Manipulation-based Ranked Set Sampling Scheme

Cost-effective and efficient sampling methods are of main concern in many social, biological and environmental studies. In this article, an efficient sampling scheme, named manipulation-based ranked set sampling (MBRSS) scheme is introduced with its properties for estimating population mean and median. The MBRSS is a mixture of simple random sampling (SRS), ranked set sampling (RSS) and median ranked set sampling (MRSS) schemes and is applicable in the situation when ordinary RSS cannot be conducted. It is shown that the proposed scheme provides unbiased mean estimator provided underlying distribution is symmetric. For asymmetric distributions, a weighted mean is proposed, where optimal weights are computed using Shannon's entropy. Monte Carlo simulation is used to ascertain effectiveness of the proposed mean and median estimators in the presence of outliers. We also compared the efficiency of MBRSS and truncation-based ranked set sampling (TBRSS) scheme with respect to SRS under the situation of perfect and imperfect ranking i.e error in rankings with respect to variable of interest. It is observed, on the basis of theoretical and numerical studies that MBRSS is more efficient than SRS. Further, a real data set is used to illustrate the proposed MBRSS scheme. Subject Classification: MSC2010-62D05


Introduction
In many social and natural sciences where sampling is used, an efficient sampling method is focused especially when the measurement of characteristics of interest is costly or time-consuming.In this connection, McIntyre(1952) suggested an efficient sampling method alternative to SRS, which later on called ranked set sampling (RSS) method, for estimating mean pasture and forage yield.Thereafter, many modifications were made in the basic RSS method to make it more economical or cost effective.For instance, Samawi et al. (1996) suggested extreme ranked set sampling (ERSS) for estimating population mean.Al-Nasser (2007) proposed L-ranked set sampling(LRSS) for estimating population mean.Al-Omari and Raqab (2013) introduced truncation based rank set sampling (TBRSS) for estimating population mean and median.Al-Nasser and Al-Omari (2015) introduced weighted TBRSS and showed that it is more efficient than ordinary TBRSS.Muttlak (1997) suggested median ranked set sampling (MRSS) for estimating mean and median of symmetric and asymmetric distributions.Dell and Clutter (1972) showed that under the situation of imperfect rankings, the sample mean remains an unbiased estimator of population mean, but ranking should be better than at least a random ordering.Stokes (1977) presented a simple linear model and showed that an concomitant variable can be used to rank the study variable.For more details about ranking based on concomitant variable see Zamanzade and Mohammadi (2016), Zamanzade & Vock (2015) and references therein.
In this paper, we proposed a manipulation-based ranked set sampling (MBRSS) scheme for estimating population mean and median.It provides flexibility to the experimenter in selecting more representative sample by adopting SRS, RSS and MRSS schemes.It consumes less units than truncation-based ranked set sampling (TBRSS) and provides efficient estimates than conventional SRS scheme.The rest of the paper is organized as follows: In Section 2, RSS, MRSS, TBRSS and the proposed MBRSS are described.Estimation of population mean and its efficiency is investigated in Section 3. Median estimation with its efficiency is elaborated in Section 4. The weighted mean estimators for skewed distribution are included in Section 5. Monte Carlo simulation to ascertain effectiveness of the proposed mean estimator in the presence of outliers is given in Section 6. TBRSS and MBRSS with concomitant variable are studied in Section 7. Illustration of proposed MBRSS with real data set and its comparison with TBRSS is given in Section 8. Finally, concluding remarks are included in Section 9.

Sampling Methods
In this Section we describe the RSS, MRSS, TBRSS and MBRSS sampling methods.

Rank Set Sampling (RSS)
RSS can be described as: For selection of m units, identify 2  m units from target population and arrange them into m samples each of size m and rank the units within each sample with respect to variable of interest by any cost free method.From the ith (i =1,2,3,. ..,m) sample, select the ith smallest ranked unit for actual measurement.The whole procedure can be repeated r times, if needed, to get a RSS sample of size mr.

Median rank Set Sampling (MRSS)
MRSS is described as: draw m simple random samples each of size m from target population and rank the units within each sample with respect to variable of interest by any cost free method.If m is odd, select ((m +1) / 2)th smallest ranked unit from each sample.If m is even, select from first (m / 2) samples (m / 2)th smallest ranked unit from the last (m / 2) samples ((m + 2) / 2)th smallest ranked unit.The whole procedure can be repeated r times, if needed, to get a MRSS sample of size mr.

Truncation Based Ranked Set Sampling (TBRSS)
TBRSS can be described as: draw m simple random samples each of size m from target population and rank the units within each sample with respect to variable of interest by any cost free method.Define a coefficient   k= α m where 0 α < 0.5  and   t is the largest integer less than equal to t.From first k samples , select the smallest rank unit and from the last k samples select the largest rank units and from the remaining (m -2k) samples, select the ith unit of the ith (i = k +1,k + 2,. ..,m -k) sample.

The proposed MBRSS sampling scheme
In many practical situation the ordinary RSS cannot be carried out due to scarcity of resources or lack of large population elements.In such situation MBRSS scheme provides opportunity to the experimenter to select a sample by applying SRS, RSS and MRSS schemes.Thus, MBRSS is more economical and flexible than ordinary RSS and TBRSS schemes.A manipulation-based ranked set sample of size m can be selected by adopting the following steps: Step-1: Define a constant   1 k=α m where 0 α 0.5


and   t is the largest integer less than equal to t.If 1 k = 1, select 1 k unit from the target population by SRS method.If 1 k2  , select 1 k units by RSS method.
Step-2: Select the remaining 21 k = (m -k ) units by applying MRSS defined in Section 2.2.This completes one cycle for selection of a sample of size 12 m = k + k units under MBRSS.The above steps 12  can be repeated r times, if needed, to obtain a sample of size mr units.It is pertinent to mention that MBRSS utilizes 11 22 (m -k ) + k units, which are always less than equal to 2  m units consumed by ordinary RSS and TBRSS schemes, to get a sample of size m.Note that for 1 k = 0 , MBRSS reduces to MRSS.

Estimation of Population Mean
Let the variable of interest X has probability density function (pdf) f(x) and cumulative distribution function F(x) with mean μ and variance 2 σ.Let 1 2 3 m X ,X ,X ,. ..,X be a SRS of size m from f(x) .The SRS estimator of population mean μ if sampling is repeated r times, is defined as Let (i:m) g (x) be pdf of ith order statistic i.e. (i:m) X (i =1,2,3,. ..,m) , then it can be shown that : The mean and variance of (i:m) X respectively are given by for detail see David and Nagaraja (2003).The RSS estimator of population mean, say μ , is defined as and its variance is given by The MRSS estimator of population mean for even m is defined as and its variance is given by The MRSS estimator of population mean for odd m is defined as and its variance is given by (7) The TBRSS estimator of population mean is defined as and its variance is given by

Estimation of population mean using MBRSS
The MBRSS estimator of population mean for even 2 k can be defined as and its variance is given by The MBRSS estimator of population mean for odd 2 k can be defined as and its variance is given by   For symmetric distribution, we have i n-i+1 μ + μ = 2μ .Further,it is easy to write (12) for odd 2 k , we have Since the second term on right hand side is non negative and Similarly, considering Eq(13) , we have Since the second term on right hand side is non-negative and 2 X SRS σ Var(X ) = mr .This completes the proof.
If the underlying distribution is asymmetric, the mean square error (MSE) of the mean estimators based on MBRSS and TBRSS, are given by 1 2 It may be noted that the MSE and Bias of any estimator T( )  of population parameter μ are defined as The REs of mean estimators are reported in Tables1-3.Calculation were done using Eqs (17) and ( 18) for different values of 1 k and k for MBRSS and TBRSS respectively with m = 4,5,6,7.The results advocate the efficiency of MBRSS over SRS for estimating mean for symmetric and asymmetric distributions.RE of MBRSS increases as sample size increases for all considered values of 1 k under both symmetric and asymmetric distributions.While, the efficiency of MRSS has, generally, decreasing trend when sample size gets large for the cases of highly skewed distributions such as exponential(1), chisquare(1) and lognormal(0,1) distributions.For m6  , MBRSS became superior to MRSS for estimating mean of chisquare(1) when 1 k > 1 .For instance, RE of MRSS for estimating mean of chisquare(1) is 1.7000 when m = 6 .While, it is 1.7885 under MBRSS with 1 k = 2 and 2 k = 4 .However, in most of the considered distributions, maximum efficiency is obtained when 1 k = 0 .In this case, as mentioned earlier, the MBRSS is equivalent to MRSS.As regards TBRSS, it is less efficient than MBRSS for estimating population mean under all considered skewed distributions except weibull(2,1) but it is superior to MBRSS in most of the considered symmetric distributions, except logistc(0,1).But this loss in efficiency decreases as m increases with minimum loss occurs when 1 k = 1.Therefore, it can be concluded that in the situation of scarcity of resources the experimenter should prefer MBRSS.

Estimation of population median
Median is reliable measure of center tendency when underlying distribution is asymmetric or highly skewed.We define median estimators based on SRS, TBRSS and MBRSS.An extensive simulation study is also conducted to compare the efficiency of the median estimators based on TBRSS and MBRSS relative to conventional estimator based on SRS.Let 1 2 3 m X ,X ,X ,. ..,X be a SRS of size m.Then SRS estimator of population median, say , is defined as The population median estimator under TBRSS is defined by be MBRSSe of size m.Then the population median estimator is given by Suppose that 2 k is odd and . .,X be MBRSSo of size m.Then the population median estimator is given by The We can see from Tables 45  that a substantial gain in efficiency is obtained by using MBRSS relative to SRS for estimating median for symmetric and asymmetric distributions.RE increases with increase in m.However, maximum gain in efficiency is obtained at 1 k = 0 .The Table6 reflects efficiency of TBRSS for estimating population median.It can be seen from Tables4-6 that the proposed MBRSS is superior to TBRSS for estimating population median for both symmetric and asymmetric distributions.For instance, RE of median estimator under MBRSS for m=5 at 1 k = 1 is 2.1640, while it is 1.5330 under TBRSS for the case of normal distribution (0,1).Therefore, the results suggest that MBRSS is economical and efficient alternative to TBRSS for estimating population median.

Weighted MBRSS for skewed distribution
To improve the efficiency of MBRSS scheme in estimating population mean when underlying distribution is asymmetric, a weighted MBRSS for even 2 k is defined as Similarly, a weighted MBRSS for odd 2 k is defined as First, we find estimated weights for the estimator wMBRSSe X defined above.where 11 w , 21 w and i1 1 w ,i = 1,2,. ..,k are non negative weights to be chosen such that wMBRSSe X is unbiased estimator.The optimal weights, which provide a measure of uncertainty, can be found by using entropy measure from information theory.Entropy is considered as a measure of uncertainty.A simple choice of this measure is Shannon's entropy: subject to the constraints 1.To find weights, the Lagrange function is formulated as Solving the first order conditions, the solution leads to and its associated weighted variance is given by We, now, find weights for the estimator wMBRSSo X defined above.where 12 w and i2 1 w ,i = 1,2,. ..,k are non negative weights to be chosen such that wMBRSSo X is unbiased estimator.In this case, the problem can be expressed as a nonlinear system m i2 i2 i=1

Maximize -w ln(w )
  subject to the constraints 1.To find weights, the Lagrange function is formulated as Solving the first order conditions, the solution leads to

Weighted TBRSS for skewed distribution
The population mean estimator based on weighted TBRSS (wTBRSS) is defined by and its associated weighted variance is given by m-k 2 The REs of wMBRSS and wTBRSS with respect to SRS for estimating mean are given by and The numerical values of REs of wMBRSS and wTBRSS with respect to SRS are reported in Tables7-10 for different asymmetric distributions, assuming 1 k = 2 and k = 2, for wMBRSS and wTBRSS respectively and sample size m = 4,5,6.The results indicate significant improvement in the efficiency of mean estimator by using wMBRSS.Moreover, RE increases as m gets large.For instance, REs of unweighted MBRSS for m = 4,5 at 1 k = 2 are 1.3496 and 1.7777 respectively, as given in Table1 for the case of exponential(1) distribution.While, these are 5.3333 and 7.2246 under wMBRSS as given in Table7.A gain in efficiency is also obtained by using wTBRSS.For example, REs of unweighted TBRSS for m = 5 at k = 2 are 1.3066 and for the case of exponential(1) distribution.While, it is 2.7111 under wTBRSS as indicated in Tables 9 .However, a substantial gain in efficiency of mean estimator is obtained by using wMBRSS instead of wTBRSS as indicated for the case of exponential (1) distribution.

Simulation study
In this section, effectiveness of the proposed MBRSS scheme relative to the traditional SRS scheme is ascertained in the presence of outlier for m equal to 4,5,6 and 7 with 1 k = 1,2 .The idea is to replace minimum value of the first sample in one of the two data sets 1 k and 2 k with an outlier and ascertain its effects on the performance of the considered estimators.This is done by setting (1) X .The simulation study is carried out for different symmetric and asymmetric distributions such as normal(0,1), logistic(0,1), uniform(0,1), beta (3,3), exponential(1), weibull(2,1), gamma(2,3) and student t (5).The performance of the estimators is investigated by comparing simulated mean square error(MSE) as a criteria of robustness of the mean and median estimators.The MSE based on 40,000 simulation is defined as: The estimated REs of MBRSS vs SRS based on MSEs for estimating mean and median of considered distributions are depicted in Figures1-2.The Figure1 indicates that the RE of MBRSS vs SRS, for estimating mean, is increasing with increase in m except some skewed distributions such as exponential(1) and gamma(2,3) wherein RE decreases as m increases when one unit ( 1 k = 1 ) is chosen by SRS and remaining by MRSS.For the choice 1 k = 2 i.e. two units are selected by RSS and remaining by MRSS, RE increases when m gets large under all considered distributions.However, it is observed that maximum gain in efficiency is obtained when one unit of a sample is taken by SRS i.e.

Ranking with concomitant variable
In many practical problems the variable of interest, X, is hard to measure and difficult to rank as well but a concomitant variable, Y, correlated with, X, can easily be measured.
Then the concomitant variable can be used for the ranking of the sampling units.For instance, the assessment of the status of hazard waste sites is usually costly.But, often, a great deal of knowledge about hazard waste sites can be obtained from records, photos etc. and then be used to rank the hazard waste sites.In this section, we follow Stokes (1977) idea in which ranking is performed using concomitant variable,say Y , that can be measured easily.Stokes (1977) proposed the following model with the assumptions (1) the regression of X on Y is linear (2) the underlying distributions of standardized variables Y and  are independent and  has mean zero and variance 2 2 in jth replication

Estimation under imperfect ranking
In this section, we estimate population mean and median under the situation when ranking is performed on the concomitant variable Y to rank the study variable X using model given by the Eq(27) .Suppose (X,Y) follow bivariate normal distribution.Then, and its variance is given by Lemma-2: The estimator MBRSSC X is more efficient than SRS X i.e.

MBRSSC SRS
Var(X ) Var(X )  Proof: From Eq(29) , we have Note that the second term on right hand side is non negative.
MBRSSCe SRS Var(X ) Var(X )  Now, from Eq(31) , we have Here, we also note that the second term on right hand side is non negative.This completes the proof.The REs of MBRSSC X and TBRSSC X with respect to SRS X are given by SRS J SRS J Var(X ) RE(X ,X ) = ;J = MBRSSC,TBRSSC Var(X ) (34) The performance of MBRSSC and TBRSSC for estimating population mean and median are investigated when the study variable X and auxiliary variable Y follow standard bivariate normal distribution with pdf as given by 22 The simulated REs of mean and median estimators for m = 4,5,6,7 with different values of correlation coefficient ρ = ±0.20,±0.40, ±0.60, ±0.80, ±0.90 are calculated using Eq(34) after 4 4×10 replication and reported in .As expected, the performance of the mean and median estimators depend on value of correlation coefficient.The estimators become more precise correlation increases and vice-versa.The MBRSSC is less efficient than TBRSSC for estimating population mean.But this loss is not so large as we can see from Table11-16.For example, if m = 5 the RE of MBRSSC for estimating mean is 1.2255 for 1 k = 1 , ρ = ±0.60 .While, it is 1.5236 under TBRSSC for .k = 2 However, MBRSSC performs better than TBRSSC in estimating population median for 1 k = 1, m 6 and ρ 0.80  .For instance, the RE of MBRSSC for estimating population median is 2.1023 for 1 k = 1,m = 6 and ρ = ±0.90 .While, it is 1.6641 under TBRSSC.Therefore, it is concluded that MBRSSC conditionally dominates the efficiency of TBRSSC for estimating population median but slightly less efficienct than TBRSSC for estimating population mean under imperfect rankings.But this loss in efficiency decreases as sample size increases.

Concluding remarks
In this paper, we suggested MBRSS scheme for estimating population mean and median.The population mean estimator based on MBRSS is unbiased subject to underlying distribution is symmetric.For asymmetric distributions, a weighted mean estimator based on MBRSS showed a significant improvement in its efficiency relative to SRS.Monte Carlo simulation results depicted in Figures.1 and 2 advocate the robustness of the proposed MBRSS relative to SRS for estimating population mean and median in the presence of outliers.The MBRSS performs well in estimating population median instead of TBRSS under the situation of both perfect and imperfect ranking i.e error in rankings.
But the proposed MBRSS is, generally, less efficient than TBRSS for estiamting mean of symmetric population, but this loss in efficiency will decrease as sample size is increased.Using MBRSS will cut down number of sampling units to be identified to approximately 2/3 to 1/2 of what is needed in TBRSS for selection of required units.Therefore, under the situation when there is less budget to conduct survey or lack of large number of sampling units, it is recommended to use MBRSS scheme being economical and efficient alternative to SRS in estimating population mean and median.

1 
leads to optimal solution.This problem can be expressed as a nonlinear system . .., m) are estimated using Shannon's entropy, for details see Al-Nasser and Al-Omari (2015), and given by

1 k = 1
and remaining 21 (k = m -k ) units by MRSS scheme.The Figure.2 also indicates that MBRSS is superior to the traditional SRS for estimating median of a population even in the presence of outliers.The choice 1 k = 1 also remains optimum in case of median estimation.

Figure 1 :Figure 2 :
Figure 1: REs of mean estimators based on MBRSS vs SRS in presence of outliers for symmetric and asymmetric distribution

Table 17 : RE of MBRSS vs SRS for estimating mean and median height of 399 trees ( X ) under perfect and imperfect rankings
Platt et al. (1988)use a real data set to illustrate the efficiency of the proposed MBRSS and TBRSS schemes with respect to SRS in estimating mean and median height of 399 conifer trees.The data based on two variables: X, the diameter in centimeters at breast height, and Y, the entire height in feet, for more detail seePlatt et al. (1988).The summary statistics of the two variables are given by Since both variables have non zero skewness so data is asymmetrically distributed.REs of mean and median estimators based on MBRSS and TBRSS with respect to SRS are reported in Tables17-18 respectively.It is evident from the Table17 that under both perfect and imperfect rankings, the mean and median estimators based on MBRSS scheme gives efficient estimates as compare to its counterparts based on SRS for all values of 1 k and 2 k .There is decay in RE under imperfect rankings, as expected, due to error in rankings.Further, the efficiency of MBRSS has, generally, increasing trend for m6  as 1 k gets large in estimating population mean.Consequently, for m 6,  MBRSS becomes superior to MRSS in estimating mean of the population under study.For example, RE of MRSS in estimating mean for m = 6 under perfect ranking is 1.5700.while, corresponding RE of MBRSS at 1 k = 2,3 are 1.6938 and 1.7441 respectively.On the other hand, RE of TBRSS for estimating population mean is also increases as m gets large.It is worth mentioning that at m = 5, MBRSS becomes superior to TBRSS for estimating population mean.For m 6,  the efficiency of MBRSS at 1 k = 3 is approximately equal to that of the TBRSS for k = 2.As regards population median estimation, MBRSS outperforms relative to TBRSS under both perfect and imperfect rankings.For example, for m = 5, 1 k = 1 , RE of median estimator based on MBRSS, under perfect rankings, is 3.1439.While, it is 1.8724 in case of TBRSS.From above discussion, it can be concluded that MBRSS is superior alternative to SRS.It also works well in estimating population median as compare to TBRSS.

Table 18 : RE of TBRSSC vs SRS for estimating mean and median height of 399 trees (
X ) under perfect and imperfect rankings