Estimation of Population Variance in Two-Phase Sampling in Presence of Random Non-Response

The present investigation deals with the problem of estimation of population variance in presence of random non-response in two-phase (double) sampling. Using information on two auxiliary variables, two general classes of estimators have been suggested in two different situations of random non-response and studied their properties under two different set up of two-phase sampling. It is shown that several estimators may be generated from our proposed classes of estimators. Proposed classes of estimators are empirically compared with some contemporary estimators of population variance under the similar realistic situations and their performances have been demonstrated through numerical illustration and graphical interpretation which are followed by suitable recommendations. Mathematics Subject Classification: 62D05


Introduction
The problem of estimation of population variance arises in many practical situations.For example, a physician needs a full understanding of variations in the degree of human blood pressure, body temperature and pulse rate for adequate prescription.An agriculturist needs an adequate understanding of the variations in climatic factors especially from place to place (or time to time) to be able to plan on when, how and where to plant his crop.The variance estimation technique using auxiliary variable was first considered by Das and Tripathi (1978).Further this was extended by Srivastava and Jhaji (1980), Isaki (1983), Singh (1983), Upadhyay and Singh (1983), Tripathi et al. (1988), Singh and Joarder (1998) and Ahamed et al. (2003) among others.In many situations, information on an auxiliary variable may be readily available on all unit of the population; for example, tonnage (or seat capacity) of each vehicle or ship is known in survey sampling of transportation and number of beds in different hospitals may be known in hospital surveys.
However in some practical situations, it is common experience in sample surveys that data cannot always be collected from all the units selected in the sample.For example, the selected families may not be at home at the first attempt and some of them may refuse to cooperate with the interviewer even if contacted.As many respondents do not reply, available sample of returns is incomplete.The resulting incompleteness is called nonresponse and is sometimes so large that can completely vitiate the results.Statisticians have long known that failure to account for the stochastic nature of incompleteness can damage the actual conclusion.An obvious problem, that one needs to justify, arises when ignoring the incomplete mechanism.Rubin (1976) advocated three concepts: missing at random (MAR), observed at random (OAR), and parameter distribution (PD).Rubin defined: "The data are MAR if the probability of the observed missingness pattern, given the observed and unobserved data, does not depend on the value of the unobserved data".Singh and Joarder (1998) studied the properties of ratio type estimator of population variance suggested by Isaki (1983) under two different situations of random non-response (MAR) advocated by Tracy and Osahan (1994) when (i) random non-response on both the study and auxiliary variables and (ii) only on the study variable.Singh et al. (2012) revisited the family of estimators of population variance suggested by Srivastava and Jhajj (1980) under the above situations of random non-responses.
It is worth to be mentioned that all the above recent works of estimation of population variance in presence of random non-response are discussed on the assumption that either population mean or both population mean and variance of the auxiliary variable are known and even if they are unknown, it is assumed that no non-response situations occur on the auxiliary variable in the sampled unit.This may not often be the case.In such situations, it is more generously advisable to draw a large preliminary sample in which auxiliary variable alone is measured.This technique is known as double sampling or twophase sampling.Two-phase sampling happens to be a powerful and cost effective (economical) technique for obtaining the reliable estimate in first-phase (preliminary) sample for the unknown population parameters of the auxiliary variables.Motivated with these arguments and using information on two auxiliary variables, we have proposed two general classes of estimators of population variance in two-phase sampling applicable for two different realistic situations of random non-response and studied their properties under two different set up of two-phase sampling.It is shown that several estimators may be generated as member of the proposed classes of estimators.The superiorities of the proposed classes of estimators over some contemporary estimators of population variance under the similar realistic conditions have been established through numerical illustration and graphical interpretation.Suitable recommendations have been put forward to the survey statistician.

Two-Phase Sampling Structure
Consider a finite population We assume that no non-response situations occur at the first phase sample S while random non-response situations occur either on both the variables y and x or on the variable y alone in the second phase sample S. We have considered that the occurrences of random non-response situation follow the discrete probability distribution as presented below.

Non-Response Probability Model
If random non-response situations occur at the second phase sample S of size m and r   r = 0, 1, 2, . .., (m -2) denotes the number of sampling units on which information could not be collected due to random non-response, then the observations of the respective variables on which random non-response occur can be taken from the remaining (m − r) units of the second phase sample.It is assumed that r is less than (m − 1), that is, 0 r (m -2).


We also assume that if p denotes the probability of a non-response among the (m − 2) possible values of non-response, then r has the following discrete distribution     m -2 r m -2 -r r m -r P r = C p q , r = 0, 1, 2, . .., (m -2) mq + 2p (1) where q = 1-p and m -2 r C denote the total number of ways of obtaining r non-responses out of the (m − 2) total possible non-responses, for instance, see Singh and Joarder (1998).
It is to be noted, the probability model, defined in equation (1), is free from actual data values; hence, can be considered as a model suitable for MAR situation.
We have defined following variables based on the responding part of the sample as

Proposed Estimation Strategies
Utilizing information on an auxiliary variable x with unknown 2 x S and following the work of Isaki (1983), one may propose the ratio type estimator of population variance and Motivated with the above suggestions and assuming that the population variance 2 x S of the auxiliary variable x is unknown, we have proposed two general classes of estimators of population variance 2 y S in two-phase sampling set up applicable for two different situations of random non-response and presented below.

Situation I:
In this situation, we assume that random non-response conditions occur on both the study variable y and the auxiliary variable x at the second phase sample S and also the population variance 2 z S of the auxiliary variable z is known.Accordingly, we have suggested the general class of estimators of population variance 2 y S in two-phase sampling set up as where F s , s , s , s satisfy the following regularity conditions: 1.
Whatever be the chosen samples,   s , s , s , s assume values in a closed convex subspace, 4 R of the four dimensional real space containing the point S , S , S , S .

2.
The function   F s , s , s , s is continuous and bounded in 4 R .
Proceeding as above, it may be found that the class of estimators 2 T is also very wide and we present below some estimators of b is a real scalar.

Bias and Mean Square Errors of the Proposed Classes of Estimators 12 T and T
The bias and mean square errors (M. S. E.s) of our proposed classes of estimators 12 T and T are derived up to first order of approximations under large sample assumptions and using the following transformations: Such that i e < 1  (i = 0, 1, . .., 4).
We have derived the bias and mean square errors of the proposed classes of estimators 12 T and T separately for the cases I and II of the two-phase sampling structure defined in section 2.1 and present them below.

1 Bias and Mean Square Errors of the Proposed Classes of Estimators under Case I
In this section, we have considered that the second phase sample S of size m is drawn as a subsample from of the first phase sample S of size n and we have the following results.From the above expectations, it is to be noted that:    S , S , S , S .
Taking expectations on both sides of the equations ( 17), ( 18) and using the results in equation ( 13), we obtain the expressions for bias B(.) and mean square errors M(.) of the proposed classes of estimators 12 T and T to the first order of approximations as

2 Bias and Mean Square Errors of the Proposed Classes of Estimators under Case II
If the second phase sample S is drawn independently of the first phase sample S , then we have the following results.
Proceeding as section 3.1 and using the results in equation ( 23), we have derived the expressions for bias B(.) and mean square errors M(.) of the proposed classes of estimators 12 T and T to the first order of approximations as The bias and mean square errors of the various estimators (indicated in section 2.3) belonging to the classes of estimators 1 T and 2 T can be easily obtained by substituting the suitable values of the derivatives in equations ( 19)-( 22) and ( 24 Substituting these optimum values of the derivatives in equations ( 21) and ( 22), we have the minimum M. S. E.s of the classes of estimators   and

Case II
When second phase sample S is selected independently of the first phase sample S,  the optimality conditions which minimize the mean square errors of the proposed classes of estimators 1 T and 2 T are obtained as and Remark 4.1: It is to be noted from optimality conditions in equations ( 28) and (31) that the optimum values of derivatives of the proposed classes of estimators depend on unknown population parameters such as 0 1 2 C , C , C , 01 12 02 ρ , ρ , ρ , 22 yx S and S .Thus, to use such estimators one has to use guessed or estimated values of them.Guessed values of population parameters can be obtained either from past data or experience gathered over time; for instance see Murthy (1967) and Tracy al. (1996).If the guessed values are not known then it is advisable to use sample data to estimate these parameters as suggested by Singh et al. (2007) and Gupta and Shabbir (2008).It could be seen that the minimum mean square errors of the classes of estimators remain same up to the first order of approximations, even if population parameters are replaced by their respective sample estimates.Min.

Efficiency Comparisons of the Proposed Classes of Estimators
Min.
The variance of m *2 y s can be obtained to the first order of approximation as The performances of the proposed classes of estimators 1 T and 2 T under their respective optimality conditions are compared with the other estimators considered in this paper and their dominance have been shown by empirical and graphical means of comparisons.

Numerical Illustration
We have chosen four natural population data sets to illustrate the efficacious performances of the proposed classes of estimators 1 T and 2 T .The source of the populations, the nature of the variables y, x, z and the values of the various parameters are given as follows.

Conclusions
The following conclusions can be read-out from the present study.

1.
From Tables 1 and 2, it is observed that For the different choices of non-response rate p, proposed classes of estimators   i T i = 1, 2 are more efficient than the other estimators considered in this work.

2.
From Figures 1-4, it is noticed that (a) The percent relative efficiencies of i T (i = 1, 2) are increasing with the increasing values of the correlation coefficients 01 02 ρ , ρ and 12 ρ .This phenomenon indicates that the proposed classes of estimators perform more precisely, if information on high positively correlated auxiliary variables is available.
Thus it is clear that the uses of auxiliary variables are highly rewarding in terms of the proposed classes of estimators.Hence, the propositions of the classes of estimators in the present study are highly justified as they unify several results.Therefore, the suggested classes of estimators are more attractive in comparison with the previous work of similar nature.

Recommendations of the Proposed Work for Real life Applications
In real life survey it may be found that the character of interest is sensitive or stigmatizing such as drinking alcohol, gambling habit, drug addiction, tax evasion, history of induced abortions etc.Hence, a direct survey is likely to yield unreliable responses because presence of random nonresponse situations in the sampled units.The suggested estimation strategies for estimating the character of interest are recommended to the survey statisticians to handle these realistic situations.
U , U , . .., U ) of N units, y, x and z are the variables under study, first auxiliary variable and second auxiliary variable respectively with population means Y, X and Z.Let k y, k x and k z be the values of y, x and z for the k-th (k = 1, 2, . .., N) unit in the population.We wish to estimate the population variance variable y in the presence of the auxiliary variables x and z, when the population variance unknown but the information on z is available for all the units of population.To estimate 2 y S , a first phase sample S of size n is drawn by simple random sampling without replacement scheme (SRSWOR) from the entire population U and observed for the auxiliary variables x and z to estimate 2 x S .Again a second-phase sample S of size m (m < n) is drawn according to the following cases by SRSWOR scheme to observe the characteristic y and x.Case I: Second phase sample is drawn as a subsample of the first phase sample   i. e. S S  .Case II: Second phase sample S is drawn independently of the first phase sample S.  Hence onwards, we use the following notations: m m n y , x , z : Sample means of the respective variables based on the sample sizes shown in suffices.and s : Sample variances of the respective variables based on sample sizes shown in suffices.


Sample means of the respective variables based on the responding part of the second phase sample S. Sample variance of the variable x based on the responding part of the second phase sample S. variance of the study variable y based on the responding part of the second phase sample S.

3 2 4 E
e = f C , E e = f C , E e = f C , E e = f C , E e = f C , E e e = f ρ C C , E e e = f ρ C C , E e e = f ρ C C , E e e = f ρ C C , E e e = f ρ C C , E e e = E e e = f C , E e e = f C , E e e =E e 0 (there is no non-response), the above expected values of the sample statistics on which random non-responses occur coincide with the usual results.; see for instanceUpadhyaya and Singh (2006

1 T
= G s , s , s , s = S 1+e + c S e -e + c S e + S c e +c e +2c e e + c S e (18) 2 +2S S c e e + c e e +2c S S e e +2S S c e   T -Y =f S C + c f S C + c f S C +2c f S S ρ C C + 2c f S S ρ C C (22) f C , E e = f C , E e = f C , E e = f C , E e = f C , E e e = f ρ C C , E e e = f ρ C C , E e e = f C , E e e f ρ C C , E e e = E e e = E e e = E e e =E e e =E e e = 0 T -Y = S C c f +c f +c f S C +2c f S S ρ C C +2c f S S ρ C C T -Y =f S C +d S f +f C +d f S C +2d f S S ρ C C -2d d f S S ρ C C , T -Y = f S C +c S f +f C +c f S C +2c f S S ρ C C -2c c f S S ρ C C .(27) Remark 3.1.

4 .and 2 T
Minimum M. S. E.s of the Proposed Classes of Estimators 1 T It is obvious from the equations (21), (22), (26), (27) and remark 3.1 that the mean square errors of the proposed classes of estimators   i T i = 1, 2 depend on the different values of the derivatives 2 4 2 4 d , d , c and c .Therefore, we desire to minimize the mean square errors of the proposed classes of estimators i T separately for two different cases of two- phase sampling set up considered in this work and shown below: Case I When second phase sample S is drawn as a sub sample of the first phase sample S , the optimality conditions under which proposed classes of estimators

Substituting these optimum values of the derivatives 2 4 2 4 d
, d , c and c in equations (26) and (27), we have the expressions of minimum M. S. E. of the classes of estimators

Figure 1 Figure 3 :
Figure 1: PRE of 1 T under Case I Figure 2: PRE of 2 T under Case I For high positive values of the correlation coefficients 01 02 ρ , ρ and 12 ρ (specially for the populations II and IV), the proposed class of estimators 1 T yields impressive gains in efficiency over the other estimators m behavior is visible from both the cases of the two-phase sampling set up as suggested in this work.Similar situations are also observed for the class of estimators If random non-response occurs only on the study variable y, then the estimators first, second and third order partial derivatives of   It can be observed from equation (8) that the class of estimators 1T is very wide in the sense for any parametric function, , the M. S. E.s/ minimum M. S. E.s of these estimators are derived up to the first order of approximations under the Cases I and II of the two phase-sampling set up and presented below.