Estimation of Finite Population Mean and Superpopulation Parameters when the Sampling Design is Informative and Nonresponse Mechanism is Nonignorable

In this paper we study the joint treatment of not missing at random response mechanism and informative sampling for survey data. This is the most general situation in surveys and other combinations of sampling informativeness and response mechanisms can be considered as special cases. The proposed method combines two methodologies used in the analysis of sample surveys for the treatment of informative sampling and the nonignorable nonresponse mechanism. One incorporates the dependence of the first order inclusion probabilities on the study variable, while the other incorporates the dependence of the probability of nonresponse on unobserved or missing observations. The main purpose here is the estimation of finite population mean and superpopulation parameters when the sampling design is informative and nonresponse mechanism is nonignorable. Under four scenarios of sampling design and nonresponse mechanism, we obtained the method of moment estimators of finite population mean, with their biases and mean square errors. Furthermore, a four-step estimation method is introduced for the estimation of superpopulation parameters under informative sampling and nonignorable nonresponse mechanism. New relationships between moments of response, nonresponse, sample, sample-complement and population distributions were derived. Most estimators for finite population mean known from sampling surveys can be derived as a special case of the results derived in this paper.


Introduction
Data collected by sample surveys are used extensively to make inferences on assumed population models. Often, survey design features (clustering, stratification, unequal probability selection, etc.) are ignored and the sample data are then analyzed using classical methods based on simple random sampling. This approach can, however, lead to erroneous inference because of sample selection bias implied by informative samplingthe sample selection probabilities depend on the values of the model outcome variable (or the model outcome variable is correlated with design variables not included in the model). See Pfeffermann et. al (1998) and Eideh and Nathan (2006). In addition to the effect of complex sample design, one of the major problems in the analysis of survey data is that of missing values. Rubin (1976) and Little and Rubin (2002) consider three types of nonresponse mechanism or missing data mechanism: (a) Missing completely at random (MCAR): if the response probability does not depend on the study variable, or the auxiliary population variable, the missing data are MCAR.
(b) Missing at random (MAR) given auxiliary population variable: if the response probability depends on the auxiliary population variable but not on the study variable, the missing data are MAR.  Chang and Kott (2008). The methods used in these papers are summarized by Pfeffermann and Sikov (2011) and Eideh (2012).
For inference problem, Little (1982) classify the nonresponse mechanism as ignorable (MAR and MCAR) and nonignorable (NMAR). Foe this sense, the cross classification of sampling design and nonresponse mechanism is: Table 2 Sampling Design Nonresponse Mechanism  Ignorable  Nonignorable  Informative  II  IN  Noninformative NI NN Pfeffermann and Sikov (2011), and Eideh (2012) consider estimation of superpopulation parameters and prediction of finite population parameters (census parameters) under nonignorable nonresponse via response and nonresponse distributions when the sampling design in noninformative.
None of the above studies consider simultaneously the problem of informative sampling and the problem of nonignorable nonresponse when analyzing survey data.
In this paper, we study, within a modeling framework, the joint treatment of nonignorable nonresponse mechanism and informative sampling for survey data, by specifying the probability distribution of the observed measurements when the sampling design is informative. This is the most general situation in surveys and other combinations of sampling informativeness and response mechanisms can be considered as special cases.
It should be pointed here that, according to Sarndal (2011) "Nonresponse causes both bias and increased variance. Its square is typically the dominant portion of the Mean Squared Error (MSE). We address primarily surveys on individuals and households with quite large sample sizes, as is typical for Journal of Official Statistics for government surveys; consequently, the variance contribution to MSE is low by comparison. Increased variance due to nonresponse is nevertheless an issue; striking a balance between variance increase and bias reduction is considered, for example, in Little and Vartivarian (2005)." Furthermore, Brick (2013) mentioned that "Model assumptions and adjustments are made in an attempt to compensate for missing data. Because the mechanisms that cause unit nonresponse are almost never adequately reflected in the model assumptions, survey estimates may be biased even after the model based adjustments. Nonresponse also causes a loss in the precision of survey estimates, primarily due to reduced sample size and secondarily as the result of increased variation of the survey weights. However, bias is the dominant component of the nonresponse-related error in the estimates, and nonresponse bias generally does not decrease as the sample size increases. Thus, bias is often the largest component of mean square error of the estimates even for subdomains when the sample size is large". In we focus here on the bias, variance and MSE.
The paper is structured as follow. Section 2 reviews the definition of response distribution and estimation of response probabilities. Section 3 introduces new relationships between moments of response, nonresponse, sample, sample-complement and population distributions. Section 4 describes the estimation of finite population total , under the four scenarios mentioned in Table 2. Also the main purpose in this section is the computation of the biases and mean square errors of the estimators. Section 5 is devoted to the estimation of superpopulation parameters under informative sampling and nonignorable nonresponse mechanism. Section 6 provides the conclusions.

Response and Nonresponse Distributions
denote a finite population consisting of N units. Let y be the study variable of interest and let i y be the value of y for the th i population unit. A probability sample s is drawn from U according to a specified sampling design. The sample size is In addition to the effect of complex sample design, one of the major problems in the analysis of survey data is that of missing values. In recent articles by Eideh (2009), Pfeffermann and Sikov (2011), and Eideh (2012), the authors defined and studied the problem of nonignorable nonresponse using the response and nonresponse distributions where the sampling design is noninformative. Following the notations, denote by if otherwise. We assume that these random variables are independent of one another and of the sample selection mechanism (Oh and Scheuren 1983). The response set is defined accordingly as . We assume probability sampling, so that 0 ) . In practice, these conditional expectations are not known. Assuming that the available data to the analyst is
Note that Section 3.3 of Beaumont (2002) is a special case of equation (14). where .

Method of Moments Estimators of Finite Population Mean
In this section we consider the estimation of finite population total under the four scenarios mentioned in Table 2, namely: IN, II, NI, and NN. Also the main purpose of this section is the computation of the biases and mean square errors of these estimators.

Estimation of Finite Population Mean and Superpopulation Parameters when the
which is the two-phase nonresponse adjusted estimator, see Sarndal and Lundstrom (2005, p 51).
is an unbiased estimator of 2 t .
 can be written as: Case 2: Informative sampling design and nonresponse mechanism is ignorable (II).
The MME of which is similar to the estimator given by Sarndal (1980) and discussed in details by Bethlehem (1988).

Proof:
I did not see the proof anywhere, I decided to show the reader the proof.
Note that, if the nonresponse mechanism is ignorable, that is the population covariance between the study variable and response probability is zero,   0 (24)

Case 3: Noninformative sampling design and nonignorable nonresponse mechanism
We can show that the MME of Note that, if the sampling design is noninformative, that is the population covariance between the study variable and first order inclusion probability is zero,  

Case 4: Sampling design is noninformative and nonresponse mechanism is ignorable
Here, we can show that the MME of Note that, if the sampling design is noninformative and nonresponse mechanism is ignorable, that is the population covariance between the study variable and inclusion probability is zero,    , y C , then   0 (34) The four cases can be summarized in Table 3.
An interesting feature of the theses results is that several classical estimators in common use, within randomization theory (design-based school) of survey sampling, are shown to be special cases of the proposed approach, thus providing them a new justification.

Estimation under Informative Sampling and Nonignorable Nonresponse Mechanism
One of the main advantages of basing the inference on the response distribution is that it permits the use of standard inference procedures like those based on the likelihood principle. Having derived the response distribution when the sampling design is informative and the nonresponse mechanism in nonignorable (NMAR) and if the response measurements are independent, then the response likelihood for  (the parameter indexing the superpopulation model),  (the parameter indexing the sampling design) and  (the parameter indexing the nonresponse mechanism), is given by: In this paper we study the joint treatment of not missing at random response mechanism and informative sampling for survey data. This is the most general situation in surveys and other combinations of sampling informativeness and response mechanisms can be considered as special cases. The proposed method combines two methodologies used in the analysis of sample surveys for the treatment of informative sampling and the nonignorable nonresponse mechanism. One incorporates the dependence of the first order inclusion probabilities on the study variable, while the other incorporates the dependence of the probability of nonresponse on unobserved or missing observations. The main purpose here is the estimation of finite population mean and superpopulation parameters when the sampling design is informative and nonresponse mechanism is nonignorable. Under four scenarios of sampling design and nonresponse mechanism, we obtained the method of moment estimators of finite population mean, with their biases and mean

Estimation of Finite Population Mean and Superpopulation Parameters when the Sampling Design is …….
Pak.j.stat.oper.res. Vol.XII No.3 2016 pp467-489 487 square errors. Furthermore, a four-step estimation method is introduced for the estimation of superpopulation parameters under informative sampling and nonignorable nonresponse mechanism. New relationships between moments of response, nonresponse, sample, sample-complement and population distributions were derived. Most estimators for finite population mean known from sampling surveys can be derived as a special case of the results derived in this paper. This paper can be considered as generalization and extension of Bethlehem paper (1988).

Conclusions
In this article we use two methodologies used in the analysis of sample surveys for the treatment of informative sampling and the nonignorable nonresponse mechanism. One incorporates the dependence of the first order inclusion probabilities on the study variable, while the other incorporates the dependence of the probability of nonresponse on unobserved or missing observations. Using the new relationships, derived in the present study, between moments of response, nonresponse, sample, sample-complement and population distributions, we develop four estimators of finite population mean under classification of sampling design and nonresponse mechanism. Known estimators in common use in official statistics are shown to be special cases of the present theory, so provide new justification of these estimators as method of moments estimators. Further experimentation (simulation and real data problem) with this kind of estimators and is therefore highly recommended. Furthermore, in this paper, we show the role of informative sampling design and nonignorable nonresponse in adjusting various estimators for bias reduction. In addition to the estimation of finite population mean, we introduce a new method for the estimation of superpopulation parameters under informative sampling and nonignorable nonresponse mechanism.
In brief, ignoring informativeness of sampling design and nonignorable nonresponse, will yield biased estimators of finite population total. To reduce the bias, we propose the use of poststratification based on first order inclusion probabilities (in case of informative sampling design and ignorable nonresponse mechanism), or estimated response probabilities (for noninformative sampling design and nonignorable nonresponse mechanism), or product of them (if the sampling deign is noninformative and the nonresponse mechanism is ignorable).
I hope that the new mathematical results obtained in the present article will encourage further theoretical, simulation, real data problem, empirical and practical research in these directions.