Three-Stage Stochastic Multivariate Stratified Sample Survey

In this paper, we have considered the problem of three-stage sample surveys. The problem of a three stage multivariate stratified sample survey has been formulated as a non-linear stochastic programming problem by considering survey cost and the variances as random variables. The stochastic programming problem has been converted into equivalent deterministic form using Chance constraint programming and modified E-model.


Introduction
The analysis of two-stage stratified sampling designs is well defined in the sampling literature. In two-stage stratified sampling designs the total population is subdivided into a number of strata and then two-stage stratified sampling procedure is applied for taking the samples. The two-stage stratified sampling designs generally specifies two-stages of selection: primary sampling units (PSUs) at the first stage and sub samples from each PSUs at second stage as a secondary sampling units (SSUs) units. The methods to obtain the optimum allocations of sampling units to each stage are readily available. Showkat et al. (2011) has used the geometric programming approach in multivariate two-stage sampling design for obtaining optimum sample sizes of each stage.
In the three-stage stratified sampling design the process of sub sampling of the population under study can be carried out by dividing the given population into a number of strata, instead of enumerating them completely. The use of three stage sampling designs generally specifies three stages of selection: primary sampling units (PSUs) at the first stage, sub samples from each PSUs at second stage as secondary sampling units (SSUs) units and again sub samples from SSUs at third stage as tertiary sampling units (TSUs). For instance, in surveys to estimate crop production in India (Sukhatme, 1947), the village is a convenient sampling unit. Within a village, only some of the fields growing the crop in question are selected, so that the field is a sub-unit. When a field is selected, only certain parts of it are cut for the determination of yield per acre; thus the sub unit itself is sampled. Here we have to find the optimal sample sizes n, m and p for all the three stages with the minimum cost. The problem of optimum allocation in twostage and three-stage sample surveys is described in standard text book on sampling such as W.G. Cochran (1977). Recently Shafiullah et at. (2013) has worked on three-stage sample surveys and applied geometric programming approach for finding optimum sample sizes of each stage.
In many real-life situations the decision makers have to optimize their objectives which they have decided under certain conditions. The parameters on which the decision makers have to optimize their objectives are not always certain. The mathematical programming problem which deals with the theory and methods of the unknown parameters where the variables are considered as random is called stochastic programming problem. Stochastic programming plays very important role for modeling optimization problems. Uncertainty is the root of the stochastic programming. The main target of using stochastic programming is for finding such solution where the feasibility occurs for all data and optimal in some cases. The stochastic programming is discussed by many authors in their text books such as Prékopa (1995), Charnes and Cooper (1959). In this paper, we have formulated the three-stage sample surveys problem as a stochastic programming problem. In three-stage sample surveys problem, we have considered that sampling variance and stratum costs has normally distributed random variable. The stochastic formulation of the problem has been converted into equivalent deterministic form by using chance constrained programming and modified E-model respectively.

Formulation of the problem in Three -Stage Stratified Sample Surveys
The population is considered to be a heterogeneous population; it is turned into a homogeneous population by dividing it into L homogeneous stratum. Let strata have population such that ∑ . Now, primary stage units (PSU) are selected from each strata taking into consideration the sizes to be constant within a stratum but may differ from stratum to stratum. As, is the case of third stage unit (TSU), an SSU is selected from PSU and further, a TSU is selected from SSU such that the stratum contains PSUs with SSUs having TSUs. Also, their corresponding sample sizes are with equal probability and without replacement at each stage.
Let the value in the population of TSU in the SSU in PSU of strata be such that .
Below are some of the usual notations that refer to strata, Sample mean of TSU that were selected, Population mean of TSU that were selected, Sample mean of SSU that were selected, ∑ .
Population mean per SSU that were selected, ∑ .
Sample mean of PSU that were selected, ∑ .
Population mean of PSU that were selected, ∑ .
Required variances are Sampling variance among PSU means in stratum, Sampling variance among SSU's within PSU means in stratum, Sampling variance among TSU's within SSU means in stratum, Population variance among PSU means in stratum Population variance among SSU's within PSU means in stratum, Population variance among TSU's within SSU means in stratum, An unbiased estimate of population mean, , per TSU may be written as where ∑ is the relative size of the stratum in terms of the TSU's.
It is known that for stratified random sampling, WOR, with as the unbiased estimator of population mean , the sampling variance is given by where are the sample fraction at various stage and its estimated variance ignoring the fps is given by Now, if the travel cost may be ignored, the total cost of survey can be written in the linear form given below

∑( )
where is the overall cost of sampling. is the fixed cost in survey.
is the cost of obtaining information from the sampled FSU from the stratum. is the cost of obtaining information from the sampled SSU from the stratum. is the cost of obtaining information from the sampled TSU from the stratum.
In practice, is likely to be larger than and is likely to be larger than . Hence, a unit increase in increases the cost much more as compared to a unit increase in similarly, a unit increase in is much more compared to a unit increase in . Thus, the third component of cost function will vary from sample to sample for given .
If is considered as a finite limit on cost and the optimum size of , and is required to be found so that the total survey variance can be minimized the allocation problem will be of the following Non Linear Programming Problem (NLPP) form with characteristics can be given by Now, let us assume that to be independently normally distributed random variables. Further, sampling variance in the stratum are also random variables.

Solution Using Modified E-technique
In objective function of Eq.
The equivalent deterministic form of Eq. 2(i) can be obtained by using modified E-model as

Solution Using Chance Constraint Programming
The costs in the constraint are assumed to be normally distributive random variables.
Finally, from Eq. (4), (6) and (8) mean of objective function with random cost will be ( ) ∑ ( ) ( ) and variance from Eq. (5), (7) and (9) with random cost is ( ) ∑ ( ) ( ) Since, are unknown and therefore they are replaced by their estimators. The estimator of ( ) is Again, ( ̂ ( ) ) ( ̂ ( ) ) also imply Since, are unknown and therefore they are replaced by their estimators. The Thus finally, the allocation problem will be using assumptions made for (see Melaku 1968) and using modified E-model (see Garcia 2007) the NLPP (3) will be formulated as

Lexicographic Method
To solve the converted deterministic NLPP using lexicographic goal programming approach the with r characteristics arranged in lexicographic order of importance, at the first stage of the solution the NLPP with has to be obtained. Let be the optimal value of the objective function and is such that .
At the second stage of the solution the NLPP to be solved is given by

∑ √∑
Successively solving the problem at each stage, the NLPP at stage will be given as

Other Allocation Methods
A Comparative Study

Proportional allocation
The proportional

Cochran's Allocation
The compromise criterion of Cochran's allocation is to average the individual optimum allocations of that are solutions to the NLPP for all the p characteristics separately. Khan et al. (2003)

Sukhatme's Allocation
Sukhatme et al. [16] obtained the compromise allocation by minimizing the sum of the variances for the p characteristics under linear cost constraints. The NLPP for this allocation is given as

Simulation Study
To illustrate the theory developed in previous section a simulation study has been done.
Considering the population to posses two characteristics, randomized data with normal probabilities have been generated at each stage with total population being divided into four stratum. The data for simulation of three stage sampling is obtained through R-Software.
For the population of first characteristic at primary stage, normal random variables with specified mean and variance are generated through R-software. Later, the data is divided into four assumed stratum and mean variance and fourth moment are obtained for each strata. The normal random variables are regenerated with different mean and standard deviation for second characteristic.
Similarly, different populations at each stage are generated and regenerated for second characteristic for pre-assumed means and variances. The required data generated through the R-software for characteristic one and two are shown in table 1 and 2 respectively.

Conclusions
This paper has provided comprehensive study of an optimum allocation in three-stage multivariate stratified sample surveys with costs and the variances as random parameters. The problem is formulated as a non-linear stochastic programming problem by considering survey cost and the variances as random variables. The stochastic problem of three-stage multivariate stratified sample surveys is converted into equivalent deterministic form by using Chance constraint programming and modified E-model. Furthermore the researchers can use these formulations for obtaining optimum allocation for three-stage sample surveys whenever their costs are needed to be optimized with a limitation on variance.