An Optimum Multivariate-Multiobjective Stratified Sampling Design : Fuzzy Programming Approach

In stratified sampling design when the cost of measuring the units is not significant in each stratum, the estimation of population mean or total constructed from a selected sample according to Neyman allocation is advisable. In general the practical use of Neyman allocation suffers from a number of limitations, when there is no information about strata standard deviations except about the equality of standard deviations between some of the strata, then the precision of the estimate may be increased by pooling the strata with equal standard deviations as a single stratum and the problem of allocation is resolved by using Neyman and proportional allocations simultaneously. In this paper the case of multiple pooling of the standard deviations of the estimates in a multivariate stratified sampling for more than three strata. The problem is formulated as a Multiobjective Nonlinear Programming Problem and its solution procedure is suggested by using Fuzzy Programming approach.


Introduction
In sampling literature the problem of determining the sample sizes of the units among strata that minimizes the sampling variance of the estimator of the population mean (or total) for a fixed cost or minimizes the total cost of the survey for a fixed precision of the estimator is termed as the problem of allocation.By using either of the above criteria the allocation, so obtained, is known as optimum allocation.When the measurement cost is constant and does not vary from stratum to stratum, minimizing the total cost of the survey is equivalent to minimize the total sample size (Cochran (1977)).The optimum allocation for univariate stratified populations was first suggested by Neyman (1934).Later on, Mahalonobis (1944) introduced the cost function.Stuart (1954)  which are given as where h n ˆ are called as the Modified Neyman allocation (see Sukhatme et al. (1984)).During stratification some strata variances are unknown but may be assumed with equal variances, as discussed by Park et al. (2007).They obtained an allocation by using estimated pooled standard deviations and proportional allocation for combined strata.First they obtained pooled standard deviation for a single stratum which comprises of strata with equal variances by pooling and worked out the modified Neyman allocation.The allocation, for the pooled stratum, is then reallocated among its constituent strata by the use of proportional allocation.They also showed that under certain conditions their allocation outperforms in comparison to modified Neyman allocation and proportional allocation.Ansari et al. (2011) justified the assumption of the equality of some of the stratum variances as considered by the Park et al. (2007).Practically, there may be circumstances that allow this assumption.For example, consider a population with L strata and these strata are constructed with a view to make them internally homogeneous as far as possible.For administrative convenience there is a need of division of large homogeneous stratum into smaller strata for some reasons and also strata variances of the smaller strata are not significantly different.This can be ascertained by testing 2 2 0 : pair-wise.
In the multivariate survey when p-characteristics are to be measured on each selected units of the sample.The optimum allocation for one characteristic may be far from optimality of other characteristics (Khan et al. (1997)) unless the characteristics are highly correlated.Thus to obtain an allocation which is optimum for all the characteristics, in some sense, we need a compromise criterion that suits well to all the characteristics in some respect.An allocation based on such compromise criterion is termed as a compromise allocation in sample surveys.
In the present paper the idea of pooling the standard deviations is extended to obtain a compromise allocation in a multivariate stratified population when the true values of the stratum standard deviations are unknown but the additional information about equality of standard deviations for a specified group of strata and the estimates of the strata standard deviations are available.The case of multiple pooling is also considered for the situation when there are more than one groups of strata that have equal stratum variances.

Formulation of the Problem: The Univariate Case
In stratified random sampling, the stratified sample mean is an unbiased estimator of the overall population mean is the stratum mean for the th h stratum (2.5) and hi y is the value of the th i unit of the th h stratum/sample from th h stratum.
The problem of obtaining a Neyman allocation may be formulated as the following Nonlinear Programming Problem (NLPP) to estimate the strata variances and to avoid the problem of oversampling.After incorporating these, the NLLP (2.6) may be restated as denote the variance of st y ignoring fpc.

The Solution: The Univariate Case
The approach considered by Park et al. (2007) is summarized here for the sake of continuity in univariate case and to formulate the problem for its multivariate case.
In absence of the knowledge of the true values of the strata standard deviations their estimates are used to work out an optimum allocation.If the additional information about the equality of the strata standard deviations is available then Park et al. (2007) showed that under certain conditions this information could be used to improve the precision of the estimator st y of the population mean .Y First, the preliminary samples of sizes h n are drawn to work out the estimates h s of the unknown h S .The sizes of the preliminary samples may be worked out using the criterion given in Sukhatme et al. (1984).The strata with equal h S are then combined into a single stratum and the samples sizes are allocated by the modified Neyman allocation using the pooled variance obtained by pooling the equal strata variances.The sample size allocated to the combined stratum is then reallocated to its constituent strata using proportional allocation.
According to the above scheme if in a stratified population some of the strata (say k) are known to have equal variances then without loss of generality it can be assumed that the first k (<L) strata have equal variances, that is, . These k strata when combined into a single stratum will have the pooled estimated standard deviation denoted by pool s as The simulation study carried out by the authors (Section 6) showed that the above assumption of equality of variances is not a rigid condition and if some of the strata variances are approximately equal (± 10%) even then the compromise allocation works well.
The Park's compromise allocation is then given by where n denote the total sample size.Park et al. (2007) also showed that if the differences between unequal strata standard deviations are large, then the estimator based on the suggested compromise allocation is more efficient than proportional allocation. .
The suffix 'j' has been introduced to represent the th j characteristic (Ansari et al. (2011)).For a particular characteristic the strata, having equal or nearly equal stratum standard deviations, are combined into a single stratum.The pooled standard deviation is worked out using (3.1).The sample sizes are then allocated according to the modified Neyman allocation using the pooled standard deviations.The sample size allocated to the combined stratum is then reallocated to their constituents strata according to the proportional allocation.This gives Park's compromise allocation for a particular characteristic as given in (3.2) and (3.3) for th j characteristic the number of strata having unequal stratum variances is So that while working out the Modified Neyman allocation, the total number of strata, j L (say), for th j characteristic is given by .., . ., 2 , 1 ; For the th j characteristic the number of poolings will be j l .These j l pooled standard deviations for equal or nearly equal stratum variances are given by ; For the th j characteristic the Modified Neyman allocation with pooled stratum variances will be the solution of the NLPP (4.1) re-expressed, incorporating the assumptions laid down earlier, as the NLPP: ) , ( Minimize , denote the combined sample sizes for the th k group of strata for th j characteristics. The optimum values of jk m (say * jk m ) are reallocated to their constituent strata according to proportional allocation.This will give the values of denote the optimal value of the objective function of NLPP (4.1) denote a compromise allocation.The LHS of (4.4) denote the increase in the sampling variances of the estimate of for using the compromise allocation instead of their individual optimum allocations.

The Fuzzy Programming Approach
Obviously the best allocation will be the solution of the following MNLPP Since no algorithm is available to solve a multiobjective programming problem directly the problem is to be converted into a single objective problem by using some compromise criterion.
The solution is obtained by using Fuzzy programming approach to solve the problem (5.1) consists of the following steps: Step 1: To obtain the solution of the multi-objective NLPP (MNLPP), consider the single objective problem using only one objective at a time and ignoring the other objective function and obtained the optimum solution for each characteristic as ideal solution.
Step 2: From the result of step- where, j U and j L be the upper and lower bounds of the th j objective function ). ( Step 3: The membership function for the given problem can be defined as: )) ( ( Therefore the general aggregation function can be defined as: The fuzzy multi-objective formulation of the problem may be defined as Maximize ) ( The problem is to find the optimal value of * jh n for this convex fuzzy decision based on addition operator (like Tiwari et al. (1987)).Therefore the problem (5.2) is rewritten, according to max-addition operator, as The problem (5.4) will attain its maxima if the function is to be minimum.Therefore the problem (5.4) reduces into the following primal problem given as .

Model (2):
A typical fuzzy programming using under and over deviational variables can be expressed as follows: are respectively under and over derivations from target set.

Model (3):
By introducing an auxiliary variable λ, the model can be reformulated as follows: Model (4): By introduce auxiliary variables for each objective as j  , the model 3 can be formulated as follows: The common value of λ may be termed a measure of the degree of satisfaction or the degree of compromise (0 ≤ λ ≤ 1).If λ is close to 1, there is a high degree of satisfaction (compromise), and if λ is close to 0, there is a low degree of satisfaction.
The NLPP (5.5)-(5.8)may be solved by using a software package for solving constrained optimization problems.The software, developed by LINDO Systems Inc., is user's friendly and does not require much knowledge of computer programming or computer languages.A LINGO User's Guide ( 2001) is also available for reference.

Numerical Illustrations
Example 1: A simulation study has been carried out to illustrate the computational details of a multivariate population with multiple pooling of stratum variances.Consider a population with five strata ( 5  L ) in which three independent characteristics are defined on each unit of the population ).

(  p
It is also assumed that the population of size N = 500 is divided into five strata with stratum sizes The data for three independent normal populations with the specification of strata means jh Y and the strata standard deviations jh S given in Tables 6.1 and 6.2 respectively are generated through the website "http://www.alewand.de/stattabneu/stattab.htm".The sample data are generated through a computer program using the model where jhi y denote the value of the th i observation in th h stratum for the th j characteristic and hi Z are the values of the randomly selected standard normal variate Z .Table 6.4 gives the estimated strata standard deviations jh s .For the sake of comparisons the Averaged Neyman allocation for n = 100 using true standard deviations jh S are worked out and are given in Table 6.5.Assuming that the true strata standard deviations jh S are unknown but the information about the equality of some of the stratum standard deviations for a particular characteristic are available as stated after Table 6.2.
The pooled standard deviations are worked out as follows.3 2 2  Now the pay-off matrix of the above formulated problems is given as: The upper and lower bound of each objective functions can be expressed as: and they are defined as: On applying the max-addition operator, the MOSSD problem reduces to the problem as: In order to maximize the above problem, we have to minimize subject to the constraints as described below:   It is assumed that the population of size N = 600 is divided into six strata with stratum sizes Nh and stratum weights Wh as:  Using the sample standard deviations the Averaged Modified Neyman allocation for n = 120 are given in Table 6.11.From the available data the sampling variances of the estimates of the population means of the three characteristics (fpc ignored) under the Averaged Modified Neyman allocation are obtained as ) ( For the sake of comparisons the Averaged Neyman allocation for n = 120 using true standard deviations jh S are given in Table 6.12.Assuming that the true strata standard deviations jh S are unknown but the information about the equality of some of the standard deviations for a particular characteristic are available as stated after Table 6.8, pooled standard deviations are worked out using (4.2) as For the characteristic 1  j , the Modified Neyman allocation with pooled standard deviations will be the solution of the following NLPP: These allocations already satisfy the limits of sample sizes, thus they will solve the NLPP (6.4).
The sample size 11 m to the combined stratum is reallocated to its constituent strata (2 nd , 3 rd and 5 th ) proportionally as:   ,  1 ) ( On applying the max-addition operator, the MOSSD problem reduces to the problem as: In order to maximize the above problem, we have to minimize subject to the constraints as described below:

Conclusion
To validate the proposed compromise allocation, it is compared with some other existing compromise allocations and proportional allocation as well.Tables 7.1 and 7.2, for Examples 1 and 2 respectively, explore the performance of the proposed allocation and other comparative allocations.

The proportional allocation is worked out by
) and its variance (ignoring fpc) is computed directly using the formula .
The following averaged compromise allocations are selected for comparison.The last columns of Tables 7.1 and 7.2 explain relative efficiency of all allocations, as discussed, with respect to proportional allocation.
Thus it can be concluded about the proposed approach, using fuzzy programming for a specific model as discussed, may be considered as usable compromise criterion to solve the problems of allocation in multivariate surveys.

S
the above situation, we have, for the first characteristic ( are given in Table6.3.

 12 . 11 n
Thus the optimum allocation with pooled strata variances for the first characteristic is *

2 n = 16 .2054  16 3 n 5 n
= 17.1875  17 and = 21.6071 22Thus the optimum values of the sample sizes to different strata for the first characteristic are: the pay-off matrix of the above problems is given below:

4 ).
Averaged Allocation with Pooled Standard Deviations.These allocations are averaged over characteristics and rounded off to the nearest integers.Kozak (2006b) considered five methods for working out the compromise allocation in multivariate stratified surveys.In the fifth method, Kozak minimized the sum of relative increases in the variances due to not using the individual optimum allocations.Thus the problem of allocation may be stated as the following solution to the NLPP (7.2), with objective as "Minimize  obtained by using Lagrange multiplier technique after ignoring non-negativity restrictions, is given as The allocations given by (7.3) are termed as "Kozak's allocation" are placed for comparison in Tables 7.1 and 7.2.The basis of comparison is the 'TRACE' (the sum of principal diagonal elements = variance-covariance matrix of the estimator of the th j Since all the characteristics are assumed to be independent, the covariances are zero.The relative efficiency (R. E.) of a compromise allocation with respect to the proportional allocation is defined as

4. The Problem: The Multivariate Case
1, determine the corresponding values for every objective at each solution obtained.Let  

Table 6 .5: Averaged Neyman allocation for n = 100 (Using
With the help of available data the sampling variances of the estimates of the population means of the three characteristics (fpc ignored) under Averaged Neyman allocation given in Table6.5 are obtained as jh S )

Table 6 .7: Strata Means ( jh
Y ) jh Y jh S as given in Table 6.7 and Table 6.8 respectively.

Table 6 .
jh n used to estimate jh S are given in Table6.9.

Table 6 .9: Preliminary Sample Sizes ( jh n )
are the values of the randomly selected standard normal variate .Z The sample values of stratum standard deviations are summarized in Table6.10.
hi Z