Ordinal Logit and Multilevel Ordinal Logit Models: An Application on Wealth Index MICS-Survey Data

Ordinal logistic regression models are used to predict the dependent variable, when dependent variable is of ordinal type in both the situation for single level and multilevel. The most used model for ordinal regression is the Proportional Odd (PO) model which assumes that the effect of the each predictor remains same for each category of the response variable. To estimate the wealth index of household in the province Punjab the proportional odds model is used. The wealth index is an order categorical dependent variable having five categories. The data MICS (2014), a multiple indicator cluster survey conduct by Punjab bureau of statistics was used in this article. The data was recorded at different level such as individual level (household level), district level and division level. The secondary data MICS contains a sample of 41413 household collected from both rural and urban areas of the province Punjab. In the present study analysis were made for single level (household level) and two levels (division level). After fitting the proportional odds model for the single level the proportionality assumption is tested by the brand test whose results suggest that all the predictors fulfill assumption of proportional odds. The significance value suggests that all the predictors have significant effect on the wealth index. The variation due to division level was estimated by two level ordinal logistic regression equal to 5.842, and the Intra Class Correlation ICC is equal to 0.6397 which show that 63.97% of total variation is due to division level.


Introduction
Discrete choice models is a class of model in which the response variable Y takes the counted values such as 0, 1, 2, and so on to a finite number of values (Joe, 2008). In the case where the natural ordering exist in the dependent variable. For example the grade of a high school student may, very good, good, satisfactory, poor, and very poor. Another example opinion about a product of soap may be strongly opposed, opposed, neutral, support and strongly support for such a categorically we can code the 1 for" strongly opposed", 2 for "opposed", 3 for "neutral", 4 for " support" and 5 for "strongly support".
Here the values are not quantitative but a natural ordering exist between the values. It is not necessary that the difference between the category 2,3 is same as the difference between the category 4 and 5. Commonly, for the prediction of an ordinal response variable ordinal logistic regression is used (Bello et al., 2016;Christensen, 2010

Ordinal Logistic Regression Model
Many models for the ordinal dependent variable has been studied in literature; the basic objective of ordered logit models is calculation of accumulative probability for dependent variable being greater than the th j category (Peterson and Harrel, 1990;Brant, 1990;Agresti, 1996;Liu and Agresti, 2005). McCullagh (1980) called this model as the proportional odds model (POM), where the basic assumption is that effect of independent variable is same across all the categories of dependent variable, this is also called the proportional odds assumption or parallel lines assumption. To relax this proportional odds assumption we used some generalized model that relax the proportional odds assumption for some predictors or for all the predictors, called as the partial proportional odds model (PPOM) by Peterson and Harrell (1990).
For a response variable Y with C categories and a set of predictors X having the effect parameters  the probability of response variable being less than or equal to category j can be modeled by the logistic distribution as Pr  The above proportional odds model gives the cumulative probability j  of category j and for the response variable having categories C we find the 1 C  cumulative probabilities as for the last category the cumulative probability is always equal to one. The above model can also be written as The odds of response variable being less than or equal to category j to category greater than j can be found as And the logit model is the natural log of odds ratio and is the linear function of k independent variables   The proportional odds model assumes that the explanatory variables have the same on the response variable across all the categories of the response variable, this is called the proportional odds assumption. Under the assumption of proportional odds the s  remains same and only intercepts varies for different categories of response variable. As the sign of s  is negative (subtracted) they show how the one unit increases in predicter increase in log odds of being in the category greater than j. The cumulative logit model

Multilevel Ordinal Logistic Regression Model
Many types of data, including observational study collected from the human and biological sciences, are nested in clustered. For example, children belong to same parents are more to same in their mental and physical look than children selected at random from a large population. Individual further may be nested within localities and institution such as schools etc. Multilevel structure can also be exist in time involving studies, where individual's responses may be correlated over time (Chan et al., 2015). Multilevel models are for hierarchical nested data structure by allowing error components at different level of hierarchy. For example, a two level used for child outcomes where child outcomes are nested within schools. Multilevel model estimate the residual at both level, students and schools. Thus the total residual variance is divided into parts one for between schools (level two units, schools residual) and one for within schools ( between level one units, students residuals).
In the field of science, education and related field mostly research application has the dependent variable of ordinal nature and is non normal nature in their distribution. For example the marks of students in an examination is categorized as ( 1: Extra ordinary, 2: Good, 3: Satisfactory, 4: Average, 5: Fail ) here the dependent variable is measured on ordinal scale having five ordered categories, or proficiency scores obtained by mastery testing process, such as when we are interested to find out the factors affecting students level of proficiency in reading or English ("e.g., 1: Below basic; 2: Basic; 3: Proficient; 4: Beyond proficient"). The regression analysis for ordinal categorical data, non ordinal data, rates, proportions and for all other types of non normal outcomes, time involving data are most of the part of models that are generally belongs to generalized linear models. For continuous data the ordinary least square regression models are known as the special case of generalized linear models. We also used the generalized linear models to show the pattern of most variables which are of limited or discrete type and also are of non normal type. The multilevel analysis is used to estimate the model and to estimate the variation is response occur within cluster and between clusters, the units of higher level. The variation between clusters represent random effect due to inclusion of cluster level in the model. In multilevel modeling this variation is most important and interesting part of the analysis.
Which measure of odds of cij Y being in the category less than or equal to C as compared to greater than the category C, and oj u is the random effect of level two units and is assumed to follow normal distribution 0 (0, ) N  . The above model random intercept model when there is no explanatory variables. When model also have some fixed explanatory variables then the above model can be written as Where ij X is the data matrix of fixed predictors, hence fixed effect  s are the same as for simple proportional odds model. The sample size required to find reliable estimate of multilevel models for ordinal data depends on many factors, for example the complication of the model, estimate of cluster level variance, method of parameters estimation etc. Some guidelines about sample size are provided from recent simulation studies for multilevel order logit model. Austin (2010) used a random intercept logistic regression model, whereas Moineddin et al. (2007) used both random intercept and random slopes logit model in which both the intercept and the slope randomly vary across clusters. For the model with random intercept case, the estimates are comparatively good with every estimation methods even for 10 to 15 clusters, with cluster size the average equal to 10. If the size of clusters are small then more cluster are required to obtain reliable estimates. For the random slope model more clusters with larger size are required say 30 clusters with average size of 30. Intra class correlation coefficient is used as assessment of how many variation in the response categories lies at the level two (group level). When logistic model is used the residual at level one are assumed to follow the standard logistic distribution with mean 0 and variance 2 3  =3.29. ICC the within groups variation for dichotomous and ordinal outcomes (Snijder and Bosker 2008) is defined as  is variance level two error term and 3.29 is variance of standard logistic distribution.

Data and Methodology
In present study for the analysis and measure of Wealth Index Quintile, we used Multiple Indicator Cluster Data (MICS-2014). The target population of the Survey is all the area of province Punjab rural and urban which are defined by population census of 1998 and changing made by Government thereafter. The MICS survey conducted by Bureau of statistics in the 2004, used multistage stratified sampling to select the a sample of house hold, a complete list of enumeration area has been taking from the Punjab bureau of statistics. The sampling frame was first divided into 9 Division and thirty six administrative districts in the Punjab, and the divided into rural and urban areas of the province Punjab. So first stage sampling consist of selection of enumeration area and second stage sampling consist of selection of sample of house hose.
The value of all physical, natural and financial assets owned by a household is called the wealth. The wealth index is a cumulative measure of a household living standard. It is a composite index includes of all the asset of ownership variables, wealth index indicate the proxy wealth of household level. Wealth index is calculated using the data about the ownership of household assets, such as material used in the construction of house, types of access to water for drinking, facilities of sanitation, bicycles, television, cars etc. To construct the wealth index of a house we need all indicator that that are used as assets. Wealth index vary from household to household, locality to locality and country to county. After 1990s, wealth indices are become major tool for measuring the economic status of a household. These are considered effective indicator for Scio-economic position, standard of living, health standard of household (Córdova, 2009).
Wealth characteristic of household has a greater effect on the health. Wealth index can be used for identification the problems, specially for poor household, as poor has unequal access to health care facilities compared to wealthy. DHS program that is partially funded by the world bank developed the wealth index, that is used by governments to identify whether education, health services, and other essential are reaching to poorest households. Wealth index is used to check how economics status of household affects health education etc (Garenne and Hohmann, 2003;Hong et al., 2006).
Estimation of wealth index is based on the data collected by household questionnaire, which include the questions related to ownership of number of televisions, dwelling status, characteristics related to building of house, source of drinking water, number of cars, toilet facility and other characteristics related to wealth status (Howe et al., 2008, Van Campenhout, 2007. Assign the weights or factor score to each household assets which is collected by principal components analysis. Standardized the assets scores by the standard normal distribution that had zero mean and unit standard deviation. Then break points these standardized scores to obtain the wealth index quintile as "Lowest", "Second", "Third", "Fourth" and "Highest". Standardized score are assigns to each assets of household, and these score differs for each asset depending on that household own or not that asset. The scores of all assets are summed and household are ranked on the basis of total score. A single index for asset is identified by the data about the entire country sample and used for all households. These indices are same for rural and urban living households and no separate index is fined for rural and urban areas (Rasbash et al., 2002). The present research is planned to measure the wealth index quintile for house hold in the province Punjab by ordinal logistic regression model and by multilevel ordinal logistic regression model.

Results and Discussion
Descriptive statistical analysis is most important part of the study and is the best way to check the nature of the data. Following table shows the descriptive statistics for household data.

Ordinal Logistic Regression Models
The present study is to estimate the wealth index quintile a ordered categorical response variable ij Y having five ordered categories (Lowest, Second, Third, Fourth, Highest).
Ordered logit model estimate the cumulative probability j  or cumulative log odds      x (education of household head) more over first category non/preschool is taken as reference. 33 l x  where l varies over 1 to 7 the categories of 3 x ( main source of drinking water) and first category pipe into dwelling is taken is reference. 44 l x  , l varies over 1 to 2 the categories of 4

Ordinal Logit and Multilevel Ordinal Logit Models: An Application on Wealth Index MICS-Survey Data
x (dwelling status of household) and first category own of 4 x is taken as reference. 55 l x  l varies over 1 and 2 the categories of 5 x (household own any animal) more over first category yes is taken as reference. 66 l x  where l varies over 1,2 the categories of 6 x (electricity facility) and first category yes is taken as reference. 77 l x  where l varies over 1 to 6 the categories of 7 x ( type of fuel used for cocking ) more over the first category electricity if taken as reference. In 88 x  , 8  is the slope coefficient for 8 x number of household member. In 99 x  , 9  is the slope coefficient for 9 x (total children aged 1-17 year). Statistical results presented in table 3.2 showed that p-values for each predictors is less than 0.01 except one motorized pump, so all the predictors with all the categories are statistically significant. Exp(B) is odd ratio for the each predictor variables, the interpretations are made on the basis of above results. The four intercept are used to differentiate the category of wealth index quintile one for each comparison. These are also called is the cut points of comparison -2.159 is used for comparison of lowest to second, third, fourth and highest, 0.029 is used to compare category lowest, second to category third, fourth and highest, 0.182 is used as intercept of comparison of category lowest, second and third to category fourth and highest and 4.562 is used as intercept to compare the category lowest, second, third and fourth to category highest of wealth index quintile. The household living in the urban area OR (exp(b)) =

Multilevel Ordinal Logistic Regression Models
In the previous section we have fit the ordinal logit model for household level (single level) and explained the parameter estimates for the household level. As households are nested in the high level of hierarchy such as households are nested in division (level 2) so we need to fit a two level multilevel ordinal logit model to estimate the response variable wealth index quintile. In two level random intercept model we allow the cut points to vary at different level of hierarchy. We take these random variation as a part of total random variation, i.e. we distribute the whole random variation into parts associated with different levels of hierarchies. In the present problem the estimation of wealth index quintile we use two level multilevel random intercept ordinal logit model to find variation due division level of household.
A two level random intercept ordinal logit model to estimate the wealth index quintile with fixed predictors the cumulative log odds model can by written as represent the cumulative probability of th i household in th j division less than or equal to category C where 1, 2,3, 4 C  . oj u is random variation of th j division and is assumed to follow Log odds of category lowest to all above ( Log odds for the categories lowest-third to category fourth and second   For multilevel model the estimation method MQL 1 st order produced can produced the biased estimated there we used the PQL method for estimation of parameter of multilevel models. The result of MLWIN, PQL 2 nd order estimation are shown in the table above. Where the value of Intra Class Correlation (ICC) 0.6397 show that 63.97% variation in dependent variable "wealth index" in at division level, (level two). By using the random intercept model with fixed predictors the variation at division level is 5.842. Intercept (cut point) to find the odds of lowest wealth index to all above categories is -4.119, cut point for odds of category lowest, second to third fourth and highest wealth index is -