Bayesian Analysis of Linear and Nonlinear Latent Variable Models with Fixed Covariate and Ordered Categorical Data

In this paper, ordered categorical variables are used to compare between linear with covariate and nonlinear interactions of covariates and latent variables in Bayesian structural equation models. Gibbs sampling method is applied for estimation and model comparison. Hidden continuous normal distribution (censored normal distribution) is used to handle the problem of ordered categorical data. Statistical inferences, which involve estimation of parameters and their standard deviations, and residuals analyses for testing the selected model, are discussed. The proposed procedure is illustrated by a real data. Analyses are done by using OpenBUGS program.


Introduction
Structural equation modeling (SEM) is a statistical approach to testing hypotheses about the relationships among observed and latent variables.Observed variables also called indicator variables or manifest variables.Latent variables also denoted as unobserved variables or factors.Examples of latent variables in education are math ability and intelligence and in psychology are depression and self confidence.The latent variables cannot be measured directly.Researchers must define the latent variable in terms of observed variables (Khine, 2013).
At present, most statistical theory and computer software in the field of SEMs are based on models that involve only linear relationships among the manifest and the latent variables.More statistically sound methods for linear and nonlinear SEMs and factor analysis have been proposed by Lee and Song (2003), Lee and Song (2005), Lee (2006) , Lee and Tang (2006(, Cai et al. (2008( , Lee et al. (2009), Lee et al. (2010), respectively.We used the Bayesian approach to develop methods for statistical inference.MCMC methods, such as the Gibbs Sampler )Geman and Geman, 1984( is used in this paper.Theoretically, in the light of the extension of simple linear regression to multiple regression and nonlinear regression, the importance of generalizing linear structural equation models to nonlinear models that include nonlinear terms of the latent variables is obvious.Practically, nonlinear relationships such as quadratic and interaction terms among the variables are important in establishing the substantive theory in many areas.The rapid growth of SEMs is due to the demand of subtle models and the related statistical methods for solving complex research problems in various fields. The main objective of this paper is to propose a Bayesian approach for analysing linear and nonlinear SEMs with ordered categorical variables.The Deviance Information Criterion (DIC; see Spiegelhalter et al., 2002) will be used for model comparison.
The main idea is to handle the ordered categorical variables in the Bayesian analysis and to treat the underlying latent continuous measurements as hypothetical missing data and augment them with the observed data in the posterior analysis.
The paper is organized as follows.Model Description is described in section 2. Bayesian estimation of structural equation models which contain Linear and nonlinear models are described in Section 3. Models comparison using (DIC) are described Section 4. A case study is presented in section 5. Empirical results, which are obtained from a case study, are discussed in Section 6.Some concluding remarks are given in section 7.

Model Description
Consider the following measurement equation for a 1 p  manifest random vector i y 1,..., n, where ( 1) p   is the vector of intercepts Where is underlying latent continuous measurements are unobservable.The information associated with y is given by an observable ordered categorical vector 1 ( ,..., ) . That is, any latent variable may have continuous and/or ordered categorical manifest variables as its indicators.The relationship between y and z is defined by a set of thresholds as follows: Where for 1,..., ,  (Lee, 2007).
It has been pointed out by Lee et al. (1990) that single-sample models with ordered categorical variables are not identified without imposing identification conditions.This is also the case for multi-sample models.To solve this problem, we use the common method (see, for example, Lee et al., 1995;Shi and Lee, 1998) of fixing some thresholds at preassigned values.For convenience, we assume that the positions of the fixed elements are the same for each group.

Bayesian estimation of structural Equation models
The objective of this section is to describe a Bayesian approach for analyzing the preceding nonlinear structural equation models in the context of ordered categorical data.Nice features of a Bayesian approach include the following: (a) Prior knowledge can be directly incorporated in the analysis.As a result, more accurate parameter estimates can be obtained under situations with good prior information; (b) As mentioned by many articles on Bayesian analysis of structural equation models (Lee, 2006  depends on the sample size, whereas () p  does not.For large samples, the prior of  plays a less important role, and the posterior density function ( | Z) p  is close to the likelihood function () p  .Thus, the Bayesian and ML approaches are asymptotically equivalent, and the Bayesian estimates have the same optimal asymptotical properties as the ML estimates.However, () p  plays a significant role in the Bayesian approach in the situations where the sample size is small, or the information given by ordered categorical data Z.
In this article, we define the Bayesian estimate of  as the mean of the posterior distribution (called the posterior mean).For simple structural equation models, the posterior mean can be obtained through direct integration.However, due to the complexity of the proposed nonlinear structural equation model with covariates and dichotomous variables, the relating integral does not have a closed form.We apply some MCMC methods in statistical computing to solve this problem.Let , var( | ) ( 1) ( )( ) .
It is necessary to specify the prior distributions for components in  when deriving the conditional distribution of  given ( , , ) Step a.In general Bayesian analyses, the conjugate prior distributions have been found to be flexible and convenient (Broemeling, 1985).
This kind of prior distribution has been widely applied to many Bayesian analyses in structural equation models (Lee and Song, 2004 ;Song and Lee, 2007).Hence, the following well-known conjugate prior distributions are used: Where k   is the kth diagonal element of and 0 R are assumed to be given by the prior information.In general, prior information can be obtained from causal observation or theoretical consideration of experts, or the analyses of past data.As pointed out by Kass and Raftery (1995) , priors are often picked for convenience when there is a lack of accurate prior knowledge, because the effect of the priors in Bayesian estimation is small when the sample size is fairly large.For completeness, the conditional posterior distributions of , Y  and the components of based on the conjugate prior distributions.These results are useful for writing the computer program to implement the Gibbs sampler.
In the Bayesian approach, we need to evaluate the posterior distribution [ , , ] Z


. This distribution is rather complicated.To capture its characteristics, we will try to draw a sufficiently large number of observations from it such that the empirical distribution of the generated observations is a close approximation to the true distribution.A good candidate for simulating observations from the posterior distribution is the Gibbs sampler (Geman and Geman, 1984), which iteratively simulates ,  and  from the full conditional distributions.However, owing to the presence of the ordered categorical variables, these conditional distributions are rather complicated to derive and simulating observations from them is difficult.This motivates the further augmentation of the latent matrix Y in the posterior analysis, and the consideration of the joint posterior distribution [ , , , ] YZ


. To implement the Gibbs sampler for generating observations of this posterior distribution, we start with initial starting values simulate (1) (1) (1) (1)  ( , , , ) Y  and so on according to the following procedure.At the mth iteration with current values ( ) ( ) ( ) ( ) , , , Generate ( 1)   m ( , ,  , Z) The cycle defined above generates ( , , , Z) mm pY    after the mth iteration.As m approaches infinity, the joint distribution of The sequences of the quantities simulated from the joint posterior distribution will be used to calculate the Bayesian estimates and other related statistics.Convergence of the Gibbs sampler can be monitored by the plots of several simulated sequences of the individual parameters with different starting values.The sequences of the quantities simulated from the joint posterior distribution will be used to calculate the Bayesian estimates and other related statistics.

Model Comparisons
A model comparison statistic that takes into account the number of unknown parameters in the model is the DIC (see Spiegelhalter et al., 2002).This statistic is intended as a generalization of the Akaike Information Criterion (AIC; Akaike, (1973) .Under a competing model k M with a vector of unknown parameters k be a sample of observations simulated from the posterior distribution.
The DIC for k M is computed as follows: () In model comparison, the model with the smaller DIC value is selected.As mentioned in Spiegehalter et al. (2003), practical applications of DIC, it is important to note the followings: (a) If the difference in DIC is small, for example less than 5, and the models make very different inferences, then just reporting the model with the lowest DIC could be misleading.(b) DIC can be applied to non-nested models.(c) Similar to the Bayes factor (Kass and Raftery, 1995) BIC, and AIC, DIC gives a clear conclusion to support the null hypothesis or the alternative hypothesis.
To illustrate the use of DIC for model comparison, we analyzed the same data by a linear and a nonlinear structural equation model with the same measurement model: *x(i,1) , *x(i,1)+ *x(i,1)* + *x(i,1)* + *x(i,1)* + *x(i,1)* + The DIC value corresponding to the linear and nonlinear structural equation model produced by OpenBUGS.The correct nonlinear structural equation model is selected.

Case Study and Example
The quality of life data set was established by (Power et al,. 1999) to evaluate three latent variables 1, 2 3 ( , , )     .Some of these variables are selected in this paper.The first three items (Q3 to Q5) are intended to address physical health, the next three items (Q6 to Q8) are intended to address psychological health, the three items (Q9, Q10, Q11) that follow are for social relationships, and the last one items (Q12) are intended to address environment.The instrument also includes two ordered categorical items for the overall QOL (Q1) and general health (Q2), giving a total of 11 items.
( , ,..., ) The relationships of the latent variables in The prior is informative and can have a significant effect on the parameter estimates for a small sample size case.
The parameter values were analyzed by OpenBUGS.In checking convergence, we observed that most parameters such as , , , ,       converged quickly see Figures 3  and 4. Comparing the Bayesian analyses of structural equation models with data, the MCMC procedure for analyzing data required more iterations to converge.Bayesian estimates were obtained from T=10000 iterations after discarding 5000 burn-in iterations.The Bayesian estimates of nonlinear SEM with covariate and the 95% HPD intervals are presented in Table 1.The Bayesian estimates of linear SEM with covariate and the 95% HPD intervals are presented in Table 2. Estimates of the latent variables were also been obtained from OpenBUGS.The performances of deviance information criterion when comparing linear and nonlinear models are presented in Table 3.

Results and Discussion
The objective of this section is to present results of a case study to reveal the empirical performances of the Bayesian estimates and the DIC for model comparison.For linear and nonlinear SEMs, we have the following proposed models: *x(i,1)+ *x(i,1)* + *x(i,1)* + *x(i,1)* + *x(i,1)* + + + In this paper, a Bayesian approach is introduced for analysing linear and nonlinear SEMs with ordered categorical variables.The Bayesian estimates of the unknown parameters and the Bayesian model selection statistic DIC are obtained using recently developed powerful tools in statistical computing.All the computational work can be accomplished via the recently developed and freely available software OpenBUGS.Therefore, our proposed method can be conveniently applied on real data.The purpose of this analysis is to compare between linear and nonlinear Bayesian SEMs with ordered categorical data.
There are some limitations of the current analysis.First, due to the design of questionnaires and the nature of the problems in behavioral, educational, medical and social sciences, data are often in ordered categorical variables with observations in discrete form.In analyzing ordered categorical data, the basic assumption in SEM is that the data comes from a continuous normal distribution which is clearly violated, and rigorous analysis that takes into account the ordered categorical nature is necessary.
Hence, Clearly, routinely treating ordered categorical variables as normal may lead to erroneous conclusions (see Lee et al., 1990;Olsson, 1979).
A better approach for assessing this kind of discrete data is to treat them as observations obtained from a hidden continuous normal distribution with a threshold specification.Second, the current analysis was conducted under the normality assumption of the observed variables in the model.However, this assumption is likely to be violated.Developing a linear & nonlinear Bayesian approach to relax the normality assumption in NSEMs may represent a future research topic.The results corresponding to the linear and nonlinear SEM with covariates under Type I and II inputs, we observed that the SD values in linear SEM are smaller than the SD values for the nonlinear SEM with covariates.However, it is expected that the empirical performance would be worse with the nonlinear SEM with covariates.
The HPD intervals of all the parameters were computed.We observed that the performances of the HPD intervals in linear SEM and covariate are satisfactory for ordered categorical variables.To reveal the performance of DIC for model comparison, we reanalysed the data sets via a nonlinear SEM and covariate in the structural equation (Model 5).The DIC values obtained were compared to those obtained under the correct model.Results are presented in Table 3.
The model fitting DIC in linear SEM is less than the DIC value in nonlinear SEM in the ordered categorical variables.As a result, we observed the performance of DIC is not satisfactory and would be worse under nonlinear effect and ordered categorical variables.However, it performs very well with linear effect with covariate and ordered categorical variables.
Convergence of the Gibbs sampler are monitored by the plots of several simulated sequences of the individual parameters with different starting values and are presented in Figures 3 and 4 respectively.Bayesian estimates were obtained from T=10000 iterations after discarding 5000 burn-in iterations in linear and nonlinear SEMs with covariates.

Conclusions and Recommendations
The Bayesian linear and nonlinear SEMs which involving covariates are very common in social and behavioural sciences.However, in SEM, examples that incorporate nonlinear and covariate terms of latent variables into structural equations exist.As pointed out by Bollen and Paxton (1998), Schumacker and Marcoulides (1998) among others, the lack of applications is not due to the failure of substantive arguments that suggest the presence of nonlinearity, rather the existing statistical methods are technically demanding and not well understood.In this paper, a Bayesian approach is proposed for analysing a linear and nonlinear, covariate models with ordered categorical variables.In addition to point estimation, we provide statistical methods to obtain standard deviation estimates, and model comparisons using the deviance information criterion (DIC).Owing to the complexity of the proposed model, as we have seen, causal relationships among the latent variables and the discrete nature of ordered categorical data manifest variables are alleviated by data augmentation with some MCMC methods.More specifically, the basic idea of our development is inspired by the following common strategy from recent work in statistical computing (see Rubin, 1991) that formulate the underlying complicated problem so that when augmenting the real observed data with the hypothetical missing data, the analysis would be relatively easy with the complete data.This strategy is very powerful and can be applied to other more complex models.
which are defined by the unknown thresholds , kj  .The integral values {0,1,..., } k b of k z are just used for specifying the categories that contain the corresponding elements in k y Bayesian estimate of  and the standard error estimates can be obtained from the following sample mean and sample variance matrix, respectively: the kth rows of  and   , respectively.

2  and 3  are 0 30   , 1 0RH
the nonlinear structural equation which is described in Equation(10).A covariate x (200*1) and those corresponding to 1  ,   8, respectively.The following accurate prior inputs of the hyperparameter values in the conjugate prior distributions of the parameters are considered: Prior I. Elements in 0  , 0k  and 0 k   in Equation (6) are set equal to the following values and initial values are equal to 1; Prior I. Elements in 0  , 0k and 0 k   in Equation (6) are set equal to the true values;  are taken to be 0.25 times the identity matrices;

5 Figure 3 . 5 Figure 4 .
Figure 3. Sequences of (a) 1  ; (b) 2  ; (c) 1  ; (d) 12  ; and (e)   for linear SEM The sample size of the whole data set is extremely large.To illustrate the Bayesian methods, we only analyze a synthetic data set with sample size n = 200.
with the following values of the parameters in

Table 1 .
Bayesian Estimation of nonlinear SEM with Ordered Categorical Variables

Table 2 .
Bayesian Estimation of linear SEM with Ordered Categorical Variables

Table 3 .
(2) DIC Values for Linear and Nonlinear EMs with Ordered CategoricalVariables and CovariatesThe results corresponding to the nonlinear SEM with covariates under Type I and II inputs, and ordered categorical variables are reported in Table(1).We observed that the SD values are very small in the nonlinear SEM.The results corresponding to the linear SEM and covariate under Type I and II inputs, and ordered categorical variables are reported in Table(2).We observed that the SD values are very small in the linear SEM with covariate.