On Inference of the Linear Regression Model with Groupwise Heteroscedasticity

The performance of heteroscedasticity consistent covariance matrix estimators (HCCMEs), namely, HC0, HC1, HC2, HC3 and HC4 have been evaluated by numerous researchers for the heteroscedastic linear regression models. This study focuses on examining the performance of these covariance estimators in case of groupwise heteroscedasticity. With the help of the Monte Carlo simulations, we evaluate the performance of these covariance estimators and the associated quasi-t tests. We consider the cases when data are divided into 10, 20 and 30 groups of different sizes and the regression is run on the mean values of the dependent variable and the regressor of these groups. The numerical results show that HCCMEs perform appealingly well in case of groupwise heteroscedasticity.


Introduction
The linear regression models for cross-sectional data often exhibit the problem of heteroscedasticity i.e. the error variances are not constant for all observations. In this situation, the ordinary least square (OLS) estimators of the parameters remain unbiased and consistent but become inefficient. Since the form of heteroscedasticity is usually unknown, the practitioners usually use the OLS estimators even when the error variances are not constant. However, the usual OLS covariance matrix estimator becomes biased and does not remain consistent when the homoscedasticity assumption is violated. Since the OLS standard errors are based directly on these variances, so the inferences drawn on the basis of these estimators become misleading and erroneous. It thus, becomes necessary to build and use alternative covariance matrix estimators that are consistent under both homoscedasticity and heteroscedasticity of unknown form. Several consistent covariance matrix estimators are available in the literature.
In his econometrics text book, Wooldridge (2008, pp. 249) writes, "In the last two decades, econometricians have learned to adjust standard errors, t, F and LM statistics so In real life, there are many situations when the dependent and explanatory variables have to be aggregated as average values i.e. the means of dependent and explanatory variables are available and we have to run regression on these average values. As a result of such aggregation, the error tem in the resulting model becomes heteroscedastic, in spite of the fact that it is homoscedastic in the model of individual values. It is because the number of observations (say) g n varies across the different groups. Such type of heteroscedasticity is called as groupwise heteroscedasticity (see Greene 2000, for more details). For groupwise heteroscedastic regression model, obviously, the usual OLS covariance matrix of estimates will be inconsistent. Thus, it becomes needful to use the HCCMEs. Now the question arises that what would be the performance of the HCCME in case of groupwise heteroscedasticity but the available literature is found to be silent in this situation. This thing motivated us to evaluate the performance of HCCMEs in case of groupwise heteroscedasticity. Using the Monte Carlo simulations, we evaluated the performance of HCCMEs for groupwise heteroscedasticity.

Heteroscedasticity-Consistent Covariance Matrix Estimator (HCCME)
Consider the regression model yX  where y is an 1 n vector of observations on the dependent variable, X is an np  known design matrix of rank p,  is an where   .

E   
where n is the number of observations and p is the number of unknown parameters.
But if the errors are heteroscedastic and the form of heteroscedasticity is unknown then the HCCME, presented by White (1980), commonly known as HC0, can be used. White's estimator is defined as The disadvantage of White's estimator is that it can be biased, usually downward, in small samples (see MacKinnon and White, 1985) and takes no account of the fact that OLS residuals tends to be too small. MacKinnon and White (1985) provided an improved estimator i.e., HC1, that was tailored after making the adjustment for the degree of freedom (d.f) as suggested by Hinkley (1977 The ith squared OLS residual is weighted by the reciprocal of (1- The third HCCME is the jackknife estimator of Efron (1979, as modified by Davidson and MacKinnon, 1993) The numerical results in Cribari-Neto (2004) showed that the asymptotic inferences in linear regression models were much affected by the presence of high leverage points in the design matrix. By the Monte Carlo simulations, Cribari-Neto (2004) showed that the quasi-t tests, based on the HC4 estimator, were reliable even in the presence of influential observations in the design matrix.

Groupwise Heteroscedasticity
The linear regression models are usually used in real life in which both the dependent and explanatory variables have individual values. There are many situations in real life, when the dependent and explanatory variable have aggregated or average values i.e. the means of dependent and explanatory variables are available and the regression is run for these average values. In spite of the fact that the error tem in the model of individual values is homoscedastic, when the observations are grouped and the regression is run on the means of dependent and explanatory variables, the error tem in the resulting model becomes heteroscedastic.
If the n observations are grouped into G groups each with g n observations and the means g y and g x for g = 1, 2, …, G groups are observed then the regression model, given in Eq. (1), becomes The error term g v in the above model is now heteroscedastic, irrespective of the fact that the error term in regression model (1) where g n is the number of observations in group G.

HCCME under Groupwise Heteroscedasticity
In case of groupwise heteroscedasticity, instead of the individual values, the mean of dependent variable is regressed on the means of explanatory variables. Consider regression model (4) where both, the dependent and independent, represent the respective mean values. For this model, White's estimator can be written as After applying the degree of freedom adjustment, we get the HC1 estimator as 11 2 1, where G is the number of groups which behaves like the sample size and p is the number of unknown parameters in groupwise heteroscedastic model.
The HC2 estimator can be written as where i r is the ith diagonal element of the hat matrix, The HC3 estimators becomes Finally, the HC4 estimator becomes These estimators are used to get the consistent covariance matrix estimator of coefficient estimates for the model with groupwise heteroscedasticity.

Numerical Evaluation
The numerical results, presented in this study, were obtained using the simple regression model x for all the 10 groups were obtained. It is obvious that x for G = 1, 2, …, 30 groups were observed. The regression for model (11) was run on the means of 20 and 30 groups, separately. These experiments were replicated 5000 times each for G =10, 20 and 30. The estimates of the  's were computed by the OLS, and the estimates of  OLS Cov  were computed using Eq. (3) i.e. using the OLS and the HCCMEs, HC0, HC1, HC2, HC3 and HC4, given in Eqs. (5)-(9), respectively. All the computations were performed through programming routines in the econometric package EViews 5.0 (visit www.eviews.com).
Following Cribari-Neto and Galvão (2003) and Cribari-Neto (2004), we computed the total relative bias (TRB) of the OLS, the HC0, HC1, HC2, HC3 and HC4 covariance estimators. The total relative bias yields the sum of the absolute relative biases of the estimated variances of 0  and 1  . The relative bias is defined as the difference between the mean value of variance estimator and the true value of variance divided by the true variance. Thus, the TRB is defined as   The figures in Table 1 and 2 convey that when G = 10, the HC4 estimator is much biased than the other estimators. On the other hand, the HC2 estimator yields lowest bias among all the estimators. When G = 20, we observe that the HC4 estimator is again much biased as compared to other estimators but the amount of bias reduces considerably when the number of groups (G) in which the data have been divided, increases. The HC2 estimator is again less biased than the other estimators. Cribari-Neto (2004) reported that the HC4 estimator was much biased among all the estimators and the same behavior has been observed in our case. We note that as the number of groups i.e. G increases, the TRB of The MSE thus comprises of not only bias but also the variance of different estimators.   Table 3 and 4 show that the OLS estimator yields the smallest RMSE as compared to other estimators under consideration for all G (number of groups). The HC4 estimator yields the largest RMSE as compared to other estimators. Cribari-Neto (2004) also reported the similar results but for the ungrouped data.
Following Cribari-Neto (2004), we computed the estimated null rejection rate (NRR) of quasi-t tests, based on the OLS and HCCM estimators.   Table 5 show that when G =10, the tests based on the HC0, HC1 and HC2 estimators result in larger size distortion. However, when G = 20, the performance of tests based on these estimators improves regarding size distortion. The tests that use the OLS, HC3 and HC4 estimators result in relatively less size distortion. We note that when G = 10 and 20, the tests based on the HC4 estimator, show satisfactory performance if the criterion is size distortion. For instance, when G = 10 and 20, the HC4-based test rejects the null hypothesis approximately 2% of the times that is close to the nominal level of the test. For G = 30, the tests based on all the covariance estimators, considered here, show large size distortion. However, the HC4 estimator results in less size distortion among rest of the estimators. Table 6 and 7 reveal the similar behavior for 5% and 10% level of significance, respectively under DGP-I.

 
We compute the empirical coverage and average length of confidence intervals for the both coefficients 0  and 1  but for discussion here, we focus on 1  only.

Conclusion
It is common practice to use the HCCMEs to get the consistent estimates of variances of the OLS estimates for heteroscedastic linear regression models. In many practical situations, averages of the dependent and explanatory variables are used and regression is run on these average values. In this situation, the problem of groupwise heteroscedasticity is evident. The present study unfolds the performance of the conventional HCCMEs to encounter groupwise heteroscedasticity by providing consistent covariance matrix estimates. Five versions of HCCME i.e. HC0-HC4, have been computed for the linear regression model, facing groupwise heteroscedasticity. The Monte Carlo results reveal that the HC3 and HC4 estimators perform adequately well in the presence of groupwise heteroscedasticity as they do in the usual heteroscedastic linear regression model. However, the HC4 estimator outperforms the HC3 estimator, giving less size distortion and better coverage in interval estimation.