A New Extended G Family of Continuous Distributions with Mathematical Properties , Characterizations and Regression Modeling

We propose a new extended G family of distributions. Some of its structural properties are derived and some useful characterization results are presented. The maximum likelihood method is used to estimate the model parameters by means of graphical and numerical Monte Carlo simulation study. The flexibility of the new family illustrated by means of two real data sets. Moreover, we introduce a new log-location regression model based on the proposed family. The martingale and modified deviance residuals are defined to detect outliers and evaluate the model assumptions. The potentiality of the new regression model is illustrated by means of a real data set.


Introduction
In the statistical literature, there are many G families which have been introduced based on the bounded models such as the beta-G family of distributions by Eugene (2002), the Kumaraswamy G family of distributions by Cordeiro and de Castro (2011)  respectively.The new log-location regression model based on the NE Weibull distribution provides better fits than the log-Weibull regression model in modelingthe HIV survival data.

Useful expansions
First, we consider two power series (1 + ) − = ∑ ∞ =0 (−1 + )   Fourthly, using the binomial expansion to expand quantity , the above equation will be Fifthly, applying (5) for , we have Finally where and Π  () =  (; )  is the cdf of Exp-G with power parameter .The corresponding pdf can be written as where   () = (, ) (, ) −1 is the pdf of the Exp-G distribution with power parameter .Equation ( 7) and (8) reveal that pdf of NE-G is a linear combination of Exp-G densities.Thereby, some properties of the proposed family such as moments and generating function can be determined from those of Exp-G distribution.

Two Special Members of the Family
The NE-G family generates alternative extended distributions.Now, we present the two important sub-models of this new family.

The NE-normal (NEN) distribution
As a first example, we extend the ordinary normal distribution which has symmetrical and bell-shaped.We define the NEN distribution by taking (; , ) = Φ ( Some possible plots of the NEN density for selected parameter values are displayed in Figure 1.
Figure 1: The pdf plots of NEN distributions for some selected parameter values.

The NE-Weibull (NEW) distribution
As the second example, we define the NEW distribution by employing the Weibull cdf with shape  > 0 and scale  > 0 parameters, defined by (; , ) = 1 − exp( −()  ) (for  > 0).The cdf of the NEW distribution is given by Possible plots of the NEW density and hrf for selected parameter values are displayed in Figure 2. From these Figures, we can say that pdf shapes of the NEW distribution can be bi-modal, uni-modal and decreasing-increasing-decreasing shaoed.Also, its hrf can be both monotone and non-monotone shaped.

Moments, incomplete moments and generating function
The r ℎ ordinary moment of  is given by Henceforth,   denotes the Exp-G model with power parameter .For  > 0, we have The r ℎ incomplete moment, say   (), of  can be expressed from (6) as The ) is easily calculated using (2) and  1 () is the first incomplete moment given by (10) with  = 1.A general equation for  1 () can be derived from (10) as where is the first incomplete moment of the Exp-G model.The moment generating function (mgf)   () = (   ) of  can be derived using equation (9) as where   () is the mgf of   .Hence,   () can be determined from the Exp-G generating function.

Moments of residual and reversed residual life
The n ℎ moment of the residual life say,   () = [( − )  | > ], = 1,2,.. uniquely determines ().The n ℎ moment of the residual life of  is given by The n ℎ moment of the reversed residual life say,   () = [( − )  | ≤ ] for  > 0 and  = 1,2,... uniquely determines ().We obtain Then, the n ℎ moment of the reversed residual life (RRL) of  becomes The mean residual life (MRL) function or the life expectation at age  defined by  1 () = [( − )| > ], which represents the expected additional life length for a unit which is alive at the age .The MRL of  can be obtained by setting  = 1 in   () equation.

Characterizations
This section is devoted to the characterizations of the NE-G family of distributions in different directions: () based on the ratio of two truncated moments; () in terms of the reverse hazard function.Note that () can be employed also when the cdf does not have a closed form.We would also like to mention that due to the nature of NE-G family of distributions, our characterizations may be the only possible ones.We present our characterizations () − () in two subsections.

Characterizations based on two truncated moments
This subsection deals with the characterizations of NE-G family of distributions based on the ratio of two truncated moments.Our first characterization employs a theorem due to Glänzel (1987), see Theorem 1 of Appendix A .The result, however, holds also when the interval  is not closed, since the condition of the Theorem is on the interior of .
Corollary 5.1: Let : Ω → ℝ be a continuous random variable and let  1 () be as in Proposition 5.1.The random variable  has pdf (3) if and only if there exist functions  2 and  defined in Theorem 1 satisfying the following differential equation

Corollary 5.2:
The general solution of the differential equation in Corollary 5.1 is where  is a constant.We like to point out that one set of functions satisfying the above differential equation is given in Proposition 5.1 with  = 1 2 . Clearly, there are other triplets ( 1 ,  2 , ) which satisfy conditions of Theorem1.

Characterization in terms of the reverse hazard function
The reverse hazard function,   , of a twice differentiable distribution function,  , is defined as In this subsection we present a characterization of NE-G family of distributions for  =  = 1, in terms of the reverse hazard function.In this case

Examples:
a) Take the baseline cdf to be exponential with parameter  , then cdf will be given by ,  ≥ 0. b) Take the baseline cdf to be uniform (0,1), then cdf will have the form , 0 ≤  ≤ 1.

Estimation and inference
Several approaches for parameter estimation were proposed in the literature but the maximum likelihood method is the most commonly employed.The MLEs enjoy desirable properties and can be used for constructing confidence intervals and also for test statistics.The normal approximation for these estimators in large samples can be easily handled either analytically or numerically.Here, we consider the estimation of the unknown parameters of the new family from complete samples only by maximum likelihood.Let  1 , … ,   be a random sample from the NE-G models with a ( + 2) × 1 parameter vector  =(, ,  ú ) ú , where  is a  × 1 baseline parameter vector.The loglikelihood function for  is given by (for  = 1, . . ., ).Setting the nonlinear system of equations () = () =  (  ) = 0 (for  = 1 = ⋯ , ) and solving them simultaneously yields the MLE  ̂.To solve these equations, it is more convenient to use nonlinear optimization methods such as the quasi-Newton algorithm to maximize ℓ  () numerically .For interval estimation of the parameters, we can evaluate numerically the elements of the ( + 2) × ( + 2) observed information matrix () = {− ∂ 2 ∂Φ  Φ  [ℓ  ()]}.Under standard regularity conditions when  → ∞, the distribution of  ̂ can be approximated by a multivariate normal   (0, ( ̂)−1 ) distribution to construct approximate confidence intervals for the parameters.Here, ( ̂) is the total observed information matrix evaluated at  ̂.The method of the re-sampling bootstrap can be used for correcting the biases of the MLEs of the model parameters.Good interval estimates may also be obtained using the bootstrap percentile method.

Simulation studies
In this Section, we perform two simulation studies by using the new extended normal and Wiebull distributions to illustrate the performance of MLEs corresponding to these distribution.The random numbers generation is obtained by the inverse of their cdfs.All results related to MLEs were obtained using optim-CG routine in the R programme.

Simulation study 1
In the first simulation study, we obtain the graphical results.We generate  = 1000 respectively.We give results of this simulation study in Figure 3. From Figure 3, we observe that when the sample size increases, the empirical means approach the true parameter value whereas all biases, sds and MSEs approach 0 in all cases.

Simulation study 2
In the second simulation study, we generate 1,000 samples of sizes 50, 100 and 200 from selected new extended Weibull distributions.For this simulation study, we obtain the empirical means and sd's of the MLEs.The results of this simulation study are reported in Table 1.Table 1 shows that when the sample size increases, the empirical means approach true parameter value whereas the sds decrease, as expected.

Log-NEW regression model
Let  be a random variable having the NEW density function with four parameters  > 0,  > 0,  > 0 and  > 0 as discussed in Subsection 2. ,.. (11) where  ∈ ℜ is the location parameter,  > 0 is the scale parameter,  > 0 and  > 0 are the shape parameter.We refer to Equation ( 11) as the log-NEW (LNEW) distribution and write ~  (, , , ).The standardized random variable  = ( − )/ has density function Based on the LNEW distribution, a linear location-scale regression model is proposed by linking the response variable   and the explanatory variable vector    = ( 1 , … ,   ) by where the random error   has density function (13),  = ( 1 , … ,   )  ,  > 0,  > 0 and  ∈ ℜ  are unknown parameters.The parameter   =     is the location of   .The location parameter vector  = ( 1 , … ,   )  is represented by a linear model  = , where  = ( 1 , … ,   )  is a known model matrix.
Consider a sample ( 1 ,  1 ), ⋯ , (  ,   ) of  independent observations, where each random response is defined by   = min{log(  ), log(  )}.We assume non-informative censoring such that the observed lifetimes and censoring times are independent.Let  and  be the sets of individuals for which   is the log-lifetime or log-censoring, respectively.The log-likelihood function for the vector of parameters  = (, , ,   )  from model ( 14) has the form () = ∑ ∈   () + ∑ ∈   () (), where   () = log[(  )],   () () = log[(  )], (  ) is the density (11) and (  ) is the survival function ( 12) of   .Then, the total log-likelihood function for  is given by where   = exp(  ),   = (  −    )/ and  is the number of uncensored observations (failures) and  is the number of censored observations.The MLE ̂ of the vector of unknown parameters can be evaluated by maximizing the log-likelihood (15).
The asymptotic covariance matrix () −1 of ̂ can be approximated by the inverse of the ( + 2) × ( + 2) observed information matrix −Ł ̈(), whose elements are evaluated numerically in most statistical packages.The approximate multivariate normal distribution  +2 (0, −Ł ̈() −1 ) for ̂ can be used in the classical way to construct approximate confidence intervals for the parameters in .

Residual analysis
Residual analysis has critical role in checking the adequacy of the fitted model.In order to analyze departures from the error assumption, two types of residuals are considered: martingale and modified deviance residuals.

Martingale residual
The martingale residuals is defined in counting process and takes values between −∞ and +1 (see, Fleming and Harrington(1994) for details).The martingale residuals for LNEW model is, = exp(  ) and   = (  −    )/.

Modified deviance residual
The main drawback of martingale residual is that when the fitted model is correct, it is not symmetrically distributed about zero.To overcome this problem, modified deviance residual was proposed by Therneau et al. (1990).The modified deviance residual is given by where ̂  is the martingale residual.

Data Analysis
In this section, we provide applications to three real data sets to prove empirically the potentiality of NEW and NEN models.We also compare the fits of these models with some generalizations of the Weibull and normal distributions on two real data sets.The third data set refers to regression modeling.To determine the optimum model, we also compute Cramer von Mises ( * ) and Anderson-Darling ( * ) goodness of-fit statistics for all models.The statistics  * and  * are described in detail in Chen and Balakrishnan (1995).In general, it can be chosen as the best model which has the smaller values of the  * and  * statistics.All computations of the MLEs are performed by the maxLik routine and all goodness-of-fits statistics are calculated by the goftest routine in the R programme.The details are given by followings.

Stress data
The first real data set introduces the stress-rupture life of kevlar 49/epoxy strands which are subjected to constant sustained pressure at the 90% stress level until all had failed such that we obtain complete data with exact failure times.This data set was studied by Andrews and Herzberg (1985), Cooray and Ananda (2008)  ,  = 1, . . .,  and   are the order statistics of the sample.It is convex shape for decreasing hrf and is concave shape for increasing hrf.The TTT plots for this set is given by Figure 5. From Figure 5, the data set deals with convex-concave-convex shaped.Table 2 lists the MLEs, their standard errors of the parameters and goodness-of-fits statistics from the fitted models.Table 2 shows that the NEW model could be chosen as the best model among the fitted models since these models have the lowest values of the  * and  * statistics.The plots of the fitted densities, cdfs and hrfs of all models, and probability-probability (P-P) plot of NEW model are displayed in Figure 6.These plots show that the NEW model provides the good fit to these data compared to the other models.The fitted hrf shape of NEW model is close to TTT plot of data set.None of the models, except NEW, fit to Figure 5(a).   2 lists the MLEs, their standard errors of the parameters and goodness-of-fits statistics from the fitted models.Table 2 shows that the NEW model could be chosen as the best model among the fitted models since these models have the lowest values of the  * and  * statistics.Table 3 lists the MLEs, their standard errors of the parametersand goodness-offits statistics from the fitted models.Table 3 shows that the NEN model could be chosen as the best model among the fitted models since these models have the lowest values of the  * and  * statistics.The plots of the fitted densities and cdfs are given in Figure 7. P-P plots of all models are also drown in Figure 8.These plots show that the NEN model provides the good fit to these data compared to the other models.

Figure 2 :
Figure 2: Possible pdf and hrf plots of NEW distributions for some selected parameter values.

Figure 3 :
Figure 3: Simulation results of the new extended normal distribution.

Figure 4
displays plots of this density function for some parameter values.They reveal that the LNEW density can be very flexible for modeling left skewed and symmetric data.

Figure 4 :
Figure 4: Plots of the LNEW density for selected parameter values.

Figure 6 :
Figure 6: The fitted plots for the first data set.

Figure 7 :
Figure 7: The fitted plots for the second data set.

Figure 8 :
Figure 8: The P-P plots for the second data set.

Figure 9 Figure 9 :
Figure9displays the index plot of the modified deviance residuals and its Q-Q plot against to (0,1) quantiles for used data set.Based on Figure9,it is concluded that none of the values appears as a possible outlier.Therefore, the fitted model is statistically valid.

Table 1 : Empirical means and standard deviations (in parentheses) for the new extended Weibull distributions.
and Paraniaba et al. (2013).

Table 4
lists the MLEs of the model parameters of the LNEW and LW regression models fitted to the current data and the estimated minus log-likelihood values.Based on the figures in Table4, the LNEW regression model has the lower minus log-likelihood value than LW regression model.Therefore, it is concluded that LNEW regression model provides better fits than LW regression model for used data set.Based on the estimated regression parameters, note that  0 ,  1 and  2 is statistically significant at any significance level.

Table 4 :
MLEs of the parameters, their standard errors and -values, the estimated −.