On Discriminating between Gamma and Log-logistic Distributions in Case of Progressive Type II Censoring

Gamma and log-logistic distributions are two popular distributions for analyzing lifetime data. In this paper, the problem of discriminating between these two distribution functions is considered in case of progressive type II censoring. The ratio of the maximized likelihood test (RML) is used to discriminate between them. Some simulation experiments were performed to see how the probability of correct selection (PCS) under each model work for small sample sizes. Real data life is analyzed to see how the proposed method works in practice. As a special case of progressive type II censoring, the problem of discriminating between gamma and log-logistic in case of complete samples is considered. The RML and the ratio of Minimized Kullback-Leibler Divergence (RMKLD) tests are used to discriminate between them. The asymptotic results are used to estimate the PCS which is used to calculate the minimum sample size required for discriminating between two distributions. Two real life data are analyzed.


Introduction
Choosing the correct or best-fitting distribution for a given data set is an important issue. Most of the times distribution functions may provide a similar data fit but still it is desirable to select the correct or more nearly correct model. The effect of choosing the wrong model has been attempted by many researchers as Cox (1961), Wiens (1999) and Pascual (2005).
Special attention has been given to discriminate some specific distribution functions, due to the increase of their applications. Cox (1962) suggested tests of separate families of hypotheses and applied his test to discriminate between lognormal and exponential distribution. Atkinson (1970) combined Cox's two hypotheses in a general model and applied his test to discriminate between lognormal and exponential distribution. The ratio of the maximized likelihood (RML) procedure has been applied in discriminating between distributions by many authors as , , Pereira (1977), Bain and Engelhardt (1980), Kappenman (1982), Firth

Maximum Likelihood Estimation for Gamma Distribution
The probability density function of the gamma distribution, denoted by GA(λ,α), with scale parameter λ > 0 and shape parameter α > 0 is given by > 0 > 0.
Let X 1:m:n ,…, X m:m:n be progressively type II censored sample from a two parameter Gamma distribution, with censoring scheme ) ,...,   Where Differentiating Equation (2.1) with respect to α and putting the derivative equal to zero we get is the digamma function .

Maximum Likelihood Estimation for Log-logistic Distribution
The probability density function of the log-logistic distribution, denoted by LL(ε,σ), with scale parameter ε > 0 and shape parameter σ > 0 is given by > 0 > 0.
Let X 1:m:n ,…, X m:m:n be progressively type II censored sample from a two parameter loglogistic distribution, with censoring scheme ) ,..., ( 1 m r r r  . The likelihood function is given by For simplicity of notation, we will use Differentiating Equation (2.4) with respect to  and putting the derivative equal to zero we get Therefore ˆ and ˆ can be obtained as a solutions of Equations (2.7) and (2.8).

The Ratio of the Maximized Likelihood (RML)
The ratio of the maximized likelihood (RML) is defined as In this discrimination procedure, choose

Numerical Experiment
In this section, the RML procedures are using for selecting between the Gamma and loglogistic distributions. A censoring scheme called progressive type II censoring is considered. The PCS's involved in the discrimination between the gamma and the loglogistic distributions based on likelihood ratio can be determined with more accuracy through simulated samples. For simplicities, the scale and shape parameters of the Gamma and log-logistic distributions can be fixed to some specific values without any loss of generality in assessing the relative performance of the RML procedure. i.
Firstly, when the true distribution is Gamma distribution computation of the PCS is performed as follows: By using the algorithm given by Balakrishna and Aggarwala (2000). The following steps are used to generate progressively Type-II right censored order statistics from Gamma distribution.

2.
For given values of the progressive censoring scheme R 1 , R 2 , ..., R m . Set

For a given values of the two parameters
Where is a progressively Type-II right censored sample of size m from the Gamma distribution.

5.
After maximum likelihood estimation, both the Gamma and the log-logistic distributions have fitted to the sample, and a realization t of the statistic , is calculated and stored.

6.
Steps 4 and 5 are repeated many times (in this study, the repetition was done 100 times).

7.
The approximate PCS under the assumption that the true distribution is Gamma is PCS GA =Pr[T>0]≈ (number of t values in step 5 > 0)/100. The result are given in Table1. ii. Secondly, when the true distribution is log-logistic distribution computation of the PCS is performed as follows  For a given values of the two parameters ε and σ is a progressively Type-II right censored sample of size m from the log-logistic dist.

5.
After maximum likelihood estimation, both the Gamma and the log-logistic distributions have fitted to the sample, and a realization t of the statistic is calculated and stored.

6.
Steps 4 and 5 are repeated many times (in this study, the repetition was done 100 times).

7.
The approximate PCS under the assumption that the true distribution is loglogistic is PCS LL =Pr[ T<0]≈ (number of t values in step 5 < 0)/100.The result is given in Table (2)

Data Analysis
Nelson [1982, p.105] presented data on the time (in minutes) to breakdown of an insulating fluid in an accelerated test at 34 kilovolts. This data is given in Table 3. Using Nelson data, the generation of progressively Type-II censored order statistics was illustrated by the example given in Balakrishnan and Aggarwala [2000]. Consider m=8 and the censoring scheme R= (0,0,3,0,3,0,0,5). The observations and censoring scheme are reported in Table 4. Therefore, by using the RML test to discriminate between the two distributions in case of progressive censoring Type II, the gamma model is chosen for this data set.

In Case of Complete Samples
As special case of progressive censoring, the problem of discriminating between Gamma and log-logistic distribution functions in case of complete samples when censoring scheme R i =0 is considered in this section.

Likelihood Ratio Test
Put R i =0 in (2.1) and (2.4) we get the following log likelihood functions for complete samples can be obtained as a solutions of these two equations Similarly, and can be obtained as a solutions of these two equations The logarithm of RML can be obtained as follows Here and are the maximum likelihood estimators of and respectively.
In this discrimination procedure, choose

Asymptotic Properties of the RML under Null Hypotheses
In this section, the asymptotic distributions of the RML statistics will be obtained under null hypotheses in two different cases. From now on the almost sure convergence will be denoted by a.s.
Proof of Lemma 1. The proof follows using similar arguments as of White (1982.theorem 1) and therefore it is omitted.

Lemma 2.
Under the assumption that the data are from LL (ε, σ), we have the following results as n ∞ a. where )) , Under the assumption that the data are from a Gamma distribution, the distribution of T is approximately normally distributed with mean E GA (T) and variance V GA (T). Proof: Using the central limit theorem and using (iii) of Lemma 1, one can easily shows that is asymptotically normally distributed.

Theorem 2.
Under the assumption that the data are from log-logistic distribution, the distribution of T is approximately normally distributed with mean E LL (T) and variance V LL (T). The proof follows along the same line as of Theorem 1.

Case (1):
The data are coming from a gamma distribution and the alternative is a loglogistic distribution.
Assumed that n data points x 1 ,x 2 ,…,x n are obtained from GA(λ, α) with scale parameter λ and shape parameter α.

On Discriminating between Gamma and Log-logistic Distributions in Case of Progressive Type II Censoring
Pak.j.stat.oper.res. Vol.XIII No.1 2017 pp157-183 169 Case (2): The data are coming from a log-logistic distribution and the alternative is a Gamma distribution.
Assumed that n data points x 1 ,x 2 ,…,x n are obtained from LL(ε, σ) with scale parameter ε and shape parameter σ. Now to obtain  ,  let us define If σ > 1 else undefined, Where

Determination of Sample Size
In this Section, we propose a method to determine the minimum sample size needed to discriminate between Gamma and log-logistic distributions, for a given user specified probability of correct selection (PCS). Intuitively, it is clear that if two distributions are very close, one needs a very large sample size to discriminate between them for a given probability of correct selection. On the other hand if two distribution functions are quite different, then one may not need very large sample size to discriminate between two distribution functions. It is also true that if two distribution functions are very close to each other, then one may not need to differentiate the two distributions.
It is expected that the user will specify beforehand the PCS and tolerance limit in terms of the distance between two distribution functions in terms of the distance between two distribution functions. Based on the PCS, the required minimum sample size can be determined. The tolerance limit simply indicates that the user does not want to make the distinction between two distribution functions if their distance is less than the tolerance limit. Based on the PCS and the tolerance limit, the required minimum sample size can be determined.
In section 3.2 the RML statistics follow normal distribution approximately for large n will be observed. Now it will be used with the help of K-S distance to determine the required sample size n.
Using Theorem 1 and since it is assumed that the data are coming from GA . In this case the probability of correct selection PCS GA is given by Also using Theorem 2 and since it is assumed that the data are coming from LL(ε, σ). In this case, the PCS LL is given by Where AM and AV denote the asymptotic mean and variance respectively, and Ф is the distribution function of the standard normal random variable.
Therefore, to determine the minimum sample size required to achieve at least α* protection level, equate PCS GA ( ) by PCS LL i.e., Let and * ) ) , ( From previous equations give n 1 and n 2 as Here z α is the α-th percentile point of a standard normal distribution. reported n 1 , with the help of Table(5) for different when the protection level α=0.7 in Table (7), reported n 2 , with the help of Table(6) for different when the protection level α=0.7, in Table(8).
Therefore, the minimum sample size n required to discriminate between Gamma and loglogistic distribution can be taken as max (n 1 ; n 2 ).
The distance between two distribution functions is defined by the K-S distance. The K-S distance between two distribution functions, say F(x) and G(x) is defined as K-S distance between GA(1,α) and LL( , ) reported with the help of If one knows the range of the shape parameter of the null distribution and for a given PCS that achieves a certain protection level P*, then the minimum sample size can be obtained by taking the maximum n obtained from Equations (3.12) and (3.13).
But unfortunately in practice the shape parameter may be completely unknown; therefore, the K-S distances can replace the unknown parameters in taking the decision. That is, for a given protection level P* and a given pre-specified tolerance limit D*, the minimum sample size can be obtained by taking the maximum n obtained from Equations (3.12) and (3.13). For example, suppose that for a given P*= 0.7 and for α = 2.2 and σ= 2.2, then from Tables 7 and 8 the minimum sample size required to discriminate between Weibull and log-logistic distributions is max (21, 125) =125.
On the other hand if and σ are unknown and suppose that the practitioner wants to discriminate between Gamma and log-logistic distribution functions only when the distance between them is greater than or equal 0.217, i.e., D* ≥ 0.217 and with P* = 0.7. Then from Tables 7 and 8, it is clear that, D* ≥ 0.217 if ≥ 1.2 and σ ≥ 2.2. Also, when the null distribution is Gamma, then for the tolerance limit D* ≥ 0.217, one needs n=5 to meet the PCS, P*= 0.7. Similarly when the null distribution is log-logistic then one needs n=125 to meet the same protection level. Finally, the minimum sample size required to discriminate between Gamma and log-logistic distributions with P*= 0.7 and D* ≥ 0.217 is max(5, 125) = 125 .   Notice that, Tables 7 and 8 are obtained for the protection level 0.7 but for other protection levels the tables can be easily modified. For example, if we need a sample size corresponding to protection level P*=0.9, then all the entries corresponding to the row of n, must be multiplied by

Numerical Experiments
In this section some experimental results is presented to examine how the asymptotic results derived in Section 3.2 behave for finite sample sizes. Moreover when the sample size is not sufficiently large, the PCS's involved in the discrimination between the gamma and the log-logistic distributions based on likelihood ratio can be determined with more accuracy through MC simulations sample. The PCSs obtained using simulations and based on the asymptotic results derived in Section 3.2 are compared.
Different sample sizes and different shape parameters of the null distributions are considered. The details are explained below. First case when the null distribution is gamma and the alternative is log-logistic. In this case consider n=20, 40, 60, 80, 100 and α=1.2, 1.5, 2, 2.2, 2.5 and 3. Computation of the PCS is performed as follows 1.
For a sample size n, a random sample {x 1 ,x 2 ,……,x n } is generated from a GA(1,α) distribution. , is calculated and stored.

3.
Steps 1 and 2 are repeated many times (in this study, the repetition was done 10,000 times).

4.
The approximate PCS under the assumption that the true distribution is gamma, is: PCS GA =Pr[ T>0] ≈ (number of t values in step 2 >0)/10,000 .
Also the PCS's obtained by using the asymptotic results as given in section 3.2 are computed. The results are reported in Table 9 in section 3.5.
Similarly, the results when the null distribution is log-logistic and the alternative is gamma are obtained. In this case considered the same set of n and σ= 2.2, 2.5, 3, 3.2, 3.5 and 4. The results are reported in Table 10 in section 3.5.

PCS of RMKLD
In probability and information theory, the Kullback-Leibler divergence (also information discrepancy, information gain, relative entropy, or KLD) is a non-symmetric measure of the difference (dissimilarity) between two probability distributions ) ". In other words, KLD is a measure of inefficiency of assuming that the distribution is ) is not the same as the measure from ) , then it can be conceptualized as a "directed/oriented distance" between the two models.
The KLD is a natural distance function between models. It is usually used as a logical basis for model selection in conjunction with likelihood inference. Values of KLD are not based on only the mean and variance of the distributions; rather, the distributions in their entirety are the subject of comparison. The later is regarded an advantage of the KLD as a test statistic. The KLD is defined as follow ) ≥ 0 and the equality holds if and only if ) " is preferred and large values of KLD favor " ) ( 2 x f ". However, the KLD (based test statistic) is considered as a ruler to measure the similarity between the two hypotheses / distributions.
The test statistic is defined as the natural logarithm of two ratios of KLDs. The idea is similar to RML, in which our interested to select a model maximizing the likelihood. But in KLD method our interested to select a model minimizing the KLD, That's why it named as ratio of minimized KLD (RMKLD) which is defined as follow In this Section how the proposed RMKLD test statistic work for different parameters and sample sizes is presented.
First considered case 1 when the null distribution is gamma. The RMKLD testing procedure is introduced as follows 1.
For a sample size n, a random sample {x 1 ,x 2 ,……,x n } is generated from a GA(1,α) distribution.

2.
Calculate is calculated and stored.

4.
Steps 2 and 3 are repeated many times (in this study, the repetition was done 10,000 times).

5.
Compute the percentage of the times if RMKLD < 0 as the PCS of gamma distribution.
PCS GA = (number of RMKLD values in step 2 < 0)/10,000 The results are reported in Table 9.
Similarly, the PCS LL when the null distribution is log-logistic and the alternative is gamma are obtained. Results are reported in Table 10.  using asymptotic results in case of ratio of the maximum likelihood functions denoted by RMLAC. The element in third row represent the RMKLD. The element in the first row in each box represents the results based on Simulations (10,000 replications) in case of ratio of the maximum likelihood functions denoted by RMLS and the number in bracket immediately below represents the result obtained by using asymptotic results in case of ratio of the maximum likelihood functions denoted by RMLAC. The element in third row represent the RMKLD.
It is quite clear from Tables 9 and 10 that as the sample size increases the PCS increases as expected. It is also clear that as the shape parameter moves away from 1, the PCS increases. Even when the sample size is 20.
Interestingly, when the null distribution is gamma distribution, then the PCS based on MC simulation, AR and RMKLD is found to be significantly higher than the other case particularly for small sample sizes. For example, when the sample size is 20, and null distribution is gamma, the PCS for MC, AR and RMKLD are 0.77, 0.75 and 0.83 respectively. But when the null distribution is log-logistic, for the same sample size the PCS for MC, AR and RMKLD are 0.51, 0.47 and 0.57 respectively.
The asymptotic results work reasonable well for both the distributions and for all possible ranges of the parameters. From the simulation study it is recommended that the asymptotic results can be used quite effectively even when the sample size is as small as 20 for all possible choices of the shape parameters. Moreover, as shape parameter increases, the PCS increases.
Also from Tables 9 and 10 notice that, the RMKLD works better than RML, because it generates higher PCS (about 3-9 %) in particular for small sample size. In other words, the error type I for RMKLD is remarkably less, compared to the RML one. For instance, consider in table 9 the case at α= 1.2. The PCS for RMKLD is 83% and 77% for RML which indicates the error type I equals to 17% for RMKLD and 23% for RML. also consider in table 10 the case σ= 2.2 The PCS for RMKLD is 57% and 51% for RML which indicates the error type I equals to 43% for RMKLD and 49% for RML It also found that both methods behave similarly, for example, as sample size increases the PCS capture higher values, as expected.

Data Analysis
For illustrative purposes two data sets will be analyzed using RML method and some goodness of fit tests.


Therefore, it is nearly shows that the fitted gamma is much closer to the empirical distribution function than the log-logistic distribution. Results are reported in Table 11.


Therefore, it is nearly shows that the fitted logistic is much closer to the empirical distribution function than the gamma distribution. Results are reported in Table 12.

Conclusion
The problem of discriminating gamma and log-logistic is considered in case of progressive type II censoring. The RML test is used to discriminate between them. Some simulation experiments were performed to see how the PCS under each model work for small sample sizes. It can be observed that the PCS works quite well even for small sample sizes. Interestingly, when Gamma is the true distribution, then the PCS based on simulation is found to be significantly higher than the other case particularly for small sample sizes. Also using real data life and applying RML test we found that gamma model is chosen for this data set.
As a special case of progressive type II censoring when censoring scheme R i = 0, The problem of discriminating between gamma and log-logistic in case of complete samples is considered. The RML and the RMKLD are used to discriminate between them. The asymptotic results are used to estimate the PCS which is used calculate the minimum sample size required for discriminating between two distributions. The PCS using simulations with the asymptotic results and RMKLD is compared and it is observed that even when the sample size is very small the asymptotic results work quite well for a wide range of the parameter space. Also it noticed that when the null distribution is gamma distribution, then the PCS is found to be significantly higher than the other case particularly for small sample sizes. Two Real data life are analyzed and applying RML test we found that gamma model is chosen for these two data sets.