Generalized Poisson-Lindely Distribution in Promotion Time Cure Model

Long-term survival analysis has been improved in the last decade and most of the models concentrate on the promotion time cure model that proposed by Chen (1999). These models are based on the distribution of latent variable N, number of initiated node cells. In this paper we proposed a Generalized PoissonLindely distribution that is another option instead of Negative Binomial distribution when there is overdispersion. The results indicated a better fitness compared to others, because of its more flexibility. Parameter estimation has been done by Bayesian approach, in a real data set and a simulation study has shown the advantages of proposed model.


Introduction
For analyzing count data with over dispersion, it's common to use Negative Binomial (NB) distribution instead of Poisson (P) distribution.It is straightforward that, when the parameter of the Poisson distribution has Gamma distribution, Negative Binomial distribution is obtained.Another choice for Gamma distribution is Lindely or Generalized Lindely Distribution.So the result is called Poisson Lindely (PL) or Generalized Poisson Lindely (GPL) Distribution.
In this paper we used the GPL in the long-term survival analysis.In the Long-term survival analysis two broad classes of Models are used.The first one which has been introduced by Boag (1949) and Brekson and Gage (1952), is called mixture model and the second one has been introduced by Yakovlev and Tsodikov (1996) and Chen et al. (1999) is called non-mixture cure model or promotion time cure model in cancer relapse setting, assume that lymph node cells act as competing causes to produce the detectable tumor cells.Cooner et al., (2007) generalized this approach to a flexible class of cure models under different latent activation schemes.Several authors considered different distribution for the number of competing cause such as Poisson, Geometric, Negative Binomial, Conway-Maxwell Poisson or generalized power series distribution (see e.g.Chen et al., 1999;Cooner et al., 2007;Cancho et al. 2011;Rodrigues et al. 2009;Borgers et al. 2012).
In this paper, Generalized Poisson-Lindely Distribution was proposed for the number of lymph node cells in the promotion time cure model for obtain a more flexible model to fit a published data set.
A Bayesian framework was assumed for parameter estimation since the posterior distributions do not have a close form and because of complex structure of the model, the Markov chain Monte Carlo (MCMC) methods were employed for the purpose.In other to compare the models, the deviance information criteria (DIC) were applied, as a result of which the smallest value has shown the better fitness.
The rest of this paper is organized as follows.In the next section we introduced Generalized Pisson-Lindely distribution.In third section GPL distribution was proposed in promotion time cure model.Statistical modeling and parameter estimation were discussed in Section 4. Section 5 was devoted to the application of the model in cutaneous melanoma data set and simulation study.Results were discussed and concluded in final Section.

Generalized Poisson Lindely Distribution:
The Generalized Lindely distribution has been introduced by Zakerzadeh and Dolati (2010) with the probability density function; They mentioned that this distribution can be replaced instead of the Gamma and Weibull distribution for analyzing lifetime or skewed data.
Suppose X| P( ) & ( ) Then ( ) with the density function is given by; While is the scale and the shape parameter.It is obvious that if , this distribution reduce to the Lindely Poisson distribution that have been shown by Ghitany et al., (2008) which in many ways, is a better distribution to model count data.Ghitany and Al-Mutari have done a complete comparison between Negative Binomial and Poisson Lindely distribution.For more detail see (12).
To comparison with the NB distribution proposed by Cancho et al., (2011) and well known P distribution, please refer to table 1 for their properties.
=0 and the limitation of it when the is so this distribution is equal-dispersed for large enough amount of .Negative Binomial is overdispersed for all value of and and equal-dispersed if

Promotion time cure Model with Generalized Poisson-Lindely Distribution:
In the promotion time cure rate model has been introduced by Cooner (2007), in the first activation scheme, If N is the number of competing causes (lymph nodes that remain actives after treatment), ., are time for the jth competing causes to produce the detectable tumor cells and the observable time to event is defined as also N is independent of ., , the survival function for population can be obtained by; is the Probability Generating Function of the N, so the survival function for population is given by; And the density function is given by; Where S(t) is the survival function of promotion time of N lymph nodes that can be any of the common survival function like Weibull, Piece Wise ,….In this model the cure fraction should be ( ) ( ) ( ), where ( ) so the relation between the covariates and the cure rate like the Poisson model is direct.For example with increased in the coefficient covariate the cure rate is increasing.

Statistical method 4.1 Likelihood function
Suppose that there are n subjects and let N i be the number of lymph nodes representing the number of competing causes that can produce a detectable tumor cells for the ith subject.Let Tj and Cj denote, respectively, the observable lifetime and the censored time for the ith subject, such that * + and is an indicator function that if and if .So for the ith individual, our observed data D obs = {Ti, δi ,Xi } where X i is a matrix containing covariates.We assumed that the N i , i= 1,2,…,n, are independent generalized Poisson Lindely variables with probability function given by ( 2), with and given N i =n i the promotion times ., are independent with Weibull distribution described by (7).The corresponding Likelihood function under right censor is given by; Where the survival function of the Weibull distribution is as the following;

Parameter Estimation:
For parameter estimation we employed the Bayesian approach using the MCMC methods.We take non-informative prior in order to the likelihood function dominate the posterior distribution.Without lost generality, we supposed that the prior distributions are independent.
For we consider ( ) a uniform improper prior, for has considered Normal distribution which ( ) and for and have considered Gamma distribution which ( ) and ( ).Combining these prior distribution with the likelihood function of the posterior distribution of ( ) obtains to be; Because of analytically intractable of the joint posterior density in equation ( 8), we applied the Markov Chain Monte Carlo (MCMC) simulations, carried out with Metropolis Hastings algorithm.[6] For Bayesian estimates were calculated for each parameter using the samples drawn from conditional posterior distributions, which usually derived from the marginal distributions obtained from the joint distribution of parameters given the observations.In this model, posterior joint distribution of the parameters takes a complicated form and it is too difficult to derive the posterior marginal distribution of each parameter.Hence, a Markov chains are good tools to approximate the distribution of interest.Sampling from such a Markov chain after an adequate burn-in period yields good approximations of model parameters.In this study, the Metropolis algorithm and Gibbs sampling method are implemented by a specific Winbugs (1.4) program [14].

Model Comparison Criteria:
In order to compare these models, the DIC was computed for each model.DIC, which was proposed by Spiegelhalter et al. [15], is one of the best criterions for the comparison of Bayesian models [25].Let θ be the vector of model parameters DIC defined by the expression DIC =D(θ)+p D =2D(θ)+p D , where D(θ) is the deviance of the model which evaluated at the posterior mean estimate .θand D(θ) is the posterior mean of the deviance which is derived from the average of the logarithm of likelihood after the burn-in period and denote the goodness of fitness.Where difference between the posterior mean of the deviance and the deviance of the posterior mean of the vector of parameters of interest, which represents the number of parameters effectiveness in the model, so it is an indicator of model complexity.Based on this measure, the model with a smallest DIC value is known to be the best one.

Cutaneous Melanoma
We used the cutaneous melanoma data set that is available in the homepage of Ibrahim book (2001) [10].This data set contains 427 patients for the evaluation of postoperative treatment with a high dose of interferon alfa-2beta in order to prevent recurrence in the period 1991 until 1998.10 subjects were excluded because tumor thickness data were missing.The observed time (T) ranges from 0.15 to 7.01 years (3.18 ).In this data set 55.6 percent of observation was censored.The most important covariate that is important and significant in several models was nodule category (1: n=82; 2: n=87; 3: n=137; 4: n=111).We considered this covariate as a categorical variable and defined 3 dummy variables to handle this covariate.For parameter estimation we proposed for scale and shape parameters of promotion time a normal prior with and gamma prior with .For shape parameter of the Generalized Poisson-Lindely, gamma prior with .
The self write codes were written in WinBugs.The 50000 iterations were run and a sample was recorded every 10 iteration to reduction of autocorrelation within chain after 10,000 burn-ins.The results of this analysis of 3 models (Poisson, Negative Binomial and Generalized Lindely-Poisson distribution) have been shown in table 2.
The credible intervals for the does not include zero, so there is evidence that the cure rate is different in different categorical.To Compare these model we used the DIC criteria.According to this criterion, the best model should have fewer amounts.These criteria for the P, NB and GPL are 1036.9,1029.9, and 1026.6;therefore the GPL is the best model.The percent of cure rate based on the categorical nodule parameter of these models have shown in table 3. The figure 1 has showed the K-M estimates of the survival function and the Bayesian estimation of the survival function based on the different models.The best fitness of the GPL model has been emphasized.

Simulation
To assess the performance of our new model, we conducted a simulation study and generated a data set that was subsequently analyzed by fitting the model.We employed these steps for simulation as the following: Step1, Generate a dummy variable from the Bernoulli distribution with p=0.5.
For we have ( ) when x=0 and ( ) when x=1 so that the cure rate in each group are 23.1 and 63.4 if .
Step2, generate data from the GPL with parameters obtain from step1.
We generated 500 samples with 50 times repetitions and estimated the parameters.The simulation program has been written in R Package and then recorded them to Winbugs in order to obtain the parameters estimation.In Table 4, the posterior mean and standard deviation averaged was shown for each regression parameter.We can see that the posterior means of the parameters are quite close to the true values, indicating that the MCMC chains converged properly.

Discussion:
In this paper we introduced another option instead of Negative Binomial distribution to overcome over-dispersion problem in promotion time cure model.This distribution is called generalized Lindely-Poisson distribution that introduce by Mahmoudi and Zakerzadeh (2010).Not only the variance of this distribution related to the shape parameter, but also the mean of its.This cause more flexible model to analyze complex data set.
This data set was analyzed by Cancho et al. (2011Cancho et al. ( , 2012) which used the Negative Binomial and Conway-Maxwell poisson distribution.They mentioned that when using the nodule covariate like a categorical covariate, the DIC is increasing a little.Due to this fact that cure model is providing the cure rate, in this study we considered the covariates as the categorical variables to aim this purpose.
We proposed a new way to simulate cure rate data, that was different from the way of Cancho et. al (2011).In this method we used the latent mechanism in which, the initial model has produced from it.As mentioned before, Generalized Lindely-Poisson reduces to the Lindely-Poisson when the , since in table 1 the estimate of isn't different from one 1, but when we used the Poisson-Lindely distribution, the DIC increased to 1032.3, so that we take the GPL to interpret this data.Mahmoudi and Zakerzadeh (2010) have mentioned that the Generalized-Lindely distribution is a two component mixture gamma distribution is given by; Therefore it's not so amazing that the result of the NB and GPL is not so different.
comparison, we considered the numbers of lymph nodes have Negative Binomial and Poisson distribution that are discussed by Cancho et al., (2011) and Chen et al. (1999).