On Generating a New Family of Distributions Using the Tangent Function

In this paper, a method for generating a new family of univariate continuous distributions using the tangent function is proposed. Some general properties of this new family are discussed: hazard function, quantile function, Rényi and Shannon entropies, symmetry, and existence of the non-central n moment. Some new members as sub-families in the T − X family of distributions are provided. Three members of the new subfamilies are defined and discussed: the four-parameter Normal-Generalized hyperbolic secant distribution (NGHS), the four-parameter Gumbel-Generalized hyperbolic secant distribution (GGHS), and the fiveparameter Generalized Error-Generalized hyperbolic secant distribution (GEHS), the shapes of these distributions were found: skewed right, skewed left, or symmetric, and unimodal, bimodal, or trimodal. Finally, to demonstrate the usefulness and the capability of the distributions, two real data sets are used and the results are compared with other known distributions.


Introduction
Statistical distribution is a mathematical description of a random phenomenon in terms of the probabilities of events. Many methods were recently proposed and developed to generate new statistical distributions.
A class of beta-generated distributions were proposed and studied by Eugene, Lee and Famoye (Eugene et al., 2002). Their idea has been build depending on the property of the beta random variable lies in the interval (0,1), and they have defined the cumulative distribution function (CDF) of the beta-generated class by Many studies about the beta-generated class have been published by applying different in (1.2). Examples include: Beta-Gumbel distribution by Nadarajah and Kotz (2004), beta-exponential distribution by Nadarajah and Kotz (2006), Beta-Weibull distribution by The beta-generated class was extended by Jones (2009) and Cordeiro and de Castro (2011), that was by replacing the beta distribution ( ) in (1.1) by the Kumaraswamy distribution ( ) = −1 (1 − ) −1 , ∈ (0,1) (Kumaraswamy, 1980). The new Kumaraswamy-generated (Kw-G) class is given by   This family is depending on the quantile of the random variable , , where the random  variable has the CDF ( ). They defined ( ) = ( ) where 0 < < 1 which satisfies the conditions in (1.5). The CDF of the new family is defined as and the corresponding PDF (if it exists) is defined as In this paper, a new (•) function is used; this function is depending on the tangent function with (−∞, ∞) as the support of the random variable . In Section 2, the family of − distributions depending on the tangent function is defined and some of its general properties are discussed: hazard function, quantile function, Rényi and Shannon entropies, symmetry, and existence of the non-central ℎ moment. In Section 3, will define and discuss some new members as sub-families in the − family of distributions with different distributions: Normal− sub-family, Cauchy− subfamily, and Generalized Error− sub-family, also three members of the new sub-families are defined and discussed: the four-parameter Normal-Generalized hyperbolic secant distribution ( ), the four-parameter Gumbel-Generalized hyperbolic secant distribution ( ), and the five-parameter Generalized Error-Generalized hyperbolic secant distribution ( ). In Section 4, two real data sets are used to demonstrate the usefulness of this new family of distributions. In Section 5, the summary and conclusion. The programs were used to compute the results in sections 4.1 and 4.2 were written in 3.3.1 programming language (R Core Team, 2016). These codes are available to the reader from the author.

Generating new family of distributions using the tangent function
In this section, a new class of distributions will be proposed. Let be a continuous random variable with PDF ( ) and CDF ( ), defined on (−∞, ∞), and let be any continuous random variable with PDF ( ) and CDF ( ).
The CDF of the new − family of distributions is defined by The corresponding PDF of this family is given by Note: Aljarrah et al (2014) were mentioned about − {Cauchy} family in his Table 1.
The family in (2.1) is the same their family.
Denote is the random variable of with PDF , and is the random variable of with PDF . Since is any continuous random variable, so it can be easily derived many new − family of distributions.   where ℎ is the hazard function of the random variable with CDF ( ).
Theorem 1: Let 0 < < 1, the quantile function of the − family of distributions defined in (2.4) is given by where ( ) = −1 ( ) is the quantile function of the random variable with CDF ( ) and ( ) = −1 ( ) is the quantile function of the random variable with CDF ( ). Entropy has wide applications in science, engineering and probability theory, for a random variable , the entropy is a measure of the variation of the uncertainty. The Rényi entropy (the spectrum of Rény information) of order α, of a random variable with PDF ( ), is defined as , ≥ 0 and ≠ 1 (Rényi, 1961).
Let the random variable follow the − family of distributions defined in (2.4), then the Rényi entropy of the random variable , ( ), is given by The Shannon entropy is defined by Shannon (1948), and it is considered as a special case of the Rényi entropy when → 1. The Shannon entropy of a random variable with PDF ( ), is defined as (− log( ( ))).

Some sub-families of the − family of distributions with different distributions
The subfamilies from the − family of distributions defined in (2.4) can be gotten in two different ways: fix the random variable distribution and change the random variable distributions, and the other fix the variable distribution and change the random variable distributions.
In Table 1 above, the random variable has been fixed, and by changing the random variable distributions, one can be gotten several such sub-families. For example: let the random variable be normally distributed, we generate a sub-family of Normal− distributions.
In the following sub-sections, some properties of the following sub-families will be discussed: Normal− sub-family, Cauchy− sub-family, and Generalized Error− subfamily.

The Normal− Sub-Family
Lemma 1: Let 0 < < 1, the quantile function of the Normal− sub-family of distributions defined in (3.2) is given by where −1 ( ) = ( ) is the quantile function of the random variable with CDF ( ). Plots in Figure 1 show the density function for different parameter values, the distribution can be symmetric, right skewed, left skewed, unimodal or bimodal.  By binomial theorem: where ′ are the Euler numbers. For the odd-indexed all the Euler numbers are all zero, and for the even-indexed 0 = 1, 2 = −1, 4 = 5, 6 = −61, …. So, Hence, by substituting in (3.5), we obtain the result in (3.8).

The Gumbel− Sub-Family
Let the random variable One example on this family, let the random variable follow Generalized Hyperbolic Secant distribution with location parameter −∞ < < ∞ and scale parameter > 0 as in Section 3.1. From (3.9) we get where ∈ ℝ, , ∈ ℝ and , > 0, and = ( , , , )′. The CDF of is given by Plots in Figure 2 show the density function for different parameter values, the distribution can be right skewed, left skewed, unimodal or bimodal. where ′ are the Euler numbers, and * is the Euler-Mascheroni constant, and (•) is the exponential integral function.
Proof: By using the same steps used in proving Lemma 5 above, and by substituting in (3.13), we obtain the result in (3.16). Proof: Since the random variable has the Generalized Error distribution with parameters , , and, then its Shannon entropy, , is defined as + log(2 +1 Γ( + 1)). Now substitute in (2.9) we get the result (3.20).  Figure 3 show the density function for different parameter values, the distribution can be symmetric, right skewed, left skewed, unimodal, bimodal or trimodal.

Applications
We now consider two real numeric examples in order to demonstrate the usefulness of the distribution defined in (3.14) and the distribution defined in (3.22) in fitting data sets.

The famous old faithful Geyser eruption data
The famous Old Faithful Geyser eruption data ( = 272) obtained from Härdle (1991, p. 201), this data is the duration time of eruption (in minutes) taken during August 1 to August 15 ℎ , 1985 (Dekking et al., 2005), and it is available in faithful data within MASS package in 3.3.3 programming language (Venables and Ripley, 2002). Figure 3 shows the Old Faithful Geyser eruption data histogram; it can be shown this data has two distinct modes (bimodal).
A common approach for fitting such a bimodal data is by using mixture distributions (Aljarrah et al., 2014). Arellano-Valle et al. (2010) used epsilon-skew-normal distribution to fit this data, they have gotten the same fitting results comparing with the mixture-normal distribution fitting results. The four-parameter distribution defined in (3.1), the Mixture-Normal ( ) distribution, the five-parameter Normal-Weibull ( { }) distribution was defined by Aljarrah et al. (2014), and Beta-Normal ( ) distribution was defined by Eugene et al. (2002), are applied to fit the data using procedure.
To compare the models, Table 2 shows the estimates and their standard errors, loglikelihood values, (Akaike Information Criterion), (Consistent Akaike Information Criterion), (Durbin-Watson) test statistic, (Anderson-Darling) test statistic, and -(Kolmogorov-Smirnov) test statistic with its corresponding -value. In general, the smallest the values of: log-likelihood, , , , , and -, and the largest the value of the -corresponding -value, gives the best the fit to the data.
The results in Table 2 indicate that the four-parameter distribution outperforms the three distributions: , five-parameter { }, and distributions, and gives the best fit based on the all six measures: log-likelihood, , , , , and -statistic with its corresponding -value.
Plots of the probability density functions: , , five-parameter { }, and with estimate parameters versus the data, shown in Figure 4. The distribution can fit well wide variety of distribution shapes, including bimodal data such as Old Faithful Geyser eruption data.  Cook and Weisberg (1994) proposed the Australian athletes' data, this data contains 13 variables on 102 male and 100 female Australian athletes collected at the Australian Institute of Sport. This data is available in ais data within DAAG package in 3.3.3 programming language (John and W. John, 2015).  Mudholkar and Srivastava (1993). They were found the distribution provides the best fit comparing with the other compared distributions. The five-parameter distribution defined in (3.14), the distribution, the distribution, and distribution are applied to fit the data using procedure. To compare the models, Table 3 shows the estimates and their standard errors, loglikelihood values, , , test statistic, test statistic, and -test statistic with its corresponding -value. The results in Table 3 indicate that the five-parameter distribution outperforms the three distributions:

Australian athletes' data
, , and distributions, and gives the best fit based on the all six measures: log-likelihood, , , , , and -statistic with its corresponding -value.
Plots of the probability density functions: distribution, distribution, distribution, and distribution with estimate parameters versus the data, shown in Figure 5. The distribution can fit well wide variety of distribution shapes, including left skewed unimodal data such as the Australian athletes' data.

Figure 5
The PDFs for the heights data.

Summary and Conclusion
In this paper, we have proposed a method for generating a new family of univariate continuous distributions using the tangent function ( ( )) = + tan( ( ( ) − 1/2)) of the CDF ( ). In Table 1 we have presented a list of some examples of the − family of distributions based on the tangent function derived from different distributions with support (−∞, ∞).
Two new distributions in the family: four-parameter Gumbel-Generalized hyperbolic secant distribution ( ) and five-parameter Generalized Error-Generalized hyperbolic secant distribution ( ) are defined and some of their properties are given and discussed: quantiles, Shannon entropy, and existence of the ℎ raw moment with its upper bound. The shapes of these distributions were found: skewed right, skewed left, or symmetric, and unimodal, bimodal, or trimodal.
To illustrate and assess the flexibility of the distributions, the of the distribution for the Old Faithful Geyser eruption data is computed, this data is bimodal data, and it was fitted by using: Mixture-Normal distribution, five-parameter { } distribution, and distribution (Aljarrah et al., 2014). Furthermore, the distribution has been used to fitted the Australian athletes' data, whereas this data set is unimodal and left skewed and it was fitted by using: distribution, distribution, and distribution (Al-Aqtash et al., 2014). The and distributions have been found a very flexible and capable of fitting these data sets with the highest Loglikelihood value and the smallest , , , , and -values among the four distributions.