Compound probability distribution

Summary

In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

The compound distribution ("unconditional distribution") is the result of marginalizing (integrating) over the latent random variable(s) representing the parameter(s) of the parametrized distribution ("conditional distribution").

Definition edit

A compound probability distribution is the probability distribution that results from assuming that a random variable   is distributed according to some parametrized distribution   with an unknown parameter   that is again distributed according to some other distribution  . The resulting distribution   is said to be the distribution that results from compounding   with  . The parameter's distribution   is also called the mixing distribution or latent distribution. Technically, the unconditional distribution   results from marginalizing over  , i.e., from integrating out the unknown parameter(s)  . Its probability density function is given by:

 

The same formula applies analogously if some or all of the variables are vectors.

From the above formula, one can see that a compound distribution essentially is a special case of a marginal distribution: The joint distribution of   and   is given by  , and the compound results as its marginal distribution:  . If the domain of   is discrete, then the distribution is again a special case of a mixture distribution.

Properties edit

General edit

The compound distribution   will depend on the specific expression of each distribution, as well as which parameter of   is distributed according to the distribution  , and the parameters of   will include any parameters of   that are not marginalized, or integrated, out. The support of   is the same as that of  , and if the latter is a two-parameter distribution parameterized with the mean and variance, some general properties exist.

Mean and variance edit

The compound distribution's first two moments are given by the law of total expectation and the law of total variance:

 

 

If the mean of   is distributed as  , which in turn has mean   and variance   the expressions above imply   and  , where   is the variance of  .

Proof edit

let   and   be probability distributions parameterized with mean a variance as

 
then denoting the probability density functions as   and   respectively, and   being the probability density of   we have
 
and we have from the parameterization   and   that
 
and therefore the mean of the compound distribution   as per the expression for its first moment above.


The variance of   is given by  , and

 
given the fact that   and  . Finally we get
 

Applications edit

Testing edit

Distributions of common test statistics result as compound distributions under their null hypothesis, for example in Student's t-test (where the test statistic results as the ratio of a normal and a chi-squared random variable), or in the F-test (where the test statistic is the ratio of two chi-squared random variables).

Overdispersion modeling edit

Compound distributions are useful for modeling outcomes exhibiting overdispersion, i.e., a greater amount of variability than would be expected under a certain model. For example, count data are commonly modeled using the Poisson distribution, whose variance is equal to its mean. The distribution may be generalized by allowing for variability in its rate parameter, implemented via a gamma distribution, which results in a marginal negative binomial distribution. This distribution is similar in its shape to the Poisson distribution, but it allows for larger variances. Similarly, a binomial distribution may be generalized to allow for additional variability by compounding it with a beta distribution for its success probability parameter, which results in a beta-binomial distribution.

Bayesian inference edit

Besides ubiquitous marginal distributions that may be seen as special cases of compound distributions, in Bayesian inference, compound distributions arise when, in the notation above, F represents the distribution of future observations and G is the posterior distribution of the parameters of F, given the information in a set of observed data. This gives a posterior predictive distribution. Correspondingly, for the prior predictive distribution, F is the distribution of a new data point while G is the prior distribution of the parameters.

Convolution edit

Convolution of probability distributions (to derive the probability distribution of sums of random variables) may also be seen as a special case of compounding; here the sum's distribution essentially results from considering one summand as a random location parameter for the other summand.[1]

Computation edit

Compound distributions derived from exponential family distributions often have a closed form. If analytical integration is not possible, numerical methods may be necessary.

Compound distributions may relatively easily be investigated using Monte Carlo methods, i.e., by generating random samples. It is often easy to generate random numbers from the distributions   as well as   and then utilize these to perform collapsed Gibbs sampling to generate samples from  .

A compound distribution may usually also be approximated to a sufficient degree by a mixture distribution using a finite number of mixture components, allowing to derive approximate density, distribution function etc.[1]

Parameter estimation (maximum-likelihood or maximum-a-posteriori estimation) within a compound distribution model may sometimes be simplified by utilizing the EM-algorithm.[2]

Examples edit

Similar terms edit

The notion of "compound distribution" as used e.g. in the definition of a Compound Poisson distribution or Compound Poisson process is different from the definition found in this article. The meaning in this article corresponds to what is used in e.g. Bayesian hierarchical modeling.

The special case for compound probability distributions where the parametrized distribution   is the Poisson distribution is also called mixed Poisson distribution.

See also edit

References edit

  1. ^ a b Röver, C.; Friede, T. (2017). "Discrete approximation of a mixture distribution via restricted divergence". Journal of Computational and Graphical Statistics. 26 (1): 217–222. arXiv:1602.04060. doi:10.1080/10618600.2016.1276840.
  2. ^ Gelman, A.; Carlin, J. B.; Stern, H.; Rubin, D. B. (1997). "9.5 Finding marginal posterior modes using EM and related algorithms". Bayesian Data Analysis (1st ed.). Boca Raton: Chapman & Hall / CRC. p. 276.
  3. ^ a b Lee, S.X.; McLachlan, G.J. (2019). "Scale mixture distribution". Wiley StatsRef: Statistics Reference Online. doi:10.1002/9781118445112.stat08201.
  4. ^ Gneiting, T. (1997). "Normal scale mixtures and dual probability densities". Journal of Statistical Computation and Simulation. 59 (4): 375–384. doi:10.1080/00949659708811867.
  5. ^ Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). Introduction to the theory of statistics (3rd ed.). New York: McGraw-Hill.
  6. ^ Andrews, D.F.; Mallows, C.L. (1974), "Scale mixtures of normal distributions", Journal of the Royal Statistical Society, Series B, 36 (1): 99–102, doi:10.1111/j.2517-6161.1974.tb00989.x
  7. ^ Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005). "6.2.2". Univariate discrete distributions (3rd ed.). New York: Wiley. p. 253.
  8. ^ Gelman, A.; Carlin, J. B.; Stern, H.; Dunson, D. B.; Vehtari, A.; Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Boca Raton: Chapman & Hall / CRC.
  9. ^ Lawless, J.F. (1987). "Negative binomial and mixed Poisson regression". The Canadian Journal of Statistics. 15 (3): 209–225. doi:10.2307/3314912. JSTOR 3314912.
  10. ^ Teich, M. C.; Diament, P. (1989). "Multiply stochastic representations for K distributions and their Poisson transforms". Journal of the Optical Society of America A. 6 (1): 80–91. Bibcode:1989JOSAA...6...80T. CiteSeerX 10.1.1.64.596. doi:10.1364/JOSAA.6.000080.
  11. ^ Johnson, N. L.; Kotz, S.; Balakrishnan, N. (1994). "20 Pareto distributions". Continuous univariate distributions. Vol. 1 (2nd ed.). New York: Wiley. p. 573.
  12. ^ Dubey, S. D. (1970). "Compound gamma, beta and F distributions". Metrika. 16: 27–31. doi:10.1007/BF02613934.

Further reading edit

  • Lindsay, B. G. (1995), Mixture models: theory, geometry and applications, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5, Hayward, CA, USA: Institute of Mathematical Statistics, pp. i–163, ISBN 978-0-940600-32-4, JSTOR 4153184
  • Seidel, W. (2010), "Mixture models", in Lovric, M. (ed.), International Encyclopedia of Statistical Science, Heidelberg: Springer, pp. 827–829, doi:10.1007/978-3-642-04898-2_368, ISBN 978-3-642-04898-2
  • Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974), "III.4.3 Contagious distributions and truncated distributions", Introduction to the theory of statistics (3rd ed.), New York: McGraw-Hill, ISBN 978-0-07-042864-5
  • Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005), "8 Mixture distributions", Univariate discrete distributions, New York: Wiley, ISBN 978-0-471-27246-5