KNOWPIA
WELCOME TO KNOWPIA

In statistics, **completeness** is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. It is opposed to the concept of an ancillary statistic. While an ancillary statistic contains no information about the model parameters, a complete statistic contains only information about the parameters, and no ancillary information. It is closely related to the concept of a sufficient statistic which contains all of the information that the dataset provides about the parameters.^{[1]}

Consider a random variable *X* whose probability distribution belongs to a parametric model **P**_{θ} parametrized by *θ*.

Say *T* is a statistic; that is, the composition of a measurable function with a random sample *X*_{1},...,*X*_{n}.

The statistic *T* is said to be **complete** for the distribution of *X* if, for every measurable function *g,*^{[2]}

The statistic *T* is said to be **boundedly complete** for the distribution of *X* if this implication holds for every measurable function *g* that is also bounded.

The Bernoulli model admits a complete statistic.^{[3]} Let *X* be a random sample of size *n* such that each *X*_{i} has the same Bernoulli distribution with parameter *p*. Let *T* be the number of 1s observed in the sample, i.e. . *T* is a statistic of *X* which has a binomial distribution with parameters (*n*,*p*). If the parameter space for *p* is (0,1), then *T* is a complete statistic. To see this, note that

Observe also that neither *p* nor 1 − *p* can be 0. Hence if and only if:

On denoting *p*/(1 − *p*) by *r*, one gets:

First, observe that the range of *r* is the positive reals. Also, E(*g*(*T*)) is a polynomial in *r* and, therefore, can only be identical to 0 if all coefficients are 0, that is, *g*(*t*) = 0 for all *t*.

It is important to notice that the result that all coefficients must be 0 was obtained because of the range of *r*. Had the parameter space been finite and with a number of elements less than or equal to *n*, it might be possible to solve the linear equations in *g*(*t*) obtained by substituting the values of *r* and get solutions different from 0. For example, if *n* = 1 and the parameter space is {0.5}, a single observation and a single parameter value, *T* is not complete. Observe that, with the definition:

then, E(*g*(*T*)) = 0 although *g*(*t*) is not 0 for *t* = 0 nor for *t* = 1.

This example will show that, in a sample *X*_{1}, *X*_{2} of size 2 from a normal distribution with known variance, the statistic *X*_{1} + *X*_{2} is complete and sufficient. Suppose (*X*_{1}, *X*_{2}) are independent, identically distributed random variables, normally distributed with expectation *θ* and variance 1.
The sum

is a **complete statistic** for *θ*.

To show this, it is sufficient to demonstrate that there is no non-zero function such that the expectation of

remains zero regardless of the value of *θ*.

That fact may be seen as follows. The probability distribution of *X*_{1} + *X*_{2} is normal with expectation 2*θ* and variance 2. Its probability density function in is therefore proportional to

The expectation of *g* above would therefore be a constant times

A bit of algebra reduces this to

where *k*(*θ*) is nowhere zero and

As a function of *θ* this is a two-sided Laplace transform of *h*(*X*), and cannot be identically zero unless *h*(*x*) is zero almost everywhere.^{[4]} The exponential is not zero, so this can only happen if *g*(*x*) is zero almost everywhere.

By contrast, the statistic is sufficient but not complete. It admits a non-zero unbiased estimator of zero, namely

Suppose Then regardless of the value of Thus is not complete.

For some parametric families, a complete sufficient statistic does not exist (for example, see Galili and Meilijson 2016 ^{[5]}).

For example, if you take a sample sized *n* > 2 from a *N*(*θ*,*θ*^{2}) distribution, then is a minimal sufficient statistic and is a function of any other minimal sufficient statistic, but has an expectation of 0 for all *θ*, so there cannot be a complete statistic.

If there is a minimal sufficient statistic then any complete sufficient statistic is also minimal sufficient. But there are pathological cases where a minimal sufficient statistic does not exist even if a complete statistic does.

The notion of completeness has many applications in statistics, particularly in the following two theorems of mathematical statistics.

**Completeness** occurs in the Lehmann–Scheffé theorem,^{[6]}
which states that if a statistic that is unbiased, **complete** and sufficient for some parameter *θ*, then it is the best mean-unbiased estimator for *θ*. In other words, this statistic has a smaller expected loss for any convex loss function; in many practical applications with the squared loss-function, it has a smaller mean squared error among any estimators with the same expected value.

Examples exists that when the minimal sufficient statistic is **not complete** then several alternative statistics exist for unbiased estimation of *θ*, while some of them have lower variance than others.^{[7]}

See also minimum-variance unbiased estimator.

**Bounded completeness** occurs in Basu's theorem,^{[8]} which states that a statistic that is both **boundedly complete** and sufficient is independent of any ancillary statistic.

**Bounded completeness** also occurs in Bahadur's theorem. In the case where there exists at least one minimal sufficient statistic, a statistic which is sufficient and boundedly complete, is necessarily minimal sufficient.
Another form of Bahadur's theorem states that any sufficient and boundedly complete statistic over a finite-dimensional coordinate space is also minimal sufficient.^{[9]}

**^**Casella, George; Berger, Roger W. (2001).*Statistical inference*. CRC Press. ISBN 978-1-032-59303-6.**^**Young, G. A. and Smith, R. L. (2005). Essentials of Statistical Inference. (p. 94). Cambridge University Press.**^**Casella, G. and Berger, R. L. (2001). Statistical Inference. (pp. 285–286). Duxbury Press.**^**Orloff, Jeremy. "Uniqueness of Laplace Transform" (PDF).**^**Tal Galili; Isaac Meilijson (31 Mar 2016). "An Example of an Improvable Rao–Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator".*The American Statistician*.**70**(1): 108–113. doi:10.1080/00031305.2015.1100683. PMC 4960505. PMID 27499547.**^**Casella, George; Berger, Roger L. (2001).*Statistical Inference*(2nd ed.). Duxbury Press. ISBN 978-0534243128.**^**Tal Galili; Isaac Meilijson (31 Mar 2016). "An Example of an Improvable Rao–Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator".*The American Statistician*.**70**(1): 108–113. doi:10.1080/00031305.2015.1100683. PMC 4960505. PMID 27499547.**^**Casella, G. and Berger, R. L. (2001). Statistical Inference. (pp. 287). Duxbury Press.**^**"Statistical Inference Lecture Notes" (PDF). July 7, 2022.

- Basu, D. (1988). J. K. Ghosh (ed.).
*Statistical information and likelihood : A collection of critical essays by Dr. D. Basu*. Lecture Notes in Statistics. Vol. 45. Springer. ISBN 978-0-387-96751-6. MR 0953081. - Bickel, Peter J.; Doksum, Kjell A. (2001).
*Mathematical statistics, Volume 1: Basic and selected topics*(Second (updated printing 2007) of the Holden-Day 1976 ed.). Pearson Prentice–Hall. ISBN 978-0-13-850363-5. MR 0443141. - E. L., Lehmann; Romano, Joseph P. (2005).
*Testing statistical hypotheses*. Springer Texts in Statistics (Third ed.). New York: Springer. pp. xiv+784. ISBN 978-0-387-98864-1. MR 2135927. Archived from the original on 2013-02-02. - Lehmann, E.L.; Scheffé, H. (1950). "Completeness, similar regions, and unbiased estimation. I."
*Sankhyā: the Indian Journal of Statistics*.**10**(4): 305–340. doi:10.1007/978-1-4614-1412-4_23. JSTOR 25048038. MR 0039201. - Lehmann, E.L.; Scheffé, H. (1955). "Completeness, similar regions, and unbiased estimation. II".
*Sankhyā: The Indian Journal of Statistics*.**15**(3): 219–236. doi:10.1007/978-1-4614-1412-4_24. JSTOR 25048243. MR 0072410.