The diehard tests are a battery of statistical tests for measuring the quality of a random number generator. They were developed by George Marsaglia over several years and first published in 1995 on a CD-ROM of random numbers.[1] In 2006, the original diehard tests were extended into the dieharder
tests.[2]
An initial battery of randomness tests for RNGs was suggested in the 1969 first edition of The Art of Computer Programming by Donald Knuth (Volume 2, Chapter 3.3: Statistical tests). Knuth's tests were then supplanted by George Marsaglia's Diehard tests (1996) consisting of fifteen different tests. The inability to modify the test parameters or add new tests led to the development of the TestU01 library, introduced in 2007 by Pierre L’Ecuyer and Richard Simard of the Université de Montréal.
Most of the tests in DIEHARD return a p-value, which should be uniform on [0,1) if the input file contains truly independent random bits. Those p-values are obtained by p = F(X), where F is the assumed distribution of the sample random variable X – often normal. But that assumed F is just an asymptotic approximation, for which the fit will be worst in the tails. Thus you should not be surprised with occasional p-values near 0 or 1, such as 0.0012 or 0.9983. When a bit stream really FAILS BIG, you will get ps of 0 or 1 to six or more places. Since there are many tests, it is not unlikely that a p < 0.025 or p > 0.975 means that the RNG has "failed the test at the 0.05 level". We expect a number of such events ps happen among the hundreds of events DIEHARD produces, even conditioned on the random number generator being perfect.