In statistics and research, internal consistency is typically a measure based on the correlations between different items on the same test (or the same subscale on a larger test). It measures whether several items that propose to measure the same general construct produce similar scores. For example, if a respondent expressed agreement with the statements "I like to ride bicycles" and "I've enjoyed riding bicycles in the past", and disagreement with the statement "I hate bicycles", this would be indicative of good internal consistency of the test.
Internal consistency is usually measured with Cronbach's alpha, a statistic calculated from the pairwise correlations between items. Internal consistency ranges between negative infinity and one. Coefficient alpha will be negative whenever there is greater within-subject variability than between-subject variability.[1]
A commonly accepted rule of thumb for describing internal consistency is as follows:[2]
Cronbach's alpha | Internal consistency |
---|---|
0.9 ≤ α | Excellent |
0.8 ≤ α < 0.9 | Good |
0.7 ≤ α < 0.8 | Acceptable |
0.6 ≤ α < 0.7 | Questionable |
0.5 ≤ α < 0.6 | Poor |
α < 0.5 | Unacceptable |
Very high reliabilities (0.95 or higher) are not necessarily desirable, as this indicates that the items may be redundant.[3] The goal in designing a reliable instrument is for scores on similar items to be related (internally consistent), but for each to contribute some unique information as well. Note further that Cronbach's alpha is necessarily higher for tests measuring more narrow constructs, and lower when more generic, broad constructs are measured. This phenomenon, along with a number of other reasons, argue against using objective cut-off values for internal consistency measures.[4] Alpha is also a function of the number of items, so shorter scales will often have lower reliability estimates yet still be preferable in many situations because they are lower burden.
An alternative way of thinking about internal consistency is that it is the extent to which all of the items of a test measure the same latent variable. The advantage of this perspective over the notion of a high average correlation among the items of a test – the perspective underlying Cronbach's alpha – is that the average item correlation is affected by skewness (in the distribution of item correlations) just as any other average is. Thus, whereas the modal item correlation is zero when the items of a test measure several unrelated latent variables, the average item correlation in such cases will be greater than zero. Thus, whereas the ideal of measurement is for all items of a test to measure the same latent variable, alpha has been demonstrated many times to attain quite high values even when the set of items measures several unrelated latent variables.[5][6][7][8][9][10][11] The hierarchical "coefficient omega" may be a more appropriate index of the extent to which all of the items in a test measure the same latent variable.[12][13] Several different measures of internal consistency are reviewed by Revelle & Zinbarg (2009).[14][15]