In statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include. For example, given personal income together with years of schooling and on-the-job experience , we might specify a functional relationship as follows:
The statistician Sir David Cox has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".
Specification error and biasEdit
Specification error occurs when the functional form or the choice of independent variables poorly represent relevant aspects of the true data-generating process. In particular, bias (the expected value of the difference of an estimated parameter and the true underlying value) occurs if an independent variable is correlated with the errors inherent in the underlying process. There are several different possible causes of specification error; some are listed below.
An inappropriate functional form could be employed.
In the example given above relating personal income to schooling and job experience, if the assumptions of the model are correct, then the least squares estimates of the parameters and will be efficient and unbiased. Hence specification diagnostics usually involve testing the first to fourth moment of the residuals.
Building a model involves finding a set of relationships to represent the process that is generating the data. This requires avoiding all the sources of misspecification mentioned above.
One approach is to start with a model in general form that relies on a theoretical understanding of the data-generating process. Then the model can be fit to the data and checked for the various sources of misspecification, in a task called statistical model validation. Theoretical understanding can then guide the modification of the model in such a way as to retain theoretical validity while removing the sources of misspecification. But if it proves impossible to find a theoretically acceptable specification that fits the data, the theoretical model may have to be rejected and replaced with another one.
A quotation from Karl Popper is apposite here: "Whenever a theory appears to you as the only possible one, take this as a sign that you have neither understood the theory nor the problem which it was intended to solve".
Another approach to model building is to specify several different models as candidates, and then compare those candidate models to each other. The purpose of the comparison is to determine which candidate model is most appropriate for statistical inference. Common criteria for comparing models include the following: R2, Bayes factor, and the likelihood-ratio test together with its generalization relative likelihood. For more on this topic, see statistical model selection.
Akaike, Hirotugu (1994), "Implications of informational point of view on the development of statistical science", in Bozdogan, H. (ed.), Proceedings of the First US/JAPAN Conference on The Frontiers of Statistical Modeling: An Informational Approach—Volume 3, Kluwer Academic Publishers, pp. 27–38.
Asteriou, Dimitrios; Hall, Stephen G. (2011). "Misspecification: Wrong regressors, measurement errors and wrong functional forms". Applied Econometrics (Second ed.). Palgrave Macmillan. pp. 172–197.
Colegrave, N.; Ruxton, G. D. (2017). "Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data". Proceedings of the Royal Society B. 284 (1851): 20161850. doi:10.1098/rspb.2016.1850. PMC5378071. PMID 28330912.