|
Prediction in linear models |
|
|
|
|
Written by Agnar Höskuldsson, DTU
|
Chemometrics uses an empirical approach to modelling of data. One can say that data is used to generate the model. This is the main criticism from the community of statisticians. It is in the nature of natural sciences to argue for a mathematical model, and use data to estimate unknown parameters in the model.
The estimation methods should be methodologically correct and provide with estimates having good properties. If the parameters in the model are not significant, they may or should be excluded from the model. Chemometricians say that this approach is not rational, when working with industrial data and most types of scientific data. The argument is that data typically has reduced rank, which makes it not feasible to estimate the unbiased or exact parameters in the model. It is suggested to find the latent structure in data and use it to provide with predictions. It is recommended to validate the model in different ways; both study the inherent features of the latent structure by graphic methods and to test the model by crossvalidation procedures. The argument is that latent structure usually provides with better predictions than corresponding ‘full rank’ model estimated by traditional statistical methods. This issue of better performance is considered closer
here.
|