|
Multivariate Solutions to Metabonomic Profiling and Functional Genomics, Part 2 |
|
|
|
|
Written by Henrik Antti, Elaine Holmes, Jeremy Nicholson, Imperial College, London
|
Simple, unsupervised chemometric methods obviously work well for data sets with a limited number of well-defined classes. However, biological systems are seldom simple and many of the biofluid datasets generated within metabonomics require more sophisticated statistical data analysis.
Consequently, various adaptations of principal components-, partial least squares- and neural network-based methods have been suggested to optimise the classification of toxicity or disease. Supervised chemometric procedures (i.e. those methods incorporating prior knowledge of class identity) work to maximize the separation between classes, rather than explaining maximum variation in the data, or to construct statistical boundaries round each class. In addition, the calculated models can be used to predict the class of independent samples based on a training set containing samples of known origin or class. Typically each database of spectra is divided into two sets, a training set and a validation set and predictive multivariate models are constructed based on the training data. Validation of the model can then be carried out by predicting the outcome for the independent set of validation samples. The ability to predict the class of toxicity for unknown compounds is of obvious importance to the Pharmaceutical industry, and has particular relevance to implementing efficient toxicity screening for lead compound selection and minimising attrition.
|