header image
Home arrow Tutorial arrow Chemometrics in Metabonomics and Metabolomics
Chemometrics in Metabonomics and Metabolomics PDF Print E-mail
Written by Johan Trygg, Umeå University & Torbjörn Lundstedt, AcurePharma   

In the post-genomics era, the use of methodologies that enable transcriptomic,proteomic and metabolomic data to be analysed in detail have revolutionized biological investigations. One of the major advantages with metabolomics investigations compared to traditional target metabolite analysis is that metabolomics data can give an unbiased view of changes in metabolism during environmental,genetic or developmental changes. Instead of tracking only a few metabolites, changes in relative amounts in 300 to 1000 or even more metabolites can be recorded and analysed, covering all major metabolic pathways.

This development has accentuated the need to apply and further develop chemometric methodology.

However, in biology, chemometric methodology has been largely overlooked in favor of traditional statistics. It is not until recently that the overwhelming size and complexity of the ‘omics’ technologies has driven biology toward the adoption of chemometric methods. This includes efficient and robust methods for modeling and analysis of complicated chemical/biological data tables that produce interpretable and reliable models capable of handling incomplete, noisy, and collinear data structures. These methods include principal component analysis(PCA) [Ref 1], partial least squares (PLS) [Refs 3,4] and Orthogonal PLS (OPLS), [Refs 5-7]. It is also important to stress that chemometrics also provides a means of collecting relevant information through Design of Experiments(DOE) [Refs 8-10].

The underlying philosophy of chemometrics, in combination with the chemometrical toolbox, can efficiently be applied throughout a metabonomic study. The philosophy is needed already from the start of a study through the whole process to the biological interpretation.

Chemometric Approach to Metabonomic Studies

Step 1: Define the Aim.
It is important to formulate the objectives and goals of the metabonomic study.

Step 2: Selection of Objects, e.g. using Multivariate Design.
The selection of the objects (e.g., samples, individuals) needs to span the experimental domain in a balanced and systematic manner.To be able to do this, we have to characterize the objects with both measured and observed descriptors.This collected information represents a multivariate profile (with K-descriptors) for each object that is a fingerprint of its inherent properties.

Step 3: Sample Preparation and Characterization.
In metabonomics, it is important to keep the experimental and biological variation at a minimum. At the same time, the metabolic analysis should be global, quantitative, robust, reproducible, accurate, and interpretable. Here, statistical design of experiments represents an important strategy to systematically investigate factors and optimize the experimental protocols. Typical working procedures for NMR spectroscopy for biofluids and tissue extraction are found in Appendix 4, in the SMRS Policy document[Ref 11]. For GC-MS, see [refs 12,13].


Step 4: Evaluation of the Collected Data.
Class Specific Studies
Most of the published papers within the field are dealing with classification problems such as disease diagnosis or treated versus control, that is, to identify a group of control observations and another group of observations known to have a specific disease.

Dynamic Studies
Metabonomic studies that involve the quantification of the dynamic metabolic response are best evaluated using sequential sampling over an appropriate time course. The evaluation of human biofluid samples is further complicated by a high degree of normal physiological variation caused by genetic and lifestyle differences.Sampling period and interval are based on the expected or known pharmaco-kinetics of the expected effect. In other words, statistical experimental design is used to maximize the information content and increase the chances of capturing all possible variations of responses. This allows flexibility to the subsequent analysis and an unbiased evaluation of each individual’s kinetic profile. This also implies that the often assumed control (or pre-dose) and treated modeling approach is not optimal, as it fails to take into account the individual dynamics, for example, slow and fast responders. In addition, for dynamic studies, the traditional control group does not exist. Instead, each individual (object) is its own reference control.

Chemometric techniques and methods

Design of Experiments - Making data contain information [Refs 8,9,10]

The metabonomics approach is more demanding on the quality, accuracy and richness of information in data sets. DOE is recommended to be used through the whole process, from defining the aim of the study to the final extraction of information.

Principal component analysis [Ref 1]
Workhorse in chemometrics to get an overview of the multivariate profiles. Examining the scatter plot of the first two score vectors (t1-t2) reveals the homogeneity of the data, any groupings, outliers, and trends. Strong outliers are found as deviating points in the scatter plot.

The SIMCA method [Ref 2]
A supervised classification method based on PCA. The idea is to construct a separate PCA model for each known class of observations. These PCA models are then used to assign the class belonging to observations of unknown class origin by the prediction of these observations into each PCA class model where the boundaries have been defined by the 95% confidence interval.

Partial Least-Squares (PLS) Method by Projections to Latent Structures, [Ref 3,4].
PLS is a method commonly used where a quantitative relationship between two data tables X and Y is sought between a matrix, X, usually comprising spectral or chromatographic data of a set of calibration samples, and another matrix, Y, containing quantitative values, for example, concentrations of endogenous metabolites. PLS can also be used in discriminant analysis, that is, PLS-DA. The Y matrix then contains qualitative values, for example, class belonging, gender, and treatment of the samples.

The Orthogonal-PLS Method (OPLS). [Ref 5,6,7,17,18]
The OPLS method is a recent modification of the PLS method. The main idea of OPLS is to separate the systematic variation in X into two parts, one that is linearly related to Y and one that is unrelated (orthogonal) to Y. This partitioning of the X-data facilitates model interpretation and model execution on new samples. OPLS-DA [Ref 17] is an extension combining the strengths of PLS-DA and SIMCA classification. O2PLS [Ref 6,7,18] is a further modification of the OPLS method that provides separate models for both joint and orthogonal variations between two blocks of data.

Batch Modeling. [Ref 14,15]
Batch modeling is routinely being used for analysis of industrial batch process data. A batch process has a finite duration in time, in contrast to a continuous process. By analogy, batch modeling methods are used in metabonomic studies to model the time dependency or dynamics of biological processes,for example, the evolution of a toxic substance in rats.A drawback with batch modeling is that all study objects must have a similar metabolic and response rate; we cannot have slow and fast responders in the same model.

Hierarchical PCA. [Ref 16]
The idea behind hierarchical PCA is to block the variables to improve transparency and interpretability. This method operates on two or more levels. On each level, standard PCA scores and loading plots, as well as residuals and their summaries, such as DModX, are used for interpretation.

Discussion
The most common chemometrical tool used in the evaluation of a metabonomic study is PCA. PCA is always recommended as a starting point for analyzing multivariate data and will rapidly provide an overview of the information hidden in the data. Unfortunately, in a majority of the publications, the PCA method is the only tool applied. Often additional information can be extracted by using more advanced multivariate methods. In a few papers, PLS-DA and/or OPLS-DA have been used for modeling two classes of data to increase the class separation, simplify interpretation, and find potential biomarkers. For the two-class problem, OPLS-DA is recommended to obtain a clearer and more straightforward interpretation. It can also provide an understanding of the interclass variation. There is a general lack in applying Design of Experiments (DOE) to ensure balanced data and to have a defined experimental domain.

A future outlook for chemometrics in metabonomics is that the benefits of statistical experimental design in conjunction with more focused modeling methods such as PLS and OPLS become more widely known and applied to a much greater extent, not only for the two-class problems, but also for dynamic studies. However, it is likely to take some time until a fully integrated multivariate approach is published, based on the chemometric philosophy.

FUTURE READING

February issue 2007 of Journal of Proteome Research features a Metabonomics review.
See Editorial by Nicholson et al http://pubs.acs.org/subscribe/journals/jprobs/6/i02/html/0207edit.html

Chemometrics contribution partly described in this Editorial can be found here, Chemometrics in Metabonomics
http://pubs.acs.org/cgi-bin/article.cgi/jprobs/2007/6/i02/pdf/pr060594q.pdf

Download all articles and reviews in this issue of Journal of Proteome Reseearch
http://pubs3.acs.org/acs/journals/toc.page?incoden=jprobs&indecade=0&involume=6&inissue=2

 

 

The Handbook of Metabonomics and Metabolomics
(Edited by J C Lindon, J KNicholson & E Holmes)

  • Hardcover: 572 pages
  • Publisher: Elsevier Science (December 28, 2006)
  • Language: English
  • ISBN-10: 0444528415

    See also Amazon [http://www.amazon.com/Handbook-Metabonomics-Metabolomics-John-Lindon/dp/0444528415]

     

    REFERENCES

    1. Jackson J. E. A Users Guide to Principal Components; Wiley: New york, 1991.

    2. Wold S: Pattern recognition by means of disjoint principal components models. Pattern Recognition 8(3): 127-139 (1976)

    3. Wold, S.; Ruhe, A.; Wold, H.; Dunn, W. J., III The Collinearity problem in linear regression. The partial least squares approach to generalized inverses. SIAM J. Sci. Stat. Comput. 1984, 5 (3), 735-743.

    4. Wold, S.; Martens, H.; Wold, H. Lecture Notes in Mathematics Proc. conf. Matrix pencils, Piteå, Sweden; Springer-Verlag: Heidelberg, 1983.

    5. Trygg, J.; Wold S. Orthogonal projections to latent structures (OPLS). J. Chemom. 2002, 16, 119-128.

    6. Trygg, J. O2-PLS for qualitative and quantitative analysis in multivariate calibration. J. Chemom. 2002, 16, 283-293.

    7. Trygg, J.; Wold, S. O2-PLS, a two-block (X-Y) latent variable regression (LVR) method with an integral OSC filter. J. Chemom. 2003, 17, 53-64.

    8. Lundstedt, T.; Seifert, E.; Abramo, L.; Thelin, B.; Nyström, A.; Pettersen, J.; Bergman R. Experimental design and optimization. Chemom. Intell. Lab. Syst. 1998, 42, 3-40.

    9. Box, G. E. P.; Hunter, W. G.; Hunter, J. S. Statistics for Experimenters; John Wiley & Sons: New York, 1978.

    10. Eriksson, L.; Johansson, E.; Kettaneh Wold, N.; Wikström, C.;Wold, S. Design of Experiments principles and Applications, Umetrics AB, Umeå, Sweden, 1996.

    11. The Standard Metabolic Reporting Structure, Version 2.3, http://www.smrsgroup.org/, Jan uary 13, 2006.

     

     

    12. Gullberg, J.; Jonsson, P.; Nordström, A.; Sjöstroöm, M.; Moritz, T. Design of experiments: an efficient strategy to identify factors influencing extraction and derivatization of Arabidopsis thaliana samples in metabolomic studies with gas chromatography/mass spectrometry. Anal. Biochem. 2004, 331, 283-295.

    13. Jiye, A.; Trygg, J.; Gullberg, J.; Johansson, A. I.; Jonsson. P.; Antti, H.; Marklund, S. L.; Moritz, T.; Extraction and GC/MS. Analysis of the human blood plasma metabolome. Anal. Chem. 2005, 77, 8086-8094

    14. Wold, S.; Kettaneh, N.; Friden, H.; Holmberg, A. Modelling and diagnostics of batch processes and analogous kinetic experiments. Chemom. Intell. Lab. Syst. 1998, 44, 331-340.

    15. Antti, H.; Bollard, M. E.; Ebbels, T.; Keun, H.; Lindon, J. C.; Nicholson, J. K.; Holmes, E. Batch statistical processing of H-1 NMR-derived urinary spectral data. J. Chemom. 2002, 461-468.

    16. Wold, S.; Kettaneh, N.; Tjessem, K. Hierarchical multiblock, PLS and PC models for easier model interpretation and as an alternative to variable selection. J. Chemom. 1996, 10 (5-6), 463-482.

    17. Bylesjö B, Rantalainen M, Cloarec O, Nicholson JK, Holmes E, Trygg J, OPLS Discriminant Analysis, Combining the strengths of PLS-DA and SIMCA classification, Journal of Chemometrics, Jan, 2006, Early view [http://www3.interscience.wiley.com/cgi-bin/abstract/114103576/]

    18. Rantalainen M, Cloarec O, Beckonert O, Wilson ID, Jackson D, Rowlinson R,Jones S, Rayner S, Nickson J, Tonge R, Wilkinson R, Mills JD, Trygg J*,Nicholson JK, Holmes E, Statistically Integrated Metabonomic-Proteomic Studies on a Human Prostate Cancer Xenograft Model in Mice, J Proteome Res., 5 (10):2642-2655 Oct 6 2006 [http://dx.doi.org/10.1021/pr060124w]

     

     

     

     

  • < Prev   Next >
    Search website
    Editorial flash
    Svante Wold's New Year Editorial 2005
    Another year has passed too quickly, and we now read January 2005 on our calendars. In 2004, good things happened and terrible, happy and sad, as usual. However, chemometrics does well, spreading deeper into biology where the data sets are larger than ever with thousands of variables in, among others, gene arrays, LC-MS profiles, and NIR and IR microscopy.
    Read more...
    News flash
    Chemometrics’ Epistemology in Systems Biology

    10 years of Systems biology research – current knowledge and future expectations

    May 9-11 2011, Skanör-Falsterbo, Öresund, Sweden

    Systems biology has now matured and includes advanced modelling of biological systems.The condensation of divergent scientific disciplines (clinicians, biologists, mathematicians, statisticians, physicists, chemists, etc.) has forced us into a new way of working and thinking – Systems biology. This provides further challenges how to create meaningful data, extract information and ultimately gain new knowledge.

    Download flyer (2010-06-10)

    The focus of our conference theme, Chemometrics’ Epistemology in Systems biology will be on the scientific process, from data generation to quality of gained knowledge and its presentation to a scientific audience. Epistemology means theory of knowledge and addresses the questions: What is knowledge? How is knowledge acquired? What do systems biologists know? How do we know what we know? How knowledge convince fellow scientists? These issues will be given ample time at this EUCHEM conference, which will be limited to maximum 70 participants. Each half day will include two one-hour speeches of the same topic from two speakers representing differing point of views. The suggested format will allow plenty of time for informal discussions and networking. Further, poster presenters will be given the opportunity to give a five minutes oral presentation. This gives an opportunity also for PhD students and young researchers to contribute to the symposium.

    Confirmed speakers:
    Prof. Jonas Bergqvist, Dep. Analytical Chemistry, Uppsala University, Sweden
    Dr. Mats Sundgren, AstraZeneca R&D Mölndal, Sweden
    Dr. Nicholas Waters, CEO Neurosearch Sweden
    Scientific board:
    Prof. Torbjörn Lundstedt, Uppsala University & AcurePharma AB, Sweden
    Dr. Mats Sundgren, AstraZeneca R&D Mölndal, Sweden
    Prof. Mark Stitt,
    Max Planck Institute of Molecular Plant Physiology, Germany

    Organization committee:
    Johan Trygg (Umeå University, Umeå, Sweden)
    Jenny Forshed (Karolinska Institute, Stockholm, Sweden)
    Johan Gottfries (Gottfries Medicinal, Gothenburg, Sweden)
    Ing-Marie Olsson (Umetrics AB, Malmö, Sweden)

     

     

     

     

     

     

     

     

     

     

     

     

    Tutorial flash
    Model control of mathematical models
    In the applied work with data we often formulate a mathematical model that is expected to be appropriate to the model. Data is used to estimate the unknown parameters in the model. The situation considered here is the case of regression analysis, where there are available data for the instrumental variables and the associated response values.
    Read more...