header image
Home arrow Tutorial arrow Classification methods
Classification methods PDF Print E-mail
Written by Richard Brereton, Bristol University   
Classification is one of the fundamental jobs for the chemometrician. Information from a series of measurements is used to determine whether samples originate from one or more groups. Much classical work in the area of multivariate statistics originates from biology, in particular the pioneering work of R A Fisher in the 1930s, where organisms were classified into species according to their physical measurements. Biologists originally developed multivariate techniques to cope with the problem of classifying organisms. Usually one measurement is insufficient to distinguish species. Consider trying to determine whether an organism is a baby mouse or a rhinoceros beetle. The lengths are fairly similar. So a single measurement such as the length of an organism is insufficient to classify it into a specific group. What about taking another measurement such as their width? The rhinoceros beetle may be as wide as an adult mouse, so this measurement again is not sufficient to distinguish the animals. Figure 1 illustrates the dilemma. So neither length nor width alone can distinguish a mouse from a beetle. What is necessary is to use a series of measurements, for example, the rhinoceros beetle's width is quite large compared to its length, so the ratio of length to width may be a better one. Biologists, therefore, record a series of measurements and by using these together they can distinguish different organisms. Some of the early pioneering work by biologists is quite elegant, for example, looking at various body measurements to distinguish closely related species, particularly useful for the fossil record. Biology, especially taxonomy, was a powerful historic driving force for the development of multivariate methods in classification.
< Prev   Next >
Search website
Editorial flash
PLS vs canonical correlation and relation to the O2PLS method

I have compiled a few obvious differences between PLS2 and canonical correlation below . In addition, I also describe their relation to a recent development of the OPLS method called O2PLS.

Read more...
News flash
Metabolomics 2010 meeting

www.metabolomics2010.com

June 27- July 1, 2010

Amsterdam, The Netherlands

Last chance for Abstract submission for ORAL presentations !!

The deadline for submitting abstracts for oral presentations is approaching rapidly! You have until Friday 23rd April.
We encourtage you all to submit your proposed contributions by then.
After this date you can however still send in abstracts for posters.
As we have a limited space for just 400 posters we encourage everyone to submit their abstracts as soon as possible in order not to miss out.

To register and send in your abstracts for talks and posters click here

The Local Organisers

Thomas Hankemeier and Robert Hall

 

 

Tutorial flash
Centring and scaling of data
It is common to centre and scale data such that each variable in the analysis have mean zero and unit variance. Here the situation is discussed briefly. There are in the literature different views on what one should do, some views are personal experiences and others are based on theoretical considerations.
Read more...