|
Written by Richard Brereton, Bristol University
|
Classification is one of the fundamental jobs for the chemometrician. Information from a series of measurements is used to determine whether samples originate from one or more groups. Much classical work in the area of multivariate statistics originates from biology, in particular the pioneering work of R A Fisher in the 1930s, where organisms were classified into species according to their physical measurements.
Biologists originally developed multivariate techniques to cope with the problem of classifying organisms. Usually one measurement is insufficient to distinguish species. Consider trying to determine whether an organism is a baby mouse or a rhinoceros beetle. The lengths are fairly similar. So a single measurement such as the length of an organism is insufficient to classify it into a specific group. What about taking another measurement such as their width? The rhinoceros beetle may be as wide as an adult mouse, so this measurement again is not sufficient to distinguish the animals. Figure 1 illustrates the dilemma. So neither length nor width alone can distinguish a mouse from a beetle. What is necessary is to use a series of measurements, for example, the rhinoceros beetle's width is quite large compared to its length, so the ratio of length to width may be a better one. Biologists, therefore, record a series of measurements and by using these together they can distinguish different organisms. Some of the early pioneering work by biologists is quite elegant, for example, looking at various body measurements to distinguish closely related species, particularly useful for the fossil record. Biology, especially taxonomy, was a powerful historic driving force for the development of multivariate methods in classification.
|