Monument Valley

Discriminant analysis

Used to assign members to groups, or differentiate between groups

Groups are pre-determined

Assumes existence of a well-defined group structure; also assumes data are multivariate normally distributed

Consider a plot of quadrats in species-dimensional space








Conceptually, discriminant analysis seeks similar entities and groups them together.

Similarity is sought by using Mahalanobis distance (a distance measure which "corrects" for the correlation between species). It is identical to euclidean distance if species are uncorrelated.

Use of Mahalanobis distances requires multivariate normality

Unlike univariate ANOVA, discriminant analysis is sensitive to deviations from multivariate normality

DA is a maximization technique: it does an F-test to determine which variable maximally discriminates between groups

Mechanically, the procedure uses prediction equations

equations are called classification or discriminant functions

they are analogous to regression equations

e.g.,   R = c1(X1) + c2(X2) + c3(X3) + ... + cn(Xn), where

R = biogeographic province

c = constant

X = species abundance, for various species (Xi's)

Values of R indicate province to which quadrats should be assigned

Coefficients, then, determine to which group a quadrat will be assigned

Coefficients are sensitive to deviations from multivariate normality

Coefficients are sensitive to prior probabilities (which must be assigned)

DA assumes that covariances are equal within groups

But if group structure exists, then relationships between species should be different in different groups

Finally, there is a problem w/ statistical bias:

Group membership is based on all samples, including the sample being assigned to a group. Thus, the quadrat being considered for group membership contributes info to the determination of coefficients.

DA is often used in ecology, at least partially because it is one of few multivariate tools which allows a statistical test of hypothesis

In practice w/ real data, calculation of the F-statistic is so sensitive to violations of assumptions (bias, normality, equality of covariances, assignment of prior probabilities) that the F-test is badly flawed

Nonetheless, discrimination is often quite good--as a maximization technique, DA capitalizes on small discriminating differences --> good classification into groups

Furthermore, R2 values are usually high, because sample size (number of quadrats) is usually small relative to the number of independent variables (species)

Therefore, DA almost always looks good on paper

However, the associated F-test should be viewed w/ extreme caution

Williams (1983, Ecology 64:1283-1291) concluded that there is widespread use of DA in ecology, and almost always it is done incorrectly

He did not recommend using DA only when assumptions are rigorously satisfied

But there is a difference between "exploration" and "confirmation"

Statistical procedures can be used to explore data whether assumptions are met or not

But any perceived patterns should be regarded as preliminary and should be used to suggest hypotheses which can be subsequently tested



Previous lecture

Next lecture