Discriminant analysis
Used to assign members to groups, or differentiate between groups
Groups are pre-determined
Assumes existence of a well-defined group structure; also assumes
data are multivariate normally distributed
Consider a plot of quadrats in species-dimensional
space
- Conceptually, discriminant analysis
seeks similar entities and
groups them together.
- Similarity is sought by using Mahalanobis distance (a
distance measure which "corrects" for the correlation
between species). It is identical to euclidean distance if
species are uncorrelated.
- Use of Mahalanobis distances requires multivariate
normality
- Unlike univariate ANOVA, discriminant analysis is
sensitive to deviations from multivariate normality
- DA is a maximization technique: it does an F-test to
determine which variable maximally discriminates between
groups
- Mechanically, the procedure uses prediction equations
- equations are called classification or discriminant
functions
- they are analogous to regression equations
- e.g., R = c1(X1) +
c2(X2) + c3(X3) + ... +
cn(Xn), where
- R = biogeographic province
- c = constant
- X = species abundance, for various species (Xi's)
- Values of R indicate province to which quadrats should
be assigned
- Coefficients, then, determine to which group a quadrat will
be assigned
- Coefficients are sensitive to deviations from
multivariate normality
- Coefficients are sensitive to prior probabilities
(which must be assigned)
- DA assumes that covariances are equal within groups
- But if group structure exists, then relationships between
species should be different in different groups
- Finally, there is a problem w/ statistical bias:
- Group membership is based on all samples, including the
sample being assigned to a group. Thus, the quadrat being
considered for group membership contributes info to the
determination of coefficients.
- DA is often used in ecology, at least partially because it is one
of few multivariate tools which allows a statistical test of
hypothesis
- In practice w/ real data, calculation of the F-statistic is
so sensitive to violations of assumptions (bias, normality,
equality of covariances, assignment of prior probabilities)
that the F-test is badly flawed
- Nonetheless, discrimination is often quite good--as a
maximization technique, DA capitalizes on small
discriminating differences --> good classification into
groups
- Furthermore, R2 values are usually high, because sample
size (number of quadrats) is usually small relative to
the number of independent variables (species)
- Therefore, DA almost always looks good on paper
- However, the associated F-test should be viewed w/
extreme caution
- Williams (1983, Ecology 64:1283-1291) concluded that there is
widespread use of DA in ecology, and almost always it is done
incorrectly
- He did not recommend using DA only when assumptions are
rigorously satisfied
- But there is a difference between "exploration" and
"confirmation"
- Statistical procedures can be used to explore data
whether assumptions are met or not
- But any perceived patterns should be regarded as
preliminary and should be used to suggest hypotheses
which can be subsequently tested
Previous
lectureNext lecture