Monument Valley

Similarity measures

Once data are collected, we may be interested in the similarity (or absence thereof) between different samples, quadrats, or communities

Numerous similarity indices have been proposed to measure the degree to which species composition of quadrats is alike (conversely, dissimilarity coefficients assess the degree to which quadrats differ in composition)



Jaccard coefficient

Simplest index, developed to compare regional floras (e.g., Jaccard 1912, The distribution of the flora of the alpine zone, New Phytologist 11:37-50); widely used to assess similarity of quadrats

Uses presence/absence data (i.e., ignores info about abundance)

SJ = a/(a + b + c), where

SJ = Jaccard similarity coefficient,

a = number of species common to (shared by) quadrats,

b = number of species unique to the first quadrat, and

c = number of species unique to the second quadrat

e.g., given the following data:

quadratsp 1sp 2sp 3sp 4sp 5sp 6
i342100
j330521

kyki = 3 + 4 + 2 + 1 + 0 + 0 = 10

kykj = 3 + 3 + 0 + 5 + 2 + 1 = 14

kyki2 = 32 + 42 + 22 + 12 + 02 + 02 = 30

kykj2 = 32 + 32 + 02 + 52 + 22 + 12 = 48

kykiykj = 3(3) + 4(3) + 2(0) + 1(5) + 0(2) + 0(1) = 27

SJ is frequently multiplied by 100%, and may be represented in terms of dissimilarity (i.e., DJ = 1.0 - SJ)



Sørensen coefficient (syn. coefficient of community, CC)

A very simple index, similar to Jaccard's index

Give greater "weight" to species common to the quadrats than to those found in only one quadrat

Uses presence/absence data:

SS = 2a/(2a + b + c), where

Sørensen similarity coefficient,

a = number of species common to both quadrats,

b = number of species unique to the first quadrat, and

c = number of species unique to the second quadrat

SS usually is multiplied by 100% (i.e., SS = 67%), and may be represented in terms of dissimilarity (i.e., DS = 1.0 - SS)



Similarity ratio

Similarity between quadrats i and j is

SRij = kykiykj / ( kyki2 + kykj2 - kykiykj), where

  yki = abundance of kth species in quadrat i


For presence-absence data, SR reduces to the Jaccard index



Percentage similarity (syn. Czekanowski coefficient)

Percentage similarity between quadrats i and j is

PSij = 200 k min(yki, ykj) / (kyki + kykj), where

min(yki, ykj) = minimum value of yki and ykj


For presence-absence data, PS reduces to the Sørensen index



Euclidean distance (syn. coefficient of squared Euclidean distance)

Generalized formula for Euclidean distance is

EDij = , where

EDij = Euclidean distance between quadrats i and j, and

  yki = abundance of kth species in quadrat i

  ykj = abundance of kth species in quadrat j


Euclidean distance is routinely used as a measure of similarity in cluster analyis algorithms



Direct gradient analysis

Graphics

Multivariate analyses are required for community data because we're interested in the response of many species, simultaneously

Multivariate analyses are used to summarize redundancy, reduce noise, elucidate relationships, and identify outliers

Multivariate analyses can relate communities to other kinds of data (e.g., environmental, historical data)

Results from multivariate analyses are designed to improve our understanding of communities, esp. community structure

Direct gradient analysis

Used to display distribution of organisms along gradients of important environmental factors

Devised by Ramensky (1930) and Gause (1930), but used extensively in ecological research after about 1950 (Whittaker)

An example:

Dix and Smeins (1967) took 100 community samples to represent the range of vegetation present in Nelson County, North Dakota

Homogeneous stands of 0.1 ha were sampled by recording frequency in 30, 0.5 × 0.5 m quadrats

Numerous environmental variables were recorded for each stand

Defined indicator species of a drainage class as a species w/ frequency at least 10% greater in that class than in any other class

Defined indicator value as drainage class of the indicator species {drainage classes vary from 1 (good) to 6 (poor)}

Goal: summarize frequency of all species --> single number for each stand

Stand Index Number = {(rel. freq. × indicator value)/{(rel. freq. of indiv. sp)} × 100



e.g., Stand 17 (sample data) {RF=rel. freq., IV=indicator value}:

Spp.RFIVRF x IV 
Stco20120 
Stvi10---(not an indicator for any drainage class)
Acmi15230 
Lica5315 
Other50--- 
40* 65 

*sum of RF for spp. w/ IV (20+15+5)

Stand Index 17 = (65/40) × 100 = 162

For all stands, stand index varied from 100 to 600

Divided this 500-unit gradient into 10, 50-unit classes:

  Species frequency
ClassStand w/in 50-uinit classA BC
100-1494   
 9   
 12   
  XAXBXC
150-199    

and so on ...

=========> Fig. 2 [Dix and Smeins 1967, p. 33]

They could have plotted frequency over the entire 500-unit gradient, but the graph would have been messy--10 drainage classes "smooths" the graph, making interpretation easier


The purpose of direct gradient analysis is to organize community and environmental data to answer questions such as:

  1. Precisely which environmental factor in a complex of factors principally affects distribution of organisms and communities?

    While direct gradient analysis can be used to identify ecologically important environmental factors, experimental manipulations are needed to more precisely determine the importance of various environmental factors

  2. How can environmental factors best be measured or estimated?

    Dix and Smeins derived an index for drainage based on the plants themselves: this may be easier, more accurate, and less expensive than other measures of drainage or soil moisture

  3. What additional environmental gradients affect community composition?

    Often difficult to evaluate because secondary gradients are overshadowed by primary gradients

  4. What general principles emerge from direct gradient analysis to characterize the combining of individual species into communities?



More graphics

Characteristics of DGA:

Data are plotted along environmental axes which are generally accepted as given. Axes can be:

direct

indirect

synthetic

Species, communities, and community-level characteristics can be plotted

Several dimensions are possible

Some form of data-smoothing is usually employed prior to presentation

common smoothing technique is weighted average for each datum; e.g.,

{current datumsmoothed = previous datum + 2 × current datum + next datum/4}

resulting curve is less "noisy" than original data

Whittaker offered the following conclusions about DGA:

  1. The general form for the distribution of a species population along an environmental complex-gradient is the bell-shaped curve

    • The center (or mode) of a species population along a complex-gradient is not at its physiological optimum but is a center of maximum population success in competition with other species populations

    • The centers of species populations are scattered along a complex-gradient in an apparently random manner

    One important qualification: in some cases, competing species appear to be not randomly but regularly distributed along environmental complex-gradients

    According to Whittaker, these considerations imply the following:

  2. Species do not form well-defined groups of associates with similar distributions, clearly separate from other such defined groups, but are distributed according to the principle of species individuality; each species is distributed in its own manner, according to its own genetic, physiological, and population response to environmental factors that affect it, including effects of other species

  3. Along an environmental complex-gradient, species populations (w/ their scattered centers and broadly overlapping distributions) form a population continuum or compositional gradient, suggesting that, in the absence of environmental discontinuity or disturbance, communities intergrade or are continuous w/ one another

  4. These conclusions led Whittaker to reject the "community- unit" hypothesis

Whittaker's conclusions were strongly influenced by his belief in bell-shaped curves of species distributions

The bell-shaped curve concept was challenged by Austin (1976, Vegetatio 33:33-41) in a summary of previously published data:

 linearbell symmetricskewedvery skewedbimodaltotal
Curtis40372824
Noy-Meir0124007
Monk32311818
Total7381231649
Percent of Total1461624633 

 bell (%)skewedshoulderedplateaubimodaltotal
Whittaker      
Smokies8 (23%)6102935
Siskiyous14 (27%)16811251

Austin therefore concluded that the general form of the species population is not normal, bell-shaped. And he was considering data which had already been smoothed

Werger (1983, Vegetatio 52:141-150) used a very conservative yardstick for "normal" distribution (50% of variation accounted for by curve)

31% of species normally distributed:

1 of 8 species (12%) on ridge tops

12 of 22 species (55%) midslope

5 of 32 species (16%) in swales

The data collected and summarized by Austin and Werger indicate that there is no a priori reason to assume bell-shaped normal curves for distributions of species on gradients



Conclusions about DGA:

  1. DGA is of unquestionable value and utility in ecology as a means of

    • data summarization and presentation, and

    • hypothesis generation

  2. DGA is soundly based in classical plant ecology (e.g., Jack Major's functional factorial approach to plant ecology-- vegetation = f(topography, organisms, time, soil, climate)

  3. The use of data-smoothing may be misleading

  4. There is a high degree of subjectivity inherent in this method

  5. DGA (esp. w/ "synthetic" indices) is inherently circular

    Circularity results from subjective (pre-conceived) sampling design--note that this was a criticism launched by Whittaker (among others) against the Clementsian approach of "seeing" communities and sampling w/in them.

    The DGA-based conclusion of vegetation continuum results from arbitrary, subjective sampling (just as the discrete-community conclusion derives from sampling w/in well-defined communities which appear to be different.

    Both schools describe, but do not answer "why"? Both groups base conclusions on descriptive data, w/o testing hypotheses.



Previous lecture

Next lecture