Thereafter, we denote by:
\[ D_{Mg} = \frac{S - 1}{\ln N} \]
\[ D_{Mn} = \frac{S}{\sqrt{N}} \]
\[ \hat{S}_{Chao1} = \begin{cases} S + \frac{N - 1}{N} \frac{f_1^2}{2 f_2} & f_2 > 0 \\ S + \frac{N - 1}{N} \frac{f_1 (f_1 - 1)}{2} & f_2 = 0 \end{cases} \]
In the special case of homogeneous case, a bias-corrected estimator is:
\[ \hat{S}_{Chao1-bc} = S + \frac{N - 1}{N} \frac{f_1 (f_1 - 1)}{2 f_2 + 1}\]
Improved Chao1 estimator (makes use of the additional information of tripletons and quadrupletons; Chiu et al. (2014)) :
\[ \hat{S}_{iChao1} = \hat{S}_{Chao1} + \frac{N - 3}{4 N} \frac{f_3}{f_4} \times \max\left(f_1 - \frac{N - 3}{N - 1} \frac{f_2 f_3}{2 f_4} , 0\right)\]
\[ \hat{S}_{ACE} = \hat{S}_{abun} + \frac{\hat{S}_{rare}}{\hat{C}_{rare}} + \frac{f_1}{\hat{C}_{rare}} \times \hat{\gamma}^2_{rare} \]
Where \(\hat{S}_{rare} = \sum_{i = 1}^{k} f_i\) is the number of rare taxa, \(\hat{S}_{abun} = \sum_{i > k}^{N} f_i\) is the number of abundant taxa (for a given cut-off value \(k\)), \(\hat{C}_{rare} = 1 - \frac{f_1}{N_{rare}}\) is the Turing’s coverage estimate and:
\[ \hat{\gamma}^2_{rare} = \max\left[\frac{\hat{S}_{rare}}{\hat{C}_{rare}} \frac{\sum_{i = 1}^{k} i(i - 1)f_i}{\left(\sum_{i = 1}^{k} if_i\right)\left(\sum_{i = 1}^{k} if_i - 1\right)} - 1, 0\right] \]
For replicated incidence data (i.e. a \(m \times p\) logical matrix), the Chao2 estimator is:
\[ \hat{S}_{Chao2} = \begin{cases} S + \frac{m - 1}{m} \frac{Q_1^2}{2 Q_2} & Q_2 > 0 \\ S + \frac{m - 1}{m} \frac{Q_1 (Q_1 - 1)}{2} & Q_2 = 0 \end{cases} \]
Improved Chao2 estimator (Chiu et al. 2014):
\[ \hat{S}_{iChao2} = \hat{S}_{Chao2} + \frac{m - 3}{4 m} \frac{Q_3}{Q_4} \times \max\left(Q_1 - \frac{m - 3}{m - 1} \frac{Q_2 Q_3}{2 Q_4} , 0\right)\]
\[ \hat{S}_{ICE} = \hat{S}_{freq} + \frac{\hat{S}_{infreq}}{\hat{C}_{infreq}} + \frac{Q_1}{\hat{C}_{infreq}} \times \hat{\gamma}^2_{infreq} \]
Where \(\hat{S}_{infreq} = \sum_{i = 1}^{k} Q_i\) is the number of infrequent taxa, \(\hat{S}_{freq} = \sum_{i > k}^{N} Q_i\) is the number of frequent taxa (for a given cut-off value \(k\)), \(\hat{C}_{infreq} = 1 - \frac{Q_1}{\sum_{i = 1}^{k} iQ_i}\) is the Turing’s coverage estimate and:
\[ \hat{\gamma}^2_{infreq} = \max\left[\frac{\hat{S}_{infreq}}{\hat{C}_{infreq}} \frac{m_{infreq}}{m_{infreq} - 1} \frac{\sum_{i = 1}^{k} i(i - 1)Q_i}{\left(\sum_{i = 1}^{k} iQ_i\right)\left(\sum_{i = 1}^{k} iQ_i - 1\right)} - 1, 0\right] \]
Where \(m_{infreq}\) is the number of sampling units that include at least one infrequent species.
Hurlbert (1971) unbiaised estimate of Sander (1968) rarefaction:
\[ E(S) = \sum_{i = 1}^{S} 1 - \frac{{N - N_i} \choose n}{N \choose n} \]
Diversity:
\[ H' = - \sum_{i = 1}^{S} p_i \ln p_i \]
Evenness:
\[ E = \frac{H'}{H'_{max}} = \frac{H'}{\ln S} = - \sum_{i = 1}^{S} p_i \log_S p_i \]
When \(p_i\) is unknown in the population, an estimate is given by \(\hat{p}_i =\frac{n_i}{N}\) (maximum likelihood estimator - MLE). As the use of \(\hat{p}_i\) results in a biased estimate, Hutcheson (1970) and Bowman et al. (1971) suggest the use of:
\[ \hat{H}' = - \sum_{i = 1}^{S} \hat{p}_i \ln \hat{p}_i - \frac{S - 1}{N} + \frac{1 - \sum_{i = 1}^{S} \hat{p}_i^{-1}}{12N^2} + \frac{\sum_{i = 1}^{S} (\hat{p}_i^{-1} - \hat{p}_i^{-2})}{12N^3} + \cdots \]
This error is rarely significant (Peet 1974), so the unbiaised form is not implemented here (for now).
Diversity:
\[ HB = \frac{\ln (N!) - \sum_{i = 1}^{S} \ln (n_i!)}{N} \]
Evenness:
\[ E = \frac{HB}{HB_{max}} \]
with:
\[ HB_{max} = \frac{1}{N} \ln \frac{N!}{\left( \lfloor \frac{N}{S} \rfloor! \right)^{S - r} \left[ \left( \lfloor \frac{N}{S} \rfloor + 1 \right)! \right]^{r}} \]
where: \(r = N - S \lfloor \frac{N}{S} \rfloor\).
The following methods return a dominance index, not the reciprocal or inverse form usually adopted, so that an increase in the value of the index accompanies a decrease in diversity.
Dominance for an infinite sample:
\[ D = \sum_{i = 1}^{S} p_i^2 \]
Dominance for a finite sample:
\[ \lambda = \sum_{i = 1}^{S} \frac{n_i \left( n_i - 1 \right)}{N \left( N - 1 \right)} \]
Dominance:
\[ D = \frac{N - U}{N - \sqrt{N}} \]
Evenness:
\[ E = \frac{N - U}{N - \frac{N}{\sqrt{S}}} \]
where \(U\) is the distance of the sample from the origin in an \(S\) dimensional hypervolume:
\[U = \sqrt{\sum_{i = 1}^{S} n_i^2}\]
Dominance:
\[ d = \frac{n_{max}}{N} \]
The following methods can be used to acertain the degree of turnover in taxa composition along a gradient on qualitative (presence/absence) data. This assumes that the order of the matrix rows (from 1 to \(m\)) follows the progression along the gradient/transect.
We denote the \(m \times p\) incidence matrix by \(X = \left[ x_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) and the \(p \times p\) corresponding co-occurrence matrix by \(Y = \left[ y_{ij} \right] ~\forall i,j \in \left[ 1,p \right]\), with row and column sums:
\[\begin{align} x_{i \cdot} = \sum_{j = 1}^{p} x_{ij} && x_{\cdot j} = \sum_{i = 1}^{m} x_{ij} && x_{\cdot \cdot} = \sum_{j = 1}^{p} \sum_{i = 1}^{m} x_{ij} && \forall x_{ij} \in \lbrace 0,1 \rbrace \\ y_{i \cdot} = \sum_{j \geqslant i}^{p} y_{ij} && y_{\cdot j} = \sum_{i \leqslant j}^{p} y_{ij} && y_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} y_{ij} && \forall y_{ij} \in \lbrace 0,1 \rbrace \end{align}\]\[ \beta_W = \frac{S}{\alpha} - 1 \]
where \(\alpha\) is the mean sample diversity: \(\alpha = \frac{x_{\cdot \cdot}}{m}\).
\[ \beta_C = \frac{g(H) + l(H)}{2} - 1 \]
where \(g(H)\) is the number of taxa gained along the transect and \(l(H)\) the number of taxa lost.
\[ \beta_T = \frac{g(H) + l(H)}{2\alpha} \]
where \(g(H)\) is the number of taxa gained along the transect, \(l(H)\) the number of taxa lost and \(\alpha\) the mean sample diversity, \(\alpha = \frac{x_{\cdot \cdot}}{m}\).
Similarity between two samples \(a\) and \(b\) or between two types \(x\) and \(y\) can be measured as follow.
These indices provide a scale of similarity from \(0\)-\(1\) where \(1\) is perfect similarity and \(0\) is no similarity, with the exception of the Brainerd-Robinson index which is scaled between \(0\) and \(200\).
\(a_j\) and \(b_j\) denote the number of individuals in the \(j\)-th type/taxon, \(j \in \left[ 1,n \right]\). \(o_j\) denotes the number of type/taxon common to both sample/case: \(o_j = \sum_{k = 1}^{n} a_k \cap b_k\).
\(x_i\) and \(y_i\) denote the number of individuals in the \(i\)-th sample/case, \(i \in \left[ 1,m \right]\). \(o_i\) denotes the number of sample/case common to both type/taxon: \(o_i = \sum_{k = 1}^{m} x_k \cap y_k\).
\[ C_J = \frac{o_j}{S_a + S_b - o_j} \]
\[ C_S = \frac{2 \times o_j}{S_a + S_b} \]
\[ C_{BR} = 200 - \sum_{j = 1}^{S} \left| \frac{a_j \times 100}{\sum_{j = 1}^{S} a_j} - \frac{b_j \times 100}{\sum_{j = 1}^{S} b_j} \right|\]
Bray and Curtis (1957) modified version of Sorenson’s index:
\[ C_N = \frac{2 \sum_{j = 1}^{S} \min(a_j, b_j)}{N_a + N_b} \]
\[ C_{MH} = \frac{2 \sum_{j = 1}^{S} a_j \times b_j}{(\frac{\sum_{j = 1}^{S} a_j^2}{N_a^2} + \frac{\sum_{j = 1}^{S} b_j^2}{N_b^2}) \times N_a \times N_b} \]
\[ C_{Bi} = \frac{o_i - N \times p}{\sqrt{N \times p \times (1 - p)}} \]
Berger, W. H., and F. L. Parker. 1970. “Diversity of Planktonic Foraminifera in Deep-Sea Sediments.” Science 168 (3937): 1345–7. doi:10.1126/science.168.3937.1345.
Bowman, K. O., K. Hutcheson, E. P. Odum, and L. R. Shenton. 1971. “Comments on the Distribution of Indices of Diversity.” In Statistical Ecology, edited by E. C. Patil, E. C. Pielou, and W. E. Waters, 3:315–66. University Park, PA: Pennsylvania State University Press.
Brainerd, George W. 1951. “The Place of Chronological Ordering in Archaeological Analysis.” American Antiquity 16 (04): 301–13. doi:10.2307/276979.
Bray, J. Roger, and J. T. Curtis. 1957. “An Ordination of the Upland Forest Communities of Southern Wisconsin.” Ecological Monographs 27 (4): 325–49. doi:10.2307/1942268.
Brillouin, Leon. 1956. Science and Information Theory. New York: Academic Press.
Chao, Anne. 1984. “Nonparametric Estimation of the Number of Classes in a Population.” Scandinavian Journal of Statistics 11 (4): 265–70.
———. 1987. “Estimating the Population Size for Capture-Recapture Data with Unequal Catchability.” Biometrics 43 (4): 783–91. doi:10.2307/2531532.
Chao, Anne, and Chun-Huo Chiu. 2016. “Species Richness: Estimation and Comparison.” In Wiley StatsRef: Statistics Reference Online, edited by N. Balakrishnan, Theodore Colton, Brian Everitt, Walter Piegorsch, Fabrizio Ruggeri, and Jozef L. Teugels, 1–26. Chichester, UK: John Wiley & Sons, Ltd. doi:10.1002/9781118445112.stat03432.pub2.
Chao, Anne, and Shen-Ming Lee. 1992. “Estimating the Number of Classes via Sample Coverage.” Journal of the American Statistical Association 87 (417): 210–17. doi:10.1080/01621459.1992.10475194.
Chiu, Chun-Huo, Yi-Ting Wang, Bruno A. Walther, and Anne Chao. 2014. “An Improved Nonparametric Lower Bound of Species Richness via a Modified Good-Turing Frequency Formula.” Biometrics 70 (3): 671–82. doi:10.1111/biom.12200.
Cody, M. L. 1975. “Towards a Theory of Continental Species Diversity: Bird Distributions over Mediterranean Habitat Gradients.” In Ecology and Evolution of Communities, edited by M. L. Cody and J. M. Diamond, 214–57. Cambridge, MA: Harvard University Press.
Hurlbert, Stuart H. 1971. “The Nonconcept of Species Diversity: A Critique and Alternative Parameters.” Ecology 52 (4): 577–86. doi:10.2307/1934145.
Hutcheson, K. 1970. “A Test for Comparing Diversity Based on the Shannon Formula.” Journal of Theoretical Biology 29 (1): 151–54. doi:10.1016/0022-5193(70)90124-4.
Kintigh, Keith. 2006. “Ceramic Dating and Type Associations.” In Managing Archaeological Data: Essays in Honor of Sylvia W. Gaines, edited by Jeffrey Hantman and Rachel Most, 17–26. Anthropological Research Paper 57. Tempe, AZ: Arizona State University. doi:10.6067/XCV8J38QSS.
Margalef, R. 1958. “Information Theory in Ecology.” General Systems 3: 36–71.
McIntosh, Robert P. 1967. “An Index of Diversity and the Relation of Certain Concepts to Diversity.” Ecology 48 (3): 392–404. doi:10.2307/1932674.
Menhinick, Edward F. 1964. “A Comparison of Some Species-Individuals Diversity Indices Applied to Samples of Field Insects.” Ecology 45 (4): 859–61. doi:10.2307/1934933.
Peet, R K. 1974. “The Measurement of Species Diversity.” Annual Review of Ecology and Systematics 5 (1): 285–307. doi:10.1146/annurev.es.05.110174.001441.
Robinson, W. S. 1951. “A Method for Chronologically Ordering Archaeological Deposits.” American Antiquity 16 (04): 293–301. doi:10.2307/276978.
Routledge, R. D. 1977. “On Whittaker’s Components of Diversity.” Ecology 58 (5): 1120–7. doi:10.2307/1936932.
Sander, Howard L. 1968. “Marine Benthic Diversity: A Comparative Study.” The American Naturalist 102 (925): 243–82. https://www.jstor.org/stable/2459027.
Shannon, C. E. 1948. “A Mathematical Theory of Communication.” The Bell System Technical Journal 27: 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.
Simpson, E. H. 1949. “Measurement of Diversity.” Nature 163 (4148): 688–88. doi:10.1038/163688a0.
Whittaker, R. H. 1960. “Vegetation of the Siskiyou Mountains, Oregon and California.” Ecological Monographs 30 (3): 279–338. doi:10.2307/1943563.
Wilson, M. V., and A. Shmida. 1984. “Measuring Beta Diversity with Presence-Absence Data.” The Journal of Ecology 72 (3): 1055–64. doi:10.2307/2259551.