ksample.e {energy}R Documentation

E-statistic (Energy Statistic) for Multivariate k-sample Test of Equal Distributions

Description

Returns the E-statistic (energy statistic) for the multivariate k-sample test of equal distributions.

Usage

 ksample.e(x, sizes, distance = FALSE, ix = 1:sum(sizes), 
           incomplete = FALSE, N = 100)

Arguments

x data matrix of pooled sample
sizes vector of sample sizes
distance logical: if TRUE, x is a distance matrix
ix a permutation of the row indices of x
incomplete logical: if TRUE, compute incomplete E-statistics
N incomplete sample size

Details

The k-sample multivariate E-statistic for testing equal distributions is returned. The statistic is computed from the original pooled samples, stacked in matrix x where each row is a multivariate observation, or from the distance matrix x of the original data. The first sizes[1] rows of x are the first sample, the next sizes[2] rows of x are the second sample, etc.

The two-sample E-statistic proposed by Szekely and Rizzo (2003) is the e-distance e(S_i,S_j), defined for two samples S_i, S_j of size n_i, n_j by

e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],

where

M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||,

|| || denotes Euclidean norm, and X_(ip) denotes the p-th observation in the i-th sample. The k-sample E-statistic is defined by summing the pairwise e-distances over all k(k-1)/2 pairs of samples:

E = sum[i<j] e(S_i,S_j).

Large values of E are significant.

If incomplete==TRUE, an incomplete E-statistic (which is an incomplete V-statistic) is computed. That is, at most N observations from each sample are used, by sampling without replacement as needed.

Value

The value of the multisample E-statistic corresponding to the permutation ix is returned.

Note

This function computes the E-statistic only. For the test decision, a nonparametric bootstrap test (approximate permutation test) is provided by the function eqdist.etest.

Author(s)

Maria Rizzo rizzo@math.ohiou.edu

References

Szekely, G. J. and Rizzo, M. L. (2003) Testing for Equal Distributions in High Dimension, submitted.

Szekely, G. J. (2000) E-statistics: Energy of Statistical Samples, preprint.

See Also

eqdist.etest

Examples

## compute 3-sample E-statistic for 4-dimensional iris data
 data(iris)
 ksample.e(iris[,1:4], c(50,50,50))

## compute univariate two-sample incomplete E-statistic
 x1 <- rnorm(200)
 x2 <- rnorm(300, .5)
 x <- c(x1, x2)
 ksample.e(x, c(200, 300), incomplete=TRUE, N=100)
 

[Package Contents]