ksample.e {energy} | R Documentation |
Returns the E-statistic (energy statistic) for the multivariate k-sample test of equal distributions.
ksample.e(x, sizes, distance = FALSE, ix = 1:sum(sizes), incomplete = FALSE, N = 100)
x |
data matrix of pooled sample |
sizes |
vector of sample sizes |
distance |
logical: if TRUE, x is a distance matrix |
ix |
a permutation of the row indices of x |
incomplete |
logical: if TRUE, compute incomplete E-statistics |
N |
incomplete sample size |
The k-sample multivariate E-statistic for testing equal distributions
is returned. The statistic is computed from the original pooled samples, stacked in
matrix x
where each row is a multivariate observation, or from the distance
matrix x
of the original data. The
first sizes[1]
rows of x
are the first sample, the next
sizes[2]
rows of x
are the second sample, etc.
The two-sample E-statistic proposed by Szekely and Rizzo (2003) is the e-distance e(S_i,S_j), defined for two samples S_i, S_j of size n_i, n_j by
e(S_i, S_j) = (n_i n_j)(n_i+n_j)[2M_(ij)-M_(ii)-M_(jj)],
where
M_{ij} = 1/(n_i n_j) sum[1:n_i, 1:n_j] ||X_(ip) - X_(jq)||,
|| || denotes Euclidean norm, and X_(ip) denotes the p-th observation in the i-th sample. The k-sample E-statistic is defined by summing the pairwise e-distances over all k(k-1)/2 pairs of samples:
E = sum[i<j] e(S_i,S_j).
Large values of E are significant.
If incomplete==TRUE
, an incomplete E-statistic (which is an
incomplete V-statistic) is computed. That is, at most
N
observations from each sample are used,
by sampling without replacement as needed.
The value of the multisample E-statistic corresponding to
the permutation ix
is returned.
This function computes the E-statistic only.
For the test decision,
a nonparametric bootstrap test (approximate permutation test)
is provided by the function eqdist.etest
.
Maria Rizzo rizzo@math.ohiou.edu
Szekely, G. J. and Rizzo, M. L. (2003) Testing for Equal Distributions in High Dimension, submitted.
Szekely, G. J. (2000) E-statistics: Energy of Statistical Samples, preprint.
## compute 3-sample E-statistic for 4-dimensional iris data data(iris) ksample.e(iris[,1:4], c(50,50,50)) ## compute univariate two-sample incomplete E-statistic x1 <- rnorm(200) x2 <- rnorm(300, .5) x <- c(x1, x2) ksample.e(x, c(200, 300), incomplete=TRUE, N=100)