Introduction to ‘rg.test’

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Working with rgTest

Get example data

set.seed(100)

d=200
vmu = rep(1.1/sqrt(d),d)
vsd = c(rep(1.1, d/5), rep(1, d-d/5))
num1 = 100
num2 = 100
s1 = matrix(0,num1,d)               # sample 1
s2 = matrix(0,num2,d)               # sample 2

for (i in 1:num1) {
  s1[i,] = rnorm(d)
}
for (i in 1:(num2)) {
  s2[i,] = rnorm(d, mean = vmu, sd = vsd)
}

num1 = nrow(s1)                     # number of observations in sample 1
num2 = nrow(s2)                     # number of observations in sample 2

Get an overview of the data.

The data of both samples have 200 variables. We take a look at the matrix of scatterplots of the first five variables for the two samples.

plot_dat = cbind(as.data.frame(rbind(s1[,1:5], s2[,1:5])), label = rep(c('sample 1', 'sample 2'), each = 100))
my_cols = c("#00AFBB", "#E7B800")  
pairs(plot_dat[, 1:5], col = my_cols[as.factor(plot_dat$label)])

Even though we know the observations from two samples are generated from different distribution, it is hard to tell the differnce by looking at the scatterplots.

Graph-based two-sample test

Use data matrices

res1 = rg.test(data.X = s1, data.Y = s2, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000, progress_bar = F)

type	test statistic	p value
robust generalized(asymptotic)	9.00485158688115	0.0110820810661949
robust max-type(asymptotic)	2.37080471135692	0.0264665891417283
robust generalized(permutation)	NA	0.013
robust max-type(permutation)	NA	0.022

Use the distance matrix

data = rbind(s1, s2)
dist = dist(as.matrix(data))
res2 = rg.test(dis = dist, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000)

type	test statistic	p value
robust generalized(asymptotic)	9.00485158688115	0.0110820810661949
robust max-type(asymptotic)	2.37080471135692	0.0264665891417283
robust generalized(permutation)	NA	0.015
robust max-type(permutation)	NA	0.026

Use the edge matrix

E = kmst(dis=dist, k=5)
res3 = rg.test(E = E, n1 = num1, n2 = num2, weigh.fun = weiMax, perm.num = 1000)

type	test statistic	p value
robust generalized(asymptotic)	9.00485158688115	0.0110820810661949
robust max-type(asymptotic)	2.37080471135692	0.0264665891417283
robust generalized(permutation)	NA	0.016
robust max-type(permutation)	NA	0.032

The two-sample test is done. We can see the asymptotic results are the same by using the data matrices, the distance matrix or the edge matrix generated by 5-MST. The p-values based on the permutation method are similar to those based on asymptotic method.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.