The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
library(mgc)
library(reshape2)
library(ggplot2)
plot_sim_func <- function(X, Y, Xf, Yf, name, geom='line') {
if (!is.null(dim(Y))) {
Y <- Y[, 1]
Yf <- Yf[, 1]
}
if (geom == 'points') {
funcgeom <- geom_point
} else {
funcgeom <- geom_line
}
data <- data.frame(x1=X[,1], y=Y)
data_func <- data.frame(x1=Xf[,1], y=Yf)
ggplot(data, aes(x=x1, y=y)) +
funcgeom(data=data_func, aes(x=x1, y=y), color='red', size=3) +
geom_point() +
xlab("x") +
ylab("y") +
ggtitle(name) +
theme_bw()
}
plot_mtx <- function(Dx, main.title="Local Correlation Map", xlab.title="# X Neighbors", ylab.title="# Y Neighbors") {
data <- melt(Dx)
ggplot(data, aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
scale_fill_gradientn(name="l-corr",
colours=c("#f2f0f7", "#cbc9e2", "#9e9ac8", "#6a51a3"),
limits=c(min(Dx), max(Dx))) +
xlab(xlab.title) +
ylab(ylab.title) +
theme_bw() +
ggtitle(main.title)
}
In this notebook, we show how to use the MGC
statistic in a real and simulated context.
First, we use a simulated example, where Y = XB + N
; that is, Y
is linearly dependent on X
with added gaussian noise.
n=200 # 100 samples
d=1 # simple 1-d case
set.seed(12345)
data <- mgc.sims.linear(n, d) # data with noise
func <- mgc.sims.linear(n, d, eps=0) # source function
We first visualize the data:
plot_sim_func(data$X, data$Y, func$X, func$Y, name="Linear Simulation")
which clearly shows a slightly linear relationship.
visualizing the MGC
image with 100 replicates:
set.seed(12345)
res <- mgc.test(data$X, data$Y, nperm=20) # 20 permutations test; typically should be run with >100 permutations
## Warning in mgc.test(data$X, data$Y, nperm = 20): nperm is < 100. nperm should
## typically be set > 100.
plot_mtx(res$localCorr, main.title="Local Correlation Map")
print(res$optimalScale)
## $x
## [1] 200
##
## $y
## [1] 200
print(res$statMGC)
## NULL
As we can see, the local correlation plot suggests a strongly linear relationship. This is because intuitively, having more and more neighbors will help in our identification of the linear relationship between x
and y
, and as we can see in the local correlation map, k=n=200
and l=n=200
shows the best smoothed p
.
In the below demo, we show the result of MGC
to determine the relationship between the first (sepal length) and third (petal length) dimensions of the iris
dataset:
set.seed(12345)
res <- mgc.test(iris[,1,drop=FALSE], iris[,3,drop=FALSE], nperm=20)
## Warning in mgc.test(iris[, 1, drop = FALSE], iris[, 3, drop = FALSE], nperm =
## 20): nperm is < 100. nperm should typically be set > 100.
plot_mtx(res$localCorr, main.title="Local Correlation Map",
xlab.title="Sepal Length Neighbors", ylab.title="Petal Length Neighbors")
print(res$optimalScale)
## $x
## [1] 35
##
## $y
## [1] 43
print(res$statMGC)
## NULL
viewing the corr map above we see that the relationship betweel Sepal and Petal Length is strongly linear.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.