The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

LOL Cross-Validation

Eric Bridgeford

2020-06-25

Cross Validation to Determine Error of Embedding Alg composed with Classifier

require(lolR)
require(ggplot2)
n = 400
d = 30
r = 3

Here, we look at how to see the generalizability of a given model in the form of the cross-validated error. We simulate data with n=200 and d=30:

testdat <- lol.sims.rtrunk(n, d)
X <- testdat$X
Y <- testdat$Y

data <- data.frame(x1=X[,1], x2=X[,2], y=Y)
data$y <- factor(data$y)
ggplot(data, aes(x=x1, y=x2, color=y)) +
  geom_point() +
  xlab("x1") +
  ylab("x2") +
  ggtitle("Simulated Data")

We arbitrarily select LOL as our algorithm, and look at the leave-one-out (loo) cross-validated error with the LDA classifier. We project the resulting model to 3 dimensions and visualize the first 2:

result <- lol.xval.eval(X, Y, r, alg = lol.project.lol, alg.return="A",
                        classifier=MASS::lda, classifier.return="class", k='loo')

data <- data.frame(x1=result$model$Xr[,1], x2=result$model$Xr[,2], y=Y)
data$y <- factor(data$y)
ggplot(data, aes(x=x1, y=x2, color=y)) +
  geom_point() +
  xlab("x1") +
  ylab("x2") +
  ggtitle(sprintf("Projected Data using LOL, L=%.2f", result$lhat))

Cross Validation to Determine Optimal Embedding Dimensions of Embedding Alg composed with Classifier

result <- lol.xval.optimal_dimselect(X, Y, rs=c(5, 10, 15), alg = lol.project.lol, alg.return="A",
                        classifier=MASS::lda, classifier.return="class", k='loo')

data <- data.frame(x1=result$model$Xr[,1], x2=result$model$Xr[,2], y=Y)
data$y <- factor(data$y)
ggplot(data, aes(x=x1, y=x2, color=y)) +
  geom_point() +
  xlab("x1") +
  ylab("x2") +
  ggtitle(sprintf("Projected Data using LOL, L=%.2f", result$optimal.lhat))

ggplot(result$foldmeans.data, aes(x=r, y=lhat)) +
  geom_line() +
  xlab("Embedding Dimensions, r") +
  ylab("Misclassification Rate, L") +
  ggtitle("Impact on Misclassification Rate of Embedding Dimension")

print(sprintf("optimal dimension: %d", result$optimal.r))

## [1] "optimal dimension: 5"

print(sprintf("Misclassification rate at rhat = %d: %.2f", result$optimal.r, result$optimal.lhat))

## [1] "Misclassification rate at rhat = 5: 0.07"
## [2] "Misclassification rate at rhat = 5: 0.07"

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.