The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Let’s start with an extremely simple example: a prediction problem on a continuous outcome, where we want to use cross-validation to minimize the expected risk/loss on held out data across a few different models.
We’ll use the iris dataset to do this.
nadir::super_learner() strives to keep the syntax
simple, so the simplest call to super_learner() might look
something like this:
super_learner(
data = iris,
formula = Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width,
learners = list(lnr_lm, lnr_rf, lnr_earth, lnr_mean))
#> $predict
#> function (newdata)
#> {
#> Reduce(`+`, x = future_lapply(1:length(fit_learners), function(i) {
#> fit_learners[[i]](newdata) * learner_weights[[i]]
#> }, future.seed = TRUE))
#> }
#> <bytecode: 0x1492f1408>
#> <environment: 0x1492f38d8>
#>
#> $y_variable
#> [1] "Petal.Width"
#>
#> $outcome_type
#> [1] "continuous"
#>
#> $learner_weights
#> lm rf earth mean
#> 0.5769071 0.4230929 0.0000000 0.0000000
#>
#> $holdout_predictions
#> # A tibble: 150 × 6
#> .sl_fold lm rf earth mean Petal.Width
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.425 0.329 1.83 1.20 0.4
#> 2 1 0.143 0.250 1.43 1.20 0.1
#> 3 1 0.0913 0.227 0.959 1.20 0.1
#> 4 1 0.0996 0.599 1.68 1.20 0.2
#> 5 1 0.319 0.220 1.64 1.20 0.4
#> 6 1 0.417 0.293 1.71 1.20 0.1
#> 7 1 0.203 0.191 1.55 1.20 0.2
#> 8 1 -0.0406 0.467 1.20 1.20 0.3
#> 9 1 0.232 0.213 1.02 1.20 0.2
#> 10 1 0.346 0.221 1.63 1.20 0.6
#> # ℹ 140 more rows
#>
#> attr(,"class")
#> [1] "nadir_sl_model"Notice what it returns: A function of newdata that
predicts across the learners, sums up according to the learned weights,
and returns the ensemble predictions.
We can store that learned predictor function and use it:
# We recommend storing more complicated arguments used repeatedly to simplify
# the call to super_learner()
petal_formula <- Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width
learners <- list(lnr_lm, lnr_rf, lnr_earth, lnr_mean)
sl_model <- super_learner(
data = iris,
formula = petal_formula,
learners = learners)In particular, we can use it to predict on the same dataset,
predict(sl_model, iris) |> head()
#> 1 2 3 4 5 6
#> 0.2274732 0.1725615 0.1903219 0.2566321 0.2482329 0.3803236On a random sample of it,
predict(sl_model, iris[sample.int(size = 10, n = nrow(iris)), ]) |>
head()
#> 128 45 25 2 97 106
#> 1.7339317 0.4793721 0.4237248 0.1725615 1.3776838 2.2293237Or on completely new data.
fake_iris_data <- data.frame()
fake_iris_data <- cbind.data.frame(
Sepal.Length =
rnorm(
n = 6,
mean = mean(iris$Sepal.Length),
sd = sd(iris$Sepal.Length)
),
Sepal.Width =
rnorm(
n = 6,
mean = mean(iris$Sepal.Width),
sd = sd(iris$Sepal.Width)
),
Petal.Length =
rnorm(
n = 6,
mean = mean(iris$Petal.Length),
sd = sd(iris$Petal.Length)
)
)
predict(sl_model, fake_iris_data) |>
head()
#> 1 2 3 4 5 6
#> 1.1015928 1.7899799 1.1575923 0.8751806 0.8710849 0.5649973If we want to know a lot more about the super_learner()
process, how it weighted the candidate learners, what the candidate
learners predicted on the held-out data, etc., then we will want to look
at the other metadata contained in the nadir_sl_model
object produced: option.
sl_model_iris <- super_learner(
data = iris,
formula = petal_formula,
learners = learners)
str(sl_model_iris, max.level = 2)
#> List of 5
#> $ predict :function (newdata)
#> $ y_variable : chr "Petal.Width"
#> $ outcome_type : chr "continuous"
#> $ learner_weights : Named num [1:4] 0.521 0.479 0 0
#> ..- attr(*, "names")= chr [1:4] "lm" "rf" "earth" "mean"
#> $ holdout_predictions: tibble [150 × 6] (S3: tbl_df/tbl/data.frame)
#> - attr(*, "class")= chr "nadir_sl_model"To put some description to what’s contained in the output from
super_learner():
$predict() that takes
newdata$y_variable and
$outcome_type to provide some context to the learning task
that was performed.$learner_weights that indicate what weight the
different candidate learners were given$holdout_predictions: A data.frame of predictions from
each of the candidate learners, along with the actual outcome from the
held-out data.We can call compare_learners() on the verbose output
from super_learner() if we want to assess how the different
learners performed. We can also call cv_super_learner()
with the same arguments as super_learner() to wrap the
super_learner() call in another layer of cross-validation
to assess how super_learner() performs on held-out
data.
compare_learners(sl_model_iris)
#> Inferring the loss metric for learner comparison based on the outcome type:
#> outcome_type=continuous -> using mean squared error
#> # A tibble: 1 × 4
#> lm rf earth mean
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.0373 0.0391 1.69 0.590
cv_super_learner(
data = iris,
formula = petal_formula,
learners = learners)$cv_loss
#> The loss_metric is being inferred based on the outcome_type=continuous -> using CV-MSE
#> [1] 0.03374206We can, of course, do anything with a super learned model that we would do with a conventional prediction model, like calculating performance statistics like \(R^2\).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.