Basic Examples

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Basic Examples

Let’s start with an extremely simple example: a prediction problem on a continuous outcome, where we want to use cross-validation to minimize the expected risk/loss on held out data across a few different models.

nadir::super_learner() strives to keep the syntax simple, so the simplest call to super_learner() might look something like this:

super_learner(
  data = iris,
  formula = Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width,
  learners = list(lnr_lm, lnr_rf, lnr_earth, lnr_mean))
#> $predict
#> function (newdata) 
#> {
#>     Reduce(`+`, x = future_lapply(1:length(fit_learners), function(i) {
#>         fit_learners[[i]](newdata) * learner_weights[[i]]
#>     }, future.seed = TRUE))
#> }
#> <bytecode: 0x1492f1408>
#> <environment: 0x1492f38d8>
#> 
#> $y_variable
#> [1] "Petal.Width"
#> 
#> $outcome_type
#> [1] "continuous"
#> 
#> $learner_weights
#>        lm        rf     earth      mean 
#> 0.5769071 0.4230929 0.0000000 0.0000000 
#> 
#> $holdout_predictions
#> # A tibble: 150 × 6
#>    .sl_fold      lm    rf earth  mean Petal.Width
#>       <int>   <dbl> <dbl> <dbl> <dbl>       <dbl>
#>  1        1  0.425  0.329 1.83   1.20         0.4
#>  2        1  0.143  0.250 1.43   1.20         0.1
#>  3        1  0.0913 0.227 0.959  1.20         0.1
#>  4        1  0.0996 0.599 1.68   1.20         0.2
#>  5        1  0.319  0.220 1.64   1.20         0.4
#>  6        1  0.417  0.293 1.71   1.20         0.1
#>  7        1  0.203  0.191 1.55   1.20         0.2
#>  8        1 -0.0406 0.467 1.20   1.20         0.3
#>  9        1  0.232  0.213 1.02   1.20         0.2
#> 10        1  0.346  0.221 1.63   1.20         0.6
#> # ℹ 140 more rows
#> 
#> attr(,"class")
#> [1] "nadir_sl_model"

Notice what it returns: A function of newdata that predicts across the learners, sums up according to the learned weights, and returns the ensemble predictions.

Getting More Information Out

If we want to know a lot more about the super_learner() process, how it weighted the candidate learners, what the candidate learners predicted on the held-out data, etc., then we will want to look at the other metadata contained in the nadir_sl_model object produced: option.

sl_model_iris <- super_learner(
  data = iris,
  formula = petal_formula,
  learners = learners)

str(sl_model_iris, max.level = 2)
#> List of 5
#>  $ predict            :function (newdata)  
#>  $ y_variable         : chr "Petal.Width"
#>  $ outcome_type       : chr "continuous"
#>  $ learner_weights    : Named num [1:4] 0.521 0.479 0 0
#>   ..- attr(*, "names")= chr [1:4] "lm" "rf" "earth" "mean"
#>  $ holdout_predictions: tibble [150 × 6] (S3: tbl_df/tbl/data.frame)
#>  - attr(*, "class")= chr "nadir_sl_model"

To put some description to what’s contained in the output from super_learner():

A prediction function, $predict() that takes newdata
Some character fields like $y_variable and $outcome_type to provide some context to the learning task that was performed.
$learner_weights that indicate what weight the different candidate learners were given
$holdout_predictions: A data.frame of predictions from each of the candidate learners, along with the actual outcome from the held-out data.

We can call compare_learners() on the verbose output from super_learner() if we want to assess how the different learners performed. We can also call cv_super_learner() with the same arguments as super_learner() to wrap the super_learner() call in another layer of cross-validation to assess how super_learner() performs on held-out data.

compare_learners(sl_model_iris)
#> Inferring the loss metric for learner comparison based on the outcome type:
#> outcome_type=continuous -> using mean squared error
#> # A tibble: 1 × 4
#>       lm     rf earth  mean
#>    <dbl>  <dbl> <dbl> <dbl>
#> 1 0.0373 0.0391  1.69 0.590

cv_super_learner(
  data = iris, 
  formula = petal_formula,
  learners = learners)$cv_loss
#> The loss_metric is being inferred based on the outcome_type=continuous -> using CV-MSE
#> [1] 0.03374206

We can, of course, do anything with a super learned model that we would do with a conventional prediction model, like calculating performance statistics like $R^2$.

var_residuals <- var(iris$Sepal.Length - predict(sl_model_iris, iris))
total_variance <- var(iris$Sepal.Length)
variance_explained <- total_variance - var_residuals 

rsquared <- variance_explained / total_variance
print(rsquared)
#> [1] 0.7217497

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.