Machine Learning

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Machine Learning

Bernardo Lares

2026-04-23

Introduction

The lares package provides a streamlined interface to h2o’s AutoML for automated machine learning. This vignette demonstrates how to build, evaluate, and interpret models with minimal code.

Setup

Install and load required packages:

library(lares)
library(dplyr)

h2o must be installed separately:

# Install h2o (run once)
# install.packages("h2o")
library(h2o)

# Initialize h2o quietly for vignette
Sys.unsetenv("http_proxy")
Sys.unsetenv("https_proxy")
h2o.init(nthreads = -1, max_mem_size = "2G", ip = "127.0.0.1")
#> 
#> H2O is not running yet, starting it now...
#> 
#> Note:  In case of errors look at the following log files:
#>     /var/folders/_9/97xqjz8j4cx_q5m_3t646mdm0000gn/T//Rtmpl5cGQ0/file4ca320b67f28/h2o_bernardo_started_from_r.out
#>     /var/folders/_9/97xqjz8j4cx_q5m_3t646mdm0000gn/T//Rtmpl5cGQ0/file4ca32d562bdf/h2o_bernardo_started_from_r.err
#> 
#> 
#> Starting H2O JVM and connecting: ... Connection successful!
#> 
#> R is connected to the H2O cluster: 
#>     H2O cluster uptime:         3 seconds 462 milliseconds 
#>     H2O cluster timezone:       Europe/Madrid 
#>     H2O data parsing timezone:  UTC 
#>     H2O cluster version:        3.44.0.3 
#>     H2O cluster version age:    2 years, 4 months and 2 days 
#>     H2O cluster name:           H2O_started_from_R_bernardo_rna358 
#>     H2O cluster total nodes:    1 
#>     H2O cluster total memory:   1.76 GB 
#>     H2O cluster total cores:    12 
#>     H2O cluster allowed cores:  12 
#>     H2O cluster healthy:        TRUE 
#>     H2O Connection ip:          127.0.0.1 
#>     H2O Connection port:        54321 
#>     H2O Connection proxy:       NA 
#>     H2O Internal Security:      FALSE 
#>     R Version:                  R version 4.5.3 (2026-03-11)
h2o.no_progress() # Disable progress bars

Pipeline

h2o_automl workflow

In short, these are the steps that happen on h2o_automl’s backend:

Input Processing: The function receives a dataframe df and the dependent variable y to predict. Set seed for reproducibility.
Model Type Detection: Automatically decides between classification (categorical) or regression (continuous) based on y’s class and unique values (controlled by thresh parameter).
Data Splitting: Splits data into test and train datasets. Control the proportion with split parameter. Replicate this with msplit().
Preprocessing:
- Center and scale numerical values
- Remove outliers with no_outliers
- Impute missing values with MICE (impute = TRUE)
- Balance training data for classification (balance = TRUE)
- Replicate with model_preprocess()
Model Training: Runs h2o::h2o.automl() to train multiple models and generate a leaderboard sorted by performance. Customize with:
- max_models or max_time
- nfolds for k-fold cross-validation
- exclude_algos and include_algos
Model Selection: Selects the best model based on performance metric (change with stopping_metric). Use h2o_selectmodel() to choose an alternative.
Performance Evaluation: Calculates metrics and plots using test predictions (unseen data). Replicate with model_metrics().
Results: Returns a list with inputs, leaderboard, best model, metrics, and plots. Export with export_results().

Quick Start: Binary Classification

Let’s build a model to predict Titanic survival:

data(dft)

# Train an AutoML model
# Binary classification
model <- h2o_automl(
  df = dft,
  y = "Survived",
  target = "TRUE",
  ignore = c("Ticket", "Cabin", "PassengerId"),
  max_models = 10,
  max_time = 120,
  impute = FALSE
)
#> [90m# A tibble: 2 × 5[39m
#>   tag       n     p order  pcum
#>   [3m[90m<lgl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m1[39m FALSE   549  61.6     1  61.6
#> [90m2[39m TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_4_AutoML_1_20260423_85339 0.8655135 0.4261943 0.8323496
#> 2 GBM_3_AutoML_1_20260423_85339 0.8628015 0.4334215 0.8260966
#> 3 GBM_2_AutoML_1_20260423_85339 0.8589818 0.4318601 0.8276204
#>   mean_per_class_error      rmse       mse
#> 1            0.1807105 0.3635268 0.1321517
#> 2            0.1725745 0.3662049 0.1341061
#> 3            0.1843010 0.3652161 0.1333828
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/10): GBM_4_AutoML_1_20260423_85339
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.86366
#>    ACC = 0.18657
#>    PRC = 0.19355
#>    TPR = 0.34615
#>    TNR = 0.085366
#> 
#> Most important variables:
#>    Sex (40.8%)
#>    Fare (20.8%)
#>    Age (16.2%)
#>    Pclass (14.6%)
#>    SibSp (3.2%)

# View results
print(model)
#> Model (1/10): GBM_4_AutoML_1_20260423_85339
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.86366
#>    ACC = 0.18657
#>    PRC = 0.19355
#>    TPR = 0.34615
#>    TNR = 0.085366
#> 
#> Most important variables:
#>    Sex (40.8%)
#>    Fare (20.8%)
#>    Age (16.2%)
#>    Pclass (14.6%)
#>    SibSp (3.2%)

That’s it! h2o_automl() handles:

Train/test split
One-hot encoding of categorical variables
Model training with multiple algorithms
Hyperparameter tuning
Model selection

Understanding the Output

The model object contains:

names(model)
#>  [1] "model"           "y"               "scores_test"     "metrics"        
#>  [5] "parameters"      "importance"      "datasets"        "scoring_history"
#>  [9] "categoricals"    "type"            "split"           "threshold"      
#> [13] "model_name"      "algorithm"       "leaderboard"     "project"        
#> [17] "ignored"         "seed"            "h2o"             "plots"

Key components: - model: Best h2o model - metrics: Performance metrics - importance: Variable importance - datasets: Train/test data used - parameters: Configuration used

Model Performance

Metrics

View detailed metrics:

# All metrics
model$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#>        Pred
#> Real    FALSE TRUE
#>   FALSE    14  150
#>   TRUE     68   36
#> 
#> $gain_lift
#> [90m# A tibble: 10 × 10[39m
#>    percentile value random target total  gain optimal   lift response score
#>    [3m[90m<fct>[39m[23m      [3m[90m<chr>[39m[23m  [3m[90m<dbl>[39m[23m  [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m 1[39m 1          TRUE    10.8     29    29  27.9    27.9 158.     27.9   90.0 
#> [90m 2[39m 2          TRUE    20.1     22    25  49.0    51.9 143.     21.2   78.1 
#> [90m 3[39m 3          TRUE    30.2     16    27  64.4    77.9 113.     15.4   51.6 
#> [90m 4[39m 4          TRUE    39.9     13    26  76.9   100    92.7    12.5   29.2 
#> [90m 5[39m 5          TRUE    50        6    27  82.7   100    65.4     5.77  20.7 
#> [90m 6[39m 6          TRUE    60.1      5    27  87.5   100    45.7     4.81  14.8 
#> [90m 7[39m 7          TRUE    69.8      6    26  93.3   100    33.7     5.77  12.3 
#> [90m 8[39m 8          TRUE    79.9      1    27  94.2   100    18.0     0.962  9.31
#> [90m 9[39m 9          TRUE    89.9      3    27  97.1   100     8.00    2.88   6.12
#> [90m10[39m 10         TRUE   100        3    27 100     100     0       2.88   1.54
#> 
#> $metrics
#>       AUC     ACC     PRC     TPR      TNR
#> 1 0.86366 0.18657 0.19355 0.34615 0.085366
#> 
#> $cv_metrics
#> [90m# A tibble: 20 × 8[39m
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    [3m[90m<chr>[39m[23m     [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m
#> [90m 1[39m accuracy  0.855 0.033[4m1[24m      0.856     0.904       0.864      0.815      0.839
#> [90m 2[39m auc       0.861 0.042[4m4[24m      0.876     0.924       0.860      0.815      0.831
#> [90m 3[39m err       0.145 0.033[4m1[24m      0.144     0.096       0.136      0.185      0.161
#> [90m 4[39m err_cou… 18     4.06       18        12          17         23         20    
#> [90m 5[39m f0point5  0.835 0.035[4m6[24m      0.855     0.881       0.826      0.830      0.785
#> [90m 6[39m f1        0.797 0.034[4m2[24m      0.812     0.838       0.809      0.777      0.75 
#> [90m 7[39m f2        0.763 0.036[4m9[24m      0.774     0.799       0.793      0.730      0.718
#> [90m 8[39m lift_to…  2.66  0.377       2.40      3.12        2.72       2.18       2.88 
#> [90m 9[39m logloss   0.432 0.075[4m8[24m      0.413     0.324       0.425      0.530      0.468
#> [90m10[39m max_per…  0.259 0.040[4m0[24m      0.25      0.225       0.217      0.298      0.302
#> [90m11[39m mcc       0.690 0.059[4m2[24m      0.703     0.775       0.705      0.632      0.636
#> [90m12[39m mean_pe…  0.834 0.027[4m8[24m      0.841     0.870       0.847      0.806      0.806
#> [90m13[39m mean_pe…  0.166 0.027[4m8[24m      0.159     0.130       0.153      0.194      0.194
#> [90m14[39m mse       0.132 0.024[4m5[24m      0.126     0.097[4m7[24m      0.129      0.164      0.145
#> [90m15[39m pr_auc    0.832 0.050[4m9[24m      0.868     0.895       0.815      0.818      0.764
#> [90m16[39m precisi…  0.863 0.039[4m9[24m      0.886     0.912       0.837      0.870      0.811
#> [90m17[39m r2        0.436 0.087[4m1[24m      0.481     0.551       0.446      0.342      0.358
#> [90m18[39m recall    0.741 0.040[4m0[24m      0.75      0.775       0.783      0.702      0.698
#> [90m19[39m rmse      0.362 0.034[4m2[24m      0.355     0.313       0.359      0.404      0.381
#> [90m20[39m specifi…  0.926 0.023[4m1[24m      0.932     0.965       0.911      0.910      0.914
#> 
#> $max_metrics
#>                         metric  threshold       value idx
#> 1                       max f1 0.49534115   0.7775281 157
#> 2                       max f2 0.28977227   0.7987220 226
#> 3                 max f0point5 0.50884749   0.8172147 150
#> 4                 max accuracy 0.50884749   0.8410915 150
#> 5                max precision 0.98538253   1.0000000   0
#> 6                   max recall 0.04255203   1.0000000 386
#> 7              max specificity 0.98538253   1.0000000   0
#> 8             max absolute_mcc 0.50884749   0.6587759 150
#> 9   max min_per_class_accuracy 0.36328089   0.8025210 200
#> 10 max mean_per_class_accuracy 0.46113191   0.8194041 168
#> 11                     max tns 0.98538253 385.0000000   0
#> 12                     max fns 0.98538253 236.0000000   0
#> 13                     max fps 0.01285907 385.0000000 399
#> 14                     max tps 0.04255203 238.0000000 386
#> 15                     max tnr 0.98538253   1.0000000   0
#> 16                     max fnr 0.98538253   0.9915966   0
#> 17                     max fpr 0.01285907   1.0000000 399
#> 18                     max tpr 0.04255203   1.0000000 386

# Specific metrics
model$metrics$AUC
#> NULL
model$metrics$Accuracy
#> NULL
model$metrics$Logloss
#> NULL

Confusion Matrix

# Confusion matrix plot
mplot_conf(
  tag = model$scores_test$tag,
  score = model$scores_test$score,
  subtitle = sprintf("AUC: %.3f", model$metrics$metrics$AUC)
)

ROC Curve

# ROC curve
mplot_roc(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

Gain and Lift Charts

# Gain and Lift charts for binary classification
mplot_gain(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

Variable Importance

See which features matter most:

# Variable importance dataframe
head(model$importance, 15)
#>   variable relative_importance scaled_importance importance
#> 1      Sex           202.18417        1.00000000 0.40811816
#> 2     Fare           102.86121        0.50875008 0.20763014
#> 3      Age            80.14220        0.39638218 0.16177077
#> 4   Pclass            72.13468        0.35677709 0.14560721
#> 5    SibSp            15.87309        0.07850806 0.03204057
#> 6    Parch            12.75075        0.06306504 0.02573799
#> 7 Embarked             9.45986        0.04678833 0.01909517

# Plot top 15 important variables
top15 <- head(model$importance, 15)
mplot_importance(
  var = top15$variable,
  imp = top15$importance
)

Model Interpretation with SHAP

SHAP values explain individual predictions:

# Calculate SHAP values (computationally expensive)
shap <- h2o_shap(model)

# Plot SHAP summary
plot(shap)

Advanced: Customizing AutoML

Preprocessing Options

model <- h2o_automl(
  df = dft,
  y = "Survived",
  # Ignore specific columns
  ignore = c("Ticket", "Cabin", "PassengerId"),
  # Use only specific algorithms (exclude_algos also available)
  include_algos = c("GBM", "DRF"), # Gradient Boosting & Random Forest
  # Data split
  split = 0.7,
  # Handle imbalanced data
  balance = TRUE,
  # Remove outliers (Z-score > 3)
  no_outliers = TRUE,
  # Impute missing values (requires mice package if TRUE)
  impute = FALSE,
  # Keep only unique training rows
  unique_train = TRUE,
  # Reproducible results
  seed = 123
)
#> [90m# A tibble: 2 × 5[39m
#>   tag       n     p order  pcum
#>   [3m[90m<lgl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m1[39m FALSE   549  61.6     1  61.6
#> [90m2[39m TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_2_AutoML_2_20260423_85401 0.8596583 0.4248255 0.8431084
#> 2 DRF_1_AutoML_2_20260423_85401 0.8564385 0.4488829 0.8421588
#> 3 GBM_1_AutoML_2_20260423_85401 0.8328975 0.4839880 0.8085889
#>   mean_per_class_error      rmse       mse
#> 1            0.1960698 0.3625182 0.1314194
#> 2            0.1961569 0.3699173 0.1368388
#> 3            0.2342388 0.3942838 0.1554597
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/3): GBM_2_AutoML_2_20260423_85401
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 123
#> 
#> Test metrics:
#>    AUC = 0.87879
#>    ACC = 0.86567
#>    PRC = 0.88506
#>    TPR = 0.74757
#>    TNR = 0.93939
#> 
#> Most important variables:
#>    Sex (37.5%)
#>    Fare (22.7%)
#>    Age (17.3%)
#>    Pclass (12.2%)
#>    Embarked (4.1%)

Multi-Class Classification

Predict passenger class (3 categories):

model_multiclass <- h2o_automl(
  df = dft,
  y = "Pclass",
  ignore = c("Cabin", "PassengerId"),
  max_models = 10,
  max_time = 60
)
#> [90m# A tibble: 3 × 5[39m
#>   tag       n     p order  pcum
#>   [3m[90m<fct>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m1[39m n_3     491  55.1     1  55.1
#> [90m2[39m n_1     216  24.2     2  79.4
#> [90m3[39m n_2     184  20.6     3 100
#> train_size  test_size 
#>        623        268
#>                            model_id mean_per_class_error   logloss      rmse
#> 1 XGBoost_3_AutoML_3_20260423_85406            0.0975638 0.1843648 0.2331297
#> 2 XGBoost_2_AutoML_3_20260423_85406            0.1134454 0.2204648 0.2579203
#> 3 XGBoost_1_AutoML_3_20260423_85406            0.1191367 0.2584227 0.2761215
#>          mse
#> 1 0.05434945
#> 2 0.06652287
#> 3 0.07624310
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/10): XGBoost_3_AutoML_3_20260423_85406
#> Dependent Variable: Pclass
#> Type: Classification (3 classes)
#> Algorithm: XGBOOST
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.98236
#>    ACC = 0.9291
#> 
#> Most important variables:
#>    Fare (66%)
#>    Age (14.8%)
#>    SibSp (7.9%)
#>    Parch (4.4%)
#>    Survived.FALSE (2.9%)

# Multi-class metrics
model_multiclass$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#> [90m# A tibble: 3 × 4[39m
#>   `Real x Pred`   n_3   n_1   n_2
#>   [3m[90m<fct>[39m[23m         [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m
#> [90m1[39m n_3             136     3     3
#> [90m2[39m n_1               1    60     2
#> [90m3[39m n_2               4     6    53
#> 
#> $metrics
#>       AUC    ACC
#> 1 0.98236 0.9291
#> 
#> $metrics_tags
#> [90m# A tibble: 3 × 9[39m
#>   tag       n     p   AUC order   ACC   PRC   TPR   TNR
#>   [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m1[39m n_3     142  53.0 0.985     1 0.959 0.965 0.958 0.960
#> [90m2[39m n_1      63  23.5 0.983     2 0.955 0.870 0.952 0.956
#> [90m3[39m n_2      63  23.5 0.979     3 0.944 0.914 0.841 0.976
#> 
#> $cv_metrics
#> [90m# A tibble: 12 × 8[39m
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    [3m[90m<chr>[39m[23m     [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m
#> [90m 1[39m accur…   0.926  0.027[4m2[24m     0.936      0.944      0.944       0.879     0.927 
#> [90m 2[39m auc    [31mNaN[39m      0        [31mNaN[39m        [31mNaN[39m        [31mNaN[39m         [31mNaN[39m       [31mNaN[39m     
#> [90m 3[39m err      0.073[4m9[24m 0.027[4m2[24m     0.064      0.056      0.056       0.121     0.072[4m6[24m
#> [90m 4[39m err_c…   9.2    3.35       8          7          7          15         9     
#> [90m 5[39m loglo…   0.185  0.088[4m1[24m     0.162      0.186      0.113       0.334     0.128 
#> [90m 6[39m max_p…   0.190  0.078[4m5[24m     0.167      0.125      0.136       0.32      0.2   
#> [90m 7[39m mean_…   0.903  0.042[4m9[24m     0.922      0.926      0.936       0.830     0.900 
#> [90m 8[39m mean_…   0.097[4m2[24m 0.042[4m9[24m     0.077[4m7[24m     0.073[4m9[24m     0.064[4m0[24m      0.170     0.100 
#> [90m 9[39m mse      0.054[4m4[24m 0.026[4m7[24m     0.043[4m6[24m     0.051[4m1[24m     0.035[4m7[24m      0.101     0.040[4m5[24m
#> [90m10[39m pr_auc [31mNaN[39m      0        [31mNaN[39m        [31mNaN[39m        [31mNaN[39m         [31mNaN[39m       [31mNaN[39m     
#> [90m11[39m r2       0.923  0.037[4m8[24m     0.933      0.926      0.950       0.857     0.947 
#> [90m12[39m rmse     0.229  0.051[4m7[24m     0.209      0.226      0.189       0.318     0.201 
#> 
#> $hit_ratio
#>   k hit_ratio
#> 1 1 0.9261637
#> 2 2 0.9903692
#> 3 3 1.0000000

# Confusion matrix for multi-class
mplot_conf(
  tag = model_multiclass$scores_test$tag,
  score = model_multiclass$scores_test$score
)

Regression Example

Predict fare prices:

model_regression <- h2o_automl(
  df = dft,
  y = "Fare",
  ignore = c("Cabin", "PassengerId"),
  max_models = 10,
  exclude_algos = NULL
)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    7.91   14.45   32.20   31.00  512.33
#> train_size  test_size 
#>        609        262
#>                                                 model_id     rmse      mse
#> 1 StackedEnsemble_BestOfFamily_1_AutoML_4_20260423_85416 10.38136 107.7726
#> 2    StackedEnsemble_AllModels_1_AutoML_4_20260423_85416 10.55894 111.4913
#> 3                          GBM_3_AutoML_4_20260423_85416 12.44341 154.8385
#>        mae     rmsle mean_residual_deviance
#> 1 5.533338 0.4535433               107.7726
#> 2 5.719281 0.4555461               111.4913
#> 3 5.769395 0.4650435               154.8385
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/12): StackedEnsemble_BestOfFamily_1_AutoML_4_20260423_85416
#> Dependent Variable: Fare
#> Type: Regression
#> Algorithm: STACKEDENSEMBLE
#> Split: 70% training data (of 871 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    rmse = 6.6239
#>    mae = 4.143
#>    mape = 0.012637
#>    mse = 43.876
#>    rsq = 0.9391
#>    rsqa = 0.9389

# Regression metrics
model_regression$metrics
#> $dictionary
#> [1] "RMSE: Root Mean Squared Error"       
#> [2] "MAE: Mean Average Error"             
#> [3] "MAPE: Mean Absolute Percentage Error"
#> [4] "MSE: Mean Squared Error"             
#> [5] "RSQ: R Squared"                      
#> [6] "RSQA: Adjusted R Squared"            
#> 
#> $metrics
#>       rmse      mae       mape      mse    rsq   rsqa
#> 1 6.623908 4.143029 0.01263738 43.87615 0.9391 0.9389
#> 
#> $cv_metrics
#> [90m# A tibble: 8 × 8[39m
#>   metric     mean      sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>   [3m[90m<chr>[39m[23m     [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m
#> [90m1[39m mae     5.57[90me[39m+0 6.50[90me[39m[31m-1[39m      5.34       4.79       5.52       6.58       5.63 
#> [90m2[39m mean_r… 1.08[90me[39m+2 4.86[90me[39m+1     74.7       78.3      110.       192.        86.4  
#> [90m3[39m mse     1.08[90me[39m+2 4.86[90me[39m+1     74.7       78.3      110.       192.        86.4  
#> [90m4[39m null_d… 1.19[90me[39m+5 2.29[90me[39m+4  [4m8[24m[4m9[24m084.    [4m1[24m[4m1[24m[4m1[24m869.    [4m1[24m[4m4[24m[4m7[24m015.    [4m1[24m[4m0[24m[4m9[24m745.    [4m1[24m[4m3[24m[4m5[24m751.   
#> [90m5[39m r2      8.85[90me[39m[31m-1[39m 5.72[90me[39m[31m-2[39m      0.891      0.908      0.914      0.785      0.926
#> [90m6[39m residu… 1.31[90me[39m+4 5.85[90me[39m+3   [4m9[24m415.     [4m1[24m[4m0[24m332.     [4m1[24m[4m2[24m509.     [4m2[24m[4m3[24m363.      [4m9[24m934.   
#> [90m7[39m rmse    1.02[90me[39m+1 2.14[90me[39m+0      8.64       8.85      10.5       13.8        9.29 
#> [90m8[39m rmsle   4.46[90me[39m[31m-1[39m 1.21[90me[39m[31m-1[39m      0.456      0.270      0.424      0.472      0.608

Using Pre-Split Data

If you have predefined train/test splits:

# Create splits
splits <- msplit(dft, size = 0.8, seed = 123)
#> train_size  test_size 
#>        712        179
splits$train$split <- "train"
splits$test$split <- "test"

# Combine
df_split <- rbind(splits$train, splits$test)

# Train using split column
model <- h2o_automl(
  df = df_split,
  y = "Survived",
  train_test = "split",
  max_models = 5
)
#> [90m# A tibble: 2 × 5[39m
#>   tag       n     p order  pcum
#>   [3m[90m<lgl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m1[39m FALSE   549  61.6     1  61.6
#> [90m2[39m TRUE    342  38.4     2 100
#> 
#>  test train 
#>   179   712
#>                            model_id       auc   logloss     aucpr
#> 1     DRF_1_AutoML_5_20260423_85425 0.8680875 0.7855203 0.8270861
#> 2     GLM_1_AutoML_5_20260423_85425 0.8654726 0.4253319 0.8491966
#> 3 XGBoost_2_AutoML_5_20260423_85425 0.8537248 0.4484009 0.8055514
#>   mean_per_class_error      rmse       mse
#> 1            0.1775527 0.3813365 0.1454175
#> 2            0.1923547 0.3652137 0.1333811
#> 3            0.2039812 0.3752972 0.1408480
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/5): DRF_1_AutoML_5_20260423_85425
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: DRF
#> Split: 80% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.85792
#>    ACC = 0.78212
#>    PRC = 0.84783
#>    TPR = 0.5493
#>    TNR = 0.93519
#> 
#> Most important variables:
#>    Ticket (65.7%)
#>    Sex (14.9%)
#>    Cabin (8.7%)
#>    Pclass (3.4%)
#>    Fare (2.7%)

Making Predictions

On New Data

# New data (same structure as training)
new_data <- dft[1:10, ]

# Predict
predictions <- h2o_predict_model(new_data, model$model)
head(predictions)
#>   predict     FALSE.        TRUE.
#> 1   FALSE 0.99979242 0.0002075763
#> 2    TRUE 0.02148936 0.9785106383
#> 3    TRUE 0.12765957 0.8723404255
#> 4    TRUE 0.09574468 0.9042553191
#> 5   FALSE 0.99979242 0.0002075763
#> 6   FALSE 0.97851583 0.0214841721

Binary Model Predictions

# Get probabilities
predictions <- h2o_predict_model(new_data, model$model)
head(predictions)
#>   predict     FALSE.        TRUE.
#> 1   FALSE 0.99979242 0.0002075763
#> 2    TRUE 0.02148936 0.9785106383
#> 3    TRUE 0.12765957 0.8723404255
#> 4    TRUE 0.09574468 0.9042553191
#> 5   FALSE 0.99979242 0.0002075763
#> 6   FALSE 0.97851583 0.0214841721

Model Comparison

Full Visualization Suite

# Complete model evaluation plots
mplot_full(
  tag = model$scores_test$tag,
  score = model$scores_test$score,
  subtitle = model$model@algorithm
)

Metrics Comparison

# Model performance over trees
mplot_metrics(model)

Saving and Loading Models

Export Results

# Save model and plots
export_results(model, subdir = "models", thresh = 0.5)

This creates: - Model file (.rds) - MOJO file (for production) - Performance plots - Metrics summary

Load Saved Model

# Load model
loaded_model <- readRDS("models/Titanic_Model/Titanic_Model.rds")

# Make predictions with MOJO (production-ready)
predictions <- h2o_predict_MOJO(
  model_path = "models/Titanic_Model",
  df = dft[1:10, ]
)

Best Practices

1. Start Simple

# Quick prototype
model <- h2o_automl(dft, "Survived", max_models = 3, max_time = 30)
#> [90m# A tibble: 2 × 5[39m
#>   tag       n     p order  pcum
#>   [3m[90m<lgl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m1[39m FALSE   549  61.6     1  61.6
#> [90m2[39m TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                            model_id       auc   logloss     aucpr
#> 1     GLM_1_AutoML_6_20260423_85436 0.8566401 0.4331878 0.8468753
#> 2 XGBoost_1_AutoML_6_20260423_85436 0.8400780 0.4574884 0.8099752
#> 3     GBM_1_AutoML_6_20260423_85436 0.8159377 0.6451460 0.7378534
#>   mean_per_class_error      rmse       mse
#> 1            0.1914171 0.3680368 0.1354511
#> 2            0.2138740 0.3789387 0.1435946
#> 3            0.2218336 0.4732407 0.2239567
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/3): GLM_1_AutoML_6_20260423_85436
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GLM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.87979
#>    ACC = 0.79851
#>    PRC = 0.90164
#>    TPR = 0.53398
#>    TNR = 0.96364
#> 
#> Most important variables:
#>    Ticket.1601 (0.9%)
#>    Ticket.2661 (0.9%)
#>    Ticket.C.A. 37671 (0.8%)
#>    Cabin.C22 C26 (0.8%)
#>    Sex.female (0.7%)

2. Iterate and Refine

# Refine based on results
model <- h2o_automl(
  dft, "Survived",
  max_models = 20,
  no_outliers = TRUE,
  balance = TRUE,
  ignore = c("PassengerId", "Name", "Ticket", "Cabin"),
  model_name = "Titanic_Model"
)
#> [90m# A tibble: 2 × 5[39m
#>   tag       n     p order  pcum
#>   [3m[90m<lgl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m1[39m FALSE   549  61.6     1  61.6
#> [90m2[39m TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_3_AutoML_7_20260423_85441 0.8575063 0.4316748 0.8410436
#> 2 GBM_2_AutoML_7_20260423_85441 0.8571250 0.4270731 0.8442881
#> 3 GBM_4_AutoML_7_20260423_85441 0.8561498 0.4266892 0.8469913
#>   mean_per_class_error      rmse       mse
#> 1            0.1887694 0.3656777 0.1337202
#> 2            0.1944735 0.3641046 0.1325721
#> 3            0.1909050 0.3631896 0.1319067
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

3. Validate Thoroughly

# Check multiple metrics
model$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#>        Pred
#> Real    FALSE TRUE
#>   FALSE   156    9
#>   TRUE     28   75
#> 
#> $gain_lift
#> [90m# A tibble: 10 × 10[39m
#>    percentile value random target total  gain optimal  lift response score
#>    [3m[90m<fct>[39m[23m      [3m[90m<fct>[39m[23m  [3m[90m<dbl>[39m[23m  [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m   [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m    [3m[90m<dbl>[39m[23m [3m[90m<dbl>[39m[23m
#> [90m 1[39m 1          FALSE   10.1     25    27  15.2    16.4  50.4   15.2   93.5 
#> [90m 2[39m 2          FALSE   20.1     24    27  29.7    32.7  47.4   14.5   92.2 
#> [90m 3[39m 3          FALSE   30.2     26    27  45.5    49.1  50.4   15.8   89.0 
#> [90m 4[39m 4          FALSE   39.9     24    26  60      64.8  50.3   14.5   86.1 
#> [90m 5[39m 5          FALSE   50       23    27  73.9    81.2  47.9   13.9   81.1 
#> [90m 6[39m 6          FALSE   60.1     17    27  84.2    97.6  40.2   10.3   71.1 
#> [90m 7[39m 7          FALSE   69.8     18    26  95.2   100    36.4   10.9   45.7 
#> [90m 8[39m 8          FALSE   79.9      3    27  97.0   100    21.4    1.82  22.6 
#> [90m 9[39m 9          FALSE   89.9      4    27  99.4   100    10.5    2.42   9.55
#> [90m10[39m 10         FALSE  100        1    27 100     100     0      0.606  1.49
#> 
#> $metrics
#>       AUC     ACC     PRC     TPR     TNR
#> 1 0.89147 0.86194 0.89286 0.72816 0.94545
#> 
#> $cv_metrics
#> [90m# A tibble: 20 × 8[39m
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    [3m[90m<chr>[39m[23m     [3m[90m<dbl>[39m[23m  [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m      [3m[90m<dbl>[39m[23m
#> [90m 1[39m accuracy  0.836 0.046[4m2[24m      0.84      0.896       0.8        0.782      0.863
#> [90m 2[39m auc       0.847 0.072[4m1[24m      0.847     0.927       0.854      0.731      0.876
#> [90m 3[39m err       0.164 0.046[4m2[24m      0.16      0.104       0.2        0.218      0.137
#> [90m 4[39m err_cou… 20.4   5.73       20        13          25         27         17    
#> [90m 5[39m f0point5  0.783 0.091[4m4[24m      0.788     0.913       0.743      0.663      0.808
#> [90m 6[39m f1        0.781 0.079[4m3[24m      0.778     0.876       0.779      0.658      0.813
#> [90m 7[39m f2        0.780 0.075[4m9[24m      0.768     0.842       0.818      0.653      0.819
#> [90m 8[39m lift_to…  2.64  0.337       2.72      2.23        2.40       3.1        2.76 
#> [90m 9[39m logloss   0.432 0.082[4m8[24m      0.452     0.326       0.460      0.543      0.379
#> [90m10[39m max_per…  0.236 0.070[4m2[24m      0.239     0.179       0.233      0.35       0.178
#> [90m11[39m mcc       0.651 0.110       0.653     0.792       0.605      0.499      0.705
#> [90m12[39m mean_pe…  0.824 0.053[4m1[24m      0.823     0.889       0.807      0.748      0.854
#> [90m13[39m mean_pe…  0.176 0.053[4m1[24m      0.177     0.111       0.193      0.252      0.146
#> [90m14[39m mse       0.134 0.029[4m4[24m      0.139     0.096[4m6[24m      0.145      0.173      0.115
#> [90m15[39m pr_auc    0.822 0.098[4m7[24m      0.821     0.937       0.814      0.669      0.870
#> [90m16[39m precisi…  0.785 0.103       0.795     0.939       0.721      0.667      0.804
#> [90m17[39m r2        0.425 0.149       0.403     0.610       0.402      0.207      0.503
#> [90m18[39m recall    0.780 0.079[4m3[24m      0.761     0.821       0.846      0.65       0.822
#> [90m19[39m rmse      0.364 0.040[4m5[24m      0.372     0.311       0.381      0.416      0.339
#> [90m20[39m specifi…  0.868 0.069[4m3[24m      0.886     0.957       0.767      0.845      0.886
#> 
#> $max_metrics
#>                         metric  threshold       value idx
#> 1                       max f1 0.41714863   0.7672956 181
#> 2                       max f2 0.28931884   0.7898957 224
#> 3                 max f0point5 0.66135190   0.8173619 120
#> 4                 max accuracy 0.61537557   0.8250401 130
#> 5                max precision 0.99256302   1.0000000   0
#> 6                   max recall 0.03108045   1.0000000 391
#> 7              max specificity 0.99256302   1.0000000   0
#> 8             max absolute_mcc 0.61537557   0.6277286 130
#> 9   max min_per_class_accuracy 0.35070798   0.7907950 205
#> 10 max mean_per_class_accuracy 0.41714863   0.8112306 181
#> 11                     max tns 0.99256302 384.0000000   0
#> 12                     max fns 0.99256302 238.0000000   0
#> 13                     max fps 0.01019947 384.0000000 399
#> 14                     max tps 0.03108045 239.0000000 391
#> 15                     max tnr 0.99256302   1.0000000   0
#> 16                     max fnr 0.99256302   0.9958159   0
#> 17                     max fpr 0.01019947   1.0000000 399
#> 18                     max tpr 0.03108045   1.0000000 391

# Visual inspection
mplot_full(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)


# Variable importance
mplot_importance(
  var = model$importance$variable,
  imp = model$importance$importance
)

Score Distribution

# Density plot
mplot_density(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

4. Document Your Process

# Save everything
export_results(model, subdir = "my_project", thresh = 0.5)

Troubleshooting

h2o Initialization Issues

# Manually initialize h2o with more memory
h2o::h2o.init(max_mem_size = "8G", nthreads = -1)

Clean h2o Environment

# Remove all models
h2o::h2o.removeAll()

# Shutdown h2o
h2o::h2o.shutdown(prompt = FALSE)

Check h2o Flow UI

# Open h2o's web interface
# Navigate to: http://localhost:54321/flow/index.html

Next Steps

Explore data wrangling features (see Data Wrangling vignette)
Learn about API integrations (see API Integrations vignette)
Review individual plotting functions: ?mplot_conf, ?mplot_roc, ?mplot_importance

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

Machine Learning

Bernardo Lares

2026-04-23

Introduction

Setup

Pipeline

Quick Start: Binary Classification

Understanding the Output

Model Performance

Metrics

Confusion Matrix

ROC Curve

Gain and Lift Charts

Variable Importance

Model Interpretation with SHAP

Advanced: Customizing AutoML

Preprocessing Options

Multi-Class Classification

Regression Example

Using Pre-Split Data

Making Predictions

On New Data

Binary Model Predictions

Model Comparison

Full Visualization Suite

Metrics Comparison

Saving and Loading Models

Export Results

Load Saved Model

Best Practices

1. Start Simple

2. Iterate and Refine

3. Validate Thoroughly

Score Distribution

4. Document Your Process

Troubleshooting

h2o Initialization Issues

Clean h2o Environment

Check h2o Flow UI

Further Reading

Package & ML Resources

Next Steps