The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Machine Learning

Bernardo Lares

2026-04-23

Introduction

The lares package provides a streamlined interface to h2o’s AutoML for automated machine learning. This vignette demonstrates how to build, evaluate, and interpret models with minimal code.

Setup

Install and load required packages:

library(lares)
library(dplyr)

h2o must be installed separately:

# Install h2o (run once)
# install.packages("h2o")
library(h2o)

# Initialize h2o quietly for vignette
Sys.unsetenv("http_proxy")
Sys.unsetenv("https_proxy")
h2o.init(nthreads = -1, max_mem_size = "2G", ip = "127.0.0.1")
#> 
#> H2O is not running yet, starting it now...
#> 
#> Note:  In case of errors look at the following log files:
#>     /var/folders/_9/97xqjz8j4cx_q5m_3t646mdm0000gn/T//Rtmpl5cGQ0/file4ca320b67f28/h2o_bernardo_started_from_r.out
#>     /var/folders/_9/97xqjz8j4cx_q5m_3t646mdm0000gn/T//Rtmpl5cGQ0/file4ca32d562bdf/h2o_bernardo_started_from_r.err
#> 
#> 
#> Starting H2O JVM and connecting: ... Connection successful!
#> 
#> R is connected to the H2O cluster: 
#>     H2O cluster uptime:         3 seconds 462 milliseconds 
#>     H2O cluster timezone:       Europe/Madrid 
#>     H2O data parsing timezone:  UTC 
#>     H2O cluster version:        3.44.0.3 
#>     H2O cluster version age:    2 years, 4 months and 2 days 
#>     H2O cluster name:           H2O_started_from_R_bernardo_rna358 
#>     H2O cluster total nodes:    1 
#>     H2O cluster total memory:   1.76 GB 
#>     H2O cluster total cores:    12 
#>     H2O cluster allowed cores:  12 
#>     H2O cluster healthy:        TRUE 
#>     H2O Connection ip:          127.0.0.1 
#>     H2O Connection port:        54321 
#>     H2O Connection proxy:       NA 
#>     H2O Internal Security:      FALSE 
#>     R Version:                  R version 4.5.3 (2026-03-11)
h2o.no_progress() # Disable progress bars

Pipeline

h2o_automl workflow

In short, these are the steps that happen on h2o_automl’s backend:

  1. Input Processing: The function receives a dataframe df and the dependent variable y to predict. Set seed for reproducibility.

  2. Model Type Detection: Automatically decides between classification (categorical) or regression (continuous) based on y’s class and unique values (controlled by thresh parameter).

  3. Data Splitting: Splits data into test and train datasets. Control the proportion with split parameter. Replicate this with msplit().

  4. Preprocessing:

    • Center and scale numerical values
    • Remove outliers with no_outliers
    • Impute missing values with MICE (impute = TRUE)
    • Balance training data for classification (balance = TRUE)
    • Replicate with model_preprocess()
  5. Model Training: Runs h2o::h2o.automl() to train multiple models and generate a leaderboard sorted by performance. Customize with:

    • max_models or max_time
    • nfolds for k-fold cross-validation
    • exclude_algos and include_algos
  6. Model Selection: Selects the best model based on performance metric (change with stopping_metric). Use h2o_selectmodel() to choose an alternative.

  7. Performance Evaluation: Calculates metrics and plots using test predictions (unseen data). Replicate with model_metrics().

  8. Results: Returns a list with inputs, leaderboard, best model, metrics, and plots. Export with export_results().

Quick Start: Binary Classification

Let’s build a model to predict Titanic survival:

data(dft)

# Train an AutoML model
# Binary classification
model <- h2o_automl(
  df = dft,
  y = "Survived",
  target = "TRUE",
  ignore = c("Ticket", "Cabin", "PassengerId"),
  max_models = 10,
  max_time = 120,
  impute = FALSE
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_4_AutoML_1_20260423_85339 0.8655135 0.4261943 0.8323496
#> 2 GBM_3_AutoML_1_20260423_85339 0.8628015 0.4334215 0.8260966
#> 3 GBM_2_AutoML_1_20260423_85339 0.8589818 0.4318601 0.8276204
#>   mean_per_class_error      rmse       mse
#> 1            0.1807105 0.3635268 0.1321517
#> 2            0.1725745 0.3662049 0.1341061
#> 3            0.1843010 0.3652161 0.1333828
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/10): GBM_4_AutoML_1_20260423_85339
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.86366
#>    ACC = 0.18657
#>    PRC = 0.19355
#>    TPR = 0.34615
#>    TNR = 0.085366
#> 
#> Most important variables:
#>    Sex (40.8%)
#>    Fare (20.8%)
#>    Age (16.2%)
#>    Pclass (14.6%)
#>    SibSp (3.2%)

# View results
print(model)
#> Model (1/10): GBM_4_AutoML_1_20260423_85339
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.86366
#>    ACC = 0.18657
#>    PRC = 0.19355
#>    TPR = 0.34615
#>    TNR = 0.085366
#> 
#> Most important variables:
#>    Sex (40.8%)
#>    Fare (20.8%)
#>    Age (16.2%)
#>    Pclass (14.6%)
#>    SibSp (3.2%)

That’s it! h2o_automl() handles:

Understanding the Output

The model object contains:

names(model)
#>  [1] "model"           "y"               "scores_test"     "metrics"        
#>  [5] "parameters"      "importance"      "datasets"        "scoring_history"
#>  [9] "categoricals"    "type"            "split"           "threshold"      
#> [13] "model_name"      "algorithm"       "leaderboard"     "project"        
#> [17] "ignored"         "seed"            "h2o"             "plots"

Key components: - model: Best h2o model - metrics: Performance metrics - importance: Variable importance - datasets: Train/test data used - parameters: Configuration used

Model Performance

Metrics

View detailed metrics:

# All metrics
model$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#>        Pred
#> Real    FALSE TRUE
#>   FALSE    14  150
#>   TRUE     68   36
#> 
#> $gain_lift
#> # A tibble: 10 × 10
#>    percentile value random target total  gain optimal   lift response score
#>    <fct>      <chr>  <dbl>  <int> <int> <dbl>   <dbl>  <dbl>    <dbl> <dbl>
#>  1 1          TRUE    10.8     29    29  27.9    27.9 158.     27.9   90.0 
#>  2 2          TRUE    20.1     22    25  49.0    51.9 143.     21.2   78.1 
#>  3 3          TRUE    30.2     16    27  64.4    77.9 113.     15.4   51.6 
#>  4 4          TRUE    39.9     13    26  76.9   100    92.7    12.5   29.2 
#>  5 5          TRUE    50        6    27  82.7   100    65.4     5.77  20.7 
#>  6 6          TRUE    60.1      5    27  87.5   100    45.7     4.81  14.8 
#>  7 7          TRUE    69.8      6    26  93.3   100    33.7     5.77  12.3 
#>  8 8          TRUE    79.9      1    27  94.2   100    18.0     0.962  9.31
#>  9 9          TRUE    89.9      3    27  97.1   100     8.00    2.88   6.12
#> 10 10         TRUE   100        3    27 100     100     0       2.88   1.54
#> 
#> $metrics
#>       AUC     ACC     PRC     TPR      TNR
#> 1 0.86366 0.18657 0.19355 0.34615 0.085366
#> 
#> $cv_metrics
#> # A tibble: 20 × 8
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    <chr>     <dbl>  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1 accuracy  0.855 0.0331      0.856     0.904       0.864      0.815      0.839
#>  2 auc       0.861 0.0424      0.876     0.924       0.860      0.815      0.831
#>  3 err       0.145 0.0331      0.144     0.096       0.136      0.185      0.161
#>  4 err_cou… 18     4.06       18        12          17         23         20    
#>  5 f0point5  0.835 0.0356      0.855     0.881       0.826      0.830      0.785
#>  6 f1        0.797 0.0342      0.812     0.838       0.809      0.777      0.75 
#>  7 f2        0.763 0.0369      0.774     0.799       0.793      0.730      0.718
#>  8 lift_to…  2.66  0.377       2.40      3.12        2.72       2.18       2.88 
#>  9 logloss   0.432 0.0758      0.413     0.324       0.425      0.530      0.468
#> 10 max_per…  0.259 0.0400      0.25      0.225       0.217      0.298      0.302
#> 11 mcc       0.690 0.0592      0.703     0.775       0.705      0.632      0.636
#> 12 mean_pe…  0.834 0.0278      0.841     0.870       0.847      0.806      0.806
#> 13 mean_pe…  0.166 0.0278      0.159     0.130       0.153      0.194      0.194
#> 14 mse       0.132 0.0245      0.126     0.0977      0.129      0.164      0.145
#> 15 pr_auc    0.832 0.0509      0.868     0.895       0.815      0.818      0.764
#> 16 precisi…  0.863 0.0399      0.886     0.912       0.837      0.870      0.811
#> 17 r2        0.436 0.0871      0.481     0.551       0.446      0.342      0.358
#> 18 recall    0.741 0.0400      0.75      0.775       0.783      0.702      0.698
#> 19 rmse      0.362 0.0342      0.355     0.313       0.359      0.404      0.381
#> 20 specifi…  0.926 0.0231      0.932     0.965       0.911      0.910      0.914
#> 
#> $max_metrics
#>                         metric  threshold       value idx
#> 1                       max f1 0.49534115   0.7775281 157
#> 2                       max f2 0.28977227   0.7987220 226
#> 3                 max f0point5 0.50884749   0.8172147 150
#> 4                 max accuracy 0.50884749   0.8410915 150
#> 5                max precision 0.98538253   1.0000000   0
#> 6                   max recall 0.04255203   1.0000000 386
#> 7              max specificity 0.98538253   1.0000000   0
#> 8             max absolute_mcc 0.50884749   0.6587759 150
#> 9   max min_per_class_accuracy 0.36328089   0.8025210 200
#> 10 max mean_per_class_accuracy 0.46113191   0.8194041 168
#> 11                     max tns 0.98538253 385.0000000   0
#> 12                     max fns 0.98538253 236.0000000   0
#> 13                     max fps 0.01285907 385.0000000 399
#> 14                     max tps 0.04255203 238.0000000 386
#> 15                     max tnr 0.98538253   1.0000000   0
#> 16                     max fnr 0.98538253   0.9915966   0
#> 17                     max fpr 0.01285907   1.0000000 399
#> 18                     max tpr 0.04255203   1.0000000 386

# Specific metrics
model$metrics$AUC
#> NULL
model$metrics$Accuracy
#> NULL
model$metrics$Logloss
#> NULL

Confusion Matrix

# Confusion matrix plot
mplot_conf(
  tag = model$scores_test$tag,
  score = model$scores_test$score,
  subtitle = sprintf("AUC: %.3f", model$metrics$metrics$AUC)
)

ROC Curve

# ROC curve
mplot_roc(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

Gain and Lift Charts

# Gain and Lift charts for binary classification
mplot_gain(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

Variable Importance

See which features matter most:

# Variable importance dataframe
head(model$importance, 15)
#>   variable relative_importance scaled_importance importance
#> 1      Sex           202.18417        1.00000000 0.40811816
#> 2     Fare           102.86121        0.50875008 0.20763014
#> 3      Age            80.14220        0.39638218 0.16177077
#> 4   Pclass            72.13468        0.35677709 0.14560721
#> 5    SibSp            15.87309        0.07850806 0.03204057
#> 6    Parch            12.75075        0.06306504 0.02573799
#> 7 Embarked             9.45986        0.04678833 0.01909517

# Plot top 15 important variables
top15 <- head(model$importance, 15)
mplot_importance(
  var = top15$variable,
  imp = top15$importance
)

Model Interpretation with SHAP

SHAP values explain individual predictions:

# Calculate SHAP values (computationally expensive)
shap <- h2o_shap(model)

# Plot SHAP summary
plot(shap)

Advanced: Customizing AutoML

Preprocessing Options

model <- h2o_automl(
  df = dft,
  y = "Survived",
  # Ignore specific columns
  ignore = c("Ticket", "Cabin", "PassengerId"),
  # Use only specific algorithms (exclude_algos also available)
  include_algos = c("GBM", "DRF"), # Gradient Boosting & Random Forest
  # Data split
  split = 0.7,
  # Handle imbalanced data
  balance = TRUE,
  # Remove outliers (Z-score > 3)
  no_outliers = TRUE,
  # Impute missing values (requires mice package if TRUE)
  impute = FALSE,
  # Keep only unique training rows
  unique_train = TRUE,
  # Reproducible results
  seed = 123
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_2_AutoML_2_20260423_85401 0.8596583 0.4248255 0.8431084
#> 2 DRF_1_AutoML_2_20260423_85401 0.8564385 0.4488829 0.8421588
#> 3 GBM_1_AutoML_2_20260423_85401 0.8328975 0.4839880 0.8085889
#>   mean_per_class_error      rmse       mse
#> 1            0.1960698 0.3625182 0.1314194
#> 2            0.1961569 0.3699173 0.1368388
#> 3            0.2342388 0.3942838 0.1554597
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/3): GBM_2_AutoML_2_20260423_85401
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GBM
#> Split: 70% training data (of 891 observations)
#> Seed: 123
#> 
#> Test metrics:
#>    AUC = 0.87879
#>    ACC = 0.86567
#>    PRC = 0.88506
#>    TPR = 0.74757
#>    TNR = 0.93939
#> 
#> Most important variables:
#>    Sex (37.5%)
#>    Fare (22.7%)
#>    Age (17.3%)
#>    Pclass (12.2%)
#>    Embarked (4.1%)

Multi-Class Classification

Predict passenger class (3 categories):

model_multiclass <- h2o_automl(
  df = dft,
  y = "Pclass",
  ignore = c("Cabin", "PassengerId"),
  max_models = 10,
  max_time = 60
)
#> # A tibble: 3 × 5
#>   tag       n     p order  pcum
#>   <fct> <int> <dbl> <int> <dbl>
#> 1 n_3     491  55.1     1  55.1
#> 2 n_1     216  24.2     2  79.4
#> 3 n_2     184  20.6     3 100
#> train_size  test_size 
#>        623        268
#>                            model_id mean_per_class_error   logloss      rmse
#> 1 XGBoost_3_AutoML_3_20260423_85406            0.0975638 0.1843648 0.2331297
#> 2 XGBoost_2_AutoML_3_20260423_85406            0.1134454 0.2204648 0.2579203
#> 3 XGBoost_1_AutoML_3_20260423_85406            0.1191367 0.2584227 0.2761215
#>          mse
#> 1 0.05434945
#> 2 0.06652287
#> 3 0.07624310
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/10): XGBoost_3_AutoML_3_20260423_85406
#> Dependent Variable: Pclass
#> Type: Classification (3 classes)
#> Algorithm: XGBOOST
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.98236
#>    ACC = 0.9291
#> 
#> Most important variables:
#>    Fare (66%)
#>    Age (14.8%)
#>    SibSp (7.9%)
#>    Parch (4.4%)
#>    Survived.FALSE (2.9%)

# Multi-class metrics
model_multiclass$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#> # A tibble: 3 × 4
#>   `Real x Pred`   n_3   n_1   n_2
#>   <fct>         <int> <int> <int>
#> 1 n_3             136     3     3
#> 2 n_1               1    60     2
#> 3 n_2               4     6    53
#> 
#> $metrics
#>       AUC    ACC
#> 1 0.98236 0.9291
#> 
#> $metrics_tags
#> # A tibble: 3 × 9
#>   tag       n     p   AUC order   ACC   PRC   TPR   TNR
#>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 n_3     142  53.0 0.985     1 0.959 0.965 0.958 0.960
#> 2 n_1      63  23.5 0.983     2 0.955 0.870 0.952 0.956
#> 3 n_2      63  23.5 0.979     3 0.944 0.914 0.841 0.976
#> 
#> $cv_metrics
#> # A tibble: 12 × 8
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    <chr>     <dbl>  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1 accur…   0.926  0.0272     0.936      0.944      0.944       0.879     0.927 
#>  2 auc    NaN      0        NaN        NaN        NaN         NaN       NaN     
#>  3 err      0.0739 0.0272     0.064      0.056      0.056       0.121     0.0726
#>  4 err_c…   9.2    3.35       8          7          7          15         9     
#>  5 loglo…   0.185  0.0881     0.162      0.186      0.113       0.334     0.128 
#>  6 max_p…   0.190  0.0785     0.167      0.125      0.136       0.32      0.2   
#>  7 mean_…   0.903  0.0429     0.922      0.926      0.936       0.830     0.900 
#>  8 mean_…   0.0972 0.0429     0.0777     0.0739     0.0640      0.170     0.100 
#>  9 mse      0.0544 0.0267     0.0436     0.0511     0.0357      0.101     0.0405
#> 10 pr_auc NaN      0        NaN        NaN        NaN         NaN       NaN     
#> 11 r2       0.923  0.0378     0.933      0.926      0.950       0.857     0.947 
#> 12 rmse     0.229  0.0517     0.209      0.226      0.189       0.318     0.201 
#> 
#> $hit_ratio
#>   k hit_ratio
#> 1 1 0.9261637
#> 2 2 0.9903692
#> 3 3 1.0000000

# Confusion matrix for multi-class
mplot_conf(
  tag = model_multiclass$scores_test$tag,
  score = model_multiclass$scores_test$score
)

Regression Example

Predict fare prices:

model_regression <- h2o_automl(
  df = dft,
  y = "Fare",
  ignore = c("Cabin", "PassengerId"),
  max_models = 10,
  exclude_algos = NULL
)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00    7.91   14.45   32.20   31.00  512.33
#> train_size  test_size 
#>        609        262
#>                                                 model_id     rmse      mse
#> 1 StackedEnsemble_BestOfFamily_1_AutoML_4_20260423_85416 10.38136 107.7726
#> 2    StackedEnsemble_AllModels_1_AutoML_4_20260423_85416 10.55894 111.4913
#> 3                          GBM_3_AutoML_4_20260423_85416 12.44341 154.8385
#>        mae     rmsle mean_residual_deviance
#> 1 5.533338 0.4535433               107.7726
#> 2 5.719281 0.4555461               111.4913
#> 3 5.769395 0.4650435               154.8385
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/12): StackedEnsemble_BestOfFamily_1_AutoML_4_20260423_85416
#> Dependent Variable: Fare
#> Type: Regression
#> Algorithm: STACKEDENSEMBLE
#> Split: 70% training data (of 871 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    rmse = 6.6239
#>    mae = 4.143
#>    mape = 0.012637
#>    mse = 43.876
#>    rsq = 0.9391
#>    rsqa = 0.9389

# Regression metrics
model_regression$metrics
#> $dictionary
#> [1] "RMSE: Root Mean Squared Error"       
#> [2] "MAE: Mean Average Error"             
#> [3] "MAPE: Mean Absolute Percentage Error"
#> [4] "MSE: Mean Squared Error"             
#> [5] "RSQ: R Squared"                      
#> [6] "RSQA: Adjusted R Squared"            
#> 
#> $metrics
#>       rmse      mae       mape      mse    rsq   rsqa
#> 1 6.623908 4.143029 0.01263738 43.87615 0.9391 0.9389
#> 
#> $cv_metrics
#> # A tibble: 8 × 8
#>   metric     mean      sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>   <chr>     <dbl>   <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#> 1 mae     5.57e+0 6.50e-1      5.34       4.79       5.52       6.58       5.63 
#> 2 mean_r… 1.08e+2 4.86e+1     74.7       78.3      110.       192.        86.4  
#> 3 mse     1.08e+2 4.86e+1     74.7       78.3      110.       192.        86.4  
#> 4 null_d… 1.19e+5 2.29e+4  89084.    111869.    147015.    109745.    135751.   
#> 5 r2      8.85e-1 5.72e-2      0.891      0.908      0.914      0.785      0.926
#> 6 residu… 1.31e+4 5.85e+3   9415.     10332.     12509.     23363.      9934.   
#> 7 rmse    1.02e+1 2.14e+0      8.64       8.85      10.5       13.8        9.29 
#> 8 rmsle   4.46e-1 1.21e-1      0.456      0.270      0.424      0.472      0.608

Using Pre-Split Data

If you have predefined train/test splits:

# Create splits
splits <- msplit(dft, size = 0.8, seed = 123)
#> train_size  test_size 
#>        712        179
splits$train$split <- "train"
splits$test$split <- "test"

# Combine
df_split <- rbind(splits$train, splits$test)

# Train using split column
model <- h2o_automl(
  df = df_split,
  y = "Survived",
  train_test = "split",
  max_models = 5
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> 
#>  test train 
#>   179   712
#>                            model_id       auc   logloss     aucpr
#> 1     DRF_1_AutoML_5_20260423_85425 0.8680875 0.7855203 0.8270861
#> 2     GLM_1_AutoML_5_20260423_85425 0.8654726 0.4253319 0.8491966
#> 3 XGBoost_2_AutoML_5_20260423_85425 0.8537248 0.4484009 0.8055514
#>   mean_per_class_error      rmse       mse
#> 1            0.1775527 0.3813365 0.1454175
#> 2            0.1923547 0.3652137 0.1333811
#> 3            0.2039812 0.3752972 0.1408480
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/5): DRF_1_AutoML_5_20260423_85425
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: DRF
#> Split: 80% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.85792
#>    ACC = 0.78212
#>    PRC = 0.84783
#>    TPR = 0.5493
#>    TNR = 0.93519
#> 
#> Most important variables:
#>    Ticket (65.7%)
#>    Sex (14.9%)
#>    Cabin (8.7%)
#>    Pclass (3.4%)
#>    Fare (2.7%)

Making Predictions

On New Data

# New data (same structure as training)
new_data <- dft[1:10, ]

# Predict
predictions <- h2o_predict_model(new_data, model$model)
head(predictions)
#>   predict     FALSE.        TRUE.
#> 1   FALSE 0.99979242 0.0002075763
#> 2    TRUE 0.02148936 0.9785106383
#> 3    TRUE 0.12765957 0.8723404255
#> 4    TRUE 0.09574468 0.9042553191
#> 5   FALSE 0.99979242 0.0002075763
#> 6   FALSE 0.97851583 0.0214841721

Binary Model Predictions

# Get probabilities
predictions <- h2o_predict_model(new_data, model$model)
head(predictions)
#>   predict     FALSE.        TRUE.
#> 1   FALSE 0.99979242 0.0002075763
#> 2    TRUE 0.02148936 0.9785106383
#> 3    TRUE 0.12765957 0.8723404255
#> 4    TRUE 0.09574468 0.9042553191
#> 5   FALSE 0.99979242 0.0002075763
#> 6   FALSE 0.97851583 0.0214841721

Model Comparison

Full Visualization Suite

# Complete model evaluation plots
mplot_full(
  tag = model$scores_test$tag,
  score = model$scores_test$score,
  subtitle = model$model@algorithm
)

Metrics Comparison

# Model performance over trees
mplot_metrics(model)

Saving and Loading Models

Export Results

# Save model and plots
export_results(model, subdir = "models", thresh = 0.5)

This creates: - Model file (.rds) - MOJO file (for production) - Performance plots - Metrics summary

Load Saved Model

# Load model
loaded_model <- readRDS("models/Titanic_Model/Titanic_Model.rds")

# Make predictions with MOJO (production-ready)
predictions <- h2o_predict_MOJO(
  model_path = "models/Titanic_Model",
  df = dft[1:10, ]
)

Best Practices

1. Start Simple

# Quick prototype
model <- h2o_automl(dft, "Survived", max_models = 3, max_time = 30)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                            model_id       auc   logloss     aucpr
#> 1     GLM_1_AutoML_6_20260423_85436 0.8566401 0.4331878 0.8468753
#> 2 XGBoost_1_AutoML_6_20260423_85436 0.8400780 0.4574884 0.8099752
#> 3     GBM_1_AutoML_6_20260423_85436 0.8159377 0.6451460 0.7378534
#>   mean_per_class_error      rmse       mse
#> 1            0.1914171 0.3680368 0.1354511
#> 2            0.2138740 0.3789387 0.1435946
#> 3            0.2218336 0.4732407 0.2239567
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Model (1/3): GLM_1_AutoML_6_20260423_85436
#> Dependent Variable: Survived
#> Type: Classification (2 classes)
#> Algorithm: GLM
#> Split: 70% training data (of 891 observations)
#> Seed: 0
#> 
#> Test metrics:
#>    AUC = 0.87979
#>    ACC = 0.79851
#>    PRC = 0.90164
#>    TPR = 0.53398
#>    TNR = 0.96364
#> 
#> Most important variables:
#>    Ticket.1601 (0.9%)
#>    Ticket.2661 (0.9%)
#>    Ticket.C.A. 37671 (0.8%)
#>    Cabin.C22 C26 (0.8%)
#>    Sex.female (0.7%)

2. Iterate and Refine

# Refine based on results
model <- h2o_automl(
  dft, "Survived",
  max_models = 20,
  no_outliers = TRUE,
  balance = TRUE,
  ignore = c("PassengerId", "Name", "Ticket", "Cabin"),
  model_name = "Titanic_Model"
)
#> # A tibble: 2 × 5
#>   tag       n     p order  pcum
#>   <lgl> <int> <dbl> <int> <dbl>
#> 1 FALSE   549  61.6     1  61.6
#> 2 TRUE    342  38.4     2 100
#> train_size  test_size 
#>        623        268
#>                        model_id       auc   logloss     aucpr
#> 1 GBM_3_AutoML_7_20260423_85441 0.8575063 0.4316748 0.8410436
#> 2 GBM_2_AutoML_7_20260423_85441 0.8571250 0.4270731 0.8442881
#> 3 GBM_4_AutoML_7_20260423_85441 0.8561498 0.4266892 0.8469913
#>   mean_per_class_error      rmse       mse
#> 1            0.1887694 0.3656777 0.1337202
#> 2            0.1944735 0.3641046 0.1325721
#> 3            0.1909050 0.3631896 0.1319067
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

3. Validate Thoroughly

# Check multiple metrics
model$metrics
#> $dictionary
#> [1] "AUC: Area Under the Curve"                                                             
#> [2] "ACC: Accuracy"                                                                         
#> [3] "PRC: Precision = Positive Predictive Value"                                            
#> [4] "TPR: Sensitivity = Recall = Hit rate = True Positive Rate"                             
#> [5] "TNR: Specificity = Selectivity = True Negative Rate"                                   
#> [6] "Logloss (Error): Logarithmic loss [Neutral classification: 0.69315]"                   
#> [7] "Gain: When best n deciles selected, what % of the real target observations are picked?"
#> [8] "Lift: When best n deciles selected, how much better than random is?"                   
#> 
#> $confusion_matrix
#>        Pred
#> Real    FALSE TRUE
#>   FALSE   156    9
#>   TRUE     28   75
#> 
#> $gain_lift
#> # A tibble: 10 × 10
#>    percentile value random target total  gain optimal  lift response score
#>    <fct>      <fct>  <dbl>  <int> <int> <dbl>   <dbl> <dbl>    <dbl> <dbl>
#>  1 1          FALSE   10.1     25    27  15.2    16.4  50.4   15.2   93.5 
#>  2 2          FALSE   20.1     24    27  29.7    32.7  47.4   14.5   92.2 
#>  3 3          FALSE   30.2     26    27  45.5    49.1  50.4   15.8   89.0 
#>  4 4          FALSE   39.9     24    26  60      64.8  50.3   14.5   86.1 
#>  5 5          FALSE   50       23    27  73.9    81.2  47.9   13.9   81.1 
#>  6 6          FALSE   60.1     17    27  84.2    97.6  40.2   10.3   71.1 
#>  7 7          FALSE   69.8     18    26  95.2   100    36.4   10.9   45.7 
#>  8 8          FALSE   79.9      3    27  97.0   100    21.4    1.82  22.6 
#>  9 9          FALSE   89.9      4    27  99.4   100    10.5    2.42   9.55
#> 10 10         FALSE  100        1    27 100     100     0      0.606  1.49
#> 
#> $metrics
#>       AUC     ACC     PRC     TPR     TNR
#> 1 0.89147 0.86194 0.89286 0.72816 0.94545
#> 
#> $cv_metrics
#> # A tibble: 20 × 8
#>    metric     mean     sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
#>    <chr>     <dbl>  <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1 accuracy  0.836 0.0462      0.84      0.896       0.8        0.782      0.863
#>  2 auc       0.847 0.0721      0.847     0.927       0.854      0.731      0.876
#>  3 err       0.164 0.0462      0.16      0.104       0.2        0.218      0.137
#>  4 err_cou… 20.4   5.73       20        13          25         27         17    
#>  5 f0point5  0.783 0.0914      0.788     0.913       0.743      0.663      0.808
#>  6 f1        0.781 0.0793      0.778     0.876       0.779      0.658      0.813
#>  7 f2        0.780 0.0759      0.768     0.842       0.818      0.653      0.819
#>  8 lift_to…  2.64  0.337       2.72      2.23        2.40       3.1        2.76 
#>  9 logloss   0.432 0.0828      0.452     0.326       0.460      0.543      0.379
#> 10 max_per…  0.236 0.0702      0.239     0.179       0.233      0.35       0.178
#> 11 mcc       0.651 0.110       0.653     0.792       0.605      0.499      0.705
#> 12 mean_pe…  0.824 0.0531      0.823     0.889       0.807      0.748      0.854
#> 13 mean_pe…  0.176 0.0531      0.177     0.111       0.193      0.252      0.146
#> 14 mse       0.134 0.0294      0.139     0.0966      0.145      0.173      0.115
#> 15 pr_auc    0.822 0.0987      0.821     0.937       0.814      0.669      0.870
#> 16 precisi…  0.785 0.103       0.795     0.939       0.721      0.667      0.804
#> 17 r2        0.425 0.149       0.403     0.610       0.402      0.207      0.503
#> 18 recall    0.780 0.0793      0.761     0.821       0.846      0.65       0.822
#> 19 rmse      0.364 0.0405      0.372     0.311       0.381      0.416      0.339
#> 20 specifi…  0.868 0.0693      0.886     0.957       0.767      0.845      0.886
#> 
#> $max_metrics
#>                         metric  threshold       value idx
#> 1                       max f1 0.41714863   0.7672956 181
#> 2                       max f2 0.28931884   0.7898957 224
#> 3                 max f0point5 0.66135190   0.8173619 120
#> 4                 max accuracy 0.61537557   0.8250401 130
#> 5                max precision 0.99256302   1.0000000   0
#> 6                   max recall 0.03108045   1.0000000 391
#> 7              max specificity 0.99256302   1.0000000   0
#> 8             max absolute_mcc 0.61537557   0.6277286 130
#> 9   max min_per_class_accuracy 0.35070798   0.7907950 205
#> 10 max mean_per_class_accuracy 0.41714863   0.8112306 181
#> 11                     max tns 0.99256302 384.0000000   0
#> 12                     max fns 0.99256302 238.0000000   0
#> 13                     max fps 0.01019947 384.0000000 399
#> 14                     max tps 0.03108045 239.0000000 391
#> 15                     max tnr 0.99256302   1.0000000   0
#> 16                     max fnr 0.99256302   0.9958159   0
#> 17                     max fpr 0.01019947   1.0000000 399
#> 18                     max tpr 0.03108045   1.0000000 391

# Visual inspection
mplot_full(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)


# Variable importance
mplot_importance(
  var = model$importance$variable,
  imp = model$importance$importance
)

Score Distribution

# Density plot
mplot_density(
  tag = model$scores_test$tag,
  score = model$scores_test$score
)

4. Document Your Process

# Save everything
export_results(model, subdir = "my_project", thresh = 0.5)

Troubleshooting

h2o Initialization Issues

# Manually initialize h2o with more memory
h2o::h2o.init(max_mem_size = "8G", nthreads = -1)

Clean h2o Environment

# Remove all models
h2o::h2o.removeAll()

# Shutdown h2o
h2o::h2o.shutdown(prompt = FALSE)

Check h2o Flow UI

# Open h2o's web interface
# Navigate to: http://localhost:54321/flow/index.html

Further Reading

Package & ML Resources

Next Steps

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.