The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

modeltuning

pkgdown.yaml R-CMD-check CRAN status

The goal of modeltuning is to provide common model selection and tuning utilities in an intuitive manner. Additionally, modeltuning aims to be:

Installation

You can install the development version of modeltuning with:

# install.packages("pak")
pak::pkg_install("dmolitor/modeltuning")

Usage

These are simple examples that use the built-in iris data-set to illustrate the basic functionality of modeltuning.

Cross Validation

First we’ll train a binary classification Decision Tree model to predict whether the flowers in iris are of Species virginica and we’ll specify a 3-fold cross validation scheme with stratification by Species to estimate our model’s true error rate.

First, let’s split our data into a train and test set.

library(future)
library(modeltuning)
library(rpart)
library(rsample)
library(yardstick)

iris_new <- iris[sample(1:nrow(iris), nrow(iris)), ]
iris_new$Species <- factor(iris_new$Species == "virginica")
iris_train <- iris_new[1:100, ]
iris_test <- iris_new[101:150, ]

Next, we’ll define a function to generate cross validation splits.

splitter <- function(data, ...) lapply(vfold_cv(data, ...)$splits, \(.x) .x$in_id)

Now, let’s specify and fit a 3-fold cross validation scheme and calculate the F-Measure, Accuracy, and ROC AUC as our hold-out set evaluation metrics.

# Specify cross validation schema
iris_cv <- CV$new(
  learner = rpart,
  learner_args = list(method = "class"),
  splitter = splitter,
  splitter_args = list(v = 3, strata = Species),
  scorer = list(
    f_meas = f_meas_vec,
    accuracy = accuracy_vec,
    auc = roc_auc_vec
  ), 
  prediction_args = list(
    f_meas = list(type = "class"),
    accuracy = list(type = "class"), 
    auc = list(type = "prob")
  ),
  convert_predictions = list(
    f_meas = NULL,
    accuracy = NULL,
    auc = function(.x) .x[, "FALSE"]
  )
)

# Fit cross validated model
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_new)

Now, let’s check our evaluation metrics averaged across folds.

iris_cv_fitted$mean_metrics
#> $f_meas
#> [1] 0.9492091
#> 
#> $accuracy
#> [1] 0.9333173
#> 
#> $auc
#> [1] 0.9304813

Another common model-tuning method is grid search. We’ll use it to tune the minsplit and maxdepth parameters of our decision tree. We will choose our optimal hyper-parameters as those that maximize the ROC AUC on the validation set.

# Specify Grid Search schema
iris_grid <- GridSearch$new(
  learner = rpart,
  learner_args = list(method = "class"),
  tune_params = list(
    minsplit = seq(10, 30, by = 5),
    maxdepth = seq(20, 30, by = 2)
  ),
  evaluation_data = list(x = iris_test, y = iris_test$Species),
  scorer = list(
    accuracy = accuracy_vec,
    auc = roc_auc_vec
  ),
  optimize_score = "max",
  prediction_args = list(
    accuracy = list(type = "class"),
    auc = list(type = "prob")
  ),
  convert_predictions = list(
    accuracy = NULL,
    auc = function(i) i[, "FALSE"]
  )
)

# Fit models across grid
iris_grid_fitted <- iris_grid$fit(
  formula = Species ~ .,
  data = iris_train
)

Let’s check out the optimal decision tree hyperparameters.

iris_grid_fitted$best_params
#> $minsplit
#> [1] 10
#> 
#> $maxdepth
#> [1] 20

Grid Search with cross validation

Finally, modeltuning supports model-tuning with Grid Search using cross validation to estimate each model’s true error rate instead of a hold-out validation set. We’ll use cross validation to tune the same parameters as above.

# Specify Grid Search schema with cross validation
iris_grid_cv <- GridSearchCV$new(
  learner = rpart,
  learner_args = list(method = "class"),
  tune_params = list(
    minsplit = seq(10, 30, by = 5),
    maxdepth = seq(20, 30, by = 2)
  ),
  splitter = splitter,
  splitter_args = list(v = 3, strata = Species),
  scorer = list(
    accuracy = accuracy_vec,
    auc = roc_auc_vec
  ),
  optimize_score = "max",
  prediction_args = list(
    accuracy = list(type = "class"),
    auc = list(type = "prob")
  ),
  convert_predictions = list(
    accuracy = NULL,
    auc = function(i) i[, "FALSE"]
  )
)

# Fit models across grid
iris_grid_cv_fitted <- iris_grid_cv$fit(
  formula = Species ~ .,
  data = iris_train
)

Let’s check out the optimal decision tree hyperparameters

iris_grid_cv_fitted$best_params
#> $minsplit
#> [1] 10
#> 
#> $maxdepth
#> [1] 28

as well as the cross validation ROC AUC for those parameters

iris_grid_cv_fitted$best_metric
#> [1] 0.9555556

Parallelization

As noted above, modeltuning is built on top of the future package and can utilize any parallelization method supported by the future package when fitting cross-validated models or tuning models with grid search. The code below evaluates the same cross-validated binary classification model using local parallelization.

plan(multisession)

# Fit cross validation model
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_train)

plan(sequential)

# Model performance metrics
iris_cv_fitted$mean_metrics
#> $f_meas
#> [1] 0.9564668
#> 
#> $accuracy
#> [1] 0.939951
#> 
#> $auc
#> [1] 0.9480072

And voila!

Examples

For a bunch of worked examples with common ML frameworks, check out the /examples directory!

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.