The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

The goal of modeltuning is to provide common model selection and tuning utilities in an intuitive manner. Additionally, modeltuning aims to be:
Matrix sparse matricesfuture package
and is compatible with any of the (many!) available parallelization
backends.You can install the development version of modeltuning
with:
# install.packages("pak")
pak::pkg_install("dmolitor/modeltuning")These are simple examples that use the built-in iris
data-set to illustrate the basic functionality of modeltuning.
First we’ll train a binary classification Decision Tree model to
predict whether the flowers in iris are of Species
virginica and we’ll specify a 3-fold cross validation
scheme with stratification by Species to estimate our model’s true error
rate.
First, let’s split our data into a train and test set.
library(future)
library(modeltuning)
library(rpart)
library(rsample)
library(yardstick)
iris_new <- iris[sample(1:nrow(iris), nrow(iris)), ]
iris_new$Species <- factor(iris_new$Species == "virginica")
iris_train <- iris_new[1:100, ]
iris_test <- iris_new[101:150, ]Next, we’ll define a function to generate cross validation splits.
splitter <- function(data, ...) lapply(vfold_cv(data, ...)$splits, \(.x) .x$in_id)Now, let’s specify and fit a 3-fold cross validation scheme and calculate the F-Measure, Accuracy, and ROC AUC as our hold-out set evaluation metrics.
# Specify cross validation schema
iris_cv <- CV$new(
  learner = rpart,
  learner_args = list(method = "class"),
  splitter = splitter,
  splitter_args = list(v = 3, strata = Species),
  scorer = list(
    f_meas = f_meas_vec,
    accuracy = accuracy_vec,
    auc = roc_auc_vec
  ), 
  prediction_args = list(
    f_meas = list(type = "class"),
    accuracy = list(type = "class"), 
    auc = list(type = "prob")
  ),
  convert_predictions = list(
    f_meas = NULL,
    accuracy = NULL,
    auc = function(.x) .x[, "FALSE"]
  )
)
# Fit cross validated model
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_new)Now, let’s check our evaluation metrics averaged across folds.
iris_cv_fitted$mean_metrics
#> $f_meas
#> [1] 0.9492091
#> 
#> $accuracy
#> [1] 0.9333173
#> 
#> $auc
#> [1] 0.9304813Another common model-tuning method is grid search. We’ll use it to
tune the minsplit and maxdepth parameters of
our decision tree. We will choose our optimal hyper-parameters as those
that maximize the ROC AUC on the validation set.
# Specify Grid Search schema
iris_grid <- GridSearch$new(
  learner = rpart,
  learner_args = list(method = "class"),
  tune_params = list(
    minsplit = seq(10, 30, by = 5),
    maxdepth = seq(20, 30, by = 2)
  ),
  evaluation_data = list(x = iris_test, y = iris_test$Species),
  scorer = list(
    accuracy = accuracy_vec,
    auc = roc_auc_vec
  ),
  optimize_score = "max",
  prediction_args = list(
    accuracy = list(type = "class"),
    auc = list(type = "prob")
  ),
  convert_predictions = list(
    accuracy = NULL,
    auc = function(i) i[, "FALSE"]
  )
)
# Fit models across grid
iris_grid_fitted <- iris_grid$fit(
  formula = Species ~ .,
  data = iris_train
)Let’s check out the optimal decision tree hyperparameters.
iris_grid_fitted$best_params
#> $minsplit
#> [1] 10
#> 
#> $maxdepth
#> [1] 20Finally, modeltuning supports model-tuning with Grid
Search using cross validation to estimate each model’s true error rate
instead of a hold-out validation set. We’ll use cross validation to tune
the same parameters as above.
# Specify Grid Search schema with cross validation
iris_grid_cv <- GridSearchCV$new(
  learner = rpart,
  learner_args = list(method = "class"),
  tune_params = list(
    minsplit = seq(10, 30, by = 5),
    maxdepth = seq(20, 30, by = 2)
  ),
  splitter = splitter,
  splitter_args = list(v = 3, strata = Species),
  scorer = list(
    accuracy = accuracy_vec,
    auc = roc_auc_vec
  ),
  optimize_score = "max",
  prediction_args = list(
    accuracy = list(type = "class"),
    auc = list(type = "prob")
  ),
  convert_predictions = list(
    accuracy = NULL,
    auc = function(i) i[, "FALSE"]
  )
)
# Fit models across grid
iris_grid_cv_fitted <- iris_grid_cv$fit(
  formula = Species ~ .,
  data = iris_train
)Let’s check out the optimal decision tree hyperparameters
iris_grid_cv_fitted$best_params
#> $minsplit
#> [1] 10
#> 
#> $maxdepth
#> [1] 28as well as the cross validation ROC AUC for those parameters
iris_grid_cv_fitted$best_metric
#> [1] 0.9555556As noted above, modeltuning is built on top of the
future package and can utilize any parallelization method
supported by the future package
when fitting cross-validated models or tuning models with grid search.
The code below evaluates the same cross-validated binary classification
model using local parallelization.
plan(multisession)
# Fit cross validation model
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_train)
plan(sequential)
# Model performance metrics
iris_cv_fitted$mean_metrics
#> $f_meas
#> [1] 0.9564668
#> 
#> $accuracy
#> [1] 0.939951
#> 
#> $auc
#> [1] 0.9480072And voila!
For a bunch of worked examples with common ML frameworks, check out
the /examples directory!
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.