The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Note: This package is currently experimental and under active development. The API may change. Feedback and bug reports are welcome via GitHub Issues.
misl implements Multiple Imputation by Super
Learning (MISL), a flexible approach to handling missing data
that uses a stacked ensemble of machine learning algorithms to impute
missing values across continuous, binary, and categorical variables.
Rather than relying on a single parametric imputation model, MISL builds a super learner for each incomplete variable using the tidymodels framework, combining learners such as linear/logistic regression, random forests, gradient boosted trees, and MARS to produce well-calibrated imputations.
The method is described in:
Carpenito T, Manjourides J. (2022) MISL: Multiple imputation by super learning. Statistical Methods in Medical Research. 31(10):1904–1915. doi: 10.1177/09622802221104238
misl is not yet on CRAN. Install the development version
from GitHub:
# install.packages("remotes")
remotes::install_github("JustinManjourides/misl")The following backend packages are optional but recommended:
install.packages(c("ranger", "xgboost", "earth"))library(misl)
# Introduce missingness into a dataset
set.seed(42)
n <- 200
demo_data <- data.frame(
age = rnorm(n, mean = 50, sd = 10),
weight = rnorm(n, mean = 70, sd = 15),
smoker = rbinom(n, 1, 0.3),
group = factor(sample(c("A", "B", "C"), n, replace = TRUE))
)
demo_data[sample(n, 20), "age"] <- NA
demo_data[sample(n, 15), "weight"] <- NA
demo_data[sample(n, 10), "smoker"] <- NA
demo_data[sample(n, 10), "group"] <- NA
# Run MISL with default settings
misl_imp <- misl(
demo_data,
m = 5,
maxit = 5,
con_method = c("glm", "rand_forest"),
bin_method = c("glm", "rand_forest"),
cat_method = c("rand_forest", "multinom_reg")
)
# Each of the m imputed datasets is accessible via:
completed_data <- misl_imp[[1]]$datasets
# Trace plots can be used to inspect convergence:
trace <- misl_imp[[1]]$traceImputation across the m datasets is parallelised via the
future framework. To
enable parallel execution, set a plan before calling
misl():
library(future)
plan(multisession, workers = 4)
misl_imp <- misl(demo_data, m = 5, maxit = 5)
plan(sequential) # reset when done# View all available learners
list_learners()
# Filter by outcome type
list_learners("continuous")
list_learners("categorical")
# Show only installed learners
list_learners(installed_only = TRUE)If you use misl in your research, please cite the
original paper:
Carpenito T, Manjourides J. (2022) MISL: Multiple imputation by super
learning. Statistical Methods in Medical Research. 31(10):1904-1915.
doi: 10.1177/09622802221104238
BibTeX:
@article{carpenito2022misl,
author = {Carpenito, T and Manjourides, J},
title = {{MISL}: Multiple imputation by super learning},
journal = {Statistical Methods in Medical Research},
year = {2022},
volume = {31},
number = {10},
pages = {1904--1915},
doi = {10.1177/09622802221104238}
}MIT © see LICENSE
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.