The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The goal of imputeGeneric is to ease the implementation of imputation functions.
You can install the development version of imputeGeneric from GitHub with:
# install.packages("devtools")
::install_github("torockel/imputeGeneric") devtools
The aim of imputeGeneric is to make the implementation and usage of
imputation methods easier. The main function of the package is
impute_iterative()
. This function can turn any parsnip model into
an imputation method. Furthermore, other customized approaches can be
used in a general imputation framework. For more information, see the
documentations of impute_iterative()
,
impute_supervised()
, impute_unsupervised()
and
the following examples.
The use of a parsnip model for imputation is demonstrated using
regression trees from the rpart package via parsnip
(decision_tree("regression")
). First, a data set with
missing values is created. Then, this data set is imputed once with
regression trees using only completely observed rows and columns for the
model building.
library(imputeGeneric)
library(parsnip)
# create data set
set.seed(123)
<- data.frame(X = rnorm(100), Y = rnorm(100))
ds_mis $Z <- 5 + 2* ds_mis$X + ds_mis$Y + rnorm(100)
ds_mis$Z[sample.int(100, 30)] <- NA
ds_mis$Y[sample.int(100, 20)] <- NA
ds_mis# impute data set
<- impute_iterative(ds_mis, decision_tree("regression"), max_iter = 1)
ds_imp anyNA(ds_imp)
#> [1] FALSE
To use other parsnip models instead of regression trees, only the
model_spec_parsnip
argument must be altered. E.g. for
linear regression instead of regression trees use
linear_reg()
.
<- impute_iterative(ds_mis, linear_reg(), max_iter = 1)
ds_imp_lm anyNA(ds_imp_lm)
#> [1] FALSE
Many aspects of the imputation can be specified and customized. The
missing values can be initially imputed e.g. with per column mean values
(initial_imputation_fun = missMethods::impute_mean
). In
addition, all objects and columns can be used for the imputation models
(rows_used_for_imputation = "all"
and
cols_used_for_imputation = "all"
). Furthermore, the
imputation can be iterative. The iteration will be stopped, if either
the difference between two imputed data sets falls below a threshold
(stop_fun = stop_ds_difference, stop_fun_args = list(eps = 0.1)
)
or the maximum number of iterations (max_iter = 5
) is
reached.
<- impute_iterative(
ds_imp2 decision_tree("regression"),
ds_mis, initial_imputation_fun = missMethods::impute_mean,
cols_used_for_imputation = "all",
rows_used_for_imputation = "all",
stop_fun = stop_ds_difference,
stop_fun_args = list(eps = 0.1),
max_iter = 5)
anyNA(ds_imp2)
#> [1] FALSE
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.