The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
prc()
which enables easy building of precision-recall curves from ‘nestedcv’ models and repeatcv()
results.predict
method for cva.glmnet
.|>
can be used instead.metrics()
which gives additional performance metrics for binary classification models such as F1 score, Matthew’s correlation coefficient and precision recall AUC.pls_filter()
which uses partial least squares regression to filter features.repeatedcv()
leading to significant improvement in speed.nestcv.train()
. If argument cv.cores
>1, openMP multithreading is now disabled, which prevents caret models xgbTree
and xgbLinear
from crashing, and allows them to be parallelised efficiently over the outer CV loops.var_stability()
and its plots.nestcv.glmnet()
repeatcv()
to apply repeated nested CV to the main nestedcv
model functions for robust measurement of model performance.modifyX
argument to all nestedcv
models. This allows more powerful manipulation of the predictors such as scaling, imputing missing values, adding extra columns through variable manipulations. Importantly these are applied to train and test input data separately.predict()
function for nestcv.SuperLearner()
pred_SuperLearner
wrapper for use with fastshap::explain
nestcv.SuperLearner()
on windows.nestcv.glmnet()
verbose
in nestcv.train()
, nestcv.glmnet()
and outercv()
to show progress.multicore_fork
in nestcv.train()
and outercv()
to allow choice of parallelisation between forked multicore processing using mclapply
or non-forked using parLapply
. This can help prevent errors with certain multithreaded caret models e.g. model = "xgbTree"
.one_hot()
changed all_levels
argument default to FALSE
to be compatible with regression models by default.lm_filter()
full results tablelm_filter()
where variables with zero variance were incorrectly reporting very low p-values in linear models instead of returning NA
. This is due to how rank deficient models are handled by RcppEigen::fastLmPure
. Default method for fastLmPure
has been changed to 0
to allow detection of rank deficient models.weight()
caused by NA
. Allow weight()
to tolerate character vectors.keep_factors
option has been added to filters to control filtering of factors with 3 or more levels.one_hot()
for fast one-hot encoding of factors and character columns by creating dummy variables.stat_filter()
which applies univariate filtering to dataframes with mixed datatype (continuous & categorical combined).anova_filter()
from Rfast::ftests()
to matrixTests::col_oneway_welch()
for much better accuracynestcv.train()
(Matt Siggins suggestion)n_inner_folds
argument to nestcv.train()
to make it easier to set the number of inner CV folds, and inner_folds
argument which enables setting the inner CV fold indices directly (suggestion Aline Wildberger)plot_shap_beeswarm()
caused by change in fastshap 0.1.0 output from tibble to matrixnestcv.train()
pass_outer_folds
to both nestcv.glmnet
and nestcv.train
: this enables passing of passing of outer CV fold indices stored in outer_folds
to the final round of CV. Note this can only work if n_outer_folds
= number of inner CV folds and balancing is not applied so that y
is a consistent length.nfolds
for final CV equals n_inner_folds
in nestcv.glmnet()
plot_var_stability()
to be more user friendlytop
argument to shap plotsfastshap
for calculating SHAP values.force_vars
argument to glmnet_filter()
ranger_filter()
nestcv.train()
from models such as gbm
. This fixes multicore bug when using standard R gui on mac/linux.nestcv.glmnet()
model has 0 or 1 coefficients.nestedcv
models now return xsub
containing a subset of the predictor matrix x
with filtered variables across outer folds and the final fitboxplot_model()
no longer needs the predictor matrix to be specified as it is contained in xsub
in nestedcv
modelsboxplot_model()
now works for all nestedcv
model typesvar_stability()
to assess variance and stability of variable importance across outer folds, and directionality for binary outcomeplot_var_stability()
to plot variable stability across outer foldsfinalCV = NA
option which skips fitting the final model completely. This gives a useful speed boost if performance metrics are all that is needed.model
argument in outercv
now prefers a character value instead of a function for the model to be fittedoutercv
nestcv.train
which improves error detection in caret. So nestcv.train
can be run in multicore mode straightaway.nestcv.glmnet
nestcv.glmnet
outer_train_predict
argument to enable saving of predictions on outer training foldstrain_preds
to obtain outer training fold predictionstrain_summary
to show performance metrics on outer training foldssmote()
SuperLearner
packagenestcv.train
and nestcv.glmnet
nestcv.train
for caret models with tuning parameters which are factorsnestcv.train
for caret models using regressionnestcv.train
and nestcv.glmnet
to tune final model parameters using a final round of CV on the whole datasetnestcv.train
and outercv
randomsample()
to handle class imbalance using random over/undersamplingsmote()
for SMOTE algorithm for increasing minority class databoot_ttest()
nestcv.glmnet()
is mean of best lambdas on log scaleplot_varImp
for plotting variable importance for nestcv.glmnet
final modelsnestcv.glmnet()
cva.glmnet()
plot.cva.glmnet
alphaSet
in plot.cva.glmnet
train
function of caret
filterFUN
is no longer done through ...
but with a list of arguments passed through a new argument filter_options
.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.