The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Argument response renamed to
responses: Now accepts multiple response
variables. Functions affected: collinear(),
preference_order(), and related validation
functions.
Argument encoding_method defaults to
NULL in collinear(): Target encoding
is now opt-in rather than automatic. Previously defaulted to
"mean".
Default values changed for max_cor and
max_vif: Both now default to NULL,
triggering adaptive threshold computation based on the correlation
structure of the data.
Output structure changed for
collinear(): Now returns a list of class
collinear_output containing sub-lists of class
collinear_selection, each with response,
df, preference_order, selection,
and formulas slots. Previously returned a character vector
or named list of character vectors.
| Old Name (v2.0) | New Name (v3.0) |
|---|---|
identify_predictors() |
Split into identify_valid_variables(),
identify_numeric_variables(),
identify_categorical_variables(),
identify_logical_variables() |
identify_predictors_categorical() |
identify_categorical_variables() |
identify_predictors_numeric() |
identify_numeric_variables() |
identify_predictors_zero_variance() |
identify_zero_variance_variables() |
identify_predictors_type() |
Removed (merged into identify_valid_variables()) |
f_ Functions for Preference Order| Old Name (v2.0) | New Name (v3.0) |
|---|---|
f_r2_glm_gaussian() |
f_numeric_glm() |
f_r2_gam_gaussian() |
f_numeric_gam() |
f_r2_rf() |
f_numeric_rf() |
f_r2_glm_poisson() |
f_count_glm() |
f_r2_gam_poisson() |
f_count_gam() |
f_auc_glm_binomial() |
f_binomial_glm() |
f_auc_gam_binomial() |
f_binomial_gam() |
f_auc_rf_binomial() |
f_binomial_rf() |
f_v_rf() |
f_categorical_rf() |
| — | f_count_rf() (new) |
When both max_cor = NULL and
max_vif = NULL, the function now automatically determines
optimal filtering thresholds using:
gam_cor_to_vif) mapping correlation
thresholds to equivalent VIF values.This data-driven approach adapts to each dataset’s correlation structure, preventing over-filtering while maintaining statistically meaningful bounds.
step_collinear(): Recipe step for multicollinearity
filtering in tidymodels workflows.prep() and bake()
methods following recipes architecture.cv_training_fraction and
cv_iterations in preference_order() and passed
through collinear().collinear() now returns comprehensive results
including:
S3 methods print() and summary() for
collinear_output and collinear_selection
classes provide clean output formatting.
cor_matrix() now returns signed correlations,
preserving the positive semi-definite property required for VIF
calculations.max_cor thresholds.collinear_stats(): Compute summary statistics for both
correlation and VIF.cor_stats(): Summary statistics for pairwise
correlations.vif_stats(): Summary statistics for variance inflation
factors.f_count_rf(): Score integer count predictors with
random forest.print.collinear_output()print.collinear_selection()summary.collinear_output()summary.collinear_selection()| Name | Description |
|---|---|
experiment_adaptive_thresholds |
Validation experiment results (10,000 iterations) |
experiment_cor_vs_vif |
Correlation vs VIF equivalence experiment results |
gam_cor_to_vif |
Fitted GAM for mapping max_cor to
max_vif |
prediction_cor_to_vif |
Look-up table for threshold equivalence |
toy |
Simple dataset illustrating multicollinearity concepts |
vi_smol |
Smaller version of vi dataset (610 rows) for faster
examples |
vi_responses |
Character vector of response variable names |
solve() to prevent
false singularity detection.Inf.validate_arg_*() functions provide consistent
argument checking across the package.@family tags for better cross-referencing.@inheritSection for consistent documentation of shared
concepts.abs() before VIF
computation."auto" in preference_order argument
(ignored with message)Expanded Functionality: Functions
collinear() and preference_order() support
both categorical and numeric responses and predictors, and can handle
several responses at once.
Robust Selection Algorithms: Enhanced selection
in vif_select() and cor_select().
Enhanced Functionality to Rank Predictors: New functions to compute association between response and predictors covering most use-cases, and automated function selection depending on data features.
Simplified Target Encoding: Streamlined and
parallelized for better efficiency, and new default is
"loo" (leave-one-out).
Parallelization and Progress Bars: Utilizes
future and progressr for enhanced performance
and user experience.
collinear(),
cor_select(), and vif_select()These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.