The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
evoFE (Evolutionary Feature Engineering) is an R
package that uses a genetic algorithm to automatically discover,
combine, and optimize feature transformations for tabular datasets.
Instead of manually engineering interaction terms, ratios, or binning
strategies, evoFE searches the space of possible feature
recipes to maximize the predictive performance of LightGBM or XGBoost
models.
The final output is a reusable
evo_recipe object that can be easily
applied to new data at prediction time.
log(ratio(x1, x2))).cv) and stratified Train/Validation/Holdout Split
(split) strategies.You can install the package directly from GitHub:
# Install devtools if you haven't already
install.packages("devtools")
# Install evoFE
devtools::install_github("tanopereira/evoFE", build_vignettes = TRUE)Several of evoFE’s core transformers (like Genie and
Lumbermark clustering) are implemented in C++ and parallelized using
OpenMP. On macOS, R packages compile single-threaded by default. To
enable multi-threading:
Install libomp via Homebrew:
brew install libompConfigure your ~/.R/Makevars file to use OpenMP:
SHLIB_OPENMP_CFLAGS = -Xpreprocessor -fopenmp
SHLIB_OPENMP_CXXFLAGS = -Xpreprocessor -fopenmp
CPPFLAGS += -I/opt/homebrew/opt/libomp/include
LDFLAGS += -L/opt/homebrew/opt/libomp/lib -lompReinstall quitefastmst, genieclust,
lumbermark, and deadwood from source:
install.packages(c("quitefastmst", "genieclust", "lumbermark", "deadwood"), type = "source")Here is a quick example using the mtcars dataset for a
binary classification task:
library(evoFE)
data(mtcars)
df <- mtcars
df$am <- as.integer(df$am) # target: 0 = automatic, 1 = manual
# Evolve features
recipe <- evolve_features(
data = df,
target_col = "am",
task = "classification",
evaluator = "xgboost",
generations = 5,
pop_size = 8,
cv_folds = 3,
seed = 42,
verbose = TRUE
)
# View the winning recipe
cat("Best Recipe: ", individual_to_recipe_string(recipe$best_individual), "\n")
cat("Best Fitness: ", recipe$best_individual$fitness, "\n")
# Engineer features on new data
engineered_df <- predict(recipe, df[1:5, ])
# Run predictions using the trained model
predictions <- predict_model(recipe, df[1:5, ])| Category | Transformers |
|---|---|
| Arithmetic | log, sqrt,
reciprocal, add, subtract,
multiply, divide,
normalized_difference, log_ratio |
| Group-by Aggregations | groupby_mean,
groupby_sd, groupby_max,
groupby_min, groupby_ratio,
groupby_zscore |
| Encoding & Binning | target_encode,
frequency_encode, one_hot_encode,
quantile_binning, log_binning,
quantile_binning_cat, log_binning_cat |
| Dimensionality Reduction | pca,
truncated_svd, random_projection,
umap |
| Graph & Clustering | genie,
lumbermark, mst_score,
deadwood |
This project is licensed under the MIT License - see the LICENSE file for details.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.