The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

OptimalBinningWoE is a high-performance R package for optimal binning and Weight of Evidence (WoE) transformation, designed for credit scoring, risk assessment, and predictive modeling applications.
| Feature | Benefit |
|---|---|
| 36 Algorithms | Choose the best method for your data characteristics |
| C++ Performance | Process millions of records efficiently via Rcpp/RcppEigen |
| tidymodels Ready | Seamless integration with modern ML pipelines |
| Regulatory Compliance | Monotonic binning for Basel/IFRS 9 requirements |
| Production Quality | Comprehensive testing and documentation |
# Install from CRAN (when available)
install.packages("OptimalBinningWoE")
# Or install the development version from GitHub
# install.packages("pak")
pak::pak("evandeilton/OptimalBinningWoE")library(OptimalBinningWoE)
# Create sample data
set.seed(123)
df <- data.frame(
age = rnorm(1000, 45, 15),
income = exp(rnorm(1000, 10, 0.5)),
education = sample(c("HS", "BA", "MA", "PhD"), 1000, replace = TRUE),
target = rbinom(1000, 1, 0.15)
)
# Automatic optimal binning with WoE calculation
result <- obwoe(
data = df,
target = "target",
algorithm = "jedi", # Joint Entropy-Driven Information
min_bins = 3,
max_bins = 6
)
# View summary
print(result)
# Examine binning details
result$results$agelibrary(tidymodels)
library(OptimalBinningWoE)
# Create a preprocessing recipe with WoE transformation
rec <- recipe(default ~ ., data = credit_data) %>%
step_obwoe(
all_predictors(),
outcome = "default",
algorithm = "mob", # Monotonic Optimal Binning
min_bins = 3,
max_bins = tune(), # Tune the number of bins
output = "woe"
)
# Works seamlessly in ML workflows
workflow() %>%
add_recipe(rec) %>%
add_model(logistic_reg()) %>%
fit(data = training_data)WoE quantifies the predictive power of each bin by measuring the log-odds ratio:
\[\text{WoE}_i = \ln\left(\frac{\text{Distribution of Goods}_i}{\text{Distribution of Bads}_i}\right)\]
Interpretation:
IV measures the overall predictive power of a feature:
\[\text{IV} = \sum_{i=1}^{n} (\text{Dist. Goods}_i - \text{Dist. Bads}_i) \times \text{WoE}_i\]
| IV Range | Predictive Power | Recommendation |
|---|---|---|
| < 0.02 | Unpredictive | Exclude |
| 0.02 – 0.10 | Weak | Use cautiously |
| 0.10 – 0.30 | Medium | Good predictor |
| 0.30 – 0.50 | Strong | Excellent predictor |
| > 0.50 | Suspicious | Check for data leakage |
OptimalBinningWoE provides 36 algorithms optimized for different scenarios:
| Algorithm | Function | Best For |
|---|---|---|
| JEDI | ob_numerical_jedi() |
General purpose, balanced performance |
| MOB | ob_numerical_mob() |
Regulatory compliance (monotonic) |
| ChiMerge | ob_numerical_cm() |
Statistical significance-based merging |
| DP | ob_numerical_dp() |
Optimal partitioning with constraints |
| Sketch | ob_numerical_sketch() |
Large-scale / streaming data |
| Algorithm | Function | Specialty |
|---|---|---|
| MDLP | ob_numerical_mdlp() |
Entropy-based discretization |
| MBLP | ob_numerical_mblp() |
Monotonic binning via linear programming |
| IR | ob_numerical_ir() |
Isotonic regression binning |
| EWB | ob_numerical_ewb() |
Fast equal-width binning |
| KMB | ob_numerical_kmb() |
K-means clustering approach |
| Acronym | Full Name | Description |
|---|---|---|
| BB | Branch and Bound | Exact optimization |
| CM | ChiMerge | Chi-square merging |
| DMIV | Decision Tree MIV | Recursive partitioning |
| DP | Dynamic Programming | Optimal partitioning |
| EWB | Equal Width | Fixed-width bins |
| Fast-MDLP | Fast MDLP | Optimized entropy |
| FETB | Fisher’s Exact Test | Statistical significance |
| IR | Isotonic Regression | Order-preserving |
| JEDI | Joint Entropy-Driven | Information maximization |
| JEDI-MWoE | JEDI Multinomial | Multi-class targets |
| KMB | K-Means Binning | Clustering-based |
| LDB | Local Density | Density estimation |
| LPDB | Local Polynomial | Smooth density |
| MBLP | Monotonic LP | LP optimization |
| MDLP | Min Description Length | Entropy-based |
| MOB | Monotonic Optimal | IV-optimal + monotonic |
| MRBLP | Monotonic Regression LP | Regression + LP |
| OSLP | Optimal Supervised LP | Supervised learning |
| Sketch | KLL Sketch | Streaming quantiles |
| UBSD | Unsupervised StdDev | Standard deviation |
| UDT | Unsupervised DT | Decision tree |
| Algorithm | Function | Specialty |
|---|---|---|
| SBLP | ob_categorical_sblp() |
Similarity-based grouping |
| IVB | ob_categorical_ivb() |
IV maximization |
| GMB | ob_categorical_gmb() |
Greedy monotonic |
| SAB | ob_categorical_sab() |
Simulated annealing |
| Acronym | Full Name | Description |
|---|---|---|
| CM | ChiMerge | Chi-square merging |
| DMIV | Decision Tree MIV | Recursive partitioning |
| DP | Dynamic Programming | Optimal partitioning |
| FETB | Fisher’s Exact Test | Statistical significance |
| GMB | Greedy Monotonic | Greedy monotonic binning |
| IVB | Information Value | IV maximization |
| JEDI | Joint Entropy-Driven | Information maximization |
| JEDI-MWoE | JEDI Multinomial | Multi-class targets |
| MBA | Modified Binning | Modified approach |
| MILP | Mixed Integer LP | LP optimization |
| MOB | Monotonic Optimal | IV-optimal + monotonic |
| SAB | Simulated Annealing | Stochastic optimization |
| SBLP | Similarity-Based LP | Similarity grouping |
| Sketch | Count-Min Sketch | Streaming counts |
| SWB | Sliding Window | Window-based |
| UDT | Unsupervised DT | Decision tree |
| Use Case | Recommended | Rationale |
|---|---|---|
| General Credit Scoring | jedi, mob |
Best balance of speed and predictive power |
| Regulatory Compliance | mob, mblp, ir |
Guaranteed monotonic WoE patterns |
| Large Datasets (>1M rows) | sketch, ewb |
Sublinear memory, single-pass |
| High Cardinality Categorical | sblp, gmb, ivb |
Intelligent category grouping |
| Interpretability Focus | dp, mdlp |
Clear, explainable bins |
| Multi-class Targets | jedi_mwoe |
Multinomial WoE support |
| Function | Purpose |
|---|---|
obwoe() |
Main interface for optimal binning and WoE |
obwoe_apply() |
Apply learned binning to new data |
obwoe_gains() |
Compute gains table with KS, Gini, lift |
step_obwoe() |
tidymodels recipe step |
ob_preprocess() |
Data preprocessing with outlier handling |
library(OptimalBinningWoE)
# 1. Fit binning model on training data
model <- obwoe(
data = train_data,
target = "default",
algorithm = "mob",
min_bins = 3,
max_bins = 5
)
# 2. View feature importance by IV
print(model$summary[order(-model$summary$total_iv), ])
# 3. Apply transformation
train_woe <- obwoe_apply(train_data, model)
test_woe <- obwoe_apply(test_data, model)
# 4. Compute performance metrics
gains <- obwoe_gains(model, feature = "income")
print(gains)
plot(gains, type = "ks")OptimalBinningWoE is optimized for speed through:
Typical performance on a standard laptop:
| Data Size | Processing Time |
|---|---|
| 100K rows | < 1 second |
| 1M rows | 2-5 seconds |
| 10M rows | 20-60 seconds |
Contributions are welcome! Please see our Contributing Guidelines and Code of Conduct.
If you use OptimalBinningWoE in your research, please cite:
@software{optimalbinningwoe,
author = {José Evandeilton Lopes},
title = {OptimalBinningWoE: Optimal Binning and Weight of Evidence Framework for Modeling},
year = {2026},
url = {https://github.com/evandeilton/OptimalBinningWoE}
}MIT License © 2026 José Evandeilton Lopes
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.