The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The olr
package provides a systematic way to identify
the best linear regression model by testing all
combinations of predictor variables. You can choose to optimize
based on either R-squared or adjusted
R-squared.
# Load data
<- read.csv(system.file("extdata", "crudeoildata.csv", package = "olr"))
crudeoildata <- crudeoildata[, -1]
dataset
# Define variables
<- 'CrudeOil'
responseName <- c('RigCount', 'API', 'FieldProduction', 'RefinerNetInput',
predictorNames 'OperableCapacity', 'Imports', 'StocksExcludingSPR',
'NonCommercialLong', 'NonCommercialShort',
'CommercialLong', 'CommercialShort', 'OpenInterest')
# Full model using R-squared
<- olr(dataset, responseName, predictorNames, adjr2 = FALSE) model_r2
## Returning model with max R-squared.
##
## Call:
## lm(formula = CrudeOil ~ RigCount + API + FieldProduction + RefinerNetInput +
## OperableCapacity + Imports + StocksExcludingSPR + NonCommercialLong +
## NonCommercialShort + CommercialLong + CommercialShort + OpenInterest,
## data = dataset)
##
## Coefficients:
## (Intercept) RigCount API FieldProduction
## 0.0068578950 -0.3551354134 0.0004393875 0.2670366950
## RefinerNetInput OperableCapacity Imports StocksExcludingSPR
## 0.3535677365 0.0030449534 -0.1034192549 0.7417144521
## NonCommercialLong NonCommercialShort CommercialLong CommercialShort
## -0.5643353759 0.0207113857 -1.3007001952 1.8508558043
## OpenInterest
## -0.0409690597
# Adjusted R-squared model
<- olr(dataset, responseName, predictorNames, adjr2 = TRUE) model_adjr2
## Returning model with max adjusted R-squared.
##
## Call:
## lm(formula = CrudeOil ~ RigCount + RefinerNetInput + Imports +
## StocksExcludingSPR + NonCommercialLong + CommercialLong +
## CommercialShort, data = dataset)
##
## Coefficients:
## (Intercept) RigCount RefinerNetInput Imports
## 0.008256759 -0.380836990 0.322995592 -0.102405212
## StocksExcludingSPR NonCommercialLong CommercialLong CommercialShort
## 0.694028117 -0.528991035 -1.219766893 1.676484528
# Actual values
<- dataset[[responseName]]
actual <- model_r2$fitted.values
fitted_r2 <- model_adjr2$fitted.values
fitted_adjr2
# Data frames for ggplot
<- data.frame(
plot_data Index = 1:length(actual),
Actual = actual,
R2_Fitted = fitted_r2,
AdjR2_Fitted = fitted_adjr2
)
# Plot both fits
ggplot(plot_data, aes(x = Index)) +
geom_line(aes(y = Actual), color = "black", size = 1, linetype = "dashed") +
geom_line(aes(y = R2_Fitted), color = "steelblue", size = 1) +
labs(
title = "Full Model (R-squared): Actual vs Fitted Values",
subtitle = "Observation Index used in place of dates (parsed from original dataset)",
x = "Observation Index",
y = "CrudeOil % Change"
+
) theme_minimal()
ggplot(plot_data, aes(x = Index)) +
geom_line(aes(y = Actual), color = "black", size = 1, linetype = "dashed") +
geom_line(aes(y = AdjR2_Fitted), color = "limegreen", size = 1.1) +
labs(
title = "Optimal Model (Adjusted R-squared): Actual vs Fitted Values",
subtitle = "Observation Index used in place of dates (parsed from original dataset)",
x = "Observation Index",
y = "CrudeOil % Change"
+
)theme_minimal() +
theme(plot.background = element_rect(color = "limegreen", size = 2))
Metric | adjr2 = FALSE (All 12 Predictors) | adjr2 = TRUE (Best Subset of 7 Predictors) |
---|---|---|
Adjusted R-squared | 0.6145 | 0.6531 ✅ (higher is better) |
Multiple R-squared | 0.7018 | 0.699 |
Residual Std. Error | 0.02388 | 0.02265 ✅ (lower is better) |
F-statistic (p-value) | 8.042 (1.88e-07) | 15.26 (3.99e-10) ✅ (stronger model) |
Model Complexity | 12 predictors | 7 predictors ✅ (simpler, more robust) |
Significant Coeffs | 4 | 6 ✅ (more signal, less noise) |
R² Difference | — | ~0.003 ❗ (negligible) |
olr()
function automates model
selection by testing every valid predictor combination.adjr2 = TRUE
to prioritize models that
balance accuracy and parsimony.The adjusted R² model outperformed the full model on: - Adjusted R² - F-statistic - Residual error - Model simplicity - # of significant coefficients
👉 Use adjusted R² (adjr2 = TRUE
) in practice to
avoid overfitting and ensure interpretability.
Created by Mathew Fok • Author of the olr
package
Contact:
quiksilver67213@yahoo.com
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.