The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
fbrglm is forfbrglm is a formula-based front-end for regularized
generalized linear models. Internally it delegates the fit to glmnet;
the wrapper’s job is to make the user-facing experience look like base
R’s glm() — a formula +
data.frame, automatic factor handling, complete-case
filtering, and the familiar S3 methods (print,
summary, coef, predict,
nobs, plot).
The MVP described here is infer = "none": it returns
regularized point estimates and does not report
classical standard errors, z values, p values, or confidence intervals.
Honest post-selection inference (via data splitting or selective
inference) is on the roadmap; see the package TODO.md.
n <- 150
dat <- data.frame(
y = rbinom(n, 1, 0.5),
x1 = rnorm(n),
x2 = rnorm(n),
x3 = rnorm(n)
)
fit <- fbrglm(y ~ x1 + x2 + x3, data = dat,
family = "binomial",
lambda = "cv_min")print() shows the call and the basics of the fit:
print(fit)
#> Formula-based regularized GLM (fbrglm)
#>
#> Call:
#> fbrglm(formula = y ~ x1 + x2 + x3, data = dat, family = "binomial",
#> lambda = "cv_min")
#>
#> Family: binomial
#> Alpha: 1
#> Lambda rule: cv_min
#> Lambda value: 0.05370831
#> Inference: none
#> Observations: total = 150, dropped = 0, used = 150
#> Non-zero terms: 0 / 3summary() returns a structured object that includes the
call, family, chosen λ, complete-case bookkeeping, and the (regularized)
coefficient vector with zeros included:
summary(fit)
#> fbrglm summary
#> ==============
#>
#> Call:
#> fbrglm(formula = y ~ x1 + x2 + x3, data = dat, family = "binomial",
#> lambda = "cv_min")
#>
#> Family: binomial
#> Lambda method: cv_min
#> Lambda value: 0.05370831
#> Inference: none
#>
#> Observations:
#> total = 150, dropped (missing) = 0, used = 150
#>
#> Coefficients:
#> Estimate
#> (Intercept) -0.08
#> x1 0.00
#> x2 0.00
#> x3 0.00
#>
#> Non-zero predictors (0): (none)
#>
#> Note: no standard errors, z-values, or p-values are reported under
#> infer = "none". Classical inference does not account for the
#> shrinkage bias from L1/L2 penalisation or for the data-driven
#> selection of lambda. Use infer = "split" / "selective" (planned)
#> for valid post-selection inference; see vignette("fbrglm").Coefficients and predictions follow the same shapes you’d expect from
glm():
coef(fit)
#> (Intercept) x1 x2 x3
#> -0.08004271 0.00000000 0.00000000 0.00000000
head(predict(fit, newdata = dat[1:5, ], type = "response"))
#> [1] 0.48 0.48 0.48 0.48 0.48A plot() method is registered; it delegates to
plot.cv.glmnet() when λ was chosen by cross-validation, and
to plot.glmnet() otherwise.
lambdaThere are three rules, exposed through a single argument:
fit_min <- fbrglm(y ~ x1 + x2 + x3, data = dat,
family = "binomial", lambda = "cv_min")
fit_1se <- fbrglm(y ~ x1 + x2 + x3, data = dat,
family = "binomial", lambda = "cv_1se")
fit_fix <- fbrglm(y ~ x1 + x2 + x3, data = dat,
family = "binomial",
lambda = "fix", lambda_value = 0.05)
c(cv_min = fit_min$lambda_value,
cv_1se = fit_1se$lambda_value,
fix = fit_fix$lambda_value)
#> cv_min cv_1se fix
#> 0.05370831 0.05370831 0.05000000"cv_min" and "cv_1se" go through
cv.glmnet(); "fix" skips CV and goes straight
to glmnet() at the supplied lambda_value. The
numeric λ actually used is always available at
fit$lambda_value.
Factor columns are auto-dummied via model.matrix(), and
the training factor levels are stored on the fit object so
predict(newdata = ...) can rebuild a design matrix that
matches the training column structure — even when some training levels
are missing from newdata.
n_train <- 200
train <- data.frame(
y = rbinom(n_train, 1, 0.5),
x1 = rnorm(n_train),
g = factor(sample(c("A", "B", "C"), n_train, replace = TRUE),
levels = c("A", "B", "C"))
)
fit_f <- fbrglm(y ~ x1 + g, data = train,
family = "binomial",
lambda = "fix", lambda_value = 0.05)
## newdata is missing level "C"
test <- data.frame(
x1 = rnorm(10),
g = factor(rep(c("A", "B"), 5), levels = c("A", "B", "C"))
)
head(predict(fit_f, newdata = test, type = "response"))
#> [1] 0.56 0.56 0.56 0.56 0.56 0.56fbrglm also tolerates the narrower case where
newdata’s factor has its levels narrowed
(not just its values): missing one-hot columns are padded with zeros
before being handed to glmnet.
fbrglm() drops rows with any NA from the
modelling frame, prints a one-line note, and records the counts on the
fit object under fit$nobs_info.
dat_na <- dat
dat_na$y[1:5] <- NA
fit_na <- fbrglm(y ~ x1 + x2 + x3, data = dat_na,
family = "binomial",
lambda = "fix", lambda_value = 0.05)
#> fbrglm: dropped 5 row(s) with missing values (145 / 150 kept).
fit_na$nobs_info
#> $n_total
#> [1] 150
#>
#> $n_dropped_missing
#> [1] 5
#>
#> $n_used
#> [1] 145
nobs(fit_na)
#> [1] 145offset at fit time goes through to
glmnet(); at predict time, pass newoffset of
matching length. With newdata = NULL the stored training
offset is reused; with newdata supplied, an explicit
newoffset is required.
n_off <- 80
dat_off <- data.frame(
y = rbinom(n_off, 1, 0.5),
x1 = rnorm(n_off),
x2 = rnorm(n_off)
)
fit_off <- fbrglm(y ~ x1 + x2, data = dat_off, family = "binomial",
offset = rep(0.2, n_off),
lambda = "fix", lambda_value = 0.05)
head(predict(fit_off, type = "response")) # reuses training offset
#> [1] 0.5625 0.5625 0.5625 0.5625 0.5625 0.5625
head(predict(fit_off, newdata = dat_off[1:5, ],
newoffset = rep(0.2, 5), type = "response"))
#> [1] 0.5625 0.5625 0.5625 0.5625 0.5625glmnet objectsIf you need to use a glmnet-specific tool, two accessors
get you out of the wrapper:
class(as_glmnet(fit_min))
#> [1] "lognet" "glmnet"
class(as_cv_glmnet(fit_min))
#> [1] "cv.glmnet"
class(as_glmnet(fit_fix))
#> [1] "lognet" "glmnet"
as_cv_glmnet(fit_fix) # NULL — no CV was run
#> NULLas_glmnet() returns the underlying glmnet
object (the $glmnet.fit slot when the wrapper used CV).
as_cv_glmnet() returns the cv.glmnet object,
or NULL for the "fix" λ path.
The MVP is deliberately narrow:
infer = "none" is implemented;
"split" and "selective" are planned but not in
this release.gaussian, binomial,
poisson only. multinomial and cox
will land later.x / y direct-matrix entry point is
reserved but not yet supported — supply formula +
data instead.glm()-style standard errors, z / p values,
and confidence intervals are intentionally not shown
for infer = "none". Doing so naively for regularized
estimators would be misleading; honest inference is the next
milestone.Reproducible benchmarks against raw glmnet,
glmnetUtils, and a parsnip /
workflows pipeline with the glmnet engine live
in a separate repository: https://github.com/dsc-chiba-u/fbrglm-experiments. In
the prediction-failure case (narrowed test factor levels), raw
glmnet built naively can fail; parsnip /
workflows succeeds but with higher runtime overhead than
fbrglm in the tested small-data setting. See the experiments repo for
the CSVs and figures.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.