The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Linear models derive prediction intervals analytically from the covariance structure of their parameter estimates. A neural network has no equivalent: it computes a single deterministic output for each input, with no internal distributional model to derive intervals from.
This is not a limitation specific to kerasnip. The
entire tidymodels ecosystem reflects the same reality, the
parsnip package does not register conf_int or
pred_int prediction types for any neural network engine,
including brulee, the official tidymodels neural network
package.
The supported and recommended path for obtaining prediction intervals
from neural networks in tidymodels is conformal
inference, provided by the probably package.
Conformal inference is:
fit() and predict() calls are
required.The probably package provides three conformal methods,
described in the sections below.
library(kerasnip)
library(tidymodels)
#> ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidymodels 1.5.0 ──
#> ✔ broom 1.0.12 ✔ recipes 1.3.2
#> ✔ dials 1.4.3 ✔ rsample 1.3.2
#> ✔ dplyr 1.2.1 ✔ tailor 0.1.0
#> ✔ ggplot2 4.0.3 ✔ tidyr 1.3.2
#> ✔ infer 1.1.0 ✔ tune 2.1.0
#> ✔ modeldata 1.5.1 ✔ workflows 1.3.0
#> ✔ parsnip 1.5.0 ✔ workflowsets 1.1.1
#> ✔ purrr 1.2.2 ✔ yardstick 1.4.0
#> ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ recipes::step() masks stats::step()
library(probably)
#>
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#>
#> as.factor, as.orderedAll three methods work with any fitted
workflows::workflow() object, so the setup is identical to
any other kerasnip workflow.
input_block <- function(model, input_shape) {
keras3::keras_model_sequential(input_shape = input_shape)
}
dense_block <- function(model, units = 32) {
model |>
keras3::layer_dense(units = units, activation = "relu")
}
output_block <- function(model) {
model |> keras3::layer_dense(units = 1)
}
create_keras_sequential_spec(
model_name = "conf_mlp",
layer_blocks = list(
input = input_block,
dense = dense_block,
output = output_block
),
mode = "regression"
)
spec <- conf_mlp(dense_units = 32, fit_epochs = 30) |>
set_engine("keras")
data <- modeldata::ames |>
mutate(Sale_Price = as.numeric(Sale_Price)) |>
select(
Sale_Price,
Gr_Liv_Area,
Year_Built,
Garage_Area,
Total_Bsmt_SF
)
rec <- recipe(Sale_Price ~ ., data = data) |>
step_normalize(all_numeric_predictors())
wflow <- workflow(rec, spec)int_conformal_split)This is the simplest and fastest method. The workflow is:
int_conformal_split(). The function runs
predict() once on the calibration set and records the
residuals.predict() on the resulting object to obtain
intervals for new data.The model is never re-fitted. The calibration residuals act as a reference distribution for deciding how wide to make the intervals.
set.seed(1)
split <- initial_split(data, prop = 0.75)
train_dat <- training(split)
cal_dat <- testing(split)
# Fit on the training set
fit_obj <- fit(wflow, data = train_dat)
# Build the conformal object from the calibration set
conformal_split <- int_conformal_split(fit_obj, cal_data = cal_dat)
#> 23/23 - 0s - 8ms/step
conformal_split
#> Split Conformal inference
#> preprocessor: recipe
#> model: conf_mlp (engine = keras)
#> calibration set size: 733
#>
#> Use `predict(object, new_data, level)` to compute prediction intervals
# Predict intervals for new observations
new_obs <- cal_dat[1:6, ]
predict(conformal_split, new_data = new_obs, level = 0.90)
#> 1/1 - 0s - 63ms/step
#> # A tibble: 6 × 3
#> .pred .pred_lower .pred_upper
#> <dbl> <dbl> <dbl>
#> 1 59969. -168695. 288633.
#> 2 27399. -201265. 256063.
#> 3 60180. -168485. 288844.
#> 4 43338. -185326. 272002.
#> 5 42925. -185739. 271589.
#> 6 18101. -210563. 246765.When to use this: when training cost is non-trivial and you can afford to set aside a calibration set. This is the recommended method for kerasnip models in most practical situations.
int_conformal_cv)This method uses cross-validation to generate out-of-fold predictions, then pools those residuals to calibrate the intervals. The workflow is:
tune::fit_resamples() on the workflow with
save_pred = TRUE and extract = function(x) x
so that both predictions and the fitted model objects are retained for
each fold.resample_results object to
int_conformal_cv(). It reads the held-out predictions and
uses them to calibrate the intervals.predict() on the resulting object to obtain
intervals for new data.The final model is fitted once more on the full training data inside
int_conformal_cv().
set.seed(1)
folds <- vfold_cv(data, v = 5)
# Fit the workflow on each fold, retaining predictions and model objects
fitted_folds <- tune::fit_resamples(
wflow,
folds,
control = tune::control_resamples(
save_pred = TRUE,
extract = function(x) x
)
)
#> 19/19 - 0s - 8ms/step
#> 19/19 - 0s - 12ms/step
#> 19/19 - 0s - 9ms/step
#> 19/19 - 0s - 10ms/step
#> 19/19 - 0s - 16ms/step
# Build the conformal object from the resampling results
conformal_cv <- int_conformal_cv(fitted_folds)
conformal_cv
#> Conformal inference via CV+
#> preprocessor: recipe
#> model: conf_mlp (engine = keras)
#> number of models: 5
#> training set size: 2,930
#>
#> Use `predict(object, new_data, level)` to compute prediction intervals
predict(conformal_cv, new_data = data[1:6, ], level = 0.90)
#> 1/1 - 0s - 122ms/step
#> 1/1 - 0s - 150ms/step
#> 1/1 - 0s - 76ms/step
#> 1/1 - 0s - 84ms/step
#> 1/1 - 0s - 86ms/step
#> # A tibble: 6 × 3
#> .pred_lower .pred .pred_upper
#> <dbl> <dbl> <dbl>
#> 1 -192359. 25742. 243844.
#> 2 -198053. 20049. 238151.
#> 3 -200821. 17281. 235383.
#> 4 -150145. 67957. 286059.
#> 5 -185703. 32399. 250501.
#> 6 -186155. 31947. 250049.When to use this: when you do not want to reserve a separate calibration set and are already running cross-validation to evaluate your model. The cross-validation residuals tend to produce slightly narrower intervals than the split method because they use more of the data. For kerasnip models, note that this refits the model once per fold, which multiplies training time by the number of folds.
int_conformal_full)Full conformal inference is the most principled of the three methods in terms of statistical guarantees, but it is the most expensive computationally.
The algorithm works observation by observation. For each new test point, it appends that observation to the training set with a candidate outcome value, refits the model from scratch on the augmented dataset, and checks whether the residual for the candidate value is consistent with the training residuals. By sweeping over a grid of candidate values, it determines the interval of values that the algorithm would not reject.
For a kerasnip model, every refit call goes through the full build–compile–train cycle: the Keras architecture is reconstructed from the layer blocks, compiled, and trained from random weight initialisation. This means the total number of training runs is:
(number of new test observations) × (number of candidate grid points)
For the example below with 6 new observations and a 20-point grid,
that is 120 complete model training runs. Even with 30 epochs on a small
dataset this takes meaningful time. In practice,
int_conformal_split or int_conformal_cv are
the recommended choices for kerasnip models. Full conformal inference is
provided for completeness and for situations where the extra statistical
rigour is required and training cost is acceptable.
# Use a small subset to keep runtime reasonable in this vignette
data_small <- data[1:100, ]
new_obs_small <- data[101:106, ]
fit_small <- fit(wflow, data = data_small)
conformal_full <- int_conformal_full(
fit_small,
train_data = data_small,
control = control_conformal_full(
method = "grid",
trial_points = 20
)
)
#> 4/4 - 0s - 34ms/step
conformal_full
#> Conformal inference
#> preprocessor: recipe
#> model: conf_mlp (engine = keras)
#> training set size: 100
#>
#> Use `predict(object, new_data, level)` to compute prediction intervals
predict(conformal_full, new_data = new_obs_small, level = 0.90)
#> 1/1 - 0s - 64ms/step
#> # A tibble: 6 × 2
#> .pred_lower .pred_upper
#> <dbl> <dbl>
#> 1 -452504. 452607.
#> 2 -451266. 451368.
#> 3 -444009. 444106.
#> 4 -471366. 471481.
#> 5 -463133. 463398.
#> 6 -502665. 502800.When to use this: when you need the strongest
possible coverage guarantees and are willing to pay the training cost.
For neural networks, prefer int_conformal_split unless you
have a specific reason to use full conformal inference.
| Method | Model refits | Coverage guarantee | Recommended for kerasnip |
|---|---|---|---|
int_conformal_split |
1 (on training data only) | Marginal | Yes, lowest cost |
int_conformal_cv |
1 per fold + 1 final | Marginal | Yes, if already cross-validating |
int_conformal_full |
(n_test × n_grid) | Marginal | With caution, expensive |
All three methods provide marginal coverage: across many test observations, at least the requested fraction will have their true outcome within the interval. None of them provide conditional coverage (intervals that are exactly correct for every specific input value), but in practice they are well-calibrated for most regression problems.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.