The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
ErrorTracer provides a complete pipeline for ecological and genomic forecasting with climate or environmental covariates. It bridges the gap between regularized regression and fully Bayesian workflows in three steps:
glmnet, lm,
glm, ranger) and carry those estimates forward
as prior means into a Bayesian model — rather than discarding them.The package is designed for ecological and genomic time-series forecasting in which a regularized (or plain) regression model is refit as a Bayesian model and predictions must be accompanied by a principled uncertainty budget. The API is fully general — response, predictors, and grouping are formula-driven.
| Feature | Description |
|---|---|
| Regularized → Bayesian prior pipeline | extract_priors() converts glmnet,
lm, glm, or ranger coefficients
into brms-compatible priors |
| Three-way uncertainty decomposition | Partitions forecast variance into parameter / environmental / residual components |
| Forecast shelf life | Quantifies the exact time point at which a forecast becomes uninformative |
| Group-level fitting | One function call fits and predicts across all groups (e.g., SNP clusters, species, sites) |
| Calibration assessment | Observed vs. nominal coverage probability at multiple CI levels |
| Rich visualizations | Six ggplot2-based plotting functions, all returning
customizable objects |
| Full S3 class system | Consistent print, summary, and
plot-family methods across all objects |
Install the development version from GitHub using
remotes:
# install.packages("remotes")
remotes::install_github("madrigalrocalj/ErrorTracer")ErrorTracer requires a Stan backend. Install either
cmdstanr (recommended) or rstan
separately:
# Option A: cmdstanr (recommended)
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
cmdstanr::install_cmdstan()
# Option B: rstan
install.packages("rstan")extract_priors() is an S3 generic that dispatches on the
class of the fitted model. Coefficient estimates (or importance weights
for ranger) are used as prior means; prior SDs are
proportional to coefficient magnitude or standard error.
library(ErrorTracer)
library(glmnet)
# --- Elastic net (cv.glmnet) ---
cv_fit <- cv.glmnet(x_train, y_train, alpha = 0.5)
prior_spec <- extract_priors(
model = cv_fit,
lambda = "lambda.min", # or "lambda.1se"
multiplier = 2.0, # prior SD = multiplier * |coef|
min_sd = 0.1, # floor for prior SD
intercept_prior_sd = 2.5, # intercept prior SD
sigma_prior_scale = 1.0 # half-Cauchy scale for sigma
)
print(prior_spec)
# ErrorTracer prior specification
# Method : glmnet
# Predictors : 5
# Multiplier : 2
# Min SD : 0.1
# Coefficients:
# Tmean mean = 0.3241 sd = 0.6482
# PPT mean = -0.1087 sd = 0.2174
# ...Supported model classes and their prior construction:
| Class | Prior mean | Prior SD |
|---|---|---|
cv.glmnet / glmnet |
Non-zero coefficients | multiplier × |coef| (floor: min_sd) |
lm |
Coefficient estimates | multiplier × SE (floor: min_sd) |
glm |
Coefficient estimates | multiplier × SE (floor: min_sd) |
ranger |
Zero (direction unknown) | multiplier × normalised_importance |
For ranger, only variables with positive permutation
importance are included. Because random forests do not produce signed
coefficients, priors are centered at zero and scaled by importance —
encoding “this predictor matters” without a directional claim.
Note on centering: The glmnet intercept is intentionally excluded from the prior — it reflects the centering of training data, which may not apply to new data. Center predictors upstream and document this in your analysis.
et_fit() wraps brms::brm() and attaches the
prior specification and training data for downstream use.
fit <- et_fit(
formula = z_diff ~ Tmean + PPT + SWE,
data = train_df,
priors = prior_spec, # from extract_priors(); or NULL for brms defaults
chains = 4L,
iter = 2000L,
warmup = 1000L,
cores = 4L,
seed = 42L,
adapt_delta = 0.95,
max_treedepth = 12L
)
print(fit)
# ErrorTracer model (et_model)
# Formula : z_diff ~ Tmean + PPT + SWE
# n obs : 48
# Chains : 4 Iter: 2000 Warmup: 1000
# Priors : informed (glmnet, 3 predictors)
# Rhat max: 1.002
summary(fit)Pass a grouping column name to fit one model per group
and receive an et_model_list:
# Fits one brms model per SNP cluster
fit_list <- et_fit(
formula = z_diff ~ Tmean + PPT + SWE,
data = all_clusters_df,
priors = prior_spec, # single spec applied to all groups
grouping = "cluster_id", # or pass a named list of et_prior_spec for per-group priors
chains = 4L,
iter = 2000L
)
print(fit_list)
# ErrorTracer grouped model list (et_model_list)
# Grouping : cluster_id
# Formula : z_diff ~ Tmean + PPT + SWE
# Groups : 12
# Fitted : 12 / 12et_predict() draws from the posterior predictive
distribution, propagates environmental measurement noise through the
model, and computes credible intervals.
pred <- et_predict(
model = fit,
newdata = forecast_df,
env_noise = list(Tmean = 0.5, PPT = 10.0, SWE = 5.0),
# env_noise can also be a single fraction of each predictor's SD:
# env_noise = 0.10 → 10% noise applied uniformly
n_draws = 2000L,
ci_levels = c(0.5, 0.8, 0.90, 0.95),
n_perturb = 500L # draws used in the perturbation step; reduce for speed
)
print(pred)
# ErrorTracer prediction (et_prediction)
# Observations : 24
# Draws : 2000
# CI levels : 0.5, 0.8, 0.9, 0.95
# Mean var decomposition (across observations):
# Parameter : 0.0412
# Environmental: 0.0083
# Residual : 0.1204
# Total : 0.1619The returned et_prediction object contains:
| Slot | Description |
|---|---|
posterior_predict |
[n_draws × n_obs] matrix — full posterior predictive
draws |
posterior_linpred |
[n_draws × n_obs] matrix — linear predictor draws
(parameter uncertainty only) |
lp_perturbed |
[n_perturb × n_obs] matrix — linear predictor under
environmentally perturbed inputs |
sigma_draws |
Numeric vector of posterior sigma draws |
credible_intervals |
data.frame with columns
row_id, ci_level, lower, median, upper, width |
decomposition |
data.frame with
obs_id, param_var, env_var, residual_var, total_var |
newdata |
The input forecast data |
model |
Reference to the source et_model |
decompose_uncertainty() extracts the variance
decomposition table from a prediction object:
decomp <- decompose_uncertainty(pred)
head(decomp)
# obs_id param_var env_var residual_var total_var
# 1 1 0.0389 0.0071 0.1204 0.1601
# 2 2 0.0441 0.0098 0.1204 0.1637
# ...Decomposition method:
| Component | Estimator |
|---|---|
param_var |
Var[posterior_linpred] across draws — coefficient
uncertainty |
env_var |
Var[lp_perturbed] − Var[lp_unperturbed] — additional
variance from predictor noise |
residual_var |
E[σ²] across posterior draws — biological process
noise |
total_var |
Var[posterior_predict] — full predictive variance |
All components are guaranteed non-negative. Using the posterior mean
of σ² (not the median) ensures
param_var + residual_var ≈ total_var by the law of total
variance.
shelf_life() computes the ratio of credible interval
width to the plausible response range at each forecast time point, and
flags when the forecast becomes uninformative:
# For arcsin-sqrt transformed responses, derive range from training data:
# plausible_range = range(train_df$z_diff)
# For bounded responses, supply explicit bounds:
# plausible_range = c(lower, upper)
sl <- shelf_life(
predictions = pred,
plausible_range = range(train_df$z_diff),
ci_level = 0.90, # must be present in the et_prediction object
threshold = 1.0, # CI/range ratio above which forecast is uninformative
time_col = "year", # column in newdata to use as time axis
max_extrapolation_factor = 10 # cap on linear projection beyond observed window
)
print(sl)
# ErrorTracer shelf life analysis
# Observations : 24
# Plausible range : 2
# Informative : 18 / 24
# Mean CI/range : 0.847
# Max CI/range : 1.231
as.data.frame(sl)
# obs_id time ci_width plausible_range ratio informative
# 1 1 2024 0.841 2 0.421 TRUE
# 2 2 2025 1.103 2 0.552 TRUE
# 3 3 2026 2.461 2 1.231 FALSEA threshold of 1.0 (default) means the forecast is
considered uninformative when the CI spans the entire plausible response
range. Lower thresholds (e.g., 0.8) impose stricter requirements.
et_calibrate() computes observed coverage probability at
each nominal CI level. A well-calibrated model produces coverage that
matches the nominal level:
cal <- et_calibrate(
predictions = pred,
observed = validation_df, # same n_rows as newdata; matched positionally
response_col = "z_diff", # inferred from formula if omitted
ci_levels = c(0.5, 0.8, 0.90, 0.95)
)
print(cal)
# ci_level nominal observed_coverage n_obs calibration_error
# 1 0.50 0.50 0.542 24 0.042
# 2 0.80 0.80 0.792 24 0.008
# 3 0.90 0.90 0.875 24 0.025
# 4 0.95 0.95 0.958 24 0.008et_diagnose() reports Rhat, effective sample size
ratios, divergent transitions, and LOO-CV in a single call:
diag <- et_diagnose(fit, loo = TRUE)
# Convergence
diag$convergence$rhat_max # should be < 1.05
diag$convergence$neff_all_ok # all Neff ratios > 0.1
diag$convergence$n_divergences
# LOO-CV
diag$loo$elpd_loo
diag$loo$n_bad_pareto_k # Pareto k > 0.7 indicate influential observationsFor grouped models, et_diagnose() returns a
per_group list and a summary data.frame with
one row per group.
All plotting functions return ggplot2 objects that can
be further customized with standard ggplot2 additions
(+ theme(...), + labs(...), etc.).
# Stacked bar: proportional contribution of each variance component
et_plot_decomposition(decomp, proportional = TRUE)
# Line chart: CI width / plausible range over time, with uninformative threshold
et_plot_shelf_life(sl, show_ratio = TRUE)
# Calibration: observed vs. nominal coverage (points on the 1:1 diagonal = well-calibrated)
et_plot_calibration(cal)
# Fan chart: nested CI ribbons with optional observed values overlaid
et_plot_forecast(pred, observed = validation_df,
response_col = "z_diff", time_col = "year")
# Density overlay: prior vs. posterior for each regression coefficient
et_plot_prior_posterior(fit, max_preds = 8L)
# Forest plot: Bayesian 95% CI (blue) vs. regularized coefficient (red ×)
et_plot_coefficients(fit)| Class | Created by | Methods |
|---|---|---|
et_prior_spec |
extract_priors() |
print |
et_model |
et_fit() |
print, summary |
et_model_list |
et_fit(..., grouping = ...) |
print, summary |
et_prediction |
et_predict() |
print |
et_prediction_list |
et_predict() on et_model_list |
print |
et_shelf_life |
shelf_life() |
print, summary |
library(ErrorTracer)
library(glmnet)
# --- Prepare data ---
# train_df: historical years with columns z_diff, Tmean, PPT, SWE
# forecast_df: future years with predictor values only
# 1. Fit elastic net
x_mat <- as.matrix(train_df[, c("Tmean", "PPT", "SWE")])
cv_fit <- cv.glmnet(x_mat, train_df$z_diff, alpha = 0.5)
# 2. Extract informed priors
prior_spec <- extract_priors(cv_fit, multiplier = 2.0, min_sd = 0.1)
# 3. Fit Bayesian model
fit <- et_fit(
formula = z_diff ~ Tmean + PPT + SWE,
data = train_df,
priors = prior_spec,
chains = 4L, iter = 2000L, cores = 4L
)
# 4. Predict with environmental noise propagation
pred <- et_predict(
model = fit,
newdata = forecast_df,
env_noise = list(Tmean = 0.5, PPT = 10.0, SWE = 5.0),
ci_levels = c(0.5, 0.80, 0.90, 0.95)
)
# 5. Decompose uncertainty
decomp <- decompose_uncertainty(pred)
# 6. Assess forecast shelf life
# For arcsin-sqrt responses use the observed training range; for bounded
# responses supply explicit bounds.
sl <- shelf_life(
pred,
plausible_range = range(train_df$z_diff),
ci_level = 0.90,
threshold = 1.0,
time_col = "year",
max_extrapolation_factor = 10
)
# 7. Calibrate (if validation data available)
cal <- et_calibrate(pred, observed = valid_df, response_col = "z_diff")
# 8. Diagnose
et_diagnose(fit)
# 9. Visualize
library(ggplot2)
et_plot_decomposition(decomp)
et_plot_shelf_life(sl)
et_plot_calibration(cal)
et_plot_forecast(pred, observed = valid_df,
response_col = "z_diff", time_col = "year")
et_plot_prior_posterior(fit)
et_plot_coefficients(fit)# Fit one model per SNP cluster, then predict and summarize shelf life by cluster
fit_list <- et_fit(
formula = z_diff ~ Tmean + PPT + SWE,
data = all_df,
priors = prior_spec,
grouping = "cluster_id",
chains = 4L, iter = 2000L, cores = 4L
)
pred_list <- et_predict(
fit_list,
newdata = forecast_df,
env_noise = list(Tmean = 0.5, PPT = 10.0),
ci_levels = c(0.50, 0.90, 0.95)
)
# Per-group decomposition and shelf life
decomp_all <- decompose_uncertainty(pred_list)
sl_all <- shelf_life(pred_list,
plausible_range = range(train_df$z_diff),
ci_level = 0.90,
time_col = "year",
max_extrapolation_factor = 10)
# Per-group diagnostics summary
diag_list <- et_diagnose(fit_list)
diag_list$summary # data.frame: rhat_ok, neff_ok, n_divergences, elpd_loo per cluster
# Faceted shelf life plot
et_plot_shelf_life(sl_all) +
ggplot2::facet_wrap(~ group, ncol = 4)| Package | What it does | What is missing |
|---|---|---|
brms |
Bayesian regression | No shelf life, no decomposition, no enet→prior pipeline |
propagate |
Analytical error propagation | No posterior sampling, no Bayesian framework |
rstanarm |
Bayesian regression | Same gaps as brms |
forecast / fable |
Time series forecasting | No ecological/genomic data structures, no decomposition |
INLA |
Spatial/temporal Bayes | No shelf life concept, very different user base |
ErrorTracer’s niche: the full pipeline from regularized feature selection → informed Bayesian refitting → structured uncertainty decomposition → forecast shelf life, designed for ecology and genomics.
ErrorTracer/
├── R/
│ ├── priors.R # extract_priors() — dispatch for glmnet, lm, glm, ranger
│ ├── fit.R # et_fit() — Bayesian model fitting via brms
│ ├── predict.R # et_predict() — posterior prediction with decomposition
│ ├── decompose.R # decompose_uncertainty() — param / env / residual split
│ ├── shelf_life.R # shelf_life() — forecast horizon analysis
│ ├── calibrate.R # et_calibrate(), et_diagnose()
│ ├── plot.R # et_plot_*() — six ggplot2 visualizations
│ └── utils.R # internal helpers (standardize, logging, etc.)
├── tests/
│ └── testthat/
├── vignettes/
│ ├── genomics.Rmd # Drosophila allele frequency example
│ └── ecology.Rmd # Phenology / species abundance example
├── DESCRIPTION
├── NAMESPACE
Imports (installed automatically):
| Package | Version | Purpose |
|---|---|---|
brms |
>= 2.20.0 | Bayesian regression via Stan |
loo |
>= 2.6.0 | LOO cross-validation |
bayesplot |
>= 1.10.0 | Posterior predictive checks |
ggplot2 |
>= 3.4.0 | Visualization |
tidyr |
>= 1.3.0 | Data reshaping |
rlang |
>= 1.1.0 | Tidy evaluation |
Suggests (install separately as needed):
| Package | Purpose |
|---|---|
glmnet |
Elastic net / lasso prior extraction |
ranger |
Random forest prior extraction |
dplyr |
Data manipulation in vignettes |
testthat |
Unit testing |
knitr, rmarkdown |
Vignette building |
Backend (required; install separately): -
cmdstanr (recommended) or rstan
n < 10
observations, informed priors from extract_priors()
dominate the posterior. This is the setting where the enet→prior
pipeline provides the most value, but is also most sensitive to prior
misspecification. Document prior choices carefully.Normal(0, 2.5)) is
appropriate.env_var is estimated via perturbation and is reported
separately from total_var, which is computed from the full
posterior predictive (parameter + residual only). Interpret the
decomposition table accordingly.ranger priors are undirected: Because
random forests provide no signed coefficients, ranger-derived priors are
centered at zero. This is conservative but avoids false directional
assumptions.MIT License. See LICENSE for details.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.