The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette covers Tier 2 of the reproducr workflow in
depth: certify(), check_drift(), and
list_certs(). These three functions together form the
baseline and drift detection system.
You submit a paper in March. Before submission you run the analysis and note the key results: hazard ratio 0.582 (95% CI: 0.446–0.760, p < 0.001).
In May a reviewer asks for a revision. While working on the response
you upgrade your packages — including lme4, which adjusted
its default optimizer tolerances between versions 1.1.29 and 1.1.30. You
re-run the analysis: hazard ratio 0.591 (95% CI: 0.452–0.768).
The numbers are slightly different. No error was thrown. The code is identical. Without a record of what the March run produced, you would not know whether the change came from your revision or from the package upgrade.
[DRIFTED] hr: 0.582 → 0.591
[DRIFTED] ci_lower: 0.446 → 0.452
[DRIFTED] ci_upper: 0.760 → 0.768
With certify() and check_drift(), this is
caught immediately and you can investigate before submitting to the
reviewer.
More broadly, packages change hands, maintainers push silent fixes, platform-level libraries (BLAS, LAPACK) get updated by system administrators, and R itself changes RNG defaults between minor versions. Any of these can alter your numerical results without producing an error.
certify() and check_drift() detect this.
The idea is simple:
certify() — creating a baselinePass a fully named list of any R objects you want to protect. Common choices:
model <- lm(mpg ~ wt + cyl, data = mtcars)
certify(
outputs = list(
coefs = coef(model),
r_squared = summary(model)$r.squared,
sigma = sigma(model),
n_obs = nrow(mtcars),
n_complete = sum(complete.cases(mtcars)),
group_means = aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
),
tag = "baseline-v1",
script = "analysis.R",
file = cert_file
)
#> reproducr: certified 6 output(s) [2026-06-15] under tag 'baseline-v1'Certify outputs that are:
Avoid certifying objects that are expected to differ across runs by
design, such as proc.time() outputs or
Sys.time() values.
list_certs() — inspecting the storelist_certs(file = cert_file)
#> tag timestamp r_version os n_outputs
#> 1 baseline-v1 2026-06-15T17:03:19+0200 4.4.2 Darwin 25.5.0 1
#> 2 pre-peer-review 2026-06-15T17:03:19+0200 4.4.2 Darwin 25.5.0 1
#> 3 post-revision 2026-06-15T17:03:19+0200 4.4.2 Darwin 25.5.0 1
#> script
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>check_drift() — comparing against a baselinemodel2 <- lm(mpg ~ wt + cyl, data = mtcars)
result <- check_drift(
outputs = list(
coefs = coef(model2),
r_squared = summary(model2)$r.squared,
sigma = sigma(model2),
n_obs = nrow(mtcars),
n_complete = sum(complete.cases(mtcars)),
group_means = aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
),
against = "baseline-v1",
file = cert_file
)
#> -- reproducr drift check vs 'baseline-v1' --
#> Verdict : ALL OUTPUTS MATCH
#> OK : 1
#> Drifted : 0
#> Missing : 0
#> New : 5certify(
outputs = list(
stays_same = 42L,
will_change = coef(lm(mpg ~ wt, data = mtcars)),
will_vanish = "this output disappears next run"
),
tag = "four-statuses",
file = cert_file
)
#> reproducr: certified 3 output(s) [2026-06-15] under tag 'four-statuses'
demo_result <- check_drift(
outputs = list(
stays_same = 42L,
will_change = coef(lm(mpg ~ hp, data = mtcars)),
brand_new = "this output is new"
),
against = "four-statuses",
file = cert_file
)
#> -- reproducr drift check vs 'four-statuses' --
#> Verdict : DRIFT DETECTED
#> OK : 1
#> Drifted : 1
#> Missing : 1
#> New : 1
#> Drifted outputs:
#> - will_change
print(demo_result)
#>
#> -- reproducr drift report --
#>
#> [OK] stays_same
#> [DRIFT] will_change
#> Hash mismatch (numeric tolerance check requires stored values).
#> [NEW] brand_new
#> Not present in the baseline certification.
#> [MISSING] will_vanish
#> Present in baseline but not supplied to check_drift().| Status | Meaning |
|---|---|
ok |
Hash matches the baseline exactly |
drifted |
Hash differs — output has changed |
missing |
Present in baseline, not supplied to check_drift() |
new |
Supplied to check_drift(), not in baseline |
"latest"certify(outputs = list(x = 1L), tag = "run-1", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'run-1'
certify(outputs = list(x = 1L), tag = "run-2", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'run-2'
certify(outputs = list(x = 1L), tag = "run-3", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'run-3'
check_drift(outputs = list(x = 1L), against = "latest", file = cert_file)
#> reproducr: comparing against latest tag: 'run-3'
#> -- reproducr drift check vs 'run-3' --
#> Verdict : ALL OUTPUTS MATCH
#> OK : 1
#> Drifted : 0
#> Missing : 0
#> New : 0result <- check_drift(outputs = current_outputs, against = "latest")
n_drifted <- sum(result$status == "drifted")
if (n_drifted > 0L) {
drifted_names <- result$output[result$status == "drifted"]
stop(sprintf(
"%d output(s) have drifted since last certification: %s",
n_drifted,
paste(drifted_names, collapse = ", ")
))
}Commit .reproducr.rds to your Git repository. This gives
you a permanent, auditable history of what every run produced, and lets
you compare against any past milestone.
Add to .gitattributes to prevent noisy diffs:
.reproducr.rds binary
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.