Reproducible ILD workflows with tidyILD provenance

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Alex Litovchenko

2026-06-12

tidyILD records provenance at each step: data preparation, centering, lags, alignment, weighting, and model fitting. This vignette shows how to inspect history, generate methods text, build a report, export provenance, and compare two pipelines.

1. Prepare data

library(tidyILD)
set.seed(1)
d <- ild_simulate(n_id = 8, n_obs_per = 10, irregular = TRUE, seed = 42)
x <- ild_prepare(d, id = "id", time = "time", gap_threshold = 7200)

2. Center and lag

x <- ild_center(x, y)
x <- ild_lag(x, y, n = 1, mode = "gap_aware", max_gap = 7200)

3. Fit model

fit <- ild_lme(y ~ y_bp + y_wp + (1 | id), data = x, ar1 = FALSE, warn_no_ar1 = FALSE)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0265615 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

4. Run diagnostics

diag <- ild_diagnostics(fit, type = c("residual_acf", "qq"))

5. Inspect ild_history()

ild_history() prints a human-readable log of all preprocessing and analysis steps recorded on the object.

ild_history(x)
#> ILD provenance (tidyILD 0.4.1)
#>   1. ild_prepare @ 2026-06-12T14:45:00
#>       args: id, time, gap_threshold, duplicate_handling, input_format, wide_cols, wide_names_pattern, wide_time_parser, wide_time_format, wide_keep_cols
#>       outputs: n_id, n_obs, spacing_class, source_was_tsibble, source_was_wide, wide_n_measures, wide_n_time, tsibble_interval_declared
#>   2. ild_center @ 2026-06-12T14:45:00
#>       args: vars, type, naming
#>       outputs: created
#>   3. ild_lag @ 2026-06-12T14:45:00
#>       args: vars, n, mode, max_gap, window, resolution
#>       outputs: created

For a model fit, provenance includes the source data steps plus the analysis step:

ild_history(fit)
#> ILD analysis provenance (tidyILD 0.4.1)
#>   [Source data steps: 3]
#>   1. ild_lme @ 2026-06-12T14:45:01
#>       args: formula, ar1, correlation_class, method, backend
#>       outputs: n_obs, n_id, fit_engine, backend_version

6. Generate ild_methods()

ild_methods() turns provenance into a single methods-style paragraph suitable for a manuscript.

ild_methods(fit)
#> [1] "Data were prepared using ild_prepare() with participant ID id, time variable time, and a gap threshold of 7200 units (spacing class: irregular-ish) (N = 8 persons, n = 80 observations). Predictor(s) y were person-mean centered using ild_center(), creating y_bp, y_wp. A gap_aware lag of y was computed using ild_lag() with lag 1 and max gap 7200, creating y_lag1. A mixed-effects model was fit using ild_lme() (lmer) with formula y ~ y_bp + y_wp + (1 | id) with AR1 disabled (n = 80 observations, N = 8 persons)."

If you reported fixed effects with cluster-robust SEs (e.g. via tidy_ild_model(fit, se = "robust", robust_type = "CR2")), pass that so the methods text can mention it:

ild_methods(fit, robust_se = "CR2")
#> [1] "Data were prepared using ild_prepare() with participant ID id, time variable time, and a gap threshold of 7200 units (spacing class: irregular-ish) (N = 8 persons, n = 80 observations). Predictor(s) y were person-mean centered using ild_center(), creating y_bp, y_wp. A gap_aware lag of y was computed using ild_lag() with lag 1 and max gap 7200, creating y_lag1. A mixed-effects model was fit using ild_lme() (lmer) with formula y ~ y_bp + y_wp + (1 | id) with AR1 disabled (n = 80 observations, N = 8 persons). Fixed effects were reported with cluster-robust standard errors (CR2)."

7. Run ild_report()

ild_report() assembles a standardized list: meta (n_obs, n_id, engine), methods text, the fixed-effects table, a diagnostics summary, provenance, and an optional export path.

r <- ild_report(fit)
names(r)
#> [1] "meta"                    "methods"                
#> [3] "model_table"             "diagnostics_summary"    
#> [5] "provenance"              "provenance_export_path" 
#> [7] "methods_with_guardrails"
r$meta
#> $n_obs
#> [1] 80
#> 
#> $n_id
#> [1] 8
#> 
#> $engine
#> [1] "lmer"
r$methods
#> [1] "Data were prepared using ild_prepare() with participant ID id, time variable time, and a gap threshold of 7200 units (spacing class: irregular-ish) (N = 8 persons, n = 80 observations). Predictor(s) y were person-mean centered using ild_center(), creating y_bp, y_wp. A gap_aware lag of y was computed using ild_lag() with lag 1 and max gap 7200, creating y_lag1. A mixed-effects model was fit using ild_lme() (lmer) with formula y ~ y_bp + y_wp + (1 | id) with AR1 disabled (n = 80 observations, N = 8 persons)."
r$model_table
#> # A tibble: 3 × 18
#>   term   component effect_level estimate std_error  conf_low conf_high statistic
#>   <chr>  <chr>     <chr>           <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
#> 1 (Inte… fixed     population   5.29e-17  5.16e-17 -4.82e-17  1.54e-16   1.03e 0
#> 2 y_bp   fixed     between      1   e+ 0  5.66e-17  1   e+ 0  1   e+ 0   1.77e16
#> 3 y_wp   fixed     within       1.00e+ 0  8.24e-17  1   e+ 0  1.00e+ 0   1.21e16
#> # ℹ 10 more variables: p_value <dbl>, interval_type <chr>, engine <chr>,
#> #   model_class <chr>, rhat <dbl>, ess_bulk <dbl>, ess_tail <dbl>, pd <dbl>,
#> #   rope_low <dbl>, rope_high <dbl>

The return schema is stable: meta, methods, model_table, diagnostics_summary, provenance, provenance_export_path.

8. Export provenance

Export the full provenance (data + analysis steps) to JSON or YAML for reproducibility supplements or archiving.

tmp <- tempfile(fileext = ".json")
ild_export_provenance(fit, tmp, format = "json")
readLines(tmp, n = 20)
#>  [1] "{"                                              
#>  [2] "  \"version\": \"0.4.1\","                      
#>  [3] "  \"schema_version\": \"1\","                   
#>  [4] "  \"object_type\": \"ild_model\","              
#>  [5] "  \"source_data_provenance\": {"                
#>  [6] "    \"version\": \"0.4.1\","                    
#>  [7] "    \"schema_version\": \"1\","                 
#>  [8] "    \"object_type\": \"ild_data\","             
#>  [9] "    \"steps\": ["                               
#> [10] "      {"                                        
#> [11] "        \"step_id\": \"1\","                    
#> [12] "        \"step\": \"ild_prepare\","             
#> [13] "        \"timestamp\": \"2026-06-12T14:45:00\","
#> [14] "        \"args\": {"                            
#> [15] "          \"id\": \"id\","                      
#> [16] "          \"time\": \"time\","                  
#> [17] "          \"gap_threshold\": 7200,"             
#> [18] "          \"duplicate_handling\": \"first\","   
#> [19] "          \"input_format\": \"long\","          
#> [20] "          \"wide_cols\": null,"

With ild_report(), you can export in one call:

tmp2 <- tempfile(fileext = ".yaml")
r2 <- ild_report(fit, export_provenance_path = tmp2)
r2$provenance_export_path
#> [1] "/var/folders/3s/mxnfwt994sz7kmhmwt__s12m0000gn/T//RtmpLrRoDb/file2d03ad4a772.yaml"

9. Compare two pipelines

Use ild_compare_pipelines() to see how two objects differ (e.g. different gap thresholds, lag modes, or model formula).

x2 <- ild_prepare(d, id = "id", time = "time", gap_threshold = 3600)
x2 <- ild_center(x2, y)
x2 <- ild_lag(x2, y, n = 1, mode = "index")
fit2 <- ild_lme(y ~ y_bp + y_wp + (1 | id), data = x2, ar1 = FALSE, warn_no_ar1 = FALSE)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0265615 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?

cmp <- ild_compare_pipelines(fit, fit2)
cmp
#> Pipeline comparison
#>   Different gap_threshold (ild_prepare): 7200 vs 3600
#>   Different mode (ild_lag): gap_aware vs index
#>   Different max_gap (ild_lag): 7200 vs Inf

This makes it easy to document sensitivity analyses or to check what changed between two analyses.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.