The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Developer contracts (package standards)

This vignette embeds the developer contract shipped with the package. Use it when adding engines (KFAS, ctsem, …) or evolving ild_tidy() / ild_augment() / ild_diagnose(). For a code skeleton (S3 stubs, bundle assembly, guardrails, provenance, tests), see inst/dev/backend-adapter-template.R in the source tree.

tidyILD developer contracts (package standards)

Status: Normative for new backends and API evolution. User-facing summaries live in ?ild_diagnostics_bundle, ?ild_tidy_schema, and ?ild_augment_schema; this document is the operational specification for implementers.

Version: Align with tidyILD package version; schema constants live in R/ild_diagnostics_bundle.R, R/ild_schema_tidy_augment.R, and R/ild_guardrail_registry.R.


1. Diagnostics bundle standard (ild_diagnostics_bundle)

1.1 Object-level contract

  • Class: c("ild_diagnostics_bundle", "list").
  • Top-level names (fixed order, non-optional):
    meta, data, design, fit, residual, predictive, missingness, causal, warnings, guardrails, summary_text
    (constant ILD_DIAGNOSTICS_BUNDLE_SLOTS in source).
  • Validation: validate_ild_diagnostics_bundle() — list names must match exactly; warnings and guardrails must be tibbles; guardrails must include the columns in §1.10; summary_text must be character.
  • NULL sections: Any of metacausal may be NULL if not applicable for an engine or run. warnings and guardrails are never NULL (use empty tibbles).

1.2 Section: meta

Field Required Type Semantics
engine Recommended character Backend id: "lmer", "lme", "brms", or future ids (e.g. "kfas").
n_obs, n_id Recommended integer Rows and distinct persons on the ILD used for diagnostics.
ar1 Optional logical Whether residual AR1/CAR1 was used (frequentist path).
brms_types Optional character Subset of diagnostics requested for brms (convergence, sampler, ppc).

Additional named entries are allowed if documented per engine.

1.3 Section: data

Purpose: Observation-level and cohort-level data quality (spacing, gaps, global missingness pattern).

Field Required Semantics
summary Recommended One-row tibble from ild_summary()$summary (e.g. n_id, n_obs, prop_gap, median_dt_sec, iqr_dt_sec).
spacing_class Recommended "regular-ish" or "irregular-ish" (ild_spacing_class()).
n_gaps, pct_gap Recommended Gap counts / percent of intervals flagged as gaps.
median_dt_sec, iqr_dt_sec Optional Timing dispersion.
time_range Optional Numeric range of .ild_time_num or equivalent.
cohort Recommended list(n_id, n_obs, n_intervals) aligned with ild_summary.
obs_per_id Recommended list(min, median, max, sd) observation counts per person (occasion imbalance).
outcome_summaries Optional Tibble per outcome column: mean, sd, min, max, pct_na, n (when model response/predictors passed to ild_diagnose).
missingness_rates Optional Tibble variable, pct_na for those columns.
missing_pattern Optional List: summary, overall, n_complete from ild_missing_pattern() (global columns).

1.4 Section: design

Purpose: Design structure (WP/BP, imbalance, design-time missingness).

Field Required Semantics
ild_design_check Recommended Full return value of ild_design_check() (embedded list object).
spacing_class, recommendation Optional Redundant copies acceptable for fast reporting.
wp_bp Optional Decomposition tibble or NULL if vars omitted.
design_missingness Optional Summaries bundled inside ild_design_check / design slice.
flags Recommended list(has_wp_bp, spacing_class, irregular) for fast checks.
time_coverage Recommended list(min, max, span_sec) from pooled timeline.
occasion_imbalance Recommended Same object as data$obs_per_id (mirrored for visibility).

1.5 Section: fit

Purpose: Estimation diagnostics (convergence, singularity, MCMC).

Frequentist lmerMod: engine, singular, converged, optimizer_messages, optinfo (lmer); reml, optimizer, theta_summary (length/min/max of variance-component vector), X_rank (fixed-effects design rank from qr(X)), residual_correlation (list(modeled, structure, note) — lmer has no AR1/CAR1; modeled reflects attr(ild_ar1)).

Frequentist lme: above where applicable; apVar_ok; residual_correlation with class (ild_correlation_class) and coef_corStruct when present.

Bayesian (brmsfit): engine, ild_posterior, convergence_table, max_rhat, n_divergent, n_max_treedepth_hits; residual_correlation (note only; residual structure is model-specific).

Optional heterogeneity (lmerMod, lme, brmsfit): list(available, reason, object). When available is TRUE, object is an ild_heterogeneity result from [ild_heterogeneity()] (person-specific partial-pooling summaries). When extraction fails (e.g. intercept-only fixed model without random effects, or parsing error), available is FALSE and reason is a short message. ild_autoplot(bundle, section = "fit", type = "heterogeneity", ...) forwards term / heterogeneity_type to [ild_autoplot.ild_heterogeneity()].

1.6 Section: residual

Purpose: Residual behavior (ACF, Q-Q, vs time/fitted).

Field Required Semantics
legacy_ild_diagnostics Optional Full ild_diagnostics() object for frequentist engines (enables plot_ild_diagnostics() / ild_autoplot(bundle, section = "residual", type = NULL) multi-plot or short type names acf / qq / fitted).
residual_sd, cor_observed_fitted Optional Scalar summaries.
engine Recommended Engine id.

Bayesian fits may omit legacy_ild_diagnostics and keep lighter summaries only.

1.7 Section: predictive

Purpose: Predictive checks (obs vs fitted, PPC).

Content Semantics
Frequentist engine, n, mean_abs_error, rmse, mean_residual (bias), max_abs_error, cor_observed_fitted.
brms ppc from ild_brms_ppc_summary() when requested; same observation-level scalars as frequentist when ild_augment succeeds (n, MAE, RMSE, etc.).

1.8 Section: missingness

Purpose: Variable-level missingness aligned to model terms (not duplicate of global data$missing_pattern when both exist).

Field Semantics
note Set when no model variables; else NULL.
summary, overall From ild_missing_pattern(data, vars = predictors) when predictors exist.
pct_na_by_var Subset of summary (variable, pct_na) when available.
missing_model_diagnostic Optional: tidy, message, outcome, predictors when ild_diagnose(..., missing_model = TRUE) runs ild_missing_model().

1.9 Section: causal

Purpose: Causal / weighting diagnostics (IPW, positivity).

Typical content: columns_found, weight_summary (min, max, mean) when .ipw or related columns exist on ild_data. Optional weight_detail (quantiles, sum_w) when ild_diagnose(..., causal_detail = TRUE).

1.10 Tibbles: warnings and guardrails

warnings (event log from software / sampler):

Column Semantics
source e.g. "lme4", "stan", "tidyILD".
level e.g. "warning", "note".
message Human-readable text.
code Short machine id: e.g. lmer_warning, divergent_transitions.

guardrails (methodological rules; see ?guardrail_registry):

Column Semantics
rule_id Stable id (e.g. GR_SINGULAR_RANDOM_EFFECTS).
section Bundle section the rule relates to (fit, design, data, …).
severity e.g. info, warning.
triggered Always TRUE for rows that appear (only triggered rules are stored).
message, recommendation Run-specific or default text.

Surfacing guardrails (package identity):

  • print() on ild_diagnostics_bundle: After the slot listing, prints a Summary block: warning row count, guardrail row count, and (when nrow(guardrails) > 0) highest severity (info < warning) and up to five unique rule_id values. Helpers: guardrail_severity_rank() / guardrail_max_severity() in R/ild_guardrail_registry.R.
  • ild_report(): On successful ild_diagnose(), diagnostics_summary includes guardrails_narrative (short string) and guardrails (n, max_severity, rule_ids). If any guardrails fired, methods_with_guardrails repeats the methods paragraph with the same guardrail sentence appended (equivalent to ild_methods(fit, bundle = diagnose_result)).
  • ild_methods(..., bundle = NULL): Optional bundle from ild_diagnose(fit); when nrow(bundle$guardrails) > 0, appends one sentence after provenance: Methodological cautions (tidyILD guardrails): …
  • New rules: Each new rule_id added to ILD_GUARDRAIL_REGISTRY must have a deterministic regression test that asserts the rule can still be triggered (see tests/testthat/test-guardrails-triggers.R and fit-level tests via evaluate_guardrails_fit() with a constructed fit_diag when full ild_diagnose() is impractical).

Contract regression (semantics, not only shape): tests/testthat/helper-contract-fixtures.R builds stable seeded scenarios (regular vs irregular spacing, IPW instability, late dropout with merged contextual guardrails, merged singular / brms fit-level guardrails). tests/testthat/test-contract-regression.R checks dense bundle sections, expected rule_ids, ild_tidy / ild_augment required columns, core ild_autoplot() routes, and guardrail-aware ild_methods(..., bundle) text. The brms scenario uses skip_on_cran() because it fits a small brms model.

1.11 summary_text

Character vector of short narrative lines for reporting (ild_report, printing). Not a substitute for structured sections.


2. Tidy output standard (ild_tidy())

Schema function: ild_tidy_schema() — constants ILD_TIDY_REQUIRED_COLS, ILD_TIDY_OPTIONAL_COLS.

2.1 Required columns (implemented)

term, component, effect_level, estimate, std_error, conf_low, conf_high, statistic, p_value, interval_type, engine, model_class

Frequentist (tidy_ild_model): interval_type is "Wald"; statistic is the model-reported t (or z-ratio when se = "robust"). brms: interval_type is "quantile" (equal-tailed default intervals); p_value is NA; optional posterior columns filled when intervals = TRUE.

2.2 Optional columns

rhat, ess_bulk, ess_tail, pd, rope_low, rope_high (Bayesian / posterior summaries).

2.3 Semantics: component

Conservative vocabulary (extend with engine-specific values only if documented):

Value Meaning
fixed Standard fixed-effect regression coefficients (current default for all exposed FE rows).
random Random-effect variance / covariance summaries when exposed as rows.
auxiliary Dispersion, residual variance, or residual-correlation parameters (not yet exposed for lme4/nlme in ild_tidy).
scale Dispersion parameters (legacy alias in docs; prefer auxiliary where appropriate).
correlation Correlation structure parameters (e.g. AR1).
nonlinear Smooth / spline terms (e.g. mgcv-style).

2.4 Semantics: effect_level

Value Meaning
population Intercept / cohort-mean type estimands ((Intercept) rows).
within Term name ends with _wp (within-person component).
between Term name ends with _bp (between-person component).
cross_level Interaction term involving both _wp and _bp substrings.
unknown Coefficient rows that are not clearly classified (default for generic predictors).
auxiliary Variance / sampler / state parameters (when such rows exist).

2.5 Semantics: interval_type

Value Meaning
Wald Symmetric interval from SE (normal or t as implemented).
quantile Equal-tailed posterior quantiles.
HPD Highest posterior density interval.
normal Alias for large-sample normal approximation when distinct from Wald.

3. Augment output standard (ild_augment())

Schema function: ild_augment_schema()ILD_AUGMENT_REQUIRED_COLS, ILD_AUGMENT_OPTIONAL_COLS.

3.1 Required columns (implemented)

.ild_id, .ild_time, .outcome, .fitted, .resid, .resid_std, engine, model_class

.resid_std semantics (principled but sparse): set from residuals(fit, type = "pearson") when the engine returns a numeric vector of the correct length; otherwise all NA. Do not populate with arbitrary z-scores of .resid in the same column (avoids mixing definitions). For brms, Pearson residuals are used when available; otherwise NA.

3.2 Optional columns

.fitted_lower, .fitted_upper (e.g. equal-tailed 95% for brms ild_augment), .influence, .state, .state_lower, .state_upper (e.g. latent states; often NA placeholders).

3.3 Reserved prefixes and names

Pattern Reserved for
.ild_* ILD system columns from ild_prepare() (do not overwrite for non-ILD semantics).
.fitted* Fitted / predicted mean response.
.resid* Residuals (raw or transformed).
.state* Latent trajectory / random-effect line values when exposed per row.

The observed response is always .outcome (no duplicate formula-named column in the augmented tibble).


4. Backend adapter checklist

Skeleton: inst/dev/backend-adapter-template.R — commented R patterns for ild_tidy, ild_augment, ild_diagnose, guardrails, provenance, tests, and docs (not sourced by the package; copy into R/ when implementing).

Each new estimation backend (e.g. KFAS, ctsem) should ship:

4.1 S3 methods (operational generics)

Generic Requirement
ild_tidy() Return a tibble aligned with ild_tidy_schema() (required columns when feasible).
ild_augment() Return a tibble aligned with ild_augment_schema(); attach attr(..., "ild_data") on the model object used by diagnostics.
ild_diagnose() Return ild_diagnostics_bundle only; populate all relevant sections; use shared fillers/helpers where possible (fill_diagnostics_*, guardrail evaluation).
ild_autoplot() For bundle: section-first routing (section + type); implement plotters that read bundle slots; call engine-specific code only inside plotters. Attach attr(bundle, "ild_fit") and attr(bundle, "ild_data") from ild_diagnose() so PPC, fitted, and missingness plots work without a second user argument.

4.2 Data attachment

  • Fitted object must carry attr(fit, "ild_data") (validated ILD) for ild_augment / ild_diagnose.

4.3 Provenance

  • Attach attr(fit, "ild_provenance") via ild_new_analysis_provenance() (or equivalent) with step set to the fitting function name (e.g. ild_kfas), serializable args, and outputs summarizing key choices.

4.4 Tests (minimum)

  • Schema: expect_named() / expect_true(all(ild_tidy_schema()$required %in% names(...))) for tidy output (or explicit waiver for transitional columns).
  • Bundle: expect_s3_class(..., "ild_diagnostics_bundle"); expect_no_error(validate_ild_diagnostics_bundle(...)) when constructed via ild_diagnose().
  • Smoke: one small simulated ILD dataset, one fitted model, ild_tidy, ild_augment, ild_diagnose run without error (use skip_if_not_installed for heavy deps).

4.5 Documentation

  • Export methods documented on the same @rdname as existing engines where possible; describe engine-specific fit / residual slots in Details.

5. Tsibble interoperability

Status: Phase 1 — provenance on input; best-effort round-trip via ild_as_tsibble().


6. Cross-backend validation benchmark harness

Normative spec: inst/dev/BACKEND_VALIDATION_BENCHMARK_CONTRACT.md (scenario IDs, tiers, metric columns, artifact layout).

Purpose: Run shared simulation scenarios across lme4 / nlme / brms / KFAS / ctsem entry points (where installed), write benchmark_raw.csv, benchmark_summary.csv, and benchmark_metadata.json, and optionally gate regressions with JSON thresholds in inst/benchmarks/thresholds-*.json.

Implementation (not exported API):

Location Role
tests/testthat/helper-backend-validation-harness.R harness_run_benchmark(), metric extraction, summarization, threshold evaluation.
scripts/run-backend-validation-benchmarks.R CLI runner (--tier, --backends, --n-sim, --seed, --out-dir). Expects package root + pkgload::load_all() or devtools::load_all().
scripts/check-backend-validation-thresholds.R Reads benchmark_summary.csv + thresholds JSON; writes benchmark_checks.csv; exit code 1 on hard failures.
.github/workflows/backend-validation-benchmarks.yml Scheduled / manual CI; uploads artifacts.

When adding a new backend: extend the scenario manifest and harness_fit_one() in the harness helper; document the scenario here and in the benchmark contract; add or adjust skip_if_not_installed() tests in tests/testthat/test-backend-validation-harness.R. Prefer warn-only thresholds for slow or fragile optional engines until metrics stabilize.


7. Temporal dynamics helpers and guardrails

User-facing functions (see vignette("temporal-dynamics-model-choice", package = "tidyILD")):

Function Role
ild_panel_lag_prepare() Multi-variable ild_lag() + single ild_check_lags(); provenance step ild_panel_lag_prepare.
ild_compare_fits() Named list of fits → tibble (aic, bic, n_obs, converged, optional n_guardrails); not an automatic nested-model test.
ild_brms_dynamics_formula() Returns a suggested formula + notes for ild_brms(); does not fit.

Guardrails (registry + evaluate_guardrails_contextual()):


See also

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.