The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
RobustFlow is an R package for auditing temporal drift, subgroup disparities, and robustness in longitudinal decision systems.
It is built around three methodological contributions:
| Metric | Definition | Key function |
|---|---|---|
| Drift Intensity Index (DII) | Frobenius norm of consecutive transition matrix differences | compute_drift() |
| Bias Amplification Index (BAI) | Change in disparity gap from first to last time point | compute_bai() |
| Temporal Fragility Index (TFI) | Minimum hidden-bias perturbation to nullify a trend conclusion | compute_tfi_simple() |
An interactive Shiny application (run_app()) provides a
seven-tab workflow with point-and-click access to all analyses plus
downloadable HTML reports, CSV bundles, and reproducible R scripts.
We construct a panel that mimics a study of mathematics risk status across five elementary school waves, with three SES groups.
set.seed(2024)
n_children <- 200
n_waves <- 5
# Simulate SES group with unequal sizes
ses <- sample(
c("Low SES", "Mid SES", "High SES"),
size = n_children,
replace = TRUE,
prob = c(0.35, 0.40, 0.25)
)
# Risk probability increases over waves for Low SES,
# decreases slightly for High SES
risk_prob <- function(ses_grp, wave) {
base <- switch(ses_grp,
"Low SES" = 0.55,
"Mid SES" = 0.35,
"High SES" = 0.20
)
trend <- switch(ses_grp,
"Low SES" = 0.04,
"Mid SES" = 0.01,
"High SES" = -0.02
)
pmin(pmax(base + trend * (wave - 1), 0.02), 0.98)
}
rows <- lapply(seq_len(n_children), function(i) {
data.frame(
child_id = i,
wave = seq_len(n_waves),
ses_group = ses[i],
school_id = sample(seq_len(20), 1),
risk_math = rbinom(n_waves, 1,
vapply(seq_len(n_waves),
function(w) risk_prob(ses[i], w),
numeric(1)))
)
})
ecls_demo <- do.call(rbind, rows)
head(ecls_demo, 8)
#> child_id wave ses_group school_id risk_math
#> 1 1 1 High SES 5 0
#> 2 1 2 High SES 5 0
#> 3 1 3 High SES 5 0
#> 4 1 4 High SES 5 0
#> 5 1 5 High SES 5 0
#> 6 2 1 Mid SES 14 0
#> 7 2 2 Mid SES 14 0
#> 8 2 3 Mid SES 14 1validate_panel_data() checks variable existence, sorts
the data, and reports panel balance and missingness.
validated <- validate_panel_data(
data = ecls_demo,
id = "child_id",
time = "wave",
decision = "risk_math",
group = "ses_group",
cluster = "school_id"
)
cat(sprintf(
"Individuals: %d\nWaves: %d\nBalanced: %s\n",
validated$n_ids,
validated$n_times,
validated$balanced
))
#> Individuals: 200
#> Waves: 5
#> Balanced: TRUEbuild_paths() constructs each child’s complete risk
trajectory and returns aggregate path frequencies, the pooled transition
matrix, and Shannon entropy.
paths <- build_paths(
data = validated$data,
id = "child_id",
time = "wave",
decision = "risk_math"
)
# Top 10 paths
head(paths$path_counts, 10)
#> path n pct
#> 1 0->0->0->0->0 28 14.0
#> 2 0->0->0->1->0 16 8.0
#> 3 0->0->1->0->0 11 5.5
#> 4 1->0->0->0->0 11 5.5
#> 5 0->0->0->0->1 10 5.0
#> 6 0->0->0->1->1 8 4.0
#> 7 0->0->1->0->1 8 4.0
#> 8 1->1->0->1->1 8 4.0
#> 9 0->1->0->0->0 7 3.5
#> 10 0->1->0->0->1 7 3.5Shannon entropy measures trajectory diversity — higher values mean more heterogeneous individual histories.
df_paths <- head(paths$path_counts, 10)
df_paths$path <- factor(df_paths$path, levels = rev(df_paths$path))
ggplot(df_paths, aes(x = path, y = n)) +
geom_col(fill = "#2c7bb6") +
coord_flip() +
labs(x = "Path", y = "Count",
title = "Top 10 Decision Paths") +
theme_minimal()Top 10 most common five-wave risk trajectories.
compute_drift() estimates a period-specific transition
matrix for each consecutive wave pair and computes the Frobenius norm of
their difference:
\[DII_t = \| P_t - P_{t-1} \|_F\]
A larger DII signals greater structural instability between two periods.
drift <- compute_drift(
data = validated$data,
id = "child_id",
time = "wave",
decision = "risk_math",
normalize = TRUE
)
drift$summary
#> time DII
#> 1 1 NA
#> 2 2 NA
#> 3 3 0.089542
#> 4 4 0.140190
#> 5 5 0.095655
cat(sprintf("Mean DII: %.4f\n", drift$mean_dii))
#> Mean DII: 0.1085
cat(sprintf("Period of max drift: wave %s\n", drift$max_dii_period))
#> Period of max drift: wave 4ggplot(drift$summary, aes(x = time, y = DII)) +
geom_line(color = "#1a9641", linewidth = 1) +
geom_point(size = 2.5) +
labs(x = "Wave", y = "DII",
title = "Drift Intensity Index Over Time") +
theme_minimal()
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).DII over time. Higher values indicate greater structural change in risk transitions.
compute_group_gaps() computes the at-risk rate for each
SES group at each wave and the pairwise gap (alphabetically first group
minus second).
compute_bai() then summarizes whether that gap widens or
narrows:
\[BAI = Gap_T - Gap_1\]
gaps <- compute_group_gaps(
data = validated$data,
time = "wave",
decision = "risk_math",
group = "ses_group",
focal_value = 1
)
gaps$gap_df
#> time gap
#> 1 1 -0.277778
#> 2 2 -0.534722
#> 3 3 -0.347222
#> 4 4 -0.479167
#> 5 5 -0.590277ggplot(gaps$long_format,
aes(x = time, y = rate, color = group, group = group)) +
geom_line(linewidth = 1) +
geom_point(size = 2.5) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
limits = c(0, 1)) +
labs(x = "Wave", y = "At-Risk Rate", color = "SES Group",
title = "Math Risk Trajectories by SES Group") +
theme_minimal() +
theme(legend.position = "bottom")At-risk rates by SES group across five waves.
ggplot(gaps$gap_df, aes(x = time, y = gap)) +
geom_line(color = "#762a83", linewidth = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
geom_point(size = 2.5) +
labs(x = "Wave", y = "Gap",
title = "Disparity Gap Over Time") +
theme_minimal()Disparity gap (first two alphabetic SES groups) over time.
bai <- compute_bai(gaps$gap)
cat(sprintf("BAI: %.4f\n", bai$bai))
#> BAI: -0.3125
cat(sprintf("Direction: %s\n", bai$direction))
#> Direction: convergence
cat(sprintf("Gap at t=1: %.4f | Gap at t=T: %.4f\n",
bai$gap_start, bai$gap_end))
#> Gap at t=1: -0.2778 | Gap at t=T: -0.5903The standardized version divides by the SD of the gap series:
bai_std <- compute_bai(gaps$gap, standardize = TRUE)
cat(sprintf("Standardized BAI: %.4f\n", bai_std$bai))
#> Standardized BAI: -2.3995compute_tfi_simple() estimates how much hidden bias —
modelled as a scalar attenuation \(u\)
— would be required to nullify the observed longitudinal trend in DII.
The adjusted slope under perturbation \(u\) is:
\[\hat{\beta}(u) = \hat{\beta}_{obs} \times (1 - u)\]
The TFI is the minimum \(u\) at which the trend reverses or reaches zero.
dii_vals <- drift$summary$DII[!is.na(drift$summary$DII)]
tfi <- compute_tfi_simple(dii_vals)
tfi$summary_table
#> Metric Value
#> 1 Observed slope 0.003056
#> 2 TFI 1
#> 3 Interpretation Moderately robustsc <- tfi$sensitivity_curve
ggplot(sc, aes(x = perturbation, y = adjusted_effect)) +
geom_line(color = "#4575b4", linewidth = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "#d73027",
linewidth = 0.8) +
labs(x = "Hidden Bias Parameter (u)",
y = "Adjusted Slope",
title = "TFI Sensitivity Curve") +
theme_minimal()TFI sensitivity curve. The red dashed line marks the null effect threshold.
| TFI value | Interpretation |
|---|---|
Inf |
Conclusion cannot be nullified by scalar attenuation — highly robust |
> 0.5 |
Moderately robust |
0.1 – 0.5 |
Requires scrutiny — modest hidden bias could reverse conclusions |
< 0.1 |
Fragile — small bias sufficient to nullify the trend |
generate_r_script(
id_var = "child_id",
time_var = "wave",
decision_var = "risk_math",
group_var = "ses_group",
cluster_var = "school_id",
focal_value = 1,
output_file = "my_robustflow_analysis.R"
)The generated script is a self-contained .R file that a
collaborator can run to reproduce the full analysis without the Shiny
app.
The app loads in your default browser and provides:
| Tab | Content |
|---|---|
| Data | Upload, variable mapping, balance diagnostics, missingness |
| Paths | Path frequencies, transition matrix, entropy |
| Drift | Prevalence over time, DII trend, period-specific matrices |
| Disparities | Group trajectories, gap evolution, BAI |
| Robustness | TFI, sensitivity curve |
| Intervention | Highest-risk transitions, disparity-generating steps |
| Report | HTML report, CSV bundle, reproducible R script |
If you use RobustFlow in published research, please cite:
Hait, S. (2025). RobustFlow: Robustness and drift auditing for
longitudinal decision systems. R package version 0.1.0.
https://github.com/subirhait/RobustFlow
sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/Chicago
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.2 RobustFlow_0.1.1
#>
#> loaded via a namespace (and not attached):
#> [1] config_0.3.2 plotly_4.11.0 sass_0.4.10 generics_0.1.4
#> [5] tidyr_1.3.2 digest_0.6.37 magrittr_2.0.4 evaluate_1.0.5
#> [9] attempt_0.3.1 grid_4.5.1 RColorBrewer_1.1-3 golem_0.5.1
#> [13] fastmap_1.2.0 jsonlite_2.0.0 promises_1.3.3 httr_1.4.7
#> [17] purrr_1.2.1 viridisLite_0.4.2 scales_1.4.0 lazyeval_0.2.2
#> [21] jquerylib_0.1.4 cli_3.6.5 shiny_1.11.1 rlang_1.2.0
#> [25] withr_3.0.2 cachem_1.1.0 yaml_2.3.10 tools_4.5.1
#> [29] dplyr_1.2.1 httpuv_1.6.16 DT_0.34.0 vctrs_0.7.1
#> [33] R6_2.6.1 mime_0.13 lifecycle_1.0.5 htmlwidgets_1.6.4
#> [37] pkgconfig_2.0.3 pillar_1.11.1 bslib_0.9.0 later_1.4.4
#> [41] gtable_0.3.6 glue_1.8.0 data.table_1.17.8 Rcpp_1.1.0
#> [45] xfun_0.53 tibble_3.3.1 tidyselect_1.2.1 rstudioapi_0.17.1
#> [49] knitr_1.51 farver_2.1.2 xtable_1.8-4 htmltools_0.5.8.1
#> [53] rmarkdown_2.31 labeling_0.4.3 compiler_4.5.1 S7_0.2.0These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.