diagFDR: DIA-NN diagnostics from report.parquet

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This vignette demonstrates how to run diagFDR on DIA-NN exports and interpret the key diagnostics in terms of scope, calibration, and stability.

Recommended DIA-NN export settings

To enable all diagnostics, export:

decoys: --report-decoys
a permissive export ceiling: --qvalue 0.5 (or higher)

The q-value ceiling matters because some diagnostics operate in low-confidence regions (e.g. equal-chance plausibility checks, or local-window support around cutoffs).

Runnable toy example (no DIA-NN files required)

We start with a small simulated dataset that exercises the diagFDR functions. Any workflow producing outputs that can be mapped to the columns id, is_decoy, q, pep, run, and score can be handled similarly.

library(diagFDR)

set.seed(1)

n <- 3000
toy_global <- data.frame(
  id = paste0("P", seq_len(n)),
  is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.97, 0.03)),
  q = pmin(1, runif(n)^3),       # skew toward small q-values
  pep = NA_real_,
  run = NA_character_,
  score = NA_real_
)

x_global <- as_dfdr_tbl(
  toy_global,
  unit = "precursor",
  scope = "global",
  q_source = "toy",
  q_max_export = 0.5
)

diag <- dfdr_run_all(
  xs = list(global = x_global),
  alpha_main = 0.01,
  alphas = c(1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
  low_conf = c(0.2, 0.5)
)

Headline stability at 1%

diag$tables$headline
#> # A tibble: 1 × 24
#>   alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#>   <dbl>   <int>   <int>   <dbl>  <dbl>      <dbl>     <dbl>      <dbl>     <dbl>
#> 1  0.01     602      13  0.0216  0.277     0.0199    0.0233    0.00498    0.0382
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> #   FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, effect_abs <dbl>,
#> #   IPE <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> #   flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> #   interpretation <chr>

Tail support and stability versus threshold

diag$plots$dalpha

diag$plots$cv

Local boundary support

diag$plots$dwin

Threshold elasticity (list sensitivity to changing alpha)

diag$plots$elasticity

Equal-chance plausibility by q-band

diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#>   qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#>         <dbl>  <dbl>  <dbl>  <int>    <int>    <dbl>      <dbl>   <dbl>   <dbl>
#> 1         0.5    0.2    0.5    596       26   0.0436      0.456  0.0299  0.0632
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__global
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the diagFDR package.
#>   Please report the issue to the authors.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Real DIA-NN parquet workflow

The following code shows how to run the pipeline on a real DIA-NN report.parquet.

# Requires arrow 
rep <- read_diann_parquet("path/to/report.parquet")

# (A) Global precursor list using Global.Q.Value
# Recommended for experiment-wide (pooled) lists.
x_global_gq <- diann_global_precursor(
  rep,
  q_col = "Global.Q.Value",
  q_max_export = 0.5,
  unit = "precursor",
  scope = "global",
  q_source = "Global.Q.Value"
)

# (B) Run×precursor universe using run-wise Q.Value
# Recommended for per-run decisions / QC.
x_runx <- diann_runxprecursor(
  rep,
  q_col = "Q.Value",
  q_max_export = 0.5,
  id_mode = "runxid",
  unit = "runxprecursor",
  scope = "runwise",
  q_source = "Q.Value"
)

# (C) Scope misuse comparator: min run-wise q over runs per precursor (anti-pattern)
# Useful for demonstrating/diagnosing scope mismatch.
x_minrun <- diann_global_minrunq(
  rep,
  q_col = "Q.Value",
  q_max_export = 0.5,
  unit = "precursor",
  scope = "aggregated",
  q_source = "min_run(Q.Value)"
)

diag <- dfdr_run_all(
  xs = list(global = x_global_gq, runx = x_runx, minrun = x_minrun),
  alpha_main = 0.01,
  compute_pseudo_pvalues = TRUE  # <-- This adds p-value diagnostics
)

# Compare accepted lists across scopes (Jaccard overlap across alpha)
scope_tbl <- dfdr_scope_disagreement(
  x1 = x_global_gq,
  x2 = x_minrun,
  alphas = c(1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2),
  label1 = "Global.Q.Value",
  label2 = "min_run(Q.Value)"
)

# Write outputs to disk (tables + plots; optionally PPTX)
dfdr_write_report(diag, out_dir = "diagFDR_diann_out", formats = c("csv", "png", "manifest", "readme", "summary"))

# Render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_diann_out")

Interpretation notes

Scope: run-wise q-values (Q.Value) and global q-values (Global.Q.Value) do not control the same multiple-testing universe. Constructing experiment-wide lists by aggregating run-wise q-values (e.g., taking min(Q.Value) across runs) is generally anti-conservative.
Stability: stringent cutoffs can enter a granular regime where only a few decoys support the boundary. Inspect D_alpha, CV_hat, and the local boundary support D_alpha_win before making strong claims at very small alpha.
Equal-chance diagnostics (decoy fractions in low-confidence q-bands) and PEP reliability are internal consistency checks under target–decoy assumptions; they do not replace external validation (e.g. entrapment) when decoy representativeness is uncertain.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.