diagFDR: mokapot diagnostics (competed winners)

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This vignette demonstrates how to compute diagFDR diagnostics from mokapot PSM exports.

A key choice is to compute diagnostics on competed winners: one PSM per run + spectrum_id, selecting the maximum mokapot score among candidates (targets and decoys). This corresponds to the standard target–decoy competition setting for PSM-level inference.

Runnable toy example (no external files required)

We start with a small simulated dataset that is similar to a competed-winner mokapot output. Any workflow producing a table that can be mapped to columns id, is_decoy, q, pep, run, and score can similarly be used.

library(diagFDR)

set.seed(2)

n <- 6000
toy <- data.frame(
  id = paste0("run1||scan", seq_len(n)),
  is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)),
  q = pmin(1, runif(n)^3),                                  # many small q-values
  pep = pmin(1, pmax(0, runif(n)^3 + rnorm(n, sd = 0.02))), # correlated toy PEP
  run = "run1",
  score = rnorm(n)
)

x <- as_dfdr_tbl(
  toy,
  unit = "psm",
  scope = "global",
  q_source = "toy mokapot",
  q_max_export = 1
)

diag <- dfdr_run_all(
  xs = list(mokapot = x),
  alpha_main = 0.01,
  alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
  low_conf = c(0.2, 0.5)
)

Headline stability at 1%

diag$tables$headline
#> # A tibble: 1 × 24
#>   alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#>   <dbl>   <int>   <int>   <dbl>  <dbl>      <dbl>     <dbl>      <dbl>     <dbl>
#> 1  0.01    1248      33  0.0264  0.174     0.0256    0.0272     0.0184    0.0345
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> #   FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, IPE <dbl>,
#> #   effect_abs <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> #   flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> #   interpretation <chr>

Decoy tail support and stability proxy

diag$plots$dalpha

diag$plots$cv

Local boundary support and elasticity

diag$plots$dwin

diag$plots$elasticity

Equal-chance plausibility by q-band (internal check)

diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#>   qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#>         <dbl>  <dbl>  <dbl>  <int>    <int>    <dbl>      <dbl>   <dbl>   <dbl>
#> 1           1    0.2    0.5   1307       38   0.0291      0.471  0.0213  0.0397
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__mokapot

PEP reliability and expected errors (ΣPEP)

diag$tables$pep_IPE
#> # A tibble: 1 × 2
#>   list       IPE
#>   <chr>    <dbl>
#> 1 mokapot 0.0739
diag$plots$pep_reliability__mokapot

# sumpep is a list-of-tibbles (one per list)
diag$tables$sumpep$mokapot
#> # A tibble: 9 × 4
#>    alpha n_targets sum_PEP mean_PEP
#>    <dbl>     <int>   <dbl>    <dbl>
#> 1 0.0005       443    107.    0.241
#> 2 0.001        559    141.    0.253
#> 3 0.002        721    177.    0.246
#> 4 0.005        985    239.    0.243
#> 5 0.01        1248    304.    0.244
#> 6 0.02        1584    394.    0.249
#> 7 0.05        2155    527.    0.244
#> 8 0.1         2703    654.    0.242
#> 9 0.2         3390    822.    0.242

Real mokapot workflow (targets + decoys text exports)

The code below shows how to run diagFDR directly from mokapot outputs.

library(diagFDR)

# Read mokapot outputs (targets + decoys)
raw <- read_mokapot_psms(
  target_path = "path/to/your_output.mokapot.psms.txt",
  decoy_path  = "path/to/your_output.mokapot.decoy.psms.txt"
)

# Construct competed winners (1 per run+spectrum_id; max mokapot score)
x <- mokapot_competed_universe(
  raw,
  id_mode = "runxid",        
  unit = "psm",
  scope = "global",
  q_source = "mokapot q-value",
  q_max_export = 1
)

# Run diagnostics
diag <- dfdr_run_all(
  xs = list(mokapot = x),
  alpha_main = 0.01,
  alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
  low_conf = c(0.2, 0.5)
)

# Write outputs to disk (tables + plots + summary + manifest + README)
dfdr_write_report(
  diag,
  out_dir = "diagFDR_mokapot_out",
  formats = c("csv", "png", "manifest", "readme", "summary")
)

# Render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_mokapot_out")

Interpretation notes

D_alpha (tail support) and D_alpha_win (local boundary support) quantify how well the cutoff is supported by decoys. Very strict thresholds can yield sparse decoy tails and thus unstable operating points (large CV_hat).
S_alpha(eps) (elasticity) captures list sensitivity to small threshold perturbations. Low Jaccard overlap indicates that accepted IDs are sensitive to the chosen cutoff.
Equal-chance diagnostics (decoy fraction in low-confidence q-bands) and PEP-based diagnostics (PEP reliability, ΣPEP) are internal consistency checks under target–decoy assumptions. They are informative but do not replace external validation strategies (e.g., entrapment) when decoy representativeness is uncertain.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.