The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette demonstrates how to compute diagFDR diagnostics from mokapot PSM exports.
A key choice is to compute diagnostics on competed
winners: one PSM per run + spectrum_id, selecting
the maximum mokapot score among candidates (targets and decoys). This
corresponds to the standard target–decoy competition setting
for PSM-level inference.
The workflow is:
dfdr_run_all() and inspect tables/plots.We start with a small simulated dataset that is similar to a
competed-winner mokapot output. Any workflow producing a table that can
be mapped to columns id, is_decoy,
q, pep, run, and
score can similarly be used.
library(diagFDR)
set.seed(2)
n <- 6000
toy <- data.frame(
id = paste0("run1||scan", seq_len(n)),
is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)),
q = pmin(1, runif(n)^3), # many small q-values
pep = pmin(1, pmax(0, runif(n)^3 + rnorm(n, sd = 0.02))), # correlated toy PEP
run = "run1",
score = rnorm(n)
)
x <- as_dfdr_tbl(
toy,
unit = "psm",
scope = "global",
q_source = "toy mokapot",
q_max_export = 1
)
diag <- dfdr_run_all(
xs = list(mokapot = x),
alpha_main = 0.01,
alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
low_conf = c(0.2, 0.5)
)diag$tables$headline
#> # A tibble: 1 × 24
#> alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.01 1248 33 0.0264 0.174 0.0256 0.0272 0.0184 0.0345
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> # FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, IPE <dbl>,
#> # effect_abs <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> # flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> # interpretation <chr>diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#> qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.2 0.5 1307 38 0.0291 0.471 0.0213 0.0397
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__mokapotdiag$tables$pep_IPE
#> # A tibble: 1 × 2
#> list IPE
#> <chr> <dbl>
#> 1 mokapot 0.0739
diag$plots$pep_reliability__mokapot# sumpep is a list-of-tibbles (one per list)
diag$tables$sumpep$mokapot
#> # A tibble: 9 × 4
#> alpha n_targets sum_PEP mean_PEP
#> <dbl> <int> <dbl> <dbl>
#> 1 0.0005 443 107. 0.241
#> 2 0.001 559 141. 0.253
#> 3 0.002 721 177. 0.246
#> 4 0.005 985 239. 0.243
#> 5 0.01 1248 304. 0.244
#> 6 0.02 1584 394. 0.249
#> 7 0.05 2155 527. 0.244
#> 8 0.1 2703 654. 0.242
#> 9 0.2 3390 822. 0.242The code below shows how to run diagFDR directly from mokapot outputs.
library(diagFDR)
# Read mokapot outputs (targets + decoys)
raw <- read_mokapot_psms(
target_path = "path/to/your_output.mokapot.psms.txt",
decoy_path = "path/to/your_output.mokapot.decoy.psms.txt"
)
# Construct competed winners (1 per run+spectrum_id; max mokapot score)
x <- mokapot_competed_universe(
raw,
id_mode = "runxid",
unit = "psm",
scope = "global",
q_source = "mokapot q-value",
q_max_export = 1
)
# Run diagnostics
diag <- dfdr_run_all(
xs = list(mokapot = x),
alpha_main = 0.01,
alphas = c(5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
low_conf = c(0.2, 0.5)
)
# Write outputs to disk (tables + plots + summary + manifest + README)
dfdr_write_report(
diag,
out_dir = "diagFDR_mokapot_out",
formats = c("csv", "png", "manifest", "readme", "summary")
)
# Render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_mokapot_out")D_alpha (tail support) and D_alpha_win
(local boundary support) quantify how well the cutoff is supported by
decoys. Very strict thresholds can yield sparse decoy tails and thus
unstable operating points (large CV_hat).
S_alpha(eps) (elasticity) captures list sensitivity
to small threshold perturbations. Low Jaccard overlap indicates that
accepted IDs are sensitive to the chosen cutoff.
Equal-chance diagnostics (decoy fraction in low-confidence q-bands) and PEP-based diagnostics (PEP reliability, ΣPEP) are internal consistency checks under target–decoy assumptions. They are informative but do not replace external validation strategies (e.g., entrapment) when decoy representativeness is uncertain.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.