The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette demonstrates how to compute diagFDR
diagnostics from a MaxQuant msms.txt
file.
MaxQuant reports a PSM-level score (Andromeda Score) and
a decoy indicator (Reverse). diagFDR can
reconstruct target–decoy counting (TDC) q-values from
the score and then compute stability and calibration diagnostics on the
competed PSM universe.
The typical workflow is:
msms.txt from MaxQuant.msms.txt into a dfdr_tbl (PSM-level)
using read_dfdr_maxquant_msms().dfdr_run_all() and inspect
headline/stability/calibration diagnostics.We start with a simulated dataset that is similar to the
dfdr_tbl returned by
read_dfdr_maxquant_msms(). Any software producing outputs
that can be mapped to columns id, is_decoy,
q, pep, run, and
score can similarly be used.
library(diagFDR)
set.seed(10)
n <- 8000
toy <- data.frame(
id = as.character(seq_len(n)),
is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)),
run = sample(paste0("run", 1:3), n, replace = TRUE),
score = c(rnorm(n * 0.98, mean = 7, sd = 1), rnorm(n * 0.02, mean = 5, sd = 1))[seq_len(n)],
pep = NA_real_
)
# Reconstruct q-values by simple TDC from score (for toy data only):
# Here we mimic a typical "higher score is better" setting.
toy <- toy[order(toy$score, decreasing = TRUE), ]
toy$D_cum <- cumsum(toy$is_decoy)
toy$T_cum <- cumsum(!toy$is_decoy)
toy$FDR_hat <- (toy$D_cum + 1) / pmax(toy$T_cum, 1)
toy$q <- rev(cummin(rev(toy$FDR_hat)))
toy <- toy[, c("id","is_decoy","q","pep","run","score")]
x_toy <- as_dfdr_tbl(
toy,
unit = "psm",
scope = "global",
q_source = "toy TDC from score",
q_max_export = 1,
provenance = list(tool = "toy")
)
diag <- dfdr_run_all(xs = list(MaxQuant_PSM = x_toy), alpha_main = 0.01)diag$tables$headline
#> # A tibble: 1 × 24
#> alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.01 0 0 NA Inf NA NA NA NA
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> # FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, effect_abs <dbl>,
#> # IPE <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> # flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> # interpretation <chr>diag$plots$elasticity
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_point()`).diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#> qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#> <dbl> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.2 0.5 0 0 NA NA NA NA
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__MaxQuant_PSMThe code below shows how to run diagFDR directly
from a MaxQuant msms.txt file.
library(diagFDR)
mq_path <- "path/to/msms.txt"
# Read msms.txt and reconstruct TDC q-values using MaxQuant Score and Reverse indicator.
# - Reverse == "+" is treated as a decoy indicator.
# - Score is assumed "higher is better".
# - q-values are computed using FDR(i) = (D(i)+add_decoy)/T(i) and q(i)=min_{j>=i} FDR(j).
x_mq <- read_dfdr_maxquant_msms(
path = mq_path,
pep_mode = "sanitize", # or "drop" if PEP contains values >1
exclude_contaminants = TRUE,
add_decoy = 1L,
unit = "psm",
scope = "global",
provenance = list(tool = "MaxQuant", file = basename(mq_path))
)
# Run diagnostics
diag <- dfdr_run_all(
xs = list(MaxQuant_PSM = x_mq),
alpha_main = 0.01,
alphas = c(1e-4, 5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
eps = 0.2,
win_rel = 0.2,
truncation = "warn_drop",
low_conf = c(0.2, 0.5)
)
# Inspect headline diagnostics
diag$tables$headline
# Export tables/plots + human-readable summary
dfdr_write_report(
diag,
out_dir = "diagFDR_maxquant_out",
formats = c("csv", "png", "manifest", "readme", "summary")
)
# Optional: render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_maxquant_out")
library(diagFDR)
mq_path <- "path/to/msms.txt"
# Read msms.txt and reconstruct TDC q-values
x_mq <- read_dfdr_maxquant_msms(
path = mq_path,
pep_mode = "sanitize",
exclude_contaminants = TRUE,
add_decoy = 1L,
unit = "psm",
scope = "global",
provenance = list(tool = "MaxQuant", file = basename(mq_path))
)
# Run diagnostics with automatic pseudo p-value computation
# This will use score_to_pvalue(method="decoy_ecdf") internally
diag <- dfdr_run_all(
xs = list(MaxQuant_PSM = x_mq),
alpha_main = 0.01,
compute_pseudo_pvalues = TRUE, # generates pseudo p-values from scores
pseudo_pvalue_method = "decoy_ecdf", # Most defensible method for arbitrary scores
p_stratify = "run" # Optional: stratify p-value diagnostics by run
)
# diag will contain:
# - All standard target-decoy diagnostics
# - PLUS p-value calibration plots and tables
# - PLUS Storey pi0 diagnostics
# - PLUS BH comparison diagnostics
# Inspect headline + p-value calibration
diag$tables$headline
diag$tables$p_calibration_summary
# Export everything
dfdr_write_report(
diag,
out_dir = "diagFDR_maxquant_out",
formats = c("csv", "png", "manifest", "readme", "summary")
)
# Render HTML report (will include p-value diagnostics section)
dfdr_render_report(diag, out_dir = "diagFDR_maxquant_out")##Interpretation of pseudo p-value diagnostics
When compute_pseudo_pvalues = TRUE with method = “decoy_ecdf”:
Pseudo p-values are computed as right-tail probabilities under the decoy score distribution
π₀ estimate: Estimates the proportion of true nulls; should be stable across λ
BH comparison: Compares BH-FDR to TDA-FDR; agreement supports consistent FDR control
Important: These are diagnostic pseudo p-values, not guaranteed null p-values. They provide:
✓ Additional calibration perspectives
✓ Cross-validation between BH and TDA procedures
✓ Stratified stability checks (e.g., by run)
Caveat: Pseudo p-value calibration cannot detect scoring pathologies that affect both targets and decoys equally.
MaxQuant Score is not a p-value. However, you can
construct pseudo-p-values to run p-value-based
calibration and BH-stability diagnostics. The most defensible option for
arbitrary scores is to use the empirical decoy null tail
(method = "decoy_ecdf"), which maps scores to right-tail
probabilities under the decoy distribution.
# Create pseudo-p-values from the decoy score tail and rerun diagnostics with p-value checks.
x_mq$p <- score_to_pvalue(
score = x_mq$score,
method = "decoy_ecdf",
is_decoy = x_mq$is_decoy
)
attr(x_mq, "meta")$p_source <- "score_to_pvalue(method='decoy_ecdf' on MaxQuant Score)"
diag_p <- dfdr_run_all(
xs = list(MaxQuant_PSM = x_mq),
alpha_main = 0.01,
p_stratify = "run" # optional stratification if run column is meaningful
)
dfdr_write_report(diag_p, out_dir = "diagFDR_maxquant_out_with_p", formats = c("csv","png","manifest","readme","summary"))
dfdr_render_report(diag_p, out_dir = "diagFDR_maxquant_out_with_p")Universe / scope:
read_dfdr_maxquant_msms() produces a PSM-level universe;
q-values are reconstructed by score-based TDC. Protein-level claims
require additional inference.
Stability: D_alpha,
CV_hat, and D_alpha_win quantify whether the
operating cutoff is well supported by decoys. Very strict cutoffs can
yield sparse decoy tails.
Calibration: equal-chance diagnostics are internal plausibility checks under the target–decoy model. Agreement with expectations supports (but does not prove) correct FDR control.
Pseudo-p-values derived from the decoy tail can be used for additional calibration/stability diagnostics, but should be interpreted as diagnostic tools unless null calibration is theoretically guaranteed.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.