diagFDR: MaxQuant diagnostics from msms.txt

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This vignette demonstrates how to compute diagFDR diagnostics from a MaxQuant msms.txt file.

MaxQuant reports a PSM-level score (Andromeda Score) and a decoy indicator (Reverse). diagFDR can reconstruct target–decoy counting (TDC) q-values from the score and then compute stability and calibration diagnostics on the competed PSM universe.

Runnable toy example (no external files required)

We start with a simulated dataset that is similar to the dfdr_tbl returned by read_dfdr_maxquant_msms(). Any software producing outputs that can be mapped to columns id, is_decoy, q, pep, run, and score can similarly be used.

library(diagFDR)

set.seed(10)

n <- 8000
toy <- data.frame(
  id = as.character(seq_len(n)),
  is_decoy = sample(c(FALSE, TRUE), n, replace = TRUE, prob = c(0.98, 0.02)),
  run = sample(paste0("run", 1:3), n, replace = TRUE),
  score = c(rnorm(n * 0.98, mean = 7, sd = 1), rnorm(n * 0.02, mean = 5, sd = 1))[seq_len(n)],
  pep = NA_real_
)

# Reconstruct q-values by simple TDC from score (for toy data only):
# Here we mimic a typical "higher score is better" setting.
toy <- toy[order(toy$score, decreasing = TRUE), ]
toy$D_cum <- cumsum(toy$is_decoy)
toy$T_cum <- cumsum(!toy$is_decoy)
toy$FDR_hat <- (toy$D_cum + 1) / pmax(toy$T_cum, 1)
toy$q <- rev(cummin(rev(toy$FDR_hat)))
toy <- toy[, c("id","is_decoy","q","pep","run","score")]

x_toy <- as_dfdr_tbl(
  toy,
  unit = "psm",
  scope = "global",
  q_source = "toy TDC from score",
  q_max_export = 1,
  provenance = list(tool = "toy")
)

diag <- dfdr_run_all(xs = list(MaxQuant_PSM = x_toy), alpha_main = 0.01)

Headline stability at 1%

diag$tables$headline
#> # A tibble: 1 × 24
#>   alpha T_alpha D_alpha FDR_hat CV_hat FDR_minus1 FDR_plus1 FDR_minusK FDR_plusK
#>   <dbl>   <int>   <int>   <dbl>  <dbl>      <dbl>     <dbl>      <dbl>     <dbl>
#> 1  0.01       0       0      NA    Inf         NA        NA         NA        NA
#> # ℹ 15 more variables: k2sqrtD <int>, FDR_minus2sqrtD <dbl>,
#> #   FDR_plus2sqrtD <dbl>, list <chr>, D_alpha_win <int>, effect_abs <dbl>,
#> #   IPE <dbl>, flag_Dalpha <chr>, flag_CV <chr>, flag_Dwin <chr>,
#> #   flag_IPE <chr>, flag_FDR <chr>, flag_equalchance <chr>, status <chr>,
#> #   interpretation <chr>

Decoy tail support and stability proxy

diag$plots$dalpha

diag$plots$cv

Local boundary support and elasticity

diag$plots$dwin

diag$plots$elasticity
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 6 rows containing missing values or values outside the scale range
#> (`geom_point()`).

Equal-chance plausibility by q-band

diag$tables$equal_chance_pooled
#> # A tibble: 1 × 12
#>   qmax_export low_lo low_hi N_test N_D_test pi_D_hat effect_abs ci95_lo ci95_hi
#>         <dbl>  <dbl>  <dbl>  <int>    <int>    <dbl>      <dbl>   <dbl>   <dbl>
#> 1           1    0.2    0.5      0        0       NA         NA      NA      NA
#> # ℹ 3 more variables: p_value_binom <dbl>, pass_minN <lgl>, list <chr>
diag$plots$equal_chance__MaxQuant_PSM

Real MaxQuant workflow (msms.txt)

The code below shows how to run diagFDR directly from a MaxQuant msms.txt file.

library(diagFDR)

mq_path <- "path/to/msms.txt"

# Read msms.txt and reconstruct TDC q-values using MaxQuant Score and Reverse indicator.
# - Reverse == "+" is treated as a decoy indicator.
# - Score is assumed "higher is better".
# - q-values are computed using FDR(i) = (D(i)+add_decoy)/T(i) and q(i)=min_{j>=i} FDR(j).
x_mq <- read_dfdr_maxquant_msms(
  path = mq_path,
  pep_mode = "sanitize",          # or "drop" if PEP contains values >1
  exclude_contaminants = TRUE,
  add_decoy = 1L,
  unit = "psm",
  scope = "global",
  provenance = list(tool = "MaxQuant", file = basename(mq_path))
)

# Run diagnostics
diag <- dfdr_run_all(
  xs = list(MaxQuant_PSM = x_mq),
  alpha_main = 0.01,
  alphas = c(1e-4, 5e-4, 1e-3, 2e-3, 5e-3, 1e-2, 2e-2, 5e-2, 1e-1, 2e-1),
  eps = 0.2,
  win_rel = 0.2,
  truncation = "warn_drop",
  low_conf = c(0.2, 0.5)
)

# Inspect headline diagnostics
diag$tables$headline

# Export tables/plots + human-readable summary
dfdr_write_report(
  diag,
  out_dir = "diagFDR_maxquant_out",
  formats = c("csv", "png", "manifest", "readme", "summary")
)

# Optional: render a single HTML report (requires rmarkdown in Suggests)
dfdr_render_report(diag, out_dir = "diagFDR_maxquant_out")


library(diagFDR)

mq_path <- "path/to/msms.txt"

# Read msms.txt and reconstruct TDC q-values
x_mq <- read_dfdr_maxquant_msms(
  path = mq_path,
  pep_mode = "sanitize",
  exclude_contaminants = TRUE,
  add_decoy = 1L,
  unit = "psm",
  scope = "global",
  provenance = list(tool = "MaxQuant", file = basename(mq_path))
)

# Run diagnostics with automatic pseudo p-value computation
# This will use score_to_pvalue(method="decoy_ecdf") internally
diag <- dfdr_run_all(
  xs = list(MaxQuant_PSM = x_mq),
  alpha_main = 0.01,
  compute_pseudo_pvalues = TRUE,   # generates pseudo p-values from scores
  pseudo_pvalue_method = "decoy_ecdf",  # Most defensible method for arbitrary scores
  p_stratify = "run"  # Optional: stratify p-value diagnostics by run
)

# diag will contain:
# - All standard target-decoy diagnostics
# - PLUS p-value calibration plots and tables
# - PLUS Storey pi0 diagnostics
# - PLUS BH comparison diagnostics

# Inspect headline + p-value calibration
diag$tables$headline
diag$tables$p_calibration_summary

# Export everything
dfdr_write_report(
  diag,
  out_dir = "diagFDR_maxquant_out",
  formats = c("csv", "png", "manifest", "readme", "summary")
)

# Render HTML report (will include p-value diagnostics section)
dfdr_render_report(diag, out_dir = "diagFDR_maxquant_out")

##Interpretation of pseudo p-value diagnostics

When compute_pseudo_pvalues = TRUE with method = “decoy_ecdf”:

Pseudo p-values are computed as right-tail probabilities under the decoy score distribution
π₀ estimate: Estimates the proportion of true nulls; should be stable across λ
BH comparison: Compares BH-FDR to TDA-FDR; agreement supports consistent FDR control

Important: These are diagnostic pseudo p-values, not guaranteed null p-values. They provide:

✓ Additional calibration perspectives
✓ Cross-validation between BH and TDA procedures
✓ Stratified stability checks (e.g., by run)

Caveat: Pseudo p-value calibration cannot detect scoring pathologies that affect both targets and decoys equally.

Optional: other way to get p-value / pseudo-p-value diagnostics from MaxQuant scores

MaxQuant Score is not a p-value. However, you can construct pseudo-p-values to run p-value-based calibration and BH-stability diagnostics. The most defensible option for arbitrary scores is to use the empirical decoy null tail (method = "decoy_ecdf"), which maps scores to right-tail probabilities under the decoy distribution.

# Create pseudo-p-values from the decoy score tail and rerun diagnostics with p-value checks.
x_mq$p <- score_to_pvalue(
  score = x_mq$score,
  method = "decoy_ecdf",
  is_decoy = x_mq$is_decoy
)
attr(x_mq, "meta")$p_source <- "score_to_pvalue(method='decoy_ecdf' on MaxQuant Score)"

diag_p <- dfdr_run_all(
  xs = list(MaxQuant_PSM = x_mq),
  alpha_main = 0.01,
  p_stratify = "run"   # optional stratification if run column is meaningful
)

dfdr_write_report(diag_p, out_dir = "diagFDR_maxquant_out_with_p", formats = c("csv","png","manifest","readme","summary"))
dfdr_render_report(diag_p, out_dir = "diagFDR_maxquant_out_with_p")

Interpretation notes

Universe / scope: read_dfdr_maxquant_msms() produces a PSM-level universe; q-values are reconstructed by score-based TDC. Protein-level claims require additional inference.
Stability: D_alpha, CV_hat, and D_alpha_win quantify whether the operating cutoff is well supported by decoys. Very strict cutoffs can yield sparse decoy tails.
Calibration: equal-chance diagnostics are internal plausibility checks under the target–decoy model. Agreement with expectations supports (but does not prove) correct FDR control.
Pseudo-p-values derived from the decoy tail can be used for additional calibration/stability diagnostics, but should be interpreted as diagnostic tools unless null calibration is theoretically guaranteed.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.