The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
vennDiagramLab reports five pairwise metrics for every set pair plus a multiple-testing correction. This vignette explains what each metric means, when to prefer it, and how to reproduce the values that appear in the web tool’s significance coloring.
library(vennDiagramLab)
result <- analyze(load_sample("dataset_real_cancer_drivers_4"))
stats <- statistics(result)For two sets A and B of sizes |A|, |B| with intersection |A ∩ B| drawn from a universe of N items:
|A ∩ B| / |A ∪ B|. Range [0, 1]. Symmetric.2 |A ∩ B| / (|A| + |B|). Range [0, 1]. Symmetric. Always >= Jaccard; the two relate by Dice = 2J / (1 + J).|A ∩ B| / min(|A|, |B|). Range [0, 1]. Equal to 1 when one set is contained in the other.|A ∩ B| or more shared items by chance, given |A|, |B|, and N. Tests over- representation.(|A ∩ B| * N) / (|A| * |B|). The ratio of observed to expected intersection size under independence. > 1 is over- representation.The helpers are exported and stateless:
jaccard(size_a = 138, size_b = 581, intersection = 100)
dice(size_a = 138, size_b = 581, intersection = 100)
overlap_coefficient(size_a = 138, size_b = 581, intersection = 100)
hypergeometric_p_value(N = 20000, K = 138, n = 581, k = 100)
fold_enrichment(N = 20000, K = 138, n = 581, k = 100)The hypergeometric p-value is essentially zero: 100 shared genes out of an expected (138 * 581) / 20000 ≈ 4 is a 25× enrichment.
statistics(result) returns five tables (four square NxN matrices for the ratio metrics + a long-form data.frame for the hypergeometric test):
stats@jaccardhead(stats@hypergeometric)stats@hypergeometric already carries the BH-FDR-adjusted q-value (p_adjusted) and a boolean significant (q < 0.05) and highly_significant (q < 0.001). The adjustment uses stats::p.adjust(method = "BH"):
raw_p <- stats@hypergeometric$p_value
adjusted <- bh_fdr(raw_p)
all.equal(adjusted, stats@hypergeometric$p_adjusted)For unrelated p-values, BH-FDR is more permissive than Bonferroni and more conservative than no correction:
toy_p <- c(0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 0.9)
data.frame(
raw = toy_p,
bonferroni = pmin(toy_p * length(toy_p), 1),
bh_fdr = bh_fdr(toy_p)
)broom::tidy() produces a tibble that’s pipeline-friendly (one row per pair, all metrics in one frame, sorted by adjusted p-value):
broom::tidy(result)The web tool colors significant pairs via the same p_adjusted thresholds:
sig_table <- broom::tidy(result)
sig_table$colour <- ifelse(sig_table$highly_significant, "red",
ifelse(sig_table$significant, "orange",
"grey"))
sig_table[, c("set_a", "set_b", "p_adjusted", "colour")]vignette("v02_real_cancer_drivers") — see these stats in the context of a real biological analysis.vignette("v06_pipeline_integration") — feed broom::tidy() into a downstream tidyverse pipeline.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.