The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Illustrating multivariate MAPIT with Simulated Data

Julian Stamp

2023-09-25

Please Cite Us

If you use multivariate MAPIT in published research, please cite:

Getting Started

Load necessary libraries. For the sake of getting started, mvMAPIT comes with a small set of simulated data. This data contains random genotype-like data and two simulated quantitative traits with epistatic interactions. To make use of this data, call the genotype data as simulated_data$genotype, and the simulated trait data as simulated_data$trait. The vignette traces the analysis of simulated data. The simulations are described in vignette("simulations").

For a working installation of mvMAPIT please look at theREADME.md or vignette("docker-mvmapit")

library(mvMAPIT)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(tidyr)
library(kableExtra)
library(RcppAlgos)

Running mvMAPIT

The R routine mvmapit can be run in multiple modes. By default it runs in a hybrid mode, performing tests both wtih a normal Z-test as well as the Davies method. The resulting p-values can be combined using functions provided by mvMAPIT, e.g. fishers_combined(), that work on the pvalues tibble that mvmapit returns.

NOTE: mvMAPIT takes the X matrix as \(p \times n\); not as \(n \times p\).

mvmapit_hybrid <- mvmapit(
        t(data$genotype),
        t(data$trait),
        test = "hybrid"
)
fisher <- fishers_combined(mvmapit_hybrid$pvalues)

To visualize the genome wide p-values, we use a Manhattan plot. The p-values are plotted after combining the results from the multivariate analysis using Fisher’s method.

manhplot <- ggplot(fisher, aes(x = 1:nrow(fisher), y = -log10(p))) +
  geom_hline(yintercept = -log10(thresh), color = "grey40", linetype = "dashed") +
  geom_point(alpha = 0.75, color = "grey50") +
  geom_text_repel(aes(label=ifelse(p < thresh, as.character(id), '')))
plot(manhplot)

To control the type I error despite multiple testing, we recommend the conservative Bonferroni correction. The significant SNPs returned by the mvMAPIT analysis are shown in the output below. There are in total 6 significant SNPs after multiple test correction. Of the significant SNPs, 4 are true positives.

thresh <- 0.05 / (nrow(fisher) / 2)
significant_snps <-  fisher %>%
    filter(p < thresh) # Call only marginally significant SNPs

truth <- simulated_data$epistatic %>%
  ungroup() %>%
  mutate(discovered = (name %in% significant_snps$id)) %>%
  select(name, discovered) %>%
  unique()

significant_snps %>%
  mutate_if(is.numeric, ~(as.character(signif(., 3)))) %>%
  mutate(true_pos = id %in% truth$name) %>%
  kable(., linesep = "") %>%
  kable_material(c("striped"))
id trait p true_pos
snp_00068 fisher 2.65e-10 TRUE
snp_00343 fisher 5.27e-05 FALSE
snp_00460 fisher 7.86e-07 FALSE
snp_00469 fisher 1.47e-09 TRUE
snp_00665 fisher 1.17e-09 TRUE
snp_00917 fisher 2.83e-07 TRUE

True Epistatic SNPs

To compare this list to the full list of causal epistatic SNPs of the simulations, refer to the following list. There are 5 causal SNPs. Of these 5 causal SNPs, 4 were succesfully discovered by mvMAPIT.

truth %>%
  kable(., linesep = "") %>%
  kable_material(c("striped"))
name discovered
snp_00068 TRUE
snp_00156 FALSE
snp_00469 TRUE
snp_00665 TRUE
snp_00917 TRUE

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.