Mapping disease risk loci using varbvs

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

2023-05-31

This vignettes demonstrates how to fit a Bayesian variable selection model using varbvs to identify genetic markers associated with Crohn’s disease risk. The data consist of 442,001 SNPs genotyped for 1,748 cases and 2,938 controls. Note that file cd.RData cannot be made publicly available due to data sharing restrictions, so this vignette is for viewing only.

Load the genotype and phenotype data

load("cd.RData")

Fit variational approximation to posterior

Here we fit the fully-factorized variational approximation to the posterior distribution of the coefficients for a logistic regression model of a binary outcome (case-control status), with spike and slab priors on the coefficients.

r <- system.time(fit <- varbvs(X,NULL,y,family = "binomial",
                               logodds = seq(-6,-3,0.25),n0 = 0)
cat(sprintf("Model fitting took %0.2f minutes.\n",r["elapsed"]/60))

Compute “single-marker” posterior inclusion probabilities.

pip <- c(varbvsindep(fit,X,NULL,y)$alpha %*% fit$w)

Save the results to a file

save(list = c("fit","map","pip","r"),
     file = "varbvs.demo.cd.RData")

Summarize the model fitting

print(summary(fit,nv = 9))

Show two “genome-wide scans”, one using the posterior inclusion probabilities (PIPs) computed in the joint analysis of all variables, and one using the PIPs that ignore correlations between the variables. The latter is meant to look like a typical genome-wide “Manhattan” plot used to summarize the results of a genome-wide association study. Variables with PIP > 0.5 are highlighted.

i <- which(fit$pip > 0.5)
var.labels <- paste0(round(map$pos[i]/1e6,digits = 2),"Mb")
print(plot(fit,groups = map$chr,vars = i,var.labels = var.labels,gap = 7500,
           ylab = "posterior prob."),
      split = c(1,1,1,2),more = TRUE)
print(plot(fit,groups = map$chr,score = log10(pip + 0.001),vars = i,
           var.labels = var.labels,gap = 7500,ylab = "log10 posterior prob."),
      split = c(1,2,1,2),more = FALSE)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.