BeviMed
provides functions for performing Bayesian Evaluation of Variant Involvement in Mendelian Disease. The aim is to estimate the probability of an association between a latent configuration of a restricted set of variants and a case-control label, and conditional on the association, the probabilities of pathogenicity for individual variants.
The inference is carried out based on the inputs:
y
, a length N
(number of samples) logical
vector,G
, a k
by N
integer matrix of allele counts,min_ac
, representing a mode of inheritance hypothesis (i.e. minimum number of pathogenic variants required to be considered to have a pathogenic configuration of variants).Then, depending on the quantity of interest, the inference procedure can be invoked simply by passing the above arguments to the functions:
prob_association
- returning the probability of association between configurations of variants represented in G
and the case-control label y
(optionally broken down by mode of inheritance),log_BF
- the log Bayes factor between the association model and no-association model,prob_pathogenic
- the probabilities of pathogenicity for the individual variants.The inference is performed by the function bevimed
, an MCMC sampling procedure with many parameters, including those listed above and others determining the sampling management and prior distributions of the model parameters.
It returns a list of traces for the sampled parameters in an object of class BeviMed
. This object can take up a lot of memory, so it may be preferable to store a summarised version passed to summary
.
Here we demonstrate a simple application of BeviMed for some simulated examples.
library(BeviMed)
set.seed(0)
Firstly, we’ll generate some random data consisting of a case-control label, y
and a variant-wise indicator of pathogenicity, Z
, where Z[i] == TRUE
if variant i
is pathogenic and FALSE
otherwise.
In y
, we will represent 100 samples, the first 20 of whom are cases and the next 80 of whom are controls, and in Z
we will represent 20 SNPs at which the samples are to be genotyped, of which the first 3 are pathogenic and of which the next 17 are benign.
y <- c(rep(TRUE, 20), rep(FALSE, 80))
Z <- c(rep(TRUE, 3), rep(FALSE, 17))
In the first application, we’ll use variant matrix G1
where there is no association between y
and the variants (i.e. we simulated all genotypes at random).
G1 <- sapply(y, function(y_i) as.integer(runif(n=length(Z)) < 0.15))
prob_association(G=G1, y=y)
## [1] 0.01511634
The results indicate that there is a low probability of association (by default a prior probability of 0.01
with each mode of inheritance is used). In the example with association, we’ll sample the allele counts based on y
(i.e. for the cases we’ll include a random pathogenic variant amongst the observed genotypes).
G2 <- sapply(y, function(y_i) as.integer(runif(n=length(Z)) < 0.15 |
(if (y_i) 1:length(Z) == sample(which(Z), size=1) else rep(FALSE, length(Z)))))
prob_association(G=G2, y=y)
## [1] 0.9987681
Notice that there is now a higher estimated probability of association.
By default, prob_association
integrates over mode of inheritance (e.g. are at least 1 or 2 pathogenic variants required for a pathogenic configuration?). The probabilities of association with each mode of inheritance can by shown by passing the option by_MOI=TRUE
(for more details, including how to set the ploidy of the samples within the region, see ?prob_pathogenic
).
For a more detailed output, the bevimed
function can be used, and it’s returned values can be summarised and stored/printed.
output <- summary(bevimed(G=G2, y=y))
output
## ---------------------------------------------------------------------------
## The probability of association is 1 [prior: 0.01]
##
## Log Bayes factor between gamma 1 model and gamma 0 model is 11.16
##
## A confidence interval for the log Bayes factor is:
## 2.5% 97.5%
## 10.41 11.81
## ---------------------------------------------------------------------------
## Estimated probabilities of pathogenicity of individual variants
## (conditional on gamma = 1)
##
## Variant Controls Cases P(Z_j=1|y,gamma=1) Bar Chart
## 1 9 7 1.00 [=================== ]
## 2 17 9 1.00 [====================]
## 3 10 6 1.00 [====================]
## 4 12 2 0.00 [ ]
## 5 14 2 0.00 [ ]
## 6 11 4 0.01 [ ]
## 7 12 4 0.01 [ ]
## 8 13 2 0.01 [ ]
## 9 11 2 0.00 [ ]
## 10 12 4 0.00 [ ]
## 11 11 4 0.01 [ ]
## 12 10 4 0.00 [ ]
## 13 16 5 0.00 [ ]
## 14 8 4 0.01 [ ]
## 15 16 4 0.00 [ ]
## 16 11 1 0.01 [ ]
## 17 14 2 0.00 [ ]
## 18 14 1 0.01 [ ]
## 19 11 3 0.00 [ ]
## 20 7 2 0.01 [ ]