Approximate Bayesian Computation (ABC) is a simulation based method for Bayesian inference. It is commonly used in evolutionary biology to estimate parameters of e.g. demographic models. Coala makes it easy to conduct the simulations for an ABC analysis and works well together with the abc
package for doing the estimation. To demonstrate the principle, we will estimate the parameter of an over-simplified toy model.
Lets assume we have 50 genetic loci from 10 individuals from a panmictic population. We’ll use the site frequency spectrum of the data as a summary statistic to estimate the scaled mutation rate theta
. Lets assume the frequency spectrum we get from the data is
sfs <- c(112, 57, 24, 34, 16, 29, 8, 10, 15)
We can now use coala to setup the model:
library(coala)
model <- coal_model(10, 50) +
feat_mutation(par_prior("theta", runif(1, 1, 5))) +
sumstat_sfs()
Note that we used par_prior
to set a flat uniform prior between 1 and 5 for theta
.
We can now do easily simulate the model:
sim_data <- simulate(model, nsim = 2000, seed = 17)
For this toy model, we did just 2000 simulations (to keep the time for building this document within a reasonable range). A real analysis will need many more!
We now need to prepare the simulation data for the abc
package:
# Getting the parameters
sim_pars <- matrix(sapply(sim_data, function(x) x$pars), 2000, 1)
colnames(sim_pars) <- "theta"
head(sim_pars, n = 3)
## theta
## [1,] 1.620203
## [2,] 1.216988
## [3,] 2.410258
# Getting the summary statistics
sim_sumstats <- t(sapply(sim_data, function(x) x$sfs))
colnames(sim_sumstats) <- paste0("S", 1:9)
head(sim_sumstats, n = 3)
## S1 S2 S3 S4 S5 S6 S7 S8 S9
## [1,] 74 54 32 30 13 14 10 6 2
## [2,] 63 30 12 12 12 12 11 8 7
## [3,] 121 49 45 36 12 28 12 7 8
And can now estimate theta
:
suppressPackageStartupMessages(library(abc))
posterior <- abc(sfs, sim_pars, sim_sumstats, 0.05, method = "rejection")
hist(posterior, breaks = 10)
Due to the low number of simulations we get a relatively flat posterior, but we still get an idea what the value for theta
might be.