The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
One commonly requested feature is to be able to run a post hoc Markov chain Monte Carlo analysis based on the results of a frequentist fit. This is often a reasonable shortcut for computing confidence intervals and p-values that allow for finite-sized samples rather than relying on asymptotic sampling distributions. This vignette shows an example of such an analysis. Some caveats:
Load packages:
library(glmmTMB)
library(coda) ## MCMC utilities
library(reshape2) ## for melt()
## graphics
library(lattice)
library(ggplot2); theme_set(theme_bw())
Fit basic model:
data("sleepstudy",package="lme4")
fm1 <- glmmTMB(Reaction ~ Days + (Days|Subject),
sleepstudy)
Set up for MCMC: define scaled log-posterior function (in this case the log-likelihood function); extract coefficients and variance-covariance matrices as starting points.
## FIXME: is there a better way for user to extract full coefs?
rawcoef <- with(fm1$obj$env,last.par[-random])
names(rawcoef) <- make.names(names(rawcoef),unique=TRUE)
## log-likelihood function
## (MCMCmetrop1R wants *positive* log-lik)
logpost_fun <- function(x) -fm1$obj$fn(x)
## check definitions
stopifnot(all.equal(c(logpost_fun(rawcoef)),
c(logLik(fm1)),
tolerance=1e-7))
V <- vcov(fm1,full=TRUE)
This is a basic block Metropolis sampler, based on Florian Hartig's code here.
##' @param start starting value
##' @param V variance-covariance matrix of MVN candidate distribution
##' @param iterations total iterations
##' @param nsamp number of samples to store
##' @param burnin number of initial samples to discard
##' @param thin thinning interval
##' @param tune tuning parameters; expand/contract V
##' @param seed random-number seed
run_MCMC <- function(start,
V,
logpost_fun,
iterations = 10000,
nsamp = 1000,
burnin = iterations/2,
thin = floor((iterations-burnin)/nsamp),
tune = NULL,
seed=NULL
) {
## initialize
if (!is.null(seed)) set.seed(seed)
if (!is.null(tune)) {
tunesq <- if (length(tune)==1) tune^2 else outer(tune,tune)
V <- V*tunesq
}
chain <- matrix(NA, nsamp+1, length(start))
chain[1,] <- cur_par <- start
postval <- logpost_fun(cur_par)
j <- 1
for (i in 1:iterations) {
proposal = MASS::mvrnorm(1,mu=cur_par, Sigma=V)
newpostval <- logpost_fun(proposal)
accept_prob <- exp(newpostval - postval)
if (runif(1) < accept_prob) {
cur_par <- proposal
postval <- newpostval
}
if ((i>burnin) && (i %% thin == 1)) {
chain[j+1,] <- cur_par
j <- j + 1
}
}
chain <- na.omit(chain)
colnames(chain) <- names(start)
chain <- coda::mcmc(chain)
return(chain)
}
Run the chain:
t1 <- system.time(m1 <- run_MCMC(start=rawcoef,
V=V, logpost_fun=logpost_fun,
seed=1001))
(running this chain takes 4.5 seconds)
Add more informative names and transform correlation parameter (see vignette on covariance structures and parameters):
colnames(m1) <- c(names(fixef(fm1)[[1]]),
"log(sigma)",
c("log(sd_Intercept)","log(sd_Days)","cor"))
m1[,"cor"] <- sapply(m1[,"cor"], get_cor)
xyplot(m1,layout=c(2,3),asp="fill")
The trace plots are poor, especially for the correlation; the effective sample size backs this up, as would any other diagnostics we did.
print(effectiveSize(m1),digits=3)
In a real analysis we would stop and fix the mixing/convergence problems before proceeding; for this simple sampler, some of our choices would be (1) simply run the chain for longer; (2) tune the candidate distribution (e.g. by using tune
to scale some parameters, or perhaps by switching to a multivariate Student t distribution [see the mvtnorm
package]); (3) add regularizing priors.
Ignoring the problems and proceeding, we can compute column-wise quantiles or highest posterior density intervals (coda::HPDinterval
) to get confidence intervals. Plotting posterior distributions, omitting the intercept because it's on a very different scale.
The tmbstan
package allows direct, simple access to a hybrid/Hamiltonian Monte Carlo algorithm for sampling from a TMB object; the $obj
component of a glmmTMB
fit is such an object. (To run this example you'll need to install the tmbstan
package and its dependencies.)
## install.packages("tmbstan")
library(tmbstan)
t2 <- system.time(m2 <- tmbstan(fm1$obj))
(running this command, which creates 4 chains, takes 336.1 seconds)
However, there are many indications (warning messages; trace plots) that the correlation parameter needs to be given a more informative prior. (In the plot below, the correlation parameter is shown on its unconstrained scale; the actual correlation would be \(\theta_3/\sqrt{1+\theta_3^2}\).)
Owls
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.