require(GGally)
## Loading required package: GGally
The BAS
package provides easy to use functions to implement Bayesian Model Averaging in linear models and generalized linear models. Prior distributions on coefficients are based on Zellner’s g-prior or mixtures of g-priors, such as the Zellner-Siow Cauchy prior or mixtures of g-priors from Liang et al (2008) for linear models, as well as other options including AIC, BIC, RIC and Empirical Bayes methods. Extensions to Generalized Linear Models are based on the mixtures of g-priors in GLMs of Li and Clyde (2015) using an integrated Laplace approximation.
BAS
uses an adaptive sampling algorithm to sample without replacement from the space of models or MCMC sampling which is recommended for sampling problems with a large number of predictors. See Clyde, Littman & Ghosh for more details for the sampling algorithms.
The stable version can be installed easily in the R
console like any other package:
install.packages('BAS')
On the other hand, I welcome everyone to use the most recent version of the package with quick-fixes, new features and probably new bugs. To get the latest development version from GitHub, use the devtools
package from CRAN and enter in R
:
# devtools::install_github('merliseclyde/BAS')
As the package does depend on BLAS and LAPACK, installing from GitHub will require that you have FORTRAN and C compilers on your system.
We will use the UScrime data to illustrate some of the commands and functionality.
library(MASS)
data(UScrime)
Following other analyses, we will go ahead and log transform all of the variables except column 2, which is the indicator variable of the state being a southern state.
UScrime[,-2] = log(UScrime[,-2])
To get started, we will use BAS
with the Zellner-Siow Cauchy prior on the coefficients.
library(BAS)
crime.ZS = bas.lm(y ~ .,
data=UScrime,
prior="ZS-null",
modelprior=uniform(), initprobs="eplogp")
BAS
uses a model formula similar toe lm
to specify the full model with all of the potential predictors. Here we are using the shorthand .
to indicate that all remaining variables in the data frame will be included. BAS
require a data frame as the input for the data
argument. Different prior distributions on the regression coefficients may be specified using the prior
argument, and include
By default, BAS
will try to enumerate all models, in this case \(2^15\). The prior distribution over the models is a uniform()
distribution which assigns equal probabilities to all models. The last optional argument initprobs = eplogp
provides a way to initialize the sampling algorithm.
Some graphical summaries of the output may be obtained by the plot
function
plot(crime.ZS, ask=F)
which produces a panel of four plots. The first is a plot of residuals and fitted values under Bayesian Model Averaging. Ideally, of our model assumptions hold, we will not see outliers or non-constant variance. The second plot shows the cumulative probabilty of the models in the order that they are sampled. This plot indicates that the cumulative probability is leveling off as each additional model adds only a small increment to the cumulative probability, which earlier, there are larger jumps corresponding to sampling high probabilty models. The third plot shows the dimension of each model (the number of regression coefficients includeing the intercept) versus the log of the marginal likelihood of the model. The last plot shows the marginal posterior inclusion probabilities (pip) for each of the covariates, with marginal pips greater than 0.5 shown in red. The variables with pip > 0.5 correspond to what is known as the median probability model. Variables with high inclusion probabilties are generally important for explaining the data or prediction, but marginal inclusion probabilities may be small if there are extreme correlations among predictors, similar to how p-values may be large in the presence of mullticollinearity.
Individual plots may be obtained using the which
option.
plot(crime.ZS, which = 4, ask=FALSE, caption="", sub.caption="")
BAS
has print
and summary
methods defined for objects of class bas
. Typing the objects name
crime.ZS
##
## Call:
## bas.lm(formula = y ~ ., data = UScrime, prior = "ZS-null", modelprior = uniform(), initprobs = "eplogp")
##
##
## Marginal Posterior Inclusion Probabilities:
## Intercept M So Ed Po1 Po2
## 1.0000 0.8536 0.2737 0.9747 0.6652 0.4490
## LF M.F Pop NW U1 U2
## 0.2022 0.2050 0.3696 0.6944 0.2526 0.6149
## GDP Ineq Prob Time
## 0.3601 0.9965 0.8992 0.3718
returns a summary of the marginal inclusion probabilities, while the summary
function provides
summary(crime.ZS)
## Intercept M So Ed Po1 Po2 LF M.F Pop NW U1 U2 GDP Ineq Prob Time
## [1,] 1 1 0 1 1 0 0 0 0 1 0 1 0 1 1 1
## [2,] 1 1 0 1 1 0 0 0 0 1 0 1 0 1 1 0
## [3,] 1 1 0 1 0 1 0 0 0 1 0 1 0 1 1 0
## [4,] 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0
## [5,] 1 1 0 1 1 0 0 0 0 0 0 1 0 1 1 0
## BF PostProbs R2 dim logmarg
## [1,] 1.0000000 0.0182 0.8420 9 23.65111
## [2,] 0.9416178 0.0172 0.8265 8 23.59096
## [3,] 0.6369712 0.0116 0.8229 8 23.20008
## [4,] 0.5944530 0.0108 0.8375 9 23.13100
## [5,] 0.5301269 0.0097 0.8046 7 23.01647
This lists the top 5 models (in terms of posterior probability) with the zero-one indicators for variable inclusion. The other columns in the summary are the Bayes factor of each model to the hiest probbility model (hence its Bayes factor is 1), the posterior probabilites of the models, the ordinary \(R^2\) of the models, the dimension of the models (number of coeeficients includeing the intercept) and the log marginal likelihood usnder the selected prior distribution.
To see beyond the first five models, we can represent the collection of the models via an image
plot. By default this shows the top 20 models.
image(crime.ZS, rotate=F)
This image has rows that correspond to each of the variables and intercept, with labels for the variables on the y-axis. The x-axis corresponds to the possible models. These are sorted by their posterior probability from best at the left to worst at the right with the rank on the top x-axis.
Each column represents one of the 16 models. The variables that are excluded in a model are shown in black for each column, while the variables that are included are colored, with the color related to the log posterior probability. The color of each column is proportional to the log of the posterior probabilities (the lower x-axis) of that model. Models that are the same color have similar log posterior probabilities which allows us to view models that are clustered together that have marginal likelihoods where the differences are not “worth a bare mention”.
This plot indicates that the police expenditure in the two years do not enter the model together, and is an indication of the high correlation between the two variables.
To examine the marginal distributions of the two coefficients for the police expenditures, we can estract their coefficents and then plot the posterior distributions averaging over all of the models.
coef.ZS = coef(crime.ZS)
plot(coef.ZS, subset=c(5:6), ask=F)
The vertical bar represents the posterior probability that the coefficient is 0 while the bell shaped curve represents the density of plausible values from all the models where the coefficient is non-zero. This is scaled so that the height of the density for non-zero values is the probability that the coefficient is non-zero.
Omitting the subset
argument provides all of the marginal distributions
plot(coef.ZS, ask=FALSE)
To obtain credible interals for coefficients, BAS
includes a confint
method to create Highest Posterior Density intervals from the summaries from coef
.
confint(coef.ZS)
## 2.5 % 97.5 % beta
## Intercept 6.668626e+00 6.779743390 6.72493620
## M 0.000000e+00 2.146826338 1.14359433
## So -6.109821e-02 0.312584636 0.03547522
## Ed 6.421643e-01 3.132233107 1.85848834
## Po1 -4.898413e-05 1.411629669 0.60067372
## Po2 -1.091004e-01 1.437126001 0.31841766
## LF -5.370194e-01 0.914519154 0.05933737
## M.F -2.327060e+00 1.580858361 -0.02702786
## Pop -1.234147e-01 0.008296623 -0.02248283
## NW 0.000000e+00 0.165170878 0.06668437
## U1 -5.398903e-01 0.354907526 -0.02456854
## U2 -9.718635e-05 0.660028387 0.20702927
## GDP -4.123953e-02 1.150540460 0.20625063
## Ineq 6.681233e-01 2.096703914 1.39012647
## Prob -4.096855e-01 0.000000000 -0.21536203
## Time -4.938990e-01 0.069843468 -0.08433479
## attr(,"Probability")
## [1] 0.95
## attr(,"class")
## [1] "confint.bas"
where the third column is the posterior mean. This uses Monte Carlo sampling to draw from the mixture model over coefficient where models are sampled based on their posterior probabilities.
We can also plot these via
plot(confint(coef.ZS, parm=2:16))
## NULL
using the parm
argument to select which coefficients to plot (the intercept is parm=1
).
BAS
has methods defined to return fitted values, fitted
, using the observed design matrix and predictions at either the observed data or potentially new values, predict
, as with lm
.
muhat.BMA = fitted(crime.ZS, estimator="BMA")
BMA = predict(crime.ZS, estimator="BMA")
# predict has additional slots for fitted values under BMA, predictions under each model
names(BMA)
## [1] "fit" "Ybma" "Ypred" "postprobs" "se.fit"
## [6] "se.pred" "se.bma.fit" "se.bma.pred" "df" "best"
## [11] "bestmodel" "prediction" "estimator"
Plotting the two sets of fitted values,
par(mar=c(9, 9, 3, 3))
plot(muhat.BMA, BMA$fit,
pch=16,
xlab=expression(hat(mu[i])), ylab=expression(hat(Y[i])))
abline(0,1)
we see that they are in perfect agreement. That is always the case as the posterior mean for the regression mean function at a point \(x\) is the expected posterior predictive value for \(Y\) at \(x\). This is true not only for estimators such as BMA, but the expected values under model selection.
In addition to using BMA, we can use the posterior means under model selection. This corresponds to a decision rule that combines estimation and selection. BAS
currently implements the following options
highest probability model:
HPM = predict(crime.ZS, estimator="HPM")
# show the indices of variables in the best model where 0 is the intercept
HPM$bestmodel
## [1] 0 1 3 4 9 11 13 14 15
A little more interpretable version with names:
(crime.ZS$namesx[HPM$bestmodel +1])[-1]
## [1] "M" "Ed" "Po1" "NW" "U2" "Ineq" "Prob" "Time"
median probability model:
MPM = predict(crime.ZS, estimator="MPM")
(crime.ZS$namesx[attr(MPM$fit, 'model') +1])[-1]
## [1] "M" "Ed" "Po1" "NW" "U2" "Ineq" "Prob"
Note that we can also extract the best model from the attribute in the fitted values as well.
best predictive model:
This is the model that is closest to BMA predictions under squared error loss.
BPM = predict(crime.ZS, estimator="BPM")
(crime.ZS$namesx[attr(BPM$fit, 'model') +1])[-1]
## [1] "M" "So" "Ed" "Po1" "Po2" "M.F" "NW" "U2" "Ineq" "Prob"
Let’s see how they compare:
library(GGally)
ggpairs(data.frame(HPM = as.vector(HPM$fit), #this used predict so we need to extract fitted values
MPM = as.vector(MPM$fit), # this used fitted
BPM = as.vector(BPM$fit), # this used fitted
BMA = as.vector(BMA$fit))) # this used predict
Using the se.fit = TRUE
option with predict
we can also calculate standard deviations for prediction or for the mean and use this as imput for the confint
function for the prediction object.
BPM = predict(crime.ZS, estimator="BPM", se.fit=TRUE)
crime.conf.fit = confint(BPM, parm="mean")
crime.conf.pred = confint(BPM, parm="pred")
cbind(crime.conf.fit, crime.conf.pred)
## 2.5 % 97.5 % mean 2.5 % 97.5 % pred
## [1,] 6.513238 6.824738 6.668988 6.258715 7.079261 6.668988
## [2,] 7.151787 7.429921 7.290854 6.886619 7.695089 7.290854
## [3,] 6.039978 6.364354 6.202166 5.789406 6.614926 6.202166
## [4,] 7.490608 7.832006 7.661307 7.245129 8.077484 7.661307
## [5,] 6.847647 7.183493 7.015570 6.600523 7.430617 7.015570
## [6,] 6.279276 6.659818 6.469547 6.044966 6.894128 6.469547
## [7,] 6.555130 6.997135 6.776133 6.336920 7.215346 6.776133
## [8,] 7.117166 7.481955 7.299560 6.878450 7.720670 7.299560
## [9,] 6.482384 6.747470 6.614927 6.212890 7.016964 6.614927
## [10,] 6.468988 6.724836 6.596912 6.196374 6.997449 6.596912
## [11,] 6.877582 7.188087 7.032834 6.622750 7.442918 7.032834
## [12,] 6.462326 6.701317 6.581822 6.183896 6.979748 6.581822
## [13,] 6.281998 6.653843 6.467921 6.045271 6.890571 6.467921
## [14,] 6.403813 6.728664 6.566239 6.153385 6.979092 6.566239
## [15,] 6.388987 6.711270 6.550129 6.137779 6.962479 6.550129
## [16,] 6.746097 7.031087 6.888592 6.483166 7.294019 6.888592
## [17,] 6.063944 6.441526 6.252735 5.828815 6.676654 6.252735
## [18,] 6.564634 7.026895 6.795764 6.351369 7.240160 6.795764
## [19,] 6.766289 7.125086 6.945687 6.525866 7.365508 6.945687
## [20,] 6.840374 7.160289 7.000331 6.588442 7.412220 7.000331
## [21,] 6.443389 6.784108 6.613748 6.197710 7.029787 6.613748
## [22,] 6.352123 6.666946 6.509534 6.098628 6.920441 6.509534
## [23,] 6.589687 6.973172 6.781430 6.356187 7.206672 6.781430
## [24,] 6.659905 6.943825 6.801865 6.396626 7.207104 6.801865
## [25,] 6.187973 6.549014 6.368493 5.948191 6.788795 6.368493
## [26,] 7.173560 7.638879 7.406220 6.961027 7.851412 7.406220
## [27,] 5.780243 6.209869 5.995056 5.558924 6.431187 5.995056
## [28,] 6.970370 7.291621 7.130996 6.718847 7.543144 7.130996
## [29,] 6.904331 7.264275 7.084303 6.664237 7.504370 7.084303
## [30,] 6.360876 6.677539 6.519208 6.107948 6.930468 6.519208
## [31,] 5.952977 6.430114 6.191546 5.743237 6.639854 6.191546
## [32,] 6.472328 6.820844 6.646586 6.228936 7.064236 6.646586
## [33,] 6.591383 6.966323 6.778853 6.355520 7.202186 6.778853
## [34,] 6.683297 6.943958 6.813627 6.412314 7.214940 6.813627
## [35,] 6.503099 6.870205 6.686652 6.265039 7.108265 6.686652
## [36,] 6.788852 7.304426 7.046639 6.587815 7.505464 7.046639
## [37,] 6.601977 6.971745 6.786861 6.364667 7.209055 6.786861
## [38,] 6.128026 6.484162 6.306094 5.886840 6.725348 6.306094
## [39,] 6.460387 6.740965 6.600676 6.196020 7.005333 6.600676
## [40,] 6.934796 7.254189 7.094493 6.682705 7.506280 7.094493
## [41,] 6.374613 6.816734 6.595673 6.156431 7.034916 6.595673
## [42,] 5.761671 6.249794 6.005732 5.554476 6.456988 6.005732
## [43,] 6.822918 7.102682 6.962800 6.558285 7.367316 6.962800
## [44,] 6.910261 7.220580 7.065421 6.655371 7.475470 7.065421
## [45,] 6.060228 6.473190 6.266709 5.834621 6.698797 6.266709
## [46,] 6.315350 6.708046 6.511698 6.084359 6.939037 6.511698
## [47,] 6.644370 7.001773 6.823072 6.403548 7.242596 6.823072
plot(crime.conf.fit)
## NULL
Many problems are too large to enumerate all possible models. In such cases we may use the method=MCMC
option to sample models using Markov Chain Monte Carlo sampling to sample models based on their posterior probabilities.
crime.ZS = bas.lm(y ~ .,
data=UScrime,
prior="ZS-null",
modelprior=uniform(),
method = "MCMC")
This will run the MCMC sampler until the number of uniques sampled models exceeds n.models
which is \(2^p\) (if \(p < 19\)) by default or until MCMC.iterations
has been exceeded, where MCMC.iterations = n.models*2
by default.
With MCMC sampling there are two estimates of the marginal inclusion probabilities: object$probne0
which are obtained by using the renormalized posterior odds from sampled models to estimate probabilities and the estimates based on Monte Carlo frequencies object$probs.MCMC
. These should be in close agreement if the MCMC sampler has run for enough iterations.
BAS
includes a diagnostic function to compare the two sets of estimates of posterior inclusion probabilities and posterior model probabilities
diagnostics(crime.ZS, type="pip", pch=16)
diagnostics(crime.ZS, type="model", pch=16)
In the left hand plot of pips, each point represents one posterior inclusion probability for the 15 variables estimated under the two methods. The two estimators are in pretty close agreement. The plot of the model probabilites suggests that we should use more MCMC.iterations
if we want more accurate estimates of the posterior model probabilities.
crime.ZS = bas.lm(y ~ .,
data=UScrime,
prior="ZS-null",
modelprior=uniform(),
method = "MCMC", MCMC.iterations = 10^6)
diagnostics(crime.ZS, type="model", pch=16)
BAS
includes other prior distributions on coefficients and models, as well as bas.glm
for fitting Generalized Linear Models. Some of the syntax for bas.glm and bas.lm have not converged, particularly how some of the priors are represented, so do please see the documentation for more features and details until this is updated!