Gibbs sampling was originally designed by Geman and Geman (1984) for drawing updates from the Gibbs distribution, hence the name. However, single-site Gibbs sampling exhibits poor mixing due to the posterior correlation between the pixel labels. Thus it is very slow to converge when the correlation (controlled by the inverse temperature \(\beta\)) is high.
The algorithm of Swendsen and Wang (1987) addresses this problem by forming clusters of neighbouring pixels, then updating all of the labels within a cluster to the same value. When simulating from the prior, such as a Potts model without an external field, this algorithm is very efficient.
The SW
function in the PottsUtils package is implemented in a combination of R
and C
. The swNoData
function in bayesImageS is implemented using RcppArmadillo, which gives it a speed advantage. It is worth noting that the intention of bayesImageS is not to replace PottsUtils. Rather, an efficient Swendsen-Wang algorithm is used as a building block for implementations of ABC (Grelaud et al. 2009), path sampling (Gelman and Meng 1998), and the exchange algorithm (Murray, Ghahramani, and MacKay 2006). These other algorithms will be covered in future posts.
There are two things that we want to keep track of in this simulation study: the speed of the algorithm and the distribution of the summary statistic. We will be using system.time(..)
to measure both CPU and elapsed (wall clock) time taken for the same number of iterations, for a range of inverse temperatures:
beta <- seq(0,2,by=0.1)
tmMx.PU <- tmMx.bIS <- matrix(nrow=length(beta),ncol=2)
rownames(tmMx.PU) <- rownames(tmMx.bIS) <- beta
colnames(tmMx.PU) <- colnames(tmMx.bIS) <- c("user","elapsed")
We will discard the first 100 iterations as burn-in and keep the remaining 500.
The distribution of pixel labels can be summarised by the sufficient statistic of the Potts model:
\[ S(z) = \sum_{i \sim \ell \in \mathscr{N}} \delta(z_i, z_\ell) \]
where \(i \sim \ell \in \mathscr{N}\) are all of the pairs of neighbours in the lattice (ie. the cliques) and \(\delta(u,v)\) is 1 if \(u = v\) and 0 otherwise (the Kronecker delta function). swNoData
returns this automatically, but with SW
we will need to use the function sufficientStat
to calculate the sufficient statistic for the labels.
##
## Attaching package: 'PottsUtils'
## The following objects are masked from 'package:bayesImageS':
##
## getBlocks, getEdges, getNeighbors
mask <- matrix(1,50,50)
neigh <- getNeighbors(mask, c(2,2,0,0))
block <- getBlocks(mask, 2)
edges <- getEdges(mask, c(2,2,0,0))
n <- sum(mask)
k <- 2
bcrit <- log(1 + sqrt(k))
maxSS <- nrow(edges)
for (i in 1:length(beta)) {
# PottsUtils
tm <- system.time(result <- SW(iter,n,k,edges,beta=beta[i]))
tmMx.PU[i,"user"] <- tm["user.self"]
tmMx.PU[i,"elapsed"] <- tm["elapsed"]
res <- sufficientStat(result, neigh, block, k)
samp.PU[i,] <- res$sum[(burn+1):iter]
print(paste("PottsUtils::SW",beta[i],tm["elapsed"],median(samp.PU[i,])))
# bayesImageS
tm <- system.time(result <- swNoData(beta[i],k,neigh,block,iter))
tmMx.bIS[i,"user"] <- tm["user.self"]
tmMx.bIS[i,"elapsed"] <- tm["elapsed"]
samp.bIS[i,] <- result$sum[(burn+1):iter]
print(paste("bayesImageS::swNoData",beta[i],tm["elapsed"],median(samp.bIS[i,])))
}
## [1] "PottsUtils::SW 0 4.40900000000001 2448"
## [1] "bayesImageS::swNoData 0 0.109999999999999 2447.5"
## [1] "PottsUtils::SW 0.1 2.72000000000001 2571"
## [1] "bayesImageS::swNoData 0.1 0.156999999999996 2575.5"
## [1] "PottsUtils::SW 0.2 2.574 2695"
## [1] "bayesImageS::swNoData 0.2 0.165000000000006 2698"
## [1] "PottsUtils::SW 0.3 2.547 2830"
## [1] "bayesImageS::swNoData 0.3 0.179000000000002 2827"
## [1] "PottsUtils::SW 0.4 2.43199999999999 2971"
## [1] "bayesImageS::swNoData 0.4 0.189999999999998 2974"
## [1] "PottsUtils::SW 0.5 2.29100000000001 3123"
## [1] "bayesImageS::swNoData 0.5 0.192999999999998 3132"
## [1] "PottsUtils::SW 0.6 2.184 3305"
## [1] "bayesImageS::swNoData 0.6 0.185999999999993 3310"
## [1] "PottsUtils::SW 0.7 2.077 3516"
## [1] "bayesImageS::swNoData 0.7 0.205000000000013 3527"
## [1] "PottsUtils::SW 0.8 1.96699999999998 3783"
## [1] "bayesImageS::swNoData 0.8 0.179999999999978 3782"
## [1] "PottsUtils::SW 0.9 1.851 4160"
## [1] "bayesImageS::swNoData 0.9 0.162000000000006 4186"
## [1] "PottsUtils::SW 1 1.71299999999999 4520"
## [1] "bayesImageS::swNoData 1 0.164000000000016 4529"
## [1] "PottsUtils::SW 1.1 1.64100000000002 4679"
## [1] "bayesImageS::swNoData 1.1 0.167000000000002 4677.5"
## [1] "PottsUtils::SW 1.2 1.56999999999999 4761"
## [1] "bayesImageS::swNoData 1.2 0.146999999999991 4762"
## [1] "PottsUtils::SW 1.3 1.577 4810"
## [1] "bayesImageS::swNoData 1.3 0.145999999999987 4812"
## [1] "PottsUtils::SW 1.4 1.52799999999999 4843"
## [1] "bayesImageS::swNoData 1.4 0.141999999999996 4841"
## [1] "PottsUtils::SW 1.5 1.53399999999999 4863"
## [1] "bayesImageS::swNoData 1.5 0.150999999999982 4865"
## [1] "PottsUtils::SW 1.6 1.54299999999998 4875"
## [1] "bayesImageS::swNoData 1.6 0.155999999999977 4876"
## [1] "PottsUtils::SW 1.7 1.69400000000002 4884"
## [1] "bayesImageS::swNoData 1.7 0.135999999999996 4884"
## [1] "PottsUtils::SW 1.8 1.721 4889"
## [1] "bayesImageS::swNoData 1.8 0.131 4889"
## [1] "PottsUtils::SW 1.9 1.58699999999999 4893"
## [1] "bayesImageS::swNoData 1.9 0.157000000000011 4893"
## [1] "PottsUtils::SW 2 1.55799999999999 4896"
## [1] "bayesImageS::swNoData 2 0.140999999999991 4896"
Here is the comparison of elapsed times between the two algorithms (in seconds):
## user elapsed
## Min. :1.388 Min. :1.528
## 1st Qu.:1.436 1st Qu.:1.577
## Median :1.574 Median :1.721
## Mean :1.887 Mean :2.034
## 3rd Qu.:2.143 3rd Qu.:2.291
## Max. :4.291 Max. :4.409
## user elapsed
## Min. :0.1080 Min. :0.1100
## 1st Qu.:0.1440 1st Qu.:0.1460
## Median :0.1540 Median :0.1570
## Mean :0.1576 Mean :0.1602
## 3rd Qu.:0.1760 3rd Qu.:0.1790
## Max. :0.2020 Max. :0.2050
On average, swNoData
using RcppArmadillo (Eddelbuettel and Sanderson 2014) is seven times faster than SW
.
library(lattice)
s_z <- c(samp.PU,samp.bIS)
s_x <- rep(beta,times=iter-burn)
s_a <- rep(1:2,each=length(beta)*(iter-burn))
s.frame <- data.frame(s_z,c(s_x,s_x),s_a)
names(s.frame) <- c("stat","beta","alg")
s.frame$alg <- factor(s_a,labels=c("SW","swNoData"))
xyplot(stat ~ beta | alg, data=s.frame)
The overlap between the two distributions is almost complete, although it is a bit tricky to verify that statistically. The relationship between \(beta\) and \(S(z)\) is nonlinear and heteroskedastic.
## [1] 0.716 4.334 1.674 -2.698 -1.010 7.746 1.946 8.534 0.082 21.128
## [11] 5.620 -3.016 -1.364 4.408 -2.248 0.720 0.438 0.468 0.776 -0.128
## [21] -0.164
## [1] 34.767386 36.687161 36.668988 36.726811 38.577284 43.979061 41.778715
## [8] 49.194918 56.279603 70.771043 45.209439 33.225939 24.994265 19.729318
## [15] 16.353182 12.014890 9.522702 7.984644 6.638950 5.652550 4.415097
## [1] 34.439460 36.057187 34.136805 36.112528 37.093624 40.823487 42.898457
## [8] 49.613610 55.255928 68.490544 47.140730 34.708536 26.807338 20.323738
## [15] 16.175408 12.847516 9.647048 7.780510 6.484738 5.610003 4.374120
## Df Sum Sq Mean Sq F value Pr(>F)
## alg 1 2.738e+04 27385 22.4 2.23e-06 ***
## beta 20 1.745e+10 872604519 713729.6 < 2e-16 ***
## Residuals 20978 2.565e+07 1223
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = stat ~ alg + beta, data = s.frame)
##
## $alg
## diff lwr upr p adj
## swNoData-SW 2.283905 1.338027 3.229783 2.1e-06
Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Comput. Stat. Data Anal. 71: 1054–63. https://doi.org/10.1016/j.csda.2013.02.005.
Gelman, Andrew, and Xiao-Li Meng. 1998. “Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling.” Statist. Sci. 13 (2): 163–85. https://doi.org/10.1214/ss/1028905934.
Geman, Stuart, and Donald Geman. 1984. “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images.” IEEE Trans. PAMI 6: 721–41.
Grelaud, Aude, Christian P. Robert, Jean-Michel Marin, François Rodolphe, and Jean-François Taly. 2009. “ABC Likelihood-Free Methods for Model Choice in Gibbs Random Fields.” Bayesian Analysis 4 (2): 317–36. https://doi.org/10.1214/09-BA412.
Murray, Iain, Zoubin Ghahramani, and David J. C. MacKay. 2006. “MCMC for Doubly-Intractable Distributions.” In Proc. \(22^{nd}\) Conf. UAI, 359–66. Arlington, VA: AUAI Press.
Swendsen, Robert H., and Jian-Sheng Wang. 1987. “Nonuniversal Critical Dynamics in Monte Carlo Simulations.” Phys. Rev. Lett. 58 (2): 86–88. https://doi.org/10.1103/PhysRevLett.58.86.