The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Compared to the univariate gridded Gaussian case, we now place the data irregularly and assume we observe counts rather than a Gaussian response.
library(magrittr)
library(dplyr)
library(ggplot2)
library(meshed)
set.seed(2021)
<- 30 # coord values for jth dimension
SS <- 2 # spatial dimension
dd <- SS^2 # number of locations
n <- 3 # number of covariates
p
# irregularly spaced
<- cbind(runif(n), runif(n))
coords colnames(coords) <- c("Var1", "Var2")
<- 1.5
sigmasq <- 10
phi
# covariance at coordinates
<- sigmasq * exp(-phi * as.matrix(dist(coords)))
CC # cholesky of covariance matrix
<- t(chol(CC))
LC # spatial latent effects are a GP
<- LC %*% rnorm(n)
w
# covariates and coefficients
<- matrix(rnorm(n*p), ncol=p)
X <- matrix(rnorm(p), ncol=1)
Beta
# univariate outcome, fully observed
<- rpois(n, exp(-3 + X %*% Beta + w))
y_full
# now introduce some NA values in the outcomes
<- y_full %>% matrix(ncol=1)
y sample(1:n, n/5, replace=FALSE), ] <- NA
y[
<- coords %>%
simdata cbind(data.frame(Outcome_full= y_full,
Outcome_obs = y,
w = w))
%>% ggplot(aes(Var1, Var2, color=w)) +
simdata geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("w: Latent GP")
%>% ggplot(aes(Var1, Var2, color=y)) +
simdata geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: Observed outcomes")
We now fit the following spatial regression model \[ y ~ Poisson(\eta), \] where \(log(\eta) = X %*% Beta + w\), and \(w \sim MGP\) are spatial random effects. For spatial data, an exponential covariance function is used by default: \(C(h) = \sigma^2 \exp( -\phi h )\) where \(h\) is the spatial distance.
The prior for the spatial decay \(\phi\) is \(U[l,u]\) and the values of \(l\) and \(u\) must be specified. We choose \(l=1\), \(u=30\) for this dataset.1
Setting verbose
tells spmeshed
how many
intermediate messages to output while running MCMC. For brevity, we opt
to run a very short chain of MCMC with only 2000 iterations, of which we
discard the first third. Since the data are irregularly spaced, we build
a grid of knots of size 1600 using argument grid_size
,
which will facilitate computations. Then, just like in the gridded case
we use block_size=20
to specify the coarseness of domain
partitioning.
Finally, we note that NA
values for the outcome are OK
since the full dataset is on a grid. spmeshed
will figure
this out and use settings optimized for partly observed lattices, and
automatically predict the outcomes at missing locations. On the other
hand, X
values are assumed to not be missing.
<- 200 # too small! this is just a vignette.
mcmc_keep <- 400
mcmc_burn <- 2
mcmc_thin
<- system.time({
mesh_total_time <- spmeshed(y, X, coords,
meshout family="poisson",
grid_size=c(20, 20),
block_size = 20,
n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin,
n_threads = 16,
verbose = 5,
prior=list(phi=c(1,30))
)})#> Bayesian Meshed GP regression model fit via Markov chain Monte Carlo
#> Sending to MCMC.
#> 20.2% elapsed: 67ms (+ 61ms). ETR: 0.29s.
#> theta: Metrop. acceptance 29.00%, average 35.80%
#> theta = 2.783 0.175
#> lambda = 1.893
#>
#> 40.4% elapsed: 124ms (+ 57ms). ETR: 0.20s.
#> theta: Metrop. acceptance 26.50%, average 31.27%
#> theta = 1.453 0.306
#> lambda = 2.467
#>
#> 60.5% elapsed: 189ms (+ 64ms). ETR: 0.14s.
#> theta: Metrop. acceptance 21.50%, average 28.51%
#> theta = 1.283 0.640
#> lambda = 2.577
#>
#> 80.6% elapsed: 251ms (+ 62ms). ETR: 0.07s.
#> theta: Metrop. acceptance 23.00%, average 26.98%
#> theta = 1.649 0.748
#> lambda = 2.196
#>
#> MCMC done [0.303s]
We can now do some postprocessing of the results. We extract
posterior marginal summaries for \(\sigma^2\), \(\phi\), \(\tau^2\), and \(\beta_2\). The model that
spmeshed
targets is a slight reparametrization of the
above:2
\[ log(\eta) = X \beta + \lambda w, \]
where \(w\sim MGP\) has unitary
variance. This model is equivalent to the previous one and in fact we
find \(\sigma^2=\lambda^2\). Naturally,
it is much more difficult to estimate parameters when data are
counts.
summary(meshout$lambda_mcmc[1,1,]^2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 2.954 4.472 6.076 6.158 7.287 12.585
summary(meshout$theta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.004 1.261 1.561 1.645 1.952 3.718
summary(meshout$beta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.59923 -0.39025 -0.33478 -0.33508 -0.27247 -0.08278
We proceed to plot predictions across the domain along with the
recovered latent effects. We plot the latent effects at the grid we used
for fitting spmeshed
. Instead, we plot our predictions at
the original data locations. We may see some pattern by plotting the
data on the log scale.
# process means
<- data.frame(w_mgp = meshout$w_mcmc %>% summary_list_mean())
wmesh # predictions
<- data.frame(y_mgp = meshout$yhat_mcmc %>% summary_list_mean())
ymesh
<- meshout$coordsdata %>%
outdf cbind(wmesh, ymesh) %>%
left_join(simdata, by = c("Var1", "Var2"))
%>% filter(forced_grid==1) %>%
outdf ggplot(aes(Var1, Var2, fill=w_mgp)) +
geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("w: recovered")
%>% filter(forced_grid==0) %>%
outdf ggplot(aes(Var1, Var2, color=y_mgp)) +
geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: predictions")
spmeshed
implements a model which can be
interpreted as assigning \(\sigma^2\) a
folded-t via parameter expansion if settings$ps=TRUE
, or an
inverse Gamma with parameters \(a=2\)
and \(b=1\) if
settings$ps=FALSE
, which cannot be changed at this time.
\(\tau^2\) is assigned an exponential
prior.↩︎
At its core, spmeshed
implements the
spatial factor model \(Y(s) ~ Poisson(
exp(X(s)\beta + \Lambda v(s)) )\) where \(w(s) = \Lambda v(s)\) is modeled via linear
coregionalization.↩︎
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.