The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting Started with spatialAtomizeR

Yunzhe Qian (Bella), Principal Investigators: Dr. Nancy Krieger and Dr. Rachel Nethery

2026-02-13

Overview

spatialAtomizeR implements atom-based Bayesian regression methods (ABRM) for spatial data with misaligned grids. This vignette demonstrates the basic workflow for analyzing misaligned spatial data using both simulated and real-world examples.

Installation

# Install from CRAN
install.packages("spatialAtomizeR")

# Or install development version from GitHub
devtools::install_github("bellayqian/spatialAtomizeR")

Load Required Packages

Important: Always load nimble before using spatialAtomizeR functions that involve model fitting.

library(spatialAtomizeR)
library(nimble)  # Required for ABRM models

Example 1: Simulated Data

This example demonstrates the complete workflow using simulated data.

Step 1: Simulate Misaligned Data

Generate spatial data with misaligned grids. The function creates two non-overlapping grids (X-grid and Y-grid) and their intersection (atoms).

sim_data <- simulate_misaligned_data(
  seed = 42,
  
  # Distribution specifications for covariates
  dist_covariates_x = c('normal', 'poisson', 'binomial'),
  dist_covariates_y = c('normal', 'poisson', 'binomial'),
  dist_y = 'poisson',  # Outcome distribution
  
  # Intercepts for data generation (REQUIRED)
  x_intercepts = c(4, -1, -1),      # One per X covariate
  y_intercepts = c(4, -1, -1),      # One per Y covariate
  beta0_y = -1,                     # Outcome model intercept
  
  # Spatial correlation parameters
  x_correlation = 0.5,  # Correlation between X covariates
  y_correlation = 0.5,  # Correlation between Y covariates
  
  # True effect sizes for outcome model
  beta_x = c(-0.03, 0.1, -0.2),    # Effects of X covariates
  beta_y = c(0.03, -0.1, 0.2)      # Effects of Y covariates
)

The simulated data object contains: - gridx: X-grid spatial features with covariates - gridy: Y-grid spatial features with covariates and outcome - atoms: Atom-level spatial features (intersection of grids) - true_params: True parameter values used in simulation

Step 2: Examine Simulated Data

The simulate_misaligned_data function returns an object of class misaligned_data with useful S3 methods:

# Check the class
class(sim_data)

# Print method shows basic information
print(sim_data)

# Summary method shows more details
summary(sim_data)

Step 3: Get ABRM Model Code

Retrieve the pre-compiled NIMBLE model specification:

model_code <- get_abrm_model()

Step 4: Run ABRM Analysis

Fit the ABRM model. You must specify which covariates follow which distributions by providing their indices:

abrm_results <- run_abrm(
  gridx = sim_data$gridx,
  gridy = sim_data$gridy,
  atoms = sim_data$atoms,
  model_code = model_code,
  true_params = sim_data$true_params,  # Optional: for validation
  
  # Map distribution indices to positions in dist_covariates_x/y
  norm_idx_x = 1,   # 'normal' is 1st in dist_covariates_x
  pois_idx_x = 2,   # 'poisson' is 2nd
  binom_idx_x = 3,  # 'binomial' is 3rd
  norm_idx_y = 1,   # Same for Y covariates
  pois_idx_y = 2,
  binom_idx_y = 3,
  
  # Outcome distribution: 1=normal, 2=poisson, 3=binomial
  dist_y = 2,
  
  # MCMC parameters
  niter = 50000,    # Total iterations per chain
  nburnin = 30000,  # Burn-in iterations
  nchains = 2,      # Number of chains
  seed = 123
)

Step 5: Examine ABRM Results

The run_abrm function returns an object of class abrm with S3 methods:

# Check the class
class(abrm_results)

# Print method shows summary statistics
print(abrm_results)

# Summary method shows detailed parameter estimates
summary(abrm_results)

# Plot method shows MCMC diagnostics (if available)
plot(abrm_results)

Access vairance-covariance matrices from the MCMC posterior sample

# Get variance-covariance matrices
vcov_matrices <- vcov(abrm_results)

# Print summary
print(vcov_matrices)

# Access specific matrices
vcov_matrices$vcov_beta_x   # Variance-covariance for X-grid coefficients
vcov_matrices$vcov_beta_y   # Variance-covariance for Y-grid coefficients
vcov_matrices$vcov_beta_0   # Variance of intercept
vcov_matrices$vcov_all      # Combined matrix for all parameters

# Compute standard errors from diagonal
sqrt(diag(vcov_matrices$vcov_beta_x))

# Compute correlation matrix from covariance matrix
cov2cor(vcov_matrices$vcov_beta_y)

# Check if parameters are correlated
vcov_matrices$vcov_beta_x[1, 2]  # Covariance between first two X coefficients

Access specific results:

# Parameter estimates table
abrm_results$parameter_estimates

# MCMC convergence diagnostics
abrm_results$mcmc_results$convergence

# Full MCMC samples
# abrm_results$mcmc_results$samples

Example 2: Real Data - Utah Counties and School Districts

This example demonstrates using real spatial data from Utah, matching the analysis in the manuscript.

Load Required Packages for Spatial Data

library(tigris)    # For US Census shapefiles
library(sf)        # Spatial features
library(sp)        # Spatial objects
library(spdep)     # Spatial dependence
library(raster)    # Spatial data manipulation
library(dplyr)     # Data manipulation
library(ggplot2)   # Plotting

Step 1: Load Utah Shapefiles

set.seed(500)

# Load Utah counties (Y-grid)
cnty <- counties(state = 'UT')
gridy <- cnty %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))  # ID must be numeric

# Load Utah school districts (X-grid)
scd <- school_districts(state = 'UT')
gridx <- scd %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))

Step 2: Visualize Spatial Misalignment

ggplot() +
  geom_sf(data = gridx, fill = NA, color = "orange", linewidth = 1.2) +
  geom_sf(data = gridy, fill = NA, color = "blue", 
          linetype = 'dashed', linewidth = 0.5) +
  labs(title = "Spatial Misalignment in Utah",
       subtitle = "Orange: School Districts | Blue: Counties") +
  theme_void()

Step 3: Create Atoms by Intersecting Grids

# Intersect the two grids to create atoms
atoms <- raster::intersect(as(gridy, 'Spatial'), as(gridx, 'Spatial'))
atoms <- sf::st_as_sf(atoms)

# Rename ID variables to match expected names
atoms <- atoms %>%
  rename(ID_y = ID_1,    # Y-grid (county) ID
         ID_x = ID_2)     # X-grid (school district) ID

Step 4: Add Atom-Level Population Counts

ABRM requires population counts at the atom level. Here we use LandScan gridded population data:

# NOTE: You must download LandScan data separately
# Available at: https://landscan.ornl.gov/
# This example assumes the file is in your working directory

pop_raster <- raster("landscan-global-2022.tif")

# Extract population for each atom
pop_atoms <- raster::extract(
  pop_raster, 
  st_transform(atoms, crs(pop_raster)),
  fun = sum, 
  na.rm = TRUE
)

atoms$population <- pop_atoms

Step 5: Generate Semi-Synthetic Outcome and Covariates

For demonstration, we generate synthetic outcome and covariate data at the atom level:

# Create atom-level spatial adjacency matrix
W_a <- nb2mat(
  poly2nb(as(atoms, "Spatial"), queen = TRUE),
  style = "B", 
  zero.policy = TRUE
)

# Generate spatial random effects
atoms <- atoms %>%
  mutate(
    re_x = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_z = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_y = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8)
  )

# Generate atom-level covariates and outcome
atoms <- atoms %>%
  mutate(
    # X-grid covariate (Poisson)
    covariate_x_1 = rpois(
      n = nrow(atoms), 
      lambda = population * exp(-1 + re_x)
    ),
    # Y-grid covariate (Normal)
    covariate_y_1 = rnorm(
      n = nrow(atoms), 
      mean = population * (3 + re_z), 
      sd = 1 * population
    )
  ) %>%
  mutate(
    # Outcome (Poisson)
    y = rpois(
      n = nrow(atoms),
      lambda = population * exp(
        -5 + 
        1 * (covariate_x_1 / population) - 
        0.3 * (covariate_y_1 / population) + 
        re_y
      )
    )
  )

Step 6: Aggregate to Observed Grids

In reality, we only observe aggregated data at the grid level, not at atoms:

# Aggregate X covariates to X-grid
gridx[["covariate_x_1"]] <- sapply(gridx$ID, function(j) {
  atom_indices <- which(atoms$ID_x == j)
  sum(atoms[["covariate_x_1"]][atom_indices])
})

# Aggregate Y covariates to Y-grid
gridy[["covariate_y_1"]] <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms[["covariate_y_1"]][atom_indices])
})

# Aggregate outcome to Y-grid
gridy$y <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms$y[atom_indices])
})

Step 7: Run ABRM on Real Data

model_code <- get_abrm_model()

mcmc_results <- run_abrm(
  gridx = gridx,
  gridy = gridy,
  atoms = atoms,
  model_code = model_code,
  
  # Specify covariate distributions
  pois_idx_x = 1,  # X covariate is Poisson
  norm_idx_y = 1,  # Y covariate is Normal
  dist_y = 2,      # Outcome is Poisson
  
  # MCMC settings (increase for real analysis)
  niter = 300000,
  nburnin = 200000,
  nchains = 2,
  seed = 123
)

# View results
summary(mcmc_results)

Understanding Distribution Indices

When you specify covariates, the indices correspond to their positions:

dist_covariates_x = c('normal', 'poisson', 'binomial')
#                       ^1        ^2         ^3

# So you would specify:
norm_idx_x = 1    # First position
pois_idx_x = 2    # Second position  
binom_idx_x = 3   # Third position

Distribution Type Codes

Code	Distribution	String	Use Case
1	Normal	‘normal’	Continuous data
2	Poisson	‘poisson’	Count/rate data
3	Binomial	‘binomial’	Proportion/binary data

Troubleshooting

Common Errors

Error: “could not find function ‘getNimbleOption’” - Solution: Load nimble before calling get_abrm_model(): library(nimble)

Error: “unused arguments (res1 = c(5, 5), res2 = c(10, 10))” - Solution: The simulate_misaligned_data() function does not have res1 or res2 parameters. Grid resolutions are fixed internally.

Error: “gridx must contain an area ID variable named ‘ID’” - Solution: Ensure your spatial dataframes have a numeric column named ID

Key Parameters Reference

Critical Parameters (Don’t Omit!)

x_intercepts: Intercepts for X covariates (must match length of dist_covariates_x)
y_intercepts: Intercepts for Y covariates (must match length of dist_covariates_y)
beta0_y: Outcome model intercept
beta_x: True coefficients for X covariates (if simulating)
beta_y: True coefficients for Y covariates (if simulating)

MCMC Parameters

niter: Total MCMC iterations per chain (default: 50000)
nburnin: Burn-in iterations to discard (default: 30000)
nchains: Number of independent chains (default: 2)
thin: Thinning interval (default: 10)

Getting Help

Function documentation: ?simulate_misaligned_data, ?run_abrm
Report issues: https://github.com/bellayqian/spatialAtomizeR/issues
Package help: ?spatialAtomizeR

Citation

citation("spatialAtomizeR")

Reference: Qian, Y., Johnson, N., Krieger, N., & Nethery, R.C. (2026). spatialAtomizeR: Atom-Based Regression Models for Spatially Misaligned Data in R. R package version 0.2.6.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.