Coarse-to-fine spatial modeling (CFSM) using the spCF package

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This vignette introduces spatial prediction, multiscale pattern extraction, and uncertainty quantification using the spCF package. Throughout the vignette, we employ the coarse-to-fine spatial modeling (CFSM) framework. CFSM sequentially learns spatial patterns from coarser to finer scales and terminates the learning process when the validation score no longer improves. CFSM is especially suitable for moderate-to-large samples. See Murakami et al. (2025) for further details.

Data and setup

Let us load the required packages

library(spCF)
library(sp)
library(sf)
#> Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE

The meuse dataset included in the sp package is used in this vignette. This dataset comprises measurements of heavy metal concentrations in topsoil collected from a floodplain along the River Meuse, along with several covariates. In this example, zinc concentrations observed at 155 sample sites are used to predict concentrations over 3,103 regularly spaced grid cells.

The response variable is the logarithm of zinc concentration. The covariates include the distance to the river and dummy variables indicating flood frequency (ffreq2 and ffreq3):

### Data at samples sites
data(meuse)
y        <- log(meuse[,"zinc"])     # Response variable
coords   <- meuse[,c("x","y")]      # Coordinates
x        <- data.frame(dist   = meuse[,"dist"],
                       ffreq2 = as.integer(meuse$ffreq == 2),
                       ffreq3 = as.integer(meuse$ffreq == 3)) # Covariates

### Data at prediction sites
data(meuse.grid)
coords0  <- meuse.grid[,c("x","y")] # Coordinates
x0       <- data.frame(dist   = meuse.grid[,"dist"],
                       ffreq2 = as.integer(meuse.grid$ffreq == 2),
                       ffreq3 = as.integer(meuse.grid$ffreq == 3)) # Covariates

Zinc concentration is spatially plotted as follows:

obs_s    <- st_as_sf( data.frame(coords, zinc=y), coords= c("x","y"), crs=28992)
plot(obs_s[,"zinc"], cex=0.6,pch = 20, nbreaks = 20, key.pos=4, axes=TRUE)

Coarse-to-fine spatial modeling

In CFSM, the number of spatial scales, R, is optimized through holdout validation. A smaller value of R, corresponding to early stopping, captures coarser-scale spatial patterns, whereas a larger R allows the model to represent finer-scale patterns. The cf_lm_hv function performs holdout validation as follows:

set.seed(123) # For this vignette, training samples are fixed
mod_hv   <- cf_lm_hv(y = y, x = x, coords = coords)
#> [1] "--- SSE: Linear regression ---"
#> [1] 7.462058
#> [1] "--- SSE: Learning multi-scale spatial processes ---"
#> [1] 6.733006 (Scale  1)
#> [1] 6.381509 (Scale  2)
#> [1] 6.202699 (Scale  3)
#> [1] 6.041188 (Scale  4)
#> [1] 5.895836 (Scale  5)
#> [1] 5.723286 (Scale  6)
#> [1] 5.534511 (Scale  7)
#> [1] 5.321755 (Scale  8)
#> [1] 5.159955 (Scale  9)
#> [1] 4.927529 (Scale 10)
#> [1] 4.707226 (Scale 11)
#> [1] 4.548683 (Scale 12)
#> [1] 4.416852 (Scale 13)
#> [1] 4.304862 (Scale 14)
#> [1] 4.204059 (Scale 15)
#> [1] 4.119032 (Scale 16)
#> [1] 4.046272 (Scale 17)
#> [1] 4.046272 (Scale 18)
#> [1] 3.964767 (Scale 19)
#> [1] 3.964767 (Scale 20)
#> [1] 3.901129 (Scale 21)
#> [1] 3.901129 (Scale 22)
#> [1] 3.901129 (Scale 23)
#> [1] 3.901129 (Scale 24)
#> [1] 3.901129 (Scale 25)
#> [1] "--- SSE: After coefficient adjustment ---"
#> [1] 3.842398

In this example, 75% of the samples are randomly selected for training, and the remaining samples are used for validation (train_rat = 0.75). Alternatively, the training samples can be specified explicitly using the id_train argument. As shown in the output, the sum-of-squares error (SSE) for the validation samples gradually decreases as learning proceeds from the coarsest scale (Scale 1) to the finest scale (Scale 21 in this case).

Optionally, an additional learning can be applied to further reduce the validation SSE. Currently, only random forest is available; it is implemented by specifying add_learn = "rf" in the cf_lm_hv function, to capture non-linear patterns and higher-order interactions among covariates and spatial coordinates.

After holdout validation, the full model is trained using the cf_lm function:

mod      <- cf_lm(y = y, x = x, x0 = x0, coords = coords, coords0 = coords0, mod_hv = mod_hv)
#> [1] "--- Learning multi-scale spatial processes ---"
#> [1]  Scale  1 (bandwidth:1436.96)
#> [1]  Scale  2 (bandwidth:1293.264)
#> [1]  Scale  3 (bandwidth:1163.938)
#> [1]  Scale  4 (bandwidth:1047.544)
#> [1]  Scale  5 (bandwidth:942.7897)
#> [1]  Scale  6 (bandwidth:848.5107)
#> [1]  Scale  7 (bandwidth:763.6596)
#> [1]  Scale  8 (bandwidth:687.2937)
#> [1]  Scale  9 (bandwidth:618.5643)
#> [1]  Scale 10 (bandwidth:556.7079)
#> [1]  Scale 11 (bandwidth:501.0371)
#> [1]  Scale 12 (bandwidth:450.9334)
#> [1]  Scale 13 (bandwidth:405.84)
#> [1]  Scale 14 (bandwidth:365.256)
#> [1]  Scale 15 (bandwidth:328.7304)
#> [1]  Scale 16 (bandwidth:295.8574)
#> [1]  Scale 17 (bandwidth:266.2717)
#> [1]  Scale 18 (bandwidth:239.6445)
#> [1]  Scale 19 (bandwidth:215.68)
#> [1]  Scale 20 (bandwidth:194.112)
#> [1]  Scale 21 (bandwidth:174.7008)

In addition to the covariates x and spatial coordinates coords at the sample sites, the corresponding covariates (x0) and coordinates (coords0) at the prediction sites are also specified to enable spatial prediction.

The estimated regression coefficients, standard deviations of the scale-wise spatial processes, and error statistics are displayed as follows:

mod
#> Call:
#> cf_lm(y = y, x = x, coords = coords, x0 = x0, coords0 = coords0, 
#>     mod_hv = mod_hv)
#> 
#> ----Coefficients---------------------------------------
#>                 coef    coef_se lower_95CI upper_95CI
#> Intercept  6.7883824 0.06277981  6.6653339  6.9114308
#> dist      -2.5958941 0.20338559 -2.9945298 -2.1972583
#> ffreq2    -0.6132297 0.09010173 -0.7898290 -0.4366303
#> ffreq3    -0.6041271 0.11213476 -0.8239113 -0.3843430
#> 
#> ----Standard deviations (influential elements only)----
#>           elements standard_deviation
#> 1               xb         0.68002228
#> 2   spatial_scale1         0.09866884
#> 3   spatial_scale2         0.05187771
#> 4   spatial_scale3         0.02740153
#> 5   spatial_scale4         0.01837173
#> 6   spatial_scale5         0.01802175
#> 7   spatial_scale6         0.01810347
#> 8   spatial_scale7         0.01920265
#> 9   spatial_scale8         0.02250761
#> 10  spatial_scale9         0.02380506
#> 11 spatial_scale10         0.02604527
#> 12 spatial_scale11         0.02789054
#> 13 spatial_scale12         0.02746576
#> 14 spatial_scale13         0.02726837
#> 15 spatial_scale14         0.02630008
#> 16 spatial_scale15         0.02538236
#> 17 spatial_scale16         0.02523592
#> 18 spatial_scale17         0.02611116
#> 19 spatial_scale19         0.03359572
#> 20 spatial_scale21         0.04286941
#> 21       residuals         0.17623437
#> 
#> ----Error statistics ----------------------------------
#>              stat     value
#> 1   validation_R2 0.9235123
#> 2 validation_RMSE 0.3138838
#> 3     residual_SD 0.1762344

Mapping

Predictive values

The predictive values and their standard deviations at the grid cells are mapped as follows:

### Convert gridded points to gridded polygons (for clear visualization)
meuse.grid_sp             <- meuse.grid
coordinates(meuse.grid_sp)<- c("x", "y")
gridded(meuse.grid_sp)    <- TRUE
meuse.grid_sf             <- st_as_sf(as(meuse.grid_sp, "SpatialPolygons"))
st_crs(meuse.grid_sf)     <- 28992

### Mapping predictive mean and standard deviations
meuse.grid_sf$pred        <- mod$pred0$pred   # Predictive mean
meuse.grid_sf$pred_sd     <- mod$pred0$pred_sd# Predictive standard deviations
plot(meuse.grid_sf[,"pred"], border = NA, nbreaks = 20, key.pos=4,axes=TRUE)

plot(meuse.grid_sf[,"pred_sd"], border = NA, nbreaks = 20, key.pos=4,axes=TRUE)

As expected, the standard deviation, which quantifies prediction uncertainty, increases in the eastern center where sample sites are relatively sparse.

Multiscale spatial pattern/feature extraction

The sp_scalewise function extracts scalewise spatial processes with bandwidth values falling within a pre-specified range. For example, the following commands extract the large-, moderate-, and small-scale processes, corresponding to bandwidth ranges of 1000+, 500–1000, and 0–500, respectively.

mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwdith)
mod_s2<- sp_scalewise(mod,bw_range=c(500,1000)) # Moderate scale (500 <= bandwdith < 1000)
mod_s3<- sp_scalewise(mod,bw_range=c(0,500))    # Small scale (bandwdith < 500)

The extracted scalewise processes are mapped as follows:

z1    <- mod_s1$pred0$pred   # Large-scale process
z2    <- mod_s2$pred0$pred   # Moderate-scale process
z3    <- mod_s3$pred0$pred   # Small-scale process
meuse.grid_sf3  <- cbind(meuse.grid_sf, z1, z2, z3)
plot(meuse.grid_sf3[,c("z1","z2","z3")], border = NA, nbreaks = 20,key.pos=4,axes=TRUE)

Their standard deviations are displayed as follows:

z1_sd <- mod_s1$pred0$pred_sd   # Large-scale process
z2_sd <- mod_s2$pred0$pred_sd   # Moderate-scale process
z3_sd <- mod_s3$pred0$pred_sd   # Small-scale process
meuse.grid_sf3  <- cbind(meuse.grid_sf, z1_sd, z2_sd, z3_sd)
plot(meuse.grid_sf3[,c("z1_sd","z2_sd","z3_sd")],border = NA,nbreaks = 20,key.pos=4,axes=TRUE)

The sp_scalewise function is useful for multiscale spatial pattern analysis (or feature extraction), which is commonly conducted in ecological, epidemiological, and environmental studies.

Reference

Murakami, Daisuke, Alexis Comber, Takahiro Yoshida, Narumasa Tsutsumida, Chris Brunsdon, and Tomoki Nakaya. 2025. “Coarse-to-Fine Spatial Modeling: A Scalable, Machine-Learning-Compatible Spatial Model.” ArXiv.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.