The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette introduces spatial prediction, multiscale pattern extraction, and uncertainty quantification using the spCF package. Throughout the vignette, we employ the coarse-to-fine spatial modeling (CFSM) framework. CFSM sequentially learns spatial patterns from coarser to finer scales and terminates the learning process when the validation score no longer improves. CFSM is especially suitable for moderate-to-large samples. See Murakami et al. (2025) for further details.
Let us load the required packages
library(spCF)
library(sp)
library(sf)
#> Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUEThe meuse dataset included in the sp package is used in this vignette. This dataset comprises measurements of heavy metal concentrations in topsoil collected from a floodplain along the River Meuse, along with several covariates. In this example, zinc concentrations observed at 155 sample sites are used to predict concentrations over 3,103 regularly spaced grid cells.
The response variable is the logarithm of zinc concentration. The covariates include the distance to the river and dummy variables indicating flood frequency (ffreq2 and ffreq3):
### Data at samples sites
data(meuse)
y <- log(meuse[,"zinc"]) # Response variable
coords <- meuse[,c("x","y")] # Coordinates
x <- data.frame(dist = meuse[,"dist"],
ffreq2 = as.integer(meuse$ffreq == 2),
ffreq3 = as.integer(meuse$ffreq == 3)) # Covariates
### Data at prediction sites
data(meuse.grid)
coords0 <- meuse.grid[,c("x","y")] # Coordinates
x0 <- data.frame(dist = meuse.grid[,"dist"],
ffreq2 = as.integer(meuse.grid$ffreq == 2),
ffreq3 = as.integer(meuse.grid$ffreq == 3)) # CovariatesZinc concentration is spatially plotted as follows:
obs_s <- st_as_sf( data.frame(coords, zinc=y), coords= c("x","y"), crs=28992)
plot(obs_s[,"zinc"], cex=0.6,pch = 20, nbreaks = 20, key.pos=4, axes=TRUE)In CFSM, the number of spatial scales, R, is optimized through
holdout validation. A smaller value of R, corresponding to early
stopping, captures coarser-scale spatial patterns, whereas a larger R
allows the model to represent finer-scale patterns. The
cf_lm_hv function performs holdout validation as
follows:
set.seed(123) # For this vignette, training samples are fixed
mod_hv <- cf_lm_hv(y = y, x = x, coords = coords)
#> [1] "--- SSE: Linear regression ---"
#> [1] 7.462058
#> [1] "--- SSE: Learning multi-scale spatial processes ---"
#> [1] 6.733006 (Scale 1)
#> [1] 6.381509 (Scale 2)
#> [1] 6.202699 (Scale 3)
#> [1] 6.041188 (Scale 4)
#> [1] 5.895836 (Scale 5)
#> [1] 5.723286 (Scale 6)
#> [1] 5.534511 (Scale 7)
#> [1] 5.321755 (Scale 8)
#> [1] 5.159955 (Scale 9)
#> [1] 4.927529 (Scale 10)
#> [1] 4.707226 (Scale 11)
#> [1] 4.548683 (Scale 12)
#> [1] 4.416852 (Scale 13)
#> [1] 4.304862 (Scale 14)
#> [1] 4.204059 (Scale 15)
#> [1] 4.119032 (Scale 16)
#> [1] 4.046272 (Scale 17)
#> [1] 4.046272 (Scale 18)
#> [1] 3.964767 (Scale 19)
#> [1] 3.964767 (Scale 20)
#> [1] 3.901129 (Scale 21)
#> [1] 3.901129 (Scale 22)
#> [1] 3.901129 (Scale 23)
#> [1] 3.901129 (Scale 24)
#> [1] 3.901129 (Scale 25)
#> [1] "--- SSE: After coefficient adjustment ---"
#> [1] 3.842398In this example, 75% of the samples are randomly selected for
training, and the remaining samples are used for validation
(train_rat = 0.75). Alternatively, the training samples can
be specified explicitly using the id_train argument. As
shown in the output, the sum-of-squares error (SSE) for the validation
samples gradually decreases as learning proceeds from the coarsest scale
(Scale 1) to the finest scale (Scale 21 in this case).
Optionally, an additional learning can be applied to further reduce
the validation SSE. Currently, only random forest is available; it is
implemented by specifying add_learn = "rf" in the
cf_lm_hv function, to capture non-linear patterns and
higher-order interactions among covariates and spatial coordinates.
After holdout validation, the full model is trained using the
cf_lm function:
mod <- cf_lm(y = y, x = x, x0 = x0, coords = coords, coords0 = coords0, mod_hv = mod_hv)
#> [1] "--- Learning multi-scale spatial processes ---"
#> [1] Scale 1 (bandwidth:1436.96)
#> [1] Scale 2 (bandwidth:1293.264)
#> [1] Scale 3 (bandwidth:1163.938)
#> [1] Scale 4 (bandwidth:1047.544)
#> [1] Scale 5 (bandwidth:942.7897)
#> [1] Scale 6 (bandwidth:848.5107)
#> [1] Scale 7 (bandwidth:763.6596)
#> [1] Scale 8 (bandwidth:687.2937)
#> [1] Scale 9 (bandwidth:618.5643)
#> [1] Scale 10 (bandwidth:556.7079)
#> [1] Scale 11 (bandwidth:501.0371)
#> [1] Scale 12 (bandwidth:450.9334)
#> [1] Scale 13 (bandwidth:405.84)
#> [1] Scale 14 (bandwidth:365.256)
#> [1] Scale 15 (bandwidth:328.7304)
#> [1] Scale 16 (bandwidth:295.8574)
#> [1] Scale 17 (bandwidth:266.2717)
#> [1] Scale 18 (bandwidth:239.6445)
#> [1] Scale 19 (bandwidth:215.68)
#> [1] Scale 20 (bandwidth:194.112)
#> [1] Scale 21 (bandwidth:174.7008)In addition to the covariates x and spatial coordinates
coords at the sample sites, the corresponding covariates
(x0) and coordinates (coords0) at the
prediction sites are also specified to enable spatial prediction.
The estimated regression coefficients, standard deviations of the scale-wise spatial processes, and error statistics are displayed as follows:
mod
#> Call:
#> cf_lm(y = y, x = x, coords = coords, x0 = x0, coords0 = coords0,
#> mod_hv = mod_hv)
#>
#> ----Coefficients---------------------------------------
#> coef coef_se lower_95CI upper_95CI
#> Intercept 6.7883824 0.06277981 6.6653339 6.9114308
#> dist -2.5958941 0.20338559 -2.9945298 -2.1972583
#> ffreq2 -0.6132297 0.09010173 -0.7898290 -0.4366303
#> ffreq3 -0.6041271 0.11213476 -0.8239113 -0.3843430
#>
#> ----Standard deviations (influential elements only)----
#> elements standard_deviation
#> 1 xb 0.68002228
#> 2 spatial_scale1 0.09866884
#> 3 spatial_scale2 0.05187771
#> 4 spatial_scale3 0.02740153
#> 5 spatial_scale4 0.01837173
#> 6 spatial_scale5 0.01802175
#> 7 spatial_scale6 0.01810347
#> 8 spatial_scale7 0.01920265
#> 9 spatial_scale8 0.02250761
#> 10 spatial_scale9 0.02380506
#> 11 spatial_scale10 0.02604527
#> 12 spatial_scale11 0.02789054
#> 13 spatial_scale12 0.02746576
#> 14 spatial_scale13 0.02726837
#> 15 spatial_scale14 0.02630008
#> 16 spatial_scale15 0.02538236
#> 17 spatial_scale16 0.02523592
#> 18 spatial_scale17 0.02611116
#> 19 spatial_scale19 0.03359572
#> 20 spatial_scale21 0.04286941
#> 21 residuals 0.17623437
#>
#> ----Error statistics ----------------------------------
#> stat value
#> 1 validation_R2 0.9235123
#> 2 validation_RMSE 0.3138838
#> 3 residual_SD 0.1762344The predictive values and their standard deviations at the grid cells are mapped as follows:
### Convert gridded points to gridded polygons (for clear visualization)
meuse.grid_sp <- meuse.grid
coordinates(meuse.grid_sp)<- c("x", "y")
gridded(meuse.grid_sp) <- TRUE
meuse.grid_sf <- st_as_sf(as(meuse.grid_sp, "SpatialPolygons"))
st_crs(meuse.grid_sf) <- 28992
### Mapping predictive mean and standard deviations
meuse.grid_sf$pred <- mod$pred0$pred # Predictive mean
meuse.grid_sf$pred_sd <- mod$pred0$pred_sd# Predictive standard deviations
plot(meuse.grid_sf[,"pred"], border = NA, nbreaks = 20, key.pos=4,axes=TRUE)As expected, the standard deviation, which quantifies prediction uncertainty, increases in the eastern center where sample sites are relatively sparse.
The sp_scalewise function extracts scalewise spatial
processes with bandwidth values falling within a pre-specified range.
For example, the following commands extract the large-, moderate-, and
small-scale processes, corresponding to bandwidth ranges of 1000+,
500–1000, and 0–500, respectively.
mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwdith)
mod_s2<- sp_scalewise(mod,bw_range=c(500,1000)) # Moderate scale (500 <= bandwdith < 1000)
mod_s3<- sp_scalewise(mod,bw_range=c(0,500)) # Small scale (bandwdith < 500)The extracted scalewise processes are mapped as follows:
z1 <- mod_s1$pred0$pred # Large-scale process
z2 <- mod_s2$pred0$pred # Moderate-scale process
z3 <- mod_s3$pred0$pred # Small-scale process
meuse.grid_sf3 <- cbind(meuse.grid_sf, z1, z2, z3)
plot(meuse.grid_sf3[,c("z1","z2","z3")], border = NA, nbreaks = 20,key.pos=4,axes=TRUE)
Their standard deviations are displayed as follows:
z1_sd <- mod_s1$pred0$pred_sd # Large-scale process
z2_sd <- mod_s2$pred0$pred_sd # Moderate-scale process
z3_sd <- mod_s3$pred0$pred_sd # Small-scale process
meuse.grid_sf3 <- cbind(meuse.grid_sf, z1_sd, z2_sd, z3_sd)
plot(meuse.grid_sf3[,c("z1_sd","z2_sd","z3_sd")],border = NA,nbreaks = 20,key.pos=4,axes=TRUE)The sp_scalewise function is useful for multiscale
spatial pattern analysis (or feature extraction), which is commonly
conducted in ecological, epidemiological, and environmental studies.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.