The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.
After loading the package via
library("mlr3spatiotempcv")
, the spatiotemporal resampling
methods and example tasks provided by {mlr3spatiotempcv} are available
to the user alongside the default {mlr3} resampling methods and
tasks.
To make use of spatial resampling methods, a {mlr3} task that is
aware of its spatial characteristic needs to be created. Two
Task
child classes exist in {mlr3spatiotempcv} for this
purpose:
TaskClassifST
TaskRegrST
To create one of these, you have multiple options:
Task
directly via
$new()
- this only works for data.table backends (!)as_task_*
converters (e.g. if your data is
stored in an sf
object)We recommend the latter, as the as_task_*
converters aim
to make task construction easier, e.g., by creating the
DataBackend
(which is required to create a Task in {mlr3})
automatically and setting the crs
and
coordinate_names
fields. Let’s assume your (point) data is
stored in with an sf
object, which is a common scenario for
spatial analysis in R.
# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717)
# create `TaskClassifST` from `sf` object
task = as_task_classif_st(data_sf, id = "ecuador_task", target = "slides", positive = "TRUE")
You can also use a plain data.frame
. In this case,
crs
and coordinate_names
need to be passed
along explicitly as they cannot be inferred directly from the
sf
object:
task = as_task_classif_st(ecuador, id = "ecuador_task", target = "slides",
positive = "TRUE", coordinate_names = c("x", "y"), crs = 32717)
The *ST
task family prints a subset of the coordinates
by default:
print(task)
#> <TaskClassifST:ecuador_task> (751 x 11)
#> * Target: slides
#> * Properties: twoclass
#> * Features (10):
#> - dbl (10): carea, cslope, dem, distdeforest, distroad,
#> distslidespast, hcurv, log.carea, slope, vcurv
#> * Coordinates:
#> x y
#> <num> <num>
#> 1: 712882.5 9560002
#> 2: 715232.5 9559582
#> 3: 715392.5 9560172
#> 4: 715042.5 9559312
#> 5: 715382.5 9560142
#> ---
#> 747: 714472.5 9558482
#> 748: 713142.5 9560992
#> 749: 713322.5 9560562
#> 750: 715392.5 9557932
#> 751: 713802.5 9560862
All *ST
tasks can be treated as their super class
equivalents TaskClassif
or TaskRegr
in
subsequent {mlr3} modeling steps.
In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.
TaskClassifST
TaskRegrST
mlr_reflections$task_types
#> Key: <type>
#> type package task learner
#> <char> <char> <char> <char>
#> 1: classif mlr3 TaskClassif LearnerClassif
#> 2: classif_st mlr3spatiotempcv TaskClassifST LearnerClassif
#> 3: regr mlr3 TaskRegr LearnerRegr
#> 4: regr_st mlr3spatiotempcv TaskRegrST LearnerRegr
#> 5: unsupervised mlr3 TaskUnsupervised Learner
#> prediction prediction_data measure
#> <char> <char> <char>
#> 1: PredictionClassif PredictionDataClassif MeasureClassif
#> 2: PredictionClassif PredictionDataClassif MeasureClassif
#> 3: PredictionRegr PredictionDataRegr MeasureRegr
#> 4: PredictionRegr PredictionDataRegr MeasureRegr
#> 5: <NA> <NA> <NA>
coordinate
space
time
mlr_reflections$task_col_roles
#> $regr
#> [1] "feature" "target" "name" "order" "stratum" "group" "weight"
#>
#> $classif
#> [1] "feature" "target" "name" "order" "stratum" "group" "weight"
#>
#> $unsupervised
#> [1] "feature" "name" "order"
#>
#> $classif_st
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinate" "space" "time"
#>
#> $regr_st
#> [1] "feature" "target" "name" "order" "stratum"
#> [6] "group" "weight" "coordinate" "space" "time"
mlr_resampling_spcv_block
mlr_resampling_spcv_buffer
mlr_resampling_spcv_coords
mlr_resampling_spcv_knndm
mlr_resampling_spcv_disc
mlr_resampling_spcv_tiles
mlr_resampling_spcv_env
mlr_resampling_sptcv_cstf
and their respective repeated versions. See
as.data.table(mlr_resamplings)
for the full dictionary.
tsk("ecuador")
(spatial, classif)
tsk("cookfarm_mlr3")
(spatiotemp, regr)
The following table lists all spatiotemporal methods implemented in
{mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific
references. All methods besides "spcv_buffer"
also have a
corresponding “repeated” method.
Category | (Package) Method Name | Reference | mlr3 Notation |
---|---|---|---|
Buffering, spatial | (blockCV) Spatial Buffering | Valavi et al. (2018) | mlr_resamplings_spcv_buffer |
Buffering, spatial | (sperrorest) Spatial Disc | Brenning (2012) | mlr_resamplings_spcv_disc |
Blocking, spatial | (blockCV) Spatial Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_block |
Blocking, spatial | (sperrorest) Spatial Tiles | Valavi et al. (2018) | mlr_resamplings_spcv_tiles |
Clustering, spatial | (sperrorest) Spatial CV | Brenning (2012) | mlr_resamplings_spcv_coords |
Clustering, spatial | (CAST) KNNDM | Linnenbrink et al. (2023) | mlr_resamplings_spcv_knndm |
Clustering, feature-space | (blockCV) Environmental Blocking | Valavi et al. (2018) | mlr_resamplings_spcv_env |
Grouping, predefined inds | (mlr3) Predefined partitions | mlr_resamplings_custom_cv |
|
Grouping, spatiotemporal | (mlr3) via col_roles "group" |
mlr_resamplings_cv ,
Task$set_col_roles(<variable>, "group") |
|
Grouping, spatiotemporal | (CAST) Leave-Location-and-Time-Out | Meyer et al. (2018) | mlr_resamplings_sptcv_cstf ,
Task$set_col_roles(<variable>, "space|time") |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.