spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to convert GPS relocations to gambit-of-the-group format to build proximity-based social networks. In addition, the randomization function provides data-stream randomization methods suitable for GPS data.
spatsoc
leverages data.table
to modify by reference and iteratively work on subsets of the input data. The first input for all functions in spatsoc
is DT
, an input data.table
. If your data is a data.frame
, you can convert it by reference using setDT(DF)
.
spatsoc
is designed to work in two steps: temporal followed by spatial grouping. Considering your specific study species and system, determine a relevant temporal and spatial grouping threshold. This may be 5 minutes and 50 meters or 2 days and 100 meters or any other thresholds - that decision is left to you. In some cases, the spatial grouping function selected is only relevant with certain temporal grouping thresholds. For example, we wouldn't expect a threshold of 5 minutes with group_polys
.
# Load packages
library(spatsoc)
library(data.table)
# Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
# Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
# Spatial groups
group_pts(
DT,
threshold = 50,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup'
)
ID | X | Y | datetime | timegroup | group |
---|---|---|---|---|---|
A | 715851.4 | 5505340 | 2016-11-01 00:00:54 | 1 | 1 |
A | 715822.8 | 5505289 | 2016-11-01 02:01:22 | 2 | 2 |
A | 715872.9 | 5505252 | 2016-11-01 04:01:24 | 3 | 3 |
A | 715820.5 | 5505231 | 2016-11-01 06:01:05 | 4 | 4 |
A | 715830.6 | 5505227 | 2016-11-01 08:01:11 | 5 | 5 |
See the vignette about using spatsoc in social network analysis!
Install GEOS
:
apt-get install libgeos-dev
pacman -S geos
dnf install geos geos-devel
brew install geos
data.table
igraph
sp
adehabitatHR
group_times(DT, datetime, threshold)
DT
: input data.table
datetime
: date time column name in input data.tablethreshold
: threshold for grouping A data.table
with a date time formatted column. The input DT
will be returned with columns appended. The timegroup
column corresponds to the temporal group assigned to each row. Please note that the actual value of the time group is meaningless. Reordered data will return a different time group. What is meaningful, however, is the contents of each group. Each group will contain all rows nearest to the threshold provided.
The group_times
function expects either one column (POSIXct
) or two columns (IDate
and ITime
).
Given a character column representing the date time, convert it to POSIXct
or IDate
and ITime
:
DT[, datetime := as.POSIXct(datetime)]
DT[, c('idate', 'itime') := IDateTime(datetime)]
These are then provided to the function using the names of the column in the input data.
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
or
group_times(DT, datetime = c('idate', 'itime'), threshold = '5 minutes')
The threshold
provided to group_times
should be related to the fix rate of the input dataset and to the specific study system and species. If relocations are recorded every two hours, a threshold = '2 hours'
will group all rows to the nearest two hour group (10am, 12pm, 2pm, 4pm, …). This, however, means that the relocations can be up to one hour apart from each other. Picking a smaller threshold, e.g.: threshold = '15 minutes'
may be more relevant in some cases. The flexibility of spatsoc
's threshold argument means the user must carefully consider what threshold is reasonable to their specific system.
The threshold
of group_times
is considered only within the scope of 24 hours and this poses limitations on it:
threshold
must evenly divide into 60 minutes or 24 hoursDT
threshold
cannot be fractionalThe main column returned by group_times
is “timegroup”. It represents the temporal group of each row, where those nearest (either above or below) within the threshold are grouped. Its actual value does not have any meaning, but the contents of each group do. That means if the data is reordered, a row may have a different time group, but the other rows in that group should not change.
The extra columns are provided to help the user investigate, troubleshoot and interpret the timegroup.
threshold unit | column(s) added |
---|---|
minute | “minutes” column added identifying the nearest minute group for each row. |
hour | “hours” column added identifying the nearest hour group for each row. |
day | “block” columns added identifying the multiday block for each row. |
This message is returned to the user when a column matching those returned by group_times
is found in the input DT. This is commonly the case when group_times
is run multiple times consecutively.
This message is returned to the user when the threshold
is NULL. This is the default setting of threshold
and, at times, may be suitable. In this case, the date times in the datetime
column will be grouped exactly. Usually, a threshold should be provided.
group_pts(DT, threshold, id, coords, timegroup, splitBy)
DT
: input data.table
threshold
: threshold for grouping id
: column name of IDs in DT
coords
: column names of x and y coordinates in DT
timegroup
: (optional) column name of time groupsplitBy
: (optional) column names of extra variables to group onThe input data.table
. It will returned with a column named group appended, which represents the spatial (and temporal if timegroup
is provided) group.
The threshold must be in the units of the coordinates.
The coordinates must be planar, such as UTM (of whichever zone your relocations are in).
group_lines(DT, threshold, projection, id, coords, timegroup, sortBy, splitBy, spLines)
DT
: input data.table
threshold
: threshold for grouping projection
: projection of coordinates in DT
id
: column name of IDs in DT
coords
: column names of x and y coordinates in DT
timegroup
: (optional) column name of time groupsortBy
: column name of date time to sort rows for building linessplitBy
: (optional) column names of extra variables to group onspLines
: alternatively, provide solely a SpatialLines
objectSee 3.2.1.
The threshold
argument represents a buffer area around each line. When threshold = 0
, the lines are grouped by spatial overlap. If the threshold is greater than 0, the lines buffered, then grouped by spatial overlap.
The coordinates must be planar, such as UTM (of whichever zone your relocations are in).
The sortBy
argument expects a date time formatted column name, which is used to order the rows for each individual (and splitBy
).
group_polys(DT, area, hrType, hrParams, projection, id, coords, timegroup, splitBy, spLines)
DT
: input data.table
area
: boolean argument if proportional area should be returnedhrType
: type of home range created hrParams
: parameters relevant to the type of home range createdprojection
: projection of coordinates in DT
id
: column name of IDs in DT
coords
: column names of x and y coordinates in DT
timegroup
: (optional) column name of time groupsplitBy
: (optional) column names of extra variables to group onspPolys
: alternatively, provide solely a SpatialPolygons
objectIf area = FALSE
, see 3.2.1. If area = TRUE
, the DT will not be appended with a group column instead a data.table
with IDs and proportional area overlap will be returned.
The default unit for area overlap is square meters.
The coordinates must be planar, such as UTM (of whichever zone your relocations are in).
Currently, spatsoc
offers two types of home ranges provided by the adehabitatHR
package: 'mcp' (mcp
) and 'kernel' (kernelUD
and getverticeshr
). The parameters must match the arguments of those functions.
Internally, we match arguments to the functions allowing the user to provide, for example, both the percent (provided to getverticeshr
) and grid arguments (provided to mcp
).
group_polys(
DT,
area = FALSE,
projection = utm,
hrType = 'mcp',
hrParams = list(grid = 60, percent = 95),
id = 'ID',
coords = c('X', 'Y')
)
randomizations(DT, type, id, datetime, splitBy, iterations)
DT
: input data.table
type
: one of 'daily', 'step' or 'trajectory' id
: Character string of ID column namedatetime
: field used for providing date time or time group - see detailssplitBy
: List of fields in DT to split the randomization process byiterations
: The number of iterations to randomizeSee the vignette Using spatsoc in social network analysis for details about the randomizations
function (specifically the section 'Data stream randomization')
(Almost) all functions in spatsoc
use data.table's modify-by-reference to reduce recopying large datasets and improve performance. The exceptions are group_polys(area = TRUE)
and randomizations
with iterations
greater than 1.
Check that your data.table
has columns allocated (with data.table::truelength
) and if not, use data.table::setDT
. This can happen if you are reading your data from RDS
or RData
files. See here.
if (truelength(DT) == 0) {
setDT(DT)
}
# then go to spatsoc
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
Here are some useful code chunks for understanding the spatial and temporal extent of your data and the outputs of spatsoc
functions.
# Number of unique individuals
DT[, uniqueN(ID)]
# Number of unique individuals by timegroup
DT[, uniqueN(ID), by = timegroup]
# Min, max datetime
DT[, range(datetime)]
# Difference between relocations in hours
DT[order(datetime),
.(difHours = as.numeric(difftime(datetime, shift(datetime), units = 'hours'))),
by = ID]
# Difference between relocations in hours
DT[order(datetime),
.(difMins = as.numeric(difftime(datetime, shift(datetime), units = 'mins'))),
by = ID]
Simple spatial extents can be calculated for all individuals or by individual.
# All individuals
DT[, .(minX = min(X),
maxX = max(X),
minY = min(Y),
maxY = max(Y),)]
# By individual
DT[, .(minX = min(X),
maxX = max(X),
minY = min(Y),
maxY = max(Y),),
by = ID]
spatsoc
outputsAfter using the grouping functions, we can determine the number of individuals in a temporal or spatial group.
# Number of unique individuals by timegroup
DT[, uniqueN(ID), by = timegroup]
# Number of unique individuals by group
DT[, uniqueN(ID), by = group]