In this vignette, we walk through periodogram for movement data and periodic movement model fitting and selection. It is assumed that you are already familiar with data preparation with ctmm, as well as the maximum likelihood procedure described in the variogram
vignette. Our example maned wolf data is already prepared into a telemetry
object.
library(ctmm)
data(wolf)
Gamba <- wolf$Gamba
plot(Gamba)
Before anything else, we want to plot the data in a way that makes periodic patterns apparent. This is the periodogram.
LSP <- periodogram(Gamba,fast=2,res.time=2)
The fast=2
option requests the use of the (much) faster FFT-based algorithm and furthermore samples a highly composite number of times. Set the argument to FALSE to revert to Scargle’s original algorithm, which involves fewer numerical approximations. The res.time
argument increases the resolution of the temporal grid when fast>0
. The algorithm defaults to adequate resolution for regularly scheduled data (permitting gaps), while variable sampling rates require res.time>1
to resolve the fine scale spectrum correctly.
plot(LSP,max=TRUE,diagnostic=TRUE,cex=0.5)
The max=TRUE
option keeps only local maxima and often yields periodograms that are easier to interpret, especially when the resolution of the periodogram has been increased from the default.
Periodicities in the data cause peaks at their respective periods on the horizontal axis. To visually assess significance, these peaks should be compared to the normal variation around the smooth trend of the periodogram. Important natural periods like the stellar day, synodic month and tropical year are labeled on the horizontal axes and given dark vertical lines in the plot. Harmonics of these natural periods have unlabeled tick marks on the horizontal axis and are are given lighter vertical lines in the plot. E.g., the two tick marks under a day represent the second and third harmonic of the day, or (24/2) 12 hours and (24/3) 8 hours, respectively Peaks at the harmonics of a period indicate that fine details of the periodicity are resolved by the data.
The diagnostic=TRUE
option draws the periodogram of the sampling schedule with red symbols. If the periodogram of the sampling schedule exhibits peaks, this indicates that the corresponding peaks in the movement data could be simply caused by irregularities in the sampling schedule and not by periodicity in the movement. Here the periodogram exhibits a clear peak for the period of one day, but this is also present in the (red) diagnostic.
Periodograms are a great data exploration tool, but they do not detect everything. In particular circulation processes, those that induce circulatory patterns via a stochastic rotational effect, are often not visible in periodograms. Periodograms are also difficult to compare from one individual to the next, if the aperiodic stochastic movements of these individuals are not the same.
Circulatory patterns can be incorporated into stochastic continuous time movement models in two ways: via the mean of the process, modelling that the animal reverts to a point that moves periodically through time, or via the stochastic component, by including a rotation effect. The first is called a periodic-mean process, and the second is called a circulation process.
For periodic-mean processes, you need to specify one or more period values that you want ctmm
to consider.
In our example, we are going to specify a period of a day, which is what the periodogram was saying. Other individuals in this population were also exhibiting a weekly periodicity, and in similar species, lunar cycles are well-known to affect ranging behaviors.
PROTO <- ctmm(mean="periodic",period=c(24 %#% "hours",1 %#% "month"),circle=TRUE)
The mean="periodic"
option tells ctmm
that you want a periodic-mean process. The default is mean="stationary"
. The period values are all specified in seconds (SI unit) by the %#%
function (see help("%#%")
for more information).
The circle=TRUE
option tells ctmm that you want to try and fit a model in which there is stochastic circulation on top of the periodicity in the process mean. For circulation processes, you do not need to specify candidate period values. The period of the circulation is estimated from the data.
As with other uses of ctmm, the next step is to generate an initial guess of the parameter values to feed to the likelihood optimization routine. You can do this using your intuition, by visually examining the variogram, or by letting ctmm
do it for you.
SVF <- variogram(Gamba,res=3)
GUESS <- ctmm.guess(Gamba,PROTO,variogram=SVF,interactive=FALSE)
ctmm.guess
is a generalization of variogram.fit
that can estimate other quantites from the data that are not apparent in the variogram, such as the circulation period and location correlations. A variogram argument is not necessary, but here we used the res
option to increase the FFT variogram’s temporal resolution and counteract sampling variability. The interactive
argument works just as with variogram.fit
.
Potentially, the most complex model based on our prototype could have both circulation and multiple harmonics of daily periodicity. In addition, we also do not know whether the velocities are autocorrelated in time (OUF model) and whether the animal moves amounts in different directions (anisotropy). All of this makes for a large number of effects. Forcing all these effects upon data that do not support them would inflate the risk of overfitting or convergence issues. We thereby conduct a model selection.
# ctmm beta optimizer is more reliable here
control <- list(method="pNewton")
# CRAN policy limits to 2 processes (cores)
FITS <- ctmm.select(Gamba,GUESS,verbose=TRUE,control=control,cores=2)
## Nyquist frequency estimated at harmonic 3 88.5917639444444 of the period.
The non-default control
list here specifies our prototype optimization code that has been tested to be faster and more accurate on this example. With verbose=TRUE
, we obtain a list fitted ctmm
objects, one for each relevant combination of effects, where the first element of that list is the preferred model.
summary(FITS)
## ΔAICc ΔRMSPE (m) DOF[area]
## OUF anisotropic harmonic 2 0 0.000000 58.45494 179.5625
## OUF anisotropic harmonic 1 0 20.554301 62.28416 183.3231
## OUF anisotropic harmonic 2 1 5.562484 64.79601 182.0841
## OUF anisotropic harmonic 1 1 25.975607 68.24534 185.9471
## OUF anisotropic harmonic 0 0 123.457393 160.36316 209.7547
## OUF anisotropic harmonic 0 1 128.194996 162.92746 212.5175
## OUF anisotropic harmonic 3 0 -17.064596 2474.48215 178.3531
## OUf anisotropic harmonic 2 0 19.461079 0.00000 240.1619
## OUf anisotropic harmonic 1 0 39.764079 6.02654 244.2230
## OUf anisotropic harmonic 0 0 124.144110 136.71159 235.8036
## OU anisotropic harmonic 2 0 34.417369 62.30380 151.5203
## OU anisotropic harmonic 1 0 51.591502 65.57303 155.5243
## OU anisotropic harmonic 0 0 193.294443 175.99469 169.0918
## OUF isotropic harmonic 2 0 88.832091 232.43221 167.3316
## OUF isotropic harmonic 1 0 112.907159 234.77280 170.1576
## OUF isotropic harmonic 0 0 229.937343 325.15215 200.6788
## OUF anisotropic circulation harmonic 0 0 124.199987 160.28095 209.7402
## OUF isotropic circulation harmonic 0 0 230.906720 325.09471 200.6995
From the first model, We see that the velocity autocorrelation (OUF), anisotropy, and 2
harmonics of daily periodicity were all selected. Given how we specified the prototype, harmonic 2 0
means that this preferred model has no lunar periodicity, but has two harmonics of the one-day periodicity. This fittingly corresponds to what the periodogram was saying. If there was some moon-related pattern of space use and given how we specified our prototype, we would have had a non-zero value as the second harmonic
value.
The sorting of our candidate models is more complex than in previous stationary examples. For a given autocovariance model, the different non-stationary models are sorted by mean square predictive error (MSPE) and not the information criteria. As we will demonstrate, likelihood-based model selection can badly overfit with these types of models. For sorting between the autocovariance models (each with best non-stationary model), the information criteria is used. MSPE is not valid for general purpose selection.
# these are sorted by MSPE
SUB <- grepl("OUF anisotropic harmonic",names(FITS))
summary(FITS[SUB])
## ΔAICc ΔRMSPE (m) DOF[area]
## OUF anisotropic harmonic 2 0 0.000000 0.000000 179.5625
## OUF anisotropic harmonic 1 0 20.554301 3.829219 183.3231
## OUF anisotropic harmonic 2 1 5.562484 6.341069 182.0841
## OUF anisotropic harmonic 1 1 25.975607 9.790403 185.9471
## OUF anisotropic harmonic 0 0 123.457393 101.908224 209.7547
## OUF anisotropic harmonic 0 1 128.194996 104.472526 212.5175
## OUF anisotropic harmonic 3 0 -17.064596 2416.027217 178.3531
# these are sorted by IC
SUB <- gsub('[0-9]+','',names(FITS))
SUB <- sapply(unique(SUB),function(s){which(SUB==s)[1]})
summary(FITS[SUB])
## ΔAICc ΔRMSPE (m) DOF[area]
## OUF anisotropic harmonic 2 0 0.00000 58.45494 179.5625
## OUf anisotropic harmonic 2 0 19.46108 0.00000 240.1619
## OU anisotropic harmonic 2 0 34.41737 62.30380 151.5203
## OUF isotropic harmonic 2 0 88.83209 232.43221 167.3316
## OUF anisotropic circulation harmonic 0 0 124.19999 160.28095 209.7402
## OUF isotropic circulation harmonic 0 0 230.90672 325.09471 200.6995
If only the information criteria was used, we would have selected harmonic 3 0
over harmonic 2 0
.
# sorting by IC only
summary(FITS,MSPE=NA)
## ΔAICc DOF[mean]
## OUF anisotropic harmonic 3 0 0.00000 107.18034
## OUF anisotropic harmonic 2 0 17.06460 106.12039
## OUF anisotropic harmonic 2 1 22.62708 107.90711
## OUf anisotropic harmonic 2 0 36.52568 149.48636
## OUF anisotropic harmonic 1 0 37.61890 108.21424
## OUF anisotropic harmonic 1 1 43.04020 110.06466
## OU anisotropic harmonic 2 0 51.48196 80.57093
## OUf anisotropic harmonic 1 0 56.82867 152.03294
## OU anisotropic harmonic 1 0 68.65610 82.77230
## OUF isotropic harmonic 2 0 105.89669 93.78426
## OUF isotropic harmonic 1 0 129.97175 95.21590
## OUF anisotropic harmonic 0 0 140.52199 132.48116
## OUf anisotropic harmonic 0 0 141.20871 149.62057
## OUF anisotropic circulation harmonic 0 0 141.26458 132.96314
## OUF anisotropic harmonic 0 1 145.25959 134.73607
## OU anisotropic harmonic 0 0 210.35904 90.04711
## OUF isotropic harmonic 0 0 247.00194 120.96682
## OUF isotropic circulation harmonic 0 0 247.97132 121.37711
Next we do some sanity checking on our results. The sampling interval for Gamba is fairly steady at
"hour" %#% stats::median(diff(Gamba$t))
## [1] 4
4 hours, which (ideally) corresponds to a Nyquist period of 8 (2 \(\times\) 4) hours, or 3 (24/8) times per day, or 3 harmonics of the day. The Nyquist period/frequency is an information limit on discretely sampled data. We expect to be able to extract a maximum of 3 harmonics from uniformly sampled data. Therefore, we should limit our consideration to harmonics of the day \(\leq\) 3, while for lower quality data, we might have to limit our consideration even further.
Consistent with these considerations, let us look at harmonics 3 and 2 of the day.
SUB <- rownames(summary(FITS,MSPE=NA))[1]
summary(FITS[[SUB]]) # harmonic 3 0 # selected by IC
## $name
## [1] "OUF anisotropic harmonic 3 0"
##
## $DOF
## mean area speed
## 107.1803439 178.3530902 0.8606009
##
## $CI
## low ML high
## rotation/deviation % 74.795669 91.774630 100.000000
## rotation/speed % 96.807430 98.987421 100.000000
## area (square kilometers) 41.444887 48.269156 55.605890
## τ[position] (hours) 6.984419 9.267908 12.297962
## τ[velocity] (hours) 1.126099 1.617409 2.323076
## speed (kilometers/day) 12.674827 103.001923 204.382858
summary(FITS[[1]]) # harmonic 2 0 # selected by MSPE
## $name
## [1] "OUF anisotropic harmonic 2 0"
##
## $DOF
## mean area speed
## 106.1204 179.5625 139.5356
##
## $CI
## low ML high
## rotation/deviation % 23.6228691 29.019474 35.648925
## rotation/speed % 24.3431627 29.868660 36.648355
## area (square kilometers) 41.1021782 47.844462 55.091282
## τ[position] (hours) 7.2925135 9.568355 12.554438
## τ[velocity] (hours) 0.9699887 1.423753 2.089789
## speed (kilometers/day) 14.7247950 16.056839 17.387434
rotation/deviation %
corresponds to \(100 \eta_P\) from the Péron et al (2017). It is interpreted as the proportion of the variance in the animal`s location that is caused by the periodicity in the mean.rotation/speed %
corresponds to \(100 \eta_V\) from the Péron et al (2017). It is interpreted as the proportion of the variance in the animal`s velocity that is caused by the periodicity in the mean.circulation period
is the period of the stochastic circulations. On average, the animal re-pass through the same neighborhoods every estimated number of months (or days, or hours, depending on the automated unit specification).Note that, aside from the rotational indices, these two models are largely consistent, and the 3 harmonic model has an extrordinarily uncertain speed estimate. We can get a better idea of what is happening by comparing the variograms.
xlim <- c(0,1/2) %#% "month"
plot(SVF,CTMM=FITS[[SUB]],xlim=xlim)
title("3 Harmonics")
plot(SVF,CTMM=FITS[[1]],xlim=xlim)
title("2 Harmonics")
While the confidence bands encompass the empirical variogram, the 3 harmonic model has clearly overfit in attempting to match the Nyquist period (3/day) with less than ideal data.
This is not coded yet!