In this vignette, I use the space-time permutation scan to show making data in R, running SaTScan on the generated data, and collecting the results.
I begin by making data on a 10*10 grid of locations, over 30 days. Each day, each location has a 0.1 probability of having a single case.
set.seed(42)
mygeo = expand.grid(1:10,1:10)
daysbase = 30
locid = rep(1:100, times=daysbase)
basecas = rbinom(3000, 1, .1)
day = rep(1:30, each = 100)
mycas = data.frame(locid,basecas, day)
Here’s what the geo and case files look like. I’m using generic time, for convenience.
head(mygeo)
## Var1 Var2
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
head(mycas)
## locid basecas day
## 1 1 1 1
## 2 2 1 1
## 3 3 0 1
## 4 4 0 1
## 5 5 0 1
## 6 6 0 1
Now I can write the data into the OS; the row names in the mygeo data.frame object are the location IDs for SaTSCan, so I’m using the userownames
option to use, rather than ignore, the row names from R in the geography file; in the case file, there is an explicit column with the same information included.
library("rsatscan")
td = tempdir()
write.geo(mygeo, location = td, file = "mygeo", userownames=TRUE)
write.cas(mycas, location = td, file = "mycas")
Now I’m ready to build the parameter file. This is adapted pretty closely from the NYCfever
example in the rsatscan
vignette.
invisible(ss.options(reset=TRUE))
ss.options(list(CaseFile="mycas.cas", PrecisionCaseTimes=4))
ss.options(list(StartDate="1", CoordinatesType=0, TimeAggregationUnits=4))
ss.options(list(EndDate="30", CoordinatesFile="mygeo.geo", AnalysisType=4, ModelType=2))
ss.options(list(UseDistanceFromCenterOption="y", MaxSpatialSizeInDistanceFromCenter=3))
ss.options(list(NonCompactnessPenalty=0, MaxTemporalSizeInterpretation=1, MaxTemporalSize=7))
ss.options(list(ProspectiveStartDate="30", ReportGiniClusters="n", LogRunToHistoryFile="n"))
Then I write the parameter file into the OS and run SaTScan using it. I’ll peek in the summary cluster table to see what we got.
write.ss.prm(td, "mybase")
# This step omitted in compliance with CRAN policies
# Please install SaTScan and run the vignette with this and following code uncommented
# SaTScan can be downloaded from www.satscan.org, free of charge
# you will also find there fully compiled versions of this vignette with results
# mybase = satscan(td, "mybase")
# mybase$col[3:10]
As one would hope, there’s no evidence of a meaningful cluster.
Now, let’s add a day just like the others. I’ll stick it onto the end of the previous data, then write out a new case file.
newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 31)
newcas = rbind(mycas,newday)
write.cas(newcas, location = td, file = "mycas")
I don’t need to re-assign any parameter values that don’t change between runs. In this case, since I used the same name for the data file, I only need to change the end date of the surveillance period.
ss.options(list(EndDate="31"))
write.ss.prm(td, "day1")
# day1 = satscan(td, "day1")
# day1$col[3:10]
Again, no clusters, as we would expect.
But now let’s make a cluster appear. I create an additional time unit as before, but then select a location to get a heap of extra cases. Glue the new day to the end of the old case file, write it to the OS, change the end date, and re-run SaTScan.
newday = data.frame(locid = 1:100, basecas = rbinom(100,1,.1), day = 32)
newday$basecas[20] =5
newcas = rbind(mycas,newday)
write.cas(newcas, location = td, file = "mycas")
ss.options(list(EndDate="32"))
write.ss.prm(td, "day2")
# day2 = satscan(td,"day2")
# day2$col[3:10]
This demonstrates that I did detect what I inserted. I can also extract the wordier section of the report about this cluster.
# summary(day2)
# cat(day2$main[20:31],fill=1)