Backcalculating Reporting Delays Distributions from Linelist Data

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This R Markdown document walks through the steps for imputing a reporting delay distribution from linelist data with missing disease onset data, adapted in STAN from Li and White

There are two starting points for this vignette: either you have caseCount data, which are aggregated case counts by day, or you have Line-list data means you have a single row for each case, that has dates for: infection, symptom onset, positive test, and when this was reported to public health agencies.

Example: lineList data

library(WhiteLabRt)

Step 1. Load data

data("sample_report_dates")
data("sample_onset_dates")

Step 2. Creating linelist object

my_linelist <- create_linelist(report_dates = sample_report_dates,
                                onset_dates = sample_onset_dates)
#> Warning in create_linelist(report_dates = sample_report_dates, onset_dates =
#> sample_onset_dates): Some onset dates are NA
head(my_linelist)
#>   report_date delay_int onset_date is_weekend report_int week_int
#> 1  2020-01-01         4 2019-12-28          0          1        1
#> 2  2020-01-01         4 2019-12-28          0          1        1
#> 3  2020-01-01        NA       <NA>          0          1        1
#> 4  2020-01-01        NA       <NA>          0          1        1
#> 5  2020-01-01        NA       <NA>          0          1        1
#> 6  2020-01-01         8 2019-12-24          0          1        1

Step 3. Define the serial interval. The si() function creates a vector of length 14 with shape and rate for a gamma distribution. Note, this has a leading 0 to indicate no infections on the day of disease onset.

sip <- si(14, 4.29, 1.18)
plot(sip, type = 'l')

Step 5. Run the back-calculation algorithm. The default is an R(t) sliding window of 7 days. Additional options to STAN can be specified in the last argument (e.g., chains, cores, control).

out_list_demo <- run_backnow(my_linelist, sip = sip, chains = 1)

Plot outputs. The points are aggregated reported cases, and the red line (and shaded confidence interval) represent the back-calculated case onsets that lead to the reported data.

data("out_list_demo")
plot(out_list_demo, "est")

You can also plot the R(t) curve over time. In this case, the red line (and shaded confidence interval) represent the time-varying r(t). See Li and White for description.

data("out_list_demo")
plot(out_list_demo, "rt")

Example: Case Count data

You can also do the same from case count data, although at some point you will have to assume a reporting delay distribution, so this would be a little circular.

Step 1. Load data

data("sample_dates")
data("sample_location")
data("sample_cases")

head(sample_dates)
#> [1] "2020-01-01" "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
#> [6] "2020-01-06"
head(sample_cases)
#> [1] 10  1  4 11  8 10
head(sample_location)
#> [1] "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine" "Tatooine"

Step 2. Creating case-counts

caseCounts <- create_caseCounts(date_vec = sample_dates,
                                location_vec = sample_location,
                                cases_vec = sample_cases)
head(caseCounts)
#>         date cases location
#> 1 2020-01-01    10 Tatooine
#> 2 2020-01-02     1 Tatooine
#> 3 2020-01-03     4 Tatooine
#> 4 2020-01-04    11 Tatooine
#> 5 2020-01-05     8 Tatooine
#> 6 2020-01-06    10 Tatooine

Step 3. Convert to linelist data. You can specify the distribution for my_linelist in convert_to_linelist. reportF is the distribution function, _args lists the distribution params that are not x, and _missP is the percent missing. This must be between \({0 < x < 1}\). Both ‘caseCounts’ and ‘caseCounts_line’ objects can be fed into run_backnow. The implied onset distribution is rnbinom() with size = 3 and mu = 9, with reportF_missP = 0.6 .

my_linelist <- convert_to_linelist(caseCounts, 
                                   reportF = rnbinom, 
                                   reportF_args = list(size = 3, mu = 9),
                                   reportF_missP = 0.6)
head(my_linelist)
#>   report_date delay_int onset_date is_weekend report_int week_int
#> 1  2020-01-01        NA       <NA>          0          1        1
#> 2  2020-01-01        12 2019-12-20          0          1        1
#> 3  2020-01-01        NA       <NA>          0          1        1
#> 4  2020-01-01        NA       <NA>          0          1        1
#> 5  2020-01-01         5 2019-12-27          0          1        1
#> 6  2020-01-01        NA       <NA>          0          1        1

Step 4. Define the serial interval. The si() function creates a vector of length 14 with alpha and beta as defined in Li and White, for COVID-19.

sip <- si(14, 4.29, 1.18)

Step 5. Run the back-calculation algorithm. The defaults are 2000 iterations and an R(t) sliding window of 7 days.

options(mc.cores = 4)

out_list_demo <- run_backnow(my_linelist, sip = sip)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.