Surveillance workflow

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

This vignette demonstrates a complete end-to-end surveillance analysis: from raw count data to actionable outputs. The workflow mirrors what a public health genomics team would run weekly.

Step 1: Load and prepare data

library(lineagefreq)

data(sarscov2_us_2022)
x <- lfq_data(sarscov2_us_2022,
              lineage = variant,
              date    = date,
              count   = count,
              total   = total)

Step 2: Collapse rare lineages

Real surveillance data often contains dozens of low-frequency lineages. collapse_lineages() merges those below a threshold into an “Other” category.

x_clean <- collapse_lineages(x, min_freq = 0.02)
#> Collapsing 1 rare lineage into "Other".
attr(x_clean, "lineages")
#> [1] "BA.1"   "BA.2"   "BA.4/5" "Other"

Step 3: Fit model

fit <- fit_model(x_clean, engine = "mlr")
summary(fit)
#> Lineage Frequency Model Summary
#> ================================
#> Engine:       mlr 
#> Pivot:        BA.1 
#> Lineages:     4 
#> Time points:  40 
#> Total seqs:   461424 
#> Parameters:   6 
#> Log-lik:      -455671 
#> AIC:          911355 
#> BIC:          911365 
#> 
#> Growth rates (per 7 days):
#> # A tibble: 4 × 6
#>   lineage estimate lower upper type        pivot
#>   <chr>      <dbl> <dbl> <dbl> <chr>       <chr>
#> 1 BA.1       0     0     0     growth_rate BA.1 
#> 2 BA.2       0.228 0.226 0.229 growth_rate BA.1 
#> 3 BA.4/5     0.395 0.393 0.397 growth_rate BA.1 
#> 4 Other      0.153 0.152 0.154 growth_rate BA.1

Step 4: Growth advantages

ga <- growth_advantage(fit,
                       type = "relative_Rt",
                       generation_time = 5)
ga
#> # A tibble: 4 × 6
#>   lineage estimate lower upper type        pivot
#>   <chr>      <dbl> <dbl> <dbl> <chr>       <chr>
#> 1 BA.1        1     1     1    relative_Rt BA.1 
#> 2 BA.2        1.18  1.18  1.18 relative_Rt BA.1 
#> 3 BA.4/5      1.33  1.32  1.33 relative_Rt BA.1 
#> 4 Other       1.12  1.11  1.12 relative_Rt BA.1

autoplot(fit, type = "advantage", generation_time = 5)

Step 5: Identify emerging lineages

emerging <- summarize_emerging(x_clean)
emerging[emerging$significant, ]
#> # A tibble: 3 × 10
#>   lineage first_seen last_seen  n_timepoints current_freq growth_rate p_value
#>   <chr>   <date>     <date>            <int>        <dbl>       <dbl>   <dbl>
#> 1 BA.2    2022-01-08 2022-10-08           40       0.156      0.00476       0
#> 2 BA.4/5  2022-01-08 2022-10-08           40       0.797      0.0293        0
#> 3 Other   2022-01-08 2022-10-08           40       0.0469    -0.00442       0
#> # ℹ 3 more variables: p_adjusted <dbl>, significant <lgl>, direction <chr>

Step 6: Forecast

fc <- forecast(fit, horizon = 28)
autoplot(fc)
#> Warning in ggplot2::scale_x_date(date_labels = "%Y-%m-%d"): A <numeric> value was passed to a Date scale.
#> ℹ The value was converted to a <Date> object.

Step 7: Assess sequencing needs

How many sequences are needed to reliably detect a lineage at 2% frequency?

sequencing_power(
  target_precision = 0.05,
  current_freq = c(0.01, 0.02, 0.05, 0.10)
)
#> # A tibble: 4 × 4
#>   current_freq target_precision required_n ci_level
#>          <dbl>            <dbl>      <dbl>    <dbl>
#> 1         0.01             0.05         16     0.95
#> 2         0.02             0.05         31     0.95
#> 3         0.05             0.05         73     0.95
#> 4         0.1              0.05        139     0.95

Step 8: Extract tidy results

All results are compatible with the broom ecosystem.

tidy.lfq_fit(fit)
#> # A tibble: 6 × 6
#>   lineage term        estimate std.error conf.low conf.high
#>   <chr>   <chr>          <dbl>     <dbl>    <dbl>     <dbl>
#> 1 BA.2    intercept     -2.97   0.0109     -2.99     -2.95 
#> 2 BA.2    growth_rate    0.228  0.000750    0.226     0.229
#> 3 BA.4/5  intercept     -7.88   0.0238     -7.93     -7.83 
#> 4 BA.4/5  growth_rate    0.395  0.00103     0.393     0.397
#> 5 Other   intercept     -1.53   0.00823    -1.55     -1.52 
#> 6 Other   growth_rate    0.153  0.000677    0.152     0.154
glance.lfq_fit(fit)
#> # A tibble: 1 × 10
#>   engine n_lineages n_timepoints   nobs    df   logLik     AIC     BIC pivot
#>   <chr>       <int>        <int>  <int> <int>    <dbl>   <dbl>   <dbl> <chr>
#> 1 mlr             4           40 461424     6 -455671. 911355. 911365. BA.1 
#> # ℹ 1 more variable: convergence <int>

Summary

A typical weekly workflow:

lfq_data() — ingest new counts
collapse_lineages() — clean up rare lineages
fit_model() — estimate dynamics
summarize_emerging() — flag growing lineages
forecast() — project forward 4 weeks
autoplot() — generate report figures

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.