Model repeated cross-sectional data

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Age-time varying model

To monitor changes in a population’s seroprevalence over time, modelers often conduct multiple cross-sectional surveys at different time points, each using a new representative sample. The resulting data are known as repeated cross-sectional data.

Proposed approach

To model repeated cross-sectional serological data, serosv offers age_time_model() function which implements the following workflow:

Fit age-specific seroprevalence for each survey period
Monotonize age-specific or birth-cohort-specific prevalence over time
Fit monotonized age-specific seroprevalence for each survey period

Fitting data

The function expects input data with the following columns:

Either age and status (linelisting) or age and pos + tot (aggregated)
A column for date of survey (specified via time_col argument)
A column for id of each survey period (specified via grouping_col argument)

# Prepare data 
tb_nl <- tb_nl_1966_1973 %>% 
  mutate(
    survey_year = age + birthyr,
    survey_time = as.Date(paste0(survey_year, "-01-01"))
  ) %>% select(-birthyr) %>% 
  filter(survey_year > 1966) %>% 
  group_by(age, survey_year, survey_time) %>% 
  summarize(pos = sum(pos), tot = sum(tot), .groups = "drop")

head(tb_nl)
#> # A tibble: 6 × 5
#>     age survey_year survey_time   pos    tot
#>   <int>       <dbl> <date>      <int>  <int>
#> 1     6        1970 1970-01-01    140  40868
#> 2     6        1971 1971-01-01     55  17874
#> 3     6        1972 1972-01-01      4   2163
#> 4     7        1970 1970-01-01    328 105960
#> 5     7        1971 1971-01-01    308  96326
#> 6     7        1972 1972-01-01     11   2807

The monotonization method can be specified via the monotonize_method argument, serosv currently supports 2 options:

Pooled adjacent violators average (monotonize_method = "pava")
Shape constrained additive model (monotonize_method = "scam")

The users can also configure to monotonize either:

Age-specific seroprevalence over time (age_correct = FALSE)
Or birth cohort specific seroprevalence over time (age_correct=TRUE)

out_pava <- tb_nl %>% 
  age_time_model(
    time_col = "survey_time", 
    grouping_col = "survey_year",
    age_correct = F,
    monotonize_method = "pava"
  ) %>% 
  suppressWarnings()

out_scam <- tb_nl %>% 
  age_time_model(
    time_col = "survey_time", 
    grouping_col = "survey_year",
    age_correct = T,
    monotonize_method = "scam"
  ) %>% 
  suppressWarnings()

The output is a data.frame with dimension [number of survey, 9], where each row corresponds to a single survey period. The columns are:

column for id of survey period
df - input data.frame corresponding to that survey period
info - model for the seroprevalence
monotonized_info - model for the monotonized seroprevalence
monotonized_ci_mod - model for the monotonized confidence interval
sp - predicted seroprevalence of the given input data
foi - estimated force of infection from sp
monotonized_sp - predicted monotonized seroprevalence of the given input data
monotonized_foi - estimated force of infection from monotonized_sp

out_pava
#> Age-time varying seroprevalence model 
#> 
#> Input type:  aggregated 
#> Grouping variable:  survey_year 
#> Monotonization method:  pava 
#> Monotonize across:  age group 
#> # A tibble: 7 × 9
#>   survey_year monotonized_info monotonized_ci_mod df       info   sp        
#>         <dbl> <list>           <list>             <list>   <list> <list>    
#> 1        1967 <gam>            <named list [2]>   <tibble> <gam>  <dbl [5]> 
#> 2        1968 <gam>            <named list [2]>   <tibble> <gam>  <dbl [5]> 
#> 3        1969 <gam>            <named list [2]>   <tibble> <gam>  <dbl [6]> 
#> 4        1970 <gam>            <named list [2]>   <tibble> <gam>  <dbl [13]>
#> 5        1971 <gam>            <named list [2]>   <tibble> <gam>  <dbl [8]> 
#> 6        1972 <gam>            <named list [2]>   <tibble> <gam>  <dbl [8]> 
#> 7        1973 <gam>            <named list [2]>   <tibble> <gam>  <dbl [5]> 
#> # ℹ 3 more variables: foi <list>, monotonized_sp <list>, monotonized_foi <list>
out_scam
#> Age-time varying seroprevalence model 
#> 
#> Input type:  aggregated 
#> Grouping variable:  survey_year 
#> Monotonization method:  scam 
#> Monotonize across:  birth cohort 
#> # A tibble: 7 × 9
#>   survey_year monotonized_info monotonized_ci_mod df       info   sp        
#>         <dbl> <list>           <list>             <list>   <list> <list>    
#> 1        1967 <gam>            <named list [2]>   <tibble> <gam>  <dbl [5]> 
#> 2        1968 <gam>            <named list [2]>   <tibble> <gam>  <dbl [5]> 
#> 3        1969 <gam>            <named list [2]>   <tibble> <gam>  <dbl [6]> 
#> 4        1970 <gam>            <named list [2]>   <tibble> <gam>  <dbl [13]>
#> 5        1971 <gam>            <named list [2]>   <tibble> <gam>  <dbl [8]> 
#> 6        1972 <gam>            <named list [2]>   <tibble> <gam>  <dbl [8]> 
#> 7        1973 <gam>            <named list [2]>   <tibble> <gam>  <dbl [5]> 
#> # ℹ 3 more variables: foi <list>, monotonized_sp <list>, monotonized_foi <list>

For visualization, the plot function for age_time_model offers the following configurations

facet whether to visualize result for each survey period separately (facet=TRUE) or on a single plot (facet=FALSE)
modtype choose which model to visualize, the model fitted with input data (modtype="non-monotonized") or monotonized data (modtype="monotonized")

Example: output for model with PAVA for monotonization

plot(out_pava, facet = TRUE, modtype = "non-monotonized") + ylim(c(0, 0.07))

plot(out_pava, facet = TRUE, modtype = "monotonized") + ylim(c(0, 0.07))


plot(out_pava, facet = FALSE, modtype = "monotonized") + ylim(c(0, 0.07))

Example: output for model with SCAM for monotonization

plot(out_scam, facet = TRUE, modtype = "non-monotonized") + ylim(c(0, 0.07))

plot(out_scam, facet = TRUE, modtype = "monotonized") + ylim(c(0, 0.07))

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.