The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to the msSPChelpR package - from long dataset to SIR analyses

Marian Eberl

26 October 2020

Introduction

This vignette explains how to use the functions:

For some functions there are multiple variants of the same function using varying frameworks. They give the same results but will differ in execution time and memory use:

Theory behind SIRs

In the next version of this vignette the theoretical considerations how SIRs are calculated will be explained in this chapter.

Examples

SEER lung cancer

Step 1 - Long dataset

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
library(msSPChelpR)
#Load synthetic dataset of patients with cancer to demonstrate package functions
data("us_second_cancer")

#This dataset is in long format, so each tumor is a separate row in the data
us_second_cancer
#> # A tibble: 113,999 × 16
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100004        3 SEER Reg … Male  White 1926-01-01 2006-06-15 C34        hist…
#>  4 100004        4 SEER Reg … Male  White 1926-01-01 2018-06-15 C14        DCO …
#>  5 100034        1 SEER Reg … Male  White 1979-01-01 2000-06-15 C50        hist…
#>  6 100037        1 SEER Reg … Fema… White 1938-01-01 1996-01-15 C54        hist…
#>  7 100038        1 SEER Reg … Male  White 1989-01-01 1991-04-15 C50        hist…
#>  8 100038        2 SEER Reg … Male  White 1989-01-01 2000-03-15 C80        hist…
#>  9 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#> 10 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#> # ℹ 113,989 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>

Step 2 - Filter long dataset

#filter for lung cancer
ids <- us_second_cancer %>%
  #detect ids with any lung cancer
  filter(t_site_icd == "C34") %>%
  select(fake_id) %>%
  as.vector() %>%
  unname() %>%
  unlist()

filtered_usdata <- us_second_cancer %>%
  #filter according to above detected ids with any lung cancer diagnosis
  filter(fake_id %in% ids) %>%
  arrange(fake_id)

filtered_usdata
#> # A tibble: 62,661 × 16
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100004        3 SEER Reg … Male  White 1926-01-01 2006-06-15 C34        hist…
#>  4 100004        4 SEER Reg … Male  White 1926-01-01 2018-06-15 C14        DCO …
#>  5 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#>  6 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#>  7 100039        3 SEER Reg … Fema… White 1946-01-01 2018-01-15 C80        hist…
#>  8 100073        1 SEER Reg … Male  White 1960-01-01 1993-11-15 C44        hist…
#>  9 100073        2 SEER Reg … Male  White 1960-01-01 2003-12-15 C34        hist…
#> 10 100143        1 SEER Reg … Male  White 1944-01-01 1992-03-15 C50        hist…
#> # ℹ 62,651 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>

Step 3 - Renumber time_id

renumbered_usdata <- filtered_usdata %>%
  renumber_time_id(new_time_id_var = "t_tumid", 
                   dattype = "seer",
                   case_id_var = "fake_id")

renumbered_usdata %>%
   select(fake_id, sex, t_site_icd, t_datediag, t_tumid)
#> # A tibble: 62,661 × 5
#>    fake_id sex    t_site_icd t_datediag t_tumid
#>    <chr>   <chr>  <chr>      <date>       <int>
#>  1 100004  Male   C50        1992-07-15       1
#>  2 100004  Male   C54        2004-01-15       2
#>  3 100004  Male   C34        2006-06-15       3
#>  4 100004  Male   C14        2018-06-15       4
#>  5 100039  Female C50        2003-08-15       1
#>  6 100039  Female C34        2011-04-15       2
#>  7 100039  Female C80        2018-01-15       3
#>  8 100073  Male   C44        1993-11-15       1
#>  9 100073  Male   C34        2003-12-15       2
#> 10 100143  Male   C50        1992-03-15       1
#> # ℹ 62,651 more rows

Step 4 - Reshape to wide dataset

usdata_wide <- renumbered_usdata %>%
  reshape_wide_tidyr(case_id_var = "fake_id", time_id_var = "t_tumid", timevar_max = 10)

#now the data is in the wide format as required by many package functions. 
#This means, each case is a row and several tumors per case ID are 
#add new columns to the data using the time_id as column name suffix.
usdata_wide
#> # A tibble: 31,997 × 136
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi… Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec… Fema… White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr… Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec… Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec… Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec… Fema… White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec… Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr… Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr… Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr… Fema… White  1956-01-01  2010-07-15  
#> # ℹ 31,987 more rows
#> # ℹ 129 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> #   fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> #   fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> #   sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> #   t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> #   datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …

Step 5 - Recalculate p_spc


usdata_wide <- usdata_wide %>%
  dplyr::mutate(p_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ "No SPC",
                         !is.na(t_site_icd.2)           ~ "SPC developed",
                         TRUE ~ NA_character_)) %>%
  #create the same information as numeric variable count_spc
  dplyr::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ 1,
                            TRUE ~ 0))
usdata_wide %>%
   dplyr::select(fake_id, sex.1, p_spc, count_spc, t_site_icd.1, 
                 t_datediag.1, t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#>    fake_id sex.1  p_spc         count_spc t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <chr>  <chr>             <dbl> <chr>        <date>       <chr>       
#>  1 100004  Male   SPC developed         0 C50          1992-07-15   C54         
#>  2 100039  Female SPC developed         0 C50          2003-08-15   C34         
#>  3 100073  Male   SPC developed         0 C44          1993-11-15   C34         
#>  4 100143  Male   SPC developed         0 C50          1992-03-15   C34         
#>  5 100182  Male   SPC developed         0 C18          1991-09-15   C34         
#>  6 100197  Female SPC developed         0 C34          2012-06-15   C50         
#>  7 100208  Male   No SPC                1 C34          2019-11-15   <NA>        
#>  8 100230  Male   SPC developed         0 C44          1992-11-15   C34         
#>  9 100234  Male   No SPC                1 C34          2010-02-15   <NA>        
#> 10 100266  Female No SPC                1 C34          2010-07-15   <NA>        
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>

Step 6 - Determine patient status at end of FU

usdata_wide <- usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = FALSE, check = TRUE, 
             as_labelled_factor = TRUE)
#> # A tibble: 10 × 3
#>    p_alive.1 p_status                                                          n
#>    <chr>     <fct>                                                         <int>
#>  1 Alive     Patient alive after FC (with or without following SPC after …  5986
#>  2 Alive     Patient alive after SPC                                       11421
#>  3 Alive     NA - Patient not born before end of FU                            4
#>  4 Alive     NA - Patient did not develop cancer before end of FU            873
#>  5 Dead      Patient alive after FC (with or without following SPC after …   909
#>  6 Dead      Patient alive after SPC                                        1294
#>  7 Dead      Patient dead after FC                                          6116
#>  8 Dead      Patient dead after SPC                                         5286
#>  9 Dead      NA - Patient did not develop cancer before end of FU             44
#> 10 Dead      NA - Patient date of death is missing                            64
#> # A tibble: 7 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6895
#> 2 Patient alive after SPC                                                12715
#> 3 Patient dead after FC                                                   6116
#> 4 Patient dead after SPC                                                  5286
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> 7 NA - Patient date of death is missing                                     64

usdata_wide %>%
   dplyr::select(fake_id, p_status, p_alive.1, datedeath.1, t_site_icd.1, t_datediag.1, 
                 t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#>    fake_id p_status p_alive.1 datedeath.1 t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <fct>    <chr>     <date>      <chr>        <date>       <chr>       
#>  1 100004  Patient… Alive     NA          C50          1992-07-15   C54         
#>  2 100039  Patient… Alive     NA          C50          2003-08-15   C34         
#>  3 100073  Patient… Dead      2012-06-01  C44          1993-11-15   C34         
#>  4 100143  Patient… Alive     NA          C50          1992-03-15   C34         
#>  5 100182  Patient… Alive     NA          C18          1991-09-15   C34         
#>  6 100197  Patient… Alive     NA          C34          2012-06-15   C50         
#>  7 100208  NA - Pa… Dead      2019-11-15  C34          2019-11-15   <NA>        
#>  8 100230  Patient… Alive     NA          C44          1992-11-15   C34         
#>  9 100234  Patient… Alive     NA          C34          2010-02-15   <NA>        
#> 10 100266  Patient… Dead      2010-07-15  C34          2010-07-15   <NA>        
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>

#alternatively, you can impute the date of death using lifedatmin_var
usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = TRUE, lifedatmin_var = "p_dodmin.1", 
             check = TRUE, as_labelled_factor = TRUE)
#> # A tibble: 9 × 3
#>   p_alive.1 p_status                                                           n
#>   <chr>     <fct>                                                          <int>
#> 1 Alive     Patient alive after FC (with or without following SPC after e…  5986
#> 2 Alive     Patient alive after SPC                                        11421
#> 3 Alive     NA - Patient not born before end of FU                             4
#> 4 Alive     NA - Patient did not develop cancer before end of FU             873
#> 5 Dead      Patient alive after FC (with or without following SPC after e…   913
#> 6 Dead      Patient alive after SPC                                         1295
#> 7 Dead      Patient dead after FC                                           6138
#> 8 Dead      Patient dead after SPC                                          5323
#> 9 Dead      NA - Patient did not develop cancer before end of FU              44
#> # A tibble: 6 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6899
#> 2 Patient alive after SPC                                                12716
#> 3 Patient dead after FC                                                   6138
#> 4 Patient dead after SPC                                                  5323
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> # A tibble: 31,997 × 139
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi… Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec… Fema… White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr… Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec… Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec… Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec… Fema… White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec… Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr… Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr… Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr… Fema… White  1956-01-01  2010-07-15  
#> # ℹ 31,987 more rows
#> # ℹ 132 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> #   fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> #   fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> #   sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> #   t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> #   datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …

Step 6b - Remove patients irrelevant to analysis depending on status

usdata_wide <- usdata_wide %>%
  dplyr::filter(!p_status %in% c("NA - Patient not born before end of FU",
                                 "NA - Patient did not develop cancer before end of FU",
                                 "NA - Patient date of death is missing"))

usdata_wide %>%
  dplyr::count(p_status)
#> # A tibble: 4 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6895
#> 2 Patient alive after SPC                                                12715
#> 3 Patient dead after FC                                                   6116
#> 4 Patient dead after SPC                                                  5286

Step 7 - Calculate FU time

usdata_wide <- usdata_wide %>%
   calc_futime(., futime_var_new = "p_futimeyrs", fu_end = "2017-12-31",
               dattype = "seer", time_unit = "years", 
               lifedat_var = "datedeath.1", 
               fcdat_var = "t_datediag.1", spcdat_var = "t_datediag.2")
#> # A tibble: 4 × 5
#>   p_status                       mean_futime min_futime max_futime median_futime
#>   <fct>                                <dbl>      <dbl>      <dbl>         <dbl>
#> 1 Patient alive after FC (with …        9.56     0.0438       27.0          8.29
#> 2 Patient alive after SPC               8.70     0            26.9          7.50
#> 3 Patient dead after FC                 8.60     0            25.9          7.54
#> 4 Patient dead after SPC                6.29     0            25.3          5.17

usdata_wide %>%
   dplyr::select(fake_id, p_status, p_futimeyrs, p_alive.1, datedeath.1, t_datediag.1, t_datediag.2)
#> # A tibble: 31,012 × 7
#>    fake_id p_status  p_futimeyrs p_alive.1 datedeath.1 t_datediag.1 t_datediag.2
#>    <chr>   <fct>           <dbl> <chr>     <date>      <date>       <date>      
#>  1 100004  Patient …       11.5  Alive     NA          1992-07-15   2004-01-15  
#>  2 100039  Patient …        7.67 Alive     NA          2003-08-15   2011-04-15  
#>  3 100073  Patient …       10.1  Dead      2012-06-01  1993-11-15   2003-12-15  
#>  4 100143  Patient …        3.33 Alive     NA          1992-03-15   1995-07-15  
#>  5 100182  Patient …        7.08 Alive     NA          1991-09-15   1998-10-15  
#>  6 100197  Patient …        4.83 Alive     NA          2012-06-15   2017-04-15  
#>  7 100230  Patient …       11.0  Alive     NA          1992-11-15   2003-11-15  
#>  8 100234  Patient …        7.87 Alive     NA          2010-02-15   NA          
#>  9 100266  Patient …        0    Dead      2010-07-15  2010-07-15   NA          
#> 10 100274  Patient …        7.38 Dead      2011-06-01  2004-01-15   NA          
#> # ℹ 31,002 more rows

Step 8 - Calculate SIR

sircalc_results <- usdata_wide %>%
  sir_byfutime(
    dattype = "seer",
    ybreak_vars = c("race.1", "t_dco.1"),
    xbreak_var = "none",
    futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
    count_var = "count_spc",
    refrates_df = us_refrates_icd2,
    calc_total_row = TRUE,
    calc_total_fu = TRUE,
    region_var = "registry.1",
    age_var = "fc_agegroup.1",
    sex_var = "sex.1",
    year_var = "t_yeardiag.1",
    race_var = "race.1",
    site_var = "t_site_icd.1", #using grouping by second cancer incidence
    futime_var = "p_futimeyrs",
    alpha = 0.05)
#> 
Calculating SIR ■■■■■■                            18% | ETA:  5s

Calculating SIR ■■■■■■■■                          23% | ETA:  5s

Calculating SIR ■■■■■■■■■                         27% | ETA:  4s

Calculating SIR ■■■■■■■■■■■                       32% | ETA:  4s

Calculating SIR ■■■■■■■■■■■■                      36% | ETA:  4s

Calculating SIR ■■■■■■■■■■■■■                     41% | ETA:  4s

Calculating SIR ■■■■■■■■■■■■■■■                   45% | ETA:  3s

Calculating SIR ■■■■■■■■■■■■■■■■                  50% | ETA:  3s

Calculating SIR ■■■■■■■■■■■■■■■■■                 55% | ETA:  3s

Calculating SIR ■■■■■■■■■■■■■■■■■■■               59% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■              64% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■             68% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■           73% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■          77% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■        82% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■       86% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      91% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■    95% | ETA:  0s

                                                                 
[INFO Cases 0 PYARs] There are conflicts where strata with 0 follow-up time have data in observed.
#> ℹ 30 strata are affected.
#>  - This might be caused by cases where SPC occured at the same day as first cancer.
#>  - You can check this by excluding all cases from wide_df, where date of first diagnosis is equal.
#> ! Check attribute `problems_not_empty` of results to see what strata are affected.
#>  [INFO Unexpected Cases] There are observed cases in the results file that do not occur in the refrates_df.
#> ℹ 2665 strata are affected.
#> A possible explanation can be:
#>  - DCO cases or
#>  - diagnosis of second cancer occured in different time period than first cancer
#> ! Check attribute `notes_refcases` of results to see what strata are affected.
#> 

sircalc_results %>% print(n = 100)
#> # A tidytable: 421,430 × 22
#>     age    region sex   race  year  yvar_name yvar_label fu_time t_site observed
#>     <chr>  <chr>  <chr> <chr> <chr> <chr>     <chr>      <chr>   <chr>     <dbl>
#>   1 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C14           0
#>   2 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C18           0
#>   3 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C34           0
#>   4 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C44           0
#>   5 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C50           0
#>   6 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C54           0
#>   7 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C64           0
#>   8 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C80           0
#>   9 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C14           0
#>  10 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C18           0
#>  11 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C34           0
#>  12 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C44           0
#>  13 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C50           0
#>  14 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C54           0
#>  15 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C64           0
#>  16 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C80           0
#>  17 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C14           0
#>  18 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C18           0
#>  19 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C34           0
#>  20 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C44           0
#>  21 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C50           0
#>  22 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C54           0
#>  23 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C64           0
#>  24 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C80           0
#>  25 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C14           0
#>  26 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C18           0
#>  27 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C34           0
#>  28 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C44           0
#>  29 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C50           0
#>  30 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C54           0
#>  31 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C64           0
#>  32 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C80           0
#>  33 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C14           0
#>  34 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C18           0
#>  35 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C34           0
#>  36 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C44           0
#>  37 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C50           0
#>  38 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C54           0
#>  39 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C64           0
#>  40 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C80           0
#>  41 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C14           0
#>  42 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C18           0
#>  43 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C34           1
#>  44 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C44           0
#>  45 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C50           0
#>  46 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C54           0
#>  47 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C64           0
#>  48 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C80           0
#>  49 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C14           0
#>  50 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C18           0
#>  51 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C34           1
#>  52 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C44           0
#>  53 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C50           0
#>  54 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C54           0
#>  55 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C64           0
#>  56 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C80           0
#>  57 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C14           0
#>  58 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C18           0
#>  59 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C34           0
#>  60 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C44           0
#>  61 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C50           0
#>  62 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C54           0
#>  63 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C64           0
#>  64 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C80           0
#>  65 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C14           0
#>  66 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C18           0
#>  67 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C34           0
#>  68 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C44           0
#>  69 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C50           0
#>  70 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C54           0
#>  71 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C64           0
#>  72 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C80           0
#>  73 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C14           0
#>  74 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C18           0
#>  75 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C34           0
#>  76 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C44           0
#>  77 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C50           0
#>  78 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C54           0
#>  79 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C64           0
#>  80 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C80           0
#>  81 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C14           0
#>  82 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C18           0
#>  83 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C34           0
#>  84 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C44           0
#>  85 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C50           0
#>  86 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C54           0
#>  87 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C64           0
#>  88 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C80           0
#>  89 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C14           0
#>  90 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C18           0
#>  91 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C34           0
#>  92 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C44           0
#>  93 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C50           0
#>  94 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C54           0
#>  95 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C64           0
#>  96 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C80           0
#>  97 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C14           0
#>  98 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C18           0
#>  99 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C34           1
#> 100 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C44           0
#> # ℹ 421,330 more rows
#> # ℹ 12 more variables: expected <dbl>, sir <dbl>, sir_lci <dbl>, sir_uci <dbl>,
#> #   pyar <dbl>, n_base <dbl>, ref_inc_cases <dbl>, ref_population_pyar <dbl>,
#> #   ref_inc_crude_rate <dbl>, fu_time_sort <int>, yvar_sort <int>,
#> #   warning <chr>

Step 9 - Summarize SIR results

#The summarize function is versatile. Here for example the summary with minimal output

sircalc_results %>%
  #summarize results across region, age, year and t_site
  summarize_sir_results(.,
                        summarize_groups = c("region", "age", "year", "race"),
                        summarize_site = TRUE,
                        output = "long",  output_information = "minimal",
                        add_total_row = "only",  add_total_fu = "no",
                        collapse_ci = FALSE,  shorten_total_cols = TRUE,
                        fubreak_var_name = "fu_time", ybreak_var_name = "yvar_name",
                        xbreak_var_name = "none", site_var_name = "t_site",
                        alpha = 0.05
                        ) %>%
  dplyr::select(-region, -age, -year, -race, -sex, -yvar_name)
#> Warning: The results file `sir_df` contains observed cases in i_observed that do not occur in the refrates_df (ref_inc_cases).
#> Therefore calculation of the variables n_base and ref_population_pyar is ambiguous.
#> We take the first value of each variable. Expect small inconsistencies in the calculation of n_base, ref_population_pyar and ref_inc_crude_rate across strata.
#> ! If you want to know more, please check the `warnings` column of `sir_df`.
#> # A tidytable: 7 × 8
#>   yvar_label fu_time          fu_time_sort t_site observed expected   sir sir_ci
#>   <chr>      <chr>                   <int> <chr>     <dbl>    <dbl> <dbl> <chr> 
#> 1 Overall    to 1 month                  1 Total       306     20.6 14.9  13.25…
#> 2 Overall    0.0833-0.167 ye…            2 Total        74     20.4  3.62 2.84 …
#> 3 Overall    0.167-1 years               3 Total       717    196.   3.65 3.39 …
#> 4 Overall    1-5 years                   4 Total      2995    760.   3.94 3.8 -…
#> 5 Overall    5-10 years                  5 Total      3113    605.   5.14 4.96 …
#> 6 Overall    10+ years                   6 Total      4254    502.   8.47 8.22 …
#> 7 Overall    Total 0 to Inf …            7 Total     11459   2105.   5.44 5.34 …

Built with

sessionInfo()
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: Europe/Berlin
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] msSPChelpR_0.9.1 magrittr_2.0.3   dplyr_1.1.4     
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_1.8.8     compiler_4.3.2     tidyselect_1.2.0   stringr_1.5.1     
#>  [5] tidytable_0.10.2   tidyr_1.3.0        jquerylib_0.1.4    yaml_2.3.8        
#>  [9] fastmap_1.1.1      R6_2.5.1           generics_0.1.3     sjlabelled_1.2.0  
#> [13] knitr_1.45         forcats_1.0.0      tibble_3.2.1       insight_0.19.7    
#> [17] lubridate_1.9.3    bslib_0.6.1        pillar_1.9.0       rlang_1.1.3       
#> [21] utf8_1.2.4         stringi_1.8.3      cachem_1.0.8       xfun_0.41         
#> [25] sass_0.4.8         timechange_0.2.0   cli_3.6.2          withr_3.0.0       
#> [29] digest_0.6.34      rstudioapi_0.15.0  haven_2.5.4        hms_1.1.3         
#> [33] lifecycle_1.0.4    vctrs_0.6.5        data.table_1.14.10 evaluate_0.23     
#> [37] glue_1.7.0         fansi_1.0.6        rmarkdown_2.25     purrr_1.0.2       
#> [41] tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.7

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.