The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette explains how to use the functions:
calc_futime() to calculate follow-up time from index
event until next event, death or end of follow-up datepat_status() to determine patient status at end of
follow-uprenumber_time_id() to calculate a consecutive index of
events per case IDreshape_long() to transpose dataset in wide format to
data in long formatreshape_wide() to transpose dataset in long format to
data in wide format (the wide format is required for many package
functions)sir_byfutime() to calculate standardized incidence
ratios (SIRs) with custom grouping variables stratified by follow-up
timesummarize_sir_results() to summarize detailed SIR
results produced by sir_byfutime()vital_status() to determine vital status whether
patient is alive or dead at end of follow-upFor some functions there are multiple variants of the same function using varying frameworks. They give the same results but will differ in execution time and memory use:
It is recommended to run the following steps in the correct order to obtain accurate follow-up time calculations
Filter all cases in the long version of the dataset that are relevant for your analysis. Make sure that:
case_id the index event (e.g. First Cancer FC)
is still included and is the one remaining row in the dataset with the
smallest case_id (TUMID3 variable for ZfKD
data, and SEQ_NUM for SEER data)case_ids might or might not get a countable
incident event (e.g. Second Primary Cancer SPC). This event should be
the second entry per case_id (second smallest
case_id) if it is to be countedcount_var should indicate
whether the countable incident event (SPC) has occurred or not. Coded
0 for non-occurrence (or not counted event) and
1 for a counted incident event.Renumber filtered long dataset: In
the filter long dataset, you should run the helper function
msSPChelpR::renumber_time_id_dt() (or non-data.table
variant msSPChelpR::renumber_time_id()) that will renumber
all events per case_id and (if step 1 is fulfilled) will
assign each index event with time_var_new = 1 and each
second (possibly countable incident event) with
time_var_new = 2. Any SIR related function will only count
the second event, if additionally to time_var_new = 2 for
this row also count_var = 1 is true.
Reshape dataset: Run
msSPChelpR::reshape_wide_dt() or non-data.table-variant
msSPChelpR::reshape_wide(), so that dataset is transposed
to wide format (1 row per case_id, creating variables such
as count_var.2).
Set flag for Second Primary Cancer
diagnosis: After filtering and reshaping it is essential to set
p_spc again. This variable will be used by later steps of
the analysis.
Determine patient status at a
defined end of follow-up by using the
msSPChelpR::pat_status() function. This date for end of
follow-up must:
be in “YYYY-MM-DD” format and is always defined via the
fu_end = parameter
must precede the end of data collection. E.g. if the last
incident events for the dataset you are using are collected at the end
of 2014, your fu_end must be
fu_end = "2014-12-15" or earlier.
Based on the newly calculated patient status, you might want to exclude cases for which patient status cannot be determined
msSPChelpR::calc_futime()
function and the same fu_end as for step 6. By standard all
functions of the msSPChelpR package require follow-up times
as numeric years.In order to calculate SIR using the package functions, the following
data structure is needed: * Wide format data wide_df with
one row per patient that has encountered the index event (i.e. diagnosed
with a first primary cancer FC)
wide_df needs to contain the following
variables (columns) per patient (row):
region_var - variable in df that contains information
on region where case was incident.agegroup_var - variable in df that contains information
on age-group.sex_var - variable in df that contains information on
biological sex.year_var - variable in df that contains information on
year or year-period when case was incident.site_var - variable in df that contains information on
case (count event) diagnosis. Cases are usually the second cancers.
Diagnoses can use any coding system (e.g. ICD) but coding system between
dataset and reference data must be coherent.futime_var - variable in df that contains follow-up
time per person between date of first cancer and any of death, date of
event (case), end of FU date (in years; whatever event comes first). In
case you have not calculated the FU time yet, you can use the workflow
described in the previous chapter.If your data has the required structure, you can calculate and summarize SIRs with the following two steps:
msSPChelpR::sir_byfutime() function. For this calculation
usually a reference dataset is required that defines the population
standard rates. refrates_df must use the same category
coding of age, sex, region, year and cancer_site as
agegroup_var, sex_var,
region_var, year_var and
site_varmsSPChelpR::summarize_sir_results() function on the
stratified sir results produced by the previous step.In the next version of this vignette the theoretical considerations how SIRs are calculated will be explained in this chapter.
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
library(msSPChelpR)
#Load synthetic dataset of patients with cancer to demonstrate package functions
data("us_second_cancer")
#This dataset is in long format, so each tumor is a separate row in the data
us_second_cancer
#> # A tibble: 113,999 × 16
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100004        3 SEER Reg … Male  White 1926-01-01 2006-06-15 C34        hist…
#>  4 100004        4 SEER Reg … Male  White 1926-01-01 2018-06-15 C14        DCO …
#>  5 100034        1 SEER Reg … Male  White 1979-01-01 2000-06-15 C50        hist…
#>  6 100037        1 SEER Reg … Fema… White 1938-01-01 1996-01-15 C54        hist…
#>  7 100038        1 SEER Reg … Male  White 1989-01-01 1991-04-15 C50        hist…
#>  8 100038        2 SEER Reg … Male  White 1989-01-01 2000-03-15 C80        hist…
#>  9 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#> 10 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#> # ℹ 113,989 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>#filter for lung cancer
ids <- us_second_cancer %>%
  #detect ids with any lung cancer
  filter(t_site_icd == "C34") %>%
  select(fake_id) %>%
  as.vector() %>%
  unname() %>%
  unlist()
filtered_usdata <- us_second_cancer %>%
  #filter according to above detected ids with any lung cancer diagnosis
  filter(fake_id %in% ids) %>%
  arrange(fake_id)
filtered_usdata
#> # A tibble: 62,661 × 16
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100004        3 SEER Reg … Male  White 1926-01-01 2006-06-15 C34        hist…
#>  4 100004        4 SEER Reg … Male  White 1926-01-01 2018-06-15 C14        DCO …
#>  5 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#>  6 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#>  7 100039        3 SEER Reg … Fema… White 1946-01-01 2018-01-15 C80        hist…
#>  8 100073        1 SEER Reg … Male  White 1960-01-01 1993-11-15 C44        hist…
#>  9 100073        2 SEER Reg … Male  White 1960-01-01 2003-12-15 C34        hist…
#> 10 100143        1 SEER Reg … Male  White 1944-01-01 1992-03-15 C50        hist…
#> # ℹ 62,651 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>time_idrenumbered_usdata <- filtered_usdata %>%
  renumber_time_id(new_time_id_var = "t_tumid", 
                   dattype = "seer",
                   case_id_var = "fake_id")
renumbered_usdata %>%
   select(fake_id, sex, t_site_icd, t_datediag, t_tumid)
#> # A tibble: 62,661 × 5
#>    fake_id sex    t_site_icd t_datediag t_tumid
#>    <chr>   <chr>  <chr>      <date>       <int>
#>  1 100004  Male   C50        1992-07-15       1
#>  2 100004  Male   C54        2004-01-15       2
#>  3 100004  Male   C34        2006-06-15       3
#>  4 100004  Male   C14        2018-06-15       4
#>  5 100039  Female C50        2003-08-15       1
#>  6 100039  Female C34        2011-04-15       2
#>  7 100039  Female C80        2018-01-15       3
#>  8 100073  Male   C44        1993-11-15       1
#>  9 100073  Male   C34        2003-12-15       2
#> 10 100143  Male   C50        1992-03-15       1
#> # ℹ 62,651 more rowsusdata_wide <- renumbered_usdata %>%
  reshape_wide_tidyr(case_id_var = "fake_id", time_id_var = "t_tumid", timevar_max = 10)
#now the data is in the wide format as required by many package functions. 
#This means, each case is a row and several tumors per case ID are 
#add new columns to the data using the time_id as column name suffix.
usdata_wide
#> # A tibble: 31,997 × 136
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi… Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec… Fema… White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr… Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec… Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec… Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec… Fema… White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec… Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr… Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr… Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr… Fema… White  1956-01-01  2010-07-15  
#> # ℹ 31,987 more rows
#> # ℹ 129 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> #   fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> #   fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> #   sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> #   t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> #   datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …p_spc
usdata_wide <- usdata_wide %>%
  dplyr::mutate(p_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ "No SPC",
                         !is.na(t_site_icd.2)           ~ "SPC developed",
                         TRUE ~ NA_character_)) %>%
  #create the same information as numeric variable count_spc
  dplyr::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ 1,
                            TRUE ~ 0))
usdata_wide %>%
   dplyr::select(fake_id, sex.1, p_spc, count_spc, t_site_icd.1, 
                 t_datediag.1, t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#>    fake_id sex.1  p_spc         count_spc t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <chr>  <chr>             <dbl> <chr>        <date>       <chr>       
#>  1 100004  Male   SPC developed         0 C50          1992-07-15   C54         
#>  2 100039  Female SPC developed         0 C50          2003-08-15   C34         
#>  3 100073  Male   SPC developed         0 C44          1993-11-15   C34         
#>  4 100143  Male   SPC developed         0 C50          1992-03-15   C34         
#>  5 100182  Male   SPC developed         0 C18          1991-09-15   C34         
#>  6 100197  Female SPC developed         0 C34          2012-06-15   C50         
#>  7 100208  Male   No SPC                1 C34          2019-11-15   <NA>        
#>  8 100230  Male   SPC developed         0 C44          1992-11-15   C34         
#>  9 100234  Male   No SPC                1 C34          2010-02-15   <NA>        
#> 10 100266  Female No SPC                1 C34          2010-07-15   <NA>        
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>usdata_wide <- usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = FALSE, check = TRUE, 
             as_labelled_factor = TRUE)
#> # A tibble: 10 × 3
#>    p_alive.1 p_status                                                          n
#>    <chr>     <fct>                                                         <int>
#>  1 Alive     Patient alive after FC (with or without following SPC after …  5986
#>  2 Alive     Patient alive after SPC                                       11421
#>  3 Alive     NA - Patient not born before end of FU                            4
#>  4 Alive     NA - Patient did not develop cancer before end of FU            873
#>  5 Dead      Patient alive after FC (with or without following SPC after …   909
#>  6 Dead      Patient alive after SPC                                        1294
#>  7 Dead      Patient dead after FC                                          6116
#>  8 Dead      Patient dead after SPC                                         5286
#>  9 Dead      NA - Patient did not develop cancer before end of FU             44
#> 10 Dead      NA - Patient date of death is missing                            64
#> # A tibble: 7 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6895
#> 2 Patient alive after SPC                                                12715
#> 3 Patient dead after FC                                                   6116
#> 4 Patient dead after SPC                                                  5286
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> 7 NA - Patient date of death is missing                                     64
usdata_wide %>%
   dplyr::select(fake_id, p_status, p_alive.1, datedeath.1, t_site_icd.1, t_datediag.1, 
                 t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#>    fake_id p_status p_alive.1 datedeath.1 t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <fct>    <chr>     <date>      <chr>        <date>       <chr>       
#>  1 100004  Patient… Alive     NA          C50          1992-07-15   C54         
#>  2 100039  Patient… Alive     NA          C50          2003-08-15   C34         
#>  3 100073  Patient… Dead      2012-06-01  C44          1993-11-15   C34         
#>  4 100143  Patient… Alive     NA          C50          1992-03-15   C34         
#>  5 100182  Patient… Alive     NA          C18          1991-09-15   C34         
#>  6 100197  Patient… Alive     NA          C34          2012-06-15   C50         
#>  7 100208  NA - Pa… Dead      2019-11-15  C34          2019-11-15   <NA>        
#>  8 100230  Patient… Alive     NA          C44          1992-11-15   C34         
#>  9 100234  Patient… Alive     NA          C34          2010-02-15   <NA>        
#> 10 100266  Patient… Dead      2010-07-15  C34          2010-07-15   <NA>        
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>
#alternatively, you can impute the date of death using lifedatmin_var
usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = TRUE, lifedatmin_var = "p_dodmin.1", 
             check = TRUE, as_labelled_factor = TRUE)
#> # A tibble: 9 × 3
#>   p_alive.1 p_status                                                           n
#>   <chr>     <fct>                                                          <int>
#> 1 Alive     Patient alive after FC (with or without following SPC after e…  5986
#> 2 Alive     Patient alive after SPC                                        11421
#> 3 Alive     NA - Patient not born before end of FU                             4
#> 4 Alive     NA - Patient did not develop cancer before end of FU             873
#> 5 Dead      Patient alive after FC (with or without following SPC after e…   913
#> 6 Dead      Patient alive after SPC                                         1295
#> 7 Dead      Patient dead after FC                                           6138
#> 8 Dead      Patient dead after SPC                                          5323
#> 9 Dead      NA - Patient did not develop cancer before end of FU              44
#> # A tibble: 6 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6899
#> 2 Patient alive after SPC                                                12716
#> 3 Patient dead after FC                                                   6138
#> 4 Patient dead after SPC                                                  5323
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> # A tibble: 31,997 × 139
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi… Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec… Fema… White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr… Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec… Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec… Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec… Fema… White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec… Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr… Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr… Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr… Fema… White  1956-01-01  2010-07-15  
#> # ℹ 31,987 more rows
#> # ℹ 132 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> #   fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> #   fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> #   sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> #   t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> #   datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …usdata_wide <- usdata_wide %>%
  dplyr::filter(!p_status %in% c("NA - Patient not born before end of FU",
                                 "NA - Patient did not develop cancer before end of FU",
                                 "NA - Patient date of death is missing"))
usdata_wide %>%
  dplyr::count(p_status)
#> # A tibble: 4 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6895
#> 2 Patient alive after SPC                                                12715
#> 3 Patient dead after FC                                                   6116
#> 4 Patient dead after SPC                                                  5286usdata_wide <- usdata_wide %>%
   calc_futime(., futime_var_new = "p_futimeyrs", fu_end = "2017-12-31",
               dattype = "seer", time_unit = "years", 
               lifedat_var = "datedeath.1", 
               fcdat_var = "t_datediag.1", spcdat_var = "t_datediag.2")
#> # A tibble: 4 × 5
#>   p_status                       mean_futime min_futime max_futime median_futime
#>   <fct>                                <dbl>      <dbl>      <dbl>         <dbl>
#> 1 Patient alive after FC (with …        9.56     0.0438       27.0          8.29
#> 2 Patient alive after SPC               8.70     0            26.9          7.50
#> 3 Patient dead after FC                 8.60     0            25.9          7.54
#> 4 Patient dead after SPC                6.29     0            25.3          5.17
usdata_wide %>%
   dplyr::select(fake_id, p_status, p_futimeyrs, p_alive.1, datedeath.1, t_datediag.1, t_datediag.2)
#> # A tibble: 31,012 × 7
#>    fake_id p_status  p_futimeyrs p_alive.1 datedeath.1 t_datediag.1 t_datediag.2
#>    <chr>   <fct>           <dbl> <chr>     <date>      <date>       <date>      
#>  1 100004  Patient …       11.5  Alive     NA          1992-07-15   2004-01-15  
#>  2 100039  Patient …        7.67 Alive     NA          2003-08-15   2011-04-15  
#>  3 100073  Patient …       10.1  Dead      2012-06-01  1993-11-15   2003-12-15  
#>  4 100143  Patient …        3.33 Alive     NA          1992-03-15   1995-07-15  
#>  5 100182  Patient …        7.08 Alive     NA          1991-09-15   1998-10-15  
#>  6 100197  Patient …        4.83 Alive     NA          2012-06-15   2017-04-15  
#>  7 100230  Patient …       11.0  Alive     NA          1992-11-15   2003-11-15  
#>  8 100234  Patient …        7.87 Alive     NA          2010-02-15   NA          
#>  9 100266  Patient …        0    Dead      2010-07-15  2010-07-15   NA          
#> 10 100274  Patient …        7.38 Dead      2011-06-01  2004-01-15   NA          
#> # ℹ 31,002 more rowssircalc_results <- usdata_wide %>%
  sir_byfutime(
    dattype = "seer",
    ybreak_vars = c("race.1", "t_dco.1"),
    xbreak_var = "none",
    futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
    count_var = "count_spc",
    refrates_df = us_refrates_icd2,
    calc_total_row = TRUE,
    calc_total_fu = TRUE,
    region_var = "registry.1",
    age_var = "fc_agegroup.1",
    sex_var = "sex.1",
    year_var = "t_yeardiag.1",
    race_var = "race.1",
    site_var = "t_site_icd.1", #using grouping by second cancer incidence
    futime_var = "p_futimeyrs",
    alpha = 0.05)
#> 
Calculating SIR ■■■■■■                            18% | ETA:  5s
Calculating SIR ■■■■■■■■                          23% | ETA:  5s
Calculating SIR ■■■■■■■■■                         27% | ETA:  4s
Calculating SIR ■■■■■■■■■■■                       32% | ETA:  4s
Calculating SIR ■■■■■■■■■■■■                      36% | ETA:  4s
Calculating SIR ■■■■■■■■■■■■■                     41% | ETA:  4s
Calculating SIR ■■■■■■■■■■■■■■■                   45% | ETA:  3s
Calculating SIR ■■■■■■■■■■■■■■■■                  50% | ETA:  3s
Calculating SIR ■■■■■■■■■■■■■■■■■                 55% | ETA:  3s
Calculating SIR ■■■■■■■■■■■■■■■■■■■               59% | ETA:  2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■              64% | ETA:  2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■             68% | ETA:  2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■           73% | ETA:  2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■          77% | ETA:  1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■        82% | ETA:  1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■       86% | ETA:  1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      91% | ETA:  1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■    95% | ETA:  0s
                                                                 
[INFO Cases 0 PYARs] There are conflicts where strata with 0 follow-up time have data in observed.
#> ℹ 30 strata are affected.
#>  - This might be caused by cases where SPC occured at the same day as first cancer.
#>  - You can check this by excluding all cases from wide_df, where date of first diagnosis is equal.
#> ! Check attribute `problems_not_empty` of results to see what strata are affected.
#>  [INFO Unexpected Cases] There are observed cases in the results file that do not occur in the refrates_df.
#> ℹ 2665 strata are affected.
#> A possible explanation can be:
#>  - DCO cases or
#>  - diagnosis of second cancer occured in different time period than first cancer
#> ! Check attribute `notes_refcases` of results to see what strata are affected.
#> 
sircalc_results %>% print(n = 100)
#> # A tidytable: 421,430 × 22
#>     age    region sex   race  year  yvar_name yvar_label fu_time t_site observed
#>     <chr>  <chr>  <chr> <chr> <chr> <chr>     <chr>      <chr>   <chr>     <dbl>
#>   1 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C14           0
#>   2 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C18           0
#>   3 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C34           0
#>   4 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C44           0
#>   5 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C50           0
#>   6 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C54           0
#>   7 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C64           0
#>   8 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C80           0
#>   9 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C14           0
#>  10 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C18           0
#>  11 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C34           0
#>  12 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C44           0
#>  13 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C50           0
#>  14 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C54           0
#>  15 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C64           0
#>  16 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C80           0
#>  17 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C14           0
#>  18 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C18           0
#>  19 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C34           0
#>  20 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C44           0
#>  21 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C50           0
#>  22 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C54           0
#>  23 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C64           0
#>  24 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C80           0
#>  25 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C14           0
#>  26 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C18           0
#>  27 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C34           0
#>  28 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C44           0
#>  29 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C50           0
#>  30 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C54           0
#>  31 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C64           0
#>  32 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C80           0
#>  33 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C14           0
#>  34 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C18           0
#>  35 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C34           0
#>  36 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C44           0
#>  37 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C50           0
#>  38 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C54           0
#>  39 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C64           0
#>  40 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C80           0
#>  41 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C14           0
#>  42 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C18           0
#>  43 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C34           1
#>  44 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C44           0
#>  45 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C50           0
#>  46 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C54           0
#>  47 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C64           0
#>  48 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C80           0
#>  49 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C14           0
#>  50 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C18           0
#>  51 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C34           1
#>  52 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C44           0
#>  53 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C50           0
#>  54 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C54           0
#>  55 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C64           0
#>  56 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C80           0
#>  57 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C14           0
#>  58 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C18           0
#>  59 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C34           0
#>  60 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C44           0
#>  61 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C50           0
#>  62 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C54           0
#>  63 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C64           0
#>  64 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C80           0
#>  65 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C14           0
#>  66 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C18           0
#>  67 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C34           0
#>  68 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C44           0
#>  69 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C50           0
#>  70 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C54           0
#>  71 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C64           0
#>  72 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C80           0
#>  73 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C14           0
#>  74 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C18           0
#>  75 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C34           0
#>  76 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C44           0
#>  77 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C50           0
#>  78 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C54           0
#>  79 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C64           0
#>  80 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C80           0
#>  81 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C14           0
#>  82 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C18           0
#>  83 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C34           0
#>  84 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C44           0
#>  85 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C50           0
#>  86 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C54           0
#>  87 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C64           0
#>  88 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C80           0
#>  89 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C14           0
#>  90 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C18           0
#>  91 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C34           0
#>  92 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C44           0
#>  93 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C50           0
#>  94 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C54           0
#>  95 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C64           0
#>  96 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C80           0
#>  97 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C14           0
#>  98 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C18           0
#>  99 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C34           1
#> 100 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C44           0
#> # ℹ 421,330 more rows
#> # ℹ 12 more variables: expected <dbl>, sir <dbl>, sir_lci <dbl>, sir_uci <dbl>,
#> #   pyar <dbl>, n_base <dbl>, ref_inc_cases <dbl>, ref_population_pyar <dbl>,
#> #   ref_inc_crude_rate <dbl>, fu_time_sort <int>, yvar_sort <int>,
#> #   warning <chr>#The summarize function is versatile. Here for example the summary with minimal output
sircalc_results %>%
  #summarize results across region, age, year and t_site
  summarize_sir_results(.,
                        summarize_groups = c("region", "age", "year", "race"),
                        summarize_site = TRUE,
                        output = "long",  output_information = "minimal",
                        add_total_row = "only",  add_total_fu = "no",
                        collapse_ci = FALSE,  shorten_total_cols = TRUE,
                        fubreak_var_name = "fu_time", ybreak_var_name = "yvar_name",
                        xbreak_var_name = "none", site_var_name = "t_site",
                        alpha = 0.05
                        ) %>%
  dplyr::select(-region, -age, -year, -race, -sex, -yvar_name)
#> Warning: The results file `sir_df` contains observed cases in i_observed that do not occur in the refrates_df (ref_inc_cases).
#> Therefore calculation of the variables n_base and ref_population_pyar is ambiguous.
#> We take the first value of each variable. Expect small inconsistencies in the calculation of n_base, ref_population_pyar and ref_inc_crude_rate across strata.
#> ! If you want to know more, please check the `warnings` column of `sir_df`.
#> # A tidytable: 7 × 8
#>   yvar_label fu_time          fu_time_sort t_site observed expected   sir sir_ci
#>   <chr>      <chr>                   <int> <chr>     <dbl>    <dbl> <dbl> <chr> 
#> 1 Overall    to 1 month                  1 Total       306     20.6 14.9  13.25…
#> 2 Overall    0.0833-0.167 ye…            2 Total        74     20.4  3.62 2.84 …
#> 3 Overall    0.167-1 years               3 Total       717    196.   3.65 3.39 …
#> 4 Overall    1-5 years                   4 Total      2995    760.   3.94 3.8 -…
#> 5 Overall    5-10 years                  5 Total      3113    605.   5.14 4.96 …
#> 6 Overall    10+ years                   6 Total      4254    502.   8.47 8.22 …
#> 7 Overall    Total 0 to Inf …            7 Total     11459   2105.   5.44 5.34 …sessionInfo()
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: Europe/Berlin
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] msSPChelpR_0.9.1 magrittr_2.0.3   dplyr_1.1.4     
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_1.8.8     compiler_4.3.2     tidyselect_1.2.0   stringr_1.5.1     
#>  [5] tidytable_0.10.2   tidyr_1.3.0        jquerylib_0.1.4    yaml_2.3.8        
#>  [9] fastmap_1.1.1      R6_2.5.1           generics_0.1.3     sjlabelled_1.2.0  
#> [13] knitr_1.45         forcats_1.0.0      tibble_3.2.1       insight_0.19.7    
#> [17] lubridate_1.9.3    bslib_0.6.1        pillar_1.9.0       rlang_1.1.3       
#> [21] utf8_1.2.4         stringi_1.8.3      cachem_1.0.8       xfun_0.41         
#> [25] sass_0.4.8         timechange_0.2.0   cli_3.6.2          withr_3.0.0       
#> [29] digest_0.6.34      rstudioapi_0.15.0  haven_2.5.4        hms_1.1.3         
#> [33] lifecycle_1.0.4    vctrs_0.6.5        data.table_1.14.10 evaluate_0.23     
#> [37] glue_1.7.0         fansi_1.0.6        rmarkdown_2.25     purrr_1.0.2       
#> [41] tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.7These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.