Introduction to PHEindicatormethods

Georgina Anderson

2023-03-03

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  10 135
#> 2 Area2 2015  38 118
#> 3 Area3 2015 100 118
#> 4 Area4 2015  52 135
#> 5 Area1 2016  78 109
#> 6 Area2 2016  17 192
#> 7 Area3 2016  76 124
#> 8 Area4 2016  71 147

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop      value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  10 135 0.07407407 0.04073054 0.1309866        95% proportion of 1
#> 2 Area2 2015  38 118 0.32203390 0.24448837 0.4108014        95% proportion of 1
#> 3 Area3 2015 100 118 0.84745763 0.77172806 0.9012777        95% proportion of 1
#> 4 Area4 2015  52 135 0.38518519 0.30735355 0.4693702        95% proportion of 1
#> 5 Area1 2016  78 109 0.71559633 0.62469702 0.7918166        95% proportion of 1
#> 6 Area2 2016  17 192 0.08854167 0.05601543 0.1372095        95% proportion of 1
#> 7 Area3 2016  76 124 0.61290323 0.52500839 0.6940129        95% proportion of 1
#> 8 Area4 2016  71 147 0.48299320 0.40367960 0.5631730        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop      value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  10 135 0.07407407 0.02925417 0.1751708      99.8% proportion of 1
#> 2 Area2 2015  38 118 0.32203390 0.20681397 0.4639022      99.8% proportion of 1
#> 3 Area3 2015 100 118 0.84745763 0.71968271 0.9232048      99.8% proportion of 1
#> 4 Area4 2015  52 135 0.38518519 0.26746001 0.5180806      99.8% proportion of 1
#> 5 Area1 2016  78 109 0.71559633 0.56901776 0.8274410      99.8% proportion of 1
#> 6 Area2 2016  17 192 0.08854167 0.04320031 0.1728733      99.8% proportion of 1
#> 7 Area3 2016  76 124 0.61290323 0.47433068 0.7353294      99.8% proportion of 1
#> 8 Area4 2016  71 147 0.48299320 0.36060673 0.6074545      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop     value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  10 135  7.407407  4.073054 13.09866        95% percentage Wilson
#> 2 Area2 2015  38 118 32.203390 24.448837 41.08014        95% percentage Wilson
#> 3 Area3 2015 100 118 84.745763 77.172806 90.12777        95% percentage Wilson
#> 4 Area4 2015  52 135 38.518519 30.735355 46.93702        95% percentage Wilson
#> 5 Area1 2016  78 109 71.559633 62.469702 79.18166        95% percentage Wilson
#> 6 Area2 2016  17 192  8.854167  5.601543 13.72095        95% percentage Wilson
#> 7 Area3 2016  76 124 61.290323 52.500839 69.40129        95% percentage Wilson
#> 8 Area4 2016  71 147 48.299320 40.367960 56.31730        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value   lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  10 135  7.407407  2.925417 17.51708      99.8% percentage Wilson
#> 2 Area2 2015  38 118 32.203390 20.681397 46.39022      99.8% percentage Wilson
#> 3 Area3 2015 100 118 84.745763 71.968271 92.32048      99.8% percentage Wilson
#> 4 Area4 2015  52 135 38.518519 26.746001 51.80806      99.8% percentage Wilson
#> 5 Area1 2016  78 109 71.559633 56.901776 82.74410      99.8% percentage Wilson
#> 6 Area2 2016  17 192  8.854167  4.320031 17.28733      99.8% percentage Wilson
#> 7 Area3 2016  76 124 61.290323 47.433068 73.53294      99.8% percentage Wilson
#> 8 Area4 2016  71 147 48.299320 36.060673 60.74545      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop     value   lowercl  uppercl
#> 1 Area1 2015  10 135  7.407407  2.925417 17.51708
#> 2 Area2 2015  38 118 32.203390 20.681397 46.39022
#> 3 Area3 2015 100 118 84.745763 71.968271 92.32048
#> 4 Area4 2015  52 135 38.518519 26.746001 51.80806
#> 5 Area1 2016  78 109 71.559633 56.901776 82.74410
#> 6 Area2 2016  17 192  8.854167  4.320031 17.28733
#> 7 Area3 2016  76 124 61.290323 47.433068 73.53294
#> 8 Area4 2016  71 147 48.299320 36.060673 60.74545

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop     value   lowercl   uppercl confidence       statistic
#> 1 Area1 2015  10 135  7407.407  3546.259  13623.30        95% rate per 100000
#> 2 Area2 2015  38 118 32203.390 22786.160  44202.91        95% rate per 100000
#> 3 Area3 2015 100 118 84745.763 68950.937 103074.52        95% rate per 100000
#> 4 Area4 2015  52 135 38518.519 28765.401  50512.93        95% rate per 100000
#> 5 Area1 2016  78 109 71559.633 56562.915  89310.80        95% rate per 100000
#> 6 Area2 2016  17 192  8854.167  5154.936  14177.14        95% rate per 100000
#> 7 Area3 2016  76 124 61290.323 48288.012  76714.98        95% rate per 100000
#> 8 Area4 2016  71 147 48299.320 37720.596  60923.88        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value   lowercl   uppercl confidence    statistic
#> 1 Area1 2015  10 135  7.407407  2.160236  17.92128      99.8% rate per 100
#> 2 Area2 2015  38 118 32.203390 18.411842  51.86901      99.8% rate per 100
#> 3 Area3 2015 100 118 84.745763 60.935332 114.35900      99.8% rate per 100
#> 4 Area4 2015  52 135 38.518519 24.076544  58.07185      99.8% rate per 100
#> 5 Area1 2016  78 109 71.559633 49.089484 100.32861      99.8% rate per 100
#> 6 Area2 2016  17 192  8.854167  3.641033  17.72878      99.8% rate per 100
#> 7 Area3 2016  76 124 61.290323 41.821774  86.29739      99.8% rate per 100
#> 8 Area4 2016  71 147 48.299320 32.488672  68.78563      99.8% rate per 100
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop     value   lowercl   uppercl
#> 1 Area1 2015  10 135  7.407407  2.160236  17.92128
#> 2 Area2 2015  38 118 32.203390 18.411842  51.86901
#> 3 Area3 2015 100 118 84.745763 60.935332 114.35900
#> 4 Area4 2015  52 135 38.518519 24.076544  58.07185
#> 5 Area1 2016  78 109 71.559633 49.089484 100.32861
#> 6 Area2 2016  17 192  8.854167  3.641033  17.72878
#> 7 Area3 2016  76 124 61.290323 41.821774  86.29739
#> 8 Area4 2016  71 147 48.299320 32.488672  68.78563

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   200   506 0.395   0.354   0.438 95%        proportion of 1 Wilson
#> 2  2016   242   572 0.423   0.383   0.464 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   200   506 39526.  34237.  45400. 95%        rate per 100000 Byars 
#> 2  2016   242   572 42308.  37145.  47988. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev value  lowercl  uppercl confidence statistic
#> 1       442           8 31.66228 55.25 28.77967 81.72033        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confi…¹ stati…² method
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#> 1  2015       200           4  37.6  50    -142.     242. 99.8%   mean    Stude…
#> 2  2016       242           4  29.1  60.5   -88.4    209. 99.8%   mean    Stude…
#> # … with abbreviated variable names ¹​confidence, ²​statistic

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       200           4  37.6  50    -142.     242.
#> 2  2016       242           4  29.1  60.5   -88.4    209.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0 177 14312
#> 2 Area1 2006 Male       5  24 16507
#> 3 Area1 2006 Male      10 177 15590
#> 4 Area1 2006 Male      15  90 17854
#> 5 Area1 2006 Male      20 144 12624
#> 6 Area1 2006 Male      25  13 18162

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        2024   288172  685.    653.    718. 95%     dsr pe…
#>  2 Area1  2006 Male          1761   289012  627.    596.    660. 95%     dsr pe…
#>  3 Area1  2007 Female        2393   287928  845.    809.    882. 95%     dsr pe…
#>  4 Area1  2007 Male          1844   287266  658.    627.    691. 95%     dsr pe…
#>  5 Area1  2008 Female        1751   288418  611.    580.    642. 95%     dsr pe…
#>  6 Area1  2008 Male          2081   297307  771.    736.    808. 95%     dsr pe…
#>  7 Area1  2009 Female        1846   292801  614.    585.    645. 95%     dsr pe…
#>  8 Area1  2009 Male          1635   283820  568.    538.    598. 95%     dsr pe…
#>  9 Area1  2010 Female        2093   278420  792.    756.    829. 95%     dsr pe…
#> 10 Area1  2010 Male          2184   283341  781.    745.    818. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        2024    288172  685.    653.    718.
#>  2 Area1  2006 Male          1761    289012  627.    596.    660.
#>  3 Area1  2007 Female        2393    287928  845.    809.    882.
#>  4 Area1  2007 Male          1844    287266  658.    627.    691.
#>  5 Area1  2008 Female        1751    288418  611.    580.    642.
#>  6 Area1  2008 Male          2081    297307  771.    736.    808.
#>  7 Area1  2009 Female        1846    292801  614.    585.    645.
#>  8 Area1  2009 Male          1635    283820  568.    538.    598.
#>  9 Area1  2010 Female        2093    278420  792.    756.    829.
#> 10 Area1  2010 Male          2184    283341  781.    745.    818.
#> # … with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        2024   288172  685.    653.    718. 95%     dsr pe…
#>  2 Area1  2006 Male          1761   289012  627.    596.    660. 95%     dsr pe…
#>  3 Area1  2007 Female        2393   287928  845.    809.    882. 95%     dsr pe…
#>  4 Area1  2007 Male          1844   287266  658.    627.    691. 95%     dsr pe…
#>  5 Area1  2008 Female        1751   288418  611.    580.    642. 95%     dsr pe…
#>  6 Area1  2008 Male          2081   297307  771.    736.    808. 95%     dsr pe…
#>  7 Area1  2009 Female        1846   292801  614.    585.    645. 95%     dsr pe…
#>  8 Area1  2009 Male          1635   283820  568.    538.    598. 95%     dsr pe…
#>  9 Area1  2010 Female        2093   278420  792.    756.    829. 95%     dsr pe…
#> 10 Area1  2010 Male          2184   283341  781.    745.    818. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        2024   288172  685.    653.    718. 95%     dsr pe…
#>  2 Area1  2006 Male          1761   289012  627.    596.    660. 95%     dsr pe…
#>  3 Area1  2007 Female        2393   287928  845.    809.    882. 95%     dsr pe…
#>  4 Area1  2007 Male          1844   287266  658.    627.    691. 95%     dsr pe…
#>  5 Area1  2008 Female        1751   288418  611.    580.    642. 95%     dsr pe…
#>  6 Area1  2008 Male          2081   297307  771.    736.    808. 95%     dsr pe…
#>  7 Area1  2009 Female        1846   292801  614.    585.    645. 95%     dsr pe…
#>  8 Area1  2009 Male          1635   283820  568.    538.    598. 95%     dsr pe…
#>  9 Area1  2010 Female        2093   278420  792.    756.    829. 95%     dsr pe…
#> 10 Area1  2010 Male          2184   283341  781.    745.    818. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_…¹ value lowercl uppercl confi…² stati…³
#>    <chr> <int> <chr>        <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>  
#>  1 Area1  2006 Female        1475   232192  627.    595.    661. 95%     dsr pe…
#>  2 Area1  2006 Male          1363   224949  635.    601.    671. 95%     dsr pe…
#>  3 Area1  2007 Female        1807   233596  813.    775.    853. 95%     dsr pe…
#>  4 Area1  2007 Male          1507   231371  661.    628.    696. 95%     dsr pe…
#>  5 Area1  2008 Female        1462   233240  627.    594.    661. 95%     dsr pe…
#>  6 Area1  2008 Male          1742   225496  807.    769.    847. 95%     dsr pe…
#>  7 Area1  2009 Female        1359   232743  596.    564.    629. 95%     dsr pe…
#>  8 Area1  2009 Male          1212   226688  550.    519.    583. 95%     dsr pe…
#>  9 Area1  2010 Female        1721   212672  814.    775.    855. 95%     dsr pe…
#> 10 Area1  2010 Male          1616   228550  765.    727.    805. 95%     dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   names ¹​total_pop, ²​confidence, ³​statistic
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups:   area, year [20]
#>    area   year total_count total_…¹ value lowercl uppercl confi…² stati…³ method
#>    <chr> <int>       <int>    <int> <dbl>   <dbl>   <dbl> <chr>   <chr>   <chr> 
#>  1 Area1  2006        3785   577184  625.    604.    647. 95%     dsr pe… Dobson
#>  2 Area1  2007        4237   575194  742.    719.    766. 95%     dsr pe… Dobson
#>  3 Area1  2008        3832   585725  667.    645.    690. 95%     dsr pe… Dobson
#>  4 Area1  2009        3481   576621  580.    560.    601. 95%     dsr pe… Dobson
#>  5 Area1  2010        4277   561761  771.    746.    796. 95%     dsr pe… Dobson
#>  6 Area2  2006        2572   553802  483.    463.    503. 95%     dsr pe… Dobson
#>  7 Area2  2007        4362   565559  760.    736.    784. 95%     dsr pe… Dobson
#>  8 Area2  2008        4126   554590  778.    754.    804. 95%     dsr pe… Dobson
#>  9 Area2  2009        4296   590844  782.    758.    807. 95%     dsr pe… Dobson
#> 10 Area2  2010        3625   583271  633.    612.    656. 95%     dsr pe… Dobson
#> 11 Area3  2006        3855   556619  688.    665.    711. 95%     dsr pe… Dobson
#> 12 Area3  2007        3270   549441  621.    598.    644. 95%     dsr pe… Dobson
#> 13 Area3  2008        3885   561281  769.    744.    795. 95%     dsr pe… Dobson
#> 14 Area3  2009        3731   575388  656.    634.    678. 95%     dsr pe… Dobson
#> 15 Area3  2010        3561   530638  700.    676.    725. 95%     dsr pe… Dobson
#> 16 Area4  2006        3547   565865  675.    652.    699. 95%     dsr pe… Dobson
#> 17 Area4  2007        3892   545201  733.    709.    758. 95%     dsr pe… Dobson
#> 18 Area4  2008        3723   562596  690.    667.    713. 95%     dsr pe… Dobson
#> 19 Area4  2009        3879   573563  667.    644.    690. 95%     dsr pe… Dobson
#> 20 Area4  2010        4431   558789  812.    787.    838. 95%     dsr pe… Dobson
#> # … with abbreviated variable names ¹​total_pop, ²​confidence, ³​statistic

Execute calculate_ISRatio and calculate_ISRate

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   818 113929
#> 2       5   536 126250
#> 3      10   859 122899
#> 4      15   783 134328
#> 5      20  1112 120966
#> 6      25   708 114075

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence stati…¹
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>  
#>  1 Area1  2006 Female     2024    1751. 1.16    1.11    1.21  95%        indire…
#>  2 Area1  2006 Male       1761    1753. 1.00    0.958   1.05  95%        indire…
#>  3 Area1  2007 Female     2393    1758. 1.36    1.31    1.42  95%        indire…
#>  4 Area1  2007 Male       1844    1742. 1.06    1.01    1.11  95%        indire…
#>  5 Area1  2008 Female     1751    1772. 0.988   0.943   1.04  95%        indire…
#>  6 Area1  2008 Male       2081    1820. 1.14    1.09    1.19  95%        indire…
#>  7 Area1  2009 Female     1846    1762. 1.05    1.00    1.10  95%        indire…
#>  8 Area1  2009 Male       1635    1749. 0.935   0.890   0.981 95%        indire…
#>  9 Area1  2010 Female     2093    1686. 1.24    1.19    1.30  95%        indire…
#> 10 Area1  2010 Male       2184    1740. 1.26    1.20    1.31  95%        indire…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> #   name ¹​statistic

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     2024    1751. 1.16    1.11    1.21 
#>  2 Area1  2006 Male       1761    1753. 1.00    0.958   1.05 
#>  3 Area1  2007 Female     2393    1758. 1.36    1.31    1.42 
#>  4 Area1  2007 Male       1844    1742. 1.06    1.01    1.11 
#>  5 Area1  2008 Female     1751    1772. 0.988   0.943   1.04 
#>  6 Area1  2008 Male       2081    1820. 1.14    1.09    1.19 
#>  7 Area1  2009 Female     1846    1762. 1.05    1.00    1.10 
#>  8 Area1  2009 Male       1635    1749. 0.935   0.890   0.981
#>  9 Area1  2010 Female     2093    1686. 1.24    1.19    1.30 
#> 10 Area1  2010 Male       2184    1740. 1.26    1.20    1.31 
#> # … with 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl confide…¹
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>    
#>  1 Area1  2006 Female     2024    1751.     611.  706.    676.    737. 95%      
#>  2 Area1  2006 Male       1761    1753.     611.  613.    585.    643. 95%      
#>  3 Area1  2007 Female     2393    1758.     611.  831.    798.    865. 95%      
#>  4 Area1  2007 Male       1844    1742.     611.  646.    617.    677. 95%      
#>  5 Area1  2008 Female     1751    1772.     611.  603.    576.    632. 95%      
#>  6 Area1  2008 Male       2081    1820.     611.  698.    669.    729. 95%      
#>  7 Area1  2009 Female     1846    1762.     611.  640.    611.    669. 95%      
#>  8 Area1  2009 Male       1635    1749.     611.  571.    543.    599. 95%      
#>  9 Area1  2010 Female     2093    1686.     611.  758.    726.    791. 95%      
#> 10 Area1  2010 Male       2184    1740.     611.  766.    735.    799. 95%      
#> # … with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> #   abbreviated variable name ¹​confidence

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     2024    1751.     611.  706.    676.    737.
#>  2 Area1  2006 Male       1761    1753.     611.  613.    585.    643.
#>  3 Area1  2007 Female     2393    1758.     611.  831.    798.    865.
#>  4 Area1  2007 Male       1844    1742.     611.  646.    617.    677.
#>  5 Area1  2008 Female     1751    1772.     611.  603.    576.    632.
#>  6 Area1  2008 Male       2081    1820.     611.  698.    669.    729.
#>  7 Area1  2009 Female     1846    1762.     611.  640.    611.    669.
#>  8 Area1  2009 Male       1635    1749.     611.  571.    543.    599.
#>  9 Area1  2010 Female     2093    1686.     611.  758.    726.    791.
#> 10 Area1  2010 Male       2184    1740.     611.  766.    735.    799.
#> # … with 30 more rows