Introduction_to_PHEindicatormethods

Georgina Anderson

2019-04-23

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_smr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
phe_isr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  90 193
#> 2 Area2 2015  94 124
#> 3 Area3 2015  90 137
#> 4 Area4 2015  34 174
#> 5 Area1 2016  82 122
#> 6 Area2 2016  67 198
#> 7 Area3 2016  96 180
#> 8 Area4 2016  32 101

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value   lowercl   uppercl confidence
#> 1 Area1 2015  90 193 0.4663212 0.3972851 0.5366719        95%
#> 2 Area2 2015  94 124 0.7580645 0.6756700 0.8249500        95%
#> 3 Area3 2015  90 137 0.6569343 0.5741342 0.7311736        95%
#> 4 Area4 2015  34 174 0.1954023 0.1433360 0.2606275        95%
#> 5 Area1 2016  82 122 0.6721311 0.5846897 0.7490636        95%
#> 6 Area2 2016  67 198 0.3383838 0.2761117 0.4068077        95%
#> 7 Area3 2016  96 180 0.5333333 0.4605179 0.6047558        95%
#> 8 Area4 2016  32 101 0.3168317 0.2342353 0.4128509        95%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop     value   lowercl   uppercl confidence
#> 1 Area1 2015  90 193 0.4663212 0.3595776 0.5762406      99.8%
#> 2 Area2 2015  94 124 0.7580645 0.6236165 0.8556064      99.8%
#> 3 Area3 2015  90 137 0.6569343 0.5250925 0.7683237      99.8%
#> 4 Area4 2015  34 174 0.1954023 0.1194300 0.3030692      99.8%
#> 5 Area1 2016  82 122 0.6721311 0.5325394 0.7867319      99.8%
#> 6 Area2 2016  67 198 0.3383838 0.2440544 0.4475855      99.8%
#> 7 Area3 2016  96 180 0.5333333 0.4196635 0.6436445      99.8%
#> 8 Area4 2016  32 101 0.3168317 0.1950033 0.4703051      99.8%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop    value  lowercl  uppercl confidence  statistic
#> 1 Area1 2015  90 193 46.63212 39.72851 53.66719        95% percentage
#> 2 Area2 2015  94 124 75.80645 67.56700 82.49500        95% percentage
#> 3 Area3 2015  90 137 65.69343 57.41342 73.11736        95% percentage
#> 4 Area4 2015  34 174 19.54023 14.33360 26.06275        95% percentage
#> 5 Area1 2016  82 122 67.21311 58.46897 74.90636        95% percentage
#> 6 Area2 2016  67 198 33.83838 27.61117 40.68077        95% percentage
#> 7 Area3 2016  96 180 53.33333 46.05179 60.47558        95% percentage
#> 8 Area4 2016  32 101 31.68317 23.42353 41.28509        95% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value  lowercl  uppercl confidence  statistic
#> 1 Area1 2015  90 193 46.63212 35.95776 57.62406      99.8% percentage
#> 2 Area2 2015  94 124 75.80645 62.36165 85.56064      99.8% percentage
#> 3 Area3 2015  90 137 65.69343 52.50925 76.83237      99.8% percentage
#> 4 Area4 2015  34 174 19.54023 11.94300 30.30692      99.8% percentage
#> 5 Area1 2016  82 122 67.21311 53.25394 78.67319      99.8% percentage
#> 6 Area2 2016  67 198 33.83838 24.40544 44.75855      99.8% percentage
#> 7 Area3 2016  96 180 53.33333 41.96635 64.36445      99.8% percentage
#> 8 Area4 2016  32 101 31.68317 19.50033 47.03051      99.8% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop    value  lowercl  uppercl
#> 1 Area1 2015  90 193 46.63212 35.95776 57.62406
#> 2 Area2 2015  94 124 75.80645 62.36165 85.56064
#> 3 Area3 2015  90 137 65.69343 52.50925 76.83237
#> 4 Area4 2015  34 174 19.54023 11.94300 30.30692
#> 5 Area1 2016  82 122 67.21311 53.25394 78.67319
#> 6 Area2 2016  67 198 33.83838 24.40544 44.75855
#> 7 Area3 2016  96 180 53.33333 41.96635 64.36445
#> 8 Area4 2016  32 101 31.68317 19.50033 47.03051

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value  lowercl  uppercl confidence       statistic
#> 1 Area1 2015  90 193 46632.12 37496.69 57319.41        95% rate per 100000
#> 2 Area2 2015  94 124 75806.45 61257.75 92768.84        95% rate per 100000
#> 3 Area3 2015  90 137 65693.43 52823.81 80749.24        95% rate per 100000
#> 4 Area4 2015  34 174 19540.23 13530.09 27306.37        95% rate per 100000
#> 5 Area1 2016  82 122 67213.11 53454.85 83430.20        95% rate per 100000
#> 6 Area2 2016  67 198 33838.38 26223.07 42974.20        95% rate per 100000
#> 7 Area3 2016  96 180 53333.33 43199.10 65129.77        95% rate per 100000
#> 8 Area4 2016  32 101 31683.17 21667.52 44728.66        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop    value  lowercl   uppercl confidence    statistic
#> 1 Area1 2015  90 193 46.63212 32.89479  63.92121      99.8% rate per 100
#> 2 Area2 2015  94 124 75.80645 53.90614 103.23233      99.8% rate per 100
#> 3 Area3 2015  90 137 65.69343 46.34083  90.04959      99.8% rate per 100
#> 4 Area4 2015  34 174 19.54023 10.77682  32.29409      99.8% rate per 100
#> 5 Area1 2016  82 122 67.21311 46.57199  93.47871      99.8% rate per 100
#> 6 Area2 2016  67 198 33.83838 22.47514  48.67537      99.8% rate per 100
#> 7 Area3 2016  96 180 53.33333 38.07060  72.40174      99.8% rate per 100
#> 8 Area4 2016  32 101 31.68317 17.11577  53.13188      99.8% rate per 100
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop    value  lowercl   uppercl
#> 1 Area1 2015  90 193 46.63212 32.89479  63.92121
#> 2 Area2 2015  94 124 75.80645 53.90614 103.23233
#> 3 Area3 2015  90 137 65.69343 46.34083  90.04959
#> 4 Area4 2015  34 174 19.54023 10.77682  32.29409
#> 5 Area1 2016  82 122 67.21311 46.57199  93.47871
#> 6 Area2 2016  67 198 33.83838 22.47514  48.67537
#> 7 Area3 2016  96 180 53.33333 38.07060  72.40174
#> 8 Area4 2016  32 101 31.68317 17.11577  53.13188

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 x 9
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   308   628 0.490   0.452   0.529 95%        proportion of 1 Wilson
#> 2  2016   277   601 0.461   0.421   0.501 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 x 9
#>    year   obs   pop  value lowercl uppercl confidence statistic      method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>          <chr> 
#> 1  2015   308   628 49045.  43720.  54839. 95%        rate per 1000~ Byars 
#> 2  2016   277   601 46090.  40821.  51850. 95%        rate per 1000~ Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev  value  lowercl  uppercl confidence
#> 1       585           8 26.36793 73.125 51.08086 95.16914        95%
#>   statistic                   method
#> 1      mean Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence=0.998)
#> # A tibble: 2 x 10
#>    year value_sum value_count stdev value lowercl uppercl confidence
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>     
#> 1  2015       308           4  28.7  77     -69.7    224. 99.8%     
#> 2  2016       277           4  27.5  69.2   -71.3    210. 99.8%     
#> # ... with 2 more variables: statistic <chr>, method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 x 7
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       308           4  28.7  77     -69.7    224.
#> 2  2016       277           4  27.5  69.2   -71.3    210.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0  75 14989
#> 2 Area1 2006 Male       5  78 17379
#> 3 Area1 2006 Male      10 179 10718
#> 4 Area1 2006 Male      15 183 10417
#> 5 Area1 2006 Male      20 111 17387
#> 6 Area1 2006 Male      25  21 19406

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        2034    274567  745.    711.    781. 95%       
#>  2 Area1  2006 Male         1793    294669  649.    617.    682. 95%       
#>  3 Area1  2007 Fema~        2043    287373  742.    708.    777. 95%       
#>  4 Area1  2007 Male         2021    290581  678.    647.    710. 95%       
#>  5 Area1  2008 Fema~        1812    302461  588.    560.    618. 95%       
#>  6 Area1  2008 Male         1964    292433  668.    638.    700. 95%       
#>  7 Area1  2009 Fema~        2226    281936  806.    770.    843. 95%       
#>  8 Area1  2009 Male         1763    284014  606.    575.    637. 95%       
#>  9 Area1  2010 Fema~        1786    287422  617.    587.    649. 95%       
#> 10 Area1  2010 Male         2141    292748  766.    733.    801. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups:   area, year [20]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <fct> <int> <fct>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        2034    274567  745.    711.    781.
#>  2 Area1  2006 Male          1793    294669  649.    617.    682.
#>  3 Area1  2007 Female        2043    287373  742.    708.    777.
#>  4 Area1  2007 Male          2021    290581  678.    647.    710.
#>  5 Area1  2008 Female        1812    302461  588.    560.    618.
#>  6 Area1  2008 Male          1964    292433  668.    638.    700.
#>  7 Area1  2009 Female        2226    281936  806.    770.    843.
#>  8 Area1  2009 Male          1763    284014  606.    575.    637.
#>  9 Area1  2010 Female        1786    287422  617.    587.    649.
#> 10 Area1  2010 Male          2141    292748  766.    733.    801.
#> # ... with 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        2034    274567  745.    711.    781. 95%       
#>  2 Area1  2006 Male         1793    294669  649.    617.    682. 95%       
#>  3 Area1  2007 Fema~        2043    287373  742.    708.    777. 95%       
#>  4 Area1  2007 Male         2021    290581  678.    647.    710. 95%       
#>  5 Area1  2008 Fema~        1812    302461  588.    560.    618. 95%       
#>  6 Area1  2008 Male         1964    292433  668.    638.    700. 95%       
#>  7 Area1  2009 Fema~        2226    281936  806.    770.    843. 95%       
#>  8 Area1  2009 Male         1763    284014  606.    575.    637. 95%       
#>  9 Area1  2010 Fema~        1786    287422  617.    587.    649. 95%       
#> 10 Area1  2010 Male         2141    292748  766.    733.    801. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        2034    274567  745.    711.    781. 95%       
#>  2 Area1  2006 Male         1793    294669  649.    617.    682. 95%       
#>  3 Area1  2007 Fema~        2043    287373  742.    708.    777. 95%       
#>  4 Area1  2007 Male         2021    290581  678.    647.    710. 95%       
#>  5 Area1  2008 Fema~        1812    302461  588.    560.    618. 95%       
#>  6 Area1  2008 Male         1964    292433  668.    638.    700. 95%       
#>  7 Area1  2009 Fema~        2226    281936  806.    770.    843. 95%       
#>  8 Area1  2009 Male         1763    284014  606.    575.    637. 95%       
#>  9 Area1  2010 Fema~        1786    287422  617.    587.    649. 95%       
#> 10 Area1  2010 Male         2141    292748  766.    733.    801. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   total_count total_pop value lowercl uppercl confidence
#>    <fct> <int> <fct>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~        1537    212682  726.    689.    764. 95%       
#>  2 Area1  2006 Male         1483    234729  671.    637.    707. 95%       
#>  3 Area1  2007 Fema~        1587    223775  737.    701.    776. 95%       
#>  4 Area1  2007 Male         1660    237251  702.    668.    737. 95%       
#>  5 Area1  2008 Fema~        1441    248228  582.    552.    614. 95%       
#>  6 Area1  2008 Male         1688    238220  687.    654.    721. 95%       
#>  7 Area1  2009 Fema~        1800    223206  823.    784.    863. 95%       
#>  8 Area1  2009 Male         1310    219212  601.    568.    635. 95%       
#>  9 Area1  2010 Fema~        1278    231722  569.    538.    602. 95%       
#> 10 Area1  2010 Male         1785    229198  788.    751.    825. 95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(area, year) %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 x 10
#> # Groups:   area [4]
#>    area   year total_count total_pop value lowercl uppercl confidence
#>    <fct> <int>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006        3827    569236  669.    647.    692. 95%       
#>  2 Area1  2007        4064    577954  706.    683.    730. 95%       
#>  3 Area1  2008        3776    594894  633.    612.    655. 95%       
#>  4 Area1  2009        3989    565950  699.    675.    722. 95%       
#>  5 Area1  2010        3927    580170  681.    658.    704. 95%       
#>  6 Area2  2006        3721    553521  648.    626.    671. 95%       
#>  7 Area2  2007        4343    559070  791.    767.    817. 95%       
#>  8 Area2  2008        4086    530565  793.    767.    818. 95%       
#>  9 Area2  2009        3997    547794  721.    697.    745. 95%       
#> 10 Area2  2010        3802    549293  740.    715.    765. 95%       
#> 11 Area3  2006        4464    577663  769.    746.    793. 95%       
#> 12 Area3  2007        3778    540250  742.    718.    767. 95%       
#> 13 Area3  2008        3853    543948  752.    727.    777. 95%       
#> 14 Area3  2009        3428    567772  600.    579.    622. 95%       
#> 15 Area3  2010        4159    547879  726.    703.    750. 95%       
#> 16 Area4  2006        3413    584511  627.    605.    649. 95%       
#> 17 Area4  2007        3656    548666  662.    640.    685. 95%       
#> 18 Area4  2008        3938    552582  693.    669.    717. 95%       
#> 19 Area4  2009        3755    549062  670.    647.    693. 95%       
#> 20 Area4  2010        3021    575179  543.    522.    564. 95%       
#> # ... with 2 more variables: statistic <chr>, method <chr>

Execute phe_smr and phe_isr

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the phe_smr and phe_isr functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop))
    
head(df_ref)
#> # A tibble: 6 x 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   912 116087
#> 2       5   746 117691
#> 3      10   929 122302
#> 4      15   928 113389
#> 5      20   699 121968
#> 6      25   623 129101

Here are some example code chunks to demonstrate the phe_smr function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     2034    1851. 1.10    1.05    1.15  95%       
#>  2 Area1  2006 Male      1793    1973. 0.909   0.867   0.952 95%       
#>  3 Area1  2007 Fema~     2043    1940. 1.05    1.01    1.10  95%       
#>  4 Area1  2007 Male      2021    1959. 1.03    0.987   1.08  95%       
#>  5 Area1  2008 Fema~     1812    2045. 0.886   0.846   0.928 95%       
#>  6 Area1  2008 Male      1964    1963. 1.00    0.957   1.05  95%       
#>  7 Area1  2009 Fema~     2226    1930. 1.15    1.11    1.20  95%       
#>  8 Area1  2009 Male      1763    1934. 0.912   0.870   0.955 95%       
#>  9 Area1  2010 Fema~     1786    1963. 0.910   0.868   0.953 95%       
#> 10 Area1  2010 Male      2141    1988. 1.08    1.03    1.12  95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate the same smrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_smr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups:   area, year [20]
#>    area   year sex   observed expected value lowercl uppercl confidence
#>    <fct> <int> <fct>    <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema~     2034    1851. 1.10    1.05    1.15  95%       
#>  2 Area1  2006 Male      1793    1973. 0.909   0.867   0.952 95%       
#>  3 Area1  2007 Fema~     2043    1940. 1.05    1.01    1.10  95%       
#>  4 Area1  2007 Male      2021    1959. 1.03    0.987   1.08  95%       
#>  5 Area1  2008 Fema~     1812    2045. 0.886   0.846   0.928 95%       
#>  6 Area1  2008 Male      1964    1963. 1.00    0.957   1.05  95%       
#>  7 Area1  2009 Fema~     2226    1930. 1.15    1.11    1.20  95%       
#>  8 Area1  2009 Male      1763    1934. 0.912   0.870   0.955 95%       
#>  9 Area1  2010 Fema~     1786    1963. 0.910   0.868   0.953 95%       
#> 10 Area1  2010 Male      2141    1988. 1.08    1.03    1.12  95%       
#> # ... with 30 more rows, and 2 more variables: statistic <chr>,
#> #   method <chr>

# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_smr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#>    year observed expected value lowercl uppercl
#>   <int>    <int>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15425   15425  1       0.984   1.02 
#> 2  2007    15841   15077. 1.05    1.03    1.07 
#> 3  2008    15653   14971. 1.05    1.03    1.06 
#> 4  2009    15169   15094. 1.00    0.989   1.02 
#> 5  2010    14909   15237. 0.978   0.963   0.994

The phe_isr function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate isrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Fema~     2034    1851.     675.  742.    710.    775.
#>  2 Area1  2006 Male      1793    1973.     675.  613.    585.    643.
#>  3 Area1  2007 Fema~     2043    1940.     675.  711.    680.    742.
#>  4 Area1  2007 Male      2021    1959.     675.  696.    666.    727.
#>  5 Area1  2008 Fema~     1812    2045.     675.  598.    571.    626.
#>  6 Area1  2008 Male      1964    1963.     675.  675.    646.    706.
#>  7 Area1  2009 Fema~     2226    1930.     675.  779.    747.    812.
#>  8 Area1  2009 Male      1763    1934.     675.  616.    587.    645.
#>  9 Area1  2010 Fema~     1786    1963.     675.  614.    586.    643.
#> 10 Area1  2010 Male      2141    1988.     675.  727.    697.    758.
#> # ... with 30 more rows, and 3 more variables: confidence <chr>,
#> #   statistic <chr>, method <chr>

# calculate the same isrs by appending the reference data to the data frame
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    phe_isr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups:   area, year [20]
#>    area   year sex   observed expected ref_rate value lowercl uppercl
#>    <fct> <int> <fct>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Fema~     2034    1851.     675.  742.    710.    775.
#>  2 Area1  2006 Male      1793    1973.     675.  613.    585.    643.
#>  3 Area1  2007 Fema~     2043    1940.     675.  711.    680.    742.
#>  4 Area1  2007 Male      2021    1959.     675.  696.    666.    727.
#>  5 Area1  2008 Fema~     1812    2045.     675.  598.    571.    626.
#>  6 Area1  2008 Male      1964    1963.     675.  675.    646.    706.
#>  7 Area1  2009 Fema~     2226    1930.     675.  779.    747.    812.
#>  8 Area1  2009 Male      1763    1934.     675.  616.    587.    645.
#>  9 Area1  2010 Fema~     1786    1963.     675.  614.    586.    643.
#> 10 Area1  2010 Male      2141    1988.     675.  727.    697.    758.
#> # ... with 30 more rows, and 3 more variables: confidence <chr>,
#> #   statistic <chr>, method <chr>

# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
    group_by(year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop)) %>%
    group_by(year) %>%
    phe_isr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#>    year observed expected ref_rate value lowercl uppercl
#>   <int>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#> 1  2006    15425   15425      675.  675.    664.    686.
#> 2  2007    15841   15077.     675.  709.    698.    720.
#> 3  2008    15653   14971.     675.  706.    695.    717.
#> 4  2009    15169   15094.     675.  678.    668.    689.
#> 5  2010    14909   15237.     675.  661.    650.    671.