This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.
This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.
Function | Type | Description |
---|---|---|
phe_proportion | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_rate | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_mean | Aggregate | Performs a calculation on each grouping set |
phe_dsr | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
phe_smr | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
phe_isr | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:
df <- data.frame(
area = rep(c("Area1","Area2","Area3","Area4"), 2),
year = rep(2015:2016, each = 4),
obs = sample(100, 2 * 4, replace = TRUE),
pop = sample(100:200, 2 * 4, replace = TRUE))
df
#> area year obs pop
#> 1 Area1 2015 12 176
#> 2 Area2 2015 11 200
#> 3 Area3 2015 94 110
#> 4 Area4 2015 67 165
#> 5 Area1 2016 5 185
#> 6 Area2 2016 52 161
#> 7 Area3 2016 69 164
#> 8 Area4 2016 95 129
INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.
OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified
# default proportion
phe_proportion(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence
#> 1 Area1 2015 12 176 0.06818182 0.03942979 0.11538134 95%
#> 2 Area2 2015 11 200 0.05500000 0.03098534 0.09578700 95%
#> 3 Area3 2015 94 110 0.85454545 0.77672261 0.90844078 95%
#> 4 Area4 2015 67 165 0.40606061 0.33409150 0.48230431 95%
#> 5 Area1 2016 5 185 0.02702703 0.01159838 0.06169833 95%
#> 6 Area2 2016 52 161 0.32298137 0.25560301 0.39861020 95%
#> 7 Area3 2016 69 164 0.42073171 0.34783762 0.49725429 95%
#> 8 Area4 2016 95 129 0.73643411 0.65436652 0.80482747 95%
#> statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson
# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#> area year obs pop value lowercl uppercl confidence
#> 1 Area1 2015 12 176 0.06818182 0.029056743 0.15175500 99.8%
#> 2 Area2 2015 11 200 0.05500000 0.022555178 0.12800366 99.8%
#> 3 Area3 2015 94 110 0.85454545 0.722634682 0.92981453 99.8%
#> 4 Area4 2015 67 165 0.40606061 0.296217285 0.52618270 99.8%
#> 5 Area1 2016 5 185 0.02702703 0.007467365 0.09301879 99.8%
#> 6 Area2 2016 52 161 0.32298137 0.221799192 0.44398706 99.8%
#> 7 Area3 2016 69 164 0.42073171 0.309207763 0.54097910 99.8%
#> 8 Area4 2016 95 129 0.73643411 0.603330992 0.83694475 99.8%
#> statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson
# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic method
#> 1 Area1 2015 12 176 6.818182 3.942979 11.538134 95% percentage Wilson
#> 2 Area2 2015 11 200 5.500000 3.098534 9.578700 95% percentage Wilson
#> 3 Area3 2015 94 110 85.454545 77.672261 90.844078 95% percentage Wilson
#> 4 Area4 2015 67 165 40.606061 33.409150 48.230431 95% percentage Wilson
#> 5 Area1 2016 5 185 2.702703 1.159838 6.169833 95% percentage Wilson
#> 6 Area2 2016 52 161 32.298137 25.560301 39.861020 95% percentage Wilson
#> 7 Area3 2016 69 164 42.073171 34.783762 49.725429 95% percentage Wilson
#> 8 Area4 2016 95 129 73.643411 65.436652 80.482747 95% percentage Wilson
# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 12 176 6.818182 2.9056743 15.175500 99.8% percentage
#> 2 Area2 2015 11 200 5.500000 2.2555178 12.800366 99.8% percentage
#> 3 Area3 2015 94 110 85.454545 72.2634682 92.981453 99.8% percentage
#> 4 Area4 2015 67 165 40.606061 29.6217285 52.618270 99.8% percentage
#> 5 Area1 2016 5 185 2.702703 0.7467365 9.301879 99.8% percentage
#> 6 Area2 2016 52 161 32.298137 22.1799192 44.398706 99.8% percentage
#> 7 Area3 2016 69 164 42.073171 30.9207763 54.097910 99.8% percentage
#> 8 Area4 2016 95 129 73.643411 60.3330992 83.694475 99.8% percentage
#> method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson
# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 12 176 6.818182 2.9056743 15.175500
#> 2 Area2 2015 11 200 5.500000 2.2555178 12.800366
#> 3 Area3 2015 94 110 85.454545 72.2634682 92.981453
#> 4 Area4 2015 67 165 40.606061 29.6217285 52.618270
#> 5 Area1 2016 5 185 2.702703 0.7467365 9.301879
#> 6 Area2 2016 52 161 32.298137 22.1799192 44.398706
#> 7 Area3 2016 69 164 42.073171 30.9207763 54.097910
#> 8 Area4 2016 95 129 73.643411 60.3330992 83.694475
# default rate
phe_rate(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 12 176 6818.182 3519.0397 11910.715 95% rate per 100000
#> 2 Area2 2015 11 200 5500.000 2741.8451 9841.619 95% rate per 100000
#> 3 Area3 2015 94 110 85454.545 69054.1930 104575.783 95% rate per 100000
#> 4 Area4 2015 67 165 40606.061 31467.6873 51569.045 95% rate per 100000
#> 5 Area1 2016 5 185 2702.703 877.5602 6307.207 95% rate per 100000
#> 6 Area2 2016 52 161 32298.137 24120.0571 42355.565 95% rate per 100000
#> 7 Area3 2016 69 164 42073.171 32734.0031 53247.072 95% rate per 100000
#> 8 Area4 2016 95 129 73643.411 59580.4102 90026.153 95% rate per 100000
#> method
#> 1 Byars
#> 2 Byars
#> 3 Byars
#> 4 Byars
#> 5 Exact
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 12 176 6.818182 2.2729409 15.386909 99.8% rate per 100
#> 2 Area2 2015 11 200 5.500000 1.7241851 12.823260 99.8% rate per 100
#> 3 Area3 2015 94 110 85.454545 60.7669262 116.370993 99.8% rate per 100
#> 4 Area4 2015 67 165 40.606061 26.9701671 58.410443 99.8% rate per 100
#> 5 Area1 2016 5 185 2.702703 0.3996604 8.894457 99.8% rate per 100
#> 6 Area2 2016 52 161 32.298137 20.1884062 48.693791 99.8% rate per 100
#> 7 Area3 2016 69 164 42.073171 28.1261491 60.212611 99.8% rate per 100
#> 8 Area4 2016 95 129 73.643411 52.4689388 100.128698 99.8% rate per 100
#> method
#> 1 Byars
#> 2 Byars
#> 3 Byars
#> 4 Byars
#> 5 Exact
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 12 176 6.818182 2.2729409 15.386909
#> 2 Area2 2015 11 200 5.500000 1.7241851 12.823260
#> 3 Area3 2015 94 110 85.454545 60.7669262 116.370993
#> 4 Area4 2015 67 165 40.606061 26.9701671 58.410443
#> 5 Area1 2016 5 185 2.702703 0.3996604 8.894457
#> 6 Area2 2016 52 161 32.298137 20.1884062 48.693791
#> 7 Area3 2016 69 164 42.073171 28.1261491 60.212611
#> 8 Area4 2016 95 129 73.643411 52.4689388 100.128698
These functions can also return aggregate data if the input dataframes are grouped:
# default proportion - grouped
df %>%
group_by(year) %>%
phe_proportion(obs, pop)
#> # A tibble: 2 x 9
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 184 651 0.283 0.249 0.318 95% proportion of 1 Wilson
#> 2 2016 221 639 0.346 0.310 0.384 95% proportion of 1 Wilson
# default rate - grouped
df %>%
group_by(year) %>%
phe_rate(obs, pop)
#> # A tibble: 2 x 9
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 184 651 28264. 24327. 32656. 95% rate per 100000 Byars
#> 2 2016 221 639 34585. 30175. 39458. 95% rate per 100000 Byars
The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.
The df test data generated earlier can be used to demonstrate phe_mean:
INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified
# default mean
phe_mean(df,obs)
#> value_sum value_count stdev value lowercl uppercl confidence statistic
#> 1 405 8 37.03642 50.625 19.66178 81.58822 95% mean
#> method
#> 1 Student's t-distribution
# multiple means in a single execution with 99.8% confidence
df %>%
group_by(year) %>%
phe_mean(obs, confidence=0.998)
#> # A tibble: 2 x 10
#> year value_sum value_count stdev value lowercl uppercl confidence statistic
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 2015 184 4 41.3 46 -165. 257. 99.8% mean
#> 2 2016 221 4 37.9 55.2 -138. 249. 99.8% mean
#> # ... with 1 more variable: method <chr>
# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
group_by(year) %>%
phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 x 7
#> year value_sum value_count stdev value lowercl uppercl
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2015 184 4 41.3 46 -165. 257.
#> 2 2016 221 4 37.9 55.2 -138. 249.
The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:
df_std <- data.frame(
area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
year = rep(2006:2010, each = 19 * 2),
sex = rep(rep(c("Male", "Female"), each = 19), 5),
ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
50,55,60,65,70,75,80,85,90), times = 10),
obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#> area year sex ageband obs pop
#> 1 Area1 2006 Male 0 2 16136
#> 2 Area1 2006 Male 5 153 12824
#> 3 Area1 2006 Male 10 129 19215
#> 4 Area1 2006 Male 15 190 16827
#> 5 Area1 2006 Male 20 25 13011
#> 6 Area1 2006 Male 25 145 13302
INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:
The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.
The function can also accept standard populations provided as a column within the input data frame.
standard populations provided as a vector - the vector and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order
standard populations provided as a column within the input data frame - the standard populations can be appended to the input data frame by the user prior to execution of the function - if the data is grouped to generate multiple dsrs then the standard populations will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified
# calculate separate dsrs for each area, year and sex
df_std %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop)
#> # A tibble: 40 x 11
#> # Groups: area, year [20]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1823 290150 637. 607. 668. 95%
#> 2 Area1 2006 Male 1960 272886 761. 724. 798. 95%
#> 3 Area1 2007 Fema~ 1907 278471 708. 674. 742. 95%
#> 4 Area1 2007 Male 1755 269277 649. 617. 681. 95%
#> 5 Area1 2008 Fema~ 1799 285167 627. 596. 659. 95%
#> 6 Area1 2008 Male 1655 272028 744. 706. 783. 95%
#> 7 Area1 2009 Fema~ 2271 283824 838. 802. 876. 95%
#> 8 Area1 2009 Male 2291 277646 827. 791. 864. 95%
#> 9 Area1 2010 Fema~ 1743 285187 606. 575. 637. 95%
#> 10 Area1 2010 Male 1660 251262 624. 591. 658. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 x 8
#> # Groups: area, year [20]
#> area year sex total_count total_pop value lowercl uppercl
#> <fct> <int> <fct> <int> <int> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 1823 290150 637. 607. 668.
#> 2 Area1 2006 Male 1960 272886 761. 724. 798.
#> 3 Area1 2007 Female 1907 278471 708. 674. 742.
#> 4 Area1 2007 Male 1755 269277 649. 617. 681.
#> 5 Area1 2008 Female 1799 285167 627. 596. 659.
#> 6 Area1 2008 Male 1655 272028 744. 706. 783.
#> 7 Area1 2009 Female 2271 283824 838. 802. 876.
#> 8 Area1 2009 Male 2291 277646 827. 791. 864.
#> 9 Area1 2010 Female 1743 285187 606. 575. 637.
#> 10 Area1 2010 Male 1660 251262 624. 591. 658.
#> # ... with 30 more rows
# calculate same specifying standard population in vector form
df_std %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 x 11
#> # Groups: area, year [20]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1823 290150 637. 607. 668. 95%
#> 2 Area1 2006 Male 1960 272886 761. 724. 798. 95%
#> 3 Area1 2007 Fema~ 1907 278471 708. 674. 742. 95%
#> 4 Area1 2007 Male 1755 269277 649. 617. 681. 95%
#> 5 Area1 2008 Fema~ 1799 285167 627. 596. 659. 95%
#> 6 Area1 2008 Male 1655 272028 744. 706. 783. 95%
#> 7 Area1 2009 Fema~ 2271 283824 838. 802. 876. 95%
#> 8 Area1 2009 Male 2291 277646 827. 791. 864. 95%
#> 9 Area1 2010 Fema~ 1743 285187 606. 575. 637. 95%
#> 10 Area1 2010 Male 1660 251262 624. 591. 658. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
mutate(refpop = rep(esp2013,40)) %>%
group_by(area, year, sex) %>%
phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 x 11
#> # Groups: area, year [20]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1823 290150 637. 607. 668. 95%
#> 2 Area1 2006 Male 1960 272886 761. 724. 798. 95%
#> 3 Area1 2007 Fema~ 1907 278471 708. 674. 742. 95%
#> 4 Area1 2007 Male 1755 269277 649. 617. 681. 95%
#> 5 Area1 2008 Fema~ 1799 285167 627. 596. 659. 95%
#> 6 Area1 2008 Male 1655 272028 744. 706. 783. 95%
#> 7 Area1 2009 Fema~ 2271 283824 838. 802. 876. 95%
#> 8 Area1 2009 Male 2291 277646 827. 791. 864. 95%
#> 9 Area1 2010 Fema~ 1743 285187 606. 575. 637. 95%
#> 10 Area1 2010 Male 1660 251262 624. 591. 658. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
filter(ageband <= 70) %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 x 11
#> # Groups: area, year [20]
#> area year sex total_count total_pop value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1604 231777 666. 633. 699. 95%
#> 2 Area1 2006 Male 1510 214775 760. 720. 800. 95%
#> 3 Area1 2007 Fema~ 1546 214213 730. 693. 768. 95%
#> 4 Area1 2007 Male 1355 218590 626. 593. 661. 95%
#> 5 Area1 2008 Fema~ 1452 231978 627. 595. 661. 95%
#> 6 Area1 2008 Male 1462 202204 791. 749. 834. 95%
#> 7 Area1 2009 Fema~ 1916 223715 877. 837. 918. 95%
#> 8 Area1 2009 Male 1667 221576 792. 754. 831. 95%
#> 9 Area1 2010 Fema~ 1233 232456 547. 515. 579. 95%
#> 10 Area1 2010 Male 1086 204182 579. 544. 615. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate separate dsrs for persons for each area and year)
df_std %>%
group_by(area, year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop)) %>%
group_by(area, year) %>%
phe_dsr(obs,pop)
#> # A tibble: 20 x 10
#> # Groups: area [4]
#> area year total_count total_pop value lowercl uppercl confidence statistic
#> <fct> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 3783 563036 697. 674. 721. 95% dsr per ~
#> 2 Area1 2007 3662 547748 676. 653. 700. 95% dsr per ~
#> 3 Area1 2008 3454 557195 651. 628. 675. 95% dsr per ~
#> 4 Area1 2009 4562 561470 820. 795. 845. 95% dsr per ~
#> 5 Area1 2010 3403 536449 595. 573. 617. 95% dsr per ~
#> 6 Area2 2006 3906 550714 730. 706. 755. 95% dsr per ~
#> 7 Area2 2007 3761 577945 685. 662. 709. 95% dsr per ~
#> 8 Area2 2008 3864 582320 697. 674. 721. 95% dsr per ~
#> 9 Area2 2009 4243 548600 772. 748. 797. 95% dsr per ~
#> 10 Area2 2010 3442 590100 560. 540. 581. 95% dsr per ~
#> 11 Area3 2006 4212 547767 834. 807. 861. 95% dsr per ~
#> 12 Area3 2007 3686 562217 696. 672. 720. 95% dsr per ~
#> 13 Area3 2008 4169 562967 758. 733. 782. 95% dsr per ~
#> 14 Area3 2009 3809 562230 671. 648. 693. 95% dsr per ~
#> 15 Area3 2010 3792 581921 662. 640. 685. 95% dsr per ~
#> 16 Area4 2006 3658 575793 642. 619. 664. 95% dsr per ~
#> 17 Area4 2007 3812 579170 700. 676. 724. 95% dsr per ~
#> 18 Area4 2008 3412 553905 608. 586. 630. 95% dsr per ~
#> 19 Area4 2009 3624 573619 641. 619. 663. 95% dsr per ~
#> 20 Area4 2010 3885 572460 691. 668. 715. 95% dsr per ~
#> # ... with 1 more variable: method <chr>
INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the phe_smr and phe_isr functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.
The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:
reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.
reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple smrs/isrs then the reference data will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (isr only), the smr or isr, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:
df_ref <- df_std %>%
filter(year == 2006) %>%
group_by(ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop))
head(df_ref)
#> # A tibble: 6 x 3
#> ageband obs pop
#> <dbl> <int> <int>
#> 1 0 730 125561
#> 2 5 862 118008
#> 3 10 552 122253
#> 4 15 879 104675
#> 5 20 765 118936
#> 6 25 777 118799
Here are some example code chunks to demonstrate the phe_smr function and the arguments that can optionally be specified
# calculate separate smrs for each area, year and sex
df_std %>%
group_by(area, year, sex) %>%
phe_smr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 11
#> # Groups: area, year [20]
#> area year sex observed expected value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1823 2047. 0.890 0.850 0.932 95%
#> 2 Area1 2006 Male 1960 1888. 1.04 0.993 1.09 95%
#> 3 Area1 2007 Fema~ 1907 1927. 0.990 0.946 1.03 95%
#> 4 Area1 2007 Male 1755 1875. 0.936 0.893 0.981 95%
#> 5 Area1 2008 Fema~ 1799 1998. 0.900 0.859 0.943 95%
#> 6 Area1 2008 Male 1655 1883. 0.879 0.837 0.922 95%
#> 7 Area1 2009 Fema~ 2271 1978. 1.15 1.10 1.20 95%
#> 8 Area1 2009 Male 2291 1948. 1.18 1.13 1.23 95%
#> 9 Area1 2010 Fema~ 1743 1995. 0.874 0.833 0.916 95%
#> 10 Area1 2010 Male 1660 1759. 0.944 0.899 0.990 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate the same smrs by appending the reference data to the data frame
df_std %>%
mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
phe_smr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 11
#> # Groups: area, year [20]
#> area year sex observed expected value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1823 2047. 0.890 0.850 0.932 95%
#> 2 Area1 2006 Male 1960 1888. 1.04 0.993 1.09 95%
#> 3 Area1 2007 Fema~ 1907 1927. 0.990 0.946 1.03 95%
#> 4 Area1 2007 Male 1755 1875. 0.936 0.893 0.981 95%
#> 5 Area1 2008 Fema~ 1799 1998. 0.900 0.859 0.943 95%
#> 6 Area1 2008 Male 1655 1883. 0.879 0.837 0.922 95%
#> 7 Area1 2009 Fema~ 2271 1978. 1.15 1.10 1.20 95%
#> 8 Area1 2009 Male 2291 1948. 1.18 1.13 1.23 95%
#> 9 Area1 2010 Fema~ 1743 1995. 0.874 0.833 0.916 95%
#> 10 Area1 2010 Male 1660 1759. 0.944 0.899 0.990 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate separate smrs for each year and drop metadata columns from output
df_std %>%
group_by(year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop)) %>%
group_by(year) %>%
phe_smr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 6
#> year observed expected value lowercl uppercl
#> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2006 15559 15559 1 0.984 1.02
#> 2 2007 14921 15774. 0.946 0.931 0.961
#> 3 2008 14899 15732. 0.947 0.932 0.962
#> 4 2009 16238 15724. 1.03 1.02 1.05
#> 5 2010 14522 15928. 0.912 0.897 0.927
The phe_isr function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:
# calculate separate isrs for each area, year and sex
df_std %>%
group_by(area, year, sex) %>%
phe_isr(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 x 12
#> # Groups: area, year [20]
#> area year sex observed expected ref_rate value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1823 2047. 695. 619. 591. 648. 95%
#> 2 Area1 2006 Male 1960 1888. 695. 722. 690. 755. 95%
#> 3 Area1 2007 Fema~ 1907 1927. 695. 688. 658. 720. 95%
#> 4 Area1 2007 Male 1755 1875. 695. 651. 621. 682. 95%
#> 5 Area1 2008 Fema~ 1799 1998. 695. 626. 598. 656. 95%
#> 6 Area1 2008 Male 1655 1883. 695. 611. 582. 641. 95%
#> 7 Area1 2009 Fema~ 2271 1978. 695. 798. 766. 832. 95%
#> 8 Area1 2009 Male 2291 1948. 695. 818. 785. 852. 95%
#> 9 Area1 2010 Fema~ 1743 1995. 695. 608. 579. 637. 95%
#> 10 Area1 2010 Male 1660 1759. 695. 656. 625. 689. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate the same isrs by appending the reference data to the data frame
df_std %>%
mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
phe_isr(obs, pop, refobs, refpop, refpoptype="field")
#> # A tibble: 40 x 12
#> # Groups: area, year [20]
#> area year sex observed expected ref_rate value lowercl uppercl confidence
#> <fct> <int> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Fema~ 1823 2047. 695. 619. 591. 648. 95%
#> 2 Area1 2006 Male 1960 1888. 695. 722. 690. 755. 95%
#> 3 Area1 2007 Fema~ 1907 1927. 695. 688. 658. 720. 95%
#> 4 Area1 2007 Male 1755 1875. 695. 651. 621. 682. 95%
#> 5 Area1 2008 Fema~ 1799 1998. 695. 626. 598. 656. 95%
#> 6 Area1 2008 Male 1655 1883. 695. 611. 582. 641. 95%
#> 7 Area1 2009 Fema~ 2271 1978. 695. 798. 766. 832. 95%
#> 8 Area1 2009 Male 2291 1948. 695. 818. 785. 852. 95%
#> 9 Area1 2010 Fema~ 1743 1995. 695. 608. 579. 637. 95%
#> 10 Area1 2010 Male 1660 1759. 695. 656. 625. 689. 95%
#> # ... with 30 more rows, and 2 more variables: statistic <chr>, method <chr>
# calculate separate isrs for each year and drop metadata columns from output
df_std %>%
group_by(year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop)) %>%
group_by(year) %>%
phe_isr(obs, pop, df_ref$obs, df_ref$pop, type="standard")
#> # A tibble: 5 x 7
#> year observed expected ref_rate value lowercl uppercl
#> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2006 15559 15559 695. 695. 685. 706.
#> 2 2007 14921 15774. 695. 658. 647. 668.
#> 3 2008 14899 15732. 695. 659. 648. 669.
#> 4 2009 16238 15724. 695. 718. 707. 729.
#> 5 2010 14522 15928. 695. 634. 624. 644.