This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.
library(PHEindicatormethods)
library(dplyr)
This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.
Function | Type | Description |
---|---|---|
phe_proportion | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_rate | Non-aggregate | Performs a calculation on each row of data (unless data is grouped) |
phe_mean | Aggregate | Performs a calculation on each grouping set |
phe_dsr | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
calculate_ISRatio | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
calculate_ISRate | Aggregate, standardised | Performs a calculation on each grouping set and requires additional reference inputs |
The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:
<- data.frame(
df area = rep(c("Area1","Area2","Area3","Area4"), 2),
year = rep(2015:2016, each = 4),
obs = sample(100, 2 * 4, replace = TRUE),
pop = sample(100:200, 2 * 4, replace = TRUE))
df#> area year obs pop
#> 1 Area1 2015 10 135
#> 2 Area2 2015 38 118
#> 3 Area3 2015 100 118
#> 4 Area4 2015 52 135
#> 5 Area1 2016 78 109
#> 6 Area2 2016 17 192
#> 7 Area3 2016 76 124
#> 8 Area4 2016 71 147
INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.
OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified
# default proportion
phe_proportion(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 10 135 0.07407407 0.04073054 0.1309866 95% proportion of 1
#> 2 Area2 2015 38 118 0.32203390 0.24448837 0.4108014 95% proportion of 1
#> 3 Area3 2015 100 118 0.84745763 0.77172806 0.9012777 95% proportion of 1
#> 4 Area4 2015 52 135 0.38518519 0.30735355 0.4693702 95% proportion of 1
#> 5 Area1 2016 78 109 0.71559633 0.62469702 0.7918166 95% proportion of 1
#> 6 Area2 2016 17 192 0.08854167 0.05601543 0.1372095 95% proportion of 1
#> 7 Area3 2016 76 124 0.61290323 0.52500839 0.6940129 95% proportion of 1
#> 8 Area4 2016 71 147 0.48299320 0.40367960 0.5631730 95% proportion of 1
#> method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson
# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 10 135 0.07407407 0.02925417 0.1751708 99.8% proportion of 1
#> 2 Area2 2015 38 118 0.32203390 0.20681397 0.4639022 99.8% proportion of 1
#> 3 Area3 2015 100 118 0.84745763 0.71968271 0.9232048 99.8% proportion of 1
#> 4 Area4 2015 52 135 0.38518519 0.26746001 0.5180806 99.8% proportion of 1
#> 5 Area1 2016 78 109 0.71559633 0.56901776 0.8274410 99.8% proportion of 1
#> 6 Area2 2016 17 192 0.08854167 0.04320031 0.1728733 99.8% proportion of 1
#> 7 Area3 2016 76 124 0.61290323 0.47433068 0.7353294 99.8% proportion of 1
#> 8 Area4 2016 71 147 0.48299320 0.36060673 0.6074545 99.8% proportion of 1
#> method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson
# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic method
#> 1 Area1 2015 10 135 7.407407 4.073054 13.09866 95% percentage Wilson
#> 2 Area2 2015 38 118 32.203390 24.448837 41.08014 95% percentage Wilson
#> 3 Area3 2015 100 118 84.745763 77.172806 90.12777 95% percentage Wilson
#> 4 Area4 2015 52 135 38.518519 30.735355 46.93702 95% percentage Wilson
#> 5 Area1 2016 78 109 71.559633 62.469702 79.18166 95% percentage Wilson
#> 6 Area2 2016 17 192 8.854167 5.601543 13.72095 95% percentage Wilson
#> 7 Area3 2016 76 124 61.290323 52.500839 69.40129 95% percentage Wilson
#> 8 Area4 2016 71 147 48.299320 40.367960 56.31730 95% percentage Wilson
# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic method
#> 1 Area1 2015 10 135 7.407407 2.925417 17.51708 99.8% percentage Wilson
#> 2 Area2 2015 38 118 32.203390 20.681397 46.39022 99.8% percentage Wilson
#> 3 Area3 2015 100 118 84.745763 71.968271 92.32048 99.8% percentage Wilson
#> 4 Area4 2015 52 135 38.518519 26.746001 51.80806 99.8% percentage Wilson
#> 5 Area1 2016 78 109 71.559633 56.901776 82.74410 99.8% percentage Wilson
#> 6 Area2 2016 17 192 8.854167 4.320031 17.28733 99.8% percentage Wilson
#> 7 Area3 2016 76 124 61.290323 47.433068 73.53294 99.8% percentage Wilson
#> 8 Area4 2016 71 147 48.299320 36.060673 60.74545 99.8% percentage Wilson
# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 10 135 7.407407 2.925417 17.51708
#> 2 Area2 2015 38 118 32.203390 20.681397 46.39022
#> 3 Area3 2015 100 118 84.745763 71.968271 92.32048
#> 4 Area4 2015 52 135 38.518519 26.746001 51.80806
#> 5 Area1 2016 78 109 71.559633 56.901776 82.74410
#> 6 Area2 2016 17 192 8.854167 4.320031 17.28733
#> 7 Area3 2016 76 124 61.290323 47.433068 73.53294
#> 8 Area4 2016 71 147 48.299320 36.060673 60.74545
# default rate
phe_rate(df, obs, pop)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 10 135 7407.407 3546.259 13623.30 95% rate per 100000
#> 2 Area2 2015 38 118 32203.390 22786.160 44202.91 95% rate per 100000
#> 3 Area3 2015 100 118 84745.763 68950.937 103074.52 95% rate per 100000
#> 4 Area4 2015 52 135 38518.519 28765.401 50512.93 95% rate per 100000
#> 5 Area1 2016 78 109 71559.633 56562.915 89310.80 95% rate per 100000
#> 6 Area2 2016 17 192 8854.167 5154.936 14177.14 95% rate per 100000
#> 7 Area3 2016 76 124 61290.323 48288.012 76714.98 95% rate per 100000
#> 8 Area4 2016 71 147 48299.320 37720.596 60923.88 95% rate per 100000
#> method
#> 1 Byars
#> 2 Byars
#> 3 Byars
#> 4 Byars
#> 5 Byars
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl confidence statistic
#> 1 Area1 2015 10 135 7.407407 2.160236 17.92128 99.8% rate per 100
#> 2 Area2 2015 38 118 32.203390 18.411842 51.86901 99.8% rate per 100
#> 3 Area3 2015 100 118 84.745763 60.935332 114.35900 99.8% rate per 100
#> 4 Area4 2015 52 135 38.518519 24.076544 58.07185 99.8% rate per 100
#> 5 Area1 2016 78 109 71.559633 49.089484 100.32861 99.8% rate per 100
#> 6 Area2 2016 17 192 8.854167 3.641033 17.72878 99.8% rate per 100
#> 7 Area3 2016 76 124 61.290323 41.821774 86.29739 99.8% rate per 100
#> 8 Area4 2016 71 147 48.299320 32.488672 68.78563 99.8% rate per 100
#> method
#> 1 Byars
#> 2 Byars
#> 3 Byars
#> 4 Byars
#> 5 Byars
#> 6 Byars
#> 7 Byars
#> 8 Byars
# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#> area year obs pop value lowercl uppercl
#> 1 Area1 2015 10 135 7.407407 2.160236 17.92128
#> 2 Area2 2015 38 118 32.203390 18.411842 51.86901
#> 3 Area3 2015 100 118 84.745763 60.935332 114.35900
#> 4 Area4 2015 52 135 38.518519 24.076544 58.07185
#> 5 Area1 2016 78 109 71.559633 49.089484 100.32861
#> 6 Area2 2016 17 192 8.854167 3.641033 17.72878
#> 7 Area3 2016 76 124 61.290323 41.821774 86.29739
#> 8 Area4 2016 71 147 48.299320 32.488672 68.78563
These functions can also return aggregate data if the input dataframes are grouped:
# default proportion - grouped
%>%
df group_by(year) %>%
phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups: year [2]
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 200 506 0.395 0.354 0.438 95% proportion of 1 Wilson
#> 2 2016 242 572 0.423 0.383 0.464 95% proportion of 1 Wilson
# default rate - grouped
%>%
df group_by(year) %>%
phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups: year [2]
#> year obs pop value lowercl uppercl confidence statistic method
#> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 200 506 39526. 34237. 45400. 95% rate per 100000 Byars
#> 2 2016 242 572 42308. 37145. 47988. 95% rate per 100000 Byars
The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.
The df test data generated earlier can be used to demonstrate phe_mean:
INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified
# default mean
phe_mean(df,obs)
#> value_sum value_count stdev value lowercl uppercl confidence statistic
#> 1 442 8 31.66228 55.25 28.77967 81.72033 95% mean
#> method
#> 1 Student's t-distribution
# multiple means in a single execution with 99.8% confidence
%>%
df group_by(year) %>%
phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups: year [2]
#> year value_sum value_count stdev value lowercl uppercl confi…¹ stati…² method
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2015 200 4 37.6 50 -142. 242. 99.8% mean Stude…
#> 2 2016 242 4 29.1 60.5 -88.4 209. 99.8% mean Stude…
#> # … with abbreviated variable names ¹confidence, ²statistic
# multiple means in a single execution with 99.8% confidence and data-only output
%>%
df group_by(year) %>%
phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups: year [2]
#> year value_sum value_count stdev value lowercl uppercl
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2015 200 4 37.6 50 -142. 242.
#> 2 2016 242 4 29.1 60.5 -88.4 209.
The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:
<- data.frame(
df_std area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
year = rep(2006:2010, each = 19 * 2),
sex = rep(rep(c("Male", "Female"), each = 19), 5),
ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
50,55,60,65,70,75,80,85,90), times = 10),
obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#> area year sex ageband obs pop
#> 1 Area1 2006 Male 0 177 14312
#> 2 Area1 2006 Male 5 24 16507
#> 3 Area1 2006 Male 10 177 15590
#> 4 Area1 2006 Male 15 90 17854
#> 5 Area1 2006 Male 20 144 12624
#> 6 Area1 2006 Male 25 13 18162
INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:
The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.
The function can also accept standard populations provided as a column within the input data frame.
standard populations provided as a vector - the vector and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order
standard populations provided as a column within the input data frame - the standard populations can be appended to the input data frame by the user prior to execution of the function - if the data is grouped to generate multiple dsrs then the standard populations will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.
Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified
# calculate separate dsrs for each area, year and sex
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2024 288172 685. 653. 718. 95% dsr pe…
#> 2 Area1 2006 Male 1761 289012 627. 596. 660. 95% dsr pe…
#> 3 Area1 2007 Female 2393 287928 845. 809. 882. 95% dsr pe…
#> 4 Area1 2007 Male 1844 287266 658. 627. 691. 95% dsr pe…
#> 5 Area1 2008 Female 1751 288418 611. 580. 642. 95% dsr pe…
#> 6 Area1 2008 Male 2081 297307 771. 736. 808. 95% dsr pe…
#> 7 Area1 2009 Female 1846 292801 614. 585. 645. 95% dsr pe…
#> 8 Area1 2009 Male 1635 283820 568. 538. 598. 95% dsr pe…
#> 9 Area1 2010 Female 2093 278420 792. 756. 829. 95% dsr pe…
#> 10 Area1 2010 Male 2184 283341 781. 745. 818. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups: area, year, sex [40]
#> area year sex total_count total_pop value lowercl uppercl
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 2024 288172 685. 653. 718.
#> 2 Area1 2006 Male 1761 289012 627. 596. 660.
#> 3 Area1 2007 Female 2393 287928 845. 809. 882.
#> 4 Area1 2007 Male 1844 287266 658. 627. 691.
#> 5 Area1 2008 Female 1751 288418 611. 580. 642.
#> 6 Area1 2008 Male 2081 297307 771. 736. 808.
#> 7 Area1 2009 Female 1846 292801 614. 585. 645.
#> 8 Area1 2009 Male 1635 283820 568. 538. 598.
#> 9 Area1 2010 Female 2093 278420 792. 756. 829.
#> 10 Area1 2010 Male 2184 283341 781. 745. 818.
#> # … with 30 more rows
# calculate same specifying standard population in vector form
%>%
df_std group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2024 288172 685. 653. 718. 95% dsr pe…
#> 2 Area1 2006 Male 1761 289012 627. 596. 660. 95% dsr pe…
#> 3 Area1 2007 Female 2393 287928 845. 809. 882. 95% dsr pe…
#> 4 Area1 2007 Male 1844 287266 658. 627. 691. 95% dsr pe…
#> 5 Area1 2008 Female 1751 288418 611. 580. 642. 95% dsr pe…
#> 6 Area1 2008 Male 2081 297307 771. 736. 808. 95% dsr pe…
#> 7 Area1 2009 Female 1846 292801 614. 585. 645. 95% dsr pe…
#> 8 Area1 2009 Male 1635 283820 568. 538. 598. 95% dsr pe…
#> 9 Area1 2010 Female 2093 278420 792. 756. 829. 95% dsr pe…
#> 10 Area1 2010 Male 2184 283341 781. 745. 818. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate the same dsrs by appending the standard populations to the data frame
%>%
df_std mutate(refpop = rep(esp2013,40)) %>%
group_by(area, year, sex) %>%
phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2024 288172 685. 653. 718. 95% dsr pe…
#> 2 Area1 2006 Male 1761 289012 627. 596. 660. 95% dsr pe…
#> 3 Area1 2007 Female 2393 287928 845. 809. 882. 95% dsr pe…
#> 4 Area1 2007 Male 1844 287266 658. 627. 691. 95% dsr pe…
#> 5 Area1 2008 Female 1751 288418 611. 580. 642. 95% dsr pe…
#> 6 Area1 2008 Male 2081 297307 771. 736. 808. 95% dsr pe…
#> 7 Area1 2009 Female 1846 292801 614. 585. 645. 95% dsr pe…
#> 8 Area1 2009 Male 1635 283820 568. 538. 598. 95% dsr pe…
#> 9 Area1 2010 Female 2093 278420 792. 756. 829. 95% dsr pe…
#> 10 Area1 2010 Male 2184 283341 781. 745. 818. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
%>%
df_std filter(ageband <= 70) %>%
group_by(area, year, sex) %>%
phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex total_count total_…¹ value lowercl uppercl confi…² stati…³
#> <chr> <int> <chr> <int> <int> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 1475 232192 627. 595. 661. 95% dsr pe…
#> 2 Area1 2006 Male 1363 224949 635. 601. 671. 95% dsr pe…
#> 3 Area1 2007 Female 1807 233596 813. 775. 853. 95% dsr pe…
#> 4 Area1 2007 Male 1507 231371 661. 628. 696. 95% dsr pe…
#> 5 Area1 2008 Female 1462 233240 627. 594. 661. 95% dsr pe…
#> 6 Area1 2008 Male 1742 225496 807. 769. 847. 95% dsr pe…
#> 7 Area1 2009 Female 1359 232743 596. 564. 629. 95% dsr pe…
#> 8 Area1 2009 Male 1212 226688 550. 519. 583. 95% dsr pe…
#> 9 Area1 2010 Female 1721 212672 814. 775. 855. 95% dsr pe…
#> 10 Area1 2010 Male 1616 228550 765. 727. 805. 95% dsr pe…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # names ¹total_pop, ²confidence, ³statistic
# calculate separate dsrs for persons for each area and year)
%>%
df_std group_by(area, year, ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last") %>%
phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups: area, year [20]
#> area year total_count total_…¹ value lowercl uppercl confi…² stati…³ method
#> <chr> <int> <int> <int> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 Area1 2006 3785 577184 625. 604. 647. 95% dsr pe… Dobson
#> 2 Area1 2007 4237 575194 742. 719. 766. 95% dsr pe… Dobson
#> 3 Area1 2008 3832 585725 667. 645. 690. 95% dsr pe… Dobson
#> 4 Area1 2009 3481 576621 580. 560. 601. 95% dsr pe… Dobson
#> 5 Area1 2010 4277 561761 771. 746. 796. 95% dsr pe… Dobson
#> 6 Area2 2006 2572 553802 483. 463. 503. 95% dsr pe… Dobson
#> 7 Area2 2007 4362 565559 760. 736. 784. 95% dsr pe… Dobson
#> 8 Area2 2008 4126 554590 778. 754. 804. 95% dsr pe… Dobson
#> 9 Area2 2009 4296 590844 782. 758. 807. 95% dsr pe… Dobson
#> 10 Area2 2010 3625 583271 633. 612. 656. 95% dsr pe… Dobson
#> 11 Area3 2006 3855 556619 688. 665. 711. 95% dsr pe… Dobson
#> 12 Area3 2007 3270 549441 621. 598. 644. 95% dsr pe… Dobson
#> 13 Area3 2008 3885 561281 769. 744. 795. 95% dsr pe… Dobson
#> 14 Area3 2009 3731 575388 656. 634. 678. 95% dsr pe… Dobson
#> 15 Area3 2010 3561 530638 700. 676. 725. 95% dsr pe… Dobson
#> 16 Area4 2006 3547 565865 675. 652. 699. 95% dsr pe… Dobson
#> 17 Area4 2007 3892 545201 733. 709. 758. 95% dsr pe… Dobson
#> 18 Area4 2008 3723 562596 690. 667. 713. 95% dsr pe… Dobson
#> 19 Area4 2009 3879 573563 667. 644. 690. 95% dsr pe… Dobson
#> 20 Area4 2010 4431 558789 812. 787. 838. 95% dsr pe… Dobson
#> # … with abbreviated variable names ¹total_pop, ²confidence, ³statistic
INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.
The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:
reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.
reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple indirectly standardised rates or ratios then the reference data will need to be repeated and appended to the data rows for every grouping set.
OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.
OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.
The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:
<- df_std %>%
df_ref filter(year == 2006) %>%
group_by(ageband) %>%
summarise(obs = sum(obs),
pop = sum(pop),
.groups = "drop_last")
head(df_ref)
#> # A tibble: 6 × 3
#> ageband obs pop
#> <dbl> <int> <int>
#> 1 0 818 113929
#> 2 5 536 126250
#> 3 10 859 122899
#> 4 15 783 134328
#> 5 20 1112 120966
#> 6 25 708 114075
Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified
# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
%>%
df_std group_by(area, year, sex) %>%
calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups: area, year, sex [40]
#> area year sex observed expected value lowercl uppercl confidence stati…¹
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 Area1 2006 Female 2024 1751. 1.16 1.11 1.21 95% indire…
#> 2 Area1 2006 Male 1761 1753. 1.00 0.958 1.05 95% indire…
#> 3 Area1 2007 Female 2393 1758. 1.36 1.31 1.42 95% indire…
#> 4 Area1 2007 Male 1844 1742. 1.06 1.01 1.11 95% indire…
#> 5 Area1 2008 Female 1751 1772. 0.988 0.943 1.04 95% indire…
#> 6 Area1 2008 Male 2081 1820. 1.14 1.09 1.19 95% indire…
#> 7 Area1 2009 Female 1846 1762. 1.05 1.00 1.10 95% indire…
#> 8 Area1 2009 Male 1635 1749. 0.935 0.890 0.981 95% indire…
#> 9 Area1 2010 Female 2093 1686. 1.24 1.19 1.30 95% indire…
#> 10 Area1 2010 Male 2184 1740. 1.26 1.20 1.31 95% indire…
#> # … with 30 more rows, 1 more variable: method <chr>, and abbreviated variable
#> # name ¹statistic
# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
%>%
df_std mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field",
type = "standard")
#> # A tibble: 40 × 8
#> # Groups: area, year, sex [40]
#> area year sex observed expected value lowercl uppercl
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 2024 1751. 1.16 1.11 1.21
#> 2 Area1 2006 Male 1761 1753. 1.00 0.958 1.05
#> 3 Area1 2007 Female 2393 1758. 1.36 1.31 1.42
#> 4 Area1 2007 Male 1844 1742. 1.06 1.01 1.11
#> 5 Area1 2008 Female 1751 1772. 0.988 0.943 1.04
#> 6 Area1 2008 Male 2081 1820. 1.14 1.09 1.19
#> 7 Area1 2009 Female 1846 1762. 1.05 1.00 1.10
#> 8 Area1 2009 Male 1635 1749. 0.935 0.890 0.981
#> 9 Area1 2010 Female 2093 1686. 1.24 1.19 1.30
#> 10 Area1 2010 Male 2184 1740. 1.26 1.20 1.31
#> # … with 30 more rows
The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:
# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
%>%
df_std group_by(area, year, sex) %>%
calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups: area, year, sex [40]
#> area year sex observed expected ref_rate value lowercl uppercl confide…¹
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 Area1 2006 Female 2024 1751. 611. 706. 676. 737. 95%
#> 2 Area1 2006 Male 1761 1753. 611. 613. 585. 643. 95%
#> 3 Area1 2007 Female 2393 1758. 611. 831. 798. 865. 95%
#> 4 Area1 2007 Male 1844 1742. 611. 646. 617. 677. 95%
#> 5 Area1 2008 Female 1751 1772. 611. 603. 576. 632. 95%
#> 6 Area1 2008 Male 2081 1820. 611. 698. 669. 729. 95%
#> 7 Area1 2009 Female 1846 1762. 611. 640. 611. 669. 95%
#> 8 Area1 2009 Male 1635 1749. 611. 571. 543. 599. 95%
#> 9 Area1 2010 Female 2093 1686. 611. 758. 726. 791. 95%
#> 10 Area1 2010 Male 2184 1740. 611. 766. 735. 799. 95%
#> # … with 30 more rows, 2 more variables: statistic <chr>, method <chr>, and
#> # abbreviated variable name ¹confidence
# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
%>%
df_std mutate(refobs = rep(df_ref$obs,40),
refpop = rep(df_ref$pop,40)) %>%
group_by(area, year, sex) %>%
calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field",
type = "standard")
#> # A tibble: 40 × 9
#> # Groups: area, year, sex [40]
#> area year sex observed expected ref_rate value lowercl uppercl
#> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Area1 2006 Female 2024 1751. 611. 706. 676. 737.
#> 2 Area1 2006 Male 1761 1753. 611. 613. 585. 643.
#> 3 Area1 2007 Female 2393 1758. 611. 831. 798. 865.
#> 4 Area1 2007 Male 1844 1742. 611. 646. 617. 677.
#> 5 Area1 2008 Female 1751 1772. 611. 603. 576. 632.
#> 6 Area1 2008 Male 2081 1820. 611. 698. 669. 729.
#> 7 Area1 2009 Female 1846 1762. 611. 640. 611. 669.
#> 8 Area1 2009 Male 1635 1749. 611. 571. 543. 599.
#> 9 Area1 2010 Female 2093 1686. 611. 758. 726. 791.
#> 10 Area1 2010 Male 2184 1740. 611. 766. 735. 799.
#> # … with 30 more rows