The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Standardise OMOP results

Introduction

This vignette illustrates how EpiStandard can be used to standardise results from the IncidencePrevalence package, a DARWIN-EU package for estimating incidence and prevalence from data mapped to the OMOP Common Data Model.

Set-up

We first load the relevant packages and then create a mock OMOP-formatted dataset containing a population and an outcome of interest.

library(EpiStandard)
library(IncidencePrevalence)
library(omopgenerics)
library(dplyr)

cdm <- mockIncidencePrevalence(
  sampleSize = 10000,
  outPre = 0.25
)

Estimate Incidence Rates

We use the IncidencePrevalence package to generate a denominator cohort with age and sex stratifications, and then calculate overall outcome incidence and incidence by calendar year for each stratum combination. For more information on how to use this package, refer to its website.

Notice that the results are stored as a summarised_result object.

cdm <- generateDenominatorCohortSet(
  cdm = cdm,
  name = "denominator",
  cohortDateRange = as.Date(c("2008-01-01", "2020-01-01")),
  ageGroup = list(c(0, 19), c(20, 64), c(65, 150), c(0, 150)),
  sex = c("Male", "Female"),
  daysPriorObservation = 0
)

inc <- estimateIncidence(
  cdm = cdm,
  denominatorTable = "denominator",
  outcomeTable = "outcome",
  interval = c("years", "overall"),
  outcomeWashout = 0,
  repeatedEvents = FALSE
)

inc |> glimpse()
#> Rows: 1,049
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name       <chr> "denominator_cohort_name &&& outcome_cohort_name", "d…
#> $ group_level      <chr> "denominator_cohort_8 &&& cohort_1", "denominator_coh…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Denominator", "Outcome", "Denominator", "Denominator…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "denominator_count", "outcome_count", "person_days", …
#> $ estimate_type    <chr> "integer", "integer", "numeric", "numeric", "numeric"…
#> $ estimate_value   <chr> "277", "14", "85765", "234.812", "5962.217", "3259.59…
#> $ additional_name  <chr> "incidence_start_date &&& incidence_end_date &&& anal…
#> $ additional_level <chr> "2008-01-01 &&& 2008-12-31 &&& years", "2008-01-01 &&…

To use EpiStandard, we first filter the incidence results to the age groups of interest for standardisation and then convert the results into a format that facilitates further manipulation.

Standardise Incidence Rates

incidenceTidy <- inc |>
  filterSettings(denominator_age_group %in% c("0 to 19", "20 to 64", "65 to 150")) |>
  asIncidenceResult()

incidenceTidy |> glimpse()
#> Rows: 69
#> Columns: 26
#> $ cdm_name                             <chr> "mock", "mock", "mock", "mock", "…
#> $ denominator_cohort_name              <chr> "denominator_cohort_2", "denomina…
#> $ outcome_cohort_name                  <chr> "cohort_1", "cohort_1", "cohort_1…
#> $ incidence_start_date                 <date> 2008-01-01, 2009-01-01, 2010-01-…
#> $ incidence_end_date                   <date> 2008-12-31, 2009-12-31, 2010-12-…
#> $ analysis_interval                    <chr> "years", "years", "years", "years…
#> $ analysis_censor_cohort_name          <chr> "None", "None", "None", "None", "…
#> $ analysis_complete_database_intervals <chr> "TRUE", "TRUE", "TRUE", "TRUE", "…
#> $ analysis_outcome_washout             <chr> "0", "0", "0", "0", "0", "0", "0"…
#> $ analysis_repeated_events             <chr> "FALSE", "FALSE", "FALSE", "FALSE…
#> $ denominator_age_group                <chr> "0 to 19", "0 to 19", "0 to 19", …
#> $ denominator_days_prior_observation   <chr> "0", "0", "0", "0", "0", "0", "0"…
#> $ denominator_end_date                 <date> 2020-01-01, 2020-01-01, 2020-01-…
#> $ denominator_requirements_at_entry    <chr> "FALSE", "FALSE", "FALSE", "FALSE…
#> $ denominator_sex                      <chr> "Female", "Female", "Female", "Fe…
#> $ denominator_start_date               <date> 2008-01-01, 2008-01-01, 2008-01-…
#> $ denominator_target_cohort_name       <chr> "None", "None", "None", "None", "…
#> $ denominator_time_at_risk             <chr> "0 to Inf", "0 to Inf", "0 to Inf…
#> $ denominator_count                    <int> 38, 44, 38, 23, 16, 14, 8, 7, 4, …
#> $ outcome_count                        <int> 0, 1, 1, 0, 1, 0, 0, 0, 0, 3, 3, …
#> $ person_days                          <dbl> 12529, 12754, 10404, 7081, 5510, …
#> $ person_years                         <dbl> 34.303, 34.919, 28.485, 19.387, 1…
#> $ incidence_100000_pys                 <dbl> 0.000, 2863.770, 3510.620, 0.000,…
#> $ incidence_100000_pys_95CI_lower      <dbl> 0.000, 72.504, 88.881, 0.000, 167…
#> $ incidence_100000_pys_95CI_upper      <dbl> 10753.810, 15955.908, 19559.921, …
#> $ result_type                          <chr> "tidy_incidence", "tidy_incidence…

Now that the incidence results are in the correct format, we prepare the reference population. In this example, we use the European Standard Population (esp2013). As this population uses 5-year age bands, we merge these groups to match those used to estimate incidence.

standardPop <- mergeAgeGroups(
  standardPopulation("Europe"),
  newGroups = c("0 to 19", "20 to 64", "65 to 150")
) |>
  rename("denominator_age_group" = "age_group")
standardPop |> glimpse()
#> Rows: 3
#> Columns: 2
#> $ denominator_age_group <chr> "0 to 19", "20 to 64", "65 to 150"
#> $ pop                   <int> 21500, 59000, 19500

Finally, we use the directlyStandardiseRates() function to standardise the incidence results to the reference population. We specify the columns containing the event counts, person-years, and population weights, as well as the column identifying age groups. Since we want to standardise within each outcome, sex, and calendar-time stratum, we use the strata argument as follows:

standardInc <- directlyStandardiseRates(
  data = incidenceTidy,
  refdata = standardPop,
  event = "outcome_count",
  denominator = "person_years",
  age = "denominator_age_group",
  pop = "pop",
  strata = c("incidence_start_date", "denominator_sex", "analysis_interval", "outcome_cohort_name")
)
standardInc |> glimpse()
#> Rows: 22
#> Columns: 12
#> $ incidence_start_date         <date> 2009-01-01, 2010-01-01, 2012-01-01, 2008…
#> $ denominator_sex              <chr> "Female", "Female", "Female", "Female", "…
#> $ analysis_interval            <chr> "years", "years", "years", "overall", "ye…
#> $ outcome_cohort_name          <chr> "cohort_1", "cohort_1", "cohort_1", "coho…
#> $ outcome_count                <int> 10, 14, 4, 51, 13, 14, 9, 13, 4, 5, 71, 1…
#> $ person_years                 <dbl> 233.057, 209.706, 132.954, 1298.001, 272.…
#> $ crude_rate                   <dbl> 4290.7958, 6676.0131, 3008.5594, 3929.118…
#> $ crude_rate_95CI_lower        <dbl> 1631.3819, 3178.9678, 60.2254, 2850.7726,…
#> $ crude_rate_95CI_upper        <dbl> 6950.210, 10173.058, 5956.893, 5007.465, …
#> $ standardised_rate            <dbl> 4276.532, 6868.422, 3078.205, 3860.587, 5…
#> $ standardised_rate_95CI_lower <dbl> 1527.5585, 3158.1231, -362.0191, 2744.249…
#> $ standardised_rate_95CI_upper <dbl> 7025.506, 10578.720, 6518.429, 4976.924, …

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.