The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

2 Introduction

In this vignette, we will explore the OmopSketch functions designed to provide an overview of the observation_period table. Specifically, there are 3 key functions that facilitate this:

2.1 Create a mock cdm

Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the R package omock.

library(dplyr)
library(OmopSketch)
library(omock)

# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.

3 Summarise observation periods

Let’s now use the summariseObservationPeriod() function from the OmopSketch package to generate an overview of the observation_period table.
This function provides both a general summary of the table and some detailed statistics, such as the Number of subjects and the Duration in days for each observation period (e.g., first, second, etc.).

summarisedResult <- summariseObservationPeriod(cdm = cdm)

summarisedResult
#> # A tibble: 3,126 × 13
#>    result_id cdm_name group_name            group_level strata_name strata_level
#>        <int> <chr>    <chr>                 <chr>       <chr>       <chr>       
#>  1         1 GiBleed  observation_period_o… all         overall     overall     
#>  2         1 GiBleed  observation_period_o… all         overall     overall     
#>  3         1 GiBleed  observation_period_o… all         overall     overall     
#>  4         1 GiBleed  observation_period_o… all         overall     overall     
#>  5         1 GiBleed  observation_period_o… all         overall     overall     
#>  6         1 GiBleed  observation_period_o… all         overall     overall     
#>  7         1 GiBleed  observation_period_o… all         overall     overall     
#>  8         1 GiBleed  observation_period_o… all         overall     overall     
#>  9         1 GiBleed  observation_period_o… all         overall     overall     
#> 10         1 GiBleed  observation_period_o… all         overall     overall     
#> # ℹ 3,116 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>

Notice that the output is in the summarised result format.

We can use the function arguments to specify which summary statistics to compute. For instance, the estimates argument allows us to define which estimates we want to calculate for variables such as the Duration in days of the observation period or the Number of records per person.

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95")
)

summarisedResult |>
  filter(variable_name == "Duration in days") |>
  select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 8 × 4
#>   group_level variable_name    estimate_name estimate_value  
#>   <chr>       <chr>            <chr>         <chr>           
#> 1 all         Duration in days mean          14402.0014972862
#> 2 all         Duration in days sd            8725.34082831129
#> 3 all         Duration in days q05           1436            
#> 4 all         Duration in days q95           29133           
#> 5 1st         Duration in days mean          14402.0014972862
#> 6 1st         Duration in days sd            8725.34082831129
#> 7 1st         Duration in days q05           1436            
#> 8 1st         Duration in days q95           29133

By default, the function returns statistics for the Number of subjects, Duration in days, and Days to next observation both overall and by each ordinal observation period (for example, first, second, etc.).

If we are only interested in overall statistics rather than those broken down by ordinal period, we can set the argument byOrdinal = FALSE:

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95"), 
  byOrdinal = FALSE
)

summarisedResult |>
  filter(variable_name == "Duration in days") |>
  distinct(group_name, group_level)
#> # A tibble: 1 × 2
#>   group_name                 group_level
#>   <chr>                      <chr>      
#> 1 observation_period_ordinal all

3.1 Missingness

When the argument missingData = TRUE is set, the results will include an overall summary of missing data in the table, including the number of 0s in the concept columns. This output is analogous to the results produced by the OmopSketch function summariseMissingData().

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95"), 
  missingData = TRUE
)

summarisedResult |>
  filter(variable_name == "Column name") |>
  select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 16 × 4
#>    group_level variable_name estimate_name   estimate_value
#>    <chr>       <chr>         <chr>           <chr>         
#>  1 all         Column name   na_count        0             
#>  2 all         Column name   na_percentage   0.00          
#>  3 all         Column name   zero_count      0             
#>  4 all         Column name   zero_percentage 0.00          
#>  5 all         Column name   na_count        0             
#>  6 all         Column name   na_percentage   0.00          
#>  7 all         Column name   zero_count      0             
#>  8 all         Column name   zero_percentage 0.00          
#>  9 all         Column name   na_count        0             
#> 10 all         Column name   na_percentage   0.00          
#> 11 all         Column name   na_count        0             
#> 12 all         Column name   na_percentage   0.00          
#> 13 all         Column name   na_count        0             
#> 14 all         Column name   na_percentage   0.00          
#> 15 all         Column name   zero_count      0             
#> 16 all         Column name   zero_percentage 0.00

3.2 Quality

When the argument quality = TRUE is set, the results will include a quality assessment of the observation period table.
This assessment provides information such as:

  • Issues with date columns (e.g., start dates occurring after end dates, or dates preceding a subject’s birth date).
  • The presence of person_id values that do not exist in the person table.
  • A summary of the types of concept_id available in the table.
summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = "mean", 
  missingData = FALSE, 
  quality = TRUE
)

summarisedResult |>
  select(group_level, variable_name, variable_level, estimate_name, estimate_value)
#> # A tibble: 14 × 5
#>    group_level variable_name         variable_level estimate_name estimate_value
#>    <chr>       <chr>                 <chr>          <chr>         <chr>         
#>  1 all         Records per person    <NA>           mean          1             
#>  2 all         Duration in days      <NA>           mean          14402.0014972…
#>  3 1st         Duration in days      <NA>           mean          14402.0014972…
#>  4 all         Number records        <NA>           count         5343          
#>  5 all         Number subjects       <NA>           count         5343          
#>  6 1st         Number subjects       <NA>           count         5343          
#>  7 all         Type concept id       Period coveri… count         5343          
#>  8 all         Subjects not in pers… <NA>           count         2649          
#>  9 all         Subjects not in pers… <NA>           percentage    49.58         
#> 10 all         End date before star… <NA>           count         0             
#> 11 all         Start date before bi… <NA>           count         0             
#> 12 all         Type concept id       Period coveri… percentage    100.00        
#> 13 all         End date before star… <NA>           percentage    0.00          
#> 14 all         Start date before bi… <NA>           percentage    0.00

3.3 Strata

It is also possible to stratify the results by sex and age groups:

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95"),
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
)

Notice that, by default, the “overall” group will be also included, as well as crossed strata (that means, sex == "Female" and ageGroup == "\>35").

3.4 Tidy the summarised object

tableObservationPeriod() will help you to create a table (see supported types with: visOmopResults::tableType()). By default it creates a gt table.

summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  estimates = c("mean", "sd", "q05", "q95"),
  sex = TRUE
)

summarisedResult |>
  tableObservationPeriod(type = "gt")
Summary of observation_period table
Observation period ordinal Variable name Variable level Estimate name
CDM name
GiBleed
overall
all Number records N 5,343
Number subjects N 5,343
Subjects not in person table N (%) 2,649 (49.58%)
Records per person Mean (SD) 1.00 (0.00)
Duration in days Mean (SD) 14,402.00 (8,725.34)
Type concept id Period covering healthcare encounters N (%) 5,343 (100.00%)
Start date before birth date N (%) 0 (0.00%)
End date before start date N (%) 0 (0.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
1st Number subjects N 5,343
Duration in days Mean (SD) 14,402.00 (8,725.34)
Female
all Number records N 1,373
Number subjects N 1,373
Records per person Mean (SD) 1.00 (0.00)
Duration in days Mean (SD) 21,666.77 (5,623.53)
Type concept id Period covering healthcare encounters N (%) 1,373 (100.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
1st Number subjects N 1,373
Duration in days Mean (SD) 21,666.77 (5,623.53)
Male
all Number records N 1,321
Number subjects N 1,321
Records per person Mean (SD) 1.00 (0.00)
Duration in days Mean (SD) 21,535.91 (5,287.44)
Type concept id Period covering healthcare encounters N (%) 1,321 (100.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
1st Number subjects N 1,321
Duration in days Mean (SD) 21,535.91 (5,287.44)
None
all Number records N 2,649
Number subjects N 2,649
Records per person Mean (SD) 1.00 (0.00)
Duration in days Mean (SD) 7,079.08 (4,106.70)
Type concept id Period covering healthcare encounters N (%) 2,649 (100.00%)
Column name Observation period end date N missing data (%) 0 (0.00%)
Observation period id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Observation period start date N missing data (%) 0 (0.00%)
Period type concept id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
Person id N missing data (%) 0 (0.00%)
N zeros (%) 0 (0.00%)
1st Number subjects N 2,649
Duration in days Mean (SD) 7,079.08 (4,106.70)

3.5 Visualise the results

Finally, we can visualise the result using plotObservationPeriod().

summarisedResult <- summariseObservationPeriod(cdm = cdm)

plotObservationPeriod(
  result = summarisedResult,
  variableName = "Number subjects",
  plotType = "barplot"
)

Note that either Number subjects or Duration in days can be plotted. For Number of subjects, the plot type can be barplot, whereas for Duration in days, the plot type can be barplot, boxplot, or densityplot.”

Additionally, if results were stratified by sex or age group, we can further use facet or colour arguments to highlight the different results in the plot. To help us identify by which variables we can colour or facet by, we can use visOmopResult package.

summarisedResult <- summariseObservationPeriod(cdm = cdm, sex = TRUE)

plotObservationPeriod(
  result = summarisedResult,
  variableName = "Duration in days",
  plotType = "boxplot",
  facet = "sex"
)


summarisedResult <- summariseObservationPeriod(
  cdm = cdm,
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)

plotObservationPeriod(
  result = summarisedResult,
  colour = "sex",
  facet = "age_group"
)

4 Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.