2 Introduction

In this vignette, we explore how the OmopSketch function databaseCharacteristics() and shinyCharacteristics() can serve as a valuable tool for characterising databases containing electronic health records mapped to the OMOP Common Data Model.

2.1 Create a mock cdm

We begin by loading the necessary packages and creating a mock CDM using the mockOmopSketch() function:

library(dplyr)
library(OmopSketch)

cdm <- mockOmopSketch()

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: person, observation_period, cdm_source, concept, vocabulary,
#> concept_relationship, concept_synonym, concept_ancestor, drug_strength,
#> condition_occurrence, death, drug_exposure, measurement, observation,
#> procedure_occurrence, visit_occurrence, device_exposure
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise database characteristics

The databaseCharacteristics() function provides a comprehensive summary of the CDM, returning a summarised result that includes:

result <- databaseCharacteristics(cdm)
#> The characterisation will focus on the following OMOP tables: person,
#> observation_period, visit_occurrence, condition_occurrence, drug_exposure,
#> procedure_occurrence, device_exposure, measurement, observation, and death
#> → Getting cdm snapshot
#> Warning: Vocabulary version in cdm_source (NA) doesn't match the one in the vocabulary
#> table (v5.0 18-JAN-19)
#> → Getting population characteristics
#> ℹ Building new trimmed cohort
#> Creating initial cohort
#> ✔ Cohort trimmed
#> ℹ adding demographics columns
#> 
#> ℹ summarising data
#> 
#> ℹ summarising cohort general_population
#> 
#> ✔ summariseCharacteristics finished!
#> 
#> → Summarising missing data
#> Warning: These columns contain missing values, which are not permitted:
#> "race_concept_id" and "ethnicity_concept_id"
#> Warning: These columns contain missing values, which are not permitted:
#> "period_type_concept_id"
#> Warning: device_exposureomop table is empty.
#> ! 56 duplicated rows eliminated.
#> → Summarising table quality
#> Warning: device_exposureomop table is empty.
#> → Summarising clinical records
#> ℹ Adding variables of interest to observation_period.
#> ℹ Summarising records per person in observation_period.
#> ℹ Summarising observation_period: `in_observation` and `type_concept`.
#> ℹ Adding variables of interest to visit_occurrence.
#> ℹ Summarising records per person in visit_occurrence.
#> ℹ Summarising visit_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising drug_exposure: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to procedure_occurrence.
#> ℹ Summarising records per person in procedure_occurrence.
#> ℹ Summarising procedure_occurrence: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> Warning: device_exposure is empty.
#> ℹ Adding variables of interest to measurement.
#> ℹ Summarising records per person in measurement.
#> ℹ Summarising measurement: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to observation.
#> ℹ Summarising records per person in observation.
#> ℹ Summarising observation: `in_observation`, `standard_concept`,
#>   `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to death.
#> ℹ Summarising records per person in death.
#> ℹ Summarising death: `in_observation`, `standard_concept`, `source_vocabulary`,
#>   `domain_id`, and `type_concept`.
#> → Summarising record counts
#> Warning: device_exposure omop table is empty after application of date range.
#> → Summarising in observation records, subjects, person-days, age and sex
#> ℹ The following estimates will be computed:
#> • age: median
#> → Start summary of data, at 2025-06-18 20:18:17.374429
#> 
#> ✔ Summary finished, at 2025-06-18 20:18:17.425096
#> → Summarising observation period
#> ☺ Database characterisation finished. Code ran in 0 min and 16 sec
#> ℹ 1 table created: "og_075_1750274284".
omopgenerics::settings(result) |> dplyr::select("result_id", "result_type", "package_name")
#> # A tibble: 8 × 3
#>   result_id result_type                  package_name         
#>       <int> <chr>                        <chr>                
#> 1         1 summarise_omop_snapshot      OmopSketch           
#> 2         2 summarise_characteristics    CohortCharacteristics
#> 3         3 summarise_missing_data       OmopSketch           
#> 4         4 summarise_table_quality      OmopSketch           
#> 5         5 summarise_clinical_records   OmopSketch           
#> 6         6 summarise_record_count       OmopSketch           
#> 7         7 summarise_in_observation     OmopSketch           
#> 8         8 summarise_observation_period OmopSketch

3.1 Selecting tables to characterise

By default, the following OMOP tables are included in the characterisation: person, observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.

You can customise which tables to include in the analysis by specifying them with the omopTableName argument.

result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"))

3.2 Stratifying by Sex

To stratify the characterisation results by sex, set the sex argument to TRUE:

result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
                                  sex = TRUE)

3.3 Stratifying by Age Group

You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.

result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
                                  ageGroup = list(c(0,50), c(51,100)))

3.4 Filtering by date range and time interval

Use the dateRange argument to limit the analysis to a specific period. Combine it with the interval argument to stratify results by time. Valid values for interval include “overall” (default), “years”, “quarters”, and “months”:

result <- databaseCharacteristics(cdm,
                                 interval = "years",
                                 dateRange = as.Date(c("2010-01-01", "2018-12-31")))

3.5 Including Concept Counts

To include concept counts in the characterisation, set conceptIdCounts = TRUE:

result <- databaseCharacteristics(cdm,
                                  conceptIdCounts = TRUE)

4 Visualise the characterisation results

To explore the characterisation results interactively, you can use the shinyCharacteristics() function. This function generates a Shiny application in the specified directory, allowing you to browse, filter, and visualise the results through an intuitive user interface.

shinyCharacteristics(result = result, directory = "path/to/your/shiny")

4.1 Customise the Shiny App

You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:

  • title: The title displayed at the top of the app

  • logo: Path to a custom logo (must be in SVG format)

  • theme: A custom Bootstrap theme (e.g., using bslib::bs_theme())

shinyCharacteristics(result = result, directory = "path/to/my/shiny",
                     title = "Characterisation of my data",
                     logo = "path/to/my/logo.svg",
                     theme = "bslib::bs_theme(bootswatch = 'flatly')")

An example of the Shiny application generated by shinyCharacteristics() can be explored here, where the characterisation of several synthetic datasets is available.