In this vignette, we explore how the OmopSketch function
databaseCharacteristics()
and
shinyCharacteristics()
can serve as a valuable tool for
characterising databases containing electronic health records mapped to
the OMOP Common Data Model.
We begin by loading the necessary packages and creating a mock CDM
using the mockOmopSketch()
function:
library(dplyr)
library(OmopSketch)
cdm <- mockOmopSketch()
cdm
#>
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: person, observation_period, cdm_source, concept, vocabulary,
#> concept_relationship, concept_synonym, concept_ancestor, drug_strength,
#> condition_occurrence, death, drug_exposure, measurement, observation,
#> procedure_occurrence, visit_occurrence, device_exposure
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
The databaseCharacteristics()
function provides a
comprehensive summary of the CDM, returning a summarised
result that includes:
A general database snapshot, using
summariseOmopSnapshot()
A characterisation of the population in observation, built using the CohortConstructor and CohortCharacteristics packages
A summary of the observation period table using
summariseObservationPeriod()
and
summariseInObservation()
A data quality assessment of the clinical tables using
summariseMissingData()
A characterisation of the clinical tables with
summariseClinicalRecords()
and
summariseRecordCount()
result <- databaseCharacteristics(cdm)
#> The characterisation will focus on the following OMOP tables: person,
#> observation_period, visit_occurrence, condition_occurrence, drug_exposure,
#> procedure_occurrence, device_exposure, measurement, observation, and death
#> → Getting cdm snapshot
#> Warning: Vocabulary version in cdm_source (NA) doesn't match the one in the vocabulary
#> table (v5.0 18-JAN-19)
#> → Getting population characteristics
#> ℹ Building new trimmed cohort
#> Creating initial cohort
#> ✔ Cohort trimmed
#> ℹ adding demographics columns
#>
#> ℹ summarising data
#>
#> ℹ summarising cohort general_population
#>
#> ✔ summariseCharacteristics finished!
#>
#> → Summarising missing data
#> Warning: These columns contain missing values, which are not permitted:
#> "race_concept_id" and "ethnicity_concept_id"
#> Warning: These columns contain missing values, which are not permitted:
#> "period_type_concept_id"
#> Warning: device_exposureomop table is empty.
#> ! 56 duplicated rows eliminated.
#> → Summarising table quality
#> Warning: device_exposureomop table is empty.
#> → Summarising clinical records
#> ℹ Adding variables of interest to observation_period.
#> ℹ Summarising records per person in observation_period.
#> ℹ Summarising observation_period: `in_observation` and `type_concept`.
#> ℹ Adding variables of interest to visit_occurrence.
#> ℹ Summarising records per person in visit_occurrence.
#> ℹ Summarising visit_occurrence: `in_observation`, `standard_concept`,
#> `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising condition_occurrence: `in_observation`, `standard_concept`,
#> `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising drug_exposure: `in_observation`, `standard_concept`,
#> `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to procedure_occurrence.
#> ℹ Summarising records per person in procedure_occurrence.
#> ℹ Summarising procedure_occurrence: `in_observation`, `standard_concept`,
#> `source_vocabulary`, `domain_id`, and `type_concept`.
#> Warning: device_exposure is empty.
#> ℹ Adding variables of interest to measurement.
#> ℹ Summarising records per person in measurement.
#> ℹ Summarising measurement: `in_observation`, `standard_concept`,
#> `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to observation.
#> ℹ Summarising records per person in observation.
#> ℹ Summarising observation: `in_observation`, `standard_concept`,
#> `source_vocabulary`, `domain_id`, and `type_concept`.
#> ℹ Adding variables of interest to death.
#> ℹ Summarising records per person in death.
#> ℹ Summarising death: `in_observation`, `standard_concept`, `source_vocabulary`,
#> `domain_id`, and `type_concept`.
#> → Summarising record counts
#> Warning: device_exposure omop table is empty after application of date range.
#> → Summarising in observation records, subjects, person-days, age and sex
#> ℹ The following estimates will be computed:
#> • age: median
#> → Start summary of data, at 2025-06-18 20:18:17.374429
#>
#> ✔ Summary finished, at 2025-06-18 20:18:17.425096
#> → Summarising observation period
#> ☺ Database characterisation finished. Code ran in 0 min and 16 sec
#> ℹ 1 table created: "og_075_1750274284".
omopgenerics::settings(result) |> dplyr::select("result_id", "result_type", "package_name")
#> # A tibble: 8 × 3
#> result_id result_type package_name
#> <int> <chr> <chr>
#> 1 1 summarise_omop_snapshot OmopSketch
#> 2 2 summarise_characteristics CohortCharacteristics
#> 3 3 summarise_missing_data OmopSketch
#> 4 4 summarise_table_quality OmopSketch
#> 5 5 summarise_clinical_records OmopSketch
#> 6 6 summarise_record_count OmopSketch
#> 7 7 summarise_in_observation OmopSketch
#> 8 8 summarise_observation_period OmopSketch
By default, the following OMOP tables are included in the characterisation: person, observation_period, visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death.
You can customise which tables to include in the analysis by
specifying them with the omopTableName
argument.
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"))
To stratify the characterisation results by sex, set the
sex
argument to TRUE
:
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
sex = TRUE)
You can choose to characterise the data stratifying by age group by creating a list defining the age groups you want to use.
result <- databaseCharacteristics(cdm, omopTableName = c("drug_exposure", "condition_occurrence"),
ageGroup = list(c(0,50), c(51,100)))
Use the dateRange
argument to limit the analysis to a
specific period. Combine it with the interval
argument to
stratify results by time. Valid values for interval include “overall”
(default), “years”, “quarters”, and “months”:
result <- databaseCharacteristics(cdm,
interval = "years",
dateRange = as.Date(c("2010-01-01", "2018-12-31")))
To include concept counts in the characterisation, set
conceptIdCounts = TRUE
:
result <- databaseCharacteristics(cdm,
conceptIdCounts = TRUE)
To explore the characterisation results interactively, you can use
the shinyCharacteristics()
function. This function
generates a Shiny application in the specified directory
,
allowing you to browse, filter, and visualise the results through an
intuitive user interface.
shinyCharacteristics(result = result, directory = "path/to/your/shiny")
You can customise the title, logo, and theme of the Shiny app by setting the appropriate arguments:
title
: The title displayed at the top of the
app
logo
: Path to a custom logo (must be in SVG
format)
theme
: A custom Bootstrap theme (e.g., using
bslib::bs_theme())
shinyCharacteristics(result = result, directory = "path/to/my/shiny",
title = "Characterisation of my data",
logo = "path/to/my/logo.svg",
theme = "bslib::bs_theme(bootswatch = 'flatly')")
An example of the Shiny application generated by
shinyCharacteristics()
can be explored here,
where the characterisation of several synthetic datasets is
available.