The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this vignette, we will explore the OmopSketch functions designed to provide an overview of the clinical tables within a CDM object (e.g. visit_occurrence, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, and death). Specifically, there are two key functions that facilitate this:
summariseClinicalRecords(): creates a summary
statistics with key basic information about the clinical table (e.g.,
number of records, records per person, etc.), some quality checks (e.g,
missingness, correct filling of date columns, etc.) and a summary of the
concepts used in the table (domains, source vocabularies, etc.)
tableClinicalRecords(): helps visualising the
results in a formatted table.
Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the R package omock
library(dplyr)
library(OmopSketch)
library(omock)
# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#>
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
Let’s now use summariseClinicalTables()from the
OmopSketch package to help us have an overview of one of the clinical
tables of the cdm (i.e., condition_occurrence).
summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName = "condition_occurrence"
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising subjects not in person table in condition_occurrence.
#> ℹ Summarising records in observation in condition_occurrence.
#> ℹ Summarising records with start before birth date in condition_occurrence.
#> ℹ Summarising records with end date before start date in condition_occurrence.
#> ℹ Summarising domains in condition_occurrence.
#> ℹ Summarising standard concepts in condition_occurrence.
#> ℹ Summarising source vocabularies in condition_occurrence.
#> ℹ Summarising concept types in condition_occurrence.
#> ℹ Summarising missing data in condition_occurrence.
summarisedResult
#> # A tibble: 82 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 GiBleed omop_table condition_occurrence overall overall
#> 2 1 GiBleed omop_table condition_occurrence overall overall
#> 3 1 GiBleed omop_table condition_occurrence overall overall
#> 4 1 GiBleed omop_table condition_occurrence overall overall
#> 5 1 GiBleed omop_table condition_occurrence overall overall
#> 6 1 GiBleed omop_table condition_occurrence overall overall
#> 7 1 GiBleed omop_table condition_occurrence overall overall
#> 8 1 GiBleed omop_table condition_occurrence overall overall
#> 9 1 GiBleed omop_table condition_occurrence overall overall
#> 10 1 GiBleed omop_table condition_occurrence overall overall
#> # ℹ 72 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
Notice that the output is in the summarised
result format. ## Records per person We can use the arguments to
specify which statistics we want to perform. For example, use the
argument recordsPerPerson to indicate which estimates you
are interested regarding the number of records per person.
summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName = "condition_occurrence",
recordsPerPerson = c("mean", "sd", "q05", "q95")
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising subjects not in person table in condition_occurrence.
#> ℹ Summarising records in observation in condition_occurrence.
#> ℹ Summarising records with start before birth date in condition_occurrence.
#> ℹ Summarising records with end date before start date in condition_occurrence.
#> ℹ Summarising domains in condition_occurrence.
#> ℹ Summarising standard concepts in condition_occurrence.
#> ℹ Summarising source vocabularies in condition_occurrence.
#> ℹ Summarising concept types in condition_occurrence.
#> ℹ Summarising missing data in condition_occurrence.
summarisedResult |>
filter(variable_name == "records_per_person") |>
select(variable_name, estimate_name, estimate_value)
#> # A tibble: 0 × 3
#> # ℹ 3 variables: variable_name <chr>, estimate_name <chr>, estimate_value <chr>
When the argument quality = TRUE is set, the results
will include a quality assessment of the data.
This assessment provides information such as:
person_id values that do not exist in
the person table.summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName = "condition_occurrence",
recordsPerPerson = NULL,
conceptSummary = FALSE,
missing = FALSE,
quality = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising subjects not in person table in condition_occurrence.
#> ℹ Summarising records in observation in condition_occurrence.
#> ℹ Summarising records with start before birth date in condition_occurrence.
#> ℹ Summarising records with end date before start date in condition_occurrence.
summarisedResult |>
select(variable_name, estimate_name, estimate_value)
#> # A tibble: 13 × 3
#> variable_name estimate_name estimate_value
#> <chr> <chr> <chr>
#> 1 Number subjects count 2694
#> 2 Number subjects percentage 100
#> 3 Number records count 65332
#> 4 Subjects not in person table count 0
#> 5 Subjects not in person table percentage 0.00
#> 6 In observation count 450
#> 7 In observation count 64882
#> 8 Start date before birth date count 0
#> 9 End date before start date count 0
#> 10 In observation percentage 0.69
#> 11 In observation percentage 99.31
#> 12 Start date before birth date percentage 0.00
#> 13 End date before start date percentage 0.00
When the argument conceptSummary = TRUE is set, the
results will also include information about the concepts contained in
the table, such as:
summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName = "drug_exposure",
recordsPerPerson = NULL,
conceptSummary = TRUE,
missing = FALSE,
quality = FALSE
)
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising domains in drug_exposure.
#> ℹ Summarising standard concepts in drug_exposure.
#> ℹ Summarising source vocabularies in drug_exposure.
#> ℹ Summarising concept types in drug_exposure.
#> ℹ Summarising concept class in drug_exposure.
summarisedResult |>
select(variable_name, variable_level, estimate_name, estimate_value)
#> # A tibble: 37 × 4
#> variable_name variable_level estimate_name estimate_value
#> <chr> <chr> <chr> <chr>
#> 1 Number subjects <NA> count 2694
#> 2 Number subjects <NA> percentage 100
#> 3 Number records <NA> count 67707
#> 4 Domain Drug count 67707
#> 5 Standard concept S count 67707
#> 6 Source vocabulary No matching concept count 35
#> 7 Source vocabulary NDC count 2694
#> 8 Source vocabulary CVX count 25710
#> 9 Source vocabulary RxNorm count 39268
#> 10 Type concept id Dispensed in Outpatient office count 25710
#> # ℹ 27 more rows
When the argument missing = TRUE is set, the results
will include a summary of missing data in the table, including the
number of 0s in the concept columns.
This output is analogous to the results produced by the OmopSketch
function summariseMissingData().
summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName = "condition_occurrence",
recordsPerPerson = NULL,
conceptSummary = FALSE,
missing = TRUE,
quality = FALSE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising missing data in condition_occurrence.
summarisedResult |>
select(variable_name, variable_level, estimate_name, estimate_value)
#> # A tibble: 53 × 4
#> variable_name variable_level estimate_name estimate_value
#> <chr> <chr> <chr> <chr>
#> 1 Number subjects <NA> count 2694
#> 2 Number subjects <NA> percentage 100
#> 3 Number records <NA> count 65332
#> 4 Column name condition_occurrence_id na_count 0
#> 5 Column name condition_occurrence_id na_percentage 0.00
#> 6 Column name condition_occurrence_id zero_count 0
#> 7 Column name condition_occurrence_id zero_percentage 0.00
#> 8 Column name person_id na_count 0
#> 9 Column name person_id na_percentage 0.00
#> 10 Column name person_id zero_count 0
#> # ℹ 43 more rows
It is also possible to stratify the results by sex and age groups:
summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName = "condition_occurrence",
recordsPerPerson = c("mean", "sd", "q05", "q95"),
quality = TRUE,
conceptSummary = TRUE,
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising subjects not in person table in condition_occurrence.
#> ℹ Summarising records in observation in condition_occurrence.
#> ℹ Summarising records with start before birth date in condition_occurrence.
#> ℹ Summarising records with end date before start date in condition_occurrence.
#> ℹ Summarising domains in condition_occurrence.
#> ℹ Summarising standard concepts in condition_occurrence.
#> ℹ Summarising source vocabularies in condition_occurrence.
#> ℹ Summarising concept types in condition_occurrence.
#> ℹ Summarising missing data in condition_occurrence.
summarisedResult |>
select(variable_name, strata_level, estimate_name, estimate_value)
#> # A tibble: 663 × 4
#> variable_name strata_level estimate_name estimate_value
#> <chr> <chr> <chr> <chr>
#> 1 Number subjects overall count 2694
#> 2 Number subjects overall percentage 100
#> 3 Records per person overall mean 24.2509
#> 4 Records per person overall sd 7.4065
#> 5 Records per person overall q05 14
#> 6 Records per person overall q95 38
#> 7 Number records overall count 65332
#> 8 Number subjects <35 count 2694
#> 9 Number subjects >=35 count 2656
#> 10 Number subjects <35 percentage 100
#> # ℹ 653 more rows
Notice that, by default, the “overall” group will also be included,
as well as crossed strata (that means, sex == "Female" and
ageGroup == "\>35").
Also, see that the analysis can be conducted for multiple OMOP tables at the same time:
summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName = c("visit_occurrence", "drug_exposure"),
recordsPerPerson = c("mean", "sd"),
quality = FALSE,
conceptSummary = FALSE,
missingData = FALSE
)
#> ℹ Adding variables of interest to visit_occurrence.
#> ℹ Summarising records per person in visit_occurrence.
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
summarisedResult |>
select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 10 × 4
#> group_level variable_name estimate_name estimate_value
#> <chr> <chr> <chr> <chr>
#> 1 visit_occurrence Number subjects count 890
#> 2 visit_occurrence Number subjects percentage 100
#> 3 visit_occurrence Records per person mean 1.1652
#> 4 visit_occurrence Records per person sd 0.4145
#> 5 visit_occurrence Number records count 1037
#> 6 drug_exposure Number subjects count 2694
#> 7 drug_exposure Number subjects percentage 100
#> 8 drug_exposure Records per person mean 25.1325
#> 9 drug_exposure Records per person sd 5.2457
#> 10 drug_exposure Number records count 67707
We can also filter the clinical table to a specific time window by
setting the dateRange argument.
summarisedResult <- summariseClinicalRecords(
cdm = cdm,
omopTableName ="drug_exposure",
dateRange = as.Date(c("1990-01-01", "2010-01-01"))
)
#> ℹ Adding variables of interest to drug_exposure.
#> ℹ Summarising records per person in drug_exposure.
#> ℹ Summarising subjects not in person table in drug_exposure.
#> ℹ Summarising records in observation in drug_exposure.
#> ℹ Summarising records with start before birth date in drug_exposure.
#> ℹ Summarising records with end date before start date in drug_exposure.
#> ℹ Summarising domains in drug_exposure.
#> ℹ Summarising standard concepts in drug_exposure.
#> ℹ Summarising source vocabularies in drug_exposure.
#> ℹ Summarising concept types in drug_exposure.
#> ℹ Summarising concept class in drug_exposure.
#> ℹ Summarising missing data in drug_exposure.
summarisedResult |>
settings() |>
glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id <int> 1
#> $ result_type <chr> "summarise_clinical_records"
#> $ package_name <chr> "OmopSketch"
#> $ package_version <chr> "1.0.0"
#> $ group <chr> "omop_table"
#> $ strata <chr> ""
#> $ additional <chr> ""
#> $ min_cell_count <chr> "0"
#> $ study_period_end <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"
tableClinicalRecords() will help you to tidy the
previous results and create a formatted table of type gt, reactable or datatable. By default it
creates a gt table.
summarisedResult <- summariseClinicalRecords(cdm,
omopTableName = "condition_occurrence",
recordsPerPerson = c("mean", "sd", "q05", "q95"),
quality = TRUE,
conceptSummary = TRUE,
sex = TRUE
)
#> ℹ Adding variables of interest to condition_occurrence.
#> ℹ Summarising records per person in condition_occurrence.
#> ℹ Summarising subjects not in person table in condition_occurrence.
#> ℹ Summarising records in observation in condition_occurrence.
#> ℹ Summarising records with start before birth date in condition_occurrence.
#> ℹ Summarising records with end date before start date in condition_occurrence.
#> ℹ Summarising domains in condition_occurrence.
#> ℹ Summarising standard concepts in condition_occurrence.
#> ℹ Summarising source vocabularies in condition_occurrence.
#> ℹ Summarising concept types in condition_occurrence.
#> ℹ Summarising missing data in condition_occurrence.
tableClinicalRecords(result = summarisedResult, type = "gt")
| Variable name | Variable level | Estimate name |
Database name
|
|---|---|---|---|
| GiBleed | |||
| condition_occurrence; overall | |||
| Number records | – | N | 65,332.00 |
| Number subjects | – | N (%) | 2,694 (100.00%) |
| Subjects not in person table | – | N (%) | 0 (0.00%) |
| Records per person | – | Mean (SD) | 24.25 (7.41) |
| q05 | 14.00 | ||
| q95 | 38.00 | ||
| In observation | No | N (%) | 450 (0.69%) |
| Yes | N (%) | 64,882 (99.31%) | |
| Domain | Condition | N (%) | 65,332 (100.00%) |
| Source vocabulary | Icd10cm | N (%) | 479 (0.73%) |
| No matching concept | N (%) | 27 (0.04%) | |
| Snomed | N (%) | 64,826 (99.23%) | |
| Standard concept | S | N (%) | 65,332 (100.00%) |
| Type concept id | Ehr encounter diagnosis | N (%) | 65,332 (100.00%) |
| Start date before birth date | – | N (%) | 0 (0.00%) |
| End date before start date | – | N (%) | 0 (0.00%) |
| Column name | Condition concept id | N missing data (%) | 0 (0.00%) |
| N zeros (%) | 0 (0.00%) | ||
| Condition end date | N missing data (%) | 8,652 (13.24%) | |
| Condition end datetime | N missing data (%) | 8,652 (13.24%) | |
| Condition occurrence id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source value | N missing data (%) | 0 (0.00%) | |
| Condition start date | N missing data (%) | 0 (0.00%) | |
| Condition start datetime | N missing data (%) | 0 (0.00%) | |
| Condition status concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 65,332 (100.00%) | ||
| Condition status source value | N missing data (%) | 65,332 (100.00%) | |
| Condition type concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Person id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Provider id | N missing data (%) | 65,332 (100.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Stop reason | N missing data (%) | 65,332 (100.00%) | |
| Visit detail id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 65,332 (100.00%) | ||
| Visit occurrence id | N missing data (%) | 64 (0.10%) | |
| N zeros (%) | 0 (0.00%) | ||
| condition_occurrence; Female | |||
| Number records | – | N | 33,744.00 |
| Number subjects | – | N (%) | 1,373 (100.00%) |
| Records per person | – | Mean (SD) | 24.58 (7.59) |
| q05 | 14.00 | ||
| q95 | 38.00 | ||
| In observation | No | N (%) | 227 (0.67%) |
| Yes | N (%) | 33,517 (99.33%) | |
| Domain | Condition | N (%) | 33,744 (100.00%) |
| Source vocabulary | Icd10cm | N (%) | 242 (0.72%) |
| No matching concept | N (%) | 15 (0.04%) | |
| Snomed | N (%) | 33,487 (99.24%) | |
| Standard concept | S | N (%) | 33,744 (100.00%) |
| Type concept id | Ehr encounter diagnosis | N (%) | 33,744 (100.00%) |
| Column name | Condition concept id | N missing data (%) | 0 (0.00%) |
| N zeros (%) | 0 (0.00%) | ||
| Condition end date | N missing data (%) | 4,397 (13.03%) | |
| Condition end datetime | N missing data (%) | 4,397 (13.03%) | |
| Condition occurrence id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source value | N missing data (%) | 0 (0.00%) | |
| Condition start date | N missing data (%) | 0 (0.00%) | |
| Condition start datetime | N missing data (%) | 0 (0.00%) | |
| Condition status concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 33,744 (100.00%) | ||
| Condition status source value | N missing data (%) | 33,744 (100.00%) | |
| Condition type concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Person id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Provider id | N missing data (%) | 33,744 (100.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Stop reason | N missing data (%) | 33,744 (100.00%) | |
| Visit detail id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 33,744 (100.00%) | ||
| Visit occurrence id | N missing data (%) | 24 (0.07%) | |
| N zeros (%) | 0 (0.00%) | ||
| condition_occurrence; Male | |||
| Number records | – | N | 31,588.00 |
| Number subjects | – | N (%) | 1,321 (100.00%) |
| Records per person | – | Mean (SD) | 23.91 (7.20) |
| q05 | 13.00 | ||
| q95 | 37.00 | ||
| In observation | No | N (%) | 223 (0.71%) |
| Yes | N (%) | 31,365 (99.29%) | |
| Domain | Condition | N (%) | 31,588 (100.00%) |
| Source vocabulary | Icd10cm | N (%) | 237 (0.75%) |
| No matching concept | N (%) | 12 (0.04%) | |
| Snomed | N (%) | 31,339 (99.21%) | |
| Standard concept | S | N (%) | 31,588 (100.00%) |
| Type concept id | Ehr encounter diagnosis | N (%) | 31,588 (100.00%) |
| Column name | Condition concept id | N missing data (%) | 0 (0.00%) |
| N zeros (%) | 0 (0.00%) | ||
| Condition end date | N missing data (%) | 4,255 (13.47%) | |
| Condition end datetime | N missing data (%) | 4,255 (13.47%) | |
| Condition occurrence id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Condition source value | N missing data (%) | 0 (0.00%) | |
| Condition start date | N missing data (%) | 0 (0.00%) | |
| Condition start datetime | N missing data (%) | 0 (0.00%) | |
| Condition status concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 31,588 (100.00%) | ||
| Condition status source value | N missing data (%) | 31,588 (100.00%) | |
| Condition type concept id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Person id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Provider id | N missing data (%) | 31,588 (100.00%) | |
| N zeros (%) | 0 (0.00%) | ||
| Stop reason | N missing data (%) | 31,588 (100.00%) | |
| Visit detail id | N missing data (%) | 0 (0.00%) | |
| N zeros (%) | 31,588 (100.00%) | ||
| Visit occurrence id | N missing data (%) | 40 (0.13%) | |
| N zeros (%) | 0 (0.00%) | ||
Finally, disconnect from the mock CDM.
cdmDisconnect(cdm = cdm)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.