The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

2 Introduction

In this vignette we will explore the OmopSketch functions designed to provide a concise overview of the OMOP person table. Specifically there are two small utilities that make this easy:

2.1 Create a mock cdm

Let’s load the required packages and create a mock CDM using the R package omock so we can run the functions on a small example.

library(dplyr)
library(OmopSketch)
library(omock)

# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb") 
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise person table

Run summarisePerson() to compute basic summaries for the person table. The function will return a summarised_result.

result <- summarisePerson(cdm = cdm)

result |> 
  glimpse()
#> Rows: 123
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed"…
#> $ group_name       <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Number subjects", "Number subjects not in observatio…
#> $ variable_level   <chr> NA, NA, NA, "Female", "Female", "Male", "Male", "None…
#> $ estimate_name    <chr> "count", "count", "percentage", "count", "percentage"…
#> $ estimate_type    <chr> "integer", "integer", "numeric", "integer", "numeric"…
#> $ estimate_value   <chr> "2694", "0", "0", "1373", "50.9651076466221", "1321",…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

3.1 What the function reports

summarisePerson() builds a set of common summaries:

  • Number subjects: total number of rows in person.

  • Number subjects not in observation: number (and percentage) of persons that do not appear in observation_period (useful to detect missing observation periods). A warning is emitted if any are found.

  • Sex: counts and percentages for the sex categories (Female, Male, Missing).

  • A separate Sex source table shows the raw gender_source_value distribution.

  • Race / Race source: distribution of race_concept_id and race_source_value

  • Ethnicity / Ethnicity source: distribution of ethnicity_concept_id and ethnicity_source_value.

  • Year / Month / Day of birth: numeric summaries (missingness, quantiles, min/max) of birth date components.

  • Location, Provider, Care site: number of missing, zeros, distinct values.

4 Tidy the summarised object

tablePerson() will help you to tidy the previous results and create a formatted table of type gt, reactable or datatable. By default it creates a gt table.

tablePerson(result = result, type = "gt")
Summary of person table
Variable name Variable level Estimate name
CDM name
GiBleed
Number subjects N 2,694
Number subjects not in observation N (%) 0 (0.00%)
Sex Female N (%) 1,373 (50.97%)
Male N (%) 1,321 (49.03%)
None N (%) 0 (0.00%)
Sex source F N (%) 1,373 (50.97%)
M N (%) 1,321 (49.03%)
Race No matching concept N (%) 451 (16.74%)
Missing N (%) 2,243 (83.26%)
Race source asian N (%) 212 (7.87%)
black N (%) 338 (12.55%)
hispanic N (%) 435 (16.15%)
native N (%) 14 (0.52%)
other N (%) 2 (0.07%)
white N (%) 1,693 (62.84%)
Ethnicity No matching concept N (%) 2,259 (83.85%)
Missing N (%) 435 (16.15%)
Ethnicity source african N (%) 119 (4.42%)
american N (%) 79 (2.93%)
american_indian N (%) 14 (0.52%)
arab N (%) 2 (0.07%)
asian_indian N (%) 81 (3.01%)
central_american N (%) 75 (2.78%)
chinese N (%) 131 (4.86%)
dominican N (%) 105 (3.90%)
english N (%) 218 (8.09%)
french N (%) 129 (4.79%)
french_canadian N (%) 74 (2.75%)
german N (%) 130 (4.83%)
greek N (%) 19 (0.71%)
irish N (%) 438 (16.26%)
italian N (%) 295 (10.95%)
mexican N (%) 42 (1.56%)
polish N (%) 107 (3.97%)
portuguese N (%) 93 (3.45%)
puerto_rican N (%) 258 (9.58%)
russian N (%) 34 (1.26%)
scottish N (%) 48 (1.78%)
south_american N (%) 60 (2.23%)
swedish N (%) 29 (1.08%)
west_indian N (%) 114 (4.23%)
Year of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 1,961 [1,950 - 1,970]
90% Range [Q05 to Q95] 1,922 to 1,979
Range [min to max] 1,908 to 1,986
Month of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 7 [4 - 10]
90% Range [Q05 to Q95] 1 to 12
Range [min to max] 1 to 12
Day of birth Missing (%) 0 (0.00%)
Median [Q25 - Q75] 16 [8 - 23]
90% Range [Q05 to Q95] 2 to 29
Range [min to max] 1 to 31
Location Missing (%) 2,694 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1
Provider Missing (%) 2,694 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1
Care site Missing (%) 2,694 (100.00%)
Zero count (%) 0 (0.00%)
Distinct values 1

5 Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.