The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this vignette we will explore the OmopSketch functions designed to provide a concise overview of the OMOP person table. Specifically there are two small utilities that make this easy:
summarisePerson(): computes a set of summary statistics
and data-quality checks for the person table (total subjects, missing
observation-period checks, sex/race/ethnicity distributions, birth-date
components, and simple summaries for id-columns such as location_id,
provider_id, and care_site_id).tablePerson(): helps visualising the results in a
formatted table.Let’s load the required packages and create a mock CDM using the R package omock so we can run the functions on a small example.
library(dplyr)
library(OmopSketch)
library(omock)
# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#>
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
Run summarisePerson() to compute basic summaries for the person table. The function will return a summarised_result.
result <- summarisePerson(cdm = cdm)
result |>
glimpse()
#> Rows: 123
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed"…
#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "Number subjects", "Number subjects not in observatio…
#> $ variable_level <chr> NA, NA, NA, "Female", "Female", "Male", "Male", "None…
#> $ estimate_name <chr> "count", "count", "percentage", "count", "percentage"…
#> $ estimate_type <chr> "integer", "integer", "numeric", "integer", "numeric"…
#> $ estimate_value <chr> "2694", "0", "0", "1373", "50.9651076466221", "1321",…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
summarisePerson() builds a set of common summaries:
Number subjects: total number of rows in person.
Number subjects not in observation: number (and percentage) of persons that do not appear in observation_period (useful to detect missing observation periods). A warning is emitted if any are found.
Sex: counts and percentages for the sex categories (Female, Male, Missing).
A separate Sex source table shows the raw gender_source_value distribution.
Race / Race source: distribution of race_concept_id and race_source_value
Ethnicity / Ethnicity source: distribution of ethnicity_concept_id and ethnicity_source_value.
Year / Month / Day of birth: numeric summaries (missingness, quantiles, min/max) of birth date components.
Location, Provider, Care site: number of missing, zeros, distinct values.
tablePerson() will help you to tidy the previous results
and create a formatted table of type gt, reactable or datatable. By default it
creates a gt table.
tablePerson(result = result, type = "gt")
| Variable name | Variable level | Estimate name |
CDM name
|
|---|---|---|---|
| GiBleed | |||
| Number subjects | – | N | 2,694 |
| Number subjects not in observation | – | N (%) | 0 (0.00%) |
| Sex | Female | N (%) | 1,373 (50.97%) |
| Male | N (%) | 1,321 (49.03%) | |
| None | N (%) | 0 (0.00%) | |
| Sex source | F | N (%) | 1,373 (50.97%) |
| M | N (%) | 1,321 (49.03%) | |
| Race | No matching concept | N (%) | 451 (16.74%) |
| Missing | N (%) | 2,243 (83.26%) | |
| Race source | asian | N (%) | 212 (7.87%) |
| black | N (%) | 338 (12.55%) | |
| hispanic | N (%) | 435 (16.15%) | |
| native | N (%) | 14 (0.52%) | |
| other | N (%) | 2 (0.07%) | |
| white | N (%) | 1,693 (62.84%) | |
| Ethnicity | No matching concept | N (%) | 2,259 (83.85%) |
| Missing | N (%) | 435 (16.15%) | |
| Ethnicity source | african | N (%) | 119 (4.42%) |
| american | N (%) | 79 (2.93%) | |
| american_indian | N (%) | 14 (0.52%) | |
| arab | N (%) | 2 (0.07%) | |
| asian_indian | N (%) | 81 (3.01%) | |
| central_american | N (%) | 75 (2.78%) | |
| chinese | N (%) | 131 (4.86%) | |
| dominican | N (%) | 105 (3.90%) | |
| english | N (%) | 218 (8.09%) | |
| french | N (%) | 129 (4.79%) | |
| french_canadian | N (%) | 74 (2.75%) | |
| german | N (%) | 130 (4.83%) | |
| greek | N (%) | 19 (0.71%) | |
| irish | N (%) | 438 (16.26%) | |
| italian | N (%) | 295 (10.95%) | |
| mexican | N (%) | 42 (1.56%) | |
| polish | N (%) | 107 (3.97%) | |
| portuguese | N (%) | 93 (3.45%) | |
| puerto_rican | N (%) | 258 (9.58%) | |
| russian | N (%) | 34 (1.26%) | |
| scottish | N (%) | 48 (1.78%) | |
| south_american | N (%) | 60 (2.23%) | |
| swedish | N (%) | 29 (1.08%) | |
| west_indian | N (%) | 114 (4.23%) | |
| Year of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 1,961 [1,950 - 1,970] | ||
| 90% Range [Q05 to Q95] | 1,922 to 1,979 | ||
| Range [min to max] | 1,908 to 1,986 | ||
| Month of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 7 [4 - 10] | ||
| 90% Range [Q05 to Q95] | 1 to 12 | ||
| Range [min to max] | 1 to 12 | ||
| Day of birth | – | Missing (%) | 0 (0.00%) |
| Median [Q25 - Q75] | 16 [8 - 23] | ||
| 90% Range [Q05 to Q95] | 2 to 29 | ||
| Range [min to max] | 1 to 31 | ||
| Location | – | Missing (%) | 2,694 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
| Provider | – | Missing (%) | 2,694 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 | ||
| Care site | – | Missing (%) | 2,694 (100.00%) |
| Zero count (%) | 0 (0.00%) | ||
| Distinct values | 1 |
Finally, disconnect from the mock CDM.
cdmDisconnect(cdm = cdm)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.