The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

2 Introduction
- 2.1 Create a mock cdm
3 Summarise person table
- 3.1 What the function reports
4 Tidy the summarised object
5 Disconnect from CDM

2 Introduction

In this vignette we will explore the OmopSketch functions designed to provide a concise overview of the OMOP person table. Specifically there are two small utilities that make this easy:

summarisePerson(): computes a set of summary statistics and data-quality checks for the person table (total subjects, missing observation-period checks, sex/race/ethnicity distributions, birth-date components, and simple summaries for id-columns such as location_id, provider_id, and care_site_id).
tablePerson(): helps visualising the results in a formatted table.

2.1 Create a mock cdm

Let’s load the required packages and create a mock CDM using the R package omock so we can run the functions on a small example.

library(dplyr)
library(OmopSketch)
library(omock)

# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb") 
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise person table

Run summarisePerson() to compute basic summaries for the person table. The function will return a summarised_result.

result <- summarisePerson(cdm = cdm)

result |> 
  glimpse()
#> Rows: 123
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "GiBleed", "GiBleed", "GiBleed", "GiBleed", "GiBleed"…
#> $ group_name       <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Number subjects", "Number subjects not in observatio…
#> $ variable_level   <chr> NA, NA, NA, "Female", "Female", "Male", "Male", "None…
#> $ estimate_name    <chr> "count", "count", "percentage", "count", "percentage"…
#> $ estimate_type    <chr> "integer", "integer", "numeric", "integer", "numeric"…
#> $ estimate_value   <chr> "2694", "0", "0", "1373", "50.9651076466221", "1321",…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

3.1 What the function reports

summarisePerson() builds a set of common summaries:

Number subjects: total number of rows in person.
Number subjects not in observation: number (and percentage) of persons that do not appear in observation_period (useful to detect missing observation periods). A warning is emitted if any are found.
Sex: counts and percentages for the sex categories (Female, Male, Missing).
A separate Sex source table shows the raw gender_source_value distribution.
Race / Race source: distribution of race_concept_id and race_source_value
Ethnicity / Ethnicity source: distribution of ethnicity_concept_id and ethnicity_source_value.
Year / Month / Day of birth: numeric summaries (missingness, quantiles, min/max) of birth date components.
Location, Provider, Care site: number of missing, zeros, distinct values.

4 Tidy the summarised object

tablePerson() will help you to tidy the previous results and create a formatted table of type gt, reactable or datatable. By default it creates a gt table.

tablePerson(result = result, type = "gt")

Summary of person table
Variable name	Variable level	Estimate name	CDM name
Variable name	Variable level	Estimate name	GiBleed
Number subjects	–	N	2,694
Number subjects not in observation	–	N (%)	0 (0.00%)
Sex	Female	N (%)	1,373 (50.97%)
	Male	N (%)	1,321 (49.03%)
	None	N (%)	0 (0.00%)
Sex source	F	N (%)	1,373 (50.97%)
	M	N (%)	1,321 (49.03%)
Race	No matching concept	N (%)	451 (16.74%)
	Missing	N (%)	2,243 (83.26%)
Race source	asian	N (%)	212 (7.87%)
	black	N (%)	338 (12.55%)
	hispanic	N (%)	435 (16.15%)
	native	N (%)	14 (0.52%)
	other	N (%)	2 (0.07%)
	white	N (%)	1,693 (62.84%)
Ethnicity	No matching concept	N (%)	2,259 (83.85%)
	Missing	N (%)	435 (16.15%)
Ethnicity source	african	N (%)	119 (4.42%)
	american	N (%)	79 (2.93%)
	american_indian	N (%)	14 (0.52%)
	arab	N (%)	2 (0.07%)
	asian_indian	N (%)	81 (3.01%)
	central_american	N (%)	75 (2.78%)
	chinese	N (%)	131 (4.86%)
	dominican	N (%)	105 (3.90%)
	english	N (%)	218 (8.09%)
	french	N (%)	129 (4.79%)
	french_canadian	N (%)	74 (2.75%)
	german	N (%)	130 (4.83%)
	greek	N (%)	19 (0.71%)
	irish	N (%)	438 (16.26%)
	italian	N (%)	295 (10.95%)
	mexican	N (%)	42 (1.56%)
	polish	N (%)	107 (3.97%)
	portuguese	N (%)	93 (3.45%)
	puerto_rican	N (%)	258 (9.58%)
	russian	N (%)	34 (1.26%)
	scottish	N (%)	48 (1.78%)
	south_american	N (%)	60 (2.23%)
	swedish	N (%)	29 (1.08%)
	west_indian	N (%)	114 (4.23%)
Year of birth	–	Missing (%)	0 (0.00%)
		Median [Q25 - Q75]	1,961 [1,950 - 1,970]
		90% Range [Q05 to Q95]	1,922 to 1,979
		Range [min to max]	1,908 to 1,986
Month of birth	–	Missing (%)	0 (0.00%)
		Median [Q25 - Q75]	7 [4 - 10]
		90% Range [Q05 to Q95]	1 to 12
		Range [min to max]	1 to 12
Day of birth	–	Missing (%)	0 (0.00%)
		Median [Q25 - Q75]	16 [8 - 23]
		90% Range [Q05 to Q95]	2 to 29
		Range [min to max]	1 to 31
Location	–	Missing (%)	2,694 (100.00%)
		Zero count (%)	0 (0.00%)
		Distinct values	1
Provider	–	Missing (%)	2,694 (100.00%)
		Zero count (%)	0 (0.00%)
		Distinct values	1
Care site	–	Missing (%)	2,694 (100.00%)
		Zero count (%)	0 (0.00%)
		Distinct values	1

5 Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.