The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The omopgenerics package provides definitions of core classes and methods used by analytic pipelines that query the OMOP common data model.
#> Warning in citation("omopgenerics"): no date field in DESCRIPTION file of
#> package 'omopgenerics'
#> Warning in citation("omopgenerics"): could not determine year for
#> 'omopgenerics' from package DESCRIPTION file
#>
#> To cite package 'omopgenerics' in publications use:
#>
#> Català M, Burn E (????). _omopgenerics: Methods and Classes for the
#> OMOP Common Data Model_. R package version 0.3.1.900,
#> <https://darwin-eu.github.io/omopgenerics/>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {omopgenerics: Methods and Classes for the OMOP Common Data Model},
#> author = {Martí Català and Edward Burn},
#> note = {R package version 0.3.1.900},
#> url = {https://darwin-eu.github.io/omopgenerics/},
#> }
If you find the package useful in supporting your research study, please consider citing this package.
You can install the development version of OMOPGenerics from GitHub with:
install.packages("pak")
::pkg_install("darwin-eu/omopgenerics") pak
And load it using the library command:
library(omopgenerics)
library(dplyr)
A cdm reference is a single R object that represents OMOP CDM data. The tables in the cdm reference may be in a database, but a cdm reference may also contain OMOP CDM tables that are in dataframes/tibbles or in arrow. In the latter case the cdm reference would typically be a subset of an original cdm reference that has been derived as part of a particular analysis.
omopgenerics contains the class definition of a cdm reference and a dataframe implementation. For creating a cdm reference using a database, see the CDMConnector package (https://darwin-eu.github.io/CDMConnector/).
A cdm object can contain four type of tables:
omopTables()
#> [1] "person" "observation_period" "visit_occurrence"
#> [4] "visit_detail" "condition_occurrence" "drug_exposure"
#> [7] "procedure_occurrence" "device_exposure" "measurement"
#> [10] "observation" "death" "note"
#> [13] "note_nlp" "specimen" "fact_relationship"
#> [16] "location" "care_site" "provider"
#> [19] "payer_plan_period" "cost" "drug_era"
#> [22] "dose_era" "condition_era" "metadata"
#> [25] "cdm_source" "concept" "vocabulary"
#> [28] "domain" "concept_class" "concept_relationship"
#> [31] "relationship" "concept_synonym" "concept_ancestor"
#> [34] "source_to_concept_map" "drug_strength" "cohort_definition"
#> [37] "attribute_definition" "concept_recommended"
Each one of the tables has a required columns. For example, for the
person
table this are the required columns:
omopColumns(table = "person")
#> [1] "person_id" "gender_concept_id"
#> [3] "year_of_birth" "month_of_birth"
#> [5] "day_of_birth" "birth_datetime"
#> [7] "race_concept_id" "ethnicity_concept_id"
#> [9] "location_id" "provider_id"
#> [11] "care_site_id" "person_source_value"
#> [13] "gender_source_value" "gender_source_concept_id"
#> [15] "race_source_value" "race_source_concept_id"
#> [17] "ethnicity_source_value" "ethnicity_source_concept_id"
cohortTables()
#> [1] "cohort" "cohort_set" "cohort_attrition" "cohort_codelist"
cohortColumns(table = "cohort")
#> [1] "cohort_definition_id" "subject_id" "cohort_start_date"
#> [4] "cohort_end_date"
In addition, cohorts are defined in terms of a
generatedCohortSet
class. For more details on this class
definition see the corresponding vignette.
achillesTables()
#> [1] "achilles_analysis" "achilles_results" "achilles_results_dist"
achillesColumns(table = "achilles_results")
#> [1] "analysis_id" "stratum_1" "stratum_2" "stratum_3" "stratum_4"
#> [6] "stratum_5" "count_value"
Any table to be part of a cdm object has to fulfill 4 conditions:
All must share a common source.
The name of the tables must be lowercase.
The name of the column names of each table must be lowercase.
person
and observation_period
must be
present.
A concept set can be represented as either a codelist or a concept set expression. A codelist is a named list, with each item of the list containing specific concept IDs.
<- list("diabetes" = c(201820, 4087682, 3655269),
condition_codes "asthma" = 317009)
<- newCodelist(condition_codes)
condition_codes #> Warning: ! `codelist` contains numeric values, they are casted to integers.
condition_codes#>
#> ── 2 codelists ─────────────────────────────────────────────────────────────────
#>
#> - asthma (1 codes)
#> - diabetes (3 codes)
Meanwhile, a concept set expression provides a high-level definition of concepts that, when applied to a specific OMOP CDM vocabulary version (by making use of the concept hierarchies and relationships), will result in a codelist.
<- list(
condition_cs "diabetes" = dplyr::tibble(
"concept_id" = c(201820, 4087682),
"excluded" = c(FALSE, FALSE),
"descendants" = c(TRUE, FALSE),
"mapped" = c(FALSE, FALSE)
),"asthma" = dplyr::tibble(
"concept_id" = 317009,
"excluded" = FALSE,
"descendants" = FALSE,
"mapped" = FALSE
)
)<- newConceptSetExpression(condition_cs)
condition_cs
condition_cs#>
#> ── 2 conceptSetExpressions ─────────────────────────────────────────────────────
#>
#> - asthma (1 concept criteria)
#> - diabetes (2 concept criteria)
A cohort is a set of persons who satisfy one or more inclusion criteria for a duration of time and, when defined, this table in a cdm reference has a cohort table class. Cohort tables are then associated with attributes such as settings and attrition.
<- tibble(
person person_id = 1, gender_concept_id = 0, year_of_birth = 1990,
race_concept_id = 0, ethnicity_concept_id = 0
)<- dplyr::tibble(
observation_period observation_period_id = 1, person_id = 1,
observation_period_start_date = as.Date("2000-01-01"),
observation_period_end_date = as.Date("2023-12-31"),
period_type_concept_id = 0
)<- tibble(
diabetes cohort_definition_id = 1, subject_id = 1,
cohort_start_date = as.Date("2020-01-01"),
cohort_end_date = as.Date("2020-01-10")
)
<- cdmFromTables(
cdm tables = list(
"person" = person,
"observation_period" = observation_period,
"diabetes" = diabetes
),cdmName = "example_cdm"
)#> Warning: ! 5 column in person do not match expected column type:
#> • `person_id` is numeric but expected integer
#> • `gender_concept_id` is numeric but expected integer
#> • `year_of_birth` is numeric but expected integer
#> • `race_concept_id` is numeric but expected integer
#> • `ethnicity_concept_id` is numeric but expected integer
#> Warning: ! 3 column in observation_period do not match expected column type:
#> • `observation_period_id` is numeric but expected integer
#> • `person_id` is numeric but expected integer
#> • `period_type_concept_id` is numeric but expected integer
$diabetes <- newCohortTable(cdm$diabetes)
cdm#> Warning: ! 2 column in diabetes do not match expected column type:
#> • `cohort_definition_id` is numeric but expected integer
#> • `subject_id` is numeric but expected integer
$diabetes
cdm#> # A tibble: 1 × 4
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <dbl> <dbl> <date> <date>
#> 1 1 1 2020-01-01 2020-01-10
settings(cdm$diabetes)
#> # A tibble: 1 × 2
#> cohort_definition_id cohort_name
#> <int> <chr>
#> 1 1 cohort_1
attrition(cdm$diabetes)
#> # A tibble: 1 × 7
#> cohort_definition_id number_records number_subjects reason_id reason
#> <int> <int> <int> <int> <chr>
#> 1 1 1 1 1 Initial qualify…
#> # ℹ 2 more variables: excluded_records <int>, excluded_subjects <int>
cohortCount(cdm$diabetes)
#> # A tibble: 1 × 3
#> cohort_definition_id number_records number_subjects
#> <int> <int> <int>
#> 1 1 1 1
A summarised result provides a standard format for the results of an analysis performed against data mapped to the OMOP CDM.
For example this format is used when we get a summary of the cdm as a whole
summary(cdm) |>
::glimpse()
dplyr#> Rows: 13
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name <chr> "example_cdm", "example_cdm", "example_cdm", "example…
#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "snapshot_date", "person_count", "observation_period_…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ estimate_name <chr> "value", "count", "count", "source_name", "version", …
#> $ estimate_type <chr> "date", "integer", "integer", "character", "character…
#> $ estimate_value <chr> "2024-11-01", "1", "1", "", NA, "5.3", "", "", "", ""…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
and also when we summarise a cohort
summary(cdm$diabetes) |>
::glimpse()
dplyr#> Rows: 6
#> Columns: 13
#> $ result_id <int> 1, 1, 2, 2, 2, 2
#> $ cdm_name <chr> "example_cdm", "example_cdm", "example_cdm", "example…
#> $ group_name <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level <chr> "cohort_1", "cohort_1", "cohort_1", "cohort_1", "coho…
#> $ strata_name <chr> "overall", "overall", "reason", "reason", "reason", "…
#> $ strata_level <chr> "overall", "overall", "Initial qualifying events", "I…
#> $ variable_name <chr> "number_records", "number_subjects", "number_records"…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA
#> $ estimate_name <chr> "count", "count", "count", "count", "count", "count"
#> $ estimate_type <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value <chr> "1", "1", "1", "1", "0", "0"
#> $ additional_name <chr> "overall", "overall", "reason_id", "reason_id", "reas…
#> $ additional_level <chr> "overall", "overall", "1", "1", "1", "1"
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.