Introduction to CodelistGenerator

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Creating a code list for dementia

For this example we are going to generate a candidate codelist for dementia, only looking for codes in the condition domain. Let’s first load some libraries

Connect to the OMOP CDM vocabularies

CodelistGenerator works with a cdm_reference to the vocabularies tables of the OMOP CDM using the CDMConnector package.

# example with postgres database connection details
db <- DBI::dbConnect(RPostgres::Postgres(),
  dbname = Sys.getenv("server"),
  port = Sys.getenv("port"),
  host = Sys.getenv("host"),
  user = Sys.getenv("user"),
  password = Sys.getenv("password")
)

# create cdm reference
cdm <- CDMConnector::cdmFromCon(
  con = db,
  cdmSchema = Sys.getenv("vocabulary_schema")
)

Check version of the vocabularies

It is important to note that the results from CodelistGenerator will be specific to a particular version of the OMOP CDM vocabularies. We can see the version of the vocabulary being used like so

getVocabVersion(cdm = cdm)

#> [1] "vocabVersion"

A code list from “Dementia” (4182210) and its descendants

The simplest approach to identifying potential codes is to take a high-level code and include all its descendants.

codesFromDescendants <- tbl(
  db,
  sql(paste0(
    "SELECT * FROM ",
    vocabularyDatabaseSchema,
    ".concept_ancestor"
  ))
) |>
  filter(ancestor_concept_id == "4182210") |>
  select("descendant_concept_id") |>
  rename("concept_id" = "descendant_concept_id") |>
  left_join(tbl(db, sql(paste0(
    "SELECT * FROM ",
    vocabularyDatabaseSchema,
    ".concept"
  )))) |>
  select(
    "concept_id", "concept_name",
    "domain_id", "vocabulary_id"
  ) |>
  collect()

codesFromDescendants |> 
  glimpse()
#> Rows: 151
#> Columns: 4
#> $ concept_id    <int> 35610098, 4043241, 4139421, 37116466, 4046089, 44782559,…
#> $ concept_name  <chr> "Predominantly cortical dementia", "Familial Alzheimer's…
#> $ domain_id     <chr> "Condition", "Condition", "Condition", "Condition", "Con…
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOME…

This looks to pick up most relevant codes. But, this approach misses codes that are not a descendant of 4182210. For example, codes such as “Wandering due to dementia” (37312577; https://athena.ohdsi.org/search-terms/terms/37312577) and “Anxiety due to dementia” (37312031; https://athena.ohdsi.org/search-terms/terms/37312031) are not picked up.

Generating a candidate code list using CodelistGenerator

To try and include all such terms that could be included we can use CodelistGenerator.

First, let’s do a simple search for a single keyword of “dementia”, including descendants of the identified codes.

dementiaCodes1 <- getCandidateCodes(
  cdm = cdm,
  keywords = "dementia",
  domains = "Condition",
  includeDescendants = TRUE
)

dementiaCodes1|> 
  glimpse()
#> Rows: 187
#> Columns: 6
#> $ concept_id       <int> 374326, 374888, 375791, 376085, 376094, 376095, 37694…
#> $ found_from       <chr> "From initial search", "From initial search", "From i…
#> $ concept_name     <chr> "Arteriosclerotic dementia with depression", "Dementi…
#> $ domain_id        <chr> "Condition", "Condition", "Condition", "Condition", "…
#> $ vocabulary_id    <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ standard_concept <chr> "standard", "standard", "standard", "standard", "stan…

Comparing code lists

What is the difference between this code list and the one from 4182210 and its descendants?

codeComparison <- compareCodelists(
  codesFromDescendants,
  dementiaCodes1
)

codeComparison |>
  group_by(codelist) |>
  tally()
#> # A tibble: 2 × 2
#>   codelist            n
#>   <chr>           <int>
#> 1 Both              151
#> 2 Only codelist 2    36

What are these extra codes picked up by CodelistGenerator?

codeComparison |>
  filter(codelist == "Only codelist 2") |> 
  glimpse()
#> Rows: 36
#> Columns: 3
#> $ concept_id   <int> 4041685, 4043378, 4044415, 4046091, 4092747, 4187091, 425…
#> $ concept_name <chr> "Amyotrophic lateral sclerosis with dementia", "Frontotem…
#> $ codelist     <chr> "Only codelist 2", "Only codelist 2", "Only codelist 2", …

Review mappings from non-standard vocabularies

Perhaps we want to see what ICD10CM codes map to our candidate code list. We can get these by running

icdMappings <- getMappings(
  cdm = cdm,
  candidateCodelist = dementiaCodes1,
  nonStandardVocabularies = "ICD10CM"
)

icdMappings |> 
  glimpse()
#> Rows: 191
#> Columns: 7
#> $ standard_concept_id        <int> 372610, 374341, 374888, 374888, 374888, 374…
#> $ standard_concept_name      <chr> "Postconcussion syndrome", "Huntington's ch…
#> $ standard_vocabulary_id     <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ non_standard_concept_id    <int> 45571706, 35207314, 1568088, 1568089, 37402…
#> $ non_standard_concept_name  <chr> "Postconcussional syndrome", "Huntington's …
#> $ non_standard_concept_code  <chr> "F07.81", "G10", "F02", "F02.8", "F02.811",…
#> $ non_standard_vocabulary_id <chr> "ICD10CM", "ICD10CM", "ICD10CM", "ICD10CM",…

readMappings <- getMappings(
  cdm = cdm,
  candidateCodelist = dementiaCodes1,
  nonStandardVocabularies = "Read"
)

readMappings |> 
  glimpse()
#> Rows: 93
#> Columns: 7
#> $ standard_concept_id        <int> 372610, 372610, 372610, 372610, 372610, 372…
#> $ standard_concept_name      <chr> "Postconcussion syndrome", "Postconcussion …
#> $ standard_vocabulary_id     <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ non_standard_concept_id    <int> 45446542, 45446553, 45453190, 45459905, 455…
#> $ non_standard_concept_name  <chr> "Post-concussion syndrome", "[X]Post-trauma…
#> $ non_standard_concept_code  <chr> "E2A2.00", "Eu06212", "E2A2.11", "E2A2.12",…
#> $ non_standard_vocabulary_id <chr> "READ", "READ", "READ", "READ", "READ", "RE…

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.