Working with cohorts

Adding a cohort

First, we’ll load packages and create a cdm reference. In this case we’ll be using the Eunomia “GI Bleed” dataset.

library(CDMConnector)
library(dplyr)

con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cdm <- CDMConnector::cdm_from_con(
  con = con,
  cdm_schema = "main",
  write_schema = "main"
)

We can define a cohort for GI bleeding, where we exclude anyone with a record of rheumatoid arthritis at any time.

# devtools::install_github("OHDSI/Capr", "v2") # Use the development version of Capr v2
library(Capr)

# create a cohort set folder for saving cohort definitions
path <- file.path(tempdir(), "cohorts")
dir.create(path)

gibleed_cohort_definition <- cohort(
  entry = condition(cs(descendants(192671))),
  attrition = attrition(
    "no RA" = withAll(
      exactly(0,
              condition(cs(descendants(80809))),
              duringInterval(eventStarts(-Inf, Inf))))
  )
)

# write json cohort definition
writeCohort(gibleed_cohort_definition, file.path(path, "gibleed.json"))

Now we have our cohort definition, we can generate the cohort.

gibleed_cohort_set <- readCohortSet(path = path)
# requires CirceR optional dependency
cdm <- generateCohortSet(
  cdm,
  gibleed_cohort_set,
  name = "gibleed",
  computeAttrition = TRUE
)

We can see that we now have our cohort instantiated in the database with a reference to it added to the cdm reference.

cdm$gibleed %>% 
  glimpse()
#> Rows: 237
#> Columns: 4
#> $ cohort_definition_id <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
#> $ subject_id           <dbl> 3, 32, 80, 133, 187, 188, 202, 260, 264, 273, 285…
#> $ cohort_start_date    <date> 1958-01-29, 1987-06-09, 1974-10-27, 2019-04-05, …
#> $ cohort_end_date      <date> 2018-10-29, 2014-12-24, 2019-04-15, 2019-04-06, …

Cohort attributes

As well as the cohort itself, the cohort has a number of attributes. First, is a count of participants by cohort. We can use cohortCount to get these counts.

cohortCount(cdm$gibleed) %>% 
  glimpse()
#> Rows: 1
#> Columns: 3
#> $ cohort_definition_id <int> 3
#> $ cohort_entries       <dbl> 237
#> $ cohort_subjects      <dbl> 237

We also have the attrition associated with entry into the cohort available via cohortAttrition.

cohortAttrition(cdm$gibleed) %>% 
  glimpse()
#> Rows: 1
#> Columns: 7
#> $ cohort_definition_id <int> 3
#> $ number_records       <dbl> 237
#> $ number_subjects      <dbl> 237
#> $ reason_id            <dbl> 1
#> $ reason               <chr> "Qualifying initial records"
#> $ excluded_records     <dbl> 0
#> $ excluded_subjects    <dbl> 0

And lastly, we can also access the settings associated with the cohort using cohortCount.

cohortSet(cdm$gibleed) %>% 
  glimpse()
#> Rows: 1
#> Columns: 2
#> $ cohort_definition_id <int> 3
#> $ cohort_name          <chr> "GIBleed_male"
DBI::dbDisconnect(con, shutdown = TRUE)