1.1 Introduction

This document demonstrate how to use the addCohortIntersect() function in PatientProfiles to obtain cohort intersect information for each individual in your CDM cohort table.

When using addCohortIntersect() you would need two input table.

  1. Table containing cohort the individual for which the intersect information to be attached as extra columns. In the function this table is label as x. This table need to be saved as a table within CDM environment and contain.

  2. Table where you search and compute the intersect information from. In the function this table is label as cohortTableName. This table need to be saved as a table within CDM environment.

Both table need to contain the column “cohort_definition_id”, “subject_id”, “cohort_start_date”, “cohort_end_date”. The intersect information this function can return are:

  1. number count of the intersect in cohortTableName

  2. binary indicator of whether intersect is found in cohortTableName

  3. date the earliest or the latest date of the cohort_start_date of the intersect.

  4. time the time in number days between the cohort_start_date of the individual and its earliest or the latest intersect date.

Below is an example of addCohortIntersect() generated using mock data.

library(DBI)
library(duckdb)
library(tibble)
library(PatientProfiles)

  #functionality
  cohort1 <- dplyr::tibble(
    cohort_definition_id = c(1, 1, 1, 1, 1),
    subject_id = c(1, 1, 1, 2, 2),
    cohort_start_date = as.Date(
      c(
        "2020-01-01",
        "2020-01-15",
        "2020-01-20",
        "2020-01-01",
        "2020-02-01"
      )
    ),
    cohort_end_date = as.Date(
      c(
        "2020-01-01",
        "2020-01-15",
        "2020-01-20",
        "2020-01-01",
        "2020-02-01"
      )
    )
  )

  cohort2 <- dplyr::tibble(
    cohort_definition_id = c(1, 1, 1, 1, 1, 1, 1),
    subject_id = c(1, 1, 1, 2, 2, 2, 1),
    cohort_start_date = as.Date(
      c(
        "2020-01-15",
        "2020-01-25",
        "2020-01-26",
        "2020-01-29",
        "2020-03-15",
        "2020-01-24",
        "2020-02-16"
      )
    ),
    cohort_end_date = as.Date(
      c(
        "2020-01-15",
        "2020-01-25",
        "2020-01-26",
        "2020-01-29",
        "2020-03-15",
        "2020-01-24",
        "2020-02-16"
      )
    ),
  )

  cdm <- mockPatientProfiles(cohort1=cohort1, cohort2=cohort2)

First we use mockCohorProfiles() to generate two cohort table in the cdm environment and name in cohort1 and cohort2 and saved it as cdm. Then to add the intersect information as column in cohort1 from cohort2, we run below code.

cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2")
## # Source:   table<dbplyr_001> [5 x 8]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
##   cohort_defi…¹ subje…² cohort_s…³ cohort_e…⁴ numbe…⁵ binar…⁶ time_…⁷ date_coh…⁸
##           <dbl>   <dbl> <date>     <date>       <dbl>   <dbl>   <dbl> <date>    
## 1             1       1 2020-01-01 2020-01-01       4       1      14 2020-01-15
## 2             1       1 2020-01-15 2020-01-15       4       1       0 2020-01-15
## 3             1       1 2020-01-20 2020-01-20       3       1       5 2020-01-25
## 4             1       2 2020-01-01 2020-01-01       3       1      23 2020-01-24
## 5             1       2 2020-02-01 2020-02-01       1       1      43 2020-03-15
## # … with abbreviated variable names ¹​cohort_definition_id, ²​subject_id,
## #   ³​cohort_start_date, ⁴​cohort_end_date, ⁵​`number_cohort2_(0,NA)_1`,
## #   ⁶​`binary_cohort2_(0,NA)_1`, ⁷​`time_cohort2_(0,NA)_first_1`,
## #   ⁸​`date_cohort2_(0,NA)_first_1`

As you see from the result above, it added 4 extra columns “time_cohort2_(0,NA)first_1”,”day_cohort2(0,NA)first_1”,”binary_cohort2(0,NA)1”,”number_cohort2(0,NA)1”. The columns are name as ”{value of the intersect information}{tableName which contain the cohort}{window of interests}{order first or last cohort date of the cohort start date of the intersect}”.

To change it to return to last cohort start date we can use the order option in the function

cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2", order = "last")
## # Source:   table<dbplyr_002> [5 x 8]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
##   cohort_defi…¹ subje…² cohort_s…³ cohort_e…⁴ numbe…⁵ binar…⁶ time_…⁷ date_coh…⁸
##           <dbl>   <dbl> <date>     <date>       <dbl>   <dbl>   <dbl> <date>    
## 1             1       1 2020-01-01 2020-01-01       4       1      46 2020-02-16
## 2             1       1 2020-01-15 2020-01-15       4       1      32 2020-02-16
## 3             1       1 2020-01-20 2020-01-20       3       1      27 2020-02-16
## 4             1       2 2020-01-01 2020-01-01       3       1      74 2020-03-15
## 5             1       2 2020-02-01 2020-02-01       1       1      43 2020-03-15
## # … with abbreviated variable names ¹​cohort_definition_id, ²​subject_id,
## #   ³​cohort_start_date, ⁴​cohort_end_date, ⁵​`number_cohort2_(0,NA)_1`,
## #   ⁶​`binary_cohort2_(0,NA)_1`, ⁷​`time_cohort2_(0,NA)_last_1`,
## #   ⁸​`date_cohort2_(0,NA)_last_1`

The value option can be use to specified which value you want to return.

cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2", value = c("binary","number"))
## # Source:   table<dbplyr_003> [5 x 6]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
##   cohort_definition_id subject_id cohort_start_date cohort_end…¹ numbe…² binar…³
##                  <dbl>      <dbl> <date>            <date>         <dbl>   <dbl>
## 1                    1          1 2020-01-01        2020-01-01         4       1
## 2                    1          1 2020-01-15        2020-01-15         4       1
## 3                    1          1 2020-01-20        2020-01-20         3       1
## 4                    1          2 2020-01-01        2020-01-01         3       1
## 5                    1          2 2020-02-01        2020-02-01         1       1
## # … with abbreviated variable names ¹​cohort_end_date,
## #   ²​`number_cohort2_(0,NA)_1`, ³​`binary_cohort2_(0,NA)_1`

You can use the window option to change the look back period from date of reference in table x to date of event at event table.

cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2", window = c(0,0))
## # Source:   table<dbplyr_004> [5 x 8]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
##   cohort_defi…¹ subje…² cohort_s…³ cohort_e…⁴ numbe…⁵ binar…⁶ time_…⁷ date_coh…⁸
##           <dbl>   <dbl> <date>     <date>       <dbl>   <dbl>   <dbl> <date>    
## 1             1       1 2020-01-15 2020-01-15       1       1       0 2020-01-15
## 2             1       1 2020-01-01 2020-01-01       0       0      NA NA        
## 3             1       1 2020-01-20 2020-01-20       0       0      NA NA        
## 4             1       2 2020-01-01 2020-01-01       0       0      NA NA        
## 5             1       2 2020-02-01 2020-02-01       0       0      NA NA        
## # … with abbreviated variable names ¹​cohort_definition_id, ²​subject_id,
## #   ³​cohort_start_date, ⁴​cohort_end_date, ⁵​`number_cohort2_(0,0)_1`,
## #   ⁶​`binary_cohort2_(0,0)_1`, ⁷​`time_cohort2_(0,0)_first_1`,
## #   ⁸​`date_cohort2_(0,0)_first_1`