This document demonstrate how to use the
addCohortIntersect()
function in PatientProfiles to obtain
cohort intersect information for each individual in your CDM cohort
table.
When using addCohortIntersect()
you would need two input
table.
Table containing cohort the individual for which the intersect
information to be attached as extra columns. In the function this table
is label as x
. This table need to be saved as a table
within CDM environment and contain.
Table where you search and compute the intersect information
from. In the function this table is label as
cohortTableName
. This table need to be saved as a table
within CDM environment.
Both table need to contain the column “cohort_definition_id”, “subject_id”, “cohort_start_date”, “cohort_end_date”. The intersect information this function can return are:
number
count of the intersect in
cohortTableName
binary
indicator of whether intersect is found in
cohortTableName
date
the earliest or the latest date of the
cohort_start_date of the intersect.
time
the time in number days between the
cohort_start_date of the individual and its earliest or the latest
intersect date.
Below is an example of addCohortIntersect()
generated
using mock data.
library(DBI)
library(duckdb)
library(tibble)
library(PatientProfiles)
#functionality
cohort1 <- dplyr::tibble(
cohort_definition_id = c(1, 1, 1, 1, 1),
subject_id = c(1, 1, 1, 2, 2),
cohort_start_date = as.Date(
c(
"2020-01-01",
"2020-01-15",
"2020-01-20",
"2020-01-01",
"2020-02-01"
)
),
cohort_end_date = as.Date(
c(
"2020-01-01",
"2020-01-15",
"2020-01-20",
"2020-01-01",
"2020-02-01"
)
)
)
cohort2 <- dplyr::tibble(
cohort_definition_id = c(1, 1, 1, 1, 1, 1, 1),
subject_id = c(1, 1, 1, 2, 2, 2, 1),
cohort_start_date = as.Date(
c(
"2020-01-15",
"2020-01-25",
"2020-01-26",
"2020-01-29",
"2020-03-15",
"2020-01-24",
"2020-02-16"
)
),
cohort_end_date = as.Date(
c(
"2020-01-15",
"2020-01-25",
"2020-01-26",
"2020-01-29",
"2020-03-15",
"2020-01-24",
"2020-02-16"
)
),
)
cdm <- mockPatientProfiles(cohort1=cohort1, cohort2=cohort2)
First we use mockCohorProfiles()
to generate two cohort
table in the cdm environment and name in cohort1
and
cohort2
and saved it as cdm. Then to add the intersect
information as column in cohort1
from cohort2
,
we run below code.
cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2")
## # Source: table<dbplyr_001> [5 x 8]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## cohort_defi…¹ subje…² cohort_s…³ cohort_e…⁴ numbe…⁵ binar…⁶ time_…⁷ date_coh…⁸
## <dbl> <dbl> <date> <date> <dbl> <dbl> <dbl> <date>
## 1 1 1 2020-01-01 2020-01-01 4 1 14 2020-01-15
## 2 1 1 2020-01-15 2020-01-15 4 1 0 2020-01-15
## 3 1 1 2020-01-20 2020-01-20 3 1 5 2020-01-25
## 4 1 2 2020-01-01 2020-01-01 3 1 23 2020-01-24
## 5 1 2 2020-02-01 2020-02-01 1 1 43 2020-03-15
## # … with abbreviated variable names ¹cohort_definition_id, ²subject_id,
## # ³cohort_start_date, ⁴cohort_end_date, ⁵`number_cohort2_(0,NA)_1`,
## # ⁶`binary_cohort2_(0,NA)_1`, ⁷`time_cohort2_(0,NA)_first_1`,
## # ⁸`date_cohort2_(0,NA)_first_1`
As you see from the result above, it added 4 extra columns “time_cohort2_(0,NA)first_1”,”day_cohort2(0,NA)first_1”,”binary_cohort2(0,NA)1”,”number_cohort2(0,NA)1”. The columns are name as ”{value of the intersect information}{tableName which contain the cohort}{window of interests}{order first or last cohort date of the cohort start date of the intersect}”.
To change it to return to last cohort start date we can use the
order
option in the function
cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2", order = "last")
## # Source: table<dbplyr_002> [5 x 8]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## cohort_defi…¹ subje…² cohort_s…³ cohort_e…⁴ numbe…⁵ binar…⁶ time_…⁷ date_coh…⁸
## <dbl> <dbl> <date> <date> <dbl> <dbl> <dbl> <date>
## 1 1 1 2020-01-01 2020-01-01 4 1 46 2020-02-16
## 2 1 1 2020-01-15 2020-01-15 4 1 32 2020-02-16
## 3 1 1 2020-01-20 2020-01-20 3 1 27 2020-02-16
## 4 1 2 2020-01-01 2020-01-01 3 1 74 2020-03-15
## 5 1 2 2020-02-01 2020-02-01 1 1 43 2020-03-15
## # … with abbreviated variable names ¹cohort_definition_id, ²subject_id,
## # ³cohort_start_date, ⁴cohort_end_date, ⁵`number_cohort2_(0,NA)_1`,
## # ⁶`binary_cohort2_(0,NA)_1`, ⁷`time_cohort2_(0,NA)_last_1`,
## # ⁸`date_cohort2_(0,NA)_last_1`
The value
option can be use to specified which value you
want to return.
cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2", value = c("binary","number"))
## # Source: table<dbplyr_003> [5 x 6]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## cohort_definition_id subject_id cohort_start_date cohort_end…¹ numbe…² binar…³
## <dbl> <dbl> <date> <date> <dbl> <dbl>
## 1 1 1 2020-01-01 2020-01-01 4 1
## 2 1 1 2020-01-15 2020-01-15 4 1
## 3 1 1 2020-01-20 2020-01-20 3 1
## 4 1 2 2020-01-01 2020-01-01 3 1
## 5 1 2 2020-02-01 2020-02-01 1 1
## # … with abbreviated variable names ¹cohort_end_date,
## # ²`number_cohort2_(0,NA)_1`, ³`binary_cohort2_(0,NA)_1`
You can use the window
option to change the look back
period from date of reference in table x to date of event at event
table.
cdm$cohort1 %>% addCohortIntersect(cdm = cdm,cohortTableName = "cohort2", window = c(0,0))
## # Source: table<dbplyr_004> [5 x 8]
## # Database: DuckDB 0.6.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## cohort_defi…¹ subje…² cohort_s…³ cohort_e…⁴ numbe…⁵ binar…⁶ time_…⁷ date_coh…⁸
## <dbl> <dbl> <date> <date> <dbl> <dbl> <dbl> <date>
## 1 1 1 2020-01-15 2020-01-15 1 1 0 2020-01-15
## 2 1 1 2020-01-01 2020-01-01 0 0 NA NA
## 3 1 1 2020-01-20 2020-01-20 0 0 NA NA
## 4 1 2 2020-01-01 2020-01-01 0 0 NA NA
## 5 1 2 2020-02-01 2020-02-01 0 0 NA NA
## # … with abbreviated variable names ¹cohort_definition_id, ²subject_id,
## # ³cohort_start_date, ⁴cohort_end_date, ⁵`number_cohort2_(0,0)_1`,
## # ⁶`binary_cohort2_(0,0)_1`, ⁷`time_cohort2_(0,0)_first_1`,
## # ⁸`date_cohort2_(0,0)_first_1`