The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

CDM reference backends

Overview

The CDMConnector package allows us to work with cdm data in different locations consistently. The cdm_reference may be to tables in a database, files on disk, or tables loaded into R. This allows computation to take place wherever is most convenient.

Here we have a schematic of how CDMConnector can be used to create cdm_references to different locations.

Example

To show how this can work (and slightly overcomplicate things to show different options), let´s say we want to create a histogram with age of patients at diagnosis of tear of meniscus of knee (concept_id of “4035415”). We can start in the database and, after loading the required packages, subset our person table people to only include those people in the condition_occurrence table with condition_concept_id “4035415”

library(CDMConnector)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
cdm <- cdm_from_con(con, cdm_name = "eunomia", cdm_schema = "main", write_schema = "main")

# first filter to only those with condition_concept_id "4035415"
cdm$condition_occurrence %>% tally()

cdm$condition_occurrence <- cdm$condition_occurrence %>%
  filter(condition_concept_id == "4035415") %>%
  select(person_id, condition_start_date)

cdm$condition_occurrence %>% tally()

# then left_join person table
cdm$person %>% tally()
cdm$condition_occurrence %>%
  select(person_id) %>%
  left_join(select(cdm$person, person_id, year_of_birth), by = "person_id") %>% 
  tally()

We can save these tables to file

dOut <- tempfile()
dir.create(dOut)
CDMConnector::stow(cdm, dOut, format = "parquet")

And now we can create a cdm_reference to the files

cdm_arrow <- cdm_from_files(dOut, as_data_frame = FALSE, cdm_name = "GiBleed")

cdm_arrow$person %>%
  nrow()

cdm_arrow$condition_occurrence %>%
  nrow()

And create an age at diagnosis variable

result <- cdm_arrow$person %>%
  left_join(cdm_arrow$condition_occurrence, by = "person_id") %>%
  mutate(age_diag = year(condition_start_date) - year_of_birth) %>%
  collect()

We can then bring in this result to R and make the histogram

str(result)

result %>%
  ggplot(aes(age_diag)) +
  geom_histogram()
DBI::dbDisconnect(con, shutdown = TRUE)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.