The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this example we’re going to be using the Eunomia synthetic data.
library(CDMConnector)
library(CohortConstructor)
library(CodelistGenerator)
library(PhenotypeR)
library(dplyr)
library(ggplot2)
con <- DBI::dbConnect(duckdb::duckdb(),
CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con,
cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
We have created our study cohort, but to inform analytic decisions
and interpretation of results requires an understanding of the dataset
from which it has been derived. The databaseDiagnostics()
function will help us better understand a data source.
To run database diagnostics we just need to provide our cdm reference to the function.
db_diagnostics <- databaseDiagnostics(cdm)
db_diagnostics |> glimpse()
#> Rows: 6,224
#> Columns: 13
#> $ result_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,…
#> $ cdm_name <chr> "Eunomia Synpuf", "Eunomia Synpuf", "Eunomia Synpuf",…
#> $ group_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name <chr> "general", "general", "observation_period", "cdm", "g…
#> $ variable_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name <chr> "snapshot_date", "person_count", "count", "source_nam…
#> $ estimate_type <chr> "date", "integer", "integer", "character", "character…
#> $ estimate_value <chr> "2025-02-05", "1000", "1048", "Synpuf", "v5.0 06-AUG-…
#> $ additional_name <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
From our results we can create a table with a summary of metadata for the data source.
Estimate |
Database name
|
---|---|
Eunomia Synpuf | |
General | |
Snapshot date | 2025-02-05 |
Person count | 1,000 |
Vocabulary version | v5.0 06-AUG-21 |
Observation period | |
N | 1,048 |
Start date | 2008-01-01 |
End date | 2010-12-31 |
Cdm | |
Source name | Synpuf |
Version | v5.3.1 |
Holder name | ohdsi |
Release date | 2018-03-15 |
Description | |
Documentation reference | |
Source type | duckdb |
In addition, we also can see a summary of individuals’ observation periods. From this we can see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.
Observation period ordinal | Variable name | Estimate name |
CDM name
|
---|---|---|---|
Eunomia Synpuf | |||
all | Number records | N | 1,048 |
Number subjects | N | 1,000 | |
Records per person | mean (sd) | 1.05 (0.21) | |
median [Q25 - Q75] | 1 [1 - 1] | ||
Duration in days | mean (sd) | 979.71 (262.79) | |
median [Q25 - Q75] | 1,096 [1,096 - 1,096] | ||
Days to next observation period | mean (sd) | 172.17 (108.35) | |
median [Q25 - Q75] | 138 [93 - 254] | ||
1st | Number subjects | N | 1,000 |
Duration in days | mean (sd) | 994.16 (257.95) | |
median [Q25 - Q75] | 1,096 [1,096 - 1,096] | ||
Days to next observation period | mean (sd) | 172.17 (108.35) | |
median [Q25 - Q75] | 138 [93 - 254] | ||
2nd | Number subjects | N | 48 |
Duration in days | mean (sd) | 678.60 (164.50) | |
median [Q25 - Q75] | 730 [730 - 730] | ||
Days to next observation period | mean (sd) | - | |
median [Q25 - Q75] | - |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.