The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Database diagnostics

Introduction

In this example we’re going to be using the Eunomia synthetic data.

library(CDMConnector)
library(CohortConstructor)
library(CodelistGenerator)
library(PhenotypeR)
library(dplyr)
library(ggplot2)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main", 
                                achillesSchema = "main")

Database diagnostics

We have created our study cohort, but to inform analytic decisions and interpretation of results requires an understanding of the dataset from which it has been derived. The databaseDiagnostics() function will help us better understand a data source.

To run database diagnostics we just need to provide our cdm reference to the function.

db_diagnostics <- databaseDiagnostics(cdm)
db_diagnostics |> glimpse()
#> Rows: 6,224
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,…
#> $ cdm_name         <chr> "Eunomia Synpuf", "Eunomia Synpuf", "Eunomia Synpuf",…
#> $ group_name       <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "general", "general", "observation_period", "cdm", "g…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "snapshot_date", "person_count", "count", "source_nam…
#> $ estimate_type    <chr> "date", "integer", "integer", "character", "character…
#> $ estimate_value   <chr> "2025-02-05", "1000", "1048", "Synpuf", "v5.0 06-AUG-…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

From our results we can create a table with a summary of metadata for the data source.

OmopSketch::tableOmopSnapshot(db_diagnostics)
Estimate
Database name
Eunomia Synpuf
General
Snapshot date 2025-02-05
Person count 1,000
Vocabulary version v5.0 06-AUG-21
Observation period
N 1,048
Start date 2008-01-01
End date 2010-12-31
Cdm
Source name Synpuf
Version v5.3.1
Holder name ohdsi
Release date 2018-03-15
Description
Documentation reference
Source type duckdb

In addition, we also can see a summary of individuals’ observation periods. From this we can see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.

OmopSketch::tableObservationPeriod(db_diagnostics)
Observation period ordinal Variable name Estimate name
CDM name
Eunomia Synpuf
all Number records N 1,048
Number subjects N 1,000
Records per person mean (sd) 1.05 (0.21)
median [Q25 - Q75] 1 [1 - 1]
Duration in days mean (sd) 979.71 (262.79)
median [Q25 - Q75] 1,096 [1,096 - 1,096]
Days to next observation period mean (sd) 172.17 (108.35)
median [Q25 - Q75] 138 [93 - 254]
1st Number subjects N 1,000
Duration in days mean (sd) 994.16 (257.95)
median [Q25 - Q75] 1,096 [1,096 - 1,096]
Days to next observation period mean (sd) 172.17 (108.35)
median [Q25 - Q75] 138 [93 - 254]
2nd Number subjects N 48
Duration in days mean (sd) 678.60 (164.50)
median [Q25 - Q75] 730 [730 - 730]
Days to next observation period mean (sd) -
median [Q25 - Q75] -

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.