Database diagnostics

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

In this example we’re going to be using the Eunomia synthetic data.

library(CDMConnector)
library(CohortConstructor)
library(CodelistGenerator)
library(PhenotypeR)
library(dplyr)
library(ggplot2)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main", 
                                achillesSchema = "main")

Database diagnostics

We have created our study cohort, but to inform analytic decisions and interpretation of results requires an understanding of the dataset from which it has been derived. The databaseDiagnostics() function will help us better understand a data source.

To run database diagnostics we just need to provide our cdm reference to the function.

db_diagnostics <- databaseDiagnostics(cdm)
db_diagnostics |> glimpse()
#> Rows: 6,224
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,…
#> $ cdm_name         <chr> "Eunomia Synpuf", "Eunomia Synpuf", "Eunomia Synpuf",…
#> $ group_name       <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ group_level      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "general", "general", "observation_period", "cdm", "g…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "snapshot_date", "person_count", "count", "source_nam…
#> $ estimate_type    <chr> "date", "integer", "integer", "character", "character…
#> $ estimate_value   <chr> "2025-03-28", "1000", "1048", "Synpuf", "v5.0 06-AUG-…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

From our results we can create a table with a summary of metadata for the data source.

OmopSketch::tableOmopSnapshot(db_diagnostics)

Estimate	Database name
Estimate	Eunomia Synpuf
General
Snapshot date	2025-03-28
Person count	1,000
Vocabulary version	v5.0 06-AUG-21
Observation period
N	1,048
Start date	2008-01-01
End date	2010-12-31
Cdm
Source name	Synpuf
Version	v5.3.1
Holder name	ohdsi
Release date	2018-03-15
Description
Documentation reference
Source type	duckdb

In addition, we also can see a summary of individuals’ observation periods. From this we can see if there are individuals with multiple, non-overlapping, observation periods and how long each observation period lasts on average.

OmopSketch::tableObservationPeriod(db_diagnostics)

Observation period ordinal	Variable name	Estimate name	CDM name
Observation period ordinal	Variable name	Estimate name	Eunomia Synpuf
all	Number records	N	1,048
	Number subjects	N	1,000
	Records per person	mean (sd)	1.05 (0.21)
		median [Q25 - Q75]	1 [1 - 1]
	Duration in days	mean (sd)	979.71 (262.79)
		median [Q25 - Q75]	1,096 [1,096 - 1,096]
	Days to next observation period	mean (sd)	172.17 (108.35)
		median [Q25 - Q75]	138 [93 - 254]
1st	Number subjects	N	1,000
	Duration in days	mean (sd)	994.16 (257.95)
		median [Q25 - Q75]	1,096 [1,096 - 1,096]
	Days to next observation period	mean (sd)	172.17 (108.35)
		median [Q25 - Q75]	138 [93 - 254]
2nd	Number subjects	N	48
	Duration in days	mean (sd)	678.60 (164.50)
		median [Q25 - Q75]	730 [730 - 730]
	Days to next observation period	mean (sd)	-
		median [Q25 - Q75]	-

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.