The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this vignette, we will explore the OmopSketch functions
designed to provide an overview of the observation_period
table. Specifically, there are 3 key functions that facilitate this:
summariseObservationPeriod(): get some overall
statistics describing the observation_period tableplotObservationPeriod(): create plots showing the
resultstableObservationPeriod(): display the results in a
formatted tableLet’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the R package omock.
library(dplyr)
library(OmopSketch)
library(omock)
# Connect to mock database
cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.
Let’s now use the summariseObservationPeriod() function
from the OmopSketch package to generate an overview of
the observation_period table.
This function provides both a general summary of the table and some
detailed statistics, such as the Number of subjects and
the Duration in days for each observation period (e.g.,
first, second, etc.).
summarisedResult <- summariseObservationPeriod(cdm = cdm)
summarisedResult
#> # A tibble: 3,126 × 13
#> result_id cdm_name group_name group_level strata_name strata_level
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 GiBleed observation_period_o… all overall overall
#> 2 1 GiBleed observation_period_o… all overall overall
#> 3 1 GiBleed observation_period_o… all overall overall
#> 4 1 GiBleed observation_period_o… all overall overall
#> 5 1 GiBleed observation_period_o… all overall overall
#> 6 1 GiBleed observation_period_o… all overall overall
#> 7 1 GiBleed observation_period_o… all overall overall
#> 8 1 GiBleed observation_period_o… all overall overall
#> 9 1 GiBleed observation_period_o… all overall overall
#> 10 1 GiBleed observation_period_o… all overall overall
#> # ℹ 3,116 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> # estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> # additional_name <chr>, additional_level <chr>
Notice that the output is in the summarised result format.
We can use the function arguments to specify which summary statistics to compute. For instance, the estimates argument allows us to define which estimates we want to calculate for variables such as the Duration in days of the observation period or the Number of records per person.
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95")
)
summarisedResult |>
filter(variable_name == "Duration in days") |>
select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 8 × 4
#> group_level variable_name estimate_name estimate_value
#> <chr> <chr> <chr> <chr>
#> 1 all Duration in days mean 14402.0014972862
#> 2 all Duration in days sd 8725.34082831129
#> 3 all Duration in days q05 1436
#> 4 all Duration in days q95 29133
#> 5 1st Duration in days mean 14402.0014972862
#> 6 1st Duration in days sd 8725.34082831129
#> 7 1st Duration in days q05 1436
#> 8 1st Duration in days q95 29133
By default, the function returns statistics for the Number of subjects, Duration in days, and Days to next observation both overall and by each ordinal observation period (for example, first, second, etc.).
If we are only interested in overall statistics rather than those
broken down by ordinal period, we can set the argument
byOrdinal = FALSE:
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95"),
byOrdinal = FALSE
)
summarisedResult |>
filter(variable_name == "Duration in days") |>
distinct(group_name, group_level)
#> # A tibble: 1 × 2
#> group_name group_level
#> <chr> <chr>
#> 1 observation_period_ordinal all
When the argument missingData = TRUE is set, the results
will include an overall summary of missing data in the table, including
the number of 0s in the concept columns. This output is
analogous to the results produced by the OmopSketch function
summariseMissingData().
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95"),
missingData = TRUE
)
summarisedResult |>
filter(variable_name == "Column name") |>
select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 16 × 4
#> group_level variable_name estimate_name estimate_value
#> <chr> <chr> <chr> <chr>
#> 1 all Column name na_count 0
#> 2 all Column name na_percentage 0.00
#> 3 all Column name zero_count 0
#> 4 all Column name zero_percentage 0.00
#> 5 all Column name na_count 0
#> 6 all Column name na_percentage 0.00
#> 7 all Column name zero_count 0
#> 8 all Column name zero_percentage 0.00
#> 9 all Column name na_count 0
#> 10 all Column name na_percentage 0.00
#> 11 all Column name na_count 0
#> 12 all Column name na_percentage 0.00
#> 13 all Column name na_count 0
#> 14 all Column name na_percentage 0.00
#> 15 all Column name zero_count 0
#> 16 all Column name zero_percentage 0.00
When the argument quality = TRUE is set, the results
will include a quality assessment of the observation period table.
This assessment provides information such as:
person_id values that do not exist in
the person table.summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = "mean",
missingData = FALSE,
quality = TRUE
)
summarisedResult |>
select(group_level, variable_name, variable_level, estimate_name, estimate_value)
#> # A tibble: 14 × 5
#> group_level variable_name variable_level estimate_name estimate_value
#> <chr> <chr> <chr> <chr> <chr>
#> 1 all Records per person <NA> mean 1
#> 2 all Duration in days <NA> mean 14402.0014972…
#> 3 1st Duration in days <NA> mean 14402.0014972…
#> 4 all Number records <NA> count 5343
#> 5 all Number subjects <NA> count 5343
#> 6 1st Number subjects <NA> count 5343
#> 7 all Type concept id Period coveri… count 5343
#> 8 all Subjects not in pers… <NA> count 2649
#> 9 all Subjects not in pers… <NA> percentage 49.58
#> 10 all End date before star… <NA> count 0
#> 11 all Start date before bi… <NA> count 0
#> 12 all Type concept id Period coveri… percentage 100.00
#> 13 all End date before star… <NA> percentage 0.00
#> 14 all Start date before bi… <NA> percentage 0.00
It is also possible to stratify the results by sex and age groups:
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95"),
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
)
Notice that, by default, the “overall” group will be also included,
as well as crossed strata (that means, sex == "Female" and
ageGroup == "\>35").
tableObservationPeriod() will help you to create a table
(see supported types with: visOmopResults::tableType()). By
default it creates a gt table.
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
estimates = c("mean", "sd", "q05", "q95"),
sex = TRUE
)
summarisedResult |>
tableObservationPeriod(type = "gt")
| Observation period ordinal | Variable name | Variable level | Estimate name |
CDM name
|
|---|---|---|---|---|
| GiBleed | ||||
| overall | ||||
| all | Number records | – | N | 5,343 |
| Number subjects | – | N | 5,343 | |
| Subjects not in person table | – | N (%) | 2,649 (49.58%) | |
| Records per person | – | Mean (SD) | 1.00 (0.00) | |
| Duration in days | – | Mean (SD) | 14,402.00 (8,725.34) | |
| Type concept id | Period covering healthcare encounters | N (%) | 5,343 (100.00%) | |
| Start date before birth date | – | N (%) | 0 (0.00%) | |
| End date before start date | – | N (%) | 0 (0.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| 1st | Number subjects | – | N | 5,343 |
| Duration in days | – | Mean (SD) | 14,402.00 (8,725.34) | |
| Female | ||||
| all | Number records | – | N | 1,373 |
| Number subjects | – | N | 1,373 | |
| Records per person | – | Mean (SD) | 1.00 (0.00) | |
| Duration in days | – | Mean (SD) | 21,666.77 (5,623.53) | |
| Type concept id | Period covering healthcare encounters | N (%) | 1,373 (100.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| 1st | Number subjects | – | N | 1,373 |
| Duration in days | – | Mean (SD) | 21,666.77 (5,623.53) | |
| Male | ||||
| all | Number records | – | N | 1,321 |
| Number subjects | – | N | 1,321 | |
| Records per person | – | Mean (SD) | 1.00 (0.00) | |
| Duration in days | – | Mean (SD) | 21,535.91 (5,287.44) | |
| Type concept id | Period covering healthcare encounters | N (%) | 1,321 (100.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| 1st | Number subjects | – | N | 1,321 |
| Duration in days | – | Mean (SD) | 21,535.91 (5,287.44) | |
| None | ||||
| all | Number records | – | N | 2,649 |
| Number subjects | – | N | 2,649 | |
| Records per person | – | Mean (SD) | 1.00 (0.00) | |
| Duration in days | – | Mean (SD) | 7,079.08 (4,106.70) | |
| Type concept id | Period covering healthcare encounters | N (%) | 2,649 (100.00%) | |
| Column name | Observation period end date | N missing data (%) | 0 (0.00%) | |
| Observation period id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Observation period start date | N missing data (%) | 0 (0.00%) | ||
| Period type concept id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| Person id | N missing data (%) | 0 (0.00%) | ||
| N zeros (%) | 0 (0.00%) | |||
| 1st | Number subjects | – | N | 2,649 |
| Duration in days | – | Mean (SD) | 7,079.08 (4,106.70) | |
Finally, we can visualise the result using
plotObservationPeriod().
summarisedResult <- summariseObservationPeriod(cdm = cdm)
plotObservationPeriod(
result = summarisedResult,
variableName = "Number subjects",
plotType = "barplot"
)
Note that either Number subjects or
Duration in days can be plotted. For
Number of subjects, the plot type can be
barplot, whereas for Duration in days, the
plot type can be barplot, boxplot, or
densityplot.”
Additionally, if results were stratified by sex or age group, we can
further use facet or colour arguments to
highlight the different results in the plot. To help us identify by
which variables we can colour or facet by, we can use visOmopResult
package.
summarisedResult <- summariseObservationPeriod(cdm = cdm, sex = TRUE)
plotObservationPeriod(
result = summarisedResult,
variableName = "Duration in days",
plotType = "boxplot",
facet = "sex"
)
summarisedResult <- summariseObservationPeriod(
cdm = cdm,
sex = TRUE,
ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
plotObservationPeriod(
result = summarisedResult,
colour = "sex",
facet = "age_group"
)
Finally, disconnect from the mock CDM.
cdmDisconnect(cdm = cdm)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.