The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
PatientGenerator facilitates the creation of synthetic
test datasets for the OMOP Common Data Model (CDM) using two
complementary approaches:
patientChat: Generates structured
patient JSON files using Large Language Models (LLMs).patientDesigner: Provides a D3-based
Shiny interface for reviewing and editing CDM test sets.The package also includes support for Hecate-powered concept lookups to ensure valid OMOP concept codes.
# install.packages("remotes")
remotes::install_github("mi-erasmusmc/PatientGenerator")patientChat.patientDesigner().
hecateSearch) during table editing.patientChatSet an OPENAI_API_KEY environment variable (e.g., via
usethis::edit_r_environ()) to enable LLM access.
Available models can be listed using
PatientGenerator::availableModels().
library(PatientGenerator)
patientGenerator <- patientChat$new(
model = "gpt-5.4",
echo = "none"
)Provide detailed prompts, including specific concept sets, for optimal results.
patientGenerator$prompt(
"Population (person table):
- 10 adult patients
- 5 female
- 5 male
Observation Period:
- Start date between date of birth and 2025-12-31
Condition Occurrence:
- All patients must have Diabetes (condition_concept_id: 201826)
- Start date between 2015-01-01 and 2020-12-31
Drug Exposure:
- All patients must have Semaglutide (drug_concept_id: 19079450)
- Exposure within 30 days post-index date
Measurement:
- All patients must have Fasting glucose (measurement_concept_id: 3018251)
Procedure Occurrence:
- 50% of patients must have Amputation of toe (procedure_concept_id: 4159766)
Output Requirements:
- Populate only the tables specified in this prompt"
)testthatSave the generated dataset as a JSON file and utilize
TestGenerator::patientsCDM to instantiate a CDM
reference.
patientGenerator$save(name = "diabetes-patients")
cdm <- TestGenerator::patientsCDM(
testName = "diabetes-patients",
cdmVersion = "5.4"
)
cdm$person |>
collect() |>
print()#> cdm$person |> collect() |> head(5)
#> person_id gender_concept_id year_of_birth person_source_value
#> <int> <int> <int> <char>
#> 1: 1 8532 1965 SYN001
#> 2: 2 8532 1972 SYN002
#> 3: 3 8532 1958 SYN003
#> 4: 4 8532 1981 SYN004
#> 5: 5 8532 1949 SYN005
The LLM can be instructed to modify the current test set within the
same patientChat instance.
patientGenerator$prompt("Remove all male patients")#> cdm$person |> collect() |> head(5)
#> person_id gender_concept_id year_of_birth person_source_value
#> <int> <int> <int> <char>
#> 1: 1 8532 1965 SYN001
#> 2: 2 8532 1972 SYN002
#> 3: 3 8532 1958 SYN003
#> 4: 4 8532 1981 SYN004
#> 5: 5 8532 1949 SYN005
patientDesigner()Launch the interactive editor to review and refine datasets:
PatientGenerator::patientDesigner()The interface supports:
patientDesigner integrates a concept search module
powered by hecateSearch(). This allows users to search for
and insert valid OMOP concept IDs directly into the CDM tables.
Configure Hecate globally via environment variables:
Sys.setenv(
HECATE_BASE_URL = "https://your-hecate-server/api",
HECATE_API_KEY = "your-api-key"
)Or via package options:
options(PatientGenerator.hecate = list(
base_url = "https://your-hecate-server/api",
timeout_ms = 15000,
api_key = "your-api-key"
))vignette("shiny-integration", package = "PatientGenerator")These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.