The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
UKB phenotype data is stored in a proprietary .dataset
format on the RAP and cannot be read directly. The
extract_* functions provide R interfaces for discovering
approved fields and extracting phenotype data via the DNAnexus
dx extract_dataset and table-exporter
tools.
Two workflows are available:
| Function | Mode | Scale | Output |
|---|---|---|---|
extract_batch() |
Async job | Large / production (typically 50+ fields) | job ID → CSV on RAP cloud |
extract_pheno() |
Synchronous | Small (quick checks) | data.table in memory |
extract_batch() is the recommended
approach for any serious analysis. extract_pheno()
is provided for quick interactive inspection inside the RAP environment
only.
Ensure you are authenticated and have selected your project:
Before extracting, use extract_ls() to explore what
fields are approved for your project:
# List all approved fields (cached after first call)
extract_ls()
# Search by keyword
extract_ls(pattern = "cancer")
extract_ls(pattern = "p31|p53|p21022")
# Force refresh after switching projects or datasets
extract_ls(refresh = TRUE)The result is a data.frame with two columns:
| Column | Example |
|---|---|
field_name |
participant.p53_i0 |
title |
Date of attending assessment centre \| Instance 0 |
Fields reflect your project’s approved data only — not all UKB fields are present.
extract_batch()For large-scale or production extractions, submit an asynchronous table-exporter job on the RAP cloud:
# Submit extraction job
job_id <- extract_batch(c(31, 53, 21022, 22189))
# Custom output name
job_id <- extract_batch(
field_id = c(31, 53, 21022, 22189),
file = "ukb_demographics"
)
# High priority (faster queue, higher cost)
job_id <- extract_batch(
field_id = c(31, 53, 21022, 22189),
priority = "high"
)The job runs asynchronously on the RAP cloud. The output CSV is saved
to your RAP project and can be monitored with the job_
series:
job_status(job_id) # check progress
job_path(job_id) # get cloud file path once complete
job_result(job_id) # read result as data.table (inside RAP only)extract_batch() automatically selects an appropriate
instance based on the number of columns:
| Columns | Instance |
|---|---|
| ≤ 20 | mem1_ssd1_v2_x4 |
| ≤ 100 | mem1_ssd1_v2_x8 |
| ≤ 500 | mem1_ssd1_v2_x16 |
| > 500 | mem1_ssd1_v2_x36 |
You can override this with the instance_type argument if
needed.
extract_pheno()For small-scale interactive checks inside the RAP RStudio environment:
extract_pheno()is restricted to the RAP environment and returns data in memory only. For any analysis intended to be saved or reproduced, useextract_batch().
Note: extract_pheno() returns raw coded
values (e.g. 1/0 for Sex, numeric
codes for diseases). Use the decode_* series to convert
codes to human-readable labels.
Column naming differs between the two extraction methods:
extract_batch() — no prefix:
| Column | Meaning |
|---|---|
eid |
Participant ID |
p31 |
Field 31 (Sex) |
p53_i0 |
Field 53, Instance 0 |
p20002_i0_a0 |
Field 20002, Instance 0, Array 0 |
extract_pheno() —
participant. prefix:
| Column | Meaning |
|---|---|
participant.eid |
Participant ID |
participant.p31 |
Field 31 (Sex) |
participant.p53_i0 |
Field 53, Instance 0 |
participant.p20002_i0_a0 |
Field 20002, Instance 0, Array 0 |
?extract_ls, ?extract_pheno,
?extract_batchvignette("auth") — authentication setupThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.