Decoding UKB Column Names and Values

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

Raw UKB phenotype data contains encoded column names and values that need to be converted before analysis.

Source	Column names	Column values
`extract_pheno()`	`participant.p31`	Raw integer codes — needs `decode_values()`
`extract_batch()`	`p31`, `p53_i0`	Usually already decoded — `decode_values()` typically not needed

Both outputs need decode_names() to convert field ID column names to human-readable snake_case.

Call order matters: when using extract_pheno() output, always run decode_values() before decode_names(), because value decoding relies on the numeric field ID still being present in the column name.

Recommended Workflow

library(ukbflow)

df <- extract_pheno(c(31, 54, 20116, 21022))
df <- decode_values(df)   # 0/1 → "Female"/"Male", etc.
df <- decode_names(df)    # participant.p31 → sex

Step 1: Decode Values

decode_values() converts raw integer codes to human-readable labels for categorical fields that have UKB encoding mappings. Continuous, date, text, and already-decoded fields are left unchanged.

df <- decode_values(df)
#> ✔ Decoded 3 categorical columns; 2 non-categorical columns unchanged.

It requires two metadata files from the UKB Showcase. Download them once with:

fetch_metadata(dest_dir = "data/metadata")

Then point decode_values() to the same directory (default matches fetch_metadata()):

df <- decode_values(df, metadata_dir = "data/metadata")

What gets decoded

Column	Raw value	Decoded value
`p31`	`0` / `1`	`"Female"` / `"Male"`
`p54`	`11012`	`"Leeds"`
`p20116_i0`	`0` / `1` / `2`	`"Never"` / `"Previous"` / `"Current"`

Codes absent from the encoding table (including UKB missing codes -1, -3, -7) are returned as NA.

Step 2: Decode Names

decode_names() renames columns from field ID format to snake_case labels using the approved UKB field dictionary available to your project.

df <- decode_names(df)
#> ✔ Renamed 5 columns.

Name conversion examples

Raw name	Decoded name
`participant.eid`	`eid`
`participant.p31`	`sex`
`participant.p21022`	`age_at_recruitment`
`participant.p53_i0`	`date_of_attending_assessment_centre_i0`
`p31`	`sex`
`p53_i0`	`date_of_attending_assessment_centre_i0`

Both extract_pheno() format (participant.p31) and extract_batch() format (p31) are handled automatically.

Long names

Some UKB field titles are verbose. Names exceeding max_nchar characters are flagged with a warning (default: 60). Lower the threshold to catch more aggressively:

df <- decode_names(df, max_nchar = 30)
#> ! 1 column name longer than 30 characters - consider renaming manually:
#> • date_of_attending_assessment_centre_i0

Rename manually to something concise:

names(df)[names(df) == "date_of_attending_assessment_centre_i0"] <- "date_baseline"

Getting Help

?decode_values, ?decode_names
vignette("extract") — extracting phenotype data
GitHub Issues

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.