The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Decoding UKB Column Names and Values

Overview

Raw UKB phenotype data contains encoded column names and values that need to be converted before analysis.

Source Column names Column values
extract_pheno() participant.p31 Raw integer codes — needs decode_values()
extract_batch() p31, p53_i0 Usually already decoded — decode_values() typically not needed

Both outputs need decode_names() to convert field ID column names to human-readable snake_case.

Call order matters: when using extract_pheno() output, always run decode_values() before decode_names(), because value decoding relies on the numeric field ID still being present in the column name.


Step 1: Decode Values

decode_values() converts raw integer codes to human-readable labels for categorical fields that have UKB encoding mappings. Continuous, date, text, and already-decoded fields are left unchanged.

df <- decode_values(df)
#> ✔ Decoded 3 categorical columns; 2 non-categorical columns unchanged.

It requires two metadata files from the UKB Showcase. Download them once with:

fetch_metadata(dest_dir = "data/metadata")

Then point decode_values() to the same directory (default matches fetch_metadata()):

df <- decode_values(df, metadata_dir = "data/metadata")

What gets decoded

Column Raw value Decoded value
p31 0 / 1 "Female" / "Male"
p54 11012 "Leeds"
p20116_i0 0 / 1 / 2 "Never" / "Previous" / "Current"

Codes absent from the encoding table (including UKB missing codes -1, -3, -7) are returned as NA.


Step 2: Decode Names

decode_names() renames columns from field ID format to snake_case labels using the approved UKB field dictionary available to your project.

df <- decode_names(df)
#> ✔ Renamed 5 columns.

Name conversion examples

Raw name Decoded name
participant.eid eid
participant.p31 sex
participant.p21022 age_at_recruitment
participant.p53_i0 date_of_attending_assessment_centre_i0
p31 sex
p53_i0 date_of_attending_assessment_centre_i0

Both extract_pheno() format (participant.p31) and extract_batch() format (p31) are handled automatically.

Long names

Some UKB field titles are verbose. Names exceeding max_nchar characters are flagged with a warning (default: 60). Lower the threshold to catch more aggressively:

df <- decode_names(df, max_nchar = 30)
#> ! 1 column name longer than 30 characters - consider renaming manually:
#> • date_of_attending_assessment_centre_i0

Rename manually to something concise:

names(df)[names(df) == "date_of_attending_assessment_centre_i0"] <- "date_baseline"

Getting Help

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.