The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Get Started with ukbflow

Welcome to ukbflow

ukbflow is an R package for UK Biobank analysis on the Research Analysis Platform (RAP). It covers the full midstream-to-downstream pipeline — from phenotype derivation and association analysis to publication-ready figures and genetic risk scoring — designed for RAP-native UKB workflows, with local simulated data for development and testing.

Installation

pak::pkg_install("evanbio/ukbflow")

A Quick Taste

Load data

library(ukbflow)

df <- ops_toy()   # synthetic UKB-like cohort, no RAP connection needed

# On RAP, replace with:
# auth_login()
# auth_select_project("project-XXXXXXXXXXXX")
# df <- extract_pheno(c(31, 21022, 53, 20116)) |>
#   decode_values() |>
#   decode_names()

Derive a disease phenotype

df <- df |>
  derive_missing() |>                                               # recode "Prefer not to answer" → NA
  derive_selfreport(name = "t2dm", regex = "diabetes",           # T2DM self-report
                    field = "noncancer") |>
  derive_icd10(name = "t2dm", icd10 = "E11", source = "hes") |> # T2DM from HES
  derive_case(name = "t2dm") |>                                  # → t2dm_status, t2dm_date
  derive_followup(name         = "t2dm",
                  event_col    = "t2dm_date",
                  baseline_col = "p53_i0",                          # assessment centre date
                  censor_date  = as.Date("2022-06-01"))

Run an association model

res <- assoc_coxph(
  data         = df,
  outcome_coll  = "t2dm_status",
  time_col     = "t2dm_followup_years",
  exposure_col = "p21001_i0",   # BMI (continuous)
  covariates   = c("p21022",    # age_at_recruitment
                   "p31")       # sex
)

Plot the results

# Forest plot — see vignette("plot") for full usage
res_df <- as.data.frame(res)
plot_forest(
  data      = res_df,
  est       = res_df$HR,
  lower     = res_df$CI_lower,
  upper     = res_df$CI_upper,
  ci_column = 7L   # res_df has 6 cols before HR; CI graphic goes here
)

# Table 1
plot_tableone(
  data   = as.data.frame(df),
  vars   = c("p21022",     # age_at_recruitment
             "p31",        # sex
             "p21001_i0"), # bmi
  strata = "t2dm_status"
)

Full Function Overview

Module Key functions Vignette
Auth auth_login(), auth_select_project() vignette("auth")
Fetch fetch_ls(), fetch_file(), fetch_tree() vignette("fetch")
Extract extract_pheno(), extract_batch(), extract_ls() vignette("extract")
Job job_wait(), job_status(), job_result() vignette("job")
Decode decode_values(), decode_names() vignette("decode")
Derive derive_missing(), derive_icd10(), derive_case() vignette("derive")
Survival derive_timing(), derive_age(), derive_followup() vignette("derive-survival")
Assoc assoc_coxph(), assoc_logistic(), assoc_subgroup() vignette("assoc")
Plot plot_forest(), plot_tableone() vignette("plot")
GRS grs_check(), grs_score(), grs_validate() vignette("grs")
Ops ops_setup(), ops_toy(), ops_snapshot() vignette("ops")

End-to-End Case Study

For a complete worked example using a simulated UK Biobank cohort — covering data loading, phenotype derivation, cohort assembly, Cox regression, and publication-ready visualisation — see:

vignette("smoking_lung_cancer")Smoking and Lung Cancer Risk: A Complete Analysis Workflow

Additional Resources

“All models are wrong, but some are publishable.”

— after George Box

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.