Get Started with ukbflow

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Welcome to `ukbflow`

ukbflow is an R package for UK Biobank analysis on the Research Analysis Platform (RAP). It covers the full midstream-to-downstream pipeline — from phenotype derivation and association analysis to publication-ready figures and genetic risk scoring — designed for RAP-native UKB workflows, with local simulated data for development and testing.

Installation

# From CRAN (recommended)
install.packages("ukbflow")

# Latest development version from GitHub
pak::pkg_install("evanbio/ukbflow")

A Quick Taste

Load data

library(ukbflow)

df <- ops_toy()   # synthetic UKB-like cohort, no RAP connection needed

# On RAP, replace with:
# auth_login()
# auth_select_project("project-XXXXXXXXXXXX")
# df <- extract_pheno(c(31, 21022, 53, 20116)) |>
#   decode_values() |>
#   decode_names()

Derive a disease phenotype

df <- df |>
  derive_missing() |>                                               # recode "Prefer not to answer" → NA
  derive_selfreport(name = "t2dm", regex = "diabetes",           # T2DM self-report
                    field = "noncancer") |>
  derive_icd10(name = "t2dm", icd10 = "E11", source = "hes") |> # T2DM from HES
  derive_case(name = "t2dm") |>                                  # → t2dm_status, t2dm_date
  derive_followup(name         = "t2dm",
                  event_col    = "t2dm_date",
                  baseline_col = "p53_i0",                          # assessment centre date
                  censor_date  = as.Date("2022-06-01"))

Run an association model

res <- assoc_coxph(
  data         = df,
  outcome_coll  = "t2dm_status",
  time_col     = "t2dm_followup_years",
  exposure_col = "p21001_i0",   # BMI (continuous)
  covariates   = c("p21022",    # age_at_recruitment
                   "p31")       # sex
)

Plot the results

# Forest plot — see vignette("plot") for full usage
res_df <- as.data.frame(res)
plot_forest(
  data      = res_df,
  est       = res_df$HR,
  lower     = res_df$CI_lower,
  upper     = res_df$CI_upper,
  ci_column = 7L   # res_df has 6 cols before HR; CI graphic goes here
)

# Table 1
plot_tableone(
  data   = as.data.frame(df),
  vars   = c("p21022",     # age_at_recruitment
             "p31",        # sex
             "p21001_i0"), # bmi
  strata = "t2dm_status"
)

Full Function Overview

Module	Key functions	Vignette
Auth	`auth_login()`, `auth_select_project()`	`vignette("auth")`
Fetch	`fetch_ls()`, `fetch_file()`, `fetch_tree()`	`vignette("fetch")`
Extract	`extract_pheno()`, `extract_batch()`, `extract_ls()`	`vignette("extract")`
Job	`job_wait()`, `job_status()`, `job_result()`	`vignette("job")`
Decode	`decode_values()`, `decode_names()`	`vignette("decode")`
Derive	`derive_missing()`, `derive_icd10()`, `derive_case()`	`vignette("derive")`
Survival	`derive_timing()`, `derive_age()`, `derive_followup()`	`vignette("derive-survival")`
Assoc	`assoc_coxph()`, `assoc_logistic()`, `assoc_subgroup()`	`vignette("assoc")`
Plot	`plot_forest()`, `plot_tableone()`	`vignette("plot")`
GRS	`grs_check()`, `grs_score()`, `grs_validate()`	`vignette("grs")`
Ops	`ops_setup()`, `ops_toy()`, `ops_snapshot()`	`vignette("ops")`

End-to-End Case Study

For a complete worked example using a simulated UK Biobank cohort — covering data loading, phenotype derivation, cohort assembly, Cox regression, and publication-ready visualisation — see:

vignette("smoking_lung_cancer") — Smoking and Lung Cancer Risk: A Complete Analysis Workflow

Additional Resources

Documentation site
GitHub
View all functions: ?ukbflow or help(package = "ukbflow")

“All models are wrong, but some are publishable.”

— after George Box

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

Get Started with ukbflow

Welcome to ukbflow

Installation

A Quick Taste

Load data

Derive a disease phenotype

Run an association model

Plot the results

Full Function Overview

End-to-End Case Study

Additional Resources

Welcome to `ukbflow`