The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The grs_* functions provide an end-to-end pipeline for
computing and validating polygenic / genetic risk scores within the UK
Biobank Research Analysis Platform (RAP). Because individual-level
genotype data cannot be downloaded locally, all computationally
intensive steps are executed as cloud jobs via the DNAnexus Swiss Army
Knife app.
| Function | Where it runs | Purpose |
|---|---|---|
grs_check() |
Local or RAP | Validate and export a SNP weights file |
grs_bgen2pgen() |
Submits RAP jobs | Convert UKB imputed BGEN files to PGEN |
grs_score() |
Submits RAP jobs | Score genotypes across chromosomes with plink2 |
grs_standardize() |
Local or RAP | Z-score standardise GRS columns |
grs_zscore() |
Local or RAP | Alias for grs_standardize() |
grs_validate() |
Local or RAP | Assess predictive performance (OR/HR, AUC, C-index) |
Typical pipeline:
grs_check() -> grs_bgen2pgen() -> grs_score() -> grs_standardize() -> grs_validate()
Prerequisite: you must be authenticated to UKB RAP and have a project selected. See
vignette("auth").
grs_check()Before uploading to RAP, validate your SNP weights file. The function reads the file, checks required columns, flags problems, and writes a plink2-compatible space-delimited output.
Required columns:
| Column | Description |
|---|---|
snp |
SNP identifier (expected rs + digits format) |
effect_allele |
Effect allele: one of A / T / C / G |
beta |
Effect size (log-OR or beta); must be numeric |
library(ukbflow)
# Local: weights file on your machine
w <- grs_check("weights.csv", dest = "weights_clean.txt")
#> Read 'weights.csv': 312 rows, 5 columns.
#> ✔ No NA values.
#> ✔ No duplicate SNPs.
#> ✔ All SNP IDs match rs[0-9]+ format.
#> ✔ All effect alleles are A/T/C/G.
#> Beta summary:
#> Range : -0.412 to 0.589
#> Mean |beta|: 0.183
#> Positive : 187 (59.9%)
#> Negative : 125 (40.1%)
#> Zero : 0
#> ✔ Weights file passed checks: 312 SNPs ready for UKB RAP.
#> ✔ Saved: 'weights_clean.txt'# On RAP (RStudio) -- use /mnt/project/ paths directly
w <- grs_check(
file = "/mnt/project/weights/weights.csv",
dest = "/mnt/project/weights/weights_clean.txt"
)The validated weights are also returned invisibly for inspection.
grs_bgen2pgen()UKB imputed genotype data is stored in BGEN format on RAP.
grs_bgen2pgen() submits one Swiss Army Knife job per
chromosome to convert BGEN files to the faster PGEN format with a MAF
filter applied via plink2.
This function is called from your local machine or RAP RStudio – the heavy lifting runs entirely in the cloud.
# Test on the smallest chromosome first
ids <- grs_bgen2pgen(chr = 22, dest = "/pgen/", priority = "high")
#> Uploading 'grs_bgen2pgen_std.R' to RAP ...
#> ✔ Uploaded: '/grs_bgen2pgen_std.R'
#> Submitting 1 job(s) -- mem2_ssd1_v2_x4 / priority: high
#> ✔ chr22 -> 'job-Gxxxxxxxxxxxxxxxxxxxxxxxx'
#> ✔ 1/1 job(s) submitted. Monitor with job_ls().# Full run: small chromosomes on standard, large on upgraded instance
ids_small <- grs_bgen2pgen(chr = 17:22, dest = "/pgen/")
ids_large <- grs_bgen2pgen(chr = 1:16, dest = "/pgen/", instance = "large")
# Monitor progress (job_wait() takes one job ID at a time)
job_ls()
for (id in c(ids_small, ids_large)) job_wait(id)Instance types:
| Preset | DNAnexus instance | Cores | RAM | SSD | Recommended for |
|---|---|---|---|---|---|
"standard" |
mem2_ssd1_v2_x4 |
4 | 12 GB | 200 GB | chr 15–22 |
"large" |
mem2_ssd2_v2_x8 |
8 | 28 GB | 640 GB | chr 1–16 |
Key arguments:
| Argument | Default | Description |
|---|---|---|
chr |
1:22 |
Chromosomes to process |
dest |
— | RAP output path for PGEN files (required) |
maf |
0.01 |
MAF filter passed to plink2 --maf |
instance |
"standard" |
Instance type preset |
priority |
"low" |
Job priority ("low" or "high") |
Storage warning: chromosomes 1–16 may exceed the 200 GB SSD on
"standard"instances. Useinstance = "large"for these.
grs_score()Once PGEN files are ready, grs_score() uploads your
weights file(s) and submits one plink2 scoring job per GRS. Each job
scores all 22 chromosomes and saves a single CSV to RAP.
ids <- grs_score(
file = c(
grs_a = "weights/grs_a_weights.txt",
grs_b = "weights/grs_b_weights.txt"
),
pgen_dir = "/mnt/project/pgen",
dest = "/grs/",
priority = "high"
)
#> ── Uploading 2 weight file(s) to RAP ────────────────────────────────────────
#> Uploading 'grs_a_weights.txt' ...
#> ✔ Uploaded: '/grs_a_weights.txt'
#> Uploading 'grs_b_weights.txt' ...
#> ✔ Uploaded: '/grs_b_weights.txt'
#> Submitting 2 job(s) -- mem2_ssd1_v2_x4 / priority: high
#> ✔ grs_a -> 'job-Gxxxxxxxxxxxxxxxxxxxxxxxx'
#> ✔ grs_b -> 'job-Gyyyyyyyyyyyyyyyyyyyyyyyy'
#> ✔ 2/2 job(s) submitted. Monitor with job_ls().
job_wait(ids)When running from RAP RStudio with weights already at the project root, the upload step is skipped automatically:
# On RAP: weights already at /mnt/project/grs_a_weights.txt
ids <- grs_score(
file = c(grs_a = "/mnt/project/grs_a_weights.txt"),
pgen_dir = "/mnt/project/pgen",
dest = "/grs/"
)
#> ℹ grs_a_weights.txt already at RAP root, skipping upload.Important: the
mafargument must match the value used ingrs_bgen2pgen()so that the correct PGEN files are located.
Output per job: /grs/<GRS_name>_scores.csv with
columns IID and the GRS score named
GRS_<name>.
grs_standardize()After downloading the score CSVs from RAP (via
fetch_file()) and merging them into your analysis dataset,
Z-score standardise each GRS column so that effect estimates are
interpretable as per-SD associations.
# Auto-detect all columns containing "grs" (case-insensitive)
cohort <- grs_standardize(cohort)
#> Auto-detected 2 GRS column(s): 'GRS_a', 'GRS_b'
#> ✔ GRS_a -> GRS_a_z [mean=0.0000, sd=1.0000]
#> ✔ GRS_b -> GRS_b_z [mean=0.0000, sd=1.0000]grs_zscore() is an alias and produces an identical
result.
The original columns are preserved; _z columns are
inserted immediately after their source column.
grs_validate()grs_validate() runs four complementary validation
analyses for each GRS:
| Analysis | What it measures |
|---|---|
| Per SD | OR (logistic) or HR (Cox) per 1-SD increase in GRS |
| High vs Low | OR / HR comparing top 20% vs bottom 20% |
| Trend test | P-trend across Q1–Q4 quartiles |
| Discrimination | AUC (logistic) or C-index (Cox) with 95% CI |
res <- grs_validate(
data = cohort,
grs_cols = c("GRS_a_z", "GRS_b_z"),
outcome_col = "outcome"
)
#> ── Creating GRS groups ───────────────────────────────────────────────────────
#> ── Effect per SD (OR) ───────────────────────────────────────────────────────
#> ── High vs Low ──────────────────────────────────────────────────────────────
#> ── Trend test ───────────────────────────────────────────────────────────────
#> ── AUC ──────────────────────────────────────────────────────────────────────
#> ✔ Validation complete.
res$per_sd
res$discriminationres <- grs_validate(
data = cohort,
grs_cols = c("GRS_a_z", "GRS_b_z"),
outcome_col = "outcome",
time_col = "followup_years",
covariates = c("age", "sex", paste0("pc", 1:10))
)
res$per_sd # HR per SD x model
res$high_vs_low # HR: top 20% vs bottom 20%
res$trend # p-trend across Q1–Q4
res$discrimination # C-index with 95% CIReturn value: a named list with four
data.table elements.
| Element | Columns (logistic) | Columns (Cox) |
|---|---|---|
per_sd |
exposure, term, model,
OR, CI_lower, CI_upper,
p_value |
same with HR |
high_vs_low |
same as per_sd |
same with HR |
trend |
exposure, model, p_trend |
same |
discrimination |
GRS, AUC, CI_lower,
CI_upper |
GRS, C_index, CI_lower,
CI_upper |
AUC calculation requires the
pROCpackage. Install withinstall.packages("pROC")if needed.
library(ukbflow)
# 1. Validate weights (local)
grs_check("weights.csv", dest = "weights_clean.txt")
# 2. Convert BGEN -> PGEN on RAP (submit jobs)
ids_std <- grs_bgen2pgen(chr = 17:22, dest = "/pgen/", maf = 0.01)
ids_lrg <- grs_bgen2pgen(chr = 1:16, dest = "/pgen/", maf = 0.01, instance = "large")
for (id in c(ids_std, ids_lrg)) job_wait(id)
# 3. Score GRS on RAP (submit jobs)
score_ids <- grs_score(
file = c(grs_a = "weights_clean.txt"),
pgen_dir = "/mnt/project/pgen",
maf = 0.01, # must match grs_bgen2pgen()
dest = "/grs/"
)
job_wait(score_ids)
# 4. Download score CSV from RAP
fetch_file("/grs/GRS_a_scores.csv", dest = "GRS_a_scores.csv")
# 5. Merge into analysis dataset and standardise
# cohort: your analysis data.table with IID and phenotype columns
scores <- data.table::fread("GRS_a_scores.csv") # columns: IID, GRS_a
cohort <- scores[cohort, on = "IID"] # right-join: keep all cohort rows
cohort <- grs_standardize(cohort, grs_cols = "GRS_a")
# 6. Validate
res <- grs_validate(
data = cohort,
grs_cols = "GRS_a_z",
outcome_col = "outcome",
time_col = "followup_years",
covariates = c("age", "sex", paste0("pc", 1:10))
)
res$per_sd
res$discrimination?grs_check, ?grs_bgen2pgen,
?grs_score?grs_standardize, ?grs_validatevignette("auth") – RAP authentication and project
selectionvignette("job") – monitoring and managing RAP jobsvignette("assoc") – association analysis functions used
inside grs_validate()These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.