Start by loading gwasrapidd, and dplyr and tidyr.
library(dplyr)
library(tidyr)
library(gwasrapidd)
Let’s say you want to retrieve all variants associated with the phenotype Body Mass Index (BMI). Moreover, you want to sort them by their risk allele (minor allele), as well as the effect size (beta coefficient) and p-value.
First we start by finding the Experimental Factor Ontology (EFO) identifier(s) corresponding to BMI. To do this, we start by downloading all traits in the GWAS Catalog and then look for ‘BMI’ in the trait description column.
<- get_traits()
all_traits ::filter(all_traits@traits, grepl('BMI', trait, ignore.case = TRUE))
dplyr#> # A tibble: 10 × 3
#> efo_id trait uri
#> <chr> <chr> <chr>
#> 1 EFO_0005937 longitudinal BMI measurement http://www.ebi.ac…
#> 2 EFO_0007737 BMI-adjusted adiponectin measurement http://www.ebi.ac…
#> 3 EFO_0007788 BMI-adjusted waist-hip ratio http://www.ebi.ac…
#> 4 EFO_0007789 BMI-adjusted waist circumference http://www.ebi.ac…
#> 5 EFO_0007793 BMI-adjusted leptin measurement http://www.ebi.ac…
#> 6 EFO_0008036 BMI-adjusted fasting blood glucose measurement http://www.ebi.ac…
#> 7 EFO_0008037 BMI-adjusted fasting blood insulin measurement http://www.ebi.ac…
#> 8 EFO_0008038 BMI-adjusted hip bone size http://www.ebi.ac…
#> 9 EFO_0008039 BMI-adjusted hip circumference http://www.ebi.ac…
#> 10 EFO_0011044 BMI-adjusted neck circumference http://www.ebi.ac…
So there are several phenotypes whose description includes the keyword ‘BMI’. However, only the EFO trait 'EFO_0005937'
(‘longitudinal BMI measurement’) really corresponds to BMI as a phenotypic trait. All other traits are adjusted for BMI but are not BMI traits per se (you can further confirm this by looking at each trait description, just by opening your web browser with each respective URI).
To get statistical association data for the trait ‘longitudinal BMI measurement’ ('EFO_0005937'
), as well as associated variants and effect sizes, we use the gwasrapidd get_associations()
function:
<- get_associations(efo_id = 'EFO_0005937') bmi_associations
The S4 object bmi_associations
contains several tables, namely 'associations'
, 'loci'
, 'risk_alleles'
, 'genes'
, 'ensembl_ids'
and 'entrez_ids'
:
slotNames(bmi_associations)
#> [1] "associations" "loci" "risk_alleles" "genes" "ensembl_ids"
#> [6] "entrez_ids"
From table 'associations'
we can extract the variables:
'association_id'
'pvalue'
'beta_number'
'beta_unit'
'beta_direction'
whereas from table 'risk_alleles'
we can obtain:
'association_id'
'variant_id'
'risk_allele'
.We extract all these variables and combine them into one single dataframe (bmi_variants
), using 'association_id'
as the matching key:
<- dplyr::select(bmi_associations@risk_alleles, association_id, variant_id, risk_allele)
tbl01 <- dplyr::select(bmi_associations@associations, association_id, pvalue, beta_number, beta_unit, beta_direction)
tbl02
<- dplyr::left_join(tbl01, tbl02, by = 'association_id') %>%
bmi_variants ::drop_na() %>%
tidyr::arrange(variant_id, risk_allele) dplyr
The final results show 42 associations. Note that some variant/allele combinations might be repeated as the same variant/allele combination might have been assessed in more than one GWAS study.
print(bmi_variants, n = Inf)
#> # A tibble: 42 × 7
#> association_id variant_id risk_allele pvalue beta_number beta_unit
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 10066323 rs10041997 A 4e- 7 0.138 unit
#> 2 61729722 rs10070777 A 4e- 6 0.879 unit
#> 3 61729714 rs10278819 C 2e- 6 0.873 unit
#> 4 61729787 rs10426669 G 6e- 6 0.765 unit
#> 5 61729731 rs1048163 G 5e- 6 0.450 unit
#> 6 61729771 rs1048164 A 9e- 6 0.436 unit
#> 7 55309232 rs10515235 A 2e- 6 0.05 unit
#> 8 55309249 rs10938397 G 3e- 8 0.06 unit
#> 9 61729763 rs112045010 C 5e- 6 0.906 unit
#> 10 61729759 rs11979775 C 5e- 6 0.501 unit
#> 11 10068861 rs12926503 A 9e- 6 1.23 unit
#> 12 55655431 rs13015992 G 4e- 6 0.0149 unit
#> 13 61729726 rs13230004 A 4e- 6 0.534 unit
#> 14 10066325 rs1347155 T 5e- 7 0.341 unit
#> 15 10066338 rs1347155 T 9e- 8 0.559 unit
#> 16 55309238 rs1421085 C 3e-30 0.12 unit
#> 17 55655441 rs1485315 C 5e- 6 0.0151 unit
#> 18 61729779 rs17546654 A 7e- 6 0.553 unit
#> 19 55655446 rs17698894 A 6e- 6 0.0169 unit
#> 20 61729767 rs17834779 A 7e- 6 0.447 unit
#> 21 10068852 rs199950 G 3e- 6 1 unit
#> 22 55309254 rs2055816 C 5e- 6 0.07 unit
#> 23 61729739 rs2129108 A 8e- 6 0.795 unit
#> 24 55309243 rs2817419 A 4e-11 0.08 unit
#> 25 61729755 rs28716841 C 9e- 6 0.872 unit
#> 26 61729751 rs28800 A 5e- 6 0.522 unit
#> 27 10066336 rs347313 G 2e- 8 0.249 unit
#> 28 61729747 rs41501 A 8e- 6 0.513 unit
#> 29 61729709 rs57818938 T 1e- 6 0.932 unit
#> 30 10068856 rs634010 T 5e- 6 1.04 unit
#> 31 10066332 rs6447650 T 2e- 7 0.416 unit
#> 32 55655436 rs6897876 C 6e- 6 0.0161 unit
#> 33 61729718 rs7024062 A 2e- 6 0.588 unit
#> 34 10066327 rs7565158 A 6e- 7 0.136 unit
#> 35 61729775 rs76356591 T 6e- 6 0.588 unit
#> 36 61729704 rs78310016 G 4e- 8 1.05 unit
#> 37 61729783 rs8092589 A 9e- 6 0.472 unit
#> 38 10068866 rs8105895 T 3e- 6 1.26 unit
#> 39 61729735 rs927899 C 9e- 6 0.742 unit
#> 40 13293992 rs9346455 G 6e- 6 0.013 unit
#> 41 55309227 rs9436303 G 8e- 9 0.07 unit
#> 42 61729743 rs9836481 A 8e- 6 0.977 unit
#> # … with 1 more variable: beta_direction <chr>