The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to gtexr

The GTEx Portal API V2 enables programmatic access to data available from the Genotype-Tissue Expression Portal. The gtexr package wraps this API, providing R functions that correspond to each API endpoint:

Shiny app

Users can try out all functions interatively with the ⭐gtexr shiny app⭐, which pre-populates query parameters with those for the first working example from each function’s documentation. To run the app locally:

shiny::runApp(system.file("app", package = "gtexr"))

Paginated responses

library(gtexr)
library(dplyr)
library(purrr)

Many API endpoints return only the first 250 available items by default. A warning is raised if the number of available items exceeds the selected maximum page size e.g.

get_eqtl_genes("Whole_Blood")
#> Warning: ! Total number of items (12360) exceeds the selected maximum page size (250).
#> ✖ 12110 items were not retrieved.
#> ℹ To retrieve all available items, increase `itemsPerPage`, ensuring you reuse
#>   your original query parameters e.g.
#>   `get_eqtl_genes(<your_existing_parameters>, itemsPerPage = 100000)`
#> ℹ Alternatively, adjust global "gtexr.itemsPerPage" setting e.g.
#>   `options(list(gtexr.itemsPerPage = 100000))`
#> 
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 50
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 12360
#> # A tibble: 250 × 10
#>    tissueSiteDetailId ontologyId  datasetId empiricalPValue gencodeId geneSymbol
#>    <chr>              <chr>       <chr>               <dbl> <chr>     <chr>     
#>  1 Whole_Blood        UBERON:001… gtex_v8          1.05e- 9 ENSG0000… WASH7P    
#>  2 Whole_Blood        UBERON:001… gtex_v8          1.06e-25 ENSG0000… RP11-34P1…
#>  3 Whole_Blood        UBERON:001… gtex_v8          6.31e- 2 ENSG0000… CICP27    
#>  4 Whole_Blood        UBERON:001… gtex_v8          8.71e- 9 ENSG0000… RP11-34P1…
#>  5 Whole_Blood        UBERON:001… gtex_v8          6.01e-20 ENSG0000… RP11-34P1…
#>  6 Whole_Blood        UBERON:001… gtex_v8          6.96e- 9 ENSG0000… RP11-34P1…
#>  7 Whole_Blood        UBERON:001… gtex_v8          3.10e- 4 ENSG0000… RP11-34P1…
#>  8 Whole_Blood        UBERON:001… gtex_v8          1.92e- 3 ENSG0000… ABC7-4304…
#>  9 Whole_Blood        UBERON:001… gtex_v8          1.58e- 3 ENSG0000… RP11-34P1…
#> 10 Whole_Blood        UBERON:001… gtex_v8          7.82e- 2 ENSG0000… AP006222.2
#> # ℹ 240 more rows
#> # ℹ 4 more variables: log2AllelicFoldChange <dbl>, pValue <dbl>,
#> #   pValueThreshold <dbl>, qValue <dbl>

For most cases, the simplest solution is to increase the value of itemsPerPage e.g. get_eqtl_genes("Whole_Blood", itemsPerPage = 100000). This limit can be set globally by setting the “gtexr.itemsPerPage” option with options(list(gtexr.itemsPerPage = 100000)).

Alternatively, multiple pages can be retrieved sequentially e.g.

# to retrieve the first 3 pages, with default setting of 250 items per page
1:3 |>
  map(\(page) get_eqtl_genes("Whole_Blood", page = page, .verbose = FALSE) |>
        suppressWarnings()) |>
  bind_rows()
#> # A tibble: 750 × 10
#>    tissueSiteDetailId ontologyId  datasetId empiricalPValue gencodeId geneSymbol
#>    <chr>              <chr>       <chr>               <dbl> <chr>     <chr>     
#>  1 Whole_Blood        UBERON:001… gtex_v8          5.44e- 2 ENSG0000… ARID1A    
#>  2 Whole_Blood        UBERON:001… gtex_v8          2.60e-21 ENSG0000… PIGV      
#>  3 Whole_Blood        UBERON:001… gtex_v8          3.46e-17 ENSG0000… ZDHHC18   
#>  4 Whole_Blood        UBERON:001… gtex_v8          8.02e- 4 ENSG0000… GPN2      
#>  5 Whole_Blood        UBERON:001… gtex_v8          3.48e- 8 ENSG0000… TRNP1     
#>  6 Whole_Blood        UBERON:001… gtex_v8          2.15e- 3 ENSG0000… SLC9A1    
#>  7 Whole_Blood        UBERON:001… gtex_v8          1.09e- 7 ENSG0000… WDTC1     
#>  8 Whole_Blood        UBERON:001… gtex_v8          5.17e- 4 ENSG0000… RP11-4K3_…
#>  9 Whole_Blood        UBERON:001… gtex_v8          5.98e- 4 ENSG0000… TMEM222   
#> 10 Whole_Blood        UBERON:001… gtex_v8          7.62e- 6 ENSG0000… SYTL1     
#> # ℹ 740 more rows
#> # ℹ 4 more variables: log2AllelicFoldChange <dbl>, pValue <dbl>,
#> #   pValueThreshold <dbl>, qValue <dbl>

Note that paging information is printed to the R console by default. Set argument .verbose to FALSE to silence these messages, or disable globally with options(list(gtexr.verbose = FALSE)).

Examples

The rest of this vignette outlines some example applications of gtexr.

Get build 37 coordinates for a variant

get_variant(snpId = "rs1410858") |>
  tidyr::separate(
    col = b37VariantId,
    into = c(
      "chromosome",
      "position",
      "reference_allele",
      "alternative_allele",
      "genome_build"
    ),
    sep = "_",
    remove = FALSE
  ) |>
  select(snpId:genome_build)
#> 
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 1
#> # A tibble: 1 × 7
#>   snpId     b37VariantId chromosome position reference_allele alternative_allele
#>   <chr>     <chr>        <chr>      <chr>    <chr>            <chr>             
#> 1 rs1410858 1_153182116… 1          1531821… C                A                 
#> # ℹ 1 more variable: genome_build <chr>

Convert gene symbol to versioned GENCODE ID

Use get_gene() or get_genes()

get_genes("CRP") |>
  select(geneSymbol, gencodeId)
#> 
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 1
#> # A tibble: 1 × 2
#>   geneSymbol gencodeId         
#>   <chr>      <chr>             
#> 1 CRP        ENSG00000132693.12

Convert rsID to GTEx variant ID

Use get_variant()

get_variant(snpId = "rs1410858") |>
  select(snpId, variantId)
#> 
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 1
#> # A tibble: 1 × 2
#>   snpId     variantId             
#>   <chr>     <chr>                 
#> 1 rs1410858 chr1_153209640_C_A_b38

For a gene of interest, which tissues have significant cis-eQTLs?

Use get_significant_single_tissue_eqtls() (note this requires versioned GENCODE IDs)

gene_symbol_of_interest <- "CRP"

gene_gencodeId_of_interest <- get_genes(gene_symbol_of_interest) |>
  pull(gencodeId) |>
  suppressMessages()

gene_gencodeId_of_interest |>
  get_significant_single_tissue_eqtls() |>
  distinct(geneSymbol, gencodeId, tissueSiteDetailId)
#> 
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 93
#> # A tibble: 3 × 3
#>   geneSymbol gencodeId          tissueSiteDetailId                 
#>   <chr>      <chr>              <chr>                              
#> 1 CRP        ENSG00000132693.12 Thyroid                            
#> 2 CRP        ENSG00000132693.12 Esophagus_Gastroesophageal_Junction
#> 3 CRP        ENSG00000132693.12 Muscle_Skeletal

Get data for non-eQTL variants

Some analyses (e.g. Mendelian randomisation) require data for variants which may or may not be significant eQTLs. Use calculate_expression_quantitative_trait_loci() with purrr::map() to retrieve data for multiple variants

variants_of_interest <- c("rs12119111", "rs6605071", "rs1053870")

variants_of_interest |>
  set_names() |>
  map(
    \(x) calculate_expression_quantitative_trait_loci(
      tissueSiteDetailId = "Liver",
      gencodeId = "ENSG00000237973.1",
      variantId = x
    )
  ) |>
  bind_rows(.id = "rsid") |>
  # optionally, reformat output - first extract genomic coordinates and alleles
  tidyr::separate(
    col = "variantId",
    into = c(
      "chromosome",
      "position",
      "reference_allele",
      "alternative_allele",
      "genome_build"
    ),
    sep = "_"
  ) |>
  # ...then ascertain alternative_allele frequency
  mutate(
    alt_allele_count = (2 * homoAltCount) + hetCount,
    total_allele_count = 2 * (homoAltCount + hetCount + homoRefCount),
    alternative_allele_frequency = alt_allele_count / total_allele_count
  ) |>
  select(
    rsid,
    beta = nes,
    se = error,
    pValue,
    minor_allele_frequency = maf,
    alternative_allele_frequency,
    chromosome:genome_build,
    tissueSiteDetailId
  )
#> # A tibble: 3 × 12
#>   rsid         beta     se  pValue minor_allele_frequency alternative_allele_f…¹
#>   <chr>       <dbl>  <dbl>   <dbl>                  <dbl>                  <dbl>
#> 1 rs121191…  0.0270 0.0670 6.88e-1                 0.365                   0.635
#> 2 rs6605071 -0.601  0.166  3.88e-4                 0.0409                  0.959
#> 3 rs1053870  0.0247 0.0738 7.38e-1                 0.214                   0.214
#> # ℹ abbreviated name: ¹​alternative_allele_frequency
#> # ℹ 6 more variables: chromosome <chr>, position <chr>, reference_allele <chr>,
#> #   alternative_allele <chr>, genome_build <chr>, tissueSiteDetailId <chr>

  1. With the exception of get_sample_biobank_data() and get_sample_datasets(), for which ‘get_sample’ is additionally appended with their respective category titles ‘biobank_data’ and ‘datasets’.↩︎

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.