Example Workflow for Bulk RNA-Seq Analysis

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This vignette provides a step-by-step guide on how to perform bulk RNA-Seq analysis using the Limma-voom workflow. It is designed to be a comprehensive resource for researchers looking to analyze gene expression data from projects such as TCGA. You can view an example script for this workflow by running the following command

Limma-voom workflow

Prepare data

For downloading and processing datasets from the Gene Expression Omnibus (GEO), utilize the prepare_geo() function. This function will generate a list containing count data, sample information, and gene data.

library(easybio)
# x <- prepare_geo('gseid')

When preparing The Cancer Genome Atlas (TCGA) RNA-Seq data, employ the prepare_tcga() function from the TCGAbiolinks package. This function returns a list with count data for all samples and unstranded FPKM data for tumor samples, along with sample and feature information.

Example Workflow: TCGA CHOL Project

For a detailed overview of the Limma workflow, refer to the article: RNA-seq analysis is as easy as 1-2-3 with limma, Glimma and edgeR. Three functions are designed for this workflow:

library(TCGAbiolinks)
library(easybio)
library(data.table)

query <- GDCquery(
  project = "TCGA-CHOL",
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification"
)
GDCdownload(query = query)
data <- GDCprepare(query = query)

lt <- prepare_tcga(data)
lt$all$sampleInfo[["group"]] <- fifelse(lt$all$sampleInfo$sample_type %ilike% "Tumor", "Tumor", "Normal")

# limma-voom workflow
x <- dgeList(lt$all$exprCount, lt$all$sampleInfo, lt$all$featuresInfo)
x <- dprocess_dgeList(x, "group", 10)
efit <- limmaFit(x, "group")
get_attr(efit, "contrast")

CHOL_DEGs <- limma::topTable(fit = efit, coef = 1, number = Inf)

Set the criteria for differential gene expression:

CHOL_DEGs[, let(
  tumor_vs_normal = fcase(
    adj.P.Val < 0.05 & logFC > 2, "Up",
    adj.P.Val < 0.05 & logFC < -2, "Down",
    default = "Not-Significant"
  )
)]
setDT(CHOL_DEGs, keep.rownames = "rid")
head(CHOL_DEGs)

Visualize the differentially expressed genes (DEGs) with a volcano plot:

data(CHOL_DEGs)
plotVolcano(
  data = CHOL_DEGs,
  x = logFC,
  y = -log10(adj.P.Val),
  color = tumor_vs_normal
)

For pathway enrichment analysis, such as GO and KEGG, download the r4msigdb package to access the MSigDB gene sets. For more details on this package, please see r4msigdb.

# devtools::install_github('person-c/easybio')
library(data.table)
# Over Presentation Analysis(ORA)
pathwayGO <- r4msigdb::query("Hs", pathway = "^GO(BP|CC|MF)_")
pathwayGO <- setNames(pathwayGO$symbol, pathwayGO$standard_name)

oraRes <- fgsea::fora(
  pathways = pathwayGO,
  genes = CHOL_DEGs[.("Up"), gene_name, on = .(tumor_vs_normal)],
  universe = unique(CHOL_DEGs$gene_name)
)
oraRes[, let(
  category = fcase(
    pathway %like% "GOBP", "BP",
    pathway %like% "GOMF", "MF",
    pathway %like% "GOCC", "CC"
  )
)]

oraRes <- oraRes[, .SD[order(padj)], by = .(category)]
oraRes[, let(pathwayGO = factor(pathway, levels = rev(pathway)))]
plotORA(
  data = oraRes[, head(.SD, 5), by = category],
  x = -log10(padj),
  y = pathwayGO,
  size = log10(overlap),
  fill = category
)

To perform Gene Set Enrichment Analysis (GSEA), use the fgsea::fgsea() function. The core function is adapted from the fgsea package with minor visual enhancements.

library(fgsea)
data(examplePathways)
data(exampleRanks)

Run the GSEA analysis:

fgseaRes <- fgsea(pathways = examplePathways, 
                  stats    = exampleRanks,
                  minSize  = 15,
                  maxSize  = 500)
plotGSEA(
  fgseaRes, 
  pathways = examplePathways, 
  pwayname = "5991130_Programmed_Cell_Death", 
  stats = exampleRanks, 
  save = FALSE
)
#> Warning in fsort(stats, TRUE): New parallel sort has not been implemented for
#> decreasing=TRUE so far. Using one thread.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.