A common problem in R is labelling scatter plots with large numbers of points and/or labels. We provide a utility for easy labelling of scatter plots and quick plotting of volcano plots and MA plots for gene expression analyses. Using an interactive shiny and plotly interface, users can hover over points to see where specific points are located and click on points to easily label them. Labels can be toggled on/off simply by clicking. An input box and batch input window provides an easy way to label points by name. Labels can be dragged around the plot to place them optimally. Notably we provide an easy way to export directly to pdf for publication.
Install from CRAN
install.packages("easylabel")
library(easylabel)
Install from Github
devtools::install_github("myles-lewis/easylabel")
library(easylabel)
We recommend installing at least version 4.9.4.9 (currently the development version) of plotly which has much better on-screen resolution of scatter plots for a better user experience. Note that exporting to pdf is unaffected by this.
remotes::install_github("ropensci/plotly")
If you wish to use the optional useQ
argument with easyVolcano()
and easyMAplot()
, you will need to install additional package qvalue
from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("qvalue")
If you wish to use the optional fullGeneNames
argument, you will need to install packages AnnotationDbi
and org.Hs.eg.db
from Bioconductor:
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
Use easylabel()
to open a shiny app and plot and label scatter plots. A table of the main data is supplied in the Table tab for easy viewing of data.
data(mtcars)
easylabel(mtcars, x = 'mpg', y = 'wt',
colScheme = 'royalblue')
Similar to ggplot2 and plotly, colour can either be set as a single colour, using colScheme = 'blue'
or can be set to change with the level of a factor variable within data
by setting col
. Transparency can be altered by setting alpha
. Currently colour gradients are not supported.
easylabel(mtcars, x = 'mpg', y = 'wt',
col = 'cyl')
Marker shapes can either be set as a single shape using shapeScheme = 21
, based on the base graphics pch
symbol scheme (see points()
), or shapes can be set to change with the level of a factor variable within data
by setting shape
. Shapes and colours can be combined.
# gapminder data set
if(!require(gapminder)) {install.packages("gapminder")}
library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
col = 'continent', shape = 'continent',
size = 10,
labs = 'country',
zeroline = FALSE)
The size of all markers can be adjusted by setting size
to a single number e.g. size = 6
(default 8). Size can be assigned to a column of continuous data within data
to produce a bubble chart. The size of points is automatically scaled to fit within a range of point sizes, determined by scaleRange
.
library(gapminder)
easylabel(gapminder[gapminder$year == 2007, ], x = 'gdpPercap', y = 'lifeExp',
col = 'continent', labs = 'country',
size = 'pop',
alpha = 0.6,
zeroline = FALSE)
By default, easylabel()
takes the rownames of the main dataframe as the label names. Any column can be selected for label names by setting labs
to the name of a column within data
, as for example in the Gapminder dataset plots above.
Axis titles can be set using xlab
and ylab
. These accept R expressions to allow maths symbols. This is primarily designed to work when exporting plots to pdf via base graphics, since plotly does not natively understand expressions.
easylabel(ymatrix, x = 'x', y = 'y', col = 'col',
colScheme = c('darkgrey', 'green3', 'gold3', 'blue'),
xlab = expression("log"[2] ~ " fold change post-Rituximab"),
ylab = expression("log"[2] ~ " fold change post-Tocilizumab"),
showgrid = TRUE)
Gridlines can be shown by setting showgrid = TRUE
. Black lines through the origin can be controlled by setting zeroline
to TRUE
or FALSE
. Dashed grey horizontal lines can be added at specific levels by setting hline
with a vector of values. Similarly dashed grey vertical lines can be added by setting vline
with a vector of values.
Use the easyVolcano()
function to quickly plot a volcano plot from DESeq2 or EdgeR objects. The useQ
argument will switch to using q values for FDR.
# typical DESeq2 workflow
volc1 <- results(dds)
easyVolcano(volc1, useQ = TRUE)
Use the easyMAplot()
function to quickly plot an MA plot from DESeq2 or EdgeR objects.
easyMAplot(volc2, useQ = TRUE)
The fullGeneNames
argument will use Bioconductor package AnnotationDbi
and the org.Hs.eg.db
human gene database to expand gene symbols in the Table tab. Both will need to be installed from Bioconductor.
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")
easyVolcano(volc1, useQ = TRUE, fullGeneNames = TRUE)
Full gene names are also shown when hovering over points in the scatter plot, which can make it easier to label points of interest. For mouse genes, install org.Mm.eg.db
and set AnnotationDb = org.Mm.eg.db
(note the lack of quotes).
BiocManager::install("org.Mm.eg.db")
library(org.Mm.eg.db)
easyVolcano(volc1,
fullGeneNames = TRUE,
AnnotationDb = org.Mm.eg.db)
For volcano plots a simple colour scheme with downregulated genes in blue and upregulated genes in red can be rendered by setting fccut = 0
.
easyVolcano(volc2,
useQ = TRUE,
fccut = 0,
main = "Volcano title")
A title can be added using main = "Title"
. The font size of the title can be modified using cex.main = 2
(default 1.2) if saving to pdf.
You can add left and right sided titles using Ltitle
and Rtitle
to explain the direction of effect for up/downregulation. The use of expression
in the example below shows how to add left/right arrow symbols to the titles. The symbols only appear in the saved pdf - they are not compatible with plotly.
LRtitle_side = 1
puts these titles on the bottom while LRtitle_side = 3
puts them on the top.
cex.lab
controls font size for these titles as well as axis titles. cex.axis
controls font size for axis numbering.
easyVolcano(volc1,
useQ = TRUE, fullGeneNames = TRUE,
Ltitle = expression(symbol("\254") ~ "Non-responder"),
Rtitle = expression("Responder" ~ symbol("\256")),
LRtitle_side = 1,
cex.lab = 0.9, cex.axis = 0.8,
fccut = c(1, 2), fdrcutoff = 0.2,
ylim = c(0, 6), xlim = c(-5, 5),
colScheme = c('darkgrey', 'blue', 'orange', 'red'))
The FDR cutoff is set using fdrcutoff
(volcano plots allow a single value, while MA plots allow multiple values). In order to force significant genes to be shown based on unadjusted P values, this can be achieved by a workaround setting both y
and padj
manually to the unadjusted p value column. Note this does mean the legend incorrectly states ‘FDR <’ etc.
easyVolcano(volc1, y = 'pvalue', padj = 'pvalue', fdrcutoff = 0.01)
xlim
and ylim
allow control over the range of each axis. Outlying points are shown as diamonds (see example above). Outliers can be toggled off using showOutliers = FALSE
. Outlier symbol, colour and line width can be set using outlier_shape
, outlier_col
and outlier_lwd
respectively.
Axes can be further customised by using panel.last
, see panel.last.
The colour scheme system has been expanded to allow multiple fold change cut-offs. In the example above the colours are symmetrical.
In the next 2 plots, the colours range from blue for downregulated genes, through to red for upregulated genes. Vertical lines can be added using vline
.
colScheme <- c('darkgrey', 'blue', 'lightblue', 'orange', 'red')
easyVolcano(volc1, fccut = 1, fdrcutoff = 0.2,
ylim = c(0, 6), xlim = c(-5, 5),
colScheme = colScheme,
vline = c(-1, 1))
The next example has 6 colours and also shows how to remove the white outlines around points using outline_col = NA
and use transparency instead via alpha
.
library(RColorBrewer)
colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(9:7, 3:1)])
easyVolcano(volc1, fccut = c(1, 2), fdrcutoff = 0.2,
ylim = c(0, 6), xlim = c(-5, 5),
colScheme = colScheme,
alpha = 0.75, outline_col = NA)
Similarly 6 colours can be applied to MA plots using 3 levels of cut-off for FDR (note the colour scheme is in a slightly different order).
colScheme <- c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)])
easyMAplot(volc2, fdrcutoff = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE,
alpha = 0.75, outline_col = NA,
colScheme = colScheme)
Further control of plotting can be achieved by passing other arguments to plot() via R’s ...
argument. For example, a box around the plot can be added using bty='o'
(see par()
). Note this method only works for saving to pdf since it is not supported by plotly.
Base graphics’ panel.last
argument is a flexible way to add plotting commands. This provides a way to add trend lines, loess lines or any additional features such as extra legends to the plot after the points are plotted, but before labels are drawn.
The code below uses panel.last
to completely customise the numbering of the axis ticks. Note that panel.last
only works when saving to pdf, and is not available in plotly.
Axes can be customised by first suppressing the initial axis using xaxt='n'
or yaxt='n'
and then adding an axis()
call using panel.last
.
easyVolcano(volc1, useQ = TRUE, fdrcutoff = 0.2, ylim = c(0,8), xlim = c(-6,6),
size = 4.8,
xaxt = 'n', yaxt = 'n', bty = 'o',
main = "DEG volcano plot",
panel.last = {
axis(side = 1, at = -6:6)
axis(side = 2, at = 0:8)
})
Label lines can be altered using the argument labelDir
or by selecting the Label direction pulldown menu in the shiny app. Options are shown below:
easyVolcano(volc1, labelDir = "horiz")
easyMAplot(volc1, labelDir = "vert")
Rectangles can be drawn around the label text, using rectangles = TRUE
.
# Simple outlines
easyVolcano(volc2, useQ = TRUE, fccut = 0,
rectangles = TRUE)
# Red outlined labels, rounded ends
easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE,
rectangles = TRUE,
padding = 5,
border_radius = 10,
line_col = 'red',
border_col = 'red',
text_col = 'red')
# Transparent grey rectangles, rounded ends
easyMAplot(volc2, fdrcut = c(0.05, 0.01, 0.001), size = 6, useQ = TRUE,
alpha = 0.75, outline_col = NA,
fullGeneNames = TRUE,
colScheme = c('darkgrey', brewer.pal(9, 'RdYlBu')[c(7:9, 3:1)]),
rectangles = TRUE,
border_col = NA,
padding = 5,
rect_col = adjustcolor('grey', alpha.f = 0.6),
border_radius = 20)
# White text on black background, no rounding
easyVolcano(volc2, useQ = TRUE, fullGeneNames = TRUE,
fccut = 0,
rectangles = TRUE,
padding = 4,
border_radius = 0,
rect_col = 'black',
text_col = 'white',
border_col = NA)
scattergl
function, which uses webGL, seems to have a limit of around 400,000 scatter points per plot. Above this level, at around 500,000 points the shiny/plotly app becomes sluggish and at >600,000 points it tends to freeze and becomes unusable.easylabel()
relies on plotly for the interactive scatter plot and uses webGL for speed as a plotly scattergl trace. A known bug exists in the current version 4.9.4.1 of plotly for R which is still using plotly.js version 2.0. This older version of plotly.js has a problem with poor pixellation of scatter plots, particularly on highDPI (retina) displays. This issue has been fixed in plotly.js version 2.5 and plotly/r version 4.9.4.9, which can be installed via Github:remotes::install_github("ropensci/plotly")