Version 1.0.0 of the plotly R package introduces a new high-level interface for working with plotly’s JavaScript graphing library from R. The aim of this vignette is to explain the semantics of this interface, but I also recommend perusing plotly’s R homepage for more examples.
To initiate a plotly object, use plot_ly()
. Here we turn the economics
data frame (from the ggplot2 package) into a plotly visualization and store it as the object p
.
library(plotly)
p <- plot_ly(economics, x = date, y = uempmed)
If you have a plotly account, printing plotly objects in the R console will create a new plotly figure, and open it in your web browser. If you’re using knitr/R Markdown with HTML output (like this vignette), printing not only creates the plot, but also embeds it as an HTML iframe. If you want to avoid iframes, check out plotly offline and the accompanying vignette for R.
p
plot_ly()
has a number of arguments which are unique to the R package and make common visualizations a bit easier. These arguments are very much inspired by the semantics of ggplot2’s qplot()
in the sense that a scales are automatically applied these variables (i.e., they map data to visual properties).
If a ordinal variable (aka a non-ordered factor variable) is assigned to color, then a qualitative color palette is used by default.
plot_ly(iris, x = Petal.Length, y = Petal.Width,
color = Species, mode = "markers")
If you want to change the default palette, it’s recommended that you provide a http://colorbrewer2.org qualitative pallette name (e.g., “Set1” or “Accent”) to the colors argument.
plot_ly(iris, x = Petal.Length, y = Petal.Width,
color = Species, colors = "Set1", mode = "markers")
In this case, the palette consists of 9 colors and the default behavior is to pick colors that are furthest apart (“#E41A1C”, “#FF7F00”, and “#999999”).
cols <- RColorBrewer::brewer.pal(9, "Set1")
scales::show_col(cols)
If you’d like more control over the mapping, you can provide a vector of colors (of appropriate length).
cols <- RColorBrewer::brewer.pal(nlevels(iris$Species), "Set1")
plot_ly(iris, x = Petal.Length, y = Petal.Width,
color = Species, colors = cols, mode = "markers")
If either a numeric or an ordered factor is mapped to color, plot_ly()
applies a sequential color scale by default.
plot_ly(iris, x = Petal.Length, y = Petal.Width,
color = as.ordered(Species), mode = "markers")
In the case of continuous numeric variables, plot_ly()
performs a linear mapping between the data and an interpolated color pallette.
plot_ly(iris, x = Petal.Length, y = Petal.Width,
color = Sepal.Length, mode = "markers")
The colors argument takes arbitrary color codes of arbitrary length. Here is how we could use it to replicate the default mapping in ggplot2.
plot_ly(iris, x = Petal.Length, y = Petal.Width,
color = Sepal.Length, colors = c("#132B43", "#56B1F7"),
mode = "markers")
To encode values using symbols, use the symbol argument.
plot_ly(iris, x = Petal.Length, y = Petal.Width,
symbol = Species, mode = "markers")
To change the default symbols used, use the symbols argument. All the valid symbol types are listed here.
plot_ly(iris, x = Petal.Length, y = Petal.Width, mode = "markers",
symbol = Species, symbols = c("cross", "square", "triangle-down"))
subplot()
Using the group argument splits the data into different plotly “traces”.
plot_ly(iris, x = Petal.Length, y = Petal.Width,
group = Species, mode = "markers")
Although we haven’t specified a coloring scheme, plotly will employ one on it’s own default scheme. The group argument is quite powerful when used in conjunction with subplot()
in order to anchor traces onto different axes.
iris$id <- as.integer(iris$Species)
p <- plot_ly(iris, x = Petal.Length, y = Petal.Width, group = Species,
xaxis = paste0("x", id), mode = "markers")
subplot(p)
Since subplot()
does not assume x/y axes are on a common scale, it does not impose any restrictions on the range by default. However, you can change this by pre-specifying the range of the axis objects via the layout()
function.
p2 <- layout(
p,
xaxis = list(range = range(Petal.Length) + c(-0.1, 0.1)),
yaxis = list(range = range(Petal.Width) + c(-0.1, 0.1))
)
subplot(p2)
The subplot()
function creates “xaxis[0-9]” objects which inherit pre-specified properties, but you can also customize each subplot by referencing these objects in the layout
layout(
subplot(p2),
yaxis2 = list(title = ""),
yaxis3 = list(title = "")
)
See here for another example of using the group argument to make small multiples (with maps!).
Sometimes you may want multiple traces on a plot, but have different traces from different data sources. In this case, the add_trace()
function and it’s (optional) data
argument come in handy.
m <- loess(uempmed ~ as.numeric(date), economics)
efit <- data.frame(date = economics$date, yhat = fitted(m))
plot_ly(economics, x = date, y = uempmed, name = "observed") %>%
add_trace(y = yhat, name = "estimated", data = efit)
Note that the date information carries over from the first trace to the second. In fact, by default, information from the first trace carries over to all subsequent traces unless the property is overwritten or if we set inherit = FALSE
in plot_ly()
(this helps avoid repeating yourself).
If you look at the structure of plotly objects, they are actually data frames with a class of plotly and a special environment attached (this environment tracks the mapping from data to visual properties).
str(p <- plot_ly(economics, x = date, y = uempmed))
## Classes 'plotly' and 'data.frame': 478 obs. of 6 variables:
## $ date : Date, format: "1967-06-30" "1967-07-31" ...
## $ pce : num 508 511 517 513 518 ...
## $ pop : int 198712 198911 199113 199311 199498 199657 199808 199920 200056 200208 ...
## $ psavert : num 9.8 9.8 9 9.8 9.7 9.4 9 9.5 8.9 9.6 ...
## $ uempmed : num 4.5 4.7 4.6 4.9 4.7 4.8 5.1 4.5 4.1 4.6 ...
## $ unemploy: int 2944 2945 2958 3143 3066 3018 2878 3001 2877 2709 ...
## - attr(*, "plotly_hash")= chr "7ff330ec8c566561765c62cbafed3e0f#0"
Doing this allows us to mix data manipulation and visualization verbs in a pure(ly) functional, predictable and pipeable manner. Here, we take advantage of dplyr’s filter()
verb to label the highest peak in the time series:
p %>%
add_trace(y = fitted(loess(uempmed ~ as.numeric(date)))) %>%
layout(title = "Median duration of unemployment (in weeks)",
showlegend = FALSE) %>%
dplyr::filter(uempmed == max(uempmed)) %>%
layout(annotations = list(x = date, y = uempmed, text = "Peak", showarrow = T))
Although data frames can be thought of as the central object in this package, plotly visualizations don’t actually require a data frame. This makes chart types that accept a z
argument especially easy to use if you have a numeric matrix:
plot_ly(z = volcano, type = "surface")