1 TSplotly R Package Installation
N.B.: Please download the source files from our Github repository and install it (either in R or R-RStudio) for testing. You could also find the same source at the Github page of project TCIU of SOCR team.
# Installation from the Windows binary (recommended for Windows systems)
## Set the directory to the working space contains "TSplotly_1.1.0.tar.gz"
library(devtools)
install("TSplotly",build_vignettes = TRUE)
## Or you can use the command line to achieve this
## Set the directory to the working space contains "TSplotly_1.1.0.tar.gz"
system("R CMD INSTALL TSplotly")
# Installation from the source (recommended for Macs and Linux systems)
install.packages("~/TSplotly_1.1.0.tar.gz", repos = NULL, type = "source")
Once published on CRAN, the installation can be done by the following command:
install.packages("TSplotly")
2 Background
This document is set up for package which is developed in the R environment. This package provides a portable plot_ly style interactive display of longitudinal (timeseries) data. It is mainly based on packages and . Functions in this package mainly deal with time series data (data created by function ) or results come from ARIMA(X) models (models generated by funcion ). Data that can be applied by this package is mainly preprocessed by package .
Main functions of this plot are tested under the SOCR project Data Science: Time Complexity and Inferential Uncertainty (TCIU) by Ivo D. Dinov, Milen V. Velev, Yongkai Qiu, Zhe Yin. University of Michigan, Ann Arbor. Most of the examples of this package can be found in the last part of Chapter 5.
3 Introduction of functions with examples
The TSplotly package comprises 4 functions.
TSplot: create plot_ly plot on time series data or fitted ARIMA(X) models.
ADDline: add lines on existing TSplot objects, as needed.
GGtoPY: create a convinent way to transform (reformat) ggplot2 datasets into a format that can work on Plot_ly.
GTSplot: create multiple plot_ly lines (timeseries) based on data frames containing multiple timeseries data
3.1 Function TSplot
This function mainly takes in fitted ARIMA(X) model created by function under package . After taking in the fitted result of ARIMA model. It will generate predicted future time series results as well as 80% and 95% confidence interval. Also, the original training time seris data will also be generated on the plot. Periods of original time series data can be controled by parameter. Also, if original model contains a matrix of external regressors (i.e. model is an ARIMAX model). Then this matrix must be included inside this function.
Below are parameters in this function:
origin_t: Number of periods of original time series data you wish to include in the plot write all if all periods should be included
ARIMAmodel: ARIMA model created by function “auto.arima()”
XREG: if using ARIMAX model, put in the regularized X matrix
TITLE: title for this plot
Ylab: label of Y axis
Xlab: label of X axis
ts_original: label for original time series line
ts_forecast: label for forecasted time series line
title_size: size of the title
This function will return a style of plot. It can be saved as a variable and more elements can be put in using pipeline . More details can be viewed in the plotly homepage of R.
Note that this function can only work on ARIMA(X) models with a time format containing year and month(e.g. “2017-02-14”). As it is using function from package . So the time format must satisfy function as well. This function can be very helpful when dealing with time series data related to finance or log data with a standard time format. If an error occurs. You may wish to use instead which can accept a more flexible time format.
3.2 Example of TSplot function
require(TSplotly)
require(zoo)
require(ggplot2)
require(plotly)
require(forecast)
# Creating time series data
MCSI_Data_monthAvg_ts_Y <- ts(Y, start=c(1978,1), end=c(2018, 12), frequency = 12)
# Applying ARIMAX model
modArima <- auto.arima(MCSI_Data_monthAvg_ts_Y, xreg=X)
# Creating plot_ly results
## 48 means that there will be 48 periods from the original
## time series dataset that is included in the plot result.
## You could also change this to "all" to see all original dataset in a single plot.
TSplot(48,modArima,X_new,title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series")
This example is based on TCIU Figure 1.6 of Chapter 1. 48 periods of training time series data has been chosen so that the time is 4 years from 2015 to 2019.
3.3 Function TSplot_gen
A more general version of TSplot. It can take in fitted ARIMA(X) model and plot both training time series data and predicted time series results along with its 80% and 95% confidence interval .The biggest advantage of this function is that it doesn’t require that the time format in the model must be consistent with the format accepted by function (i.e. a time series data that has year and month information). Instead, it can take in any format of time. But if you wish to include labels for each time, a vector of time labels must be included. Another advantage of this function is that you can include a list of other time series data inside this function such that more time lines can be drawn simultaneously with the result of ARIMA(X) model. Note that if you wish to achieve this in function, function must also be used.
Below are parameters in this function:
origin_t: Number of periods of original time series data you wish to include in the plot write all if all periods should be included
ARIMAmodel: ARIMA model created by function “auto.arima()”
XREG: if using ARIMAX model, put in the regularized X matrix
TITLE: title for this plot
Ylab: label of Y axis
Xlab: label of X axis
plot_labels: To include a specific labels for each time point of all training time series data and predicted ARIMA(X) result. A vector should be applied to this parameter that has the same length of chosen periods of training time series data (i.e. parameter ) along with predicted time series periods.
ts_original: label for original time series line
ts_forecast: label for forecasted time series line
title_size: size of the title
ts_list: applying this function can help you draw more time lines into the original plot. A list should be applied to this parameter which contains all extra time series data that you wish to draw on the original plot. Each element on this list should be created by function
ts_labels: when drawing extra time lines with parameter . You could create specific labels for each time points. A list with the same shape of list in should be applied. Each element in this list should contain time labels corresponding with the list in
ts_names: Creating labels for each extra time lines you draw. Labels will appear on the legend of the plot.
COLO: Specifying colors for each new lines that is drawn.
3.4 Examples of TSplot_gen function
3.4.1 Example one
This example will generate the same result of the example of function . Notice that you must put in a vector of labels to get the year and month labels similiar to the previous example.(Which means that when dealing time series dataset with a year and month time format, function may be a good choice)
#Create labels for training time series data and ARIMAX result (48 periods of training data included)
require(zoo)
#Time labels for training data
time_label1<-as.yearmon(time(MCSI_Data_monthAvg_ts_Y))[(length(MCSI_Data_monthAvg_ts_Y)-48+1):length(MCSI_Data_monthAvg_ts_Y)]
#Time labels for ARIMAX model(need to fit model first)
time_pred<-forecast(modArima,xreg = X_new)
time_label2<-as.yearmon(time(time_pred$mean))
time_label<-as.character(c(time_label1,time_label2))
TSplot_gen(48,modArima,X_new,title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series", #inculde labels inside
plot_labels = time_label)
3.4.2 Example two
A huge advantage of applying function is that it can directly adding new time lines to the plot without calling another function .Here anothe plot of TCIU Figure 1.6 of Chapter 1 will be shown as an example:
# Step 1: creating the base plot
## Creating time labels
tl1<-as.yearmon(time(modArima_train$x))[(length(modArima_train$x)-48+1):length(modArima_train$x)]
tl2<-as.yearmon(time(forecast(modArima_train,xreg = as.matrix(X_test))$mean))
tl<-as.character(c(tl1,tl2))
Tempplot<-TSplot_gen(48,modArima_train,as.matrix(X_test),title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series")
# Show base plot if no other elements(labels, new time lines, etc)is included
Tempplot
# Step 2: including new lines and labels
## Creating list and other information for new lines
TSlist<-list(MCSI_Data_monthAvg_ts_Y_test)
TSlabel<-list(as.character(as.yearmon(time(TSlist[[1]]))))
TSname<-c("Original result")
## Put them into related parameters
TSplot_gen(48,modArima_train,as.matrix(X_test),title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series",plot_labels = tl, #labels of original plot
ts_list = TSlist,ts_names = TSname,ts_labels = TSlabel,COLO = "black")
3.5 Function ADDline
This function is set up to expand the functions of as it cannot draw new time lines by itself. Also, this funtion can also work for generating extra time lines to be applied into other style variables. function create a list of 4 elements that can be applied quickly to functions or .
Below are parameters in this function:
linetype: two options for this parameter. If “TS” is applied, then data created by should be applied to parameter . If “ARIMA” is applied, then parameters and should be used.
TS: data created by function
ARIMAmodel: ARIMA model created by function
XREG: if using ARIMAX model, put in the regularized X matrix
Name: title for this line
3.6 Example of ADDline function
can collaborate with to expand its ability. Example below is based on those two functions and will produce same result with Example two of function .
require(forecast)
#Firstly create a base plotly plot
Tempplot<-TSplot(48,modArima_train,as.matrix(X_test),title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series")
# Generate a new line with ADDline function
newline<-ADDline(TS = MCSI_Data_monthAvg_ts_Y_test,linetype = "TS",Name = "Original Result")
## Put the new line into our plot
Tempplot%>%
add_lines(x=newline$X,text=newline$TEXT,y=newline$Y,name=newline$NAME,line=list(color="grey"))
3.7 Function GtoP_trans
Dataset that can be applied to package is pretty different from that can be applied to package. This function provieds a quick way to transfer data frame that works on into the shape that will work on . So that we can apply new dataset quickly to previous functions.
Belwo are parameters in this function:
dataframe: original data frame that is applied on
NAME: column that will be used on creating different lines on plot_ly
X: column that will serve as the x axis on plot_ly
Y: column that will serve as the value(y) on plot_ly
3.8 Example of GtoP_trans
Firstly a ggplot2 example is shown here
ggplot(MCSI_Data_monthAvg_melt[MCSI_Data_monthAvg_melt$series!="INCOME", ],
aes(YYYYMM, value)) +
geom_line(aes(linetype=series, colour = series), size=2) +
geom_point(aes(shape=series, colour = series), size=0.3) +
geom_smooth(aes(colour = series), se = TRUE) +
coord_trans(y="log10") +
xlab("Time (monthly)") + ylab("Index Values (log-scale)") +
scale_x_date(date_breaks = "12 month", date_labels = "%m-%Y") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
text = element_text(size=20))+ theme(legend.position="top")
Data transformation by is done here
PYdf<-GtoP_trans(MCSI_Data_monthAvg_melt[MCSI_Data_monthAvg_melt$series!="INCOME", ],NAME="series",X="YYYYMM",Y="value")
PYdf$INCOME<-NULL
#Log10 transformation
PYdf<-log10(PYdf)
Apply package to create interactive plot
#Create an interactive list
updatemenus <- list(
list(
xanchor="left",
yanchor="top",
active = -1,
type= 'buttons',
buttons = list(
list(
label = "ALL",
method = "update",
args = list(list(visible = c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE)),
list(title = "All Indexes"))),
list(
label = "ICS",
method = "update",
args = list(list(visible = c(FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
list(title = "ICS"))),
list(
label = "ICC",
method = "update",
args = list(list(visible = c(FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE)),
list(title = "ICC"))),
list(
label = "GOVT",
method = "update",
args = list(list(visible = c(FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE)),
list(title = "GOVT"))),
list(
label = "DUR",
method = "update",
args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE)),
list(title = "DUR"))),
list(
label = "HOM",
method = "update",
args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE)),
list(title = "HOM"))),
list(
label = "CAR",
method = "update",
args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)),
list(title = "CAR"))),
list(
label = "AGE",
method = "update",
args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE)),
list(title = "AGE"))),
list(
label = "EDUC",
method = "update",
args = list(list(visible = c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
list(title = "EDUC")))
)
)
)
# Apply plot_ly to finish generating result
plot_ly(type="scatter",mode="lines")%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$ICS,name="ICS",line=list(color="powderblue"))%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$ICC,name="ICC",line=list(color="red"))%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$GOVT,name="GOVT",line=list(color="green"))%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$DUR,name="DUR",line=list(color="orange"))%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$HOM,name="HOM",line=list(color="purple"))%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$CAR,name="CAR",line=list(color="pink"))%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$AGE,name="AGE",line=list(color="brown"))%>%
add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$EDUC,name="EDUC",line=list(color="black"))%>%
layout(title= list(text="Time series for 8 indexes",font=list(family = "Times New Roman",size = 16,color = "black" )),
paper_bgcolor='rgb(255,255,255)', plot_bgcolor='rgb(229,229,229)',
xaxis = list(title ="Time (monthly)",
gridcolor = 'rgb(255,255,255)',
showgrid = TRUE,
showline = FALSE,
showticklabels = TRUE,
tickcolor = 'rgb(127,127,127)',
ticks = 'outside',
zeroline = FALSE),
yaxis = list(title = "Index Values (log-scale)",
gridcolor = 'rgb(255,255,255)',
showgrid = TRUE,
showline = FALSE,
showticklabels = TRUE,
tickcolor = 'rgb(127,127,127)',
ticks = 'outside',
zeroline = FALSE),
updatemenus=updatemenus)