Introduction to tidyquant

Matt Dancho

2017-01-07

Bringing quantitative financial analysis to the tidyverse

Overview

tidyquant integrates the best quantitative resources for collecting and analyzing quantitative data, xts and zoo, quantmod and TTR, with the tidy data infrastructure of the tidyverse allowing for seamless interaction between each and working within the tidyverse.

The three primary quantitative packages that are the backbone for quantitative financial analysis in R programming are:

The tidy data principles are a cornerstone of data management and data modeling workflow. The foundation for tidy data management is the tidyverse, a collection of R packages: ggplot2, dplyr, tidyr, purrr, readr, tibble, that work in harmony, are built for scalability, and are well documented in R for Data Science. Using this infrastructure and the core tidy concepts, we can integrate the tidy data principles with the best quantitative financial analysis packages using the package, tidyquant.

Prerequisites

Load the tidyquant package to get started.

# Loads tidyquant, tidyverse, lubridate, xts, quantmod, TTR 
library(tidyquant)  

Benefits

The tidyquant philosophy:

A Few Core Functions with A Lot of Power

Minimizing the number of functions reduces the learning curve. Functions are grouped into verbs for efficient collection and manipulation of quantitative data:

Get Quantitative Data

The tq_get() function is used to collect all data by changing the get argument. The options include stock lists for 18 stock indexes from marketvolume.com, stock prices, dividends and splits from Yahoo Finance, financial statements from Google Finance, metal prices and exchange rates from Oanda, and economic data from the FRED database. To see the full list, execute tq_get_options().

tq_get_options()
## [1] "stock.prices"   "stock.index"    "dividends"      "splits"        
## [5] "financials"     "key.ratios"     "economic.data"  "exchange.rates"
## [9] "metal.prices"

Stock Index:

A wide range of stock index / exchange lists can be retrieved using get = "stock.index". To get a full list of the options, use tq_get_stock_index_options().

tq_get_stock_index_options()
##  [1] "DOWJONES"    "DJI"         "DJT"         "DJU"         "SP100"      
##  [6] "SP400"       "SP500"       "SP600"       "RUSSELL1000" "RUSSELL2000"
## [11] "RUSSELL3000" "AMEX"        "AMEXGOLD"    "AMEXOIL"     "NASDAQ"     
## [16] "NASDAQ100"   "NYSE"        "SOX"

Set x as one of the options in the list of options above, and get = "stock.index" to get the desired stock index / exchange.

tq_get("sp500", get = "stock.index")
## # A tibble: 501 × 2
##    symbol                   company
##     <chr>                     <chr>
## 1     MMM                        3M
## 2     ABT       ABBOTT LABORATORIES
## 3    ABBV                ABBVIE INC
## 4     ACN                 ACCENTURE
## 5    ATVI       ACTIVISION BLIZZARD
## 6     AYI             ACUITY BRANDS
## 7    ADBE             ADOBE SYSTEMS
## 8     AAP        ADVANCE AUTO PARTS
## 9     AET                     AETNA
## 10    AMG AFFILIATED MANAGERS GROUP
## # ... with 491 more rows

The data source is www.marketvolume.com.

Stock Prices, Dividends and Splits:

The stock prices can be retrieved succinctly using get = "stock.prices".

aapl_prices  <- tq_get("AAPL", get = "stock.prices", from = " 1990-01-01")
aapl_prices 
## # A tibble: 6,809 × 7
##          date   open   high   low  close   volume adjusted
##        <date>  <dbl>  <dbl> <dbl>  <dbl>    <dbl>    <dbl>
## 1  1990-01-02 35.250 37.500 35.00 37.250 45799600 1.132075
## 2  1990-01-03 38.000 38.000 37.50 37.500 51998800 1.139673
## 3  1990-01-04 38.250 38.750 37.25 37.625 55378400 1.143471
## 4  1990-01-05 37.750 38.250 37.00 37.750 30828000 1.147270
## 5  1990-01-08 37.500 38.000 37.00 38.000 25393200 1.154868
## 6  1990-01-09 38.000 38.000 37.00 37.625 21534800 1.143471
## 7  1990-01-10 37.625 37.625 35.75 36.000 49929600 1.094086
## 8  1990-01-11 36.250 36.250 34.50 34.500 52763200 1.048499
## 9  1990-01-12 34.250 34.750 33.75 34.500 42974400 1.048499
## 10 1990-01-15 34.500 35.750 34.25 34.250 40434800 1.040901
## # ... with 6,799 more rows

Dividends are obtained using get = "dividends".

aapl_divs <- tq_get("AAPL", get = "dividends", from = "1990-01-01")
aapl_divs
## # A tibble: 42 × 2
##          date dividends
##        <date>     <dbl>
## 1  1990-02-16   0.00393
## 2  1990-05-21   0.00393
## 3  1990-08-20   0.00393
## 4  1990-11-16   0.00429
## 5  1991-02-15   0.00429
## 6  1991-05-20   0.00429
## 7  1991-08-19   0.00429
## 8  1991-11-18   0.00429
## 9  1992-02-14   0.00429
## 10 1992-06-01   0.00429
## # ... with 32 more rows

Stock splits are obtained using get = "splits".

aapl_splits <- tq_get("AAPL", get = "splits", from = "1990-01-01")
aapl_splits
## # A tibble: 3 × 2
##         date    splits
##       <date>     <dbl>
## 1 2000-06-21 0.5000000
## 2 2005-02-28 0.5000000
## 3 2014-06-09 0.1428571

The data source is yahoo finance.

Financial Statements:

For any given stock, a total of six financials statements are retrieved as nested tibbles, one for each combination of statement type (Income Statement, Balance Sheet, and Cash Flow) and period (by annual and quarter).

aapl_financials <- tq_get("AAPL", get = "financials")
aapl_financials
## # A tibble: 3 × 3
##    type             annual            quarter
## * <chr>             <list>             <list>
## 1    BS <tibble [168 × 4]> <tibble [210 × 4]>
## 2    CF  <tibble [76 × 4]>  <tibble [76 × 4]>
## 3    IS <tibble [196 × 4]> <tibble [245 × 4]>

The statement information can be extracted by selecting (dplyr::select()) and filtering (dplyr::filter()) to the desired statement and unnesting (tidyr::unnest()) the results.

aapl_financials %>%
    filter(type == "IS") %>%
    select(annual) %>%
    unnest()
## # A tibble: 196 × 4
##    group             category       date  value
##    <int>                <chr>     <date>  <dbl>
## 1      1              Revenue 2016-09-24 215639
## 2      1              Revenue 2015-09-26 233715
## 3      1              Revenue 2014-09-27 182795
## 4      1              Revenue 2013-09-28 170910
## 5      2 Other Revenue, Total 2016-09-24     NA
## 6      2 Other Revenue, Total 2015-09-26     NA
## 7      2 Other Revenue, Total 2014-09-27     NA
## 8      2 Other Revenue, Total 2013-09-28     NA
## 9      3        Total Revenue 2016-09-24 215639
## 10     3        Total Revenue 2015-09-26 233715
## # ... with 186 more rows

A slightly more powerful example is looking at all quarterly statements together. This is easy to do with unnest and spread from the tidyr package.

aapl_financials %>%
    unnest(quarter) %>% 
    spread(key = date, value = value)
## # A tibble: 110 × 8
##     type group                         category `2015-09-26` `2015-12-26`
## *  <chr> <int>                            <chr>        <dbl>        <dbl>
## 1     BS     1               Cash & Equivalents         9731         5537
## 2     BS     2           Short Term Investments        20590        21660
## 3     BS     3  Cash and Short Term Investments        41710        38349
## 4     BS     4 Accounts Receivable - Trade, Net        16849        12953
## 5     BS     5              Receivables - Other           NA           NA
## 6     BS     6           Total Receivables, Net        30343        24621
## 7     BS     7                  Total Inventory         2349         2451
## 8     BS     8                 Prepaid Expenses           NA           NA
## 9     BS     9      Other Current Assets, Total        14976        10798
## 10    BS    10             Total Current Assets        89378        76219
## # ... with 100 more rows, and 3 more variables: `2016-03-26` <dbl>,
## #   `2016-06-25` <dbl>, `2016-09-24` <dbl>

The data source is google finance.

Key Ratios:

For any given stock, the key ratios are available for 10 years that are classified into the following sections:

  • Financials: These ratios include gross margin %, operating margin %, EPS, book value per share, and more.
  • Profitability: These ratios include margin as a percentage of sales (gross margin, operating margin, EBT margin, etc) and profitability metrics such as tax rate %, asset turnover, ROA, financial leverage, ROE, return on invested capital, and more.
  • Growth: These ratios include year over year, 3-year average, 5-year average, and 10-year average growth rates for revenue, operating income, net income, and EPS.
  • Cash Flow: These ratios include operating cash flow growth % YOY, free cash flow growth % YOY, capital expenditure as a % of sales, and more.
  • Financial Health: These ratios include balance sheet items as a percentage of total assets and liabilities, and liquidty/financial health metrics such as current ratio, quick ratio, debt/equity, and financial leverage.
  • Efficiency Ratios: These ratios include days sales outstanding, days inventory, inventory turnover, asset turnover and more.
  • Valuation Ratios: These ratios include price to earnings (P/E), price to sales (P/S), price to book (P/B), and price to operating cash flow.

To get the key ratios:

aapl_key_ratios <- tq_get("AAPL", get = "key.ratios")
aapl_key_ratios
## # A tibble: 7 × 2
##             section               data
##               <chr>             <list>
## 1        Financials <tibble [150 × 5]>
## 2     Profitability <tibble [170 × 5]>
## 3            Growth <tibble [160 × 5]>
## 4         Cash Flow  <tibble [50 × 5]>
## 5  Financial Health <tibble [240 × 5]>
## 6 Efficiency Ratios  <tibble [80 × 5]>
## 7  Valuation Ratios  <tibble [40 × 5]>

The ratios can be filtered and unnested to peel away the hierarchical nesting layers and access the underlying data:

aapl_key_ratios %>%
    filter(section == "Valuation Ratios") %>%
    unnest()
## # A tibble: 40 × 6
##             section      sub.section group          category       date
##               <chr>            <chr> <dbl>             <chr>     <date>
## 1  Valuation Ratios Valuation Ratios    86 Price to Earnings 2007-12-31
## 2  Valuation Ratios Valuation Ratios    86 Price to Earnings 2008-12-31
## 3  Valuation Ratios Valuation Ratios    86 Price to Earnings 2009-12-31
## 4  Valuation Ratios Valuation Ratios    86 Price to Earnings 2010-12-31
## 5  Valuation Ratios Valuation Ratios    86 Price to Earnings 2011-12-30
## 6  Valuation Ratios Valuation Ratios    86 Price to Earnings 2012-12-31
## 7  Valuation Ratios Valuation Ratios    86 Price to Earnings 2013-12-31
## 8  Valuation Ratios Valuation Ratios    86 Price to Earnings 2014-12-31
## 9  Valuation Ratios Valuation Ratios    86 Price to Earnings 2015-12-31
## 10 Valuation Ratios Valuation Ratios    86 Price to Earnings 2016-12-30
## # ... with 30 more rows, and 1 more variables: value <dbl>

Once we have a section, we can quickly visualize the ratios:

aapl_key_ratios %>%
    filter(section == "Valuation Ratios") %>%
    unnest() %>%
    ggplot(aes(x = date, y = value)) + 
    geom_line(aes(col = forcats::fct_reorder2(category, date, value))) +
    labs(title = "10-Year Historical Valuation Ratios for AAPL", x = "", 
         y = "", col = "") 

The data source is Morningstar.

Economic Data:

A wealth of economic data can be extracted from the Federal Reserve Economic Data (FRED) database. The WTI Crude Oil Prices are shown below.

wti_price_usd <- tq_get("DCOILWTICO", get = "economic.data")
wti_price_usd 
## # A tibble: 2,612 × 2
##          date price
##        <date> <dbl>
## 1  2007-01-01    NA
## 2  2007-01-02 60.77
## 3  2007-01-03 58.31
## 4  2007-01-04 55.65
## 5  2007-01-05 56.29
## 6  2007-01-08 56.08
## 7  2007-01-09 55.65
## 8  2007-01-10 53.95
## 9  2007-01-11 51.91
## 10 2007-01-12 52.96
## # ... with 2,602 more rows

The FRED contains literally over 10K data sets that are free to use. See the FRED categories to narrow down the data base and to get data codes.

Exchange Rates:

Exchange rates are entered as currency pairs using “/” notation (e.g "EUR/USD"), and by setting get = "exchange.rates".

eur_usd <- tq_get("EUR/USD", get = "exchange.rates", from = "2000-01-01")
eur_usd 
## # A tibble: 1,827 × 2
##          date exchange.rate
##        <date>         <dbl>
## 1  2012-01-07       1.27195
## 2  2012-01-08       1.27151
## 3  2012-01-09       1.27272
## 4  2012-01-10       1.27795
## 5  2012-01-11       1.27334
## 6  2012-01-12       1.27520
## 7  2012-01-13       1.27777
## 8  2012-01-14       1.26809
## 9  2012-01-15       1.26760
## 10 2012-01-16       1.26576
## # ... with 1,817 more rows

The data source is Oanda, and list of currencies to compare can be found on Oanda’s currency converter. It may make more sense to get this data from the FRED (See Economic Data) since the max period for Oanda is 5-years.

Metal Prices:

Metal prices are very similar to stock prices. Set get = "metal.prices" along with the appropriate commodity symbol (e.g. XAU (gold) , XAG (silver), XPD (palladium), or XPT (platinum)).

plat_price_eur <- tq_get("plat", get = "metal.prices", 
                         from = "2000-01-01", base.currency = "EUR")
plat_price_eur 
## # A tibble: 1,827 × 2
##          date   price
##        <date>   <dbl>
## 1  2012-01-07 1104.61
## 2  2012-01-08 1097.12
## 3  2012-01-09 1096.08
## 4  2012-01-10 1091.59
## 5  2012-01-11 1095.54
## 6  2012-01-12 1093.95
## 7  2012-01-13 1091.75
## 8  2012-01-14 1100.09
## 9  2012-01-15 1171.90
## 10 2012-01-16 1173.60
## # ... with 1,817 more rows

The data source is Oanda. It may make more sense to get this data from the FRED (See Economic Data) since the max period for Oanda is 5-years.

Transform and Mutate Quantitative Data

Transform and mutate functions enable the xts, quantmod and TTR functions to shine (see Leverage the Quantitative Power of xts, quantmod and TTR):

Transform Quantitative Data, tq_transform():

Transforms the results of tq_get(). The result is typically a different shape than the input (hence “transformed”), although this is not a requirement. An example is periodicity aggregation from daily to monthly.

fb_prices <- tq_get("FB") 
fb_prices %>%
    tq_transform(ohlc_fun = OHLCV, transform_fun = to.monthly)
## # A tibble: 57 × 6
##        date  open  high   low close    volume
##       <chr> <dbl> <dbl> <dbl> <dbl>     <dbl>
## 1  May 2012 28.55 29.67 26.83 29.60 111639200
## 2  Jun 2012 31.92 31.99 30.76 31.10  19526900
## 3  Jul 2012 23.37 23.37 21.61 21.71  56179400
## 4  Aug 2012 18.68 18.70 18.03 18.06  58764200
## 5  Sep 2012 20.57 21.95 20.50 21.66  65486000
## 6  Oct 2012 20.82 21.50 20.73 21.11  99378200
## 7  Nov 2012 27.26 28.00 26.76 28.00 127049600
## 8  Dec 2012 26.20 26.99 26.11 26.62  60374500
## 9  Jan 2013 29.15 31.47 28.74 30.98 190744900
## 10 Feb 2013 26.84 27.30 26.34 27.25  83027800
## # ... with 47 more rows

Let’s go through what happened. ohlc_fun is one of the various quantmod Open, High, Low, Close (OHLC) functions (see ?quantmod::OHLC). The function returns a column or set of columns from data that are passed to the transform_fun. In example above, OHLCV selects the full list of prices and volumes from data, and sends this to the transform function, to.monthly, which transforms the periodicity from daily to monthly. Additional arguments can be passed to the transform_fun by way of ....

Mutate Quantitative Data, tq_mutate():

Adds a column or set of columns to the tibble with the calculated attributes (hence the original tibble is returned, mutated with the additional columns). An example is getting the MACD from Cl (close price), which mutates the original input by adding MACD and Signal columns.

fb_prices %>%
    tq_mutate(ohlc_fun = Cl, mutate_fun = MACD)
## # A tibble: 1,167 × 9
##          date  open  high   low close    volume adjusted  macd signal
##        <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl>
## 1  2012-05-18 42.05 45.00 38.00 38.23 573576400    38.23    NA     NA
## 2  2012-05-21 36.53 36.66 33.00 34.03 168192700    34.03    NA     NA
## 3  2012-05-22 32.61 33.59 30.94 31.00 101786600    31.00    NA     NA
## 4  2012-05-23 31.37 32.50 31.36 32.00  73600000    32.00    NA     NA
## 5  2012-05-24 32.95 33.21 31.77 33.03  50237200    33.03    NA     NA
## 6  2012-05-25 32.90 32.95 31.11 31.91  37149800    31.91    NA     NA
## 7  2012-05-29 31.48 31.69 28.65 28.84  78063400    28.84    NA     NA
## 8  2012-05-30 28.70 29.55 27.86 28.19  57267900    28.19    NA     NA
## 9  2012-05-31 28.55 29.67 26.83 29.60 111639200    29.60    NA     NA
## 10 2012-06-01 28.89 29.15 27.39 27.72  41855500    27.72    NA     NA
## # ... with 1,157 more rows

Note that a mutation can occur if, and only if, the mutation has the same structure of the original tibble. In other words, the calculation must have the same number of rows and row.names (or date fields), otherwise the mutation cannot be performed.

xy Variants, tq_transform_xy and tq_mutate_xy:

Enables working with:

  1. Transformation functions that require two primary inputs (e.g. EVWMA, VWAP, etc)
  2. Data that is not in OHLC format.

Transformation with two primary inputs:

EVWMA (exponential volume-weighted moving average) requires two inputs, price and volume, that are not in OHLC code format. To work with these columns, we can switch to the xy variants, tq_transform_xy() and tq_mutate_xy(). The only difference is instead of an ohlc_fun argument, you use x and y arguments to pass the columns needed based on the transform_fun or mutate_fun documentation.

fb_prices %>%
    tq_mutate_xy(x = close, y = volume, mutate_fun = EVWMA)
## # A tibble: 1,167 × 8
##          date  open  high   low close    volume adjusted    v1
##        <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl> <dbl>
## 1  2012-05-18 42.05 45.00 38.00 38.23 573576400    38.23    NA
## 2  2012-05-21 36.53 36.66 33.00 34.03 168192700    34.03    NA
## 3  2012-05-22 32.61 33.59 30.94 31.00 101786600    31.00    NA
## 4  2012-05-23 31.37 32.50 31.36 32.00  73600000    32.00    NA
## 5  2012-05-24 32.95 33.21 31.77 33.03  50237200    33.03    NA
## 6  2012-05-25 32.90 32.95 31.11 31.91  37149800    31.91    NA
## 7  2012-05-29 31.48 31.69 28.65 28.84  78063400    28.84    NA
## 8  2012-05-30 28.70 29.55 27.86 28.19  57267900    28.19    NA
## 9  2012-05-31 28.55 29.67 26.83 29.60 111639200    29.60    NA
## 10 2012-06-01 28.89 29.15 27.39 27.72  41855500    27.72 27.72
## # ... with 1,157 more rows

Working with non-OHLC data:

Returns from FRED, Oanda, and other sources do not have open, high, low, close, and volume (OHLCV) format. The following example shows how to transform WTI Crude daily prices to monthly prices. Since we only have a single column to pass, set the x = price and leave the y = NULL. This sends the price column to the to.period transformation fuction.

wti_prices <- tq_get("DCOILWTICO", get = "economic.data") 
wti_prices %>%    
    tq_transform_xy(x = price, transform_fun = to.period,
                    period = "months")
## # A tibble: 121 × 2
##          date price
##        <dttm> <dbl>
## 1  2007-01-31 58.17
## 2  2007-02-28 61.78
## 3  2007-03-30 65.94
## 4  2007-04-30 65.78
## 5  2007-05-31 64.02
## 6  2007-06-29 70.47
## 7  2007-07-31 78.20
## 8  2007-08-31 73.98
## 9  2007-09-28 81.64
## 10 2007-10-31 94.16
## # ... with 111 more rows

Coercing Time Series Objects To and From Tibble

Sometimes you want to work using a tibble and other times you want to work using a xts object. The as_tibble() and as_xts() functions are the key.

Coerce from time-series to tibble, as_tibble():

The tidyquant::as_tibble() function includes a preserve_row_names argument, which is useful when coercing one of the many time formats (e.g. xts, zoo, timeSeries, ts) or matrix objects that contain valuable information in the row names. This makes bridging the gap between the various quantitative analysis packages and the tidyverse much easier.

Let’s start with an xts object.

# Create xts object from a matrix
vals = matrix(c(500, 504, 503))
date = c("2016-01-01", "2016-01-02", "2016-01-03") 
rownames(vals) <- date
time_series_xts <- as_xts(vals)
time_series_xts
##            [,1]
## 2016-01-01  500
## 2016-01-02  504
## 2016-01-03  503

We can easily coerce to tibble by setting preserve_row_names = TRUE. Note the return column is row.names with class of character.

time_series_tbl <- as_tibble(time_series_xts, preserve_row_names = TRUE)
time_series_tbl
## # A tibble: 3 × 2
##    row.names    V1
##        <chr> <dbl>
## 1 2016-01-01   500
## 2 2016-01-02   504
## 3 2016-01-03   503

Converting to date is one extra step with lubridate.

time_series_tbl <- time_series_tbl %>%
    mutate(row.names = lubridate::ymd(row.names))
time_series_tbl
## # A tibble: 3 × 2
##    row.names    V1
##       <date> <dbl>
## 1 2016-01-01   500
## 2 2016-01-02   504
## 3 2016-01-03   503

Coerce from tibble to xts, as_xts():

We can convert back to xts with the tidyquant as_xts() function. Make sure to set the date column (date_col) argument to the column name containing the date (date_col = row.names). The date column must be in a date format (inherits either Date or POSIXct classes).

time_series_xts <- time_series_tbl %>%
    as_xts(date_col = row.names)
time_series_xts
##             V1
## 2016-01-01 500
## 2016-01-02 504
## 2016-01-03 503

Working in the tidyverse

You probably already know and love tidyverse packages like ggplot2, dplyr, tidyr, purrr, readr, and tibble along with lubridate for working with date and datetime. tidyquant works solely in tibbles, so all of the tidyverse functionality is intact.

A simple example inspired by Kan Nishida’s blog shows the dplyr and lubridate capability: Say we want the growth in the stock over the past year. We can do this with dplyr operations.

Getting the last year is simple with dplyr and lubridate. We first select the date and adjusted price (adjusted for stock splits). We then filter using lubridate date functions.

aapl_prices %>%
    select(date, adjusted) %>%
    filter(date >= today() - years(1))
## # A tibble: 253 × 2
##          date adjusted
##        <date>    <dbl>
## 1  2016-01-07 94.35077
## 2  2016-01-08 94.84967
## 3  2016-01-11 96.38550
## 4  2016-01-12 97.78438
## 5  2016-01-13 95.27031
## 6  2016-01-14 97.35395
## 7  2016-01-15 95.01597
## 8  2016-01-19 94.55621
## 9  2016-01-20 94.68337
## 10 2016-01-21 94.20404
## # ... with 243 more rows

We can also get a baseline price using the first function. Adding to our workflow, this looks like:

aapl_prices %>%
    select(date, adjusted) %>%
    filter(date >= today() - years(1)) %>%
    mutate(baseline = first(adjusted))
## # A tibble: 253 × 3
##          date adjusted baseline
##        <date>    <dbl>    <dbl>
## 1  2016-01-07 94.35077 94.35077
## 2  2016-01-08 94.84967 94.35077
## 3  2016-01-11 96.38550 94.35077
## 4  2016-01-12 97.78438 94.35077
## 5  2016-01-13 95.27031 94.35077
## 6  2016-01-14 97.35395 94.35077
## 7  2016-01-15 95.01597 94.35077
## 8  2016-01-19 94.55621 94.35077
## 9  2016-01-20 94.68337 94.35077
## 10 2016-01-21 94.20404 94.35077
## # ... with 243 more rows

Growth and growth percent versus baseline columns can be added now. We tack on a final select statement to remove unnecessary columns. The final workflow looks like this:

aapl_growth <- aapl_prices %>%
    select(date, adjusted) %>%
    filter(date >= today() - years(1)) %>%
    mutate(baseline = first(adjusted),
           growth = adjusted - baseline,
           growth.pct = growth / baseline) %>%
    select(-(baseline:growth))
aapl_growth
## # A tibble: 253 × 3
##          date adjusted   growth.pct
##        <date>    <dbl>        <dbl>
## 1  2016-01-07 94.35077  0.000000000
## 2  2016-01-08 94.84967  0.005287736
## 3  2016-01-11 96.38550  0.021565601
## 4  2016-01-12 97.78438  0.036391934
## 5  2016-01-13 95.27031  0.009746004
## 6  2016-01-14 97.35395  0.031829958
## 7  2016-01-15 95.01597  0.007050287
## 8  2016-01-19 94.55621  0.002177364
## 9  2016-01-20 94.68337  0.003525186
## 10 2016-01-21 94.20404 -0.001555144
## # ... with 243 more rows

And, we can quickly plot using ggplot2.

aapl_growth %>%
    ggplot(aes(x = date, y = growth.pct)) + 
    geom_line() +
    labs(title = "AAPL: Growth Over One Year", x = "", y = "Growth") +
    scale_y_continuous(labels = scales::percent)

Leverage the Quantitative Power of xts, quantmod and TTR

You may already know and love xts / zoo, quantmod and TTR, which is why the core functionality is fully intact. Using tq_transform() and tq_mutate(), we can apply the zoo, xts, quantmod and TTR functions. Entering tq_transform_fun_options() returns a list the transform functions by each package. We’ll discuss these options by package briefly.

tq_transform_fun_options() %>% str()
## List of 4
##  $ zoo     : chr [1:14] "rollapply" "rollapplyr" "rollmax" "rollmax.default" ...
##  $ xts     : chr [1:27] "apply.daily" "apply.monthly" "apply.quarterly" "apply.weekly" ...
##  $ quantmod: chr [1:25] "allReturns" "annualReturn" "ClCl" "dailyReturn" ...
##  $ TTR     : chr [1:61] "adjRatios" "ADX" "ALMA" "aroon" ...

zoo Functionality

# Get zoo functions that work with tq_transform and tq_mutate
tq_transform_fun_options()$zoo
##  [1] "rollapply"          "rollapplyr"         "rollmax"           
##  [4] "rollmax.default"    "rollmaxr"           "rollmean"          
##  [7] "rollmean.default"   "rollmeanr"          "rollmedian"        
## [10] "rollmedian.default" "rollmedianr"        "rollsum"           
## [13] "rollsum.default"    "rollsumr"

The zoo functions that are compatible are listed above. Generally speaking, these are the:

  • Roll Apply Functions:
    • A generic function for applying a function to rolling margins.
    • Form: rollapply(data, width, FUN, ..., by = 1, by.column = TRUE, fill = if (na.pad) NA, na.pad = FALSE, partial = FALSE, align = c("center", "left", "right"), coredata = TRUE).
    • Options include rollmax, rollmean, rollmedian, rollsum, etc.

xts Functionality

# Get xts functions that work with tq_transform and tq_mutate
tq_transform_fun_options()$xts
##  [1] "apply.daily"     "apply.monthly"   "apply.quarterly"
##  [4] "apply.weekly"    "apply.yearly"    "diff.xts"       
##  [7] "lag.xts"         "period.apply"    "period.max"     
## [10] "period.min"      "period.prod"     "period.sum"     
## [13] "periodicity"     "to.daily"        "to.hourly"      
## [16] "to.minutes"      "to.minutes10"    "to.minutes15"   
## [19] "to.minutes3"     "to.minutes30"    "to.minutes5"    
## [22] "to.monthly"      "to.period"       "to.quarterly"   
## [25] "to.weekly"       "to.yearly"       "to_period"

The xts functions that are compatible are listed above. Generally speaking, these are the:

  • Period Apply Functions:
    • Apply a function to a time segment (e.g. max, min, mean, etc).
    • Form: apply.daily(x, FUN, ...).
    • Options include apply.daily, weekly, monthly, quarterly, yearly.
  • To-Period Functions:
    • Convert a time series to time series of lower periodicity (e.g. convert daily to monthly periodicity).
    • Form: to.period(x, period = 'months', k = 1, indexAt, name = NULL, OHLC = TRUE, ...).
    • Options include to.minutes, hourly, daily, weekly, monthly, quarterly, yearly.
    • Note 1 (Important): The return structure is different for to.period and the to.monthly (to.weekly, to.quarterly, etc) forms. to.period returns a date, while to.months returns a character MON YYYY. Best to use to.period if you want to work with time-series via lubridate.

quantmod Functionality

# Get quantmod functions that work with tq_transform and tq_mutate
tq_transform_fun_options()$quantmod
##  [1] "allReturns"      "annualReturn"    "ClCl"           
##  [4] "dailyReturn"     "Delt"            "HiCl"           
##  [7] "Lag"             "LoCl"            "LoHi"           
## [10] "monthlyReturn"   "Next"            "OpCl"           
## [13] "OpHi"            "OpLo"            "OpOp"           
## [16] "periodReturn"    "quarterlyReturn" "seriesAccel"    
## [19] "seriesDecel"     "seriesDecr"      "seriesHi"       
## [22] "seriesIncr"      "seriesLo"        "weeklyReturn"   
## [25] "yearlyReturn"

The quantmod functions that are compatible are listed above. Generally speaking, these are the:

  • Percentage Change (Delt) and Lag Functions
    • Delt: Delt(x1, x2 = NULL, k = 0, type = c("arithmetic", "log"))
      • Variations of Delt: ClCl, HiCl, LoCl, LoHi, OpCl, OpHi, OpLo, OpOp
      • Form: OpCl(OHLC)
    • Lag: Lag(x, k = 1) / Next: Next(x, k = 1) (Can also use dplyr::lag and dplyr::lead)
  • Period Return Functions:
    • Get the arithmetic or logarithmic returns for various periodicities, which include daily, weekly, monthly, quarterly, and yearly.
    • Form: periodReturn(x, period = 'monthly', subset = NULL, type = 'arithmetic', leading = TRUE, ...)
  • Series Functions:
    • Return values that describe the series. Options include describing the increases/decreases, accelerations/decelerations, and hi/low.
    • Forms: seriesHi(x), seriesIncr(x, thresh = 0, diff. = 1L), seriesAccel(x)

TTR Functionality

# Get TTR functions that work with tq_transform and tq_mutate
tq_transform_fun_options()$TTR
##  [1] "adjRatios"          "ADX"                "ALMA"              
##  [4] "aroon"              "ATR"                "BBands"            
##  [7] "CCI"                "chaikinAD"          "chaikinVolatility" 
## [10] "CLV"                "CMF"                "CMO"               
## [13] "DEMA"               "DonchianChannel"    "DPO"               
## [16] "DVI"                "EMA"                "EMV"               
## [19] "EVWMA"              "GMMA"               "growth"            
## [22] "HMA"                "KST"                "lags"              
## [25] "MACD"               "MFI"                "momentum"          
## [28] "OBV"                "PBands"             "ROC"               
## [31] "rollSFM"            "RSI"                "runCor"            
## [34] "runCov"             "runMAD"             "runMax"            
## [37] "runMean"            "runMedian"          "runMin"            
## [40] "runPercentRank"     "runSD"              "runSum"            
## [43] "runVar"             "SAR"                "SMA"               
## [46] "SMI"                "stoch"              "TDI"               
## [49] "TRIX"               "ultimateOscillator" "VHF"               
## [52] "VMA"                "volatility"         "VWAP"              
## [55] "VWMA"               "wilderSum"          "williamsAD"        
## [58] "WMA"                "WPR"                "ZigZag"            
## [61] "ZLEMA"

Here’ a brief description of the most popular functions from TTR:

  • Welles Wilder’s Directional Movement Index:
    • ADX(HLC, n = 14, maType, ...)
  • Bollinger Bands:
    • BBands(HLC, n = 20, maType, sd = 2, ...): Bollinger Bands
  • Rate of Change / Momentum:
    • ROC(x, n = 1, type = c("continuous", "discrete"), na.pad = TRUE): Rate of Change
    • momentum(x, n = 1, na.pad = TRUE): Momentum
  • Moving Averages (maType):
    • SMA(x, n = 10, ...): Simple Moving Average
    • EMA(x, n = 10, wilder = FALSE, ratio = NULL, ...): Exponential Moving Average
    • DEMA(x, n = 10, v = 1, wilder = FALSE, ratio = NULL): Double Exponential Moving Average
    • WMA(x, n = 10, wts = 1:n, ...): Weighted Moving Average
    • EVWMA(price, volume, n = 10, ...): Elastic, Volume-Weighted Moving Average
    • ZLEMA(x, n = 10, ratio = NULL, ...): Zero Lag Exponential Moving Average
    • VWAP(price, volume, n = 10, ...): Volume-Weighted Moving Average Price
    • VMA(x, w, ratio = 1, ...): Variable-Length Moving Average
    • HMA(x, n = 20, ...): Hull Moving Average
    • ALMA(x, n = 9, offset = 0.85, sigma = 6, ...): Arnaud Legoux Moving Average
  • MACD Oscillator:
    • MACD(x, nFast = 12, nSlow = 26, nSig = 9, maType, percent = TRUE, ...)
  • Relative Strength Index:
    • RSI(price, n = 14, maType, ...)
  • runFun:
    • runSum(x, n = 10, cumulative = FALSE): returns sums over a n-period moving window.
    • runMin(x, n = 10, cumulative = FALSE): returns minimums over a n-period moving window.
    • runMax(x, n = 10, cumulative = FALSE): returns maximums over a n-period moving window.
    • runMean(x, n = 10, cumulative = FALSE): returns means over a n-period moving window.
    • runMedian(x, n = 10, non.unique = "mean", cumulative = FALSE): returns medians over a n-period moving window.
    • runCov(x, y, n = 10, use = "all.obs", sample = TRUE, cumulative = FALSE): returns covariances over a n-period moving window.
    • runCor(x, y, n = 10, use = "all.obs", sample = TRUE, cumulative = FALSE): returns correlations over a n-period moving window.
    • runVar(x, y = NULL, n = 10, sample = TRUE, cumulative = FALSE): returns variances over a n-period moving window.
    • runSD(x, n = 10, sample = TRUE, cumulative = FALSE): returns standard deviations over a n-period moving window.
    • runMAD(x, n = 10, center = NULL, stat = "median", constant = 1.4826, non.unique = "mean", cumulative = FALSE): returns median/mean absolute deviations over a n-period moving window.
    • wilderSum(x, n = 10): retuns a Welles Wilder style weighted sum over a n-period moving window.
  • Stochastic Oscillator / Stochastic Momentum Index:
    • stoch(HLC, nFastK = 14, nFastD = 3, nSlowD = 3, maType, bounded = TRUE, smooth = 1, ...): Stochastic Oscillator
    • SMI(HLC, n = 13, nFast = 2, nSlow = 25, nSig = 9, maType, bounded = TRUE, ...): Stochastic Momentum Index

Quantitative Power In Action

We’ll go through some examples, but first let’s get some data. The default for tq_get() is get = "stock.prices", so all we need is to give x a stock symbol.

AAPL <- tq_get("AAPL")

Example 1: Getting the Max Close Price for Each Quarter.

The xts::apply.quarterly() function that is part of the period apply group can be used to apply functions by quarterly time segments. Because we are seeking a return structure that is on a different time scale than the input (quarterly versus daily), we need to use a transform function. We select tq_transform and pass the close price using OHLC format via ohlc_fun = Cl, and we send this subset of the data to the apply.quarterly function via the transform_fun argument. Looking at the documentation for apply.quarterly, we see that we can pass a function to the argument, FUN. We want the maximum values, so we set FUN = max. The result is the quarters returned as a date and the maximum closing price during the quarter returned as a double.

AAPL %>%
    tq_transform(ohlc_fun = Cl, transform_fun = apply.quarterly, FUN = max)
## # A tibble: 41 × 2
##          date  close
##        <dttm>  <dbl>
## 1  2007-03-30  97.10
## 2  2007-06-29 125.09
## 3  2007-09-28 154.50
## 4  2007-12-31 199.83
## 5  2008-03-31 194.93
## 6  2008-06-30 189.96
## 7  2008-09-30 179.55
## 8  2008-12-31 111.04
## 9  2009-03-31 109.87
## 10 2009-06-30 144.67
## # ... with 31 more rows

Note that as an alternative you could use the xy form, replacing ohlc_fun = Cl with x = close.

Example 2: Getting Daily Log Returns

The quantmod::periodReturn() function generates returns by periodicity. We have a few options here. Normally I go with a transform function, tq_transform, because the periodReturn function accepts different periodicity options, and anything other than daily will blow up a mutation. But, in our situation the period returns periodicity is the same as the stock prices periodicity (both daily), so we can use either. We want to use the adjusted closing prices column (adjusted for stock splits, which can make it appear that a stock is performing poorly if a split is included), so we set ohlc_fun = Ad. We researched the periodReturn function, and we found that it accepts type = "log" and period = "daily", which returns the daily log returns.

AAPL %>%
    tq_transform(ohlc_fun = Ad, transform_fun = periodReturn, 
                 type = "log", period = "daily")
## # A tibble: 2,522 × 2
##          date daily.returns
##        <dttm>         <dbl>
## 1  2007-01-03   0.000000000
## 2  2007-01-04   0.021952915
## 3  2007-01-05  -0.007146656
## 4  2007-01-08   0.004926215
## 5  2007-01-09   0.079799676
## 6  2007-01-10   0.046745766
## 7  2007-01-11  -0.012448172
## 8  2007-01-12  -0.012393888
## 9  2007-01-16   0.025872526
## 10 2007-01-17  -0.022390960
## # ... with 2,512 more rows

Example 3: Adding MACD and Bollinger Bands to a OHLC data set

In reviewing the available options in the TTR package, we see that MACD and BBands functions will get us where we need to be. In researching the documentation, the return is in the same periodicity as the input and the functions work with OHLC functions, so we can use tq_mutate(). MACD requires a price, so we select close using Cl, BBands requires high, low, and close, prices so we use HLC. We can chain the inputs together using the pipe (%>%) since mutate just adds columns. The result is a tibble containing the MACD and Bollinger Band results.

AAPL %>%
    tq_mutate(Cl, MACD) %>%
    tq_mutate(HLC, BBands)
## # A tibble: 2,522 × 13
##          date  open  high   low close    volume adjusted  macd signal
##        <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl>
## 1  2007-01-03 86.29 86.58 81.90 83.80 309579900 10.90416    NA     NA
## 2  2007-01-04 84.05 85.95 83.82 85.66 211815100 11.14619    NA     NA
## 3  2007-01-05 85.77 86.20 84.40 85.05 208685400 11.06681    NA     NA
## 4  2007-01-08 85.96 86.53 85.28 85.47 199276700 11.12147    NA     NA
## 5  2007-01-09 86.45 92.98 85.15 92.57 837324600 12.04533    NA     NA
## 6  2007-01-10 94.75 97.80 93.45 97.00 738220000 12.62176    NA     NA
## 7  2007-01-11 95.94 96.78 95.10 95.80 360063200 12.46562    NA     NA
## 8  2007-01-12 94.59 95.06 93.23 94.62 328172600 12.31207    NA     NA
## 9  2007-01-16 95.68 97.25 95.45 97.10 311019100 12.63477    NA     NA
## 10 2007-01-17 97.56 97.60 94.82 94.95 411565000 12.35501    NA     NA
## # ... with 2,512 more rows, and 4 more variables: dn <dbl>, mavg <dbl>,
## #   up <dbl>, pctb <dbl>

Note that for the MACD, we could have used tq_mutate_xy(), setting x = close. However, for the BBands, we are forced to use tq_mutate() because of the HLC input.

Example 4: Getting the Percentage Difference Between Open and Close from Zero to Five Periods

We can’t use the OpCl function for this task since it only returns the percentage difference for a period lag of zero. We keep digging and we find the base Delt function from quantmod. In researching the function, we see that Delt takes one or two inputs, k a series of lags, and the type of difference, either arithmetic or log. We will set x = open and y = close and k = 0:5 to get zero through five periods. The default type = "arithmetic" is acceptable, so there is no need to specify. The result is the percentage difference between the open and close prices for periods zero to five.

AAPL %>%
    tq_mutate_xy(x = open, y = close, mutate_fun = Delt, k = 0:5) %>%
    select(-c(high, low, volume, adjusted))
## # A tibble: 2,522 × 9
##          date  open close delt.0.arithmetic delt.1.arithmetic
##        <date> <dbl> <dbl>             <dbl>             <dbl>
## 1  2007-01-03 86.29 83.80     -0.0288561482                NA
## 2  2007-01-04 84.05 85.66      0.0191552288      -0.007300974
## 3  2007-01-05 85.77 85.05     -0.0083945785       0.011897632
## 4  2007-01-08 85.96 85.47     -0.0057003026      -0.003497726
## 5  2007-01-09 86.45 92.57      0.0707923631       0.076896291
## 6  2007-01-10 94.75 97.00      0.0237466810       0.122035785
## 7  2007-01-11 95.94 95.80     -0.0014592141       0.011081837
## 8  2007-01-12 94.59 94.62      0.0003171688      -0.013758568
## 9  2007-01-16 95.68 97.10      0.0148411267       0.026535542
## 10 2007-01-17 97.56 94.95     -0.0267528282      -0.007629630
## # ... with 2,512 more rows, and 4 more variables: delt.2.arithmetic <dbl>,
## #   delt.3.arithmetic <dbl>, delt.4.arithmetic <dbl>,
## #   delt.5.arithmetic <dbl>

For comparison we’ll inspect the output from the OpCl() function using tq_mutate(). We send OHLC prices to the OpCl function. As expected the OpCl.. column returned is the same as Delt.0.arithmetic from above.

AAPL %>%
    tq_mutate(OHLC, OpCl) %>%
    select(-c(high, low, volume, adjusted))
## # A tibble: 2,522 × 4
##          date  open close        opcl..
##        <date> <dbl> <dbl>         <dbl>
## 1  2007-01-03 86.29 83.80 -0.0288561482
## 2  2007-01-04 84.05 85.66  0.0191552288
## 3  2007-01-05 85.77 85.05 -0.0083945785
## 4  2007-01-08 85.96 85.47 -0.0057003026
## 5  2007-01-09 86.45 92.57  0.0707923631
## 6  2007-01-10 94.75 97.00  0.0237466810
## 7  2007-01-11 95.94 95.80 -0.0014592141
## 8  2007-01-12 94.59 94.62  0.0003171688
## 9  2007-01-16 95.68 97.10  0.0148411267
## 10 2007-01-17 97.56 94.95 -0.0267528282
## # ... with 2,512 more rows

Example 4: Get the 5, 10, 15-Day Rolling Minimum and Maximum Values of the Adjusted Prices

Rolling functions come from the zoo package. In reviewing the available options, we see that rollmax is present but no there is no rollmin. However, the generic rollapply function will work with any function, so let’s use that. In reviewing the documentation, the rollapply() function takes three primary arguments: data, width and FUN. The data is passed using the ohlc_fun argument, so we don’t need to worry about this. The width is the number of periods to apply the function, and for our situation this is 5, 10, and 15. The FUN is the function to apply, and for our situation this is min and max. We want the result to add columns to our data set, so we use the tq_mutate() function, which allows us to pipe (%>%) each mutation. Putting it all together:

AAPL %>%
    tq_mutate(Ad, rollapply, width = 5, FUN = min) %>%
    tq_mutate(Ad, rollapply, width = 10, FUN = min) %>%
    tq_mutate(Ad, rollapply, width = 15, FUN = min) %>%
    tq_mutate(Ad, rollapply, width = 5, FUN = max) %>%
    tq_mutate(Ad, rollapply, width = 10, FUN = max) %>%
    tq_mutate(Ad, rollapply, width = 15, FUN = max)
## # A tibble: 2,522 × 13
##          date  open  high   low close    volume adjusted rollapply
##        <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>     <dbl>
## 1  2007-01-03 86.29 86.58 81.90 83.80 309579900 10.90416        NA
## 2  2007-01-04 84.05 85.95 83.82 85.66 211815100 11.14619        NA
## 3  2007-01-05 85.77 86.20 84.40 85.05 208685400 11.06681        NA
## 4  2007-01-08 85.96 86.53 85.28 85.47 199276700 11.12147        NA
## 5  2007-01-09 86.45 92.98 85.15 92.57 837324600 12.04533  10.90416
## 6  2007-01-10 94.75 97.80 93.45 97.00 738220000 12.62176  11.06681
## 7  2007-01-11 95.94 96.78 95.10 95.80 360063200 12.46562  11.06681
## 8  2007-01-12 94.59 95.06 93.23 94.62 328172600 12.31207  11.12147
## 9  2007-01-16 95.68 97.25 95.45 97.10 311019100 12.63477  12.04533
## 10 2007-01-17 97.56 97.60 94.82 94.95 411565000 12.35501  12.31207
## # ... with 2,512 more rows, and 5 more variables: rollapply.1 <dbl>,
## #   rollapply.2 <dbl>, rollapply.3 <dbl>, rollapply.4 <dbl>,
## #   rollapply.5 <dbl>

Note that the new column names for the rolling minimums and maximums are just the mutate_fun name with a sequential suffix added, which is likely not descriptive enough for our applications. These can be renamed using the rename function (e.g. rename(rolling.min.5 = rollapply) for the five period rolling minimum).

Designed to be Used and Scaled with the tidyverse

Each function has one primary input and one output. This allows chaining operations with the pipe (%>%), and mapping to extend to lists of many stocks, exchange rates, metals, economic data, financial statements, etc. The rationale behind this is simple: let the function handle the operation, let the tidyverse handle the iteration.

Rather than explain, let’s go through a simple workflow using the tidyverse. We setup a two step workflow:

  1. Analyze a single stock
  2. Scale to many stocks

Analyze a Single Stock

In our hypothetical situation, we want to compare the mean monthly log returns (MMLR). First, let’s come up with a function to help us collect log returns. The function below performs three operations internally. It first gets the stock prices using tq_get(). Then, it transforms the stock prices to period returns using tq_transform(). We add the type = "log" and period = "monthly" arguments to ensure we retrieve a tibble of monthly log returns. Last, we take the mean of the monthly returns to get MMLR.

my_stock_analysis_fun <- function(stock.symbol) {
    period.returns <- stock.symbol %>%
        tq_get(get = "stock.prices") %>%
        tq_transform(ohlc_fun = Ad, transform_fun = periodReturn, 
                     type = "log", period = "monthly")
    mean(period.returns$monthly.returns)
}

And, let’s test it out. We now have the mean monthly log returns over the past ten years.

my_stock_analysis_fun("AAPL")
## [1] 0.01967584

Extrapolate to Many Stocks using tidyverse

Now that we have one stock down, we can scale to many stocks. For brevity, we’ll randomly sample ten stocks from the S&P500 with a call to dplyr::sample_n().

set.seed(100)
stocks <- tq_get("SP500", get = "stock.index") %>%
    sample_n(10)
stocks
## # A tibble: 10 × 2
##    symbol            company
##     <chr>              <chr>
## 1     EMC                EMC
## 2    XRAY      DENTSPLY INTL
## 3     MNK   MALLINCKRODT PLC
## 4     AIG      AMERICAN INTL
## 5    INTC              INTEL
## 6     IVZ            INVESCO
## 7      SE     SPECTRA ENERGY
## 8    FLIR       FLIR SYSTEMS
## 9       L              LOEWS
## 10    CNP CENTERPOINT ENERGY

We can now apply our analysis function to the stocks using dplyr::mutate and purrr::map_dbl. The mutate() function adds a column to our tibble, and the map_dbl() function maps our my_stock_analysis_fun to our tibble of stocks using the symbol column.

stocks <- stocks %>%
    mutate(mmlr = map_dbl(symbol, my_stock_analysis_fun)) %>%
    arrange(desc(mmlr))
stocks
## # A tibble: 10 × 3
##    symbol            company         mmlr
##     <chr>              <chr>        <dbl>
## 1    INTC              INTEL  0.007399353
## 2    FLIR       FLIR SYSTEMS  0.007324195
## 3     EMC                EMC  0.007208655
## 4     CNP CENTERPOINT ENERGY  0.007100799
## 5      SE     SPECTRA ENERGY  0.006802450
## 6    XRAY      DENTSPLY INTL  0.005952766
## 7     IVZ            INVESCO  0.004606504
## 8     MNK   MALLINCKRODT PLC  0.003611393
## 9       L              LOEWS  0.001577270
## 10    AIG      AMERICAN INTL -0.023182725

And, we’re done! We now have the MMLR for 10-years of stock data for 10 stocks. And, we can easily extend this to larger lists or stock indexes. For example, the entire S&P500 could be analyzed removing the sample_n() following the call to tq_get("SP500", get = "stock.index").

Function tq_get() Designed to Handle Errors Gracefully

Eventually you will run into a stock index, stock symbol, FRED data code, etc that cannot be retrieved. Possible reasons are:

  • The website changes
  • An index becomes out of date
  • A company goes private
  • A stock ticker symbol changes
  • Yahoo / FRED just doesn’t like your stock symbol / FRED code

This becomes painful when scaling if the functions return errors. So, the tq_get() function is designed to handle errors gracefully. What this means is a NA value is returned when an error is generated along with a gentle error warning. There are pros and cons to this approach that you may not agree with but I believe helps in the long run. Just be aware of what happens:

  • Pros: Long running scripts are not interrupted because of one error

  • Cons: Errors flow downstream if not looking at warnings and not reviewing results

With tq_get(), Bad Apples Fail Gracefully:

Let’s see an example when mapping to tq_get() to a long list of stocks with one BAD APPLE.

stock_list_with_one_bad_apple <- tibble( 
    symbol = c("AAPL", "GOOG", "AMZN", "FB", "BAD APPLE",
               "AVGO", "SWKS","NVDA", "V", "MA")
)
stock_list_with_one_bad_apple <- stock_list_with_one_bad_apple %>%
    mutate(stock.prices = map(.x = symbol, ~ tq_get(x = .x, get = "stock.prices")))
## Warning in value[[3L]](cond): Error at stock symbol BAD APPLE during call
## to quantmod::getSymbols.

We get warned that there was an issue in the operation. With that said, we still get the full list of stocks.

stock_list_with_one_bad_apple
## # A tibble: 10 × 2
##       symbol         stock.prices
##        <chr>               <list>
## 1       AAPL <tibble [2,522 × 7]>
## 2       GOOG <tibble [2,522 × 7]>
## 3       AMZN <tibble [2,522 × 7]>
## 4         FB <tibble [1,167 × 7]>
## 5  BAD APPLE            <lgl [1]>
## 6       AVGO <tibble [1,869 × 7]>
## 7       SWKS <tibble [2,522 × 7]>
## 8       NVDA <tibble [2,522 × 7]>
## 9          V <tibble [2,218 × 7]>
## 10        MA <tibble [2,522 × 7]>

Say hypothetically we didn’t recognize the error message. An error shows up during the next operation. As an example, we’ll attempt to get yearly period returns using tq_transform. The operation is wrapped in a tryCatch() statement to enable printing the error message.

tryCatch({
    stock_list_with_one_bad_apple %>%
    mutate(annual.returns = map(.x = stock.prices, 
                                ~ tq_transform(x = .x,
                                               ohlc_fun = Ad, 
                                               transform_fun = periodReturn, 
                                               period = "yearly")
                                )
           )
}, error = function(e) {
    print(e)
})
## <Rcpp::eval_error in eval(substitute(expr), envir, enclos): argument "data" is missing, with no default>

The operation grinds to a hault because the BAD APPLE tried to send its value for stock.prices of NA to the tq_transform() function. The error message tells us that data is not a tibble or data.frame.

The rationale behind the error handling approach is that long-running scripts should not fail during minor issues. For example, if you have a list of 3000 stocks and the 3000th is bad, the program could take 20+ minutes to fail. This is disheartening. We allow tq_get() to continue to fetch data even if an error is encountered. Failure occurs during tq_transform() and tq_mutate() to prevent the error from getting too far downstream.

Recognizing how tq_get() works (and gracefully fails), we can adjust our workflow. It’s a good idea to collect stock information in one independent step, review any warnings / errors, and remove “bad apples” if present before moving on to any transformations or mutations.

Here’s an example of a good workflow:

stock_list_with_one_bad_apple <- tibble( 
    symbol = c("AAPL", "GOOG", "AMZN", "FB", "BAD APPLE",
               "AVGO", "SWKS","NVDA", "V", "MA")
    ) %>%
    # Step 1: Get stock prices
    mutate(stock.prices = map(.x = symbol, ~ tq_get(x = .x, get = "stock.prices")),
           class = map_chr(.x = stock.prices, ~ class(.x)[[1]])) %>%
    # Step 2: Filter out errors; errors have a class of "logical"
    filter(class != "logical") %>%
    select(-class) %>%
    # Step 3: Perform period returns
    mutate(annual.returns = map(.x = stock.prices, 
                                ~ tq_transform(data = .x,
                                               ohlc_fun = Ad, 
                                               transform_fun = periodReturn, 
                                               period = "yearly")
                                )
           )
stock_list_with_one_bad_apple
## # A tibble: 9 × 3
##   symbol         stock.prices    annual.returns
##    <chr>               <list>            <list>
## 1   AAPL <tibble [2,522 × 7]> <tibble [11 × 2]>
## 2   GOOG <tibble [2,522 × 7]> <tibble [11 × 2]>
## 3   AMZN <tibble [2,522 × 7]> <tibble [11 × 2]>
## 4     FB <tibble [1,167 × 7]>  <tibble [6 × 2]>
## 5   AVGO <tibble [1,869 × 7]>  <tibble [9 × 2]>
## 6   SWKS <tibble [2,522 × 7]> <tibble [11 × 2]>
## 7   NVDA <tibble [2,522 × 7]> <tibble [11 × 2]>
## 8      V <tibble [2,218 × 7]> <tibble [10 × 2]>
## 9     MA <tibble [2,522 × 7]> <tibble [11 × 2]>

Fall Back for Stock Indexes:

There’s a fallback for the stock indexes too. Since the source, www.marketvolume.com, could change over time, an option is provided to pull stored data within the tidyquant package. The downside is that the data is only as accurate as the last update to tidyquant. Here’s how to get the stock indexes locally if for some reason the website is down or has changed.

tq_get("SP500", get = "stock.index", use_fallback = TRUE)
## Using fallback dataset last downloaded 2016-12-23.
## # A tibble: 501 × 2
##    symbol                   company
##     <chr>                     <chr>
## 1     MMM                        3M
## 2     ABT       ABBOTT LABORATORIES
## 3    ABBV                ABBVIE INC
## 4     ACN                 ACCENTURE
## 5    ATVI       ACTIVISION BLIZZARD
## 6     AYI             ACUITY BRANDS
## 7    ADBE             ADOBE SYSTEMS
## 8     AAP        ADVANCE AUTO PARTS
## 9     AET                     AETNA
## 10    AMG AFFILIATED MANAGERS GROUP
## # ... with 491 more rows

Recap

Hopefully now you see how tidyquant helps to integrate the best quantitative financial analysis packages with the tidyverse. With a few, easy-to-use core functions, you can efficiently leverage the quantitative power of xts and zoo, quantmod and TTR with the data management infrastructure and scale-ability of the tidyverse.