tibbletime
on its own has useful functions for manipulating time-based tibbles, but one of its most useful features is how nicely it plays with dplyr
. Traditionally, performing grouped analysis over a time period with dplyr
(like quarterly / monthly summaries) is doable, but it could be easier and typically requires use of the lubridate
package along with the creation of multiple columns to group on. Below is an example of a monthly summary from a daily dataset.
library(tibbletime)
library(dplyr)
library(lubridate)
series <- create_series('2013' ~ '2017', 'daily', class = "Date") %>%
mutate(var = rnorm(1826))
series
## # A time tibble: 1,826 x 2
## # Index: date
## date var
## <date> <dbl>
## 1 2013-01-01 -0.0243
## 2 2013-01-02 -0.917
## 3 2013-01-03 1.46
## 4 2013-01-04 -0.303
## 5 2013-01-05 -1.07
## 6 2013-01-06 -0.827
## 7 2013-01-07 -0.655
## 8 2013-01-08 -0.316
## 9 2013-01-09 1.22
## 10 2013-01-10 -1.33
## # ... with 1,816 more rows
series %>%
mutate(year = year(date), month = month(date)) %>%
group_by(year, month) %>%
summarise(mean_var = mean(var))
## # A tibble: 60 x 3
## year month mean_var
## <dbl> <dbl> <dbl>
## 1 2013 1.00 -0.266
## 2 2013 2.00 -0.0204
## 3 2013 3.00 0.181
## 4 2013 4.00 -0.0841
## 5 2013 5.00 0.606
## 6 2013 6.00 -0.221
## 7 2013 7.00 -0.183
## 8 2013 8.00 0.126
## 9 2013 9.00 -0.318
## 10 2013 10.0 -0.376
## # ... with 50 more rows
This get’s more difficult the more granular you go. Getting 5-minute summaries from minute or second data requires grouping on year, month, day, hour and minute columns.
With tibbletime
, rather than creating new columns to group on, you manipulate your original date column into something that corresponds to the period you are summarising at. The tibbletime
way to do this is with collapse_by()
.
series %>%
collapse_by("monthly") %>%
group_by(date) %>%
summarise(mean_var = mean(var))
## # A time tibble: 60 x 2
## # Index: date
## date mean_var
## <date> <dbl>
## 1 2013-01-31 -0.266
## 2 2013-02-28 -0.0204
## 3 2013-03-31 0.181
## 4 2013-04-30 -0.0841
## 5 2013-05-31 0.606
## 6 2013-06-30 -0.221
## 7 2013-07-31 -0.183
## 8 2013-08-31 0.126
## 9 2013-09-30 -0.318
## 10 2013-10-31 -0.376
## # ... with 50 more rows
While collapse_by()
directly manipulates the index column, the lower level collapse_index()
function can be used inside of a call to mutate()
to modify the index column and then save it as a new column. This can be useful if you don’t want to lose the original index column.
This works for more granular series too. Below we aggregate 5 second level data up to hourly. This is working with a faily sizable ~19 million row data set.
second_series <- create_series('2013' ~ '2015', '5 second')
second_series %>%
mutate(var = rnorm(nrow(second_series))) %>%
collapse_by("hourly") %>%
group_by(date) %>%
summarise(mean_var = mean(var))
## # A time tibble: 26,280 x 2
## # Index: date
## date mean_var
## <dttm> <dbl>
## 1 2013-01-01 00:59:55 0.0102
## 2 2013-01-01 01:59:55 -0.0132
## 3 2013-01-01 02:59:55 0.0907
## 4 2013-01-01 03:59:55 0.00358
## 5 2013-01-01 04:59:55 0.0331
## 6 2013-01-01 05:59:55 0.0316
## 7 2013-01-01 06:59:55 0.0676
## 8 2013-01-01 07:59:55 0.0325
## 9 2013-01-01 08:59:55 0.00153
## 10 2013-01-01 09:59:55 -0.00630
## # ... with 26,270 more rows
One really powerful benefit of working in the tidyverse
is being able to manipulate multiple series at once. Essentially we can create multiple layers of groupings, one for the stocks we are working with (like Facebook and Apple), and one for the period you want to summarise your data at (daily, yearly, etc).
Below we use create_series()
to create two dummy hourly price series, combine them, and calculate the OHLC (Open, High, Low, Close) prices per day by first collapsing to daily with collapse_by()
to have something to group on.
set.seed(123)
# Create price series of hourly movements for apple and facebook stock.
apple <- create_series('2014' ~ '2016', period = '1 hour') %>%
mutate(price = 100 + cumsum(rnorm(26304, mean = 0, sd = .5)))
facebook <- create_series('2014' ~ '2016', period = '1 hour') %>%
mutate(price = 150 + cumsum(rnorm(26304, mean = 0, sd = .5)))
# Bind them together and create a symbol column to group on
price_series <- bind_rows(list(apple = apple, facebook = facebook), .id = "symbol") %>%
as_tbl_time(date) %>%
group_by(symbol)
# Collapse to daily and transform to OHLC (Open, High, Low, Close), a
# common financial transformation
price_series %>%
collapse_by("daily") %>%
group_by(date, add = TRUE) %>%
summarise(
open = first(price),
high = max(price),
low = min(price),
close = last(price)
) %>%
slice(1:5)
## # A time tibble: 10 x 6
## # Index: date
## # Groups: symbol [2]
## symbol date open high low close
## <chr> <dttm> <dbl> <dbl> <dbl> <dbl>
## 1 apple 2014-01-01 23:00:00 99.7 102 99.6 99.9
## 2 apple 2014-01-02 23:00:00 99.6 102 98.7 101
## 3 apple 2014-01-03 23:00:00 101 103 101 101
## 4 apple 2014-01-04 23:00:00 102 104 101 103
## 5 apple 2014-01-05 23:00:00 104 105 101 101
## 6 facebook 2014-01-01 23:00:00 150 152 149 149
## 7 facebook 2014-01-02 23:00:00 149 151 148 149
## 8 facebook 2014-01-03 23:00:00 150 150 147 148
## 9 facebook 2014-01-04 23:00:00 148 150 144 145
## 10 facebook 2014-01-05 23:00:00 145 146 144 146