Often with time series you want to aggregate your dataset to a less granular period. An example of this might be moving from a daily series to a monthly series to look at broader trends in your data. as_period()
allows you to do exactly this.
The period
argument in as_period()
for specifying the transformation you want is a character with a general format of "frequency period"
where frequency is a number like 1 or 2, and period is an interval like weekly
or yearly
. There must be a space between the two.
library(tibbletime)
library(dplyr)
# Facebook stock prices.
data(FB)
# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)
# FANG stock prices
data(FANG)
# Convert FANG to tbl_time and group
FANG <- as_tbl_time(FANG, index = date) %>%
group_by(symbol)
To see this in action, transform the daily FB
data set to monthly data.
as_period(FB, '1 month')
## # A time tibble: 48 x 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400 28.0
## 2 FB 2013-02-01 31.0 31.0 29.6 29.7 85856700 29.7
## 3 FB 2013-03-01 27.0 28.1 26.8 27.8 54064800 27.8
## 4 FB 2013-04-01 25.6 25.9 25.3 25.5 22249300 25.5
## 5 FB 2013-05-01 27.8 27.9 27.3 27.4 64567600 27.4
## 6 FB 2013-06-03 24.3 24.3 23.7 23.8 35733800 23.8
## 7 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
## 8 FB 2013-08-01 37.3 38.3 36.9 37.5 106066500 37.5
## 9 FB 2013-09-03 41.8 42.2 41.5 41.9 48774900 41.9
## 10 FB 2013-10-01 50.0 51.0 49.5 50.4 98114000 50.4
## # ... with 38 more rows
# Additionally, the following are equivalent
# as_period(FB, 'monthly')
# as_period(FB, 'm')
# as_period(FB, '1 m')
You aren’t restricted to only 1 month periods. Maybe you wanted every 2 months?
as_period(FB, '2 m')
## # A time tibble: 24 x 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400 28.0
## 2 FB 2013-03-01 27.0 28.1 26.8 27.8 54064800 27.8
## 3 FB 2013-05-01 27.8 27.9 27.3 27.4 64567600 27.4
## 4 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
## 5 FB 2013-09-03 41.8 42.2 41.5 41.9 48774900 41.9
## 6 FB 2013-11-01 50.8 52.1 49.7 49.8 95033000 49.8
## 7 FB 2014-01-02 54.8 55.2 54.2 54.7 43195500 54.7
## 8 FB 2014-03-03 67.0 68.1 66.5 67.4 56824100 67.4
## 9 FB 2014-05-01 60.4 62.3 60.2 61.2 82429000 61.2
## 10 FB 2014-07-01 67.6 68.4 67.4 68.1 33243000 68.1
## # ... with 14 more rows
Or maybe every 25 days? Note that the dates do not line up exactly with a difference of 25 days. This is due to the data set not being completely regular (there are gaps due to weekends and holidays). as_period()
chooses the first date it can find in the period specified.
as_period(FB, '25 d')
## # A time tibble: 59 x 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400 28.0
## 2 FB 2013-01-28 31.9 32.5 31.8 32.5 59682500 32.5
## 3 FB 2013-02-21 28.3 28.5 27.2 27.3 49642300 27.3
## 4 FB 2013-03-18 26.4 26.8 25.8 26.5 26653700 26.5
## 5 FB 2013-04-12 28.0 28.0 27.2 27.4 28697400 27.4
## 6 FB 2013-05-07 27.5 27.8 26.8 26.9 41259100 26.9
## 7 FB 2013-06-03 24.3 24.3 23.7 23.8 35733800 23.8
## 8 FB 2013-06-26 24.5 24.6 24.0 24.2 29890300 24.2
## 9 FB 2013-07-22 26.0 26.1 25.7 26.0 27526300 26.0
## 10 FB 2013-08-15 36.4 37.1 36.0 36.6 56521100 36.6
## # ... with 49 more rows
start_date
argumentBy default, the date that starts the first group is calculated as:
Find the minimum date in your dataset.
Floor that date to the period that you specified.
In the 1 month example above, 2013-01-02
is the first date in the series, and because “monthly” was chosen, the first group is defined as (2013-01-01 to 2013-01-31).
Occasionally this is not what you want. Consider what would happen if you changed the period to “every 2 days”. The first date is 2013-01-02
, but because “daily” is chosen, this isn’t floored to 2013-01-01
so the groups are (2013-01-02, 2013-01-03), (2013-01-04, 2013-01-05) and so on. If you wanted the first group to be (2013-01-01, 2013-01-02), you can use the start_date
argument.
# Without start_date
as_period(FB, '2 d')
## # A time tibble: 607 x 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400 28.0
## 2 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 3 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 4 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 5 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 6 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 7 FB 2013-01-16 30.2 30.4 29.5 29.8 75332700 29.8
## 8 FB 2013-01-18 30.3 30.4 29.3 29.7 49631500 29.7
## 9 FB 2013-01-22 29.8 30.9 29.7 30.7 55243300 30.7
## 10 FB 2013-01-24 31.3 31.5 30.8 31.1 43845100 31.1
## # ... with 597 more rows
# With start_date
as_period(FB, '2 d', start_date = "2013-01-01")
## # A time tibble: 619 x 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400 28.0
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 4 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 5 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 6 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 7 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## 8 FB 2013-01-17 30.1 30.4 30.0 30.1 40256700 30.1
## 9 FB 2013-01-22 29.8 30.9 29.7 30.7 55243300 30.7
## 10 FB 2013-01-23 31.1 31.5 30.8 30.8 48899800 30.8
## # ... with 609 more rows
side
argumentBy default, the first date per period is returned. If you want the end of each period instead, specify the side = "end"
argument.
as_period(FB, 'y', side = "end")
## # A time tibble: 4 x 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-12-31 54.1 54.9 53.9 54.7 43076200 54.7
## 2 FB 2014-12-31 79.5 79.8 77.9 78.0 19935400 78.0
## 3 FB 2015-12-31 106 106 105 105 18298700 105
## 4 FB 2016-12-30 117 117 115 115 18600100 115
One of the neat things about working in the tidyverse
is that these functions can also work with grouped datasets. Here we transform the daily series of the 4 FANG stocks to a periodicity of every 2 years.
FANG %>%
as_period('2 y')
## # A time tibble: 8 x 8
## # Index: date
## # Groups: symbol [4]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400 28.0
## 2 FB 2015-01-02 78.6 78.9 77.7 78.4 18177500 78.4
## 3 AMZN 2013-01-02 256 258 253 257 3271000 257
## 4 AMZN 2015-01-02 313 315 307 309 2783200 309
## 5 NFLX 2013-01-02 95.2 95.8 90.7 92.0 19431300 13.1
## 6 NFLX 2015-01-02 344 352 341 349 13475000 49.8
## 7 GOOG 2013-01-02 719 727 717 723 5101500 361
## 8 GOOG 2015-01-02 529 531 524 525 1447500 525