The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
A common task in financial analyses is to perform a rolling
calculation. This might be a single value like a rolling mean or
standard deviation, or it might be more complicated like a rolling
linear regression. To account for this flexibility,
tibbletime has the rollify() function. This
function allows you to turn any function into a rolling version
of itself.
In the tidyverse, this type of function is known as an
adverb because it modifies an existing function, which
are typically given verb names.
To calculate a rolling average, picture a column in a data frame where you take the average of the values in rows 1-5, then in rows 2-6, then in 3-7, and so on until you reach the end of the dataset. This type of 5-period moving window is a rolling calculation, and is often used to smooth out noise in a dataset.
Let’s see how to do this with rollify().
# The function to use at each step is `mean`.
# The window size is 5
rolling_mean <- rollify(mean, window = 5)
rolling_mean## function (...) 
## {
##     roller(..., .f = .f, window = window, unlist = unlist, na_value = na_value)
## }
## <bytecode: 0x10e4bd4c0>
## <environment: 0x10e4b7ce8>We now have a rolling version of the function, mean().
You use it in a similar way to how you might use
mean().
## # A tibble: 1,008 × 6
##    symbol date        open close adjusted mean_5
##    <chr>  <date>     <dbl> <dbl>    <dbl>  <dbl>
##  1 FB     2013-01-02  27.4  28       28     NA  
##  2 FB     2013-01-03  27.9  27.8     27.8   NA  
##  3 FB     2013-01-04  28.0  28.8     28.8   NA  
##  4 FB     2013-01-07  28.7  29.4     29.4   NA  
##  5 FB     2013-01-08  29.5  29.1     29.1   28.6
##  6 FB     2013-01-09  29.7  30.6     30.6   29.1
##  7 FB     2013-01-10  30.6  31.3     31.3   29.8
##  8 FB     2013-01-11  31.3  31.7     31.7   30.4
##  9 FB     2013-01-14  32.1  31.0     31.0   30.7
## 10 FB     2013-01-15  30.6  30.1     30.1   30.9
## # ℹ 998 more rowsYou can create multiple versions of the rolling function if you need to calculate the mean at multiple window lengths.
rolling_mean_2 <- rollify(mean, window = 2)
rolling_mean_3 <- rollify(mean, window = 3)
rolling_mean_4 <- rollify(mean, window = 4)
FB %>% mutate(
  rm10 = rolling_mean_2(adjusted),
  rm20 = rolling_mean_3(adjusted),
  rm30 = rolling_mean_4(adjusted)
)## # A tibble: 1,008 × 8
##    symbol date        open close adjusted  rm10  rm20  rm30
##    <chr>  <date>     <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl>
##  1 FB     2013-01-02  27.4  28       28    NA    NA    NA  
##  2 FB     2013-01-03  27.9  27.8     27.8  27.9  NA    NA  
##  3 FB     2013-01-04  28.0  28.8     28.8  28.3  28.2  NA  
##  4 FB     2013-01-07  28.7  29.4     29.4  29.1  28.6  28.5
##  5 FB     2013-01-08  29.5  29.1     29.1  29.2  29.1  28.8
##  6 FB     2013-01-09  29.7  30.6     30.6  29.8  29.7  29.5
##  7 FB     2013-01-10  30.6  31.3     31.3  30.9  30.3  30.1
##  8 FB     2013-01-11  31.3  31.7     31.7  31.5  31.2  30.7
##  9 FB     2013-01-14  32.1  31.0     31.0  31.3  31.3  31.1
## 10 FB     2013-01-15  30.6  30.1     30.1  30.5  30.9  31.0
## # ℹ 998 more rowsrollify() is built using pieces from the
purrr package. One of those is the ability to accept an
anonymous function using the ~ function syntax.
The documentation, ?rollify, gives a thorough
walkthrough of the different forms you can pass to
rollify(), but let’s see a few more examples.
# Rolling mean, but with function syntax
rolling_mean <- rollify(.f = ~mean(.x), window = 5)
mutate(FB, mean_5 = rolling_mean(adjusted))## # A tibble: 1,008 × 6
##    symbol date        open close adjusted mean_5
##    <chr>  <date>     <dbl> <dbl>    <dbl>  <dbl>
##  1 FB     2013-01-02  27.4  28       28     NA  
##  2 FB     2013-01-03  27.9  27.8     27.8   NA  
##  3 FB     2013-01-04  28.0  28.8     28.8   NA  
##  4 FB     2013-01-07  28.7  29.4     29.4   NA  
##  5 FB     2013-01-08  29.5  29.1     29.1   28.6
##  6 FB     2013-01-09  29.7  30.6     30.6   29.1
##  7 FB     2013-01-10  30.6  31.3     31.3   29.8
##  8 FB     2013-01-11  31.3  31.7     31.7   30.4
##  9 FB     2013-01-14  32.1  31.0     31.0   30.7
## 10 FB     2013-01-15  30.6  30.1     30.1   30.9
## # ℹ 998 more rowsYou can create anonymous functions (functions without a name) on the fly.
# 5 period average of 2 columns (open and close)
rolling_avg_sum <- rollify(~ mean(.x + .y), window = 5)
mutate(FB, avg_sum = rolling_avg_sum(open, close))## # A tibble: 1,008 × 6
##    symbol date        open close adjusted avg_sum
##    <chr>  <date>     <dbl> <dbl>    <dbl>   <dbl>
##  1 FB     2013-01-02  27.4  28       28      NA  
##  2 FB     2013-01-03  27.9  27.8     27.8    NA  
##  3 FB     2013-01-04  28.0  28.8     28.8    NA  
##  4 FB     2013-01-07  28.7  29.4     29.4    NA  
##  5 FB     2013-01-08  29.5  29.1     29.1    56.9
##  6 FB     2013-01-09  29.7  30.6     30.6    57.9
##  7 FB     2013-01-10  30.6  31.3     31.3    59.1
##  8 FB     2013-01-11  31.3  31.7     31.7    60.4
##  9 FB     2013-01-14  32.1  31.0     31.0    61.4
## 10 FB     2013-01-15  30.6  30.1     30.1    61.8
## # ℹ 998 more rowsTo pass optional arguments (not .x or .y)
to your rolling function, they must be specified in the non-rolling form
in the call to rollify().
For instance, say our dataset had NA values, but we
still wanted to calculate an average. We need to specify
na.rm = TRUE as an argument to mean().
FB$adjusted[1] <- NA
# Do this
rolling_mean_na <- rollify(~mean(.x, na.rm = TRUE), window = 5)
FB %>% mutate(mean_na = rolling_mean_na(adjusted))## # A tibble: 1,008 × 6
##    symbol date        open close adjusted mean_na
##    <chr>  <date>     <dbl> <dbl>    <dbl>   <dbl>
##  1 FB     2013-01-02  27.4  28       NA      NA  
##  2 FB     2013-01-03  27.9  27.8     27.8    NA  
##  3 FB     2013-01-04  28.0  28.8     28.8    NA  
##  4 FB     2013-01-07  28.7  29.4     29.4    NA  
##  5 FB     2013-01-08  29.5  29.1     29.1    28.8
##  6 FB     2013-01-09  29.7  30.6     30.6    29.1
##  7 FB     2013-01-10  30.6  31.3     31.3    29.8
##  8 FB     2013-01-11  31.3  31.7     31.7    30.4
##  9 FB     2013-01-14  32.1  31.0     31.0    30.7
## 10 FB     2013-01-15  30.6  30.1     30.1    30.9
## # ℹ 998 more rowsSay our rolling function returned a call to a custom
summary_df() function. This function calculates a 5 number
number summary and returns it as a tidy data frame.
We won’t be able to use the rolling version of this out of the box.
dplyr::mutate() will complain that an incorrect number of
values were returned since rollify() attempts to unlist at
each call. Essentially, each call would be returning 5 values instead of
1. What we need is to be able to create a list-column. To do this,
specify unlist = FALSE in the call to
rollify().
# Our data frame summary
summary_df <- function(x) {
  data.frame(  
    rolled_summary_type = c("mean", "sd",  "min",  "max",  "median"),
    rolled_summary_val  = c(mean(x), sd(x), min(x), max(x), median(x))
  )
}
# A rolling version, with unlist = FALSE
rolling_summary <- rollify(~summary_df(.x), window = 5, 
                           unlist = FALSE)
FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised## # A tibble: 1,008 × 4
##    symbol date       adjusted summary_list_col
##    <chr>  <date>        <dbl> <list>          
##  1 FB     2013-01-02     28   <lgl [1]>       
##  2 FB     2013-01-03     27.8 <lgl [1]>       
##  3 FB     2013-01-04     28.8 <lgl [1]>       
##  4 FB     2013-01-07     29.4 <lgl [1]>       
##  5 FB     2013-01-08     29.1 <df [5 × 2]>    
##  6 FB     2013-01-09     30.6 <df [5 × 2]>    
##  7 FB     2013-01-10     31.3 <df [5 × 2]>    
##  8 FB     2013-01-11     31.7 <df [5 × 2]>    
##  9 FB     2013-01-14     31.0 <df [5 × 2]>    
## 10 FB     2013-01-15     30.1 <df [5 × 2]>    
## # ℹ 998 more rowsThe neat thing is that after removing the NA values at
the beginning, the list-column can be unnested using
tidyr::unnest() giving us a nice tidy 5-period rolling
summary.
## # A tibble: 5,020 × 5
##    symbol date       adjusted rolled_summary_type rolled_summary_val
##    <chr>  <date>        <dbl> <chr>                            <dbl>
##  1 FB     2013-01-08     29.1 mean                            28.6  
##  2 FB     2013-01-08     29.1 sd                               0.700
##  3 FB     2013-01-08     29.1 min                             27.8  
##  4 FB     2013-01-08     29.1 max                             29.4  
##  5 FB     2013-01-08     29.1 median                          28.8  
##  6 FB     2013-01-09     30.6 mean                            29.1  
##  7 FB     2013-01-09     30.6 sd                               1.03 
##  8 FB     2013-01-09     30.6 min                             27.8  
##  9 FB     2013-01-09     30.6 max                             30.6  
## 10 FB     2013-01-09     30.6 median                          29.1  
## # ℹ 5,010 more rowsThe last example was a little clunky because to unnest we had to
remove the first few missing rows manually. If those missing values were
empty data frames then unnest() would have known how to
handle them. Luckily, the na_value argument will allow us
to specify a value to fill the NA spots at the beginning of
the roll.
rolling_summary <- rollify(~summary_df(.x), window = 5, 
                           unlist = FALSE, na_value = data.frame())
FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised## # A tibble: 1,008 × 4
##    symbol date       adjusted summary_list_col
##    <chr>  <date>        <dbl> <list>          
##  1 FB     2013-01-02     28   <df [0 × 0]>    
##  2 FB     2013-01-03     27.8 <df [0 × 0]>    
##  3 FB     2013-01-04     28.8 <df [0 × 0]>    
##  4 FB     2013-01-07     29.4 <df [0 × 0]>    
##  5 FB     2013-01-08     29.1 <df [5 × 2]>    
##  6 FB     2013-01-09     30.6 <df [5 × 2]>    
##  7 FB     2013-01-10     31.3 <df [5 × 2]>    
##  8 FB     2013-01-11     31.7 <df [5 × 2]>    
##  9 FB     2013-01-14     31.0 <df [5 × 2]>    
## 10 FB     2013-01-15     30.1 <df [5 × 2]>    
## # ℹ 998 more rowsNow unnesting directly:
## # A tibble: 5,020 × 5
##    symbol date       adjusted rolled_summary_type rolled_summary_val
##    <chr>  <date>        <dbl> <chr>                            <dbl>
##  1 FB     2013-01-08     29.1 mean                            28.6  
##  2 FB     2013-01-08     29.1 sd                               0.700
##  3 FB     2013-01-08     29.1 min                             27.8  
##  4 FB     2013-01-08     29.1 max                             29.4  
##  5 FB     2013-01-08     29.1 median                          28.8  
##  6 FB     2013-01-09     30.6 mean                            29.1  
##  7 FB     2013-01-09     30.6 sd                               1.03 
##  8 FB     2013-01-09     30.6 min                             27.8  
##  9 FB     2013-01-09     30.6 max                             30.6  
## 10 FB     2013-01-09     30.6 median                          29.1  
## # ℹ 5,010 more rowsFinally, if you want to actually keep those first few NA rows in the unnest, you can pass a data frame that is initialized with the same column names as the rest of the values.
rolling_summary <- rollify(~summary_df(.x), window = 5, 
                           unlist = FALSE, 
                           na_value = data.frame(rolled_summary_type = NA,
                                                 rolled_summary_val  = NA))
FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted))
FB_summarised %>% unnest(cols = summary_list_col)## # A tibble: 5,024 × 5
##    symbol date       adjusted rolled_summary_type rolled_summary_val
##    <chr>  <date>        <dbl> <chr>                            <dbl>
##  1 FB     2013-01-02     28   <NA>                            NA    
##  2 FB     2013-01-03     27.8 <NA>                            NA    
##  3 FB     2013-01-04     28.8 <NA>                            NA    
##  4 FB     2013-01-07     29.4 <NA>                            NA    
##  5 FB     2013-01-08     29.1 mean                            28.6  
##  6 FB     2013-01-08     29.1 sd                               0.700
##  7 FB     2013-01-08     29.1 min                             27.8  
##  8 FB     2013-01-08     29.1 max                             29.4  
##  9 FB     2013-01-08     29.1 median                          28.8  
## 10 FB     2013-01-09     30.6 mean                            29.1  
## # ℹ 5,014 more rowsA final use of this flexible function is to calculate rolling regressions.
A very ficticious example is to perform a rolling regression on the
FB dataset of the form
close ~ high + low + volume. Notice that we have 4 columns
to pass here. This is more complicated than a .x and
.y example, but have no fear. The arguments can be
specified in order as ..1, ..2, … for as far
as is required, or you can pass a freshly created anonymous function.
The latter is what we will do so we can preserve the names of the
variables in the regression.
Again, since this returns a linear model object, we will specify
unlist = FALSE. Unfortunately there is no easy default NA
value to pass here.
# Reset FB
data(FB)
rolling_lm <- rollify(.f = function(close, high, low, volume) {
                              lm(close ~ high + low + volume)
                           }, 
                      window = 5, 
                      unlist = FALSE)
FB_reg <- mutate(FB, roll_lm = rolling_lm(close, high, low, volume))
FB_reg## # A tibble: 1,008 × 9
##    symbol date        open  high   low close    volume adjusted roll_lm  
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl> <list>   
##  1 FB     2013-01-02  27.4  28.2  27.4  28    69846400     28   <lgl [1]>
##  2 FB     2013-01-03  27.9  28.5  27.6  27.8  63140600     27.8 <lgl [1]>
##  3 FB     2013-01-04  28.0  28.9  27.8  28.8  72715400     28.8 <lgl [1]>
##  4 FB     2013-01-07  28.7  29.8  28.6  29.4  83781800     29.4 <lgl [1]>
##  5 FB     2013-01-08  29.5  29.6  28.9  29.1  45871300     29.1 <lm>     
##  6 FB     2013-01-09  29.7  30.6  29.5  30.6 104787700     30.6 <lm>     
##  7 FB     2013-01-10  30.6  31.5  30.3  31.3  95316400     31.3 <lm>     
##  8 FB     2013-01-11  31.3  32.0  31.1  31.7  89598000     31.7 <lm>     
##  9 FB     2013-01-14  32.1  32.2  30.6  31.0  98892800     31.0 <lm>     
## 10 FB     2013-01-15  30.6  31.7  29.9  30.1 173242600     30.1 <lm>     
## # ℹ 998 more rowsTo get some useful information about the regressions, we will use
broom::tidy() and apply it to each regression using a
mutate() + map() combination.
FB_reg %>%
  filter(!is.na(roll_lm)) %>%
  mutate(tidied = purrr::map(roll_lm, broom::tidy)) %>%
  unnest(tidied) %>%
  select(symbol, date, term, estimate, std.error, statistic, p.value)## # A tibble: 4,016 × 7
##    symbol date       term         estimate     std.error statistic p.value
##    <chr>  <date>     <chr>           <dbl>         <dbl>     <dbl>   <dbl>
##  1 FB     2013-01-08 (Intercept) -2.84e- 1 10.2           -0.0279    0.982
##  2 FB     2013-01-08 high         7.09e- 1  1.95           0.364     0.778
##  3 FB     2013-01-08 low          2.70e- 1  2.16           0.125     0.921
##  4 FB     2013-01-08 volume       1.12e- 8  0.0000000266   0.422     0.746
##  5 FB     2013-01-09 (Intercept) -5.95e+ 0  7.48          -0.796     0.572
##  6 FB     2013-01-09 high         2.08e+ 0  1.88           1.10      0.468
##  7 FB     2013-01-09 low         -9.20e- 1  1.75          -0.526     0.692
##  8 FB     2013-01-09 volume      -1.45e-10  0.0000000168  -0.00861   0.995
##  9 FB     2013-01-10 (Intercept)  9.55e- 1  4.46           0.214     0.866
## 10 FB     2013-01-10 high         7.17e- 1  1.30           0.553     0.679
## # ℹ 4,006 more rowsThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.