The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Examples of using runstats package

Marta Karas

2019-11-14

runstats

Package runstats provides methods for fast computation of running sample statistics for time series. The methods utilize Convolution Theorem to compute convolutions via Fast Fourier Transform (FFT). Implemented running statistics include:

mean,
standard deviation,
variance,
covariance,
correlation,
euclidean distance.

Installation

# devtools::install_github("martakarass/runstats")
install.packages("runstats")

Usage

library(runstats)

## Example: running correlation
x0 <- sin(seq(0, 2 * pi * 5, length.out = 1000))
x  <- x0 + rnorm(1000, sd = 0.1)
pattern <- x0[1:100]
out1 <- RunningCor(x, pattern)
out2 <- RunningCor(x, pattern, circular = TRUE)

## Example: running mean
x <- cumsum(rnorm(1000))
out1 <- RunningMean(x, W = 100)
out2 <- RunningMean(x, W = 100, circular = TRUE)

Running statistics

To better explain the details of running statistics, package’s function runstats.demo(func.name) allows to visualize how the output of each running statistics method is generated. To run the demo, use func.name being one of the methods’ names:

"RunningMean",
"RunningSd",
"RunningVar",
"RunningCov",
"RunningCor",
"RunningL2Norm".

## Example: demo for running correlation method  
runstats.demo("RunningCor")

## Example: demo for running mean method 
runstats.demo("RunningMean")

Performance

We use rbenchmark to measure elapsed time of RunningCov execution, for different lengths of time-series x and fixed length of the shorter pattern y.

library(rbenchmark)

set.seed (20181010)
x.N.seq <- 10^(3:7)
x.list  <- lapply(x.N.seq, function(N) runif(N))
y <- runif(100)

## Benchmark execution time of RunningCov 
out.df <- data.frame()
for (x.tmp in x.list){
  out.df.tmp <- benchmark("runstats" = runstats::RunningCov(x.tmp, y),
                          replications = 10,
                          columns = c("test", "replications", "elapsed",
                                      "relative", "user.self", "sys.self"))
  out.df.tmp$x_length <- length(x.tmp)
  out.df.tmp$pattern_length <- length(y)
  out.df <- rbind(out.df, out.df.tmp)
}

knitr::kable(out.df)

test	replications	elapsed	relative	user.self	sys.self	x_length	pattern_length
runstats	10	0.005	1	0.004	0.001	1000	100
runstats	10	0.023	1	0.018	0.004	10000	100
runstats	10	0.194	1	0.158	0.037	100000	100
runstats	10	1.791	1	1.656	0.125	1000000	100
runstats	10	20.234	1	17.660	2.514	10000000	100

Compare with a conventional method

To compare RunStats performance with “conventional” loop-based way of computing running covariance in R, we use rbenchmark package to measure elapsed time of RunStats::RunningCov and running covariance implemented with sapply loop, for different lengths of time-series x and fixed length of the shorter time-series y.

## Conventional approach 
RunningCov.sapply <- function(x, y){
  l_x <- length(x)
  l_y <- length(y)
  sapply(1:(l_x - l_y + 1), function(i){
    cov(x[i:(i+l_y-1)], y)
  })
}

set.seed (20181010)

out.df2 <- data.frame()
for (x.tmp in x.list[c(1,2,3,4)]){
  out.df.tmp <- benchmark("conventional" = RunningCov.sapply(x.tmp, y),
                          "runstats" = runstats::RunningCov(x.tmp, y),
                          replications = 10,
                          columns = c("test", "replications", "elapsed",
                                      "relative", "user.self", "sys.self"))
  out.df.tmp$x_length <- length(x.tmp)
  out.df2 <- rbind(out.df2, out.df.tmp)
}

Benchmark results

library(ggplot2)

plt1 <- 
  ggplot(out.df2, aes(x = x_length, y = elapsed, color = test)) + 
  geom_line() + geom_point(size = 3) + scale_x_log10() + 
  theme_minimal(base_size = 14) + 
  labs(x = "Vector length of x",
       y = "Elapsed [s]", color = "Method", 
       title = "Running covariance rbenchmark") + 
  theme(legend.position = "bottom")
plt2 <- 
  plt1 + 
  scale_y_log10() + 
  labs(y = "Log of elapsed [s]")

cowplot::plot_grid(plt1, plt2, nrow = 1, labels = c('A', 'B'))

Platform information

sessioninfo::platform_info()
#>  setting  value                       
#>  version  R version 3.5.2 (2018-12-20)
#>  os       macOS Mojave 10.14.2        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  C                           
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-11-14

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.