Introduction to Pattern Sequence based Forecasting (PSF) algorithm

Neeraj Bokde, Gualberto Asencio-Cortes and Francisco Martinez-Alvarez

2016-08-19

Introduction

The Algorithm Pattern Sequence based Forecasting (PSF) was first proposed by Martinez Alvarez, et al., 2008 and then modified and suggested improvement by Martinez Alvarez, et al., 2011. The technical detailes are mentioned in referenced articles. PSF algorithm consists of various statistical operations like:

Example

This section discusses about the examples to introduce the use of the PSF package and to compare it with auto.arima() and ets() functions, which are well accepted functions in the R community working over time series forecasting techniques. The data used in this example are ’nottem’ and ’sunspots’ which are the standard time series dataset available in R. The ’nottem’ dataset is the average air temperatures at Nottingham Castle in degrees Fahrenheit, collected for 20 years, on monthly basis.

Similarly, ’sunspots’ dataset is mean relative sunspot numbers from 1749 to 1983, measured on monthly basis. First of all, the psf() function from PSF package is used to forecast the future values. For both datasets, all the recorded values except for the final year are considered as training data, and the last year is used for testing purposes. The predicted values for final year with psf() function for both datasets are now discussed.

Install library

Download the Package and install with instruction:

library(PSF)

Prediction results for ’nottem’ dataset with psf() function.

a <- psf(data = nottem, n.ahead = 12)
a
## $predictions
##  [1] 38.97692 38.71538 42.49231 46.32308 52.91538 57.97692 61.87692
##  [8] 60.19231 57.03846 49.42308 43.23846 40.21538
## 
## $k
## [1] 2
## 
## $w
## [1] 1

Prediction results for ’sunspots’ dataset with psf() function.

b <- psf(data = sunspots, n.ahead = 48)
b
## $predictions
##  [1] 52.625000 49.275000 48.050000 56.975000 45.675000 45.950000 37.550000
##  [8] 45.050000 46.000000 45.200000 39.325000 39.750000 34.100000 27.766667
## [15] 27.533333 36.700000 35.933333 36.233333 26.066667 28.666667 27.366667
## [22] 30.166667 23.000000 21.033333 21.366667 18.666667 20.133333 11.766667
## [29] 21.833333 17.700000 24.666667 29.366667  8.833333 14.866667 32.000000
## [36] 22.266667 16.225000 19.100000 14.200000 13.275000 20.350000 16.575000
## [43] 10.475000 21.475000 17.425000 27.075000 30.925000 28.325000
## 
## $k
## [1] 2
## 
## $w
## [1] 10

To represent the prediction performance in plot format, the psf_plot() function is used as shown in the following code.

Plot for ‘nottem’ dataset

psf_plot(data = nottem, predictions = a$predictions)

Plot for ‘sunspots’ dataset

psf_plot(data = sunspots, predictions = b$predictions)

Comparison of psf() with auto.arima() and ets() functions:

Example below shows the comparisons for psf(), auto.arima() and ets() functions when using the Root Mean Square Error (RMSE) parameter as metric, for ’sunspots’ dataset. In order to avail more accurate and robust comparison results, error values are calculated for 5 times and the mean value of error values for methods under comparison are also shown. These values clearly state that ‘psf()’ function is able to outperform the comparative time series prediction methods. Additionally, the reader might want to refer to the results published in the original work Martinez Alvarez et al. (2011), in which it was shown that PSF outperformed many different methods when applied to electricity prices and demand forecasting.

library(PSF)
library(forecast)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.1
options(warn=-1)
  
## Consider data `sunspots` with removal of last years's readings
# Training Data
x <- sunspots[1:2772]

# Test Data
y <- sunspots[2773:2820]

PSF <- NULL
ARIMA <- NULL
ETS <- NULL

for(i in 1:5)
{
  set.seed(i)
  
  # for PSF
  a <- psf(data = x, n.ahead = 48)$predictions
  
  # for ARIMA
  b <- forecast(auto.arima(x), 48)$mean
  
  # for ets
  c <- as.numeric(forecast(ets(x), 48)$mean)
  
  ## For Error Calculations
  # Error for PSF
  PSF[i] <- sqrt(mean((y - a)^2))
  # Error for ARIMA
  ARIMA[i] <- sqrt(mean((y - b)^2))
  # Error for ETS
  ETS[i] <- sqrt(mean((y - c)^2))

}

## Error values for PSF
  PSF
## [1] 61.08512 61.08512 61.08512 61.08512 61.08512
  mean(PSF)
## [1] 61.08512
## Error values for ARIMA
  ARIMA
## [1] 103.0719 103.0719 103.0719 103.0719 103.0719
  mean(ARIMA)
## [1] 103.0719
## Error values for ETS
  ETS
## [1] 70.66554 70.66554 70.66554 70.66554 70.66554
  mean(ETS)
## [1] 70.66554

References

Martínez-Álvarez, F., Troncoso, A., Riquelme, J.C. and Ruiz, J.S.A., 2008, December. LBF: A labeled-based forecasting algorithm and its application to electricity price time series. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on (pp. 453-461). IEEE.

Martinez Alvarez, F., Troncoso, A., Riquelme, J.C. and Aguilar Ruiz, J.S., 2011. Energy time series forecasting based on pattern sequence similarity. Knowledge and Data Engineering, IEEE Transactions on, 23(8), pp.1230-1243.