Generation of candidates

Amin Bazaz, Heraldo Borges, Eduardo Ogasawara

2018-08-06

The purpose of this documentation is to understand how candidates are generated. To do this, before applying the process, there is a step of preparing the dataset. After that, the process can be started. The goal in the end is to find the candidates and gather all the necessary information.

Presentation of the process

The process is composed by several steps :

  1. Normalization
  2. Symbolic Aggregation ApproXimation (SAX)
  3. Partitioning spatial-time series into blocks
  4. Combination of bloked spatial-time series
  5. Combined Series Approach

Description of each step

Normalisation

This step is fundamental to ensure that the data is in the same scale / basis. To do this normalisation the method Z-score is used.

head(STMotif::example_dataset[,1:10])
#>        1    2    3     4     5     6    7     8    9   10
#> 360  737 1350  869   750  1138   758 1006  1095   99  -83
#> 361  283  565  504   317  1849   944  -80  -895 -936  906
#> 362 -118 -375 -564  -803   870   472 -922 -1009 -698  741
#> 363 -696 -844 -654 -1303  -474  -591 -262  1034 1012  376
#> 364 -251 -622  -14  -587 -1108 -1401  404  1545 1696  247
#> 365  645  -10   -4   411  -858 -1261 -574  -329 -367 -680
head(round(STSNormalization(vector = as.matrix(STMotif::example_dataset)),digits = 2)[,1:10])
#>         1     2     3     4     5     6     7     8     9    10
#> 360  0.21  0.39  0.25  0.21  0.33  0.22  0.29  0.32  0.02 -0.04
#> 361  0.07  0.16  0.14  0.08  0.54  0.27 -0.04 -0.28 -0.29  0.26
#> 362 -0.05 -0.12 -0.18 -0.25  0.25  0.13 -0.29 -0.31 -0.22  0.21
#> 363 -0.22 -0.26 -0.21 -0.40 -0.15 -0.19 -0.09  0.30  0.29  0.10
#> 364 -0.09 -0.20 -0.02 -0.19 -0.34 -0.43  0.11  0.45  0.50  0.06
#> 365  0.18 -0.01 -0.01  0.11 -0.27 -0.39 -0.18 -0.11 -0.12 -0.22

Symbolic Aggregation ApproXimation (SAX)

The observations of subsequences trends to be normally distributed. Thereby, the discretization space is made over the Gaussian curve in different intervals with same probability. To encode values, we must give a number of letters in the alphabet.

SAX Encoding with 3 letters

SAX Encoding with 3 letters

Partitioning spatial-time series into blocks

This step divides the original dataset into blocks. To give a shape to the blocks, there are 2 parameters (Spatial Slice and Time Slice).

Blocks creation

Blocks creation

Combination of bloked spatial-time series

The goal of this step is to create time series from blocks. After this combination, the spatial-time series present in the block are transformed into time series. After that we can use existing tools to find the candidates.

Combine the spatial-time series into the block

Combine the spatial-time series into the block

Combined Series Approach

Now we can run the motif discovery algorithm and find the candidates.

Application of motif discovery algorithm

Application of motif discovery algorithm