The purpose of this documentation is to understand how the Normalization & SAX indexing
step works. The goal of this step is prepare the dataset to the process.
This step is fundamental to ensure that the data is in the same scale/basis. To do this normalization the method Z-score
is used.
The observations of subsequences trends to be normally distributed. Thereby, the discretization space is made over the Gaussian curve in different intervals with the same probability. To encode values, we must give a number of letters in the alphabet.
SAX Encoding with 3 letters
head(STMotif::example_dataset[,1:10])
#> 1 2 3 4 5 6 7 8 9 10
#> 360 737 1350 869 750 1138 758 1006 1095 99 -83
#> 361 283 565 504 317 1849 944 -80 -895 -936 906
#> 362 -118 -375 -564 -803 870 472 -922 -1009 -698 741
#> 363 -696 -844 -654 -1303 -474 -591 -262 1034 1012 376
#> 364 -251 -622 -14 -587 -1108 -1401 404 1545 1696 247
#> 365 645 -10 -4 411 -858 -1261 -574 -329 -367 -680
head(NormSAX(D = STMotif::example_dataset, a = 7)[,1:10])
#> 1 2 3 4 5 6 7 8 9 10
#> 1 e f e e f e f f d d
#> 2 d e e d f f d b b e
#> 3 d c c c e e b b c e
#> 4 c b c b c c c f f e
#> 5 c c d c b b e f f d
#> 6 e d d e b b c c c c