The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
R Package for detecting data leakages in time series forecasting competitions.
The development version from GitHub with:
# install.packages("devtools")
::install_github("thiyangt/tsdataleaks")
devtoolslibrary(tsdataleaks)
To demonstrate the package functions, I created a small data set with 4 time series.
set.seed(2020)
<- rnorm(15)
a <- rnorm(10)
d <- list(
lst a = a,
b = c(a[10:15]+rep(8,6), rnorm(10), a[1:5], a[1:5]),
c = c(rnorm(10), -a[1:5]),
d = d,
e = d)
find_dataleaks
:
Exploit data leakslibrary(tsdataleaks)
library(magrittr)
library(tidyverse)
library(viridis)
# h - I assume test period length is 5 and took that as wind size, h.
<- find_dataleaks(lstx = lst, h=5, cutoff=1)
f1
f1$a
.id start end2 b 2 6
$b
.id start end1 a 1 5
2 b 17 21
4 c 11 15
$c
.id start end1 a 1 5
2 b 17 21
3 b 22 26
$d
.id start end5 e 6 10
$e
.id start end4 d 6 10
Interpretation: The first element in the list means the last 5
observations of the time series a
correlates with time
series b
observarion from 2 to 6.
viz_dataleaks
:
Visualise the data leaksviz_dataleaks(f1)
1]] [[
[[2]]
[[2]]$a
.id start end
2 b 2 6
[[2]]$b
.id start end
1 a 1 5
2 b 17 21
4 c 11 15
[[2]]$c
.id start end
1 a 1 5
2 b 17 21
3 b 22 26
[[2]]$d
.id start end
5 e 6 10
[[2]]$e
.id start end
4 d 6 10
reason_dataleaks
Display the reasons for data leaks and evaluate usefulness of data leaks towards the winning of the competition
<- reason_dataleaks(lstx = lst, finddataleaksout = f1, h=5)
r1
r11]]
[[
series1 .id start end dist_mean dist_sd is.useful.leak dist_cor1 a b 2 6 -8.0 0.0 useful 1
2 b a 1 5 0.0 0.0 useful 1
3 b b 17 21 0.0 0.0 useful 1
4 b c 11 15 -1.7 2.6 not useful -1
5 c a 1 5 1.7 2.6 useful -1
6 c b 17 21 1.7 2.6 useful -1
7 c b 22 26 1.7 2.6 not useful -1
8 d e 6 10 0.0 0.0 not useful 1
9 e d 6 10 0.0 0.0 not useful 1
reason1 add constant
2 exact match
3 exact match
4 multiply by -1 or negative constant value
5 multiply by -1 or negative constant value
6 multiply by -1 or negative constant value
7 multiply by -1 or negative constant value
8 exact match
9 exact match
2]] [[
= rnorm(15)
a <- list(
lst
a,c(a[10:15], rnorm(10), a[1:5], a[1:5]),
c(rnorm(10), a[1:5])
)<- find_dataleaks(lst, h=5) f1
viz_dataleaks(f1)
#> [[1]]
#>
#> [[2]]
#> [[2]]$`1`
#> .id start end
#> 2 2 2 6
#>
#> [[2]]$`2`
#> .id start end
#> 1 1 1 5
#> 2 2 17 21
#> 4 3 11 15
#>
#> [[2]]$`3`
#> .id start end
#> 1 1 1 5
#> 2 2 17 21
#> 3 2 22 26
reason_dataleaks(lst, f1, h=5)
#> [[1]]
#> series1 .id start end dist_mean dist_sd is.useful.leak dist_cor reason
#> 1 1 2 2 6 0 0 useful 1 exact match
#> 2 2 1 1 5 0 0 useful 1 exact match
#> 3 2 2 17 21 0 0 useful 1 exact match
#> 4 2 3 11 15 0 0 not useful 1 exact match
#> 5 3 1 1 5 0 0 useful 1 exact match
#> 6 3 2 17 21 0 0 useful 1 exact match
#> 7 3 2 22 26 0 0 not useful 1 exact match
#>
#> [[2]]
library(Mcomp)
data("M1")
<- subset(M1, "yearly")
M1Y <- lapply(M1Y, function(temp){temp$x})
M1Y_x <- find_dataleaks(M1Y_x, h=6, cutoff = 1)
m1y_f1
m1y_f1#> $YAF17
#> .id start end
#> 22 YAM6 9 14
#>
#> $YAM6
#> .id start end
#> 16 YAF17 16 21
#>
#> $YAM28
#> .id start end
#> 78 YAI21 16 21
#>
#> $YAB3
#> .id start end
#> 18 YAM2 14 19
#>
#> $YAB4
#> .id start end
#> 17 YAM1 15 20
#>
#> $YAI21
#> .id start end
#> 43 YAM28 16 21
#>
#> $YAG29
#> .id start end
#> 137 YAC15 6 11
viz_dataleaks(m1y_f1)
#> [[1]]
#>
#> [[2]]
#> [[2]]$YAF17
#> .id start end
#> 22 YAM6 9 14
#>
#> [[2]]$YAM6
#> .id start end
#> 16 YAF17 16 21
#>
#> [[2]]$YAM28
#> .id start end
#> 78 YAI21 16 21
#>
#> [[2]]$YAB3
#> .id start end
#> 18 YAM2 14 19
#>
#> [[2]]$YAB4
#> .id start end
#> 17 YAM1 15 20
#>
#> [[2]]$YAI21
#> .id start end
#> 43 YAM28 16 21
#>
#> [[2]]$YAG29
#> .id start end
#> 137 YAC15 6 11
reason_dataleaks(M1Y_x, m1y_f1, h=6, ang=90)
#> [[1]]
#> series1 .id start end dist_mean dist_sd is.useful.leak dist_cor
#> 1 YAF17 YAM6 9 14 5.4 0.4 not useful 1
#> 2 YAM6 YAF17 16 21 -5.4 0.4 not useful 1
#> 3 YAM28 YAI21 16 21 0.0 0.0 not useful 1
#> 4 YAB3 YAM2 14 19 0.0 0.0 useful 1
#> 5 YAB4 YAM1 15 20 0.0 0.0 useful 1
#> 6 YAI21 YAM28 16 21 0.0 0.0 not useful 1
#> 7 YAG29 YAC15 6 11 -36815.7 6159.2 useful 1
#> reason
#> 1 other transformation
#> 2 other transformation
#> 3 exact match
#> 4 exact match
#> 5 exact match
#> 6 exact match
#> 7 other transformation
#>
#> [[2]]
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.