The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
kFoldMaskTensor
)In this vignette, we introduce a cross-validation approach using mask tensor (Fujita 2018; Owen 2009) to estimate the optimal rank parameter for matrix/tensor decomposition. Mask matrix/tensor has the same size of data matrix/tensor and contains only 0 or 1 elements, 0 means not observed value, and 1 means observed value. In this approach, only non-zero and non-NA elements are randomly specified as 0 and estimated by other elements reconstructed by the result of matrix/tensor decomposition. This can be considered a cross-validation approach because the elements specified as 1 are the training dataset and the elements specified as 0 are the test dataset.
Here, we use these packages.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Here, we use three types of demo data as follows:
To set mask matrix/tensor, here we use kFoldMaskTensor
,
which divides only the elements other than NA and 0 into k
mask matrices/tensors. In NMF
, the mask matrix can be
specified as M
and the rank parameter (J
) with
the smallest test error is considered the optimal rank. Here, three mask
matrices are prepared for each rank, and the average is used to estimate
the optimal rank.
out_NMF <- expand.grid(replicate=1:3, rank=factor(1:10), value=0)
count <- 1
for(i in 1:10){
masks_NMF <- kFoldMaskTensor(data_matrix, k=3)
for(j in 1:3){
out_NMF[count, 3] <- rev(
NMF(data_matrix,
M = masks_NMF[[j]],
J = i)$TestRecError)[1]
count <- count + 1
}
}
Looking at the average test error for each rank, the optimal rank appears to be around 8 to 10 (with some variation depending on random seeds).
ggplot(out_NMF, aes(x=rank, y=value)) +
geom_point() +
stat_summary(fun = mean, geom = "point", shape=21, size=3, fill="blue") +
stat_summary(fun = mean, geom = "line", colour = "blue", aes(group=1)) +
xlab("Rank") +
ylab("Test Reconstruction Error")
## # A tibble: 10 × 2
## rank Avg
## <fct> <dbl>
## 1 1 4193.
## 2 2 3332.
## 3 3 2119.
## 4 4 1364.
## 5 5 1313.
## 6 6 1313.
## 7 7 1312.
## 8 8 934.
## 9 9 710.
## 10 10 732.
## # A tibble: 1 × 2
## rank Avg
## <fct> <dbl>
## 1 9 710.
Same as NMF
, mask matrix M
can be specified
in NMTF
, and the rank parameter (rank
) with
the smallest test error is considered the optimal rank. The following
code is time-consuming and should be executed in your own
environment.
out_NMTF <- expand.grid(replicate=1:3, rank2=1:10, rank1=1:10, value=0)
rank_NMTF <- paste0(out_NMTF$rank1, "-", out_NMTF$rank2)
out_NMTF <- cbind(out_NMTF, rank=factor(rank_NMTF, levels=unique(rank_NMTF)))
count <- 1
for(i in 1:10){
for(j in 1:10){
masks_NMTF <- kFoldMaskTensor(data_matrix, k=3)
for(k in 1:3){
out_NMTF[count, 4] <- rev(
NMTF(data_matrix,
M = masks_NMTF[[k]],
rank = c(i, j))$TestRecError)[1]
count <- count + 1
}
}
}
ggplot(out_NMTF, aes(x=rank, y=value)) +
geom_point() +
stat_summary(fun = mean, geom = "point", shape=21, size=3, fill="blue") +
stat_summary(fun = mean, geom = "line", colour = "blue", aes(group=1)) +
xlab("Rank") +
ylab("Test Reconstruction Error") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Same as NMF
, mask matrices M
can be
specified in siNMF
, and the rank parameter (J
)
with the smallest test error is considered the optimal rank. The
following code is time-consuming and should be executed in your own
environment.
out_siNMF <- expand.grid(replicate=1:3, rank=factor(1:10), value=0)
count <- 1
for(i in 1:10){
masks_siNMF <- lapply(1:3, function(x){
list(
kFoldMaskTensor(data_matrices[[1]], k=3)[[x]],
kFoldMaskTensor(data_matrices[[2]], k=3)[[x]],
kFoldMaskTensor(data_matrices[[3]], k=3)[[x]])
})
for(j in 1:3){
out_siNMF[count, 3] <- rev(
siNMF(data_matrices,
M = masks_siNMF[[j]],
J = i)$TestRecError)[1]
count <- count + 1
}
}
Same as NMF
, mask matrices M
can be
specified in jNMF
, and the rank parameter (J
)
with the smallest test error is considered the optimal rank. The
following code is time-consuming and should be executed in your own
environment.
out_jNMF <- expand.grid(replicate=1:3, rank=factor(1:10), value=0)
count <- 1
for(i in 1:10){
masks_jNMF <- lapply(1:3, function(x){
list(
kFoldMaskTensor(data_matrices[[1]], k=3)[[x]],
kFoldMaskTensor(data_matrices[[2]], k=3)[[x]],
kFoldMaskTensor(data_matrices[[3]], k=3)[[x]])
})
for(j in 1:3){
out_jNMF[count, 3] <- rev(
jNMF(data_matrices,
M = masks_jNMF[[j]],
J = i)$TestRecError)[1]
count <- count + 1
}
}
Same as NMF
, mask tensor M
can be specified
in NTF
, and the rank parameter (rank
) with the
smallest test error is considered the optimal rank. The following code
is time-consuming and should be executed in your own environment.
out_NTF <- expand.grid(replicate=1:3, rank=factor(1:10), value=0)
count <- 1
for(i in 1:10){
masks_NTF <- kFoldMaskTensor(data_tensor, k=3)
for(j in 1:3){
out_NTF[count, 3] <- rev(
NTF(data_tensor,
M = masks_NTF[[j]],
rank = i)$TestRecError)[1]
count <- count + 1
}
}
Same as NMF
, mask tensor M
can be specified
in NTD
, and the rank parameter (rank
) with the
smallest test error is considered the optimal rank. The following code
is time-consuming and should be executed in your own environment.
out_NTD <- expand.grid(replicate=1:3, rank3=1:5, rank2=1:5, rank1=1:5, value=0)
rank_NTD <- paste0(out_NTD$rank1, "-", out_NTD$rank2,
"-", out_NTD$rank3)
out_NTD <- cbind(out_NTD, rank=factor(rank_NTD, levels=unique(rank_NTD)))
count <- 1
for(i in 1:5){
for(j in 1:5){
for(k in 1:5){
masks_NTD <- kFoldMaskTensor(data_tensor, k=3)
for(k in 1:3){
out_NTD[count, 5] <- rev(
NTD(data_tensor,
M = masks_NTD[[k]],
rank = c(i, j, k))$TestRecError)[1]
count <- count + 1
}
}
}
}
ggplot(out_NTD, aes(x=rank, y=value)) +
geom_point() +
stat_summary(fun = mean, geom = "point", shape=21, size=3, fill="blue") +
stat_summary(fun = mean, geom = "line", colour = "blue", aes(group=1)) +
xlab("Rank") +
ylab("Test Reconstruction Error") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 ggplot2_3.4.2 rTensor_1.4.8 nnTensor_1.3.0
##
## loaded via a namespace (and not attached):
## [1] viridis_0.6.3 sass_0.4.6 utf8_1.2.3 generics_0.1.3
## [5] tcltk_4.3.0 digest_0.6.31 magrittr_2.0.3 evaluate_0.21
## [9] grid_4.3.0 RColorBrewer_1.1-3 fastmap_1.1.1 maps_3.4.1
## [13] jsonlite_1.8.5 misc3d_0.9-1 gridExtra_2.3 fansi_1.0.4
## [17] spam_2.9-1 viridisLite_0.4.2 scales_1.2.1 jquerylib_0.1.4
## [21] cli_3.6.1 rlang_1.1.1 munsell_0.5.0 withr_2.5.0
## [25] cachem_1.0.8 yaml_2.3.7 tools_4.3.0 colorspace_2.1-0
## [29] vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.3 plot3D_1.4
## [33] MASS_7.3-60 pkgconfig_2.0.3 pillar_1.9.0 bslib_0.5.0
## [37] gtable_0.3.3 glue_1.6.2 Rcpp_1.0.10 fields_14.1
## [41] xfun_0.39 tibble_3.2.1 tidyselect_1.2.1 highr_0.10
## [45] knitr_1.43 farver_2.1.1 htmltools_0.5.5 rmarkdown_2.22
## [49] labeling_0.4.2 tagcloud_0.6 dotCall64_1.0-2 compiler_4.3.0
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.