The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
dSVD
)In this vignette, we consider approximating a matrix as a product of two low-rank matrices (a.k.a., factor matrices).
Test data is available from toyModel
.
You will see that there are five blocks in the data matrix as follows.
Here, we introduce the ternary regularization to take {-1,0,1} values in \(U\) as below:
\[
X \approx U V' \ \mathrm{s.t.}\ U \in \{-1,0,1\},
\] where \(X\) (\(N \times M\)) is a data matrix, \(U\) (\(N \times
J\)) is a ternary score matrix, and \(V\) (\(M \times
J\)) is a loading matrix. In dcTensor
package, the
object function is optimized by combining gradient-descent algorithm
(Tsuyuzaki 2020) and ternary
regularization.
In STMF, a rank parameter \(J\)
(\(\leq \min(N, M)\)) is needed to be
set in advance. Other settings such as the number of iterations
(num.iter
) are also available. For the details of arguments
of dSVD, see ?dSVD
. After the calculation, various objects
are returned by dSVD
. STMF is achieved by specifying the
ternary regularization parameter as a large value like the below:
## List of 6
## $ U : num [1:100, 1:5] 0.00592 0.00582 0.00626 0.00641 0.00611 ...
## $ V : num [1:300, 1:5] 89.8 94.8 93.6 101 87.6 ...
## $ RecError : Named num [1:101] 1.00e-09 4.24e+05 3.67e+05 3.63e+05 3.65e+05 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ TrainRecError: Named num [1:101] 1.00e-09 4.24e+05 3.67e+05 3.63e+05 3.65e+05 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ TestRecError : Named num [1:101] 1e-09 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
## $ RelChange : Named num [1:101] 1.00e-09 9.70e-01 1.55e-01 1.25e-02 4.53e-03 ...
## ..- attr(*, "names")= chr [1:101] "offset" "1" "2" "3" ...
The reconstruction error (RecError
) and relative error
(RelChange
, the amount of change from the reconstruction
error in the previous step) can be used to diagnose whether the
calculation is converged or not.
layout(t(1:2))
plot(log10(out_STMF$RecError[-1]), type="b", main="Reconstruction Error")
plot(log10(out_STMF$RelChange[-1]), type="b", main="Relative Change")
The product of \(U\) and \(V\) shows whether the original data is
well-recovered by dSVD
.
recX <- out_STMF$U %*% t(out_STMF$V)
layout(t(1:2))
image.plot(X, main="Original Data", legend.mar=8)
image.plot(recX, main="Reconstructed Data (STMF)", legend.mar=8)
The histograms of \(U\) and \(V\) show that \(U\) looks ternary but \(V\) does not.
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] nnTensor_1.2.0 fields_15.2 viridisLite_0.4.2 spam_2.9-1
## [5] dcTensor_1.3.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.4 jsonlite_1.8.7 highr_0.10 compiler_4.3.1
## [5] maps_3.4.1 Rcpp_1.0.11 plot3D_1.4 tagcloud_0.6
## [9] jquerylib_0.1.4 scales_1.2.1 yaml_2.3.7 fastmap_1.1.1
## [13] ggplot2_3.4.3 R6_2.5.1 tcltk_4.3.1 knitr_1.43
## [17] MASS_7.3-60 dotCall64_1.0-2 misc3d_0.9-1 tibble_3.2.1
## [21] munsell_0.5.0 pillar_1.9.0 bslib_0.5.1 RColorBrewer_1.1-3
## [25] rlang_1.1.1 utf8_1.2.3 cachem_1.0.8 xfun_0.40
## [29] sass_0.4.7 cli_3.6.1 magrittr_2.0.3 digest_0.6.33
## [33] grid_4.3.1 rTensor_1.4.8 lifecycle_1.0.3 vctrs_0.6.3
## [37] evaluate_0.21 glue_1.6.2 fansi_1.0.4 colorspace_2.1-0
## [41] rmarkdown_2.24 pkgconfig_2.0.3 tools_4.3.1 htmltools_0.5.6
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.