Wavelet-based multivariate spectral analysis

Joris Chau

2017-12-09

Introduction

In multivariate time series analysis, the second-order behavior of a multivariate time series is studied by means of the autocovariance matrices in the time domain, or the spectral density matrix in the frequency domain. A non-degenerate spectral density matrix is necessarily a curve (or surface) of Hermitian positive definite (HPD) matrices, and one generally constrains a spectral matrix estimator to preserve this property. This in order to ensure interpretability of the estimator as a covariance or spectral matrix, but also to avoid computational issues in e.g. simulation or bootstrapping.

In (Chau and von Sachs), we develop intrinsic wavelet transforms and perform nonparametric wavelet regression for curves or surfaces in the non-Euclidean space of HPD matrices, exploiting the geometric structure of the space as a Riemannian manifold. Nonlinear wavelet denoising in the Riemannian manifold allows one to capture local smoothness behavior of the spectral matrix across frequency, but also varying degrees of smoothness across components of the spectral matrix, at the same time guaranteeing a HPD spectral estimator. Moreover, and in contrast to existing approaches, the wavelet-based spectral estimator in the space of HPD matrices endowed with a specific inveriant Riemannian metric is equivariant under a change or basis (i.e. change of coordinate system) of the given time series. For more details we refer to (Chau and von Sachs).

In this vignette, we demonstrate how to use the pdSpecEst package to perform intrinsic wavelet-based spectral estimation for stationary and nonstationary time series by linear or nonlinear wavelet denoising of the (time-varying) periodogram matrices in the intrinsic manifold wavelet domain. In addition to (time-varying) spectral matrix estimation, we consider fast fuzzy clustering for (time-varying) spectral matrices of replicated multivariate time series using the sparse representations of the spectral matrices in the intrinsic manifold wavelet domain.

Shiny app

A demo Shiny app for intrinsic wavelet-based (time-varying) spectral estimation and clustering is available here. The app allows the user to test, tune and time the wavelet-based (time-varying) spectral matrix estimation or wavelet-based spectral matrix clustering procedures on simulated multivariate time series data. The estimated spectral matrices (resp. their wavelet representations) and/or cluster assignments may be compared to the true generating spectral matrices (resp. their wavelet representations) and/or clusters, or in the case of the wavelet-based spectral estimate with a benchmark multitaper spectral estimate.

Stationary (1D) spectral matrix estimation

First, we demonstrate how to perform 1D spectral matrix estimation for stationary multivariate \(d\)-dimensional time series with pdSpecEst1D(). In this case the target spectrum is a curve of HPD matrices in the Riemannian manifold.

Simulate time series data with rExamples()

With rExamples() we simulate multivariate time series observations from an example HPD spectral matrix. Examples include: (i) a (\(3 \times 3\)) heaviSine HPD spectral matrix consisting of smooth sinosoids with a break, (ii) a (\(3 \times 3\)) bumps HPD spectral matrix containing peaks and bumps of various degrees of smoothness, (iii) a (\(3 \times 3\)) two-cats HPD spectral matrix visualizing the contour of two side-by-side cats, with inhomogeneous smoothness across frequency, and (iv) a (\(2 \times 2\)) Gaussian HPD spectral matrix consisting of smooth random Gaussian functions. The time series observations are generated via the transfer function of the example spectral matrix and complex normal random variates by means of its Cramér representation. In addition, the function returns the true generating spectral matrix of the time series observations and an initial noisy multitaper HPD periodogram obtained with pdPgram(). By default the periodogram is pre-smoothed using \(d\) (dimension of the time series) Slepian tapering functions to guarantee that the periodogram is everywhere HPD.

library(pdSpecEst)
## Generate example time series and periodogram
set.seed(123)
d <- 3
n <- 2^9 
example <- rExamples(2 * n, example = "bumps")
freq <- example$freq
str(example)
#> List of 4
#>  $ f   : cplx [1:3, 1:3, 1:512] 1.00+0.00i -3.05e-16+1.61e-15i -1.05e-15+1.19e-15i ...
#>  $ freq: num [1:512] 0 0.00614 0.01227 0.01841 0.02454 ...
#>  $ per : cplx [1:3, 1:3, 1:512] 0.346+0i -0.268+0.178i -0.313-0.049i ...
#>  $ ts  : cplx [1:1024, 1:3] 0.92+1.5i 1.86+1.21i 3.21+1.53i ...

Below we plot the simulated time series data and the generating spectral matrix in the frequency domain.

Figure 1: Generating (3 x 3)-dimensional spectral matrix

Figure 1: Generating (3 x 3)-dimensional spectral matrix

WavTransf1D() transforms the multivariate spectrum (i.e. a curve of HPD matrices) to the intrinsic manifold wavelet domain. By default the order of the intrinsic AI refinement scheme is order = 5 and the space of HPD matrices is equipped with the invariant Riemannian metric metric = 'Riemannian', but this can also be one of: 'logEuclidean', 'Cholesky' 'rootEuclidean' or 'Euclidean'.

wt.f <- WavTransf1D(example$f, periodic = T)

Below we plot the Frobenius norms of the Hermitian matrix-valued wavelet coefficients across scale-locations from the finest wavelet scale \(j = J - 1\) to the coarsest wavelet scale \(j = 0\), with \(J = \log_2(n)\).

Wavelet-denoised spectral estimator with pdSpecEst1D()

pdSpecEst1D() computes the HPD wavelet-denoised spectral matrix estimator by: (i) applying the intrinsic 1D AI wavelet transform (WavTransf1D) to an initial noisy HPD spectral estimate, (ii) (tree-structured) thresholding of the wavelet coefficients (pdCART) and (iii) applying the intrinsic inverse 1D AI wavelet transform (InvWavTransf1D). The complete estimation procedure is described in detail in (Chau and von Sachs).

f.hat <- pdSpecEst1D(example$per)
str(f.hat)
#> List of 5
#>  $ f           : cplx [1:3, 1:3, 1:512] 0.743-0i 0.367+0.699i -0.928+0.264i ...
#>  $ D           :List of 7
#>   ..$ D.scale0: cplx [1:3, 1:3, 1:3] -0.68-0i 0.061+0.355i -0.253+0.284i ...
#>   ..$ D.scale1: cplx [1:3, 1:3, 1:4] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale2: cplx [1:3, 1:3, 1:6] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale3: cplx [1:3, 1:3, 1:10] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale4: cplx [1:3, 1:3, 1:18] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale5: cplx [1:3, 1:3, 1:34] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale6: cplx [1:3, 1:3, 1:66] 0+0i 0+0i 0+0i ...
#>  $ M0          : cplx [1:3, 1:3, 1:5] 1.3195+0i -0.0064-0.1817i 0.0973-0.0783i ...
#>  $ tree.weights:List of 6
#>   ..$ : logi [1:2] TRUE TRUE
#>   ..$ : logi [1:4] TRUE TRUE TRUE TRUE
#>   ..$ : logi [1:8] TRUE TRUE FALSE FALSE FALSE FALSE ...
#>   ..$ : logi [1:16] TRUE TRUE TRUE FALSE FALSE FALSE ...
#>   ..$ : logi [1:32] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>   ..$ : logi [1:64] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  $ alpha.opt   : num 1

Nonlinear tree-structured wavelet thresholding

The noise is removed by nonlinear tree-structured thresholding of the wavelet coefficients based on the trace of the whitened coefficients through minimization of a complexity penalized residual sum of squares (CPRESS) criterion via the fast tree-pruning algorithm in (Donoho). The sparsity parameter is set equal to alpha times the universal threshold, where the noise standard deviation (homogeneous across scales) of the traces of the whitened coefficients is determined via the median absolute deviation (MAD) of the coefficients at the finest wavelet scale. If the thresholding policy is set to policy = 'universal', the sparsity parameter is set equal to the universal threshold. If the thresholding policy is set to policy = 'cv', a data-adaptive sparsity parameter is determined via a more time-consuming two-fold cross-validation procedure as in (Nason) relying on the chosen metric for the space of HPD matrices. Note that the two-fold cross-validation procedure works best if the noisy HPD matrix-valued observations are (approximately) independent in the frequency domain.

Linear wavelet thresholding

It is also possible to perform linear thresholding of wavelet scales using pdSpecEst1D() by setting the argument alpha = 0 (i.e. no nonlinear thresholding) and the argument jmax to the maximum wavelet scale we wish to keep in the intrinsic inverse AI wavelet transform. For instance, if jmax = 5 the wavelet coefficients at scales \(j\) with \(j \leq 5\) will not be altered, but all wavelet coefficients at scales \(j > 5\) will be set to zero.

Estimation results

The figures below show the Frobenius norms of the Hermitian matrix-valued wavelet coefficients of the initial noisy HPD periodogram matrix and the denoised HPD spectral matrix estimator in the intrinsic manifold wavelet domain.

The figure below shows the target spectral matrix (dashed lines) given by example$f and the estimated spectral matrix (continuous lines) obtained from f.hat$f at the frequencies examples$freq. The spectral estimator captures both the smooth spectral matrix behavior in the high-frequency range and the localized peaks in the low-frequency range, while guaranteeing positive definiteness of the estimator.

Figure 2: target spectral matrix (dashed lines) and estimated spectral matrix (continuous lines).

Figure 2: target spectral matrix (dashed lines) and estimated spectral matrix (continuous lines).

Depth-based bootstrap confidence regions with pdConfInt1D()

Given the wavelet-based spectral matrix estimator, we can assess its variability with pdConfInt1D(), which uses a parametric bootstrap procedure to construct depth-based confidence regions based on the intrinsic manifold data depths developed in (Chau, Ombao, and von Sachs). The bootstrap procedure exploits the data generating process of a stationary time series via its Cramér representation and substitutes the true generating transfer functions with the transfer functions based on a consistent spectral matrix estimator.

The following code constructs intrinsic depth-based simultaneous confidence regions at \((1-\alpha)\%\) confidence levels with \(\alpha = \{0.1, 0.05, 0.01 \}\) over the second half of the frequency domain (\(\omega \in (\pi/2, \pi]\)) based on the Log-Euclidean metric and the manifold spatial depth.

## Not run
boot.ci <- pdConfInt1D(f.hat$f, alpha = c(0.1, 0.05, 0.01), ci.region = c(0.5, 1), 
                       boot.samples = 1E3, f.0 = example$f)

The depth-based confidence balls (maximum, minimum depths and radii) are given by the component depth.CI. In particular, a given curve of HPD matrices is covered by the confidence ball if its manifold data depth with respect to the cloud of bootstrap spectral estimates is above the minimum depths given by the component depth.CI. The code line above also checks whether the (true) target spectral matrix example$f is covered by the confidence regions at the different confidence levels.

#> The depth-based confidence balls:
#> $spatial
#> , , ci.region.1
#> 
#>        max-depth min-depth   radius
#> 90%-CI         1 0.2428679 1.579807
#> 95%-CI         1 0.2295909 1.611627
#> 99%-CI         1 0.2119668 1.649800
#> Verification that the target spectral matrix is covered:
#> $spatial
#>        ci.region.1
#> 90%-CI        TRUE
#> 95%-CI        TRUE
#> 99%-CI        TRUE
#> Depth of the target spectrum w.r.t. cloud of bootstrap spectral estimates
#> $spatial
#> ci.region.1 
#>    0.426467

Nonstationary (2D) spectral matrix estimation

Second, we demonstrate how to perform 2D spectral matrix estimation for nonstationary multivariate \(d\)-dimensional time series with pdSpecEst2D(). In this case the target spectrum is a surface of HPD matrices in the Riemannian manifold and it suffices to replace the suffix -1D used in the functions above by the suffix -2D.

Simulate pseudo time-varying periodograms with rExamples2D()

rExamples2D() simulates noisy HPD pseudo time-varying periodogram observations from an example HPD time-varying spectral matrix. Examples include: (i) a (\(d \times d\)) smiley HPD spectral matrix consisting of constant surfaces of random HPD matrices in the shape of a smiley face, (ii) a (\(d \times d\)) tvar HPD spectral matrix generated from a time-varying vector auto-regressive process of order 1 with a random time-varying coefficient matrix (\(\Phi\)), (iii) a (\(d \times d\)) generally smooth HPD spectral matrix containing a pronounced peak in the center of the discretized time-frequency grid and (iv) a (\(d \times d\)) facets HPD spectral matrix consisting of several facets generated from random geodesic surfaces. Instead of generating nonstationary time series observations via the Cramér representation based on the tranfer function of the example spectral matrix as in rExamples(), which is relatively time-consuming, we directly generate pseudo HPD periodogram observations as independent complex random HPD Wishart matrices centered around the generating HPD spectral matrix. Informally, such random matrix behavior corresponds to the asymptotic distribution of the actual HPD periodogram observations (obtained with pdPgram2D()) of a multivariate time series with the given generating HPD spectral matrix.

## Generate example pseudo time-varying periodogram observations
set.seed(17)
d <- 2
n <- c(2^7, 2^7) 
example <- rExamples2D(n, d, example = "smiley", snr = 0.5)
tf.grid <- example$tf.grid ## time-frequency grid
str(example)
#> List of 3
#>  $ f      : cplx [1:2, 1:2, 1:128, 1:128] 3.78+0i 2.98-0.93i 2.98+0.93i ...
#>  $ tf.grid:List of 2
#>   ..$ time     : num [1:128] 0.00781 0.01562 0.02344 0.03125 0.03906 ...
#>   ..$ frequency: num [1:128] 0.00391 0.00781 0.01172 0.01562 0.01953 ...
#>  $ per    : cplx [1:2, 1:2, 1:128, 1:128] 1.43-0i 1.09-0.1i 1.09+0.1i ...

The figures below show the matrix-logarithms of the (true) target time-varying spectrum and the matrix-logarithms of the noisy HPD pseudo-periodogram observations. Here we only display the auto-spectrum and upper-diagonal cross-spectrum components of the spectral matrix as the lower-diagonal cross-spectrum components are the conjugate transpose of the upper-diagonal components.

Figure 3: Matrix-log's of target time-varying spectrum

Figure 3: Matrix-log’s of target time-varying spectrum

Figure 4: Matrix-log's of pseudo-periodogram matrix

Figure 4: Matrix-log’s of pseudo-periodogram matrix

Wavelet-denoised time-varying spectral estimator with pdSpecEst2D()

pdSpecEst2D() computes the HPD wavelet-denoised spectral matrix estimator by: (i) applying the intrinsic 2D AI wavelet transform (WavTransf2D) to an initial noisy HPD spectral estimate, (ii) (tree-structured) thresholding of the wavelet coefficients (pdCART) and (iii) applying the intrinsic inverse 1D AI wavelet transform (InvWavTransf2D). By default the marginal refinement orders of the forward and inverse 2D wavelet transform are order = c(3, 3) and the space of HPD matrices is equipped with the invariant Riemannian metric metric = 'Riemannian', but this can also be one of: 'logEuclidean', 'Cholesky' 'rootEuclidean' or 'Euclidean'.

f.hat <- pdSpecEst2D(example$per, order = c(1, 1), progress = F)
str(f.hat)
#> List of 5
#>  $ f           : cplx [1:2, 1:2, 1:128, 1:128] 3.35+0i 2.52-0.76i 2.52+0.76i ...
#>  $ D           :List of 6
#>   ..$ D.scale0: cplx [1:2, 1:2, 1:2, 1:2] -0.014+0i 0.0208+0.0171i 0.0208-0.0171i ...
#>   ..$ D.scale1: cplx [1:2, 1:2, 1:4, 1:4] 0.48+0i 0.558-0.202i 0.558+0.202i ...
#>   ..$ D.scale2: cplx [1:2, 1:2, 1:8, 1:8] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale3: cplx [1:2, 1:2, 1:16, 1:16] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale4: cplx [1:2, 1:2, 1:32, 1:32] 0+0i 0+0i 0+0i ...
#>   ..$ D.scale5: cplx [1:2, 1:2, 1:64, 1:64] 0+0i 0+0i 0+0i ...
#>  $ M0          : cplx [1:2, 1:2, 1, 1] 1.949-0i 0.905-0.204i 0.905+0.204i ...
#>  $ tree.weights:List of 5
#>   ..$ : logi [1:4, 1:4] TRUE TRUE TRUE TRUE TRUE TRUE ...
#>   ..$ : logi [1:8, 1:8] FALSE TRUE TRUE TRUE TRUE TRUE ...
#>   ..$ : logi [1:16, 1:16] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>   ..$ : logi [1:32, 1:32] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>   ..$ : logi [1:64, 1:64] FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  $ D.raw       :List of 6
#>   ..$ D.scale0: cplx [1:2, 1:2, 1:2, 1:2] -0.014+0i 0.0208+0.0171i 0.0208-0.0171i ...
#>   ..$ D.scale1: cplx [1:2, 1:2, 1:4, 1:4] 0.48+0i 0.558-0.202i 0.558+0.202i ...
#>   ..$ D.scale2: cplx [1:2, 1:2, 1:8, 1:8] 0.138+0i 0.138-0.084i 0.138+0.084i ...
#>   ..$ D.scale3: cplx [1:2, 1:2, 1:16, 1:16] 0.0341-0i 0.041-0.0146i 0.041+0.0146i ...
#>   ..$ D.scale4: cplx [1:2, 1:2, 1:32, 1:32] -0.00472-0i -0.01616+0.0187i -0.01616-0.0187i ...
#>   ..$ D.scale5: cplx [1:2, 1:2, 1:64, 1:64] -0.025+0i -0.0321-0.0192i -0.0321+0.0192i ...

Nonlinear tree-structured wavelet thresholding

The noise is removed by nonlinear tree-structured thresholding of the wavelet coefficients based on the trace of the whitened coefficients in the same way as in pdSpecEst1D() outlined above. For thresholding of 2D wavelet coefficients on a non-square time-frequency grid, there is a discrepancy between the constant noise variance of the traces of the whitened coefficients of the first \(|J_1 - J_2|\) scales and the remaining scales, where \(J_1 = \log_2(n_1)\) and \(J_2 = \log_2(n_2)\) with \(n_1\) and \(n_2\) the (dyadic) number of observations in each marginal direction of the 2D rectangular time-frequency grid. The reason is that the variances of the traces of the whitened coefficients are not homogeneous between: (i) scales at which the 1D wavelet refinement scheme is applied and (ii) scales at which the 2D wavelet refinement scheme is applied. To correct for this discrepancy, the variances of the coefficients at the 2D wavelet scales are normalized by the noise variance determined from the finest wavelet scale. The variances of the coefficients at the 1D wavelet scales are normalized using the theoretic noise variance of the traces of the whitened coefficients for a grid of complex random Wishart matrices, which corresponds to the distributional behavior of the pseudo HPD periodogram matrices, or the asymptotic distributional behavior of the actual HPD periodogram matrices. Note that if the 2D time-frequency grid of is a square grid, i.e. \(n_1 = n_2\), the variances of the traces of the whitened coefficients are again homogeneous across all wavelet scales.

Linear wavelet thresholding

Analogous to the function pdSpecEst1D(), linear thresholding of wavelet scales is performed by setting the argument alpha = 0 (i.e. no nonlinear thresholding) and the argument jmax to the maximum wavelet scale we wish to keep in the intrinsic inverse 2D AI wavelet transform.

Estimation result

The figure below shows the estimated time-varying spectral matrix obtained from f.hat$f at the time-frequency grid examples$tf.grid. The spectral estimator is able to capture both the smooth parts and the non-smooth edges in the spectrum, while guaranteeing positive definiteness of the estimator as in the stationary case.

Figure 5: Matrix-log's of estimated time-varying spectrum

Figure 5: Matrix-log’s of estimated time-varying spectrum

Stationary (1D) and nonstationary (2D) spectral matrix clustering

The intrinsic 1D or 2D AI wavelet transforms WavTransf1D() and WavTransf2D() can also be used for fast clustering of multivariate (time-varying) spectral matrices based on their sparse representations in the intrinsic manifold wavelet domain. To be precise, in the intrinsic manifold wavelet domain we combine both: (i) denoising by thresholding wavelet coefficients, and (ii) clustering of spectral matrices based on sparse representations (i.e. the non-zero wavelet coefficients). Such an approach allows for significant reduction of the computational effort of clustering estimated spectral matrices in comparison to a more naive approach, where we first compute individual spectral estimates and subsequently execute a clustering procedure based on integrated distances between spectral estimates in the frequency domain.

Wavelet-based spectral matrix clustering with pdSpecClust1D() and pdSpecClust2D()

pdSpecClust1D() and pdSpecClust2D() perform clustering of multivariate spectral matrices (resp. time-varying spectral matrices) via a two-step fuzzy clustering algorithm. Given initial noisy HPD spectral matrix estimates (e.g. multitaper HPD periodograms obtained with pdPgram(), resp. pdPgram2D()) for \(S\) different subjects. For each subject \(s = 1,\ldots,S\), thresholded wavelet coefficients in the intrinsic manifold wavelet domain are calculated internally using pdSpecEst1D(), resp. pdSpecEst2D() as described above. The \(S\) subjects are assigned to \(K\) different clusters in a probabilistic fashion according to a two-step procedure:

  1. In a first step, an intrinsic fuzzy c-medoids algorithm, with fuziness parameter \(m\) is applied to the subject-specific coarsest midpoints at scale \(j = 0\), relying on the metric that the space of HPD matrices gets equipped with.
  2. In a second step, a weighted fuzzy c-means algorithm based on the Euclidean distance (between Hermitian matrices), also with fuzziness parameter \(m\), is applied to the subject-specific non-zero thresholded whitened wavelet coefficients. Here, the Euclidean distance function is an appropriate distance measure between the whitened wavelet coefficients as they are elements of the real vector space of Hermitian matrices.

The maximum scale taken into account in the clustering procedure is set to the minimum of the argument jmax and the wavelet scale \(j\) for which the proportion of nonzero thresholded wavelet coefficients (averaged across subjects) is smaller than the argument d.jmax.

Example of 1D spectral matrix clustering

Below we simulate stationary two-dimensional time series data for ten different subjects from two slightly different vARMA(2,2) (vector-ARMA) processes. Here, the first group of five subjects shares the same spectrum and the second group of five subjects share a slightly different spectrum. We use pdSpecClust1D() to assign the different subjects to \(K=2\) clusters in a probabilistic fashion. Note that the true clusters are formed by the first group of five subjects and the last group of five subjects.

## Fix parameters
set.seed(123)
Phi1 <- array(c(0.5, 0, 0, 0.6, rep(0, 4)), dim = c(2, 2, 2))
Phi2 <- array(c(0.7, 0, 0, 0.4, rep(0, 4)), dim = c(2, 2, 2))
Theta <- array(c(0.5, -0.7, 0.6, 0.8, rep(0, 4)), dim = c(2, 2, 2))
Sigma <- matrix(c(1, 0.71, 0.71, 2), nrow = 2)

## Generate periodogram data for 10 subjects
pgram <- function(Phi) pdPgram(rARMA(2^10, 2, Phi, Theta, Sigma)$X)$P
P <- array(c(replicate(5, pgram(Phi1)), replicate(5, pgram(Phi2))), dim=c(2,2,2^9,10))

pdSpecClust1D(P, K = 2, metric = "logEuclidean")$cl.prob
#>              Cluster1    Cluster2
#> Subject1  0.004691701 0.995308299
#> Subject2  0.019086379 0.980913621
#> Subject3  0.035205375 0.964794625
#> Subject4  0.014209533 0.985790467
#> Subject5  0.035534099 0.964465901
#> Subject6  0.987094137 0.012905863
#> Subject7  0.984333526 0.015666474
#> Subject8  0.993840412 0.006159588
#> Subject9  0.996371332 0.003628668
#> Subject10 0.998297824 0.001702176

To conclude, we note again that an interactive demo Shiny app to test and tune the different estimation procedures detailed in this vignette is available at https://jchau.shinyapps.io/pdSpecEst/.

References

Chau, J., and R. von Sachs. “Positive Definite Multivariate Spectral Estimation: A Geometric Wavelet Approach.” http://arxiv.org/abs/1701.03314.

Chau, J., H. Ombao, and R. von Sachs. “Data Depth and Rank-Based Tests for Covariance and Spectral Density Matrices.” http://arxiv.org/abs/1706.08289.

Donoho, D.L. “CART and Best-Ortho-Basis: A Connection.” Annals of Statistics 25 (5).

Nason, G.P. “Wavelet Shrinkage Using Cross-Validation.” Journal of the Royal Statistical Society (Series B) 58.