The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview of the ksm package

Leo Belzile

2026-06-06

The ksm package provides functionalities for the kernel density estimation for random symmetric positive definite matrices. It provides C++ wrappers for estimating the density, with additional functionalities for estimating the optimal bandwidth according to the least square cross validation (LSCV) and the leave-one-out cross (LCV) validation criteria. Three kernels are implemented: the Gaussian (smnorm), log-Gaussian (smlnorm) and Wishart (Wishart).

Evaluation of the kernel and bandwidth optimization

set.seed(2025)
library(ksm)
#> 
#> Attaching package: 'ksm'
#> The following object is masked from 'package:stats':
#> 
#>     rWishart
d <- 4L
# Equicorrelation matrix
S <- diag(rep(0.5, d)) + matrix(0.5, d, d)
# Generate simulated data from an inverse Wishart (d by d by n array).
samp <- rinvWishart(n = 100, df = 10, S)
dim(samp)
#> [1]   4   4 100
# Optimize the bandwidth for kernel using leave-one-out cross validation
b <- bandwidth_optim(
  x = samp, 
  kernel = "smlnorm", 
  criterion = "lcv")
# Evaluate kernel with a new psd matrix data point
newsamp <- rinvWishart(n = 2, df = 10, S = S)
# Kernel density (default to log scale)
kdens_symmat(
  x = newsamp, 
  xs = samp, 
  kernel = "smlnorm", 
  b = b)
#> [1] 15.710544  3.460046

We can also consider stationary data and exclude observations at lag \(h\) to account for serial dependence. This is illustrated below with a time series of realized variance covariance matrix between two assets over 250 days.

data(realvar, package = "ksm")
dim(realvar)
#> [1]   2   2 250
# Select suitable lag for least square cross validation
h <- ceiling(dim(realvar)[3]^0.25)
# Calculate optimal bandwidth
bopt <- bandwidth_optim(
  x = realvar, 
  kernel = "Wishart", 
  criterion = "lscv",
  h = h)
# Evaluate at multiple values of bandwidth
bseq <- seq(0.5 * bopt, 2 * bopt, length.out = 101)
lscv <- ksm::lscv_kdens_symmat(
  x = realvar, 
  b = bseq, 
  h = h, 
  kernel = "Wishart")
with(lscv, 
     plot(x = b, 
          y = lscv,
          type = "l", 
          panel.first = {
            abline(v = bopt, lty = 2, col = "grey")
            },
          xlab = "bandwidth",
          ylab = "least square cross validation"
          )
     )

Integrating over positive definite matrices

We can also make integration in dimensions \(d \in \{2, 3\}\) to evaluate numerically the performance of kernels on simulated data. The function integrate_spd performs a transformation of inputs for numerical integration with routines from the cubature package. This is useful for checking correct implementation: below, we check that the density of a Wishart integrates to unity over the space of positive definite matrices.

Different methods for integration are provided: we recommend in particular cuhre, suave or hcubature. The lb argument controls the smaller value for the eigenvalues, which sometimes returns matrices that are close to being non-positive definite. If such an error mistake arises, consider setting lb to a small value.

# Check that Wishart density integrates to unity
(int <- integrate_spd(
  dim = 2L,
  neval = 1e5L,
  f = function(x, S){
   dWishart(x, df = 10, S = S, log = FALSE)},
  S = diag(2),lb = 0, method = "hcubature"))
#> $integral
#> [1] 0.9999433
#> 
#> $error
#> [1] 0.0009986894
#> 
#> $neval
#> [1] 30723
#> 
#> $returnCode
#> [1] 0
isTRUE(all.equal(int$integral, 1, tol = 2*int$error))
#> [1] TRUE

The main use for the integrate_spd function is to calculate the integrated squared error for data simulated from some density. We do this here for different models that generate independent and identically distributed matrices from a mixture of Wishart densities. The function here relies on predefined models from functions simu_rdens and simu_fdens used in the simulation studies of the paper for the independent and identically distributed scenario.

# Integrated squared error
ISE <- function(
  S,
  x,
  bandwidth,
  model = 1:6,
  kernel = c("Wishart", "smlnorm", "smnorm"),
  ...
) {
  model <- match.arg(as.character(model), choices = 1:6)
  kernel <- match.arg(kernel)
  dim <- ncol(x)
  # Compute the squared difference between the kernel and the target density
  (
    ksm::kdens_symmat(
      x = S,
      xs = x,
      b = bandwidth,
      kernel = kernel,
      log = FALSE
    ) -
      ksm::simu_fdens(S, model = model, d = dim)
  )^2
}

# Calculate numerical integral
d <- 2L
xs <- simu_rdens(n = 100, model = 2, d = d)
b <- bandwidth_optim(
  x = xs, 
  criterion = "lcv", 
  kernel = "smnorm")
# Computing ISE
ISE(
  S = xs[,,1, drop = FALSE], 
  x = xs, 
  bandwidth = b, 
  model = 2, 
  kernel = "smnorm")
#> [1] 8.525459e-05
# Next integrate over the space of psd matrices
ISE_int <- try(
            ksm::integrate_spd(
              f = function(S) {
                ISE(
                  S,
                  x = xs,
                  kernel = "smnorm",  # (only for lcv)
                  bandwidth = b,
                  model = 2
                )
              },
              dim = d,
              method = "hcubature",
              lb = 0,
              ub = Inf,
              neval = 1e6L
            ),
            silent = TRUE
          )
ISE_int
#> $integral
#> [1] 0.002425163
#> 
#> $error
#> [1] 2.42419e-06
#> 
#> $neval
#> [1] 66561
#> 
#> $returnCode
#> [1] 0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.