The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Mixture-of-Experts Wishart Models for Covariance Data
Version: 1.0
Description: Methods for maximum likelihood and Bayesian estimation for the Wishart mixture model and the mixture-of-experts Wishart (MoE-Wishart) model. The package provides four inference algorithms for these models, each implemented using the expectation–maximization (EM) algorithm for maximum likelihood estimation and a fully Bayesian approach via Gibbs-within-Metropolis–Hastings sampling.
Maintainer: Zhi Zhao <zhi.zhao@medisin.uio.no>
URL: https://github.com/zhizuio/moewishart
BugReports: https://github.com/zhizuio/moewishart/issues
License: GPL-3
VignetteBuilder: knitr
Depends: R (≥ 4.1.0)
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: utils, stats, loo
Suggests: rmarkdown, knitr
NeedsCompilation: no
Packaged: 2026-02-16 10:10:46 UTC; zhiz
Author: The Tien Mai [aut], Zhi Zhao [aut, cre]
Repository: CRAN
Date/Publication: 2026-02-19 19:50:02 UTC

Information criteria for Wishart mixtures and MoE models

Description

Compute AIC, BIC, and ICL for EM fits; and PSIS-LOO expected log predictive density (elpd_loo) for Bayesian fits. Supports mixturewishart (finite mixture) and moewishart (MoE with covariates in gating).

Usage

computeIC(fit)

Arguments

fit

A fitted object returned by mixturewishart() or moewishart().

Details

For EM fits:

Parameter counting k:

For Bayesian fits:

Notes:

Value

- For method="em": a list with fields AIC, BIC, ICL. - For method="bayes": a list with fields ICL and elpd of class '"loo"' as returned by [loo::loo()] that contains fields 'estimates', 'pointwise', 'diagnostics'

Examples


# Bayesian example (MoE)

# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
  Xq = 3, K = 3, betas = betas,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)
set.seed(123)
fit <- moewishart(
  dat$S,
  X = cbind(1, dat$X), K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
  mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
  niter = 100, burnin = 50
)
computeIC(fit)


density of Wishart distribution

Description

Compute the (log) density of a p-dimensional Wishart distribution W_p(\nu, \Sigma) at an SPD matrix S. Returns either the log-density or the density depending on logarithm.

Usage

dWishart(S, nu, Sigma, detS_val = NULL, logarithm = TRUE)

Arguments

S

Numeric p \times p SPD matrix at which to evaluate the density.

nu

Numeric. Degrees of freedom \nu (must exceed p-1).

Sigma

Numeric p \times p SPD scale matrix \Sigma.

detS_val

Optional numeric. Precomputed \log|S| to reuse; if NULL, it is computed internally.

logarithm

Logical. If TRUE, return log-density; otherwise return density.

Details

Let S \sim W_p(\nu, \Sigma) with degrees of freedom \nu and scale matrix \Sigma (SPD). The density is:

f(S \mid \nu, \Sigma) = \frac{|S|^{(\nu - p - 1)/2}\, \exp\{-\tfrac{1}{2}\,\mathrm{tr}(\Sigma^{-1}S)\}} {2^{\nu p/2}\,|\Sigma|^{\nu/2}\,\Gamma_p(\nu/2)},

where \Gamma_p(\cdot) is the multivariate gamma function and p is the dimension.

Note that (i) detS_val can be supplied to avoid recomputing \log|S|, which is useful inside EM/MCMC loops, and (ii) small diagonal jitter is added internally to S and \Sigma when computing determinants or solves for numerical stability.

Constraints: (i) S and \Sigma must be SPD, and (ii) the Wishart requires \nu > p - 1.

Value

A numeric scalar: the log-density if logarithm = TRUE, otherwise the density.

Examples


set.seed(123)
p <- 3
# Construct an SPD Sigma
A <- matrix(rnorm(p * p), p, p)
Sigma <- crossprod(A) + diag(p) * 0.5
# Draw a Wishart matrix using base stats::rWishart()
W <- drop(rWishart(1, df = p + 5, Sigma = Sigma))
# Evaluate log-density at W
dWishart(W, nu = p + 5, Sigma = Sigma, logarithm = TRUE)


Log multivariate gamma function

Description

Compute the log of the multivariate gamma function \log \Gamma_p(a) for dimension p and parameter a.

Usage

lmvgamma(a, p)

Arguments

a

Numeric. Argument of \Gamma_p(\cdot) (often \nu/2 in Wishart contexts).

p

Integer. Dimension p of the multivariate gamma.

Details

The multivariate gamma function \Gamma_p(a) is defined by:

\Gamma_p(a) = \pi^{\,p(p-1)/4} \prod_{j=1}^{p} \Gamma\!\left(a + \frac{1-j}{2}\right).

Constraints: (i) p \in \{1,2,\dots\} (positive integer), and (ii) a > (p-1)/2 to keep all gamma terms finite (as in the Wishart normalization constant).

Value

A numeric scalar equal to \log \Gamma_p(a).

Examples


# Dimension
p <- 3
# Evaluate log multivariate gamma at a = nu/2
nu <- p + 5
lmvgamma(a = nu / 2, p = p)


EM/Bayesian estimation for Wishart mixture model

Description

Fit finite mixtures of Wishart-distributed SPD matrices using either a Bayesian sampler or the EM algorithm. The input S_list is a list of p \times p SPD matrices. Under component k, S_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k) with degrees of freedom \nu_k and SPD scale matrix \Sigma_k. Mixture weights \pi_k sum to 1.

Usage

mixturewishart(
  S_list,
  K,
  niter = 3000,
  burnin = 1000,
  method = "bayes",
  thin = 1,
  alpha = NULL,
  nu0 = NULL,
  Psi0 = NULL,
  init_pi = NULL,
  init_nu = NULL,
  init_Sigma = NULL,
  marginal.z = TRUE,
  estimate_nu = TRUE,
  nu_prior_a = 2,
  nu_prior_b = 0.1,
  mh_sigma = 1,
  n_restarts = 3,
  restart_iters = 20,
  tol = 1e-06,
  verbose = TRUE
)

Arguments

S_list

List of length n of SPD matrices, each p \times p. These are the observed matrices modeled by a mixture of Wisharts.

K

Integer. Number of mixture components.

niter

Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations (alias to maxiter).

burnin

Integer. Number of burn-in iterations (Bayesian mode).

method

Character; one of c("bayes","em"). Selects sampler or optimizer.

thin

Integer. Thinning interval for saving draws (Bayesian).

alpha

Numeric vector length K (Dirichlet prior on \pi) or NULL to default to rep(1, K) (Bayesian).

nu0

Numeric. Inverse-Wishart prior df for \Sigma_k (Bayesian). Default: p + 2.

Psi0

Numeric p \times p SPD matrix. Inverse-Wishart prior scale for \Sigma_k (Bayesian). Default: diag(p).

init_pi

Optional numeric vector length K summing to 1. EM initialization for mixture weights. If NULL, random or data-driven initialization is used.

init_nu

Optional numeric vector length K of initial degrees of freedom. Used in both modes.

init_Sigma

Optional list of K SPD matrices (each p \times p). EM initialization for \Sigma_k.

marginal.z

Logical. If TRUE, integrates out \pi when sampling z (collapsed step) in Bayesian mode. If FALSE, samples z conditional on current \pi.

estimate_nu

Logical. If TRUE, estimate/update \nu_k (MH in Bayesian mode; Newton/EM in EM). If FALSE, \nu_k are fixed.

nu_prior_a

Numeric. Prior hyperparameter a for \nu_k (Bayesian), used when estimate_nu = TRUE.

nu_prior_b

Numeric. Prior hyperparameter b for \nu_k (Bayesian), used when estimate_nu = TRUE.

mh_sigma

Numeric scalar or length-K vector. Proposal sd for MH updates on \log(\nu_k) (Bayesian, when estimating \nu).

n_restarts

Integer. Number of random restarts for EM. Ignored in Bayesian mode.

restart_iters

Integer. Number of short EM iterations per restart used to select a good initialization. Ignored in Bayesian mode.

tol

Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally elsewhere.

verbose

Logical. If TRUE, print progress information.

Details

Mixture mixture model: p(S_i) = \sum_{k=1}^K \pi_k \, f_W(S_i \mid \nu_k, \Sigma_k).

Algorithms:

  1. method = "bayes": Samples latent labels z, weights \pi, component scales \Sigma_k, and optionally \nu_k. Uses a Dirichlet prior for \pi, inverse- Wishart prior for \Sigma_k, and a prior on \nu_k when estimate_nu = TRUE. Degrees-of-freedom are updated via MH on \log(\nu_k) with proposal sd mh_sigma. Can integrate out \pi when sampling z if marginal.z = TRUE.

  2. method = "em": Maximizes the observed-data log- likelihood via EM. The E-step computes responsibilities via Wishart log-densities. The M-step updates \pi_k and \Sigma_k; optionally updates \nu_k when estimate_nu = TRUE. Supports multiple random restarts.

Note that (i) All matrices in S_list must be SPD. Small ridge terms may be added internally for stability, and (ii) Multiple EM restarts are recommended for robustness on difficult datasets.

Value

A list whose structure depends on method:

Examples


# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
dat <- simData(n, p,
  K = 3,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)

set.seed(123)
fit <- mixturewishart(
  dat$S,
  K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # tune this for MH acceptance 20-40%
  niter = 100, burnin = 50
)

# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)


EM/Bayesian estimation for Wishart MoE model

Description

Fit a mixture-of-experts model for symmetric positive-definite (SPD) matrices with covariate-dependent mixing proportions (gating network). Components are Wishart-distributed. Supports Bayesian sampling and EM-based maximum-likelihood estimation.

Usage

moewishart(
  S_list,
  X,
  K,
  niter = 3000,
  burnin = 1000,
  method = "bayes",
  thin = 1,
  nu0 = NULL,
  Psi0 = NULL,
  init_nu = NULL,
  estimate_nu = TRUE,
  nu_prior_a = 2,
  nu_prior_b = 0.1,
  mh_sigma = 0.1,
  mh_beta = 0.05,
  sigma_beta = 10,
  init = NULL,
  tol = 1e-06,
  ridge = 1e-08,
  verbose = TRUE
)

Arguments

S_list

List of length n of SPD matrices, each p \times p. These are the observed responses modeled by the MoE.

X

Numeric matrix n \times q of covariates for the gating network. Include an intercept column if desired.

K

Integer. Number of mixture components (experts).

niter

Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations.

burnin

Integer. Number of burn-in iterations (Bayesian mode).

method

Character; one of c("bayes", "em"). Selects sampler or optimizer.

thin

Integer. Thinning interval for saving draws (Bayesian).

nu0

Numeric. Inverse-Wishart prior df for \Sigma_k (Bayesian). Default: p + 2 if NULL.

Psi0

Numeric p \times p SPD matrix. Inverse-Wishart prior scale for \Sigma_k (Bayesian). Default: diag(p) if NULL.

init_nu

Optional numeric vector length K of initial dfs \nu_k. Used for initialization.

estimate_nu

Logical. If TRUE, estimate \nu_k (MH in Bayesian; Newton/EM in EM). If FALSE, keep \nu_k fixed at init_nu.

nu_prior_a

Numeric. Prior hyperparameter a for \nu_k (Bayesian), used when estimate_nu = TRUE.

nu_prior_b

Numeric. Prior hyperparameter b for \nu_k (Bayesian), used when estimate_nu = TRUE.

mh_sigma

Numeric scalar or length-K vector. Proposal sd for MH updates on \log(\nu_k) (Bayesian, when estimating \nu).

mh_beta

Numeric scalar or length-K-1 vector. Proposal sd for MH updates of the free B columns (Bayesian).

sigma_beta

Numeric. Prior sd of the Gaussian prior on B (Bayesian).

init

Optional list with fields for EM initialization, e.g., beta, Sigma, nu. See return structure.

tol

Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally.

ridge

Numeric. Small diagonal ridge added to \Sigma_k updates in EM for numerical stability.

verbose

Logical. If TRUE, print progress information.

Details

MoE-Wishart Model:

Algorithms:

  1. Bayesian (method = "bayes"): Metropolis-within-Gibbs sampler for z, \Sigma_k, optional \nu_k, and B. Gaussian priors on B with sd sigma_beta. Proposals use mh_sigma for \log(\nu_k) and mh_beta for B.

  2. EM (method = "em"): E-step responsibilities using Wishart log-densities and softmax gating. M-step updates \Sigma_k, optional \nu_k, and B via weighted multinomial logistic regression (BFGS).

Note that: (i) include an intercept column in X; none is added by default, and (ii) all S_list elements must be SPD. A small ridge may be added for stability.

Value

A list whose fields depend on method:

Examples


# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
  Xq = 3, K = 3, betas = betas,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 16, 3)
)

set.seed(123)
fit <- moewishart(
  dat$S,
  X = cbind(1, dat$X), K = 3,
  mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
  mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
  niter = 500, burnin = 200
)

# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)


Dirichlet random sampling

Description

Generate random draws from a Dirichlet distribution with parameter vector \alpha \in \mathbb{R}_+^K. Each draw is a length-K probability vector on the simplex.

Usage

rdirichlet(n, alpha)

Arguments

n

Integer. Number of independent Dirichlet draws to generate.

alpha

Numeric vector of positive concentration parameters \alpha = (\alpha_1,\dots,\alpha_K). Its length K defines the dimension of the simplex.

Details

Definition: If Y_k \sim \mathrm{Gamma}(\alpha_k, 1) independently for k=1,\dots,K (shape \alpha_k, rate 1), then the normalized vector X_k = Y_k / \sum_{j=1}^K Y_j follows \mathrm{Dirichlet}(\alpha).

Note that alpha must be a numeric vector with strictly positive entries.

Value

A numeric matrix of size n \times K, where each row is an independent Dirichlet draw that sums to 1.

Examples


set.seed(123)
# 3-dimensional Dirichlet with asymmetric concentration
alpha <- c(2, 5, 3)
rdirichlet(5, alpha)


Fast sampler for the inverse-Wishart distribution

Description

Draw a random sample from an inverse-Wishart distribution \mathcal{IW}_p(\nu, \Psi) using the identity S \sim \mathcal{IW}_p(\nu, \Psi) iff S^{-1} \sim W_p(\nu, \Psi^{-1}). This implementation accepts \Psi^{-1} directly for speed.

Usage

sampleIW(nu, Psi_inv)

Arguments

nu

Numeric. Degrees of freedom \nu of the inverse-Wishart (must exceed p - 1).

Psi_inv

Numeric p \times p SPD matrix equal to \Psi^{-1}, the inverse of the inverse-Wishart scale matrix.

Details

Sampling scheme:

Parameterization:

Note that: (i) internally calls rWishart(1, df = nu, Sigma = Psi_inv), and (ii) returns solve(W); if numerical issues arise, consider adding a small ridge to \Psi^{-1} prior to sampling.

Value

A p \times p SPD matrix S distributed as \mathcal{IW}_p(\nu, \Psi).

Examples


set.seed(123)
p <- 3
# Construct an SPD scale matrix Psi
A <- matrix(rnorm(p * p), p, p)
Psi <- crossprod(A) + diag(p) * 0.5
Psi_inv <- solve(Psi)

# Draw one inverse-Wishart sample
S <- sampleIW(nu = p + 5, Psi_inv = Psi_inv)
S


Simulate data from a Wishart mixture or mixture-of-experts model

Description

Generate synthetic SPD matrices from either: (i) a finite mixture of Wishart components with fixed mixing proportions, or (ii) a mixture-of-experts (MoE) where mixing proportions depend on covariates via a softmax gating model.

Usage

simData(
  n = 200,
  p = 2,
  Xq = 0,
  K = NA,
  betas = NULL,
  pis = c(0.4, 0.6),
  nus = c(8, 12),
  Sigma = NULL
)

Arguments

n

Integer. Number of observations to simulate.

p

Integer. Dimension of the Wishart distribution (matrix size p \times p).

Xq

Integer. Number of covariates for the gating network (MoE case). If Xq = 0, a standard mixture (no covariates) is simulated.

K

Integer. Number of latent components. Required when Xq > 0. If Xq = 0, defaults to length(pis).

betas

Numeric matrix Xq \times K of gating coefficients used when Xq > 0. If NULL, random coefficients are generated and the last column is set to zero (reference class).

pis

Numeric vector of length K giving fixed mixture proportions when Xq = 0. Ignored when Xq > 0.

nus

Numeric vector length K, degrees of freedom \nu_k for each component (must exceed p - 1).

Sigma

Optional list length K of SPD scale matrices \Sigma_k (each p \times p). If NULL, defaults are generated based on K and p.

Details

Models:

Simulation steps:

  1. Construct pis:

    • If Xq = 0, replicate the provided pis over n rows.

    • If Xq > 0, generate X ~ N(0, I) and compute softmax probabilities using betas (last column set to zero by default identifiability).

  2. If Sigma is not provided, create default \Sigma_k matrices (SPD) depending on K and p.

  3. Sample labels z_i \sim \mathrm{Categorical}(\pi_i).

  4. Draw S_i from W_p(\nu_{z_i}, \Sigma_{z_i}) via rWishart.

Note that: (i) in the MoE case, no intercept is automatically added to X. Use Xq to include desired covariates; the default betas is randomly generated with betas[, K] = 0, and (ii) provided Sigma must be a list of SPD p \times p matrices. Provided nus must exceed p - 1.

Value

A list with the following elements:

Examples


# simulate data from mixture model (no covariates)
set.seed(123)
n <- 200 # subjects
p <- 10
dat <- simData(n, p,
  K = 3,
  pis = c(0.35, 0.40, 0.25),
  nus = c(8, 12, 3)
)
str(dat)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.