| Type: | Package | 
| Title: | Bayesian Sparse Estimation of a Covariance Matrix | 
| Version: | 1.0.3 | 
| Date: | 2025-08-14 | 
| Maintainer: | Kyeongwon Lee <kwlee1718@gmail.com> | 
| Author: | Kwangmin Lee [aut], Kyeongwon Lee [aut, cre], Kyoungjae Lee [aut], Seongil Jo [aut], Jaeyong Lee [ctb] | 
| Depends: | R (≥ 4.2) | 
| Description: | Bayesian estimations of a covariance matrix for multivariate normal data. Assumes that the covariance matrix is sparse or band matrix and positive-definite. Methods implemented include the beta-mixture shrinkage prior (Lee et al. (2022) <doi:10.1016/j.jmva.2022.105067>), screened beta-mixture prior (Lee et al. (2024) <doi:10.1214/24-BA1495>), and post-processed posteriors for banded and sparse covariances (Lee et al. (2023) <doi:10.1214/22-BA1333>; Lee and Lee (2023) <doi:10.1016/j.jeconom.2023.105475>). This software has been developed using funding supported by Basic Science Research Program through the National Research Foundation of Korea ('NRF') funded by the Ministry of Education ('RS-2023-00211979', 'NRF-2022R1A5A7033499', 'NRF-2020R1A4A1018207' and 'NRF-2020R1C1C1A01013338'). | 
| Imports: | GIGrvg, coda, progress, BayesFactor, MASS, mvnfast, matrixcalc, matrixStats, purrr, dplyr, RSpectra, Matrix, plyr, CholWishart, magrittr, future, furrr, ks, ggplot2, ggmcmc, caret, FinCovRegularization, mvtnorm, stats, patchwork, reshape2, future.apply | 
| Suggests: | hdbinseg, POET, tidyquant, tidyr, timetk, quantmod | 
| License: | GPL-2 | 
| LazyLoad: | yes | 
| URL: | https://github.com/statjs/bspcov | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-08-14 02:52:08 UTC; runner | 
| LazyData: | true | 
| LazyDataCompression: | xz | 
| Repository: | CRAN | 
| Date/Publication: | 2025-08-18 19:20:30 UTC | 
SP500 dataset
Description
The S&P 500 dataset from State Street Global Advisors with the collection period from Jan 2013 to Nov 2023.
Format
list
Source
State Street Global Advisors.
Examples
data("SP500")
names(SP500)
Bayesian Estimation of a Banded Covariance Matrix
Description
Provides a post-processed posterior for Bayesian inference of a banded covariance matrix.
Usage
bandPPP(X, k, eps, prior = list(), nsample = 2000)
Arguments
| X | a n  | 
| k | a scalar value (natural number) specifying the bandwidth of covariance matrix. | 
| eps | a small positive number decreasing to  | 
| prior | a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
 | 
| nsample | a scalar value giving the number of the post-processed posterior samples. | 
Details
Lee, Lee, and Lee (2023+) proposed a two-step procedure generating samples from the post-processed posterior for Bayesian inference of a banded covariance matrix:
- Initial posterior computing step: Generate random samples from the following initial posterior obtained by using the inverse-Wishart prior - IW_p(B_0, \nu_0)- \Sigma \mid X_1, \ldots, X_n \sim IW_p(B_0 + nS_n, \nu_0 + n),- where - S_n = n^{-1}\sum_{i=1}^{n}X_iX_i^\top.
- Post-processing step: Post-process the samples generated from the initial samples - \Sigma_{(i)} := \left\{\begin{array}{ll}B_{k}(\Sigma^{(i)}) + \left[\epsilon_n - \lambda_{\min}\{B_{k}(\Sigma^{(i)})\}\right]I_p, & \mbox{ if } \lambda_{\min}\{B_{k}(\Sigma^{(i)})\} < \epsilon_n, \\ B_{k}(\Sigma^{(i)}), & \mbox{ otherwise }, \end{array}\right.
where \Sigma^{(1)}, \ldots, \Sigma^{(N)} are the initial posterior samples,
\epsilon_n is a small positive number decreasing to 0 as n \rightarrow \infty,
and B_k(B) denotes the k-band operation given as
B_{k}(B) = \{b_{ij}I(|i - j| \le k)\} \mbox{ for any } B = (b_{ij}) \in R^{p \times p}.
For more details, see Lee, Lee and Lee (2023+).
Value
| Sigma | a nsample  | 
| p | dimension of covariance matrix. | 
Author(s)
Kwangmin Lee
References
Lee, K., Lee, K., and Lee, J. (2023+), "Post-processes posteriors for banded covariances", Bayesian Analysis, DOI: 10.1214/22-BA1333.
See Also
cv.bandPPP estimate
Examples
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X,2,0.01,nsample=100)
Bayesian Sparse Covariance Estimation
Description
Provides a Bayesian sparse and positive definite estimate of a covariance matrix via the beta-mixture shrinkage prior.
Usage
bmspcov(
  X,
  Sigma,
  prior = list(),
  nsample = list(),
  nchain = 1,
  seed = NULL,
  do.parallel = FALSE,
  show_progress = TRUE
)
Arguments
| X | a n  | 
| Sigma | an initial guess for Sigma. | 
| prior | a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
 | 
| nsample | a list giving the MCMC parameters.
The list includes the following integers (with default values in parentheses):
 | 
| nchain | number of MCMC chains to run. Default is 1. | 
| seed | random seed for reproducibility. If NULL, no seed is set. For multiple chains, each chain gets seed + i - 1. | 
| do.parallel | logical indicating whether to run multiple chains in parallel using future.apply. Default is FALSE. When TRUE, automatically sets up a multisession plan with nchain workers if no parallel plan is already configured. | 
| show_progress | logical indicating whether to show progress bars during MCMC sampling. Default is TRUE. Automatically set to FALSE for individual chains when running in parallel. | 
Details
Lee, Jo and Lee (2022) proposed the beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix.
The beta-mixture shrinkage prior for \Sigma = (\sigma_{jk}) is defined as
 \pi(\Sigma) = \frac{\pi^{u}(\Sigma)I(\Sigma \in C_p)}{\pi^{u}(\Sigma \in C_p)}, ~ C_p = \{\mbox{ all } p \times p \mbox{ positive definite matrices }\},
where \pi^{u}(\cdot) is the unconstrained prior given by
\pi^{u}(\sigma_{jk} \mid \rho_{jk}) = N\left(\sigma_{jk} \mid 0, \frac{\rho_{jk}}{1 - \rho_{jk}}\tau_1^2\right)
\pi^{u}(\rho_{jk}) = Beta(\rho_{jk} \mid a, b), ~ \rho_{jk} = 1 - 1/(1 + \phi_{jk})
\pi^{u}(\sigma_{jj}) = Exp(\sigma_{jj} \mid \lambda).
For more details, see Lee, Jo and Lee (2022).
Value
| Sigma | a nmc  | 
| Phi | a nmc  | 
| p | dimension of covariance matrix. | 
| mcmctime | elapsed time for MCMC sampling. For multiple chains, this becomes a list. | 
| nchain | number of chains used. | 
| burnin | number of burn-in samples discarded. | 
| nmc | number of MCMC samples retained for analysis. | 
Author(s)
Kyoungjae Lee, Seongil Jo, and Kyeongwon Lee
References
Lee, K., Jo, S., and Lee, J. (2022), "The beta-mixture shrinkage prior for sparse covariances with near-minimax posterior convergence rate", Journal of Multivariate Analysis, 192, 105067.
See Also
sbmspcov
Examples
set.seed(1)
n <- 20
p <- 5
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X))))
post.est.m <- bspcov::estimate(fout)
sqrt(mean((post.est.m - True.Sigma)^2))
sqrt(mean((cov(X) - True.Sigma)^2))
# Multiple chains example with parallel processing:
# fout_multi <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X))), 
#                               nchain = 4, do.parallel = TRUE)
# post.est.multi <- bspcov::estimate(fout_multi)
# When do.parallel = TRUE, the function automatically sets up 
# a multisession plan with nchain workers for parallel execution.
colon dataset
Description
The colon cancer dataset, which includes gene expression values from 22 colon tumor tissues and 40 non-tumor tissues.
Format
data.frame
Source
http://genomics-pubs.princeton.edu/oncology/affydata/.
Examples
data("colon")
head(colon)
CV for Bayesian Estimation of a Banded Covariance Matrix
Description
Performs leave-one-out cross-validation (LOOCV) to calculate the predictive log-likelihood for a post-processed posterior of a banded covariance matrix and selects the optimal parameters.
Usage
cv.bandPPP(X, kvec, epsvec, prior = list(), nsample = 2000, ncores = 2)
Arguments
| X | a n  | 
| kvec | a vector of natural numbers specifying the bandwidth of covariance matrix. | 
| epsvec | a vector of small positive numbers decreasing to  | 
| prior | a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
 | 
| nsample | a scalar value giving the number of the post-processed posterior samples. | 
| ncores | a scalar value giving the number of CPU cores. | 
Details
The predictive log-likelihood for each k and \epsilon_n is estimated as follows:
 \sum_{i=1}^n \log S^{-1} \sum_{s=1}^S p(X_i \mid B_k^{(\epsilon_n)}(\Sigma_{i,s})),
 
where X_i is the ith observation, \Sigma_{i,s} is the sth posterior sample based on (X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n), and B_k^{(\epsilon_n)} represents the banding post-processing function.
For more details, see (3) in Lee, Lee and Lee (2023+).
Value
| elpd | a M  | 
Author(s)
Kwangmin Lee
References
Lee, K., Lee, K., and Lee, J. (2023+), "Post-processes posteriors for banded covariances", Bayesian Analysis, DOI: 10.1214/22-BA1333.
Gelman, A., Hwang, J., and Vehtari, A. (2014). "Understanding predictive information criteria for Bayesian models." Statistics and computing, 24(6), 997-1016.
See Also
bandPPP
Examples
Sigma0 <- diag(1,50)
X <- mvtnorm::rmvnorm(25,sigma = Sigma0)
kvec <- 1:2
epsvec <- c(0.01,0.05)
res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4)
plot(res)
CV for Bayesian Estimation of a Sparse Covariance Matrix
Description
Performs cross-validation to estimate spectral norm error for a post-processed posterior of a sparse covariance matrix.
Usage
cv.thresPPP(
  X,
  thresvec,
  epsvec,
  prior = NULL,
  thresfun = "hard",
  nsample = 2000,
  ncores = 2
)
Arguments
| X | a n  | 
| thresvec | a vector of real numbers specifying the parameter of the threshold function. | 
| epsvec | a vector of small positive numbers decreasing to  | 
| prior | a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
 | 
| thresfun | a string to specify the type of threshold function.  | 
| nsample | a scalar value giving the number of the post-processed posterior samples. | 
| ncores | a scalar value giving the number of CPU cores. | 
Details
Given a set of train data and validation data, the spectral norm error for each \gamma and \epsilon_n is estimated as follows:
 ||\hat{\Sigma}(\gamma,\epsilon_n)^{(train)} - S^{(val)}||_2
 
where \hat{\Sigma}(\gamma,\epsilon_n)^{(train)} is the estimate for the covariance based on the train data and S^{(val)} is the sample covariance matrix based on the validation data.
The spectral norm error is estimated by the 10-fold cross-validation.
For more details, see the first paragraph on page 9 in Lee and Lee (2023).
Value
| CVdf | a M  | 
Author(s)
Kwangmin Lee
References
Lee, K. and Lee, J. (2023), "Post-processes posteriors for sparse covariances", Journal of Econometrics, 236(3), 105475.
See Also
thresPPP
Examples
Sigma0 <- diag(1,50)
X <- mvtnorm::rmvnorm(25,sigma = Sigma0)
thresvec <- c(0.01,0.1)
epsvec <- c(0.01,0.1)
res <- bspcov::cv.thresPPP(X,thresvec,epsvec,nsample=100)
plot(res)
Point-estimate of posterior distribution
Description
Compute the point estimate (mean) to describe posterior distribution. For multiple chains, combines all chains to compute a more robust estimate.
Usage
estimate(object, ...)
## S3 method for class 'bspcov'
estimate(object, ...)
Arguments
| object | an object from bandPPP, bmspcov, sbmspcov, and thresPPP. | 
| ... | additional arguments for estimate. | 
Value
| Sigma | the point estimate (mean) of covariance matrix. For multiple chains, uses combined samples from all chains. | 
Author(s)
Seongil Jo and Kyeongwon Lee
See Also
plot.postmean.bspcov
Examples
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X,2,0.01,nsample=100)
est <- bspcov::estimate(res)
Plot Diagnostics of Posterior Samples and Cross-Validation
Description
Provides a trace plot of posterior samples and a plot of a learning curve for cross-validation. For multiple chains, each chain is displayed in a different color for convergence assessment.
Usage
## S3 method for class 'bspcov'
plot(x, ..., cols, rows)
Arguments
| x | an object from bmspcov, sbmspcov, cv.bandPPP, and cv.thresPPP. | 
| ... | additional arguments for ggplot2. | 
| cols | a scalar or a vector including specific column indices for the trace plot. | 
| rows | a scalar or a vector including specific row indices greater than or equal to columns indices for the trace plot. | 
Value
| plot | a plot for diagnostics of posterior samples x. For multiple chains, returns colored trace plots with each chain in a different color. | 
Author(s)
Seongil Jo and Kyeongwon Lee
Examples
set.seed(1)
n <- 100
p <- 20
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
plot(fout, cols = c(1, 3, 4), rows = c(1, 3, 4))
plot(fout, cols = 1, rows = 1:3)
# Cross-Validation for Banded Structure
Sigma0 <- diag(1,50)
X <- mvtnorm::rmvnorm(25,sigma = Sigma0)
kvec <- 1:2
epsvec <- c(0.01,0.05)
res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4)
plot(res)
Plot method for postmean.bspcov objects
Description
Create heatmap visualization for posterior mean estimate of sparse covariance matrix. Provides flexible visualization options with customizable aesthetics and labeling.
Usage
## S3 method for class 'postmean.bspcov'
plot(
  x,
  title = NULL,
  color_limits = NULL,
  color_low = "black",
  color_high = "white",
  base_size = 14,
  legend_position = "bottom",
  x_label = "Variable",
  y_label = "Variable",
  show_values = FALSE,
  ...
)
Arguments
| x | an object of class  | 
| title | character string for plot title. If NULL, auto-generated title is used. | 
| color_limits | optional vector of length 2 specifying color scale limits. If NULL, computed from data. | 
| color_low | color for low values in heatmap. Default is "black". | 
| color_high | color for high values in heatmap. Default is "white". | 
| base_size | base font size for plot theme. Default is 14. | 
| legend_position | position of legend. Default is "bottom". | 
| x_label | label for x-axis. Default is "Variable". | 
| y_label | label for y-axis. Default is "Variable". | 
| show_values | logical indicating whether to display values on tiles. Default is FALSE. | 
| ... | additional arguments passed to plotting functions. | 
Value
A ggplot object showing heatmap visualization of the posterior mean covariance matrix.
Author(s)
Seongil Jo, Kyeongwon Lee
See Also
estimate, plot.bspcov, plot.quantile.bspcov
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100)
est <- bspcov::estimate(res)
# Basic plot
plot(est)
# Plot with custom color scheme
plot(est, color_low = "blue", color_high = "red")
Plot method for quantile.bspcov objects
Description
Create visualization plots for posterior quantiles of covariance matrices. Produces heatmap visualizations showing uncertainty across different quantile levels.
Usage
## S3 method for class 'quantile.bspcov'
plot(
  x,
  type = "heatmap",
  titles = NULL,
  ncol = 3,
  color_limits = NULL,
  color_low = "black",
  color_high = "white",
  base_size = 14,
  legend_position = "bottom",
  x_label = "Variable",
  y_label = "Variable",
  width = NULL,
  height = 6,
  ...
)
Arguments
| x | an object of class  | 
| type | character string specifying plot type. Options are "heatmap" (default), "uncertainty", or "comparison". | 
| titles | character vector of titles for each quantile plot. If NULL, auto-generated titles are used. | 
| ncol | number of columns in the plot layout. Default is 3. | 
| color_limits | optional vector of length 2 specifying color scale limits. If NULL, computed from data. | 
| color_low | color for low values in heatmap. Default is "black". | 
| color_high | color for high values in heatmap. Default is "white". | 
| base_size | base font size for plot theme. Default is 14. | 
| legend_position | position of legend. Default is "bottom". | 
| x_label | label for x-axis. Default is "Variable". | 
| y_label | label for y-axis. Default is "Variable". | 
| width | plot width when saving. Default is calculated based on number of quantiles. | 
| height | plot height when saving. Default is 6. | 
| ... | additional arguments passed to plotting functions. | 
Value
A ggplot object (single quantile) or patchwork object (multiple quantiles) showing heatmap visualizations.
Author(s)
Kyeongwon Lee
See Also
quantile, plot.bspcov, plot.postmean.bspcov
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100)
quant <- quantile(res)
# Plot uncertainty visualization
plot(quant)
# Plot with custom titles and labels
plot(quant, titles = c("Lower Bound", "Median", "Upper Bound"),
     x_label = "Variable", y_label = "Variable")
Preprocess SP500 data
Description
The proc_SP500 function preprocesses the SP500 stock data by calculating monthly
returns for selected sectors and generating idiosyncratic errors.
Usage
proc_SP500(SP500data, sectors)
Arguments
| SP500data | A data frame containing SP500 stock data with columns including: 
 | 
| sectors | A character vector specifying the sectors to include in the analysis. | 
Details
- Calculates monthly returns for each stock in the specified sectors 
- Estimates the number of factors via - hdbinseg::get.factor.model(ic="ah")
- Uses - POET::POET()to extract factor loadings/factors and form idiosyncratic errors
Value
A list with components:
| Uhat | A matrix of idiosyncratic errors. | 
| Khat | Estimated number of factors. | 
| factorparthat | Estimated factor returns. | 
| sectornames | Sector for each column of  | 
Examples
data("SP500")
set.seed(1234)
sectors <- c("Energy", "Financials", "Materials",
   "Real Estate", "Utilities", "Information Technology")
Uhat <- proc_SP500(SP500, sectors)$Uhat
PPPres <- thresPPP(Uhat, eps = 0, thres = list(value = 0.0020, fun = "hard"), nsample = 100)
postmean <- estimate(PPPres)
diag(postmean) <- 0 # hide color for diagonal
plot(postmean)
Preprocess Colon Gene Expression Data
Description
The proc_colon function preprocesses colon gene expression data by:
- Log transforming the raw counts. 
- Performing two-sample t-tests for each gene between normal and tumor samples. 
- Selecting the top 50 genes by absolute t-statistic. 
- Returning the filtered expression matrix and sample indices/groups. 
Usage
proc_colon(colon, tissues)
Arguments
| colon | A numeric matrix of raw colon gene expression values (genes × samples). Rows are genes; columns are samples. | 
| tissues | A numeric vector indicating tissue type per sample: positive for normal, negative for tumor. | 
Value
A list with components:
- X
- A numeric matrix (samples x 50 genes) of selected, log‐transformed expression values. 
- normal_idx
- Integer indices of normal‐tissue columns in the original data. 
- tumor_idx
- Integer indices of tumor‐tissue columns in the original data. 
- group
- Integer vector of length - ncol(colon), with 1 = normal, 2 = tumor.
Examples
data("colon")
data("tissues")
set.seed(1234)
colon_data <- proc_colon(colon, tissues)
X <- colon_data$X
foo <- bmspcov(X, Sigma = cov(X))
sigmah <- estimate(foo)
Quantiles of posterior distribution
Description
Compute quantiles to describe posterior distribution. For multiple chains, combines all chains to compute more robust quantiles.
Usage
## S3 method for class 'bspcov'
quantile(x, probs = c(0.025, 0.5, 0.975), ...)
Arguments
| x | an object from bandPPP, bmspcov, sbmspcov, and thresPPP. | 
| probs | numeric vector of probabilities with values in [0,1]. Default is c(0.025, 0.5, 0.975). | 
| ... | additional arguments for quantile. | 
Value
| quantiles | a list containing quantile matrices for each probability level. For multiple chains, uses combined samples from all chains. | 
Author(s)
Kyeongwon Lee
See Also
estimate, plot.postmean.bspcov
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X, 2, 0.01, nsample=100)
quant <- quantile(res)
# Get 95% credible intervals
quant <- quantile(res, probs = c(0.025, 0.975))
Save quantile plot to file
Description
Convenience function to save quantile plots with appropriate dimensions.
Usage
save_quantile_plot(x, filename, width = NULL, height = 6, ...)
Arguments
| x | an object of class  | 
| filename | filename to save the plot. | 
| width | plot width. If NULL, calculated based on number of quantiles. | 
| height | plot height. Default is 6. | 
| ... | additional arguments passed to  | 
Examples
# Example with simulated data
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100)
quant <- quantile(res)
Bayesian Sparse Covariance Estimation using Sure Screening
Description
Provides a Bayesian sparse and positive definite estimate of a covariance matrix via screened beta-mixture prior.
Usage
sbmspcov(
  X,
  Sigma,
  cutoff = list(),
  prior = list(),
  nsample = list(),
  nchain = 1,
  seed = NULL,
  do.parallel = FALSE,
  show_progress = TRUE
)
Arguments
| X | a n  | 
| Sigma | an initial guess for Sigma. | 
| cutoff | a list giving the information for the threshold.
The list includes the following parameters (with default values in parentheses):
 | 
| prior | a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
 | 
| nsample | a list giving the MCMC parameters.
The list includes the following integers (with default values in parentheses):
 | 
| nchain | number of MCMC chains to run. Default is 1. | 
| seed | random seed for reproducibility. If NULL, no seed is set. For multiple chains, each chain gets seed + i - 1. | 
| do.parallel | logical indicating whether to run multiple chains in parallel using future.apply. Default is FALSE. When TRUE, automatically sets up a multisession plan with nchain workers if no parallel plan is already configured. | 
| show_progress | logical indicating whether to show progress bars during MCMC sampling. Default is TRUE. Automatically set to FALSE for individual chains when running in parallel. | 
Details
Lee, Jo, Lee, and Lee (2023+) proposed the screened beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix.
The screened beta-mixture shrinkage prior for \Sigma = (\sigma_{jk}) is defined as
 \pi(\Sigma) = \frac{\pi^{u}(\Sigma)I(\Sigma \in C_p)}{\pi^{u}(\Sigma \in C_p)}, ~ C_p = \{\mbox{ all } p \times p \mbox{ positive definite matrices }\},
where \pi^{u}(\cdot) is the unconstrained prior given by
\pi^{u}(\sigma_{jk} \mid \psi_{jk}) = N\left(\sigma_{jk} \mid 0, \frac{\psi_{jk}}{1 - \psi_{jk}}\tau_1^2\right), ~ \psi_{jk} = 1 - 1/(1 + \phi_{jk})
\pi^{u}(\psi_{jk}) = Beta(\psi_{jk} \mid a, b) ~ \mbox{for } (j, k) \in S_r(\hat{R})
\pi^{u}(\sigma_{jj}) = Exp(\sigma_{jj} \mid \lambda),
where S_r(\hat{R}) = \{(j,k) : 1 < j < k \le p, |\hat{\rho}_{jk}| > r\}, \hat{\rho}_{jk} are sample correlations, and r is a threshold given by user.
For more details, see Lee, Jo, Lee and Lee (2022+).
Value
| Sigma | a nmc  | 
| p | dimension of covariance matrix. | 
| Phi | a nmc  | 
| INDzero | a list including indices of off-diagonal elements screened by sure screening. For multiple chains, this becomes a list of lists. | 
| cutoff | the cutoff value specified by FNR-approach. For multiple chains, this becomes a list. | 
| mcmctime | elapsed time for MCMC sampling. For multiple chains, this becomes a list. | 
| nchain | number of chains used. | 
| burnin | number of burn-in samples discarded. | 
| nmc | number of MCMC samples retained for analysis. | 
Author(s)
Kyoungjae Lee, Seongil Jo, and Kyeongwon Lee
References
Lee, K., Jo, S., Lee, K., and Lee, J. (2023+), "Scalable and optimal Bayesian inference for sparse covariance matrices via screened beta-mixture prior", arXiv:2206.12773.
See Also
bmspcov
Examples
set.seed(1)
n <- 20
p <- 5
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
post.est.m <- bspcov::estimate(fout)
sqrt(mean((post.est.m - True.Sigma)^2))
sqrt(mean((cov(X) - True.Sigma)^2))
# Multiple chains example with parallel processing:
# fout_multi <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))), 
#                                nchain = 4, seed = 123, do.parallel = TRUE)
# post.est.multi <- bspcov::estimate(fout_multi)
# summary(fout_multi, cols = 1, rows = 1:3)  # Shows MCMC diagnostics
# plot(fout_multi, cols = 1, rows = 1:3)     # Shows colored trace plots
# When do.parallel = TRUE, the function automatically sets up 
# a multisession plan with nchain workers for parallel execution.
Summary of Posterior Distribution
Description
Provides the summary statistics for posterior samples of covariance matrix.
Usage
## S3 method for class 'bspcov'
summary(object, cols, rows, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), ...)
Arguments
| object | an object from bandPPP, bmspcov, sbmspcov, and thresPPP. | 
| cols | a scalar or a vector including specific column indices. | 
| rows | a scalar or a vector including specific row indices greater than or equal to columns indices. | 
| quantiles | a numeric vector of quantiles to compute. Default is c(0.025, 0.25, 0.5, 0.75, 0.975). | 
| ... | additional arguments for the summary function. | 
Value
| summary | a table of summary statistics including empirical mean, standard deviation, and quantiles for posterior samples. For multiple chains, also includes effective sample size (n_eff) and R-hat diagnostics. | 
Note
If both cols and rows are vectors, they must have the same length.
Author(s)
Seongil Jo and Kyeongwon Lee
Examples
# Example with simulated data
set.seed(1)
n <- 20
p <- 5
# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}
# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)
# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
summary(fout, cols = c(1, 3, 4), rows = c(1, 3, 4))
summary(fout, cols = 1, rows = 1:p)
# Custom quantiles:
summary(fout, cols = 1, rows = 1:3, quantiles = c(0.05, 0.5, 0.95))
Bayesian Estimation of a Sparse Covariance Matrix
Description
Provides a post-processed posterior (PPP) for Bayesian inference of a sparse covariance matrix.
Usage
thresPPP(X, eps, thres = list(), prior = list(), nsample = 2000)
Arguments
| X | a n  | 
| eps | a small positive number decreasing to  | 
| thres | a list giving the information for thresholding PPP procedure.
The list includes the following parameters (with default values in parentheses):
 | 
| prior | a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
 | 
| nsample | a scalar value giving the number of the post-processed posterior samples. | 
Details
Lee and Lee (2023) proposed a two-step procedure generating samples from the post-processed posterior for Bayesian inference of a sparse covariance matrix:
- Initial posterior computing step: Generate random samples from the following initial posterior obtained by using the inverse-Wishart prior - IW_p(B_0, \nu_0)- \Sigma \mid X_1, \ldots, X_n \sim IW_p(B_0 + nS_n, \nu_0 + n),- where - S_n = n^{-1}\sum_{i=1}^{n}X_iX_i^\top.
- Post-processing step: Post-process the samples generated from the initial samples - \Sigma_{(i)} := \left\{\begin{array}{ll}H_{\gamma_n}(\Sigma^{(i)}) + \left[\epsilon_n - \lambda_{\min}\{H_{\gamma_n}(\Sigma^{(i)})\}\right]I_p, & \mbox{ if } \lambda_{\min}\{H_{\gamma_n}(\Sigma^{(i)})\} < \epsilon_n, \\ H_{\gamma_n}(\Sigma^{(i)}), & \mbox{ otherwise }, \end{array}\right.
where \Sigma^{(1)}, \ldots, \Sigma^{(N)} are the initial posterior samples,
\epsilon_n is a positive real number, and H_{\gamma_n}(\Sigma) denotes the generalized threshodling operator given as
(H_{\gamma_n}(\Sigma))_{ij} = \left\{\begin{array}{ll}\sigma_{ij}, & \mbox{ if } i = j, \\
h_{\gamma_n}(\sigma_{ij}), & \mbox{ if } i \neq j, \end{array}\right.
where \sigma_{ij} is the (i,j) element of \Sigma and h_{\gamma_n}(\cdot) is a generalized thresholding function.
For more details, see Lee and Lee (2023).
Value
| Sigma | a nsample  | 
| p | dimension of covariance matrix. | 
Author(s)
Kwangmin Lee
References
Lee, K. and Lee, J. (2023), "Post-processes posteriors for sparse covariances", Journal of Econometrics.
See Also
cv.thresPPP
Examples
n <- 25
p <- 50
Sigma0 <- diag(1, p)
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0)
res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100)
est <- bspcov::estimate(res)
tissues dataset
Description
The tissues data of colon cancer dataset, which includes gene expression values from 22 colon tumor tissues and 40 non-tumor tissues.
Format
numeric
Source
http://genomics-pubs.princeton.edu/oncology/affydata/.
Examples
data("tissues")
head(tissues)