The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette demonstrates a complete workflow for using the BCFM (Bayesian Clustering Factor Models) package. We’ll use a simulated dataset that comes with the package to illustrate:
For this tutorial, we use a simulated dataset included in the BCFM package. The dataset contains 200 observations, 20 variables, and 5 time points, with a known structure of 4 groups and 3 latent factors.
# Load the simulated dataset
data("sim.data", package = "BCFM")
# Examine the structure
str(sim.data)
#> 'data.frame': 200 obs. of 20 variables:
#> $ V1 : num -0.131 -1.553 1.32 -1.078 -2.31 ...
#> $ V2 : num -0.19 -0.844 -1.358 0.999 -3.974 ...
#> $ V3 : num -1.4138 -1.8041 0.267 -0.0431 -0.1291 ...
#> $ V4 : num 0.466 -0.592 1.224 -0.853 -1.623 ...
#> $ V5 : num 0.591 1.446 0.154 -0.865 2.988 ...
#> $ V6 : num 0.0941 -1.5791 1.0666 -1.0235 -3.6528 ...
#> $ V7 : num 0.1117 0.0602 0.4311 -0.822 0.1544 ...
#> $ V8 : num -0.391 -1.544 -0.803 0.252 -1.14 ...
#> $ V9 : num 0.2517 0.0907 0.4541 0.0939 0.5604 ...
#> $ V10: num 0.4264 0.0708 0.4772 0.0174 -0.5723 ...
#> $ V11: num -0.0713 1.0613 -0.0336 -0.2751 3.0106 ...
#> $ V12: num 0.244 -0.591 0.517 -0.327 -2.487 ...
#> $ V13: num 0.424 1.532 -0.359 0.834 -0.112 ...
#> $ V14: num 0.0833 1.7325 1.8849 -2.0999 4.8114 ...
#> $ V15: num 0.696 0.717 0.111 0.247 0.648 ...
#> $ V16: num -0.0427 -1.108 1.2177 -1.1383 -3.4454 ...
#> $ V17: num -0.237 0.708 0.75 -1.155 0.565 ...
#> $ V18: num 0.078 0.252 1.533 -0.861 0.251 ...
#> $ V19: num -0.398 0.761 -0.24 0.509 0.775 ...
#> $ V20: num 0.12 0.447 1.179 -1.217 1.402 ...
# Extract dimensions
n_obs <- dim(sim.data)[1] # 200 observations
n_vars <- dim(sim.data)[2] # 20 variables
cat("Dataset dimensions:\n")
#> Dataset dimensions:
cat(" Observations:", n_obs, "\n")
#> Observations: 200
cat(" Variables:", n_vars, "\n")
#> Variables: 20The first step is to identify the optimal number of groups (clusters)
and latent factors. We use BCFM.model.selection() to fit
models across a grid of possible configurations and compare them using
IC.
# Specify variables to use for clustering
cluster.vars <- paste0("V", 1:n_vars)
# Create output directory for results (using tempdir() for CRAN compliance)
output_dir <- file.path(tempdir(), "BCFM_analysis")
# Run model selection
# Note: grouplist and factorlist can be vectors (e.g., c(2, 3, 4)) or sequences (e.g., 2:4)
# The function will fit a model for each combination of groups and factors
BCFM.model.selection(
data = sim.data,
cluster.vars = cluster.vars,
grouplist = 4, # Number of groups
factorlist = 3, # Number of factors
n.iter = 100, # Number of MCMC iterations (use 50000+ for real analysis)
burnin = 10, # Burnin for IC calculation (use 10000+ for real analysis)
every = 10, # Retains the output of every "every" MCMC iterations
cluster.size = 0.01, # Minimum proportion required for each cluster (default 0.05)
output_dir = output_dir, # Specify where to save results
seed = 123 # Optional seed for reproducibility
)
#> Created output directory: /var/folders/gy/w0gwgb4d1m95fnkc05slvsnr0000gn/T//RtmpKzt4l2/BCFM_analysis
#> Saving results to: /var/folders/gy/w0gwgb4d1m95fnkc05slvsnr0000gn/T//RtmpKzt4l2/BCFM_analysis
#> Factor: 3, Group: 4
#> Running BCFM
#> Done
#> Time: 0.2201362 secsNote: For demonstration, we use minimal iterations (100) and burnin (10). For actual analysis, use
n.iter = 50000andburnin = 10000or higher to ensure convergence.
Output Directory: The
output_dirparameter specifies where model results will be saved. By default, results are saved totempdir(). For your own analyses, you can specify any directory where you have write permissions.
After running model selection, check the output directory:
# Check what files were created
list.files(output_dir)
#> [1] "IC.Rdata" "results-covarianceF-g4-f3.Rdata"
# Load IC results (if multiple models were fitted)
load(file.path(output_dir, "IC.Rdata"))
print(IC.matrix)
#> F3
#> G4 5527.836
# Load the fitted model results
load(file.path(output_dir, "results-covarianceF-g4-f3.Rdata"))
# Examine the structure
str(SDresult$Result, max.level = 1)
#> List of 8
#> $ X : num [1:10, 1:200, 1:3, 1] 1.76 2.17 1.97 2.05 1.91 ...
#> $ mu : num [1:10, 1:4, 1:3] 2.29 2.32 2.02 2.28 2.36 ...
#> $ Omega : num [1:10, 1:4, 1:3, 1:3] 1.76 2.19 2.13 2.06 2.09 ...
#> $ B : num [1:10, 1:20, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#> $ sigma2: num [1:10, 1:20] 0.2369 0.0965 0.0877 0.0886 0.0817 ...
#> $ tau : num [1:10, 1:3] 0.535 0.308 0.431 0.22 0.326 ...
#> $ Z : num [1:10, 1:200, 1] 1 1 1 1 1 1 1 1 1 1 ...
#> $ probs : num [1:10, 1:4] 0.376 0.505 0.527 0.45 0.445 ...When testing multiple configurations (e.g.,
grouplist = 2:4, factorlist = 2:4), an
IC.Rdata file is created for model comparison.
The BCFM package provides comprehensive visualization functions to explore your results.
The factor loadings matrix B shows the relationship between observed variables and latent factors. Each column represents a latent factor, and each row represents a variable. Higher absolute values indicate stronger associations.
# Check dimensions and calculate posterior mean of B (after burnin)
n_iter <- dim(SDresult$Result$B)[1]
burnin <- min(100, floor(n_iter * 0.2)) # Use 20% as burnin or 100, whichever is smaller
cat("Total iterations:", n_iter, "\n")
#> Total iterations: 10
cat("Using burnin:", burnin, "\n")
#> Using burnin: 2
# Calculate posterior mean after burnin
B.matrix <- apply(SDresult$Result$B[burnin:n_iter, , ], MARGIN = c(2, 3), FUN = mean)
B.matrix <- as.data.frame(B.matrix)
# Set variable names
rownames(B.matrix) <- paste0("Var", 1:nrow(B.matrix))
# Set informative column names for factors
colnames(B.matrix) <- paste0("Factor ", 1:ncol(B.matrix))
# Print the factor loadings matrix
knitr::kable(round(B.matrix, 3),
caption = "Posterior Mean Factor Loadings",
align = 'c')| Factor 1 | Factor 2 | Factor 3 | |
|---|---|---|---|
| Var1 | 1.000 | 0.000 | 0.000 |
| Var2 | 0.105 | 1.000 | 0.000 |
| Var3 | -0.148 | 0.188 | 1.000 |
| Var4 | 0.921 | -0.299 | -0.062 |
| Var5 | 0.669 | -0.254 | -0.071 |
| Var6 | -0.154 | -0.578 | -0.237 |
| Var7 | 0.225 | -0.195 | 0.167 |
| Var8 | 0.027 | 0.334 | 0.232 |
| Var9 | 0.039 | -0.161 | -0.034 |
| Var10 | 0.147 | -0.013 | 0.028 |
| Var11 | -0.522 | -0.415 | -0.016 |
| Var12 | 0.645 | 0.039 | -0.359 |
| Var13 | -0.206 | 0.262 | -0.544 |
| Var14 | -0.165 | -1.159 | 0.103 |
| Var15 | 0.171 | -0.223 | -0.443 |
| Var16 | 0.978 | -0.084 | -0.273 |
| Var17 | 0.024 | -0.372 | -0.033 |
| Var18 | 0.256 | -0.387 | -0.086 |
| Var19 | -0.307 | 0.014 | -0.083 |
| Var20 | 0.086 | -0.513 | -0.054 |
Visualize the mean profiles for each cluster:
Explore the posterior distributions of cluster means:
Examine the posterior distributions of cluster membership probabilities:
Visualize the cluster membership assignments over MCMC iterations:
The BCFM package includes several other plotting functions to explore your results in depth:
ggplot_B.trace() - Check convergence of factor
loadingsggplot_B.CI() - Examine credible intervals for
loadingsggplot_omega.density() - Visualize within-cluster
covariance structuresggplot_sigma2.CI() - Examine error variancesggplot_tau.CI() - Examine factor variancesggplot_variability() - Understand variance
decomposition across factorsSee the function documentation for details.
The heatmap shows how observations are assigned to clusters. Stable assignments (consistent colors across iterations) indicate good convergence and clear cluster structure.
The loading matrix B shows which variables are associated with each latent factor. Large credible intervals that include zero suggest weak associations.
The latent profiles plot reveals the distinctive patterns of each cluster across variables, helping to interpret what makes each cluster unique.
The trace plots for factor loadings and cluster probabilities help assess MCMC convergence. Look for:
You can directly access various components of the fitted model:
# Cluster assignments (most likely cluster for each observation)
cluster_assignments <- SDresult$Result$Z
# Loading matrix
loadings <- SDresult$Result$B
# Cluster means
cluster_means <- SDresult$Result$mu
# Cluster probabilities
cluster_probs <- SDresult$Result$probs
# Factor scores
factor_scores <- SDresult$Result$X
# Show dimensions
cat("Dimensions of key components:\n")
#> Dimensions of key components:
cat(" B (loadings):", dim(loadings), "\n")
#> B (loadings): 10 20 3
cat(" mu (means):", dim(cluster_means), "\n")
#> mu (means): 10 4 3
cat(" X (scores):", dim(factor_scores), "\n")
#> X (scores): 10 200 3 1This vignette covered:
BCFM.model.selection() to fit modelsFor more details, see ?BCFM.model.selection and
individual function help pages.
sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 26.2
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] C/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.2 tidyr_1.3.2 dplyr_1.2.0 BCFM_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4 rstatix_0.7.3
#> [4] lattice_0.22-6 digest_0.6.39 magrittr_2.0.4
#> [7] evaluate_1.0.5 grid_4.4.3 RColorBrewer_1.1-3
#> [10] mvtnorm_1.3-3 fastmap_1.2.0 jsonlite_2.0.0
#> [13] backports_1.5.0 Formula_1.2-5 gridExtra_2.3
#> [16] purrr_1.2.1 scales_1.4.0 jquerylib_0.1.4
#> [19] abind_1.4-8 mnormt_2.1.2 cli_3.6.5
#> [22] rlang_1.1.7 LaplacesDemon_16.1.6 RcppArmadillo_15.2.3-1
#> [25] withr_3.0.2 cachem_1.1.0 yaml_2.3.12
#> [28] tools_4.4.3 parallel_4.4.3 ggsignif_0.6.4
#> [31] ggpubr_0.6.2 broom_1.0.12 vctrs_0.7.1
#> [34] R6_2.6.1 lifecycle_1.0.5 car_3.1-5
#> [37] psych_2.6.1 fastmatrix_0.6-6 pkgconfig_2.0.3
#> [40] pillar_1.11.1 bslib_0.10.0 gtable_0.3.6
#> [43] glue_1.8.0 Rcpp_1.1.1 xfun_0.56
#> [46] tibble_3.3.1 tidyselect_1.2.1 knitr_1.51
#> [49] farver_2.1.2 htmltools_0.5.9 nlme_3.1-167
#> [52] labeling_0.4.3 rmarkdown_2.30 carData_3.0-6
#> [55] compiler_4.4.3 S7_0.2.1These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.