The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting Started with BCFM: A Complete Workflow

Allison Tegge, Virginia Tech

February 06, 2026

Introduction

This vignette demonstrates a complete workflow for using the BCFM (Bayesian Clustering Factor Models) package. We’ll use a simulated dataset that comes with the package to illustrate:

Data preparation and exploration
Model selection across different numbers of groups and factors
Fitting the optimal model
Comprehensive visualization of results

Load and Explore the Data

For this tutorial, we use a simulated dataset included in the BCFM package. The dataset contains 200 observations, 20 variables, and 5 time points, with a known structure of 4 groups and 3 latent factors.

# Load the simulated dataset
data("sim.data", package = "BCFM")

# Examine the structure
str(sim.data)
#> 'data.frame':    200 obs. of  20 variables:
#>  $ V1 : num  -0.131 -1.553 1.32 -1.078 -2.31 ...
#>  $ V2 : num  -0.19 -0.844 -1.358 0.999 -3.974 ...
#>  $ V3 : num  -1.4138 -1.8041 0.267 -0.0431 -0.1291 ...
#>  $ V4 : num  0.466 -0.592 1.224 -0.853 -1.623 ...
#>  $ V5 : num  0.591 1.446 0.154 -0.865 2.988 ...
#>  $ V6 : num  0.0941 -1.5791 1.0666 -1.0235 -3.6528 ...
#>  $ V7 : num  0.1117 0.0602 0.4311 -0.822 0.1544 ...
#>  $ V8 : num  -0.391 -1.544 -0.803 0.252 -1.14 ...
#>  $ V9 : num  0.2517 0.0907 0.4541 0.0939 0.5604 ...
#>  $ V10: num  0.4264 0.0708 0.4772 0.0174 -0.5723 ...
#>  $ V11: num  -0.0713 1.0613 -0.0336 -0.2751 3.0106 ...
#>  $ V12: num  0.244 -0.591 0.517 -0.327 -2.487 ...
#>  $ V13: num  0.424 1.532 -0.359 0.834 -0.112 ...
#>  $ V14: num  0.0833 1.7325 1.8849 -2.0999 4.8114 ...
#>  $ V15: num  0.696 0.717 0.111 0.247 0.648 ...
#>  $ V16: num  -0.0427 -1.108 1.2177 -1.1383 -3.4454 ...
#>  $ V17: num  -0.237 0.708 0.75 -1.155 0.565 ...
#>  $ V18: num  0.078 0.252 1.533 -0.861 0.251 ...
#>  $ V19: num  -0.398 0.761 -0.24 0.509 0.775 ...
#>  $ V20: num  0.12 0.447 1.179 -1.217 1.402 ...

# Extract dimensions
n_obs <- dim(sim.data)[1]      # 200 observations
n_vars <- dim(sim.data)[2]     # 20 variables

cat("Dataset dimensions:\n")
#> Dataset dimensions:
cat("  Observations:", n_obs, "\n")
#>   Observations: 200
cat("  Variables:", n_vars, "\n")
#>   Variables: 20

Model Selection

The first step is to identify the optimal number of groups (clusters) and latent factors. We use BCFM.model.selection() to fit models across a grid of possible configurations and compare them using IC.

# Specify variables to use for clustering
cluster.vars <- paste0("V", 1:n_vars)

# Create output directory for results (using tempdir() for CRAN compliance)
output_dir <- file.path(tempdir(), "BCFM_analysis")

# Run model selection
# Note: grouplist and factorlist can be vectors (e.g., c(2, 3, 4)) or sequences (e.g., 2:4)
# The function will fit a model for each combination of groups and factors
BCFM.model.selection(
  data = sim.data,
  cluster.vars = cluster.vars,
  grouplist = 4,          # Number of groups
  factorlist = 3,         # Number of factors
  n.iter = 100,           # Number of MCMC iterations (use 50000+ for real analysis)
  burnin = 10,            # Burnin for IC calculation (use 10000+ for real analysis)
  every = 10,             # Retains the output of every "every" MCMC iterations
  cluster.size = 0.01,    # Minimum proportion required for each cluster (default 0.05)
  output_dir = output_dir, # Specify where to save results
  seed = 123              # Optional seed for reproducibility
)
#> Created output directory: /var/folders/gy/w0gwgb4d1m95fnkc05slvsnr0000gn/T//RtmpKzt4l2/BCFM_analysis
#> Saving results to: /var/folders/gy/w0gwgb4d1m95fnkc05slvsnr0000gn/T//RtmpKzt4l2/BCFM_analysis
#> Factor: 3, Group: 4
#> Running BCFM
#> Done
#> Time: 0.2201362 secs

Note: For demonstration, we use minimal iterations (100) and burnin (10). For actual analysis, use n.iter = 50000 and burnin = 10000 or higher to ensure convergence.

Output Directory: The output_dir parameter specifies where model results will be saved. By default, results are saved to tempdir(). For your own analyses, you can specify any directory where you have write permissions.

Examining Output Files

After running model selection, check the output directory:

# Check what files were created
list.files(output_dir)
#> [1] "IC.Rdata"                        "results-covarianceF-g4-f3.Rdata"

# Load IC results (if multiple models were fitted)
load(file.path(output_dir, "IC.Rdata"))
print(IC.matrix)
#>          F3
#> G4 5527.836

# Load the fitted model results
load(file.path(output_dir, "results-covarianceF-g4-f3.Rdata"))

# Examine the structure
str(SDresult$Result, max.level = 1)
#> List of 8
#>  $ X     : num [1:10, 1:200, 1:3, 1] 1.76 2.17 1.97 2.05 1.91 ...
#>  $ mu    : num [1:10, 1:4, 1:3] 2.29 2.32 2.02 2.28 2.36 ...
#>  $ Omega : num [1:10, 1:4, 1:3, 1:3] 1.76 2.19 2.13 2.06 2.09 ...
#>  $ B     : num [1:10, 1:20, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ sigma2: num [1:10, 1:20] 0.2369 0.0965 0.0877 0.0886 0.0817 ...
#>  $ tau   : num [1:10, 1:3] 0.535 0.308 0.431 0.22 0.326 ...
#>  $ Z     : num [1:10, 1:200, 1] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ probs : num [1:10, 1:4] 0.376 0.505 0.527 0.45 0.445 ...

When testing multiple configurations (e.g., grouplist = 2:4, factorlist = 2:4), an IC.Rdata file is created for model comparison.

Visualization

The BCFM package provides comprehensive visualization functions to explore your results.

Factor Loadings Matrix

The factor loadings matrix B shows the relationship between observed variables and latent factors. Each column represents a latent factor, and each row represents a variable. Higher absolute values indicate stronger associations.

# Check dimensions and calculate posterior mean of B (after burnin)
n_iter <- dim(SDresult$Result$B)[1]
burnin <- min(100, floor(n_iter * 0.2))  # Use 20% as burnin or 100, whichever is smaller

cat("Total iterations:", n_iter, "\n")
#> Total iterations: 10
cat("Using burnin:", burnin, "\n")
#> Using burnin: 2

# Calculate posterior mean after burnin
B.matrix <- apply(SDresult$Result$B[burnin:n_iter, , ], MARGIN = c(2, 3), FUN = mean)
B.matrix <- as.data.frame(B.matrix)

# Set variable names
rownames(B.matrix) <- paste0("Var", 1:nrow(B.matrix))

# Set informative column names for factors
colnames(B.matrix) <- paste0("Factor ", 1:ncol(B.matrix))

# Print the factor loadings matrix
knitr::kable(round(B.matrix, 3), 
             caption = "Posterior Mean Factor Loadings",
             align = 'c')

Posterior Mean Factor Loadings
	Factor 1	Factor 2	Factor 3
Var1	1.000	0.000	0.000
Var2	0.105	1.000	0.000
Var3	-0.148	0.188	1.000
Var4	0.921	-0.299	-0.062
Var5	0.669	-0.254	-0.071
Var6	-0.154	-0.578	-0.237
Var7	0.225	-0.195	0.167
Var8	0.027	0.334	0.232
Var9	0.039	-0.161	-0.034
Var10	0.147	-0.013	0.028
Var11	-0.522	-0.415	-0.016
Var12	0.645	0.039	-0.359
Var13	-0.206	0.262	-0.544
Var14	-0.165	-1.159	0.103
Var15	0.171	-0.223	-0.443
Var16	0.978	-0.084	-0.273
Var17	0.024	-0.372	-0.033
Var18	0.256	-0.387	-0.086
Var19	-0.307	0.014	-0.083
Var20	0.086	-0.513	-0.054

Latent Profiles

Visualize the mean profiles for each cluster:

ggplot_latent.profiles(Gibbs = SDresult$Result)

Cluster Means

Explore the posterior distributions of cluster means:

ggplot_mu.density(Gibbs = SDresult$Result, add.legend = TRUE)

Cluster Probabilities

Density Plots

Examine the posterior distributions of cluster membership probabilities:

ggplot_probs.density(Gibbs = SDresult$Result)

Trace Plots

Check convergence of cluster probabilities:

ggplot_probs.trace(Gibbs = SDresult$Result)

Cluster Assignments Heatmap

Visualize the cluster membership assignments over MCMC iterations:

ggplot_Zit.heatmap(Gibbs = SDresult$Result)

Additional Visualizations

The BCFM package includes several other plotting functions to explore your results in depth:

ggplot_B.trace() - Check convergence of factor loadings
ggplot_B.CI() - Examine credible intervals for loadings
ggplot_omega.density() - Visualize within-cluster covariance structures
ggplot_sigma2.CI() - Examine error variances
ggplot_tau.CI() - Examine factor variances
ggplot_variability() - Understand variance decomposition across factors

See the function documentation for details.

Interpreting Results

Cluster Assignments

The heatmap shows how observations are assigned to clusters. Stable assignments (consistent colors across iterations) indicate good convergence and clear cluster structure.

Factor Loadings

The loading matrix B shows which variables are associated with each latent factor. Large credible intervals that include zero suggest weak associations.

Cluster Characteristics

The latent profiles plot reveals the distinctive patterns of each cluster across variables, helping to interpret what makes each cluster unique.

Convergence Diagnostics

The trace plots for factor loadings and cluster probabilities help assess MCMC convergence. Look for:

No trends over iterations (stationary)
Good mixing (values move freely)
Similar patterns across chains

Variance Components

Sigma² (Error variances): Larger values indicate more unexplained variance for those variables
Tau (Factor variances): Shows the relative importance of each latent factor
Variance decomposition: Illustrates how much variance each factor explains

Accessing Model Components

You can directly access various components of the fitted model:

# Cluster assignments (most likely cluster for each observation)
cluster_assignments <- SDresult$Result$Z

# Loading matrix
loadings <- SDresult$Result$B

# Cluster means
cluster_means <- SDresult$Result$mu

# Cluster probabilities
cluster_probs <- SDresult$Result$probs

# Factor scores
factor_scores <- SDresult$Result$X

# Show dimensions
cat("Dimensions of key components:\n")
#> Dimensions of key components:
cat("  B (loadings):", dim(loadings), "\n")
#>   B (loadings): 10 20 3
cat("  mu (means):", dim(cluster_means), "\n")
#>   mu (means): 10 4 3
cat("  X (scores):", dim(factor_scores), "\n")
#>   X (scores): 10 200 3 1

Summary

This vignette covered:

Model Selection - Using BCFM.model.selection() to fit models
Output Examination - Exploring result files and IC matrices
Factor Analysis - Interpreting loadings, convergence, and variance
Clustering - Visualizing cluster profiles and assignments
Diagnostics - Assessing model fit and convergence

For more details, see ?BCFM.model.selection and individual function help pages.

Session Info

sessionInfo()
#> R version 4.4.3 (2025-02-28)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 26.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_4.0.2 tidyr_1.3.2   dplyr_1.2.0   BCFM_1.0.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.10            generics_0.1.4         rstatix_0.7.3         
#>  [4] lattice_0.22-6         digest_0.6.39          magrittr_2.0.4        
#>  [7] evaluate_1.0.5         grid_4.4.3             RColorBrewer_1.1-3    
#> [10] mvtnorm_1.3-3          fastmap_1.2.0          jsonlite_2.0.0        
#> [13] backports_1.5.0        Formula_1.2-5          gridExtra_2.3         
#> [16] purrr_1.2.1            scales_1.4.0           jquerylib_0.1.4       
#> [19] abind_1.4-8            mnormt_2.1.2           cli_3.6.5             
#> [22] rlang_1.1.7            LaplacesDemon_16.1.6   RcppArmadillo_15.2.3-1
#> [25] withr_3.0.2            cachem_1.1.0           yaml_2.3.12           
#> [28] tools_4.4.3            parallel_4.4.3         ggsignif_0.6.4        
#> [31] ggpubr_0.6.2           broom_1.0.12           vctrs_0.7.1           
#> [34] R6_2.6.1               lifecycle_1.0.5        car_3.1-5             
#> [37] psych_2.6.1            fastmatrix_0.6-6       pkgconfig_2.0.3       
#> [40] pillar_1.11.1          bslib_0.10.0           gtable_0.3.6          
#> [43] glue_1.8.0             Rcpp_1.1.1             xfun_0.56             
#> [46] tibble_3.3.1           tidyselect_1.2.1       knitr_1.51            
#> [49] farver_2.1.2           htmltools_0.5.9        nlme_3.1-167          
#> [52] labeling_0.4.3         rmarkdown_2.30         carData_3.0-6         
#> [55] compiler_4.4.3         S7_0.2.1

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.