The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting started with bpgmm

bpgmm fits Bayesian parsimonious Gaussian mixture models for model-based clustering. It targets three posterior inference goals described by Lu, Li, and Love (2021): the partition of observations, the number of clusters, and the cluster covariance structure.

The vignettes have separate roles:

Article Use it for
data-preparation matrix orientation, scaling, and choosing m_range and q_new
model-and-sampler formulas from the paper and their package arguments
examples inspecting one fitted object
model-selection RJMCMC summaries for \(m\) and covariance model \(v\)
variable-prioritization exploratory variable rankings from posterior output
posterior-diagnostics independent chains, traces, and co-clustering

Fit a model

Input data should be a numeric matrix with variables in rows and observations in columns. The small example below has two variables and eight observations. The example is short so the vignette runs quickly; applied analyses should use larger burn and niter values.

library(bpgmm)

set.seed(2026)

X <- cbind(
  matrix(rnorm(8, mean = -2, sd = 0.2), nrow = 2),
  matrix(rnorm(8, mean = 2, sd = 0.2), nrow = 2)
)
known_labels <- rep(1:2, each = 4)

The scatter plot gives a direct check of the simulated partition. Each point is one observation, and the colors show the reference labels used later for the ARI calculation.

plot(
  X[1, ], X[2, ],
  col = c("#0072B2", "#D55E00")[known_labels],
  pch = 19,
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Small two-cluster example",
  asp = 1
)
legend(
  "topleft",
  legend = paste("Reference", sort(unique(known_labels))),
  col = c("#0072B2", "#D55E00"),
  pch = 19,
  bty = "n"
)

Scatter plot of the small two-cluster example.

fit_log <- capture.output({
  fit <- pgmm_rjmcmc(
    X = X,
    m_init = 2,
    m_range = c(1, 3),
    q_new = 1,
    burn = 1,
    niter = 3,
    constraint = model_to_constraint("UUU"),
    m_step = 0,
    v_step = 0,
    verbose = FALSE
  )
})
tail(fit_log, 1)
#> character(0)

The call fixes \(m = 2\) and the covariance model so the example is fast. The important arguments are:

Summarize posterior samples

summary <- summarize_pgmm_rjmcmc(fit, true_cluster = known_labels)

summary$allocation
#> [1] 2 2 2 2 1 1 1 1
summary$n_clusters
#> 
#> 2 
#> 3
summary$n_constraints
#> 
#> UUU 
#>   3
summary$ari
#> [1] 1

If a reference partition is available, pass it as true_cluster to calculate the adjusted Rand index.

The summary allocation can be plotted back on the original coordinates. This plot is a first check before longer chains and convergence diagnostics.

plot(
  X[1, ], X[2, ],
  col = c("#009E73", "#CC79A7", "#E69F00")[summary$allocation],
  pch = 19,
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Posterior modal allocation",
  asp = 1
)
text(X[1, ], X[2, ], labels = seq_along(summary$allocation), pos = 3, cex = 0.75)
legend(
  "topleft",
  legend = paste("Cluster", sort(unique(summary$allocation))),
  col = c("#009E73", "#CC79A7", "#E69F00")[sort(unique(summary$allocation))],
  pch = 19,
  bty = "n"
)

Scatter plot colored by posterior modal allocation.

Common early mistakes

Common early problems are:

Naming convention

Starting with version 1.2.0, the public API uses snake_case names throughout. Use pgmm_rjmcmc(), summarize_pgmm_rjmcmc(), and snake_case argument names such as m_init, m_range, and q_new.

Citation

If you use bpgmm in published work, please cite the package and the methodology paper:

citation("bpgmm")
#> To cite package 'bpgmm' in publications use:
#> 
#>   Lu X, Li Y, Love T (2021). On Bayesian Analysis of Parsimonious
#>   Gaussian Mixture Models. Journal of Classification, 38, 576-593.
#>   doi:10.1007/s00357-021-09391-8
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {On Bayesian Analysis of Parsimonious Gaussian Mixture Models},
#>     author = {Xiang Lu and Yaoxiang Li and Tanzy Love},
#>     journal = {Journal of Classification},
#>     year = {2021},
#>     volume = {38},
#>     pages = {576--593},
#>     doi = {10.1007/s00357-021-09391-8},
#>   }

Lu, X., Li, Y., & Love, T. (2021). On Bayesian Analysis of Parsimonious Gaussian Mixture Models. Journal of Classification, 38, 576-593. https://doi.org/10.1007/s00357-021-09391-8

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.