The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting Started with clmstan

Introduction

clmstan is an R package for fitting Cumulative Link Models (CLMs) using Bayesian inference via CmdStan. It provides: - 11 link functions: 5 standard + 6 flexible parametric links - 3 threshold structures: flexible, equidistant, symmetric - Bayesian estimation: Full posterior inference with MCMC - Pre-compiled Stan models: Fast execution via the instantiate package

Comparison with ordinal::clm()

Feature ordinal::clm() clmstan
Inference Maximum likelihood Bayesian (MCMC)
Uncertainty Asymptotic SE Posterior distribution
Link functions 5 standard 11 (5 standard + 6 flexible)
Estimation of link parameters No Yes

Installation

Before using clmstan, you need to install CmdStan:

# Step 1: Install cmdstanr
install.packages("cmdstanr",
                 repos = c("https://stan-dev.r-universe.dev",
                           getOption("repos")))

# Step 2: Install CmdStan (only needed once)
library(cmdstanr)
install_cmdstan()

# Step 3: Install clmstan
# devtools::install_github("t-momozaki/clmstan")

Quick Start

Fit a cumulative link model in just a few lines:

library(clmstan)
library(ordinal)
data(wine)

# Fit a model with logit link
fit <- clm_stan(rating ~ temp + contact, data = wine,
                chains = 4, iter = 2000, warmup = 1000)

# View results
print(fit)
summary(fit)

The wine dataset contains ratings (1-5) of 72 wine samples with two predictors: temperature (cold/warm) and contact (yes/no).

For ordinal::clm() Users

If you’re familiar with the ordinal package, the transition to clmstan is straightforward:

API Correspondence

ordinal::clm() clmstan::clm_stan()
clm(y ~ x, data) clm_stan(y ~ x, data)
link = "logit" link = "logit"
threshold = "flexible" threshold = "flexible"
summary(fit) summary(fit)
coef(fit) coef(fit)

Example Comparison

library(ordinal)
library(clmstan)
data(wine)

# ordinal::clm (frequentist)
fit_freq <- clm(rating ~ temp + contact, data = wine, link = "logit")

# clmstan::clm_stan (Bayesian)
fit_bayes <- clm_stan(rating ~ temp + contact, data = wine, link = "logit",
                      chains = 4, iter = 2000, warmup = 1000)

# Compare coefficients
coef(fit_freq)
coef(fit_bayes)

Key Differences

  1. Posterior distributions: clmstan provides full posterior samples, enabling Bayesian inference (credible intervals, posterior predictive checks)
  2. Reproducibility: Set set.seed() before clm_stan() for reproducible results
  3. Computation time: MCMC sampling is slower than MLE, but provides richer inference

Threshold Structures

clmstan supports three threshold parameterizations:

# View available threshold structures
supported_thresholds()

Flexible (Default)

Each threshold is freely estimated (K-1 parameters for K categories):

fit_flex <- clm_stan(rating ~ temp, data = wine, threshold = "flexible",
                     chains = 2, iter = 1000, warmup = 500)

Equidistant

Thresholds are equally spaced (2 parameters: start + interval):

# Useful for Likert scales with assumed equal intervals
fit_equi <- clm_stan(rating ~ temp, data = wine, threshold = "equidistant",
                     chains = 2, iter = 1000, warmup = 500)

Symmetric

Thresholds are symmetric around zero (useful for scales with a neutral center):

fit_sym <- clm_stan(rating ~ temp, data = wine, threshold = "symmetric",
                    chains = 2, iter = 1000, warmup = 500)

Prior Specification

clmstan uses weakly informative default priors, but you can customize them.

Default Priors

Parameter Default Prior
Coefficients (beta) normal(0, 2.5)
Thresholds (c) normal(0, 10)
t-link df gamma(2, 0.1)
GEV xi normal(0, 2)
SP r gamma(0.5, 0.5)
AEP theta1, theta2 gamma(2, 1)

Available Distributions

normal(mu, sigma)      # Normal distribution
gamma(alpha, beta)     # Gamma distribution (shape, rate)
student_t(df, mu, sigma)  # Student-t distribution
cauchy(mu, sigma)      # Cauchy distribution
flat()                 # Flat (improper) prior

Prior Classes

Class Description Compatible Distributions
"b" Regression coefficients normal, student_t, cauchy, flat
"Intercept" Thresholds (flexible) normal, student_t, cauchy, flat
"c1" First threshold (equidistant) normal, student_t, cauchy, flat
"d" Interval (equidistant) gamma
"cpos" Positive thresholds (symmetric) normal, student_t, cauchy, flat
"df" t-link degrees of freedom gamma
"xi" GEV shape parameter normal, student_t, cauchy
"r" SP skewness parameter gamma
"theta1", "theta2" AEP shape parameters gamma

Legacy API (clm_prior)

For backward compatibility:

fit <- clm_stan(rating ~ temp, data = wine,
                prior = clm_prior(beta_sd = 1, c_sd = 5),
                chains = 2, iter = 1000, warmup = 500)

Interpreting Results

Basic Methods

fit <- clm_stan(rating ~ temp + contact, data = wine,
                chains = 4, iter = 2000, warmup = 1000)

# Quick overview
print(fit)

# Detailed summary with credible intervals
summary(fit)
summary(fit, probs = c(0.025, 0.5, 0.975))

# Extract point estimates
coef(fit)                    # Posterior mean (default)
coef(fit, type = "median")   # Posterior median

Diagnostic Plots

# Trace plots (check mixing)
plot(fit, type = "trace")

# Posterior density
plot(fit, type = "dens")

# Posterior intervals
plot(fit, type = "intervals")

# Autocorrelation (check for high autocorrelation)
plot(fit, type = "acf")

# Select specific parameters
plot(fit, type = "trace", pars = c("beta[1]", "beta[2]"))

Convergence Diagnostics

# Quick summary
diagnostics(fit)

# Detailed output
diagnostics(fit, detail = TRUE)

Key diagnostics to check: - Rhat: Should be < 1.01 (values > 1.01 indicate poor convergence) - ESS (bulk/tail): Should be > 400 (low values suggest inefficient sampling) - Divergences: Should be 0 (divergences indicate sampling problems)

Prediction

Category Prediction

fit <- clm_stan(rating ~ temp + contact, data = wine,
                chains = 4, iter = 2000, warmup = 1000)

# Predict most likely category
pred_class <- predict(fit, type = "class")
head(pred_class)

# Prediction for new data
newdata <- data.frame(
  temp = factor("warm", levels = levels(wine$temp)),
  contact = factor("yes", levels = levels(wine$contact))
)
predict(fit, newdata = newdata, type = "class")

Probability Prediction

# Predicted probabilities for each category
pred_probs <- predict(fit, type = "probs")
head(pred_probs)

# Alternative: fitted values
fitted_vals <- fitted(fit)
head(fitted_vals)

Posterior Predictive Distribution

# Draw from posterior predictive distribution
y_rep <- posterior_predict(fit)
dim(y_rep)  # draws x observations

# Uncertainty in predictions
pred_draws <- predict(fit, type = "class", summary = FALSE)

Model Comparison

Use LOO-CV (Leave-One-Out Cross-Validation) for model comparison:

# Fit competing models
fit1 <- clm_stan(rating ~ temp, data = wine, link = "logit",
                 chains = 4, iter = 2000, warmup = 1000)
fit2 <- clm_stan(rating ~ temp + contact, data = wine, link = "logit",
                 chains = 4, iter = 2000, warmup = 1000)
fit3 <- clm_stan(rating ~ temp + contact, data = wine, link = "probit",
                 chains = 4, iter = 2000, warmup = 1000)

# Compute LOO-CV
loo1 <- loo(fit1)
loo2 <- loo(fit2)
loo3 <- loo(fit3)

# View results
print(loo1)

# Pareto k diagnostic plot
plot(loo1)

# Compare models
loo::loo_compare(loo1, loo2, loo3)

# WAIC is also available
waic(fit1)

Interpreting LOO results: - Lower elpd_loo (expected log pointwise predictive density) indicates better fit - Pareto k values > 0.7 suggest unreliable LOO estimates for those observations

Practical Workflow

Here’s a recommended workflow for ordinal data analysis:

library(clmstan)
library(ordinal)
data(wine)

# Step 1: Fit baseline model
set.seed(123)
fit_base <- clm_stan(rating ~ temp + contact, data = wine,
                     link = "logit", threshold = "flexible",
                     chains = 4, iter = 2000, warmup = 1000)

# Step 2: Check convergence
diagnostics(fit_base)
plot(fit_base, type = "trace")

# Step 3: Review results
summary(fit_base)

# Step 4: Try alternative link functions
fit_probit <- clm_stan(rating ~ temp + contact, data = wine,
                       link = "probit",
                       chains = 4, iter = 2000, warmup = 1000)

fit_gev <- clm_stan(rating ~ temp + contact, data = wine,
                    link = "gev", link_param = list(xi = "estimate"),
                    chains = 4, iter = 2000, warmup = 1000)

# Step 5: Compare models
loo_base <- loo(fit_base)
loo_probit <- loo(fit_probit)
loo_gev <- loo(fit_gev)
loo::loo_compare(loo_base, loo_probit, loo_gev)

# Step 6: Final model interpretation
summary(fit_base)
plot(fit_base, type = "intervals")

Troubleshooting

CmdStan Not Found

# Check CmdStan installation
cmdstanr::cmdstan_path()
cmdstanr::cmdstan_version()

# If not found, install CmdStan:
cmdstanr::install_cmdstan()

Convergence Issues

If you see Rhat > 1.01 or low ESS:

# Increase iterations and warmup
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 4000,
                warmup = 2000)

# Increase adapt_delta (helps with divergences)
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 2000,
                warmup = 1000,
                adapt_delta = 0.95)

# Increase max_treedepth
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 2000,
                warmup = 1000,
                max_treedepth = 12)

Divergent Transitions

Divergent transitions indicate that the sampler had difficulty exploring the posterior:

  1. Increase adapt_delta: Try 0.95, 0.99, or even 0.999
  2. Reparameterize: Try different threshold structures
  3. Simplify priors: Use more informative priors
  4. Check data: Look for separation or outliers
# High adapt_delta
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 2000,
                warmup = 1000,
                adapt_delta = 0.99)

Memory Issues

For large datasets:

# Reduce number of chains
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 2,
                iter = 2000,
                warmup = 1000)

# Run chains in parallel (default)
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                parallel_chains = 4)

References and Resources

Package Resources

Statistical References

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.