Getting Started with clmstan

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

clmstan is an R package for fitting Cumulative Link Models (CLMs) using Bayesian inference via CmdStan. It provides: - 11 link functions: 5 standard + 6 flexible parametric links - 3 threshold structures: flexible, equidistant, symmetric - Bayesian estimation: Full posterior inference with MCMC - Pre-compiled Stan models: Fast execution via the instantiate package

Comparison with ordinal::clm()

Feature	ordinal::clm()	clmstan
Inference	Maximum likelihood	Bayesian (MCMC)
Uncertainty	Asymptotic SE	Posterior distribution
Link functions	5 standard	11 (5 standard + 6 flexible)
Estimation of link parameters	No	Yes

Installation

Before using clmstan, you need to install CmdStan:

# Step 1: Install cmdstanr
install.packages("cmdstanr",
                 repos = c("https://stan-dev.r-universe.dev",
                           getOption("repos")))

# Step 2: Install CmdStan (only needed once)
library(cmdstanr)
install_cmdstan()

# Step 3: Install clmstan
# devtools::install_github("t-momozaki/clmstan")

Quick Start

Fit a cumulative link model in just a few lines:

library(clmstan)
library(ordinal)
data(wine)

# Fit a model with logit link
fit <- clm_stan(rating ~ temp + contact, data = wine,
                chains = 4, iter = 2000, warmup = 1000)

# View results
print(fit)
summary(fit)

The wine dataset contains ratings (1-5) of 72 wine samples with two predictors: temperature (cold/warm) and contact (yes/no).

For ordinal::clm() Users

If you’re familiar with the ordinal package, the transition to clmstan is straightforward:

API Correspondence

ordinal::clm()	clmstan::clm_stan()
`clm(y ~ x, data)`	`clm_stan(y ~ x, data)`
`link = "logit"`	`link = "logit"`
`threshold = "flexible"`	`threshold = "flexible"`
`summary(fit)`	`summary(fit)`
`coef(fit)`	`coef(fit)`

Example Comparison

library(ordinal)
library(clmstan)
data(wine)

# ordinal::clm (frequentist)
fit_freq <- clm(rating ~ temp + contact, data = wine, link = "logit")

# clmstan::clm_stan (Bayesian)
fit_bayes <- clm_stan(rating ~ temp + contact, data = wine, link = "logit",
                      chains = 4, iter = 2000, warmup = 1000)

# Compare coefficients
coef(fit_freq)
coef(fit_bayes)

Key Differences

Posterior distributions: clmstan provides full posterior samples, enabling Bayesian inference (credible intervals, posterior predictive checks)
Reproducibility: Set set.seed() before clm_stan() for reproducible results
Computation time: MCMC sampling is slower than MLE, but provides richer inference

Link Functions

clmstan supports 11 link functions, more than any other CLM package.

Standard Links (5)

# View available link functions
supported_links("standard")

# Logit (default) - proportional odds model
fit_logit <- clm_stan(rating ~ temp, data = wine, link = "logit",
                      chains = 2, iter = 1000, warmup = 500)

# Probit - normal distribution
fit_probit <- clm_stan(rating ~ temp, data = wine, link = "probit",
                       chains = 2, iter = 1000, warmup = 500)

# Complementary log-log - asymmetric (right-skewed)
fit_cloglog <- clm_stan(rating ~ temp, data = wine, link = "cloglog",
                        chains = 2, iter = 1000, warmup = 500)

# Log-log - asymmetric (left-skewed)
fit_loglog <- clm_stan(rating ~ temp, data = wine, link = "loglog",
                       chains = 2, iter = 1000, warmup = 500)

# Cauchit - heavy tails
fit_cauchit <- clm_stan(rating ~ temp, data = wine, link = "cauchit",
                        chains = 2, iter = 1000, warmup = 500)

Flexible Links (6)

Flexible links have parameters that can be fixed or estimated. For theoretical background, see Wang & Dey (2011) for GEV, Jiang & Dey (2015) for SP, and Naranjo et al. (2015) for AEP.

# View flexible link functions
supported_links("flexible")

# t-link with fixed df (heavier tails than logit)
fit_t8 <- clm_stan(rating ~ temp, data = wine, link = "tlink",
                   link_param = list(df = 8),
                   chains = 2, iter = 1000, warmup = 500)

# t-link with estimated df (data-driven tail behavior)
fit_t_est <- clm_stan(rating ~ temp, data = wine, link = "tlink",
                      link_param = list(df = "estimate"),
                      chains = 2, iter = 1000, warmup = 500)

# GEV link with estimated shape parameter
fit_gev <- clm_stan(rating ~ temp, data = wine, link = "gev",
                    link_param = list(xi = "estimate"),
                    chains = 2, iter = 1000, warmup = 500)

# Aranda-Ordaz link (asymmetric)
fit_ao <- clm_stan(rating ~ temp, data = wine, link = "aranda_ordaz",
                   link_param = list(lambda = "estimate"),
                   chains = 2, iter = 1000, warmup = 500)

# Symmetric Power link (skewness adjustment)
fit_sp <- clm_stan(rating ~ temp, data = wine, link = "sp",
                   base = "logit",
                   link_param = list(r = "estimate"),
                   chains = 2, iter = 1000, warmup = 500)

# Log-gamma link
fit_lg <- clm_stan(rating ~ temp, data = wine, link = "log_gamma",
                   link_param = list(lambda = "estimate"),
                   chains = 2, iter = 1000, warmup = 500)

# AEP link (asymmetric exponential power)
fit_aep <- clm_stan(rating ~ temp, data = wine, link = "aep",
                    link_param = list(theta1 = "estimate", theta2 = "estimate"),
                    chains = 2, iter = 1000, warmup = 500)

Link Function Summary

Link	Parameters	Special Cases
tlink	df > 0	df = Inf -> probit
aranda_ordaz	lambda > 0	lambda = 1 -> logit
gev	xi (real)	xi = 0 -> loglog
sp	r > 0, base	r = 1 -> base distribution
log_gamma	lambda (real)	lambda = 0 -> probit
aep	theta1, theta2 > 0	Both = 2 -> similar to probit

Threshold Structures

clmstan supports three threshold parameterizations:

# View available threshold structures
supported_thresholds()

Flexible (Default)

Each threshold is freely estimated (K-1 parameters for K categories):

fit_flex <- clm_stan(rating ~ temp, data = wine, threshold = "flexible",
                     chains = 2, iter = 1000, warmup = 500)

Equidistant

Thresholds are equally spaced (2 parameters: start + interval):

# Useful for Likert scales with assumed equal intervals
fit_equi <- clm_stan(rating ~ temp, data = wine, threshold = "equidistant",
                     chains = 2, iter = 1000, warmup = 500)

Symmetric

Thresholds are symmetric around zero (useful for scales with a neutral center):

fit_sym <- clm_stan(rating ~ temp, data = wine, threshold = "symmetric",
                    chains = 2, iter = 1000, warmup = 500)

Prior Specification

clmstan uses weakly informative default priors, but you can customize them.

Default Priors

Parameter	Default Prior
Coefficients (beta)	normal(0, 2.5)
Thresholds (c)	normal(0, 10)
t-link df	gamma(2, 0.1)
GEV xi	normal(0, 2)
SP r	gamma(0.5, 0.5)
AEP theta1, theta2	gamma(2, 1)

Using prior() Function (Recommended)

The prior() function provides a brms-like interface:

# Single prior
fit <- clm_stan(rating ~ temp, data = wine,
                prior = prior(normal(0, 1), class = "b"),
                chains = 2, iter = 1000, warmup = 500)

# Multiple priors
fit <- clm_stan(rating ~ temp, data = wine,
                prior = c(
                  prior(normal(0, 1), class = "b"),
                  prior(normal(0, 5), class = "Intercept")
                ),
                chains = 2, iter = 1000, warmup = 500)

# Prior for link parameters
fit_gev <- clm_stan(rating ~ temp, data = wine, link = "gev",
                    link_param = list(xi = "estimate"),
                    prior = prior(normal(0, 0.5), class = "xi"),
                    chains = 2, iter = 1000, warmup = 500)

Available Distributions

normal(mu, sigma)      # Normal distribution
gamma(alpha, beta)     # Gamma distribution (shape, rate)
student_t(df, mu, sigma)  # Student-t distribution
cauchy(mu, sigma)      # Cauchy distribution
flat()                 # Flat (improper) prior

Prior Classes

Class	Description	Compatible Distributions
`"b"`	Regression coefficients	normal, student_t, cauchy, flat
`"Intercept"`	Thresholds (flexible)	normal, student_t, cauchy, flat
`"c1"`	First threshold (equidistant)	normal, student_t, cauchy, flat
`"d"`	Interval (equidistant)	gamma
`"cpos"`	Positive thresholds (symmetric)	normal, student_t, cauchy, flat
`"df"`	t-link degrees of freedom	gamma
`"xi"`	GEV shape parameter	normal, student_t, cauchy
`"r"`	SP skewness parameter	gamma
`"theta1"`, `"theta2"`	AEP shape parameters	gamma

Legacy API (clm_prior)

For backward compatibility:

fit <- clm_stan(rating ~ temp, data = wine,
                prior = clm_prior(beta_sd = 1, c_sd = 5),
                chains = 2, iter = 1000, warmup = 500)

Interpreting Results

Basic Methods

fit <- clm_stan(rating ~ temp + contact, data = wine,
                chains = 4, iter = 2000, warmup = 1000)

# Quick overview
print(fit)

# Detailed summary with credible intervals
summary(fit)
summary(fit, probs = c(0.025, 0.5, 0.975))

# Extract point estimates
coef(fit)                    # Posterior mean (default)
coef(fit, type = "median")   # Posterior median

Diagnostic Plots

# Trace plots (check mixing)
plot(fit, type = "trace")

# Posterior density
plot(fit, type = "dens")

# Posterior intervals
plot(fit, type = "intervals")

# Autocorrelation (check for high autocorrelation)
plot(fit, type = "acf")

# Select specific parameters
plot(fit, type = "trace", pars = c("beta[1]", "beta[2]"))

Convergence Diagnostics

# Quick summary
diagnostics(fit)

# Detailed output
diagnostics(fit, detail = TRUE)

Key diagnostics to check: - Rhat: Should be < 1.01 (values > 1.01 indicate poor convergence) - ESS (bulk/tail): Should be > 400 (low values suggest inefficient sampling) - Divergences: Should be 0 (divergences indicate sampling problems)

Prediction

Category Prediction

fit <- clm_stan(rating ~ temp + contact, data = wine,
                chains = 4, iter = 2000, warmup = 1000)

# Predict most likely category
pred_class <- predict(fit, type = "class")
head(pred_class)

# Prediction for new data
newdata <- data.frame(
  temp = factor("warm", levels = levels(wine$temp)),
  contact = factor("yes", levels = levels(wine$contact))
)
predict(fit, newdata = newdata, type = "class")

Probability Prediction

# Predicted probabilities for each category
pred_probs <- predict(fit, type = "probs")
head(pred_probs)

# Alternative: fitted values
fitted_vals <- fitted(fit)
head(fitted_vals)

Posterior Predictive Distribution

# Draw from posterior predictive distribution
y_rep <- posterior_predict(fit)
dim(y_rep)  # draws x observations

# Uncertainty in predictions
pred_draws <- predict(fit, type = "class", summary = FALSE)

Model Comparison

Use LOO-CV (Leave-One-Out Cross-Validation) for model comparison:

# Fit competing models
fit1 <- clm_stan(rating ~ temp, data = wine, link = "logit",
                 chains = 4, iter = 2000, warmup = 1000)
fit2 <- clm_stan(rating ~ temp + contact, data = wine, link = "logit",
                 chains = 4, iter = 2000, warmup = 1000)
fit3 <- clm_stan(rating ~ temp + contact, data = wine, link = "probit",
                 chains = 4, iter = 2000, warmup = 1000)

# Compute LOO-CV
loo1 <- loo(fit1)
loo2 <- loo(fit2)
loo3 <- loo(fit3)

# View results
print(loo1)

# Pareto k diagnostic plot
plot(loo1)

# Compare models
loo::loo_compare(loo1, loo2, loo3)

# WAIC is also available
waic(fit1)

Interpreting LOO results: - Lower elpd_loo (expected log pointwise predictive density) indicates better fit - Pareto k values > 0.7 suggest unreliable LOO estimates for those observations

Practical Workflow

Here’s a recommended workflow for ordinal data analysis:

library(clmstan)
library(ordinal)
data(wine)

# Step 1: Fit baseline model
set.seed(123)
fit_base <- clm_stan(rating ~ temp + contact, data = wine,
                     link = "logit", threshold = "flexible",
                     chains = 4, iter = 2000, warmup = 1000)

# Step 2: Check convergence
diagnostics(fit_base)
plot(fit_base, type = "trace")

# Step 3: Review results
summary(fit_base)

# Step 4: Try alternative link functions
fit_probit <- clm_stan(rating ~ temp + contact, data = wine,
                       link = "probit",
                       chains = 4, iter = 2000, warmup = 1000)

fit_gev <- clm_stan(rating ~ temp + contact, data = wine,
                    link = "gev", link_param = list(xi = "estimate"),
                    chains = 4, iter = 2000, warmup = 1000)

# Step 5: Compare models
loo_base <- loo(fit_base)
loo_probit <- loo(fit_probit)
loo_gev <- loo(fit_gev)
loo::loo_compare(loo_base, loo_probit, loo_gev)

# Step 6: Final model interpretation
summary(fit_base)
plot(fit_base, type = "intervals")

Troubleshooting

CmdStan Not Found

# Check CmdStan installation
cmdstanr::cmdstan_path()
cmdstanr::cmdstan_version()

# If not found, install CmdStan:
cmdstanr::install_cmdstan()

Convergence Issues

If you see Rhat > 1.01 or low ESS:

# Increase iterations and warmup
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 4000,
                warmup = 2000)

# Increase adapt_delta (helps with divergences)
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 2000,
                warmup = 1000,
                adapt_delta = 0.95)

# Increase max_treedepth
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 2000,
                warmup = 1000,
                max_treedepth = 12)

Divergent Transitions

Divergent transitions indicate that the sampler had difficulty exploring the posterior:

Increase adapt_delta: Try 0.95, 0.99, or even 0.999
Reparameterize: Try different threshold structures
Simplify priors: Use more informative priors
Check data: Look for separation or outliers

# High adapt_delta
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                iter = 2000,
                warmup = 1000,
                adapt_delta = 0.99)

Memory Issues

For large datasets:

# Reduce number of chains
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 2,
                iter = 2000,
                warmup = 1000)

# Run chains in parallel (default)
fit <- clm_stan(rating ~ temp, data = wine,
                chains = 4,
                parallel_chains = 4)

References and Resources

Package Resources

Statistical References

Agresti, A. (2010). Analysis of Ordinal Categorical Data (2nd ed.)
Christensen, R.H.B. (2019). ordinal: Regression Models for Ordinal Data
Jiang, X., & Dey, D.K. (2015). Symmetric power link with ordinal response model. In S.K. Upadhyay et al. (Eds.), Current trends in Bayesian methodology with applications (pp. 335-346). CRC Press.
Naranjo, L., Perez, C.J., & Martin, J. (2015). Bayesian analysis of some models that use the asymmetric exponential power distribution. Statistics and Computing, 25(3), 497-514.
Wang, X., & Dey, D.K. (2011). Generalized extreme value regression for ordinal response data. Environmental and Ecological Statistics, 18(4), 619-634.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

Getting Started with clmstan

Introduction

Comparison with ordinal::clm()

Installation

Quick Start

For ordinal::clm() Users

API Correspondence

Example Comparison

Key Differences

Link Functions

Standard Links (5)

Flexible Links (6)

Link Function Summary

Threshold Structures

Flexible (Default)

Equidistant

Symmetric

Prior Specification

Default Priors

Using prior() Function (Recommended)

Available Distributions

Prior Classes

Legacy API (clm_prior)

Interpreting Results

Basic Methods

Diagnostic Plots

Convergence Diagnostics

Prediction

Category Prediction

Probability Prediction

Posterior Predictive Distribution

Model Comparison

Practical Workflow

Troubleshooting

CmdStan Not Found

Convergence Issues

Divergent Transitions

Memory Issues

References and Resources

Package Resources

Related Packages

Statistical References