The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
clmstan is an R package for fitting Cumulative Link Models (CLMs) using Bayesian inference via CmdStan. It provides: - 11 link functions: 5 standard + 6 flexible parametric links - 3 threshold structures: flexible, equidistant, symmetric - Bayesian estimation: Full posterior inference with MCMC - Pre-compiled Stan models: Fast execution via the instantiate package
| Feature | ordinal::clm() | clmstan |
|---|---|---|
| Inference | Maximum likelihood | Bayesian (MCMC) |
| Uncertainty | Asymptotic SE | Posterior distribution |
| Link functions | 5 standard | 11 (5 standard + 6 flexible) |
| Estimation of link parameters | No | Yes |
Before using clmstan, you need to install CmdStan:
Fit a cumulative link model in just a few lines:
library(clmstan)
library(ordinal)
data(wine)
# Fit a model with logit link
fit <- clm_stan(rating ~ temp + contact, data = wine,
chains = 4, iter = 2000, warmup = 1000)
# View results
print(fit)
summary(fit)The wine dataset contains ratings (1-5) of 72 wine
samples with two predictors: temperature (cold/warm) and contact
(yes/no).
If you’re familiar with the ordinal package, the transition to clmstan is straightforward:
| ordinal::clm() | clmstan::clm_stan() |
|---|---|
clm(y ~ x, data) |
clm_stan(y ~ x, data) |
link = "logit" |
link = "logit" |
threshold = "flexible" |
threshold = "flexible" |
summary(fit) |
summary(fit) |
coef(fit) |
coef(fit) |
library(ordinal)
library(clmstan)
data(wine)
# ordinal::clm (frequentist)
fit_freq <- clm(rating ~ temp + contact, data = wine, link = "logit")
# clmstan::clm_stan (Bayesian)
fit_bayes <- clm_stan(rating ~ temp + contact, data = wine, link = "logit",
chains = 4, iter = 2000, warmup = 1000)
# Compare coefficients
coef(fit_freq)
coef(fit_bayes)set.seed() before
clm_stan() for reproducible resultsclmstan supports 11 link functions, more than any other CLM package.
# View available link functions
supported_links("standard")
# Logit (default) - proportional odds model
fit_logit <- clm_stan(rating ~ temp, data = wine, link = "logit",
chains = 2, iter = 1000, warmup = 500)
# Probit - normal distribution
fit_probit <- clm_stan(rating ~ temp, data = wine, link = "probit",
chains = 2, iter = 1000, warmup = 500)
# Complementary log-log - asymmetric (right-skewed)
fit_cloglog <- clm_stan(rating ~ temp, data = wine, link = "cloglog",
chains = 2, iter = 1000, warmup = 500)
# Log-log - asymmetric (left-skewed)
fit_loglog <- clm_stan(rating ~ temp, data = wine, link = "loglog",
chains = 2, iter = 1000, warmup = 500)
# Cauchit - heavy tails
fit_cauchit <- clm_stan(rating ~ temp, data = wine, link = "cauchit",
chains = 2, iter = 1000, warmup = 500)Flexible links have parameters that can be fixed or estimated. For theoretical background, see Wang & Dey (2011) for GEV, Jiang & Dey (2015) for SP, and Naranjo et al. (2015) for AEP.
# View flexible link functions
supported_links("flexible")
# t-link with fixed df (heavier tails than logit)
fit_t8 <- clm_stan(rating ~ temp, data = wine, link = "tlink",
link_param = list(df = 8),
chains = 2, iter = 1000, warmup = 500)
# t-link with estimated df (data-driven tail behavior)
fit_t_est <- clm_stan(rating ~ temp, data = wine, link = "tlink",
link_param = list(df = "estimate"),
chains = 2, iter = 1000, warmup = 500)
# GEV link with estimated shape parameter
fit_gev <- clm_stan(rating ~ temp, data = wine, link = "gev",
link_param = list(xi = "estimate"),
chains = 2, iter = 1000, warmup = 500)
# Aranda-Ordaz link (asymmetric)
fit_ao <- clm_stan(rating ~ temp, data = wine, link = "aranda_ordaz",
link_param = list(lambda = "estimate"),
chains = 2, iter = 1000, warmup = 500)
# Symmetric Power link (skewness adjustment)
fit_sp <- clm_stan(rating ~ temp, data = wine, link = "sp",
base = "logit",
link_param = list(r = "estimate"),
chains = 2, iter = 1000, warmup = 500)
# Log-gamma link
fit_lg <- clm_stan(rating ~ temp, data = wine, link = "log_gamma",
link_param = list(lambda = "estimate"),
chains = 2, iter = 1000, warmup = 500)
# AEP link (asymmetric exponential power)
fit_aep <- clm_stan(rating ~ temp, data = wine, link = "aep",
link_param = list(theta1 = "estimate", theta2 = "estimate"),
chains = 2, iter = 1000, warmup = 500)| Link | Parameters | Special Cases |
|---|---|---|
| tlink | df > 0 | df = Inf -> probit |
| aranda_ordaz | lambda > 0 | lambda = 1 -> logit |
| gev | xi (real) | xi = 0 -> loglog |
| sp | r > 0, base | r = 1 -> base distribution |
| log_gamma | lambda (real) | lambda = 0 -> probit |
| aep | theta1, theta2 > 0 | Both = 2 -> similar to probit |
clmstan supports three threshold parameterizations:
Each threshold is freely estimated (K-1 parameters for K categories):
Thresholds are equally spaced (2 parameters: start + interval):
clmstan uses weakly informative default priors, but you can customize them.
| Parameter | Default Prior |
|---|---|
| Coefficients (beta) | normal(0, 2.5) |
| Thresholds (c) | normal(0, 10) |
| t-link df | gamma(2, 0.1) |
| GEV xi | normal(0, 2) |
| SP r | gamma(0.5, 0.5) |
| AEP theta1, theta2 | gamma(2, 1) |
The prior() function provides a brms-like interface:
# Single prior
fit <- clm_stan(rating ~ temp, data = wine,
prior = prior(normal(0, 1), class = "b"),
chains = 2, iter = 1000, warmup = 500)
# Multiple priors
fit <- clm_stan(rating ~ temp, data = wine,
prior = c(
prior(normal(0, 1), class = "b"),
prior(normal(0, 5), class = "Intercept")
),
chains = 2, iter = 1000, warmup = 500)
# Prior for link parameters
fit_gev <- clm_stan(rating ~ temp, data = wine, link = "gev",
link_param = list(xi = "estimate"),
prior = prior(normal(0, 0.5), class = "xi"),
chains = 2, iter = 1000, warmup = 500)| Class | Description | Compatible Distributions |
|---|---|---|
"b" |
Regression coefficients | normal, student_t, cauchy, flat |
"Intercept" |
Thresholds (flexible) | normal, student_t, cauchy, flat |
"c1" |
First threshold (equidistant) | normal, student_t, cauchy, flat |
"d" |
Interval (equidistant) | gamma |
"cpos" |
Positive thresholds (symmetric) | normal, student_t, cauchy, flat |
"df" |
t-link degrees of freedom | gamma |
"xi" |
GEV shape parameter | normal, student_t, cauchy |
"r" |
SP skewness parameter | gamma |
"theta1", "theta2" |
AEP shape parameters | gamma |
fit <- clm_stan(rating ~ temp + contact, data = wine,
chains = 4, iter = 2000, warmup = 1000)
# Quick overview
print(fit)
# Detailed summary with credible intervals
summary(fit)
summary(fit, probs = c(0.025, 0.5, 0.975))
# Extract point estimates
coef(fit) # Posterior mean (default)
coef(fit, type = "median") # Posterior median# Trace plots (check mixing)
plot(fit, type = "trace")
# Posterior density
plot(fit, type = "dens")
# Posterior intervals
plot(fit, type = "intervals")
# Autocorrelation (check for high autocorrelation)
plot(fit, type = "acf")
# Select specific parameters
plot(fit, type = "trace", pars = c("beta[1]", "beta[2]"))Key diagnostics to check: - Rhat: Should be < 1.01 (values > 1.01 indicate poor convergence) - ESS (bulk/tail): Should be > 400 (low values suggest inefficient sampling) - Divergences: Should be 0 (divergences indicate sampling problems)
fit <- clm_stan(rating ~ temp + contact, data = wine,
chains = 4, iter = 2000, warmup = 1000)
# Predict most likely category
pred_class <- predict(fit, type = "class")
head(pred_class)
# Prediction for new data
newdata <- data.frame(
temp = factor("warm", levels = levels(wine$temp)),
contact = factor("yes", levels = levels(wine$contact))
)
predict(fit, newdata = newdata, type = "class")Use LOO-CV (Leave-One-Out Cross-Validation) for model comparison:
# Fit competing models
fit1 <- clm_stan(rating ~ temp, data = wine, link = "logit",
chains = 4, iter = 2000, warmup = 1000)
fit2 <- clm_stan(rating ~ temp + contact, data = wine, link = "logit",
chains = 4, iter = 2000, warmup = 1000)
fit3 <- clm_stan(rating ~ temp + contact, data = wine, link = "probit",
chains = 4, iter = 2000, warmup = 1000)
# Compute LOO-CV
loo1 <- loo(fit1)
loo2 <- loo(fit2)
loo3 <- loo(fit3)
# View results
print(loo1)
# Pareto k diagnostic plot
plot(loo1)
# Compare models
loo::loo_compare(loo1, loo2, loo3)
# WAIC is also available
waic(fit1)Interpreting LOO results: - Lower
elpd_loo (expected log pointwise predictive density)
indicates better fit - Pareto k values > 0.7 suggest unreliable LOO
estimates for those observations
Here’s a recommended workflow for ordinal data analysis:
library(clmstan)
library(ordinal)
data(wine)
# Step 1: Fit baseline model
set.seed(123)
fit_base <- clm_stan(rating ~ temp + contact, data = wine,
link = "logit", threshold = "flexible",
chains = 4, iter = 2000, warmup = 1000)
# Step 2: Check convergence
diagnostics(fit_base)
plot(fit_base, type = "trace")
# Step 3: Review results
summary(fit_base)
# Step 4: Try alternative link functions
fit_probit <- clm_stan(rating ~ temp + contact, data = wine,
link = "probit",
chains = 4, iter = 2000, warmup = 1000)
fit_gev <- clm_stan(rating ~ temp + contact, data = wine,
link = "gev", link_param = list(xi = "estimate"),
chains = 4, iter = 2000, warmup = 1000)
# Step 5: Compare models
loo_base <- loo(fit_base)
loo_probit <- loo(fit_probit)
loo_gev <- loo(fit_gev)
loo::loo_compare(loo_base, loo_probit, loo_gev)
# Step 6: Final model interpretation
summary(fit_base)
plot(fit_base, type = "intervals")If you see Rhat > 1.01 or low ESS:
# Increase iterations and warmup
fit <- clm_stan(rating ~ temp, data = wine,
chains = 4,
iter = 4000,
warmup = 2000)
# Increase adapt_delta (helps with divergences)
fit <- clm_stan(rating ~ temp, data = wine,
chains = 4,
iter = 2000,
warmup = 1000,
adapt_delta = 0.95)
# Increase max_treedepth
fit <- clm_stan(rating ~ temp, data = wine,
chains = 4,
iter = 2000,
warmup = 1000,
max_treedepth = 12)Divergent transitions indicate that the sampler had difficulty exploring the posterior:
adapt_delta: Try 0.95, 0.99,
or even 0.999These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.