The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
When data are clustered (e.g., students within schools, patients within hospitals), observations are not independent. Ignoring this clustering can lead to:
The bglmm() function extends bglm() to
handle clustered data by incorporating random
effects.
Use bglmm() when:
Use bglm() when:
We’ll analyze a study comparing two teaching methods across 20 schools on reading and math ability, differentiating on student baseline ability.
library(bmco)
set.seed(2020)
n_schools <- 20
n_per_school <- 15
n_total <- n_schools * n_per_school
# Generate random intercepts
school_intercepts_math <- rnorm(n_schools, mean = 0, sd = 1.50)
school_intercepts_reading <- rnorm(n_schools, mean = 0, sd = 1.50)
# Cluster randomization: whole schools receive the same method.
school_method <- rep(c("traditional", "new_method"), each = n_schools/2)
study_data <- data.frame(
school_id = factor(rep(1:n_schools, each = n_per_school)),
method = rep(school_method, each = n_per_school),
baseline_ability = rnorm(n_total, mean = 50, sd = 10),
stringsAsFactors = FALSE
)
# Standardize covariate
study_data$baseline_ability_std <- scale(study_data$baseline_ability)[, 1]
study_data$math_pass <- NA
study_data$reading_pass <- NA
p_math <- p_reading <- rep(NA, nrow(study_data))
for (i in 1:nrow(study_data)) {
school <- as.numeric(study_data$school_id[i])
is_new <- ifelse(study_data$method[i] == "new_method", 1, 0)
p_math[i] <- plogis(-0.50 + 0.75 * is_new + 0.10 * study_data$baseline_ability_std[i] +
0.20 * is_new * study_data$baseline_ability_std[i] +
school_intercepts_math[school])
p_reading[i] <- plogis(-0.50 + 0.80 * is_new + 0.05 * study_data$baseline_ability_std[i] +
0.15 * is_new * study_data$baseline_ability_std[i] +
school_intercepts_reading[school])
study_data$math_pass[i] <- rbinom(1, 1, p_math[i])
study_data$reading_pass[i] <- rbinom(1, 1, p_reading[i])
}set.seed(2025)
result <- bglmm(
data = study_data,
grp = "method",
grp_a = "traditional",
grp_b = "new_method",
id_var = "school_id",
x_var = "baseline_ability",
y_vars = c("math_pass", "reading_pass"),
x_method = "Empirical",
x_def = c(-Inf, Inf),
fixed = c("method", "baseline_ability", "method_baseline_ability"),
random = c("Intercept"),
test = "right_sided",
rule = "All",
n_burn = 200,
n_it = 500,
n_thin = 1
)print(result)
#>
#> Bayesian Multilevel Multivariate Logistic Regression
#> ======================================================
#>
#> Group mean y1 mean y2
#> traditional 0.416 0.546
#> new_method 0.537 0.634
#> J = 20 clusters n(traditional) = 150 n(new_method) = 150
#>
#> Posterior probability P(new_method > traditional) [All rule]: 0.980
#> Marginalization: Empirical over [-Inf, Inf]
#> MPSRF: fixed = 10.9836 random = 1.3948 variance = 5.4158
#>
#> Use summary() for full coefficient tables, priors and MCMC diagnostics.
summary(result)
#>
#> Bayesian Multilevel Multivariate Logistic Regression
#> ======================================================
#>
#> Multilevel Structure: J = 20 clusters n(traditional) = 150 n(new_method) = 150
#>
#> Group Estimates:
#> Group mean y1 sd y1 mean y2 sd y2
#> traditional 0.416 0.030 0.546 0.033
#> new_method 0.537 0.031 0.634 0.032
#>
#> Treatment Effect:
#> Delta mean (y1, y2): 0.121, 0.088
#> Delta SE (y1, y2): 0.002, 0.002
#> Posterior probability P(new_method > traditional): 0.980
#>
#> Fixed Effects (mean [SD]):
#> b11 b10 b01
#> method 3.693 [5.041] 2.346 [3.905] 3.596 [4.233]
#> baseline_ability 0.021 [0.032] 0.045 [0.024] -0.005 [0.027]
#> method_baseline_ability 0.233 [0.205] 0.155 [0.194] 0.216 [0.191]
#>
#> Random Effects (population mean [SD]):
#> g11 g10 g01
#> Intercept -0.241 [2.605] -1.000 [2.675] 1.010 [2.475]
#>
#> Variance Components (posterior mean):
#> y1=1, y2=1 (b11):
#> Intercept
#> Intercept 1029.96
#> y1=1, y2=0 (b10):
#> Intercept
#> Intercept 544.973
#> y1=0, y2=1 (b01):
#> Intercept
#> Intercept 712.445
#>
#> Prior Specification:
#> Fixed effects -- Normal prior:
#> Mean: 0, 0, 0
#> Variance: 10, 10, 10
#> Random effects -- Normal prior:
#> Mean: 0
#> Variance: 10
#> Covariance -- Inverse-Wishart: df = 1
#>
#> Marginalization:
#> Method: Empirical (Sub)population: [-Inf, Inf]
#> Decision rule: All
#>
#> MCMC Convergence Diagnostics:
#> Fixed effects -- MPSRF: 10.9836
#> Parameter ESS Rhat
#> b11_method[1] 36.9 7.0948
#> b11_baseline_ability[2] 10.7 1.2712
#> b11_method_baseline_ability[3] 12.9 5.4743
#> b10_method[1] 41.2 3.5012
#> b10_baseline_ability[2] 17.5 1.1125
#> b10_method_baseline_ability[3] 19.8 5.4723
#> b01_method[1] 19.1 2.4033
#> b01_baseline_ability[2] 11.4 1.6787
#> b01_method_baseline_ability[3] 10.6 5.5842
#>
#> Random effects -- MPSRF: 1.3948
#> Parameter ESS Rhat
#> g11_Intercept[1] 565.3 1.1999
#> g10_Intercept[1] 515.1 1.5451
#> g01_Intercept[1] 512.2 1.2905
#>
#> Variance components -- MPSRF: 5.4158
#> Parameter ESS Rhat
#> g11_InterceptIntercept 89.5 4.4046
#> g10_InterceptIntercept 106.7 4.9981
#> g01_InterceptIntercept 305.3 4.5160The posterior probability indicates the probability that the new teaching method improves both math and reading outcomes (rule = “All”), accounting for:
Like bglm(), you can define subpopulations:
Both outcomes must favor the new method:
At least one outcome must favor the new method:
bglmm() uses weakly informative priors
by default. You can customize these priors for:
b_mu0,
b_sigma0)g_mu0,
g_sigma0)nu0, tau0)Fixed effects have a multivariate normal prior:
\[\beta \sim N(\mu_0, \Sigma_0)\]
# Default
p_fixed <- 2
b_mu0 <- rep(0, p_fixed) # Zero prior mean
b_sigma0 <- solve(diag(1e1, p_fixed)) # Small prior precision (large prior variance on the log-odds scale).
# Weakly informative.
# solve() generates precision matrix needed as inputWhere p_fixed is the number of fixed effects. In our
example, we use: - Covariate effect(s) - Interaction term(s)
Random effects means have a multivariate normal prior:
\[\gamma \sim N(\mu_g, \Sigma_g)\]
Default:
p_random <- 2
g_mu0 <- rep(0, p_random) # Prior mean
g_sigma0 <- solve(diag(1e1, p_random)) # Weakly informative prior variance;
# solve() transforms variance matrix to precision matrix needed as inputWhere p_random is the number of fixed effects. In our
example, we use: - Intercept - Group effect
The covariance matrix has an inverse-Wishart prior:
\[\Sigma \sim \text{InvWishart}(\nu_0, \mathbf{T}_0)\]
Default:
Important: - nu0 must be ≥
p_random - tau0 must be positive
definite (never use diag(0, p_random)) - Smaller
nu0 = less informative - Larger tau0 diagonal
values = expect more between-cluster variation
Compare results under different priors:
# -----------------------------------------------------------------
# Three priors that pull in clearly different directions.
# -----------------------------------------------------------------
# 1. Weakly informative (neutral starting point)
# Diffuse on fixed effects, diffuse IW for random effects.
result_weak <- bglmm(
# ... other arguments ...
b_mu0 = rep(0, p_fixed),
b_sigma0 = solve(diag(10, p_fixed)),
g_mu0 = rep(0, p_random),
g_sigma0 = solve(diag(10, p_random)),
nu0 = p_random,
tau0 = diag(0.1, p_random),
)
# 2. Skeptical prior
# Tight variance so the prior resists positive or negative evidence.
# Also a small tau0, implying homogeneity between schools.
result_skeptical <- bglmm(
# ... other arguments ...
b_mu0 = rep(0, p_fixed),
b_sigma0 = solve(diag(c(1, 1), p_fixed)), # Tight — hard(er) to overcome
g_mu0 = rep(0, p_random),
g_sigma0 = solve(diag(c(1, 1), p_random)), # Tight — hard(er) to overcome
nu0 = p_random + 4, # More informative IW
tau0 = diag(0.05, p_random), # Expects small school variation
)Interpretation:
Default priors: Use for exploratory analysis or when no prior information exists
Informative priors: Use when:
Skeptical priors: Use for:
Prior sensitivity: Check:
Don’t: Use tau0 = diag(0, p)
(singular matrix)
Do: Use tau0 = diag(1e-1, p) or
diag(0.5, p)
Don’t: Set nu0 < p (too few
degrees of freedom)
Do: Use `nu0 =\(\geq p\)
Don’t: Use extremely small prior variances without justification
Do: Use weakly informative priors unless you have strong evidence
Don’t: Ignore prior sensitivity when sample size is small
Do: Report results under multiple priors for transparency
For Bayesian logistic regression priors: - Gelman, A. Jakulin, A., Pittau, M. G. & Su, Y-S (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics, 2(4), 1360-1383. .
For inverse-Wishart priors: - Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515-534.
The function internally checks the multivariate potential scale
reduction factor (MPSRF) and warns if > 1.1. Additional MCMC
diagnostics will be returned When
return_diagnostics = TRUE. When
return_diagnostic_plots = TRUE, bglmm()
returns diagnostic plots as well.
set.seed(2030)
result_diag <- bglmm(
data = study_data,
grp = "method",
grp_a = "traditional",
grp_b = "new_method",
id_var = "school_id",
x_var = "baseline_ability",
y_vars = c("math_pass", "reading_pass"),
x_method = "Value",
x_def = 50,
n_burn = 200,
n_it = 500,
n_thin = 1,
start = c(0.5, 1),
return_diagnostics = TRUE
)Key diagnostics:
# Check convergence
cat("Random effects convergence:\n")
#> Random effects convergence:
print(result_diag$diags$g$convergence)
#> $mpsrf
#> [1] 6.27427
#>
#> $psrf
#> Point est. Upper C.I.
#> g11_Intercept[1] 1.368093 2.748530
#> g11_method[2] 1.146185 1.151254
#> g10_Intercept[1] 1.135531 1.218255
#> g10_method[2] 2.271145 7.825439
#> g01_Intercept[1] 1.683002 3.983673
#> g01_method[2] 3.646306 8.278269
cat("\nRandom effects covariance convergence:\n")
#>
#> Random effects covariance convergence:
print(result_diag$diags$tau$convergence)
#> $mpsrf
#> [1] 5.167838
#>
#> $psrf
#> Point est. Upper C.I.
#> g11_InterceptIntercept 4.983014 29.190062
#> g11_methodIntercept 5.274581 21.553906
#> g11_methodmethod 1.201419 1.731316
#> g10_InterceptIntercept 2.539835 13.514819
#> g10_methodIntercept 2.659497 14.379082
#> g10_methodmethod 2.672124 14.473397
#> g01_InterceptIntercept 4.175515 20.831837
#> g01_methodIntercept 4.298845 20.442173
#> g01_methodmethod 4.247503 19.116352Available plots:
plot(result_diag) # All
plot(result_diag, type = "trace") # Trace only
plot(result_diag, type = "density") # Density only
plot(result_diag, which = "fixed") # Fixed effects
plot(result_diag, which = "random") # Random effects
plot(result_diag, which = "variance") # Variance components
plot(result_diag, which = "all") # All
# Combine type + which
plot(result_diag, type = "trace", which = "all") # Trace for all parameters
plot(result_diag, type = "density", which = "variance") # Variance densitiesIf convergence issues occur:
n_burn (more warmup)n_it (more samples)If autocorrelation is high, usually the effective sample size is much
lower than n_it:
n_itFor more information on MCMC diagnostics, consult
Brooks, S. P. & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455.
Gelman, A. & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.
# Treatment effect samples
str(result_samples$samples$delta)
# Compute custom summaries
cat("Treatment effect quantiles:\n")
print(apply(result_samples$samples$delta, 2, quantile, probs = c(0.025, 0.5, 0.975)))
# Probability that effect on math > 0.1
cat("\nP(delta_math > 0.1):", mean(result_samples$samples$delta[, 1] > 0.1), "\n")For reliable multilevel modeling:
Problem: Too few clusters might hinder reliable random effect estimation
Solutions: - Collect more clusters if possible - Use
bglm() instead (ignores clustering), but only if
appropriate
Problem: MPSRF > 1.1
Solutions: 1. Increase burn-in:
n_burn = 2000 or higher 2. Increase iterations:
n_it = 5000 or higher 3. Reduce thinning:
n_thin = 1 (keeps more samples) 4. Check data quality
(outliers, sufficient variation)
Problem: Large number of clusters or long chains
Solutions: - Start with small n_it to
verify model runs - Increase n_thin to reduce stored
samples: n_thin = 10 - Run overnight for production
analyses - Typical times: 5-10 minutes for 10 clusters, 10k
iterations
bglmm() extends Bayesian multivariate analysis to
clustered data by:
Choose your analysis:
bmvb(): Full sample, no
clusteringbglm(): Subgroup analysis, no
clusteringbglmm(): Subgroup analysis +
clusteringAll three functions support multivariate binary outcomes and provide posterior probabilities for treatment effects.
For more details on statistical methodology or when using the
bglmm()-function, please cite:
Kavelaars, X., J. Mulder, and M. Kaptein (2023). “Bayesian multilevel, multivariate logistic regression for superiority decision-making under, observable treatment heterogeneity”. In: BMC Medical Research, Methodology 23.1. DOI:, 10.1186/s12874-023-02034-z.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.