The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This appendix documents the current directional_tail()
diagnostic, how to call it directly, and how it is used inside
summary.glmb().
The diagnostic measures prior–posterior disagreement in a multivariate direction (Evans and Moshonov 2006) rather than coefficient-by-coefficient only. It reports:
mahalanobis_shift: the standardized prior-posterior
shift magnitude in whitened spacep_directional: directional tail probability,
P(delta^T Z <= 0)delta: posterior mean shift vector in whitened
coordinatesLet mu0 denote the reference vector (prior mean by
default, or a null/reference mode), and let \(\beta^{(m)}\) be posterior draws.
directional_tail() computes a whitening map from posterior
precision so that
\[ Z^{(m)} = W\left(\beta^{(m)} - \mu_0\right), \]
with \(W^\top W = \text{Prec}_{post}\). In this scale, Euclidean distance corresponds to Mahalanobis distance in the original coefficient space.
Define
\[ \delta = E[Z \mid y], \qquad d = \|\delta\|_2. \]
The function returns \(d\) as
mahalanobis_shift, and the directional-tail probability
\[ p_{\text{dir}} = P\!\left(\delta^\top Z \le 0 \mid y\right), \]
estimated by posterior draws.
Intuition: \(\delta\) is the posterior mean shift away from the reference in whitened space. The event \(\delta^\top Z \le 0\) is the half-space opposite the direction of disagreement, so \(p_{\text{dir}}\) quantifies how much posterior mass lies “against” that shift.
Under a Gaussian approximation in whitened space, \[ Z \mid y \approx N(\delta, I_p), \] then with \(u = \delta/\|\delta\|\), \[ u^\top Z \sim N(d, 1), \qquad p_{\text{dir}} = P(u^\top Z \le 0) = \Phi(-d). \] So the directional tail is a one-dimensional tail probability driven entirely by the standardized distance \(d\).
For \(p=1\), whitening reduces to scalar standardization and \[ d = \frac{\mu_{post} - \mu_0}{\sigma_{post}}, \] so \(d\) is the posterior z/t-type contrast against \(\mu_0\). In this case:
Thus directional_tail() plays the same role as a
one-sided standardized test in the univariate case, while still
extending naturally to multivariate models.
The squared shift \[ d^2 = \delta^\top \delta \] is the whitened quadratic distance from the reference. This is the same geometric core as multivariate Wald statistics (and, after scaling, Hotelling/F forms): larger \(d^2\) means stronger global departure from the reference.
Difference in emphasis:
directional_tail() uses directional
half-space evidence along \(\delta\).Under the Gaussian approximation, both are monotone in \(d\): as \(d\) increases, quadratic-test evidence strengthens and \(p_{\text{dir}}=\Phi(-d)\) decreases.
In this sense, directional_tail() is a directional
companion to multivariate Wald/F-style testing: it retains the same
Mahalanobis geometry but reports evidence on an interpretable one-sided
directional probability scale.
This is the key conceptual difference from classical p-values (Bernardo and Smith 1994).
In a classical t-test, the p-value is a sampling probability under the null: \[ P_{H_0}\!\left(T(Y^{rep}) \ge T(y_{obs})\right), \] that is, how extreme the observed statistic would be in repeated samples if \(H_0\) were true.
By contrast, the directional-tail quantity here is a posterior probability conditional on the observed data: \[ p_{\text{dir}} = P\!\left(\delta^\top Z \le 0 \mid y,\ \text{model},\ \text{prior},\ \mu_0\right). \]
So the probability being assessed is not a prior probability and not a frequentist repeated-sampling probability. It is posterior mass, after updating by the data.
Practical interpretation:
p_directional: posterior mass is concentrated in
the direction away from the reference \(\mu_0\), indicating stronger directional
disagreement.p_directional (near 0.5): weak directional
separation from the reference.For summary.glmb(), this means:
dir_tail (vs prior): posterior directional disagreement
with the prior mean.dir_tail_null (vs null): posterior directional
disagreement with the null/intercept-only reference used in the summary
routine.In the one-parameter case, directional tail reduces to a one-sided posterior tail area relative to \(\mu_0\):
This is closely related to well-known Bayesian sign/tail summaries
(including posterior sign probability / “probability of direction” style
summaries in applied Bayesian reporting (Gelman
et al. 2013)). In that language, for a scalar parameter,
p_directional is essentially the opposite-side posterior
mass, and the sign probability is approximately \(1 - p_{\text{dir}}\).
The main contribution of directional_tail() is the
multivariate extension: it uses whitening plus
projection onto the posterior shift direction to provide a single
coherent directional-tail probability even when coefficients are
correlated and inference is genuinely multivariate.
Because the targets differ (posterior mass vs repeated-sampling
extremeness), p_directional is not numerically identical to
a classical p-value. Still, the two are often comparable through the
standardized shift magnitude \(d\).
In the scalar Gaussian approximation: \[ p_{\text{dir}} \approx \Phi(-|d|), \qquad p_{\text{post,2s}} \approx 2\Phi(-|d|), \] while a classical two-sided p-value is typically \[ p_{\text{classical,2s}} \approx 2\{1-F_t(|t|;\nu)\} \] (or \(2\Phi(-|z|)\) asymptotically).
When the prior is strong relative to the likelihood, two effects dominate:
Consequences for p_directional vs classical
p-values:
p_directional tends to be larger (weaker apparent evidence
against the reference) than classical p-values.p_directional can be smaller (stronger
evidence in that direction) than classical tails.directional_tail() summarizes the net
directional result along \(\delta\).As prior precision becomes small relative to information in the data:
Hence in the univariate case, p_directional approaches a
one-sided likelihood-based tail measure, and \(2\,p_{\text{dir}}\) approaches the familiar
two-sided z/t-style tail (asymptotically).
In the multivariate case, \(d^2\) approaches the corresponding Wald quadratic form (and its Hotelling/F scaling in finite samples), so Bayesian directional and classical quadratic tests tend to agree more closely in large-sample/weak-prior regimes.
For one coefficient, the posterior-vs-classical comparison can be decomposed into a center effect and a scale effect. Write \[ t_{\text{post}} = \frac{\hat\beta_{\text{post}}-\mu_0}{\text{SE}_{\text{post}}}, \qquad t_{\text{class}} = \frac{\hat\beta_{\text{MLE}}-\mu_0}{\text{SE}_{\text{class}}}. \] Then \[ t_{\text{post}} = t_{\text{class}} \times \underbrace{\frac{\hat\beta_{\text{post}}-\mu_0}{\hat\beta_{\text{MLE}}-\mu_0}}_{\text{center/shrinkage factor } \kappa} \times \underbrace{\frac{\text{SE}_{\text{class}}}{\text{SE}_{\text{post}}}}_{\text{scale factor}}. \]
In conjugate normal-gamma style parameterizations, a common heuristic is \[ \text{SE}_{\text{post}} \approx \text{SE}_{\text{class}} \sqrt{\frac{n-p}{n+n_{\text{prior}}}}, \] so \[ \frac{\text{SE}_{\text{class}}}{\text{SE}_{\text{post}}} \approx \sqrt{\frac{n+n_{\text{prior}}}{n-p}}. \] If one also uses a simple shrinkage approximation \(\kappa \approx \frac{n}{n+n_{\text{prior}}}\), then \[ t_{\text{post}} \approx t_{\text{class}} \cdot \frac{n}{n+n_{\text{prior}}} \cdot \sqrt{\frac{n+n_{\text{prior}}}{n-p}}. \]
Using \(n_{\text{prior}} = n\,\frac{\text{pwt}}{1-\text{pwt}}\), this can be rewritten as \[ t_{\text{post}} \approx t_{\text{class}} \cdot \sqrt{1-\text{pwt}} \cdot \sqrt{\frac{n}{n-p}}. \]
This expression shows why posterior directional tails can be either larger or smaller than classical tails:
A useful threshold from this heuristic is: \[ \sqrt{\frac{n+n_{\text{prior}}}{n-p}} > 1 \;\;\Longleftrightarrow\;\; n_{\text{prior}} > p, \] so once prior pseudo-sample size is large relative to dimension, scale effects can dominate unless center shrinkage is also strong.
This is the scalar analogue of the multivariate directional-tail story: center and scale are both active, and the final probability is driven by the standardized shift magnitude after whitening.
Ex_directional_tail.R)The package example already demonstrates the intended workflow and interpretation. In particular, it shows:
lmb/glmb),directional_tail(fit),mahalanobis_shift,
p_directional, and the draw-level objects,A compact usage sketch from that example is:
example("directional_tail", package = "glmbayes")
# Core objects used in interpretation:
dt <- directional_tail(fit)
dt$mahalanobis_shift
dt$p_directional
# Same quantities are also surfaced in summary output:
s <- summary(fit)
s$dir_tail
s$dir_tail_nullEx_directional_tail.RThe example script also includes two useful diagnostic visualizations:
dt <- directional_tail(fit)
Z <- dt$draws$Z
flag <- dt$draws$is_tail
delta <- dt$delta
w <- delta
plot(
Z,
col = ifelse(flag, "red", "blue"),
pch = 19,
xlab = "Z1",
ylab = "Z2",
main = "Directional Tail Diagnostic (Whitened Space)"
)
# Decision boundary orthogonal to direction vector
abline(a = 0, b = -w[1] / w[2], col = "darkgreen", lty = 2)
# Radius corresponding to Mahalanobis shift
r <- sqrt(sum(delta^2))
symbols(
delta[1], delta[2],
circles = r,
inches = FALSE,
add = TRUE,
lwd = 2,
fg = "gray"
)
points(0, 0, pch = 4, col = "black", lwd = 2) # reference center in whitened space
points(delta[1], delta[2], pch = 3, col = "purple", lwd = 2) # posterior shiftB <- dt$draws$B
flag <- dt$draws$is_tail
mu0 <- as.numeric(fit$Prior$mean)
mu_post <- colMeans(B)
plot(
B,
col = ifelse(flag, "red", "blue"),
pch = 19,
xlab = "Coefficient 1",
ylab = "Coefficient 2",
main = "Directional Tail Diagnostic (Raw Coefficient Space)"
)
points(mu0[1], mu0[2], pch = 4, col = "black", cex = 1.5) # reference center
points(mu_post[1], mu_post[2], pch = 3, col = "darkgreen", cex = 1.5) # posterior center
legend(
"topright",
legend = c("Tail draws", "Non-tail draws", "Reference", "Posterior"),
col = c("red", "blue", "black", "darkgreen"),
pch = c(19, 19, 4, 3),
bty = "n"
)directional_tail() is calledThe function is exported and can be called directly on a fitted
object (glmb or lmb):
fit <- glmb(
counts ~ outcome + treatment,
family = poisson(),
pfamily = dNormal(mu = mu, Sigma = V)
)
dt_prior <- directional_tail(fit) # reference = prior mean
dt_prior
print(dt_prior)To compare against a user-specified reference (for example a
null/intercept-only mode), pass mu0 explicitly:
summary.glmb() uses directional tailsummary.glmb() computes two directional diagnostics
internally:
directional_tail(object) : posterior vs prior meandirectional_tail(object, mu0 = null_est_full) :
posterior vs null referenceThese are returned in the summary object:
The print method displays a compact “Directional Tail Summaries” table with:
vs Null,
vs Prior)vs Null, vs Prior)The package example inst/examples/Ex_directional_tail.R
provides a full reproducible script. A minimal call pattern is:
ps <- Prior_Setup(weight ~ group, family = gaussian(), data = dat2)
fit <- lmb(weight ~ group, dNormal(ps$mu, ps$Sigma, dispersion = ps$dispersion), data = dat2, n = 10000)
dt <- directional_tail(fit)
print(dt)** 1.0 Why the Bayesian tail can be smaller
There are three distinct, concrete reasons the Bayesian posterior tail probability for a coefficient can be smaller than the classical t tail
*** 1.1 Different effective degrees of freedom (v)
In the conjugate parameterization above, posterior \(\boldsymbol{v = 2\alpha_{0} + n}\). If the prior has positive \(\boldsymbol{\alpha_{0}}\) (an informative prior on the variance), the posterior \(\boldsymbol{v}\) will be larger than the classical \(n - p\). Increasing \(v\) makes the Student‑t closer to normal (lighter tails), so the tail probability at the same \(t\) value is smaller.
– Even for vaguely inforative priors, the prior contributes “pseudo-observations” to the posterior df. This is often the main cause of smaller Bayesian tails
*** 1.2 Different standard error (scale) used in the denominator
Bayesian marginal variance for \(\boldsymbol{\beta_{j}}\) is \(\boldsymbol{\frac{\beta_{n}}{\alpha_{n}} V_{n}[j,j]}\).
Two effects happen here:
(a) \(\boldsymbol{V_{n}}\) generally
shrinks relative to \((X^{T}X)^{-1}\)
because of the prior precision \(\boldsymbol{V_{0}^{-1}}\), and
(b) the posterior scale factor \(\boldsymbol{\frac{\beta_{n}}{\alpha_{n}}}\)
can be smaller than the classical estimate \(\hat{s}^{2}\) depending on prior
hyperparameters and the data (e.g., prior favors smaller variance).
Either effect reduces the denominator and can increase the raw \(t\) value — but combined with larger \(\boldsymbol{v}\) the net tail probability can still be smaller.
So you must compare both the numerator (posterior mean \(\boldsymbol{m_{n,j}}\)) vs. the OLS estimate \(\hat{\beta}_{j}\), and the denominator (posterior scale vs. \(\boldsymbol{\hat{s}\sqrt{(X^{T}X)^{-1}_{jj}}}\)). Prior shrinkage often makes \(m_{n,j}\) closer to zero and \(V_{n}[j,j]\) smaller, both of which reduce evidence against \(H_{0}\).
For Bayesian linear models with normal-gamma priors, it seems like standard errors used in t-statistic calculations need to be adjusted relative to the classical estimates. In particular, the adjustment seems to be
The adjusted standard error is:
\[ \text{Std\_err}_t = \text{Std\_err} \cdot \sqrt{ \frac{n - p}{n + n_{\text{prior}}} } \]
which leads to:
\[ t_{\text{adj},i} = \frac{n}{n + n_{\text{prior}}} \, t_{\text{unadjusted},i} \, \sqrt{\frac{n + n_{\text{prior}}}{n - p}}. \]
where the first ratio shows the skrinkage effect (lower t-value) and the second effect the impact of the adjustment to the standard error (which boosts the t-value).
\[ t_{\text{adj},i} = \sqrt{\frac{n}{n + n_{\text{prior}}}} \cdot \sqrt{\frac{n}{n -p}} \cdot t_{\text{unadjusted},i} \]
\[ t_{\text{adj},i} = \sqrt{\frac{n}{n + n_{\text{prior}}}} \cdot \sqrt{\frac{n}{n -p}} \cdot t_{\text{unadjusted},i} \]
\[ t_{\text{adj},i} = \sqrt{1-pwt} \cdot \sqrt{\frac{n}{n -p}} \cdot t_{\text{unadjusted},i} \]
It can be seen from the above that t-statitic in the Bayesian context is larger than the unadjusted t-statistic (i.e., inflated) whenver
\[ n_{\text{prior}} < p \cdot \frac{n}{n - p} \]
which implies tail probabilities smaller than their classical counterparts.
\[ n + n_{\text{prior}} = n + n\left(\frac{\text{pwt}}{1 - \text{pwt}}\right) = \frac{n(1 - \text{pwt}) + n(\text{pwt})}{1 - \text{pwt}} = \frac{n}{1 - \text{pwt}}. \]
The adjusted standard error is \[ \text{StdErr}_{t} = \text{StdErr} \cdot \sqrt{\frac{n - p}{n + n_{\text{prior}}}}, \] which leads to the adjusted t-statistic
\[ t_{\text{adj},i} = t_{\text{unadjusted},i} \cdot \sqrt{\frac{n + n_{\text{prior}}}{n - p}}. \]
For reference, the same expression can be written as
\[ \text{StdErr}_{t} = \text{StdErr} \cdot \sqrt{\frac{n - p}{n + n_{\text{prior}}}}. \]
which leads to
\[ t_{\text{adj},i} = t_{\text{unadjusted},i} \cdot \sqrt{\frac{n + n_{\text{prior}}}{n - p}}. \]
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.