The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Given an observed aggregate price index \(\mathrm{cpi}_t\) and a matrix of (known)
sectoral aggregation weights \(W_{t,k}\) — value-added (VAB) shares — the
goal is to recover the \(K\) latent
sectoral price indices \(\varphi_{t,k}\) that the aggregate is made
of. The sectoral indices then feed a downstream nested
Ornstein–Uhlenbeck model (bayesianOU) as the market price
\(\varphi\).
The disaggregation is genuinely Bayesian: the aggregate enters as evidence (an observation density), and the sectoral indices come out as a posterior with credible intervals, not as a single deterministic re-weighting.
The 0.1.x family advertised “MCMC-free Bayesian disaggregation”, but the aggregate CPI never entered the computation (F1): the “posterior” was derived from the prior weight matrix alone, the Dirichlet concentration cancelled on renormalization (F2), the temporal pattern cancelled too (F3), an “efficiency” term was a fixed constant (F4), there were no recovery tests (F5), and a correlation helper opportunistically picked whichever of Pearson/Spearman was larger (F6). That foundational defect — not using the data — cannot be patched within a deterministic re-weighting; the fix is a model that conditions on the aggregate. The deterministic family has been removed; two honest Bayesian engines replace it.
Latent state in logs, with a random walk plus drift and partial
pooling: \[
\log \varphi_{t,k} = \log \varphi_{t-1,k} + \delta_k +
\tau_k\,\eta_{t,k},
\qquad \eta_{t,k}\sim\mathcal N(0,1),
\] with \(\delta_k \sim \mathcal
N(\delta_\mu,\delta_\sigma)\) and \(\log\tau_k \sim \mathcal N(\mu_{\log\tau},
\sigma_{\log\tau})\) (the drift and the innovation scale are
pooled across sectors). The cross-section at \(t=1\) is anchored at the aggregate level
with an estimable dispersion \(\omega_{\text{struct}}\) (the real
concentration the old Dirichlet \(\gamma\) failed to be): \[
\log\varphi_{1,k} = \log(\text{phi1\_center}) +
\omega_{\text{struct}}\,z_k .
\] The aggregate is the genuine observation: \[
\mathrm{cpi}_t \sim \mathrm{Student\text{-}t}\!\left(\nu,\
\textstyle\sum_k W_{t,k}\,\varphi_{t,k},\ \sigma\right),
\] (Gaussian if student_obs = FALSE).
The aggregate \(\sum_k W\varphi\) is strongly identified by the observation density. The per-sector split is only weakly identified: at each period one linear combination of the \(K\) sectors is pinned by the CPI, and the remaining \(K-1\) directions are governed by the cross-sectional prior plus temporal smoothness. So the per-sector intervals are honestly wide and prior-influenced. This is not a defect to hide — it is the correct uncertainty, and it is precisely why we feed the full posterior draws (not a point estimate) to the OU by multiple imputation: the sectoral uncertainty is propagated, not faked away.
disaggregate_conjugate(). A linear-Gaussian random
walk in levels with the same aggregate observation; its exact
posterior is the Kalman filter + RTS smoother, with no MCMC. Joint
posterior draws come from the Durbin–Koopman simulation smoother. This
is the correct realization of the original “MCMC-free posterior”
idea.disaggregate_statespace(). The
richer model above (log scale ⇒ positivity, Student-t ⇒ robustness to
aggregate outliers, hierarchical pooling), which is not
conjugate and therefore needs HMC.Both are Bayesian. Closed form buys speed and exactness at the cost of a simpler (Gaussian, linear) model; MCMC buys richness at the cost of sampling.
library(BayesianDisaggregation)
sim <- simulate_disagg(T = 30, K = 4, seed = 1) # synthetic CPI + VAB weights
bl <- disaggregate_conjugate(sim$cpi, sim$W, n_draws = 100, seed = 1)
bl
#> <disagg_conjugate> closed-form linear-Gaussian baseline (Kalman/RTS)
#> periods T = 30, sectors K = 4, joint draws = 100
#> aggregate Gaussian log-likelihood = -64.56
## the smoothed aggregate tracks the CPI tightly (aggregate is well identified)
round(cor(bl$agg_summary[, "median"], sim$cpi), 4)
#> [1] 0.999
## joint posterior draws: the [T, K, draws] contract consumed by the nested OU
dim(bl$phi_draws)
#> [1] 30 4 100fit <- disaggregate_statespace(sim$cpi, sim$W, chains = 4, iter = 2000, warmup = 1000)
fit$diagnostics # rhat_max, divergences
dim(fit$phi_draws) # T x K x draws
str(fit$phi_summary) # median, q2.5, q97.5 (T x K each)
## couple to the nested OU (uncertainty propagated by Rubin's rule):
## bayesianOU::fit_ou_nested_mi(phi_draws = fit$phi_draws, X = Phi_index, ...)From Excel directly, reusing the bundled readers:
The model is about index levels, so the CPI must be
a level series (FRED units “Index source base”, aggregation “Average”
for annual data — never a rate-of-change), re-indexed to the
same base as the production prices it will be compared
against (e.g. 1982–1984 = 100 via the project’s
convert_to_index). Feeding a percent-change series here is
a category error: the aggregate would not be on the same scale as \(\sum_k W\varphi\).
disaggregate_statespace()$phi_draws (or
disaggregate_conjugate(..., n_draws = M)$phi_draws) is a
[T, K, M] array — exactly the multiple-imputation input of
bayesianOU::fit_ou_nested_mi(). The OU refits once per
imputation and combines the analyses by Rubin’s rule, so the
disaggregation uncertainty becomes part of the OU posterior. ```
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.