Evidence-based Bayesian disaggregation

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

What this package does

Given an observed aggregate price index $\mathrm{cpi}_t$ and a matrix of (known) sectoral aggregation weights $W_{t,k}$ — value-added (VAB) shares — the goal is to recover the $K$ latent sectoral price indices $\varphi_{t,k}$ that the aggregate is made of. The sectoral indices then feed a downstream nested Ornstein–Uhlenbeck model (bayesianOU) as the market price $\varphi$.

The disaggregation is genuinely Bayesian: the aggregate enters as evidence (an observation density), and the sectoral indices come out as a posterior with credible intervals, not as a single deterministic re-weighting.

Why “evidence-based”: the F1–F6 history

The 0.1.x family advertised “MCMC-free Bayesian disaggregation”, but the aggregate CPI never entered the computation (F1): the “posterior” was derived from the prior weight matrix alone, the Dirichlet concentration cancelled on renormalization (F2), the temporal pattern cancelled too (F3), an “efficiency” term was a fixed constant (F4), there were no recovery tests (F5), and a correlation helper opportunistically picked whichever of Pearson/Spearman was larger (F6). That foundational defect — not using the data — cannot be patched within a deterministic re-weighting; the fix is a model that conditions on the aggregate. The deterministic family has been removed; two honest Bayesian engines replace it.

The model (state-space, “Model A”)

Latent state in logs, with a random walk plus drift and partial pooling: \[ \log \varphi_{t,k} = \log \varphi_{t-1,k} + \delta_k + \tau_k\,\eta_{t,k}, \qquad \eta_{t,k}\sim\mathcal N(0,1), \] with $\delta_k \sim \mathcal N(\delta_\mu,\delta_\sigma)$ and $\log\tau_k \sim \mathcal N(\mu_{\log\tau}, \sigma_{\log\tau})$ (the drift and the innovation scale are pooled across sectors). The cross-section at $t=1$ is anchored at the aggregate level with an estimable dispersion $\omega_{\text{struct}}$ (the real concentration the old Dirichlet $\gamma$ failed to be): \[ \log\varphi_{1,k} = \log(\text{phi1\_center}) + \omega_{\text{struct}}\,z_k . \] The aggregate is the genuine observation: \[ \mathrm{cpi}_t \sim \mathrm{Student\text{-}t}\!\left(\nu,\ \textstyle\sum_k W_{t,k}\,\varphi_{t,k},\ \sigma\right), \] (Gaussian if student_obs = FALSE).

Identification, honestly (rigour by layers)

The aggregate $\sum_k W\varphi$ is strongly identified by the observation density. The per-sector split is only weakly identified: at each period one linear combination of the $K$ sectors is pinned by the CPI, and the remaining $K-1$ directions are governed by the cross-sectional prior plus temporal smoothness. So the per-sector intervals are honestly wide and prior-influenced. This is not a defect to hide — it is the correct uncertainty, and it is precisely why we feed the full posterior draws (not a point estimate) to the OU by multiple imputation: the sectoral uncertainty is propagated, not faked away.

Two engines, one trade-off

Closed-form (conjugate) — disaggregate_conjugate(). A linear-Gaussian random walk in levels with the same aggregate observation; its exact posterior is the Kalman filter + RTS smoother, with no MCMC. Joint posterior draws come from the Durbin–Koopman simulation smoother. This is the correct realization of the original “MCMC-free posterior” idea.
MCMC — disaggregate_statespace(). The richer model above (log scale ⇒ positivity, Student-t ⇒ robustness to aggregate outliers, hierarchical pooling), which is not conjugate and therefore needs HMC.

Both are Bayesian. Closed form buys speed and exactness at the cost of a simpler (Gaussian, linear) model; MCMC buys richness at the cost of sampling.

A runnable example (closed-form engine)

library(BayesianDisaggregation)

sim <- simulate_disagg(T = 30, K = 4, seed = 1)   # synthetic CPI + VAB weights
bl  <- disaggregate_conjugate(sim$cpi, sim$W, n_draws = 100, seed = 1)
bl
#> <disagg_conjugate>  closed-form linear-Gaussian baseline (Kalman/RTS)
#>   periods T = 30, sectors K = 4, joint draws = 100
#>   aggregate Gaussian log-likelihood = -64.56

## the smoothed aggregate tracks the CPI tightly (aggregate is well identified)
round(cor(bl$agg_summary[, "median"], sim$cpi), 4)
#> [1] 0.999

## joint posterior draws: the [T, K, draws] contract consumed by the nested OU
dim(bl$phi_draws)
#> [1]  30   4 100

The MCMC engine (sketch, not evaluated here)

fit <- disaggregate_statespace(sim$cpi, sim$W, chains = 4, iter = 2000, warmup = 1000)
fit$diagnostics                 # rhat_max, divergences
dim(fit$phi_draws)              # T x K x draws
str(fit$phi_summary)            # median, q2.5, q97.5 (T x K each)

## couple to the nested OU (uncertainty propagated by Rubin's rule):
## bayesianOU::fit_ou_nested_mi(phi_draws = fit$phi_draws, X = Phi_index, ...)

From Excel directly, reusing the bundled readers:

cpi_file <- system.file("extdata", "CPI.xlsx", package = "BayesianDisaggregation")
w_file   <- system.file("extdata", "WEIGHTS.xlsx", package = "BayesianDisaggregation")
fit <- disaggregate_from_files(cpi_file, w_file, chains = 2, iter = 1000)

Data note: comparing index vs index

The model is about index levels, so the CPI must be a level series (FRED units “Index source base”, aggregation “Average” for annual data — never a rate-of-change), re-indexed to the same base as the production prices it will be compared against (e.g. 1982–1984 = 100 via the project’s convert_to_index). Feeding a percent-change series here is a category error: the aggregate would not be on the same scale as $\sum_k W\varphi$.

Coupling to the nested OU

disaggregate_statespace()$phi_draws (or disaggregate_conjugate(..., n_draws = M)$phi_draws) is a [T, K, M] array — exactly the multiple-imputation input of bayesianOU::fit_ou_nested_mi(). The OU refits once per imputation and combines the analyses by Rubin’s rule, so the disaggregation uncertainty becomes part of the OU posterior. ```

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.