The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Sensitivity Analysis Framework for Bayesian Economic Disaggregation

José Mauricio Gómez Julián

2025-10-09

1. Problem Setup

We observe an aggregate index (e.g., CPI) by period \(t=1,\dots,T\), and we want a sectoral disaggregation into \(K\) components whose period-wise shares lie on the unit simplex:

\[ W_t = (w_{t1},\dots,w_{tK}),\qquad w_{tk}\ge 0,\quad \sum_{k=1}^K w_{tk}=1. \]

We start with a prior weight matrix \(P\in\mathbb{R}^{T\times K}\) (rows on the simplex), and construct a likelihood of sectors \(L\in\Delta^{K-1}\) (a non-negative vector summing to one). A temporal profile then spreads \(L\) to \(LT\in\mathbb{R}^{T\times K}\). Finally, a deterministic update rule combines \(P\) and \(LT\) to obtain the posterior \(W\).

2. Constructing the Sectoral Likelihood \(L\)

2.1 PCA/SVD of the centered prior matrix

Let \(P\) be validated (finite, non-negative, rows \(\approx 1\); small deviations renormalized). We center columns over time:

\[ X = P - \mathbf{1}\,\bar p^\top,\quad \bar p = \frac{1}{T}\sum_{t=1}^T P_{t\cdot}. \]

Compute the SVD \(X = U\Sigma V^\top\). Let \(v\) denote the first right singular vector (PC1 loadings). We map to non-negative salience via absolute values and normalize:

\[ \ell_k = |v_k|,\qquad L_k = \frac{\ell_k}{\sum_j \ell_j}. \]

If PC1 is degenerate (near-zero variance or identical columns), we fall back to column means of \(P\) (renormalized). This is implemented in:

# Example call (internals are in the package):
# L <- compute_L_from_P(P)

Diagnostics attached to L: attributes "pc1_loadings", "explained_var", and "fallback".

2.2 Temporal spreading of \(L\)

We create a time-varying matrix \(LT\) by applying a non-negative weight profile \(w_t\) and row-renormalizing:

\[ LT_{t,k} \propto w_t L_k,\qquad \sum_k LT_{t,k}=1. \]

Built-in patterns:

constant: \(w_t=1\)
recent: linearly increasing in \(t\) (more weight to recent periods)
linear: affine ramp between endpoints
bell: symmetric Gaussian-like bump around \(T/2\)

# Example call:
# LT <- spread_likelihood(L, T_periods = nrow(P), pattern = "recent")

3. Posterior Updating Rules (Deterministic, MCMC-free)

Given \(P\) and \(LT\) (both row-wise on the simplex), we define four deterministic updates:

Weighted average (mixing parameter \(\lambda\in[0,1]\)):

\[ W = \mathsf{norm}_1\{\lambda P + (1-\lambda)LT\}. \]

Multiplicative (elementwise product with re-normalization):

\[ W = \mathsf{norm}_1\{P\odot LT\}. \]

Dirichlet mean (analytical conjugacy, \(\gamma>0\), smaller \(\gamma\Rightarrow\) sharper):

\[ \alpha_{\text{post}} = \frac{P}{\gamma} + \frac{LT}{\gamma},\qquad W = \frac{\alpha_{\text{post}}}{\mathbf{1}^\top\alpha_{\text{post}}}. \]

Adaptive (sector-wise mixing by prior volatility):

\[ \phi_k=\min\!\Big(\frac{\sigma_k}{\bar\sigma},\,0.8\Big),\quad W_{t\cdot}=\mathsf{norm}_1\{(1-\phi)\odot P_{t\cdot} + \phi\odot LT_{t\cdot}\}. \]

All are exposed in the package:

# posterior_weighted(P, LT, lambda = 0.7)
# posterior_multiplicative(P, LT)
# posterior_dirichlet(P, LT, gamma = 0.1)
# posterior_adaptive(P, LT)

4. Coherence, Stability, and Interpretability

4.1 Coherence with respect to \(L\)

Define prior/posterior temporal means:

\[ \bar p = \frac{1}{T}\sum_t P_{t\cdot},\qquad \bar w = \frac{1}{T}\sum_t W_{t\cdot}. \]

Let \(\rho(\cdot,\cdot)\) be a robust correlation (max of |Pearson| and |Spearman|). The coherence scales the increment \(\Delta\rho=\max(0,\rho(\bar w,L)-\rho(\bar p,L))\):

\[ \text{coherence}=\min\{1,\ \text{const} + \text{mult}\cdot\Delta\rho\}. \]

4.2 Numerical and temporal stability

Numerical stability (exponential penalty) on row-sum deviation and negatives:

\[ S_{\text{num}}=\exp\{-a\cdot\overline{|\sum_k W_{tk}-1|} - b\cdot \#(W<0)\}. \]

Temporal stability via average \(|\Delta|\) (lower variation \(\Rightarrow\) higher score):

\[ S_{\text{tmp}} = \frac{1}{1+\kappa\cdot \overline{|\Delta W|}},\quad \overline{|\Delta W|}=\frac{1}{K}\sum_k \frac{1}{T-1}\sum_{t}|W_{t+1,k}-W_{t,k}|. \]

Composite stability (default weights 60% numeric, 40% temporal):

\[ S_{\text{comp}} = 0.6\,S_{\text{num}} + 0.4\,S_{\text{tmp}}. \]

The package functions:

# coherence_score(P, W, L, mult = 3.0, const = 0.5)
# numerical_stability_exp(W, a = 1000, b = 10)
# temporal_stability(W, kappa = 50)
# stability_composite(W, a = 1000, b = 10, kappa = 50)

4.3 Interpretability

Two principles:

Preservation of the sectoral structure (correlation between \(\bar p\) and \(\bar w\));
Plausibility of average sector changes (penalize extreme relative shifts).

Implementation:

\[ \text{pres}=\max\{0,\rho(\bar p,\bar w)\},\qquad r_k=\frac{|\,\bar w_k-\bar p_k\,|}{\bar p_k+\epsilon},\quad \text{plaus}= \frac{1}{1+2\cdot Q_{0.9}(r_k)}. \]

Then \(\text{interp}=0.6\,\text{pres}+0.4\,\text{plaus}\).

# interpretability_score(P, W, use_q90 = TRUE)

5. End-to-End API (`bayesian_disaggregate`)

The convenience pipeline:

read_cpi() and read_weights_matrix() (Excel)
compute_L_from_P(P) and spread_likelihood(L, T, pattern)
posterior rule (weighted / multiplicative / dirichlet / adaptive)
metrics: coherence, stability (composite), interpretability, efficiency (heuristic), composite score
export helpers: save_results() and a one-file workbook for “best” config

# Example signature (see Section 8 for real data):
# bayesian_disaggregate(path_cpi, path_weights,
#   method = c("weighted","multiplicative","dirichlet","adaptive"),
#   lambda = 0.7, gamma = 0.1,
#   coh_mult = 3.0, coh_const = 0.5,
#   stab_a = 1000, stab_b = 10, stab_kappa = 50,
#   likelihood_pattern = "recent")

6. Interpreting Key Visualizations

Heatmap of posterior \(W\): each cell is a sector share in a year; rows are years, columns are sectors. Read it as: darker tiles = larger sector share; horizontal smoothness indicates temporal stability; vertical patterns (bands) show persistent sectoral importance.
Top-sectors lines: for the most relevant sectors by average share, lines track the sector’s share over time. Read it as: consistent levels = stability; trend changes coincide with macro structure shifts.
Sectoral CPI sheet: \(\hat Y_{t,k} = \text{CPI}_t \times W_{t,k}\). Read it as: dollarized (or index-scaled) decomposition of the aggregate.

7. Reproducible Synthetic Demo (evaluates on knit)

This chunk synthesizes a small example you can knit safely.

# Synthetic prior matrix (rows on simplex)
T <- 10; K <- 6
set.seed(123)
P <- matrix(rexp(T*K), nrow = T)
P <- P / rowSums(P)

# Likelihood vector from P (PCA/SVD; robust with fallback)
L  <- compute_L_from_P(P)

# Spread over time with "recent" pattern
LT <- spread_likelihood(L, T_periods = T, pattern = "recent")

# Try a couple of posteriors
W_weighted <- posterior_weighted(P, LT, lambda = 0.7)
W_adaptive <- posterior_adaptive(P, LT)

# Metrics for adaptive
coh  <- coherence_score(P, W_adaptive, L)
stab <- stability_composite(W_adaptive, a = 1000, b = 10, kappa = 50)
intr <- interpretability_score(P, W_adaptive)
eff  <- 0.65
comp <- 0.30*coh + 0.25*stab + 0.25*intr + 0.20*eff

data.frame(coherence = coh, stability = stab, interpretability = intr,
           efficiency = eff, composite = comp) %>% round(4)

##     coherence stability interpretability efficiency composite
## 90%         1    0.7537           0.6887       0.65    0.7906

8. Full Real-Data Pipeline (disable/enable evaluation)

Switch to eval=TRUE after setting your paths. By default we keep this chunk off to render quickly on any machine.

# === Create synthetic data for CRAN-compliant demo ===
demo_dir <- tempdir()

# Create synthetic CPI data (mimicking your structure)
set.seed(123)
cpi_demo <- data.frame(
  Year = 2000:2010,
  CPI = cumsum(c(100, rnorm(10, 0.5, 2)))  # Random walk starting at 100
)
cpi_file <- file.path(demo_dir, "synthetic_cpi.xlsx")
openxlsx::write.xlsx(cpi_demo, cpi_file)

# Create synthetic weights matrix (mimicking VAB weights structure)
set.seed(456)
years <- 2000:2010
sectors <- c("Agriculture", "Manufacturing", "Services", "Construction", "Mining")

weights_demo <- data.frame(Year = years)
for(sector in sectors) {
  weights_demo[[sector]] <- runif(length(years), 0.05, 0.35)
}
# Normalize rows to sum to 1 (simplex constraint)
weights_demo[, -1] <- weights_demo[, -1] / rowSums(weights_demo[, -1])
weights_file <- file.path(demo_dir, "synthetic_weights.xlsx")
openxlsx::write.xlsx(weights_demo, weights_file)

# Use synthetic data paths
path_cpi <- cpi_file
path_w <- weights_file
out_dir <- demo_dir

cat("Using synthetic data for CRAN demo:\n")
cat("CPI file:", path_cpi, "\n")
cat("Weights file:", path_w, "\n")
cat("Output directory:", out_dir, "\n")

# --- Base run (robust defaults) ---
base_res <- bayesian_disaggregate(
  path_cpi           = path_cpi,
  path_weights       = path_w,
  method             = "adaptive",
  lambda             = 0.7,   # recorded in metrics; not used by "adaptive"
  gamma              = 0.1,
  coh_mult           = 3.0,
  coh_const          = 0.5,
  stab_a             = 1000,
  stab_b             = 10,
  stab_kappa         = 60,
  likelihood_pattern = "recent"
)
xlsx_base <- save_results(base_res, out_dir = file.path(out_dir, "base"))
print(base_res$metrics)

# --- Minimal grid search for demo (reduced size) ---
n_cores <- 1  # Use single core for CRAN compliance
grid_df <- expand.grid(
  method             = c("weighted", "adaptive"),  # Reduced methods
  lambda             = c(0.5, 0.7),               # Reduced options
  gamma              = 0.1,                       # Single option
  coh_mult           = 3.0,                       # Single option  
  coh_const          = 0.5,                       # Single option
  stab_a             = 1000,
  stab_b             = 10,
  stab_kappa         = 60,                        # Single option
  likelihood_pattern = "recent",                  # Single option
  KEEP.OUT.ATTRS     = FALSE,
  stringsAsFactors   = FALSE
)

grid_res <- run_grid_search(
  path_cpi     = path_cpi,
  path_weights = path_w,
  grid_df      = grid_df,
  n_cores      = n_cores
)
write.csv(grid_res, file.path(out_dir, "grid_results.csv"), row.names = FALSE)

best_row <- grid_res %>% arrange(desc(composite)) %>% slice(1)
print("Best configuration from grid search:")
print(best_row)

# --- Re-run the best configuration for clean export ---
best_res <- bayesian_disaggregate(
  path_cpi           = path_cpi,
  path_weights       = path_w,
  method             = best_row$method,
  lambda             = if (!is.na(best_row$lambda)) best_row$lambda else 0.7,
  gamma              = if (!is.na(best_row$gamma))  best_row$gamma  else 0.1,
  coh_mult           = best_row$coh_mult,
  coh_const          = best_row$coh_const,
  stab_a             = best_row$stab_a,
  stab_b             = best_row$stab_b,
  stab_kappa         = best_row$stab_kappa,
  likelihood_pattern = best_row$likelihood_pattern
)
xlsx_best <- save_results(best_res, out_dir = file.path(out_dir, "best"))

# --- One Excel with everything (including hyperparameters) ---
sector_summary <- tibble(
  Sector          = colnames(best_res$posterior)[-1],
  prior_mean      = colMeans(as.matrix(best_res$prior[, -1])),
  posterior_mean  = colMeans(as.matrix(best_res$posterior[, -1]))
)

wb <- createWorkbook()
addWorksheet(wb, "Hyperparameters"); writeData(wb, "Hyperparameters", best_row)
addWorksheet(wb, "Metrics");         writeData(wb, "Metrics", best_res$metrics)
addWorksheet(wb, "Prior_P");         writeData(wb, "Prior_P", best_res$prior)
addWorksheet(wb, "Posterior_W");     writeData(wb, "Posterior_W", best_res$posterior)
addWorksheet(wb, "Likelihood_t");    writeData(wb, "Likelihood_t", best_res$likelihood_t)
addWorksheet(wb, "Likelihood_L");    writeData(wb, "Likelihood_L", best_res$likelihood)
addWorksheet(wb, "Sector_Summary");  writeData(wb, "Sector_Summary", sector_summary)

for (sh in c("Hyperparameters","Metrics","Prior_P","Posterior_W",
             "Likelihood_t","Likelihood_L","Sector_Summary")) {
  freezePane(wb, sh, firstRow = TRUE)
  addFilter(wb, sh, rows = 1, cols = 1:ncol(readWorkbook(wb, sh)))
  setColWidths(wb, sh, cols = 1:200, widths = "auto")
}

# --- Add sectoral CPI (aggregate times posterior weights) ---
W_post <- best_res$posterior           # Year + sectors
cpi_df <- read_cpi(path_cpi)           # Year, CPI
sector_cpi <- dplyr::left_join(W_post, cpi_df, by = "Year") %>%
  dplyr::mutate(dplyr::across(-c(Year, CPI), ~ .x * CPI))

# Quality check: sector sums vs CPI
check_sum <- sector_cpi %>%
  dplyr::mutate(row_sum = rowSums(dplyr::across(-c(Year, CPI))),
                diff    = CPI - row_sum)
cat("Quality check (first 5 rows):\n")
print(head(check_sum, 5))

addWorksheet(wb, "Sector_CPI")
writeData(wb, "Sector_CPI", sector_cpi)
freezePane(wb, "Sector_CPI", firstRow = TRUE)
addFilter(wb, "Sector_CPI", rows = 1, cols = 1:ncol(sector_cpi))
setColWidths(wb, "Sector_CPI", cols = 1:200, widths = "auto")

excel_onefile <- file.path(out_dir, "best", "Best_Full_Output_withSectorCPI.xlsx")
saveWorkbook(wb, excel_onefile, overwrite = TRUE)
cat("Full results saved to:", excel_onefile, "\n")

# --- Quick plots (saved as PNGs) ---
dir_plots <- file.path(out_dir, "best", "plots")
if (!dir.exists(dir_plots)) dir.create(dir_plots, recursive = TRUE)

W_long <- best_res$posterior %>%
  pivot_longer(-Year, names_to = "Sector", values_to = "Weight")
p_heat <- ggplot(W_long, aes(Year, Sector, fill = Weight)) +
  geom_tile() + scale_fill_viridis_c() +
  labs(title = "Posterior weights (W): heatmap", x = "Year", y = "Sector", fill = "Share") +
  theme_minimal(base_size = 11) + theme(axis.text.y = element_text(size = 6))
ggsave(file.path(dir_plots, "posterior_heatmap.png"), p_heat, width = 12, height = 9, dpi = 220)

top_sectors <- best_res$posterior %>%
  summarise(across(-Year, mean)) %>%
  pivot_longer(everything(), names_to = "Sector", values_to = "MeanShare") %>%
  arrange(desc(MeanShare)) %>% slice(1:3) %>% pull(Sector)  # Reduced to top 3 for demo

p_lines <- best_res$posterior %>%
  select(Year, all_of(top_sectors)) %>%
  pivot_longer(-Year, names_to = "Sector", values_to = "Weight") %>%
  ggplot(aes(Year, Weight, color = Sector)) +
  geom_line(linewidth = 0.9) +
  labs(title = "Top 3 sectors by average share (posterior W)", y = "Share", x = "Year") +
  theme_minimal(base_size = 11)
ggsave(file.path(dir_plots, "posterior_topSectors.png"), p_lines, width = 11, height = 6, dpi = 220)

cat("Demo completed successfully. All files written to temporary directory.\n")

9. Practical Guidance and Defaults

Prefer method="adaptive" when prior sector volatilities are heterogeneous; otherwise weighted with \(\lambda\in[0.7,0.9]\) is strong and often tops the grid.
The default coherence parameters (mult=3.0, const=0.5) produce a bounded, interpretable 0–1 score that emphasizes improvement over the prior.
The exponential numerical penalty is intentionally sharp: it keeps row-sum deviations and negatives at bay in automated runs and grid searches.
For reports, export Sector_CPI to illustrate the economic decomposition \(\hat Y_{t,k}\).

Appendix A. Invariants and Quick Checks

# Example: invariants on a fresh synthetic run
T <- 6; K <- 5
set.seed(7)
P <- matrix(rexp(T*K), nrow = T); P <- P / rowSums(P)
L <- compute_L_from_P(P)
LT <- spread_likelihood(L, T, "recent")
W  <- posterior_multiplicative(P, LT)

# Invariants
stopifnot(all(abs(rowSums(P)  - 1) < 1e-12))
stopifnot(all(abs(rowSums(LT) - 1) < 1e-12))
stopifnot(all(abs(rowSums(W)  - 1) < 1e-12))
c(
  coherence = coherence_score(P, W, L),
  stability = stability_composite(W),
  interpret = interpretability_score(P, W)
) %>% round(4)

##     coherence     stability interpret.90% 
##        1.0000        0.6459        0.6245

Appendix B. Session Info

sessionInfo()

## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=C                   LC_CTYPE=Spanish_Spain.utf8   
## [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=Spanish_Spain.utf8    
## 
## time zone: America/Costa_Rica
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] openxlsx_4.2.8               readr_2.1.5                 
## [3] ggplot2_4.0.0                tidyr_1.3.1                 
## [5] dplyr_1.1.4                  BayesianDisaggregation_0.1.2
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.4.2     tidyselect_1.2.1  
##  [5] Rcpp_1.1.0         zip_2.3.3          jquerylib_0.1.4    scales_1.4.0      
##  [9] yaml_2.3.10        fastmap_1.2.0      R6_2.6.1           generics_0.1.4    
## [13] knitr_1.50         iterators_1.0.14   tibble_3.3.0       tzdb_0.5.0        
## [17] RColorBrewer_1.1-3 bslib_0.9.0        pillar_1.11.0      rlang_1.1.5       
## [21] cachem_1.1.0       stringi_1.8.7      xfun_0.53          S7_0.2.0          
## [25] sass_0.4.10        cli_3.6.5          withr_3.0.2        magrittr_2.0.4    
## [29] digest_0.6.37      foreach_1.5.2      grid_4.4.2         rstudioapi_0.17.1 
## [33] hms_1.1.3          lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.5    
## [37] glue_1.8.0         farver_2.1.2       codetools_0.2-20   rmarkdown_2.29    
## [41] purrr_1.1.0        tools_4.4.2        pkgconfig_2.0.3    htmltools_0.5.8.1

Notes

The real-data chunk is set to eval=FALSE so the manual renders anywhere. Flip it to TRUE on your machine to run fully against your Excel files.
The “best one-file” export includes Hyperparameters, Metrics, Prior_P, Posterior_W, Likelihood_t, Likelihood_L, Sector_Summary, Sector_CPI, with frozen headers and filters for quick analysis.
Plots are written to .../best/plots/ and match the interpretation guidance in Section 6.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.