| Title: | Bayesian Methods for Economic Data Disaggregation |
|---|---|
| Description: | Implements a novel Bayesian disaggregation framework that combines Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) dimension reduction of prior weight matrices with deterministic Bayesian updating rules. The method provides Markov Chain Monte Carlo (MCMC) free posterior estimation with built-in diagnostic metrics. While based on established PCA (Jolliffe, 2002) <doi:10.1007/b98835> and Bayesian principles (Gelman et al., 2013) <doi:10.1201/b16018>, the specific integration for economic disaggregation represents an original methodological contribution. |
| Authors: | José Mauricio Gómez Julián [aut, cre] |
| Maintainer: | José Mauricio Gómez Julián <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-05-22 06:24:59 UTC |
| Source: | https://github.com/IsadoreNabi/BayesianDisaggregation |
Performs Bayesian disaggregation of an aggregated time series (e.g., CPI)
into components using one of four deterministic update rules:
weighted, multiplicative, dirichlet, adaptive.
bayesian_disaggregate( path_cpi, path_weights, method = c("weighted", "multiplicative", "dirichlet", "adaptive"), lambda = 0.7, gamma = 0.1, coh_mult = 3, coh_const = 0.5, stab_a = 1000, stab_b = 10, stab_kappa = 50, likelihood_pattern = "recent" )bayesian_disaggregate( path_cpi, path_weights, method = c("weighted", "multiplicative", "dirichlet", "adaptive"), lambda = 0.7, gamma = 0.1, coh_mult = 3, coh_const = 0.5, stab_a = 1000, stab_b = 10, stab_kappa = 50, likelihood_pattern = "recent" )
path_cpi |
Path to the CPI Excel file. Must contain at least the
columns |
path_weights |
Path to the Excel file with the baseline weight matrix
(prior): either |
method |
Disaggregation method: |
lambda |
Weight for the |
gamma |
Uncertainty factor for the |
coh_mult |
Multiplier for the coherence increment |
coh_const |
Constant offset for coherence, truncated to |
stab_a |
Sensitivity for row-sum deviation penalty |
stab_b |
Sensitivity for negative-values penalty (count of negatives). |
stab_kappa |
Sensitivity for temporal variation (average |
likelihood_pattern |
Temporal spreading pattern for the likelihood:
|
Assumptions: (i) prior/posterior rows lie on the simplex; (ii) no MCMC is used,
updates are analytic/deterministic; (iii) read_* helpers coerce benign
formatting issues and error on malformed inputs.
A list with:
yearsInteger vector of years used.
industriesCharacter vector of sector/column names.
priorTibble prior with Year then sectors.
likelihood_tTibble likelihood over time (same shape as prior).
likelihoodTibble Sector, L (length ).
posteriorTibble posterior (rows sum to 1).
metricsTibble with hyperparameters + coherence, stability,
interpretability, efficiency, composite, T, K.
read_cpi, read_weights_matrix,
compute_L_from_P, spread_likelihood,
posterior_weighted, posterior_multiplicative,
posterior_dirichlet, posterior_adaptive,
coherence_score, stability_composite,
interpretability_score
# Minimal synthetic run (no files): T <- 6; K <- 4 P <- matrix(rep(1/K, T*K), nrow = T) L <- runif(K); L <- L/sum(L) LT <- spread_likelihood(L, T, "recent") W <- posterior_weighted(P, LT, lambda = 0.7)# Minimal synthetic run (no files): T <- 6; K <- 4 P <- matrix(rep(1/K, T*K), nrow = T) L <- runif(K); L <- L/sum(L) LT <- spread_likelihood(L, T, "recent") W <- posterior_weighted(P, LT, lambda = 0.7)
Measures how much the posterior improves alignment with the
sectoral signal relative to the prior . We compute the
correlation increment using
robust_cor (chooses Pearson/Spearman by larger absolute value), then
map it to with mult and const:
coherence_score(P, W, L, mult = 3, const = 0.5)coherence_score(P, W, L, mult = 3, const = 0.5)
P |
Prior matrix ( |
W |
Posterior matrix ( |
L |
Likelihood vector (length |
mult |
Non-negative multiplier applied to the correlation increment (default |
const |
Constant offset in |
Scalar coherence score in .
T <- 6; K <- 4 P <- matrix(runif(T*K), T); P <- P/rowSums(P) W <- matrix(runif(T*K), T); W <- W/rowSums(W) L <- runif(K); L <- L/sum(L) coherence_score(P, W, L)T <- 6; K <- 4 P <- matrix(runif(T*K), T); P <- P/rowSums(P) W <- matrix(runif(T*K), T); W <- W/rowSums(W) L <- runif(K); L <- L/sum(L) coherence_score(P, W, L)
Builds a sectoral likelihood (length ) from the prior weights
matrix by taking the absolute value of the
first right singular vector of the centered matrix (no scaling), then
normalizing to the unit simplex. Includes input validation, optional row
renormalization, and a safe fallback when PC1 is degenerate.
compute_L_from_P(P, renormalize_rows = TRUE, tol = 1e-12)compute_L_from_P(P, renormalize_rows = TRUE, tol = 1e-12)
P |
Numeric matrix |
renormalize_rows |
Logical; if |
tol |
Numeric tolerance for simplex checks (default |
Validation: P must be a finite, non-negative numeric matrix.
Each row must either (i) already sum to 1 within tol or (ii) be renormalizable
within tol. Rows with (near-)zero sums are not renormalizable and raise an error.
Missing values are not allowed.
Algorithm (exactly as implemented):
Center columns over time: X <- scale(P, center = TRUE, scale = FALSE).
Compute SVD: sv <- svd(X).
Take the first right singular vector (first column of V matrix); set .
If or PC1 is degenerate, fall back to column means of P (over time) and renormalize.
Otherwise, .
Numeric vector of length (non-negative, sums to 1). Attributes:
"pc1_loadings": signed PC1 loadings .
"explained_var": fraction of variance explained by PC1.
"fallback": TRUE if column-mean fallback was used.
spread_likelihood, bayesian_disaggregate
set.seed(123) T <- 10; K <- 4 P <- matrix(rexp(T*K), nrow = T); P <- P / rowSums(P) L <- compute_L_from_P(P) stopifnot(length(L) == K, all(L >= 0), abs(sum(L) - 1) < 1e-12)set.seed(123) T <- 10; K <- 4 P <- matrix(rexp(T*K), nrow = T); P <- P / rowSums(P) L <- compute_L_from_P(P) stopifnot(length(L) == K, all(L >= 0), abs(sum(L) - 1) < 1e-12)
Balances two ideas: (i) preservation of the average sectoral structure
(correlation between colMeans(P) y colMeans(W); truncated at 0),
and (ii) plausibility of relative changes
, summarized by the 90th
percentile (or the maximum). The score is
interpretability_score(P, W, use_q90 = TRUE)interpretability_score(P, W, use_q90 = TRUE)
P |
Prior matrix ( |
W |
Posterior matrix ( |
use_q90 |
If |
Scalar interpretability score in .
T <- 6; K <- 5 P <- matrix(runif(T*K), T); P <- P/rowSums(P) W <- matrix(runif(T*K), T); W <- W/rowSums(W) interpretability_score(P, W)T <- 6; K <- 5 P <- matrix(runif(T*K), T); P <- P/rowSums(P) W <- matrix(runif(T*K), T); W <- W/rowSums(W) interpretability_score(P, W)
Sets the package-wide logging verbosity.
log_enable(level = "INFO")log_enable(level = "INFO")
level |
Character scalar. One of "TRACE", "DEBUG", "INFO", "WARN", "ERROR". |
No return value, called for side effects
Penalizes numerical issues in : (i) deviation of each row sum from 1,
measured by the mean absolute deviation, and (ii) count of negative entries.
The score is .
numerical_stability_exp(W, a = 1000, b = 10)numerical_stability_exp(W, a = 1000, b = 10)
W |
Posterior matrix ( |
a |
Sensitivity for row-sum deviation (default |
b |
Sensitivity for negative counts (default |
Scalar numerical stability score in .
W <- matrix(runif(20), 5); W <- W/rowSums(W) numerical_stability_exp(W)W <- matrix(runif(20), 5); W <- W/rowSums(W) numerical_stability_exp(W)
Ajusta el *mixing* por sector según la volatilidad histórica del prior:
,
donde es la desviación estándar relativa de la columna .
Sectores más volátiles reciben más peso de . Devuelve
renormalizado por filas.
posterior_adaptive(P, LT)posterior_adaptive(P, LT)
P |
Prior matrix ( |
LT |
Likelihood matrix ( |
Posterior matrix (), filas suman 1.
posterior_weighted, posterior_multiplicative,
posterior_dirichlet
set.seed(1) T <- 8; K <- 5 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_adaptive(P, LT)set.seed(1) T <- 8; K <- 5 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_adaptive(P, LT)
Interpreta y como proporciones convertidas a pseudo-cuentas
con concentración total . El posterior medio es
.
posterior_dirichlet(P, LT, gamma = 0.1)posterior_dirichlet(P, LT, gamma = 0.1)
P |
Prior matrix ( |
LT |
Likelihood matrix ( |
gamma |
Positive uncertainty factor (default |
Posterior matrix (), filas suman 1.
posterior_weighted, posterior_multiplicative,
posterior_adaptive
T <- 6; K <- 3 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_dirichlet(P, LT, gamma = 0.1)T <- 6; K <- 3 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_dirichlet(P, LT, gamma = 0.1)
Computes (producto elemento a elemento) y
renormaliza cada fila a la simplex. Útil cuando prior y likelihood deben
reforzarse mutuamente.
posterior_multiplicative(P, LT)posterior_multiplicative(P, LT)
P |
Prior matrix ( |
LT |
Likelihood matrix ( |
Posterior matrix (), filas suman 1.
posterior_weighted, posterior_dirichlet,
posterior_adaptive
T <- 4; K <- 4 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_multiplicative(P, LT)T <- 4; K <- 4 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_multiplicative(P, LT)
Computes row-wise and renormalizes each
row to the unit simplex (suma 1). Expects matrices no negativas con las mismas
dimensiones .
posterior_weighted(P, LT, lambda = 0.7)posterior_weighted(P, LT, lambda = 0.7)
P |
Prior matrix ( |
LT |
Likelihood matrix ( |
lambda |
Mixing weight in |
Posterior matrix (), filas suman 1.
posterior_multiplicative, posterior_dirichlet,
posterior_adaptive
T <- 5; K <- 3 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_weighted(P, LT, 0.6) stopifnot(all(abs(rowSums(W)-1) < 1e-12))T <- 5; K <- 3 P <- matrix(runif(T*K), T); P <- P / rowSums(P) LT <- matrix(runif(T*K), T); LT <- LT / rowSums(LT) W <- posterior_weighted(P, LT, 0.6) stopifnot(all(abs(rowSums(W)-1) < 1e-12))
Loads and normalizes a CPI time series from an Excel worksheet. The function
detects the date/year column and the CPI/value column by pattern-matching on
lower-cased header names, parses localized numerics (via to_num_commas()),
collapses duplicate years by averaging, and returns a clean, sorted data frame.
read_cpi(path_cpi)read_cpi(path_cpi)
path_cpi |
Character path to the CPI Excel file. |
Column detection. Headers are lower-cased and matched with:
Date/year: patterns "date|fecha|year|anio|ano".
CPI/value: patterns "cpi|indice|price".
If either column cannot be identified, the function errors.
Cleaning.
Year is extracted as the first 4 digits of the date-like column.
CPI is parsed with to_num_commas() (handles commas/thousands).
NA rows are dropped; duplicates in Year are averaged.
Output is sorted by Year ascending.
A data.frame with two columns:
Year (integer)
CPI (numeric)
read_weights_matrix, bayesian_disaggregate
# Create a temporary Excel file with sample CPI data temp_file <- tempfile(fileext = ".xlsx") df_sample <- data.frame( Fecha = c("2019-01-01", "2020-01-01", "2021-01-01", "2022-01-01"), Indice = c(95.5, 100.0, 103.2, 108.7) ) openxlsx::write.xlsx(df_sample, temp_file) # Read the CPI data df <- read_cpi(temp_file) print(df) # Verify structure stopifnot( is.data.frame(df), all(c("Year", "CPI") %in% names(df)), nrow(df) == 4 ) # Clean up unlink(temp_file)# Create a temporary Excel file with sample CPI data temp_file <- tempfile(fileext = ".xlsx") df_sample <- data.frame( Fecha = c("2019-01-01", "2020-01-01", "2021-01-01", "2022-01-01"), Indice = c(95.5, 100.0, 103.2, 108.7) ) openxlsx::write.xlsx(df_sample, temp_file) # Read the CPI data df <- read_cpi(temp_file) print(df) # Verify structure stopifnot( is.data.frame(df), all(c("Year", "CPI") %in% names(df)), nrow(df) == 4 ) # Clean up unlink(temp_file)
Loads a sector-by-year weight table, normalizes weights to the simplex per year,
and returns a list with the prior matrix P, the sector
names, and the year vector. The first column is assumed to contain sector names
(renamed to Industry); all other columns are treated as years.
read_weights_matrix(path_weights)read_weights_matrix(path_weights)
path_weights |
Character path to the weights Excel file. |
Expected layout. One sheet with:
First column: sector names (any header; renamed to Industry).
Remaining columns: years; the function extracts a 4-digit year from each
header using stringr::str_extract(Year, "\\d{4}").
Values are parsed with to_num_commas(), missing rows are dropped, and
weights are normalized within each year to sum to 1. Any absent (sector, year)
entry becomes 0 when pivoting wide. Finally, rows are re-normalized with
row_norm1() for numerical safety.
Safeguards.
Rows with all-missing/zero after parsing are dropped by the filters.
If no valid year columns are found, the function errors.
A list with:
P numeric matrix of prior weights (rows sum to 1).
industriesCharacter vector of sector names (length ).
yearsInteger vector of years (length ).
read_cpi, bayesian_disaggregate
# Create a temporary Excel file with sample weights temp_file <- tempfile(fileext = ".xlsx") df_sample <- data.frame( Sector = c("Agriculture", "Manufacturing", "Services", "Construction"), "2019" = c(0.20, 0.35, 0.30, 0.15), "2020" = c(0.18, 0.37, 0.32, 0.13), "2021" = c(0.17, 0.38, 0.33, 0.12), "2022" = c(0.16, 0.39, 0.34, 0.11), check.names = FALSE ) openxlsx::write.xlsx(df_sample, temp_file) # Read the weights matrix w <- read_weights_matrix(temp_file) # Inspect structure str(w) print(w$P) # Verify properties stopifnot( is.matrix(w$P), nrow(w$P) == 4, # 4 years ncol(w$P) == 4, # 4 sectors all(abs(rowSums(w$P) - 1) < 1e-10), # rows sum to 1 length(w$industries) == 4, length(w$years) == 4 ) # Clean up unlink(temp_file)# Create a temporary Excel file with sample weights temp_file <- tempfile(fileext = ".xlsx") df_sample <- data.frame( Sector = c("Agriculture", "Manufacturing", "Services", "Construction"), "2019" = c(0.20, 0.35, 0.30, 0.15), "2020" = c(0.18, 0.37, 0.32, 0.13), "2021" = c(0.17, 0.38, 0.33, 0.12), "2022" = c(0.16, 0.39, 0.34, 0.11), check.names = FALSE ) openxlsx::write.xlsx(df_sample, temp_file) # Read the weights matrix w <- read_weights_matrix(temp_file) # Inspect structure str(w) print(w$P) # Verify properties stopifnot( is.matrix(w$P), nrow(w$P) == 4, # 4 years ncol(w$P) == 4, # 4 sectors all(abs(rowSums(w$P) - 1) < 1e-10), # rows sum to 1 length(w$industries) == 4, length(w$years) == 4 ) # Clean up unlink(temp_file)
Run grid search for parameter optimization (parallel PSOCK)
run_grid_search( path_cpi, path_weights, grid_df, n_cores = parallel::detectCores() - 1 )run_grid_search( path_cpi, path_weights, grid_df, n_cores = parallel::detectCores() - 1 )
path_cpi |
Path to CPI Excel file |
path_weights |
Path to weights Excel file |
grid_df |
Data frame with parameter combinations (one row = one config) |
n_cores |
Number of cores for parallel processing |
Data frame with results for all parameter combinations (ordered by composite desc)
Writes CSV extracts and a single Excel workbook with the key outputs from
bayesian_disaggregate.
save_results(res, out_dir)save_results(res, out_dir)
res |
A result object returned by |
out_dir |
Directory where files will be written. Created if missing. |
(Invisibly) the path to the Excel file written.
# Usar archivos de ejemplo incluidos en el paquete cpi_file <- system.file("extdata", "CPI.xlsx", package = "BayesianDisaggregation") weights_file <- system.file("extdata", "WEIGHTS.xlsx", package = "BayesianDisaggregation") if (nzchar(cpi_file) && nzchar(weights_file)) { res <- bayesian_disaggregate(cpi_file, weights_file) # Use tempdir() for CRAN compliance save_results(res, file.path(tempdir(), "results")) }# Usar archivos de ejemplo incluidos en el paquete cpi_file <- system.file("extdata", "CPI.xlsx", package = "BayesianDisaggregation") weights_file <- system.file("extdata", "WEIGHTS.xlsx", package = "BayesianDisaggregation") if (nzchar(cpi_file) && nzchar(weights_file)) { res <- bayesian_disaggregate(cpi_file, weights_file) # Use tempdir() for CRAN compliance save_results(res, file.path(tempdir(), "results")) }
Expands a sectoral likelihood (length ) into a
matrix by applying a temporal weight profile and then row-normalizing to the
simplex.
spread_likelihood( L, T_periods, pattern = c("constant", "recent", "linear", "bell") )spread_likelihood( L, T_periods, pattern = c("constant", "recent", "linear", "bell") )
L |
Numeric vector (length |
T_periods |
Integer; number of time periods |
pattern |
Temporal pattern for the weights across time; one of
|
Given and a non-negative time-weight vector , the function
replicates across rows and applies elementwise, then
row-normalizes using row_norm1():
Patterns:
"constant": .
"recent": linearly increasing in (more weight to later periods).
"linear": affine ramp from first to last period.
"bell": symmetric bell-shaped profile centered at .
Numeric matrix ; each row sums to 1.
compute_L_from_P, bayesian_disaggregate
set.seed(1) K <- 5; T <- 8 L <- runif(K); L <- L / sum(L) LT <- spread_likelihood(L, T, "recent") stopifnot(nrow(LT) == T, ncol(LT) == K, all(abs(rowSums(LT) - 1) < 1e-12))set.seed(1) K <- 5; T <- 8 L <- runif(K); L <- L / sum(L) LT <- spread_likelihood(L, T, "recent") stopifnot(nrow(LT) == T, ncol(LT) == K, all(abs(rowSums(LT) - 1) < 1e-12))
Convex combination of numerical and temporal stability:
stability_composite(W, a = 1000, b = 10, kappa = 50)stability_composite(W, a = 1000, b = 10, kappa = 50)
W |
Posterior matrix ( |
a |
Sensitivity for row-sum deviation in numerical part (default |
b |
Sensitivity for negatives in numerical part (default |
kappa |
Sensitivity for temporal smoothness (default |
Scalar composite stability score in .
numerical_stability_exp, temporal_stability
W <- matrix(runif(20), 5); W <- W/rowSums(W) stability_composite(W)W <- matrix(runif(20), 5); W <- W/rowSums(W) stability_composite(W)
Rewards smooth posteriors by penalizing average absolute period-to-period
changes per sector. Maps to via .
temporal_stability(W, kappa = 50)temporal_stability(W, kappa = 50)
W |
Posterior matrix ( |
kappa |
Sensitivity to average absolute change (default |
Scalar temporal stability score in . If , returns 0.8.
W <- matrix(runif(30), 6); W <- W/rowSums(W) temporal_stability(W, kappa = 40)W <- matrix(runif(30), 6); W <- W/rowSums(W) temporal_stability(W, kappa = 40)