The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The SignalY package provides a comprehensive framework for signal extraction from panel data, integrating multiple complementary methodologies:
The package distinguishes between latent structure (the underlying data-generating process) and phenomenological dynamics (observed variability), providing automated technical interpretation of results.
# Install from CRAN (when available)
install.packages("SignalY")
# Install development version from GitHub
# remotes::install_github("username/SignalY")
# For Bayesian methods (Horseshoe, HP-GC), install cmdstanr:
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
cmdstanr::install_cmdstan()library(SignalY)
# Generate example data
set.seed(42)
n <- 100
p <- 20
# Create correlated predictors with factor structure
factors <- matrix(rnorm(n * 2), n, 2)
loadings <- matrix(runif(p * 2, -1, 1), p, 2)
X <- factors %*% t(loadings) + matrix(rnorm(n * p, 0, 0.5), n, p)
colnames(X) <- paste0("X", 1:p)
# True signal depends on 5 predictors
true_beta <- c(rep(1, 5), rep(0, 15))
Y <- X %*% true_beta + rnorm(n, 0, 0.5)
# Combine into data frame
data <- data.frame(Y = Y, X)
# Run comprehensive analysis
result <- signal_analysis(
data = data,
y_formula = "Y",
methods = "all",
verbose = TRUE
)
# View results
print(result)
summary(result)
plot(result)SignalY operationalizes the distinction between latent structure and phenomenological dynamics:
This distinction maps onto statistical concepts:
| Concept | Statistical Implementation |
|---|---|
| Signal | Trend component, common factors |
| Noise | Idiosyncratic shocks, measurement error |
| Sparsity | Few variables carry information |
| Persistence | Unit roots, long memory |
| Question | Recommended Method |
|---|---|
| What are the dominant frequencies? | Wavelets, EMD |
| Which predictors matter? | Horseshoe regression |
| Is there common factor structure? | PCA, DFM |
| Is the series stationary? | Unit root battery |
| What’s the trend vs cycle? | HP-GC filter |
# Apply wavelet decomposition
Y <- sin(seq(0, 4*pi, length.out = 128)) + rnorm(128, 0, 0.3)
wav_result <- filter_wavelet(
x = Y,
filter = "la8", # Daubechies least asymmetric, 8 vanishing moments
levels = c(3, 4), # Combine D3 + D4 (business cycle frequencies)
first_diff = FALSE
)
# Examine results
names(wav_result)
# [1] "combined" "smooth" "details" "level_variance" "filter" "levels"
# Combined signal captures 8-32 period cycles (for annual data)
plot(wav_result$combined, type = "l", main = "Wavelet-Filtered Signal")The wavelet filter choice matters:
"la8": Least asymmetric, 8 vanishing moments
(recommended for economic data)"d4": Daubechies 4, more compact support"haar": Simplest, for discontinuous signalsLevel selection corresponds to periodicities:
| Level | Period Range (annual data) |
|---|---|
| D1 | 2-4 years |
| D2 | 4-8 years |
| D3 | 8-16 years |
| D4 | 16-32 years |
| D5 | 32-64 years |
EMD is data-adaptive, requiring no basis function specification:
emd_result <- filter_emd(
x = Y,
max_imf = 10,
boundary = "wave"
)
# IMFs are ordered from highest to lowest frequency
matplot(emd_result$imf[, 1:3], type = "l",
main = "First 3 Intrinsic Mode Functions")
# Residue captures the trend
plot(emd_result$residue, type = "l", main = "EMD Residue (Trend)")The Grant-Chan embedded HP filter provides:
# Requires cmdstanr
hpgc_result <- filter_hpgc(
x = Y,
lambda = NULL, # Auto-select via DIC
n_chains = 4,
n_iter = 2000
)
# Compare to standard HP filter
plot(Y, type = "l", col = "gray")
lines(hpgc_result$trend, col = "blue", lwd = 2)
legend("topright", c("Original", "Bayesian Trend"),
col = c("gray", "blue"), lwd = c(1, 2))The Horseshoe prior provides:
# Fit Horseshoe regression
hs_fit <- fit_horseshoe(
y = Y,
X = X,
p0 = 5, # Expected non-zeros (can be NULL for auto)
n_chains = 4,
n_iter = 2000,
n_warmup = 1000,
adapt_delta = 0.95, # Target acceptance (increase if divergences)
use_qr = TRUE # QR decomposition for multicollinearity
)
# Examine shrinkage
print(hs_fit)
# Key outputs:
# - beta: Coefficient estimates
# - kappa: Shrinkage factors (0 = no shrinkage, 1 = full shrinkage)
# - m_eff: Effective number of non-zeros
# Select variables by shrinkage threshold
selection <- select_by_shrinkage(hs_fit, threshold = 0.5)
which(selection$selected) # Indices of selected predictorsThe shrinkage factor \(\kappa_j\) quantifies how much coefficient \(j\) is pulled toward zero:
\[\kappa_j = \frac{1}{1 + \tau^2 \lambda_j^2 / c^2}\]
| \(\kappa\) | Interpretation |
|---|---|
| 0.0-0.1 | Strong signal, minimal shrinkage |
| 0.1-0.3 | Moderate signal |
| 0.3-0.5 | Weak signal |
| 0.5-0.7 | Likely noise |
| 0.7-1.0 | Strong shrinkage, essentially zero |
pca_result <- pca_bootstrap(
X = X,
n_components = NULL, # Auto-select
rotation = "varimax", # Or "none", "oblimin"
n_boot = 1000,
block_length = NULL, # Auto: sqrt(T)
significance_level = 0.05
)
# Key outputs
pca_result$variance_explained # Proportion by component
pca_result$loadings # Variable loadings
pca_result$entropy # Entropy of loadings (concentration measure)
pca_result$loadings_significant # Bootstrap significance
# Interpretation
# Low entropy = concentrated loadings (few variables dominate)
# High entropy = diffuse loadings (many variables contribute)DFMs capture dynamic common factor structure:
dfm_result <- estimate_dfm(
X = X,
n_factors = NULL, # Auto-select via Bai-Ng IC
max_factors = 10,
var_lags = 1, # VAR lags for factor dynamics
ic_criterion = "bai_ng_2"
)
# Key outputs
dfm_result$n_factors # Optimal number
dfm_result$factors # Estimated factors (T x r)
dfm_result$loadings # Factor loadings (p x r)
dfm_result$variance_explained
dfm_result$var_coefficients # VAR transition matrixur_result <- test_unit_root(
x = Y,
tests = c("adf", "ers", "kpss", "pp")
)
# Synthesized conclusion
ur_result$synthesis
# $conclusion: "stationary", "trend_stationary", "difference_stationary", or "inconclusive"
# $confidence: "high", "medium", "low"
# $evidence: Detailed reasoning
# Individual test results
ur_result$tests$adf_none
ur_result$tests$ers_dfgls| Conclusion | Meaning | Implications |
|---|---|---|
| Stationary | Mean-reverting, transient shocks | Use levels in regression |
| Trend-stationary | Deterministic trend + stationary | Detrend before analysis |
| Difference-stationary | Stochastic trend (unit root) | First-difference or cointegration |
| Inconclusive | Mixed evidence | Use theory, consider both |
signal_analysis() orchestrates all methods:
result <- signal_analysis(
data = data,
y_formula = "Y", # Or formula: Y ~ X1 + X2 + X3
time_var = NULL, # Time variable name
group_var = NULL, # Panel group variable
methods = "all", # Or c("wavelet", "horseshoe", "pca")
# Method-specific configuration
filter_config = list(
wavelet_filter = "la8",
wavelet_levels = c(3, 4),
hpgc_n_chains = 4
),
horseshoe_config = list(
p0 = NULL, # Auto-calibrate
n_chains = 4,
adapt_delta = 0.95,
kappa_threshold = 0.5
),
pca_config = list(
rotation = "none",
n_boot = 1000
),
dfm_config = list(
ic_criterion = "bai_ng_2"
),
# General options
na_action = "interpolate", # Or "omit", "fail"
standardize = TRUE,
first_difference = FALSE,
verbose = TRUE,
seed = 42
)
# Access components
result$filters$wavelet
result$horseshoe
result$pca
result$dfm
result$unitroot
result$interpretationThe interpretation component synthesizes results:
result$interpretation$signal_characteristics
# $smoothness: Variance of second differences
# $smoothness_interpretation: "Very smooth", "Moderately volatile", etc.
# $trend_share: Proportion of variance from trend
result$interpretation$variable_selection
# $sparsity_ratio: Proportion shrunk to zero
# $n_selected: Number of selected predictors
# $top_predictors: Data frame of top variables
result$interpretation$factor_structure
# $pc1_entropy: Shannon entropy of PC1 loadings
# $topology_interpretation: "Concentrated", "Diffuse", etc.
result$interpretation$persistence
# $conclusion: Stationarity type
# $interpretation: Plain-language description
result$interpretation$overall_summary
# Combined narrative synthesisNot every analysis needs every method:
Statistical significance ≠ theoretical validity:
| Method | Complexity | Memory | Time (n=1000, p=100) |
|---|---|---|---|
| Wavelets | O(n log n) | Low | < 1 sec |
| EMD | O(n × IMFs) | Low | 1-5 sec |
| HP-GC | O(n × iter × chains) | Medium | 30-60 sec |
| Horseshoe | O(n × p × iter × chains) | High | 1-5 min |
| PCA | O(n × p²) | Medium | < 1 sec |
| DFM | O(n × p × factors) | Medium | 1-10 sec |
Stan-based methods (Horseshoe, HP-GC) automatically parallelize across chains.
Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2), 5018-5051.
Bai, J., & Ng, S. (2002). Determining the Number of Factors in Approximate Factor Models. Econometrica, 70(1), 191-221.
Grant, A. L., & Chan, J. C. C. (2017). A Bayesian Model Comparison for Trend-Cycle Decompositions of Output. Journal of Money, Credit and Banking, 49(2-3), 525-552.
Percival, D. B., & Walden, A. T. (2000). Wavelet Methods for Time Series Analysis. Cambridge University Press.
Huang, N. E., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society A, 454, 903-995.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.