The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Dynamic Factor Model

Greg Veramendi

Overview

An example of how to estimate the dynamics of a latent factor. Assume that indicators are observed at two time points and you want to estimate how the period-2 value depends on the period-1 value. You need two things:

  1. A measurement system that is invariant across time (same loadings, thresholds / residual SDs across the two periods) so the factor scale is comparable, and
  2. A structural equation linking the period-2 factor to the period-1 factor: f₂ = α + β·f₁ + ε.

factorana supports this with a two-stage workflow, and two helper functions make the setup a few lines:

This vignette walks through a simulated example where α, β, and σ²_ε are known, shows the recovery, and explains one subtlety of the workflow (why the anchor-period intercepts are the right ones to carry into Stage 2).

Simulate data

One latent construct, three linear indicators per period, two periods. The DGP:

\[ f_1 \sim N(0,\sigma^2_{f_1}), \qquad f_2 = \alpha + \beta\,f_1 + \varepsilon, \quad \varepsilon \sim N(0,\sigma^2_\varepsilon) \]

\[ Y_{t,m} = \tau_m + \lambda_m\,f_t + u_{t,m}, \quad u_{t,m} \sim N(0,\sigma_m^2) \]

with measurement parameters τ_m, λ_m, σ_m shared across the two periods.

library(factorana)

set.seed(41)
n <- 1500

# Structural parameters (what we want to recover)
true_var_f1   <- 1.0
true_alpha    <- 0.4
true_beta     <- 0.6
true_sigma_e2 <- 0.5

# Shared measurement parameters
item_int   <- c(1.5, 1.0, 0.8)
item_load  <- c(1.0, 0.9, 1.1)   # first loading fixed to 1 in the model
item_sigma <- c(0.7, 0.75, 0.65)

f1  <- rnorm(n, 0, sqrt(true_var_f1))
eps <- rnorm(n, 0, sqrt(true_sigma_e2))
f2  <- true_alpha + true_beta * f1 + eps

gen_Y <- function(f, i) {
  item_int[i] + item_load[i] * f + rnorm(length(f), 0, item_sigma[i])
}

dat <- data.frame(
  intercept = 1,
  eval      = 1L,
  Y_t1_m1 = gen_Y(f1, 1), Y_t1_m2 = gen_Y(f1, 2), Y_t1_m3 = gen_Y(f1, 3),
  Y_t2_m1 = gen_Y(f2, 1), Y_t2_m2 = gen_Y(f2, 2), Y_t2_m3 = gen_Y(f2, 3)
)

The wide-format data has columns named <prefix><item>: Y_t1_m1, Y_t1_m2, Y_t1_m3 for wave 1, and Y_t2_m1, … for wave 2.

Stage 1: define_dynamic_measurement()

One function call builds the measurement model: a 2-factor independent system (one factor per period) with loadings and sigmas tied across periods, intercepts left free per period.

dyn <- define_dynamic_measurement(
  data                 = dat,
  items                = c("m1", "m2", "m3"),
  period_prefixes      = c("Y_t1_", "Y_t2_"),
  model_type           = "linear",
  evaluation_indicator = "eval"
)
# The wrapper generates 5 equality constraints: 2 for loadings (items
# m2, m3; item m1's loading is fixed to 1 on its factor slot) and 3
# for sigmas.
length(dyn$equality_constraints)
#> [1] 5

Estimate Stage 1 with estimate_model_rcpp() exactly as for any model system:

ctrl <- define_estimation_control(n_quad_points = 8, num_cores = 1)
result_stage1 <- estimate_model_rcpp(
  model_system = dyn$model_system,
  data         = dat,
  control      = ctrl,
  optimizer    = "nlminb",
  parallel     = FALSE,
  verbose      = FALSE
)
result_stage1$convergence
#> [1] 0

Inspect the intercepts: the wave-1 intercepts should match the DGP τ_m (because E[f₁] = 0 by convention). The wave-2 intercepts are shifted by λ_m · E[f₂], which is the mean-drift artefact we are about to discard.

est <- result_stage1$estimates
tab <- data.frame(
  m        = 1:3,
  DGP_tau  = item_int,
  wave_1   = round(c(est["Y_t1_m1_intercept"],
                     est["Y_t1_m2_intercept"],
                     est["Y_t1_m3_intercept"]), 3),
  wave_2   = round(c(est["Y_t2_m1_intercept"],
                     est["Y_t2_m2_intercept"],
                     est["Y_t2_m3_intercept"]), 3)
)
knitr::kable(tab, row.names = FALSE)
m DGP_tau wave_1 wave_2
1 1.5 1.488 1.905
2 1.0 1.000 1.332
3 0.8 0.778 1.234

Why we do not pool the intercepts

A natural alternative is to stack period-1 and period-2 rows into one long-format dataset and fit a single-factor CFA. The resulting intercepts would be the pooled means of Y across both periods, biased by λ_m · E[f₂]/2 (because the pooled mean averages the period-1 and period-2 factor means). Plugging those into Stage 2 under-estimates α by roughly E[f₂]/2.

The wrapper avoids that: Stage 1 estimates period-specific intercepts, then build_dynamic_previous_stage() carries only the anchor period’s intercepts into Stage 2.

Stage 2: build_dynamic_previous_stage() + SE_linear

dummy <- build_dynamic_previous_stage(
  dyn           = dyn,
  stage1_result = result_stage1,
  data          = dat,
  anchor_period = 1L
)

fm_stage2 <- define_factor_model(
  n_factors        = 2,
  n_types          = 1,
  factor_structure = "SE_linear"
)
ms_stage2 <- define_model_system(
  components     = list(),       # measurement components prepended from previous_stage
  factor         = fm_stage2,
  previous_stage = dummy
)
#> Two-stage SE estimation: Stage 1 (independent) -> Stage 2 (SE_linear)
#>   Measurement parameters (loadings, thresholds) will be fixed from Stage 1
#>   Factor distribution parameters will use Stage 2 structure
#>   Fixing 16 measurement parameters, ignoring 2 factor distribution parameters

init_s2 <- initialize_parameters(ms_stage2, dat, verbose = FALSE)
init_s2$init_params["factor_var_1"]    <- unname(dummy$estimates["factor_var_1"])
init_s2$init_params["se_intercept"]    <- 0.0
init_s2$init_params["se_linear_1"]     <- 0.5
init_s2$init_params["se_residual_var"] <- 0.5

result_stage2 <- estimate_model_rcpp(
  model_system = ms_stage2,
  data         = dat,
  init_params  = init_s2$init_params,
  control      = ctrl,
  optimizer    = "nlminb",
  parallel     = FALSE,
  verbose      = FALSE
)
result_stage2$convergence
#> [1] 0

Recovery

est <- result_stage2$estimates
se  <- result_stage2$std_errors
ps  <- c("factor_var_1", "se_intercept", "se_linear_1", "se_residual_var")

tab <- data.frame(
  param = ps,
  true  = c(true_var_f1, true_alpha, true_beta, true_sigma_e2),
  est   = round(unname(est[ps]), 4),
  se    = round(unname(se[ps]),  4)
)
tab$z <- round((tab$est - tab$true) / tab$se, 2)
knitr::kable(tab, row.names = FALSE)
param true est se z
factor_var_1 1.0 0.9104 0.0325 -2.76
se_intercept 0.4 0.4036 0.0217 0.17
se_linear_1 0.6 0.5880 0.0236 -0.51
se_residual_var 0.5 0.5051 0.0261 0.20

se_intercept (the structural α) recovers essentially exactly, which is the whole point of the workflow. The slope β, residual variance σ²_ε, and input factor variance σ²_f₁ also recover within their standard errors.

Notes on generalisation

Where to go next

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.