The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

HHBayes

R-CMD-check License: MIT

Bayesian Household Transmission Modeling in R

HHBayes is an R package for simulating and estimating infectious disease transmission within households. It couples a stochastic, age-structured household simulator with Bayesian inference via Stan, enabling researchers to study within-household spread, evaluate intervention strategies, and reconstruct transmission chains from longitudinal diagnostic data.


Why HHBayes?

Household transmission studies generate rich but complex data — repeated tests on every member, overlapping infections, reinfections, and age-dependent viral kinetics. HHBayes provides an integrated pipeline:

  1. Simulate realistic outbreaks with configurable household structures, contact patterns, and interventions.
  2. Prepare observed or simulated data for Bayesian analysis, with automatic imputation of infection/recovery windows and viral load curves.
  3. Fit a Stan model that jointly estimates transmission rates, role-specific susceptibility/infectivity, covariate effects, and viral load scaling.
  4. Analyze results with built-in tools for attack rates, transmission chain reconstruction, and publication-ready plots.

Installation

# install.packages("devtools")
devtools::install_github("keli5734/HHBayes")

Prerequisites

HHBayes depends on RStan, which requires a working C++ toolchain. See the RStan Getting Started Guide for platform-specific setup.


Quick Start

library(HHBayes)
library(rstan)
library(ggpubr)

# 0. Generate surveillance dataset
study_start <- "2024-07-01"
study_end   <- "2025-06-30"
dates_weekly <- seq(from = as.Date(study_start), to = as.Date(study_end), by = "week")
surveillance_data <- data.frame(
  date = dates_weekly,
  # Random epidemic curve (low start, peak middle, low end)
  cases = 0.1 + 100 * exp(-0.0002 * (as.numeric(dates_weekly - mean(dates_weekly)))^2) + abs(rnorm( length(dates_weekly),mean = 0, sd = 10))
)

# 1. Simulate 50 households with ODE-based viral dynamics
sim <- simulate_multiple_households_comm(
  n_households = 50,
  viral_testing = "viral load",
  start_date = "2024-07-01",
  end_date   = "2025-06-30",
  seed = 123,
  surveillance_df = surveillance_data
)

# 2. Summarize attack rates
rates <- summarize_attack_rates(sim)
rates$primary_by_role

# 3. Prepare data and fit the Bayesian model
stan_input <- prepare_stan_data(sim$diagnostic_df,
                                use_vl_data      = 1,
                                surveillance_df  = surveillance_data,
                                study_start_date = as.Date("2024-07-01"),
                                study_end_date   = as.Date("2025-06-30"),
                                seed = 123)
  
options(mc.cores = parallel::detectCores())

fit <- fit_household_model(stan_input, iter = 2000, chains = 4)

# 4. Visualize
plot_posterior_distributions(fit)

Full Pipeline Walkthrough

The sections below walk through a complete analysis: from defining a study scenario, through simulation, data preparation, Bayesian model fitting, and visualization.

Step 1: Define the Study Timeline and Surveillance Data

The simulation is anchored to a calendar period. You can optionally supply external surveillance data (a dataframe with date and cases columns) to drive time-varying community importation. The case counts are automatically normalized and interpolated to daily resolution.

study_start <- "2024-07-01"
study_end   <- "2025-06-30"

# Weekly surveillance data with a mid-study epidemic peak
dates_weekly <- seq(as.Date(study_start), as.Date(study_end), by = "week")
surveillance_data <- data.frame(
  date  = dates_weekly,
  cases = 0.1 + 100 * exp(-0.0002 * (as.numeric(dates_weekly - mean(dates_weekly)))^2) +
    abs(rnorm(length(dates_weekly), 0, 10))
)

If no surveillance data is provided, you can supply custom seasonal_forcing_list vectors or leave it flat (constant community force).

Step 2: Define the Contact Matrix

By default, all household members contact each other equally. A role_mixing_matrix lets you specify differential contact weights between roles. Each cell represents the relative contact intensity between a source (row) and target (column):

role_mixing_weights <- matrix(c(
# Target: Infant Toddler Adult Elderly
          0.0,   0.5,    1.0,  0.5,    # Source: Infant  (high contact with adults)
          0.5,   0.9,    0.7,  0.5,    # Source: Toddler (high peer contact)
          1.0,   0.7,    0.6,  0.7,    # Source: Adult    
          0.5,   0.5,    0.7,  0.0     # Source: Elderly (limited infant contact)
), nrow = 4, byrow = TRUE,
   dimnames = list(
     c("infant", "toddler", "adult", "elderly"),
     c("infant", "toddler", "adult", "elderly")))

This matrix is expanded internally to an N x N individual-level contact matrix for each household, based on the role of each member. For example, in a household with roles c("adult", "adult", "infant"), the adult-to-infant weight (1.0) is applied to those pairs.

Step 3: Define Household Structure

Each simulated household is assembled randomly according to a demographic profile. The household_profile_list controls the probability of different compositions:

household_profile <- list(
  prob_adults   = c(0, 0, 1),        # 0% chance of 0 or one adult, 100% chance of two
  prob_infant   = 1.0,               # 100% chance of one infant
  prob_siblings = c(0, 0.8, 0.2),    # 80% chance of one toddler, 20% chance of two
  prob_elderly  = c(0.7, 0.1, 0.2)   # 70% no elderly, 10% one, 20% two
)
Field Values Meaning
prob_adults c(P0, P1, P2) Probability of 0, 1, or 2 adults
prob_infant 0 to 1 Probability of an infant being present
prob_siblings c(P0, P1, P2) Probability of 0, 1, or 2 toddler siblings
prob_elderly c(P0, P1, P2) Probability of 0, 1, or 2 elderly members

Step 4: Define Intervention Strategies

Interventions (vaccination, masking, prophylaxis, etc.) are specified as a list via covariates_config. Each entry defines an intervention with a name, what it affects, its efficacy, and role-specific coverage probabilities:

sim_config <- list(
  list(
    name      = "vacc_status",
    efficacy  = 0.5,                # 50% reduction
    effect_on = "both",             # "susceptibility", "infectivity", or "both"
    coverage  = list(
      infant  = 0.00,
      toddler = 0.30,
      adult   = 0.80,
      elderly = 0.90
    )
  )
)

How efficacy works: For each person, a Bernoulli draw (based on coverage) determines if they receive the intervention. If they do, their susceptibility and/or infectivity is multiplied by (1 - efficacy). Multiple interventions stack multiplicatively.

Setting efficacy = 0 (as in the baseline example) creates the covariate column without any actual effect — useful for testing the pipeline.

Step 5: Run the Simulation

sim_res <- simulate_multiple_households_comm(
  n_households    = 50,
  viral_testing   = "viral load",    # or "Ct"
  infectious_shape = 10,
  infectious_scale = 1,              # Mean infectious period = 10 days
  waning_shape    = 6,
  waning_scale    = 10,              # Mean immunity = 60 days
  surveillance_interval = 4,         # Test every 4 days
  start_date      = study_start,
  end_date        = study_end,
  surveillance_df = surveillance_data,
  covariates_config      = sim_config,
  household_profile_list = household_profile,
  role_mixing_matrix     = role_mixing_weights,
  seed = 123
)

The output is a list with two dataframes:

Output Description
sim_res$hh_df One row per person per infection episode: household ID, person ID, role, infection time, recovery times, and covariate assignments.
sim_res$diagnostic_df Simulated test results at each surveillance time point: viral load or Ct value, binary test result, and episode ID.

Step 6: Summarize Attack Rates

rates <- summarize_attack_rates(sim_res)

rates$primary_overall      # Overall primary attack rate
rates$primary_by_role      # Attack rate by age group
rates$reinf_overall        # Overall reinfection summary
rates$reinf_by_role        # Reinfections by age group

Step 7: Plot the Epidemic Curve

Overlays simulated infections (stacked bars by age group) with the surveillance line:

plot_epidemic_curve(sim_res, surveillance_data,
                    start_date_str = study_start, bin_width = 7)

Step 8: Prepare Data for Stan

This step transforms the diagnostic dataframe into the structured list that Stan expects. It handles column renaming, infection window imputation, viral load gap-filling, covariate matrix construction, and prior specification.

Important: Covariates from the simulation live in hh_df, but Stan needs them joined to diagnostic_df:

# Extract 1-row-per-person covariate table
person_covariates <- sim_res$hh_df %>%
  dplyr::select(hh_id, person_id, vacc_status) %>%
  dplyr::distinct()

# Join to diagnostic data
df_for_stan <- sim_res$diagnostic_df %>%
  dplyr::left_join(person_covariates, by = c("hh_id", "person_id"))

Define priors for the Bayesian model. Supported distributions are "normal", "uniform", and "lognormal":

my_priors <- list(
  beta1      = list(dist = "normal",  params = c(-5, 1)),
  beta2      = list(dist = "normal",  params = c(-5, 1)),
  alpha      = list(dist = "normal",  params = c(-7, 1)),
  phi_role   = list(dist = "normal",  params = c(0, 1)),
  kappa_role = list(dist = "normal",  params = c(0, 1)),
  vl_midpoint = list(dist = "normal", params = c(4, 1.0)),  # adjust if Ct values
  vl_slope    = list(dist = "normal", params = c(1.0,  0.5))   # adjust if Ct values
)

Supply viral load imputation parameters (used to fill gaps in observed viral data):

VL_params_list <- list(
  adult   = list(v_p = 4.14, t_p = 5.09, lambda_g = 2.31, lambda_d = 2.71),
  infant  = list(v_p = 5.84, t_p = 4.09, lambda_g = 2.82, lambda_d = 1.01),
  toddler = list(v_p = 5.84, t_p = 4.09, lambda_g = 2.82, lambda_d = 1.01),
  elderly = list(v_p = 2.95, t_p = 5.10, lambda_g = 3.15, lambda_d = 0.87)
)

Build the Stan input:

stan_input <- prepare_stan_data(
  df_clean         = df_for_stan,
  surveillance_df  = surveillance_data,
  study_start_date = as.Date(study_start),
  study_end_date   = as.Date(study_end),
  use_vl_data      = TRUE,
  use_curve_logic  = FALSE,         
  delta = 0,                        
  imputation_params = VL_params_list,
  priors           = my_priors,
  role_mixing_matrix = role_mixing_weights)

Step 9: Fit the Bayesian Model

fit_household_model() runs Hamiltonian Monte Carlo (HMC) sampling via rstan::sampling(). You can exclude internal bookkeeping parameters to keep the output clean:

options(mc.cores = parallel::detectCores())

fit <- fit_household_model(
  stan_input,
  pars = c("log_phi_by_role_raw",
           "log_kappa_by_role_raw",
           "log_beta1",
           "log_beta2",
           "log_alpha_comm",
           "g_curve_est",
           "V_term_calc"),
  include = FALSE,
  iter    = 2000,
  warmup  = 1000,
  chains  = 4,
  seed    = 123,
  init_fun = NULL,
  core    = 4
)

print(fit, probs = c(0.025, 0.5, 0.975))

Step 10: Visualize Results

# Posterior distributions of susceptibility (phi) and infectivity (kappa) by role
plot_posterior_distributions(fit)

# Reconstruct who-infected-whom transmission chains
chains <- reconstruct_transmission_chains(fit, stan_input, min_prob_threshold = 0.001)

# Plot a specific household's infection timeline with transmission arrows
plot_household_timeline(chains, stan_input, target_hh_id = 11)

Simulation Parameter Reference

Transmission

Parameter Default Description
beta1 0.008 Baseline within-household transmission rate (contact-driven).
beta2 0.008 Viral-load-dependent transmission rate. Total per-infector force = beta1 + beta2 * f(VL).
delta 0 Household size scaling. Force scaled by (1/max(n_h,1))^delta in larger households.
alpha_comm_by_role 5e-4 Daily community acquisition rate.
phi_by_role c(adult=1, infant=3, toddler=4, elderly=1) Susceptibility multipliers by role.
kappa_by_role c(adult=1, infant=1, toddler=1.2, elderly=1) Infectivity multipliers by role.

Infectious Period and Immunity

Parameter Default Description
infectious_shape 3 Gamma shape for infectious duration.
infectious_scale 1 Gamma scale for infectious duration. Mean = shape x scale.
waning_shape 16 Gamma shape for immunity waning.
waning_scale 10 Gamma scale for immunity waning. Mean = 160 days.
max_infections Inf Max infections per person. Set to 1 to disable reinfection.

Viral Dynamics

Parameter Default Description
viral_testing "viral load" "viral load" (log10 scale) or "Ct" (cycle threshold).
model_type "empirical" "empirical" (parametric curves) or "ODE" (within-host ODE).
vl_midpoint 3.0 or 40 Reference log10 VL for infectiousness scaling or Ct at 50% infectiousness (sigmoid)
vl_slope 2.5 or 2 Power-law exponent for log10 VL scaling. or Steepness of Ct-infectiousness sigmoid.

Testing and Surveillance

Parameter Default Description
surveillance_interval 1 Days between routine tests.
test_daily FALSE Switch to daily testing after first household detection.
perfect_detection TRUE If FALSE, detection depends on viral load thresholds.
detect_threshold_log10 1e-6 Min log10 VL for positive test.
detect_threshold_Ct 40 Max Ct for positive test (set 35-40 for realistic PCR).

Data Preparation & Model Control

Parameter Default Description
use_vl_data TRUE Use actual viral load data vs. generation interval curve.
use_curve_logic TRUE When use_vl_data = FALSE, estimate generation interval curve vs. constant infectivity.
delta 0 Household size scaling. Force scaled by (1/max(n_h,1))^delta.

Viral Load Parameters (when vl_type = 1)

Parameter Default Description
vl_midpoint 3.0 Reference log10 VL for infectiousness scaling.
vl_slope 2.5 Power-law exponent for VL scaling: (VL/vl_midpoint)^vl_slope.

Ct Parameters (when vl_type = 0)

Parameter Default Description
vl_midpoint 35.0 Ct value at 50% infectiousness.
vl_slope 2.0 Steepness of Ct-infectiousness sigmoid.

Key Functions

Function Description
simulate_multiple_households_comm() Simulate multi-household outbreaks
prepare_stan_data() Format data for Stan model
fit_household_model() Fit the Bayesian model via HMC
summarize_attack_rates() Attack rates and reinfection summaries
reconstruct_transmission_chains() Posterior who-infected-whom probabilities
fill_missing_viral_data() Impute missing Ct / viral load
plot_posterior_distributions() Violin plots of phi and kappa
plot_covariate_effects() Forest plot of intervention effects
plot_epidemic_curve() Epidemic curve overlaid with surveillance
plot_household_timeline() Per-household timeline with transmission arrows
prepare_stan_data() Format data for Stan with conditional parameter setup

Model Overview

The daily probability of infection for susceptible individual \(i\) in household \(h\) at time \(t\) is:

\[P(\text{infection}_{ih}(t)) = 1 - \exp\left(-\lambda_{ih}(t)\right)\]

where the total force of infection combines community and household sources:

\[\lambda_{ih}(t) = \phi_{r_i} \cdot \exp(\boldsymbol{\beta}_{\text{susc}} \cdot \mathbf{X}_{\text{susc},i}) \cdot \left( \alpha_{\text{comm}} \cdot S(t) + (\frac{1}{\max(1,n_h)})^{\delta} \cdot \sum_{j \in h, j \ne i} C_{ij} \cdot \kappa_{r_j} \cdot \exp(\boldsymbol{\beta}_{\text{inf}} \cdot \mathbf{X}_{\text{inf},j}) \cdot \left(\beta_1 + \beta_2 \cdot f(VL_j(t))\right) \right)\]

Parameter Description
\(\phi_{r_i}\) Role-specific susceptibility multiplier
\(\kappa_{r_j}\) Role-specific infectivity multiplier
\(\alpha_{\text{comm}}\) Baseline Community risk
\(S(t)\) Seasonal forcing term
\(\delta\) Household size scaling
\(C_{ij}\) Contact weight between persons \(i\) and \(j\)
\(f(VL)\) Viral load scaling function (power-law or sigmoid)
\(\boldsymbol{\beta}_{\text{susc}}\) Log-linear coefficients for susceptibility covariates
\(\boldsymbol{\beta}_{\text{inf}}\) Log-linear coefficients for infectivity covariates
\(\mathbf{X}_{\text{susc},i}\) Covariate vector for person \(i\) (susceptibility)
\(\mathbf{X}_{\text{inf},j}\) Covariate vector for person \(j\) (infectivity)

Individuals progress through S -> I -> R -> S with Gamma-distributed duration. Inference is performed via HMC in Stan (adapt_delta = 0.95, max_treedepth = 15).


How Covariate Effects Work

The simulation applies fixed efficacies during data generation, but the Stan model estimates log-linear coefficients to recover those effects:

Simulation time:

# Vaccination reduces susceptibility by 50%
if (vaccinated) susceptibility *= 0.5

Stan estimation:

# Stan estimates beta_susc ≈ -0.693, because exp(-0.693) ≈ 0.5  
susceptibility = phi_role * exp(beta_susc * vacc_status)

Interpretation: - β = 0: No effect (exp(0) = 1)
- β = -0.693: 50% reduction (exp(-0.693) ≈ 0.5) - β = +0.693: 2× increase (exp(0.693) ≈ 2.0) - Multiple covariates: Effects multiply on natural scale, add on log scale


Viral Component Scenarios

The model supports four scenarios for modeling infectiousness, controlled by use_vl_data and vl_type:

Scenario 1: Log10 Viral Load Data (use_vl_data = TRUE, vl_type = 1)

v_component = (max(0, VL) / vl_midpoint)^vl_slope

Estimated parameters: vl_midpoint, vl_slope

Scenario 2: Ct Value Data (use_vl_data = TRUE, vl_type = 0)

v_component = 1 / (1 + exp((vl_midpoint-Ct) / vl_slope))

Estimated parameters: vl_midpoint, vl_slope

Scenario 3: Generation Interval Curve (use_vl_data = FALSE, use_curve_logic = TRUE)

v_component = g_curve[days_since_infection]

Where g_curve is a normalized Gamma distribution. Estimated parameters: gen_shape, gen_rate

Scenario 4: Constant Infectiousness (use_vl_data = FALSE, use_curve_logic = FALSE)

v_component = 1.0  # No viral dynamics

No additional parameters estimated.

The final infectiousness is always: beta1 + beta2 * v_component


Estimated Parameters

The Stan model jointly estimates a core set of parameters, plus optional parameters that activate based on the data you supply to prepare_stan_data().

Estimated Parameters

The Stan model uses conditional parameter estimation - different parameters are estimated based on your data configuration:

Always Estimated

Parameter Description
beta1, beta2 Baseline and viral-dependent transmission rates
alpha_comm Community importation rate
phi_role Role-specific susceptibility multipliers
kappa_role Role-specific infectivity multipliers
vl_midpoint, vl_slope Log10 viral load scaling or Ct value scaling

Conditionally Estimated

Parameter Description When Estimated
gen_shape, gen_rate Generation interval curve use_vl_data = FALSE AND use_curve_logic = TRUE
beta_susc Susceptibility covariate effects covariates_susceptibility provided
beta_inf Infectivity covariate effects covariates_infectivity provided

Flexible Prior Support

All parameters support flexible priors ("normal", "uniform", "lognormal"):

my_priors <- list(
  phi_role   = list(dist = "normal", params = c(0, 0.5)),    # Tighter susceptibility prior
  kappa_role = list(dist = "lognormal", params = c(0, 0.3)), # LogNormal infectivity prior
)

Example

# Baseline: no covariates → estimates core params + VL curve only
stan_input <- prepare_stan_data(
  ...,
  use_vl_data = 1,
  covariates_susceptibility = NULL,
  covariates_infectivity    = NULL
)

# With covariates → also estimates vaccine/masking effects
stan_input <- prepare_stan_data(
  ...,
  use_vl_data = 1,
  covariates_susceptibility = c("vacc_status"),
  covariates_infectivity    = c("vacc_status", "masked")
)

When K_susc = 0 or K_inf = 0, the corresponding coefficient vectors do not enter the model. This means the same Stan model adapts automatically — no need to switch between different model files.


Acknowledgments

This work was supported by a grant from the National Institutes of Health (R01AI137093). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Authors

Citation

If you use HHBayes in your work, please cite:

Ke Li & Yiren Hou (2026). HHBayes: Bayesian Household Transmission Modeling in R. GitHub: https://github.com/keli5734/HHBayes

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.