The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

weightflow

Declarative, pipeable survey weighting in base R — from design weights to calibrated, variance-ready weights.

weightflow builds survey weights by chaining hierarchical adjustments with a tidymodels-style API, and estimates their variances with a bootstrap that re-applies the whole recipe on each replicate. It has no hard dependencies (base R, R >= 4.1) and bridges to survey/srvyr for design-based inference.

Installation

# install.packages("remotes")
remotes::install_github("jpferreira33/weightflow")

The idea

A recipe is inert: building it computes nothing. prep() walks the steps in order and estimates the cascade of factors; collect_weights() extracts the final weights. Separating define from apply makes the whole process reproducible and auditable, and it is exactly what lets the bootstrap re-run the entire cascade per replicate.

library(weightflow)

recipe <- weighting_spec(sample_survey, base_weights = pw) |>
  step_unknown_eligibility(unknown = unknown_elig, by = "region") |>
  step_nonresponse(respondent = responded, method = "weighting_class",
                   by = c("region", "sex")) |>
  step_calibrate(method = "raking",
                 margins = list(region = c(table(population$region)),
                                sex    = c(table(population$sex))))

fitted <- prep(recipe)              # estimate the cascade
summary(fitted)                     # per-stage diagnostics + Kish deff
wts    <- collect_weights(fitted)   # data.frame with .weight

What it does

Adjustment steps, applied in the order you pipe them:

Step What it does
step_unknown_eligibility() Redistribute unknown-eligibility cases among the known ones (person- or household-level via cluster).
step_drop_ineligible() Zero out out-of-scope units.
step_select_within() Within-household selection (unequal prob or equal n_eligible).
step_nonresponse() Weighting classes or propensity (logit / CART / random forest), person- or household-level.
step_calibrate() Raking, post-stratification, linear/GREG; bounded (Deville-Särndal) and integrative (one weight per household) options.
step_model_calibration() Wu-Sitter model calibration with working models for the outcomes.
step_trim(), step_trim_weights() Manual or automatic survey-style trimming, insertable anywhere.
step_round(), step_rescale() Integer rounding and rescaling to a size or total.
step_assert() Quality checkpoint on deff, weight ratio or effective n.

Eligibility and response accept 0/1 dummy columns or any logical condition.

Diagnostics and reporting: summary() and plot() show the per-stage cascade with the Kish design effect (deff = 1 + CV²) and effective sample size; weight_factors() returns the per-unit, per-step factors; report_weighting() writes a self-contained HTML report — pipeline diagram, variables used, per-stage summaries and per-step visuals — with no graphics device or server required.

Variance estimation (see the Variance estimation article):

boot <- bootstrap_weights(recipe, replicates = 500, strata = "region", psu = "psu")
boot_mean(boot, "income")           # estimate, SE and CI
as_svydesign(fitted, ids = "psu", strata = "region")   # survey linearization
collect_replicate_weights(boot)     # replicate weights, ready for srvyr

The bootstrap resamples PSUs within strata (Rao-Wu rescaling bootstrap) and re-applies the recipe on each replicate, so the replicate weights carry the variability of every adjustment.

Example data

Three bundled datasets: population (the frame), sample_survey (take-all roster) and sample_one (multistage select-one design), all with stratum, PSU and design weight, so the full pipeline and the variance methods run natively.

Extending

apply_step() is the internal S3 generic behind each step. To add an adjustment, define a step_*() constructor (inert) and its apply_step.<class>() method — nothing else changes.

References

General framework

Nonresponse

Calibration

Design effect and trimming

Variance estimation

License

MIT © Juan Pablo Ferreira

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.