The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Declarative, pipeable survey weighting in base R — from design weights to calibrated, variance-ready weights.
weightflow builds survey weights by chaining
hierarchical adjustments with a tidymodels-style API, and
estimates their variances with a bootstrap that re-applies the whole
recipe on each replicate. It has no hard dependencies
(base R, R >= 4.1) and bridges to
survey/srvyr for design-based inference.
# install.packages("remotes")
remotes::install_github("jpferreira33/weightflow")A recipe is inert: building it computes nothing.
prep() walks the steps in order and estimates the
cascade of factors; collect_weights() extracts the final
weights. Separating define from apply makes the whole
process reproducible and auditable, and it is exactly what lets the
bootstrap re-run the entire cascade per replicate.
library(weightflow)
recipe <- weighting_spec(sample_survey, base_weights = pw) |>
step_unknown_eligibility(unknown = unknown_elig, by = "region") |>
step_nonresponse(respondent = responded, method = "weighting_class",
by = c("region", "sex")) |>
step_calibrate(method = "raking",
margins = list(region = c(table(population$region)),
sex = c(table(population$sex))))
fitted <- prep(recipe) # estimate the cascade
summary(fitted) # per-stage diagnostics + Kish deff
wts <- collect_weights(fitted) # data.frame with .weightAdjustment steps, applied in the order you pipe them:
| Step | What it does |
|---|---|
step_unknown_eligibility() |
Redistribute unknown-eligibility cases among the known ones (person-
or household-level via cluster). |
step_drop_ineligible() |
Zero out out-of-scope units. |
step_select_within() |
Within-household selection (unequal prob or equal
n_eligible). |
step_nonresponse() |
Weighting classes or propensity (logit / CART / random forest), person- or household-level. |
step_calibrate() |
Raking, post-stratification, linear/GREG; bounded (Deville-Särndal) and integrative (one weight per household) options. |
step_model_calibration() |
Wu-Sitter model calibration with working models for the outcomes. |
step_trim(), step_trim_weights() |
Manual or automatic survey-style trimming, insertable anywhere. |
step_round(), step_rescale() |
Integer rounding and rescaling to a size or total. |
step_assert() |
Quality checkpoint on deff, weight ratio or effective n. |
Eligibility and response accept 0/1 dummy columns or any logical condition.
Diagnostics and reporting: summary()
and plot() show the per-stage cascade with the Kish
design effect (deff = 1 + CV²) and effective sample size;
weight_factors() returns the per-unit, per-step factors;
report_weighting() writes a self-contained HTML report —
pipeline diagram, variables used, per-stage summaries and per-step
visuals — with no graphics device or server required.
Variance estimation (see the Variance estimation article):
boot <- bootstrap_weights(recipe, replicates = 500, strata = "region", psu = "psu")
boot_mean(boot, "income") # estimate, SE and CI
as_svydesign(fitted, ids = "psu", strata = "region") # survey linearization
collect_replicate_weights(boot) # replicate weights, ready for srvyrThe bootstrap resamples PSUs within strata (Rao-Wu rescaling bootstrap) and re-applies the recipe on each replicate, so the replicate weights carry the variability of every adjustment.
Three bundled datasets: population (the frame),
sample_survey (take-all roster) and sample_one
(multistage select-one design), all with stratum, PSU and design weight,
so the full pipeline and the variance methods run natively.
apply_step() is the internal S3 generic behind each
step. To add an adjustment, define a step_*() constructor
(inert) and its apply_step.<class>() method — nothing
else changes.
General framework
Nonresponse
Calibration
Design effect and trimming
Variance estimation
MIT © Juan Pablo Ferreira
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.