The goal of surveysd is to combine all necessary steps to use calibrated bootstrapping with custom estimation functions. This vignette will cover the usage of the most important functions. For insights in the theory used in this package, refer to vignette("methodology")
.
A test data set based on data(eusilc, package = "laeken")
can be created with demo.eusilc()
library(surveysd)
set.seed(1234)
eusilc <- demo.eusilc(n = 2, prettyNames = TRUE)
eusilc[1:5, .(year, povertyRisk, gender, pWeight)]
year | povertyRisk | gender | pWeight |
---|---|---|---|
2010 | FALSE | female | 504.5696 |
2010 | FALSE | male | 504.5696 |
2010 | FALSE | male | 504.5696 |
2010 | FALSE | female | 493.3824 |
2010 | FALSE | male | 493.3824 |
Use stratified resampling without replacement to generate 10 samples. Those samples are consistent with respect to the reference periods.
Calibrate each sample according to the distribution of gender
(on a personal level) and region
(on a household level).
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
epsP = 1e-2, epsH = 2.5e-2, verbose = TRUE)
## Convergence reached in 3 steps
##
## Convergence reached in 3 steps
## Convergence reached in 2 steps
##
## Convergence reached in 2 steps
## Convergence reached in 3 steps
##
## Convergence reached in 3 steps
## Convergence reached in 2 steps
## Convergence reached in 3 steps
## Convergence reached in 2 steps
##
## Convergence reached in 2 steps
year | povertyRisk | gender | pWeight | w1 | w2 | w3 | w4 |
---|---|---|---|---|---|---|---|
2010 | FALSE | female | 504.5696 | 1024.316 | 1.492932 | 1.452077 | 1.472965 |
2010 | FALSE | male | 504.5696 | 1024.316 | 1.492932 | 1.452077 | 1.472965 |
2010 | FALSE | male | 504.5696 | 1024.316 | 1.492932 | 1.452077 | 1.472965 |
2011 | FALSE | female | 504.5696 | 1023.818 | 1.538197 | 1.493209 | 1.501502 |
2011 | FALSE | male | 504.5696 | 1023.818 | 1.538197 | 1.493209 | 1.501502 |
Estimate relative amount of persons at risk of poverty per period and gender
.
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender")
err.est$Estimates
year | n | N | gender | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2010 | 7267 | 3979572 | male | 12.02660 | 0.5870804 |
2010 | 7560 | 4202650 | female | 16.73351 | 0.7460045 |
2010 | 14827 | 8182222 | NA | 14.44422 | 0.6613327 |
2011 | 7267 | 3979572 | male | 12.81921 | 0.6050084 |
2011 | 7560 | 4202650 | female | 16.62488 | 0.7344174 |
2011 | 14827 | 8182222 | NA | 14.77393 | 0.6622016 |
The output contains estimates (val_povertyRisk
) as well as standard errors (stE_povertyRisk
) measured in percent.
Estimate relative amount of persons at risk of poverty per period for each region
, gender
, and combination of both.
group <- list("gender", "region", c("gender", "region"))
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group)
head(err.est$Estimates)
year | n | N | gender | region | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|
2010 | 261 | 122741.8 | male | Burgenland | 17.414524 | 3.814464 |
2010 | 288 | 137822.2 | female | Burgenland | 21.432598 | 3.228845 |
2010 | 359 | 182732.9 | male | Vorarlberg | 12.973259 | 1.862122 |
2010 | 374 | 194622.1 | female | Vorarlberg | 19.883637 | 3.101161 |
2010 | 440 | 253143.7 | male | Salzburg | 9.156964 | 1.804527 |
2010 | 484 | 282307.3 | female | Salzburg | 17.939382 | 2.579708 |