The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The Introduction to synthACS
briefly mentions the
split
and combine_smsm
functionality in
Sections 3.2 and 3.4 respectively. There, we note that deriving the
sample synthetic micro data is a memory intensive process and advise
using synthACS
on a high performance machine. Of course,
such a machine is not always available, which is when split
and combine_smsm
are needed.
A brief illustration of these two functions is provided in this vignette. The same example data is used as in the introductory vignette:
library(data.table)
library(acs)
library(synthACS)
library(retry)
<- geo.make(state = "CA", county = "*")
ca_geo <- pull_synth_data(2014, 5, ca_geo) ca_dat_SMSM
split()
and
combine_smsm()
The split
and combine_smsm
functions are
used, respectively, to reduce the computational requirements of a large
spatial microsimulation task into a set of smaller tasks and to
recombine the results. They enable the well known “split-apply-combine”
strategy for Data Analysis (Wickham 2011). In this case, the “apply”
step is intentionally performed sequentially and not
inside another function in order to minimize RAM usage and enable a
garbage-collection step between intensive in-memory function calls.
The syntax for both is straightforward:
split(<object>, n_splits= N)
combine_smsm(<object1>, <object2>, ..., <objectk>)
split
takes a larger macroASC
class object
and splits it into n_splits
smaller macroACS
objects. Similarly combine_smsm
takes several smaller
smsm_set
objects and combines them into a single, larger,
smsm_set
class object.
An example of this is provided below:
# split()
<- 20
n_splits <- split(ca_dat_SMSM, n_splits = n_splits)
split_ca_dat <- vector("list", length= n_splits)
tmp_opts
for (i in 1:n_splits) {
# Section 3.3 of introduction: SMSM via simulated annealing
# derive synthetic datasets
<- derive_synth_datasets(split_ca_dat[[i]], leave_cores = 0)
tmp_synth
# create constraints for simulated annealing
<- all_geog_constraint_age(tmp_synth, method = "macro.table")
a <- all_geog_constraint_gender(tmp_synth, method = "macro.table")
g <- all_geog_constraint_marital_status(tmp_synth, method = "macro.table")
m <- all_geog_constraint_race(tmp_synth, method = "synthetic")
r <- all_geog_constraint_edu(tmp_synth, method = "synthetic")
e
<- all_geogs_add_constraint(attr_name = "age", attr_total_list = a,
cll macro_micro = tmp_synth)
<- all_geogs_add_constraint(attr_name = "gender", attr_total_list = g,
cll macro_micro = tmp_synth, constraint_list_list = cll)
<- all_geogs_add_constraint(attr_name = "marital_status", attr_total_list = m,
cll macro_micro = tmp_synth, constraint_list_list = cll)
<- all_geogs_add_constraint(attr_name = "race", attr_total_list = r,
cll macro_micro = tmp_synth, constraint_list_list = cll)
<- all_geogs_add_constraint(attr_name = "edu_attain", attr_total_list = e,
cll macro_micro = tmp_synth, constraint_list_list = cll)
# anneal
<- all_geog_optimize_microdata(tmp_synth, seed = 6550L, verbose = TRUE,
tmp_opts[[i]] constraint_list_list = cll, p_accept = 0.4, max_iter = 10000L)
}
# create the string needed for combine_smsm().
paste0("tmp_opts[[", 1:n_splits, "]]", sep= ", ", collapse= "")
# [1] "tmp_opts[[1]], tmp_opts[[2]], tmp_opts[[3]], tmp_opts[[4]], tmp_opts[[5]],
# tmp_opts[[6]], tmp_opts[[7]], tmp_opts[[8]], tmp_opts[[9]], tmp_opts[[10]],
# tmp_opts[[11]], tmp_opts[[12]], tmp_opts[[13]], tmp_opts[[14]], tmp_opts[[15]],
# tmp_opts[[16]], tmp_opts[[17]], tmp_opts[[18]], tmp_opts[[19]], tmp_opts[[20]], "
# copy and paste the resulting string, excluding the final trailing comma
<- combine_smsm(tmp_opts[[1]], tmp_opts[[2]], tmp_opts[[3]], tmp_opts[[4]], tmp_opts[[5]],
opt_ca 6]], tmp_opts[[7]], tmp_opts[[8]], tmp_opts[[9]], tmp_opts[[10]],
tmp_opts[[11]], tmp_opts[[12]], tmp_opts[[13]], tmp_opts[[14]],
tmp_opts[[15]], tmp_opts[[16]], tmp_opts[[17]], tmp_opts[[18]],
tmp_opts[[19]], tmp_opts[[20]]) tmp_opts[[
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.