WDL and WIG Model Specs

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

In this vignette, I will show how to set up the control parameters (hyper-parameters) needed for the WDL and WIG models.

The wdl_specs() is a list of lists, and consists of 5 parts (lists): wdl_control, tokenizer_control, word2vec_control, barycenter_control, optimizer_control.

`wig_control`

This is the options only needed for wig_specs(). By default, it is

wig_control = list(
  group_unit = "month",
  svd_method = "docs",
  standardize = TRUE
)

group_unit dictates at which level of time to group the documents, and it will be passed to lubridate::floor_date() as the unit argument. The default option is “month” to obtain monthly time series index, and other options can be specified following the unit argument in lubridate::floor_date().
svd_method can be either “docs” or “topics”. The “doc” method means the Truncated SVD will be applied on the reconstructed documents to get the index directly; whereas “topics” means TSVD will be applied to the topics matrix before the construction of the index. The latter one is the one originally proposed in Xie (2020).
standardize: bool, whether or not to standardize the result index as mean 100 and standard deviation 1. This is default to be true, following Baker et al. (2016), Xie (2020).

`wdl_control`

This is the options supplied to the WDL modelling, and is used for both wdl_specs() and wig_specs().

num_topics: number of topics for the topic modeling
batch_size: batch size for the training purpose
epochs: epochs (i.e. number of passes) for the training data
shuffle: bool, whether to shuffle the input data randomly
verbose: bool, whether to print out useful diagnostic information

`tokenizer_control`

Arguments for tokenizers::tokenize_word_stems().

`word2vec_control`

Arguments for word2vec::word2vec(), but with the following default parameters:

type = "cbow"
dim = 10
min_count = 1

`barycenter_control`

Identical to barycenter_control in barycenter() function, but with default

with_grad = TRUE

`optimizer_control`

Parameters to control the optimizer (SGD, Adam, AdamW).

optimizer_control = list(
  optimizer = "adamw",
  lr = .005,
  decay = .01,
  beta1 = .9,
  beta2 = .999,
  eps = 1e-8
)

The default optimizer is AdamW (“adamw”), but you can also choose vanilla SGD (“sgd”) or the vanilla (“adam”). You can also set the learning rate lr in your hyper-parameter search.

The other default parameters should mostly be untouched for most people, unless you know exactly what you are doing. For a reference, you can see Section 7.1 in Xie (2025), and the references within.

References

Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. The Quarterly Journal of Economics, 131(4), 1593–1636. https://doi.org/10.1093/qje/qjw024

Peyré, G., & Cuturi, M. (2019). Computational Optimal Transport: With Applications to Data Science. Foundations and Trends® in Machine Learning, 11(5–6), 355–607. https://doi.org/10.1561/2200000073

Schmitz, M. A., Heitz, M., Bonneel, N., Ngolè, F., Coeurjolly, D., Cuturi, M., Peyré, G., & Starck, J.-L. (2018). Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning. SIAM Journal on Imaging Sciences, 11(1), 643–678. https://doi.org/10.1137/17M1140431

Xie, F. (2020). Wasserstein index generation model: Automatic generation of time-series index with application to economic policy uncertainty. Economics Letters, 186, 108874. https://doi.org/10.1016/j.econlet.2019.108874

Xie, F. (2025). Deriving the Gradients of Some Popular Optimal Transport Algorithms (No. arXiv:2504.08722). arXiv. https://doi.org/10.48550/arXiv.2504.08722

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

WDL and WIG Model Specs

wig_control

wdl_control

tokenizer_control

word2vec_control

barycenter_control

optimizer_control

See Also

References

`wig_control`

`wdl_control`

`tokenizer_control`

`word2vec_control`

`barycenter_control`

`optimizer_control`