The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette documents some “best practices” for anticlustering
using the R package anticlust
. In many cases, the
suggestions pertain to overriding the default values of arguments of
anticlustering()
, which seems to be a difficult decision
for users. However, I advise you: Do not stick with the defaults; check
out the results of different anticlustering specifications; repeat the
process; play around; read the documentation (especially
?anticlustering
); change arguments arbitrarily; compare the
output. Nothing can break.1
This document uses somewhat imperative language; nuance and explanations are given in the package documentation, the other vignettes, and the papers by Papenberg and Klau (2021; https://doi.org/10.1037/met0000301) and Papenberg (2024; https://doi.org/10.1111/bmsp.12315). Note that deciding which anticlustering objective to use usually requires substantial content considerations and cannot be reduced to “which one is better”. However, some hints are given below.
method = "local-maximum"
instead of the default
method = "exchange"
. It is unambiguously
better.repetitions
.standardize = TRUE
instead of the default
standardize = FALSE
.2objective = "diversity"
when the
group sizes are not equal (preferably, use
objective = "kplus"
or
objective = "average-diversity"
).objective = "variance"
.objective = "kplus"
over
objective = "variance"
(or check out the function
kplus_anticlustering()
).objective = "kplus"
instead of the default
objective = "diversity"
.standardize = TRUE
.Papenberg, M., & Klau, G. W. (2021). Using anticlustering to partition data sets into equivalent parts. Psychological Methods, 26(2), 161–174. https://doi.org/10.1037/met0000301.
Papenberg, M. (2024). K-plus Anticlustering: An Improved k-means Criterion for Maximizing Between-Group Similarity. British Journal of Mathematical and Statistical Psychology, 77 (1), 80–102. https://doi.org/10.1111/bmsp.12315
Well, actually your R session can break if you use an
optimal method (method = "ilp"
) with a data set that is too
large.↩︎
You might ask why standardize = TRUE
is not
the default. Actually, there are two reasons. First, the argument was
not always available in anticlust
and changing the default
behaviour of a function when releasing a new version is oftentimes
undesirable. Second, it seems like a big decision to me to just change
users’ data by default (which is done when standardizing the data). In
doubt, just compare the results of using standardize = TRUE
and standardize = FALSE
and decide for yourself which you
like best. Standardization may not be the best choice in all settings.↩︎
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.