The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette demonstrates the methods of splitting data that are
supported by the splithalfr
. Each splitting method is
illustrated by calling by_split
with the right arguments,
printing to the terminal what data is in each of the two parts produced
by a split. For a comprehensive review of each splitting method, see Pronk et
al. (2021).
We’ll use this example dataset with eight trials of one participant, each trial having a condition and rt variable.
ds <- data.frame(
participant = rep(1, 8),
condition = rep(c("a", "b"), each = 4),
rt = 100 * 1 : 8
)
First-second splitting assigns trials of the first half of rows to
one part and trials of the second half of rows to the other (Green et al., 2016;
Webb, Shavelson,
& Haertel, 1996; Williams &
Kaufmann, 2012). For this splitting method, set method
to first_second
.
dummy = by_split(
ds,
ds$participant,
method = "first_second",
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
Odd-even splitting assigns trials with an odd row number to one part
and trials with an even row number to the other (Green et al., 2016;
Webb, Shavelson,
& Haertel, 1996; Williams &
Kaufmann, 2012). For this splitting method, set method
to odd_even
.
dummy = by_split(
ds,
ds$participant,
method = "odd_even",
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
Permutated splitting is also known as random splitting (Kopp, Lange, &
Steinke, 2021), bootstrapped splitting (Parsons, Kruijt, &
Fox, 2019) and random sample of split halves (Williams &
Kaufmann, 2012). It assigns trials to each part via random sampling
without replacement. This splitting method is the default, but you can
make it explicit by setting method
to random
.
In practice, random splits are averaged over many replications, but for
illustration we’re only printing one.
dummy = by_split(
ds,
ds$participant,
method = "random",
replications = 1,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
Monte Carlo splitting assigns trials to each part by sampling with
replacement (Williams &
Kaufmann, 2012). For constructing parts that are of any length, use
the split_p
argument and set replace
to
TRUE
. The example below constructs two parts of the same
length as the original dataset by setting split_p
to 1.
dummy = by_split(
ds,
ds$participant,
method = "random",
replace = TRUE,
split_p = 1,
replications = 1,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
If a split is stratified by a variable, then trials are separately
assigned to each part for each level of that variable (Green et al.,
2016). For example, if splits are stratified by
ds$condition
, the trials with condition a and b are split
separately. Stratification can be used in combination with any of the
methods above. For illustration we combine it with first-second
splitting
dummy = by_split(
ds,
ds$participant,
method = "first_second",
stratification = ds$condition,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
In a subsampled split, a subset of the trials is randomly sampled without replacement and then split (see the supplementary materials of Hedge, Powell, & Sumner, 2018). Sub-sampling only works well with splitting methods that uses random sampling (permutated and Monte Carlo). Since the sub-sampling procedure already randomizes the trials selected for splitting, splitting methods that assign trials to part based on their row number, such as first-second and odd-even, should give results that are similar to permutated splitting. Any stratifications are applied both to the sub-sampling and splitting.
dummy = by_split(
ds,
ds$participant,
method = "random",
stratification = ds$condition,
subsample_p = 0.5,
function(ds) { print(ds); },
ncores = 1,
verbose = F
)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.