In this package there are currently 2 functions that provide robust
alternatives to the t_TOST
function.
The Wilcoxon group of tests (includes Mann-Whitney U-test) provide a
non-parametric test of differences between groups, or within samples,
based on ranks. This provides a test of location shift, which is a fancy
way of saying differences in the center of the distribution (i.e., in
parametric tests the location is mean). With TOST, there are two
separate tests of directional location shift to determine if the
location shift is within (equivalence) or outside (minimal effect). The
exact calculations can be explored via the documentation of the
wilcox.test
function.
In the TOSTER package, we accomplish this with the
wilcox_TOST
function. Overall, this function operates
extremely similar to the t_TOST
function. However, the
standardized mean difference (SMD) is not calculated. Instead
the rank-biserial correlation is calculated for all types of
comparisons (e.g., two sample, one sample, and paired samples). Also,
there is no plotting capability at this time for the output of this
function.
As an example, we can use the sleep data to make a non-parametric comparison of equivalence.
data('sleep')
library(TOSTER)
= wilcox_TOST(formula = extra ~ group,
test1 data = sleep,
paired = FALSE,
low_eqbound = -.5,
high_eqbound = .5)
print(test1)
##
## Wilcoxon rank sum test with continuity correction
## Hypothesis Tested: Equivalence
## Equivalence Bounds (raw):-0.500 & 0.500
## Alpha Level:0.05
## The equivalence test was non-significant W = 20.000, p = 8.94e-01
## The null hypothesis test was non-significant W = 25.500, p = 6.93e-02
## NHST: don't reject null significance hypothesis that the effect is equal to zero
## TOST: don't reject null equivalence hypothesis
##
## TOST Results
## statistic p.value
## NHST 25.5 0.06932758
## TOST Lower 34.0 0.89385308
## TOST Upper 20.0 0.01287404
##
## Effect Sizes
## estimate lower.ci upper.ci conf.level
## Median of Differences -1.346388 -3.3999651 -0.09995341 0.9
## rank-biserial correlation -0.490000 -0.7492521 -0.10053222 0.9
The standardized effect size reported for the
wilcox_TOST
procedure is the rank-biserial correlation.
This is a fairly intuitive measure of effect size which has the same
interpretation of the common language effect size (Kerby 2014). However, instead of assuming
normality and equal variances, the rank-biserial correlation calculates
the number of favorable (positive) and unfavorable (negative) pairs
based on their respective ranks.
Overall, the correlation is calculated as the proportion of favorable pairs minus the unfavorable pairs.
\[ r_{biserial} = f_{pairs} - u_{pairs} \]
The Fisher approximation is used to calculate the confidence intervals.
For paired samples the standard error is calculated as the following:
\[ SE_r = \sqrt{ \frac {(2 \cdot nd^3 + 3 \cdot nd^2 + nd) / 6} {(nd^2 + nd) / 2} } \]
wherein, nd represents the total count not equal to zero.
For independent samples, the standard error is calculated as the following:
\[ SE_r = \sqrt{\frac {(n1 + n2 + 1)} { (3 \cdot n1 \cdot n2)}} \]
The confidence intervals can then be calculated by transforming the estimate.
\[ r_z = atanh(r_{biserial}) \]
Then the confidence interval can be calculated and back transformed.
\[ r_{CI} = tanh(r_z \pm Z_{(1 - \alpha / 2)} \cdot SE_r) \]
We hope a bootstrapped version of this confidence interval will be added in future updates.
The boostrap is a simulation based technique, derived from resampling
with replacement, designed for statistical estimation and inference.
Bootsrapping techniques are very useful because they are considered
somewhat robust to the violations of assumptions for a simple t-test.
Therefore we added a bootsrap option, boot_t_TOST
to the
package to provide another robust alternative to the t_TOST
function.
In this function we provide a percentile bootstrap solution outlined
by Efron and Tibshirani (1993) (see
chapter 16, page 220). The bootstrapped p-values are derived from the
“studentized” version of a test of mean differences outlined by Efron and Tibshirani (1993). Overall, the
results should be similar to the results of t_TOST
.
However, for paired samples, the Cohen’s d(rm) effect
size cannot be calculated at this time.
Form B bootstrap data sets from x* and y* wherein x* is sampled with replacement from \(\tilde x_1,\tilde x_2, ... \tilde x_n\) and y* is sampled with replacement from \(\tilde y_1,\tilde y_2, ... \tilde y_n\)
t is then evaluated on each sample, but the mean of each sample (y or x) and the overall average (z) are subtracted from each
\[ t(z^{*b}) = \frac {(\bar x^*-\bar x - \bar z) - (\bar y^*-\bar y - \bar z)}{\sqrt {sd_y^*/n_y + sd_x^*/n_x}} \]
\[ p_{boot} = \frac {\#t(z^{*b}) \ge t_{sample}}{B} \]
The same process is completed for the one sample case but with the one sample solution for the equation outlined by \(t(z^{*b})\). The paired sample case in this bootstrap procedure is equivalent to the one sample solution because the test is based on the difference scores.
Again, we can use the sleep data to see the bootstrapped results. Notice that the plots show how the resampling via boostrapping indicates the instability of Hedges’ d(z).
data('sleep')
= boot_t_TOST(formula = extra ~ group,
test1 data = sleep,
paired = TRUE,
low_eqbound = -.5,
high_eqbound = .5,
R = 999)
print(test1)
##
## Bootstrapped Paired t-test
## Hypothesis Tested: Equivalence
## Equivalence Bounds (raw):-0.500 & 0.500
## Alpha Level:0.05
## The equivalence test was non-significant, t(9) = -2.777, p = 1e+00
## The null hypothesis test was significant, t(9) = -4.062, p = 0e+00
## NHST: reject null significance hypothesis that the effect is equal to zero
## TOST: don't reject null equivalence hypothesis
##
## TOST Results
## t SE df p.value
## t-test -4.062128 0.3889587 9 0
## TOST Lower -2.776644 0.3889587 9 1
## TOST Upper -5.347611 0.3889587 9 0
##
## Effect Sizes
## estimate SE lower.ci upper.ci conf.level
## Raw -1.580000 0.3709668 -2.210000 -1.0300000 0.9
## Hedges' g(z) -1.230152 0.7414507 -2.818478 -0.9184013 0.9
##
## Note: percentile boostrap method utilized.
plot(test1)