This vignette documents the use of the MANOVA.RM
package for the analysis of semi-parametric repeated measures designs and multivariate data.
The package consists of two important functions, which will be explained in detail below. Both functions calculate the Wald-type statistic
(WTS) and the ANOVA-type statistic (ATS) based on means. Furthermore, different resampling approaches are provided in order to improve the
small sample behavior of the WTS. These test statistics can be used for arbitrary semi-parametric designs, even with unequal covariance matrices among groups
and small sample sizes.
RM
functionThe RM
function calculates the above mentioned test statistics in a repeated measures design with an arbitrary number of crossed whole-plot and
sub-plot factors.
The resampling methods provided are a permutation procedure, a parametric bootstrap approach and a
wild bootstrap using Rademacher weights. The wild bootstrap is also implemented for the ATS.
For illustration purposes, we consider the data set o2cons
, which is included in MANOVA.RM
.
library(MANOVA.RM)
data(o2cons)
The data set contains measurements of the oxygen consumption of leukocytes in the presence and absence of inactivated staphylococci at three consecutive time points. Due to the study design, both time and staphylococci are sub-plot factors while the treatment (Verum vs. Placebo) is a whole-plot factor.
head(o2cons)
## O2 Staphylococci Time Group Subject
## 1 1.48 1 6 P 1
## 2 2.81 1 12 P 1
## 3 3.56 1 18 P 1
## 4 1.04 0 6 P 2
## 5 2.07 0 12 P 2
## 6 2.81 0 18 P 2
We will now analyze this data using the RM
function.
The RM
function takes as arguments:
formula
: A formula consisting of the outcome variable on the left hand side of a ~ operator and the factor
variables of interest on the right hand side. An interaction term must be specified.data
: A data.frame, list or environment containing the variables in formula
.subject
: The column name of the subjects variable in the data frame.no.subf
: The number of sub-plot factors, default is 1.iter
: The number of iterations for the resampling approach. Default value is 10000.alpha
: The significance level, default is 0.05.resampling
: The resampling method, one of 'Perm', 'paramBS' or 'WildBS'. Default is
set to 'Perm'.set.seed(1234)
model1 <- RM(O2 ~ Group * Staphylococci * Time, data = o2cons,
subject = "Subject", no.subf = 2, iter = 10000, resampling = "Perm", CPU = 1)
summary(model1)
## Call:
## O2 ~ Group * Staphylococci * Time
##
## Descriptive:
## Group Staphylococci Time n Means Lower 95 % CI Upper 95 % CI
## 1 P 0 6 12 1.321667 1.150408 1.492926
## 5 P 0 12 12 2.430000 2.195672 2.664328
## 9 P 0 18 12 3.425000 3.123488 3.726512
## 3 P 1 6 12 1.618333 1.478675 1.757991
## 7 P 1 12 12 2.434167 2.164383 2.703950
## 11 P 1 18 12 3.526667 3.272821 3.780512
## 2 V 0 6 12 1.394167 1.200641 1.587692
## 6 V 0 12 12 2.570000 2.355010 2.784990
## 10 V 0 18 12 3.676667 3.374016 3.979317
## 4 V 1 6 12 1.655833 1.471327 1.840340
## 8 V 1 12 12 2.799167 2.500171 3.098162
## 12 V 1 18 12 4.029167 3.801804 4.256529
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## Group 11.167304 1 8.325153e-04
## Staphylococci 20.400635 1 6.280894e-06
## Group:Staphylococci 2.554304 1 1.099942e-01
## Time 4113.057018 2 0.000000e+00
## Group:Time 24.105270 2 5.829176e-06
## Staphylococci:Time 4.334106 2 1.145146e-01
## Group:Staphylococci:Time 4.302876 2 1.163168e-01
##
## ANOVA-Type Statistic (ATS):
## Test statistic df1 df2 p-value
## Group 11.167304 1.000000 316.2776 9.323112e-04
## Staphylococci 20.400635 1.000000 Inf 6.280894e-06
## Group:Staphylococci 2.554304 1.000000 Inf 1.099942e-01
## Time 960.208241 1.524477 Inf 0.000000e+00
## Group:Time 5.393468 1.524477 Inf 9.237194e-03
## Staphylococci:Time 2.365958 1.982999 Inf 9.434742e-02
## Group:Staphylococci:Time 2.147250 1.982999 Inf 1.172659e-01
##
## p-values resampling:
## Perm (WTS) Perm (ATS)
## Group 0.0035 NA
## Staphylococci 0.0001 NA
## Group:Staphylococci 0.1200 NA
## Time 0.0000 NA
## Group:Time 0.0003 NA
## Staphylococci:Time 0.1541 NA
## Group:Staphylococci:Time 0.1496 NA
The output consists of four parts: model1$Descriptive
gives an overview of the descriptive statistics: The number of observations,
mean and confidence intervals (based on quantiles of the t-distribution) are displayed for each factor level combination.
model1$WTS
contains the results for the Wald-type test: The test statistic, degree of freedom and p-values based on the asymptotic \(\chi^2\) distribution
are displayed. Note that the \(\chi^2\) approximation is very liberal for small sample sizes.
model1$ATS
contains the corresponding results based on the ATS.
This test statistic tends to rather
conservative decisions in the case of small sample sizes and is even asymptotically only an approximation, thus not providing an asymptotic level \(\alpha\) test.
Finally, model1$resampling
contains the p-values based on
the chosen resampling approach. For the ATS, only the wild bootstrap procedure can be applied.
Due to the above mentioned issues for small sample sizes, the resampling procedure is recommended for such situations.
We consider the data set EEG
from the MANOVA.RM
package: At the Department of Neurology, University Clinic of Salzburg, 160 patients were diagnosed
with either AD, MCI, or SCC, based on neuropsychological diagnostics (Bathke et al.(2015), preprint). This data set contains z-scores for
brain rate and Hjorth complexity,
each measured at frontal, temporal and central electrode positions and averaged across hemispheres. In addition to standardization, complexity
values were multiplied by -1 in order to make them more easily comparable to brain rate
values: For brain rate we know that the values decrease with age and pathology, while
Hjorth complexity values are known to increase with age and pathology.
The three between-subjects factors considered were sex (men vs. women), diagnosis (AD
vs. MCI vs. SCC), and age (\(< 70\) vs. \(>= 70\) years). Additionally, the within-subjects factors region (frontal, temporal, central) and
feature (brain rate, complexity) structure the response vector.
data(EEG)
set.seed(987)
EEG_model <- RM(resp ~ sex * diagnosis * feature * region,
data = EEG, subject = "id", no.subf = 2, resampling = "WildBS",
iter = 1000, alpha = 0.01, CPU = 1)
summary(EEG_model)
## Call:
## resp ~ sex * diagnosis * feature * region
##
## Descriptive:
## sex diagnosis feature region n Means Lower 99 % CI
## 1 M AD brainrate central 12 -1.00974303 -4.88050009
## 13 M AD brainrate frontal 12 -1.00676081 -4.99056034
## 25 M AD brainrate temporal 12 -0.98728648 -4.49344448
## 7 M AD complexity central 12 -1.48789095 -10.05319599
## 19 M AD complexity frontal 12 -1.08562580 -6.90620385
## 31 M AD complexity temporal 12 -1.32044015 -7.20316997
## 3 M MCI brainrate central 27 -0.44742815 -1.59051263
## 15 M MCI brainrate frontal 27 -0.46371997 -1.64609499
## 27 M MCI brainrate temporal 27 -0.50628015 -1.58436017
## 9 M MCI complexity central 27 -0.25680596 -1.13870013
## 21 M MCI complexity frontal 27 -0.45922121 -1.99708310
## 33 M MCI complexity temporal 27 -0.49000571 -1.79618083
## 5 M SCC brainrate central 20 0.45927248 -0.41381197
## 17 M SCC brainrate frontal 20 0.24296492 -0.67023546
## 29 M SCC brainrate temporal 20 0.40875170 -1.21026961
## 11 M SCC complexity central 20 0.34866657 -0.06998078
## 23 M SCC complexity frontal 20 0.09532784 -1.03651243
## 35 M SCC complexity temporal 20 0.31400312 -0.59757724
## 2 W AD brainrate central 24 -0.29374893 -1.97838450
## 14 W AD brainrate frontal 24 -0.15918975 -1.81317962
## 26 W AD brainrate temporal 24 -0.28539451 -1.77644839
## 8 W AD complexity central 24 -0.12774393 -1.37182217
## 20 W AD complexity frontal 24 0.02573385 -1.21242033
## 32 W AD complexity temporal 24 -0.19371702 -1.67017970
## 4 W MCI brainrate central 30 -0.10647305 -1.07581790
## 16 W MCI brainrate frontal 30 -0.07356190 -1.03211025
## 28 W MCI brainrate temporal 30 -0.06924853 -1.06377356
## 10 W MCI complexity central 30 0.09398357 -0.46428563
## 22 W MCI complexity frontal 30 0.13130291 -0.76790741
## 34 W MCI complexity temporal 30 0.12144023 -0.65193792
## 6 W SCC brainrate central 47 0.53736580 -0.04918933
## 18 W SCC brainrate frontal 47 0.54829110 -0.06244934
## 30 W SCC brainrate temporal 47 0.55891259 -0.01510047
## 12 W SCC complexity central 47 0.38428656 0.10967160
## 24 W SCC complexity frontal 47 0.40347289 -0.03793270
## 36 W SCC complexity temporal 47 0.50641224 0.13237975
## Upper 99 % CI
## 1 2.8610140
## 13 2.9770387
## 25 2.5188715
## 7 7.0774141
## 19 4.7349522
## 31 4.5622897
## 3 0.6956563
## 15 0.7186551
## 27 0.5717999
## 9 0.6250882
## 21 1.0786407
## 33 0.8161694
## 5 1.3323569
## 17 1.1561653
## 29 2.0277730
## 11 0.7673139
## 23 1.2271681
## 35 1.2255835
## 2 1.3908866
## 14 1.4948001
## 26 1.2056594
## 8 1.1163343
## 20 1.2638880
## 32 1.2827457
## 4 0.8628718
## 16 0.8849864
## 28 0.9252765
## 10 0.6522528
## 22 1.0305132
## 34 0.8948184
## 6 1.1239209
## 18 1.1590315
## 30 1.1329256
## 12 0.6589015
## 24 0.8448785
## 36 0.8804447
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## sex 9.97296133 1 1.588558e-03
## diagnosis 42.38284398 2 6.261558e-10
## sex:diagnosis 3.77699775 2 1.512988e-01
## feature 0.08646315 1 7.687226e-01
## sex:feature 2.16726982 1 1.409763e-01
## diagnosis:feature 5.31686865 2 7.005782e-02
## sex:diagnosis:feature 1.73538341 2 4.199197e-01
## region 0.06961597 2 9.657908e-01
## sex:region 0.87591221 2 6.453541e-01
## diagnosis:region 6.12148594 4 1.902575e-01
## sex:diagnosis:region 1.53151907 4 8.210440e-01
## feature:region 0.65247268 2 7.216346e-01
## sex:feature:region 0.42264779 2 8.095118e-01
## diagnosis:feature:region 7.14232065 4 1.285557e-01
## sex:diagnosis:feature:region 2.27379685 4 6.855437e-01
##
## ANOVA-Type Statistic (ATS):
## Test statistic df1 df2 p-value
## sex 9.97296133 1.000000 657.4156 1.661290e-03
## diagnosis 13.12350327 1.343095 657.4156 5.927116e-05
## sex:diagnosis 1.90378291 1.343095 657.4156 1.635926e-01
## feature 0.08646315 1.000000 Inf 7.687226e-01
## sex:feature 2.16726982 1.000000 Inf 1.409763e-01
## diagnosis:feature 1.43695004 1.561987 Inf 2.380325e-01
## sex:diagnosis:feature 1.03088198 1.561987 Inf 3.416028e-01
## region 0.01784709 1.610689 Inf 9.650363e-01
## sex:region 0.37086704 1.610689 Inf 6.440475e-01
## diagnosis:region 1.09114816 2.045826 Inf 3.368500e-01
## sex:diagnosis:region 0.37621401 2.045826 Inf 6.912290e-01
## feature:region 0.12552423 1.420760 Inf 8.099026e-01
## sex:feature:region 0.07709516 1.420760 Inf 8.636365e-01
## diagnosis:feature:region 0.82935062 1.624110 Inf 4.146809e-01
## sex:diagnosis:feature:region 0.61120071 1.624110 Inf 5.098169e-01
##
## p-values resampling:
## WildBS (WTS) WildBS (ATS)
## sex 0.000 0.000
## diagnosis 0.000 0.000
## sex:diagnosis 0.128 0.116
## feature 0.778 0.778
## sex:feature 0.168 0.168
## diagnosis:feature 0.061 0.250
## sex:diagnosis:feature 0.445 0.366
## region 0.976 0.985
## sex:region 0.688 0.723
## diagnosis:region 0.181 0.308
## sex:diagnosis:region 0.871 0.819
## feature:region 0.811 0.921
## sex:feature:region 0.886 0.955
## diagnosis:feature:region 0.129 0.522
## sex:diagnosis:feature:region 0.747 0.636
We find significant effects at level \(\alpha = 0.01\) of the whole-plot factors sex and diagnosis, while none of the sub-plot factors or interactions become significant.
The RM()
function is equipped with a plotting option, displaying the calculated means along with \((1-\alpha)\) confidence intervals.
The plot
function takes an RM
object as an argument. In addition, the factor of interest may be specified. If this argument is
omitted in a two- or higher-way layout, the user is asked to specify the factor for plotting. Furthermore, additional graphical parameters
can be used to customize the plots. The optional argument legendpos
specifies the position of the legend in higher-way layouts.
plot(EEG_model, factor = "sex", main = "Effect of sex on EEG values")
plot(EEG_model, factor = "sex:diagnosis", legendpos = "topleft", col = c(4, 2))
plot(EEG_model, factor = "sex:diagnosis:feature", legendpos = "center")
MANOVA
functionThe MANOVA
function calculates the above mentioned test statistics for multivariate data in a design with crossed or nested factors.
The resampling methods provided are a parametric bootstrap approach and a
wild bootstrap using Rademacher weights. The wild bootstrap is also implemented for the ATS, while the parametric approach works only for the WTS,
see Konietschke et al. (2015) for details.
Note that only balanced nested designs (i.e., the same number of factor levels \(b\) for each level of the factor \(A\)) with up to three factors are
implemented. Designs involving both crossed and nested factors are not implemented.
We again consider the data set EEG
from the MANOVA.RM
package, but now we ignore the sub-plot factor structure. Therefore, we are
now in a multivariate setting with 6 measurements per patient and three crossed factors sex, age and diagnosis. Due to the small number of subjects in
some groups (e.g., only 2 male patients aged \(<\) 70 were diagnosed with AD) we restrict our analyses to two factors at a time.
The analysis of this example is shown below.
The MANOVA
function takes as arguments:
formula
: A formula consisting of the outcome variable on the left hand side of a ~ operator and the factor
variables of interest on the right hand side. An interaction term must be specified.data
: A data.frame, list or environment containing the variables in formula
.subject
: The column name of the subjects variable in the data frame.iter
: The number of iterations for the resampling procedure. Default value is 10000.alpha
: The significance level, default is 0.05.resampling
: The resampling method, one of 'paramBS' and 'WildBS'. Default is
set to 'paramBS'.data(EEG)
set.seed(987)
EEG_MANOVA <- MANOVA(resp ~ sex * diagnosis,
data = EEG, subject = "id", resampling = "paramBS",
iter = 1000, alpha = 0.01, CPU = 1)
summary(EEG_MANOVA)
## Call:
## resp ~ sex * diagnosis
##
## Descriptive:
## sex diagnosis n Mean 1 Mean 2 Mean 3 Mean 4
## 1 M AD 12 -0.98728648 -1.0067608 -1.0097430 -1.3204402
## 3 M MCI 27 -0.50628015 -0.4637200 -0.4474281 -0.4900057
## 5 M SCC 20 0.40875170 0.2429649 0.4592725 0.3140031
## 2 W AD 24 -0.28539451 -0.1591898 -0.2937489 -0.1937170
## 4 W MCI 30 -0.06924853 -0.0735619 -0.1064731 0.1214402
## 6 W SCC 47 0.55891259 0.5482911 0.5373658 0.5064122
## Mean 5 Mean 6
## 1 -1.08562580 -1.48789095
## 3 -0.45922121 -0.25680596
## 5 0.09532784 0.34866657
## 2 0.02573385 -0.12774393
## 4 0.13130291 0.09398357
## 6 0.40347289 0.38428656
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## sex 12.604176 6 4.977046e-02
## diagnosis 55.158000 12 1.695621e-07
## sex:diagnosis 9.790162 12 6.343637e-01
##
## ANOVA-Type Statistic (ATS):
## Test statistic df1 df2 p-value
## sex 7.333559 1.796816 Inf 1.047708e-03
## diagnosis 9.624691 2.323563 Inf 2.244646e-05
## sex:diagnosis 1.536449 2.323563 Inf 2.116279e-01
##
## p-values resampling:
## paramBS (WTS) paramBS (ATS)
## sex 0.122 NA
## diagnosis 0.001 NA
## sex:diagnosis 0.753 NA
The MANOVA.RM
package is equipped with an optional graphical user interface, which is based on RGtk2
. The GUI may be started in R
(if RGtk2
is installed) using the
command GUI.RM()
and GUI.MANOVA()
for repeated measures designs and multivariate data, respectively.
GUI.MANOVA()
The user can specify the data location
(either directly or via the “load data” button), the formula, the number of iterations for the resampling approach and
the significance level. Furthermore, one needs to specify the number of sub-plot factors (for the repeated measures design only),
the 'subject' variable in the
data frame and the resampling method.
Additionally, one can specify whether or not headers are
included in the data file, and which separator (e.g., ',' for *.csv files) and character symbols are used for decimals
in the data file.
The GUI for RM
also provides a plotting option, which generates a new window
for specifying the factors to be plotted (in higher-way layouts) along with a few plotting
parameters.