Introduction

This vignette documents the use of the MANOVA.RM package for the analysis of semi-parametric repeated measures designs and multivariate data. The package consists of two important functions, which will be explained in detail below. Both functions calculate the Wald-type statistic (WTS) and the ANOVA-type statistic (ATS) based on means. Furthermore, different resampling approaches are provided in order to improve the small sample behavior of the WTS. These test statistics can be used for arbitrary semi-parametric designs, even with unequal covariance matrices among groups and small sample sizes.

The RM function

The RM function calculates the above mentioned test statistics in a repeated measures design with an arbitrary number of crossed whole-plot and sub-plot factors. The resampling methods provided are a permutation procedure, a parametric bootstrap approach and a wild bootstrap using Rademacher weights. The wild bootstrap is also implemented for the ATS.

Data Example 1 (One whole-plot and two sub-plot factors)

For illustration purposes, we consider the data set o2cons, which is included in MANOVA.RM.

library(MANOVA.RM)
data(o2cons)

The data set contains measurements of the oxygen consumption of leukocytes in the presence and absence of inactivated staphylococci at three consecutive time points. Due to the study design, both time and staphylococci are sub-plot factors while the treatment (Verum vs. Placebo) is a whole-plot factor.

head(o2cons)
##     O2 Staphylococci Time Group Subject
## 1 1.48             1    6     P       1
## 2 2.81             1   12     P       1
## 3 3.56             1   18     P       1
## 4 1.04             0    6     P       2
## 5 2.07             0   12     P       2
## 6 2.81             0   18     P       2

We will now analyze this data using the RM function. The RM function takes as arguments:

set.seed(1234)
model1 <- RM(O2 ~ Group * Staphylococci * Time, data = o2cons, 
             subject = "Subject", no.subf = 2, iter = 10000, resampling = "Perm", CPU = 1)
summary(model1)
## Call: 
## O2 ~ Group * Staphylococci * Time
## 
## Descriptive:
##    Group Staphylococci Time  n    Means Lower 95 % CI Upper 95 % CI
## 1      P             0    6 12 1.321667      1.150408      1.492926
## 5      P             0   12 12 2.430000      2.195672      2.664328
## 9      P             0   18 12 3.425000      3.123488      3.726512
## 3      P             1    6 12 1.618333      1.478675      1.757991
## 7      P             1   12 12 2.434167      2.164383      2.703950
## 11     P             1   18 12 3.526667      3.272821      3.780512
## 2      V             0    6 12 1.394167      1.200641      1.587692
## 6      V             0   12 12 2.570000      2.355010      2.784990
## 10     V             0   18 12 3.676667      3.374016      3.979317
## 4      V             1    6 12 1.655833      1.471327      1.840340
## 8      V             1   12 12 2.799167      2.500171      3.098162
## 12     V             1   18 12 4.029167      3.801804      4.256529
## 
## Wald-Type Statistic (WTS):
##                          Test statistic df      p-value
## Group                         11.167304  1 8.325153e-04
## Staphylococci                 20.400635  1 6.280894e-06
## Group:Staphylococci            2.554304  1 1.099942e-01
## Time                        4113.057018  2 0.000000e+00
## Group:Time                    24.105270  2 5.829176e-06
## Staphylococci:Time             4.334106  2 1.145146e-01
## Group:Staphylococci:Time       4.302876  2 1.163168e-01
## 
## ANOVA-Type Statistic (ATS):
##                          Test statistic      df1      df2      p-value
## Group                         11.167304 1.000000 316.2776 9.323112e-04
## Staphylococci                 20.400635 1.000000      Inf 6.280894e-06
## Group:Staphylococci            2.554304 1.000000      Inf 1.099942e-01
## Time                         960.208241 1.524477      Inf 0.000000e+00
## Group:Time                     5.393468 1.524477      Inf 9.237194e-03
## Staphylococci:Time             2.365958 1.982999      Inf 9.434742e-02
## Group:Staphylococci:Time       2.147250 1.982999      Inf 1.172659e-01
## 
## p-values resampling:
##                          Perm (WTS) Perm (ATS)
## Group                        0.0035         NA
## Staphylococci                0.0001         NA
## Group:Staphylococci          0.1200         NA
## Time                         0.0000         NA
## Group:Time                   0.0003         NA
## Staphylococci:Time           0.1541         NA
## Group:Staphylococci:Time     0.1496         NA

The output consists of four parts: model1$Descriptive gives an overview of the descriptive statistics: The number of observations, mean and confidence intervals (based on quantiles of the t-distribution) are displayed for each factor level combination. model1$WTS contains the results for the Wald-type test: The test statistic, degree of freedom and p-values based on the asymptotic \(\chi^2\) distribution are displayed. Note that the \(\chi^2\) approximation is very liberal for small sample sizes. model1$ATS contains the corresponding results based on the ATS. This test statistic tends to rather conservative decisions in the case of small sample sizes and is even asymptotically only an approximation, thus not providing an asymptotic level \(\alpha\) test. Finally, model1$resampling contains the p-values based on the chosen resampling approach. For the ATS, only the wild bootstrap procedure can be applied. Due to the above mentioned issues for small sample sizes, the resampling procedure is recommended for such situations.

Data Example 2 (Two sub-plot and two whole-plot factors)

We consider the data set EEG from the MANOVA.RM package: At the Department of Neurology, University Clinic of Salzburg, 160 patients were diagnosed with either AD, MCI, or SCC, based on neuropsychological diagnostics (Bathke et al.(2015), preprint). This data set contains z-scores for brain rate and Hjorth complexity, each measured at frontal, temporal and central electrode positions and averaged across hemispheres. In addition to standardization, complexity values were multiplied by -1 in order to make them more easily comparable to brain rate values: For brain rate we know that the values decrease with age and pathology, while Hjorth complexity values are known to increase with age and pathology. The three between-subjects factors considered were sex (men vs. women), diagnosis (AD vs. MCI vs. SCC), and age (\(< 70\) vs. \(>= 70\) years). Additionally, the within-subjects factors region (frontal, temporal, central) and feature (brain rate, complexity) structure the response vector.

data(EEG)
set.seed(987)
EEG_model <- RM(resp ~ sex * diagnosis * feature * region, 
                     data = EEG, subject = "id", no.subf = 2, resampling = "WildBS",
                     iter = 1000,  alpha = 0.01, CPU = 1)
summary(EEG_model)
## Call: 
## resp ~ sex * diagnosis * feature * region
## 
## Descriptive:
##    sex diagnosis    feature   region  n       Means Lower 99 % CI
## 1    M        AD  brainrate  central 12 -1.00974303   -4.88050009
## 13   M        AD  brainrate  frontal 12 -1.00676081   -4.99056034
## 25   M        AD  brainrate temporal 12 -0.98728648   -4.49344448
## 7    M        AD complexity  central 12 -1.48789095  -10.05319599
## 19   M        AD complexity  frontal 12 -1.08562580   -6.90620385
## 31   M        AD complexity temporal 12 -1.32044015   -7.20316997
## 3    M       MCI  brainrate  central 27 -0.44742815   -1.59051263
## 15   M       MCI  brainrate  frontal 27 -0.46371997   -1.64609499
## 27   M       MCI  brainrate temporal 27 -0.50628015   -1.58436017
## 9    M       MCI complexity  central 27 -0.25680596   -1.13870013
## 21   M       MCI complexity  frontal 27 -0.45922121   -1.99708310
## 33   M       MCI complexity temporal 27 -0.49000571   -1.79618083
## 5    M       SCC  brainrate  central 20  0.45927248   -0.41381197
## 17   M       SCC  brainrate  frontal 20  0.24296492   -0.67023546
## 29   M       SCC  brainrate temporal 20  0.40875170   -1.21026961
## 11   M       SCC complexity  central 20  0.34866657   -0.06998078
## 23   M       SCC complexity  frontal 20  0.09532784   -1.03651243
## 35   M       SCC complexity temporal 20  0.31400312   -0.59757724
## 2    W        AD  brainrate  central 24 -0.29374893   -1.97838450
## 14   W        AD  brainrate  frontal 24 -0.15918975   -1.81317962
## 26   W        AD  brainrate temporal 24 -0.28539451   -1.77644839
## 8    W        AD complexity  central 24 -0.12774393   -1.37182217
## 20   W        AD complexity  frontal 24  0.02573385   -1.21242033
## 32   W        AD complexity temporal 24 -0.19371702   -1.67017970
## 4    W       MCI  brainrate  central 30 -0.10647305   -1.07581790
## 16   W       MCI  brainrate  frontal 30 -0.07356190   -1.03211025
## 28   W       MCI  brainrate temporal 30 -0.06924853   -1.06377356
## 10   W       MCI complexity  central 30  0.09398357   -0.46428563
## 22   W       MCI complexity  frontal 30  0.13130291   -0.76790741
## 34   W       MCI complexity temporal 30  0.12144023   -0.65193792
## 6    W       SCC  brainrate  central 47  0.53736580   -0.04918933
## 18   W       SCC  brainrate  frontal 47  0.54829110   -0.06244934
## 30   W       SCC  brainrate temporal 47  0.55891259   -0.01510047
## 12   W       SCC complexity  central 47  0.38428656    0.10967160
## 24   W       SCC complexity  frontal 47  0.40347289   -0.03793270
## 36   W       SCC complexity temporal 47  0.50641224    0.13237975
##    Upper 99 % CI
## 1      2.8610140
## 13     2.9770387
## 25     2.5188715
## 7      7.0774141
## 19     4.7349522
## 31     4.5622897
## 3      0.6956563
## 15     0.7186551
## 27     0.5717999
## 9      0.6250882
## 21     1.0786407
## 33     0.8161694
## 5      1.3323569
## 17     1.1561653
## 29     2.0277730
## 11     0.7673139
## 23     1.2271681
## 35     1.2255835
## 2      1.3908866
## 14     1.4948001
## 26     1.2056594
## 8      1.1163343
## 20     1.2638880
## 32     1.2827457
## 4      0.8628718
## 16     0.8849864
## 28     0.9252765
## 10     0.6522528
## 22     1.0305132
## 34     0.8948184
## 6      1.1239209
## 18     1.1590315
## 30     1.1329256
## 12     0.6589015
## 24     0.8448785
## 36     0.8804447
## 
## Wald-Type Statistic (WTS):
##                              Test statistic df      p-value
## sex                              9.97296133  1 1.588558e-03
## diagnosis                       42.38284398  2 6.261558e-10
## sex:diagnosis                    3.77699775  2 1.512988e-01
## feature                          0.08646315  1 7.687226e-01
## sex:feature                      2.16726982  1 1.409763e-01
## diagnosis:feature                5.31686865  2 7.005782e-02
## sex:diagnosis:feature            1.73538341  2 4.199197e-01
## region                           0.06961597  2 9.657908e-01
## sex:region                       0.87591221  2 6.453541e-01
## diagnosis:region                 6.12148594  4 1.902575e-01
## sex:diagnosis:region             1.53151907  4 8.210440e-01
## feature:region                   0.65247268  2 7.216346e-01
## sex:feature:region               0.42264779  2 8.095118e-01
## diagnosis:feature:region         7.14232065  4 1.285557e-01
## sex:diagnosis:feature:region     2.27379685  4 6.855437e-01
## 
## ANOVA-Type Statistic (ATS):
##                              Test statistic      df1      df2      p-value
## sex                              9.97296133 1.000000 657.4156 1.661290e-03
## diagnosis                       13.12350327 1.343095 657.4156 5.927116e-05
## sex:diagnosis                    1.90378291 1.343095 657.4156 1.635926e-01
## feature                          0.08646315 1.000000      Inf 7.687226e-01
## sex:feature                      2.16726982 1.000000      Inf 1.409763e-01
## diagnosis:feature                1.43695004 1.561987      Inf 2.380325e-01
## sex:diagnosis:feature            1.03088198 1.561987      Inf 3.416028e-01
## region                           0.01784709 1.610689      Inf 9.650363e-01
## sex:region                       0.37086704 1.610689      Inf 6.440475e-01
## diagnosis:region                 1.09114816 2.045826      Inf 3.368500e-01
## sex:diagnosis:region             0.37621401 2.045826      Inf 6.912290e-01
## feature:region                   0.12552423 1.420760      Inf 8.099026e-01
## sex:feature:region               0.07709516 1.420760      Inf 8.636365e-01
## diagnosis:feature:region         0.82935062 1.624110      Inf 4.146809e-01
## sex:diagnosis:feature:region     0.61120071 1.624110      Inf 5.098169e-01
## 
## p-values resampling:
##                              WildBS (WTS) WildBS (ATS)
## sex                                 0.000        0.000
## diagnosis                           0.000        0.000
## sex:diagnosis                       0.128        0.116
## feature                             0.778        0.778
## sex:feature                         0.168        0.168
## diagnosis:feature                   0.061        0.250
## sex:diagnosis:feature               0.445        0.366
## region                              0.976        0.985
## sex:region                          0.688        0.723
## diagnosis:region                    0.181        0.308
## sex:diagnosis:region                0.871        0.819
## feature:region                      0.811        0.921
## sex:feature:region                  0.886        0.955
## diagnosis:feature:region            0.129        0.522
## sex:diagnosis:feature:region        0.747        0.636

We find significant effects at level \(\alpha = 0.01\) of the whole-plot factors sex and diagnosis, while none of the sub-plot factors or interactions become significant.

Plotting

The RM() function is equipped with a plotting option, displaying the calculated means along with \((1-\alpha)\) confidence intervals. The plot function takes an RM object as an argument. In addition, the factor of interest may be specified. If this argument is omitted in a two- or higher-way layout, the user is asked to specify the factor for plotting. Furthermore, additional graphical parameters can be used to customize the plots. The optional argument legendpos specifies the position of the legend in higher-way layouts.

plot(EEG_model, factor = "sex", main = "Effect of sex on EEG values")

plot of chunk unnamed-chunk-5

plot(EEG_model, factor = "sex:diagnosis", legendpos = "topleft", col = c(4, 2))

plot of chunk unnamed-chunk-5

plot(EEG_model, factor = "sex:diagnosis:feature", legendpos = "center")

plot of chunk unnamed-chunk-5

The MANOVA function

The MANOVA function calculates the above mentioned test statistics for multivariate data in a design with crossed or nested factors. The resampling methods provided are a parametric bootstrap approach and a wild bootstrap using Rademacher weights. The wild bootstrap is also implemented for the ATS, while the parametric approach works only for the WTS, see Konietschke et al. (2015) for details. Note that only balanced nested designs (i.e., the same number of factor levels \(b\) for each level of the factor \(A\)) with up to three factors are implemented. Designs involving both crossed and nested factors are not implemented.

Data Example MANOVA (two crossed factors)

We again consider the data set EEG from the MANOVA.RM package, but now we ignore the sub-plot factor structure. Therefore, we are now in a multivariate setting with 6 measurements per patient and three crossed factors sex, age and diagnosis. Due to the small number of subjects in some groups (e.g., only 2 male patients aged \(<\) 70 were diagnosed with AD) we restrict our analyses to two factors at a time. The analysis of this example is shown below.

The MANOVA function takes as arguments:

data(EEG)
set.seed(987)
EEG_MANOVA <- MANOVA(resp ~ sex * diagnosis, 
                     data = EEG, subject = "id", resampling = "paramBS", 
                     iter = 1000,  alpha = 0.01, CPU = 1)
summary(EEG_MANOVA)
## Call: 
## resp ~ sex * diagnosis
## 
## Descriptive:
##   sex diagnosis  n      Mean 1     Mean 2     Mean 3     Mean 4
## 1   M        AD 12 -0.98728648 -1.0067608 -1.0097430 -1.3204402
## 3   M       MCI 27 -0.50628015 -0.4637200 -0.4474281 -0.4900057
## 5   M       SCC 20  0.40875170  0.2429649  0.4592725  0.3140031
## 2   W        AD 24 -0.28539451 -0.1591898 -0.2937489 -0.1937170
## 4   W       MCI 30 -0.06924853 -0.0735619 -0.1064731  0.1214402
## 6   W       SCC 47  0.55891259  0.5482911  0.5373658  0.5064122
##        Mean 5      Mean 6
## 1 -1.08562580 -1.48789095
## 3 -0.45922121 -0.25680596
## 5  0.09532784  0.34866657
## 2  0.02573385 -0.12774393
## 4  0.13130291  0.09398357
## 6  0.40347289  0.38428656
## 
## Wald-Type Statistic (WTS):
##               Test statistic df      p-value
## sex                12.604176  6 4.977046e-02
## diagnosis          55.158000 12 1.695621e-07
## sex:diagnosis       9.790162 12 6.343637e-01
## 
## ANOVA-Type Statistic (ATS):
##               Test statistic      df1 df2      p-value
## sex                 7.333559 1.796816 Inf 1.047708e-03
## diagnosis           9.624691 2.323563 Inf 2.244646e-05
## sex:diagnosis       1.536449 2.323563 Inf 2.116279e-01
## 
## p-values resampling:
##               paramBS (WTS) paramBS (ATS)
## sex                   0.122            NA
## diagnosis             0.001            NA
## sex:diagnosis         0.753            NA

optional GUI

The MANOVA.RM package is equipped with an optional graphical user interface, which is based on RGtk2. The GUI may be started in R (if RGtk2 is installed) using the command GUI.RM() and GUI.MANOVA() for repeated measures designs and multivariate data, respectively.

GUI.MANOVA()

The user can specify the data location (either directly or via the “load data” button), the formula, the number of iterations for the resampling approach and the significance level. Furthermore, one needs to specify the number of sub-plot factors (for the repeated measures design only), the 'subject' variable in the data frame and the resampling method. Additionally, one can specify whether or not headers are included in the data file, and which separator (e.g., ',' for *.csv files) and character symbols are used for decimals in the data file. The GUI for RM also provides a plotting option, which generates a new window for specifying the factors to be plotted (in higher-way layouts) along with a few plotting parameters.