Using dabestr

How to create estimation plots

Joses Ho

2019-06-26

Create Data

For this vignette, we will create and use a synthetic dataset.

library(dplyr)

set.seed(54321)

N = 40
c1 <- rnorm(N, mean = 100, sd = 25)
c2 <- rnorm(N, mean = 100, sd = 50)
g1 <- rnorm(N, mean = 120, sd = 25)
g2 <- rnorm(N, mean = 80, sd = 50)
g3 <- rnorm(N, mean = 100, sd = 12)
g4 <- rnorm(N, mean = 100, sd = 50)
gender <- c(rep('Male', N/2), rep('Female', N/2))
id <- 1: N


wide.data <- 
  tibble::tibble(
    Control1 = c1, Control2 = c2,
    Group1 = g1, Group2 = g2, Group3 = g3, Group4 = g4,
    Gender = gender, ID = id)


my.data   <- 
  wide.data %>%
  tidyr::gather(key = Group, value = Measurement, -ID, -Gender)

head(my.data)
## # A tibble: 6 x 4
##   Gender    ID Group    Measurement
##   <chr>  <int> <chr>          <dbl>
## 1 Male       1 Control1        95.5
## 2 Male       2 Control1        76.8
## 3 Male       3 Control1        80.4
## 4 Male       4 Control1        58.7
## 5 Male       5 Control1        89.8
## 6 Male       6 Control1        72.6

This dataset is a tidy dataset, where each observation (datapoint) is a row, and each variable (or associated metadata) is a column. dabestr requires that data be in this form, as do other popular R packages for data visualization and analysis.

The Gardner-Altman Two Group Estimation Plot

Unpaired

The dabest function is the main workhorse of the dabestr package. To create a two-group estimation plot (aka a Gardner-Altman plot), specify:

library(dabestr)

two.group.unpaired <- 
  my.data %>%
  dabest(Group, Measurement, 
         # The idx below passes "Control" as the control group, 
         # and "Group1" as the test group. The mean difference
         # will be computed as mean(Group1) - mean(Control1).
         idx = c("Control1", "Group1"), 
         paired = FALSE)

# Calling the object automatically prints out a summary.
two.group.unpaired 
## DABEST (Data Analysis with Bootstrap Estimation) v0.2.1
## =======================================================
## 
## Variable: Measurement 
## 
## Unpaired mean difference of Group1 (n=40) minus Control1 (n=40)
##  19.2 [95CI  7.16; 30.4]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.

To create a two-group estimation plot (aka a Gardner-Altman plot), simply use plot(dabest.object).

Advanced R users would be interested to learn that dabest produces an object of class dabest. There is a generic S3 plot method for dabest objects that produces the estimation plot.

plot(two.group.unpaired, color.column = Gender)

This is known as a Gardner-Altman estimation plot, after Martin J. Gardner and Douglas Altman who were the first to publish it in 1986.

The key features of the Gardner-Altman estimation plot are:

  1. All data points are plotted.
  2. The mean difference (the effect size) and its 95% confidence interval (95% CI) is displayed as a point estimate and vertical bar respectively, on a separate but aligned axes.

The estimation plot produced by dabest differs from the one first introduced by Gardner and Altman in one important aspect. dabest derives the 95% CI through nonparametric bootstrap resampling. This enables visualization of the confidence interval as a graded sampling distribution.

The 95% CI presented is bias-corrected and accelerated (ie. a BCa bootstrap). You can read more about bootstrap resampling and BCa correction in this vignette.

Paired

If you have paired or repeated observations, you must specify the id.col, a column in the data that indicates the identity of each paired observation. This will produce a Tufte slopegraph instead of a swarmplot.

two.group.paired <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = c("Control1", "Group1"), 
         paired = TRUE, id.col = ID)

# The summary indicates this is a paired comparison. 
two.group.paired 
## DABEST (Data Analysis with Bootstrap Estimation) v0.2.1
## =======================================================
## 
## Variable: Measurement 
## 
## Paired mean difference of Group1 (n=40) minus Control1 (n=40)
##  19.2 [95CI  7.45; 31]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
plot(two.group.paired, color.column = Gender)

The Cummings estimation plot

Multi-two group

To create a multi-two group plot, one will need to specify a list, with each element of the list corresponding to the each two-group comparison.

multi.two.group.unpaired <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = list(c("Control1", "Group1"), 
                    c("Control2", "Group2")),
         paired = FALSE
         )

multi.two.group.unpaired 
## DABEST (Data Analysis with Bootstrap Estimation) v0.2.1
## =======================================================
## 
## Variable: Measurement 
## 
## Unpaired mean difference of Group1 (n=40) minus Control1 (n=40)
##  19.2 [95CI  7.16; 30.4]
## 
## Unpaired mean difference of Group2 (n=40) minus Control2 (n=40)
##  -23.9 [95CI  -44.8; -3.1]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
plot(multi.two.group.unpaired, color.column = Gender)

This is a Cumming estimation plot. It is heavily influenced by the plot designs of Geoff Cumming in his 2012 text Understanding the New Statistics. The effect size and 95% CIs are plotted a separate axes that is now positioned below the raw data. In addition, summary measurements are displayed as gapped lines to the right of each group. These vertical lines are identical to conventional mean ± standard deviation error bars. Here, the mean of each group is indicated as a gap in the line, drawing inspiration from Edward Tufte’s low data-ink ratio dictum.

By default, dabest plots the mean ± standard deviation of each group as a gapped line beside each group. The group.summaries = 'median_quartiles' parameter will plot the median and 25th & 75th percentiles of each group is plotted instead. If group.summaries = NULL, the summaries are not shown.

plot(multi.two.group.unpaired, color.column = Gender, 
     group.summaries = "median_quartiles")

Multi-paired

One can also produce a multi-paired plot.

multi.two.group.paired <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = list(c("Control1", "Group1"), 
                    c("Control2", "Group2")),
         paired = TRUE, id.col = ID
         )

multi.two.group.paired 
## DABEST (Data Analysis with Bootstrap Estimation) v0.2.1
## =======================================================
## 
## Variable: Measurement 
## 
## Paired mean difference of Group1 (n=40) minus Control1 (n=40)
##  19.2 [95CI  7.45; 31]
## 
## Paired mean difference of Group2 (n=40) minus Control2 (n=40)
##  -23.9 [95CI  -42.9; -4.61]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
plot(multi.two.group.paired, color.column = Gender, slopegraph = TRUE)

Shared Control

If you supply a character vector to idx with more than 2 groups, a shared control plot will be produced.

shared.control <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = c("Control2", "Group2", "Group4"),
         paired = FALSE
         )

shared.control 
## DABEST (Data Analysis with Bootstrap Estimation) v0.2.1
## =======================================================
## 
## Variable: Measurement 
## 
## Unpaired mean difference of Group2 (n=40) minus Control2 (n=40)
##  -23.9 [95CI  -44.8; -3.1]
## 
## Unpaired mean difference of Group4 (n=40) minus Control2 (n=40)
##  -4.54 [95CI  -27.4; 17.8]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
plot(shared.control, color.column = Gender, rawplot.type = "swarmplot")

Multi-group plot

multi.group <- 
  my.data %>%
  dabest(Group, Measurement, 
         idx = list(c("Control1", "Group1", "Group3"), 
                    c("Control2", "Group2", "Group4")),
         paired = FALSE
        )

multi.group 
## DABEST (Data Analysis with Bootstrap Estimation) v0.2.1
## =======================================================
## 
## Variable: Measurement 
## 
## Unpaired mean difference of Group1 (n=40) minus Control1 (n=40)
##  19.2 [95CI  7.16; 30.4]
## 
## Unpaired mean difference of Group3 (n=40) minus Control1 (n=40)
##  0.83 [95CI  -9.17; 9.99]
## 
## Unpaired mean difference of Group2 (n=40) minus Control2 (n=40)
##  -23.9 [95CI  -44.8; -3.1]
## 
## Unpaired mean difference of Group4 (n=40) minus Control2 (n=40)
##  -4.54 [95CI  -27.4; 17.8]
## 
## 
## 5000 bootstrap resamples.
## All confidence intervals are bias-corrected and accelerated.
plot(multi.group, color.column = Gender)

Controlling Aesthetics

You can control several graphical aspects of the estimation plot.

Use the rawplot.ylim and effsize.ylim parameters to supply custom y-limits for the rawplot and the delta plot, respectively.

plot(multi.group, color.column = Gender,
     rawplot.ylim = c(-100, 200),
     effsize.ylim = c(-60, 60)
    )

You can control the size of the dots used to create the rawplot data with rawplot.markersize. The default size (in points) is 2.

To obtain an aesthetically-pleasing plot, You should use this option in tandem with the rawplot.groupwidth option. This sets the maximum amount that each group of datapoints is allowed to spread in the x-direction. The default is 0.3.

plot(multi.group, color.column = Gender,
     rawplot.markersize = 1,
     rawplot.groupwidth = 0.4
    )

The rawplot.ylabel and effsize.ylabel parameters control the y-axis titles for the rawplot and the delta plot, respectively.

plot(multi.group, color.column = Gender,
     rawplot.ylabel = "Rawplot Title?",
     effsize.ylabel = "My delta plot!"
    )

The palette parameter accepts any ggplot2 palettes. The default palette applied is “Set2”.

plot(multi.group, color.column = Gender,
     palette = "Dark2" # The default is "Set2".
     )