The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

The paired function

Ethan Heinzen, Beth Atkinson, Jason Sinnwell

Introduction

Another one of the most common tables in medical literature includes summary statistics for a set of variables paired across two time points. Locally at Mayo, the SAS macro %paired was written to create summary tables with a single call. With the increasing interest in R, we have developed the function paired() to create similar tables within the R environment.

This vignette is light on purpose; paired() piggybacks off of tableby, so most documentation there applies here, too.

Simple Example

The first step when using the paired() function is to load the arsenal package. We can’t use mockstudy here because we need a dataset with paired observations, so we’ll create our own dataset.

library(arsenal)
dat <- data.frame(
  tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)),
  id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6),
  Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"),
  Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")),
  Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA),
  Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")),
  Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE),
  Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4),
  stringsAsFactors = FALSE
)

To create a simple table stratified by time point, use a formula= statement to specify the variables that you want summarized and the id= argument to specify the paired observations.

p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE)
summary(p)
Time Point 1 (N=4) Time Point 2 (N=4) Difference (N=4) p value
Cat 1.000
   A 2 (50.0%) 2 (50.0%) 1 (50.0%)
   B 2 (50.0%) 2 (50.0%) 1 (50.0%)
Fac 0.261
   A 2 (50.0%) 1 (25.0%) 2 (100.0%)
   B 1 (25.0%) 2 (50.0%) 1 (100.0%)
   C 1 (25.0%) 1 (25.0%) 1 (100.0%)
Num 0.391
   Mean (SD) 2.750 (1.258) 3.250 (0.957) 0.500 (1.000)
   Range 1.000 - 4.000 2.000 - 4.000 -1.000 - 1.000
Ord 0.174
   I 2 (50.0%) 0 (0.0%) 2 (100.0%)
   II 1 (25.0%) 1 (25.0%) 1 (100.0%)
   III 1 (25.0%) 3 (75.0%) 0 (0.0%)
Lgl 1.000
   FALSE 2 (50.0%) 1 (25.0%) 2 (100.0%)
   TRUE 2 (50.0%) 3 (75.0%) 1 (50.0%)
Dat 0.182
   Median 2018-05-03 2018-05-04 0.500
   Range 2018-05-02 - 2018-05-06 2018-05-02 - 2018-05-07 0.000 - 1.000

The third column shows the difference between time point 1 and time point 2. For categorical variables, it reports the percent of observations from time point 1 which changed in time point 2.

NAs

Note that by default, observations which do not have both timepoints are removed. This is easily changed using the na.action = na.paired("<arg>") argument. For example:

p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id,
            signed.rank.exact = FALSE, na.action = na.paired("fill"))
summary(p)
Time Point 1 (N=6) Time Point 2 (N=6) Difference (N=6) p value
Cat 1.000
   N-Miss 2 1 2
   A 2 (50.0%) 2 (40.0%) 1 (50.0%)
   B 2 (50.0%) 3 (60.0%) 1 (50.0%)
Fac 0.261
   N-Miss 1 1 2
   A 2 (40.0%) 2 (40.0%) 2 (100.0%)
   B 1 (20.0%) 2 (40.0%) 1 (100.0%)
   C 2 (40.0%) 1 (20.0%) 1 (100.0%)
Num 0.391
   N-Miss 1 2 2
   Mean (SD) 2.200 (1.643) 3.250 (0.957) 0.500 (1.000)
   Range 0.000 - 4.000 2.000 - 4.000 -1.000 - 1.000
Ord 0.174
   N-Miss 1 1 2
   I 2 (40.0%) 1 (20.0%) 2 (100.0%)
   II 2 (40.0%) 1 (20.0%) 1 (100.0%)
   III 1 (20.0%) 3 (60.0%) 0 (0.0%)
Lgl 1.000
   N-Miss 1 1 2
   FALSE 3 (60.0%) 2 (40.0%) 2 (100.0%)
   TRUE 2 (40.0%) 3 (60.0%) 1 (50.0%)
Dat 0.182
   N-Miss 1 1 2
   Median 2018-05-04 2018-05-05 0.500
   Range 2018-05-02 - 2018-05-06 2018-05-02 - 2018-05-07 0.000 - 1.000

For more details, see the help page for na.paired().

Available Function Options

Testing options

The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.

The following tests are accepted:

paired.control settings

A quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in paired.control or in summary.tableby.

args(paired.control)
## function (diff = TRUE, numeric.test = "paired.t", cat.test = "mcnemar", 
##     ordered.test = "signed.rank", date.test = "paired.t", mcnemar.correct = TRUE, 
##     signed.rank.exact = NULL, signed.rank.correct = TRUE, ...) 
## NULL

summary.tableby settings

Since the “paired” object inherits “tableby”, the summary.tableby function is what’s actually used to format and print the table.

args(arsenal:::summary.tableby)
## function (object, ..., labelTranslations = NULL, text = FALSE, 
##     title = NULL, pfootnote = FALSE, term.name = "") 
## NULL

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.