The tableby function

Beth Atkinson, Jason Sinnwell, Shannon McDonnell and Greg Dougherty

09 March, 2017

Introduction

One of the most common tables in medical literature includes summary statistics for a set of variables, often stratified by some group (e.g. treatment arm). Locally at Mayo, the SAS macros %table and %summary were written to create summary tables with a single call. With the increasing interest in R, we have developed the function tableby to create similar tables within the R environment.

In developing the tableby function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R’s strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report.

This report provides step-by-step directions for using the functions associated with tableby. All functions presented here are available within the arsenal package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.

Simple Example

The first step when using the tableby function is to load the arsenal package. All the examples in this report use a dataset called mockstudy made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.

require(arsenal)
require(knitr)
require(survival)
data(mockstudy) ##load data
dim(mockstudy)  ##look at how many subjects and variables are in the dataset 
## [1] 1499   14
# help(mockstudy) ##learn more about the dataset and variables
str(mockstudy) ##quick look at the data
## 'data.frame':    1499 obs. of  14 variables:
##  $ case       : int  110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
##  $ age        : atomic  67 74 50 71 69 56 50 57 51 63 ...
##   ..- attr(*, "label")= chr "Age in Years"
##  $ arm        : atomic  F: FOLFOX A: IFL A: IFL G: IROX ...
##   ..- attr(*, "label")= chr "Treatment Arm"
##  $ sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
##  $ race       : atomic  Caucasian Caucasian Caucasian Caucasian ...
##   ..- attr(*, "label")= chr "Race"
##  $ fu.time    : int  922 270 175 128 233 120 369 421 387 363 ...
##  $ fu.stat    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ ps         : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ hgb        : num  11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
##  $ bmi        : atomic  25.1 19.5 NA 29.4 26.4 ...
##   ..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
##  $ alk.phos   : int  160 290 700 771 350 569 162 152 231 492 ...
##  $ ast        : int  35 52 100 68 35 27 16 12 25 18 ...
##  $ mdquality.s: int  NA 1 1 1 NA 1 1 1 1 1 ...
##  $ age.ord    : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...

To create a simple table stratified by treament arm, use a formula statement to specify the variables that you want summarized. The example below uses age (a continuous variable) and sex (a factor).

tab1 <- tableby(arm ~ sex + age, data=mockstudy)

If you want to take a quick look at the table, you can use summary on your tableby object and the table will print out as text in your R console window. If you use summary without any options you will see a number of \(\&nbsp;\) statements which translates to “space” in HTML.

Pretty text version of table

If you want a nicer version in your console window then add the text=TRUE option.

summary(tab1, text=TRUE)
## 
## ---------------------------------------------------------------------------------------------------------------------------
##                          A: IFL (N=428)      F: FOLFOX (N=691)   G: IROX (N=380)     Total (N=1499)      p value           
## ----------------------- ------------------- ------------------- ------------------- ------------------- -------------------
## Sex                                                                                                                   0.190
##    Male                 277 (64.7%)         411 (59.5%)         228 (60%)           916 (61.1%)        
##    Female               151 (35.3%)         280 (40.5%)         152 (40%)           583 (38.9%)        
## Age in Years                                                                                                          0.614
##    Mean (SD)            59.7 (11.4)         60.3 (11.6)         59.8 (11.5)         60 (11.5)          
##    Q1, Q3               53, 68              52, 69              52, 68              52, 68             
##    Range                27 - 88             19 - 88             26 - 85             19 - 88            
## ---------------------------------------------------------------------------------------------------------------------------

Pretty Rmarkdown version of table

In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis" when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names.

summary(tab1)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Sex 0.190
    Male 277 (64.7%) 411 (59.5%) 228 (60%) 916 (61.1%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%) 583 (38.9%)
Age in Years 0.614
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5) 60 (11.5)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88

Data frame version of table

If you want a data.frame version, simply use as.data.frame.

as.data.frame(tab1)
##           term variable A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
## 1          Sex      sex                                                                   0.190
## 2         Male      sex    277 (64.7%)       411 (59.5%)       228 (60%)    916 (61.1%)        
## 3       Female      sex    151 (35.3%)       280 (40.5%)       152 (40%)    583 (38.9%)        
## 4 Age in Years      age                                                                   0.614
## 5    Mean (SD)      age    59.7 (11.4)       60.3 (11.6)     59.8 (11.5)      60 (11.5)        
## 6       Q1, Q3      age         53, 68            52, 69          52, 68         52, 68        
## 7        Range      age        27 - 88           19 - 88         26 - 85        19 - 88

Summaries using standard R code

## base R frequency example
tmp <- table(Gender=mockstudy$sex, "Study Arm"=mockstudy$arm)
tmp
##         Study Arm
## Gender   A: IFL F: FOLFOX G: IROX
##   Male      277       411     228
##   Female    151       280     152
# Note: The continuity correction is applied by default in R (not used in %table)
chisq.test(tmp) 
## 
##  Pearson's Chi-squared test
## 
## data:  tmp
## X-squared = 3.3168, df = 2, p-value = 0.1904
## gmodels frequency example
#require(gmodels)
#CrossTable(mockstudy$sex, mockstudy$arm, prop.r=F, prop.t=F, 
#           prop.chisq=F, chisq=T, dnn=c('Gender','Study Arm'))

## base R numeric summary example
tapply(mockstudy$age, mockstudy$arm, summary)
## $`A: IFL`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   27.00   53.00   61.00   59.67   68.00   88.00 
## 
## $`F: FOLFOX`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    19.0    52.0    61.0    60.3    69.0    88.0 
## 
## $`G: IROX`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   26.00   52.00   61.00   59.76   68.00   85.00
summary(aov(age ~ arm, data=mockstudy))
##               Df Sum Sq Mean Sq F value Pr(>F)
## arm            2    129    64.7   0.487  0.614
## Residuals   1496 198628   132.8

Modifying Output

Add labels

In the above example, age is shown with a label (Age in Years), but sex is listed “as is” with lower case letters. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R.

## Look at one variable's label
attr(mockstudy$age,'label')
## [1] "Age in Years"
## See all the variables with a label
unlist(lapply(mockstudy,'attr','label'))
##                        age                        arm                       race 
##             "Age in Years"            "Treatment Arm"                     "Race" 
##                        bmi 
## "Body Mass Index (kg/m^2)"

If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset.

attr(mockstudy$sex,'label')  <- 'Gender'

tab1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(tab1)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Gender 0.190
    Male 277 (64.7%) 411 (59.5%) 228 (60%) 916 (61.1%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%) 583 (38.9%)
Age in Years 0.614
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5) 60 (11.5)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88

Another option is to add labels after you have created the table

mylabels <- list( sex = "SEX", age ="Age, yrs")
summary(tab1, labelTranslations = mylabels)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
SEX 0.190
    Male 277 (64.7%) 411 (59.5%) 228 (60%) 916 (61.1%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%) 583 (38.9%)
Age, yrs 0.614
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5) 60 (11.5)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88

Alternatively, you can check the variable labels and manipulate them with a function called labels, which works on the tableby object.

labels(tab1)
##             arm             sex             age 
## "Treatment Arm"        "Gender"  "Age in Years"
labels(tab1) <- c(arm="Treatment Assignment", age="Baseline Age (yrs)")
labels(tab1)
##                    arm                    sex                    age 
## "Treatment Assignment"               "Gender"   "Baseline Age (yrs)"
summary(tab1)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Gender 0.190
    Male 277 (64.7%) 411 (59.5%) 228 (60%) 916 (61.1%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%) 583 (38.9%)
Baseline Age (yrs) 0.614
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5) 60 (11.5)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88

Change summary statistics globally

Currently the default behavior is to summarize continuous variables with: Number of missing values, Mean (SD), 25th - 75th quantiles, and Minimum-Maximum values with an ANOVA (t-test with equal variances) p-value. For categorical variables the default is to show: Number of missing values and count (column percent) with a chi-square p-value. This behavior can be modified using the tableby.control function. In fact, you can save your standard settings and use that for future tables. Note that test=FALSE and total=FALSE results in the total column and p-value column not being printed.

mycontrols  <- tableby.control(test=FALSE, total=FALSE,
                               numeric.test="kwt", cat.test="chisq",
                               numeric.stats=c("N", "median", "q1q3"),
                               cat.stats=c("countpct"),
                               stats.labels=list(N='Count', median='Median', q1q3='Q1,Q3'))
tab2 <- tableby(arm ~ sex + age, data=mockstudy, control=mycontrols)
summary(tab2)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380)
Gender
    Male 277 (64.7%) 411 (59.5%) 228 (60%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%)
Age in Years
    Count 428 691 380
    Median 61 61 61
    Q1,Q3 53, 68 52, 69 52, 68

You can also change these settings directly in the tableby call.

tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE, 
                numeric.stats=c("median","q1q3"), numeric.test="kwt")
summary(tab3)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380)
Gender
    Male 277 (64.7%) 411 (59.5%) 228 (60%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%)
Age in Years
    median 61 61 61
    Q1, Q3 53, 68 52, 69 52, 68

Change summary statistics within the formula

In addition to modifying summary options globally, it is possible to modify the test and summary statistics for specific variables within the formula statement. For example, both the kwt (Kruskal-Wallis rank-based) and anova (asymptotic analysis of variance) tests apply to numeric variables and we can use one for the variable “age” and another for the variable “ast”. A list of all the options is shown at the end of the vignette.

The tests function can do a quick check on what tests were performed on each variable in tableby.

tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + kwt(ast), data=mockstudy)
tests(tab.test)
##                     Variable    p.value                       Method
## age             Age in Years 0.63906143 Kruskal-Wallis rank sum test
## bmi Body Mass Index (kg/m^2) 0.89165522           Linear Model ANOVA
## ast                      ast 0.03902803 Kruskal-Wallis rank sum test
summary(tab.test)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age in Years 0.639
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5) 60 (11.5)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88
Body Mass Index (kg/m^2) 0.892
    N-Miss 9 20 4 33
    Mean (SD) 27.3 (5.55) 27.2 (5.17) 27.1 (5.75) 27.2 (5.43)
    Q1, Q3 23.6, 30.6 23.7, 30.1 23.2, 29.6 23.5, 30.1
    Range 14.1 - 53 16.6 - 49.1 15.4 - 60.2 14.1 - 60.2
ast 0.039
    N-Miss 69 141 56 266
    Mean (SD) 37.3 (28) 35.2 (26.7) 35.7 (25.8) 35.9 (26.8)
    Q1, Q3 21, 42 19, 40 20, 41 20, 41
    Range 10 - 205 7 - 174 5 - 176 5 - 205

Summary statistics for any individual variable can also be modified, but it must be done as secondary arguments to the test function. The function names must be strings that are functions already written for tableby, built-in R functions like mean and range, or user-defined functions.

tab.test <- tableby(arm ~ kwt(ast, "Nmiss2","median") + anova(age, "N","mean") +
                    kwt(bmi, "Nmiss","median"), data=mockstudy)
summary(tab.test)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
ast 0.039
    N-Miss 69 141 56 266
    median 29 25.5 27 27
Age in Years 0.614
    N 428 691 380 1499
    mean 59.7 60.3 59.8 60
Body Mass Index (kg/m^2) 0.641
    N-Miss 9 20 4 33
    median 26.2 26.5 26 26.3

Modifying the look & feel in Word documents

You can easily create Word versions of tableby output via an Rmarkdown report and the default options will give you a reasonable table in Word - just select the “Knit Word” option in RStudio.

The functionality listed in this next paragraph is coming soon but needs an upgraded version of RStudio If you want to modify fonts used for the table, then you’ll need to add an extra line to your header at the beginning of your file. You can take the WordStylesReference01.docx file and modify the fonts (storing the format preferences in your project directory). To see how this works, run your report once using WordStylesReference01.docx and then WordStylesReference02.docx.

output: word_document
  reference_docx: /projects/bsi/gentools/R/lib320/arsenal/doc/WordStylesReference01.docx 

For more informating on changing the look/feel of your Word document, see the Rmarkdown documentation website.

Additional Examples

Here are multiple examples showing how to use some of the different options.

1. Summarize without a group/by variable

tab.noby <- tableby(~ bmi + sex + age, data=mockstudy)
summary(tab.noby)
Overall (N=1499)
Body Mass Index (kg/m^2)
    N-Miss 33
    Mean (SD) 27.2 (5.43)
    Q1, Q3 23.5, 30.1
    Range 14.1 - 60.2
Gender
    Male 916 (61.1%)
    Female 583 (38.9%)
Age in Years
    Mean (SD) 60 (11.5)
    Q1, Q3 52, 68
    Range 19 - 88

2. Display footnotes indicating which “test” was used

summary(tab.test) #, pfootnote=TRUE)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
ast 0.039
    N-Miss 69 141 56 266
    median 29 25.5 27 27
Age in Years 0.614
    N 428 691 380 1499
    mean 59.7 60.3 59.8 60
Body Mass Index (kg/m^2) 0.641
    N-Miss 9 20 4 33
    median 26.2 26.5 26 26.3

3. Summarize an ordered factor

When comparing groups of ordered data there are a couple of options. The default uses a general independence test available from the coin package. For two-group comparisons, this is essentially the Armitage trend test. The other option is to specify the Kruskal Wallis test. The example below shows both options.

mockstudy$age.ordnew <- ordered(c("a",NA,as.character(mockstudy$age.ord[-(1:2)])))
table(mockstudy$age.ord, mockstudy$sex)
##        
##         Male Female
##   10-19    1      0
##   20-29    8     11
##   30-39   37     30
##   40-49  127     83
##   50-59  257    179
##   60-69  298    170
##   70-79  168    101
##   80-89   20      9
table(mockstudy$age.ordnew, mockstudy$sex)
##        
##         Male Female
##   10-19    1      0
##   20-29    8     11
##   30-39   37     30
##   40-49  127     83
##   50-59  257    179
##   60-69  297    170
##   70-79  168    100
##   80-89   20      9
##   a        1      0
class(mockstudy$age.ord)
## [1] "ordered" "factor"
summary(tableby(sex ~ age.ordnew, data = mockstudy)) #, pfootnote = TRUE)
Male (N=916) Female (N=583) Total (N=1499) p value
age.ordnew 0.040
    N-Miss 0 1 1
    10-19 1 (0.109%) 0 (0%) 1 (0.067%)
    20-29 8 (0.873%) 11 (1.89%) 19 (1.27%)
    30-39 37 (4.04%) 30 (5.15%) 67 (4.47%)
    40-49 127 (13.9%) 83 (14.3%) 210 (14%)
    50-59 257 (28.1%) 179 (30.8%) 436 (29.1%)
    60-69 297 (32.4%) 170 (29.2%) 467 (31.2%)
    70-79 168 (18.3%) 100 (17.2%) 268 (17.9%)
    80-89 20 (2.18%) 9 (1.55%) 29 (1.94%)
    a 1 (0.109%) 0 (0%) 1 (0.067%)
summary(tableby(sex ~ kwt(age.ord), data = mockstudy)) #) #, pfootnote = TRUE)
Male (N=916) Female (N=583) Total (N=1499) p value
age.ord 0.067
    10-19 1 (0.109%) 0 (0%) 1 (0.067%)
    20-29 8 (0.873%) 11 (1.89%) 19 (1.27%)
    30-39 37 (4.04%) 30 (5.15%) 67 (4.47%)
    40-49 127 (13.9%) 83 (14.2%) 210 (14%)
    50-59 257 (28.1%) 179 (30.7%) 436 (29.1%)
    60-69 298 (32.5%) 170 (29.2%) 468 (31.2%)
    70-79 168 (18.3%) 101 (17.3%) 269 (17.9%)
    80-89 20 (2.18%) 9 (1.54%) 29 (1.93%)

4. Summarize a survival variable

First look at the information that is presented by the survfit function, then see how the same results can be seen with tableby. The default is to show the median survival (time at which the probability of survival = 50%).

survfit(Surv(fu.time, fu.stat)~sex, data=mockstudy)
## Call: survfit(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
## 
##              n events median 0.95LCL 0.95UCL
## sex=Male   916    829    550     515     590
## sex=Female 583    527    543     511     575
survdiff(Surv(fu.time, fu.stat)~sex, data=mockstudy)
## Call:
## survdiff(formula = Surv(fu.time, fu.stat) ~ sex, data = mockstudy)
## 
##              N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=Male   916      829      830  0.000370  0.000956
## sex=Female 583      527      526  0.000583  0.000956
## 
##  Chisq= 0  on 1 degrees of freedom, p= 0.975
summary(tableby(sex ~ Surv(fu.time, fu.stat), data=mockstudy))
Male (N=916) Female (N=583) Total (N=1499) p value
Surv(fu.time, fu.stat) 0.975
    Events 829 527 1356
    medSurv 550 543 546

It is also possible to obtain summaries of the %survival at certain time points (say the probability of surviving 1-year).

summary(survfit(Surv(fu.time/365.25, fu.stat)~sex, data=mockstudy), times=1:5)
## Call: survfit(formula = Surv(fu.time/365.25, fu.stat) ~ sex, data = mockstudy)
## 
##                 sex=Male 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    626     286   0.6870  0.0153       0.6576       0.7177
##     2    309     311   0.3437  0.0158       0.3142       0.3761
##     3    152     151   0.1748  0.0127       0.1516       0.2015
##     4     57      61   0.0941  0.0104       0.0759       0.1168
##     5     24      16   0.0628  0.0095       0.0467       0.0844
## 
##                 sex=Female 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    380     202   0.6531  0.0197       0.6155        0.693
##     2    190     189   0.3277  0.0195       0.2917        0.368
##     3     95      90   0.1701  0.0157       0.1420        0.204
##     4     51      32   0.1093  0.0133       0.0861        0.139
##     5     18      12   0.0745  0.0126       0.0534        0.104
summary(tableby(sex ~ Surv(fu.time/365.25, fu.stat), data=mockstudy, times=1:5, surv.stats=c("NeventsSurv","NriskSurv")))
## Warning in tableby(sex ~ Surv(fu.time/365.25, fu.stat), data = mockstudy, : unused arguments: times
Male (N=916) Female (N=583) Total (N=1499) p value
Surv(fu.time/365.25, fu.stat) 0.975
NeventsSurv 0.975
    1 286 (68.7) 202 (65.3) 488 (67.4)
    2 597 (34.4) 391 (32.8) 988 (33.7)
    3 748 (17.5) 481 (17) 1229 (17.3)
    4 809 (9.41) 513 (10.9) 1322 (10.1)
    5 825 (6.28) 525 (7.45) 1350 (6.78)
NriskSurv 0.975
    1 626 380 1006
    2 309 190 499
    3 152 95 247
    4 57 51 108
    5 24 18 42

5. Summarize date variables

Date variables by default are summarized with the number of missing values, the median, and the range. For example purposes we’ve created a random date. Missing values are introduced for impossible February dates.

set.seed(100)
N <- nrow(mockstudy)
mockstudy$dtentry <- mdy.Date(month=sample(1:12,N,replace=T), day=sample(1:29,N,replace=T), 
                              year=sample(2005:2009,N,replace=T))
summary(tableby(sex ~ dtentry, data=mockstudy))
Male (N=916) Female (N=583) Total (N=1499) p value
dtentry 0.554
    N-Miss 3 2 5
    median 2007-06-16 2007-06-15 2007-06-15
    Range 2005-01-03 - 2009-12-27 2005-01-01 - 2009-12-28 2005-01-01 - 2009-12-28

6. Summarize multiple variables without typing them out

Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the paste command with the collapse="+" option.

## create a vector specifying the variable names
myvars <- names(mockstudy)

## select the 8th through the last variables
## paste them together, separated by the + sign
RHS <- paste(myvars[8:10], collapse="+")
RHS

[1] “ps+hgb+bmi”

## create a formula using the as.formula function
as.formula(paste('arm ~ ', RHS))

arm ~ ps + hgb + bmi

## use the formula in the tableby function
summary(tableby(as.formula(paste('arm ~', RHS)), data=mockstudy))
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
ps 0.903
    N-Miss 69 141 56 266
    Mean (SD) 0.529 (0.597) 0.547 (0.595) 0.537 (0.606) 0.539 (0.598)
    Q1, Q3 0, 1 0, 1 0, 1 0, 1
    Range 0 - 2 0 - 2 0 - 2 0 - 2
hgb 0.639
    N-Miss 69 141 56 266
    Mean (SD) 12.3 (1.69) 12.4 (1.76) 12.4 (1.68) 12.3 (1.72)
    Q1, Q3 11, 13.4 11.1, 13.6 11.2, 13.6 11.1, 13.5
    Range 9.06 - 17.3 9 - 18.2 9 - 17 9 - 18.2
Body Mass Index (kg/m^2) 0.892
    N-Miss 9 20 4 33
    Mean (SD) 27.3 (5.55) 27.2 (5.17) 27.1 (5.75) 27.2 (5.43)
    Q1, Q3 23.6, 30.6 23.7, 30.1 23.2, 29.6 23.5, 30.1
    Range 14.1 - 53 16.6 - 49.1 15.4 - 60.2 14.1 - 60.2

These steps can also be done using the formulize function.

## The formulize function does the paste and as.formula steps
tmp <- formulize('arm',myvars[8:10])
tmp

arm ~ ps + hgb + bmi <environment: 0x674d838>

## More complex formulas could also be written using formulize
tmp2 <- formulize('arm',c('ps','hgb^2','bmi'))

## use the formula in the tableby function
summary(tableby(tmp, data=mockstudy))
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
ps 0.903
    N-Miss 69 141 56 266
    Mean (SD) 0.529 (0.597) 0.547 (0.595) 0.537 (0.606) 0.539 (0.598)
    Q1, Q3 0, 1 0, 1 0, 1 0, 1
    Range 0 - 2 0 - 2 0 - 2 0 - 2
hgb 0.639
    N-Miss 69 141 56 266
    Mean (SD) 12.3 (1.69) 12.4 (1.76) 12.4 (1.68) 12.3 (1.72)
    Q1, Q3 11, 13.4 11.1, 13.6 11.2, 13.6 11.1, 13.5
    Range 9.06 - 17.3 9 - 18.2 9 - 17 9 - 18.2
Body Mass Index (kg/m^2) 0.892
    N-Miss 9 20 4 33
    Mean (SD) 27.3 (5.55) 27.2 (5.17) 27.1 (5.75) 27.2 (5.43)
    Q1, Q3 23.6, 30.6 23.7, 30.1 23.2, 29.6 23.5, 30.1
    Range 14.1 - 53 16.6 - 49.1 15.4 - 60.2 14.1 - 60.2

7. Subset the dataset used in the analysis

Here are two ways to get the same result (limit the analysis to subjects age>5 and in the F: FOLFOX treatment group).

newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(sex,ps:bmi))
dim(mockstudy)
## [1] 1499   16
table(mockstudy$arm)
## 
##    A: IFL F: FOLFOX   G: IROX 
##       428       691       380
dim(newdata)
## [1] 557   4
names(newdata)
## [1] "sex" "ps"  "hgb" "bmi"
summary(tableby(sex ~ ., data=newdata))
Male (N=333) Female (N=224) Total (N=557) p value
ps 0.652
    N-Miss 64 44 108
    Mean (SD) 0.554 (0.6) 0.528 (0.602) 0.543 (0.6)
    Q1, Q3 0, 1 0, 1 0, 1
    Range 0 - 2 0 - 2 0 - 2
hgb <0.001
    N-Miss 64 44 108
    Mean (SD) 12.7 (1.92) 12.1 (1.4) 12.5 (1.76)
    Q1, Q3 11.3, 14 11, 12.9 11.2, 13.7
    Range 9 - 18.2 9.1 - 15.9 9 - 18.2
bmi 0.650
    N-Miss 9 6 15
    Mean (SD) 27.5 (4.78) 27.3 (5.51) 27.5 (5.08)
    Q1, Q3 24.4, 30.2 23.3, 30.4 24, 30.4
    Range 17.9 - 47.5 16.6 - 49.1 16.6 - 49.1
summary(tableby(sex ~ ps + hgb + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
Male (N=333) Female (N=224) Total (N=557) p value
ps 0.652
    N-Miss 64 44 108
    Mean (SD) 0.554 (0.6) 0.528 (0.602) 0.543 (0.6)
    Q1, Q3 0, 1 0, 1 0, 1
    Range 0 - 2 0 - 2 0 - 2
hgb <0.001
    N-Miss 64 44 108
    Mean (SD) 12.7 (1.92) 12.1 (1.4) 12.5 (1.76)
    Q1, Q3 11.3, 14 11, 12.9 11.2, 13.7
    Range 9 - 18.2 9.1 - 15.9 9 - 18.2
bmi 0.650
    N-Miss 9 6 15
    Mean (SD) 27.5 (4.78) 27.3 (5.51) 27.5 (5.08)
    Q1, Q3 24.4, 30.2 23.3, 30.4 24, 30.4
    Range 17.9 - 47.5 16.6 - 49.1 16.6 - 49.1

8. Create combinations of variables on the fly

## create a variable combining the levels of mdquality.s and sex
with(mockstudy, table(interaction(mdquality.s,sex)))
## 
##   0.Male   1.Male 0.Female 1.Female 
##       77      686       47      437
summary(tableby(arm ~ interaction(mdquality.s,sex), data=mockstudy))
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
interaction(mdquality.s, sex) 0.493
    N-Miss 55 156 41 252
    0.Male 29 (7.77%) 31 (5.79%) 17 (5.01%) 77 (6.17%)
    1.Male 214 (57.4%) 285 (53.3%) 187 (55.2%) 686 (55%)
    0.Female 12 (3.22%) 21 (3.93%) 14 (4.13%) 47 (3.77%)
    1.Female 118 (31.6%) 198 (37%) 121 (35.7%) 437 (35%)
## create a new grouping variable with combined levels of arm and sex
summary(tableby(interaction(mdquality.s, sex) ~  age + bmi, data=mockstudy, subset=arm=="F: FOLFOX"))
0.Male (N=31) 1.Male (N=285) 0.Female (N=21) 1.Female (N=198) Total (N=535) p value
Age 0.190
    Mean (SD) 63.1 (11.7) 60.7 (11.8) 60.8 (10.1) 58.9 (11.4) 60.2 (11.6)
    Q1, Q3 56, 72 53, 69 51, 67 51, 68 52, 69
    Range 41 - 82 19 - 88 42 - 81 29 - 83 19 - 88
bmi 0.894
    N-Miss 0 6 1 5 12
    Mean (SD) 26.6 (5.09) 27.4 (4.7) 27.4 (4.9) 27.3 (5.67) 27.3 (5.1)
    Q1, Q3 22.8, 29.2 24.3, 30.2 23.7, 29.6 23.1, 30.4 23.9, 30.3
    Range 20.2 - 41.8 17.9 - 47.5 19.8 - 39.4 16.8 - 44.8 16.8 - 47.5

9. Transform variables on the fly

Certain transformations need to be surrounded by I() so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols / - + ^ * then surround the new variable by I().

trans <- tableby(arm ~ I(age/10) + log(bmi) + factor(mdquality.s, levels=0:1, labels=c('N','Y')),
                 data=mockstudy)
summary(trans)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
I(age/10) 0.614
    Mean (SD) 5.97 (1.14) 6.03 (1.16) 5.98 (1.15) 6 (1.15)
    Q1, Q3 5.3, 6.8 5.2, 6.9 5.2, 6.8 5.2, 6.8
    Range 2.7 - 8.8 1.9 - 8.8 2.6 - 8.5 1.9 - 8.8
log(bmi) 0.811
    N-Miss 9 20 4 33
    Mean (SD) 3.29 (0.197) 3.29 (0.183) 3.28 (0.2) 3.28 (0.192)
    Q1, Q3 3.16, 3.42 3.17, 3.41 3.14, 3.39 3.16, 3.41
    Range 2.64 - 3.97 2.81 - 3.89 2.74 - 4.1 2.64 - 4.1
factor(mdquality.s, levels = 0:1, labels = c(“N”, “Y”)) 0.694
    N-Miss 55 156 41 252
    N 41 (11%) 52 (9.72%) 31 (9.14%) 124 (9.94%)
    Y 332 (89%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

The labels for these variables isn’t exactly what we’d like so we can change modify those after the fact. Instead of typing out the very long variable names you can modify specific labels by position.

labels(trans)
##                                                           arm 
##                                               "Treatment Arm" 
##                                                     I(age/10) 
##                                                   "I(age/10)" 
##                                                      log(bmi) 
##                                                    "log(bmi)" 
##       factor(mdquality.s, levels = 0:1, labels = c("N", "Y")) 
## "factor(mdquality.s, levels = 0:1, labels = c(\"N\", \"Y\"))"
labels(trans)[2:4] <- c('Age per 10 yrs', 'log(BMI)', 'MD Quality')
labels(trans)
##                                                     arm 
##                                         "Treatment Arm" 
##                                               I(age/10) 
##                                        "Age per 10 yrs" 
##                                                log(bmi) 
##                                              "log(BMI)" 
## factor(mdquality.s, levels = 0:1, labels = c("N", "Y")) 
##                                            "MD Quality"
summary(trans)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age per 10 yrs 0.614
    Mean (SD) 5.97 (1.14) 6.03 (1.16) 5.98 (1.15) 6 (1.15)
    Q1, Q3 5.3, 6.8 5.2, 6.9 5.2, 6.8 5.2, 6.8
    Range 2.7 - 8.8 1.9 - 8.8 2.6 - 8.5 1.9 - 8.8
log(BMI) 0.811
    N-Miss 9 20 4 33
    Mean (SD) 3.29 (0.197) 3.29 (0.183) 3.28 (0.2) 3.28 (0.192)
    Q1, Q3 3.16, 3.42 3.17, 3.41 3.14, 3.39 3.16, 3.41
    Range 2.64 - 3.97 2.81 - 3.89 2.74 - 4.1 2.64 - 4.1
MD Quality 0.694
    N-Miss 55 156 41 252
    N 41 (11%) 52 (9.72%) 31 (9.14%) 124 (9.94%)
    Y 332 (89%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

Note that if we had not changed mdquality.s to a factor, it would have been summarized as though it were a continuous variable.

class(mockstudy$mdquality.s)

[1] “integer”

summary(tableby(arm~mdquality.s, data=mockstudy))
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
mdquality.s 0.695
    N-Miss 55 156 41 252
    Mean (SD) 0.89 (0.313) 0.903 (0.297) 0.909 (0.289) 0.901 (0.299)
    Q1, Q3 1, 1 1, 1 1, 1 1, 1
    Range 0 - 1 0 - 1 0 - 1 0 - 1

Another option would be to specify the test and summary statistics. In fact, if I had a set of variables coded 0/1 and that was all I was summarizing, then I could change the global option for continuous variables to use the chi-square test and show countpct.

summary(tableby(arm ~ chisq(mdquality.s, "Nmiss","countpct"), data=mockstudy))
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
mdquality.s 0.694
    N-Miss 55 156 41 252
    0 41 (11) 52 (9.72) 31 (9.14) 124 (9.94)
    1 332 (89) 483 (90.3) 308 (90.9) 1123 (90.1)

10. Change the ordering of the variables or delete a variable

mytab <- tableby(arm ~ sex + alk.phos + age, data=mockstudy)
mytab2 <- mytab[c('age','sex','alk.phos')]
summary(mytab2)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age in Years 0.614
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5) 60 (11.5)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88
Gender 0.190
    Male 277 (64.7%) 411 (59.5%) 228 (60%) 916 (61.1%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%) 583 (38.9%)
alk.phos 0.226
    N-Miss 69 141 56 266
    Mean (SD) 176 (129) 162 (122) 174 (139) 169 (128)
    Q1, Q3 89, 217 85, 195 87.8, 210 86, 207
    Range 11 - 858 10 - 1014 7 - 982 7 - 1014
summary(mytab[c('age','sex')], nsmall = 2)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age in Years 0.614
    Mean (SD) 59.67 (11.36) 60.3 (11.63) 59.76 (11.5) 59.99 (11.52)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88
Gender 0.190
    Male 277 (64.72%) 411 (59.48%) 228 (60%) 916 (61.11%)
    Female 151 (35.28%) 280 (40.52%) 152 (40%) 583 (38.89%)
summary(mytab[c(3,1)], nsmall = 3)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Age in Years 0.614
    Mean (SD) 59.673 (11.365) 60.301 (11.632) 59.763 (11.499) 59.985 (11.519)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88
Gender 0.190
    Male 277 (64.72%) 411 (59.479%) 228 (60%) 916 (61.107%)
    Female 151 (35.28%) 280 (40.521%) 152 (40%) 583 (38.893%)

11. Merge two tableby objects together

It is possible to combine two tableby objects so that they print out together.

## demographics
tab1 <- tableby(arm ~ sex + age, data=mockstudy,
                control=tableby.control(numeric.stats=c("Nmiss","meansd"), total=FALSE))
## lab data
tab2 <- tableby(arm ~ hgb + alk.phos, data=mockstudy,
                control=tableby.control(numeric.stats=c("Nmiss","median","q1q3"),
                                        numeric.test="kwt", total=FALSE))
names(tab1$x)

[1] “sex” “age”

names(tab2$x)

[1] “hgb” “alk.phos”

tab12 <- merge(tab1,tab2)
class(tab12)

[1] “tableby”

names(tab12$x)

[1] “sex” “age” “hgb” “alk.phos”

summary(tab12) #, pfootnote=TRUE)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) p value
Gender 0.190
    Male 277 (64.7%) 411 (59.5%) 228 (60%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%)
Age in Years 0.614
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5)
hgb 0.570
    N-Miss 69 141 56
    median 12.1 12.2 12.4
    Q1, Q3 11, 13.4 11.1, 13.6 11.2, 13.6
alk.phos 0.104
    N-Miss 69 141 56
    median 133 116 122
    Q1, Q3 89, 217 85, 195 87.8, 210

12. Add a title to the table

When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.

t1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(t1, title='Demographics')
Demographics
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Gender 0.190
    Male 277 (64.7%) 411 (59.5%) 228 (60%) 916 (61.1%)
    Female 151 (35.3%) 280 (40.5%) 152 (40%) 583 (38.9%)
Age in Years 0.614
    Mean (SD) 59.7 (11.4) 60.3 (11.6) 59.8 (11.5) 60 (11.5)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88

13. Modify how missing values are displayed

Depending on the report you are writing you have the following options:

## look at how many missing values there are for each variable
apply(is.na(mockstudy),2,sum)
##        case         age         arm         sex        race     fu.time     fu.stat          ps 
##           0           0           0           0           7           0           0         266 
##         hgb         bmi    alk.phos         ast mdquality.s     age.ord  age.ordnew     dtentry 
##         266          33         266         266         252           0           1           5
## Show how many subjects have each variable (non-missing)
summary(tableby(sex ~ ast + age, data=mockstudy,
                control=tableby.control(numeric.stats=c("N","median"), total=FALSE)))
Male (N=916) Female (N=583) p value
ast 0.921
    N 754 479
    median 27 27
Age in Years 0.048
    N 916 583
    median 61 60
## Always list the number of missing values
summary(tableby(sex ~ ast + age, data=mockstudy,
                control=tableby.control(numeric.stats=c("Nmiss2","median"), total=FALSE)))
Male (N=916) Female (N=583) p value
ast 0.921
    N-Miss 162 104
    median 27 27
Age in Years 0.048
    N-Miss 0 0
    median 61 60
## Only show the missing values if there are some (default)
summary(tableby(sex ~ ast + age, data=mockstudy, 
                control=tableby.control(numeric.stats=c("Nmiss","mean"),total=FALSE)))
Male (N=916) Female (N=583) p value
ast 0.921
    N-Miss 162 104
    mean 35.9 36
Age in Years 0.048
    mean 60.5 59.2
## Don't show N at all
summary(tableby(sex ~ ast + age, data=mockstudy, 
                control=tableby.control(numeric.stats=c("mean"),total=FALSE)))
Male (N=916) Female (N=583) p value
ast 0.921
    mean 35.9 36
Age in Years 0.048
    mean 60.5 59.2

14. Modify the number of digits used

Within tableby.control function there are 4 options for controlling the number of significant digits shown.

summary(tableby(arm ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2, nsmall.pct=1)
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Gender 0.19
    Male 277 (64.7%) 411 (59.5%) 228 (60.0%) 916 (61.1%)
    Female 151 (35.3%) 280 (40.5%) 152 (40.0%) 583 (38.9%)
Age in Years 0.61
    Mean (SD) 59.67 (11.36) 60.3 (11.63) 59.76 (11.5) 59.99 (11.52)
    Q1, Q3 53, 68 52, 69 52, 68 52, 68
    Range 27 - 88 19 - 88 26 - 85 19 - 88
fu.time <0.01
    Mean (SD) 553.6 (419.6) 731.2 (487.7) 607.2 (435.5) 649.1 (462.5)
    Q1, Q3 255.5, 724.2 345, 1046 306.5, 807 309.5, 878.5
    Range 9 - 2170 0 - 2472 17 - 2118 0 - 2472

It is important to understand how R treats the digits argument. Here are some summaries for the variable pi. Note that with 4 digits, the number after the decimal point changes after multiplying pi by 10 or 100. However, the nsmall option specifies the number of values after the decimal point. The two can be used together (see the help file for format for more details on how that works).

format(pi, digits=1)
## [1] "3"
format(pi, digits=3)
## [1] "3.14"
format(pi, digits=4)
## [1] "3.142"
format(pi*10, digits=4)
## [1] "31.42"
format(pi*100, digits=4)
## [1] "314.2"
format(pi*100, nsmall=4)
## [1] "314.1593"
format(pi*100, nsmall=2, digits=4)
## [1] "314.16"

15. Create a user-defined summary statistic

For purposes of this example, the code below creates a trimmed mean function (trims 10%) and use that to summarize the data. Note the use of the ... which tells R to pass extra arguments on - this is required for user-defined functions. In this case, na.rm=T is passed to myfunc. The weights argument is also required, even though it isn’t passed on to the internal function in this particular example.

myfunc <- function(x, weights=rep(1,length(x)), ...){
  mean(x, trim=.1, ...)
}

summary(tableby(sex ~ hgb, data=mockstudy, 
                control=tableby.control(numeric.stats=c("Nmiss","myfunc"), numeric.test="kwt",
                    stats.labels=list(Nmiss='Missing values', myfunc="Trimmed Mean, 10%"))))
Male (N=916) Female (N=583) Total (N=1499) p value
hgb <0.001
    Missing values 162 104 266
    Trimmed Mean, 10% 12.6 11.9 12.3

16. Use case-weights for creating summary statistics

When comparing groups, they are often unbalanced when it comes to nuisances such as age and sex. The tableby function allows you to create weighted summary statistics. If this option us used then p-values are not calculated (test=FALSE).

##create fake group that is not balanced by age/sex 
set.seed(200)
mockstudy$fake_arm <- ifelse(mockstudy$age>60 & mockstudy$sex=='Female',sample(c('A','B'),replace=T, prob=c(.2,.8)),
                            sample(c('A','B'),replace=T, prob=c(.8,.4)))

mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)

## create weights based on agegp and sex distribution
tab1 <- with(mockstudy,table(agegp, sex))
tab2 <- with(mockstudy, table(agegp, sex, fake_arm))
tab2
## , , fake_arm = A
## 
##          sex
## agegp     Male Female
##   [18,50)   73     62
##   [50,60)  128     94
##   [60,70)  139      7
##   [70,90)  102      0
## 
## , , fake_arm = B
## 
##          sex
## agegp     Male Female
##   [18,50)   79     48
##   [50,60)  130     84
##   [60,70)  156    166
##   [70,90)  109    122
gpwts <- rep(tab1, length(unique(mockstudy$fake_arm)))/tab2
gpwts[gpwts>50] <- 30

## apply weights to subjects
index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(fake_arm)))) 
mockstudy$wts <- gpwts[index]

## show weights by treatment arm group
tapply(mockstudy$wts,mockstudy$fake_arm, summary)
## $A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.774   1.894   2.069   2.276   2.082  24.710 
## 
## $B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.042   1.924   1.677   1.985   2.292
orig <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, test=FALSE)
summary(orig, title='No Case Weights used')
No Case Weights used
A (N=605) B (N=894) Total (N=1499)
Age in Years
    Mean (SD) 57.4 (11.6) 61.7 (11.1) 60 (11.5)
    Q1, Q3 50, 66 55, 70 52, 68
    Range 22 - 85 19 - 88 19 - 88
Gender
    Male 442 (73.1%) 474 (53%) 916 (61.1%)
    Female 163 (26.9%) 420 (47%) 583 (38.9%)
Surv(fu.time/365, fu.stat)
    Events 554 802 1356
    medSurv 1.5 1.49 1.5
tab1 <- tableby(fake_arm ~ age + sex + Surv(fu.time/365, fu.stat), data=mockstudy, weights=wts)
summary(tab1, title='Case Weights used')
Case Weights used
A (N=605) B (N=894) Total (N=1499)
Age in Years
    Mean (SD) 58 (10.9) 60.2 (11.4) 59.1 (11.2)
    Q1, Q3 51, 65 53, 68 52, 67
    Range 22 - 85 19 - 88 19 - 88
Gender
    Male 916 (66.5%) 916 (61.1%) 1832 (63.7%)
    Female 461 (33.5%) 583 (38.9%) 1044 (36.3%)
Surv(fu.time/365, fu.stat)
    Events 1252 1348 2599
    medSurv 1.53 1.5 1.53

17. Create your own p-value and add it to the table

When using weighted summary statistics, it is often desirable to then show a p-value from a model that corresponds to the weighted analysis. It is possible to add your own p-value and modify the column title for that new p-value. Another use for this would be to add standardized differences or confidence intervals instead of a p-value.

To add the p-value you simply need to create a data frame and use the function modpval.tableby. The first 2 columns in the dataframe are required and are the variable name and the new p-value. The third column can be used to indicate what method was used to calculate the p-value. If you specify use.pname=TRUE then the column name indicating the p-value will be also be used in the tableby summary.

mypval <- data.frame(variable=c('age','sex','Surv(fu.time/365, fu.stat)'), 
                     adj.pvalue=c(.953,.811,.01), 
                     method=c('Age/Sex adjusted model results'))
tab2 <- modpval.tableby(tab1, mypval, use.pname=TRUE)
summary(tab2, title='Case Weights used, p-values added') #, pfootnote=TRUE)
Case Weights used, p-values added
A (N=605) B (N=894) Total (N=1499) adj.pvalue
Age in Years 0.953
    Mean (SD) 58 (10.9) 60.2 (11.4) 59.1 (11.2)
    Q1, Q3 51, 65 53, 68 52, 67
    Range 22 - 85 19 - 88 19 - 88
Gender 0.811
    Male 916 (66.5%) 916 (61.1%) 1832 (63.7%)
    Female 461 (33.5%) 583 (38.9%) 1044 (36.3%)
Surv(fu.time/365, fu.stat) 0.010
    Events 1252 1348 2599
    medSurv 1.53 1.5 1.53

18. For two-level categorical variables, only display one level.

If the cat.simplify option is set to TRUE then only the second level of the group. In the example below sex has the levels and “Female” is the second level, hence only the %female is shown in the table. Similarly, “mdquality.s” was turned to a factor and “1” is the second level, hence

levels(mockstudy$sex)

[1] “Male” “Female”

table2 <- tableby(arm~sex + factor(mdquality.s), data=mockstudy, cat.simplify=TRUE)
summary(table2, labelTranslations=c(sex="Female", "factor(mdquality.s)"="MD Quality"))
A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
Female 151 (35.3%) 280 (40.5%) 152 (40%) 583 (38.9%) 0.190
MD Quality 0.694
    N-Miss 55 156 41 252
    1 332 (89%) 483 (90.3%) 308 (90.9%) 1123 (90.1%)

19. Use tableby within an Sweave document

For those users who wish to create tables within an Sweave document, the following code seems to work.

\documentclass{article}

\usepackage{longtable}
\usepackage{pdfpages}

\begin{document}

\section{Read in Data}
<<echo=TRUE>>=
require(arsenal)
require(knitr)
require(rmarkdown)
data(mockstudy)

tab1 <- tableby(arm~sex+age, data=mockstudy)
@

\section{Convert Summary.Tableby to LaTeX}
<<echo=TRUE, results='hide', message=FALSE>>=
capture.output(summary(tab1), file="Test.md")

## Convert R Markdown Table to LaTeX
render("Test.md", pdf_document(keep_tex=TRUE))
@ 

\includepdf{Test.pdf}

\end{document}

20. Export tableby object to a .CSV file

When looking at multiple variables it is sometimes useful to export the results to a csv file. The as.data.frame function creates a data frame object that can be exported or further manipulated within R.

tab1 <- tableby(arm~sex+age, data=mockstudy)
summary(tab1, text=T)
## 
## ---------------------------------------------------------------------------------------------------------------------------
##                          A: IFL (N=428)      F: FOLFOX (N=691)   G: IROX (N=380)     Total (N=1499)      p value           
## ----------------------- ------------------- ------------------- ------------------- ------------------- -------------------
## Gender                                                                                                                0.190
##    Male                 277 (64.7%)         411 (59.5%)         228 (60%)           916 (61.1%)        
##    Female               151 (35.3%)         280 (40.5%)         152 (40%)           583 (38.9%)        
## Age in Years                                                                                                          0.614
##    Mean (SD)            59.7 (11.4)         60.3 (11.6)         59.8 (11.5)         60 (11.5)          
##    Q1, Q3               53, 68              52, 69              52, 68              52, 68             
##    Range                27 - 88             19 - 88             26 - 85             19 - 88            
## ---------------------------------------------------------------------------------------------------------------------------
tmp <- as.data.frame(tab1)
tmp
##           term variable A: IFL (N=428) F: FOLFOX (N=691) G: IROX (N=380) Total (N=1499) p value
## 1       Gender      sex                                                                   0.190
## 2         Male      sex    277 (64.7%)       411 (59.5%)       228 (60%)    916 (61.1%)        
## 3       Female      sex    151 (35.3%)       280 (40.5%)       152 (40%)    583 (38.9%)        
## 4 Age in Years      age                                                                   0.614
## 5    Mean (SD)      age    59.7 (11.4)       60.3 (11.6)     59.8 (11.5)      60 (11.5)        
## 6       Q1, Q3      age         53, 68            52, 69          52, 68         52, 68        
## 7        Range      age        27 - 88           19 - 88         26 - 85        19 - 88
# write.csv(tmp, '/my/path/here/mymodel.csv')

21. Write tableby object to a separate Word or HTML file

## write to an HTML document
tab1 <- tableby(arm ~ sex + age, data=mockstudy)
write2html(tab1, "~/trash.html")

## write to a Word document
write2word(tab1, "~/trash.doc", title="My table in Word")

Available Function Options

Summary statistics

The default summary statistics, by varible type, are:

Any summary statistics standardly defined in R (e.g. mean, median, sd, med, range) can be specified, however there are a number of extra functions defined specifically for the tableby function.

Testing options

The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.

The following tests are accepted:

tableby.control settings

A quick way to see what arguments are possible to utilize in a function is to use the args() command. Settings involving the number of digits can be set in tableby.control or in summary.tableby.

args(tableby.control)
## function (test = TRUE, total = TRUE, test.pname = NULL, cat.simplify = FALSE, 
##     numeric.test = "anova", cat.test = "chisq", ordered.test = "trend", 
##     surv.test = "logrank", date.test = "kwt", numeric.stats = c("Nmiss", 
##         "meansd", "q1q3", "range"), cat.stats = c("Nmiss", "countpct"), 
##     ordered.stats = c("Nmiss", "countpct"), surv.stats = c("Nevents", 
##         "medSurv"), date.stats = c("Nmiss", "median", "range"), 
##     stats.labels = list(Nmiss = "N-Miss", Nmiss2 = "N-Miss", 
##         meansd = "Mean (SD)", q1q3 = "Q1, Q3", range = "Range", 
##         countpct = "Count (Pct)", Nevents = "Events", medsurv = "Median Survival"), 
##     digits = 3, digits.test = NULL, nsmall = NULL, nsmall.pct = NULL, 
##     ...) 
## NULL

Settings:

summary.tableby settings

The summary.tableby function has options that modify how the table appears (such as adding a title or modifying labels).

args(arsenal:::summary.tableby)
## function (object, title = NULL, labelTranslations = NULL, digits = NA, 
##     nsmall = NA, nsmall.pct = NA, digits.test = NA, text = FALSE, 
##     removeBlanks = text, labelSize = 1.2, test = NA, test.pname = NA, 
##     pfootnote = NA, total = NA, ...) 
## NULL

Settings: