This package is meant to implement the concept of a grammar of tables. It allows for a simple formula expression and a data frame to create a rich summary table in a variety of formats. It is designed for extensibility at each step of the process, so that one is not limited by the authors choice of table statistics, output format. The grammar however is an integral part of the package, and as such is not modifiable.
Here’s an example similar to summaryM from Hmisc to get us started:
summary_table("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc)
=====================================================================================================================
N D-penicillamine placebo not randomized Test Statistic
(N=154) (N=158) (N=106)
---------------------------------------------------------------------------------------------------------------------
Serum Bilirubin (mg/dl) 418 0.72 *1.30* 3.60 0.80 *1.40* 3.20 0.72 *1.40* 3.08 F_{2,415}=0.03, P=0.972
Albumin (gm/dl) 418 3.34 *3.54* 3.78 3.21 *3.56* 3.83 3.12 *3.47* 3.72 F_{2,415}=2.13, P=0.120
Histologic Stage, Ludwig Criteria 412 X^2_6=5.33, P=0.502
1 0.026 4/154 0.076 12/158 0.047 5/106
2 0.208 32/154 0.222 35/158 0.236 25/106
3 0.416 64/154 0.354 56/158 0.330 35/106
4 0.351 54/154 0.348 55/158 0.330 35/106
Prothrombin Time (sec.) 416 10.0 *10.6* 11.4 10.0 *10.6* 11.0 10.1 *10.6* 11.0 F_{2,413}=0.23, P=0.795
sex : female 418 0.903 139/154 0.867 137/158 0.925 98/106 X^2_2=2.38, P=0.304
Age 418 41.4 *48.1* 55.8 43.0 *51.9* 58.9 46.0 *53.0* 61.0 F_{2,415}=6.10, P=0.002
spiders : present 312 0.292 45/154 0.285 45/158 X^2_1=0.02, P=0.885
=====================================================================================================================
Or the same directly into an Rmarkdown pipe_table:
rmd(summary_table("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc))
N | D-penicillamine | placebo | not randomized | Test Statistic | |
---|---|---|---|---|---|
(N=154) | (N=158) | (N=106) | |||
Serum Bilirubin (mg/dl) | 418 | 0.72 1.30 3.60 | 0.80 1.40 3.20 | 0.72 1.40 3.08 | F2,415=0.03, P=0.972 |
Albumin (gm/dl) | 418 | 3.34 3.54 3.78 | 3.21 3.56 3.83 | 3.12 3.47 3.72 | F2,415=2.13, P=0.120 |
Histologic Stage, Ludwig Criteria | 412 | χ2 6=5.33, P=0.502 |
|||
1 | 0.026 4/154 | 0.076 12/158 | 0.047 5/106 | ||
2 | 0.208 32/154 | 0.222 35/158 | 0.236 25/106 | ||
3 | 0.416 64/154 | 0.354 56/158 | 0.330 35/106 | ||
4 | 0.351 54/154 | 0.348 55/158 | 0.330 35/106 | ||
Prothrombin Time (sec.) | 416 | 10.0 10.6 11.4 | 10.0 10.6 11.0 | 10.1 10.6 11.0 | F2,413=0.23, P=0.795 |
sex : female | 418 | 0.903 139/154 | 0.867 137/158 | 0.925 98/106 | χ2 2=2.38, P=0.304 |
Age | 418 | 41.4 48.1 55.8 | 43.0 51.9 58.9 | 46.0 53.0 61.0 | F2,415=6.10, P=0.002 |
spiders : present | 312 | 0.292 45/154 | 0.285 45/158 | χ2 1=0.02, P=0.885 |
Notice that stage in the formula wasn’t stored as a factor, i.e. Categorical variable, so by adding a type specifier in the formula given, it is treated as a Categorical. There is no preconversion applied to the data frame, nor is there a guess based on the number of unique values. Full direct control of typing is provided in the formula specification.
It also supports HTML5, with styling fragments
html5(summary_table("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc),
fragment=TRUE, inline="hmisc.css", caption = "HTML5 Table Hmisc Style", id="tbl2")
N | D-penicillamine | placebo | not randomized | Test Statistic | |
154 | 158 | 106 | |||
Serum Bilirubin mg/dl | 418 | 0.721.303.60 | 0.801.403.20 | 0.721.403.08 | F2,415 = 0.03,P = 0.9721 |
Albumin gm/dl | 418 | 3.343.543.78 | 3.213.563.83 | 3.123.473.72 | F2,415 = 2.13,P = 0.1201 |
Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.5022 | |||
1 | 0 . 0262.6 4154 | 0 . 0767.6 12158 | 0 . 0474.7 5106 | ||
2 | 0 . 20820.8 32154 | 0 . 22222.2 35158 | 0 . 23623.6 25106 | ||
3 | 0 . 41641.6 64154 | 0 . 35435.4 56158 | 0 . 33033.0 35106 | ||
4 | 0 . 35135.1 54154 | 0 . 34834.8 55158 | 0 . 33033.0 35106 | ||
Prothrombin Time sec. | 416 | 10.010.611.4 | 10.010.611.0 | 10.110.611.0 | F2,413 = 0.23,P = 0.7951 |
sex : female | 418 | 0 . 90390.3139154 | 0 . 86786.7137158 | 0 . 92592.5 98106 | χ2 2 = 2.38,P = 0.3042 |
Age | 418 | 41.448.155.8 | 43.051.958.9 | 46.053.061.0 | F2,415 = 6.10,P = 0.0021 |
spiders : present | 312 | 0 . 29229.2 45154 | 0 . 28528.5 45158 | χ2 1 = 0.02,P = 0.8852 |
Fragments can have localized style sheets specified by given id.
html5(summary_table("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc),
fragment=TRUE, inline="nejm.css", caption = "HTML5 Table NEJM Style", id="tbl3")
N | D-penicillamine | placebo | not randomized | Test Statistic | |
154 | 158 | 106 | |||
Serum Bilirubin mg/dl | 418 | 0.721.303.60 | 0.801.403.20 | 0.721.403.08 | F2,415 = 0.03,P = 0.9721 |
Albumin gm/dl | 418 | 3.343.543.78 | 3.213.563.83 | 3.123.473.72 | F2,415 = 2.13,P = 0.1201 |
Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.5022 | |||
1 | 0 . 0262.6 4154 | 0 . 0767.6 12158 | 0 . 0474.7 5106 | ||
2 | 0 . 20820.8 32154 | 0 . 22222.2 35158 | 0 . 23623.6 25106 | ||
3 | 0 . 41641.6 64154 | 0 . 35435.4 56158 | 0 . 33033.0 35106 | ||
4 | 0 . 35135.1 54154 | 0 . 34834.8 55158 | 0 . 33033.0 35106 | ||
Prothrombin Time sec. | 416 | 10.010.611.4 | 10.010.611.0 | 10.110.611.0 | F2,413 = 0.23,P = 0.7951 |
sex : female | 418 | 0 . 90390.3139154 | 0 . 86786.7137158 | 0 . 92592.5 98106 | χ2 2 = 2.38,P = 0.3042 |
Age | 418 | 41.448.155.8 | 43.051.958.9 | 46.053.061.0 | F2,415 = 6.10,P = 0.0021 |
spiders : present | 312 | 0 . 29229.2 45154 | 0 . 28528.5 45158 | χ2 1 = 0.02,P = 0.8852 |
Fragments can have localized style sheets specified by given id.
# Lancet uses 4-digit p-values
p_digits_4 <- cell_transform(function(cell) {
if("p" %in% names(cell)) cell$p <- form(cell$p, "%1.4f")
cell
})
html5(summary_table("drug ~ bili[2] + albumin + stage::Categorical + protime + sex + age + spiders", pbc,
after=p_digits_4),
fragment=TRUE, inline="lancet.css", caption = "HTML5 Table Lancet Style", id="tbl4"
)
N | D-penicillamine | placebo | not randomized | Test Statistic | |
154 | 158 | 106 | |||
Serum Bilirubin mg/dl | 418 | 0.721.303.60 | 0.801.403.20 | 0.721.403.08 | F2,415 = 0.03,P = 0.97251 |
Albumin gm/dl | 418 | 3.343.543.78 | 3.213.563.83 | 3.123.473.72 | F2,415 = 2.13,P = 0.12001 |
Histologic Stage, Ludwig Criteria | 412 | χ2 6 = 5.33,P = 0.50242 | |||
1 | 0 . 0262.6 4154 | 0 . 0767.6 12158 | 0 . 0474.7 5106 | ||
2 | 0 . 20820.8 32154 | 0 . 22222.2 35158 | 0 . 23623.6 25106 | ||
3 | 0 . 41641.6 64154 | 0 . 35435.4 56158 | 0 . 33033.0 35106 | ||
4 | 0 . 35135.1 54154 | 0 . 34834.8 55158 | 0 . 33033.0 35106 | ||
Prothrombin Time sec. | 416 | 10.010.611.4 | 10.010.611.0 | 10.110.611.0 | F2,413 = 0.23,P = 0.79471 |
sex : female | 418 | 0 . 90390.3139154 | 0 . 86786.7137158 | 0 . 92592.5 98106 | χ2 2 = 2.38,P = 0.30392 |
Age | 418 | 41.448.155.8 | 43.051.958.9 | 46.053.061.0 | F2,415 = 6.10,P = 0.00241 |
spiders : present | 312 | 0 . 29229.2 45154 | 0 . 28528.5 45158 | χ2 1 = 0.02,P = 0.88532 |
It is also capable of producing an index of contents inside a table for traceability.
index(summary_table("drug ~ bili + albumin + stage::Categorical + protime + sex + age + spiders", pbc))[1:20,]
key src
[1,] "MTI1" "Table:bili:drug[D-penicillamine]:N"
[2,] "ODI3" "Table:bili:drug[placebo]:N"
[3,] "Zjg4" "Table:bili:drug[not randomized]:N"
[4,] "ZjZm" "Table:bili:drug:N"
[5,] "ZDYw" "Table:bili:drug[D-penicillamine]:quantile"
[6,] "ZGI1" "Table:bili:drug[placebo]:quantile"
[7,] "OGM4" "Table:bili:drug[not randomized]:quantile"
[8,] "YTI1" "Table:bili:drug:F"
[9,] "ZDhi" "Table:albumin:drug:N"
[10,] "YzEy" "Table:albumin:drug[D-penicillamine]:quantile"
[11,] "ODBm" "Table:albumin:drug[placebo]:quantile"
[12,] "MzQy" "Table:albumin:drug[not randomized]:quantile"
[13,] "ZDlm" "Table:albumin:drug:F"
[14,] "ODZk" "Table:stage:drug:N"
[15,] "MjUx" "Table:stage:drug:htest"
[16,] "NTUx" "Table:stage[1]:drug[D-penicillamine]:fraction"
[17,] "Zjdl" "Table:stage[1]:drug[placebo]:fraction"
[18,] "Yjk2" "Table:stage[1]:drug[not randomized]:fraction"
[19,] "YWQy" "Table:stage[2]:drug[D-penicillamine]:fraction"
[20,] "NmEy" "Table:stage[2]:drug[placebo]:fraction"
value
[1,] "154"
[2,] "158"
[3,] "106"
[4,] "418"
[5,] "1.3 [0.725, 3.6]"
[6,] "1.4 [0.8, 3.2]"
[7,] "1.4 [0.725, 3.075]"
[8,] "F=0.0279075093333664, p = 0.972480132693603"
[9,] "418"
[10,] "3.545 [3.3425, 3.7775]"
[11,] "3.565 [3.2125, 3.83]"
[12,] "3.47 [3.125, 3.72]"
[13,] "F=2.13150432275865, p = 0.119955914166202"
[14,] "412"
[15,] "chisq=5.32908004628618, p=0.50235045718865"
[16,] "0.026 4/154"
[17,] "0.076 12/158"
[18,] "0.047 5/106"
[19,] "0.208 32/154"
[20,] "0.222 35/158"
x <- round(rnorm(375, 79, 10))
y <- round(rnorm(375, 80, 9))
y[rbinom(375, 1, prob=0.05)] <- NA
attr(x, "label") <- "Global score, 3m"
attr(y, "label") <- "Global score, 12m"
html5(summary_table(1 ~ x+y,
data.frame(x=x, y=y),
after=hmisc_intercept_cleanup),
fragment=TRUE, inline="lancet.css", caption="", id="tbl5")
N | All | |
Global score, 3m | 375 | 727986 |
Global score, 12m | 374 | 748086 |
The Hmisc default style recognizes 3 types: Categorical, Bionimial, and Numerical. Then for each product of these two, a function is provided to generate the corresponding rows and columns. As mentioned before, the user can declare any type in a formula, and one is not limited to the Hmisc defaults. This is completely customizable, which will be covered later.
Let’s cover the phases of table generations.
drug ~ stage::Categorical
, is a Categorical\(\times\)Categorical which references the summarize_chisq
for compiling. One can easily specify different compilers for a formula and get very different results inside a formula. Note: the application of multiplication *
cannot be done in the previous phase, because this involves semantic meaning of what multiplication means. In one context it might be an interaction, in another simple multiplication. Handling multiplicative terms can be tricky. Once compiling is finished a table object composed of cells (list of lists) which are one of a variety of S3 types is the result.A simple example of using an intercept in a formula, with some post processing to remove undesired columns.
d1 <- iris
d1$A <- d1$Sepal.Length > 5.1
attr(d1$A,"label") <- "Sepal Length > 5.1"
tbl1 <- summary_table(
Species + 1 ~ A + Sepal.Width,
data = d1,
after = list(drop_statistics, function(tbl) del_col(tbl, 6))
)
html5(tbl1,
fragment=TRUE, inline="nejm.css", caption = "Example All Summary", id="tbl1")
N | setosa | versicolor | virginica | All | |
50 | 50 | 50 | 150 | ||
Sepal Length > 5.1 : TRUE | 150 | 0 . 28028.01450 | 0 . 92092.04650 | 0 . 98098.04950 | 0 . 72772.7109150 |
Sepal.Width | 150 | 3.203.403.68 | 2.522.803.00 | 2.803.003.18 | 2.803.003.30 |
The library is designed to be extensible, in the hopes that more useful summary functions can generate results into a wide variety of formats. This is done by the translator functions, which given a row and column from a formula will process the data into a table.
This example shows how to create a function that given a row and column, to construct summary entries for a table.
### Make up some data, which has events nested within an id
n <- 1000
df <- data.frame(id = sample(1:250, n*3, replace=TRUE), event = as.factor(rep(c("A", "B","C"), n)))
attr(df$id, "label") <- "ID"
### Now create custom function for counting events with a category
summarize_count <- function(table, row, column)
{
### Getting Data for row column ast nodes, assuming no factors
datar <- row$data
datac <- column$data
### Grabbing categories
col_categories <- levels(datac)
n_labels <- lapply(col_categories, FUN=function(cat_name){
x <- datar[datac == cat_name]
# Worst interface complexity example. Work in progress to simplify
tg(tg_N(length(unique(x))), row, column, subcol=cat_name)
})
# Test a poisson model
test <- aov(glm(x ~ treatment,
aggregate(datar, by=list(id=datar, treatment=datac), FUN=length),
family=poisson))
# Build the table
table %>%
# Create Headers
row_header(derive_label(row)) %>%
col_header("N", col_categories, "Test Statistic") %>%
col_header("", n_labels, "" ) %>%
# Add the First column of summary data as an N value
add_col(tg_N(length(unique(datar)))) %>%
# Now add quantiles for the counts
table_builder_apply(col_categories, FUN=
function(tbl, cat_name) {
# Compute each data set
x <- datar[datac == cat_name]
xx <- aggregate(x, by=list(x), FUN=length)$x
# Add a column that is a quantile
add_col(tbl, tg_quantile(xx, row$format, na.rm=TRUE))
}) %>%
# Now add a statistical test for the final column
add_col(test)
}
summary_table(event ~ id["%1.0f"], df, summarize_count)
===========================================================
N A B C Test Statistic
(N=247) (N=240) (N=242)
-----------------------------------------------------------
ID 250 3 *4* 5 3 *4* 6 3 *4* 5 F_{2,726}=0.23, P=0.798
===========================================================