The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to sumExtras

library(sumExtras)
library(gtsummary)
library(dplyr)

# Apply the recommended JAMA theme
use_jama_theme()

Overview

If you’ve worked with gtsummary before, you’re familiar with the typical workflow of building summary tables: creating a base table with tbl_summary(), then progressively adding features like overall columns, p-values, and formatting tweaks. While gtsummary’s modular approach provides flexibility, the same sequence of functions appears repeatedly in analysis scripts.

sumExtras streamlines this process by providing convenience functions that apply commonly-used formatting patterns in a single step. The package handles three main pain points:

  1. Repetitive styling workflows - Combining multiple formatting steps into one function call
  2. Inconsistent missing value displays - Standardizing how NA values appear across tables
  3. Manual variable labeling - Automating label assignment from data dictionaries

This vignette will get you started with the core functions. For more specialized workflows, see:

The extras() Function

The signature function of this package, extras(), consolidates the most common table enhancements into a single step. At minimum, it adds bold labels, removes the “Characteristic” header, and standardizes missing value display. With default settings, it also adds an overall column and p-values.

Basic Usage

Standard gtsummary workflow

trial |>
  tbl_summary(by = trt) |>
  add_overall() |>
  add_p() |>
  bold_labels() |>
  modify_header(label ~ "")

Equivalent using extras()

trial |>
  tbl_summary(by = trt) |>
  extras()
Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
Age 47 (38, 57) 46 (37, 60) 48 (39, 56) 0.7
    Unknown 11 7 4
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.085
    Unknown 10 6 4
T Stage


0.9
    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade


0.9
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 61 (32%) 28 (29%) 33 (34%) 0.5
    Unknown 7 3 4
Patient Died 112 (56%) 52 (53%) 60 (59%) 0.4
Months to Death/Censor 22.4 (15.9, 24.0) 23.5 (17.4, 24.0) 21.2 (14.5, 24.0) 0.14
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test
Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
Age 47 (38, 57) 46 (37, 60) 48 (39, 56) 0.718
    Unknown 11 7 4
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.085
    Unknown 10 6 4
T Stage


0.866
    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade


0.871
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 61 (32%) 28 (29%) 33 (34%) 0.530
    Unknown 7 3 4
Patient Died 112 (56%) 52 (53%) 60 (59%) 0.412
Months to Death/Censor 22.4 (15.9, 24.0) 23.5 (17.4, 24.0) 21.2 (14.5, 24.0) 0.145
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

Both approaches produce the same result, but extras() requires less code and ensures consistency across your analysis.

Customizing Output

You can control which features are applied using the function arguments:

# Table without p-values
trial |>
  tbl_summary(by = trt) |>
  extras(pval = FALSE)
Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
Age 47 (38, 57) 46 (37, 60) 48 (39, 56)
    Unknown 11 7 4
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21)
    Unknown 10 6 4
T Stage


    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade


    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 61 (32%) 28 (29%) 33 (34%)
    Unknown 7 3 4
Patient Died 112 (56%) 52 (53%) 60 (59%)
Months to Death/Censor 22.4 (15.9, 24.0) 23.5 (17.4, 24.0) 21.2 (14.5, 24.0)
1 Median (Q1, Q3); n (%)

# Table without overall column
trial |>
  tbl_summary(by = trt) |>
  extras(overall = FALSE)
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
Age 46 (37, 60) 48 (39, 56) 0.718
    Unknown 7 4
Marker Level (ng/mL) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.085
    Unknown 6 4
T Stage

0.866
    T1 28 (29%) 25 (25%)
    T2 25 (26%) 29 (28%)
    T3 22 (22%) 21 (21%)
    T4 23 (23%) 27 (26%)
Grade

0.871
    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%) 0.530
    Unknown 3 4
Patient Died 52 (53%) 60 (59%) 0.412
Months to Death/Censor 23.5 (17.4, 24.0) 21.2 (14.5, 24.0) 0.145
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

# Overall column as last column (default is to set it as first)
trial |>
  tbl_summary(by = trt) |>
  extras(last = TRUE)
Drug A
N = 98
1
Drug B
N = 102
1
Overall
N = 200
1
p-value2
Age 46 (37, 60) 48 (39, 56) 47 (38, 57) 0.718
    Unknown 7 4 11
Marker Level (ng/mL) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.64 (0.22, 1.41) 0.085
    Unknown 6 4 10
T Stage


0.866
    T1 28 (29%) 25 (25%) 53 (27%)
    T2 25 (26%) 29 (28%) 54 (27%)
    T3 22 (22%) 21 (21%) 43 (22%)
    T4 23 (23%) 27 (26%) 50 (25%)
Grade


0.871
    I 35 (36%) 33 (32%) 68 (34%)
    II 32 (33%) 36 (35%) 68 (34%)
    III 31 (32%) 33 (32%) 64 (32%)
Tumor Response 28 (29%) 33 (34%) 61 (32%) 0.530
    Unknown 3 4 7
Patient Died 52 (53%) 60 (59%) 112 (56%) 0.412
Months to Death/Censor 23.5 (17.4, 24.0) 21.2 (14.5, 24.0) 22.4 (15.9, 24.0) 0.145
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

For projects with consistent table formatting requirements, you can define styling parameters once and reuse them:

# Define standard table settings for a project
standard_table_args <- list(
  pval = TRUE,
  overall = TRUE,
  last = TRUE
)

# Apply consistently across multiple tables
trial |>
  select(age, grade, stage, trt) |>
  tbl_summary(by = trt) |>
  extras(.args = standard_table_args)
Drug A
N = 98
1
Drug B
N = 102
1
Overall
N = 200
1
p-value2
Age 46 (37, 60) 48 (39, 56) 47 (38, 57) 0.718
    Unknown 7 4 11
Grade


0.871
    I 35 (36%) 33 (32%) 68 (34%)
    II 32 (33%) 36 (35%) 68 (34%)
    III 31 (32%) 33 (32%) 64 (32%)
T Stage


0.866
    T1 28 (29%) 25 (25%) 53 (27%)
    T2 25 (26%) 29 (28%) 54 (27%)
    T3 22 (22%) 21 (21%) 43 (22%)
    T4 23 (23%) 27 (26%) 50 (25%)
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

Cleaning Missing Values

One subtle but important aspect of table presentation is how missing or undefined values are displayed. gtsummary tables can show various representations of missing data: “0 (NA%)”, “NA (NA)”, “NA, NA”, etc. These inconsistencies create visual clutter and make tables harder to scan.

The clean_table() function (which is called automatically by extras()) standardizes all zero (0 (0%)) or missing value representations to “—”:

Without cleaning

trial_missing |>
  tbl_summary(by = trt)

With clean_table()

trial_missing |>
  tbl_summary(by = trt) |>
  clean_table()
Characteristic Drug A
N = 98
1
Drug B
N = 102
1
age 46 (37, 60) NA (NA, NA)
    Unknown 7 102
marker NA (NA, NA) 0.52 (0.18, 1.21)
    Unknown 98 4
T Stage

    T1 28 (29%) 25 (25%)
    T2 25 (26%) 29 (28%)
    T3 22 (22%) 21 (21%)
    T4 23 (23%) 27 (26%)
Grade

    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%)
    Unknown 3 4
Patient Died 52 (53%) 60 (59%)
Months to Death/Censor 23.5 (17.4, 24.0) 21.2 (14.5, 24.0)
1 Median (Q1, Q3); n (%)
Characteristic Drug A
N = 98
1
Drug B
N = 102
1
age 46 (37, 60)
    Unknown 7 102
marker 0.52 (0.18, 1.21)
    Unknown 98 4
T Stage

    T1 28 (29%) 25 (25%)
    T2 25 (26%) 29 (28%)
    T3 22 (22%) 21 (21%)
    T4 23 (23%) 27 (26%)
Grade

    I 35 (36%) 33 (32%)
    II 32 (33%) 36 (35%)
    III 31 (32%) 33 (32%)
Tumor Response 28 (29%) 33 (34%)
    Unknown 3 4
Patient Died 52 (53%) 60 (59%)
Months to Death/Censor 23.5 (17.4, 24.0) 21.2 (14.5, 24.0)
1 Median (Q1, Q3); n (%)

You can also use clean_table() independently if you prefer to build tables step-by-step:

trial_missing |>
  tbl_summary(by = trt) |>
  add_overall() |>
  add_p() |>
  clean_table()
Characteristic Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
age 46 (37, 60) 46 (37, 60)
    Unknown 109 7 102
marker 0.52 (0.18, 1.21) 0.52 (0.18, 1.21)
    Unknown 102 98 4
T Stage


0.9
    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade


0.9
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 61 (32%) 28 (29%) 33 (34%) 0.5
    Unknown 7 3 4
Patient Died 112 (56%) 52 (53%) 60 (59%) 0.4
Months to Death/Censor 22.4 (15.9, 24.0) 23.5 (17.4, 24.0) 21.2 (14.5, 24.0) 0.14
1 Median (Q1, Q3); n (%)
2 NA; Pearson’s Chi-squared test; Wilcoxon rank sum test

Quick Start: Automatic Labeling

One of the most time-consuming aspects of creating publication-ready tables is labeling variables with human-readable descriptions. sumExtras provides a streamlined labeling system using data dictionaries:

# Create a simple dictionary
dictionary <- tibble::tribble(
  ~Variable,    ~Description,
  "trt",        "Chemotherapy Treatment",
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "stage",      "T Stage",
  "grade",      "Tumor Grade"
)

# Apply labels automatically
trial |>
  tbl_summary(by = trt, include = c(age, grade, marker)) |>
  add_auto_labels(dictionary = dictionary) |>
  extras()
Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
Age 47 (38, 57) 46 (37, 60) 48 (39, 56) 0.718
    Unknown 11 7 4
Grade


0.871
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.085
    Unknown 10 6 4
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

The add_auto_labels() function is intelligent and flexible:

What About More Complex Labeling Workflows?

The labeling system is much more powerful than this basic example suggests. You can:

For comprehensive coverage of these workflows and real-world examples, see vignette("labeling").

Basic Theme Setup

sumExtras is designed to work best with the JAMA compact theme. Use use_jama_theme() to apply this theme to all gtsummary tables in your session:

# Apply JAMA compact theme (typically done once at the beginning)
use_jama_theme()
#> Setting theme "Compact"
#> Applied JAMA compact theme to {gtsummary}

This is equivalent to calling gtsummary::set_gtsummary_theme(gtsummary::theme_gtsummary_compact("jama")) but provides a more convenient interface. You can reset to the default gtsummary theme at any time with gtsummary::reset_gtsummary_theme().

For information about matching gt table styles, creating styled group headers, and advanced formatting techniques, see vignette("styling").

Putting It All Together

Here’s a simple workflow demonstrating how these core functions work together:

# 1. Define your dictionary (typically done once per project)
my_dictionary <- tibble::tribble(
  ~Variable,    ~Description,
  "trt",        "Chemotherapy Treatment",
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "stage",      "T Stage",
  "grade",      "Tumor Grade",
  "response",   "Tumor Response"
)

# 2. Set the recommended theme (once per session)
use_jama_theme()
#> Setting theme "Compact"
#> Applied JAMA compact theme to {gtsummary}

# 3. Create a clean, labeled table with one function call
trial |>
  select(age, marker, stage, grade, response, trt) |>
  tbl_summary(
    by = trt,
    missing = "no"
  ) |>
  add_auto_labels(dictionary = my_dictionary) |>
  extras()
Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
p-value2
Age 47 (38, 57) 46 (37, 60) 48 (39, 56) 0.718
Marker Level (ng/mL) 0.64 (0.22, 1.41) 0.84 (0.23, 1.60) 0.52 (0.18, 1.21) 0.085
T Stage


0.866
    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade


0.871
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 61 (32%) 28 (29%) 33 (34%) 0.530
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

That’s it! With just a few lines of code, you have a publication-ready table with automatic labeling, clean missing values, bold labels, an overall column, and p-values.

Next Steps

This vignette covered the essential functions to get you started quickly. For more advanced usage:

For detailed information about individual functions, see their help documentation:

The package is designed to reduce repetitive code while maintaining the flexibility of gtsummary’s modular approach. Use as much or as little as fits your workflow.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.