Introduction to sumExtras

Overview

If you’ve worked with gtsummary before, you’re familiar with the typical workflow of building summary tables: creating a base table with tbl_summary(), then progressively adding features like overall columns, p-values, and formatting tweaks. While gtsummary’s modular approach provides flexibility, the same sequence of functions appears repeatedly in analysis scripts.

sumExtras streamlines this process by providing convenience functions that apply commonly-used formatting patterns in a single step. The package handles three main pain points:

Repetitive styling workflows - Combining multiple formatting steps into one function call
Inconsistent missing value displays - Standardizing how NA values appear across tables
Manual variable labeling - Automating label assignment from data dictionaries

This vignette will get you started with the core functions. For more specialized workflows, see:

vignette("labeling") - Comprehensive guide to automatic variable labeling across tables and plots
vignette("styling") - Advanced table styling and formatting techniques

The `extras()` Function

The signature function of this package, extras(), consolidates the most common table enhancements into a single step. At minimum, it adds bold labels, removes the “Characteristic” header, and standardizes missing value display. With default settings, it also adds an overall column and p-values.

Basic Usage

Standard gtsummary workflow

trial |>
  tbl_summary(by = trt) |>
  add_overall() |>
  add_p() |>
  bold_labels() |>
  modify_header(label ~ "")

Equivalent using extras()

trial |>
  tbl_summary(by = trt) |>
  extras()

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.7
Unknown	11	7	4
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	10	6	4
T Stage				0.9
T1	53 (27%)	28 (29%)	25 (25%)
T2	54 (27%)	25 (26%)	29 (28%)
T3	43 (22%)	22 (22%)	21 (21%)
T4	50 (25%)	23 (23%)	27 (26%)
Grade				0.9
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Tumor Response	61 (32%)	28 (29%)	33 (34%)	0.5
Unknown	7	3	4
Patient Died	112 (56%)	52 (53%)	60 (59%)	0.4
Months to Death/Censor	22.4 (15.9, 24.0)	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)	0.14
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Unknown	11	7	4
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	10	6	4
T Stage				0.866
T1	53 (27%)	28 (29%)	25 (25%)
T2	54 (27%)	25 (26%)	29 (28%)
T3	43 (22%)	22 (22%)	21 (21%)
T4	50 (25%)	23 (23%)	27 (26%)
Grade				0.871
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Tumor Response	61 (32%)	28 (29%)	33 (34%)	0.530
Unknown	7	3	4
Patient Died	112 (56%)	52 (53%)	60 (59%)	0.412
Months to Death/Censor	22.4 (15.9, 24.0)	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)	0.145
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

Both approaches produce the same result, but extras() requires less code and ensures consistency across your analysis.

Customizing Output

You can control which features are applied using the function arguments:

# Table without p-values
trial |>
  tbl_summary(by = trt) |>
  extras(pval = FALSE)

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)
Unknown	11	7	4
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)
Unknown	10	6	4
T Stage
T1	53 (27%)	28 (29%)	25 (25%)
T2	54 (27%)	25 (26%)	29 (28%)
T3	43 (22%)	22 (22%)	21 (21%)
T4	50 (25%)	23 (23%)	27 (26%)
Grade
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Tumor Response	61 (32%)	28 (29%)	33 (34%)
Unknown	7	3	4
Patient Died	112 (56%)	52 (53%)	60 (59%)
Months to Death/Censor	22.4 (15.9, 24.0)	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)
¹ Median (Q1, Q3); n (%)


# Table without overall column
trial |>
  tbl_summary(by = trt) |>
  extras(overall = FALSE)

	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	46 (37, 60)	48 (39, 56)	0.718
Unknown	7	4
Marker Level (ng/mL)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	6	4
T Stage			0.866
T1	28 (29%)	25 (25%)
T2	25 (26%)	29 (28%)
T3	22 (22%)	21 (21%)
T4	23 (23%)	27 (26%)
Grade			0.871
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28 (29%)	33 (34%)	0.530
Unknown	3	4
Patient Died	52 (53%)	60 (59%)	0.412
Months to Death/Censor	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)	0.145
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test


# Overall column as last column (default is to set it as first)
trial |>
  tbl_summary(by = trt) |>
  extras(last = TRUE)

	Drug A N = 98¹	Drug B N = 102¹	Overall N = 200¹	p-value²
Age	46 (37, 60)	48 (39, 56)	47 (38, 57)	0.718
Unknown	7	4	11
Marker Level (ng/mL)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.64 (0.22, 1.41)	0.085
Unknown	6	4	10
T Stage				0.866
T1	28 (29%)	25 (25%)	53 (27%)
T2	25 (26%)	29 (28%)	54 (27%)
T3	22 (22%)	21 (21%)	43 (22%)
T4	23 (23%)	27 (26%)	50 (25%)
Grade				0.871
I	35 (36%)	33 (32%)	68 (34%)
II	32 (33%)	36 (35%)	68 (34%)
III	31 (32%)	33 (32%)	64 (32%)
Tumor Response	28 (29%)	33 (34%)	61 (32%)	0.530
Unknown	3	4	7
Patient Died	52 (53%)	60 (59%)	112 (56%)	0.412
Months to Death/Censor	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)	22.4 (15.9, 24.0)	0.145
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

For projects with consistent table formatting requirements, you can define styling parameters once and reuse them:

# Define standard table settings for a project
standard_table_args <- list(
  pval = TRUE,
  overall = TRUE,
  last = TRUE
)

# Apply consistently across multiple tables
trial |>
  select(age, grade, stage, trt) |>
  tbl_summary(by = trt) |>
  extras(.args = standard_table_args)

	Drug A N = 98¹	Drug B N = 102¹	Overall N = 200¹	p-value²
Age	46 (37, 60)	48 (39, 56)	47 (38, 57)	0.718
Unknown	7	4	11
Grade				0.871
I	35 (36%)	33 (32%)	68 (34%)
II	32 (33%)	36 (35%)	68 (34%)
III	31 (32%)	33 (32%)	64 (32%)
T Stage				0.866
T1	28 (29%)	25 (25%)	53 (27%)
T2	25 (26%)	29 (28%)	54 (27%)
T3	22 (22%)	21 (21%)	43 (22%)
T4	23 (23%)	27 (26%)	50 (25%)
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

Cleaning Missing Values

One subtle but important aspect of table presentation is how missing or undefined values are displayed. gtsummary tables can show various representations of missing data: “0 (NA%)”, “NA (NA)”, “NA, NA”, etc. These inconsistencies create visual clutter and make tables harder to scan.

The clean_table() function (which is called automatically by extras()) standardizes all zero (0 (0%)) or missing value representations to “—”:

Without cleaning

trial_missing |>
  tbl_summary(by = trt)

With clean_table()

trial_missing |>
  tbl_summary(by = trt) |>
  clean_table()

Characteristic	Drug A N = 98¹	Drug B N = 102¹
age	46 (37, 60)	NA (NA, NA)
Unknown	7	102
marker	NA (NA, NA)	0.52 (0.18, 1.21)
Unknown	98	4
T Stage
T1	28 (29%)	25 (25%)
T2	25 (26%)	29 (28%)
T3	22 (22%)	21 (21%)
T4	23 (23%)	27 (26%)
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28 (29%)	33 (34%)
Unknown	3	4
Patient Died	52 (53%)	60 (59%)
Months to Death/Censor	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)
¹ Median (Q1, Q3); n (%)

Characteristic	Drug A N = 98¹	Drug B N = 102¹
age	46 (37, 60)	—
Unknown	7	102
marker	—	0.52 (0.18, 1.21)
Unknown	98	4
T Stage
T1	28 (29%)	25 (25%)
T2	25 (26%)	29 (28%)
T3	22 (22%)	21 (21%)
T4	23 (23%)	27 (26%)
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)
Tumor Response	28 (29%)	33 (34%)
Unknown	3	4
Patient Died	52 (53%)	60 (59%)
Months to Death/Censor	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)
¹ Median (Q1, Q3); n (%)

You can also use clean_table() independently if you prefer to build tables step-by-step:

trial_missing |>
  tbl_summary(by = trt) |>
  add_overall() |>
  add_p() |>
  clean_table()

Characteristic	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
age	46 (37, 60)	46 (37, 60)	—
Unknown	109	7	102
marker	0.52 (0.18, 1.21)	—	0.52 (0.18, 1.21)
Unknown	102	98	4
T Stage				0.9
T1	53 (27%)	28 (29%)	25 (25%)
T2	54 (27%)	25 (26%)	29 (28%)
T3	43 (22%)	22 (22%)	21 (21%)
T4	50 (25%)	23 (23%)	27 (26%)
Grade				0.9
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Tumor Response	61 (32%)	28 (29%)	33 (34%)	0.5
Unknown	7	3	4
Patient Died	112 (56%)	52 (53%)	60 (59%)	0.4
Months to Death/Censor	22.4 (15.9, 24.0)	23.5 (17.4, 24.0)	21.2 (14.5, 24.0)	0.14
¹ Median (Q1, Q3); n (%)
² NA; Pearson’s Chi-squared test; Wilcoxon rank sum test

Quick Start: Automatic Labeling

One of the most time-consuming aspects of creating publication-ready tables is labeling variables with human-readable descriptions. sumExtras provides a streamlined labeling system using data dictionaries:

# Create a simple dictionary
dictionary <- tibble::tribble(
  ~Variable,    ~Description,
  "trt",        "Chemotherapy Treatment",
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "stage",      "T Stage",
  "grade",      "Tumor Grade"
)

# Apply labels automatically
trial |>
  tbl_summary(by = trt, include = c(age, grade, marker)) |>
  add_auto_labels(dictionary = dictionary) |>
  extras()

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Unknown	11	7	4
Grade				0.871
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
Unknown	10	6	4
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

The add_auto_labels() function is intelligent and flexible:

Pass a dictionary explicitly, or let it find one in your environment automatically
Works with pre-labeled data (from haven, Hmisc, or manual labeling)
Manual labels in tbl_summary() always override automatic ones
Compatible with regression tables via tbl_regression()

What About More Complex Labeling Workflows?

The labeling system is much more powerful than this basic example suggests. You can:

Use one dictionary for both gtsummary tables and ggplot2 visualizations
Control label priority when you have multiple label sources
Set up cross-package workflows with apply_labels_from_dictionary()
Understand how R’s label attribute system works under the hood

For comprehensive coverage of these workflows and real-world examples, see vignette("labeling").

Basic Theme Setup

sumExtras is designed to work best with the JAMA compact theme. Use use_jama_theme() to apply this theme to all gtsummary tables in your session:

# Apply JAMA compact theme (typically done once at the beginning)
use_jama_theme()
#> Setting theme "Compact"
#> Applied JAMA compact theme to {gtsummary}

This is equivalent to calling gtsummary::set_gtsummary_theme(gtsummary::theme_gtsummary_compact("jama")) but provides a more convenient interface. You can reset to the default gtsummary theme at any time with gtsummary::reset_gtsummary_theme().

For information about matching gt table styles, creating styled group headers, and advanced formatting techniques, see vignette("styling").

Putting It All Together

Here’s a simple workflow demonstrating how these core functions work together:

# 1. Define your dictionary (typically done once per project)
my_dictionary <- tibble::tribble(
  ~Variable,    ~Description,
  "trt",        "Chemotherapy Treatment",
  "age",        "Age at Enrollment (years)",
  "marker",     "Marker Level (ng/mL)",
  "stage",      "T Stage",
  "grade",      "Tumor Grade",
  "response",   "Tumor Response"
)

# 2. Set the recommended theme (once per session)
use_jama_theme()
#> Setting theme "Compact"
#> Applied JAMA compact theme to {gtsummary}

# 3. Create a clean, labeled table with one function call
trial |>
  select(age, marker, stage, grade, response, trt) |>
  tbl_summary(
    by = trt,
    missing = "no"
  ) |>
  add_auto_labels(dictionary = my_dictionary) |>
  extras()

	Overall N = 200¹	Drug A N = 98¹	Drug B N = 102¹	p-value²
Age	47 (38, 57)	46 (37, 60)	48 (39, 56)	0.718
Marker Level (ng/mL)	0.64 (0.22, 1.41)	0.84 (0.23, 1.60)	0.52 (0.18, 1.21)	0.085
T Stage				0.866
T1	53 (27%)	28 (29%)	25 (25%)
T2	54 (27%)	25 (26%)	29 (28%)
T3	43 (22%)	22 (22%)	21 (21%)
T4	50 (25%)	23 (23%)	27 (26%)
Grade				0.871
I	68 (34%)	35 (36%)	33 (32%)
II	68 (34%)	32 (33%)	36 (35%)
III	64 (32%)	31 (32%)	33 (32%)
Tumor Response	61 (32%)	28 (29%)	33 (34%)	0.530
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

That’s it! With just a few lines of code, you have a publication-ready table with automatic labeling, clean missing values, bold labels, an overall column, and p-values.

Next Steps

This vignette covered the essential functions to get you started quickly. For more advanced usage:

vignette("labeling") - Learn about the complete labeling system, including cross-package workflows with ggplot2, controlling label priority, working with pre-labeled data, and real-world analysis examples
vignette("styling") - Explore advanced styling techniques including group headers, background colors, text formatting, and creating visually polished tables

For detailed information about individual functions, see their help documentation:

?extras - Main styling function
?clean_table - Missing value standardization
?add_auto_labels - Automatic variable labeling
?use_jama_theme - Apply JAMA compact theme

The package is designed to reduce repetitive code while maintaining the flexibility of gtsummary’s modular approach. Use as much or as little as fits your workflow.