This document uses theme rmarkdown::html_vignette
.
Below are examples using recommended styles for Rmarkdown rendering. Available styles in summarytools are the same as pander
’s:
For freq()
, descr()
(and ctable()
, although with caveats), rmarkdown style is recommended. For dfSummary()
, grid is recommended.
knitr
option results = 'asis'
must be specified to get good results. This can be done globally via opts_chunk$set(results='asis')
, or in the individual chunks.
The following summarytools global options have been set:
#st_options('omit.headings', TRUE)
st_options('bootstrap.css', FALSE)
st_options('footnote', NA)
To generate tables using summarytool’s own html rendering, the .Rmd document’s configuration part (yaml) must point to the package’s summarytools.css
file. This can be achieved in several ways; the current vignette uses this configuration:
output:
rmarkdown::html_vignette:
css:
- !expr system.file("rmarkdown/templates/html_vignette/resources/vignette.css", package = "rmarkdown")
- !expr system.file("includes/stylesheets/summarytools.css", package = "summarytools")
An alternative is to point to the directory on your system containing summarytools.css:
---
title: "RMarkdown using summarytools"
output:
html_document:
css: C:/R/win-library/3.4/summarytools/includes/stylesheets/summarytools.css
---
Starting with freq()
, we’ll review the recommended methods and styles to get going with summarytools in Rmarkdown documents.
Jump to…
freq()
is best used with `style = ‘rmarkdown’; html rendering is also possible.
freq(tobacco$gender, style = 'rmarkdown')
Variable: tobacco$gender
Type: Factor (unordered)
Freq | % Valid | % Valid Cum. | % Total | % Total Cum. | |
---|---|---|---|---|---|
F | 489 | 50.00 | 50.00 | 48.90 | 48.90 |
M | 489 | 50.00 | 100.00 | 48.90 | 97.80 |
<NA> | 22 | 2.20 | 100.00 | ||
Total | 1000 | 100.00 | 100.00 | 100.00 | 100.00 |
print(freq(tobacco$gender), method = 'render')
Valid | Total | ||||
---|---|---|---|---|---|
gender | Freq | % | % Cumul | % | % Cumul |
F | 489 | 50.00 | 50.00 | 48.90 | 48.90 |
M | 489 | 50.00 | 100.00 | 48.90 | 97.80 |
<NA> | 22 | 2.20 | 100.00 | ||
Total | 1000 | 100.00 | 100.00 | 100.00 | 100.00 |
If you find the table too large, you can use table.classes = 'st-small'
- an example is provided further below.
Tables with heading spanning over 2 rows are not fully supported in markdown (yet), but the result is getting close to acceptable.
ctable(tobacco$gender, tobacco$smoker, style = 'rmarkdown')
Variables: gender * smoker
Data Frame: tobacco
smoker | Yes | No | Total | |
gender | ||||
F | 147 (30.06%) | 342 (69.94%) | 489 (100.00%) | |
M | 143 (29.24%) | 346 (70.76%) | 489 (100.00%) | |
<NA> | 8 (36.36%) | 14 (63.64%) | 22 (100.00%) | |
Total | 298 (29.80%) | 702 (70.20%) | 1000 (100.00%) |
For best results, use this method.
print(ctable(tobacco$gender, tobacco$smoker), method = 'render')
smoker | |||
---|---|---|---|
gender | Yes | No | Total |
F | 147 (30.06%) | 342 (69.94%) | 489 (100.00%) |
M | 143 (29.24%) | 346 (70.76%) | 489 (100.00%) |
<NA> | 8 (36.36%) | 14 (63.64%) | 22 (100.00%) |
Total | 298 (29.80%) | 702 (70.20%) | 1000 (100.00%) |
descr()
is also best used with style = 'rmarkdown'
, and HTML rendering is also supported.
descr(tobacco, style = 'rmarkdown')
Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease
Data Frame: tobacco
N: 1000
age | BMI | cigs.per.day | samp.wgts | |
---|---|---|---|---|
Mean | 49.60 | 25.73 | 6.78 | 1.00 |
Std.Dev | 18.29 | 4.49 | 11.88 | 0.08 |
Min | 18.00 | 8.83 | 0.00 | 0.86 |
Q1 | 34.00 | 22.93 | 0.00 | 0.86 |
Median | 50.00 | 25.62 | 0.00 | 1.04 |
Q3 | 66.00 | 28.65 | 11.00 | 1.05 |
Max | 80.00 | 39.44 | 40.00 | 1.06 |
MAD | 23.72 | 4.18 | 0.00 | 0.01 |
IQR | 32.00 | 5.72 | 11.00 | 0.19 |
CV | 2.71 | 5.73 | 0.57 | 11.92 |
Skewness | -0.04 | 0.02 | 1.54 | -1.04 |
SE.Skewness | 0.08 | 0.08 | 0.08 | 0.08 |
Kurtosis | -1.26 | 0.26 | 0.90 | -0.90 |
N.Valid | 975.00 | 974.00 | 965.00 | 1000.00 |
Pct.Valid | 97.50 | 97.40 | 96.50 | 100.00 |
We’ll use table.classes = ‘st-small’ to show how it affects the table’s size (compare to the freq()
table rendered earlier).
print(descr(tobacco), method = 'render', table.classes = 'st-small')
Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease
age | BMI | cigs.per.day | samp.wgts | |
---|---|---|---|---|
Mean | 49.60 | 25.73 | 6.78 | 1.00 |
Std.Dev | 18.29 | 4.49 | 11.88 | 0.08 |
Min | 18.00 | 8.83 | 0.00 | 0.86 |
Q1 | 34.00 | 22.93 | 0.00 | 0.86 |
Median | 50.00 | 25.62 | 0.00 | 1.04 |
Q3 | 66.00 | 28.65 | 11.00 | 1.05 |
Max | 80.00 | 39.44 | 40.00 | 1.06 |
MAD | 23.72 | 4.18 | 0.00 | 0.01 |
IQR | 32.00 | 5.72 | 11.00 | 0.19 |
CV | 2.71 | 5.73 | 0.57 | 11.92 |
Skewness | -0.04 | 0.02 | 1.54 | -1.04 |
SE.Skewness | 0.08 | 0.08 | 0.08 | 0.08 |
Kurtosis | -1.26 | 0.26 | 0.90 | -0.90 |
N.Valid | 975 | 974 | 965 | 1000 |
Pct.Valid | 97.50 | 97.40 | 96.50 | 100.00 |
This gives good results, although the histograms are not shown. This has to do with an unresolved issue, but we’re working hard to figure out a solution. Don’t forget to specify plain.ascii = FALSE
, or you won’t get good results.
dfSummary(tobacco, style = 'grid', plain.ascii = FALSE)
tobacco
N: 1000
No | Variable | Stats / Values | Freqs (% of Valid) | Text Graph | Valid | Missing |
---|---|---|---|---|---|---|
1 |
gender |
1. F |
489 (50.0%) |
IIIIIIIIIIIIIIII |
978 |
22 |
2 |
age |
mean (sd) : 49.6 (18.29) |
63 distinct val. |
975 |
25 |
|
3 |
age.gr |
1. 18-34 |
258 (26.5%) |
IIIIIIIIIIIII |
975 |
25 |
4 |
BMI |
mean (sd) : 25.73 (4.49) |
974 distinct val. |
974 |
26 |
|
5 |
smoker |
1. Yes |
298 (29.8%) |
IIIIII |
1000 |
0 |
6 |
cigs.per.day |
mean (sd) : 6.78 (11.88) |
37 distinct val. |
965 |
35 |
|
7 |
diseased |
1. Yes |
224 (22.4%) |
IIII |
1000 |
0 |
8 |
disease |
1. Hypertension |
36 (16.2%) |
IIIIIIIIIIIIIIII |
222 |
778 |
9 |
samp.wgts |
mean (sd) : 1 (0.08) |
0.86!: 267 (26.7%) |
IIIIIIIIIIIII IIIIIIIIIIII IIIIIIIIIIIIIIII IIIIIII |
1000 |
0 |
Although the results are not as neat as they are when simply generating an html report from the R interpreter – the transparency of the graphs is lost in translation –, this is the best method still.
print(dfSummary(tobacco, graph.magnif = 0.75), method = 'render')
No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing |
---|---|---|---|---|---|---|
1 | gender [factor] | 1. F 2. M | 489 (50.0%) 489 (50.0%) | 978 (97.8%) | 22 (2.2%) | |
2 | age [numeric] | mean (sd) : 49.6 (18.29) min < med < max : 18 < 50 < 80 IQR (CV) : 32 (0.37) | 63 distinct val. | 975 (97.5%) | 25 (2.5%) | |
3 | age.gr [factor] | 1. 18-34 2. 35-50 3. 51-70 4. 71 + | 258 (26.5%) 241 (24.7%) 317 (32.5%) 159 (16.3%) | 975 (97.5%) | 25 (2.5%) | |
4 | BMI [numeric] | mean (sd) : 25.73 (4.49) min < med < max : 8.83 < 25.62 < 39.44 IQR (CV) : 5.72 (0.17) | 974 distinct val. | 974 (97.4%) | 26 (2.6%) | |
5 | smoker [factor] | 1. Yes 2. No | 298 (29.8%) 702 (70.2%) | 1000 (100%) | 0 (0%) | |
6 | cigs.per.day [numeric] | mean (sd) : 6.78 (11.88) min < med < max : 0 < 0 < 40 IQR (CV) : 11 (1.75) | 37 distinct val. | 965 (96.5%) | 35 (3.5%) | |
7 | diseased [factor] | 1. Yes 2. No | 224 (22.4%) 776 (77.6%) | 1000 (100%) | 0 (0%) | |
8 | disease [character] | 1. Hypertension 2. Cancer 3. Cholesterol 4. Heart 5. Pulmonary 6. Musculoskeletal 7. Diabetes 8. Hearing 9. Digestive 10. Hypotension [ 3 others ] | 36 (16.2%) 34 (15.3%) 21 (9.5%) 20 (9.0%) 20 (9.0%) 19 (8.6%) 14 (6.3%) 14 (6.3%) 12 (5.4%) 11 (5.0%) 21 (9.4%) | 222 (22.2%) | 778 (77.8%) | |
9 | samp.wgts [numeric] | mean (sd) : 1 (0.08) min < med < max : 0.86 < 1.04 < 1.06 IQR (CV) : 0.19 (0.08) | 0.86! : 267 (26.7%) 1.04! : 249 (24.9%) 1.05! : 324 (32.4%) 1.06! : 160 (16.0%) ! rounded | 1000 (100%) | 0 (0%) |