The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Arguably one of the most popular features of GraphPad Prism is adding p-values to plots. Indeed in Prism 9, GraphPad have added a feature to automatically perform pairwise comparisons and add the resulting p-values with brackets to the graph.
ggprism
includes the add_pvalue()
function
to add p-values with or without brackets to ggplots. This vignette will
go through the many ways in which this function can be used.
This function is a re-written version of
stat_pvalue_manual()
from the ggpubr
package, which itself is based on the geom_signif()
function from the ggsignif
package. Compared to stat_pvalue_manual()
, the
add_pvalue()
function is: easier to use, more robust with
less dependencies, and has more customisable brackets.
To add significance brackets to a plot, you need a minimal data.frame with 4 columns and a number of rows corresponding to the number of brackets you want to add. The 4 columns should correspond to these 4 function arguments:
"group1"
)"group2"
)"label"
)"y.position"
)For grouped or faceted data you’ll also need a column which is named according to the grouping variable. See the Many more examples section for help with this/examples.
Let’s see how this works in practice. First we’ll plot the
sleep
data set.
str(sleep)
#> 'data.frame': 20 obs. of 3 variables:
#> $ extra: num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 ...
#> $ group: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
#> $ ID : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
# create a jitter plot of the sleep data set
# and indicate the means
p <- ggplot(sleep, aes(x = group, y = extra)) +
geom_jitter(aes(shape = group), width = 0.1) +
stat_summary(geom = "crossbar", fun = mean, colour = "red", width = 0.2) +
theme_prism() +
theme(legend.position = "none")
p
Next we’ll perform a t-test and obtain a p-value for the difference between the two means.
# perform a t-test and obtain the p-value
result <- t.test(extra ~ group, data = sleep)$p.value
result <- signif(result, digits = 3)
result
#> [1] 0.0794
Now we’ll construct a p-value data.frame for
add_pvalue()
to use.
And finally we’ll add this p-value to our plot. Because we have used
the default column names (see above) in our p-value table we don’t
necessarily have to specify any arguments of add_pvalue()
.
However, here we’ll do it for clarity’s sake. Additionally, if your
p-value table has special column names, you will need to specify them in
add_pvalue()
.
# add p-value brackets
p1 <- p + add_pvalue(df_p_val,
xmin = "group1",
xmax = "group2",
label = "label",
y.position = "y.position")
# change column names to something silly
colnames(df_p_val) <- c("apple", "banana", "some_label", "some_y_position")
# add p-value brackets again
p2 <- p + add_pvalue(df_p_val,
xmin = "apple",
xmax = "banana",
label = "some_label",
y.position = "some_y_position")
p1 + p2
# return column names back to default
colnames(df_p_val) <- c("group1", "group2", "label", "y.position")
You can easily change how the bracket and label looks. You can make
the label a glue
expression. You can also change the tip
length of the bracket. Lastly, you can flip the label when using
coord_flip()
.
# change bracket and label aesthetics
p1 <- p + add_pvalue(df_p_val,
colour = "red", # label
label.size = 8, # label
fontface = "bold", # label
fontfamily = "serif", # label
angle = 45, # label
hjust = 1, # label
vjust = 2, # label
bracket.colour = "blue", # bracket
bracket.size = 1, # bracket
linetype = "dashed", # bracket
lineend = "round") # bracket
# use glue expression for label
p2 <- p + add_pvalue(df_p_val, label = "p = {label}")
# make bracket tips longer and use coord_flip
p3 <- p + add_pvalue(df_p_val, tip.length = 0.15, coord.flip = TRUE) +
coord_flip()
# change bracket tips independently
# (make one side disappear and the other longer)
p4 <- p + add_pvalue(df_p_val, tip.length = c(0.2, 0))
(p1 + p2) / (p3 + p4)
Even if you don’t want brackets, add_pvalue()
is also
useful for adding significance text to plots with the correct/automatic
positioning.
In the example above, if you wanted the text but not the bracket, you
can just use the remove.bracket
argument. In this case, you
must use the x
argument to change the x position of the
text.
# position label above "group1"
p1 <- p + add_pvalue(df_p_val, label = "p = {label}",
remove.bracket = TRUE, x = 1)
# position label between x = 1 and x = 2
p2 <- p + add_pvalue(df_p_val, label = "p = {label}",
remove.bracket = TRUE, x = 1.5)
p1 + p2
Here is another example of ‘text only’ plot using the
ToothGrowth
data set.
str(ToothGrowth)
#> 'data.frame': 60 obs. of 3 variables:
#> $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
#> $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
#> $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# create a box plot of the ToothGrowth data set
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_boxplot(aes(fill = dose), colour = "black") +
theme_prism() +
theme(legend.position = "none")
p
Next we’ll perform two t-tests and compare the means against
dose = 0.5
as a reference group. Then we’ll correct the
p-values for multiple testing.
# compare means again reference
result1 <- t.test(len ~ dose,
data = subset(ToothGrowth, dose %in% c(0.5, 1.0)))$p.value
result2 <- t.test(len ~ dose,
data = subset(ToothGrowth, dose %in% c(0.5, 2.0)))$p.value
# Benjamini-Hochberg correction for multiple testing
result <- p.adjust(c(result1, result2), method = "BH")
We can now construct a p-value table. Note that in this case we don’t
need to to specify a "group2"
column for xmax. This is
because text-only p-value annotations just have an x position
(x
) and not an x range (xmin
and
xmax
).
# don't need group2 column (i.e. xmax)
# instead just specify x position in the same way as y.position
df_p_val <- data.frame(
group1 = c(0.5, 0.5),
group2 = c(1, 2),
x = c(2, 3),
label = signif(result, digits = 3),
y.position = c(35, 35)
)
Then we add the p-values to the plot. As before, you can change how the labels look quite easily.
p1 <- p + add_pvalue(df_p_val,
xmin = "group1",
x = "x",
label = "label",
y.position = "y.position")
p2 <- p + add_pvalue(df_p_val,
xmin = "group1",
x = "x",
label = "p = {label}",
y.position = "y.position",
label.size = 3.2,
fontface = "bold")
p1 + p2
If you want the label number format to look nicer you can provide a
column name to label
with plotmath expressions and
set parse = TRUE
. This works with or without brackets.
# plotmath expression to have superscript exponent
df_p_val$p.exprs <- paste0("P==1*x*10^", round(log10(df_p_val$label), 0))
# as above but with italics
df_p_val$p.exprs.ital <- lapply(
paste(round(log10(df_p_val$label), 0)),
function(x) bquote(italic("P = 1x10"^.(x)))
)
p1 <- p + add_pvalue(df_p_val,
xmin = "group1",
x = "x",
label = "p.exprs",
y.position = "y.position",
parse = TRUE)
p2 <- p + add_pvalue(df_p_val,
xmin = "group1",
x = "x",
label = "p.exprs.ital",
y.position = "y.position",
parse = TRUE)
p1 + p2
rstatix
packageAs add_pvalue()
is ultimately just a rewritten version
of stat_pvalue_manual()
, it works well with the rstatix
package.
With rstatix
,
you can perform the statistical test and create the p-value table with
the appropriate x and y position automatically, in a single step.
Here we will use add_pvalue()
and rstatix
to
show many more examples of how to add p-values to different plots.
Compare mean len
depending on supp
. Error
bars indicate 1 standard deviation from the mean.
df_p_val <- rstatix::t_test(ToothGrowth, len ~ supp) %>%
rstatix::add_x_position()
p <- ggplot(ToothGrowth, aes(x = factor(supp), y = len)) +
stat_summary(geom = "col", fun = mean) +
stat_summary(geom = "errorbar",
fun = mean,
fun.min = function(x) mean(x) - sd(x),
fun.max = function(x) mean(x) + sd(x),
width = 0.3) +
theme_prism() +
coord_cartesian(ylim = c(0, 35)) +
scale_y_continuous(breaks = seq(0, 35, 5), expand = c(0, 0))
# normal plot
p + add_pvalue(df_p_val, y.position = 30)
Compare mean len
of each dose to
dose = 0.5
. Error bars indicate 1 standard deviation from
the mean.
df_p_val <- rstatix::t_test(ToothGrowth, len ~ dose, ref.group = "0.5") %>%
rstatix::add_xy_position()
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_summary(geom = "col", fun = mean) +
stat_summary(geom = "errorbar",
fun = mean,
fun.min = function(x) mean(x) - sd(x),
fun.max = function(x) mean(x) + sd(x),
width = 0.3) +
theme_prism() +
coord_cartesian(ylim = c(0, 40)) +
scale_y_continuous(breaks = seq(0, 40, 5), expand = c(0, 0))
# with brackets
p1 <- p + add_pvalue(df_p_val, label = "p.adj.signif")
# without brackets
p2 <- p + add_pvalue(df_p_val, label = "p.adj.signif", remove.bracket = TRUE)
p1 + p2
Now, compare overall mean len
(base mean) to the mean
len
for each dose
. Error bars indicate 1
standard deviation from the mean.
df_p_val <- rstatix::t_test(ToothGrowth, len ~ dose, ref.group = "all")
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_summary(geom = "col", fun = mean) +
stat_summary(geom = "errorbar",
fun = mean,
fun.min = function(x) mean(x) - sd(x),
fun.max = function(x) mean(x) + sd(x),
width = 0.3) +
theme_prism() +
coord_cartesian(ylim = c(0, 40)) +
scale_y_continuous(breaks = seq(0, 40, 5), expand = c(0, 0))
p + add_pvalue(df_p_val,
label = "p.adj.signif",
y.position = 35)
Now, compare the mean len
for each dose
to
some arbitrary value, say 26
in this case. Error bars
indicate 1 standard deviation from the mean.
df_p_val <- ToothGrowth %>%
rstatix::group_by(factor(dose)) %>%
rstatix::t_test(len ~ 1, mu = 26) %>%
rstatix::adjust_pvalue(p.col = "p", method = "holm") %>%
rstatix::add_significance(p.col = "p.adj")
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_summary(geom = "col", fun = mean) +
stat_summary(geom = "errorbar",
fun = mean,
fun.min = function(x) mean(x) - sd(x),
fun.max = function(x) mean(x) + sd(x),
width = 0.3) +
theme_prism() +
coord_cartesian(ylim = c(0, 40)) +
scale_y_continuous(breaks = seq(0, 40, 5), expand = c(0, 0))
# remember xmin and x are referring to the column dames in df_p_val
p + add_pvalue(df_p_val,
xmin = "group1",
x = "factor(dose)",
y = 37,
label = "p.adj.signif")
Compare mean len
across the 3 different
dose
. Use the bracket.shorten
argument to
slightly shorten side-by-side brackets.
df_p_val <- rstatix::t_test(ToothGrowth, len ~ dose)
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width = 0.2) +
theme_prism() +
coord_cartesian(ylim = c(0, 45)) +
scale_y_continuous(breaks = seq(0, 45, 5), expand = c(0, 0))
p + add_pvalue(df_p_val,
y.position = c(44, 41, 44),
bracket.shorten = c(0.025, 0, 0.025))
Pairwise comparisons between groups of the
ToothGrowth
data set, grouped according to
supp
. The boxplots and the brackets are automatically
coloured according to supp
. Three important points for this
graph:
supp
) and
the column must contain the groups to group by (in this case
"OJ" or "VC"
.geom_boxplot()
) and
not in the ggplot()
function.step.group.by = "supp"
to automatically
change the bracket spacing between different groups.df_p_val <- ToothGrowth %>%
rstatix::group_by(supp) %>%
rstatix::t_test(len ~ dose) %>%
rstatix::add_xy_position()
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_boxplot(aes(fill = supp)) +
theme_prism()
# remember colour and step.group.by are referring to a column name in df_p_val
p + add_pvalue(df_p_val,
label = "p = {p.adj}",
colour = "supp",
fontface = "bold",
step.group.by = "supp",
step.increase = 0.1,
tip.length = 0,
bracket.colour = "black",
show.legend = FALSE)
Pairwise comparisons within groups of the
ToothGrowth
data set, grouped according to
supp
.
df_p_val <- ToothGrowth %>%
rstatix::group_by(dose) %>%
rstatix::t_test(len ~ supp) %>%
rstatix::adjust_pvalue(p.col = "p", method = "bonferroni") %>%
rstatix::add_significance(p.col = "p.adj") %>%
rstatix::add_xy_position(x = "dose", dodge = 0.8) # important for positioning!
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_boxplot(aes(fill = supp)) +
theme_prism() +
coord_cartesian(ylim = c(0, 40))
p + add_pvalue(df_p_val,
xmin = "xmin",
xmax = "xmax",
label = "p = {p.adj}",
tip.length = 0)
Pairwise comparisons within groups and between
groups of the ToothGrowth
data set, grouped according
to supp
. You can use bracket.nudge.y
to
slightly adjust the overall y position of the brackets instead of having
to redefine df_p_val2
.
df_p_val1 <- ToothGrowth %>%
rstatix::group_by(dose) %>%
rstatix::t_test(len ~ supp) %>%
rstatix::adjust_pvalue(p.col = "p", method = "bonferroni") %>%
rstatix::add_significance(p.col = "p.adj") %>%
rstatix::add_xy_position(x = "dose", dodge = 0.8) # important for positioning!
df_p_val2 <- rstatix::t_test(ToothGrowth, len ~ dose,
p.adjust.method = "bonferroni") %>%
rstatix::add_xy_position()
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_boxplot(aes(fill = supp)) +
theme_prism() +
coord_cartesian(ylim = c(0, 45))
p + add_pvalue(df_p_val1,
xmin = "xmin",
xmax = "xmax",
label = "p = {p.adj}",
tip.length = 0) +
add_pvalue(df_p_val2,
label = "p = {p.adj}",
tip.length = 0.01,
bracket.nudge.y = 2,
step.increase = 0.015)
Facet according to dose
and then compare mean
len
between either supp
. It is important that
the p-value table must have a column with the same name
as the faceting variable (in this case "dose"
).
df_p_val <- ToothGrowth %>%
rstatix::group_by(dose) %>%
rstatix::t_test(len ~ supp) %>%
rstatix::add_xy_position()
p <- ggplot(ToothGrowth, aes(x = factor(supp), y = len)) +
geom_boxplot(width = 0.2) +
facet_wrap(
~ dose, scales = "free",
labeller = labeller(dose = function(x) paste("dose =", x))
) +
theme_prism()
p + add_pvalue(df_p_val)
Facet according to supp
and then compare mean
len
between the three dose
. It is important
that the p-value table must have a column with the same
name as the faceting variable (in this case "supp"
).
df_p_val <- ToothGrowth %>%
rstatix::group_by(supp) %>%
rstatix::t_test(len ~ dose) %>%
rstatix::add_xy_position()
p <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_boxplot(width = 0.4) +
facet_wrap(~ supp, scales = "free") +
theme_prism()
p + add_pvalue(df_p_val)
Facet according to some particular grouping variable called
grp
and dose
, and then compare mean
len
between either supp
. It is important that
the p-value table must have columns with the same names
as the two faceting variables (in this case "grp"
and
"dose"
).
# add a grouping variable to ToothGrowth
tg <- ToothGrowth
tg$dose <- factor(tg$dose)
tg$grp <- factor(rep(c("grp1", "grp2"), 30))
# construct the p-value table by hand
df_p_val <- data.frame(
group1 = c("OJ", "OJ"),
group2 = c("VC", "VC"),
p.adj = c(0.0449, 0.00265),
y.position = c(22, 27),
grp = c("grp1", "grp2"),
dose = c("0.5", "1")
)
p <- ggplot(tg, aes(x = factor(supp), y = len)) +
geom_boxplot(width = 0.4) +
facet_wrap(grp ~ dose, scales = "free") +
theme_prism()
p + add_pvalue(df_p_val, bracket.nudge.y = 3)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.