SimplifyStats provides a set of functions to simplify the process of 1) generating descriptive statistics for the numeric variables of multiple groups and 2) performing hypothesis testing between all combinations of groups.
The function group_summarize can be used to generate descriptive statistics for multiple groups based for unique combinations of the grouping variables.
library(SimplifyStats)
# Generate data.
set.seed(8)
df <- iris
# Modify df to demonstrate additional functionality.
## Add an NA.
df$Sepal.Length[1] <- NA
## Add another grouping variable.
df$Condition <- sample(c("untreated","treated"), nrow(df), replace = TRUE)
# Generate descriptive statistics.
group_summarize(
df,
group_cols = c("Species","Condition"),
var_cols = c("Sepal.Length","Sepal.Width"),
na.rm = TRUE
)
#> Pairwise comparisons were performed on:
#> Grouping variables: Species Condition
#> Variables of interest: Sepal.Length Sepal.Width
#>
#> $Sepal.Length
#> # A tibble: 6 x 16
#> Species Condition N Mean StdDev StdErr Min Quartile1 Median
#> <fct> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa untreated 25 5.00 0.375 0.0749 4.4 4.8 5
#> 2 setosa treated 24 5.00 0.343 0.0701 4.3 4.77 5
#> 3 versicolor untreated 26 6.00 0.562 0.110 4.9 5.6 5.8
#> 4 versicolor treated 24 5.87 0.465 0.0949 5 5.57 5.95
#> 5 virginica treated 27 6.77 0.634 0.122 5.8 6.3 6.5
#> 6 virginica untreated 23 6.38 0.583 0.122 4.9 6.05 6.4
#> # ... with 7 more variables: Quartile3 <dbl>, Max <dbl>, PropNA <dbl>,
#> # Kurtosis <dbl>, Skewness <dbl>, `Jarque-Bera_p.value` <dbl>,
#> # `Shapiro-Wilk_p.value` <dbl>
#>
#> $Sepal.Width
#> # A tibble: 6 x 16
#> Species Condition N Mean StdDev StdErr Min Quartile1 Median
#> <fct> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa untreated 26 3.38 0.399 0.0782 2.3 3.12 3.4
#> 2 setosa treated 24 3.48 0.359 0.0733 2.9 3.2 3.45
#> 3 versicolor untreated 26 2.81 0.290 0.0568 2.3 2.6 2.8
#> 4 versicolor treated 24 2.73 0.339 0.0693 2 2.48 2.8
#> 5 virginica treated 27 3.05 0.379 0.0729 2.2 2.8 3
#> 6 virginica untreated 23 2.89 0.218 0.0455 2.5 2.8 2.9
#> # ... with 7 more variables: Quartile3 <dbl>, Max <dbl>, PropNA <dbl>,
#> # Kurtosis <dbl>, Skewness <dbl>, `Jarque-Bera_p.value` <dbl>,
#> # `Shapiro-Wilk_p.value` <dbl>
# Generate descriptive statistics.
pairwise_stats(
df,
group_cols = c("Species","Condition"),
var_col = "Sepal.Length",
t.test
)
#> Pairwise comparisons were performed on:
#> Grouping variables: Species Condition
#> Variable of interest: Sepal.Length
#>
#> # A tibble: 15 x 14
#> A.Species A.Condition B.Species B.Condition estimate estimate1
#> <fct> <chr> <fct> <chr> <dbl> <dbl>
#> 1 setosa untreated setosa treated -0.000167 5.00
#> 2 setosa untreated versicolor untreated -0.992 5.00
#> 3 setosa untreated versicolor treated -0.867 5.00
#> 4 setosa untreated virginica treated -1.76 5.00
#> 5 setosa untreated virginica untreated -1.37 5.00
#> 6 setosa treated versicolor untreated -0.992 5.00
#> 7 setosa treated versicolor treated -0.867 5.00
#> 8 setosa treated virginica treated -1.76 5.00
#> 9 setosa treated virginica untreated -1.37 5.00
#> 10 versicolor untreated versicolor treated 0.125 6.00
#> 11 versicolor untreated virginica treated -0.771 6.00
#> 12 versicolor untreated virginica untreated -0.382 6.00
#> 13 versicolor treated virginica treated -0.896 5.87
#> 14 versicolor treated virginica untreated -0.507 5.87
#> 15 virginica treated virginica untreated 0.388 6.77
#> # ... with 8 more variables: estimate2 <dbl>, statistic <dbl>,
#> # p.value <dbl>, parameter <dbl>, conf.low <dbl>, conf.high <dbl>,
#> # method <fct>, alternative <fct>