SimplifyStats Vignette

Zachary Colburn

2018-07-01

SimplifyStats provides a set of functions to simplify the process of 1) generating descriptive statistics for the numeric variables of multiple groups and 2) performing hypothesis testing between all combinations of groups.

Generate group-wise descriptive statistics

The function group_summarize can be used to generate descriptive statistics for multiple groups based for unique combinations of the grouping variables.

library(SimplifyStats)

# Generate data.
set.seed(8)
df <- iris

# Modify df to demonstrate additional functionality.
## Add an NA.
df$Sepal.Length[1] <- NA
## Add another grouping variable.
df$Condition <- sample(c("untreated","treated"), nrow(df), replace = TRUE)

# Generate descriptive statistics.
group_summarize(
  df, 
  group_cols = c("Species","Condition"), 
  var_cols = c("Sepal.Length","Sepal.Width"), 
  na.rm = TRUE
)
#> Pairwise comparisons were performed on:
#>   Grouping variables: Species Condition 
#>   Variables of interest: Sepal.Length Sepal.Width 
#> 
#> $Sepal.Length
#> # A tibble: 6 x 16
#>   Species    Condition     N  Mean StdDev StdErr   Min Quartile1 Median
#>   <fct>      <chr>     <int> <dbl>  <dbl>  <dbl> <dbl>     <dbl>  <dbl>
#> 1 setosa     untreated    25  5.00  0.375 0.0749   4.4      4.8    5   
#> 2 setosa     treated      24  5.00  0.343 0.0701   4.3      4.77   5   
#> 3 versicolor untreated    26  6.00  0.562 0.110    4.9      5.6    5.8 
#> 4 versicolor treated      24  5.87  0.465 0.0949   5        5.57   5.95
#> 5 virginica  treated      27  6.77  0.634 0.122    5.8      6.3    6.5 
#> 6 virginica  untreated    23  6.38  0.583 0.122    4.9      6.05   6.4 
#> # ... with 7 more variables: Quartile3 <dbl>, Max <dbl>, PropNA <dbl>,
#> #   Kurtosis <dbl>, Skewness <dbl>, `Jarque-Bera_p.value` <dbl>,
#> #   `Shapiro-Wilk_p.value` <dbl>
#> 
#> $Sepal.Width
#> # A tibble: 6 x 16
#>   Species    Condition     N  Mean StdDev StdErr   Min Quartile1 Median
#>   <fct>      <chr>     <int> <dbl>  <dbl>  <dbl> <dbl>     <dbl>  <dbl>
#> 1 setosa     untreated    26  3.38  0.399 0.0782   2.3      3.12   3.4 
#> 2 setosa     treated      24  3.48  0.359 0.0733   2.9      3.2    3.45
#> 3 versicolor untreated    26  2.81  0.290 0.0568   2.3      2.6    2.8 
#> 4 versicolor treated      24  2.73  0.339 0.0693   2        2.48   2.8 
#> 5 virginica  treated      27  3.05  0.379 0.0729   2.2      2.8    3   
#> 6 virginica  untreated    23  2.89  0.218 0.0455   2.5      2.8    2.9 
#> # ... with 7 more variables: Quartile3 <dbl>, Max <dbl>, PropNA <dbl>,
#> #   Kurtosis <dbl>, Skewness <dbl>, `Jarque-Bera_p.value` <dbl>,
#> #   `Shapiro-Wilk_p.value` <dbl>

Perform pair-wise hypothesis testing

# Generate descriptive statistics.
pairwise_stats(
  df, 
  group_cols = c("Species","Condition"), 
  var_col = "Sepal.Length", 
  t.test
)
#> Pairwise comparisons were performed on:
#>   Grouping variables: Species Condition 
#>   Variable of interest:  Sepal.Length 
#> 
#> # A tibble: 15 x 14
#>    A.Species  A.Condition B.Species  B.Condition  estimate estimate1
#>    <fct>      <chr>       <fct>      <chr>           <dbl>     <dbl>
#>  1 setosa     untreated   setosa     treated     -0.000167      5.00
#>  2 setosa     untreated   versicolor untreated   -0.992         5.00
#>  3 setosa     untreated   versicolor treated     -0.867         5.00
#>  4 setosa     untreated   virginica  treated     -1.76          5.00
#>  5 setosa     untreated   virginica  untreated   -1.37          5.00
#>  6 setosa     treated     versicolor untreated   -0.992         5.00
#>  7 setosa     treated     versicolor treated     -0.867         5.00
#>  8 setosa     treated     virginica  treated     -1.76          5.00
#>  9 setosa     treated     virginica  untreated   -1.37          5.00
#> 10 versicolor untreated   versicolor treated      0.125         6.00
#> 11 versicolor untreated   virginica  treated     -0.771         6.00
#> 12 versicolor untreated   virginica  untreated   -0.382         6.00
#> 13 versicolor treated     virginica  treated     -0.896         5.87
#> 14 versicolor treated     virginica  untreated   -0.507         5.87
#> 15 virginica  treated     virginica  untreated    0.388         6.77
#> # ... with 8 more variables: estimate2 <dbl>, statistic <dbl>,
#> #   p.value <dbl>, parameter <dbl>, conf.low <dbl>, conf.high <dbl>,
#> #   method <fct>, alternative <fct>