The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

edar

R-CMD-check GitHub Release Lifecycle: experimental

The goal of edar is to provide some convenient functions for common tasks in exploratory data analysis.

Citation

Sou T (2025). edar: Convenient Functions for Exploratory Data Analysis. R package version 0.0.3.9000, https://github.com/soutomas/edar.

citation("edar")
#> To cite package 'edar' in publications use:
#> 
#>   Sou T (2025). _edar: Convenient Functions for Exploratory Data
#>   Analysis_. R package version 0.0.3.9000, <https://github.com/soutomas/edar>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {edar: Convenient Functions for Exploratory Data Analysis},
#>     author = {Tomas Sou},
#>     year = {2025},
#>     note = {R package version 0.0.3.9000},
#>     url = {https://github.com/soutomas/edar},
#>   }

Installation

You can install the development version of edar from GitHub with:

# install.packages("pak")
pak::pak("soutomas/edar")

Example

Commonly, we want to generate a quick summary of variables in a dataset.

library(edar)

# Data 
dat = mtcars |> dplyr::mutate(vs=factor(vs), am=factor(am))

# Summary for continuous variables in a data frame. 
dat |> summ_by()
#> Dropped: vs am
#> Adding missing grouping variables: `name`
#> # A tibble: 9 × 10
#>   name      n   nNA   Mean    Med      SD   Min    P25    P75    Max
#>   <chr> <int> <int>  <dbl>  <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1 carb     32     0   2.81   2      1.62   1      2      4      8   
#> 2 cyl      32     0   6.19   6      1.79   4      4      8      8   
#> 3 disp     32     0 231.   196.   124.    71.1  121.   326    472   
#> 4 drat     32     0   3.60   3.70   0.535  2.76   3.08   3.92   4.93
#> 5 gear     32     0   3.69   4      0.738  3      3      4      5   
#> 6 hp       32     0 147.   123     68.6   52     96.5  180    335   
#> 7 mpg      32     0  20.1   19.2    6.03  10.4   15.4   22.8   33.9 
#> 8 qsec     32     0  17.8   17.7    1.79  14.5   16.9   18.9   22.9 
#> 9 wt       32     0   3.22   3.32   0.978  1.51   2.58   3.61   5.42

# Summary of selected variable after grouping. 
dat |> summ_by("mpg",vs)
#> Adding missing grouping variables: `vs`
#> # A tibble: 2 × 10
#>   vs        n   nNA  Mean   Med    SD   Min   P25   P75   Max
#>   <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0        18     0  16.6  15.6  3.86  10.4  14.8  19.1  26  
#> 2 1        14     0  24.6  22.8  5.38  17.8  21.4  29.6  33.9
dat |> summ_by("mpg",vs,am)
#> Adding missing grouping variables: `vs`, `am`
#> # A tibble: 4 × 11
#> # Groups:   vs [2]
#>   vs    am        n   nNA  Mean   Med    SD   Min   P25   P75   Max
#>   <fct> <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0     0        12     0  15.0  15.2  2.77  10.4  14.0  16.6  19.2
#> 2 0     1         6     0  19.8  20.4  4.01  15    16.8  21    26  
#> 3 1     0         7     0  20.7  21.4  2.47  17.8  18.6  22.2  24.4
#> 4 1     1         7     0  28.4  30.4  4.76  21.4  25.0  31.4  33.9

# Summary for categorical variables in a data frame. 
dat |> summ_cat()
#> Dropped: mpg cyl disp hp drat wt qsec gear carb
#> $vs
#>     vs  n percent
#>      0 18  0.5625
#>      1 14  0.4375
#>  Total 32  1.0000
#> 
#> $am
#>     am  n percent
#>      0 19 0.59375
#>      1 13 0.40625
#>  Total 32 1.00000

# Summary for selected categorical variable. 
dat |> summ_cat("vs")
#> Dropped: mpg cyl disp hp drat wt qsec gear carb
#>     vs  n percent
#>      0 18  0.5625
#>      1 14  0.4375
#>  Total 32  1.0000

Results can be directly viewed in a flextable object easily.

# Show data frame in a flextable object. 
dat |> summ_by("mpg",vs) |> ft()

It is often helpful to add a label in the output indicating the source file.

# A label indicating the current source file can be easily generated. 
lab = label_src(1)
# A source label can be directly added to the flextable output. 
dat |> summ_cat("am") |> ft(src=1)
# A source label can be easily added to a ggplot object. 
library(ggplot2)
p = ggplot(mtcars, aes(mpg, wt)) + geom_point() 
p |> ggsrc()

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.