The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this document, we will introduce you to functions for exploring and visualizing categorical data.
We have modified the mtcars
data to create a new data
set mtcarz
. The only difference between the two data sets
is related to the variable types.
str(mtcarz)
#> 'data.frame': 32 obs. of 11 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
#> $ disp: num 160 160 108 258 360 ...
#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec: num 16.5 17 18.6 19.4 17 ...
#> $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
#> $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
#> $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
#> $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
The ds_cross_table()
function creates two way tables of
categorical variables.
ds_cross_table(mtcarz, cyl, gear)
#> Cell Contents
#> |---------------|
#> | Frequency |
#> | Percent |
#> | Row Pct |
#> | Col Pct |
#> |---------------|
#>
#> Total Observations: 32
#>
#> ----------------------------------------------------------------------------
#> | | gear |
#> ----------------------------------------------------------------------------
#> | cyl | 3 | 4 | 5 | Row Total |
#> ----------------------------------------------------------------------------
#> | 4 | 1 | 8 | 2 | 11 |
#> | | 0.031 | 0.25 | 0.062 | |
#> | | 0.09 | 0.73 | 0.18 | 0.34 |
#> | | 0.07 | 0.67 | 0.4 | |
#> ----------------------------------------------------------------------------
#> | 6 | 2 | 4 | 1 | 7 |
#> | | 0.062 | 0.125 | 0.031 | |
#> | | 0.29 | 0.57 | 0.14 | 0.22 |
#> | | 0.13 | 0.33 | 0.2 | |
#> ----------------------------------------------------------------------------
#> | 8 | 12 | 0 | 2 | 14 |
#> | | 0.375 | 0 | 0.062 | |
#> | | 0.86 | 0 | 0.14 | 0.44 |
#> | | 0.8 | 0 | 0.4 | |
#> ----------------------------------------------------------------------------
#> | Column Total | 15 | 12 | 5 | 32 |
#> | | 0.468 | 0.375 | 0.155 | |
#> ----------------------------------------------------------------------------
If you want the above result as a tibble, use
ds_twoway_table()
.
ds_twoway_table(mtcarz, cyl, gear)
#> Joining with `by = join_by(cyl, gear, count)`
#> # A tibble: 8 × 6
#> cyl gear count percent row_percent col_percent
#> <fct> <fct> <int> <dbl> <dbl> <dbl>
#> 1 4 3 1 0.0312 0.0909 0.0667
#> 2 4 4 8 0.25 0.727 0.667
#> 3 4 5 2 0.0625 0.182 0.4
#> 4 6 3 2 0.0625 0.286 0.133
#> 5 6 4 4 0.125 0.571 0.333
#> 6 6 5 1 0.0312 0.143 0.2
#> 7 8 3 12 0.375 0.857 0.8
#> 8 8 5 2 0.0625 0.143 0.4
A plot()
method has been defined which will
generate:
The ds_freq_table()
function creates frequency
tables.
ds_freq_table(mtcarz, cyl)
#> Variable: cyl
#> -----------------------------------------------------------------------
#> Levels Frequency Cum Frequency Percent Cum Percent
#> -----------------------------------------------------------------------
#> 4 11 11 34.38 34.38
#> -----------------------------------------------------------------------
#> 6 7 18 21.88 56.25
#> -----------------------------------------------------------------------
#> 8 14 32 43.75 100
#> -----------------------------------------------------------------------
#> Total 32 - 100.00 -
#> -----------------------------------------------------------------------
A plot()
method has been defined which will create a bar
plot.
The ds_auto_freq_table()
function creates multiple one
way tables by creating a frequency table for each categorical variable
in a data set. You can also specify a subset of variables if you do not
want all the variables in the data set to be used.
ds_auto_freq_table(mtcarz)
#> Variable: cyl
#> -----------------------------------------------------------------------
#> Levels Frequency Cum Frequency Percent Cum Percent
#> -----------------------------------------------------------------------
#> 4 11 11 34.38 34.38
#> -----------------------------------------------------------------------
#> 6 7 18 21.88 56.25
#> -----------------------------------------------------------------------
#> 8 14 32 43.75 100
#> -----------------------------------------------------------------------
#> Total 32 - 100.00 -
#> -----------------------------------------------------------------------
#>
#> Variable: vs
#> -----------------------------------------------------------------------
#> Levels Frequency Cum Frequency Percent Cum Percent
#> -----------------------------------------------------------------------
#> 0 18 18 56.25 56.25
#> -----------------------------------------------------------------------
#> 1 14 32 43.75 100
#> -----------------------------------------------------------------------
#> Total 32 - 100.00 -
#> -----------------------------------------------------------------------
#>
#> Variable: am
#> -----------------------------------------------------------------------
#> Levels Frequency Cum Frequency Percent Cum Percent
#> -----------------------------------------------------------------------
#> 0 19 19 59.38 59.38
#> -----------------------------------------------------------------------
#> 1 13 32 40.62 100
#> -----------------------------------------------------------------------
#> Total 32 - 100.00 -
#> -----------------------------------------------------------------------
#>
#> Variable: gear
#> -----------------------------------------------------------------------
#> Levels Frequency Cum Frequency Percent Cum Percent
#> -----------------------------------------------------------------------
#> 3 15 15 46.88 46.88
#> -----------------------------------------------------------------------
#> 4 12 27 37.5 84.38
#> -----------------------------------------------------------------------
#> 5 5 32 15.62 100
#> -----------------------------------------------------------------------
#> Total 32 - 100.00 -
#> -----------------------------------------------------------------------
#>
#> Variable: carb
#> -----------------------------------------------------------------------
#> Levels Frequency Cum Frequency Percent Cum Percent
#> -----------------------------------------------------------------------
#> 1 7 7 21.88 21.88
#> -----------------------------------------------------------------------
#> 2 10 17 31.25 53.12
#> -----------------------------------------------------------------------
#> 3 3 20 9.38 62.5
#> -----------------------------------------------------------------------
#> 4 10 30 31.25 93.75
#> -----------------------------------------------------------------------
#> 6 1 31 3.12 96.88
#> -----------------------------------------------------------------------
#> 8 1 32 3.12 100
#> -----------------------------------------------------------------------
#> Total 32 - 100.00 -
#> -----------------------------------------------------------------------
The ds_auto_cross_table()
function creates multiple two
way tables by creating a cross table for each unique pair of categorical
variables in a data set. You can also specify a subset of variables if
you do not want all the variables in the data set to be used.
ds_auto_cross_table(mtcarz, cyl, gear, am)
#> Cell Contents
#> |---------------|
#> | Frequency |
#> | Percent |
#> | Row Pct |
#> | Col Pct |
#> |---------------|
#>
#> Total Observations: 32
#>
#> cyl vs gear
#> ----------------------------------------------------------------------------
#> | | gear |
#> ----------------------------------------------------------------------------
#> | cyl | 3 | 4 | 5 | Row Total |
#> ----------------------------------------------------------------------------
#> | 4 | 1 | 8 | 2 | 11 |
#> | | 0.031 | 0.25 | 0.062 | |
#> | | 0.09 | 0.73 | 0.18 | 0.34 |
#> | | 0.07 | 0.67 | 0.4 | |
#> ----------------------------------------------------------------------------
#> | 6 | 2 | 4 | 1 | 7 |
#> | | 0.062 | 0.125 | 0.031 | |
#> | | 0.29 | 0.57 | 0.14 | 0.22 |
#> | | 0.13 | 0.33 | 0.2 | |
#> ----------------------------------------------------------------------------
#> | 8 | 12 | 0 | 2 | 14 |
#> | | 0.375 | 0 | 0.062 | |
#> | | 0.86 | 0 | 0.14 | 0.44 |
#> | | 0.8 | 0 | 0.4 | |
#> ----------------------------------------------------------------------------
#> | Column Total | 15 | 12 | 5 | 32 |
#> | | 0.468 | 0.375 | 0.155 | |
#> ----------------------------------------------------------------------------
#>
#>
#> cyl vs am
#> -------------------------------------------------------------
#> | | am |
#> -------------------------------------------------------------
#> | cyl | 0 | 1 | Row Total |
#> -------------------------------------------------------------
#> | 4 | 3 | 8 | 11 |
#> | | 0.094 | 0.25 | |
#> | | 0.27 | 0.73 | 0.34 |
#> | | 0.16 | 0.62 | |
#> -------------------------------------------------------------
#> | 6 | 4 | 3 | 7 |
#> | | 0.125 | 0.094 | |
#> | | 0.57 | 0.43 | 0.22 |
#> | | 0.21 | 0.23 | |
#> -------------------------------------------------------------
#> | 8 | 12 | 2 | 14 |
#> | | 0.375 | 0.062 | |
#> | | 0.86 | 0.14 | 0.44 |
#> | | 0.63 | 0.15 | |
#> -------------------------------------------------------------
#> | Column Total | 19 | 13 | 32 |
#> | | 0.594 | 0.406 | |
#> -------------------------------------------------------------
#>
#>
#> gear vs am
#> -------------------------------------------------------------
#> | | am |
#> -------------------------------------------------------------
#> | gear | 0 | 1 | Row Total |
#> -------------------------------------------------------------
#> | 3 | 15 | 0 | 15 |
#> | | 0.469 | 0 | |
#> | | 1 | 0 | 0.47 |
#> | | 0.79 | 0 | |
#> -------------------------------------------------------------
#> | 4 | 4 | 8 | 12 |
#> | | 0.125 | 0.25 | |
#> | | 0.33 | 0.67 | 0.38 |
#> | | 0.21 | 0.62 | |
#> -------------------------------------------------------------
#> | 5 | 0 | 5 | 5 |
#> | | 0 | 0.156 | |
#> | | 0 | 1 | 0.16 |
#> | | 0 | 0.38 | |
#> -------------------------------------------------------------
#> | Column Total | 19 | 13 | 32 |
#> | | 0.594 | 0.406 | |
#> -------------------------------------------------------------
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.