The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The moderncor_cat() function provides a unified
interface for computing association measures between categorical
(factor) variables. All measures require the DescTools
package.
moderncor_cat() accepts two factor (or
character/numeric-as-categorical) vectors:
set.seed(42)
x <- factor(sample(c("A", "B", "C"), 100, replace = TRUE))
y <- factor(sample(c("X", "Y"), 100, replace = TRUE))
moderncor_cat(x, y, method = "cramers_v")
#>
#> Cramer's V
#>
#> Estimate: 0.0173
#> Statistic: 0.03
#> P-value: 0.9851
#> Sample size (n): 100The output is an S3 object of class "moderncor_cat" with
the same structure as moderncor() output:
$estimate: the association coefficient$statistic: the chi-square test statistic (for nominal
methods)$p.value: the p-value (for nominal methods;
NULL for ordinal methods)$n: the sample size$method_label: human-readable method nameavailable_methods_cat()
#> method label package type
#> 1 cramers_v Cramer's V DescTools nominal
#> 2 phi Phi Coefficient DescTools nominal
#> 3 gamma Goodman-Kruskal Gamma DescTools ordinal
#> 4 somers_d Somers' D DescTools ordinal
#> 5 contingency Contingency Coefficient DescTools nominal
#> 6 tschuprow Tschuprow's T DescTools nominalMethods fall into two categories:
Nominal measures are appropriate when categories have no natural ordering. They are all based on the chi-square statistic and return a p-value.
Cramér’s V is the most widely used measure of nominal association. It ranges from 0 (no association) to 1 (perfect association) and is symmetric:
moderncor_cat(x, y, method = "cramers_v")
#>
#> Cramer's V
#>
#> Estimate: 0.0173
#> Statistic: 0.03
#> P-value: 0.9851
#> Sample size (n): 100For a 2×2 table, Cramér’s V equals the absolute value of the Phi coefficient.
The Phi coefficient is designed for 2×2 contingency tables. For larger tables it can exceed 1, so prefer Cramér’s V in that case:
The contingency coefficient (Pearson’s C) is bounded between 0 and \(\sqrt{(k-1)/k}\) where \(k\) is the number of categories, so it is not comparable across tables of different sizes:
Tschuprow’s T is similar to Cramér’s V but uses the geometric mean of the marginal category counts as its normalizer. It is symmetric and ranges from 0 to 1:
Ordinal measures are appropriate when categories have a natural ordering (e.g., Likert scales, severity grades). They do not return p-values by default.
Goodman-Kruskal Gamma (\(\gamma\)) measures the tendency for pairs of observations to be concordant (both variables increase together) vs. discordant. It ranges from −1 to 1 and is symmetric:
# Simulate ordinal survey data
set.seed(1)
quality <- factor(sample(c("Low", "Medium", "High"), 100, replace = TRUE,
prob = c(0.3, 0.4, 0.3)),
levels = c("Low", "Medium", "High"), ordered = TRUE)
satisfaction <- factor(sample(c("Dissatisfied", "Neutral", "Satisfied"), 100,
replace = TRUE, prob = c(0.3, 0.4, 0.3)),
levels = c("Dissatisfied", "Neutral", "Satisfied"), ordered = TRUE)
moderncor_cat(quality, satisfaction, method = "gamma")
#>
#> Goodman-Kruskal Gamma
#>
#> Estimate: 0.0808
#> Sample size (n): 100Somers’ D is an asymmetric ordinal measure: it measures the
predictability of y from x (but not vice
versa). Values range from −1 to 1:
moderncor_cat(quality, satisfaction, method = "somers_d")
#>
#> Somers' D
#>
#> Estimate: 0.0548
#> Sample size (n): 100Note that swapping x and y gives a
different result:
Pass a data.frame of factor columns to compute pairwise
associations across all pairs:
df <- data.frame(
cyl = factor(mtcars$cyl),
gear = factor(mtcars$gear),
am = factor(mtcars$am)
)
res_mat <- moderncor_cat(df, method = "cramers_v")
res_mat
#>
#> Cramer's V
#>
#> Association Matrix (n = 32):
#>
#> cyl gear am
#> cyl 1.0000 0.5309 0.5226
#> gear 0.5309 1.0000 0.8090
#> am 0.5226 0.8090 1.0000
#>
#> P-value Matrix:
#>
#> cyl gear am
#> cyl 0.0000 0.0012 0.0126
#> gear 0.0012 0.0000 0.0000
#> am 0.0126 0.0000 0.0000The result is a matrix of association coefficients. For nominal
methods, the associated p-value matrix is also stored in
$p.value:
res_mat$p.value
#> cyl gear am
#> cyl 0.000000000 1.214066e-03 1.264661e-02
#> gear 0.001214066 0.000000e+00 2.830889e-05
#> am 0.012646605 2.830889e-05 0.000000e+00Use as.data.frame() to convert to tidy format:
The use argument controls how missing values are
handled, mirroring the interface of moderncor():
"complete.obs" (default): remove all rows with any NA
before computing"pairwise.complete.obs": remove NAs per pair"everything": propagate NAs (returns NA for any pair
with missing values)| Situation | Recommended method |
|---|---|
| Two unordered categorical variables (general) | cramers_v |
| Two binary variables (2×2 table) | phi |
| Two ordered categorical (Likert) variables | gamma |
| Predicting one ordered variable from another | somers_d |
| Comparing association across different table sizes | cramers_v or tschuprow |
For continuous variables, use moderncor() instead. See
vignette("introduction") for a full overview.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.