The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to moderncor

The moderncor package provides a single unified interface for computing a wide variety of classical and modern correlation coefficients. This guide introduces the core features of the package.

Installation and Setup

Once installed, you can load the package as follows:

library(moderncor)

Basic Usage with Vectors

Let’s generate some synthetic data with a non-linear parabolic relationship where \(y = x^2 + \epsilon\):

set.seed(123)
x <- runif(100, -1, 1)
y <- x^2 + rnorm(100, sd = 0.1)

Because the relationship is non-linear and symmetric, classical Pearson correlation will fail to capture the dependence:

moderncor(x, y, method = "pearson")
#> 
#>    Pearson Product-Moment Correlation 
#> 
#>   Estimate:  0.0168
#>   Statistic: 0.1667
#>   P-value:   0.868
#>   Sample size (n): 100

With moderncor, you can compute distance correlation (dcor) or Chatterjee’s Xi correlation (xi) using the same interface to capture the non-linear relationship:

# Distance Correlation (captures non-linear dependencies)
moderncor(x, y, method = "dcor")
#> 
#>    Distance Correlation 
#> 
#>   Estimate:  0.4673
#>   Statistic: 0.4673
#>   P-value:   0.005
#>   Sample size (n): 100
# Chatterjee's Xi (captures functional dependence)
moderncor(x, y, method = "xi")
#> 
#>    Chatterjee's Xi Correlation 
#> 
#>   Estimate:  0.6829
#>   P-value:   < 2.2e-16
#>   Sample size (n): 100

Classical Methods

moderncor supports Pearson, Spearman, and Kendall correlations via the same interface as base R cor():

moderncor(x, y, method = "spearman")
#> 
#>    Spearman Rank Correlation 
#> 
#>   Estimate:  -0.0105
#>   Statistic: 168404
#>   P-value:   0.9171
#>   Sample size (n): 100
moderncor(x, y, method = "kendall")
#> 
#>    Kendall Rank Correlation 
#> 
#>   Estimate:  -0.0129
#>   Statistic: -0.1906
#>   P-value:   0.8488
#>   Sample size (n): 100

Matrix and Data Frame Input

If you pass a matrix or a data.frame to moderncor(), it will compute the pairwise correlation matrix of the columns:

# Compute Spearman correlation matrix for iris dataset
res_mat <- moderncor(iris[, 1:4], method = "spearman")
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
res_mat
#> 
#>    Spearman Rank Correlation 
#> 
#>   Correlation Matrix (n = 150):
#> 
#>              Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length       1.0000     -0.1668       0.8819      0.8343
#> Sepal.Width       -0.1668      1.0000      -0.3096     -0.2890
#> Petal.Length       0.8819     -0.3096       1.0000      0.9377
#> Petal.Width        0.8343     -0.2890       0.9377      1.0000
#> 
#>   P-value Matrix:
#> 
#>              Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length       0.0000      0.0414        0e+00       0e+00
#> Sepal.Width        0.0414      0.0000        1e-04       3e-04
#> Petal.Length       0.0000      0.0001        0e+00       0e+00
#> Petal.Width        0.0000      0.0003        0e+00       0e+00

Tidy Output using as.data.frame

You can convert the output of moderncor() to a tidy data frame using as.data.frame(). This is particularly useful for correlation matrices:

# Convert correlation matrix to tidy data frame
df <- as.data.frame(res_mat)
head(df)
#>           var1         var2          r      p.value
#> 1  Sepal.Width Sepal.Length -0.1667777 4.136799e-02
#> 2 Petal.Length Sepal.Length  0.8818981 3.443087e-50
#> 3  Petal.Width Sepal.Length  0.8342888 4.189447e-40
#> 4 Sepal.Length  Sepal.Width -0.1667777 4.136799e-02
#> 5 Petal.Length  Sepal.Width -0.3096351 1.153938e-04
#> 6  Petal.Width  Sepal.Width -0.2890317 3.342981e-04

This returns a data frame containing the variables being compared (var1 and var2), the correlation coefficient (r), and p-values (p.value) if they were calculated.

Controlling P-value Computation

For large datasets, calculating p-values for modern methods (such as MIC, HSIC, or Mutual Information) can be slow because they rely on permutation tests. You can disable p-value calculations by setting p_value = FALSE for a significant speedup:

# Compute only the estimate, without p-values
moderncor(x, y, method = "dcor", p_value = FALSE)
#> 
#>    Distance Correlation 
#> 
#>   Estimate:  0.4673
#>   Sample size (n): 100

Robust Correlations

Robust correlations are less sensitive to outliers than classical methods. moderncor provides three robust correlation methods.

Biweight Midcorrelation

Biweight midcorrelation down-weights observations far from the median using a biweight function. It requires no additional packages:

set.seed(42)
x_out <- c(rnorm(95), rnorm(5, mean = 10))  # 5% outliers
y_out <- c(rnorm(95), rnorm(5, mean = 10))

moderncor(x_out, y_out, method = "biweight")
#> 
#>    Biweight Midcorrelation 
#> 
#>   Estimate:  0.1045
#>   Statistic: 1.0405
#>   P-value:   0.3007
#>   Sample size (n): 100

Compare with Pearson, which is strongly influenced by outliers:

moderncor(x_out, y_out, method = "pearson")
#> 
#>    Pearson Product-Moment Correlation 
#> 
#>   Estimate:  0.8485
#>   Statistic: 15.8745
#>   P-value:   < 2.2e-16
#>   Sample size (n): 100

Percentage Bend Correlation

Percentage bend correlation trims a specified proportion of the most extreme values (requires the WRS2 package):

moderncor(x_out, y_out, method = "percentage_bend")
#> 
#>    Percentage Bend Correlation 
#> 
#>   Estimate:  0.1893
#>   Statistic: 1.9082
#>   P-value:   0.0593
#>   Sample size (n): 100

Winsorized Correlation

Winsorized correlation replaces extreme values with the nearest non-extreme values (requires WRS2):

moderncor(x_out, y_out, method = "winsorized")
#> 
#>    Winsorized Correlation 
#> 
#>   Estimate:  0.12
#>   Statistic: 1.1964
#>   P-value:   0.2364
#>   Sample size (n): 100

Ordinal Correlations

Ordinal correlations are designed for ordered categorical (Likert-scale) data. They model the data as discretized versions of underlying continuous normal distributions.

Polychoric Correlation

Polychoric correlation is appropriate when both variables are ordinal with more than two categories (requires psych):

# Simulate ordinal data (e.g., Likert scale responses)
set.seed(1)
z1 <- rnorm(200)
z2 <- 0.7 * z1 + rnorm(200, sd = sqrt(1 - 0.7^2))
x_ord <- cut(z1, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)
y_ord <- cut(z2, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)

moderncor(x_ord, y_ord, method = "polychoric")
#> 
#>    Polychoric Correlation 
#> 
#>   Estimate:  0.6411
#>   Sample size (n): 200

Tetrachoric Correlation

Tetrachoric correlation is the special case of polychoric for binary (0/1) data (requires psych):

x_bin <- as.integer(z1 > 0)
y_bin <- as.integer(z2 > 0)

moderncor(x_bin, y_bin, method = "tetrachoric")
#> 
#>    Tetrachoric Correlation 
#> 
#>   Estimate:  0.5538
#>   Sample size (n): 200

Partial and Semi-Partial Correlations

Partial and semi-partial correlations measure the relationship between two variables while controlling for one or more confounding variables (requires ppcor).

Partial Correlation

Partial correlation removes the influence of z from both x and y:

set.seed(7)
z <- rnorm(100)
x_p <- 0.6 * z + rnorm(100, sd = 0.8)  # x correlates with z
y_p <- 0.6 * z + rnorm(100, sd = 0.8)  # y correlates with z

# Raw correlation (inflated by shared z)
moderncor(x_p, y_p, method = "pearson")
#> 
#>    Pearson Product-Moment Correlation 
#> 
#>   Estimate:  0.4122
#>   Statistic: 4.4794
#>   P-value:   2.028e-05
#>   Sample size (n): 100
# Partial correlation controlling for z
moderncor(x_p, y_p, method = "partial", z = z)
#> 
#>    Partial Correlation (Pearson) 
#> 
#>   Estimate:  0.1383
#>   Statistic: 1.375
#>   P-value:   0.1723
#>   Sample size (n): 100

Semi-Partial Correlation

Semi-partial correlation removes the influence of z from y only (also requires ppcor):

moderncor(x_p, y_p, method = "semi_partial", z = z)
#> 
#>    Semi-partial Correlation (Pearson) 
#> 
#>   Estimate:  0.1097
#>   Statistic: 1.0867
#>   P-value:   0.2799
#>   Sample size (n): 100

The method_partial argument selects which base correlation to use ("pearson", "spearman", or "kendall"):

moderncor(x_p, y_p, method = "partial", z = z, method_partial = "spearman")
#> 
#>    Partial Correlation (Spearman) 
#> 
#>   Estimate:  0.1169
#>   Statistic: 1.1588
#>   P-value:   0.2494
#>   Sample size (n): 100

Nonparametric Dependence Measures

Ball Correlation

Ball correlation is a nonparametric measure of dependence based on ball covariance (requires Ball):

moderncor(x, y, method = "ball")
#> 
#>    Ball Correlation 
#> 
#>   Estimate:  0.1317
#>   Statistic: 0.0044
#>   P-value:   0.01
#>   Sample size (n): 100

Bergsma-Dassios Tau*

Bergsma-Dassios \(\tau^*\) is a nonparametric measure of association that equals zero if and only if x and y are independent (requires TauStar):

moderncor(x, y, method = "tau_star")
#> 
#>    Bergsma-Dassios Tau* 
#> 
#>   Estimate:  0.0994
#>   Statistic: 0.0994
#>   P-value:   1.395e-07
#>   Sample size (n): 100

Querying Available Methods

To see all supported correlation methods and their required packages:

available_methods()
#>             method                                         label  package
#> 1          pearson            Pearson Product-Moment Correlation    stats
#> 2         spearman                     Spearman Rank Correlation    stats
#> 3          kendall                      Kendall Rank Correlation    stats
#> 4             dcor                          Distance Correlation   energy
#> 5              mic         Maximal Information Coefficient (MIC)  minerva
#> 6             hsic Hilbert-Schmidt Independence Criterion (HSIC)    dHSIC
#> 7               xi                   Chatterjee's Xi Correlation    XICOR
#> 8        hoeffding                                 Hoeffding's D    Hmisc
#> 9      mutual_info                            Mutual Information infotheo
#> 10        biweight                       Biweight Midcorrelation built-in
#> 11 percentage_bend                   Percentage Bend Correlation     WRS2
#> 12      winsorized                        Winsorized Correlation     WRS2
#> 13      polychoric                        Polychoric Correlation    psych
#> 14     tetrachoric                       Tetrachoric Correlation    psych
#> 15         partial                           Partial Correlation    ppcor
#> 16    semi_partial                      Semi-partial Correlation    ppcor
#> 17            ball                              Ball Correlation     Ball
#> 18        tau_star                          Bergsma-Dassios Tau*  TauStar
#>           type
#> 1      classic
#> 2      classic
#> 3      classic
#> 4       modern
#> 5       modern
#> 6       modern
#> 7       modern
#> 8       modern
#> 9  information
#> 10      robust
#> 11      robust
#> 12      robust
#> 13     ordinal
#> 14     ordinal
#> 15     partial
#> 16     partial
#> 17       other
#> 18       other

To get details on a specific method:

method_info("dcor")
#> $method
#> [1] "dcor"
#> 
#> $label
#> [1] "Distance Correlation"
#> 
#> $package
#> [1] "energy"
#> 
#> $description
#> [1] "Measures both linear and nonlinear dependence. Zero if and only if independent."
#> 
#> $range
#> [1] "[0, 1]"
#> 
#> $assumptions
#> [1] "Continuous variables."

Categorical Association Measures

For categorical variables (factors or contingency tables), use moderncor_cat(). See vignette("categorical") for a full introduction to categorical association measures.

# Quick preview: Cramér's V for two factor variables
moderncor_cat(factor(mtcars$cyl), factor(mtcars$gear), method = "cramers_v")
#> 
#>    Cramer's V 
#> 
#>   Estimate:  0.5309
#>   Statistic: 18.0364
#>   P-value:   0.001214
#>   Sample size (n): 32

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.