The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The moderncor package provides a single unified
interface for computing a wide variety of classical and modern
correlation coefficients. This guide introduces the core features of the
package.
Let’s generate some synthetic data with a non-linear parabolic relationship where \(y = x^2 + \epsilon\):
Because the relationship is non-linear and symmetric, classical Pearson correlation will fail to capture the dependence:
moderncor(x, y, method = "pearson")
#>
#> Pearson Product-Moment Correlation
#>
#> Estimate: 0.0168
#> Statistic: 0.1667
#> P-value: 0.868
#> Sample size (n): 100With moderncor, you can compute distance correlation
(dcor) or Chatterjee’s Xi correlation (xi)
using the same interface to capture the non-linear relationship:
moderncor supports Pearson, Spearman, and Kendall
correlations via the same interface as base R cor():
moderncor(x, y, method = "spearman")
#>
#> Spearman Rank Correlation
#>
#> Estimate: -0.0105
#> Statistic: 168404
#> P-value: 0.9171
#> Sample size (n): 100
moderncor(x, y, method = "kendall")
#>
#> Kendall Rank Correlation
#>
#> Estimate: -0.0129
#> Statistic: -0.1906
#> P-value: 0.8488
#> Sample size (n): 100If you pass a matrix or a data.frame to
moderncor(), it will compute the pairwise correlation
matrix of the columns:
# Compute Spearman correlation matrix for iris dataset
res_mat <- moderncor(iris[, 1:4], method = "spearman")
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
#> Warning in cor.test.default(x, y, method = "spearman", alternative =
#> alternative, : cannot compute exact p-value with ties
res_mat
#>
#> Spearman Rank Correlation
#>
#> Correlation Matrix (n = 150):
#>
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length 1.0000 -0.1668 0.8819 0.8343
#> Sepal.Width -0.1668 1.0000 -0.3096 -0.2890
#> Petal.Length 0.8819 -0.3096 1.0000 0.9377
#> Petal.Width 0.8343 -0.2890 0.9377 1.0000
#>
#> P-value Matrix:
#>
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Sepal.Length 0.0000 0.0414 0e+00 0e+00
#> Sepal.Width 0.0414 0.0000 1e-04 3e-04
#> Petal.Length 0.0000 0.0001 0e+00 0e+00
#> Petal.Width 0.0000 0.0003 0e+00 0e+00as.data.frameYou can convert the output of moderncor() to a tidy data
frame using as.data.frame(). This is particularly useful
for correlation matrices:
# Convert correlation matrix to tidy data frame
df <- as.data.frame(res_mat)
head(df)
#> var1 var2 r p.value
#> 1 Sepal.Width Sepal.Length -0.1667777 4.136799e-02
#> 2 Petal.Length Sepal.Length 0.8818981 3.443087e-50
#> 3 Petal.Width Sepal.Length 0.8342888 4.189447e-40
#> 4 Sepal.Length Sepal.Width -0.1667777 4.136799e-02
#> 5 Petal.Length Sepal.Width -0.3096351 1.153938e-04
#> 6 Petal.Width Sepal.Width -0.2890317 3.342981e-04This returns a data frame containing the variables being compared
(var1 and var2), the correlation coefficient
(r), and p-values (p.value) if they were
calculated.
For large datasets, calculating p-values for modern methods (such as
MIC, HSIC, or Mutual Information) can be slow because they rely on
permutation tests. You can disable p-value calculations by setting
p_value = FALSE for a significant speedup:
Robust correlations are less sensitive to outliers than classical
methods. moderncor provides three robust correlation
methods.
Biweight midcorrelation down-weights observations far from the median using a biweight function. It requires no additional packages:
set.seed(42)
x_out <- c(rnorm(95), rnorm(5, mean = 10)) # 5% outliers
y_out <- c(rnorm(95), rnorm(5, mean = 10))
moderncor(x_out, y_out, method = "biweight")
#>
#> Biweight Midcorrelation
#>
#> Estimate: 0.1045
#> Statistic: 1.0405
#> P-value: 0.3007
#> Sample size (n): 100Compare with Pearson, which is strongly influenced by outliers:
Percentage bend correlation trims a specified proportion of the most
extreme values (requires the WRS2 package):
Ordinal correlations are designed for ordered categorical (Likert-scale) data. They model the data as discretized versions of underlying continuous normal distributions.
Polychoric correlation is appropriate when both variables are ordinal
with more than two categories (requires psych):
# Simulate ordinal data (e.g., Likert scale responses)
set.seed(1)
z1 <- rnorm(200)
z2 <- 0.7 * z1 + rnorm(200, sd = sqrt(1 - 0.7^2))
x_ord <- cut(z1, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)
y_ord <- cut(z2, breaks = c(-Inf, -1, 0, 1, Inf), labels = FALSE)
moderncor(x_ord, y_ord, method = "polychoric")
#>
#> Polychoric Correlation
#>
#> Estimate: 0.6411
#> Sample size (n): 200Tetrachoric correlation is the special case of polychoric for binary
(0/1) data (requires psych):
Partial and semi-partial correlations measure the relationship
between two variables while controlling for one or more confounding
variables (requires ppcor).
Partial correlation removes the influence of z from
both x and y:
set.seed(7)
z <- rnorm(100)
x_p <- 0.6 * z + rnorm(100, sd = 0.8) # x correlates with z
y_p <- 0.6 * z + rnorm(100, sd = 0.8) # y correlates with z
# Raw correlation (inflated by shared z)
moderncor(x_p, y_p, method = "pearson")
#>
#> Pearson Product-Moment Correlation
#>
#> Estimate: 0.4122
#> Statistic: 4.4794
#> P-value: 2.028e-05
#> Sample size (n): 100Semi-partial correlation removes the influence of z from
y only (also requires ppcor):
moderncor(x_p, y_p, method = "semi_partial", z = z)
#>
#> Semi-partial Correlation (Pearson)
#>
#> Estimate: 0.1097
#> Statistic: 1.0867
#> P-value: 0.2799
#> Sample size (n): 100The method_partial argument selects which base
correlation to use ("pearson", "spearman", or
"kendall"):
Ball correlation is a nonparametric measure of dependence based on
ball covariance (requires Ball):
Bergsma-Dassios \(\tau^*\) is a
nonparametric measure of association that equals zero if and only if
x and y are independent (requires
TauStar):
To see all supported correlation methods and their required packages:
available_methods()
#> method label package
#> 1 pearson Pearson Product-Moment Correlation stats
#> 2 spearman Spearman Rank Correlation stats
#> 3 kendall Kendall Rank Correlation stats
#> 4 dcor Distance Correlation energy
#> 5 mic Maximal Information Coefficient (MIC) minerva
#> 6 hsic Hilbert-Schmidt Independence Criterion (HSIC) dHSIC
#> 7 xi Chatterjee's Xi Correlation XICOR
#> 8 hoeffding Hoeffding's D Hmisc
#> 9 mutual_info Mutual Information infotheo
#> 10 biweight Biweight Midcorrelation built-in
#> 11 percentage_bend Percentage Bend Correlation WRS2
#> 12 winsorized Winsorized Correlation WRS2
#> 13 polychoric Polychoric Correlation psych
#> 14 tetrachoric Tetrachoric Correlation psych
#> 15 partial Partial Correlation ppcor
#> 16 semi_partial Semi-partial Correlation ppcor
#> 17 ball Ball Correlation Ball
#> 18 tau_star Bergsma-Dassios Tau* TauStar
#> type
#> 1 classic
#> 2 classic
#> 3 classic
#> 4 modern
#> 5 modern
#> 6 modern
#> 7 modern
#> 8 modern
#> 9 information
#> 10 robust
#> 11 robust
#> 12 robust
#> 13 ordinal
#> 14 ordinal
#> 15 partial
#> 16 partial
#> 17 other
#> 18 otherTo get details on a specific method:
method_info("dcor")
#> $method
#> [1] "dcor"
#>
#> $label
#> [1] "Distance Correlation"
#>
#> $package
#> [1] "energy"
#>
#> $description
#> [1] "Measures both linear and nonlinear dependence. Zero if and only if independent."
#>
#> $range
#> [1] "[0, 1]"
#>
#> $assumptions
#> [1] "Continuous variables."For categorical variables (factors or contingency tables), use
moderncor_cat(). See vignette("categorical")
for a full introduction to categorical association measures.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.