The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Finding the nearest proper correlation matrix

Kevin Wright

2021-04-29

Consider the following matrix, as might arise from calculating covariance based on pairwise-complete data.

vv <- matrix(c(100.511, 159.266, 3.888, 59.964, 37.231, 32.944, 68.845,
               159.266, 277.723, 6.161, 95.017, 58.995, 52.203, 109.09, 3.888,
               6.161, 99.831, 2.32, 1.44, 1.274, 2.663, 59.964, 95.017, 2.32,
               35.774, 22.212, 19.655, 41.073, 37.231, 58.995, 1.44, 22.212,
               40.432, 12.203, 25.502, 32.944, 52.203, 1.274, 19.655, 12.203,
               10.798, 22.566, 68.845, 109.09, 2.663, 41.073, 25.502, 22.566,
               96.217), nrow=7, byrow=TRUE)
print(vv)
##         [,1]    [,2]   [,3]   [,4]   [,5]   [,6]    [,7]
## [1,] 100.511 159.266  3.888 59.964 37.231 32.944  68.845
## [2,] 159.266 277.723  6.161 95.017 58.995 52.203 109.090
## [3,]   3.888   6.161 99.831  2.320  1.440  1.274   2.663
## [4,]  59.964  95.017  2.320 35.774 22.212 19.655  41.073
## [5,]  37.231  58.995  1.440 22.212 40.432 12.203  25.502
## [6,]  32.944  52.203  1.274 19.655 12.203 10.798  22.566
## [7,]  68.845 109.090  2.663 41.073 25.502 22.566  96.217

This is not a proper covariance matrix (it has a negative eigenvalue).

eigen(vv)$values
## [1]  4.808047e+02  9.965048e+01  4.595154e+01  2.657509e+01  8.304329e+00
## [6]  6.685001e-04 -8.147905e-04

If we attempt to use the cov2cor() function to convert the covariance matrix to a correlation matrix, we find the largest correlation values are slightly larger than 1.0.

cc <- cov2cor(vv)
max(cc) # 1.000041
## [1] 1.000041

If this is passed to the corrgram function, it will issue a warning that the input data is not a correlation matrix and then calculate pairwise correlations of the columns, resulting in a non-sensical graph.

There are several packages with functions that can be used to force the correlation matrix to be an actual, positive-definite correlation matrix. Two are given here.

psych

require(psych)
## Loading required package: psych
## Warning: package 'psych' was built under R version 4.0.5
cc2 <- psych::cor.smooth(cc)
## Warning in psych::cor.smooth(cc): Matrix was not positive definite, smoothing
## was done
max(cc2)
## [1] 1

sfsmisc

library(sfsmisc)
## Warning: package 'sfsmisc' was built under R version 4.0.5
# nearcor uses 'identical' and says the matrix is not symmetric
isSymmetric(cc) # TRUE
## [1] TRUE
identical(cc, t(cc)) # FALSE
## [1] FALSE
# round slightly to make it symmetric
cc3 <- nearcor(round(cc,12))$cor
max(cc3)
## [1] 1

After converting the matrix to a valid correlation matrix, an accurate corrgram can be created:

require(corrgram)
## Loading required package: corrgram
corrgram(cc2, lower=panel.cor)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.