The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The basic idea of calculating the importance of attributes in a linear regression is according to the coefficients in the regression. However, when we put too many independent variables to regress, we can not promise that all those independent variables are independently distributed, commonly speaking. On other words, it may have great possibility that several attributes are collinearity, which also known as highly correlated. In an example context, we can easily remove the highly correlated attributes and then do the regression. However, in real world business cases, all the attributes we selected are important and meaningful, thus we can not remove the attributes which are highly correlated randomly. Therefore, we need to find out how to calculating the importance of attributes when several attributes are collinearity.
Shapley Value regression is also called Shapley regression, Shapley Value analysis, Kruskal analysis, and dominance analysis, and incremental R-squared analysis. Apart from using it while independent variables are moderately to highly correlated in linear regression, it also can be used when computing the contribution of each predictors in machine learning.
This package only has one function shapleyvalue
, and you can use it to analyze the relative importance of attributes in linear regression.
Here, we use the bulit-in dataset Boston
in package MASS
. In this demo, medv
as dependent variable, nox
, rm
, age
, dis
as four predictors, and we want to find out the importance of each predictor.
library(ShapleyValue)
<- Boston
data head(data) %>%
kbl() %>%
kable_classic(full_width = F, html_font = "Cambria")
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | black | lstat | medv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 4.98 | 24.0 |
0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 | 21.6 |
0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 | 34.7 |
0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 | 33.4 |
0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 5.33 | 36.2 |
0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 394.12 | 5.21 | 28.7 |
<- data$medv
y <- as.data.frame(data[,5:8])
x <- shapleyvalue(y,x)
value %>%
value kbl() %>%
kable_classic(full_width = F, html_font = "Cambria")
nox | rm | age | dis | |
---|---|---|---|---|
Shapley Value | 0.0836 | 0.3938 | 0.0573 | 0.0272 |
Standardized Shapley Value | 0.1488 | 0.7009 | 0.1020 | 0.0483 |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.