The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This example demonstrates how to use the breakDown
package for models created with the caret package.
First we will generate some data.
library(caret)
set.seed(2)
training <- twoClassSim(50, linearVars = 2)
trainX <- training[, -ncol(training)]
trainY <- training$Class
head(training)
#> TwoFactor1 TwoFactor2 Linear1 Linear2 Nonlinear1 Nonlinear2 Nonlinear3
#> 1 -0.6561702 -1.6480450 1.0744594 0.9758906 0.2342843 0.6805653 0.6920055
#> 2 -0.9849973 1.4598834 0.2605978 -0.1694232 0.1381283 0.7460168 0.5599569
#> 3 2.3722541 1.7069944 -0.3142720 0.7221918 -0.6920591 0.4642024 0.3426912
#> 4 -2.2067173 -0.6972704 -0.7496301 -0.8444186 -0.9303336 0.1374181 0.2344975
#> 5 0.5166671 -0.7228376 -0.8621983 1.2772937 0.9959069 0.8143796 0.4296028
#> 6 1.3331262 -0.9929323 2.0480403 -1.3431105 0.6711474 0.8321613 0.7367007
#> Class
#> 1 Class1
#> 2 Class2
#> 3 Class1
#> 4 Class2
#> 5 Class1
#> 6 Class1
Now we are ready to train a model. Let’s train a glm
model with caret
.
cctrl1 <- trainControl(method = "cv", number = 3, returnResamp = "all",
classProbs = TRUE,
summaryFunction = twoClassSummary)
test_class_cv_model <- train(trainX, trainY,
method = "glm",
trControl = cctrl1,
metric = "ROC",
preProc = c("center", "scale"))
test_class_cv_model
#> Generalized Linear Model
#>
#> 50 samples
#> 7 predictor
#> 2 classes: 'Class1', 'Class2'
#>
#> Pre-processing: centered (7), scaled (7)
#> Resampling: Cross-Validated (3 fold)
#> Summary of sample sizes: 33, 34, 33
#> Resampling results:
#>
#> ROC Sens Spec
#> 0.7771991 0.7175926 0.8009259
To use breakDown
we need a function that will calculate
scores/predictions for a single observation. By default the
predict()
function returns predicted class.
So we are adding type = "prob"
argument to get scores.
And since there will be two scores for each observarion we need to
extract one of them.
predict.fun <- function(model, x) predict(model, x, type = "prob")[,1]
testing <- twoClassSim(10, linearVars = 2)
predict.fun(test_class_cv_model, testing[1,])
#> [1] 0.9807632
Now we are ready to call the broken()
function.
library("breakDown")
explain_2 <- broken(test_class_cv_model, testing[1,], data = trainX, predict.function = predict.fun)
explain_2
#> contribution
#> (Intercept) 0.500
#> + TwoFactor2 = -2.15297519239414 0.330
#> + Linear2 = 1.21347759171666 0.103
#> + Nonlinear2 = 0.938861106755212 0.037
#> + Nonlinear3 = 0.198311409447342 0.016
#> + Linear1 = -1.59104698624311 0.006
#> + Nonlinear1 = -0.693807001691312 -0.001
#> + TwoFactor1 = -1.5957842151878 -0.009
#> final_prognosis 0.981
#> baseline: 0
And plot it.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.