The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

How to use breakDown package for models created with caret

Przemyslaw Biecek

2024-03-11

This example demonstrates how to use the breakDown package for models created with the caret package.

First we will generate some data.

library(caret)

set.seed(2)
training <- twoClassSim(50, linearVars = 2)
trainX <- training[, -ncol(training)]
trainY <- training$Class

head(training)
#>   TwoFactor1 TwoFactor2    Linear1    Linear2 Nonlinear1 Nonlinear2 Nonlinear3
#> 1 -0.6561702 -1.6480450  1.0744594  0.9758906  0.2342843  0.6805653  0.6920055
#> 2 -0.9849973  1.4598834  0.2605978 -0.1694232  0.1381283  0.7460168  0.5599569
#> 3  2.3722541  1.7069944 -0.3142720  0.7221918 -0.6920591  0.4642024  0.3426912
#> 4 -2.2067173 -0.6972704 -0.7496301 -0.8444186 -0.9303336  0.1374181  0.2344975
#> 5  0.5166671 -0.7228376 -0.8621983  1.2772937  0.9959069  0.8143796  0.4296028
#> 6  1.3331262 -0.9929323  2.0480403 -1.3431105  0.6711474  0.8321613  0.7367007
#>    Class
#> 1 Class1
#> 2 Class2
#> 3 Class1
#> 4 Class2
#> 5 Class1
#> 6 Class1

Now we are ready to train a model. Let’s train a glm model with caret.

cctrl1 <- trainControl(method = "cv", number = 3, returnResamp = "all",
                       classProbs = TRUE, 
                       summaryFunction = twoClassSummary)

test_class_cv_model <- train(trainX, trainY, 
                             method = "glm", 
                             trControl = cctrl1,
                             metric = "ROC", 
                             preProc = c("center", "scale"))
test_class_cv_model
#> Generalized Linear Model 
#> 
#> 50 samples
#>  7 predictor
#>  2 classes: 'Class1', 'Class2' 
#> 
#> Pre-processing: centered (7), scaled (7) 
#> Resampling: Cross-Validated (3 fold) 
#> Summary of sample sizes: 33, 34, 33 
#> Resampling results:
#> 
#>   ROC        Sens       Spec     
#>   0.7771991  0.7175926  0.8009259

To use breakDown we need a function that will calculate scores/predictions for a single observation. By default the predict() function returns predicted class.

So we are adding type = "prob" argument to get scores. And since there will be two scores for each observarion we need to extract one of them.

predict.fun <- function(model, x) predict(model, x, type = "prob")[,1]
testing <- twoClassSim(10, linearVars = 2)
predict.fun(test_class_cv_model, testing[1,])
#> [1] 0.9807632

Now we are ready to call the broken() function.

library("breakDown")
explain_2 <- broken(test_class_cv_model, testing[1,], data = trainX, predict.function = predict.fun)
explain_2
#>                                   contribution
#> (Intercept)                              0.500
#> + TwoFactor2 = -2.15297519239414         0.330
#> + Linear2 = 1.21347759171666             0.103
#> + Nonlinear2 = 0.938861106755212         0.037
#> + Nonlinear3 = 0.198311409447342         0.016
#> + Linear1 = -1.59104698624311            0.006
#> + Nonlinear1 = -0.693807001691312       -0.001
#> + TwoFactor1 = -1.5957842151878         -0.009
#> final_prognosis                          0.981
#> baseline:  0

And plot it.

library(ggplot2)
plot(explain_2) + ggtitle("breakDown plot for caret/glm model")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.