In this vignette we present plots for classification models evaluation.
We work on titanic dataset form the DALEX
package.
titanic <- na.omit(DALEX::titanic)
titanic$survived = as.numeric(titanic$survived)-1
head(titanic)
## gender age class embarked country fare sibsp parch survived
## 1 male 42 3rd Southampton United States 7.11 0 0 0
## 2 male 13 3rd Southampton United States 20.05 0 2 0
## 3 male 16 3rd Southampton United States 20.05 1 1 0
## 4 female 39 3rd Southampton England 20.05 1 1 1
## 5 female 16 3rd Southampton Norway 7.13 0 0 1
## 6 male 25 3rd Southampton United States 7.13 0 0 1
We fit 2 models: glm and svm.
model_glm <- glm(survived~., data = titanic, family = binomial)
library(e1071)
model_svm <- svm(survived~., data = titanic)
The first step is creating explainer
object with the DALEX
package. It’s an object that can be used to audit a model. It wraps up a model with meta-data.
exp_glm <- DALEX::explain(model_glm, data = titanic, y = titanic$survived)
## Preparation of a new explainer is initiated
## -> model label : lm ([33mdefault[39m)
## -> data : 2099 rows 9 cols
## -> target variable : 2099 values
## -> data : A column identical to the target variable `y` has been found in the `data`. ([31mWARNING[39m)
## -> data : It is highly recommended to pass `data` without the target variable column
## -> predict function : yhat.glm will be used ([33mdefault[39m)
## -> predicted values : numerical, min = 9.814966e-09 , mean = 0.3244402 , max = 1
## -> residual function : difference between y and yhat ([33mdefault[39m)
## -> residuals : numerical, min = -0.9614217 , mean = -1.68201e-09 , max = 0.9666502
## [32mA new explainer has been created![39m
exp_svm <- DALEX::explain(model_svm, data = titanic, y = titanic$survived, label = "svm")
## Preparation of a new explainer is initiated
## -> model label : svm
## -> data : 2099 rows 9 cols
## -> target variable : 2099 values
## -> data : A column identical to the target variable `y` has been found in the `data`. ([31mWARNING[39m)
## -> data : It is highly recommended to pass `data` without the target variable column
## -> predict function : yhat.svm will be used ([33mdefault[39m)
## -> predicted values : numerical, min = -0.05516344 , mean = 0.2523206 , max = 1.059265
## -> residual function : difference between y and yhat ([33mdefault[39m)
## -> residuals : numerical, min = -1.035725 , mean = 0.0721196 , max = 1.015941
## [32mA new explainer has been created![39m
Second step is creating auditor_model_evaluation
object that can be further used for validating a model.
library(auditor)
eva_glm <- model_evaluation(exp_glm)
eva_svm <- model_evaluation(exp_svm)
auditor_model_evaluation
object can be used for plotting charts.
plot(eva_glm, eva_svm, type = "roc")
# or
# plot_roc(eva_glm, eva_svm)
plot(eva_glm, eva_svm, type = "lift")
# or
# plot_lift(eva_glm, eva_svm)
Other methods and plots are described in vignettes: