To illustrate applications of auditor to regression problems we will use an artificial dataset dragons available in the DALEX package. Our goal is to predict the length of life of dragons.
dragons <- DALEX::dragons
head(dragons)
## year_of_birth height weight scars colour year_of_discovery
## 1 -1291 59.40365 15.32391 7 red 1700
## 2 1589 46.21374 11.80819 5 red 1700
## 3 1528 49.17233 13.34482 6 red 1700
## 4 1645 48.29177 13.27427 5 green 1700
## 5 -8 49.99679 13.08757 1 red 1700
## 6 915 45.40876 11.48717 2 red 1700
## number_of_lost_teeth life_length
## 1 25 1368.4331
## 2 28 1377.0474
## 3 38 1603.9632
## 4 33 1434.4222
## 5 18 985.4905
## 6 20 969.5682
lm_model <- lm(life_length ~ ., data = dragons)
library("randomForest")
set.seed(59)
rf_model <- randomForest(life_length ~ ., data = dragons)
The beginning of each analysis is creation of an explainer
object with the DALEX package. It’s an object that can be used to audit a model.
lm_exp <- DALEX::explain(lm_model, label = "lm", data = dragons, y = dragons$life_length)
## Preparation of a new explainer is initiated
## -> model label : lm
## -> data : 2000 rows 8 cols
## -> target variable : 2000 values
## -> predict function : yhat.lm will be used ([33mdefault[39m)
## -> predicted values : numerical, min = 540.9447 , mean = 1370.986 , max = 3925.691
## -> residual function : difference between y and yhat ([33mdefault[39m)
## -> residuals : numerical, min = -108.2062 , mean = -3.701928e-12 , max = 113.8603
## [32mA new explainer has been created![39m
rf_exp <- DALEX::explain(rf_model, label = "rf", data = dragons, y = dragons$life_length)
## Preparation of a new explainer is initiated
## -> model label : rf
## -> data : 2000 rows 8 cols
## -> target variable : 2000 values
## -> predict function : yhat.randomForest will be used ([33mdefault[39m)
## -> predicted values : numerical, min = 610.9752 , mean = 1370.181 , max = 3292.296
## -> residual function : difference between y and yhat ([33mdefault[39m)
## -> residuals : numerical, min = -135.4756 , mean = 0.8047108 , max = 720.0888
## [32mA new explainer has been created![39m
Model performance measures may be plotted together to easily compare model performances.
Function model_performance()
compute chosen model performance measures. A result further from the center means a better model performance.
library(auditor)
lm_mp <- model_performance(lm_exp)
rf_mp <- model_performance(rf_exp)
lm_mp
## Model label: lm
## score name
## mae 3.334652e+01 mae
## mse 1.656454e+03 mse
## rec 3.330139e+01 rec
## rroc 3.310782e+09 rroc
Results of model_performance()
function for multiple models may be plotted together on one plot.
Parameter table
indicates whether the table with scores should be generated.
On the plot scores are inversed and scaled to [0,1].
plot(lm_mp, rf_mp)
## _name_ _label_ _value_ scaled
## 1 inv\nmae lm 3.33e+01 1.000
## 2 inv\nmae rf 2.79e+01 0.835
## 3 inv\nmse lm 1.66e+03 1.000
## 4 inv\nmse rf 2.39e+03 1.443
## 5 inv\nrec lm 3.33e+01 1.000
## 6 inv\nrec rf 2.77e+01 0.831
## 7 inv\nrroc lm 3.31e+09 1.000
## 8 inv\nrroc rf 4.75e+09 1.434
There is a possibiliy to define functions with custom model performance measure.
new_score <- function(object) sum(sqrt(abs(object$residuals)))
lm_mp <- model_performance(lm_exp,
score = c("mae", "mse", "rec", "rroc"),
new_score = new_score)
rf_mp <- model_performance(rf_exp,
score = c("mae", "mse", "rec", "rroc"),
new_score = new_score)
plot(lm_mp, rf_mp)
## _name_ _label_ _value_ scaled
## 1 inv\nmae lm 3.33e+01 1.000
## 2 inv\nmae rf 2.79e+01 0.835
## 3 inv\nmse lm 1.66e+03 1.000
## 4 inv\nmse rf 2.39e+03 1.443
## 5 inv\nrec lm 3.33e+01 1.000
## 6 inv\nrec rf 2.77e+01 0.831
## 7 inv\nrroc lm 3.31e+09 1.000
## 8 inv\nrroc rf 4.75e+09 1.434
## 9 inv\nnew_score lm 1.07e+04 1.000
## 10 inv\nnew_score rf 9.25e+03 0.864
Other methods and plots are described in vignettes: