Model Evaluation

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Model Evaluation

For Bayesian model evaluation, the bayesrules package has three functions prediction_summary(), classification_summary() and naive_classification_summary() as well as their cross-validation counterparts prediction_summary_cv(), classification_summary_cv(), and naive_classification_summary_cv() respectively.

Functions	Response	Model
`prediction_summary()` `prediction_summary_cv()`	Quantitative	rstanreg
`classification_summary()` `classification_summary_cv()`	Binary	rstanreg
`naive_classification_summary()` `naive_classification_summary_cv()`	Categorical	naiveBayes

Prediction Summary

Given a set of observed data including a quantitative response variable y and an rstanreg model of y, prediction_summary() returns 4 measures of the posterior prediction quality.

Median absolute prediction error (mae) measures the typical difference between the observed y values and their posterior predictive medians (stable = TRUE) or means (stable = FALSE).
Scaled mae (mae_scaled) measures the typical number of absolute deviations (stable = TRUE) or standard deviations (stable = FALSE) that observed y values fall from their predictive medians (stable = TRUE) or means (stable = FALSE).
and 4. within_50 and within_90 report the proportion of observed y values that fall within their posterior prediction intervals, the probability levels of which are set by the user. Although 50% and 90% are the defaults for the posterior prediction intervals, these probability levels can be changed with prob_inner and prob_outer arguments. The example below shows the 60% and 80% posterior prediction intervals.

# Data generation
example_data <- data.frame(x = sample(1:100, 20)) 
example_data$y <- example_data$x*3 + rnorm(20, 0, 5)


# rstanreg model
example_model <- rstanarm::stan_glm(y ~ x,  data = example_data, refresh = FALSE)

# Prediction Summary
prediction_summary(example_model, example_data, 
                   prob_inner = 0.6, prob_outer = 0.80, 
                   stable = TRUE)
       mae mae_scaled within_60 within_80
1 2.405058  0.8680121       0.6       0.9

Similarly, prediction_summary_cv() returns the 4 cross-validated measures of a model’s posterior prediction quality for each fold as well as a pooled result. The k argument represents the number of folds to use for cross-validation.

prediction_summary_cv(model = example_model, data = example_data, 
                      k = 2, prob_inner = 0.6, prob_outer = 0.80)
$folds
  fold      mae mae_scaled within_60 within_80
1    1 2.848569  0.5061318       0.7       1.0
2    2 2.732608  0.5569211       0.6       0.9

$cv
       mae mae_scaled within_60 within_80
1 2.790589  0.5315264      0.65      0.95

Classification Summary

Given a set of observed data including a binary response variable y and an rstanreg model of y, the classification_summary() function returns summaries of the model’s posterior classification quality. These summaries include a confusion matrix as well as estimates of the model’s sensitivity, specificity, and overall accuracy. The cutoff argument represents the probability cutoff to classify a new case as positive.

# Data generation
x <- rnorm(20)
z <- 3*x
prob <- 1/(1+exp(-z))
y <- rbinom(20, 1, prob)
example_data <- data.frame(x = x, y = y)


# rstanreg model
example_model <- rstanarm::stan_glm(y ~ x, data = example_data, 
                                    family = binomial, refresh = FALSE)

# Prediction Summary
classification_summary(model = example_model, data = example_data, cutoff = 0.5)                   
$confusion_matrix
 y 0 1
 0 9 1
 1 2 8

$accuracy_rates
                     
sensitivity      0.80
specificity      0.90
overall_accuracy 0.85

The classification_summary_cv() returns the same measures but for cross-validated estimates. The k argument represents the number of folds to use for cross-validation.

classification_summary_cv(model = example_model, data = example_data, k = 2, cutoff = 0.5)                   
$folds
  fold sensitivity specificity overall_accuracy
1    1         0.6         1.0              0.8
2    2         1.0         0.8              0.9

$cv
  sensitivity specificity overall_accuracy
1         0.8         0.9             0.85

Naive Classification Summary

Given a set of observed data including a categorical response variable y and a naiveBayes model of y, the naive_classification_summary() function returns summaries of the model’s posterior classification quality. These summaries include a confusion matrix as well as an estimate of the model’s overall accuracy.

# Data
data(penguins_bayes, package = "bayesrules")

# naiveBayes model
example_model <- e1071::naiveBayes(species ~ bill_length_mm, data = penguins_bayes)

# Naive Classification Summary
naive_classification_summary(model = example_model, data = penguins_bayes, y = "species")
$confusion_matrix
   species       Adelie Chinstrap       Gentoo
    Adelie 95.39% (145) 0.00% (0)  4.61%   (7)
 Chinstrap  5.88%   (4) 8.82% (6) 85.29%  (58)
    Gentoo  6.45%   (8) 4.84% (6) 88.71% (110)

$overall_accuracy
[1] 0.7587209

Similarly naive_classification_summary_cv() returns the cross validated confusion matrix. The k argument represents the number of folds to use for cross-validation.

naive_classification_summary_cv(model = example_model, data = penguins_bayes, 
                                y = "species", k = 2)
$folds
  fold    Adelie Chinstrap    Gentoo overall_accuracy
1    1 0.9634146   0.09375 0.8965517        0.7790698
2    2 0.9428571   0.00000 0.9242424        0.7383721

$cv
   species       Adelie Chinstrap       Gentoo
    Adelie 95.39% (145) 0.00% (0)  4.61%   (7)
 Chinstrap  5.88%   (4) 4.41% (3) 89.71%  (61)
    Gentoo  6.45%   (8) 2.42% (3) 91.13% (113)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.