The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
library(bayesrules)
For Bayesian model evaluation, the bayesrules package has three functions prediction_summary()
, classification_summary()
and naive_classification_summary()
as well as their cross-validation counterparts prediction_summary_cv()
, classification_summary_cv()
, and naive_classification_summary_cv()
respectively.
Functions | Response | Model |
---|---|---|
prediction_summary() prediction_summary_cv()
|
Quantitative | rstanreg |
classification_summary() classification_summary_cv()
|
Binary | rstanreg |
naive_classification_summary() naive_classification_summary_cv()
|
Categorical | naiveBayes |
Given a set of observed data including a quantitative response variable y and an rstanreg model of y, prediction_summary()
returns 4 measures of the posterior prediction quality.
Median absolute prediction error (mae) measures the typical difference between the observed y values and their posterior predictive medians (stable = TRUE) or means (stable = FALSE).
Scaled mae (mae_scaled) measures the typical number of absolute deviations (stable = TRUE) or standard deviations (stable = FALSE) that observed y values fall from their predictive medians (stable = TRUE) or means (stable = FALSE).
and 4. within_50 and within_90 report the proportion of observed y values that fall within their posterior prediction intervals, the probability levels of which are set by the user. Although 50% and 90% are the defaults for the posterior prediction intervals, these probability levels can be changed with prob_inner
and prob_outer
arguments. The example below shows the 60% and 80% posterior prediction intervals.
# Data generation
<- data.frame(x = sample(1:100, 20))
example_data $y <- example_data$x*3 + rnorm(20, 0, 5)
example_data
# rstanreg model
<- rstanarm::stan_glm(y ~ x, data = example_data, refresh = FALSE)
example_model
# Prediction Summary
prediction_summary(example_model, example_data,
prob_inner = 0.6, prob_outer = 0.80,
stable = TRUE)
mae mae_scaled within_60 within_801 2.405058 0.8680121 0.6 0.9
Similarly, prediction_summary_cv()
returns the 4 cross-validated measures of a model’s posterior prediction quality for each fold as well as a pooled result. The k
argument represents the number of folds to use for cross-validation.
prediction_summary_cv(model = example_model, data = example_data,
k = 2, prob_inner = 0.6, prob_outer = 0.80)
$folds
fold mae mae_scaled within_60 within_801 1 2.848569 0.5061318 0.7 1.0
2 2 2.732608 0.5569211 0.6 0.9
$cv
mae mae_scaled within_60 within_801 2.790589 0.5315264 0.65 0.95
Given a set of observed data including a binary response variable y and an rstanreg model of y, the classification_summary()
function returns summaries of the model’s posterior classification quality. These summaries include a confusion matrix as well as estimates of the model’s sensitivity, specificity, and overall accuracy. The cutoff
argument represents the probability cutoff to classify a new case as positive.
# Data generation
<- rnorm(20)
x <- 3*x
z <- 1/(1+exp(-z))
prob <- rbinom(20, 1, prob)
y <- data.frame(x = x, y = y)
example_data
# rstanreg model
<- rstanarm::stan_glm(y ~ x, data = example_data,
example_model family = binomial, refresh = FALSE)
# Prediction Summary
classification_summary(model = example_model, data = example_data, cutoff = 0.5)
$confusion_matrix
0 1
y 0 9 1
1 2 8
$accuracy_rates
0.80
sensitivity 0.90
specificity 0.85 overall_accuracy
The classification_summary_cv()
returns the same measures but for cross-validated estimates. The k
argument represents the number of folds to use for cross-validation.
classification_summary_cv(model = example_model, data = example_data, k = 2, cutoff = 0.5)
$folds
fold sensitivity specificity overall_accuracy1 1 0.6 1.0 0.8
2 2 1.0 0.8 0.9
$cv
sensitivity specificity overall_accuracy1 0.8 0.9 0.85
Given a set of observed data including a categorical response variable y and a naiveBayes model of y, the naive_classification_summary()
function returns summaries of the model’s posterior classification quality. These summaries include a confusion matrix as well as an estimate of the model’s overall accuracy.
# Data
data(penguins_bayes, package = "bayesrules")
# naiveBayes model
<- e1071::naiveBayes(species ~ bill_length_mm, data = penguins_bayes)
example_model
# Naive Classification Summary
naive_classification_summary(model = example_model, data = penguins_bayes, y = "species")
$confusion_matrix
species Adelie Chinstrap Gentoo95.39% (145) 0.00% (0) 4.61% (7)
Adelie 5.88% (4) 8.82% (6) 85.29% (58)
Chinstrap 6.45% (8) 4.84% (6) 88.71% (110)
Gentoo
$overall_accuracy
1] 0.7587209 [
Similarly naive_classification_summary_cv()
returns the cross validated confusion matrix. The k
argument represents the number of folds to use for cross-validation.
naive_classification_summary_cv(model = example_model, data = penguins_bayes,
y = "species", k = 2)
$folds
fold Adelie Chinstrap Gentoo overall_accuracy1 1 0.9634146 0.09375 0.8965517 0.7790698
2 2 0.9428571 0.00000 0.9242424 0.7383721
$cv
species Adelie Chinstrap Gentoo95.39% (145) 0.00% (0) 4.61% (7)
Adelie 5.88% (4) 4.41% (3) 89.71% (61)
Chinstrap 6.45% (8) 2.42% (3) 91.13% (113) Gentoo
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.