The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
For Bayesian model evaluation, the bayesrules
package has three functions prediction_summary(),
classification_summary() and
naive_classification_summary() as well as their
cross-validation counterparts prediction_summary_cv(),
classification_summary_cv(), and
naive_classification_summary_cv() respectively.
| Functions | Response | Model |
|---|---|---|
prediction_summary()prediction_summary_cv()
|
Quantitative | rstanreg |
classification_summary()classification_summary_cv()
|
Binary | rstanreg |
naive_classification_summary()naive_classification_summary_cv()
|
Categorical | naiveBayes |
Given a set of observed data including a quantitative response
variable y and an rstanreg model of y, prediction_summary()
returns 4 measures of the posterior prediction quality.
Median absolute prediction error (mae) measures the typical difference between the observed y values and their posterior predictive medians (stable = TRUE) or means (stable = FALSE).
Scaled mae (mae_scaled) measures the typical number of absolute deviations (stable = TRUE) or standard deviations (stable = FALSE) that observed y values fall from their predictive medians (stable = TRUE) or means (stable = FALSE).
and 4. within_50 and within_90
report the proportion of observed y values that fall within their
posterior prediction intervals, the probability levels of which are set
by the user. Although 50% and 90% are the defaults for the posterior
prediction intervals, these probability levels can be changed with
prob_inner and prob_outer arguments. The
example below shows the 60% and 80% posterior prediction
intervals.
# Data generation
example_data <- data.frame(x = sample(1:100, 20))
example_data$y <- example_data$x*3 + rnorm(20, 0, 5)
# rstanreg model
example_model <- rstanarm::stan_glm(y ~ x, data = example_data, refresh = FALSE)
# Prediction Summary
prediction_summary(example_model, example_data,
prob_inner = 0.6, prob_outer = 0.80,
stable = TRUE)
mae mae_scaled within_60 within_80
1 3.710897 0.936282 0.6 0.85Similarly, prediction_summary_cv() returns the 4
cross-validated measures of a model’s posterior prediction quality for
each fold as well as a pooled result. The k argument
represents the number of folds to use for cross-validation.
Given a set of observed data including a binary response variable y
and an rstanreg model of y, the classification_summary()
function returns summaries of the model’s posterior classification
quality. These summaries include a confusion matrix as
well as estimates of the model’s sensitivity,
specificity, and overall accuracy. The
cutoff argument represents the probability cutoff to
classify a new case as positive.
# Data generation
x <- rnorm(20)
z <- 3*x
prob <- 1/(1+exp(-z))
y <- rbinom(20, 1, prob)
example_data <- data.frame(x = x, y = y)
# rstanreg model
example_model <- rstanarm::stan_glm(y ~ x, data = example_data,
family = binomial, refresh = FALSE)
# Prediction Summary
classification_summary(model = example_model, data = example_data, cutoff = 0.5)
$confusion_matrix
y 0 1
0 6 3
1 1 10
$accuracy_rates
sensitivity 0.9090909
specificity 0.6666667
overall_accuracy 0.8000000The classification_summary_cv() returns the same
measures but for cross-validated estimates. The k argument
represents the number of folds to use for cross-validation.
Given a set of observed data including a categorical response
variable y and a naiveBayes model of y, the
naive_classification_summary() function returns summaries
of the model’s posterior classification quality. These summaries include
a confusion matrix as well as an estimate of the
model’s overall accuracy.
# Data
data(penguins_bayes, package = "bayesrules")
# naiveBayes model
example_model <- e1071::naiveBayes(species ~ bill_length_mm, data = penguins_bayes)
# Naive Classification Summary
naive_classification_summary(model = example_model, data = penguins_bayes,
y = "species")
$confusion_matrix
species Adelie Chinstrap Gentoo
Adelie 95.39% (145) 0.00% (0) 4.61% (7)
Chinstrap 5.88% (4) 8.82% (6) 85.29% (58)
Gentoo 6.45% (8) 4.84% (6) 88.71% (110)
$overall_accuracy
[1] 0.7587209Similarly naive_classification_summary_cv() returns the
cross validated confusion matrix. The k argument represents
the number of folds to use for cross-validation.
naive_classification_summary_cv(model = example_model, data = penguins_bayes,
y = "species", k = 2)
$folds
fold Adelie Chinstrap Gentoo overall_accuracy
1 1 0.9864865 0.34482759 0.7826087 0.7965116
2 2 0.8974359 0.05128205 0.9272727 0.7151163
$cv
species Adelie Chinstrap Gentoo
Adelie 94.08% (143) 0.00% (0) 5.92% (9)
Chinstrap 5.88% (4) 17.65% (12) 76.47% (52)
Gentoo 8.87% (11) 6.45% (8) 84.68% (105)These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.