The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In this vignette a brief overview of classification metrics in {SLmetrics} will be
provided. The classification interface is broadly divided into two
methods: foo.cmatrix()
and foo.factor()
. The
former calculates the classification from a confusion matrix, while the
latter calculates the same metric from two vectors: a vector of
actual
values and a vector of predicted
values. Both are vectors of [factor] values.
Throughout this vignette, the following data will be used:
# 1) seed
set.seed(1903)
# 2) actual values
actual <- factor(
x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)
# 3) predicted values
predicted <- factor(
x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)
# 4) sample weights
weights <- runif(
n = length(actual)
)
Assume that the predicted
values come from a trained
machine learning model. This vignette introduces a subset of the metrics
available in {SLmetrics}; see the online documentation for
more details and other metrics.
The accuracy of the model can be evaluated using the
accuracy()
-function as follows:
Many classification metrics have different names yet compute the same
underlying value. For example, recall
is also known as the
true positive rate
or sensitivity
. These
metrics can be calculated as follows:
# 1) calculate recall
recall(
actual = actual,
predicted = predicted
)
#> A B C
#> 0.3333333 0.2500000 0.3333333
# 2) calculate sensitivity
sensitivity(
actual = actual,
predicted = predicted
)
#> A B C
#> 0.3333333 0.2500000 0.3333333
# 1) calculate true positive rate
tpr(
actual = actual,
predicted = predicted
)
#> A B C
#> 0.3333333 0.2500000 0.3333333
By default, all classification functions calculates the class-wise
performance metrics where possible. The performance metrics can also be
aggregated in micro
and macro
averages by
using the micro
-parameter:
# 1) macro average
recall(
actual = actual,
predicted = predicted,
micro = FALSE
)
#> [1] 0.3055556
# 2) micro average
recall(
actual = actual,
predicted = predicted,
micro = TRUE
)
#> [1] 0.3
Calculating multiple performance metrics using separate calls to
foo.factor()
can be inefficient because each function
reconstructs the underlying confusion matrix. A more efficient approach
is to construct the confusion matrix once and then pass it to your
chosen metric function. To do this, you can use the
cmatrix()
function:
# 1) confusion matrix
confusion_matrix <- cmatrix(
actual = actual,
predicted = predicted
)
# 2) summarise confusion matrix
summary(
confusion_matrix
)
#> Confusion Matrix (3 x 3)
#> ================================================================================
#> A B C
#> A 1 0 2
#> B 1 1 2
#> C 1 1 1
#> ================================================================================
#> Overall Statistics (micro average)
#> - Accuracy: 0.30
#> - Balanced Accuracy: 0.31
#> - Sensitivity: 0.30
#> - Specificity: 0.65
#> - Precision: 0.30
Now you can pass the confusion matrix directly into the metric functions:
The weighted classification metrics can be calculated by using the
weighted.foo
-method which have a similar interface as the
unweighted versions above. Below is an example showing how to compute a
weighted version of recall
:
# 1) calculate recall
weighted.recall(
actual = actual,
predicted = predicted,
w = weights
)
#> A B C
#> 0.3359073 0.3027334 0.4245202
# 2) calculate sensitivity
weighted.sensitivity(
actual = actual,
predicted = predicted,
w = weights
)
#> A B C
#> 0.3359073 0.3027334 0.4245202
# 1) calculate true positive rate
weighted.tpr(
actual = actual,
predicted = predicted,
w = weights
)
#> A B C
#> 0.3359073 0.3027334 0.4245202
A small disclaimer applies to weighted metrics: it is
not possible to pass a weighted confusion matrix
directly into a weighted.foo()
method. Consider the
following example:
# 1) calculate weighted confusion matrix
weighted_confusion_matrix <- weighted.cmatrix(
actual = actual,
predicted = predicted,
w = weights
)
# 2) calculate weighted accuracy
try(
weighted.accuracy(weighted_confusion_matrix)
)
#> Error in UseMethod(generic = "weighted.accuracy", object = ..1) :
#> no applicable method for 'weighted.accuracy' applied to an object of class "cmatrix"
This approach throws an error. Instead, pass the weighted confusion
matrix into the unweighted function that uses a confusion matrix
interface (i.e., foo.cmatrix()
). For example:
This returns the same weighted accuracy
as if it were
calculated directly:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.