The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The prcbench
package is a testing workbench for
evaluating precision-recall curves, which requires simple three step
processes to perform evaluations of libraries that create
precision-recall plots.
Tool selection by using the tool interface
Test data selection/creation by using the test data interface
Select pre-defined test data for the accuracy evaluation
Define randomly generated test data for the running-time evaluation
Run a evaluation function with the selected tools and test data sets
Accuracy evaluation of precision-recall curves
Running-time evaluation of precision-recall curves
In addition to predifined tools and test data sets, the
prcbench
package provides help functions for users to
define their own tools and datasets.
User-defined test data interface
User-defined test data for the accuracy evaluation
User-defined test data for the running-time evaluation
The prcbench
package provides predefined interfaces for
the following five tools that calculate precision-recall curves.
Tool | Language | Link |
---|---|---|
precrec | R | Tool web site, CRAN |
ROCR | R | Tool web site, CRAN |
PRROC | R | CRAN |
AUCCalculator | Java | Tool web site |
PerfMeas | R | CRAN |
The create_toolset
function generates a tool set with a
combination of the five tools.
library(prcbench)
## A single tool
<- create_toolset("ROCR")
toolsetA
## Multiple tools
<- create_toolset(c("PerfMeas", "PRROC"))
toolsetB
## Tool sets can be manually combined to a single set
<- c(toolsetA, toolsetB) toolsetAB
create_toolset
functionThe create_toolset
function takes two additional
arguments - calc_auc
and store_res
.
calc_auc
decides whether tools calculate AUC score
or not (Calculation of AUCs are optional for the running-time
evaluation, but not necessary for the evaluation of accurate
precision-recall curves)
store_res
decides whether tools store the calculated
curves or not (actual curves are required for the evaluation of accurate
precision-recall curves)
The following six tool sets are predefined with a different combination of tools along with default argument values.
Set name | Tools | calc_auc | store_res |
---|---|---|---|
def5 | ROCR, AUCCalculator, PerfMeas, PRROC, precrec | TRUE | TRUE |
auc5 | ROCR, xAUCCalculator, PerfMeas, PRROC, precrec | TRUE | FALSE |
crv5 | ROCR, AUCCalculator, PerfMeas, PRROC, precrec | FALSE | TRUE |
def4 | ROCR, AUCCalculator, PerfMeas, precrec | TRUE | TRUE |
auc4 | ROCR, AUCCalculator, PerfMeas, precrec | TRUE | FALSE |
crv4 | ROCR, AUCCalculator, PerfMeas, precrec | FALSE | TRUE |
## Use 'set_names'
<- create_toolset(set_names = "auc5")
toolsetC
## Multiple sets are automatically combined to a single set
<- create_toolset(set_names = c("auc5", "crv4")) toolsetD
The prcbench
package provides two different types of
test data sets.
curve
: evaluates the accuracy of precision-recall
curvesbench
: measures running times of creating
precision-recall curvesThe create_testset
function offers both types of test
data by setting the first argument either as “curve” or “bench”.
The create_testset
function takes predefined set names
for curve evaluation. These data sets contain pre-calculated precision
and recall values. The pre-calculated values must be correct so that
they can be compared with the results of specified tools.
The following four test sets are currently available.
name | #scores&labels | #pos labels | #neg labels | expected #points | expected start | expected end |
---|---|---|---|---|---|---|
c1 | 4 | 2 | 2 | 6 | (0, 1) | (1, 0.5) |
c2 | 4 | 2 | 2 | 6 | (0, 0.5) | (1, 0.5) |
c3 | 4 | 2 | 2 | 6 | (0, 0) | (1, 0.5) |
c4 | 8 | 4 | 4 | 9 | (0, 1) | (1, 0.5) |
## C1 test set
<- create_testset("curve", "c1")
testset2A
## C2 test set
<- create_testset("curve", "c2")
testset2B
## Test data sets can be manually combined to a single set
<- c(testset2A, testset2B)
testset2AB
## Multiple sets are automatically combined to a single set
<- create_testset("curve", c("c1", "c2")) testset2C
The create_testset
function uses a naming convention for
randomly generated data for benchmarking. The format is a prefix (‘b’ or
‘i’) followed by the number of dataset. The prefix ‘b’ indicates a
balanced dataset, whereas ‘i’ indicates an imbalanced dataset. The
number can be used with a suffix ‘k’ or ‘m’, indicating respectively
1000 or 1 million.
## A balanced data set with 50 positives and 50 negatives
<- create_testset("bench", "b100")
testset1A
## An imbalanced data set with 2500 positives and 7500 negatives
<- create_testset("bench", "i10k")
testset1B
## Test data sets can be manually combined to a single set
<- c(testset1A, testset1B)
testset1AB
## Multiple sets are automatically combined to a single set
<- create_testset("bench", c("i10", "b10")) testset1C
The prcbench
package currently provides two differnt
types of peformance evaluation.
Accuracy evaluation of precision-recall curves
Running-time evaluation of precision-recall curves
The run_evalcurve
function evaluates precision-recall
curves with the following five test cases. The basic idea is that the
function returns the full score as long as the points generated by a
library matches with the manually calculated recall and precision
values.
Test case | Description |
---|---|
fpoint | Check the first point |
int_pts | Check the intermediate points |
epoint | Check the end point |
x_range | Evaluate a range between two recall values |
y_range | Evaluate a range between two precision values |
The run_evalcurve
function calculates the scores of the
test cases and summarizes them to a data frame.
## Evaluate precision-recall curves for ROCR and precrec with c1 test set
<- create_testset("curve", "c1")
testset <- create_toolset(c("ROCR", "precrec"))
toolset <- run_evalcurve(testset, toolset)
scores scores
## testset toolset toolname score
## 1 c1 ROCR ROCR 5/8
## 2 c1 precrec precrec 8/8
The result of each test case can be displayed by specifying
data_type
= all
of the print
function.
## Print all results
print(scores, data_type = "all")
## testset toolset toolname testitem testcat success total
## 1 c1 ROCR ROCR x_range Rg 1 1
## 2 c1 ROCR ROCR y_range Rg 1 1
## 3 c1 ROCR ROCR fpoint SE 0 1
## 4 c1 ROCR ROCR intpts Ip 2 4
## 5 c1 ROCR ROCR epoint SE 1 1
## 6 c1 precrec precrec x_range Rg 1 1
## 7 c1 precrec precrec y_range Rg 1 1
## 8 c1 precrec precrec fpoint SE 1 1
## 9 c1 precrec precrec intpts Ip 4 4
## 10 c1 precrec precrec epoint SE 1 1
The autoplot
shows a plot with the result of the
run_evalcurve
function.
## ggplot2 is necessary to use autoplot
library(ggplot2)
## Plot base points and the result of precrec on c1, c2, and c3 test sets
<- create_testset("curve", c("c1", "c2", "c3"))
testset <- create_toolset("precrec")
toolset <- run_evalcurve(testset, toolset)
scores1 autoplot(scores1)
## Plot the results of PerfMeas and PRROC on c1, c2, and c3 test sets
<- create_toolset(c("PerfMeas", "PRROC"))
toolset <- run_evalcurve(testset, toolset)
scores2 autoplot(scores2, base_plot = FALSE)
The run_benchmark
function internally calls the
microbenchmark
function provided by the microbenchmark
package. It takes a test set and a tool set and returns the result of
microbenchmark
.
## Run microbenchmark for aut5 on b10
<- create_testset("bench", "b10")
testset <- create_toolset(set_names = "auc5")
toolset <- run_benchmark(testset, toolset)
res res
## testset toolset toolname min lq mean median uq max neval
## 1 b10 auc5 AUCCalculator 1.883 2.238 4.78 2.733 3.41 13.62 5
## 2 b10 auc5 PRROC 0.137 0.140 0.17 0.145 0.15 0.26 5
## 3 b10 auc5 PerfMeas 0.057 0.059 0.10 0.077 0.12 0.21 5
## 4 b10 auc5 ROCR 1.537 1.543 1.62 1.547 1.58 1.90 5
## 5 b10 auc5 precrec 3.558 3.627 3.73 3.630 3.79 4.03 5
In addition to the predefined five tools, users can add new tool
interfaces for their own tools to run benchmarking and curve evaluation.
The create_usrtool
function takes a name of the tool and a
function for calculating a precision-recall curve.
## Create a new tool set for 'xyz'
<- "xyz"
toolname <- create_example_func()
calcfunc <- create_usrtool(toolname, calcfunc)
toolsetU
## User-defined tools can be combined with predefined tools
<- create_toolset("ROCR")
toolsetA <- c(toolsetA, toolsetU) toolsetU2
Like the predefined tool sets, user-defined tool sets can be used for
both run_benchmark
and run_evalcurve
.
## Curve evaluation
<- create_testset("curve", "c2")
testset3 <- run_evalcurve(testset3, toolsetU2)
scores3 autoplot(scores3, base_plot = FALSE)
The create_example_func
function creates an example for
the second argument of the create_usrtool
function. The
actual function should also take a testset
generated by the
create_testset
function and returns a list with three
elements - x
, y
, and auc
.
## Show an example of the second argument
<- create_example_func()
calcfunc print(calcfunc)
## function (single_testset)
## {
## scores <- single_testset$get_scores()
## list(x = seq(0, 1, 1/length(scores)), y = seq(0, 1, 1/length(scores)),
## auc = 0.5)
## }
## <bytecode: 0x5612b4b88920>
## <environment: 0x5612b5cea558>
The create_testset
function produces a
testset
as either TestDataB
or
TestDataC
object. See the help files of the R6 classes -
help(TestDataB)
and help(TestDataC)
- for the
methods that can be used with the precision-recall calculation.
The prcbench
package also supports user-defined test
data interfaces. The create_usrdata
function creates two
types of test datasets.
User-defined test data for the accuracy evaluation
User-defined test data for the running-time evaluation
The first argument of the create_usrdata
function should
be “curve” to create a test dataset for the accuracy evaluation. Scores
and labels as well as pre-calculated recall and precision values are
required. These pre-calculated values are used to compare with the
corresponding values predicted by the specified tools.
## Create a test dataset 'c5' for benchmarking
<- create_usrdata("curve",
testsetC scores = c(0.1, 0.2), labels = c(1, 0),
tsname = "c5", base_x = c(0.0, 1.0),
base_y = c(0.0, 0.5)
)
It can be used in the same way as the predefined test datasets
selected by create_testset
.
## Run curve evaluation for ROCR and precrec on a predefined test dataset
<- create_toolset(c("ROCR", "precrec"))
toolset2 <- run_evalcurve(testsetC, toolset2)
scores2 autoplot(scores2, base_plot = FALSE)
The first argument of the create_usrdata
function should
be “bench” to create a test dataset for the running-time evaluation.
Scores and labels are also required.
## Create a test dataset 'b5' for benchmarking
<- create_usrdata("bench",
testsetB scores = c(0.1, 0.2), labels = c(1, 0),
tsname = "b5"
)
It can be used in the same way as the test datasets generated by
create_testset
.
## Run microbenchmark for ROCR and precrec on a predefined test dataset
<- create_toolset(c("ROCR", "precrec"))
toolset <- run_benchmark(testsetB, toolset)
res res
## testset toolset toolname min lq mean median uq max neval
## 1 b5 ROCR ROCR 1.5 1.6 1.6 1.6 1.7 1.8 5
## 2 b5 precrec precrec 3.6 3.7 3.8 3.8 3.9 4.3 5
See our website - Classifier evaluation with imbalanced datasets – for useful tips for performance evaluation on binary classifiers. In addition, we have summarized potential pitfalls of ROC plots with imbalanced datasets. See our paper – The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets - for more details.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.