Introduction

About ROC Curves

The Receiver Operating Characteristic (ROC) curve is used to assess the accuracy of a continuous measurement for predicting a binary outcome. In medicine, ROC curves have a long history of use for evaluating diagnostic tests in radiology and general diagnostics. ROC curves have also been used for a long time in signal detection theory.

The accuracy of a diagnostic test can be evaluated by considering the two possible types of errors: false positives, and false negatives. For a continuous measurement that we denote as \(M\), convention dictates that a test positive is defined as \(M\) exceeding some fixed threshold \(c\): \(M > c\). In reference to the binary outcome that we denote as \(D\), a good outcome of the test is when the test is positive among an individual who truly has a disease: \(D = 1\). A bad outcome is when the test is positive among an individual who does not have the disease \(D = 0\).

Formally, for a fixed cutoff \(c\), the true positive fraction is the probability of a test positive among the diseased population:

\[ TPF(c) = P\{ M > c | D = 1 \} \]

and the false positive fraction is the probability of a test positive among the healthy population:

\[ FPF(c) = P\{ M > c | D = 0 \} \]

Since the cutoff \(c\) is not usually fixed in advance, we can plot the TPF against the FPF for all possible values of \(c\). This is exactly what the ROC curve is, \(FPF(c)\) on the \(x\) axis and \(TPF(c)\) along the \(y\) axis.

Motivation

In the medical literature, ROC curves are commonly plotted without the cutoff values displayed. Other problems with ROC curve plots are abundant in the medical literature. We aim to solve some of these problems by providing a plotting interface for the ROC curve that comes with sensible defaults. It is easy to create interactive ROC curves for local or web-based use. The next section details the usage of the plotROC package.

Usage

Shiny application

I created a shiny application in order to make the features more accessible to non-R users. A limited subset of the functions of the plotROC package can be performed on an example dataset or on data that users upload to the website. Resulting plots can be saved to the users’ machine as a pdf or as a stand-alone html file. It can be used in any modern web browser with no other dependencies at the website here: http://sachsmc.shinyapps.io/plotROC.

Installation and loading

plotROC can be installed from the CRAN, or installed from github.

install.packages("plotROC")
devtools::install_github("sachsmc/plotROC")
library(plotROC)

Quick start

After installing, the interactive Shiny application can be run locally.

shiny_plotROC()

Command line basic usage

I start by creating an example data set. The marker I generate is moderately accurate for predicting disease status.

D.ex <- rbinom(100, size = 1, prob = .5)
M.ex <- rnorm(100, mean = D.ex)

Next I use the calculate_roc function to compute the empirical ROC curve. The disease status need not be coded as 0/1, but if it is not, calculate_roc assumes (with a warning) that the lowest value in sort order signifies disease-free status. This returns a dataframe with three columns: the cutoff values, the TPF and the FPF.

roc.estimate <- calculate_roc(M.ex, D.ex)
str(roc.estimate)
## 'data.frame':    100 obs. of  3 variables:
##  $ c  : num  -2.62 -1.85 -1.64 -1.55 -1.54 ...
##  $ TPF: num  1 1 1 1 1 ...
##  $ FPF: num  0.982 0.964 0.946 0.929 0.911 ...

The rocdata is passed to the ggroc function with an optional label. This creates a ggplot object of the ROC curve using the ggplot2 package.

single.rocplot <- ggroc(roc.estimate, label = "Example")

The myrocplot object can be used to create an interactive plot and display it in the Rstudio viewer or default web browser by passing it to the plot_interactive_roc function. Give the function an optional path to an html file as an argument called file to save the interactive plot as a complete web page. Hovering over the display shows the cutoff value at the point nearest to the cursor. Clicking makes the cutoff label stick until the next click, and if confidence regions are available, clicks will also display those as grey rectangles.

plot_interactive_roc(single.rocplot)

An interactive ROC plot can be exported by using the export_interactive_roc function, which returns a character string containing the necessary HTML and JavaScript. The character string can be copy-pasted into an html document, or better yet, incorporated directly into a dynamic document using knitr (knitr homepage).

In a knitr document, it is necessary to use the cat function on the results and use the chunk options results = 'asis' and fig.keep='none' so that the interactive plot is displayed correctly. For documents that contain multiple interactive plots, it is necessary to assign each plot a unique name using the prefix argument of export_interactive_roc. This is necessary to ensure that the JavaScript code manipulates the correct svg elements. The next code block shows an example knitr chunk that can be used in an .Rmd document to display an interactive plot.

```{r int-no, fig.keep='none', results = 'asis'}
cat(
  export_interactive_roc(single.rocplot, 
                        prefix = "a")
  )
```

The result is shown below:

cat(
  export_interactive_roc(single.rocplot, 
                        prefix = "a")
  )
Example 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 False positive fraction True positive fraction

The same ggroc object that I called myrocplot can be used to generate an ROC plot suitable for use in print. It annotates the cutoff values and is completely in black and white. A simple example with the default options is shown below.

plot_journal_roc(single.rocplot)

Illustration of ROC curve plot generated by plotROC for use in print.

Multiple ROC curves

If you have multiple tests of different types measured on the same subjects, you can use the calculate_multi_roc function to compute the empirical ROC curve for each test. It returns a list of data frames with the estimates and cutoff values. Then the multi_ggroc function creates the appropriate type of ggplot object. Confidence regions are not supported for multiple curves at the time of writing.

D.ex <- rbinom(100, 1, .5)

paired.data <- data.frame(M1 = rnorm(100, mean = D.ex), 
                       M2 = rnorm(100, mean = D.ex, sd = .4), 
                       M3 = runif(100), D = D.ex)

estimate.list <- calculate_multi_roc(paired.data, c("M1", "M2", "M3"), "D")

Labels can be added easily with the label option of multi_ggroc. The length of the label element should match the number of plotted curves. The multi_ggroc object can be passed to the plot_journal_roc and the export_interactive_roc functions, as desired. The resulting plot is shown next.

multi.rocplot <- multi_ggroc(estimate.list, label = c("M1", "M2", "M3"))
plot_journal_roc(multi.rocplot)

Illustration of plot with multiple curves.

Both plot_journal_roc and export_interactive_roc support a number of options to customize the look of the plots. By default, multiple curves are distinguished by different line types and direct labels. For multiple ROC curves that are similar to one another, those defaults can make it difficult to interpret the plots. Therefore, we also support colors and legends. The x- and y-axis can be changed by passing options to ggroc or multi_ggroc. The next code block illustrates the available options.

colorplot <- multi_ggroc(estimate.list, 
                         xlabel = "1 - Specificity", 
                         ylabel = "Sensitivity")
cat(
  export_interactive_roc(colorplot, lty = rep(1, 3), prefix = "multi3",
                         color = c("black", "purple", "orange"), 
                         legend = TRUE)
  )
0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 1 - Specificity Sensitivity A B C

Advanced options

Click to view confidence region

I use the ci = TRUE option in calculcate_roc and ggroc to compute confidence regions for points on the ROC curve using the Clopper and Pearson (1934) exact method. Briefly, exact confidence intervals are calculated for the \(FPF\) and \(TPF\) separately, each at level \(1 - \sqrt{1 - \alpha}\). Based on result 2.4 from Pepe (2003), the cross-product of these intervals yields a \(100 * (1 - \alpha)\) percent rectangular confidence region for the pair. The significance level can be specified using the alpha option.

roc.ci <- calculate_roc(paired.data$M1, paired.data$D, ci = TRUE, alpha = 0.05)
ci.rocplot <- ggroc(roc.ci, label = "CI Example", ci = TRUE)

For interactive plots, the confidence regions are automatically detected. When the user clicks on the ROC curve, the confidence region for the TPF and FPF is overlaid using a grey rectangle. The label and region stick until the next click.

cat(
  export_interactive_roc(ci.rocplot, 
                         prefix = "aci")
  )
CI Example 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 False positive fraction True positive fraction

For use in print, I pass a small vector of cutoff locations at which to display the confidence regions. This is shown in below.

plot_journal_roc(ci.rocplot, n.cuts = 10, 
                 ci.at = c(-.5, .5, 2.1))

Illustration of plot with exact confidence regions.

Themes and annotations

plotROC uses the ggplot2 package to create the objects to be plotted. Therefore, themes and annotations can be added in the usual ggplot2 way. A plot_journal_roc figure with a new theme, title, axis label, and AUC annotation is shown below.

library(ggplot2)
plot_journal_roc(ci.rocplot, n.cuts = 10, 
                 ci.at = c(-.5, .5, 2.1)) + 
  theme_grey() + 
  geom_abline(intercept = 0, slope = 1, color = "white") + 
  ggtitle("Themes and annotations") + 
  annotate("text", x = .75, y = .25, 
           label = "AUC = 0.80") +
  scale_x_continuous("1 - Specificity", breaks = seq(0, 1, by = .1))

Using ggplot2 themes and annotations with plotROC objects.

Other estimation methods

By default calculate_roc computes the empirical ROC curve. There are other estimation methods out there, as I have summarized in the introduction. Any estimation method can be used, as long as the cutoff, the TPF and the FPF are returned. Then you can simply pass those values in a data frame to the ggroc function. New in this latest verison is the ability to use the ROCR package directly.

library(ROCR)
D.ex <- rbinom(100, 1, .5)
M.ex <- rnorm(100, mean = D.ex, sd = .5)

rocr.est <- performance(prediction(M.ex, D.ex), "tpr", "fpr")
rocr.plot <- ggroc(rocr.est, label = "ROCR object")

cat(
  export_interactive_roc(rocr.plot, prefix = "rocr")
  )
ROCR object 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 False positive fraction True positive fraction

For another example, let us use the binormal method to create a smooth curve. This approach assumes that the test distribution is normal conditional on disease status.

mu1 <- mean(M.ex[D.ex == 1])
mu0 <- mean(M.ex[D.ex == 0])
s1 <- sd(M.ex[D.ex == 1])
s0 <- sd(M.ex[D.ex == 0])
c.ex <- seq(min(M.ex), max(M.ex), length.out = 300)

binorm.roc <- data.frame(c = c.ex, 
                             FPF = pnorm((mu0 - c.ex)/s0), 
                             TPF = pnorm((mu1 - c.ex)/s1)
                             )

Then I can pass this data.frame to the ggroc function as before. The example is shown in figure .

binorm.plot <- ggroc(binorm.roc, label = "Binormal")
cat(
  export_interactive_roc(binorm.plot, prefix = "binorm")
  )
Binormal 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 False positive fraction True positive fraction

Another potential use of this approach is for plotting time-dependent ROC curves for time-to-event outcomes estimated as desribed in Heagerty, Lumley, and Pepe (2000).

Acknowledgements

This package would not be possible without the following: