The receiver operating characteristic (ROC) is an extensively utilized method for comparing binary classifiers in various areas. However, many real-world problems are designed to multiple classes (e.g., tumor, node, and metastasis staging system of cancer), which require an evaluation strategy to assess multiclass classifiers. This package aims to fill the gap by enabling the calculation of multiclass ROC-AUC with confidence intervals and the generation of publication-quality figures of multiclass ROC curves.
library(multiROC)
data(test_data)
head(test_data)
#> G1_true G2_true G3_true G1_pred_m1 G2_pred_m1 G3_pred_m1 G1_pred_m2
#> 1 1 0 0 0.8566867 0.1169520 0.02636133 0.4371601
#> 2 1 0 0 0.8011788 0.1505448 0.04827643 0.3075236
#> 3 1 0 0 0.8473608 0.1229815 0.02965766 0.3046363
#> 4 1 0 0 0.8157730 0.1422322 0.04199482 0.2378494
#> 5 1 0 0 0.8069553 0.1472971 0.04574766 0.4067347
#> 6 1 0 0 0.6894488 0.2033285 0.10722271 0.1063048
#> G2_pred_m2 G3_pred_m2
#> 1 0.1443851 0.41845482
#> 2 0.5930025 0.09947397
#> 3 0.4101367 0.28522698
#> 4 0.5566147 0.20553591
#> 5 0.2355822 0.35768312
#> 6 0.4800507 0.41364450
This example dataset contains two classifiers (m1, m2), and three groups (G1, G2, G3).
res <- multi_roc(test_data, force_diag=T)
The function multi_roc is the core function for calculating multiclass ROC-AUC.
Arguments of multi_roc:
true_pred is the dataset contains both of true labels and corresponding predicted scores. True labels (0 - Negative, 1 - Positive) columns should be named as XX_true (e.g., S1_true, S2_true) and predictive scores (continuous) columns should be named as XX_pred_YY (e.g., S1_pred_SVM, S2_pred_RF). Predictive scores can be probabilities among [0, 1] or other continuous values. For each classifier, the number of columns should be equal to the number of groups of true labels.
If force_diag equals TRUE, true positive rate (TPR) and false positive rate (FPR) will be forced to across (0, 0) and (1, 1).
Outputs of multi_roc:
Specificity contains a list of specificities for each group of different classifiers.
Sensitivity contains a list of sensitivities for each group of different classifiers.
AUC contains a list of AUC for each group of different classifiers. Micro-average ROC/AUC was calculated by stacking all groups together, thus converting the multi-class classification into binary classification. Macro-average ROC/AUC was calculated by averaging all groups results (one vs rest) and linear interpolation was used between points of ROC.
Methods shows names of different classifiers.
Groups shows names of different groups.
unlist(res$AUC)
#> m1.G1 m1.G2 m1.G3 m1.macro m1.micro m2.G1 m2.G2
#> 0.7233607 0.5276190 0.9751462 0.7420609 0.8824221 0.3237705 0.3723810
#> m2.G3 m2.macro m2.micro
#> 0.4020468 0.3665670 0.4174394
This list shows the following AUC information:
multi_roc_auc <- function(true_pred_data, idx) {
results <- multi_roc(true_pred_data[idx, ])$AUC
results <- unlist(results)
return(results)
}
res_boot <- boot::boot(data=test_data, statistic=multi_roc_auc, R=1000)
res_boot_ci <- boot::boot.ci(res_boot, type='bca', index=4)
res_boot_ci
#> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#> Based on 1000 bootstrap replicates
#>
#> CALL :
#> boot::boot.ci(boot.out = res_boot, type = "bca", index = 4)
#>
#> Intervals :
#> Level BCa
#> 95% ( 0.6520, 0.8311 )
#> Calculations and Intervals on Original Scale
Here, we set index = 4 to calculate 95% CI of AUC of Macro in the classifier m1 based on 1000 bootstrap replicates as an example.
n_method <- length(unique(res$Methods))
n_group <- length(unique(res$Groups))
res_df <- data.frame(Specificity= numeric(0), Sensitivity= numeric(0), Group = character(0), AUC = numeric(0), Method = character(0))
for (i in 1:n_method) {
for (j in 1:n_group) {
temp_data_1 <- data.frame(Specificity=res$Specificity[[i]][j],
Sensitivity=res$Sensitivity[[i]][j],
Group=unique(res$Groups)[j],
AUC=res$AUC[[i]][j],
Method = unique(res$Methods)[i])
colnames(temp_data_1) <- c("Specificity", "Sensitivity", "Group", "AUC", "Method")
res_df <- rbind(res_df, temp_data_1)
}
temp_data_2 <- data.frame(Specificity=res$Specificity[[i]][n_group+1],
Sensitivity=res$Sensitivity[[i]][n_group+1],
Group= "Macro",
AUC=res$AUC[[i]][n_group+1],
Method = unique(res$Methods)[i])
temp_data_3 <- data.frame(Specificity=res$Specificity[[i]][n_group+2],
Sensitivity=res$Sensitivity[[i]][n_group+2],
Group= "Micro",
AUC=res$AUC[[i]][n_group+2],
Method = unique(res$Methods)[i])
colnames(temp_data_2) <- c("Specificity", "Sensitivity", "Group", "AUC", "Method")
colnames(temp_data_3) <- c("Specificity", "Sensitivity", "Group", "AUC", "Method")
res_df <- rbind(res_df, temp_data_2)
res_df <- rbind(res_df, temp_data_3)
}
ggplot2::ggplot(res_df, ggplot2::aes(x = 1-Specificity, y=Sensitivity)) + ggplot2::geom_path(ggplot2::aes(color = Group, linetype=Method)) + ggplot2::geom_segment(ggplot2::aes(x = 0, y = 0, xend = 1, yend = 1), colour='grey', linetype = 'dotdash') + ggplot2::theme_bw() + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5), legend.justification=c(1, 0), legend.position=c(.95, .05), legend.title=ggplot2::element_blank(), legend.background = ggplot2::element_rect(fill=NULL, size=0.5, linetype="solid", colour ="black"))