This vignette is intended for those seeking a quick transiton from Windows JAFROC to RJafroc. It is assumed that the user is familiar with the JAFROC data format and can analyze a dataset using the Windows program. First, let me describe the structure in R
of an RJafroc
dataset. Later I will tell you how to read a JAFROC format file to create an RJafroc
dataset.
Let us start with a predefined dataset {dataset3}
corresponding to the Franken ROC data. Let us examine the structure of this dataset.
str(dataset03)
#> List of 8
#> $ NL : num [1:2, 1:4, 1:100, 1] 3 3 4 3 3 ...
#> $ LL : num [1:2, 1:4, 1:67, 1] 5 5 4 4 5 4 4 5 2 2 ...
#> $ lesionNum : int [1:67] 1 1 1 1 1 1 1 1 1 1 ...
#> $ lesionID : num [1:67, 1] 1 1 1 1 1 1 1 1 1 1 ...
#> $ lesionWeight: num [1:67, 1] 1 1 1 1 1 1 1 1 1 1 ...
#> $ dataType : chr "ROC"
#> $ modalityID : Named chr [1:2] "0" "1"
#> ..- attr(*, "names")= chr [1:2] "0" "1"
#> $ readerID : Named chr [1:4] "0" "1" "2" "3"
#> ..- attr(*, "names")= chr [1:4] "0" "1" "2" "3"
It shows a list with 8 members. The false positive ratings are contained in {NL}
, an array with dimensions [1:2,1:4,1:100,1]
. The first index corresponds to treatments, and since the dataset has 2 treatments, the corresponding dimension is 2. The second index corresponds to readers, and since the dataset has 4 readers, the corresponding dimension is 4. The third index corresponds to the total number of cases. Since the dataset has 100 cases, the corresponding dimension is 100. But, as you can see from the code below, the entries in this array for cases 34 through 100 are -Inf
.
dataset03$NL[1,1,34:100,1]
#> [1] -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [15] -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [29] -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [43] -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf
#> [57] -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf
This is because in the ROC paradigm false positive are not possible on diseased cases. So the actual FP ratings are contained in the first 33 elements of the array. How did I know that there are 34 non-diseased cases? This can be understood in several ways.
LL
is an array with dimensions [1:2,1:4,1:67,1]
. This implies 67 diseased cases, and by subtraction from 100, there must be 34 non-diseased cases.lesionNum
, lesionID
and lesionWeight
are all arrays with dimensions [1:67,1]
containing ones. Again, these imply 67 diseased cases.lesionNum
, lesionID
and lesionWeight
, while not needed for ROC data, are needed, as we shall see later, for the FROC paradigm.The dataType
list member is the character string "ROC"
, characterizing the ROC dataset. Alternatives are "FROC"
and "LROC"
.
dataset03$dataType
#> [1] "ROC"
The modalityID
list member is a character string with two entries, “0” and “1”, corresponding to the two treatments (i.e., modalities). These can be longer strings, if you please, that label the two treatments.
dataset03$modalityID
#> 0 1
#> "0" "1"
The readerID
list member is a character string with four entries, “0”, “1”, “2” and “3” corresponding to the four readers. These can be longer strings that label the four readers.
dataset03$readerID
#> 0 1 2 3
#> "0" "1" "2" "3"
Here are the actual ratings for cases 1:34.
dataset03$NL[1,1,1:34,1]
#> [1] 3 -Inf 2 2 2 2 2 4 -Inf -Inf 4 2 -Inf 2
#> [15] 4 2 -Inf 2 -Inf 2 4 2 3 2 2 2 4 3
#> [29] 2 2 2 5 3 -Inf
This says that for treatment 1 and reader 1, (non-diseased) case 1 was rated 3, case 3 was rated 2, case 8 was rated 4, etc. The -Inf
corresponds to the cases rated 1, which is is equivalent to a 1-rating. The reason for this is that the ratings are ordered labels. As far as the ordering is concerned, nothing is changed be replacing -Inf
with 1 and vice-versa.
As another example, for treatment 2 and reader 3,
dataset03$NL[2,3,1:34,1]
#> [1] 3 -Inf 2 2 2 2 4 4 2 3 2 2 -Inf 3
#> [15] 2 4 2 3 2 2 2 2 2 4 2 2 -Inf 2
#> [29] 2 2 2 4 2 -Inf
As you can see, there are no cases that are explicitly rated 1, so changing the -Inf
to 1 does not change the ordering of the ratings.
There is a file includedRocData.xlsx
that is part of the package intallation. Since it is a system file one must get its name as follows.
fileName <- "includedRocData.xlsx"
sysFileName <- system.file(paste0("extdata/",fileName), package = "RJafroc", mustWork = TRUE)
Next, one uses DfReadDataFile() as follows, assuming it is a JAFROC format file.
ds <- DfReadDataFile(sysFileName)
Now ds is the desired dataset.
str(ds)
#> List of 8
#> $ NL : num [1:2, 1:5, 1:114, 1] 1 3 2 3 2 2 1 2 3 2 ...
#> $ LL : num [1:2, 1:5, 1:45, 1] 5 5 5 5 5 5 5 5 5 5 ...
#> $ lesionNum : int [1:45] 1 1 1 1 1 1 1 1 1 1 ...
#> $ lesionID : num [1:45, 1] 1 1 1 1 1 1 1 1 1 1 ...
#> $ lesionWeight: num [1:45, 1] 1 1 1 1 1 1 1 1 1 1 ...
#> $ dataType : chr "ROC"
#> $ modalityID : Named chr [1:2] "0" "1"
#> ..- attr(*, "names")= chr [1:2] "0" "1"
#> $ readerID : Named chr [1:5] "0" "1" "2" "3" ...
#> ..- attr(*, "names")= chr [1:5] "0" "1" "2" "3" ...
Analysis is illustrated for dataset03
, but one could have used the newly created dataset ds
.
This illustrates the StSignificanceTesting()
function. The significance testing method is specified as "DBMH"
and the figure of merit FOM
is specified as “Wilcoxon”.
ret <- StSignificanceTesting(dataset03, method = "DBMH", FOM = "Wilcoxon")
print(ret)
#> $fomArray
#> Rdr - 0 Rdr - 1 Rdr - 2 Rdr - 3
#> Trt - 0 0.8534600 0.8649932 0.8573044 0.8152420
#> Trt - 1 0.8496156 0.8435097 0.8401176 0.8143374
#>
#> $anovaY
#> Source SS DF MS
#> 1 T 0.02356541 1 0.023565410
#> 2 R 0.20521800 3 0.068406000
#> 3 C 52.52839868 99 0.530589886
#> 4 TR 0.01506079 3 0.005020264
#> 5 TC 6.41004881 99 0.064747968
#> 6 RC 39.24295381 297 0.132131158
#> 7 TRC 22.66007764 297 0.076296558
#> 8 Total 121.08532315 799 NA
#>
#> $anovaYi
#> Source DF 0 1
#> 1 R 3 0.04926635 0.02415991
#> 2 C 99 0.29396753 0.30137032
#> 3 RC 297 0.10504787 0.10337984
#>
#> $varComp
#> varComp
#> Var(R) 3.775568e-05
#> Var(C) 5.125091e-02
#> Var(T*R) -7.127629e-04
#> Var(T*C) -2.887147e-03
#> Var(R*C) 2.791730e-02
#> Var(Error) 7.629656e-02
#>
#> $fRRRC
#> [1] 4.694058
#>
#> $ddfRRRC
#> [1] 3
#>
#> $pRRRC
#> [1] 0.1188379
#>
#> $ciDiffTrtRRRC
#> Treatment Estimate StdErr DF t Pr > t CI Lower
#> 1 0 - 1 0.01085482 0.005010122 3 2.166577 0.1188379 -0.005089627
#> CI Upper
#> 1 0.02679926
#>
#> $ciAvgRdrEachTrtRRRC
#> Treatment Area StdErr DF CI Lower CI Upper
#> 1 0 0.8477499 0.02440215 70.12179 0.7990828 0.8964170
#> 2 1 0.8368951 0.02356642 253.64403 0.7904843 0.8833058
#>
#> $fFRRC
#> [1] 0.363956
#>
#> $ndf
#> [1] 1
#>
#> $ddfFRRC
#> [1] 99
#>
#> $pFRRC
#> [1] 0.547697
#>
#> $ciDiffTrtFRRC
#> Treatment Estimate StdErr DF t Pr > t CI Lower
#> 1 0 - 1 0.01085482 0.01799277 99 0.6032876 0.547697 -0.02484675
#> CI Upper
#> 1 0.04655638
#>
#> $ciAvgRdrEachTrtFRRC
#> Treatment Area StdErr DF CI Lower CI Upper
#> 1 0 0.8477499 0.02710939 99 0.7939590 0.9015408
#> 2 1 0.8368951 0.02744860 99 0.7824311 0.8913591
#>
#> $ssAnovaEachRdr
#> Source DF 0 1 2 3
#> 1 T 1 7.389761e-04 0.02307702 0.01476929 4.091217e-05
#> 2 C 99 2.018360e+01 22.12074893 21.21043057 2.825657e+01
#> 3 TC 99 9.064315e+00 7.94764631 6.06166901 5.996496e+00
#>
#> $msAnovaEachRdr
#> Source DF 0 1 2 3
#> 1 T 1 0.0007389761 0.02307702 0.01476929 4.091217e-05
#> 2 C 99 0.2038747746 0.22344191 0.21424677 2.854199e-01
#> 3 TC 99 0.0915587344 0.08027926 0.06122898 6.057067e-02
#>
#> $ciDiffTrtEachRdr
#> Reader Treatment Estimate StdErr DF t Pr > t
#> 1 0 0 - 1 0.0038444143 0.04279223 99 0.08983908 0.9285966
#> 2 1 0 - 1 0.0214834916 0.04006975 99 0.53615233 0.5930559
#> 3 2 0 - 1 0.0171867933 0.03499399 99 0.49113552 0.6244176
#> 4 3 0 - 1 0.0009045681 0.03480536 99 0.02598933 0.9793182
#> CI Lower CI Upper
#> 1 -0.08106465 0.08875348
#> 2 -0.05802359 0.10099057
#> 3 -0.05224888 0.08662247
#> 4 -0.06815683 0.06996596
#>
#> $fRRFC
#> [1] 4.694058
#>
#> $ddfRRFC
#> [1] 3
#>
#> $pRRFC
#> [1] 0.1188379
#>
#> $ciDiffTrtRRFC
#> Treatment Estimate StdErr DF t Pr > t CI Lower
#> 1 0 - 1 0.01085482 0.005010122 3 2.166577 0.1188379 -0.005089627
#> CI Upper
#> 1 0.02679926
#>
#> $ciAvgRdrEachTrtRRFC
#> Treatment Area StdErr DF CI Lower CI Upper
#> 1 0 0.8477499 0.01109801 3 0.8124311 0.8830687
#> 2 1 0.8368951 0.00777173 3 0.8121620 0.8616282
The function returns a long unwieldy list. Let us consider them one by one. The function UtilOutputReport()
can generate an Excel file report, making it much easier to visualize the results. This is described in another vignette.
fomArray
contains the [1:2,1:4]
FOM values.ret$fomArray
#> Rdr - 0 Rdr - 1 Rdr - 2 Rdr - 3
#> Trt - 0 0.8534600 0.8649932 0.8573044 0.8152420
#> Trt - 1 0.8496156 0.8435097 0.8401176 0.8143374
This shows the 2 x 4 array of FOM values.
anovaY
, where the Y denotes that these are pseudovalue based, is the ANOVA table.ret$anovaY
#> Source SS DF MS
#> 1 T 0.02356541 1 0.023565410
#> 2 R 0.20521800 3 0.068406000
#> 3 C 52.52839868 99 0.530589886
#> 4 TR 0.01506079 3 0.005020264
#> 5 TC 6.41004881 99 0.064747968
#> 6 RC 39.24295381 297 0.132131158
#> 7 TRC 22.66007764 297 0.076296558
#> 8 Total 121.08532315 799 NA
anovaYi
is the ANOVA table for individual treatments.ret$anovaYi
#> Source DF 0 1
#> 1 R 3 0.04926635 0.02415991
#> 2 C 99 0.29396753 0.30137032
#> 3 RC 297 0.10504787 0.10337984
The 0 and 1 headers come from the treatment names.
varComp
is the variance components (needed for sample size estimation).ret$varComp
#> varComp
#> Var(R) 3.775568e-05
#> Var(C) 5.125091e-02
#> Var(T*R) -7.127629e-04
#> Var(T*C) -2.887147e-03
#> Var(R*C) 2.791730e-02
#> Var(Error) 7.629656e-02
fRRRC
is the F-statistic for testing the NH that the treatments have identical FOMs. RRRC means random-reader random-case generalization.ret$fRRRC
#> [1] 4.694058
ddffRRRC
is the denominator degrees of freedom of the F-statistic.ret$ddffRRRC
#> NULL
pRRRC
is the p-value of the test.ret$pRRRC
#> [1] 0.1188379
ciDiffTrtRRRC
is the 95% confidence interval of reader-averaged differences between treatments.ret$ciDiffTrtRRRC
#> Treatment Estimate StdErr DF t Pr > t CI Lower
#> 1 0 - 1 0.01085482 0.005010122 3 2.166577 0.1188379 -0.005089627
#> CI Upper
#> 1 0.02679926
ciAvgRdrEachTrtRRRC
is the 95% confidence interval of reader-averaged FOMs for each treatments.ret$ciAvgRdrEachTrtRRRC
#> Treatment Area StdErr DF CI Lower CI Upper
#> 1 0 0.8477499 0.02440215 70.12179 0.7990828 0.8964170
#> 2 1 0.8368951 0.02356642 253.64403 0.7904843 0.8833058
fFRRC
is the F-statistic for fixed-reader random-case analysis.ret$fFRRC
#> [1] 0.363956
ndf
is the numerator degrees of freedom of the F-statistic, always one less than the number of treatments.ret$ndf
#> [1] 1
ddfFRRC
is the denominator degreesof freedom of the F-statistic, for fixed-reader random-case analysis.ret$ddfFRRC
#> [1] 99
pFRRC
is the p-value for fixed-reader random-case analysis.ret$pFRRC
#> [1] 0.547697
ciDiffTrtFRRC
is the 95% CI of reader-average differences between treatments for fixed-reader random-case analysisret$ciDiffTrtFRRC
#> Treatment Estimate StdErr DF t Pr > t CI Lower
#> 1 0 - 1 0.01085482 0.01799277 99 0.6032876 0.547697 -0.02484675
#> CI Upper
#> 1 0.04655638
ciAvgRdrEachTrtRRFC
is the 95% CI of reader-average FOMs of each treatment for fixed-reader random-case analysisret$ciAvgRdrEachTrtRRFC
#> Treatment Area StdErr DF CI Lower CI Upper
#> 1 0 0.8477499 0.01109801 3 0.8124311 0.8830687
#> 2 1 0.8368951 0.00777173 3 0.8121620 0.8616282
ssAnovaEachRdr
is the sum of squares ANOVA for each readerret$ssAnovaEachRdr
#> Source DF 0 1 2 3
#> 1 T 1 7.389761e-04 0.02307702 0.01476929 4.091217e-05
#> 2 C 99 2.018360e+01 22.12074893 21.21043057 2.825657e+01
#> 3 TC 99 9.064315e+00 7.94764631 6.06166901 5.996496e+00
msAnovaEachRdr
is the mean squares ANOVA for each readerret$msAnovaEachRdr
#> Source DF 0 1 2 3
#> 1 T 1 0.0007389761 0.02307702 0.01476929 4.091217e-05
#> 2 C 99 0.2038747746 0.22344191 0.21424677 2.854199e-01
#> 3 TC 99 0.0915587344 0.08027926 0.06122898 6.057067e-02
ciDiffTrtFRRC
is the CI for reader-averaged treatment differences, for fixed-reader random-case analysisret$ciDiffTrtFRRC
#> Treatment Estimate StdErr DF t Pr > t CI Lower
#> 1 0 - 1 0.01085482 0.01799277 99 0.6032876 0.547697 -0.02484675
#> CI Upper
#> 1 0.04655638
fRRFC
is the F-statistic for for random-reader fixed-case analysisret$fRRFC
#> [1] 4.694058
ddfRRFC
is the ddf for for random-reader fixed-case analysisret$ddfRRFC
#> [1] 3
pRRFC
is the p-value for for random-reader fixed-case analysisret$pRRFC
#> [1] 0.1188379
ciDiffTrtRRFC
is the CI for reader-averaged inter-treatment FOM differences for random-reader fixed-case analysisret$ciDiffTrtRRFC
#> Treatment Estimate StdErr DF t Pr > t CI Lower
#> 1 0 - 1 0.01085482 0.005010122 3 2.166577 0.1188379 -0.005089627
#> CI Upper
#> 1 0.02679926
ciAvgRdrEachTrtRRFC
is the CI for treatment FOMs for each reader for random-reader fixed-case analysisret$ciAvgRdrEachTrtRRFC
#> Treatment Area StdErr DF CI Lower CI Upper
#> 1 0 0.8477499 0.01109801 3 0.8124311 0.8830687
#> 2 1 0.8368951 0.00777173 3 0.8121620 0.8616282