4) Error Correction

Florian Berding and Julia Pargmann

2022-11-08

1 Introduction

Literature regarding content analysis often presents the estimation of reliability as a part of the development phase of a coding scheme, for example to inform a revision (Krippendorff, 2019; Kuckartz, 2018; Mayring, 2015; Schreier, 2012). If the reliability is considered to be sufficient, the main study starts. Often, the reliability of the codings of this main study is not further controlled, as it is assumed that the reliability estimates of the development phase hold for the entire main study. Sometimes however, researchers communicate and discuss about their findings and assign a category to a coding unit by agreeing on the relevant category when the coding scheme is unclear.

With Iota Concept, the reliability of a coding scheme can be taken into account more explicitly during the main study. It provides the opportunity for error corrections, which is not possible with traditional measures such as Percentage Agreement, Cohen’s Kappa or Krippendorff’s Alpha.

The error correction of the Iota Concept is based on two ideas. First, an Assignment Error Matrix produces patterns of ratings for every coding unit. These patterns give hints which true category may be the source for this kind of observation. This requires at least two raters. Second, involving additional raters provides more information on a coding unit, which can improve the assignments. This idea is similar to the application of multiple items in questionnaires or tests where each item can be considered as a test for the phenomena of interest. Additional test are used to reduce errors.

In this vignette, we would like to continue our example with the exams example from the first vignette and we want to show you how the error correction can be applied.

2 Using the error correction of the Iota Concept

Applying the error correction of the Iota Concept requires that all coding units of the core study are rated by at least two raters. The error correction can be requested with the function est_expected_categories(). This function calculates the probability that a coding unit belongs to a specific true category under the condition of the observed pattern. To illustrate the error correction, a look into the data set is helpful.

library(iotarelr)
head(iotarelr_written_exams)
#>   Coder A Coder B Coder C    Sex
#> 1 average average    good female
#> 2 average    poor average   male
#> 3    poor average    poor female
#> 4 average average average female
#> 5    poor average    good female
#> 6    poor    poor average female

The first 6 rows of the data set show that the three raters do not agree on all coding units. While the raters agree on the first two exams, they disagree partially on exams 3 to 6. In particular, two raters consider exam 3 to be average while one rater considers this exams to be good. Thus, there seems to be some kind of error and it is not clear which category should be assigned to exam 3.

To solve this problem, we must first estimate the Assignment Error Matrix. In the next step, we pass the estimated Assignment Error Matrix to the function est_expected_categories() and use the ratings as our data source. The results are saved in the object expected_categories.

res_iota2<-compute_iota2(
  data=iotarelr_written_exams[c("Coder A","Coder B","Coder C")],
  random_starts = 2,
  trace = FALSE)
expected_categories<-est_expected_categories(
  data=iotarelr_written_exams[c("Coder A","Coder B","Coder C")],
  aem=res_iota2$categorical_level$raw_estimates$assignment_error_matrix)
head(expected_categories)
#>   Coder A Coder B Coder C prob_average  prob_good  prob_poor expected_category
#> 1 average average    good 4.842758e-01 0.36360946 0.15211472           average
#> 2 average    poor average 3.142772e-07 0.27618301 0.72381668              poor
#> 3    poor average    poor 7.501796e-15 0.12925125 0.87074875              poor
#> 4 average average average 9.169925e-01 0.04110215 0.04190537           average
#> 5    poor average    good 3.273213e-08 0.48183913 0.51816084              poor
#> 6    poor    poor average 7.501796e-15 0.12925125 0.87074875              poor

The resulting object contains the ratings and additional columns. The columns contain the probability that a coding unit belongs to a specific true category. For the first two rows, the probability that these exams are truly poor ones is about 94.5%. The chance that these exams represent truly good exams is about 5.5% and that they are truly average exams is nearly 0%.

For exams number 3 and 4, the probability to be an average exam is about 48.2%, to be a good exam it is about 36.4% and to be a poor exam is about 15.2%. Thus, it is most plausible to assign exams number 3 and 4 to the category “average”. The most plausible category is always presented in the last column.

If the ratings were done by only one rater, these kind of errors would not become visible. For example, if the exams were rated only by rater B, exams 3 and 4 were assigned as good exams, although it is more plausible to assign them to the category “average”.

3 Conclusions

Estimating the most likely true category has several advantages:

References