Inter-annotator reliability (Cohen's kappa)

hasloe · 18 January 2022 15:53

Hi Marta, I don’t see the screenshot, but in this case I think your description is sufficient.
Figure 2 of the Holle&Rein BRM paper illustrates the matching process of the annotations created by two raters. It contains a few examples of annotations of one rater (R1) overlapping multiple annotations of the other rater (R2), in which case only one of the R2 annotations, the best match, is connected to the R1 annotation. The other overlapping annotations are categorized as “unmatched” or “nil”.
In your case all, or all but one, annotations of the second tier are unmatched; the two raters disagree strongly on the segmentation. Like they have been looking at different events or phenomena.

In ‘traditional’ kappa calculation such situations can not occur: the samples are given and both raters get to categorize the same set of samples. In this ‘modified kappa’ procedure (of ELAN and EasyDIAg) the raters more or less have to identify the samples first and then also apply a category to each sample. If the segmentation created by one rater is very different from that produced by the other, there is no high agreement and no high kappa value.
I’m not an expert in this field, by the way, and I’m probably not the best in explaining these things.