Calculating inter-rater reliability by calculating the ratio of overlap and total extent


I would love a bit more information on how the measures in the output of this analysis are derived and how one should interpret them. Is there a source that I have maybe overlooked?

Is there a cut off where the average overlap/extent ratio might be said to indicate good or very good inter-rater reliability in terms of segmentation? e.g., is the below considered high?

Average overlap/extent ratio: 0.8182
Overall average overlap/extent ratio: 0.8182

Thanks for your help,

Hi Nicky,

That method for calculating a segmentation agreement ratio is still there for internal use, but since it doesn’t take chance agreement into account and since it is not an accepted measure in any field, it can not be used in publications. It is also not possible to decide in general if a ratio is to be considered high. 0.8182 seems quite good, but it depends on the type of research and the type of events that are being annotated.