MIRA '99 (MIRA)
14th - 16th April 1999
The importance of the issue of the agreement (or disagreement) between relevance judges is increasing, since new kinds of relevance judgment expression are being used (to the classical dichotomous one, various researches have added scalar, weighted, and orders of various kind) and new media are being introduced (it is far quicker to judge the relevance of an image than a text, and thus the human judgments can be obtained more easily).
This paper presents a coherent account of the disagreement between relevance judges and groups of judges. Judgment expressions of different kinds, grouped into two categories, are taken into account. To the first category, score judgments, belong the more classical dichotomous, scalar, and weighted. To the second one, order judgments, belong total (or linear) and partial (or weak) orders, both with or without equality. A uniform notation for describing relevance judgments of each kind is proposed; some of the problems arising when one tries to operationally measure the disagreement between judges are described; a measure for the disagreement of two judges expressing two judgments of the same kind is proposed; the disagreement of a group of more than two judges is discussed; and, finally, some experimental activity inspired by this study is sketched.