Assuming that there are no negative double ratings, Kappa`s statistics would be 0.129 (95% confidence interval (95% CI), 0.208,-0.058; all confidence intervals were achieved by the bootstrap method provided for in Schedule 2); the agreement would be less than would be expected by chance. What is a reasonable estimate for one? The highest number of lesions detected was 17 for a patient, suggesting that the potential number of lesions per patient was at least 17. Therefore, a patient without lesions must count for at least 17 negative double evaluations. In this case, the total number of sites assessed would be 84×17 – 1428 and, by subtraction, 1179 and Kappa`s statistics would be well above 0.789 (IC 95%: 0.696-0.868). However, 95 different lesion sites were identified in the sample. If the potential number of injury points per patient is 95, the total number of sites would be 7980, a 7731 and Kappa statistics 0.815 (0.731, 0.884). But it can be assumed that the universe of possible lesions is larger than the few that have been observed in this sample. Figure 1 (fixed line) shows Kappa`s statistics when a range varies from 17 to 200 per patient; the horizontal line corresponds to the hands-free cappa of 0.820 (0.737, 0.888). This example shows that kappa is underestimated when potentially unlimited negative ratings are ignored or underestimated.

Landis JR, Koch GG. The measure of the compliance agreement for categorical data. Biometrics. 1977;33:159-74. Suppose you analyze data related to a group of 50 people applying for a grant. Each grant proposal was read by two readers, and each reader said „yes“ or „no“ to the proposal. Suppose the data for the tally of disagreements were as follows, A and B being readers, the data on the main diagonal of the matrix (a and d) the number of agreements and the non-diagonal data (b and c) the number of disagreements count: we find that in the second case, they show a greater similarity between A and B , compared to the first. Indeed, if the percentage of agreement is the same, the percentage of agreement that would occur „by chance“ is much higher in the first case (0.54 vs. 0.46). A good agreement between advisors is a desirable property of any diagnostic method. The agreement is generally assessed by Kappa`s statistics [1], which quantify the extent to which the agreement observed between the councillors exceeds the agreement solely on the basis of chance. The evaluation of Kappa`s statistics requires that the number of positive (or abnormal) and negative (or normal) evaluations be known to all advisors.

This is not the case when advisors report only positive results and do not report the number of negative results.