International Journal of Academic Medicine

BIOSTATISTICS
Year
: 2016  |  Volume : 2  |  Issue : 2  |  Page : 217--219

Understanding the calculation of the kappa statistic: A measure of inter-observer reliability


Sidharth S Mishra, Nitika 
 Department of Community Medicine, School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh, India

Correspondence Address:
Nitika
School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh
India

Abstract

It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance.” The kappa coefficient is a popular index of agreement for binary and categorical ratings. This article focuses on the unweighted kappa statistic calculation by providing a stepwise approach that is supplemented with an example. The aim is that health care personnel may better understand the purpose of the kappa statistic and how to calculate it. The following core competencies are addressed in this article: Medical knowledge.



How to cite this article:
Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of inter-observer reliability.Int J Acad Med 2016;2:217-219


How to cite this URL:
Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of inter-observer reliability. Int J Acad Med [serial online] 2016 [cited 2020 Apr 6 ];2:217-219
Available from: http://www.ijam-web.org/text.asp?2016/2/2/217/196883


Full Text



 Introduction



It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance” specifically, agreement between two clinicians under two different conditions or the agreement among multiple clinicians under one condition.[1] To this end, we consider a relevant statistical technique such as Cohen's kappa, which is a common index of agreement for binary and categorical ratings.[2] If the categories are unordered, the unweighted kappa statistic (K) is appropriate. If the categories are ordered – as they are in most rating scales in clinical, psychological, and epidemiological research – the weighted kappa statistic (K[w]) is preferable.[3] While there are many modifications and variants of kappa statistic, this article focused on calculation of the unweighted kappa statistic calculation by providing a stepwise approach and supplemented with an example.

 Estimating the Kappa Statistic



Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 1].{Table 1}

Step 2: Calculate the percentage of observed agreement

[INLINE:1]

Step 3: Calculate the percentage of agreement expected by chance alone.

In this agreement is present in two cells, i.e. A – in which both are agreeing and in D – in which both disagrees. “a” is the expected value for cell A, and “d” is the expected value for cell D.

For each cell, we need to find it by,

[INLINE:2]

That is, [INLINE:3]

Similarly, method has to be followed for calculating d [Table 2].{Table 2}

Percentage agreement expected by chance is

[INLINE:4]

Step 4:

[INLINE:5]

Step 5 (inference): It was suggested by Landis and Koch [4] that a kappa value more than 0.75 represented excellent agreement beyond chance whereas below 0.40 had poor agreement. A kappa value in the range of 0.40–0.75 represents intermediate to good agreement.

 Example of Estimation of the Kappa Statistic



Suppose that 100 patients suffering from pancreatic carcinoma underwent contrast-enhanced computed tomography abdomen and that 2 radiologists reviewed the reports [Table 3].{Table 3}

 Solution to Problem



Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 4].{Table 4}

Step 2: Calculate the percentage of observed agreement

[INLINE:6]

Step 3: Calculation of the percentage of agreement expected by chance alone [Table 5].{Table 5}

For each cell, we need to find it by,

[INLINE:7]

That is, [INLINE:8]

Similarly, b = 20.25, c = 30.25, and d = 24.75

Percentage agreement expected by chance alone

[INLINE:9]

Step 4:

[INLINE:10]

Step 5 (inference): intermediate to good agreement.

 Conclusion



The kappa statistic is a frequently used measure of inter-observer reliability, but its manual calculation may cause confusion. The aim of this article is to help health care personnel better understand the purpose of the kappa statistic and how to calculate it.

Acknowledgment

We would like to thank Dr. Reshmi Mishra, and Dr. Tushar Subhadarshan Mishra.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

References

1Cao H, Sen PK, Peery AF, Dellon ES. Assessing agreement with multiple raters on correlated kappa statistics. Biom. J. doi: 10.1002/bimj. 201500029.
2Barnhart HX, Williamson JM. Weighted least-squares approach for comparing correlated kappa. Biometrics 2002;58:1012-9.
3Ludbrook J. Statistical techniques for comparing measurers and methods of measurement: A critical review. Clin Exp Pharmacol Physiol 2002;29:527-36.
4Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74.