BIOSTATISTICS
Year : 2016 | Volume
: 2 | Issue : 2 | Page : 217--219
Understanding the calculation of the kappa statistic: A measure of inter-observer reliability
Sidharth S Mishra, Nitika Department of Community Medicine, School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh, India
Correspondence Address:
Nitika School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh India
Abstract
It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance.” The kappa coefficient is a popular index of agreement for binary and categorical ratings. This article focuses on the unweighted kappa statistic calculation by providing a stepwise approach that is supplemented with an example. The aim is that health care personnel may better understand the purpose of the kappa statistic and how to calculate it.
The following core competencies are addressed in this article: Medical knowledge.
How to cite this article:
Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of inter-observer reliability.Int J Acad Med 2016;2:217-219
|
How to cite this URL:
Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of inter-observer reliability. Int J Acad Med [serial online] 2016 [cited 2023 Jan 27 ];2:217-219
Available from: https://www.ijam-web.org/text.asp?2016/2/2/217/196883 |
Full Text
Introduction
It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance” specifically, agreement between two clinicians under two different conditions or the agreement among multiple clinicians under one condition.[1] To this end, we consider a relevant statistical technique such as Cohen's kappa, which is a common index of agreement for binary and categorical ratings.[2] If the categories are unordered, the unweighted kappa statistic (K) is appropriate. If the categories are ordered – as they are in most rating scales in clinical, psychological, and epidemiological research – the weighted kappa statistic (K[w]) is preferable.[3] While there are many modifications and variants of kappa statistic, this article focused on calculation of the unweighted kappa statistic calculation by providing a stepwise approach and supplemented with an example.
Estimating the Kappa Statistic
Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 1].{Table 1}
Step 2: Calculate the percentage of observed agreement
[INLINE:1]
Step 3: Calculate the percentage of agreement expected by chance alone.
In this agreement is present in two cells, i.e. A – in which both are agreeing and in D – in which both disagrees. “a” is the expected value for cell A, and “d” is the expected value for cell D.
For each cell, we need to find it by,
[INLINE:2]
That is, [INLINE:3]
Similarly, method has to be followed for calculating d [Table 2].{Table 2}
Percentage agreement expected by chance is
[INLINE:4]
Step 4:
[INLINE:5]
Step 5 (inference): It was suggested by Landis and Koch [4] that a kappa value more than 0.75 represented excellent agreement beyond chance whereas below 0.40 had poor agreement. A kappa value in the range of 0.40–0.75 represents intermediate to good agreement.
Example of Estimation of the Kappa Statistic
Suppose that 100 patients suffering from pancreatic carcinoma underwent contrast-enhanced computed tomography abdomen and that 2 radiologists reviewed the reports [Table 3].{Table 3}
Solution to Problem
Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 4].{Table 4}
Step 2: Calculate the percentage of observed agreement
[INLINE:6]
Step 3: Calculation of the percentage of agreement expected by chance alone [Table 5].{Table 5}
For each cell, we need to find it by,
[INLINE:7]
That is, [INLINE:8]
Similarly, b = 20.25, c = 30.25, and d = 24.75
Percentage agreement expected by chance alone
[INLINE:9]
Step 4:
[INLINE:10]
Step 5 (inference): intermediate to good agreement.
Conclusion
The kappa statistic is a frequently used measure of inter-observer reliability, but its manual calculation may cause confusion. The aim of this article is to help health care personnel better understand the purpose of the kappa statistic and how to calculate it.
Acknowledgment
We would like to thank Dr. Reshmi Mishra, and Dr. Tushar Subhadarshan Mishra.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References
1 | Cao H, Sen PK, Peery AF, Dellon ES. Assessing agreement with multiple raters on correlated kappa statistics. Biom. J. doi: 10.1002/bimj. 201500029. |
2 | Barnhart HX, Williamson JM. Weighted least-squares approach for comparing correlated kappa. Biometrics 2002;58:1012-9. |
3 | Ludbrook J. Statistical techniques for comparing measurers and methods of measurement: A critical review. Clin Exp Pharmacol Physiol 2002;29:527-36. |
4 | Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74. |
|