

BIOSTATISTICS 

Year : 2016  Volume
: 2
 Issue : 2  Page : 217219 

Understanding the calculation of the kappa statistic: A measure of interobserver reliability
Sidharth S Mishra, Nitika
Department of Community Medicine, School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh, India
Date of Submission  22Feb2016 
Date of Acceptance  29Mar2016 
Date of Web Publication  28Dec2016 
Correspondence Address: Nitika School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/24555568.196883
It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance.” The kappa coefficient is a popular index of agreement for binary and categorical ratings. This article focuses on the unweighted kappa statistic calculation by providing a stepwise approach that is supplemented with an example. The aim is that health care personnel may better understand the purpose of the kappa statistic and how to calculate it. The following core competencies are addressed in this article: Medical knowledge.
Keywords: Interrater agreement, kappa coefficient, unweighted kappa
How to cite this article: Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of interobserver reliability. Int J Acad Med 2016;2:2179 
How to cite this URL: Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of interobserver reliability. Int J Acad Med [serial online] 2016 [cited 2020 Aug 6];2:2179. Available from: http://www.ijamweb.org/text.asp?2016/2/2/217/196883 
Introduction   
It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance” specifically, agreement between two clinicians under two different conditions or the agreement among multiple clinicians under one condition.^{[1]} To this end, we consider a relevant statistical technique such as Cohen's kappa, which is a common index of agreement for binary and categorical ratings.^{[2]} If the categories are unordered, the unweighted kappa statistic (K) is appropriate. If the categories are ordered – as they are in most rating scales in clinical, psychological, and epidemiological research – the weighted kappa statistic (K[w]) is preferable.^{[3]} While there are many modifications and variants of kappa statistic, this article focused on calculation of the unweighted kappa statistic calculation by providing a stepwise approach and supplemented with an example.
Estimating the Kappa Statistic   
Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 1].
Step 2: Calculate the percentage of observed agreement
Step 3: Calculate the percentage of agreement expected by chance alone.
In this agreement is present in two cells, i.e. A – in which both are agreeing and in D – in which both disagrees. “a” is the expected value for cell A, and “d” is the expected value for cell D.
For each cell, we need to find it by,
That is,
Similarly, method has to be followed for calculating d [Table 2].
Percentage agreement expected by chance is
Step 4:
Step 5 (inference): It was suggested by Landis and Koch ^{[4]} that a kappa value more than 0.75 represented excellent agreement beyond chance whereas below 0.40 had poor agreement. A kappa value in the range of 0.40–0.75 represents intermediate to good agreement.
Example of Estimation of the Kappa Statistic   
Suppose that 100 patients suffering from pancreatic carcinoma underwent contrastenhanced computed tomography abdomen and that 2 radiologists reviewed the reports [Table 3].
Solution to Problem   
Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 4].
Step 2: Calculate the percentage of observed agreement
Step 3: Calculation of the percentage of agreement expected by chance alone [Table 5].
For each cell, we need to find it by,
That is,
Similarly, b = 20.25, c = 30.25, and d = 24.75
Percentage agreement expected by chance alone
Step 4:
Step 5 (inference): intermediate to good agreement.
Conclusion   
The kappa statistic is a frequently used measure of interobserver reliability, but its manual calculation may cause confusion. The aim of this article is to help health care personnel better understand the purpose of the kappa statistic and how to calculate it.
Acknowledgment
We would like to thank Dr. Reshmi Mishra, and Dr. Tushar Subhadarshan Mishra.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References   
1.  Cao H, Sen PK, Peery AF, Dellon ES. Assessing agreement with multiple raters on correlated kappa statistics. Biom. J. doi: 10.1002/bimj. 201500029. 
2.  Barnhart HX, Williamson JM. Weighted leastsquares approach for comparing correlated kappa. Biometrics 2002;58:10129. 
3.  Ludbrook J. Statistical techniques for comparing measurers and methods of measurement: A critical review. Clin Exp Pharmacol Physiol 2002;29:52736. 
4.  Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:15974. 
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]
