|Year : 2020 | Volume
| Issue : 2 | Page : 83-90
Modeling and forecasting of confirmed and recovered cases of COVID-19 in India
Anuradha Gautam1, Jayant Jha2, Ankit Kumar Singh3
1 Department of Community and Family Medicine, AIIMS Bhopal, Madhya Pradesh, India
2 Theoretical Neuroscience Group, Institute of Neuroscience Systems, Aix-Marseille University, Marseille, France
3 Member of Indian Forest Service, Ministry of Environment, Forest and Climate Change, New Delhi, India
|Date of Submission||30-Apr-2020|
|Date of Acceptance||04-May-2020|
|Date of Web Publication||29-Jun-2020|
Dr. Anuradha Gautam
Department of Community and Family Medicine, AIIMS, Bhopal, Madhya Pradesh
Source of Support: None, Conflict of Interest: None
Context: The novel coronavirus was reported in the past few weeks of 2019 in the Wuhan city, China, and the spread and outbreaks of disease require an epidemiological analysis of the disease in the shortest time and increased awareness of effective interventions
Aims: This article proposes an objective approach to predicting the continuation of the COVID-19 cases in India using a simple, but powerful time-series method.
Settings and Design: Cumulative confirmed and cumulative recovered cases of COVID-19 in India are taken to forecast the prevalence of incoming 3 weeks.
Subjects and Methods: The model is built to predict the number of confirmed cases and recovered cases based on the data available from March 14, 2020, to April 26, 2020.
Statistical Analysis Used: The autoregressive-integrated moving average model was applied to predict the number of confirmed cases and recovered cases of COVID-19 during the next 3 weeks.
Results: Our forecasts suggest a continuing increase in the confirmed COVID-19 cases with sizable associated uncertainty assuming that the data used are reliable and that the future will continue to follow the past pattern of the disease.
Conclusions: The timeline of a live-forecasting exercise with potential implications for planning and decision making is described.
The following core competencies are addressed in this article: Medical knowledge, Practice-based learning and improvement, Systems-based practice.
Keywords: Autoregressive-integrated moving average, COVID-19, forecast, India, pandemic, predictive model
|How to cite this article:|
Gautam A, Jha J, Singh AK. Modeling and forecasting of confirmed and recovered cases of COVID-19 in India. Int J Acad Med 2020;6:83-90
|How to cite this URL:|
Gautam A, Jha J, Singh AK. Modeling and forecasting of confirmed and recovered cases of COVID-19 in India. Int J Acad Med [serial online] 2020 [cited 2021 Jun 25];6:83-90. Available from: https://www.ijam-web.org/text.asp?2020/6/2/83/287956
| Introduction|| |
The novel coronavirus was reported in the past few weeks of 2019 in the Wuhan city, China, with the emergence of several cases of pneumonia of unknown etiology. Coronaviruses are a large family of enveloped viruses with single-stranded RNA and a crown on their surface. Coronavirus started as a local epidemic of China but later on, it spread subsequently worldwide, being transmitted by international travelers. Currently, COVID-19 is affecting 210 countries and territories worldwide. On March 11, 2020, the WHO declared the virus outbreak as a pandemic.
On 30th January, India, reported its first case of COVID-19 in Kerala, which rose to three cases by 3rd February; all were students who had returned from Wuhan, China. No significant rise in cases was seen in the rest of February. On 4th March, 22 new cases came to light, including those of an Italian tourist group with 14 infected members.
As on April 26, 2020, according to the latest World Health Organization report, 2,804,796 COVID-19 confirmed cases including of 193,710 deaths were reported worldwide, and in India, total 26,496 COVID-19 confirmed cases were reported. These cases were increasing day by day because of human-to-human transmission spread by the inhalation of respiratory droplets and asymptomatic transmission of COVID-19. The incubation period of COVID-19 is over 14 days, and during this period, the virus can be transferred to other people.
The rapid spread of COVID-19 virus and outbreaks of disease require an epidemiological analysis of the disease in the shortest time and increased the awareness of effective interventions. While no vaccine or treatment are available to control the effect of COVID-19, the focus has been to restrict the spread through the national lockdowns and quarantines. Therefore, to control the loss of lives, governments of most of the countries are imposing several restrictions on their citizens. The important concerns for people and health policy-makers are: (a) The trend of this COVID-19 outbreak and (b) prediction of cases in future. In this article, we aim to address these concerns by developing a data-driven time series model for confirmed and recovered COVID-19 cases to forecast COVID-19 prevalence in India for coming 3 weeks.
| Subjects and Methods|| |
This time series study is considered to forecast the cumulative number of confirmed cases and recovered cases of COVID-19 in India. The studied data include the number of confirmed and recovered cases of COVID-19 from March 14, 2020, to April 26, 2020, to predict the prevalence in India for the next 3 weeks. Data were taken from covid19. India.org. The dataset provides the number of new cases, the deceased cases and the recovered cases on a daily basis and also cumulatively.
The autoregressive-integrated moving average (ARIMA) model is considered for forecasting time-series data to forecast the number of newly infected patients. ARIMA model can be considered as one of the most used prediction models for epidemic time series.
ARIMA models are applied in some cases where the data show evidence of nonstationarity, where an initial differencing step (corresponding to the “integrated” part of the model) can be applied one or more times to eliminate the nonstationarity.
ARIMA models are generally denoted ARIMA (p, d, and q) where parameters p, d, and q are nonnegative integers, p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the moving-average model.,
The goodness of fit is evaluated by thorough inspection of residuals and Box-Ljung test of stationarity. The level of significance is set at 0.05 for our model.
Training set and test set
The data are divided into training set and test set. Overall data is of 44 days (from March 14, 2020, to April 26, 2020), of which first 35 days' data (79.5%) are used as training set for the predictive model, and the last 9 days' data (20.5%) are used as test set. The training set is considered for fitting the model using ARIMA, and the prediction errors are calculated on the test set to evaluate the performance of the forecast.
We applied “AUTOARIMA” package in R to evaluate the values of (p, d, and q) and forecast the incidences of infected cases in India. Two ARIMA models of COVID-19 daily confirmed and recovered cases were designed. The statistical analysis was performed using “R” version 1.2.5.
The optimal ARIMA model parameters have been chosen by using the Akaike Information criterion (AIC). For the best-fitted model, we also present a measure of forecast accuracy, i.e., the mean absolute prediction error (MAPE). We investigate autocorrelation function (ACF) of the residuals. We also tested the common statistical assumptions for the validity of the model by residual diagnostic, i.e., normality and independence.
| Results|| |
The ARIMA model is used for predicting the “cumulative-confirmed cases” and “cumulative-recovered cases” of COVID-19 in India for the next 21 days. [Figure 1] shows the trend of cumulative number of confirmed and recovered cases in India from March 14, 2020, to April 26, 2020. The preliminary analysis of data shows that the cumulative COVID-19-confirmed cases and recovered cases have been increasing regularly since the start of the pandemic.
|Figure 1: Cumulative-confirmed cases progression (Left) and Cumulative recovered case progression (Right) of COVID-19 in India (from 14th March to 26th April 2020)|
Click here to view
Based on AIC value, for confirmed cases ARIMA (1, 2, 0) was selected as the best ARIMA model, while for recovered cases ARIMA (0, 2, 0) was selected as the best ARIMA model. We have considered MAPE parameter as a measure of accuracy. It is defined as:
Accuracy % =100 − MAPE
[Table 1] presents the accuracy parameters. The assessment of prediction error is done to validate the model. The models of ARIMA (1, 2, 0) confirmed and ARIMA (0, 2, 0) recovered cases are validated with an accuracy of 95.70% and 89.36%, respectively.
|Table 1: Autoregressive-integrated moving average model accuracy parameters|
Click here to view
Analysis of residuals
Residuals are the portion that is left over after fitting a model. They are the differences between fitted model values and the observed values. We are using residuals to check the validity of the fitted model. We have used the check residuals () function in R which provides the following:
- Plot the residuals over time
- ACF plot of residuals
- QQ plot and histogram of residuals along with normal distribution curve to compare them
- Carry out a Ljung-Box test with correct degrees of freedom.
- Residual analysis for confirmed cases ARIMA (1, 2, 0) model
The results for confirmed cases are given below in Box 1 and [Figure 2].
- Residual analysis for recovered cases ARIMA (0, 2, 0) model.
The results for recovered cases are given below in Box 2 and [Figure 3]:
Observations from the analysis of residuals for confirmed and recovered cases are as follows:
- We observed that Q * value is large and P > 0.05 [Box 1] and [Box 2], i.e., it is nonsignificant and the residuals cannot be distinguished from the white noise
- The ACF of the residuals [Figure 2] and [Figure 3] shows that there is no significant autocorrelation
- The histogram [Figure 2] and [Figure 3] shows that the mean of residuals is close to zero, but the right tail is a bit long. The deviation is not substantial from normality
- The QQ Plot [Figure 4] and [Figure 5] also confirms the assumption of normality for the errors. Thus, the errors are normal subject to a few outliers and the tails
- The time series of residuals [Figure 2] and [Figure 3] is almost symmetric about zero.
Overall, it can be concluded that residue analysis shows no major departure from the model assumptions, and therefore, we consider the selected model for forecast.
The forecast for confirmed cases is presented in [Table 2] and for recovered cases is presented in [Table 3] with 80% as well as 95% confidence intervals (CIs). According to the forecast, the number of confirmed COVID-19 cases is expected to reach 52,989 in the coming 3 weeks. Furthermore, the recovered cases are expected to increase considerably.
|Table 2: Forecasted confirmed COVID-19 cases and their lower and upper limits for 21 days (April 27, 2020 to May 17, 2020) with 80% and 95% confidence interval|
Click here to view
|Table 3: Forecasted recovered COVID-19 cases and their lower and upper limits for 21 days (April 27, 2020 to May 17, 2020) with 80% and 95% confidence interval|
Click here to view
[Figure 6] summarizes all the cumulative data (actual value, training dataset, test dataset, and forecasted value) for confirmed and recovered cases. It can be seen from [Figure 6] that the predicted values closely follow the actual values. Thus, our models fit nicely to the data. Based on our predictions [evident from [Figure 6] and [Figure 7], the number of cumulative-confirmed cases and cumulative-recovered cases will follow an upward trend in the near future.
|Figure 6: Actual versus predicted data for confirmed (left) and recovered cases (right)|
Click here to view
|Figure 7: Predictions with confidence intervals of confirmed case model (left), and recovered case model (right) (Black line: Actual data, Blue line: 21-day forecast, Grey zone: 80% of confidence interval, White zone: 95% of confidence intervals)|
Click here to view
| Discussion|| |
The forecast for confirmed cases and the recovered cases are presented in [Table 2] and [Table 3], respectively, with 80% as well as 95% CI.
According to the forecast, the number of confirmed COVID-19 cases is expected to increase considerably in the coming 21 days. In India, first case came in 30th January and then confirmed cases started increasing rapidly from March 4, 2020. Lockdown was announced from 25th March, so it is highly suspected that increase in confirmed cases may be associated with large social gatherings which took place just before the lockdown. Many of them are not tested for COVID-19 or they are untraceable because some of them are asymptomatic. Such cases may cause transmissions and lead to higher number of infections.
Quarantine period, i.e., 14 days' is also very important to break the chain of infection. However, in some cases, people did not follow the 14 days' self-isolation period after returning from abroad as prescribed by the government authorities. This may be a major cause of infection.
In our study, we have assumed the accuracy of the data, and also, the recent patterns will continue to apply (including precautionary measures). It is noteworthy that number of cases in India is still quite low as compared to developed world. This becomes significant when we take into account relatively large population of India and the state of healthcare facilities. However, till now, the prevalence of COVID 19 is under control. At present (April 26, 2020), the total number of confirmed patients and deaths in India is 26,496 and 824, respectively. The number of cases in India is on the rise. Advance knowledge of prevalence by the way of forecasting of COVID-19 in India may help in controlling the disease.
The results of our study indicate that if the spreading pattern continues as before, there will be a sharp increase in the number of cumulative cases in the next days. Based on our predictions, the number of cumulative-confirmed cases would be 52,989 and recovered cases would be 15,237 on May 17, 2020 according to ARIMA.
It can be observed from the global trend that early population-based intervention measures such as lockdown and following social distancing has helped in containing the prevalence with lower number of new confirmed cases and significantly reduced the magnitude of exponential growth.
In Europe, Sweden, the United Kingdom (UK), and the Netherlands acted relatively slowly. Each of the three implemented “herd immunity” strategies, in the early stages of their epidemics, which sought reliance mainly on voluntary compliance; albeit later, the UK and the Netherlands switched to more aggressive responses, including nation-wide lockdown.
Meanwhile, Germany and Greece stand out as countries that adopted aggressive and early population-based interventions and control strategies compared with Italy, Spain, France, and USA which implemented similar measures, including lockdown, but later in their epidemics.
It is also anticipated that the efforts such as lockdown may affect this prediction and cases may start to decline after a month approximately. We must admit that the fitted model does not take into account for such decline after the peak is reached. However, a comparative study with some of the highly infected countries and countries in South-east Asia region indicates that India can still control the situation if the prevention measures such as quarantine and city sanitization are strictly followed.
The analysis of residual points toward the effect of lock down. In case of confirmed cases, the residual plot observed [Figure 3] shows no variation till the 12th day i.e., (March 26, 2020). After 26th March, the variation in cases begins to increase. It may be due to the fact that lockdown effect begins to kick in after that day as the variation increases considerably in the later part. However, such observed variance in later part of residue may be due to lockdown being followed differently across the country. For recovered cases, the variation is still higher; it may be due to increased focus of medical staff and health-care professionals to fight the COVID-19 epidemic. However, slightly heavy tail in residuals [Figure 3] may possibly indicate that COVID-19 crisis will take sufficient time to go away with lockdown and social distancing norms.
The propagation of infectious diseases often follows a life-cycle pattern, from the outbreak to the acceleration phase, inflection point, deceleration phase, and eventual ending. In case of India, the data [Figure 1] suggest that we are currently in an acceleration phase.
An information system which is able to monitor the epidemiological situation and the performance of the control measures is, however, crucial for efficient outbreak response and should be implemented as early as possible. This can be done using phone tech and cloud platform. For instance, Aarogya Setu App is developed by the Government of India to spread the awareness and mitigate the risk of contracting COVID-19 by tracking the user and letting the user know whether they have crossed path with an infected person. The App scans the database of known cases which are geotagged across the country.
At the same time, adaptive and countering behaviors of agents including individuals (avoiding physical contact) and governments (locking down cities) in the population needs to be strictly adhered to for faster curve decay and decreasing the life cycle of COVID-19 infection prevalence.
For managing the tail effectively, the strategies need to be deployed from today itself. Thus, the priority at that stage of the epidemic will be the early detection of possible re-emergence of transmission, in relation with the importation of cases from the areas still experiencing active transmission. Making use of technological innovations such as Aarogya Setu may help to optimize the management of the tail of the epidemic.
As the prevalence of COVID-19 declines, the pressure on clinical and laboratory expertise will decrease gradually. There will be rise in the demand for field epidemiologists to assist local public health experts and support community workers to engage in active surveillance and to monitor remaining transmission chains in affected communities.
Fundamental core competencies in public health serve to: Unify public health organizations and agencies; determine direction; develop strategic plan; facilitate communication across programmatic and organizational lines; boost overall organization and workforce performance; Develop the knowledge and skill of the public health workforce., Lack of such competencies may endanger life of larger population, and thus, may not be able to positively influence the health and well-being of society. This makes a good case for public servants and political executives to have fundamental core competency in public health to be considered as viable candidates for holding public office.
- ARIMA model only includes past data of confirmed and recovered cases as dependent variable, so it can forecast simple up and downs without change in the overall trend but is more accurate than regressive models
- Data were taken from covid19 India.org, any mismatch or delay in reporting of data could result in incorrect forecasting. The COVID-19 cases in India are rising as the pandemic reached India quite late. Thus, less amount of data is available for the study
- The model does not take into account the socioeconomic data. Individual level data are not available, so there is a lack of information regarding sociodemographic, travel history of individual, any comorbidities, and/or risk factor for COVID-19. The effects of government stringency and restrictive measures are also not taken into account as variables
- Some asymptomatic confirmed cases are not traced and thus are not reported in any data. They hide himself and may infect their family. Such cases of underreporting may affect the outcome of the study
- It is observed that in countries such as China, confirmed cases reached the peak and then declined. However, our model is based on the data when the trend is in increasing phase. Thus, the fitted model will not be able to capture the decline.
| Conclusions|| |
ARIMA models can be viewed as a decision-support system to program the health monitoring system at national and regional level. The main advantages of ARIMA forecasting approach are surely its ease of application and interpretation.
The results of the present study suggest that COVID-19 epidemic in India will probably increase, in terms of cumulative cases, in the next 21 days, i.e., till May 17, 2020.
The COVID-19 pandemic may be controlled if sufficient and adequate measures are taken to control the disease. The continuation of the restrictive measures taken by the government, i.e. lockdown all over India may mitigate the size of the epidemic and Indian civilians should strictly comply with the social distancing norms, government notified rules, such as traffic and travel restriction, complete ban on gatherings, and partial closure of commercial activities, so that COVID-19 spread can be controlled.
Our results will definitely help the decision makers to decide upon the restrictive measures and plan the requirements for additional health facilities.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
Ethical conduct of research
The datasets used in this study are available from covid19. India.org and have been made available under open access license. Institutional Review Board approval was not required for this manuscript. Applicable EQUATOR network (https://www.equator-network.org/) reporting guidelines were followed.
| References|| |
Moftakhar L, Seif M. The exponentially increasing rate of patients infected with COVID-19 in Iran. Arch Iran Med 2020;23:235-8.
Chintalapudi N, Battineni G, Amenta F. COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach. 2020;(January):19–21
Jerlicz M, Kostowski W, Bidzinski A, Hauptmann M. Effects of lesions in the ventral noradrenergic bundle on behavior and response to psychotropic drugs in rats. Pharmacol Biochem Behav 1978;9:721-4.
Gupta S, Raghuwanshi GS, Chanda A. Effect of weather on COVID-19 spread in the US: A prediction model for India in 2020. Sci Total Environ 2020;728.
Tiwari S, Kumar S, Guleria K. Outbreak trends of corona virus (COVID-19) in India: A Prediction. Disaster Med Public Health Prep 2020(May):1-9.
Perone G. An ARIMA model to forecast the spread of COVID-2019 epidemic in Italy. ( first version on SSRN 31 March). SSRN Electron J. 2020;(March)
Notation for ARIMA Models, Time Series Forecasting System. SAS Institute; 2020. p. 1-5.
Forecasting: Principles and Practice. 8.9 Seasonal ARIMA Models. Hyndman R.2018. p. 1-14.
Tandon H, Ranjan P, Chakraborty T, Suhag V. Coronavirus (COVID-19): ARIMA Based Time-Series Analysis to Forecast Near Future; January, 2020. p. 1-11.
Predictive Monitoring of COVID-19. Luo J.; 2020. p. 1-10.
Banerjea A. Govt Launches' Aarogya Setu. A Coronavirus Tracker App: All You Need to Know; 2020. p. 1-11. Available from: https://www.mygov.in/aaro gya-setu-app
. [Last accessed on 2020 Apr 29]
Kaasik-Aaslav K, Coulombier D. The tail of the epidemic and the challenge of tracing the very last Ebola case. Euro Surveill 2015;20. pii: 21075.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]
[Table 1], [Table 2], [Table 3]