Spatial modeling for COVID-19 analysis: An Indian case study

The coronavirus disease 2019 (COVID-19) outbreak in India from January 31, 2020, onwards to June 15, 2020, has reached confirmed cases over 3,32,424 that are being reported. The aim of this study is to predict and explore the spatial distribution of COVID-19 data of India using three models – geographical weighted regression (GWR), generalized linear regression (GLR), and ordinary least square (OLS). In this paper, the swift rise in COVID-19 cases is experiential after the lockdown period. This is explored using ArcGIS on the confirmed case of June 15, 2020, as the response with the explanatory of COVID-19 cases, i.e March 15, 2020, April 7, April 12, May 12, and June 1, 2020. The confirmed cases of the dataset is classified into three cases ie. case-1: June 15, 2020, vs March 15 and April 7, 2020; case-2: June 15, 2020 vs April 12, May 12 and June 1, 2020; and case-3: June 15, 2020 Vs all dates mentioned in discussion Hence, the prediction using GWR gave the much closer values for June 16, 2020. AICc of GWR (618.9038) was found to have the minimum value over GLR and OLS models. The day-wise increase and samples tested per day in twelve different states is analyzed using STATA. The number of testing varies with states to states, depending on the population and testing labs available. The percentage for each slope is achieved as m1 (-5.714 %), m2 (39.393%), m3 (6.521%) and m4 (46.938%).


introduction
In India, the coronavirus disease 2019 (COVID-19) is the global pandemic of coronavirus share caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first observed case of COVID-19 in India was initiated from China on January 30, 2020. This virus has spread rapidly across the whole country, especially Maharashtra with the highest confirmed cases of 107958 (June 15, 2020). COVID -19 has a significant correlation with air quality, average, and minimum temperature [1]. The two transmission mode of corona is respiratory and contact. The sanitation and hygienic environments are crucial to protect human health during this infectious COVID-19 outbreak. Ensuring decent and frequent hand wash practices in communities, homes, and health care will help prevent and reduce man-to-man transmission of the COVID-19 virus was avowed by World Health Organization (WHO) [2]. The physical examination of the patients was found to have dry mucous membranes, difficulty breathing, sore throat, headache, or cough [3]. All these lead to a lockdown of many countries, including India from March 25, 2020, to May 31, 2020, in four phases. In few areas of containment zone, the lockdown is extended up to June 30, 2020, as fifth phase. In this study, the spatial data i.e confirmed cases and testing samples, are focused. The confirmed cases are focused on understanding the rate of increase per day in every state and thereby in India. Also, the testing sample per day in every state is examined.
Few studies on geographical weighted regression (GWR) were explored for this study, and implemented on Indian COVID-19 data. Mollalo for COVID-19 data in the US has performed different models on the dataset, which included thirty-five environmental variables, socioeconomic, behavioral, topographic and demographic factors. The five different models used were three global models, namely ordinary least square (OLS), SLM and SEM, and two local models, namely GWR and multiscale GWR (MGWR). The results of MGWR achieved the highest goodness-offit with the most parsimonious model compared to others. The spatial variability of MGWR in different countries can reflect different behavior of COVID-19 cases in response to the explanatory variables [4].
Wang performed GWR to examine the relationship between the index of frequency of extreme precipitation and other climatic extreme indices in China that includes the frequency of warm days, warm nights, cold days, and cold nights. Based on statistical tests, the regression relationship was observed to be significant between spatial non-stationarity and explanatory variables that exhibited significant spatial inconsistency. GWR was implemented in a case of ecological inference to solve the problems related to the inference of the individual [5]. Calvo & Escolar proposed GWR approach for solving complications of spatial aggregation bias and spatial autocorrelation that affect all well-known approaches of ecological inference. This estimation process can theoretically and intuitively compute, showing that GWR approach to Goodman and King's Ecological Inference methods results in unbiased and consistent local estimates of ecological data that reveal extreme spatial heterogeneity [6]. GWR on data of house price varying with both power and rotation parameters to generate different Minkowski distances, the study proved that the local collinearity can be both negatively and positively affected by distance metric choice. The results indicate that distance metric choice can provide a useful extra tuning component to address local collinearity issues in spatially varying coefficient modelling and helps to understand the interaction of distance metric and collinearity can provide insight into the nature and structure of the data relationships [7].
Franch-Pardo carried out an assessment of sixty three scientific articles on geospatial and spatial-statistical analysis of COVID-19. The study is grouped into the categories of disease mapping: spatiotemporal analysis, health and social geography, environmental variables, data mining, and web-based mapping. It was clarified that the spatiotemporal dynamics of COVID-19 needs very strong decision making, planning and community action. Also, it emphasized that the challenges from an interdisciplinary perspective with proactive planning, international solidarity and a global perspective needs to be addressed to fight COVID-19 [8]. Gupta used longterm climatic data of air temperature (V1), rainfall (V1), actual evapotranspiration (V1), solar radiation (V1), specific humidity, wind speed with topographic altitude and density of population at the regional point to examine the spatial association with the quantity of COVID-19 infections. Their results proved Variable Importance of Projection through PLS technique that had very higher significance over all V1's [9].
Boulos & Geraghty (2020) discusses about the disease mapping and the social media reactions for disease spread, predictive risk mapping using population travel data, tracing and mapping superspreader trajectories and contacts across space and time. The study is how GIS and mapping dashboards can support the fight against infectious disease outbreaks and epidemics [10]. Krishnakumar & Rana gives good insights to make effective approach to culminate the world threat COVID-19 in India [11]. Pulla (2020) expresses that the transmission of COVID-19 by asymptomatic people would reduce the effectiveness of airport screening and quarantine measures. It was communicated that India would have confirmed cases of COVID-19 between around 100 000 and 1.3 million by the middle of May if the virus continues to spread at its current rate [12].

data and methodology
The study includes the spatial models, namely ordinary least square (OLS), geographical weighted regression (GWR), and generalized linear regression (GLR). The details of each models is discussed in Spatial Regression Models section. The day to day data was collected from Ministry of health and family welfare ( Table 1) and analyzed using ArcGIS Pro with spatial models. The samples for testing data for state-wise and all over India were obtained from Statista and ICMR websites. The testing data were found to vary based on testing labs available in each state. Total number of samples tested as on June 15, 2020 in few states such as TN (729002), MH (671348), RJ (609296) and AP (567375) were updated when compared with the confirmed cases in TN (44661), MH (107958), RJ (12694) and AP (6163). The time series graph for day wise increase in India and twelve different states was obtained using STATA 12 IC. The two way graph for sample tested and confirmed case for states was performed on STATA 12 IC. The study includes the increase of COVID-19 confirmed cases after the lockdown period and is analyzed using ArcGIS on the confirmed case of June 15, 2020 as the response variable with the explanatory variables of COVID-19 cases i.e. March 15, 2020, April 7, April 12, May 12 and June 1, 2020.

Spatial regression models 2.1.1 ordinary least square
Ward & Gleditsch discusses about OLS as a linear regression approach that examines the relationships between dependent variable and a set of explanatory variables and is represented with the following notation (eq. 1) where i represents any country, y i is the confirmed cases (dependent variable), the intercept of the model (β 0 ), the vector of selected explanatory variables (x i ), the vector of regression coefficients (β), and random error term (ε i ) [13]. Based on the nature of the spatial dependence, OLS will be either incompetent with incorrect standard errors or biased and inconsistent [14]. If spatial dependence among the data exists, then it violates assumptions about the error term [15,16].

generalized linear regression
GLR is a regression model used to generate predictions or to model a dependent variable in relation to a set of explanatory variables. Its prediction can be used to examine and quantifies relationships among features. The tool is used to fit continuous (OLS), binary (logistic), and count (Poisson) models. A count model assumes that the mean and variance of the dependent variable are equal, and moreover, the values of the dependent variable cannot be negative or contain decimals. The notation of GLR is as follows in eq. 2. (2) where β 0 is the intercept, β 1, and β 2 are the slope and coefficient of the explanatory variables in regressions with x 1 x 2 and, …x n , respectively. The term e i is the error terms, and y is the dependent variable [17].

geographically weighted regression
GWR is a spatial techniques mostly used in geography and many other disciplines. GWR is a local model of the variable to predict by fitting a regression equation to each feature in the dataset. It should be noted that GWR is not an appropriate method for small datasets and does not work with multipoint data. The notation of GWR is given in eq. 3 [4]. ( where i is a country, y i is the value for the confirm cases, the intercept (β i0 ), the jth regression parameter (β ij ), X ij is the value of the jth explanatory parameter, and ε i is a random error term.

Findings and results
The models are generated on the datasets obtained from https://www.mohfw.gov.in/ and the analysis was performed on ArcGIS Pro. The results of OLS model (Figure 1) summaries the coefficient, T-statistic and P-value along with VIFs on explanatory variables assumed (Tables 2, 3 and 4); the selected variables have relatively low multi-collinearity since the Variance Inflation Factor (VIFs) for all of explanatory variables were positively associated with confirmed cases (p< 0.01). The p-value for Con-June-1 (0.0000) is much better with VIF (66.16) over the other explanatory in case-1, similarly in case-II, p-value of Con-Apr-7(0.00001) is good fit; and Con-June-1 (0.0000) in case-III. GLR model (Figure 2) with the summary of the three different models is shown in the Tables 5, 6 and 7. The z-score for the intercept is 1720.1645,        The relationship charts (

Relationships between Variables
Relationships between Variables

Relationships between Variables
Relationships between Variables   On comparing the three models ( is better over GLR (81132) and OLS (641.192). Adj R-sq value varies from 0.0 to 1.0 and Adj-R-sq of GWR is much nearer to 1 compared to OLS and GLR model. Hence, GWR model is the better fit model for the COVID-19 data in this study.  The predicted values of GWR for Case -1 (Table 12)  daywise increase in states Figure 10 represents the graphs for 12 different states with MH and TN having maximum cases and the states, namely KL and KA where the curve is flattened. The graphs indicate the day wise increase in the number of confirm cases. Figure 11a represents each curve for all different states in a single graph, whereas Figure 11b shows the day-wise increase in the total cases in India. Figure 12

Tested samples in few states
The

lockdown period graphs in india
The lockdown period ( Figure 13) in all India is divided into four phases initially; later fifth phase named as unlock period ( Figure 14) is announced. After the fourth lockdown, few working sectors and malls were opened with strict guidelines of social distancing and frequently sanitizing to prevent COVID-19 attack. From June 1 to June 31, 2020 was stated as fifth lockdown period in the containment zones. Yet, public transport is not in move.
Based on the graphs obtained the slope is calculated as in Table 13. The percentage for each slope is obtained as m1 (-5.714 %), m2 (39.393%), m3 (6.521%) and m4 (46.938%).     work along with the day-wise increase and number of people tested in few states. The lockdown graph with the slope is the highlight of this study ( Figure  15).

Conclusion
GWR model achieved the highest goodness-of-fit among OLS and GLR models, the results of confirmed cases and the findings of the study proved. GWR obtained AICc (618.9038) and Adj-R 2 (0.9974) whereas GLR achieved AICc (81132) and Adj-R Vol. 8 | Issue S1 | December 2020  (0.0034) and OLS produced AICc (641.1929) and Adj-R 2 (0.9941). As stated earlier in discussion since GWR model is a local model compared to GLR and OLS which are global. However, the spatial variability of GWR or OLS or GLR in different countries may reflect different behavior of COVID-19 cases in response to the selected explanatory variables. This study will help in taking the decision for arranging well-equipment testing labs in less availability of testing labs in states of India.

Conflicts of interest
Authors declare no conflicts of interest.