Orginal Research
2020 December
Volume : 8 Issue : S1


Spatial modeling for COVID-19 analysis: An Indian case study

Iyyanki M, Prisilla J, Kandle S

Pdf Page Numbers :- 19-32

Muralikrishna Iyyanki1, Jayanthi Prisilla2,*, and Sudarshan Kandle3

 

1Former Dr. Raja Ramanna Distinguished Fellow DRDO and Director R&D JNT University, Hyderabad, Telangana, India

2The Airport Authority of India, Shamshabad, Hyderabad, Telangana 500409, India

3Department of Geography, Osmania University, Amberpet, Hyderabad, Telangana 500007

 

*Corresponding author: Prisilla Jayanthi, The Airport Authority of India, Shamshabad, Hyderabad, Telangana 500409, India. Email: prisillaj28@gmail.com

 

Received 11 August 2020; Revised 26 October 2020; Accepted 19 November 2020; Published 27 November 2020

 

Citation: Iyyanki M, Prisilla J, Kandle S. Spatial modeling for COVID-19 analysis: An Indian case study. J Med Sci Res. 2020; 8(S1):19-32. DOI: http://dx.doi.org/10.17727/JMSR.2020/8S1-3

 

Copyright: © 2020 Iyyanki M et al. Published by KIMS Foundation and Research Center. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



Abstract

The coronavirus disease 2019 (COVID-19) outbreak in India from January 31, 2020, onwards to June 15, 2020, has reached confirmed cases over 3,32,424 that are being reported. The aim of this study is to predict and explore the spatial distribution of COVID-19 data of India using three models – geographical weighted regression (GWR), generalized linear regression (GLR), and ordinary least square (OLS). In this paper, the swift rise in COVID-19 cases is experiential after the lockdown period. This is explored using ArcGIS on the confirmed case of June 15, 2020, as the response with the explanatory of COVID-19 cases, i.e March 15, 2020, April 7, April 12, May 12, and June 1, 2020. The confirmed cases of the dataset is classified into three cases ie. case-1: June 15, 2020, vs March 15 and April 7, 2020; case-2: June 15, 2020 vs April 12, May 12 and June 1, 2020; and case-3: June 15, 2020 Vs all dates mentioned in discussion Hence, the prediction using GWR gave the much closer values for June 16, 2020. AICc of GWR (618.9038) was found to have the minimum value over GLR and OLS models. The day-wise increase and samples tested per day in twelve different states is analyzed using STATA. The number of testing varies with states to states, depending on the population and testing labs available. The percentage for each slope is achieved as m1 (-5.714 %), m2 (39.393%), m3 (6.521%) and m4 (46.938%).

 

Keywords: COVID-19; GIS; spatial data; spatial models; testing samples

Full Text

1. Introduction

In India, the coronavirus disease 2019 (COVID-19) is the global pandemic of coronavirus share caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first observed case of COVID-19 in India was initiated from China on January 30, 2020. This virus has spread rapidly across the whole country, especially Maharashtra with the highest confirmed cases of 107958 (June 15, 2020). COVID -19 has a significant correlation with air quality, average, and minimum temperature [1]. The two transmission mode of corona is respiratory and contact. The sanitation and hygienic environments are crucial to protect human health during this infectious COVID-19 outbreak. Ensuring decent and frequent hand wash practices in communities, homes, and health care will help prevent and reduce man-to-man transmission of the COVID-19 virus was avowed by World Health Organization (WHO) [2]. The physical examination of the patients was found to have dry mucous membranes, difficulty breathing, sore throat, headache, or cough [3]. All these lead to a lockdown of many countries, including India from March 25, 2020, to May 31, 2020, in four phases. In few areas of containment zone, the lockdown is extended up to June 30, 2020, as fifth phase. In this study, the spatial data i.e confirmed cases and testing samples, are focused. The confirmed cases are focused on understanding the rate of increase per day in every state and thereby in India. Also, the testing sample per day in every state is examined.

 

Few studies on geographical weighted regression (GWR) were explored for this study, and implemented on Indian COVID-19 data. Mollalo for COVID-19 data in the US has performed different models on the dataset, which included thirty-five environmental variables, socioeconomic, behavioral, topographic and demographic factors. The five different models used were three global models, namely ordinary least square (OLS), SLM and SEM, and two local models, namely GWR and multiscale GWR (MGWR). The results of MGWR achieved the highest goodness-of-fit with the most parsimonious model compared to others. The spatial variability of MGWR in different countries can reflect different behavior of COVID-19 cases in response to the explanatory variables [4].

 

Wang performed GWR to examine the relationship between the index of frequency of extreme precipitation and other climatic extreme indices in China that includes the frequency of warm days, warm nights, cold days, and cold nights. Based on statistical tests, the regression relationship was observed to be significant between spatial non-stationarity and explanatory variables that exhibited significant spatial inconsistency. GWR was implemented in a case of ecological inference to solve the problems related to the inference of the individual [5]. Calvo & Escolar proposed GWR approach for solving complications of spatial aggregation bias and spatial autocorrelation that affect all well-known approaches of ecological inference. This estimation process can theoretically and intuitively compute, showing that GWR approach to Goodman and King’s Ecological Inference methods results in unbiased and consistent local estimates of ecological data that reveal extreme spatial heterogeneity [6]. GWR on data of house price varying with both power and rotation parameters to generate different Minkowski distances, the study proved that the local collinearity can be both negatively and positively affected by distance metric choice. The results indicate that distance metric choice can provide a useful extra tuning component to address local collinearity issues in spatially varying coefficient modelling and helps to understand the interaction of distance metric and collinearity can provide insight into the nature and structure of the data relationships [7].

 

Franch-Pardo carried out an assessment of sixty three scientific articles on geospatial and spatial-statistical analysis of COVID-19. The study is grouped into the categories of disease mapping: spatiotemporal analysis, health and social geography, environmental variables, data mining, and web-based mapping. It was clarified that the spatiotemporal dynamics of COVID-19 needs very strong decision making, planning and community action. Also, it emphasized that the challenges from an interdisciplinary perspective with proactive planning, international solidarity and a global perspective needs to be addressed to fight COVID-19 [8]. Gupta used long-term climatic data of air temperature (V1), rainfall (V1), actual evapotranspiration (V1), solar radiation (V1), specific humidity, wind speed with topographic altitude and density of population at the regional point to examine the spatial association with the quantity of COVID-19 infections. Their results proved Variable Importance of Projection through PLS technique that had very higher significance over all V1’s [9].

 

Boulos & Geraghty (2020) discusses about the disease mapping and the social media reactions for disease spread, predictive risk mapping using population travel data, tracing and mapping super-spreader trajectories and contacts across space and time. The study is how GIS and mapping dashboards can support the fight against infectious disease outbreaks and epidemics [10]. Krishnakumar & Rana gives good insights to make effective approach to culminate the world threat COVID-19 in India [11]. Pulla (2020) expresses that the transmission of COVID-19 by asymptomatic people would reduce the effectiveness of airport screening and quarantine measures. It was communicated that India would have confirmed cases of COVID-19 between around 100 000 and 1.3 million by the middle of May if the virus continues to spread at its current rate [12].

 

2. Data and methodology

The study includes the spatial models, namely ordinary least square (OLS), geographical weighted regression (GWR), and generalized linear regression (GLR). The details of each models is discussed in Spatial Regression Models section. The day to day data was collected from Ministry of health and family welfare (Table 1) and analyzed using ArcGIS Pro with spatial models. The samples for testing data for state-wise and all over India were obtained from Statista and ICMR websites. The testing data were found to vary based on testing labs available in each state. Total number of samples tested as on June 15, 2020 in few states such as TN (729002), MH (671348), RJ (609296) and AP (567375) were updated when compared with the confirmed cases in TN (44661), MH (107958), RJ (12694) and AP (6163). The time series graph for day wise increase in India and twelve different states was obtained using STATA 12 IC. The two way graph for sample tested and confirmed case for states was performed on STATA 12 IC. The study includes the increase of COVID-19 confirmed cases after the lockdown period and is analyzed using ArcGIS on the confirmed case of June 15, 2020 as the response variable with the explanatory variables of COVID-19 cases i.e. March 15, 2020, April 7, April 12, May 12 and June 1, 2020. Here, for this study, the confirmed cases of dataset is classified into three cases ie. case-1: June 15, 2020 Vs March 15 and April 7, 2020; case-2: June 15, 2020 Vs April 12, May 12 and June 1, 2020; and case-3: June 15, 2020 Vs March 15, April 7, April 12, May 12 and June 1, 2020.

 

2.1 Spatial regression models

2.1.1 Ordinary least square

Ward & Gleditsch discusses about OLS as a linear regression approach that examines the relationships between dependent variable and a set of explanatory variables and is represented with the following notation (eq. 1)

yi0+xβ+ ϵi ------------------ (1)

 

where i represents any country, yi is the confirmed cases (dependent variable), the intercept of the model (β0), the vector of selected explanatory variables (xi), the vector of regression coefficients (β), and random error term (εi) [13]. Based on the nature of the spatial dependence, OLS will be either incompetent with incorrect standard errors or biased and inconsistent [14]. If spatial dependence among the data exists, then it violates assumptions about the error term [15, 16].

 

2.1.2 Generalized linear regression

GLR is a regression model used to generate predictions or to model a dependent variable in relation to a set of explanatory variables. Its prediction can be used to examine and quantifies relationships among features. The tool is used to fit continuous (OLS), binary (logistic), and count (Poisson) models. A count model assumes that the mean and variance of the dependent variable are equal, and moreover, the values of the dependent variable cannot be negative or contain decimals. The notation of GLR is as follows in eq. 2.

 

y= β0+ β1 x+ β2 x2+⋯+ βn xn + e1 ------------ (2)

 

where β0 is the intercept, β1, and β2 are the slope and coefficient of the explanatory variables in regressions with x1 x2 and, …xn, respectively. The term ei is the error terms, and y is the dependent variable [17].

 

Table 1: Covid- 19 data as on June 15, 2020.

 

S. No.

State

Confirmed cases

Recovered

Death

1

AN Islands

38

33

0

2

AP

6163

3314

84

3

AR

91

7

0

4

AS

4049

1960

8

5

BR

6470

4170

39

6

CH

352

293

5

7

CG

1662

763

8

8

DD

36

2

0

9

DL

41182

15823

1327

10

GA

564

74

0

11

GJ

23544

16325

1477

12

HR

7208

3003

88

13

HP

518

337

7

14

JK

5041

2389

59

15

JH

1745

905

8

16

KA

7000

3955

86

17

KL

2461

1102

19

18

LA

549

80

1

19

MP

10802

7677

459

20

MH

107958

50978

3950

21

MN

458

91

0

22

ML

44

25

1

23

MZ

112

1

0

24

NL

168

88

0

25

OD

3909

2708

11

26

PY

194

91

5

27

PB

3140

2356

67

28

RJ

12694

9566

292

29

SK

68

4

0

30

TN

44661

24547

435

31

TS

4974

2377

185

32

TR

1076

315

1

33

UK

1819

1111

24

34

UP

13615

8268

399

35

WB

11087

5060

475

Total

332424

169798

9520

 

2.1.3 Geographically weighted regression

GWR is a spatial techniques mostly used in geography and many other disciplines. GWR is a local model of the variable to predict by fitting a regression equation to each feature in the dataset. It should be noted that GWR is not an appropriate method for small datasets and does not work with multipoint data. The notation of GWR is given in eq. 3 [4].

 

--- (3)

 

where i is a country, yi is the value for the confirm cases, the intercept (βi0), the jth regression parameter (βij), Xij is the value of the jth explanatory parameter, and εi is a random error term.

 

3. Findings and results

The models are generated on the datasets obtained from https://www.mohfw.gov.in/ and the analysis was performed on ArcGIS Pro. The results of OLS model (Figure 1) summaries the coefficient, T-statistic and P-value along with VIFs on explanatory variables assumed (Tables 2, 3 and 4); the selected variables have relatively low multi-collinearity since the Variance Inflation Factor (VIFs) for all of explanatory variables were positively associated with confirmed cases (p< 0.01). The p-value for Con-June-1 (0.0000) is much better with VIF (66.16) over the other explanatory in case-1, similarly in case-II, p-value of Con-Apr-7(0.00001) is good fit; and Con-June-1 (0.0000) in case-III.

 

Table 2: Summary of OLS model on explanatory variables –Case-1.

 

Var

Coeff.

T-statistic

P-Value

VIF

Intercept

189.2513

0.6132

0.5443

---

Con-Mar-15

-173.9137

-2.7939

0.0089

3.1445

Con-Apr-7

23.9996

3.1082

0.0041

15.8673

Con-Apr-12

-0.9229

-0.2325

0.8177

30.5448

Con-May-12

-2.0101

-3.9724

0.0004

72.7011

Con-June-1

2.2400

12.9313

0.0000

66.1686

 

Table 3: Summary of OLS model on explanatory variables – Case –II.

 

Var

Coeff.

T-statistic

P-Value

VIF

Intercept

-1320.4522

-0.5970

0.5546

----

Con-Mar-15

587.1253

1.6602

0.1063

1.8876

Con-Apr-7

101.1918

5.1819

0.00001

1.8876

 

Table 4: Summary of OLS model on explanatory variables – Case –III.

 

Var

Coeff.

T-statistic

P-Value

VIF

Intercept

187.4760

0.5468

0.5883

----

Con-Apr-12

10.04214

5.0953

0.00002

5.8852

Con-May-12

-2.2176

-4.0508

0.0003

6.4994

Con-June-1

2.1562

11.6117

0.0000

59.4165

 

GLR model (Figure 2) with the summary of the three different models is shown in the Tables 5, 6 and 7. The z-score for the intercept is 1720.1645, Con-Mar-15 (-28.6294), Con-Apr-7 (58.2404), Con-Apr-12 (52.3568), Con-Mar-12 (166.8258) and Con-June-1 (-143.532) in table 5. Z-scores are standard deviations. Both z-scores and p-values are associated with the standard normal distribution. In table 6 the z-score for intercept (2199.1819), Con-Mar-15 (51.5807), and Con-Apr-7 (575.7223) and in table 7, the z-score for intercept (1769.2154), Con-Apr-12 (359.7744), Con-May-12 (168.4036), and Con-June-1 (-180.429).

 

Figure 1: OLS model for confirmed cases in India.

 

Table 5: Summary of GLR model on explanatory variables – Case-I.

 

Var.

Coeff.

SE

z-score

P-value

VIF

Intercept

7.5277

0.0043

1720.1645

0.0000

--

Con-Mar-15

-0.0117

0.0004

-28.6294

0.0000

3.1445

Con-Apr-7

0.0030

0.00005

58.2404

0.0000

15.8673

Con-Apr-12

0.0014

0.00003

52.3568

0.0000

30.5448

Con-May-12

0.0004

0.000003

166.8258

0.0000

72.7011

Con-June-1

-0.0001

0.000001

-143.532

0.0000

66.1686

 

Table 6: Summary of GLR model on explanatory variables –Case-II.

 

Var.

Coeff.

SE

z-score

P-value

VIF

Intercept

7.7134

0.0035

2199.1819

0.0000

--

Con-Mar-15      

0.0082

0.0001

51.5807

0.0000

1.8876

 Con-Apr-7      

0.0070

0.00001