Load data from geopackage
## Reading layer `chargingStations' from data source `/home/slautenb/Documents/lehre/HD/ws_2020_21/heikaLab/R4UrbanDataLab_2020/data/HH.gpkg' using driver `GPKG'
## Simple feature collection with 1061 features and 12 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 552777.7 ymin: 5921782 xmax: 581876.5 ymax: 5954451
## projected CRS: ETRS89 / UTM zone 32N
What drives the clustered pattern of charging stations? One assumption could be that it is related to population density. The more people inhabit a plce the higher the number of charing statiosn. This is of course overly simplistic but it is a start.
Read population shapefile
## Reading layer `popPoly' from data source `/home/slautenb/Documents/lehre/HD/ws_2020_21/heikaLab/R4UrbanDataLab_2020/data/popPoly.shp' using driver `ESRI Shapefile'
## Simple feature collection with 7163 features and 6 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 522696.9 ymin: 5898155 xmax: 616696.9 ymax: 5975155
## projected CRS: ETRS89 / UTM zone 32N
## id left top right
## Min. : 1 Min. :523197 Min. :5899655 Min. :524197
## 1st Qu.:1832 1st Qu.:546197 1st Qu.:5918655 1st Qu.:547197
## Median :3623 Median :570197 Median :5937655 Median :571197
## Mean :3629 Mean :569821 Mean :5937938 Mean :570821
## 3rd Qu.:5424 3rd Qu.:593197 3rd Qu.:5956655 3rd Qu.:594197
## Max. :7237 Max. :616197 Max. :5975655 Max. :617197
## bottom pop_total geometry
## Min. :5898655 Min. : -1.0 POLYGON :7163
## 1st Qu.:5917655 1st Qu.: -1.0 epsg:25832 : 0
## Median :5936655 Median : 30.0 +proj=utm ...: 0
## Mean :5936938 Mean : 471.8
## 3rd Qu.:5955655 3rd Qu.: 248.0
## Max. :5974655 Max. :20088.0
-1 represents mising data so these need to be set to NA
## tmap mode set to interactive viewing
tm_shape(pop) +
tm_fill(col="pop_total", convert2density = TRUE, alpha=.5, pallette="plasma", style="kmeans") +
tm_shape(charging_stations) +
tm_dots(alpha=.5)
To study the relationship between the two variables we need to combine them. A straight forward approach is a spatial join that counts the number of charging places (one charging station could have more than one charging point) per vector grid cell of the population data set.
First we join the polygon information to the charging stations.
Add a simple ID to the charging stations since gml_id is cumbersome.
## Simple feature collection with 6 features and 19 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 563600.1 ymin: 5924303 xmax: 571126.4 ymax: 5938687
## projected CRS: ETRS89 / UTM zone 32N
## gml_id standort anzahl_ladepunkte ladesaeule_status
## 1 APP_STROMNETZ_EMOBILITY_7035387 398 2 teilweise belegt
## 2 APP_STROMNETZ_EMOBILITY_7035388 441 2 frei
## 3 APP_STROMNETZ_EMOBILITY_7035389 305 2 teilweise belegt
## 4 APP_STROMNETZ_EMOBILITY_7035390 476 2 frei
## 5 APP_STROMNETZ_EMOBILITY_7035391 052 3 belegt
## 6 APP_STROMNETZ_EMOBILITY_7035392 076 3 frei
## adresse koordinaten typ ladepunkt stecker
## 1 Am Kaiserkai 53 N53.541290 E9.987390 AC <NA> Typ2/Schuko
## 2 Neuenfelder Straße 19 N53.497900 E10.002170 AC <NA> Typ2/Schuko
## 3 Bellealliancestraße 40 N53.568036 E9.960359 AC <NA> Typ2/Schuko
## 4 Wallgraben 46 N53.463710 E9.978910 AC <NA> Typ2/Schuko
## 5 Wandsbeker Zollstraße 15 N53.574780 E10.074180 DC <NA> CHAdeMO
## 6 Grasweg 6 N53.592820 E10.002150 AC <NA> Typ2
## status authmethod_1 authmethod_2 statID id left top
## 1 belegt Öffentlich zugänglich RFID-Karte charge_1 3277 565196.9 5933655
## 2 frei Öffentlich zugänglich RFID-Karte charge_2 3359 566196.9 5928655
## 3 frei Öffentlich zugänglich RFID-Karte charge_3 3120 563196.9 5936655
## 4 frei Öffentlich zugänglich RFID-Karte charge_4 3285 565196.9 5925655
## 5 belegt Öffentlich zugänglich RFID-Karte charge_5 3735 571196.9 5937655
## 6 frei Öffentlich zugänglich RFID-Karte charge_6 3348 566196.9 5939655
## right bottom pop_total geom
## 1 566196.9 5932655 790 POINT (565431.5 5932941)
## 2 567196.9 5927655 2214 POINT (566478.7 5928127)
## 3 564196.9 5935655 13242 POINT (563600.1 5935892)
## 4 566196.9 5924655 578 POINT (564988.1 5924303)
## 5 572196.9 5936655 9534 POINT (571126.4 5936750)
## 6 567196.9 5938655 10461 POINT (566328.9 5938687)
We will drop geometries here since they tend to mix up things.
chargingStationsPolyWide <- chargingStationsPoly %>%
st_drop_geometry() %>%
pivot_wider( id_cols = id, names_from = statID, values_from = anzahl_ladepunkte)
Next we simply add up numbers across rows, ignoring NA values
chargingStationsPolyWide$sumChargingPoints <- chargingStationsPolyWide %>%
select(-one_of("id")) %>%
rowSums(na.rm=TRUE)
Drop the other columns
We can join the resulting aggregated data frame to the polygon data set based on glm_id. Let us check if that is a qunique id:
## [1] 7163
## [1] 7163
Yes, no dublicates.
## [1] 7163 8
## Simple feature collection with 6 features and 7 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 522696.9 ymin: 5969155 xmax: 523696.9 ymax: 5975155
## projected CRS: ETRS89 / UTM zone 32N
## id left top right bottom pop_total sumChargingPoints
## 1 1 523196.9 5975655 524196.9 5974655 3 NA
## 2 2 523196.9 5974655 524196.9 5973655 7 NA
## 3 3 523196.9 5973655 524196.9 5972655 10 NA
## 4 4 523196.9 5972655 524196.9 5971655 18 NA
## 5 5 523196.9 5971655 524196.9 5970655 10 NA
## 6 6 523196.9 5970655 524196.9 5969655 34 NA
## geometry
## 1 POLYGON ((522696.9 5975155,...
## 2 POLYGON ((522696.9 5974155,...
## 3 POLYGON ((522696.9 5973155,...
## 4 POLYGON ((522696.9 5972155,...
## 5 POLYGON ((522696.9 5971155,...
## 6 POLYGON ((522696.9 5970155,...
## id left top right
## Min. : 1 Min. :523197 Min. :5899655 Min. :524197
## 1st Qu.:1832 1st Qu.:546197 1st Qu.:5918655 1st Qu.:547197
## Median :3623 Median :570197 Median :5937655 Median :571197
## Mean :3629 Mean :569821 Mean :5937938 Mean :570821
## 3rd Qu.:5424 3rd Qu.:593197 3rd Qu.:5956655 3rd Qu.:594197
## Max. :7237 Max. :616197 Max. :5975655 Max. :617197
##
## bottom pop_total sumChargingPoints geometry
## Min. :5898655 Min. : 3.0 Min. : 4.00 POLYGON :7163
## 1st Qu.:5917655 1st Qu.: 22.0 1st Qu.: 4.00 epsg:25832 : 0
## Median :5936655 Median : 104.0 Median : 8.00 +proj=utm ...: 0
## Mean :5936938 Mean : 663.6 Mean :11.99
## 3rd Qu.:5955655 3rd Qu.: 480.0 3rd Qu.:16.00
## Max. :5974655 Max. :20088.0 Max. :56.00
## NA's :2067 NA's :6969
We have a lot of NA values which indicate - in our case here - that no charging points are present. SO we should set them to zero.
For pop_total we have real missing values. For the regression analysis it is best to drop those rows.
So it looks as if we have a relationship between the two variables. Lets fit a simple linear model to the data.
##
## Call:
## lm(formula = sumChargingPoints ~ pop_total, data = popChargingCleaned)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.099 -0.107 0.271 0.358 42.846
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.806e-01 3.728e-02 -10.21 <2e-16 ***
## pop_total 1.259e-03 2.228e-05 56.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.443 on 5094 degrees of freedom
## Multiple R-squared: 0.3853, Adjusted R-squared: 0.3852
## F-statistic: 3193 on 1 and 5094 DF, p-value: < 2.2e-16
So we have a signisifcant positive effect of population on the number of chargng points. For grid cells with 1000 inhabitants more we expect on average 1.259 mor charging points.
How does the regression line looks like?
ggplot(popChargingCleaned, mapping= aes(x=pop_total, y= sumChargingPoints)) +
geom_point(alpha=.25) +
geom_smooth(method="lm") +
labs(title="Charging stations Hamburg", subtitle = "1sqm grid cells") +
xlab("Population") + ylab("Number of charging points")
## `geom_smooth()` using formula 'y ~ x'
The regression line fit the data not too well.
How do the residulas look like?
Long story short: not normally distributed, a few grid cells with large cook’s distance (influential points).
Another option would be to fit a generalized linear model with a Poisson distribution and a log link:
poisModel <- glm(sumChargingPoints ~ pop_total, data=popChargingCleaned, family=poisson)
summary(poisModel)
##
## Call:
## glm(formula = sumChargingPoints ~ pop_total, family = poisson,
## data = popChargingCleaned)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.4121 -0.6592 -0.6255 -0.6181 14.2015
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.662e+00 2.900e-02 -57.3 <2e-16 ***
## pop_total 3.660e-04 2.816e-06 130.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 16759.2 on 5095 degrees of freedom
## Residual deviance: 8609.3 on 5094 degrees of freedom
## AIC: 9384.3
##
## Number of Fisher Scoring iterations: 6
ggplot(popChargingCleaned, mapping= aes(x=pop_total, y= sumChargingPoints)) +
geom_point(alpha=.25) +
geom_smooth(method="glm", method.args = list(family = "poisson")) +
labs(title="Charging stations Hamburg", subtitle = "1sqm grid cells, Poisson GLM") +
xlab("Population") + ylab("Number of charging points")
## `geom_smooth()` using formula 'y ~ x'
## [1] 23569.92
## [1] 9384.344
The Poisson model fits the data better, indicated by the smaller AIC. However, it looks unrealistic for large population values. We might think about transforming the predictor, e.g. by a square root transformation.
poisModel <- glm(sumChargingPoints ~ sqrt(pop_total), data=popChargingCleaned, family=poisson)
summary(poisModel)
##
## Call:
## glm(formula = sumChargingPoints ~ sqrt(pop_total), family = poisson,
## data = popChargingCleaned)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -11.4747 -0.5059 -0.3777 -0.3270 14.2358
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.1664820 0.0469653 -67.42 <2e-16 ***
## sqrt(pop_total) 0.0577123 0.0005307 108.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 16759.2 on 5095 degrees of freedom
## Residual deviance: 6548.9 on 5094 degrees of freedom
## AIC: 7323.9
##
## Number of Fisher Scoring iterations: 6
AIC value smaller -> better fit.
ggplot(popChargingCleaned, mapping= aes(x=pop_total, y= sumChargingPoints)) +
geom_point(alpha=.25) +
geom_smooth(method="glm", formula= y ~ sqrt(x), method.args = list(family = "poisson")) +
labs(title="Charging stations Hamburg", subtitle = "1sqm grid cells, Poisson GLM") +
xlab("Population") + ylab("Number of charging points")
Rerun the analysis using the data from Berlin. The population data is stored in the geopackage Berlin.gpkg, the name of the layer is population.
Take care: the Berlin data does not contain information on the number of charging points. The simplest solution would be to add a field count, set it to 1 and use that instead of the number of charging points. Also the files have different names for the attribute tables. The id of the population data set is named idPoly while the charging stations have a field id as an unique identifier.
Define neighborhood as polygons sharing at least one point - queen neighborhood as the default.
## Neighbour list object:
## Number of regions: 5096
## Number of nonzero links: 31556
## Percentage nonzero weights: 0.1215131
## Average number of links: 6.192308
## 6 regions with no links:
## 1487 5912 6828 6934 7153 7163
Six regions without links were reported!
Convert to sp objects. This is needed for some analysis. sp and sf are two packes designed to work with spatial data. sf is more recent and from my perspective easier to work with and faster. Sinc sp has been around for a longer time it has more packages accepting it as an input (and not sf objects). While a nuisance, it is not a big effort to convert from one representation to another and it is a lossless conversion.
Get the residuals to the polygon data set:
popChargingCleaned$residLm <- residuals(lmModel)
popChargingCleaned$residPoissonSqrt <- residuals(poisModel)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum Unknown based on GRS80 ellipsoid in CRS definition
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum European Terrestrial Reference System 1989 in CRS
## definition
plot(popChargingCleanedSp, border="grey")
plot(pop.nb, coords=coordinates(popChargingCleanedSp)[,1:2], col="brown", add=TRUE, points=FALSE)
The neighbor list is symetric.
## [1] TRUE
Create spatial weight Matrix from the neighbors using row standardization
As an alternative we could use e.g. the 8 nearest neighbors
plot(popChargingCleanedSp, border="grey")
plot(pop.k8nb, coords=coordinates(popChargingCleanedSp), col="red", add=TRUE, cex=.7)
The neighbor list is not symetric.
## [1] FALSE
Create weighted list.
All census tracks have the same number of neighbors
par(mfrow=c(1,2))
hist(unlist(pop.lw$weights), las=1, xlab="weights", main="Continguity nb, W", col="blue")
hist(unlist(pop.k8lw$weights), las=1, xlab="weights", main="nn-8 nb, W", col="blue")
We could also calculate distances and inversely weight by distance before row-standardization.
dlist <- nbdists(pop.k8nb, coordinates(popChargingCleanedSp))
dlist <- lapply(dlist, function(x) 1/x)
pop.k8lwDist <- nb2listw(pop.k8nb, glist=dlist, style = "W")
hist(unlist(pop.k8lwDist$weights), las=1, xlab="weights", main="nn-8 nb, idw, W", col="blue")
The idw (inverse distance weighted) approach can be applied to the other neighborhood definitions as well.
Moran’s I is calculated based on a neighborhood definition (weighted list) as well as the definition of the number of zones (typically the number of features) and the global sum of weights. Convenience functions help to get the required parameter.
moran.test provides the according test statistics. Like usual, don’t get just excited about p-values but also look at the absolute numbers.
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = sumChargingPoints ~ pop_total, data =
## popChargingCleaned)
## weights: pop.lw
##
## Moran I statistic standard deviate = 59.959, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.5006840535 -0.0003277572 0.0000698220
##
## Global Moran I for regression residuals
##
## data:
## model: glm(formula = sumChargingPoints ~ sqrt(pop_total), family =
## poisson, data = popChargingCleaned)
## weights: pop.lw
##
## Moran I statistic standard deviate = 45.506, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 3.887178e-01 -1.574914e-03 7.356117e-05
This indicates strong positive spatial autocorrelation for both the linear model as well as the GLM.
A correlogram basically calculates Moran’s for different distance classes. We use the function correlog from package ncf for this purpose. The method works on distance and does not consider a weight matrix W.
increment specifies the size of the bins (distance bands) for the calculation and is measured in the coordinates of the spatial data (meters for our data set).
resamp allows the specification of the number of iterations for a permutation test.
The results contain the value of Moran’s I in slot correlation, the average of the distances for eac distance class in * mean.of.class$, the number of pairs within each distance class in n, the p-value (if resamp > 0) in p.
Here I plot the first 20 distance bands and reflecting a scaled version of the number of pairs in the distance band for the point size .
correlogResidLm <- ncf::correlog(coordinates(popChargingCleanedSp)[,1], coordinates(popChargingCleanedSp)[,2], popChargingCleanedSp$residLm,
na.rm=TRUE, increment=500, resamp=0)
par(mar=c(5,7, 4.1, 0.1))
plot(correlogResidLm$correlation[1:20] ~ correlogResidLm$mean.of.class[1:20] , type="b", pch=16,
cex=sqrt(correlogResidLm$n)/70, lwd=1.5, xlab="distance ", ylab="", cex.lab=1, cex.axis=1,
las=1, main="Linear model, Hamburg")
abline(h=0)
mtext(text = "Moran's I", side = 2, line = 5, cex=1)
We see how spatial autocorrelation decreases with distance between grid cells.
First we source a script file that helps us with plotting.
## Loading required package: maptools
## Checking rgeos availability: TRUE
mylocalMI(plotvar= popChargingCleanedSp$residLm, nb = pop.lw, shape = popChargingCleanedSp,
main = "LISA on residuals of linear model, Hamburg", verbose = FALSE, zero.policy = TRUE)
Clearly, there is a cluster of high residuals in the center and several clusters with low residuals. surrounded by cells with low residuals.
## tmap mode set to interactive viewing
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
So for sure we have a problem with spatial autocorrelation. Let’s look at different possibilities how to deal with it.
We fit a spatial lag model that has the form:
\(y = \rho W y + X \beta + \epsilon\)
The operation is computational intense so you potentially can get a cup of coffee while you wait for the resuls. Setting method=“SE_classic” is much faster by an approximation used and might be fine for our case.
lagModel2 <- lagsarlm(sumChargingPoints ~ pop_total, data=popChargingCleaned, listw=pop.lw,
zero.policy=TRUE, method="SE_classic")
summary(lagModel2)
##
## Call:
## lagsarlm(formula = sumChargingPoints ~ pop_total, data = popChargingCleaned,
## listw = pop.lw, method = "SE_classic", zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.4662575 -0.0083005 0.1732037 0.2114703 33.8309176
##
## Type: lag
## Regions with no neighbours included:
## 1487 5912 6828 6934 7153 7163
## Coefficients: (numerical Hessian approximate standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.2143e-01 2.6245e-02 -8.4371 < 2.2e-16
## pop_total 5.2420e-04 1.5712e-05 33.3636 < 2.2e-16
##
## Rho: 0.707, LR test value: 2941, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 4.0254e-06
## z-value: 175630, p-value: < 2.22e-16
## Wald statistic: 3.0848e+10, p-value: < 2.22e-16
##
## Log likelihood: -10311.45 for lag model
## ML residual variance (sigma squared): 2.9729, (sigma: 1.7242)
## Number of observations: 5096
## Number of parameters estimated: 4
## AIC: 20631, (AIC for lm: 23570)
This would be the more intense computaion - which lead for the data at hands to very similar results.
lagModel <- lagsarlm(sumChargingPoints ~ pop_total, data=popChargingCleaned, listw=pop.lw, zero.policy=TRUE)
summary(lagModel)
Interpretation of the regression coefficients is not straight forward. We need to distinguish between the direct, the indicrect (via the lagged response) and the total effect The total effect of a unit change in one of the predictors is: \(\hat{\beta}/ (1- \hat{\rho})\)
Using the full weight matrix is computational intense for a weight matrix of our size:
We will speed up computation by proving a vector of traces of powers of the spatial weights matrix created using trW, for approximate impact measures
## Impact measures (lag, trace):
## Direct Indirect Total
## pop_total 0.0006064485 0.001182576 0.001789024
The see that the total effect is nearly three times as big as the direct effects. This is indicated by the big \(\rho\) value.
We now fit a lm model augmented with the spatially lagged RHS variables. The idea is to use a lagged version of the predictors in addition to the predictors.
slxModel <- lmSLX(sumChargingPoints ~ pop_total, data=popChargingCleaned, listw=pop.lw, zero.policy=TRUE)
summary(slxModel)
##
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.206 -0.177 0.322 0.482 44.504
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6074666 0.0375937 -16.16 <2e-16 ***
## pop_total 0.0006282 0.0000379 16.58 <2e-16 ***
## lag.pop_total 0.0009433 0.0000467 20.20 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.351 on 5093 degrees of freedom
## Multiple R-squared: 0.4309, Adjusted R-squared: 0.4307
## F-statistic: 1928 on 2 and 5093 DF, p-value: < 2.2e-16
## Impact measures (SLX, estimable):
## Direct Indirect Total
## pop_total 0.000628213 0.0009432578 0.001571471
popChargingCleaned$residLag <- residuals(lagModel2)
popChargingCleaned$residSlx <- residuals(slxModel)
## tmap mode set to interactive viewing
tm_shape(popChargingCleaned) +
tm_fill(col="residLag", palette = "-RdBu", midpoint=0, alpha=0.5) +
tm_layout(title="Residuals spatial lag model, Hamburg")
## tmap mode set to interactive viewing
If we study the distribution of the residuals we see that the pattern is different, having a more random pattern for the spatial lag model, which is beneficial.
If we want to use the count regression approach (which is more suitable), the spatial eigenvector mapping approach is the tool at hands. Unfortunately, it is again computational intense.
sevmPoiss <- ME(sumChargingPoints ~ sqrt(pop_total), data=popChargingCleaned, family=poisson, zero.policy = TRUE, listw=pop.lw)
## Warning: glm.fit: fitted rates numerically 0 occurred
## Warning: glm.fit: fitted rates numerically 0 occurred
## Warning: glm.fit: fitted rates numerically 0 occurred
## Warning: glm.fit: fitted rates numerically 0 occurred
## Warning: glm.fit: fitted rates numerically 0 occurred
## Warning: glm.fit: fitted rates numerically 0 occurred
## Warning: glm.fit: fitted rates numerically 0 occurred
## Eigenvector ZI pr(ZI)
## 0 NA NA 0.01
## 1 695 NA 0.01
## 2 118 NA 0.01
## 3 617 NA 0.01
## 4 256 NA 0.01
## 5 539 NA 0.01
## 6 71 NA 0.01
## 7 814 NA 0.09
sevmPoissModel <- glm(sumChargingPoints ~ sqrt(pop_total) + fitted(sevmPoiss), data=popChargingCleaned, family=poisson)
summary(sevmPoissModel)
##
## Call:
## glm(formula = sumChargingPoints ~ sqrt(pop_total) + fitted(sevmPoiss),
## family = poisson, data = popChargingCleaned)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -9.4185 -0.5059 -0.3354 -0.2436 12.2322
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.523e+00 5.414e-02 -65.075 <2e-16 ***
## sqrt(pop_total) 5.607e-02 6.324e-04 88.663 <2e-16 ***
## fitted(sevmPoiss)vec695 6.257e-01 1.275e+00 0.491 0.624
## fitted(sevmPoiss)vec118 2.739e+01 1.510e+00 18.143 <2e-16 ***
## fitted(sevmPoiss)vec617 1.961e+01 1.701e+00 11.528 <2e-16 ***
## fitted(sevmPoiss)vec256 -1.990e+01 1.571e+00 -12.666 <2e-16 ***
## fitted(sevmPoiss)vec539 -2.060e+01 1.257e+00 -16.386 <2e-16 ***
## fitted(sevmPoiss)vec71 -3.895e+01 1.939e+00 -20.093 <2e-16 ***
## fitted(sevmPoiss)vec814 -1.679e+01 1.809e+00 -9.282 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 16759.2 on 5095 degrees of freedom
## Residual deviance: 4950.9 on 5087 degrees of freedom
## AIC: 5739.9
##
## Number of Fisher Scoring iterations: 6
Presumably we have missed a lot of other predictors that are now partially absorbed into different eigenvectors. It might be worth to plot and investigate the eigenvectors that made it into the model. Therefore, we attach the selected eigenvectors to the sf object and plot them.
Most selected spatial eigenvectors represent rather short ranges spatial autocorrelation pattern which represent presumably the spillover effects that we captured in the spatial lag modell. In addition we see two relatively long ranging spatial structure (vec 71, 118 and 256).
Try the following changes to the analysis:
How do the estimated regression coefficients change if the 8-nearest neighbors are used in the idw version both for the spatial lag and the spatial lagged X model?
Rerun the analysis using the data from Berlin. The population data is stored in the geopackage Berlin.gpkg, the name of the layer is population.