Load data from geopackage
## Reading layer `chargingStations_berlin' from data source `/home/slautenb/Documents/lehre/HD/ws_2020_21/heikaLab/R4UrbanDataLab_2020/data/chargingStations_berlin.shp' using driver `ESRI Shapefile'
## Simple feature collection with 486 features and 20 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 373818.6 ymin: 5807999 xmax: 407135.9 ymax: 5829471
## projected CRS: ETRS89 / UTM zone 33N
What drives the clustered pattern of charging stations? One assumption could be that it is related to population density. The more people inhabit a plce the higher the number of charing statiosn. This is of course overly simplistic but it is a start.
Read population shapefile
## Reading layer `population' from data source `/home/slautenb/Documents/lehre/HD/ws_2020_21/heikaLab/R4UrbanDataLab_2020/data/Berlin.gpkg' using driver `GPKG'
## Simple feature collection with 805 features and 2 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 372713.8 ymin: 5807488 xmax: 407713.8 ymax: 5830488
## projected CRS: ETRS89 / UTM zone 33N
## pop_total idPoly geom
## Min. : 0.229 Min. : 1 POLYGON :805
## 1st Qu.: 826.406 1st Qu.:202 epsg:NA : 0
## Median : 2751.180 Median :403 +proj=utm ...: 0
## Mean : 3999.399 Mean :403
## 3rd Qu.: 5724.300 3rd Qu.:604
## Max. :20881.894 Max. :805
## NA's :14
-1 represents mising data so these need to be set to NA
## tmap mode set to interactive viewing
tm_shape(pop) + tm_fill(col="pop_total", convert2density = TRUE, alpha=.5, pallette="plasma", style="kmeans") + tm_shape(charging_stations) + tm_dots(alpha=.5)
To study the relationship between the two variables we need to combine them. A straight forward approach is a spatial join that counts the number of charging places (one charging station could have more than one charging point) per vector grid cell of the population data set.
First we join the polygon information to the charging stations.
Add a simple ID to the charging stations since gml_id is cumbersome.
## Simple feature collection with 6 features and 23 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 389462.1 ymin: 5820993 xmax: 391281.2 ymax: 5821719
## projected CRS: ETRS89 / UTM zone 33N
## id quelle p1_kw steckertyp p2_kw steckert_1 p3_kw steckert_2 p4_kw
## 1 1 bnetza 11 AC Steckdose Typ 2 11 <NA> NA <NA> NA
## 2 2 bnetza 11 AC Steckdose Typ 2 11 <NA> NA <NA> NA
## 3 3 bnetza 11 AC Steckdose Typ 2 11 <NA> NA <NA> NA
## 4 4 bnetza 11 AC Steckdose Typ 2 11 <NA> NA <NA> NA
## 5 5 bnetza 11 AC Steckdose Typ 2 11 <NA> NA <NA> NA
## 6 6 bnetza 22 AC Steckdose Typ 2 22 <NA> NA <NA> NA
## inbetrieb_ x_wert y_wert betreiber
## 1 2017/03/21 00:00:00.000 391281.2 5821719 Allego GmbH
## 2 2017/03/21 00:00:00.000 391281.2 5821719 Allego GmbH
## 3 2017/03/13 00:00:00.000 390867.9 5820993 Allego GmbH
## 4 2017/01/26 00:00:00.000 389462.1 5821715 Allego GmbH
## 5 2016/12/21 00:00:00.000 389951.3 5821431 Allego GmbH
## 6 2010/04/14 00:00:00.000 391180.4 5821101 innogy eMobility Solutions GmbH
## ladeeinr_a plz_ort adresse ladeart bemerkunge
## 1 Normalladeeinrichtung 10115 Berlin Anklamer Str. 12 <NA> <NA>
## 2 Normalladeeinrichtung 10115 Berlin Anklamer Str. 12 <NA> <NA>
## 3 Normalladeeinrichtung 10115 Berlin Borsigstr. 1 <NA> <NA>
## 4 Normalladeeinrichtung 10115 Berlin Kieler Str. 5 <NA> <NA>
## 5 Normalladeeinrichtung 10115 Berlin Habersaathstr. 34 <NA> <NA>
## 6 Normalladeeinrichtung 10115 Berlin Torstra�e 165 <NA> <NA>
## anschlussl steckert_3 statID pop_total idPoly
## 1 22 AC Steckdose Typ 2 charge_1 14295.598 299
## 2 22 AC Steckdose Typ 2 charge_2 14295.598 299
## 3 22 AC Steckdose Typ 2 charge_3 11397.844 334
## 4 22 AC Steckdose Typ 2 charge_4 6535.285 297
## 5 22 AC Steckdose Typ 2 charge_5 4881.026 333
## 6 44 AC Steckdose Typ 2 charge_6 11397.844 334
## geometry
## 1 POINT (391281.2 5821719)
## 2 POINT (391281.2 5821719)
## 3 POINT (390867.9 5820993)
## 4 POINT (389462.1 5821715)
## 5 POINT (389951.3 5821431)
## 6 POINT (391180.4 5821101)
We will drop geometries here since they tend to mix up things. We have no information on the number of charging points so we will simply use the count
chargingStationsPoly$count <- 1
chargingStationsPolyWide <- chargingStationsPoly %>%
st_drop_geometry() %>%
pivot_wider( id_cols = idPoly, names_from = statID, values_from = count)
## # A tibble: 6 x 10
## idPoly charge_1 charge_2 charge_3 charge_4 charge_5 charge_6 charge_7 charge_8
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 299 1 1 NA NA NA NA NA NA
## 2 334 NA NA 1 NA NA 1 1 NA
## 3 297 NA NA NA 1 NA NA NA 1
## 4 333 NA NA NA NA 1 NA NA NA
## 5 368 NA NA NA NA NA NA NA NA
## 6 404 NA NA NA NA NA NA NA NA
## # … with 1 more variable: charge_9 <dbl>
Next we simply add up numbers across rows, ignoring NA values
chargingStationsPolyWide$sumChargingStations <- chargingStationsPolyWide %>%
select(-one_of("idPoly")) %>%
rowSums(na.rm=TRUE)
## # A tibble: 6 x 4
## idPoly sumChargingStations charge_1 charge_2
## <int> <dbl> <dbl> <dbl>
## 1 299 6 1 1
## 2 334 4 NA NA
## 3 297 3 NA NA
## 4 333 3 NA NA
## 5 368 8 NA NA
## 6 404 5 NA NA
Drop the other columns
## # A tibble: 6 x 2
## idPoly sumChargingStations
## <int> <dbl>
## 1 299 6
## 2 334 4
## 3 297 3
## 4 333 3
## 5 368 8
## 6 404 5
We can join the resulting aggregated data frame to the polygon data set based on glm_id. Let us check if that is a qunique id:
## [1] 805
## [1] 805
Yes, no dublicates.
## [1] 805 4
## Simple feature collection with 6 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 372713.8 ymin: 5829488 xmax: 378713.8 ymax: 5830488
## projected CRS: ETRS89 / UTM zone 33N
## idPoly pop_total sumChargingStations geometry
## 1 1 659.475547 NA POLYGON ((372713.8 5830488,...
## 2 2 491.451880 NA POLYGON ((373713.8 5830488,...
## 3 3 194.851756 NA POLYGON ((374713.8 5830488,...
## 4 4 2.889551 NA POLYGON ((375713.8 5830488,...
## 5 5 386.571960 NA POLYGON ((376713.8 5830488,...
## 6 6 1120.683779 NA POLYGON ((377713.8 5830488,...
## idPoly pop_total sumChargingStations geometry
## Min. : 1 Min. : 0.229 Min. : 1.000 POLYGON :805
## 1st Qu.:202 1st Qu.: 826.406 1st Qu.: 1.000 epsg:NA : 0
## Median :403 Median : 2751.180 Median : 2.000 +proj=utm ...: 0
## Mean :403 Mean : 3999.399 Mean : 2.531
## 3rd Qu.:604 3rd Qu.: 5724.300 3rd Qu.: 3.000
## Max. :805 Max. :20881.894 Max. :36.000
## NA's :14 NA's :613
We have a lot of NA values which indicate - in our case here - that no charging points are present. So we should set them to zero.
For pop_total we have real missing values. For the regression analysis it is best to drop those rows.
The relationship looks different from the one from Hamburg. There are some interesting outliers.
##
## Call:
## lm(formula = sumChargingStations ~ pop_total, data = popChargingCleaned)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.352 -0.511 -0.058 0.201 33.707
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.407e-01 8.100e-02 -2.971 0.00306 **
## pop_total 2.138e-04 1.424e-05 15.016 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.62 on 789 degrees of freedom
## Multiple R-squared: 0.2223, Adjusted R-squared: 0.2213
## F-statistic: 225.5 on 1 and 789 DF, p-value: < 2.2e-16
So we have a signisifcant positive effect of population on the number of chargng points. For grid cells with 1000 inhabitants more we expect on average 2.132 more charging stations.
How does the regression line looks like?
ggplot(popChargingCleaned, mapping= aes(x=pop_total, y= sumChargingStations)) + geom_point(alpha=.25) + geom_smooth(method="lm") + labs(title="Charging stations Berlin", subtitle = "1sqm grid cells") + xlab("Population") + ylab("Number of charging stations")
## `geom_smooth()` using formula 'y ~ x'
How do the residulas look like?
Long story short: not normally distributed (thought not as bad as for Hamburg), a few grid cells with large cook’s distance (influential points).
Another option would be to fit a generalized linear model with a Poisson distribution and a log link:
poisModel <- glm(sumChargingStations ~ pop_total, data=popChargingCleaned, family=poisson)
summary(poisModel)
##
## Call:
## glm(formula = sumChargingStations ~ pop_total, family = poisson,
## data = popChargingCleaned)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.6867 -0.8051 -0.6393 -0.5709 12.0378
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.822e+00 8.582e-02 -21.23 <2e-16 ***
## pop_total 2.063e-04 7.513e-06 27.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1710.0 on 790 degrees of freedom
## Residual deviance: 1063.3 on 789 degrees of freedom
## AIC: 1565.2
##
## Number of Fisher Scoring iterations: 6
ggplot(popChargingCleaned, mapping= aes(x=pop_total, y= sumChargingStations)) + geom_point(alpha=.25) + geom_smooth(method="glm", method.args = list(family = "poisson")) + labs(title="Charging stations Berlin", subtitle = "1sqm grid cells, Poisson GLM") + xlab("Population") + ylab("Number of charging stations")
## `geom_smooth()` using formula 'y ~ x'
## [1] 3012.243
## [1] 1565.221
The Poisson model fits the data better, indicated by the smaller AIC. HWe might think about transforming the predictor, e.g. by a square root transformation.
poisModel <- glm(sumChargingStations ~ sqrt(pop_total), data=popChargingCleaned, family=poisson)
summary(poisModel)
##
## Call:
## glm(formula = sumChargingStations ~ sqrt(pop_total), family = poisson,
## data = popChargingCleaned)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1408 -0.8106 -0.5149 -0.3159 11.5776
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.282084 0.147313 -22.28 <2e-16 ***
## sqrt(pop_total) 0.037337 0.001482 25.20 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1710.01 on 790 degrees of freedom
## Residual deviance: 994.35 on 789 degrees of freedom
## AIC: 1496.2
##
## Number of Fisher Scoring iterations: 6
## [1] 1496.236
AIC value smaller -> better fit.
ggplot(popChargingCleaned, mapping= aes(x=pop_total, y= sumChargingStations)) + geom_point(alpha=.25) + geom_smooth(method="glm", formula= y ~ sqrt(x), method.args = list(family = "poisson")) + labs(title="Charging stations Berlin", subtitle = "1sqm grid cells, Poisson GLM") + xlab("Population") + ylab("Number of charging stations")
Define neighborhood as polygons sharing at least one point - queen neighborhood as the default.
## Neighbour list object:
## Number of regions: 791
## Number of nonzero links: 5908
## Percentage nonzero weights: 0.9442511
## Average number of links: 7.469027
No regions without links were reported!
Convert to sp objects. This is needed for some analysis. sp and sf are two packes designed to work with spatial data. sf is more recent and from my perspective easier to work with and faster. Sinc sp has been around for a longer time it has more packages accepting it as an input (and not sf objects). While a nuisance, it is not a big effort to convert from one representation to another and it is a lossless conversion.
Get the residuals to the polygon data set:
popChargingCleaned$residLm <- residuals(lmModel)
popChargingCleaned$residPoissonSqrt <- residuals(poisModel)
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum Unknown based on GRS80 ellipsoid in CRS definition
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum European Terrestrial Reference System 1989 in CRS
## definition
plot(popChargingCleanedSp, border="grey")
plot(pop.nb, coords=coordinates(popChargingCleanedSp)[,1:2], col="brown", add=TRUE, points=FALSE)
The neighbor list is symetric.
## [1] TRUE
Create spatial weight Matrix from the neighbors using row standardization
As an alternative we could use e.g. the 8 nearest neighbors
plot(popChargingCleanedSp, border="grey")
plot(pop.k8nb, coords=coordinates(popChargingCleanedSp), col="red", add=TRUE, cex=.7)
The neighbor list is not symetric.
## [1] FALSE
Create weighted list.
All census tracks have the same number of neighbors
par(mfrow=c(1,2))
hist(unlist(pop.lw$weights), las=1, xlab="weights", main="Continguity nb, W", col="blue")
hist(unlist(pop.k8lw$weights), las=1, xlab="weights", main="nn-8 nb, W", col="blue")
We could also calculate distances and inversely weight by distance before row-standardization.
dlist <- nbdists(pop.k8nb, coordinates(popChargingCleanedSp))
dlist <- lapply(dlist, function(x) 1/x)
pop.k8lwDist <- nb2listw(pop.k8nb, glist=dlist, style = "W")
hist(unlist(pop.k8lwDist$weights), las=1, xlab="weights", main="nn-8 nb, idw, W", col="blue")
The idw (inverse distance weighted) approach can be applied to the other neighborhood definitions as well.
Moran’s I is calculated based on a neighborhood definition (weighted list) as well as the definition of the number of zones (typically the number of features) and the global sum of weights. Convenience functions help to get the required parameter.
moran.test provides the according test statistics. Like usual, don’t get just excited about p-values but also look at the absolute numbers.
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = sumChargingStations ~ pop_total, data =
## popChargingCleaned)
## weights: pop.lw
##
## Moran I statistic standard deviate = 7.1768, p-value = 3.569e-13
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.1305577551 -0.0022004182 0.0003421887
Significant but low spatial autocorrelation.
##
## Global Moran I for regression residuals
##
## data:
## model: glm(formula = sumChargingStations ~ sqrt(pop_total), family =
## poisson, data = popChargingCleaned)
## weights: pop.lw
##
## Moran I statistic standard deviate = 13.861, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.2710026154 -0.0072753282 0.0004030688
This indicates strong positive spatial autocorrelation for both the linear model as well as the GLM. However, the absolute size for the linear model is relatively week.
A correlogram basically calculates Moran’s for different distance classes. We use the function correlog from package ncf for this purpose. The method works on distance and does not consider a weight matrix W.
increment specifies the size of the bins (distance bands) for the calculation and is measured in the coordinates of the spatial data (meters for our data set).
resamp allows the specification of the number of iterations for a permutation test.
The results contain the value of Moran’s I in slot correlation, the average of the distances for eac distance class in * mean.of.class$, the number of pairs within each distance class in n, the p-value (if resamp > 0) in p.
Here I plot the first 20 distance bands and reflecting a scaled version of the number of pairs in the distance band for the point size .
correlogResidLm <- ncf::correlog(coordinates(popChargingCleanedSp)[,1], coordinates(popChargingCleanedSp)[,2], popChargingCleanedSp$residLm,
na.rm=TRUE, increment=500, resamp=0)
par(mar=c(5,7, 4.1, 0.1))
plot(correlogResidLm$correlation[1:20] ~ correlogResidLm$mean.of.class[1:20] , type="b", pch=16, cex=sqrt(correlogResidLm$n)/70, lwd=1.5, xlab="distance ", ylab="", cex.lab=1, cex.axis=1, las=1, main="Linear model, Berlin")
abline(h=0)
mtext(text = "Moran's I", side = 2, line = 5, cex=1)
We see how spatial autocorrelation decreases with distance between grid cells.
correlogResidPoisSqrt <- ncf::correlog(coordinates(popChargingCleanedSp)[,1], coordinates(popChargingCleanedSp)[,2], popChargingCleanedSp$residPoissonSqrt,
na.rm=TRUE, increment=500, resamp=0)
par(mar=c(5,7, 4.1, 0.1))
plot(correlogResidPoisSqrt$correlation[1:20] ~ correlogResidPoisSqrt$mean.of.class[1:20] , type="b", pch=16, cex=sqrt(correlogResidPoisSqrt$n)/70, lwd=1.5, xlab="distance ", ylab="", cex.lab=1, cex.axis=1, las=1, main="Poisson model sqrt trnaformation, Berlin")
abline(h=0)
mtext(text = "Moran's I", side = 2, line = 5, cex=1)
First we source a script file that helps us with plotting.
## Loading required package: maptools
## Checking rgeos availability: TRUE
mylocalMI(plotvar= popChargingCleanedSp$residLm, nb = pop.lw, shape = popChargingCleanedSp, main = "LISA on residuals of linear model, Berlin", verbose = FALSE, zero.policy = TRUE)
Clearly, there is a cluster of high residuals in the center and two small clusters with low residuals. surrounded by cells with low residuals.
## tmap mode set to interactive viewing
## Warning in sp::proj4string(obj): CRS object has comment, which is lost in output
So for sure we have a problem with spatial autocorrelation. Let’s look at different possibilities how to deal with it.
We fit a spatial lag model that has the form:
\(y = \rho W y + X \beta + \epsilon\)
The operation is computational intense so you potentially can get a cup of coffee while you wait for the resuls. Setting method=“SE_classic” is much faster by an approximation used and might be fine for our case.
lagModel2 <- lagsarlm(sumChargingStations ~ pop_total, data=popChargingCleaned, listw=pop.lw, zero.policy=TRUE, method="SE_classic")
summary(lagModel2)
##
## Call:
## lagsarlm(formula = sumChargingStations ~ pop_total, data = popChargingCleaned,
## listw = pop.lw, method = "SE_classic", zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.61518 -0.45280 -0.07562 0.17764 33.44959
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.0726e-01 7.7910e-02 -2.6602 0.007809
## pop_total 1.5194e-04 1.6197e-05 9.3803 < 2.2e-16
##
## Rho: 0.348, LR test value: 45.56, p-value: 1.4805e-11
## Asymptotic standard error: 0.052755
## z-value: 6.5966, p-value: 4.2071e-11
## Wald statistic: 43.515, p-value: 4.2071e-11
##
## Log likelihood: -1480.341 for lag model
## ML residual variance (sigma squared): 2.4272, (sigma: 1.558)
## Number of observations: 791
## Number of parameters estimated: 4
## AIC: 2968.7, (AIC for lm: 3012.2)
## LM test for residual autocorrelation
## test value: 14.021, p-value: 0.00018079
This would be the more intense computaion - which lead for the data at hands to very similar results.
lagModel <- lagsarlm(sumChargingStations ~ pop_total, data=popChargingCleaned, listw=pop.lw, zero.policy=TRUE)
summary(lagModel)
Interpretation of the regression coefficients is not straight forward. We need to distinguish between the direct, the indicrect (via the lagged response) and the total effect The total effect of a unit change in one of the predictors is: \(\hat{\beta}/ (1- \hat{\rho})\)
Using the full weight matrix is computational intense for a weight matrix of our size:
We will speed up computation by proving a vector of traces of powers of the spatial weights matrix created using trW, for approximate impact measures
## Impact measures (lag, trace):
## Direct Indirect Total
## pop_total 0.0001549312 7.81008e-05 0.000233032
The see that the total effect is nearly 1.5 times as big as the direct effects.
We now fit a lm model augmented with the spatially lagged RHS variables. The idea is to use a lagged version of the predictors in addition to the predictors.
slxModel <- lmSLX(sumChargingStations ~ pop_total, data=popChargingCleaned, listw=pop.lw, zero.policy=TRUE)
summary(slxModel)
##
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.724 -0.581 -0.061 0.297 33.497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.053e-01 9.079e-02 -4.464 9.2e-06 ***
## pop_total 1.016e-04 3.216e-05 3.158 0.001648 **
## lag.pop_total 1.525e-04 3.925e-05 3.885 0.000111 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.606 on 788 degrees of freedom
## Multiple R-squared: 0.2369, Adjusted R-squared: 0.2349
## F-statistic: 122.3 on 2 and 788 DF, p-value: < 2.2e-16
## Impact measures (SLX, estimable):
## Direct Indirect Total
## pop_total 0.0001015569 0.0001524842 0.0002540411
popChargingCleaned$residLag <- residuals(lagModel2)
popChargingCleaned$residSlx <- residuals(slxModel)
## tmap mode set to interactive viewing
tm_shape(popChargingCleaned) + tm_fill(col="residLag", palette = "-RdBu", midpoint=0, alpha=0.5) + tm_layout(title="Residuals spatial lag model, Berlin")
## tmap mode set to interactive viewing
If we want to use the count regression approach (which is more suitable), the spatial eigenvector mapping approach is the tool at hands. Unfortunately, it is again computational intense.
sevmPoiss <- ME(sumChargingStations ~ sqrt(pop_total), data=popChargingCleaned, family=poisson, zero.policy = TRUE, listw=pop.lw)
print(sevmPoiss)
## Eigenvector ZI pr(ZI)
## 0 NA NA 0.01
## 1 10 NA 0.02
## 2 28 NA 0.39
sevmPoissModel <- glm(sumChargingStations ~ sqrt(pop_total) + fitted(sevmPoiss), data=popChargingCleaned, family=poisson)
summary(sevmPoissModel)
##
## Call:
## glm(formula = sumChargingStations ~ sqrt(pop_total) + fitted(sevmPoiss),
## family = poisson, data = popChargingCleaned)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0407 -0.7163 -0.5142 -0.3293 10.0131
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.884923 0.143510 -20.103 < 2e-16 ***
## sqrt(pop_total) 0.028309 0.001569 18.044 < 2e-16 ***
## fitted(sevmPoiss)vec10 16.900378 1.480986 11.412 < 2e-16 ***
## fitted(sevmPoiss)vec28 4.715084 1.175176 4.012 6.01e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1710.01 on 790 degrees of freedom
## Residual deviance: 836.43 on 787 degrees of freedom
## AIC: 1342.3
##
## Number of Fisher Scoring iterations: 6
Presumably we have missed a lot of other predictors that are now partially absorbed into different eigenvectors. It might be worth to plot and investigate the eigenvectors that made it into the model. Therefore, we attach the selected eigenvectors to the sf object and plot them.
These eigenvectors reval intresting pattern which should be investigated to explore further hypothesis.
## tmap mode set to interactive viewing
tm_shape(popChargingCleanedSelEV) + tm_fill(col="vec28", alpha=.5, palette = "-RdBu", midpoint=0) + tm_shape(charging_stations) + tm_dots()
## tmap mode set to interactive viewing