A helper function that eases the communication with the API a bit. It handles the processing of the data to ensure that a properly formated data.frame is returned.

```
getOsomeStat <- function(uri, valueFieldName="value", ...)
{
results <- postForm(uri, binary=FALSE, .encoding = "utf-8", ...)
#browser()
resultList <- RJSONIO::fromJSON(results, simplify = TRUE)
resultsDf <- data.frame(do.call("rbind", (resultList$result)))
# make sure the right data types are used
# for users we have fromTimestamp and toTimestamp fields, not timestampe
if(length(grep(x=names(resultsDf), pattern = "timestamp"))> 0)
{
resultsDf$timestamp <- parse_datetime( as.character(resultsDf$timestamp))
}
if(length(grep(x=names(resultsDf), pattern = "fromTimestamp"))> 0)
{
resultsDf$fromTimestamp <- parse_datetime( as.character(resultsDf$fromTimestamp))
}
if(length(grep(x=names(resultsDf), pattern = "toTimestamp"))> 0)
{
resultsDf$toTimestamp <- parse_datetime( as.character(resultsDf$toTimestamp))
}
# rename value field
resultsDf$value <- as.numeric(as.character(resultsDf$value))
idxValueField <- which(names(resultsDf)=="value")
names(resultsDf)[idxValueField] <- valueFieldName
return(resultsDf)
}
```

To estimate how complete the OSM data are we could look at the historic development of OSM contributions.

```
## Reading layer `limaCallao' from data source `/home/slautenb/Documents/lehre/HD/ws_2020_21/heikaLab/R4UrbanDataLab_2020/data_day5/lima.gpkg' using driver `GPKG'
## Simple feature collection with 2 features and 7 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -77.25156 ymin: -12.51993 xmax: -76.62082 ymax: -11.57244
## geographic CRS: WGS 84
```

For highways we query the development of road length over time

`resHighwayTotal <- getOsomeStat(uri = "https://api.ohsome.org/v1/elements/length/", bboxes = limaBboxStr, filter = "highway=* and type:way", time = theTime, valueFieldName = "highway_length_total")`

Lets get also separate information for primary, secundary and tertiary roads as well as for residential roads

```
resHighwayPrimary <- getOsomeStat(uri = "https://api.ohsome.org/v1/elements/length/", bboxes = limaBboxStr, filter = "highway=primary and type:way", time = theTime, valueFieldName = "highwayPrimary_length_total")
resHighwaySecondary <- getOsomeStat(uri = "https://api.ohsome.org/v1/elements/length/", bboxes = limaBboxStr, filter = "highway=secondary and type:way", time = theTime, valueFieldName = "highwaySecondary_length_total")
resHighwayTertiary <- getOsomeStat(uri = "https://api.ohsome.org/v1/elements/length/", bboxes = limaBboxStr, filter = "highway=tertiary and type:way", time = theTime, valueFieldName = "highwayTertiary_length_total")
resHighwayResidential <- getOsomeStat(uri = "https://api.ohsome.org/v1/elements/length/", bboxes = limaBboxStr, filter = "highway=residential and type:way", time = theTime, valueFieldName = "highwayResidential_length_total")
```

Combine the data.frames - they all have the same order since time is exactly the same:

```
resHighwayTs <- cbind(resHighwayTotal,
resHighwayPrimary$highwayPrimary_length_total,
resHighwaySecondary$highwaySecondary_length_total,
resHighwayTertiary$highwayTertiary_length_total,
resHighwayResidential$highwayResidential_length_total)
names(resHighwayTs) <- c("timestamp", "total_highway_length", "primary_highway_length", "secondary_highway_length", "tertiary_highway_length", "residential_highway_length")
```

For routing application it is important that the road network is relative complete. Therefore, we will be adding up the four main road types (excluding motorways and trunks for right now):

First we are going to change the table format from the wide format to the long format since this simplyfies handling in ggplot dramatically.

```
resHighwayTs_long <- pivot_longer(resHighwayTs, 2:7, names_to = "type", values_to = "length" )
resHighwayTs_long$length <- resHighwayTs_long$length / 1000 # konvert to km
```

Reorder the factor levels for nicer plotting

`resHighwayTs_long$type <- factor(resHighwayTs_long$type, levels = c("total_highway_length", "primary_highway_length", "secondary_highway_length", "tertiary_highway_length", "residential_highway_length", "roadLength4cls"))`

`ggplot(resHighwayTs_long, mapping=aes(x=timestamp, y=length)) + geom_area() + facet_wrap(facets = vars(type), scales="free_y") + xlab("") + ylab("Length [km]") + labs(title = "OSM contributions Lima", subtitle = "Highway")`

We see that the growth of total length of the road network still continues to increase. However, the slope has decreased - the process might not have levelled of but it might have already started to level of. We also observe a strong jump of total highway length at around 2012 which presumably was caused by an import.

We also observe that inbetween the road length for tertiary highways dropped drastically and that primary highways decreased between 2012 and 2015. Potentially these might be do to a reclassification of highways. This might become clearer if the stack the different components.

```
resHighwayTs_long %>%
filter(type != "total_highway_length" & type != "roadLength4cls") %>%
ggplot( mapping=aes(x=timestamp, y=length, fill=type)) +
geom_area(position="stack", show.legend = TRUE, alpha=.7) +
scale_fill_discrete(name = "Highway class",
labels = c("Primary", "Secondary", "Tertiary", "Residential")) +
xlab("") + ylab("Length [km]") +
labs(title = "OSM contributions Lima", subtitle = "Highway")
```

The potential import in 2012 effect mainly residential roads and in 2013 we see another strong increase in the length of residential highways. The drop of tertiary highways in 2014 seems to be a reclassification towards secondary highways.

We could try to fit a saturation curve (logistic curve in this example) to the data to estimat how well it fits the data.

We use a parametrization of the logistic function for which a selt-starter functions exists:

\(y = \frac{Asym} {1+e^{\frac{x_{mid} - x}{scal}}}\)

- \(Asym\) is the asymptotic value the function converges against
- \(x_{mid}\) is the x-value (here time, in seconds since 01.01.1970) at which the curve has its turing point (half saturation point)
- $scal is inversely related to the slope of the curve

For the fitting of non-linear functions we need to provide suitable start values for all parameters to be estimated. Otherwise the function might not converge. For some commonly used function R provides self-starter functions, that estimate suitable start values from the data.

```
g <- nls(length ~ SSlogis(input=as.numeric(timestamp), Asym, xmid, scal), data = resHighwayTs_long, subset= type == "total_highway_length")
summary(g)
```

```
##
## Formula: length ~ SSlogis(input = as.numeric(timestamp), Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 2.154e+04 4.391e+02 49.06 <2e-16 ***
## xmid 1.350e+09 4.526e+06 298.26 <2e-16 ***
## scal 6.767e+07 3.697e+06 18.30 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1594 on 146 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 1.542e-06
```

```
resHighwayTs_long %>%
filter(type == "total_highway_length") %>%
ggplot( mapping=aes(x=timestamp, y=length, fill=type)) +
geom_area() +
xlab("") + ylab("Length [km]") +
labs(title = "OSM contributions Lima", subtitle = "Highway") +
geom_smooth( method="nls", formula= y~ SSlogis(input=x, Asym, xmid, scal) , se=FALSE, col="black", lty=2, size=1)
```

The single saturation curve does not fit to well. A double logistic curve presumably fits the data better.

The double logistic curve is the sum of two logist curves therebey the second curve is zero until a threshold. It can be expressed as follows:

\[ y= \frac{a_1} {1+ e^{k_1*( b_1- x)}} + \frac{a_2-a_1} {1+ e^{k_2*( b_2- x)}} \]

- \(a_1\) represents the saturation value for the first (the y value at which the first logistic curve saturates)
- \(a_1\) + \(a_2\) is the saturation value at which the combined curve saturates. \(a_2\) is the saturation of the second curve if the values of the second curve would be substracted
- \(b_1\) represents the date (in internal representation which is seconds since 01.01.1970) at which the first curve has reached its turning point
- \(b_2\) represents the date at which the second curve has reached its turning point
- \(k_1\) and \(k_2\) describe the slope of the first and the second curve. The represent $.

For the double logistic curve we have no self-starter function. Which implies that we have to supply start values for the function.

```
doubleLog <- function(x, a1, a2, b1, b2, k1, k2)
{
y <- a1 / (1+ exp(k1*( b1- x))) + (a2-a1) / (1+ exp(k2*( b2- x)))
res <- data.frame(y, x=as.Date.POSIXct(x))
return(res)
}
doubleLog(x=1525125600, a1= 6000, a2=2500, k1=10^6, k2=10^6, b1=as.numeric(as.POSIXct("2015-01-01")), b2= as.numeric(as.POSIXct("2018-05-01")))
```

```
## # A tibble: 1 x 2
## y x
## <dbl> <date>
## 1 4250 2018-04-30
```

Lets plot the function for different parameters. First varying a1 and a2:

```
xstart <- as.numeric(as.POSIXct("2008-01-01"))
xend <- as.numeric(as.POSIXct("2019-12-01"))
params <- crossing(a1 = c(4000, 10000), a2 = c(25000, 20000), k1=10^-8, k2=10^-8, b1=as.numeric(as.POSIXct("2015-01-01")), b2= as.numeric(as.POSIXct("2018-05-01")))
params %>%
dplyr::mutate(res=purrr::pmap(., .f = doubleLog, x = seq(xstart, xend, by=10^4))) %>%
tidyr::unnest(cols=c(res)) %>%
ggplot(aes(x=x, y=y, colour=factor(a1)))+
facet_wrap(~a2, ncol=2) + geom_line()
```

Changing the slope coefficients:

```
params <- crossing(a1 = 4000, a2 = 25000, k1=c((1:3)*10^-8), k2=(1:3)*10^-8, b1=as.numeric(as.POSIXct("2015-01-01")), b2= as.numeric(as.POSIXct("2018-05-01")))
params %>%
dplyr::mutate(res=purrr::pmap(., .f = doubleLog, x = seq(xstart, xend, by=10^4))) %>%
tidyr::unnest(cols=c(res)) %>%
ggplot(aes(x=x, y=y, colour=factor(k1)))+
facet_wrap(~k2, ncol=2) + geom_line()
```