Take-home Exercise 5

Visualising and analysing social areas and locations with traffic bottleneck of the city of Engagement, Ohio USA.

Huan Li https://linkedin.com/in/huan-li-ab7498124/ (SMU, SCIS, Master of IT in Business)https://scis.smu.edu.sg/master-it-business/about-mitb-main
05-30-2022

1. Overview

Based on dataset VAST Challenge 2022, we will explore and characterize the distinct areas of the city , and characterize the travel patterns to identify potential bottlenecks or hazards, and examine how these patterns change over time. The operation was carried out on Rstudio and main packages used are sf, tmap and tidyverse.

Questions to be addressed are:

2. Data Preparation

2.1 Installing libraries

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

sf IS an R package specially designed to handle geospatial data in simple feature objects.

The chunk code on the right will do the trick.

packages = c('sf','tmap','tidyverse','clock',
             'lubridate','sftime','rmarkdown')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
}

2.2 Importing wkt data

Well-known text (WKT) is a human readable representation for spatial objects like points, lines, or enclosed areas on a map.

Import geospatial data in wkt format into R and saved the imported data as simple feature objects by using sf package

In the code chunk below, read_sf() of sf package is used to parse School.csv Pubs.csv, Apartments.csv, Buildings.csv, Employer.csv, and Restaurants.csv into R as sf data.frames.

schools <- read_sf("data/Schools.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

apartments <- read_sf("data/Apartments.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

buildings <- read_sf("data/Buildings.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

employers <- read_sf("data/Employers.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

jobs <- read_sf("data/Jobs.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

participants <- read_sf("data/Participants.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

pubs <- read_sf("data/Pubs.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

restaurants <- read_sf("data/Restaurants.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

It is always a good practice to examine the imported data frame before further analysis is performed.

Let’s take an overview of the datasets

print(buildings)
Simple feature collection with 1042 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -4762.191 ymin: -30.08359 xmax: 2650 ymax: 7850.037
CRS:           NA
# A tibble: 1,042 x 5
   buildingId                       location buildingType maxOccupancy
   <chr>                           <POLYGON> <chr>        <chr>       
 1 1          ((350.0639 4595.666, 390.0633~ Commercial   ""          
 2 2          ((-1926.973 2725.611, -1948.1~ Residental   "12"        
 3 3          ((685.6846 1552.131, 645.9985~ Commercial   ""          
 4 4          ((-976.7845 4542.382, -1053.2~ Commercial   ""          
 5 5          ((1259.306 3572.727, 1299.255~ Residental   "2"         
 6 6          ((478.8969 1082.484, 473.6596~ Commercial   ""          
 7 7          ((-1920.823 615.7447, -1960.8~ Residental   ""          
 8 8          ((-3302.657 5394.354, -3301.5~ Commercial   ""          
 9 9          ((-600.5789 4429.228, -495.95~ Commercial   ""          
10 10         ((-68.75908 5379.924, -28.782~ Residental   "5"         
# ... with 1,032 more rows, and 1 more variable: units <chr>
print(apartments)
Simple feature collection with 1517 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -4616.828 ymin: 22.16098 xmax: 2488.067 ymax: 7829.905
CRS:           NA
# A tibble: 1,517 x 6
   apartmentId rentalCost maxOccupancy numberOfRooms
   <chr>       <chr>      <chr>        <chr>        
 1 1           768.16     2            4            
 2 2           1014.55    2            1            
 3 3           1057.39    4            3            
 4 4           1259.1     4            3            
 5 5           411.5      1            4            
 6 6           859.58     3            2            
 7 7           982.11     3            4            
 8 8           980.05     4            1            
 9 9           433.45     1            3            
10 10          1104.33    3            4            
# ... with 1,507 more rows, and 2 more variables: location <POINT>,
#   buildingId <chr>

2.3 Data Wrangling

logs_selected <- read_rds("data/rds/logs_selected.rds")
print(logs_selected)
Simple feature collection with 244271 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -4616.828 ymin: 35.4377 xmax: 2630 ymax: 7836.546
CRS:           NA
# A tibble: 244,271 x 14
   timestamp            currentLocation participantId currentMode
 * <chr>                        <POINT> <chr>         <chr>      
 1 2022-03-01T05:~  (-1613.47 1032.372) 651           Transport  
 2 2022-03-01T05:~  (-4586.943 7246.79) 683           Transport  
 3 2022-03-01T05:~ (-4583.826 7612.941) 728           Transport  
 4 2022-03-01T05:~ (-1318.575 1217.017) 651           Transport  
 5 2022-03-01T05:~ (-4219.829 7379.372) 683           Transport  
 6 2022-03-01T05:~ (-4317.212 7382.659) 728           Transport  
 7 2022-03-01T05:~  (-1171.728 1502.45) 651           Transport  
 8 2022-03-01T05:~  (-4186.85 6980.755) 683           Transport  
 9 2022-03-01T05:~ (-4199.135 7076.865) 728           Transport  
10 2022-03-01T05:~ (-3581.037 7172.023) 619           Transport  
# ... with 244,261 more rows, and 10 more variables:
#   hungerStatus <chr>, sleepStatus <chr>, apartmentId <chr>,
#   availableBalance <chr>, jobId <chr>, financialStatus <chr>,
#   dailyFoodBudget <chr>, weeklyExtraBudget <chr>, Timestamp <dttm>,
#   day <int>

3. Visulisations and Insights

3.1 Distinct Social Areas

Characterize the distinct social areas of the city of Engagement, Ohio USA.

3.1.1 Building Types Map

buildingType <- tm_shape(buildings)+
tm_polygons(col = "buildingType",
           palette="Accent",
           border.col = "black",
           border.alpha = .5,
           border.lwd = 0.5)+
tm_layout(main.title = "Building Types Map",
          main.title.position = "center",
          main.title.size = 1,
          frame = FALSE)+
tm_compass(size = 2,
           position = c('right', 'top'))

buildingType

Insights

3.1.2 Facility Map

label <- c('Restaurant', 'Pub', 'Employer', 'Apartment', 'School')
color <- c('blue', 'green', "red", 'purple', 'yellow')

facilitiesMap <- tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(pubs) +
  tm_dots(col = "green", size = 0.3, alpha= 0.8) +
tm_shape(restaurants) +
  tm_dots(col = "blue", size = 0.3, alpha= 0.8) +
tm_shape(schools) +
  tm_dots(col = "yellow", size = 0.3, alpha= 0.8)+
tm_shape(employers) +
  tm_dots(col = "red") +
tm_shape(apartments) +
  tm_dots(col = "purple") +
tm_add_legend(title = 'Facilities',
              type = 'symbol',
              border.col = NA,
              labels = label,
              col = color) +
tm_layout(main.title = 'Facilities Map of Engagemnt City, Ohio USA',
          main.title.size = 1,
          frame = FALSE) +
tm_compass(size = 2,
           position = c('right', 'top'))+
tm_credits('Source: VAST Challenge 2022')

facilitiesMap

Insights

tmap_arrange(buildingType, facilitiesMap, widths = c(1))

Insights

3.2 Traffic Situation

Where are the busiest areas in Engagement? Are there traffic bottlenecks that should be addressed?

3.2.1 General Traffic Situation

Computing the haxegons

In the code chunk below, st_make_grid() of sf package is used to create haxegons

hex <- st_make_grid(buildings, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id')
plot(hex)

Performing point in polygon count

In the code chunk below, st_join() of sf package is used to count the number of event points in the hexagons.

points_in_hex <- st_join(logs_selected, 
                        hex, 
                        join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id)
head(points_in_hex)
# A tibble: 6 x 2
  hex_id pointCount
   <int>      <int>
1    169         35
2    212         56
3    225         21
4    226         94
5    227         22
6    228         45

Performing relational join

In the code chunk below, left_join() of dplyr package is used to perform a left-join by using hex as the target table and points_in_hex as the join table. The join ID is hex_id.

hex_combined <- hex %>%
  left_join(points_in_hex, 
            by = 'hex_id') %>%
  replace(is.na(.), 0)

Plotting the hexagon binning mapp

In the code chunk below, tmap package is used to create the hexagon binning map.

traffic <- tm_shape(hex_combined %>%
                      filter(pointCount > 0))+
  tm_fill("pointCount",
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1)+
  tm_layout(main.title = 'Traffic of Engagemnt City, Ohio USA',
            main.title.size = 1,
            frame = FALSE)
traffic

tmap_arrange(facilitiesMap,traffic, widths = c(1))

Insights

4. Conclusion

Well-known text (WKT) is a human readable representation for spatial objects like points, lines, or enclosed areas on a map, and helps when doing geo-spatial visualizations,

During this nexercise, we learned how to import geospatial data in wkt format into R and saved the imported data as simple feature objects by using sf package, to map geospatial data using tmap package, to process movement data by using sf and tidyverse packages,and to visualise movement data by using tmap and ggplot2 package.