1. Overview

With reference to Challenge 2 of VAST Challenge 2022, we are required to reveal the daily routines of two selected participant of the city of Engagement, Ohio USA. For the purpose of this take-home exercise,ggplot2 and ViSIElse and other appropriate visual analytics methods will be used.

2. Data Preparation

2.1 Installing and loading the required libraries

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code on the right will do the trick.

packages = c('scales', 'viridis', 
             'lubridate', 'ggthemes', 
             'gridExtra', 'tidyverse', 
             'readxl', 'knitr','tmap',
             'timetk','calendR','sf',
             'data.table', 'ViSiElse')
for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Importing the dataset

Read .csv

log1 <- read_csv('data/ParticipantStatusLogs1.csv')
log2 <- read_csv('data/ParticipantStatusLogs2.csv')

Find the Participants with the Highest and Lowest Wage

financial <- read_rds("data/rds/FinancialPivot.rds")

financialCalculate <- financial %>% 
  group_by(participantId) %>% 
  summarise(wage = mean(Wage))
participantHighest <- financialCalculate %>% 
  filter(wage == max(financialCalculate$wage))
participantLowest <- financialCalculate %>% 
  filter(wage == min(financialCalculate$wage))

selected <- c(participantHighest$participantId,participantLowest$participantId)
selected_log1 <- log1 %>% 
  filter(participantId %in% selected)
selected_log2 <- log2 %>% 
  filter(participantId %in% selected)
print(selected_log1)
print(selected_log2)

Save to .rds Format

Read .rds File

log1 <- read_rds("data/rds/selected_log1.rds")
log2 <- read_rds("data/rds/selected_log2.rds")
logs <- rbind(log1,log2)
max(log1$timestamp)

[1] "2022-03-06 18:50:00 UTC"

It is always a good practice to examine the imported data frame before further analysis is performed.

Let’s take an overview of the datasets.

print(logs)

# A tibble: 6,717 x 12
   timestamp           currentLocation       participantId currentMode
   <dttm>              <chr>                         <dbl> <chr>      
 1 2022-03-01 00:00:00 POINT (-1076.5842362~            18 AtHome     
 2 2022-03-01 00:00:00 POINT (-3374.8893316~           719 AtHome     
 3 2022-03-01 00:05:00 POINT (-1076.5842362~            18 AtHome     
 4 2022-03-01 00:05:00 POINT (-3374.8893316~           719 AtHome     
 5 2022-03-01 00:10:00 POINT (-1076.5842362~            18 AtHome     
 6 2022-03-01 00:10:00 POINT (-3374.8893316~           719 AtHome     
 7 2022-03-01 00:15:00 POINT (-1076.5842362~            18 AtHome     
 8 2022-03-01 00:15:00 POINT (-3374.8893316~           719 AtHome     
 9 2022-03-01 00:20:00 POINT (-1076.5842362~            18 AtHome     
10 2022-03-01 00:20:00 POINT (-3374.8893316~           719 AtHome     
# ... with 6,707 more rows, and 8 more variables: hungerStatus <chr>,
#   sleepStatus <chr>, apartmentId <dbl>, availableBalance <dbl>,
#   jobId <dbl>, financialStatus <chr>, dailyFoodBudget <dbl>,
#   weeklyExtraBudget <dbl>

Let us examine the structure of log1 simple feature data.frame by using glimpse().

glimpse(logs)

Rows: 6,717
Columns: 12
$ timestamp         <dttm> 2022-03-01 00:00:00, 2022-03-01 00:00:00,~
$ currentLocation   <chr> "POINT (-1076.5842362188507 2994.728261285~
$ participantId     <dbl> 18, 719, 18, 719, 18, 719, 18, 719, 18, 71~
$ currentMode       <chr> "AtHome", "AtHome", "AtHome", "AtHome", "A~
$ hungerStatus      <chr> "JustAte", "JustAte", "JustAte", "JustAte"~
$ sleepStatus       <chr> "Sleeping", "Awake", "Sleeping", "Awake", ~
$ apartmentId       <dbl> 482, 972, 482, 972, 482, 972, 482, 972, 48~
$ availableBalance  <dbl> 1633.116, 1088.465, 1633.116, 1088.465, 16~
$ jobId             <dbl> 278, 892, 278, 892, 278, 892, 278, 892, 27~
$ financialStatus   <chr> "Stable", "Stable", "Stable", "Stable", "S~
$ dailyFoodBudget   <dbl> 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12~
$ weeklyExtraBudget <dbl> 2146.3299, 422.2288, 2146.3299, 422.2288, ~

2.3 Transform Date Format

logs$date <- as.Date(logs$timestamp)
logs$day <- weekdays(logs$date)
logs$year <- year(ymd(logs$date))
print(logs)

# A tibble: 6,717 x 15
   timestamp           currentLocation       participantId currentMode
   <dttm>              <chr>                         <dbl> <chr>      
 1 2022-03-01 00:00:00 POINT (-1076.5842362~            18 AtHome     
 2 2022-03-01 00:00:00 POINT (-3374.8893316~           719 AtHome     
 3 2022-03-01 00:05:00 POINT (-1076.5842362~            18 AtHome     
 4 2022-03-01 00:05:00 POINT (-3374.8893316~           719 AtHome     
 5 2022-03-01 00:10:00 POINT (-1076.5842362~            18 AtHome     
 6 2022-03-01 00:10:00 POINT (-3374.8893316~           719 AtHome     
 7 2022-03-01 00:15:00 POINT (-1076.5842362~            18 AtHome     
 8 2022-03-01 00:15:00 POINT (-3374.8893316~           719 AtHome     
 9 2022-03-01 00:20:00 POINT (-1076.5842362~            18 AtHome     
10 2022-03-01 00:20:00 POINT (-3374.8893316~           719 AtHome     
# ... with 6,707 more rows, and 11 more variables:
#   hungerStatus <chr>, sleepStatus <chr>, apartmentId <dbl>,
#   availableBalance <dbl>, jobId <dbl>, financialStatus <chr>,
#   dailyFoodBudget <dbl>, weeklyExtraBudget <dbl>, date <date>,
#   day <chr>, year <int>

3. Daily Routines Visualisation

3.1 Routines on Monday (2022-03-07)

Day <- 'Monday'
Monday <- logs %>% 
  filter(day == Day)
Monday$endtime <- Monday$timestamp + minutes(5)
Monday

# A tibble: 576 x 16
   timestamp           currentLocation       participantId currentMode
   <dttm>              <chr>                         <dbl> <chr>      
 1 2022-03-07 00:00:00 POINT (-1076.5842362~            18 AtHome     
 2 2022-03-07 00:00:00 POINT (-3210.1724903~           719 AtRecreati~
 3 2022-03-07 00:05:00 POINT (-1076.5842362~            18 AtHome     
 4 2022-03-07 00:05:00 POINT (-3210.1724903~           719 AtRecreati~
 5 2022-03-07 00:10:00 POINT (-1076.5842362~            18 AtHome     
 6 2022-03-07 00:10:00 POINT (-3210.1724903~           719 AtRecreati~
 7 2022-03-07 00:15:00 POINT (-1076.5842362~            18 AtHome     
 8 2022-03-07 00:15:00 POINT (-3210.1724903~           719 AtRecreati~
 9 2022-03-07 00:20:00 POINT (-1076.5842362~            18 AtHome     
10 2022-03-07 00:20:00 POINT (-3210.1724903~           719 AtRecreati~
# ... with 566 more rows, and 12 more variables: hungerStatus <chr>,
#   sleepStatus <chr>, apartmentId <dbl>, availableBalance <dbl>,
#   jobId <dbl>, financialStatus <chr>, dailyFoodBudget <dbl>,
#   weeklyExtraBudget <dbl>, date <date>, day <chr>, year <int>,
#   endtime <dttm>

p1 <- ggplot()+
  geom_segment(data=Monday,
               mapping=aes(x=timestamp, xend=endtime,
                           y=currentMode, yend = currentMode, 
                           color= currentMode),
               size=6) +
  facet_grid(.~participantId~.) +
  theme(legend.position = 'right',
        legend.title=element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        text = element_text(size=10)) +
  labs(title="Monday's Routines of Participant with Highest/Lowest Eage",
       subtitle=Day)

p1

Prticipant ID 18 is the one having the highest monthly average wage; Prticipant ID 719 is the one having the Lowest monthly average wage. Figure above shows that on monday, the one with the highest monthly average wage get out to work earlier and spend no time on recreation.

p2 <- ggplot()+
  geom_segment(data=Monday,
               mapping=aes(x=timestamp, xend=endtime,
                           y=hungerStatus, yend = hungerStatus, 
                           color= hungerStatus),
               size=6) +
  facet_grid(.~participantId~.) +
  theme(legend.position = 'right',
        legend.title=element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        text = element_text(size=10))

p2

Figure above shows that on monday, the participant having the highest monthly average wage does not have dinner and have a longer starving time.

p3 <- ggplot()+
  geom_segment(data=Monday,
               mapping=aes(x=timestamp, xend=endtime,
                           y=sleepStatus, yend = sleepStatus, 
                           color= sleepStatus),
               size=6) +
  facet_grid(.~participantId~.) +
  theme(legend.position = 'right',
        legend.title=element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        text = element_text(size=10))

p3

Figure above shows that on monday, the participant having the highest monthly average wage fall asleep earlier and get up earlier.

3.2 Routines on Sunday (2022-03-06)

Day <- 'Sunday'
Sunday <- logs %>% 
  filter(day == Day)
Sunday$endtime <- Sunday$timestamp + minutes(5)
Sunday

# A tibble: 576 x 16
   timestamp           currentLocation       participantId currentMode
   <dttm>              <chr>                         <dbl> <chr>      
 1 2022-03-06 00:00:00 POINT (-1076.5842362~            18 AtHome     
 2 2022-03-06 00:00:00 POINT (-3375.8230436~           719 Transport  
 3 2022-03-06 00:05:00 POINT (-1076.5842362~            18 AtHome     
 4 2022-03-06 00:05:00 POINT (-3374.8893316~           719 AtHome     
 5 2022-03-06 00:10:00 POINT (-1076.5842362~            18 AtHome     
 6 2022-03-06 00:10:00 POINT (-3374.4549804~           719 Transport  
 7 2022-03-06 00:15:00 POINT (-1076.5842362~            18 AtHome     
 8 2022-03-06 00:15:00 POINT (-3373.7155699~           719 Transport  
 9 2022-03-06 00:20:00 POINT (-1076.5842362~            18 AtHome     
10 2022-03-06 00:20:00 POINT (-3567.6895099~           719 Transport  
# ... with 566 more rows, and 12 more variables: hungerStatus <chr>,
#   sleepStatus <chr>, apartmentId <dbl>, availableBalance <dbl>,
#   jobId <dbl>, financialStatus <chr>, dailyFoodBudget <dbl>,
#   weeklyExtraBudget <dbl>, date <date>, day <chr>, year <int>,
#   endtime <dttm>

p4 <- ggplot()+
  geom_segment(data=Sunday,
               mapping=aes(x=timestamp, xend=endtime,
                           y=currentMode, yend = currentMode, 
                           color= currentMode),
               size=6) +
  facet_grid(.~participantId~.) +
  theme(legend.position = 'right',
        legend.title=element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        text = element_text(size=10)) +
  labs(title="Sunday's Routines of Participant with Highest/Lowest Eage",
       subtitle=Day)

p4

Figure above shows that on Sunday, compared with the one with the lowest average wage, the participant having the highest monthly average wage spend more time at home and less time hanging out for pleasure.

p5 <- ggplot()+
  geom_segment(data=Sunday,
               mapping=aes(x=timestamp, xend=endtime,
                           y=hungerStatus, yend = hungerStatus, 
                           color= hungerStatus),
               size=6) +
  facet_grid(.~participantId~.) +
  theme(legend.position = 'right',
        legend.title=element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        text = element_text(size=10))

p5

Figure above shows that even on Sunday, the participant having the highest monthly average wage still on a diet and does not have dinner, while the other participant will eat food every time he/she feel hungry.

p6 <- ggplot()+
  geom_segment(data=Sunday,
               mapping=aes(x=timestamp, xend=endtime,
                           y=sleepStatus, yend = sleepStatus, 
                           color= sleepStatus),
               size=6) +
  facet_grid(.~participantId~.) +
  theme(legend.position = 'right',
        legend.title=element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        text = element_text(size=10))

p6

From figures above, we can see that the participant with highest wage(ID18) always have a good habbit , no matter it is monday or weekends.

4. Conclusions

ViSiElse package is specially designed for visualising behavioral observation over time.
Creative spirit is important in visual analytics.After exploration, we can find that ggplot2 also can be used to visualize daily routines.
There is more than one way to visulise what we want, keep trying to find an appropriate method.

Take-home Exercise 4