Switching between space and time: Spatio-temporal analysis with
cubble

H. Sherry Zhang

The Australian spatial econometrics and statistics workshop
Monash University, Australia
2023 Feb 17

Hi!

  • A final year PhD student in the Department of Econometrics and Business Statistics

  • My research centers on exploring multivariate spatio-temporal data with data wrangling and visualisation tool.

  • Find me on

    • Twitter: huizezhangsh,
    • GitHub: huizezhang-sherry, and
    • https://huizezhangsh.netlify.app/

Spatio-temporal data

People can talk about a whole range of different things when they refer to their data as spatio-temporal!

The focus of today will be on vector data.

Example of vector data

Physical sensors that measure the temperature, rainfall, and wind speed & direction

But vector data are more than just points

This simplified Victoria polygon contains 360 points!

# A tibble: 1 × 2
  NAME                                                             geometry
  <chr>                                                       <POLYGON [°]>
1 Victoria ((140.9657 -38.05599, 140.9711 -37.79145, 140.9739 -37.46209, 1…

Temporal data

A wide table 😢

year-by-month table

Year Jan Feb Mar
1946 26.663 23.598 26.931
1947 21.439 21.089 23.709
1948 21.937 20.035 23.590

A long table 😄

time stamp forms rows, variable forms columns.

Year month value
1946 Jan 26.663
1946 Feb 23.598
1946 Mar 26.931
1947 Jan 21.439
1947 Feb 21.089
1947 Mar 23.709

When the data has both spatial and temporal dimensions

  • In a long table with duplicated spatial variables? That would give a lot of duplication if daily data & large spatial objects.

  • Sometimes, we would like to make per station summary, ideally, each station forms a row

  • Other time, we would like to work on temporal variables in the long form.

  • A lot of padding work to arrange the spatio-temporal data in the format convenient for spatial & temporal operations!

Cubble: a spatio-temporal vector data structure

Cubble: a spatio-temporal vector data structure

Cubble is a nested object built on tibble that allow easy pivoting between spatial and temporal form.

Australian weather station data:

stations
# A tibble: 30 × 6
  id            lat  long  elev name                       wmo_id
  <chr>       <dbl> <dbl> <dbl> <chr>                       <dbl>
1 ASN00060139 -31.4  153.   4.2 port macquarie airport aws  94786
2 ASN00068228 -34.4  151.  10   bellambi aws                94749
3 ASN00017123 -28.1  140.  37.8 moomba airport              95481
4 ASN00081049 -36.4  145. 114   tatura inst sustainable ag  95836
5 ASN00018201 -32.5  138.  14   port augusta aero           95666
# … with 25 more rows

ts
# A tibble: 10,632 × 5
  id          date        prcp  tmax  tmin
  <chr>       <date>     <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01     0  36.7  26.9
2 ASN00003057 2020-01-02    41  34.2  24  
3 ASN00003057 2020-01-03     0  35    25.4
4 ASN00003057 2020-01-04    40  29.1  25.4
5 ASN00003057 2020-01-05  1640  27.3  24.3
# … with 10,627 more rows

Cast your data into a cubble

(weather <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, coords = c(long, lat)
))
# cubble:   id [30]: nested form
# bbox:     [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
  id            lat  long  elev name              wmo_id ts                
  <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>            
1 ASN00003057 -16.5  123.     7 cygnet bay         94201 <tibble [316 × 4]>
2 ASN00005007 -22.2  114.     5 learmonth airport  94302 <tibble [363 × 4]>
3 ASN00005084 -21.5  115.     5 thevenard island   94303 <tibble [366 × 4]>
4 ASN00010515 -32.1  117.   199 beverley           95615 <tibble [354 × 4]>
5 ASN00012314 -27.8  121.   497 leinster aero      95448 <tibble [366 × 4]>
# … with 25 more rows
  • the spatial data (stations) can be an sf object and temporal data (ts) can be a tsibble object.

Switch between the two forms

long form

(weather_long <- weather %>% 
  face_temporal())
# cubble:  date, id [30]: long form
# bbox:    [114.09, -41.88, 152.87, -11.65]
# spatial: lat [dbl], long [dbl], elev [dbl],
#   name [chr], wmo_id [dbl]
  id          date        prcp  tmax  tmin
  <chr>       <date>     <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01     0  36.7  26.9
2 ASN00003057 2020-01-02    41  34.2  24  
3 ASN00003057 2020-01-03     0  35    25.4
4 ASN00003057 2020-01-04    40  29.1  25.4
5 ASN00003057 2020-01-05  1640  27.3  24.3
# … with 10,627 more rows

back to the nested form:

(weather_back <- weather_long %>% 
   face_spatial())
# cubble:   id [30]: nested form
# bbox:     [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl],
#   tmin [dbl]
  id         lat  long  elev name  wmo_id ts      
  <chr>    <dbl> <dbl> <dbl> <chr>  <dbl> <list>  
1 ASN0000… -16.5  123.     7 cygn…  94201 <tibble>
2 ASN0000… -22.2  114.     5 lear…  94302 <tibble>
3 ASN0000… -21.5  115.     5 thev…  94303 <tibble>
4 ASN0001… -32.1  117.   199 beve…  95615 <tibble>
5 ASN0001… -27.8  121.   497 lein…  95448 <tibble>
# … with 25 more rows
identical(weather_back, weather)
[1] TRUE

Access variables in the other form

Reference temporal variables with $

weather %>% 
  mutate(avg_tmax = mean(ts$tmax, na.rm = TRUE))
# cubble:   id [30]: nested form
# bbox:     [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
  id            lat  long  elev name              wmo_id ts                 avg_tmax
  <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>                <dbl>
1 ASN00003057 -16.5  123.     7 cygnet bay         94201 <tibble [316 × 4]>     32.4
2 ASN00005007 -22.2  114.     5 learmonth airport  94302 <tibble [363 × 4]>     33.2
3 ASN00005084 -21.5  115.     5 thevenard island   94303 <tibble [366 × 4]>     30.7
4 ASN00010515 -32.1  117.   199 beverley           95615 <tibble [354 × 4]>     26.4
5 ASN00012314 -27.8  121.   497 leinster aero      95448 <tibble [366 × 4]>     29.6
# … with 25 more rows

Move spatial variables into the long form

weather_long %>% unfold(long, lat)
# cubble:  date, id [30]: long form
# bbox:    [114.09, -41.88, 152.87, -11.65]
# spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
  id          date        prcp  tmax  tmin  long   lat
  <chr>       <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01     0  36.7  26.9  123. -16.5
2 ASN00003057 2020-01-02    41  34.2  24    123. -16.5
3 ASN00003057 2020-01-03     0  35    25.4  123. -16.5
4 ASN00003057 2020-01-04    40  29.1  25.4  123. -16.5
5 ASN00003057 2020-01-05  1640  27.3  24.3  123. -16.5
# … with 10,627 more rows

Explore temporal pattern across space

Why do you need a glyph map?

Why do you need a glyph map?

Glyph map transformation

DATA %>%
  ggplot() +
  geom_glyph(
    aes(x_major = X_MAJOR, x_minor = X_MINOR,
        y_major = Y_MAJOR, y_minor = Y_MINOR)) +
  ...

Avg. max. temperature on the map

cb <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, coords = c(long, lat)
)

set.seed(0927)
cb_glyph <- cb %>%
  slice_sample(n = 20) %>%
  face_temporal() %>%
  mutate(month = lubridate::month(date)) %>%
  group_by(month) %>% 
  summarise(tmax = mean(tmax, na.rm = TRUE)) %>%
  unfold(long, lat)

ggplot() +
  geom_sf(data = oz_simp, 
          fill = "grey95", 
          color = "white") +
  geom_glyph(
    data = cb_glyph,
    aes(x_major = long, x_minor = month,
        y_major = lat, y_minor = tmax),
    width = 2, height = 0.7) + 
  ggthemes::theme_map()

Remark

  • Nowadays, data collection can take many forms and the research process begins long before a cleaned dataset is available for modeling.

  • I hope you view data wrangling as an equally important part as your model.

  • With research on creating data tools, you can more easily reproduce results with more recent data in the future, without having to hire a new RA to redo the data preparation work your previous RA has already done (if you ever hire one).

Acknowledgements

  • The slides are made with Quarto, available at
sherryzhang-monashspatial2023.netlify.app
  • All the materials used to prepare the slides are available at
https://github.com/huizezhang-sherry/MONASHspatial2023

Reference