The Australian spatial econometrics and statistics workshop
Monash University, Australia
2023 Feb 17
A final year PhD student in the Department of Econometrics and Business Statistics
My research centers on exploring multivariate spatio-temporal data with data wrangling and visualisation tool.
Find me on
huizezhangsh
,huizezhang-sherry
, andhttps://huizezhangsh.netlify.app/
People can talk about a whole range of different things when they refer to their data as spatio-temporal!
The focus of today will be on vector data.
Physical sensors that measure the temperature, rainfall, and wind speed & direction
This simplified Victoria polygon contains 360 points!
# A tibble: 1 × 2
NAME geometry
<chr> <POLYGON [°]>
1 Victoria ((140.9657 -38.05599, 140.9711 -37.79145, 140.9739 -37.46209, 1…
year-by-month table
Year | Jan | Feb | Mar | … |
---|---|---|---|---|
1946 | 26.663 | 23.598 | 26.931 | … |
1947 | 21.439 | 21.089 | 23.709 | … |
1948 | 21.937 | 20.035 | 23.590 | … |
time stamp forms rows, variable forms columns.
Year | month | value |
---|---|---|
1946 | Jan | 26.663 |
1946 | Feb | 23.598 |
1946 | Mar | 26.931 |
… | … | … |
1947 | Jan | 21.439 |
1947 | Feb | 21.089 |
1947 | Mar | 23.709 |
… | … | … |
In a long table with duplicated spatial variables? That would give a lot of duplication if daily data & large spatial objects.
Sometimes, we would like to make per station summary, ideally, each station forms a row
Other time, we would like to work on temporal variables in the long form.
A lot of padding work to arrange the spatio-temporal data in the format convenient for spatial & temporal operations!
Cubble is a nested object built on tibble that allow easy pivoting between spatial and temporal form.
# A tibble: 30 × 6
id lat long elev name wmo_id
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 ASN00060139 -31.4 153. 4.2 port macquarie airport aws 94786
2 ASN00068228 -34.4 151. 10 bellambi aws 94749
3 ASN00017123 -28.1 140. 37.8 moomba airport 95481
4 ASN00081049 -36.4 145. 114 tatura inst sustainable ag 95836
5 ASN00018201 -32.5 138. 14 port augusta aero 95666
# … with 25 more rows
(weather <- as_cubble(
list(spatial = stations, temporal = ts),
key = id, index = date, coords = c(long, lat)
))
# cubble: id [30]: nested form
# bbox: [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
id lat long elev name wmo_id ts
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <list>
1 ASN00003057 -16.5 123. 7 cygnet bay 94201 <tibble [316 × 4]>
2 ASN00005007 -22.2 114. 5 learmonth airport 94302 <tibble [363 × 4]>
3 ASN00005084 -21.5 115. 5 thevenard island 94303 <tibble [366 × 4]>
4 ASN00010515 -32.1 117. 199 beverley 95615 <tibble [354 × 4]>
5 ASN00012314 -27.8 121. 497 leinster aero 95448 <tibble [366 × 4]>
# … with 25 more rows
stations
) can be an sf
object and temporal data (ts
) can be a tsibble
object.long form
# cubble: date, id [30]: long form
# bbox: [114.09, -41.88, 152.87, -11.65]
# spatial: lat [dbl], long [dbl], elev [dbl],
# name [chr], wmo_id [dbl]
id date prcp tmax tmin
<chr> <date> <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01 0 36.7 26.9
2 ASN00003057 2020-01-02 41 34.2 24
3 ASN00003057 2020-01-03 0 35 25.4
4 ASN00003057 2020-01-04 40 29.1 25.4
5 ASN00003057 2020-01-05 1640 27.3 24.3
# … with 10,627 more rows
back to the nested form:
# cubble: id [30]: nested form
# bbox: [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl],
# tmin [dbl]
id lat long elev name wmo_id ts
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <list>
1 ASN0000… -16.5 123. 7 cygn… 94201 <tibble>
2 ASN0000… -22.2 114. 5 lear… 94302 <tibble>
3 ASN0000… -21.5 115. 5 thev… 94303 <tibble>
4 ASN0001… -32.1 117. 199 beve… 95615 <tibble>
5 ASN0001… -27.8 121. 497 lein… 95448 <tibble>
# … with 25 more rows
[1] TRUE
Reference temporal variables with $
# cubble: id [30]: nested form
# bbox: [114.09, -41.88, 152.87, -11.65]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
id lat long elev name wmo_id ts avg_tmax
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <list> <dbl>
1 ASN00003057 -16.5 123. 7 cygnet bay 94201 <tibble [316 × 4]> 32.4
2 ASN00005007 -22.2 114. 5 learmonth airport 94302 <tibble [363 × 4]> 33.2
3 ASN00005084 -21.5 115. 5 thevenard island 94303 <tibble [366 × 4]> 30.7
4 ASN00010515 -32.1 117. 199 beverley 95615 <tibble [354 × 4]> 26.4
5 ASN00012314 -27.8 121. 497 leinster aero 95448 <tibble [366 × 4]> 29.6
# … with 25 more rows
Move spatial variables into the long form
# cubble: date, id [30]: long form
# bbox: [114.09, -41.88, 152.87, -11.65]
# spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
id date prcp tmax tmin long lat
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ASN00003057 2020-01-01 0 36.7 26.9 123. -16.5
2 ASN00003057 2020-01-02 41 34.2 24 123. -16.5
3 ASN00003057 2020-01-03 0 35 25.4 123. -16.5
4 ASN00003057 2020-01-04 40 29.1 25.4 123. -16.5
5 ASN00003057 2020-01-05 1640 27.3 24.3 123. -16.5
# … with 10,627 more rows
cb <- as_cubble(
list(spatial = stations, temporal = ts),
key = id, index = date, coords = c(long, lat)
)
set.seed(0927)
cb_glyph <- cb %>%
slice_sample(n = 20) %>%
face_temporal() %>%
mutate(month = lubridate::month(date)) %>%
group_by(month) %>%
summarise(tmax = mean(tmax, na.rm = TRUE)) %>%
unfold(long, lat)
ggplot() +
geom_sf(data = oz_simp,
fill = "grey95",
color = "white") +
geom_glyph(
data = cb_glyph,
aes(x_major = long, x_minor = month,
y_major = lat, y_minor = tmax),
width = 2, height = 0.7) +
ggthemes::theme_map()
Nowadays, data collection can take many forms and the research process begins long before a cleaned dataset is available for modeling.
I hope you view data wrangling as an equally important part as your model.
With research on creating data tools, you can more easily reproduce results with more recent data in the future, without having to hire a new RA to redo the data preparation work your previous RA has already done (if you ever hire one).
Wickham, H., Hofmann, H., Wickham, C., & Cook, D. (2012). Glyph‐maps for visually exploring temporal patterns in climate data and models. Environmetrics, 23(5), 382-393: https://vita.had.co.nz/papers/glyph-maps.pdf