Download - 15 Time Space
![Page 1: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/1.jpg)
Hadley Wickham
Stat405Visualising time & space
Wednesday, 21 October 2009
![Page 2: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/2.jpg)
1. New data: baby names by state
2. Visualise time (done!)
3. Visualise time conditional on space
4. Visualise space
5. Visualise space conditional on time
Wednesday, 21 October 2009
![Page 3: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/3.jpg)
CC BY http://www.flickr.com/photos/the_light_show/2586781132
Baby names by stateTop 100 male and female baby names for each state, 1960–2008.
480,000 records (100 * 50 * 2 * 48)
Slightly different variables: state, year, name, sex and number.
To keep the data manageable, we will look at the top 25 names of all time.
Wednesday, 21 October 2009
![Page 4: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/4.jpg)
Subset
Easier to compare states if we have proportions. To calculate proportions, need births. But could only find data from 1981.
Then selected 30 names that occurred fairly frequently, and had interesting patterns.
Wednesday, 21 October 2009
![Page 5: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/5.jpg)
Aaron Alex Allison Alyssa Angela Ashley Carlos Chelsea Christian Eric Evan Gabriel Jacob Jared Jennifer Jonathan Juan Katherine Kelsey Kevin Matthew Michelle Natalie Nicholas Noah Rebecca Sara Sarah Taylor Thomas
Wednesday, 21 October 2009
![Page 6: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/6.jpg)
Getting started
library(ggplot2)library(plyr)
bnames <- read.csv("interesting-names.csv", stringsAsFactors = F)
matthew <- subset(bnames, name == "Matthew")
Wednesday, 21 October 2009
![Page 7: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/7.jpg)
Time | Space
Wednesday, 21 October 2009
![Page 8: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/8.jpg)
year
prop
0.01
0.02
0.03
0.04
1985 1990 1995 2000 2005
Wednesday, 21 October 2009
![Page 9: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/9.jpg)
year
prop
0.01
0.02
0.03
0.04
1985 1990 1995 2000 2005
qplot(year, prop, data = matthew, geom = "line", group = state)Wednesday, 21 October 2009
![Page 10: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/10.jpg)
year
prop
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
AK
DE
KS
MO
NV
SD
WV
19851990199520002005
AL
FL
KY
MS
NY
TN
WY
19851990199520002005
AR
GA
LA
MT
OH
TX
19851990199520002005
AZ
HI
MA
NC
OK
UT
19851990199520002005
CA
IA
MD
NE
OR
VA
19851990199520002005
CO
ID
ME
NH
PA
VT
19851990199520002005
CT
IL
MI
NJ
RI
WA
19851990199520002005
DC
IN
MN
NM
SC
WI
19851990199520002005
Wednesday, 21 October 2009
![Page 11: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/11.jpg)
year
prop
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
0.010.020.030.04
AK
DE
KS
MO
NV
SD
WV
19851990199520002005
AL
FL
KY
MS
NY
TN
WY
19851990199520002005
AR
GA
LA
MT
OH
TX
19851990199520002005
AZ
HI
MA
NC
OK
UT
19851990199520002005
CA
IA
MD
NE
OR
VA
19851990199520002005
CO
ID
ME
NH
PA
VT
19851990199520002005
CT
IL
MI
NJ
RI
WA
19851990199520002005
DC
IN
MN
NM
SC
WI
19851990199520002005
last_plot() + facet_wrap(~ state)Wednesday, 21 October 2009
![Page 12: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/12.jpg)
Your turn
Pick some names out of the list and explore. What do you see?
Can you write a function that plots the trend for a given name?
Wednesday, 21 October 2009
![Page 13: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/13.jpg)
show_name <- function(name) { name <- bnames[bnames$name == name, ] qplot(year, prop, data = name, geom = "line", group = state)}
show_name("Jessica")show_name("Aaron")show_name("Juan") + facet_wrap(~ state)
Wednesday, 21 October 2009
![Page 14: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/14.jpg)
year
prop
0.01
0.02
0.03
0.04
1985 1990 1995 2000 2005
Wednesday, 21 October 2009
![Page 15: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/15.jpg)
year
prop
0.01
0.02
0.03
0.04
1985 1990 1995 2000 2005qplot(year, prop, data = matthew, geom = "line", group = state) + geom_smooth(aes(group = 1), se = F, size = 3)Wednesday, 21 October 2009
![Page 16: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/16.jpg)
year
prop
0.01
0.02
0.03
0.04
1985 1990 1995 2000 2005qplot(year, prop, data = matthew, geom = "line", group = state) + geom_smooth(aes(group = 1), se = F, size = 3)
So we only get one smooth for the whole dataset
Wednesday, 21 October 2009
![Page 17: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/17.jpg)
Two useful tools
Smoothing: can be easier to perceive overall trend by smoothing individual functions
Indexing: remove initial differences by indexing to the first value.
Not that useful here, but good tools to have in your toolbox.
Wednesday, 21 October 2009
![Page 18: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/18.jpg)
library(mgcv)smooth <- function(y, x) { as.numeric(predict(gam(y ~ s(x))))}
matthew <- ddply(matthew, "state", transform, prop_s = smooth(prop, year))
qplot(year, prop_s, data = matthew, geom = "line", group = state)
Wednesday, 21 October 2009
![Page 19: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/19.jpg)
index <- function(y, x) { y / y[order(x)[1]]}
matthew <- ddply(matthew, "state", transform, prop_i = index(prop, year), prop_si = index(prop_s, year))
qplot(year, prop_i, data = matthew, geom = "line", group = state)qplot(year, prop_si, data = matthew, geom = "line", group = state)
Wednesday, 21 October 2009
![Page 20: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/20.jpg)
Create a plot to show all names simultaneously. Does smoothing every name in every state make it easier to see patterns?
Hint: run the following R code on the next slide to eliminate names with less than 10 years of data
Your turn
Wednesday, 21 October 2009
![Page 21: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/21.jpg)
longterm <- ddply(bnames, c("name", "state"), function(df) { if (nrow(df) > 10) { df }})
Wednesday, 21 October 2009
![Page 22: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/22.jpg)
qplot(year, prop, data = bnames, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name)
longterm <- ddply(longterm, c("name", "state"), transform, prop_s = smooth(prop, year))
qplot(year, prop_s, data = longterm, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name)last_plot() + facet_wrap(scales = "free_y")
Wednesday, 21 October 2009
![Page 23: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/23.jpg)
Space
Wednesday, 21 October 2009
![Page 24: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/24.jpg)
Spatial plots
Choropleth map: map colour of areas to value.
Proportional symbol map: map size of symbols to value
Wednesday, 21 October 2009
![Page 25: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/25.jpg)
juan2000 <- subset(bnames, name == "Juan" & year == 2000)
# Turn map data into normal data framelibrary(maps)states <- map_data("state")states$state <- state.abb[match(states$region, tolower(state.name))]
# Merge and then restore original orderchoropleth <- merge(juan2000, states, by = "state", all.y = T)choropleth <- choropleth[order(choropleth$order), ]
# Plot with polygonsqplot(long, lat, data = choropleth, geom = "polygon", fill = prop, group = group)
Wednesday, 21 October 2009
![Page 26: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/26.jpg)
long
lat
30
35
40
45
−120 −110 −100 −90 −80 −70
prop0.0040.0060.0080.01
Wednesday, 21 October 2009
![Page 27: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/27.jpg)
long
lat
30
35
40
45
−120 −110 −100 −90 −80 −70
prop0.0040.0060.0080.01
What’s the problem with this map?
How could we fix it?
Wednesday, 21 October 2009
![Page 28: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/28.jpg)
ggplot(choropleth, aes(long, lat, group = group)) + geom_polygon(fill = "white", colour = "grey50") + geom_polygon(aes(fill = prop))
Wednesday, 21 October 2009
![Page 29: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/29.jpg)
long
lat
30
35
40
45
−120 −110 −100 −90 −80 −70
prop0.0040.0060.0080.01
Wednesday, 21 October 2009
![Page 30: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/30.jpg)
Problems?
What are the problems with this sort of plot?
Take one minute to brainstorm some possible issues.
Wednesday, 21 October 2009
![Page 31: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/31.jpg)
Problems
Big areas most striking. But in the US (as with most countries) big areas tend to least populated. Most populated areas tend to be small and dense - e.g. the East coast.
(Another computational problem: need to push around a lot of data to create these plots)
Wednesday, 21 October 2009
![Page 32: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/32.jpg)
long
lat
30
35
40
45
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−120 −110 −100 −90 −80 −70
prop● 0.004● 0.006● 0.008
● 0.010
Wednesday, 21 October 2009
![Page 33: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/33.jpg)
mid_range <- function(x) mean(range(x))centres <- ddply(states, c("state"), summarise, lat = mid_range(lat), long = mid_range(long))
bubble <- merge(juan2000, centres, by = "state")qplot(long, lat, data = bubble, size = prop, colour = prop)
ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop))
Wednesday, 21 October 2009
![Page 34: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/34.jpg)
Replicate either a choropleth or a proportional symbol map with the name of your choice.
Your turn
Wednesday, 21 October 2009
![Page 35: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/35.jpg)
Space | Time
Wednesday, 21 October 2009
![Page 36: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/36.jpg)
Wednesday, 21 October 2009
![Page 37: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/37.jpg)
Your turn
Try and create this plot yourself. What is the main difference between this plot and the previous?
Wednesday, 21 October 2009
![Page 38: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/38.jpg)
juan <- subset(bnames, name == "Juan")bubble <- merge(juan, centres, by = "state")
ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) + facet_wrap(~ year)
Wednesday, 21 October 2009
![Page 39: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/39.jpg)
Aside: geographic data
Boundaries for most countries available from: http://gadm.org
To use with ggplot2, use the fortify function to convert to usual data frame.
Will also need to install the sp package.
Wednesday, 21 October 2009
![Page 40: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/40.jpg)
# install.packages("sp")
library(sp)load(url("http://gadm.org/data/rda/CHE_adm1.RData"))
head(as.data.frame(gadm))ch <- fortify(gadm, region = "ID_1")str(ch)
qplot(long, lat, group = group, data = ch, geom = "polygon", colour = I("white"))
Wednesday, 21 October 2009
![Page 41: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/41.jpg)
Wednesday, 21 October 2009
![Page 42: 15 Time Space](https://reader034.vdocuments.us/reader034/viewer/2022051412/54c6631d4a79591e088b45ae/html5/thumbnails/42.jpg)
This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Wednesday, 21 October 2009