social data science - data visualization · goals whatareyoutryingtoaccomplish? 1....

86
social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

Upload: others

Post on 21-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

social data scienceData Visualization

Sebastian BarfortAugust 08, 2016

University of CopenhagenDepartment of Economics

1/86

Page 2: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

Who’s ahead in the polls?

2/86

Page 3: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

What values are displayed in this chart?

3/86

Page 4: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

The answer may surprise you

4/86

Page 5: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

goals

What are you trying to accomplish?

1. Who’s the audience?

· Exploratory (use defaults) vs. explanatory (customize)· Raw data vs. model results

2. Graphs should be self explanatory3. A graph is a narrative - should convey key point(s)

5/86

Page 6: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

6/86

Page 7: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

three basic principles

Schwabish (2014)

1. Show the data (many graphs show too much)2. Reduce the clutter3. Integrate text and graph

7/86

Page 8: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

your turn

Discuss in groups

1. Look at the eight transformed graphs in Schwabish (2014). Whatdo you think about the transformations?

2. Are there other objectives of data visualization not mentionedby Schwabish?

3. If you were to improve on one of the eight transformed graphs,which one would you choose and how would you change it?

8/86

Page 9: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

other considerations

Color palette

Is your data numeric, binary, catetorical, text?

Should you truncate your y axis?

Remember

Visualizing data is not just a matter of good taste

Basic perceptual processes play a very strong role

9/86

Page 10: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

10/86

Page 11: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

11/86

Page 12: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

12/86

Page 13: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

13/86

Page 14: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

14/86

Page 15: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

some things i think work well

Small multiples

Tufte

“Illustrations of postage-stamp size are indexed by categoryor a label, sequenced over time like the frames of a movie,or ordered by a quantitative variable not used in the singleimage itself.”

15/86

Page 16: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

16/86

Page 17: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

“lollipops” and “dumbbells”

17/86

Page 18: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

slopegraphs

18/86

Page 19: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

data visualization software

Static

1. Base R2. lattice3. ggplot2

Interactive

1. D3 (javascript)2. htmlwidgets (R)3. Tableau (commercial)

19/86

Page 20: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

ggplot2

20/86

Page 21: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

why?

Positives

· Great defaults· Intuitive· Extremely well documented

Negatives

· Slow· Limited functionality

21/86

Page 22: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

grammar of graphics

ggplot2 is based on the grammar of graphics, the idea that you canbuild every graph from the same few components: a dataset, a set ofgeoms—visual marks that represent data points, and a coordinatesystem

You start with your data, and then you assign a geometry toelements of that data, such as circle size to population, then youdraw those geometries based upon some scaling of your data. Whenyou think about visualization this way it helps you develop a betterunderstanding of the data itself and think of proper ways tovisualize it.

22/86

Page 23: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

a pew visualization of partisanship

The Pew Research Center did an interesting visualization of politicalpolarization in the U.S.

Polarization

The new survey finds that as ideological consistency hasbecome more common, it has become increasingly alignedwith partisanship. Looking at 10 political values questionstracked since 1994, more Democrats now give uniformlyliberal responses, and more Republicans give uniformlyconservative responses than at any point in the last 20 years

Bob Rudis recreated their plot in ggplot2. Let’s see how.

23/86

Page 24: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

24/86

Page 25: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

basics

ggplot works by building your plot piece by pieceFirst we need to load some data into R (link here)

library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/polarization.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)names(df)

## [1] ”x” ”year” ”party” ”pct”

25/86

Page 26: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

inspecting the data

## [1] ”x” ”year” ”party” ”pct”

x year party pct

-10 1994 Dem 0.57-9 1994 Dem 1.60-8 1994 Dem 1.89-7 1994 Dem 3.49-6 1994 Dem 3.96-5 1994 Dem 6.56

26/86

Page 27: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

ggplot2 basics

ggplot works by building your plot piece by pieceThen we tell ggplot what pieces of the data frame we are interestedin

We create an object called p containing this informationHere, x = x and y = pct say what will go on the x and the y axesThese are aesthetic mappings that connect pieces of the data tothings we can actually see on a plot.

27/86

Page 28: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

library(”ggplot2”)p = ggplot(data = df, aes(x = x, y = pct))

28/86

Page 29: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

aesthetics, but no geoms

The plot is so far just a frame with no actual information

0.0

2.5

5.0

7.5

10.0

−10 −5 0 5 10x

pct

29/86

Page 30: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

let’s add a line

p = p + geom_line()

30/86

Page 31: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0.0

2.5

5.0

7.5

10.0

−10 −5 0 5 10x

pct

31/86

Page 32: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

problem

We need to inform ggplot that Democrats and Republicans are twoseparate groups.

This can be done using fill, groups, shape, size or color.

p = ggplot(data = df,aes(x = x, y = pct,

fill = party, color = party))

32/86

Page 33: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

try again

p = p + geom_line()

33/86

Page 34: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0.0

2.5

5.0

7.5

10.0

−10 −5 0 5 10x

pct

party

Dem

REP

34/86

Page 35: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

thoughts

1. Shading below the line2. Filling should be transparent3. Custom filling4. Axes should be modified5. Subset by year6. Axis title7. Background8. Legend

35/86

Page 36: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

1. shading below the line

p = p + geom_ribbon(aes(ymin = 0, ymax = pct))

36/86

Page 37: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0.0

2.5

5.0

7.5

10.0

−10 −5 0 5 10x

pct

party

Dem

REP

37/86

Page 38: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

2. filling should be transparent

p = p + geom_ribbon(aes(ymin = 0, ymax = pct),alpha = .5)

38/86

Page 39: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0.0

2.5

5.0

7.5

10.0

−10 −5 0 5 10x

pct

party

Dem

REP

39/86

Page 40: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

3. custom filling

p = p + scale_color_manual(name=NULL,values=c(Dem=”#728ea2”,REP=”#cf6a5d”),labels=c(Dem=”Democrats”, REP=”Republicans”)) +scale_fill_manual(

name=NULL,values=c(Dem=”#728ea2”, REP=”#cf6a5d”),labels=c(Dem=”Democrats”, REP=”Republicans”)) +

guides(color=”none”,fill=guide_legend(override.aes=list(alpha=1)))

40/86

Page 41: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0.0

2.5

5.0

7.5

10.0

−10 −5 0 5 10x

pct Democrats

Republicans

41/86

Page 42: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

4. axes should be modified

p = p +scale_x_continuous(

expand = c(0,0), breaks = c(-8, 0, 8),labels= c(”Consistently\nliberal”,

”Mixed”,”Consistently\nconservative”)) +

scale_y_continuous(expand = c(0,0), limits = c(0, 12))

42/86

Page 43: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0

3

6

9

12

Consistentlyliberal

Mixed Consistentlyconservative

x

pct Democrats

Republicans

43/86

Page 44: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

5. subset by year

p = p + facet_wrap(~ year, ncol = 2,scales = ”free_x”)

44/86

Page 45: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

1994 1999

2001 2004

2014

036912

036912

036912

Consistentlyliberal

MixedConsistentlyconservative

Consistentlyliberal

MixedConsistentlyconservative

Consistentlyliberal

MixedConsistentlyconservative

Consistentlyliberal

MixedConsistentlyconservative

Consistentlyliberal

MixedConsistentlyconservative

x

pct Democrats

Republicans

45/86

Page 46: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

6. axis title

p = p + labs(x = NULL,y = NULL,title = ”Polarization, 1994-2014”)

46/86

Page 47: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

7. background

p = p + theme_minimal()

47/86

Page 48: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

8. legend

p = p +theme(legend.position = c(0.75, 0.1)) +theme(legend.direction = ”horizontal”) +theme(axis.text.y = element_blank())

48/86

Page 49: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

output

1994 1999

2001 2004

2014

Consistentlyliberal

Mixed Consistentlyconservative

Consistentlyliberal

Mixed Consistentlyconservative

Consistentlyliberal

Mixed Consistentlyconservative

Consistentlyliberal

Mixed Consistentlyconservative

Consistentlyliberal

Mixed Consistentlyconservative

Democrats Republicans

Polarization, 1994−2014

49/86

Page 50: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

visualizing polls

How do you visualize polls?

New polls generate a lot of news stories

But much of it is probably noise

How to deal with this visually?

50/86

Page 51: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

berlingske

51/86

Page 52: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

alternative

A poll is not really a number

It’s a distribution

Usually, that distribution is normal

So let’s draw the distribution

52/86

Page 53: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

read data

library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/polls_tidy.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)

53/86

Page 54: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

First five rows/columns

predicted density estimate ci Parti

12.83318 6.60e-06 23.6 2.39159 Socialdemokraterne12.92999 7.90e-06 23.6 2.39159 Socialdemokraterne13.02681 9.50e-06 23.6 2.39159 Socialdemokraterne13.12363 1.14e-05 23.6 2.39159 Socialdemokraterne13.22045 1.36e-05 23.6 2.39159 Socialdemokraterne

54/86

Page 55: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

your turn

Inspect the dataset

What do you think the variables predicted and density mean?

55/86

Page 56: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

visualization

p = ggplot(df, aes(x = predicted, y = density))

56/86

Page 57: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

p = p + geom_line()

57/86

Page 58: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

p = ggplot(df, aes(x = predicted, y = density,group = group, color = Parti))

p = p + geom_line()

58/86

Page 59: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

p = ggplot(df, aes(x = predicted, y = density,group = group, color = Parti))

p = p +geom_line() +facet_grid(party.ord ~ .,

scales = ”free_y”,switch = ”y”)

59/86

Page 60: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

new strategy

df.1 = subset(df, date != max(date))df.2 = subset(df, date == max(date))p = ggplot()p = p +geom_line(data = df.1,

aes(x = predicted, y = density,group = group),

color = ”grey50”, alpha = .25) +facet_grid(party.ord ~ .,

scales = ”free_y”, switch = ”y”)

60/86

Page 61: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

add latest poll

p = p + geom_line(data = df.2,aes(x = predicted, y = density,

group = group),colour = ”red”, size = 1)

61/86

Page 62: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

details i

p = p + scale_x_continuous(breaks = scales::pretty_breaks(10)) +theme_minimal() +theme(strip.text.y = element_text(angle = 180,

face = ”bold”,vjust = 0.75,hjust = 1),

axis.text.y = element_blank()) +labs(y = NULL, x = NULL)

62/86

Page 63: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

output

SOCIALDEMOKRATERNE

DANSK FOLKEPARTI

VENSTRE

ENHEDSLISTEN

LIBERAL ALLIANCE

ALTERNATIVET

RADIKALE VENSTRE

SF

KONSERVATIVE

0 5 10 15 20 25 30 35

63/86

Page 65: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

65/86

Page 66: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

data

library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/basket_tidy.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)

66/86

Page 67: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

MP 3P agg.three agg.minutes games player

35 5 5 35 1 Stephen Curry, 2015-1627 4 9 62 2 Stephen Curry, 2015-1635 8 17 97 3 Stephen Curry, 2015-1628 4 21 125 4 Stephen Curry, 2015-1632 7 28 157 5 Stephen Curry, 2015-16

67/86

Page 68: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

plot

p = ggplot(data = df,aes(x = games,

y = agg.three,color = as.factor(steph.ind),group = player))

68/86

Page 69: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

add steps

p = p + geom_step(size = 1)

69/86

Page 70: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0

100

200

300

400

0 20 40 60 80games

agg.

thre

e

as.factor(steph.ind)

0

1

2

70/86

Page 71: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

fix scales, etc.

p = p + scale_x_continuous(expand = c(0, 0),limits = c(0, 160),breaks = c(0, 20, 40, 60, 80)) +

scale_y_continuous(expand = c(0, 0)) +

theme_minimal()

71/86

Page 72: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0

100

200

300

400

0 20 40 60 80games

agg.

thre

e as.factor(steph.ind)0

1

2

72/86

Page 73: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

fix color and axis labels

p = p +scale_color_manual(

values = c(”grey50”, ”grey50”, ”darkblue”)) +labs(x = ”Games”,

y = ”# of threes”,title = ”Stephen Curry’s 3-Point Record”)

73/86

Page 74: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

theme changes

p = p +theme(legend.position = ”none”,

axis.ticks.x = element_blank(),axis.line.x = element_blank(),panel.grid.major.x = element_blank(),panel.grid.minor.x = element_blank())

74/86

Page 75: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

0

100

200

300

400

0 20 40 60 80Games

# of

thre

esStephen Curry's 3−Point Record

75/86

Page 76: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

labels

https://github.com/slowkow/ggrepel

76/86

Page 77: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

preparation

data.max = df %>%filter(player ==

”Stephen Curry, 2015-16”) %>%filter(agg.three == max(agg.three))

77/86

Page 78: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

label

library(”ggrepel”)p = p +geom_text_repel(

data = data.max,aes(label = player,

color = as.factor(steph.ind)),size = 3,nudge_x = 24,segment.size = 0

)

78/86

Page 79: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

output

79/86

Page 81: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

load data

library(”readr”)gh.link = ”https://raw.githubusercontent.com/”user.repo = ”sebastianbarfort/sds_summer/”branch = ”gh-pages/”link = ”data/armslist.csv”data.link = paste0(gh.link, user.repo, branch, link)df = read_csv(data.link)

81/86

Page 82: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

create ggplot

p = ggplot(data = df, aes(x = price))

82/86

Page 83: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

your turn

Discuss in groups

1. What is interesting about these data?2. How would you visualize the distribution of the price variables3. Search the ggplot2 documentation here and create avisualization

4. Discuss how you would improve your visualization5. Look at Bob Rudis’ plots. Would you change anything? Do youthink they work?

83/86

Page 84: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

Maps

84/86

Page 85: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

making maps

There are many ways to make maps in R

1. easy: use an existing package2. hard: learn how to work with shapefiles (we don’t have time forthis today, but I strongly recommend reading these notes on thetopic)

85/86

Page 86: Social Data Science - Data Visualization · goals Whatareyoutryingtoaccomplish? 1. Who’stheaudience? · Exploratory(usedefaults)vs.explanatory(customize) · Rawdatavs.modelresults

map packages

There are many useful packages for making maps in R

· maps: all kinds of maps· ggcounty: generate United States county maps· ggmap: extends ggplot2 for maps· mapDK: maps of Denmark

86/86