data visualization - statistical horizons · data visualization kieran healy sample slides. using...

25
Data Visualization Kieran Healy, Ph.D. Upcoming Seminar: December 1-2, 2017, Philadelphia, Pennsylvania

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

Data Visualization Kieran Healy, Ph.D.

Upcoming Seminar: December 1-2, 2017, Philadelphia, Pennsylvania

Page 2: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

DATA VISUALIZATIONKieran Healy

Sample Slides

Page 3: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

Using ggplot

Page 4: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

HOW IT WORKS

Page 5: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

gdp lifexp pop continent

340 65 31 Euro

227 51 200 Amer

909 81 80 Euro

126 40 20 Asia

Page 6: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

AsiaEuroAmer

0-3536-100

>100

log GDP

Life E

xpec

tancy

A Gapminder Plot

Continent

Population

Page 7: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

gdp lifexp pop continent

340 65 31 Euro

227 51 200 Amer

909 81 80 Euro

126 40 20 Asia

1. Tidy Data

x=gdp y=lifexp size=pop color=continent

2. Mapping 3. Geomgeom_point()ggplot(mapping = aes(x = …))

Page 8: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

x

y y

log10 x

AsiaEuroAmer

0-3536-100

>100

log GDP

Life E

xpec

tancy

A Gapminder Plot

4. Coordinate System

5. Scales 6. Labels & Guides

Continent

Population

Page 9: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

AsiaEuroAmer

0-3536-100

>100

log GDP

Life E

xpec

tancy

A Gapminder Plot

Continent

Population

Page 10: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

LAYER BY LAYER

Page 11: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

head(gapminder)

###Atibble:6×6##countrycontinentyearlifeExppopgdpPercap##<fctr><fctr><int><dbl><int><dbl>##1AfghanistanAsia195228.8018425333779.4453##2AfghanistanAsia195730.3329240934820.8530##3AfghanistanAsia196231.99710267083853.1007##4AfghanistanAsia196734.02011537966836.1971##5AfghanistanAsia197236.08813079460739.9811##6AfghanistanAsia197738.43814880372786.1134

dim(gapminder)##[1]17046

Page 12: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

p<-ggplot(data=gapminder)

Create a ggplot objectData is gapminder table

Page 13: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

p<-ggplot(data=gapminder,mapping=aes(x=gdpPercap,y=lifeExp))

Mapping: tell ggplot the relationships you want to see

Page 14: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

•The mapping=aes(...) instruction links variables to things you will see on the plot.

•The x and y values are the most obvious ones.•Other aesthetic mappings can include, e.g., color, shape, and size.

•These mappings are not directly specifying what specific, e.g., colors or shapes will be on the plot. Rather they say which variables in the data will be represented by, e.g., colors and shapes on the plot.

Page 15: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

p+geom_point()

Add a geom layer to the plot

Page 16: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909
Page 17: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

p+geom_smooth()

Try a different geom

Page 18: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909
Page 19: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

This process is literally additive

p+geom_point()+geom_smooth(method='gam')+scale_x_log10(labels=scales::dollar)

Page 20: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

p+geom_point()+geom_smooth(method=“lm”)

Every geom is a function. It can take arguments.

Page 21: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909
Page 22: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

Keep Layering

p<-ggplot(data=gapminder,mapping=aes(x=gdpPercap,y=lifeExp))p+geom_point()+geom_smooth(method='gam')+scale_x_log10()

Page 23: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909
Page 24: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909

p+geom_point()+geom_smooth(method='gam')+scale_x_log10(labels=scales::dollar)+labs(x="GDPPerCapita",y="LifeExpectancyinYears",title="EconomicGrowthandLifeExpectancy",subtitle="Datapointsarecountry-years")

Page 25: Data Visualization - Statistical Horizons · DATA VISUALIZATION Kieran Healy Sample Slides. Using ggplot. HOW IT WORKS. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909