data science: data visualization boot camp relationship ... · bubble plot chuck cartledge,...
TRANSCRIPT
-
Data Science: Data Visualization Boot CampRelationshipBubble Plot
Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD
24 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 2020
1/22
-
2/22
Type Sample data Hands on Q & A Conclusion References Files
Table of contents (1 of 1)
1 TypeUsesGeneral considerations
2 Sample data
3 Hands on
4 Q & A
5 Conclusion6 References7 Files
-
3/22
Type Sample data Hands on Q & A Conclusion References Files
A definition
“A bubble graph is a vari-ation of a point or line graphwhere the data points (dots)have been replaced by circles(bubbles). The major advan-tage of a bubble graph versus apoint or line graph is the abil-ity to encode one or more addi-tional variables by means of thebubble symbol. Bubble graphsmight be two or three dimen-sional, . . . ”
R. L. Harris [1]
-
4/22
Type Sample data Hands on Q & A Conclusion References Files
R supplied data set (1 of 2)
Included in the R package ggplot2.
“This dataset contains a subset of the fuel econ-omy data that the EPA makes available on . It contains only models which hada new release every year between 1999 and 2008 - this wasused as a proxy for the popularity of the car.”
H. Wickham [2]
library(ggplot2)
?mpg
head(mpg)
Resulting in:
-
5/22
Type Sample data Hands on Q & A Conclusion References Files
R supplied data set (2 of 2)
# A tibble: 6 x 11
manufacturer model displ year cyl trans drv cty hwy fl class
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p comp
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p comp
3 audi a4 2 2008 4 manual(m6) f 20 31 p comp
4 audi a4 2 2008 4 auto(av) f 21 30 p comp
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p comp
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p comp
-
6/22
Type Sample data Hands on Q & A Conclusion References Files
More recent mileage data
Downloaded from:https://www.fueleconomy.gov/feg/download.shtml
Described at: https://www.fueleconomy.gov/feg/ws/index.shtml#vehicle
We will:
1 Extract csv data from a zip file (39,865 rows)
2 Select certain makes (attempt to replicate the sample data)
3 Display different data for selected makes/models
https://www.fueleconomy.gov/feg/download.shtmlhttps://www.fueleconomy.gov/feg/ws/index.shtml#vehiclehttps://www.fueleconomy.gov/feg/ws/index.shtml#vehicle
-
7/22
Type Sample data Hands on Q & A Conclusion References Files
The first codes. (1 of 3)
-
8/22
Type Sample data Hands on Q & A Conclusion References Files
The first codes. (2 of 3)
rm(list=ls())
library(ggplot2)
data(mpg, package="ggplot2")
mpg_select
-
9/22
Type Sample data Hands on Q & A Conclusion References Files
The first codes. (3 of 3)
g + geom_point(aes(col=manufacturer))
g + geom_jitter(aes(col=manufacturer))
g + geom_jitter(aes(col=manufacturer, size=hwy)) +
geom_smooth(aes(col=manufacturer), method="lm", se=F)
g + geom_jitter(aes(col=manufacturer, size=hwy)) +
geom_smooth(aes(col=manufacturer), method="lm", se=F) +
labs(size = "Highway\n mpg",
colour = "Brand")
-
10/22
Type Sample data Hands on Q & A Conclusion References Files
The second codes. (1 of 4)
-
11/22
Type Sample data Hands on Q & A Conclusion References Files
The second codes. (2 of 4)
rm(list=ls())
library(ggplot2)
saveFileName
-
12/22
Type Sample data Hands on Q & A Conclusion References Files
The second codes. (3 of 4)
labs(subtitle="mpg: Displacement vs City Mileage",
title="Bubble chart",
x="Engine displacement (liters)",
y="City mpg",
color="Manufacturer")
g + geom_point()
g + geom_point(aes(col=make))
g + geom_jitter(aes(col=make))
g + geom_jitter(aes(col=make, size=highway08)) +
geom_smooth(aes(col=make), method="lm", se=F) +
labs(size = "Highway\n mpg",
colour = "Brand")
-
13/22
Type Sample data Hands on Q & A Conclusion References Files
The second codes. (4 of 4)
-
14/22
Type Sample data Hands on Q & A Conclusion References Files
The third codes. (1 of 4)
-
15/22
Type Sample data Hands on Q & A Conclusion References Files
The third codes. (2 of 4)
rm(list=ls())
library(ggplot2)
saveFileName
-
16/22
Type Sample data Hands on Q & A Conclusion References Files
The third codes. (3 of 4)
" and average CO2 over time"),
title="Bubble chart",
x="Year",
y="City mpg",
caption = paste0("Idea taken from \"Practical",
" Statistics for Data",
" Scientists\", Bruce and Bruce."
)
)
g + geom_point()
g + geom_jitter()
g + geom_jitter(aes(size=co2,
shape=as.factor(cylinders)
), alpha=0.5) +
geom_smooth(colour="green", method="lm", se=F) +
-
17/22
Type Sample data Hands on Q & A Conclusion References Files
The third codes. (4 of 4)
labs(size = "CO2\nmeasurements",
shape = "Number\nof cylinders"
)
-
18/22
Type Sample data Hands on Q & A Conclusion References Files
Hands-on
1 The supervisor would like to see the effect of different“default” themes on the first plot. Show how to use the gray,linedraw, and classical themes.
2 The CO2 plot displays data for Hondas only. Change the dataselection command to include Fords, and discuss how theresulting plot could be improved.
-
19/22
Type Sample data Hands on Q & A Conclusion References Files
Q & A time.
Q: How many Harvard MBA’sdoes it take to screw in a lightbulb?A: Just one. He grasps it firmlyand the universe revolves aroundhim.
-
20/22
Type Sample data Hands on Q & A Conclusion References Files
What have we covered?
Bubble plots are:
Require slightly more thoughtand consideration than scatterplotsUsed to show 3, or more relateddata sets
Good for showing gross differencesin the third dimension.
Next: Columnar histograms (how grouping data can show patterns)
-
21/22
Type Sample data Hands on Q & A Conclusion References Files
References (1 of 1)
[1] Robert L. Harris,Information Graphics: A Comprehensive Illustrated Reference,Oxford University Press, 2000.
[2] Hadley Wickham, ggplot2: Elegant Graphics for Data Analysis,Springer-Verlag New York, 2009.
-
22/22
Type Sample data Hands on Q & A Conclusion References Files
Files of interest
1 Code snippet to createimages in this presentation
2 Extract Federal fuel data
## First codesrm(list=ls())
library(ggplot2)data(mpg, package="ggplot2")
mpg_select