an introduction to ggplot2 · 2016. 7. 11. · an introduction to ggplot2 claude renaux seminar for...
TRANSCRIPT
An introduction to ggplot2
Claude Renaux
Seminar for Statistics, ETH Zurich
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 1 / 33
Table of Contents
1 Overview
2 Selection of geoms
3 Multiple Layers and Faceting
4 Aesthetics
5 Tailoring Graphics
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 2 / 33
ggplot2: plots of one variable
In this section we will . . .
. . . get started with ggplot2
. . . look at plots of one variable
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 3 / 33
Must have
Useful cheatsheet: https://www.rstudio.com/resources/cheatsheets/ (pickData Visualisation with ggplot2)
Source: link above. This image is under Creative Commons licence.
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 4 / 33
Why ggplot2?
To mention some advantages:
nice labels
nice colors
small margins
beautiful faceting or multipanel plots
very powerful and flexible: we’ll have a glimpse at the grammar ofgraphics
we can change or update plots
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 5 / 33
Why ggplot2?
To mention some disadvantages:
ggplot2 can only deal with data.frames
default plots of model outputs are normally not possible
ggplot2 is not optimized for the best speed performance
3-D plots are not possible
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 6 / 33
Overview: plots of one variable
One continuous variable
histogram: geom_histogram()
densities: geom_density()
frequency plot: geom_freqpoly()
One discrete variable
barplot: geom_bar()
pie plot: different coordinate system of barplot...
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 7 / 33
Package: ggplot2
There are two important functions.
qplot: similar to base plotting functions
ggplot: look at this function closer
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 8 / 33
Let’s have a look — mpg data set
Let’s look at the mpg data set from ggplot2. It contains 234 observationsabout the fuel efficiency of 38 popular cars in 1999 and 2008.> ?mpg
> str(mpg)
Let’s look at the democode.
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 9 / 33
ggplot2: plots of two variables
In this section we will . . .
. . . glimpse at grammar of graphics underlying ggplot2
. . . create plots with two variables
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 10 / 33
Glimpse at grammar of graphics
The ”gg” in ggplot2 stands for grammar of graphics which is based onWilkinson’s (2005) grammar of graphics.
The layered grammar is useful because . . .
it is a generic way of creating a plot
do not relay on specific or customized graphic for a particular problem
iteratively update or create a plot
add layers
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 11 / 33
Glimpse at grammar of graphics
Idea: all the plots can be build from the same components
data set
geoms (geometric object) — visually representing observations
aes (aesthetics) — aesthetics properties of the geometric objects
scale — defining how the data is plotted
coordinate system
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 12 / 33
Glimpse at grammar of graphics
Source: https://www.rstudio.com/resources/cheatsheets/ of the Data Visualization
Cheat Sheet. This image is under Creative Commons licence.
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 13 / 33
Glimpse at grammar of graphics
Source: https://www.rstudio.com/resources/cheatsheets/ of the Data Visualization
Cheat Sheet. This image is under Creative Commons licence.
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 14 / 33
Overview: plots of two variables
Two continuous variables
scatter plot: geom_point()
scatter plot using jitter: geom_jitter()
smoother: geom_smooth()
Discrete x and continuous y
boxplot: geom_boxplot()
bar plot: geom_bar(stat = "identity")
Continuous function like time series
line plot: geom_line()
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 15 / 33
Let’s have a look — mpg data set
Let’s look at the democode.
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 16 / 33
ggplot2: multiple layers and faceting
In this section we will . . .
. . . look at multiple layers
. . . consider faceting or multipanel conditioning plots
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 17 / 33
Multiple layers
Add multiple layers> # variable cty is the city miles per gallon
> ggplot(data = mpg, aes(x = hwy, y = cty)) +
+ geom_jitter() +
+ geom_smooth() +
+ geom_rug(sides = "bl", position = "jitter")
●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●
●
● ●●
●
●●
● ●
●
●
●
●
●●
●
●●
●
●
●●
●
●●●
●
●
●●
●●●
●●
● ●●
●
●
●●
●
●
● ●●●
●
●
●
●
●
●
●
●●
●
● ●
●●
●●
●● ●
●●
●●
●●
● ●●
●
●
●
● ●
● ●
●●
●●
●
●●●
●
●
●●
●●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
● ●●
●●
●●
●
●
●●
●
●
●● ●●
● ●
●
●● ●
●
●● ●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
10
20
30
20 30 40hwy
cty
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 18 / 33
Faceting
Let’s look a multi-panel plots> # subset of the mpg data set
> mpg.small <- subset(mpg, manufacturer %in%
+ c("ford", "land rover", "toyota",
+ "chevrolet", "honda", "volkswagen"))
> ggplot(data = mpg.small, aes(x = hwy, y = cty)) +
+ geom_jitter() +
+ facet_wrap( ~ manufacturer)
●
●
●●●
●●
●●●
●
● ●
●
●
●
● ●●
●●●
●●
● ●●●●●●● ●
●
●
●●●
●● ● ●●
●
●
●●
●●
●●●
●
●●●
●
●●
●●●
●
●● ●●
●●●
● ● ●●
●● ●
● ●
●
●
●
●
●
●●
●
● ●●●
●●
●●
●
●
●
●
●●●●
●●
●
●
●
● ● ●●
●●
●
●
●●
chevrolet ford honda
land rover toyota volkswagen
10
15
20
25
30
35
10
15
20
25
30
35
20 30 40 20 30 40 20 30 40hwy
cty
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 19 / 33
Let’s have a look at the aesthetics
In this section we will have a look at the aesthetics . . .
. . . size
. . . shape
. . . color
. . . and combine them
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 20 / 33
size
> ggplot(data = mpg, aes(x = hwy, y = cty, size = displ)) +
+ geom_jitter()
> # displ. engine displacement, in litres
●
●
●
●
●
● ●●
●
●
●
●
●●
●●
●●
●
●
●●
●
●●
●●●
●
● ●
●
●
●
● ●
●●●
●●
●●
●
●●
●●●●
●
● ●●
●
●●
●●
●
●
●
●
●●
●
●●●
●
● ●
●
● ●● ●
●●
●
●●●●●
●●●
●
●
●●
●●
● ● ●●●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●●●
●●
●
●
●●
●●
●
●
●●
● ●●●● ●●
●
●●●●
●
●
● ●
●●●
●●
●
●
●
●
●
●
● ●
●
●●
●●
●
●
●
●●●
● ●
●
●
●
●●●
●
●
●● ● ●
●●●
● ●●
●
●●●
●●
●
●
●
●
●
●
●●
● ●●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
10
15
20
25
30
35
20 30 40hwy
cty
displ●
●
●
●●●
2
3
4
5
6
7
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 21 / 33
shape
> ggplot(data = mpg, aes(x = hwy, y = cty, shape = factor(cyl))) +
+ geom_jitter(size = 3)
> # one can set a fixed size for all the points
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●●
●
●
● ●
●●
●
●●
●
●
●● ●
● ●●
●
●●
●● ●●● ● ●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10
15
20
25
30
35
20 30 40hwy
cty
factor(cyl)
● 4
5
6
8
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 22 / 33
color
> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl))) +
+ geom_jitter(size = 3)
●
●
●
●
●
●●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●●
● ●
●
●
●
● ●●
●●
●●
●●
●
●●●●
●●
●
● ●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●●
●
●
●●
●●●●●
●● ●
●
●
●●
●●
●● ●●●
●
●
●
●
●
●●
●
●
● ●
● ●
●●●● ●
●●
●
●
●●
●●
●
●
●
●
● ●
● ●● ●●
●
●●●●
●
●
● ●
●●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●●●●
●●
●
● ●●
●
●● ●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
10
20
30
10 20 30 40hwy
cty
factor(cyl)
●
●
●
●
4
5
6
8
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 23 / 33
Combine them
> ggplot(data = mpg, aes(x = hwy, y = cty, color = factor(cyl),
+ shape = factor(cyl), size = displ)) +
+ geom_jitter()
> # nice, there is only one combined legend for shape and color
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●●
●
●
● ●
●●
●
●
●
●
●
●●●
● ●
●
●
●●
●●
●●● ● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
10
15
20
25
30
35
20 30 40hwy
cty
displ●
●
●
●●●
2
3
4
5
6
7
factor(cyl)● 4
5
6
8
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 24 / 33
The logic . . .
In this section we will look at. . .
. . . where to put what?
. . . the order of the layers
. . . how to save a plot
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 25 / 33
Where to put what?
All the arguments specified in the function ggplot() are passed to allthe layers.
Holds unless specified otherwise in a layer.
Arguments specified in a layer effect only the corresponding layer.
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 26 / 33
Where to put what?
Basic plot to start with> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() + geom_smooth()
Color both points and smoothers per group of drv⇒ three smoothers are fitted> ggplot(data = mpg, aes(x = hwy, y = cty, color = drv)) + geom_jitter() +
+ geom_smooth()
●
●
●
●
●
●●●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●●
●
●●●
● ●
●
●
●
●●
●●
●
●●●●
●
●●
●●
●●
●● ●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
● ●●
●
●
●
●●●●
●●
●●●
●
●
●●
●
●
● ● ●●
●
●
●●
●
●
●
●
●
●
●●
● ●
●●
●●
●
●●
●●
●●
● ●
●
●
●●
● ●
●●
● ●●
●
●
●●●
●
●
● ●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●● ● ●
●●●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
10
20
30
20 30 40hwy
cty
drv●
●
●
4
f
r
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 27 / 33
Where to put what?
Color the points per group of drv⇒ one smoother is fitted> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter(aes(color = drv)) +
+ geom_smooth()
●
●●
●
●
● ●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●
●
● ●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●●●
●
●● ●
●
●
●●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●●● ●
●●
●
●
●●
●●
●
●
●●
● ●●●
●●
●
●
●
●●●
●
●
●●
●●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
● ●●● ●
●
●
●
●●●
●
●
●● ● ●
●●
●
● ● ●
●
●● ●
● ●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
10
20
30
20 30 40hwy
cty
drv●
●
●
4
f
r
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 28 / 33
Where to put what?
Color the smoothers per groug drv
⇒ three smoothers are fitted> ggplot(data = mpg, aes(x = hwy, y = cty)) + geom_jitter() +
+ geom_smooth(aes(color = drv))
●
●
●
●
●
●●●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●●
●●
●
●●
●●
●
●●
● ●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
●●●
●●
● ● ●
●
●
●●
●●
● ● ●●
●
●
●●
●
●
●●
●
●
●●
● ●
●●
●●●
●●
●
●●●
● ●●
●
●●
● ●●●
● ●●●
●
●●●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●●
● ●
●
●
●
●
●●
●
●
●● ●●
●●
●
● ● ●
●
● ●●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
10
20
30
20 30 40hwy
cty
drv
4
f
r
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 29 / 33
Does the order of the layers matter?
Plot the points first as a layer and then add the layer with boxplots:> ggplot(data = mpg, aes(x = class, y = cty)) +
+ geom_jitter(alpha = 0.4) +
+ geom_boxplot(aes(fill = class))
Plot the boxplot first and afterwards add the layer of points:> ggplot(data = mpg, aes(x = class, y = cty)) +
+ geom_boxplot(aes(fill = class)) +
+ geom_jitter(alpha = 0.4)
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 30 / 33
How to save a plot?
The plot to be saved> v <- ggplot(data = mpg, aes(x = class, y = cty)) + geom_boxplot() +
+ geom_jitter(alpha = 0.3)
Plots which are saved in a object can be saved by ggsave().> ggsave(filename = "cool-boxplot-II.png", plot = v)
> # Cool, `ggsave()` recognizes the output format!
> # (pdf, png, jpg, eps, svg)
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 31 / 33
How to save a plot?
Control the width & height and change the path> # the save of the figure can be controlled by
> ggsave(filename = "cool-boxplot-III.jpg", plot = v, width = 5, height = 4,
+ path = "/path/of/figures/")
Alternatively don’t forget print the plot> pdf("cool-boxplot-IV.pdf")
> print(v)
> dev.off()
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 32 / 33
Book Recommendations: R graphics & ggplot
R Graphics CookbookWinston Chang, O’Reilly Media, 2012
and its online companion:http://www.cookbook-r.com/Graphs/
ggplot2: Elegant Graphics for Data Analysis (Use R!)Hadley Wickham, Springer, 2009
and its online companion:http://ggplot2.org/book/
Seminar for Statistics, ETH Zurich (SfS) An introduction to ggplot2 33 / 33