data scientist agill@mango-solutions - londonr · available in ggplot2 •use aesthetics (colour,...
TRANSCRIPT
• Produce standard graphics and understand the range of visualisations available in ggplot2
• Use aesthetics (colour, shape, size) to add information to a visualisation
• Create analytical visualisations by groups (small multiples)
• Use R functionality to export high resolution graphics
library(tidyverse)
install.packages(“tidyverse”)
github.com/rfordatascience/tidytuesday
https://ig.ft.com/sites/visual-history-of-womens-tennis/
bit.ly/GrandSlams
grand_slams <-
read_csv("grand_slams.csv")
ggplot(data = grand_slams,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point()
ggplot(data = grand_slams,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point()
Function used to create the skeleton structure
of a graphic
ggplot(data = grand_slams,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point()
Define the data that will be the basis of our
plot
ggplot(data = grand_slams,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point()
How do the variables in our data relate to
aesthetics in the plot?
ggplot(data = grand_slams,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point()
Use the aes helper function to define how variables relate to plot
elements
ggplot(data = grand_slams,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point()
Define the type of plot by adding geom
functions as layers
• Creates a plot skeleton when we define:
1. what data we want to use
2. how to map variables in the data to aesthetics in the plot
• Defines the type of plot we want to create
• Added as layers with "+"
• We can include multiple elements by continuing to add them as layers
www.rstudio.com/resources/cheatsheets/
ggplot(data = grand_slams,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point() +
geom_hline(aes(yintercept = 10),
colour = "red")
• Using the mpg data (in the ggplot2 package) create a scatter plot of city miles per gallon against highway miles per gallon
• Add a smooth line to this plot
• Can you figure out how to change this to use linear regression as the smoothing method?
• Anything that defines the look and feel of the plot:
– x, y, z etc
– colour, fill
– shape, linetype
– size (inc. line size)
– alpha (aka. opacity)
more_than_10_wins <-
grand_slams %>%
group_by(name) %>%
filter(any(
rolling_win_count > 10))
ggplot(data = more_than_10_wins,
mapping = aes(
x = tournament_date,
y = rolling_win_count)) +
geom_point(aes(colour = name))
• Using the scatter plot of cty against hwy, colour the points by drv (whether front, rear or 4 wheel drive.
• How does the plot differ if you define the colour in the ggplot function as opposed to the geom_point layer?
www.rstudio.com/resources/cheatsheets/
• Counting is done automatically for us from the raw data
ggplot(data = more_than_10_wins,
aes(x = name)) +
geom_bar()
• Counting is done automatically for us from the raw data
• Change bar colour with fill
ggplot(data = more_than_10_wins,
aes(x = name)) +
geom_bar(aes(fill = grand_slam))
• Create from pre-counted data with
stat = "identity"
• Side by side categories with
position = "dodge"
• Use "path" to join by appearance in the data, "line" to join by x-axis value
more_than_10_wins <- more_than_10_wins %>%
group_by(name) %>%
mutate(
first_win = min(tournament_date),
days_since_first = as.numeric(
tournament_date - first_win))
ggplot(data = more_than_10_wins,
mapping =
aes(x = days_since_first,
y = rolling_win_count)) +
geom_line()
• For individual lines for each of some group we need to define the group
ggplot(data = more_than_10_wins,
mapping = aes(
x = days_since_first,
y = rolling_win_count)) +
geom_line(aes(group = name))
• Create a bar chart of the number of cars in each class
• Update the plot so that you can compare the year (remember you will need to fill, and the variable should be a factor)
• Can you update your plot so that bars for each year appear side by side?
• Create multiple plots that can be easily compared
• In ggplot2 this is faceting
• We can either facet into a grid structure or a table structure
• Most appropriate depends on the data
ggplot(data = more_than_10_wins,
mapping =
aes(x = days_since_first,
y = rolling_win_count)) +
geom_line(aes(colour = name)) +
facet_grid(rows = vars(gender))
ggplot(data = more_than_10_wins,
mapping =
aes(x = days_since_first,
y = rolling_win_count)) +
geom_line(aes(colour = name)) +
facet_grid(rows = vars(gender))
Saying we want these variables to have one row for each category
ggplot(data = more_than_10_wins,
mapping =
aes(x = days_since_first,
y = rolling_win_count)) +
geom_line(aes(colour = name)) +
facet_grid(rows = vars(gender))
Could also use cols, or both rows and cols
ggplot(data = more_than_10_wins,
mapping =
aes(x = days_since_first,
y = rolling_win_count)) +
geom_line(aes(colour = name)) +
facet_grid(rows = vars(gender))
Use the vars helper function to define the variables in the data
ggplot(data = more_than_10_wins,
mapping =
aes(x = days_since_first,
y = rolling_win_count)) +
geom_line() +
facet_wrap(vars(name))
• Create a scatter plot of cty against hwy as previously.
• Create a facetted version of this plot, splitting by class. Try both the facet_grid and facet_wrapfunctions. Which is more suitable for this graphic?
• Set the labels
• Consider the scales
• Think about the theme
bbc.github.io/rcookbook/
• Use the labs function to set:
– x, y axis labels
– legend titles (colour, shape, size etc.)
– title, subtitle
– caption
ggplot(data = more_than_10_wins,
mapping = aes(
x = days_since_first,
y = rolling_win_count)) +
geom_line(aes(colour = name)) +
facet_grid(rows = vars(gender)) +
labs(x = "Number of Days Since First Title",
y = "Total Number of Grand Slam Titles",
colour = "Player") +
scale_colour_viridis_d() +
theme_bw()
• The scale_* family can set:
– Exact choice of colours
– Break points of axis
– Labels on legends and axis
– Much more!
• Some default functions exist to help
• The theme function sets:
– backgrounds & borders
– grid lines
– axis text rotation
– legend position
– title positions
– Over 80 graphic elements!
• Selection of default functions available
ggplot(data = more_than_10_wins,
mapping = aes(
x = days_since_first,
y = rolling_win_count)) +
geom_line(aes(colour = name)) +
facet_grid(rows = vars(gender)) +
labs(x = "Number of Days Since First Title",
y = "Total Number of Grand Slam Titles",
colour = "Player") +
scale_colour_viridis_d() +
theme_bw()
• Use function that reflects the file type (png, jpeg, pdf, etc.)
• Control:
– Width & Height
– Quality (resolution)
• Need to control graphics devices
png( filename = "TotalWinsByPlayer.png",
width = 600,
height = 350,
res = 100)
# Code to create plot goes here
dev.off()
png( filename = "TotalWinsByPlayer.png",
width = 600,
height = 350,
res = 100)
# Code to create plot goes here
dev.off()
Open a connection to a new graphics device
(place to send your plot)
png( filename = "TotalWinsByPlayer.png",
width = 600,
height = 350,
res = 100)
# Code to create plot goes here
dev.off()
Create the plot!
png( filename = "TotalWinsByPlayer.png",
width = 600,
height = 350,
res = 100)
# Code to create plot goes here
dev.off()Close the connection to
the plot
• Using one of the graphics you have created today, set the labels to be appropriate for the graphic
• Export a png of your graphic
Cheat Sheet
www.rstudio.com/resources/cheatsheets
Practice Data
github.com/rfordatascience/tidytuesday
In Production Example
bbc.github.io/rcookbook/
Aimee Gott