dplyr - about trestle · dplyr + mysql • dplyr views mysql as just another data source •...
TRANSCRIPT
![Page 1: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/1.jpg)
dplyrJeff Allen
Dallas R Users Group7/11/15
@trestleJeffCode for talk: http://tres.tl/dplyrcode
![Page 2: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/2.jpg)
My Background
• Computer Scientist
• First encountered R as a programming language (2007)
• Only later used it for data analysis
• Now a Software Engineer at RStudio (2013)
![Page 3: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/3.jpg)
Your Background
• New to R?
• Intermediate-Advanced R user?
• Used dplyr before?
![Page 4: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/4.jpg)
R Consortium
• Support R Core with development and finances
• Organized by Linux Foundation
• New R-forge, documentation, etc.
• https://www.r-consortium.org/
![Page 5: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/5.jpg)
dplyr
• Open-source R package
• From Hadley Wickham (ggplot2, plyr, devtools, …)
• Grammar of data manipulation• Operates on data.frames
![Page 6: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/6.jpg)
http://www.londonr.org/Presentations/Hadley%20Wickham%20-%20bigr-data-londonr.pdf
Tidy Transform
Visualize
Model
![Page 7: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/7.jpg)
http://www.londonr.org/Presentations/Hadley%20Wickham%20-%20bigr-data-londonr.pdf
Tidy Transform
Visualize
Model
tidyrdplyr
ggplot2ggvis
The rest of R…
![Page 8: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/8.jpg)
Motivation
• Unified syntax, captures 90% of data transformation tasks
• Consistent interface (great for “piping”)
• Performance (up to100x in certain cases)
• More to come…
![Page 9: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/9.jpg)
Data Intake
• At simplest: A special data.frame
• All the same properties of a data.frame
• tbl_df(myDataFrame)
![Page 10: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/10.jpg)
Fundamental verbs• select - subset columns
• filter - subset rows
• mutate - add new columns
• arrange - re-order rows
• summarize - reduce to single row
• group_by - “bin” data
![Page 11: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/11.jpg)
Fundamental verbs• select - subset columns
• filter - subset rows
• mutate - add new columns
• arrange - re-order rows
• summarize - reduce to single row
• group_by - “bin” data
![Page 12: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/12.jpg)
select
• Take a subset of columns
• Use column names without quotes
• “-“ to exclude a variable
• starts_with(), ends_with(), matches(), …
![Page 13: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/13.jpg)
Fundamental verbs• select - subset columns
• filter - subset rows
• mutate - add new columns
• arrange - re-order rows
• summarize - reduce to single row
• group_by - “bin” data
![Page 14: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/14.jpg)
filter
• Take a subset of rows
• Use regular R Boolean vector logic
![Page 15: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/15.jpg)
Fundamental verbs• select - subset columns
• filter - subset rows
• mutate - add new columns
• arrange - re-order rows
• summarize - reduce to single row
• group_by - “bin” data
![Page 16: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/16.jpg)
mutate
• Add new columns
• Potentially based on existing columns
![Page 17: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/17.jpg)
A Brief Interruption• Pipes offer an alternative syntax to nest functions
• Comes from the magrittr package
baz(foo(a=1),b=2
)
foo(a=1) %>% baz(b=2)==
foo(a=1) %>% baz( ,b=2)}
![Page 18: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/18.jpg)
tumble_after(
broke(
fell_down(
fetch(
went_up(jack_jill, "hill"),
"water"),
jack
),
"crown"),
"jill"
)
![Page 19: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/19.jpg)
jack_jill %>%
went_up("hill") %>%
fetch("water") %>%
fell_down("jack") %>%
broke("crown") %>%
tumble_after("jill")
![Page 20: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/20.jpg)
Fundamental verbs• select - subset columns
• filter - subset rows
• mutate - add new columns
• arrange - re-order rows
• summarize - reduce to single row
• group_by - “bin” data
![Page 21: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/21.jpg)
arrange
• Sort rows
• Use desc() to sort in decrementing order
![Page 22: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/22.jpg)
Fundamental verbs• select - subset columns
• filter - subset rows
• mutate - add new columns
• arrange - re-order rows
• summarize - reduce to single row
• group_by - “bin” data
![Page 23: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/23.jpg)
summarize
• Aggregate data into a single row
• Provide a summarization function for each column you want to keep
• Special functions like n() to get the count
![Page 24: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/24.jpg)
Fundamental verbs• select - subset columns
• filter - subset rows
• mutate - add new columns
• arrange - re-order rows
• summarize - reduce to single row
• group_by - “bin” data
![Page 25: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/25.jpg)
group_by
• Bin data into independent sets
• By itself, doesn’t change the data
• Perform further actions — such as summarize() — independently on each group
![Page 26: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/26.jpg)
nycflights13
![Page 27: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/27.jpg)
Joins
• Bind data from two tables together
• left_join(), right_join(), inner_join(), full_join(), …
• Concatenates columns together for rows that have corresponding keys
![Page 28: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/28.jpg)
JoinsUser Age Dept
joe 41 QA
kim 39 IT
steve 32 IT
Dept Room#
IT 307
QA 410
![Page 29: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/29.jpg)
JoinsUser Age Dept
joe 41 QA
kim 39 IT
steve 32 IT
Dept Room#
IT 307
QA 410
![Page 30: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/30.jpg)
Joins
User Age Dept Room#
joe 41 QA 410
kim 39 IT 307
steve 32 IT 307
User Age Dept
joe 41 QA
kim 39 IT
steve 32 IT
Dept Room#
IT 307
QA 410
![Page 31: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/31.jpg)
Join Key Collisions
![Page 32: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/32.jpg)
User Age Dept
joe 41 QA
kim 39 IT
steve 32 IT
Dept Room#
IT 307
QA 410
![Page 33: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/33.jpg)
User Age Dept
joe 41 QA
kim 39 IT
steve 32 IT
Dept Room# Age
IT 307 15
QA 410 7
![Page 34: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/34.jpg)
User Age Dept
joe 41 QA
kim 39 IT
steve 32 IT
Dept Room# Age
IT 307 15
QA 410 7
User Age Dept Room#
![Page 35: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/35.jpg)
User Age Dept
joe 41 QA
kim 39 IT
steve 32 IT
Dept Room# Age
IT 307 15
QA 410 7
User Age Dept Room#
joe 41 QA 410
kim 39 IT 307
steve 32 IT 307
by=“Dept”
![Page 36: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/36.jpg)
Data Sources
✔ Local data.frame or data.table
⃞ Local SQLite database
⃞ Remote MySQL/PostgreSQL database
⃞ Google BigQuery, Amazon RedShift, MonetDB
![Page 37: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/37.jpg)
dplyr + MySQL
• dplyr views MySQL as just another data source
• translate_sql() does the behind-the-scenes magic
• Converts what it can to a SQL query
• Runs everything else locally in R
![Page 38: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/38.jpg)
Lazy Evaluation
• dplyr avoids executing queries until it absolutely has to
• Use explain() to ask the RDBMS about the execution plan for this query.
• Use collect() to force evaluation
![Page 39: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/39.jpg)
What’s Next?
![Page 40: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/40.jpg)
ggvis
![Page 41: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/41.jpg)
ggvis• Successor to ggplot2
• Same “grammar of graphics.” Updated syntax
• Of the Web — runs in a browser
• Built-in reactivity
• Pipeable, like dplyr
• http://ggvis.rstudio.com/
![Page 42: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/42.jpg)
leaflet
![Page 43: dplyr - About Trestle · dplyr + MySQL • dplyr views MySQL as just another data source • translate_sql() does the behind-the-scenes magic • Converts what it can to a SQL query](https://reader035.vdocuments.us/reader035/viewer/2022081402/5f154f1163fb1f6e816e0bd8/html5/thumbnails/43.jpg)
leaflet
• R package for creating interactive maps
• A new major release recently
• Trivial to use