2014 july use_r
TRANSCRIPT
R In Production: the productsYasmin Lucero, PhD
Senior Statistician, Gravity-AOL
UserR! 2014
Outline• Internal products
• 1. one-off analysis• 2. automated reports• 3. internal R packages• 4. internal dashboards
• External products• 1. customer facing web-app• 2. analytical backend service
• Ops and the managing of an R environment
Internal Product 1: one-off analytical product
http://rpubs.com/nathanesau1/21383Nathan Esau
Hilary Parker
Internal Product 2: Automated reports
Thursday morning:Automated Business Reporting with R (Zhengying (Doro) Lour)
R + bash + emailR + markdown + web server
Internal Product 3:The Internal R package
• Data APIs• Business specific metrics• Custom plotting functions• Custom data manipulation utilities
Thursday Morning: An R tools platform in Cosmetic Industry (Jean-Francois Collin)
Internal Product 4:The internal dashboard
Gravity-AOL
External Product 1: Customer facing web app
Wednesday afternoonRapid Prototyping with R/Shiny at McKinsey (Aaron Horowitz)
http://www.showmeshiny.com/
External Product 2: analytical back-end
Wed afternoon:Deploying R into Business Intelligence and Real-time Applications (Louis Bajuk-Yorgan)Zillow’s Big Data and Real-time Services in R (Yeng Bun)
Artwork &
Brands
BankPartner
Transactions
CARD.COMSite / App
CARD.COMAdTech Platform
APIs
RTB Ad Xchgs
CARD.COMAnalytics Platform
Members
Visitors
1
2
3
Details: card.com/useR-2014
pre
dic
t
deploy
learn
CARD.com
More good example applications:• http://blog.revolutionanalytics.com/2014/06/how-data-
driven-companies-use-r-to-compete.html
Ops: Managing an R Environment• Overall: not complex, but there are pain points:
• R library management• CRAN, non-CRAN and internal packages• Version management• Dependency management (pulling all dependencies)
• Non-R dependencies (especially C++ and Java)• Hardware specifications: How much RAM is enough?
Conclusion: Why R?• Plotting• Rich analytical library
• More than a DSL: end to end functionality from data APIs to web apps
• Solid IDE support• Sturdy, stable easy to support platform• Rapid prototyping
Thanks.
Tools: plotting• Major frameworks
• Base graphics• lattice• ggplot2
• Useful utilties• grid/gridExtra/gtable• latticeExtra• Color: RColorBrewer/munsell/colorspace/dichromat• gplots (the ‘g’ school)• plotrix
• Custom plots• plot.ts• maps• igraph (network visualization)• ggmap• ggvis: interactive graphics• rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github)• rgl (3d)/scatterplot3d• vcd (categorical data)
Tools: data manipulation• Base R features
• Data structures: the data.frame • Vectorized data manipulation: apply, tapply, lapply…• Data structures: ts• Comprehensive, elegant missing data handling (NA)
• Packages• Wickham school: reshape2/plyr/dplyr/tidyr• data.table• Time series: zoo, xts, lubridate• Spatial data tools: sp/maptools• The ‘G’ school: gdata
Tools: Data interfaces• Connections: read.table(); url()• DBI: RpostgresSQL; RMySQL; RSQLite;…• RODBC; RJDBC: (vertica, redshift)• Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; … • yaml, XML, rjson, RJSONIO, • MS Excel: xlsx, XLConnect• SAS, SYSTAT, SPSS, Stata…: foreign• Rcurl• RProtoBuf: Efficient cross-language data serialization in R
Tools: Package development• Package development:
• package.skeleton(); tools (base package)• pkgKitten (CRAN): improvements to package.skeleton• devtools (CRAN) : miscellaneous and very useful tools• gtools: various R programming tools• roxygen2 (CRAN): literate documentation• testthat/testR: unit testing• IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
Tools: Web development & reporting• Shiny• Interactive documents
• Knitr• Sweave
Tools: parallel computing• parallel: lots of features formerly distributed among
packages have recently been collected into this base R package
• Revolution analytics• Map-Reduce: rmr/rhadoop• H20 (hexadata)• SparkR (not on CRAN yet, look on github)
Tools: big or out of memory computing
• dplyr: supports database backed data structures• ff: supports file based data • biglm/bigmemory: shared memory matrices• HadoopStreaming
Tools: memory profiling• lineprof• profr• proftools• object.size()