introduction to r

27
Visualization and Analysis of Big Data with the R Programming Language Michael E. Driscoll, Ph.D. Presented to Amyris April 2009

Upload: michael-driscoll

Post on 03-Dec-2014

2.717 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Introduction To R

Visualization and Analysis of Big Data with the R Programming Language

Michael E. Driscoll, Ph.D.Presented to AmyrisApril 2009

Page 2: Introduction To R
Page 3: Introduction To R

“The sexy job in the next ten years will be statisticians.”

– Hal Varian, Chief Economist, Google

Page 4: Introduction To R

What can it do?• data manipulation• statistics• visualization

Why is it different?• created by statisticians• free, open source• extensible via packages

What is R?

Page 5: Introduction To R

Statistical Analysis

• hypothesis testing• model fitting• clustering• machine learning

Data Visualization

What is R?

Data Manipulation

• database connectivity• slicing & dicing data cubes

Page 6: Introduction To R
Page 7: Introduction To R

Statistical analysis

• fit models for the distributions of expression values

• test hypotheses about outliers

• cluster genes with similar patterns

Visualization of hybridization artifacts

I. Taming Microarray Data with Bioconductor

http://www.bioconductor.org

Page 8: Introduction To R

1million transactions during this presentation

Page 9: Introduction To R

Statistical analysis

• every customer has a history of product purchases

• hierarchically cluster products and customers

• other approaches (depending on goals): singular value decomposition

Which products are ordered together?

II. Clustering Product Purchases

Page 10: Introduction To R

2 billion clicks during this presentation

Page 11: Introduction To R

Statistical analysis

• estimate posterior distributions for click rates from observed data

• test hypothesis that the click-rate of a given ad A is greater than for ad B

How confident are we that B beats A?

III. Optimizing Online Advertising

Page 12: Introduction To R
Page 13: Introduction To R

IV. A Tale of Two PitchersH

amel

sW

ebb

Page 14: Introduction To R

“The best thing about R is that it was developed by statisticians. The worst thing about R is that…

it was developed by statisticians.”– Bo Cowgill, Google

R Nuts and Bolts

Page 15: Introduction To R

Data Manipulation

Getting Data InSQL• MySQL• ODBC (Oracle, MS-SQL)ExcelMatlab

Getting Data OutData formats:• Delimited (CSV, Excel)• MatlabGraphic formats:• Vector (PDF, EPS, SVG)• Raster (PNG, TIFF)

driver <- dbDriver("MySQL")con <- dbConnect(driver,user=“tgardner”, password=“julien05”,host=“data.amyris.com”, dbname=“biofx”)resultSet <- dbSendQuery(con, “SELECT * FROM assay”)data <- fetch(resultSet, n=-1)

Page 16: Introduction To R

Statistical Methods

Page 17: Introduction To R

Extending R with Packages

CRAN http://cran.r-project.org

• ~ 2000 packages• organized by field• easy to install > install.package( “lattice”)

Page 18: Introduction To R

R Packages: Beautiful Colors with Colorspace

library(“Colorspace”)red <- LAB(50,64,64)blue <- LAB(50,-48,-48)mixcolor(10, red, blue)

Page 19: Introduction To R

R Packages: Creating Panel Plots with Lattice

library(“Lattice”)xyplot(x ~ y | pitch_type, data = gameday)

Page 20: Introduction To R

Getting Started

Download at R-project.org Choose a UI• Emacs – ESS• JGR – Java GUI for R• Rattle

http://www.r-project.org

Page 21: Introduction To R

Getting Help

Books Online• use inline help> ?plot

• search /post at R-helphttp://tolstoy.newcastle.edu.au/R

Modern Applied Statistics with SW.N.Venables & B.D. Ripley

http://www.springer.com/series/6991 Use R series includes 20 volumes

Page 22: Introduction To R

Data

Desktop

Page 23: Introduction To R

Which is Easier?

Coding Clickingor

Page 24: Introduction To R

R-Based Dashboards

A Simple Script

setContentType("text/html")png("/var/www/hello.png")plot(sample(100,100),col=1:8,pch=19)dev.off()cat("<html>")cat("<body>")cat("<h1>hello world</h1>")cat('<img src="../hello.png"')cat("</body>")cat("</html>")

Download Jeff Horner’s Rapache at http://biostat.mc.vanderbilt.edu/rapache/

Page 25: Introduction To R

R-Based Dashboards

http://labs.dataspora.com/gameday

Page 26: Introduction To R
Page 27: Introduction To R

Contacting Us

350 Townsend St, Suite 270San Francisco, [email protected]