introduction to r

Post on 03-Dec-2014

2.717 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Visualization and Analysis of Big Data with the R Programming Language

Michael E. Driscoll, Ph.D.Presented to AmyrisApril 2009

“The sexy job in the next ten years will be statisticians.”

– Hal Varian, Chief Economist, Google

What can it do?• data manipulation• statistics• visualization

Why is it different?• created by statisticians• free, open source• extensible via packages

What is R?

Statistical Analysis

• hypothesis testing• model fitting• clustering• machine learning

Data Visualization

What is R?

Data Manipulation

• database connectivity• slicing & dicing data cubes

Statistical analysis

• fit models for the distributions of expression values

• test hypotheses about outliers

• cluster genes with similar patterns

Visualization of hybridization artifacts

I. Taming Microarray Data with Bioconductor

http://www.bioconductor.org

1million transactions during this presentation

Statistical analysis

• every customer has a history of product purchases

• hierarchically cluster products and customers

• other approaches (depending on goals): singular value decomposition

Which products are ordered together?

II. Clustering Product Purchases

2 billion clicks during this presentation

Statistical analysis

• estimate posterior distributions for click rates from observed data

• test hypothesis that the click-rate of a given ad A is greater than for ad B

How confident are we that B beats A?

III. Optimizing Online Advertising

IV. A Tale of Two PitchersH

amel

sW

ebb

“The best thing about R is that it was developed by statisticians. The worst thing about R is that…

it was developed by statisticians.”– Bo Cowgill, Google

R Nuts and Bolts

Data Manipulation

Getting Data InSQL• MySQL• ODBC (Oracle, MS-SQL)ExcelMatlab

Getting Data OutData formats:• Delimited (CSV, Excel)• MatlabGraphic formats:• Vector (PDF, EPS, SVG)• Raster (PNG, TIFF)

driver <- dbDriver("MySQL")con <- dbConnect(driver,user=“tgardner”, password=“julien05”,host=“data.amyris.com”, dbname=“biofx”)resultSet <- dbSendQuery(con, “SELECT * FROM assay”)data <- fetch(resultSet, n=-1)

Statistical Methods

Extending R with Packages

CRAN http://cran.r-project.org

• ~ 2000 packages• organized by field• easy to install > install.package( “lattice”)

R Packages: Beautiful Colors with Colorspace

library(“Colorspace”)red <- LAB(50,64,64)blue <- LAB(50,-48,-48)mixcolor(10, red, blue)

R Packages: Creating Panel Plots with Lattice

library(“Lattice”)xyplot(x ~ y | pitch_type, data = gameday)

Getting Started

Download at R-project.org Choose a UI• Emacs – ESS• JGR – Java GUI for R• Rattle

http://www.r-project.org

Getting Help

Books Online• use inline help> ?plot

• search /post at R-helphttp://tolstoy.newcastle.edu.au/R

Modern Applied Statistics with SW.N.Venables & B.D. Ripley

http://www.springer.com/series/6991 Use R series includes 20 volumes

Data

Desktop

Which is Easier?

Coding Clickingor

R-Based Dashboards

A Simple Script

setContentType("text/html")png("/var/www/hello.png")plot(sample(100,100),col=1:8,pch=19)dev.off()cat("<html>")cat("<body>")cat("<h1>hello world</h1>")cat('<img src="../hello.png"')cat("</body>")cat("</html>")

Download Jeff Horner’s Rapache at http://biostat.mc.vanderbilt.edu/rapache/

R-Based Dashboards

http://labs.dataspora.com/gameday

Contacting Us

350 Townsend St, Suite 270San Francisco, CA415-860-4347inquire@dataspora.com

top related