r for pirates. escconf october 27, 2011
TRANSCRIPT
![Page 1: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/1.jpg)
R for PiratesMandi Walls
@lnxchkEscConf, Boston, MA
October 27, 2011
![Page 2: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/2.jpg)
whoami
• stats misfit
• R tinkerer
• large-farm runner
• not a professional statistician :D
![Page 3: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/3.jpg)
What is R
• Scripting language for stats work
• Inspired by earlier S (for statistics) developed at AT&T
• FOSS
• Syntax inherits through Algol family, so looks somewhat like C/C++
![Page 4: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/4.jpg)
What Does R Do?
• Manipulate data
• Complex Modeling and Computation
• Graphics and Visualization
![Page 5: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/5.jpg)
Why R?
• WHY NOT!?
![Page 6: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/6.jpg)
But Other Math Stuff!
• Mathematica
• MatLab
• Minitab
• MAPLE
• Excel (yes. shutup h8rs. ask your CFOs what they use)
• R provides sophisticated statistical and modeling capabilities, and is extendible through your own code
![Page 7: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/7.jpg)
Get R
• Available for Linux, Mac, Windows
• http://www.r-project.org/
![Page 8: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/8.jpg)
Fire!
• R console on Mac
• Interactive interpreter for your R needs
• Can also run from the command line: R
![Page 9: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/9.jpg)
R Basics
• R considers all elements to be vectors
• A single number is a one-element vector
• Use <- for assignment
• Use c() to concatenate values into a vector
![Page 10: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/10.jpg)
Let’s see that again
![Page 11: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/11.jpg)
Practice Datasets
• data()
• shows the sample sets included with your R
![Page 12: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/12.jpg)
Functions
• Looks familiar!
• Let’s see one!
• “evencount” counts the number of even ints in a vector
![Page 13: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/13.jpg)
![Page 14: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/14.jpg)
Datatypes
• Vectors, the important ones
• Scalars are really single-element vectors
• Character strings
• Matrices, rectangular arrays of numbers
• Lists
• Tables, useful for data transitions and temp work
![Page 15: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/15.jpg)
Vectors
• R’s most-used data structure
• All elements in a vector must have the same mode or data type
• To add values to a vector, you concatenate into it with the c() function
• Many mathematical functions can be performed on a vector, they can also be traversed like arrays
• Index starts at 1, not 0!
![Page 16: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/16.jpg)
Scalars
• One-element vectors
> x <- 8
> x[1]
[1] 8
• also climb your rigging
©Disney.
![Page 17: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/17.jpg)
Character Strings• Single-element vectors
with mode character
> y <- "abc"
> length(y)
[1] 1
> mode(y)
[1] "character"
• Can do normal string things, like
> t <- paste("yo","dawg")
> t
[1] "yo dawg"
> u <- strsplit(t,"")
> u
[[1]]
[1] "y" "o" " " "d" "a" "w" "g"
![Page 18: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/18.jpg)
Matrices• Two-dimensional array
> m <- rbind(c(1,4),c(2,2))
> m
[,1] [,2]
[1,] 1 4
[2,] 2 2
> m[1,2]
[1] 4
> m[1,]
[1] 1 4
![Page 19: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/19.jpg)
Lists• Contain elements of different types
• Have a particular syntax
> x <- list(u=2, v="abc")> x$u[1] 2
$v[1] "abc"
> x$u[1] 2
![Page 20: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/20.jpg)
Data Frames• Matrices are limited to only a single type for all elements
• A data frame can contain different types of data, can be read in from a file or created in realtime> df <- data.frame(list(kids=c("Olivia","Madison"),ages=c(10,8)))
> df
kids ages
1 Olivia 10
2 Madison 8
> df$ages
[1] 10 8
![Page 21: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/21.jpg)
Putting R to Work
• Read in a log file:access <- read.table("access.log", header=FALSE)
> head(access)
V1 V2 V3 V4 V5 V6 V7 V8
1 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 401 401
2 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 200 1970
3 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.css HTTP/1.1 200 2258
![Page 22: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/22.jpg)
Fun with Plots
• This plot series is going to make use of the “return codes” from the access log
• We’ll do a series of plots that gradually get more sophisticated
• This is a basic histogram of the data, it’s not much fun
![Page 23: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/23.jpg)
Barplotbarplot(table(access[,7]))
![Page 24: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/24.jpg)
Barplot v2barplot(table(access[,7]),ylab="Number of Pages",xlab="Return Code",main="Plot of Return Codes")
![Page 25: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/25.jpg)
Barplot v3barplot(table(access[,7]),ylab="Number of Pages",xlab="Return Code",main="Plot of Return Codes", col=heat.colors(length(x)))
![Page 26: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/26.jpg)
Barplot v4
Source: wikipedia, http://en.wikipedia.org/wiki/Bar_%28establishment%29
![Page 27: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/27.jpg)
Writing Graphical Output to Files
• Set up the output target by calling a graphics function:
• pdf(), png(), jpeg(), etc
• jpeg(“/var/www/images/returncodes-date.jpg”)
• Call the plot function you have chosen, then call dev.off()
• Can be used in batch mode to create graphics from your data
![Page 28: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/28.jpg)
Shopping is Hard, Let’s Do Math
• Read in some load averages (one-min)
loadavg<-read.table("load_avg.txt")
head(loadavg) V11 3.792 3.113 2.944 4.81
![Page 29: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/29.jpg)
Summary Stats
• Summarize the data with one function call
• Gives the min, max, mean, median, and quartilessummary(loadavg) V1 Min. :0.760 1st Qu.:1.390 Median :1.970 Mean :2.302 3rd Qu.:3.080 Max. :5.070
![Page 30: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/30.jpg)
Summary Stats as Boxplot
![Page 31: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/31.jpg)
Same Thing, 3 Datacenters
> cpu<-read.table("cpu")
> head(cpu)
V1 V2
1 3.78 smq
2 2.57 smq
3 3.69 smq
4 0.86 smq
• Looks like there’s outliers. That could spell trouble! You found them with R awesomeness. Horay!
boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=topo.colors(3))
![Page 32: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/32.jpg)
Running R in Your Workflow
• The little bit of boxplotting we did eariler, in a script:
[mandi@mandi ~]$ cat sample.R#!/usr/bin/env Rscriptcpu<-read.table("cpu")jpeg("./sample.jpg")boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=heat.colors(3))dev.off()[mandi@mandi ~]$ Rscript sample.R > /dev/null[mandi@mandi ~]$ ls -l sample.jpg -rw-rw-r-- 1 mandi staff 20137 Oct 24 20:44 sample.jpg
![Page 33: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/33.jpg)
Hey!
• I made a graph with a script!
![Page 34: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/34.jpg)
What Else?• R can read data input from a variety of files with regular
formats
• R can also fetch data from the internet using the url() function
• R has a number of functions available for dealing with reading data, creating data frames or other structures, and converting string text into numerical data modes
• Extended packages provide support for structured data formats like JSON.
![Page 35: R for Pirates. ESCCONF October 27, 2011](https://reader034.vdocuments.us/reader034/viewer/2022050614/554bca96b4c905706a8b465a/html5/thumbnails/35.jpg)
References
• http://www.slideshare.net/dataspora/an-interactive-introduction-to-r-programming-language-for-statistics
• http://www.harding.edu/fmccown/R/
• Art of R Programming, Norman Matloff, Copyright 2011 No Starch Press
• Statistical Analysis with R, John M. Quick, Copyright 2011 Packt Publishing