an introduction to r - yale universityjay/brazil/campinas/introtor/introtor.pdfjohn w. emerson an...

38
Minimal Sufficient R Example 1: Nationalistic Bias in Olympic Judging Example 2: Gambling on College Basketball Conclusion An Introduction to R John W. Emerson http://www.stat.yale.edu/~jay/ Associate Professor of Statistics, Yale University (Professor Emerson prefers to be called “Jay”) These slides complement the talk; they may be a useful reference, but may also be confusing on their own without the benefit of discussion. They are not intended as a substitute for an exhaustive book-like presentation of the language and its syntax, if you prefer that approach. Please feel free to ask questions along the way! Intended audience: R newbies who have some prior programming/scripting experience. http://www.stat.yale.edu/~jay/Brazil/Campinas/IntroToR/ John W. Emerson An Introduction to R

Upload: others

Post on 17-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

An Introduction to R

John W. Emerson

http://www.stat.yale.edu/~jay/Associate Professor of Statistics, Yale University(Professor Emerson prefers to be called “Jay”)

These slides complement the talk; they may be a useful reference, but may also be confusingon their own without the benefit of discussion. They are not intended as a substitute for an

exhaustive book-like presentation of the language and its syntax, if you prefer that approach.

Please feel free to ask questions along the way!

Intended audience: R newbies who have some prior programming/scripting experience.

http://www.stat.yale.edu/~jay/Brazil/Campinas/IntroToR/

John W. Emerson An Introduction to R

Page 2: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

Outline

1 Minimal Sufficient RPreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

2 Example 1: Nationalistic Bias in Olympic Judging

3 Example 2: Gambling on College Basketball

4 Conclusion

John W. Emerson An Introduction to R

Page 3: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Preliminaries: R? Why? What is it? Is there a GUI?

R is the lingua franca of statistics.It is a language and environment for statistical programming thatis ideal for interactive data analysis and graphics, and much,much more.It is extended by a large collection of packages.If you want a GUI, there are some options. But that misses point.GUI 6→ reproducible research. Don’t go there.

John W. Emerson An Introduction to R

Page 4: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Preliminaries: available resources

The R Project: http://www.r-project.org/Official Documentation:http://cran.r-project.org/manuals.htmlContributed Documentation:http://cran.r-project.org/other-docs.htmlOther resources linked on CRAN: Frequently Asked Questions(FAQs), the R Journal, a Wiki, Books, etc...

Another R community site: http://crantastic.org/Sweave: http://www.statistik.uni-muenchen.de/~leisch/Sweave/

Reproducible research: http://cran.r-project.org/web/views/ReproducibleResearch.html

John W. Emerson An Introduction to R

Page 5: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Preliminaries: the command line interface

The command line is your window into R: > is the prompt, and +indicates the continuation of a command.Escape (Esc) in Windows or Control-C in Linux/Mac can “kill”the command you are typing or (often) some task that is running.Arrow keys can be used to scroll through the history ofcommands, reducing the need to re-type in cases where you canedit a recent command and try again.Find a good plain-text editor (Windows Notepad or Wordpad,Linux vi or gedit, etc...) for developing scripts that representthe true work and value of any project.Do not use Microsoft Word! It does screwy things with quotes.At the most basic level, the command line is a calculator:

> 1 + 2 - 3 * (4 + 5)/6^7

[1] 2.999904

John W. Emerson An Introduction to R

Page 6: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Vectors and the major atomic types; assignments

> x <- c(2, 4, 8, 16, 32)> y = rep("Y", length(x))> y[3] <- "MIT"> z <- y == "MIT"> x

[1] 2 4 8 16 32

> y

[1] "Y" "Y" "MIT" "Y" "Y"

> z

[1] FALSE FALSE TRUE FALSE FALSE

QUESTION: what is the result of x[z]?

John W. Emerson An Introduction to R

Page 7: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Vectors and the major atomic types; assignments

What is the result of x[z]? Try it!

> x

[1] 2 4 8 16 32

> z

[1] FALSE FALSE TRUE FALSE FALSE

> x[z]

[1] 8

John W. Emerson An Introduction to R

Page 8: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Vectors and subsetting

> x <- 2^(1:5)> x

[1] 2 4 8 16 32

> x[c(2, 4)]

[1] 4 16

> x[-c(1, 3, 5)]

[1] 4 16

> x[c(FALSE, TRUE, FALSE, TRUE, FALSE)]

[1] 4 16

> x[c(F, T, F, T, F)]

[1] 4 16

John W. Emerson An Introduction to R

Page 9: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Vector logical comparisons

> x <- runif(4)> x

[1] 0.3555607 0.4259428 0.5169406 0.3673003

> y <- rep(c(TRUE, FALSE), 2)> y

[1] TRUE FALSE TRUE FALSE

> (x > 0.5)

[1] FALSE FALSE TRUE FALSE

QUESTION: what is the result of (x > 0.5) & y?

John W. Emerson An Introduction to R

Page 10: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Vector logical comparisons

What is the result of (x > 0.5) & y? Try it!

> x

[1] 0.3555607 0.4259428 0.5169406 0.3673003

> (x > 0.5)

[1] FALSE FALSE TRUE FALSE

> y

[1] TRUE FALSE TRUE FALSE

> (x > 0.5) & y

[1] FALSE FALSE TRUE FALSE

John W. Emerson An Introduction to R

Page 11: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Missing values: be cautious...

> x[2] <- NA> x

[1] 0.7176185 NA 0.3800352 0.7774452

> x == NA # Never do this, because...

[1] NA NA NA NA # "I don't know" and anything is# "I really don't know"

> x[x > 0.5] # Be careful when there are NA values

[1] 0.7176185 NA 0.7774452

> x[!is.na(x) & x > 0.5]

[1] 0.7176185 0.7774452

> subset(x, x > 0.5)

[1] 0.7176185 0.7774452

John W. Emerson An Introduction to R

Page 12: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Quick review of some points thus far

Basic atomic types: numeric, logical, string (text)I recommend <- instead of = for assignmentsSave = for function named arguments (discussed later)Integers, logical vectors, and negative numbers can all be usedfor indexing subsetsis.na is your friend when there are missing valuesUse & for vector logical comparisonsUse == for vector comparison of equivalent values (but not withNA values)R is case-sensitive; x and X are different objects!

John W. Emerson An Introduction to R

Page 13: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Objects: data.frame for data sets

See data() for a list of data sets accompanying R, such as CO2; seehelp("CO2") for more information on this data set; it is adata.frame object:> is.data.frame(CO2)

[1] TRUE

> dim(CO2)

[1] 84 5

> CO2[sample(1:nrow(CO2), 3), ]

Plant Type Treatment conc uptake14 Qn2 Quebec nonchilled 1000 44.312 Qn2 Quebec nonchilled 500 40.650 Mn2 Mississippi nonchilled 95 12.0

John W. Emerson An Introduction to R

Page 14: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Accessing subsets of a data.frame

> summary(CO2$conc)

Min. 1st Qu. Median Mean 3rd Qu. Max.95 175 350 435 675 1000

> table(CO2[, 2])

Quebec Mississippi42 42

> plot(CO2$conc, CO2$uptake,+ main = "CO2 Concentration vs Uptake",+ xlab = "Concentration of Ambient CO2 (mL/L)",+ ylab = "CO2 Uptake (umol/m^2 sec)")>

John W. Emerson An Introduction to R

Page 15: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Our first plot

●●

●●

●●

● ●●

●●

●●

●●

● ●

●● ● ●

●●

● ●

●●

●●

●●

● ● ● ●●

200 400 600 800 1000

1020

3040

Carbon Dioxide Uptake in Grass Plants

Concentration of Ambient CO2 (mL/L)

CO2

Upta

ke (u

mol/

m^2

sec)

John W. Emerson An Introduction to R

Page 16: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

A package and factors (output truncated, sorry)

The term pure factor is my own, showing no mixture of characteran numeric values in the factor levels; R’s factors (categoricalvariables) may have ordered levels, however.> install.packages("YaleToolkit") # OUTPUT OMITTED> library(YaleToolkit)> whatis(CO2)[, 1:4] # Only a few columns shown here

variable.name type missing distinct.values1 Plant ordered factor 0 122 Type pure factor 0 23 Treatment pure factor 0 24 conc numeric 0 75 uptake numeric 0 76

> levels(CO2$Type)

[1] "Quebec" "Mississippi"

John W. Emerson An Introduction to R

Page 17: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Advice on using factor objects in R

A factor is stored as vectors of integer level numberings andan associated set of factor level labels.There are strengths and weaknesses of this approach, so becareful. For example, numeric factor levels can be useful:> as.numeric(factor(c("C", "A", "A", "B")))

[1] 3 1 1 2

I recommend avoiding factors unless:You get to the point you are fitting models (with lm(), for example)You are using function notation (˜) with some plot routinesYou really know what you are doing

I recommend the as.is=TRUE option to read.table() andread.csv(), discussed later, to produce vectors of strings (typecharacter) rather than factors in data frames.

John W. Emerson An Introduction to R

Page 18: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

ListsA list is an ordered collection of objects called components. Thesecomponents can be named and are always numbered. Adata.frame is actually a list where each component is of thesame length.> x <- list(name = "Jay", father = "John", numcars = 3,+ cars = c("Camry", "Accord", "Miata"))> length(x)

[1] 4

> x$father

[1] "John"

> x[[2]]

[1] "John"

> x[2]

$father[1] "John"

John W. Emerson An Introduction to R

Page 19: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Lists

> x <- list(name = "Jay", father = "John", numcars = 3,+ cars = c("Camry", "Accord", "Miata"))> x[[1]]

[1] "Jay"

> x[1:2]

$name[1] "Jay"

$father[1] "John"

Question: which is ok? x[[4]][1:2] or x[4][[1:2]]?

John W. Emerson An Introduction to R

Page 20: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Answer: bafflingly, both work, but...

> x <- list(name = "Jay", father = "John", numcars = 3,+ cars = c("Camry", "Accord", "Miata"))>> x[[4]][1:2] # No problem, this makes sense.

[1] "Camry" "Accord"

>> x[4][[1:2]] # This doesn't seem well-defined to me

[1] "Accord"

>> x[4][[2]] # This error seems reasonable to me

Error in x[4][[2]] : subscript out of bounds

John W. Emerson An Introduction to R

Page 21: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Matrices (see also: help("array")> x <- matrix(rnorm(6, 10, 3), 2, 3)> x

[,1] [,2] [,3][1,] 13.14202 12.26642 6.992394[2,] 12.72067 14.76067 12.868565

> y <- (x - mean(x))/sd(x)> y

[,1] [,2] [,3][1,] 3.4130422 0.0340064 -2.9102096[2,] 0.3376671 8.8457165 0.1789235

> solve(y %*% t(y))

[,1] [,2][1,] 0.0497308580 -0.0005916098[2,] -0.0005916098 0.0127633223

John W. Emerson An Introduction to R

Page 22: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Matrices: underlying storage (important for C/C++)An R matrix is fundamentally a vector with column-major storage(column 1 followed by column 2 and so on).> x <- matrix(1:6, nrow = 2, ncol = 3)> x

[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6

> as.numeric(x)

[1] 1 2 3 4 5 6

> y <- matrix(1:6, nrow = 2, byrow = TRUE)> y

[,1] [,2] [,3][1,] 1 2 3[2,] 4 5 6

John W. Emerson An Introduction to R

Page 23: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Graphics: one slide summary, with apologies...

Basics: plot(), hist(), boxplot()Scatterplot matrix (or pairs plot): pairs()Quantile plots: qqnorm(), qqplot()Mosaic plots (for graphical displays of contingency tables):mosaicplot()

The above are for base R graphics. Also available:With R: grid (low-level), lattice (higher-level, built on grid)Additional package from CRAN: ggplot2Others, partly or entirely external to R that I’ve used: Mondrian,iPlots eXtreme

John W. Emerson An Introduction to R

Page 24: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Help!

Help is worth a formal comment:help("lm") is equivalent to ?lm

help.search("regression"): fuzzy matching of the wordregression in the help pages, more forgiving than help()

Each help page contains See also: advice which can lead torelated (and often helpful) commands.Each help page contains examples at the bottom which can becopied into the commands window and explored. This isunderappreciated, and very, very useful!

John W. Emerson An Introduction to R

Page 25: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Loops (and related loop-like abilities)

> x <- matrix(rnorm(100), 4, 4)> cmeans <- rep(0, ncol(x))> for (i in 1:ncol(x)) {+ cmeans[i] <- mean(x[, i])+ }> cmeans

[1] -0.3988443 0.2522960 0.7115236 -0.5938051

> apply(x, 2, mean)

[1] -0.3988443 0.2522960 0.7115236 -0.5938051

John W. Emerson An Introduction to R

Page 26: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Other apply-like functions

See also help("lapply") (similar to sapply() but returning a listrather than a vector) and help("tapply") (which I think of as atabular or conditional type of apply() over cells of a contingencytable defined by one or more categorical variables).> x <- list(A = rnorm(10), B = rexp(5))> length(x)

[1] 2

> length(x$A)

[1] 10

> sapply(x, length)

A B10 5

John W. Emerson An Introduction to R

Page 27: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

More loops

See also help("repeat") and help("while")

The break statement can be used to terminate any loop and isthe only way to terminate a repeat loop.It can be convenient to loop over elements of a set instead ofwith integers.

> names <- c("Jay", "John", NA, "Amy")> for (n in names) {+ if (is.na(n)) break+ cat("Hello, this person is", n, "\n")+ }

Hello, this person is JayHello, this person is John

John W. Emerson An Introduction to R

Page 28: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Conditionals (also see help("ifelse"))> x <- c(1, 2, 3, NA, 5)> is.na(x)

[1] FALSE FALSE FALSE TRUE FALSE

> any(is.na(x))

[1] TRUE

> if (any(is.na(x))) {+ x <- x[!is.na(x)]+ cat("Missing value(s) removed.\n")+ } else {+ cat("No missing values found.\n")+ }

Missing value(s) removed.

> x

[1] 1 2 3 5

John W. Emerson An Introduction to R

Page 29: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Functions: a trimmed mean?

> f <- function(x, trim = 1, na.rm = FALSE) {+ if (na.rm) {+ x <- x[!is.na(x)]+ } else if (any(is.na(x))) return(NA)+ x <- sort(x)+ x <- x[-c(1:trim)]+ x <- sort(x, decreasing = TRUE)+ return(mean(x[-c(1:trim)]))+ }> f(c(1, 2, 3, 2, 3, 10, NA))

[1] NA

> f(c(1, 2, 3, 2, 3, 10, NA), na.rm = TRUE)

[1] 2.5

John W. Emerson An Introduction to R

Page 30: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Function scoping; call by value; lazy evaluation

> g <- function(x) {+ x <- x + 1+ y <- y + 1+ return(c(x, y))+ }> x <- 10> y <- 100> g(x)

[1] 11 101

QUESTION: at this point, what are the values of x and y?

John W. Emerson An Introduction to R

Page 31: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Function scoping answer: 10 and 100, unchanged!

> g <- function(x) {+ x <- x + 1+ y <- y + 1+ return(c(x, y))+ }> x <- 10> y <- 100> g(x)

[1] 11 101

> x

[1] 10

> y

[1] 100

John W. Emerson An Introduction to R

Page 32: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Function scoping, continued

> g <- function(x) {+ x <<- x + 1+ return(x)+ }> x <- 1> g(x)

[1] 1

> x

[1] 2

John W. Emerson An Introduction to R

Page 33: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

PreliminariesObjects and simple manipulationsLoops (and related loop-like abilities)ConditionalsFunctions

Dot dot dot (though this seems pretty silly)

> h <- function(x, func = mean, ...) {+ return(func(x, ...))+ }> x <- sample(10, 10, replace = TRUE)> x[2] <- NA> x

[1] 8 NA 5 1 1 2 4 9 6 1

> h(x)

[1] NA

> h(x, na.rm = TRUE)

[1] 4.111111

> h(x, sd, na.rm = TRUE)

[1] 3.100179

John W. Emerson An Introduction to R

Page 34: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

Nationalistic Bias in Olympic Judging

See http://www.stat.yale.edu/~jay/Intro2R/:> x <- read.csv("Diving2000.csv", as.is = TRUE)> dim(x)

[1] 10787 10

> x[1:3, ]

Event Round Diver Country Rank DiveNo Difficulty1 M3mSB Final XIONG Ni CHN 1 1 3.12 M3mSB Final XIONG Ni CHN 1 1 3.13 M3mSB Final XIONG Ni CHN 1 1 3.1

JScore Judge JCountry1 8.0 RUIZ-PEDREGUERA Rolando CUB2 9.0 GEAR Dennis NZL3 8.5 BOYS Beverley CAN

John W. Emerson An Introduction to R

Page 35: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

Gambling on College Basketball

The script “scrapegoldsheet.txt” shows off R’s regular expressionsand is a great data scraping/cleaning example.> source("scrapegoldsheet.txt")> x <- read.csv("cbb2006.csv", as.is = TRUE)> dim(x)

[1] 2473 2

> head(x)

spread gamespread1 -11.0 -362 -7.5 -63 -15.5 -84 -12.0 -55 -17.5 -336 -19.5 -45

John W. Emerson An Introduction to R

Page 36: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

Concluding advice

You can’t break R, so try it out!Got a question? Try starting with the help examples!When you’re done, should you save the workspace! NO!Document all work related to a given problem in a script, keep ittogether with the data set in a dedicated folder. Be organized.

John W. Emerson An Introduction to R

Page 37: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

Thanks

Bell Laboratories (Rick Becker, John Chambers and Allan Wilks),for development of the S languageRoss Ihaka and Robert Gentleman, for their work and unselfishvision for RThe R Core teamBill Venables and David Smith, for An Introduction to RJohn Hartigan, for years of teaching and mentoringJohn Emerson (my father, Middlebury College), for getting mestarted in statisticsAll my students, for their willingness argue with me

John W. Emerson An Introduction to R

Page 38: An Introduction to R - Yale Universityjay/Brazil/Campinas/IntroToR/IntroToR.pdfJohn W. Emerson An Introduction to R. Minimal Sufficient R Example 1: Nationalistic Bias in Olympic

Minimal Sufficient RExample 1: Nationalistic Bias in Olympic Judging

Example 2: Gambling on College BasketballConclusion

Saving the workspace?

You have done some some fantastic work. Should you save theworkspace? NO! Don’t do it:

Your fantastic work should be saved in a script.The simulated data are reproducible.Most analysis with real data can be reproduced almost instantlyfrom the data set using the script.Perhaps an intermediate version of the data (cleaned frommessy raw data, for example) could be saved (and I recommendusing CSV files for this purpose).Perhaps the result of some extremely time-intensive work couldbe saved... but this is rare and should be the exception ratherthan the rule.

John W. Emerson An Introduction to R