notas sobre r - ub · notas sobre r francesc carmona, jordi ocan~a i alex s anchez department...
TRANSCRIPT
Notas sobre R
Francesc Carmona, Jordi Ocana i Alex Sanchez
Department d’EstadısticaUniversitat de Barcelona
3 d’octubre de 2007
Outline
Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language
Goals
I Revisar i aprofondir conceptes de R . Entrar en els “detalls”I Presentar i discutir eines i conceptes de programacio
“avancada”I Programacio orientada a objectesI Computacio amb el llenguatge, entorns, avaluacioI Depuracio i optimitzacio
I Treballar aspectes que no se solen tocar (Sweave, Paquets,Tcl/Tk)
Referencies i enllacos
I The R language manuals “An Introduction to R”, “The RLanguage Definition”, “Writing R Extensions”, “R Installationand Administration”, “R Data Import/Export”, “R Internals”
I S Programming, by W. N. Venables and B. D. Ripley
I R manuals, courses and tutorials by Thomas Girke
I Programming in R (Vincent Zoonekynd)
I R Help & R Coding Conventions, Henrik Bengtsson, LundUniversity
I Rtips, Paul Johnsson, University of Kansas
I Computing for Statisticians. John Scott. U. of Auckland
Outline
Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language
Review of basic concepts
I When R starts a workspace is created (.GlobalEnv) which isused to manipulate variables.
I Assignment operator, <-, creates a binding between symboland value.
I Remove operator, rm(), breaks the binding and removes thevalue from the environment but does not affect this value.
I ls() lists the objects in a specified environment, by defaultthe first one ((.GlobalEnv)).
Special Values
I NULL, (is.null), Often a zero length list
I NA, (is.na), Missing data for atomic types
I Inf, (is.finite/is.infinite) used to represent infinity
I NaN, (is.nan) for indeterminations
I use typeof() to check the type of a given value
> x <- 1/0
> typeof(x)
> y <- list()
> is.null(y)
> z <- "NA"
> is.na(z)
> is.character(z)
> z2 <- NA
> is.na(z2)
> is.character(z2)
Outline
Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language
Atomic vectors
I The simplest (aka “atomic”) data structure in R are vectors.
I Atomic vectors can contain integers, doubles, logicals orcharacter strings.
I Character vectors in the S language are vectors of characterstrings, not vectors of characters.
I Vectors can be created in many different ways.
> x <- c(1, 2, 3, 5, 7)
> y <- seq(1, 9, by = 2)
> z <- integer(3)
Numerical computing
I The only numbers that can be represented exactly in Rsnumeric type are integers and fractions whose denominator isa power of 2.
I As a result, two floating point numbers will not reliably beequal unless they have been computed by the same algorithm,and not always even then.
> a <- sqrt(2)
> a * a == 2
[1] FALSE
> a * a - 2
[1] 4.440892e-16
Factors
I Factors reflect the S language roots in statistical application.
I A factor is useful when a potentially large collection of datacontains relatively few, discrete levels.
I Factors are not vectors but objects of class factor which isan integer vector of codes and an attribute with name levels.
Factors examples
I Factor creation
> set.seed(123)
> x = sample(letters[1:5], 10, replace = TRUE)
> y = factor(x)
Lists
I Lists can be used to store items that are not all of the sametype.
I Lists can be of any length, and the elements of a list can benamed, or not.
I > (y <- list(a = 1, 17, b = 4:5, c = "a"))
$a[1] 1
[[2]][1] 17
$b[1] 4 5
$c[1] "a"
Outline
Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language
Conditional executions
I These are based on the evaluation of logical expressions
Comparison operators: == (equal),!= (not equal),>= (greater than or equal), etc.
Logical operators:& (and), | (or) and ! (not).
I Three selection instructions:I if which operates on scalar elements.I ifelse which operates elementwise on vectors.I switch to make multiple selection
If example
I if statement operates on length-one logical vectors
Syntax if (cond1=true) cmd1 else cmd2
> x <- 10
> z <- if (x == 0) {
+ 1
+ } else {
+ 2
+ }
Ifelse example
I ifelse statement: operates on vectors
Syntax
ifelse(test, truev alue, falsev alue)
> x <- 1:10
> print(ifelse(x < 5 | x > 8, x, 0))
Loops
I The most commonly used loop structures in R are ’for’, ’while’and ’apply’ loops.
I Less common are ’repeat’ loops.
I The ’break’ function is used to break out of loops, and
I ’next’ halts the processing of the current iteration andadvances the looping index.
for Loops
I For Loop: flexible, but slow when looping over large numberof fields (e.g. thousands of rows or columns)
Syntax
for(variable in sequence) statements
for Examples (1)
I Computing row means
> mydf <- iris
> myve <- NULL
> for (i in 1:length(mydf[, 1])) {
+ myve <- c(myve, mean(as.vector(as.matrix(mydf[i, 1:3]))))
+ }
for Examples (2)
I Conditions nested in for
> x <- 1:10
> z <- NULL
> for (i in 1:length(x)) {
+ if (x[i] < 5) {
+ z <- c(z, x[i] - 1)
+ }
+ else {
+ z <- c(z, x[i]/x[i])
+ }
+ }
While Loops
I While Loop: similar to for loop, but the iterations arecontrolled by a conditional statement.
Syntax
while(condition) statements
I Loop continues until condition returns FALSE.
> z <- 0
> while (z < 5) {
+ z <- z + 2
+ print(z)
+ }
Apply: Vectorized Loops ...
I The ’apply’ Function Family is intended to apply functions toall the values of a given data structure “at once”, that is,without using indexes.
Syntax
apply(X, MARGIN, FUN, ARGs)
whereI X: array, matrix or data.frame;I MARGIN: 1 for rows, 2 for columns, c(1,2) for both;I FUN: one or more functions;I ARGs: possible arguments for function
apply examples-1
I Example for single operation
> apply(airquality, 2, mean)
Ozone Solar.R Wind Temp Month DayNA NA 9.957516 77.882353 6.993464 15.803922
> apply(airquality, 2, mean, na.rm = T)
Ozone Solar.R Wind Temp Month Day42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
> apply(airquality, 2, function(s) sum(!is.na(s)))
Ozone Solar.R Wind Temp Month Day116 146 153 153 153 153
Two approaches to using apply (1)
I Two-step approach: 1st define function, 2nd use function inapply loop (does the same as above ’for loop’*)
> x <- 1:10
> z <- NULL
> test <- function(x) {
+ if (x < 5) {
+ x - 1
+ }
+ else {
+ x/x
+ }
+ }
> apply(as.matrix(x), 1, test)
Two approaches to using apply (2)
I One-step approach: does the same as above, but functiondefined in apply loop
> apply(as.matrix(x), 1, function(x) {
+ if (x < 5) {
+ x - 1
+ }
+ else {
+ x/x
+ }
+ })
[1] 0 1 2 3 1 1 1 1 1 1
Other applys: tapply
I tapply applies a function to array categories of variablelengths (ragged array). Grouping is defined by vector.
Syntax
tapply(vector, factor, FUN)
> tapply(iris[, 1], factor(iris[, 5]), mean)
> apply(iris[, 1:4], 2, function(x) tapply(x, iris[, 5], mean))
Other applys: sapply and lapply
I lapply and sapply Both apply a function on vector or listobjects. The function lapply returns a list, while sapplyreturns a more readable vector or matrix structure.
Syntax
lapply(list, FUN); lapply(vector, FUN)
Syntax
sapply(list, FUN); sapply(vector, FUN)
sapply and lapply examples (1)
I Application on vector
> z <- seq(1, 10, by = 2)
> myMat <- matrix(runif(100), ncol = 10)
> lapply(z, function(x) mean(myMat[x, ]))
> sapply(z, function(x) mean(myMat[x, ]))