notas sobre r - ub · notas sobre r francesc carmona, jordi ocan~a i alex s anchez department...

32
Notas sobre R Francesc Carmona, Jordi Oca˜ na i Alex S´ anchez Department d’Estad´ ıstica Universitat de Barcelona 3 d’octubre de 2007

Upload: others

Post on 03-Sep-2019

0 views

Category:

Documents


0 download

TRANSCRIPT

Notas sobre R

Francesc Carmona, Jordi Ocana i Alex Sanchez

Department d’EstadısticaUniversitat de Barcelona

3 d’octubre de 2007

Outline

Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language

Goals

I Revisar i aprofondir conceptes de R . Entrar en els “detalls”I Presentar i discutir eines i conceptes de programacio

“avancada”I Programacio orientada a objectesI Computacio amb el llenguatge, entorns, avaluacioI Depuracio i optimitzacio

I Treballar aspectes que no se solen tocar (Sweave, Paquets,Tcl/Tk)

Referencies i enllacos

I The R language manuals “An Introduction to R”, “The RLanguage Definition”, “Writing R Extensions”, “R Installationand Administration”, “R Data Import/Export”, “R Internals”

I S Programming, by W. N. Venables and B. D. Ripley

I R manuals, courses and tutorials by Thomas Girke

I Programming in R (Vincent Zoonekynd)

I R Help & R Coding Conventions, Henrik Bengtsson, LundUniversity

I Rtips, Paul Johnsson, University of Kansas

I Computing for Statisticians. John Scott. U. of Auckland

Outline

Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language

Review of basic concepts

I When R starts a workspace is created (.GlobalEnv) which isused to manipulate variables.

I Assignment operator, <-, creates a binding between symboland value.

I Remove operator, rm(), breaks the binding and removes thevalue from the environment but does not affect this value.

I ls() lists the objects in a specified environment, by defaultthe first one ((.GlobalEnv)).

> ls()

> ls(.GlobalEnv)

> sapply(1:5, ls)

? ls

Special Values

I NULL, (is.null), Often a zero length list

I NA, (is.na), Missing data for atomic types

I Inf, (is.finite/is.infinite) used to represent infinity

I NaN, (is.nan) for indeterminations

I use typeof() to check the type of a given value

> x <- 1/0

> typeof(x)

> y <- list()

> is.null(y)

> z <- "NA"

> is.na(z)

> is.character(z)

> z2 <- NA

> is.na(z2)

> is.character(z2)

Outline

Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language

Atomic vectors

I The simplest (aka “atomic”) data structure in R are vectors.

I Atomic vectors can contain integers, doubles, logicals orcharacter strings.

I Character vectors in the S language are vectors of characterstrings, not vectors of characters.

I Vectors can be created in many different ways.

> x <- c(1, 2, 3, 5, 7)

> y <- seq(1, 9, by = 2)

> z <- integer(3)

Numerical computing

I The only numbers that can be represented exactly in Rsnumeric type are integers and fractions whose denominator isa power of 2.

I As a result, two floating point numbers will not reliably beequal unless they have been computed by the same algorithm,and not always even then.

> a <- sqrt(2)

> a * a == 2

[1] FALSE

> a * a - 2

[1] 4.440892e-16

Factors

I Factors reflect the S language roots in statistical application.

I A factor is useful when a potentially large collection of datacontains relatively few, discrete levels.

I Factors are not vectors but objects of class factor which isan integer vector of codes and an attribute with name levels.

Factors examples

I Factor creation

> set.seed(123)

> x = sample(letters[1:5], 10, replace = TRUE)

> y = factor(x)

Lists

I Lists can be used to store items that are not all of the sametype.

I Lists can be of any length, and the elements of a list can benamed, or not.

I > (y <- list(a = 1, 17, b = 4:5, c = "a"))

$a[1] 1

[[2]][1] 17

$b[1] 4 5

$c[1] "a"

Outline

Presentation, overview, referencesIntroductionDataConditional executionsFunctionsLexical Scoping and Computing on the Language

Conditional executions

I These are based on the evaluation of logical expressions

Comparison operators: == (equal),!= (not equal),>= (greater than or equal), etc.

Logical operators:& (and), | (or) and ! (not).

I Three selection instructions:I if which operates on scalar elements.I ifelse which operates elementwise on vectors.I switch to make multiple selection

If example

I if statement operates on length-one logical vectors

Syntax if (cond1=true) cmd1 else cmd2

> x <- 10

> z <- if (x == 0) {

+ 1

+ } else {

+ 2

+ }

Ifelse example

I ifelse statement: operates on vectors

Syntax

ifelse(test, truev alue, falsev alue)

> x <- 1:10

> print(ifelse(x < 5 | x > 8, x, 0))

Loops

I The most commonly used loop structures in R are ’for’, ’while’and ’apply’ loops.

I Less common are ’repeat’ loops.

I The ’break’ function is used to break out of loops, and

I ’next’ halts the processing of the current iteration andadvances the looping index.

for Loops

I For Loop: flexible, but slow when looping over large numberof fields (e.g. thousands of rows or columns)

Syntax

for(variable in sequence) statements

for Examples (1)

I Computing row means

> mydf <- iris

> myve <- NULL

> for (i in 1:length(mydf[, 1])) {

+ myve <- c(myve, mean(as.vector(as.matrix(mydf[i, 1:3]))))

+ }

for Examples (2)

I Conditions nested in for

> x <- 1:10

> z <- NULL

> for (i in 1:length(x)) {

+ if (x[i] < 5) {

+ z <- c(z, x[i] - 1)

+ }

+ else {

+ z <- c(z, x[i]/x[i])

+ }

+ }

While Loops

I While Loop: similar to for loop, but the iterations arecontrolled by a conditional statement.

Syntax

while(condition) statements

I Loop continues until condition returns FALSE.

> z <- 0

> while (z < 5) {

+ z <- z + 2

+ print(z)

+ }

Apply: Vectorized Loops ...

I The ’apply’ Function Family is intended to apply functions toall the values of a given data structure “at once”, that is,without using indexes.

Syntax

apply(X, MARGIN, FUN, ARGs)

whereI X: array, matrix or data.frame;I MARGIN: 1 for rows, 2 for columns, c(1,2) for both;I FUN: one or more functions;I ARGs: possible arguments for function

apply examples-1

I Example for single operation

> apply(airquality, 2, mean)

Ozone Solar.R Wind Temp Month DayNA NA 9.957516 77.882353 6.993464 15.803922

> apply(airquality, 2, mean, na.rm = T)

Ozone Solar.R Wind Temp Month Day42.129310 185.931507 9.957516 77.882353 6.993464 15.803922

> apply(airquality, 2, function(s) sum(!is.na(s)))

Ozone Solar.R Wind Temp Month Day116 146 153 153 153 153

Two approaches to using apply (1)

I Two-step approach: 1st define function, 2nd use function inapply loop (does the same as above ’for loop’*)

> x <- 1:10

> z <- NULL

> test <- function(x) {

+ if (x < 5) {

+ x - 1

+ }

+ else {

+ x/x

+ }

+ }

> apply(as.matrix(x), 1, test)

Two approaches to using apply (2)

I One-step approach: does the same as above, but functiondefined in apply loop

> apply(as.matrix(x), 1, function(x) {

+ if (x < 5) {

+ x - 1

+ }

+ else {

+ x/x

+ }

+ })

[1] 0 1 2 3 1 1 1 1 1 1

Other applys: tapply

I tapply applies a function to array categories of variablelengths (ragged array). Grouping is defined by vector.

Syntax

tapply(vector, factor, FUN)

> tapply(iris[, 1], factor(iris[, 5]), mean)

> apply(iris[, 1:4], 2, function(x) tapply(x, iris[, 5], mean))

Other applys: sapply and lapply

I lapply and sapply Both apply a function on vector or listobjects. The function lapply returns a list, while sapplyreturns a more readable vector or matrix structure.

Syntax

lapply(list, FUN); lapply(vector, FUN)

Syntax

sapply(list, FUN); sapply(vector, FUN)

sapply and lapply examples (1)

I Application on vector

> z <- seq(1, 10, by = 2)

> myMat <- matrix(runif(100), ncol = 10)

> lapply(z, function(x) mean(myMat[x, ]))

> sapply(z, function(x) mean(myMat[x, ]))

sapply and lapply examples (2)

I Application on a list

> x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE, FALSE,

+ FALSE, TRUE))

> lapply(x, mean)

> lapply(x, quantile, probs = 1:3/4)