introduction to r · 2016. 2. 29. · reinhard furrer, uzh i-math, 12. 2. 2014 nzz.ch introduction...

142
Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R

Upload: others

Post on 24-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Reinhard Furrer, UZH

I-Math, 12. 2. 2014NZZ.ch

Introduction to R

Page 2: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Contents

2

I Basics

I Data handling and storing

I Plotting

I Linear models

I Simple programming tricks

Page 3: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

3

Part 1

Basics

Page 4: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

4

I What is R?

I The R-environment

I Getting started

I R rules

Page 5: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

What is R?

5

I R is a language and environment for statistical computing and

graphics.

I R provides a wide variety of statistical and graphical techniques,

and is highly extensible.

I R produces well-designed publication-quality plots with a careful

choice of default values.

I R is available as Free Software under the terms of the Free Soft-

ware Foundation’s GNU General Public License in source code

form.

Page 6: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

What is R?

6

Crude classification:

I Symbolic software:

– Mathematica

– Maple

– Magma

– . . .

I Numeric software:

– MATLAB, Octave

– NCL, IDL

– . . .

– R

Page 7: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

The R-environment: micro

7

I R is an integrated suite of software facilities

I Emphasis on statistical analysis and graphical display

I Perform an entire analysis from raw data to reports

I Essentially command line interpreted, links to precompiled code

are possible

Page 8: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

The R-environment: macro

8

Due to licence:

I freely available: cran.r-project.org

I huge community

I many packages (>5100): cran.r-project.org/web/packages/

I abundant documentation in form of:

FAQs (cran.r-project.org/doc/FAQ/R-FAQ.html), manuals (cran.r-

project.org/manuals.html or cran.r-project.org/other-docs.html),

wiki’s, books, . . . see www.r-project.org

I several mailing lists: www.r-project.org/mail.html

Page 9: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

The R-environment: macro

9

Slides are mainly based on the following sources:

I An Introduction to R: (IR)

cran.r-project.org/doc/manuals/R-intro.pdf

I The R Primer : (RP)

www.stat.washington.edu/cggreen/rprimer/

I The R Inferno: (RI)

www.burns-stat.com/pages/Tutor/R inferno.pdf

and some 10 years of personal use . . .

Page 10: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started: install R

10

Done through “The Comprehensive R Archive Network” (CRAN):

cran.r-project.org

Easy to follow instructions in Chapter 1 of RP:

www.stat.washington.edu/cggreen/rprimer/

Page 11: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started: run R (Linux)

11

Launch R in your console:<194>furrer@furrer-laptop:~/teaching/intro2R> R

R version 2.15.0 (2012-03-30)Copyright (C) 2012 The R Foundation for Statistical ComputingISBN 3-900051-07-0Platform: i686-pc-linux-gnu (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.

>

Page 12: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started: run R

12

RStudio

Runs under Windows, Linux, OS X (free; AGPLv3) rstudio.org

Page 13: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started: run R

13

Tinn-R (Tinn stands for the recursive acronym ’Tinn is not Notepad’)

Runs under Windows (free; GPL) sciviews.org/Tinn-R

Page 14: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started: run R

14

EMACS environment for R (and other statistics software)

Runs under Windows, Linux, OS X (GPL)

Page 15: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started

15

> pi

[1] 3.141593

> cos( pi)

[1] -1

> 2 + 2.3

[1] 4.3

> sqrt( -1) # Oops

[1] NaN

> myvar <- exp( -2.3) # Assigning

> print( myvar)

[1] 0.1002588

> print( myvar, digits=16)

[1] 0.1002588437228037

RStudio

Page 16: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 1

16

1. Open RStudio and familarize with it.

2. What is the 15th digit of π?

3. Interpret the result of sin( pi).

Page 17: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started

17

> nrcyclones <- c(6, 5, 4, 6, 6, 3, 12, 7, 4, 2, 6, 7, 4)

> # "c" is a function... creating a vector out of its elements

> summary( nrcyclones)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.000 4.000 6.000 5.538 6.000 12.000

> hist( nrcyclones)

Histogram of nrcyclones

nrcyclones

Fre

quen

cy

2 4 6 8 10 12

01

23

45

Page 18: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started

18

> plot( nrcyclones, type="b")

● ●

2 4 6 8 10 12

24

68

1012

Index

nrcy

clon

es

> cor( nrcyclones[-1], nrcyclones[-13])

[1] -0.1113836

Page 19: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started

19

> par( mfrow=c(1,2))

> acf( nrcyclones)

> pacf( nrcyclones)

0 2 4 6 8 10

−0.

50.

00.

51.

0

Lag

AC

F

Series nrcyclones

2 4 6 8 10

−0.

4−

0.2

0.0

0.2

0.4

Lag

Par

tial A

CF

Series nrcyclones

Page 20: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started

20

> help( acf)acf package:stats R Documentation

Auto- and Cross- Covariance and -Correlation Function Estimation

Description:

The function 'acf' computes (and by default plots) estimates ofthe autocovariance or autocorrelation function. Function 'pacf'is the function used for the partial autocorrelations. Function'ccf' computes the cross-correlation or cross-covariance of twounivariate series.

Usage:

acf(x, lag.max = NULL,type = c("correlation", "covariance", "partial"),plot = TRUE, na.action = na.fail, demean = TRUE, ...)

pacf(x, lag.max, plot, na.action, ...)

## Default S3 method:pacf(x, lag.max = NULL, plot = TRUE, na.action = na.fail,

...)

Page 21: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Getting started: getting help

21

Various possibilities:

> ?mean # Shortcut for help( mean)

> ?"%*%" # The quotes are required!

> help.start() # Interactive html-based help!

Further illustrative help is accessed via:

> example("image") # example code in the help of "image"

> demo("image") # run the demo "image"

> demo() # lists all available demos

We hardly use the following command:

> q()

Page 22: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

R rules

22

I R is case-sensitive.

I Variable names, function names, etc., should contain only

alphanumeric characters (A-Z, a-z, 0-9), the “.” (and “ ”).

Cannot be a reserved word or start with a digit or ” ”.

I Commands are separated by semicolons (“;”) or by a newline.

Commands are grouped with curly braces ({ }).

I # is the comment sign. Remainder of the line is ignored.

I If a command is not complete at the end of a line, R will give

a continuation prompt, “+ ”, on subsequent lines until the com-

mand is complete.

I As long as matched, single quotes (’) and double quotes (") are

equivalent.

Page 23: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

R rules: reserved words

23

The reserved words in R’s parser are:

if, else, repeat, while, for, in, next, break, function

TRUE, FALSE, NULL, Inf, NaN, NA and NA-specific types.

... and ...-derivatives, which are used to refer to arguments

passed down from an enclosing function.

There are (unprotected) short cuts T and F, for TRUE and FALSE:

> T

[1] TRUE

> T <- F # How not to do it!!

> T

[1] FALSE

Page 24: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

R rules: functions and operators

24

Most R statements are composed of functions and operators:

> y <- sqrt(2 + 2)

consists of the + operator followed by the √ -function and then the

assign operator.

Functions are of the form function( list of arguments )

Operators are of the form lhs operator rhs

Page 25: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 2

25

1. What are operators and what are functions in the following calls:

2 + 1

sin( pi)

2 + cos( 0)

2. What does the function median calculate?

3. Notice the difference between ?mean, ?"mean" and ?in, ?"in".

4. Create a variable named my1var containing log( 3).

5. Which of the following are valid variable names:

yo, beHappy!, I am 2, myvar;val, getvar1, getvar$char.

6.? Many operators can be used as functions: "operator"(lhs, rhs).

Compare: 2 + 2 and "+"(2,2)

Page 26: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

R rules: syntax

26

R has the following operators (highest to lowest)::: ::: access variables in a name space$ @ component / slot extraction[ [[ indexing^ exponentiation (right to left)- + unary minus and plus: sequence operator%any% special operators* / multiply, divide+ - (binary) add, subtract< > <= >= == != ordering and comparison! negation& && and| || or~ as in formulae-> ->> rightwards assignment= assignment (right to left)<- <<- assignment (right to left)? help (unary and binary)

Page 27: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 3

27

1. Compare:

1:-3

1:(-3)

-1:3

-(1:3)

2. Compare:

2^1/2

2^(1/2)

3.? Be aware of floating point arithmetic:

pi==3.14159265358979

pi==3.141592653589793

pi==3.141592653589793116

Page 28: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

28

Part 2

Data handling and storage

Page 29: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

29

I Objects

I Indexing

I Functions

I Reading from files

Page 30: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects

30

R uses the following “core” objects:

I vectors

I matrices

I arrays

I factors

I lists

I data frames

I functions

Page 31: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: vectors

31

Intrinsic attributes: mode and length

> v <- 1:4

> v

[1] 1 2 3 4

mode is of logical, numeric, complex, character (or raw).

> length( v)

[1] 4

> mode( v)

[1] "numeric"

> mode( 1i) # to give another example

[1] "complex"

The mode numeric has storage mode integer or double.

Page 32: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 4

32

1. All elements of a vector are of the same mode.

What is the mode of c("char", pi), c(2,1i)?

2. Interpret the result of sqrt(-1) and sqrt(-1+0i)

3. is.integer and as.integer query and coerce to integer format.

What is the output of length (two ways to verify)?

4.? Compare the results of identical(1,1.0) and

identical( as.integer(1),1.0)

5.? What is the result and storage mode of 3L, 3L*1, 3L*1L, 3L/1L,

3L/3L?

Page 33: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: vectors: generation

33

Concatenation operator:

> v <- c( 1, 2, 3, 4)

Generate sequences (several additional possibilities exist):

> seq( 4) # identical to 1:4

[1] 1 2 3 4

> seq( 1, 12, by=2)

[1] 1 3 5 7 9 11

> seq( 1, by=2, length.out=12)

[1] 1 3 5 7 9 11 13 15 17 19 21 23

> rep( 1:4, 2) # identical to rep.int( 1:4, 2)

[1] 1 2 3 4 1 2 3 4

> rep( 1:4, each=2)

[1] 1 1 2 2 3 3 4 4

> rep( 1:4, 2:5) # identical to rep( 1:4, times=2:5)

[1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4

Page 34: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 5

34

1. Interpret the output of the following calls:

seq( from=1, to=13, by=2)

seq( from=1, to=13, length.out=3)

seq( from=1, by=2, length.out=3)

seq( from=1, to=12, by=2, length.out=3)

2. What calls generate the sequence: 1, 4, 4, 7, 7, 7, 10, 10, 10,

10, 13, 13, 13, 13, 13?

3. Create a sequence containing TRUE and FALSE according to the

parity of the last sequence.

4. Why is it not advisable to use the command: c <- c(1, 2, 3, 4)?

Page 35: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: matrices

35

A vector with (minimal) attribute dim

> m <- matrix( 1:16, 4, 4)

> m

[,1] [,2] [,3] [,4]

[1,] 1 5 9 13

[2,] 2 6 10 14

[3,] 3 7 11 15

[4,] 4 8 12 16

> length( m)

[1] 16

> attributes( m)

$dim

[1] 4 4

Page 36: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: matrices

36

A matrix can contain additional attributes

> rownames( m) <- paste( "r", 1:4, sep="")

> attributes( m)

$dim

[1] 4 4

$dimnames

$dimnames[[1]]

[1] "r1" "r2" "r3" "r4"

$dimnames[[2]]

NULL

The function attr( object, name) can be used to specify an attribute:

> attr( m, "dim") <- c(2, 8) # What is the result?

Page 37: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: matrices: generation

37

> m1 <- matrix( 1:8, nrow=4, ncol=4, byrow=TRUE) # recycling

> m2 <- diag( 1:4)

> m3 <- cbind( 1:3, 2:4, 1)

> m3

[,1] [,2] [,3]

[1,] 1 2 1

[2,] 2 3 1

[3,] 3 4 1

> t( m3) # transpose

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 2 3 4

[3,] 1 1 1

Page 38: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 6

38

1. What is the effect of dim( m) <- c( 2, 8)? Try other values.

2. What is the result of

matrix( 1:7, nrow=4, ncol=4)

diag( m1)

rbind( 1:3, 2:4, 1)

cbind( rbind( 1:2, 3:4), 0) ?

3. Construct a block diagonal matrix with 2 blocks of sizes 2×2.

Page 39: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: arrays

39

Arrays are higher-dimensional “matrices”

> a <- array( 1:24, c( 3, 4, 2))

> a

, , 1

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

, , 2

[,1] [,2] [,3] [,4]

[1,] 13 16 19 22

[2,] 14 17 20 23

[3,] 15 18 21 24

Page 40: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 7

40

1. What is the length of a?

2. What are its attributes?

3. aperm is the generalization of t.

Trace the elements of aperm(a,c(2,1,3)) and aperm(a,c(3,2,1)).

Page 41: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: factors

41

Strange concept, neither numeric nor character.

> as.factor( 1:3)

[1] 1 2 3

Levels: 1 2 3

> as.factor( 1:3) + 1

[1] NA NA NA

Used in the context of categorical data.

Page 42: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: lists

42

A vector whose elements can be of ‘any’ type.

> l <- list(1:2, as.factor(1:2), paste(1:2))

> l

[[1]]

[1] 1 2

[[2]]

[1] 1 2

Levels: 1 2

[[3]]

[1] "1" "2"

> length(l)

[1] 3

Page 43: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: data frames

43

Matrix-like structures, in which the columns can be of different types.

> d <- data.frame( m)

> d

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

2 2 4 6 8 10 12 14 16

> attributes( d)

$names

[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"

$row.names

[1] 1 2

$class

[1] "data.frame"

Page 44: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: data frames

44

While rownames and colnames are for matrices, names and row.names are

for data frames.

> names( d)

[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"

> row.names( d)

[1] "1" "2"

Luckily, the former work as well:

> colnames( d)

[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"

> rownames( d)

[1] "1" "2"

In general, work with dimnames.

Page 45: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 8

45

1. Can factors be ordered?

2. What is the difference between l[1] and l[[1]] ?

(use is.list(..) to probe the result).

3. Internally, a data.frame is a list with class data.frame .

Check d[[3]] .

4. What is the length of d? Is the result intuitive?

Page 46: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: functions

46

R is built upon itself. Many of the functions are “visible”:> sdfunction (x, na.rm = FALSE)sqrt(var(if (is.vector(x)) x else as.double(x), na.rm = na.rm))<bytecode: 0x25d9408><environment: namespace:stats>

More later . . .

Page 47: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: coercion and testing

47

An object obj has usually with three associated functions:

obj() , as.obj() , and is.obj() .

> is.matrix( a)

[1] FALSE

> as.matrix( v) # here equivalent to "matrix(v)"

[,1]

[1,] 1

[2,] 2

[3,] 3

[4,] 4

Page 48: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 9

48

1. Notice the difference between matrix( a, nrow=3)

and as.matrix( a, nrow=3)

2. What is the result of c( 0, NULL, 3),

is.array( m), is.matrix( m)

is.array( a), is.matrix( a)

3. Note all coercions work. What is the result of

as.integer( pi)

as.integer( 2i)

as.numeric( "a")

Page 49: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects: summary

49

Source: RI

Page 50: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing

50

Basically, extraction is done via the [ operator:

> v

[1] 1 2 3 4

> v[1]

[1] 1

> v[-c(2:3)] # or v[-c(2,3)] or v[-(2:3)]

[1] 1 4

Similarly, replacement is done via the [<- operator:

> v[ 1] <- 1.1

> v[-c(2:3)] <- c(2.2, 3.3)

> v

[1] 2.2 2.0 3.0 3.3

Page 51: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing: vectors

51

Extraction is done via the [ operator:

> v

[1] 2.2 2.0 3.0 3.3

> v[ c(1, 4)]

[1] 2.2 3.3

> v[-c(1, 4)]

[1] 2 3

> v[c(TRUE, FALSE, TRUE, FALSE)]

[1] 2.2 3.0

> v[c(TRUE, FALSE, TRUE)] # note the recycling!

[1] 2.2 3.0 3.3

Extraction for (very) long vectors:

> tail( v, 2)

[1] 3.0 3.3

> head( v, -1)

[1] 2.2 2.0 3.0

Page 52: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing: matrices

52

> m <- matrix( 1:16, 4, 4)

> m[2, 3]

[1] 10

> m[1,]

[1] 1 5 9 13

> m[,1]

[1] 1 2 3 4

> m[ c(1,8,12)] # ordered columwise

[1] 1 8 12

> m[ c(1,2,4), c(4,2,1)] # note the ordering

[,1] [,2] [,3]

[1,] 13 5 1

[2,] 14 6 2

[3,] 16 8 4

> m[cbind( c(1,2,4), c(4,2,1))] # What is the result when using rbind?

[1] 13 6 4

Page 53: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing: matrices

53

If the matrix has appropriate dimnames attributes:

> rownames( m) <- paste( "r", 1:4, sep="")

> m

[,1] [,2] [,3] [,4]

r1 1 5 9 13

r2 2 6 10 14

r3 3 7 11 15

r4 4 8 12 16

> m["r1",]

[1] 1 5 9 13

> m[,1, drop=FALSE]

[,1]

r1 1

r2 2

r3 3

r4 4

Page 54: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing: matrices

54

Extract or replace the diagonal values:

> n <- min( dim( m))

> diag( m)

[1] 1 6 11 16

> diag( m) <- -(1:n)

How to extract the values above the diagonal?

> m[ (1:(n-1))*(n+1)]

[1] 5 10 15

> m

[,1] [,2] [,3] [,4]

r1 -1 5 9 13

r2 2 -2 10 14

r3 3 7 -3 15

r4 4 8 12 -4

Page 55: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 10

55

1. Suppose that m only has rownames, interpret the result of m[,"c1"].

2. Use diag to extract the values above the diagonal.

3. Set the values of m below the diagonal to -1.

4. Compare m[cbind( c(1,2,4), c(4,2,1))] and the result when using

rbind instead?

Page 56: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing: lists

56

Extraction is done via the [, [[, $ operator:

> l[[1]]

[1] 1 2

> l[1]

[[1]]

[1] 1 2

> ll <- list( a=2, b=3, cde=10)

> ll$a

[1] 2

> ll$c # note the partial matching

[1] 10

Page 57: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing: data frames

57

Column extraction is also possible with $ operator:

> d$X1 # a data frame is primarily a list!

[1] 1 2

> d[,1]

[1] 1 2

> d[,"X1"]

[1] 1 2

Similarly:

> d[1,]

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

> d["1",]

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

Page 58: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Indexing: other details

58

I Matrices are stored column-wise.

I Arrays are stored along the indices.

I Objects can have length zero, e.g. v[0].

I Indexing starts at one, but indexing can have all negative values.

Page 59: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 11

59

1. What happens if ll <- list( a=2, b=3, cd=10, ce=12) is indexed

with ll$c?

2. What elements are extracted with m[1:6], a[1:4*2]?

3. Let exist <- 1:14. What elements are extracted with exist[-c(1:3)],

exist[c(1:3)]? What is the result of exist[-1:3]

4.? Examine the code

nonexist[2] <- 1

nonexist <- numeric(0)

length(nonexist)

nonexist[0]

nonexist[1]

nonexist[2] <- 1

nonexist

Page 60: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions

60

Example:

> x <- mean( x, trim=.1)

General structure:

> res <- fcn( defarg1, defarg2,..., optarg1, optarg2, ...)

I res may be NULL

I Required arguments need to be in order.

I Optional arguments are name matched.

Page 61: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions: “Math” group

61

Math(x, ...): abs, sign, sqrt, floor, ceiling, trunc

round, signif, exp, log, expm1, log1p

cos, sin, tan, acos, asin, atan

cosh, sinh, tanh, acosh, asinh, atanh

lgamma, gamma, digamma, trigamma

cumsum, cumprod, cummax, cummin

Ops(e1, e2): "+", "-", "*", "/", "^", "%%", "%/%"

"&", "|", "!"

"==", "!=", "<", "<=", ">=", ">"

Summary(..., na.rm=FALSE): all, any, sum, prod, min, max, range

Complex(z): Arg, Conj, Im, Mod, Re

Page 62: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 12

62

1. What is the result of min( c( 1, 3, NA)) ?

Is there a difference to min( 1, 3, NA) ?

How to get the result of 1?

2. What is the result of 17 %% 7 and 17 %/% 7 ? Why?

3.? It is possible to define functions without a function name:

(function(x,y) { z <- x**2 + y**2; x+y+z } )(0:7, 1)

Page 63: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions: matrices

63

For matrices, special operators are defined:

> m1 <- m2 <- matrix(1, 2, 2)

> m1[2, 2] <- 2

> m1 %*% m2

[,1] [,2]

[1,] 2 2

[2,] 3 3

> solve( m1)

[,1] [,2]

[1,] 2 -1

[2,] -1 1

> det( m1)

[1] 1

Page 64: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions: matrices: factorization

64

> svd( m1) # X = U D V'$d[1] 2.618034 0.381966

$u[,1] [,2]

[1,] -0.5257311 -0.8506508[2,] -0.8506508 0.5257311

$v[,1] [,2]

[1,] -0.5257311 -0.8506508[2,] -0.8506508 0.5257311> chol( m1) # X = R' R

[,1] [,2][1,] 1 1[2,] 0 1> eigen( m1) # X = G D G' ## We see eigen and chol again!$values[1] 2.618034 0.381966

$vectors[,1] [,2]

[1,] 0.5257311 -0.8506508[2,] 0.8506508 0.5257311

Page 65: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions: matrices: factorization

65

> qr( m1)$qr

[,1] [,2][1,] -1.4142136 -2.1213203[2,] 0.7071068 0.7071068

$rank[1] 2

$qraux[1] 1.7071068 0.7071068

$pivot[1] 1 2

attr(,"class")[1] "qr"

There are several additional functions associated: qr.qy, qr.tqr, . . .

Page 66: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 13

66

Let M <- m1 %*% t( m1)

1. What is the eigendecomposition of M ?

2. What are the singular values of the same matrix?

3. Propose several approaches to construct an inverse of

M + diag( 2)

4. How can you calculate the trace of an arbitrary matrix A ?

Page 67: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions: probability distributions

67

General construct of prefix and root.

I prefix: d density, p CDF, q quantile, r random numbers

I root: beta, binom, pois, norm, t, and many more

For example:

> runif( 5)

[1] 0.2282756 0.1472576 0.8364201 0.8430635 0.0640814

> dnorm( 0)

[1] 0.3989423

> qt( 0.975, df=1)

[1] 12.7062

Parameters are “quite” standard, consult the help.

Page 68: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions: apply

68

Applying a function to margins of an array or matrix.

> d

X1 X2 X3 X4 X5 X6 X7 X8

1 1 3 5 7 9 11 13 15

2 2 4 6 8 10 12 14 16

> apply( d, 2, mean)

X1 X2 X3 X4 X5 X6 X7 X8

1.5 3.5 5.5 7.5 9.5 11.5 13.5 15.5

> apply( d, 1, range)

[,1] [,2]

[1,] 1 2

[2,] 15 16

> apply( d, 1, function(x, tr) { x[2] - mean(x, trim=tr)}, tr=.4)

[1] -5 -5

Page 69: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 14

69

1. Draw a normal sample of size 100 and draw a histogram of the

sample.

What is the mean and standard deviation of the sample?

2. Repeat the previous exercise 1000 times and calculate the mean

of the means and the standard deviations.

3. How do the results compare to the ones from your peers?

Is there a way to “homogenize” the procedure?

Page 70: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Reading from files: data files

70

I Several possibilities of reading ASCII files:

> read.table(file, header = FALSE, sep = "")

> read.csv(file, header = TRUE, sep = ",", quote="\"")

> scan(file, ...)

I scan is a powerful (complex) alternative.

I Byte length encoding is read with read.fwd.

I Common open source storage formats are supported:

netCDF, GRIB, HDF, . . .

(specific packages need to be loaded).

I Directly reading Excel files is not possible (non-free software).

Page 71: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Reading from files: R code/objects

71

I R “source code” is read and evaluated with source("filename.R")

I R data files are read with load("file.RData")

I To save R objects use

> save.image()

> save(..., file="file.RData") # symbols or character strings

Note the save.image question when quitting R.

I data() lists all the available datasets in the search path (directly

available).

data( package=.packages( all.available=TRUE)) lists all the avail-

able datasets.

I data( name, package="packagename") loads name from the package

packagename.

Page 72: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 15

72

On www.math.uzh.ch/furrer/software/workshop/ the three datasets

data1.dat, data2.dat and data3.dat are deposited (use entire link).

1. Download the datasets and look at the content thereof.

What are the differences?

2. Load these three datasets into R, by properly keeping column and

row names of the original data.

Try to specify directly the URL instead of the filename, what

do you notice?

3. Save one of the datasets in R-native format.

4.? Are there ways to reduce the file size?

Page 73: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

73

Part 3

Plotting

Page 74: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

74

I Plotting in R

I High-level plotting (HLP) functions

I Low-level plotting (LLP) functions

I Interactive graphics functions

I Graphical parameters

Page 75: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Plotting in R

75

R distinguishes different plotting type functions:

I High-level plotting (HLP) functions create a new plot on the

graphics device, possibly with axes, labels, titles and so on.

I Low-level plotting (LLP) functions add more information to an

existing plot, such as extra points, lines and labels.

I Interactive graphics functions allow you interactively add infor-

mation to, or extract information from, an existing plot, using a

pointing device such as a mouse.

R maintains a list of graphical parameters which can be manipulated

to customize your plots.

Page 76: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Plotting in R: workflow

76

General workflow:

1. Choosing a device (screen, PDF file, . . . )

2. Setting graphical parameters

3. Calling a high-level plotting function

4. Calling low-level plotting functions

5. More calls to high-level and low-level functions

6. Closing the device

Simplest example (i.e., point 3 only):

> plot(0)

Page 77: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Plotting in R: workflow: example

77

> x <- rnorm( 100) # 100 random numbers

> pdf( "figure1.pdf") # Output to a PDF file

> par( mfrow=c(1, 2)) # Two panels for this plot

> hist( x) # high-level call

> abline( v=mean( x)) # low-level call

> qqnorm( x) # second high-level call

> dev.off() # close the device

produces: Histogram of x

x

Fre

quen

cy

−2 0 1 2

05

1015

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 0 1 2

−2

−1

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Page 78: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Plotting in R: workflow

78

I If no device is open, the default one will be used (usually screen).

I When producing files, dev.off() is required.

I Each new high-level plot overwrites the current area, unless dif-

ferently specified (usually, add=TRUE).

I Several devices can be open, only one is active. Use dev.cur()

and dev.set(), to inquire and set the active device.

Page 79: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions

79

Common high-level plotting functions:

plot(x, y) most basic plotting command, flexiblehist(x) histogram (specify breaks for discrete data)boxplot(x) boxplot of one or several variablesqqnorm(y) quantile-quantile plot (empirical vs normal)qqplot(x, y) quantile-quantile plot (empirical vs arbitrary)pairs(x) scatterplots for multidimensional datacurve(expr) plots a functionimage(x, y, z) z = f(x, y) is provided in a matrixcontour(x, y, z) z = f(x, y) is provided in a matrixpersp(x, y, z) basic 3D plotting with shading

Page 80: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 16

80

1. Draw a random sample of size 15 from a normal distribution.

Plot a histogram and superimpose the true density.

2. Repeat the experiment 100 times and superimpose a histogram

of the means.

Page 81: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

81

Consider X1, . . . , Xniid∼ N (µ, σ2).

Investigate the likelihood function L(µ, σ) =n∏i=1

fX(xi;µ, σ).

For numerical stability, we work with the log-likelihood.

> mu <- 2

> sigma <- 2

> n <- 20

> x <- rnorm(n,mu,sigma)

> loglikelihood <- function(pars, x) {

+ return( sum( dnorm( x, pars[1], pars[2], log=T) ) )

+ }

Page 82: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

82

Evaluate the log-likelihood over a grid

> ns <- 50

> m <- seq( 1, to=4, length=ns)

> s <- seq( 1, to=5, length=ns)

> grid <- expand.grid( m, s)

> ll <- apply( grid, 1, loglikelihood, x=x) # What is ll?

> llmat <- matrix( ll, ns) # What is dim(llmat)? Why?

> image( m, s, llmat)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

12

34

5

m

s

Page 83: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

83

> ncol <- 64

> mx <- unlist( grid[ which.max( ll),])

> image( m, s, llmat, col=topo.colors(ncol),

+ xlab=expression(mu), ylab=expression(sigma))

> abline( v=mx[1], h=mx[2])

1.0 1.5 2.0 2.5 3.0 3.5 4.0

12

34

5

µ

σ

Page 84: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

84

> image( m, s, llmat, col=topo.colors(64),

+ xlab=expression(mu), ylab=expression(sigma))

> abline( v=mx[1], h=mx[2])

> box()

> contour( m, s, llmat, add=T)

1.0 1.5 2.0 2.5 3.0 3.5 4.0

12

34

5

µ

σ

−70 −65 −60 −60 −55 −55 −50

−50

−45

−40

Page 85: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

85

> persp( m, s, llmat)

m

s

llmat

Page 86: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

86

> persp( m, s, llmat, phi=45, theta=30)

m

s

llmat

Page 87: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

87

> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE)

Page 88: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

88

> zfacet <- llmat[-1,-1]+llmat[-1,-ns]+llmat[-ns,-1]+llmat[-ns,-ns]

> facetcol <- cut( zfacet, ncol)

> brcol <- colorRampPalette( c("white","yellow", "red") )

> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE,

+ col=brcol( ncol)[facetcol]) -> out

Page 89: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

89

> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE,

+ col=brcol(ncol)[facetcol], border=NA)

> points( trans3d(mx[1], mx[2], max( llmat), out), cex=4, col=4,

+ pch=4)

Page 90: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

HLP functions: 3D plotting

90

Grid search delivers maximum:

> c( value=max(ll), mx)

value Var1 Var2

-39.924118 2.408163 1.816327

Numerical optimum is at:

> par <- optim( mx, function( theta) -loglikelihood( theta, x),

+ method="L-BFGS-B", lower=c( -Inf, 0))

> c( value=-par[["value"]], par[["par"]])

value Var1 Var2

-39.913951 2.381047 1.780258

> par[4] # _ALWAYS_ check!

$convergence

[1] 0

For a maximization, set control$fnscale to a negative value.

Page 91: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 17

91

Draw a random sample of size two from a normal density.

1. Plot the log-likelihood as a function of x1 and x2.

2. Plot the log-likelihood as a function of µ and σ.

Page 92: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

LLP functions

92

Common low-level plotting functions:

points, lines similar as plot

title main/sub above/below the panelabline v, h, or intercept/slopetext like points with text insteadmtext quite flexiblelegend flexible through many parametersaxis add additional axis, (see xaxt, yaxt)box around the panelarrows, segments . . .polygon . . .rect . . .

Page 93: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Interactive graphics function

93

I locator(n=512):

gets n coordinates of the graphics cursor when left mouse button

is pressed.

I identify(x, y, n=length(x)):

after a left mouse button click, reads the position and searches

the closest point among x,y. Returns the index of the points.

I Both functions quit when pressing any other button.

I For more interaction, use package rgl.

Page 94: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Graphical parameters

94

The function par queries and sets plotting parameters (similar to

option for “system” parameters).

> par("bty") # Frame is a rectangle

[1] "o"

> par(bty="n") # no frame/box is drawn

> par("bty")

[1] "n"

Many options are available, see for example:

> par()

?par is my most frequent help call.

Page 95: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Graphical parameters

95

Further parameters:

adj text ajustment (.5 is default, centering)bg fg background and foreground (default) colorcex, cex. magnification of text and symbols relative to the defaultcol, col. color specification (numbers 0:7, words, rgb hex string)las rotation style of axis labelslty line type (1=solid, 2=dashed, 3=dotted, ...)lwd line widthmfrow,mfcol array of subplots filled by row/columnew if TRUE the next HLP will not clean the framepch specifying the symbol used for pointspty if s use square plotting areaxaxs, yaxs i for precise axis boundsxaxt, yaxt n to suppress axis drawingxlog, ylog if TRUE use logarithmic scale

where “ ” : axis, lab, main, sub

Page 96: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Graphical parameters

96

mai and omi (in inches or mar and oma in ’lines’):

As well as mgp, (defaults to c(3,1,0)). . .

Page 97: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Graphical parameters: example

97

> sample <- rt(100, df=2)

> boxplot( sample)

●●

−10

−5

05

1015

20

Page 98: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Graphical parameters: example

98

> par(bty="l", col=5, col.main=2, cex=2)

> boxplot( sample, main="Boxplot")−

100

1020

Boxplot

Page 99: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Graphical parameters: example

99

> par(bty="l", col.main=2, col.axis=4, cex=2, mai=c(.1,.7,.5,.1),

+ mgp=c(3,.8,0), adj=1, las=1, pch="-")

> boxplot( sample, main="Boxplot", col=5)

−−−10

−5

0

5

10

15

20Boxplot

Page 100: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Graphical parameters: example

100

> par(bty="l", col.main=2, col.axis=4, cex=2, mai=c(.1,.7,.5,.1),

+ mgp=c(3,.8,0), adj=1, las=1, pch="-")

> boxplot( sample, col=5)

> title("Boxplot", adj=.5)

−−−10

−5

0

5

10

15

20Boxplot

Page 101: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 18

101

Create the following plot. The data is available at:

www.math.uzh.ch/furrer/software/workshop/wheat.csv50

6070

8090

110

Durum

US

pro

duct

ion

(mio

bus

hel)

56

78

910

Pric

e (U

SD

per

bus

hel)

2008/09 2009/10 2010/11 2011/12

Page 102: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 19

102

Create the following plot.

−3 −2 −1 0 1 2 3

−6

−4

−2

02

46

x

f(x)

ex

ln(x)

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

2.0

2.5

3.0

x

f(x)

cosh(x)arcosh(x)

Page 103: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

103

Part 4

Linear models

Page 104: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

104

I A regression example

I Objects of class formula

I lm object

I Another regression example

I Other uses of formula objects

Page 105: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

A regression example

105

Suppose we have a response y for a set of predictors x1, . . . , xp.

Assume a linear model

yi = β1xi1 + · · ·+ βpxip + εi εiiid∼ N (0, σ2), i = 1, . . . , n

in matrix notation y = Xβ + ε.

Given response and predictors “solve” the regression problem:

I What are the estimates β̂?

I Which predictors are significant?

I Is the model adequate?

I . . .

Page 106: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

A regression example

106

Artificial data, so we know the “truth”:

> n <- 10

> x <- runif( n, -1, 2)

> beta <- c( 1, 1)

> sigma <- .5

> y <- beta[1] + beta[2]*x + rnorm( n, sd=sigma)

> plot( x, y)

●●

−0.5 0.0 0.5 1.0 1.5

0.5

1.5

2.5

3.5

x

y

Page 107: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

A regression example

107

A linear model is fitted with> lm1 <- lm( y~x)> summary( lm1)Call:lm(formula = y ~ x)

Residuals:Min 1Q Median 3Q Max

-1.1663 -0.3133 0.1224 0.3003 0.6425

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.9993 0.2257 4.426 0.002208 **x 1.0673 0.2031 5.255 0.000769 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.577 on 8 degrees of freedomMultiple R-squared: 0.7754, Adjusted R-squared: 0.7473F-statistic: 27.62 on 1 and 8 DF, p-value: 0.000769

Page 108: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

A regression example

108

> coef( lm1)

(Intercept) x

0.9992531 1.0673167

> fitted( lm1)

1 2 3 4 5 6 7

0.7820819 1.1234585 1.7661843 2.8399724 0.5777119 2.8085353 2.9567395

8 9 10

2.0477779 1.9463282 0.1297729

> resid( lm1)

1 2 3 4 5

-0.39579007 0.23662769 0.32153818 0.17254164 -0.12536026

6 7 8 9 10

0.64252432 0.07220797 -0.37600485 -1.16633597 0.61805134

Page 109: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

A regression example

109

> par( mfrow=c(2, 2))

> plot( lm1)

0.5 1.0 1.5 2.0 2.5 3.0

−1.

00.

0

Fitted values

Res

idua

ls

●●

Residuals vs Fitted

9

610

●●

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2

−1

01

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als Normal Q−Q

9

106

0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

●●

●●

Scale−Location9

10 6

0.0 0.1 0.2 0.3

−2

01

Leverage

Sta

ndar

dize

d re

sidu

als

●●●

Cook's distance 1

0.5

0.5

Residuals vs Leverage10

9

6

Page 110: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

A regression example

110

> pre <- predict( lm1, newdata=data.frame(x=0))

> pre

1

0.9992531

> plot( x, y)

> points( 0, pre, col=2, cex=2)

●●

−0.5 0.0 0.5 1.0 1.5

0.5

1.5

2.5

3.5

x

y

Page 111: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

A regression example

111

> new <- data.frame( x = seq(-2, 3, by=0.25))

> pred.w.plim <- predict( lm1, new, interval="prediction")

> pred.w.clim <- predict( lm1, new, interval="confidence")

> plot( x, y)

> points( 0, pre, col=2, cex=2)

> matlines( new$x, cbind(pred.w.clim, pred.w.plim[,-1]), lty=1)

●●

−0.5 0.0 0.5 1.0 1.5

0.5

1.5

2.5

3.5

x

y

Page 112: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Objects of class formula

112

General structure: LHS ~ RHS

I ~ is used to define a model formula

I LHS is usually a single vector, the response

I RHS is of the form

op1 term1 op2 term2 ...

where opi is either + or - and termi: formula expression consisting

of factors, vectors or matrices connected by formula operators.

I Examples of formula operators are in RI p52.

I I(object) treated as is, inhibit the interpretation of operators as

model operators.

I offset(object) term in a linear model with known coefficient (=1)

Page 113: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

lm object

113

Generic for lm object:plot

print

summary

residuals resid

coef

predict

add1

drop1

step

deviance

formula

anova

vcov

kappa

effects

There exist some more . . .

Page 114: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Another regression example

114

> pairs( swiss, panel = panel.smooth, main = "swiss data",

+ col = 3 + (swiss$Catholic > 50), gap=0)

Fertility

0 40 80

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ●

● ●●

●●

●●●

● ●

●●

●●

● ●

0 20 40

●●

●●

● ●●

●● ●

●●

●●

●●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

●●

15 20 25

4060

80●●

● ●

● ●●

●● ●

●●

●●

●● ●

●●

● ●●

● ●

●●

●●

●●

040

80

●●●

●●

● ●

●●

●●●●

● ●

●●

●●

●●

●●●

●●●

●●

●●●

Agriculture

●● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●● ●

● ●●

●●

●● ●

●●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●● ●

●●●

●●

●● ●

●●●

●●

●●

●●

●●●●● ●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●● ●

●●

●●

●●

●●●

● ●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● Examination●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

515

30

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

020

40

●●

●●

● ● ●●

●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

● ●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

● ● ●●

● ●

● ● Education

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

● ● ●●●

●●●●

●●● ●●●●● ●●●

●●

●● ●● ●● ●●

●● ●●

●●

●●

●●

● ●●●

●●

●●● ●

●●

● ●●●●● ●●●

●●

●●●●● ● ●●

●● ●

●●

●●

●●

● ●●●●

●● ● ●

●●

●● ●●●

●●●●●

● ●● ●●●● ●

●●●

● ●

●●

●●

●●●●

●●

●● ●●

●●

●●●●●

●● ●●●

●●●●● ●● ●

●●●

●●

●●●

Catholic0

4080

●●

●●●●●●

●● ●●

●●

● ●● ●●

● ● ●●●

● ●●●●●● ●

●●●

●●

●●

40 60 80

1520

25

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

5 15 25 35

●●

● ● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●●

● ●●●

●●

●● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●●●

0 40 80

● ●

●●●

●●

●●

●● ●

●●

●●

●●

●●●●

●●

●●●

●●●●

● Infant.Mortality

swiss data

Page 115: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Another regression example

115

> summary( lmswiss <- lm(Fertility ~ . , data = swiss))Call:lm(formula = Fertility ~ ., data = swiss)

Residuals:Min 1Q Median 3Q Max

-15.2743 -5.2617 0.5032 4.1198 15.3213

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 66.91518 10.70604 6.250 1.91e-07 ***Agriculture -0.17211 0.07030 -2.448 0.01873 *Examination -0.25801 0.25388 -1.016 0.31546Education -0.87094 0.18303 -4.758 2.43e-05 ***Catholic 0.10412 0.03526 2.953 0.00519 **Infant.Mortality 1.07705 0.38172 2.822 0.00734 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 7.165 on 41 degrees of freedomMultiple R-squared: 0.7067, Adjusted R-squared: 0.671F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10

Page 116: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Another regression example

116

> drop1( lmswiss, test="F")

Single term deletions

Model:

Fertility ~ Agriculture + Examination + Education + Catholic +

Infant.Mortality

Df Sum of Sq RSS AIC F value Pr(>F)

<none> 2105.0 190.69

Agriculture 1 307.72 2412.8 195.10 5.9934 0.018727 *

Examination 1 53.03 2158.1 189.86 1.0328 0.315462

Education 1 1162.56 3267.6 209.36 22.6432 2.431e-05 ***

Catholic 1 447.71 2552.8 197.75 8.7200 0.005190 **

Infant.Mortality 1 408.75 2513.8 197.03 7.9612 0.007336 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Page 117: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Another regression example

117

> add1( lm( Fertility ~ 1, data=swiss), ~ Agriculture +

+ Examination + Education + Catholic + Infant.Mortality)

Single term additions

Model:

Fertility ~ 1

Df Sum of Sq RSS AIC

<none> 7178.0 238.34

Agriculture 1 894.8 6283.1 234.09

Examination 1 2994.4 4183.6 214.97

Education 1 3162.7 4015.2 213.04

Catholic 1 1543.3 5634.7 228.97

Infant.Mortality 1 1245.5 5932.4 231.39

Page 118: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Other uses of formula objects

118

I Functions like plot or boxplot can be fed with a formula object.

I Generalized linear models, extensions of linear models:

glm( formula, family = gaussian, data, weights, subset, ...)

Page 119: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

119

Part 5

Programming tricks

Page 120: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

120

I Search path

I Scripting

I Functions

I Writing packages

I Customize the environment

I Writing documents

Page 121: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Search path

121

R objects of a session are stored in environments.

The global environment is called the workspace.

> ls()

[1] "a" "beta" "brcol" "d"

[5] "facetcol" "grid" "l" "ll"

[9] "llmat" "lm1" "loglikelihood" "m"

[13] "m1" "m2" "m3" "mu"

[17] "mx" "myvar" "n" "ncol"

[21] "nrcyclones" "ns" "par" "s"

[25] "sample" "sigma" "v" "x"

[29] "y" "zfacet"

> rm( m1, m2, m3, facet, loglikelihood, nrcyclones, facetcol, grid,

+ llmat, zfacet, ncol, mx, brcol, ll, myvar, lm1, sample)

> ls()

[1] "a" "beta" "d" "l" "m" "mu" "n" "ns"

[9] "par" "s" "sigma" "v" "x" "y"

Page 122: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Search path

122

To list all environments or databases:

> search()

[1] ".GlobalEnv" "package:stats" "package:graphics"

[4] "package:grDevices" "package:utils" "package:datasets"

[7] "package:methods" "Autoloads" "package:base"

Variables are searched for in the databases until an appropriate match

is found.

Page 123: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Search path: data frames

123

attach allows you to put the “columns” of the argument in your

“search path”, i.e., they are directly accessible.

> X1

Error in try(X1) : object 'X1' not found

> attach( d) # reverse is done with a detach(d)

> X1

[1] 1 2

> search()

[1] ".GlobalEnv" "d" "package:stats"

[4] "package:graphics" "package:grDevices" "package:utils"

[7] "package:datasets" "package:methods" "Autoloads"

[10] "package:base"

> detach( d)

> search()[1:3]

[1] ".GlobalEnv" "package:stats" "package:graphics"

Page 124: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 20

124

1. What is the command rm( list=ls()) doing.

2. Attach d, change an entry in X1, then attach d again.

What do you notice?

Page 125: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Scripting

125

I Save R commands in a file.

File is executed with source( filename ),

where filename is a character string.

I Scripting is faster than line by line evaluation.

I Better programming practice compared to history re-evaluation!

I Make use of #.

I Add plenty of spaces or newlines to structure the code.

Page 126: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Scripting: flow control

126

I if-statements:

> if(condition) expr

> if(condition) cons.expr else alt.expr

I Control:

> stop('message')> warning('message') # evaluation is continued

I Loops:

> for(var in seq) expr

> while(condition) expr

> repeat expr # needs a break

Most loops can be avoided by “vectorizing” the commands.

Page 127: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Scripting: flow control: vectorizing

127

Instead of:

> rns <- matrix(0, 90, 100)

> sol <- numeric( 90)

> for ( i in 1:90) {

+ rns[i,] <- rnorm(100)

+ sol[i] <- mean( rns[i,])

+ }

> rns

Use:

> rns <- array( rnorm( 90*100), c(90,100))

> sol <- apply( rns, 1, mean)

Page 128: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 21

128

1. Convince yourself that if ( cond ) expr and if(cond)expr

are equivalent (note the spaces).

2. Create a script executing a few commands and evaluate the script.

E.g. drawing 1000 random numbers from a gamma distribution,

plotting the histogram and indicating the mean and median with

vertical lines.

3. Implement a statement causing an error in the last call, what do

you notice?

Page 129: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions

129

I A function is defined by an assignment of the form

> functionname <- function(arg_1, arg_2, ...) expression

expression is usually a series of R expressions (evaluations) grouped

by { and }.

I The last (evaluated) expression is returned.

I Recommended to use a return() or invisible().

Page 130: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions

130

Example:

two functions that transform Cartesian (x, y) to polar coordinates

(θ, ρ):

> cart2polar <- function(x) {

+ return( cbind( atan2(x[,2], x[,1]), sqrt( x[,1]^2 + x[,2]^2)))

+ }

> polar2cart <- function(x) {

+ return( cbind( x[,2]*cos(x[,1]), x[,2]*sin(x[,1])) )

+ }

> n <- 1500

> po <- cbind( runif(n, 0, 2*pi), runif( n, 0, 1))

Page 131: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions

131

> par( pty="s")

> plot( polar2cart( po))

●●

●●

●●

● ●

●●

● ●

● ●

●●●

●●

●●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●●

●●

●●

●● ●

●●

● ●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

polar2cart(po)[,1]

pola

r2ca

rt(p

o)[,2

]

Page 132: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Functions

132

Maybe, some checking might be useful:

> cart2polar <- function(x) {

+ if ((length(dim(x))!=2) || (dim(x)[2]!=2))

+ stop("Need a nx2 matrix/array")

+ return( cbind(atan2(x[,2],x[,1]), sqrt( x[,1]^2+x[,2]^2)))

+ }

> cart2polar(rep(1,2))

Error in cart2polar(rep(1, 2)) : Need a nx2 matrix/array

> cart2polar(cbind(1,2))

[,1] [,2]

[1,] 1.107149 2.236068

Page 133: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Hands-on tasks 22

133

1. extend the function cart2polar such that an optional argument

allows scaling of the coordinates.

2. extend the function polar2cart such that degrees as input are

possible.

Page 134: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Packages

134

I All R functions and datasets are stored in packages.

I Only when a package is loaded are its contents available.

This is done both for efficiency and to aid package developers,

who are protected from name clashes with other code.

I Packages come along with help files for each function and dataset!

I A few packages are standard and loaded by default:

stats, graphics, grDevices, utils, datasets, methods, base.

I There are > 3800 packages publicly available on CRAN.

Daily increasing . . .

Page 135: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Packages

135

I To see which packages are installed at your site, issue

> library()

I To see which packages are currently loaded, use

> search()

I To load a package, use

> library( abind)

I To remove a package, use

> detach( package:abind)

I A basic description of the package is often given by

> help( "package.name")

RStudio

Page 136: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Packages: namespaces

136

Packages have a NAMESPACE

:: accessing public (exported) objects

::: accessing private (non-exported) objects

Works for not-loaded packages as well!

> exists( "diag.spam")

[1] FALSE

> spam::diag.spam( 1)

[,1]

[1,] 1

Class 'spam'> spam::.spam.addsparsefull

Error : '.spam.addsparsefull' is not an exported object from 'namespace:spam'> # The following would work:

> # spam:::.spam.addsparsefull

Page 137: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Packages: writing packages

137

I Disseminate R code (globally or locally)

I Thorough code and documentation checking

Documentation:

cran.r-project.org/doc/manuals/R-exts.html

cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf

Page 138: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Customize the environment

138

Within RStudio, set preferences (→ Tools → Options)

Page 139: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Customize the environment

139

Global and local initialization files (Section 10.8 in RI).

I global: file taken from the R PROFILE environment variable

I local: .Rprofile in any directory

Launching R executes (“sources”)

1. site profile

2. user profile (local or home)

3. .RData

4. .First()

Page 140: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Customize the environment

140

Example:

> .First <- function() {

+ library( spam)

+ source( "/home/furrer/R/usefulfcn.R")

+ options( width=120)

+ }

Similarly, before closing R, .Last() is executed:

> .Last <- function() {

+ cat( "Thanks for using R - good night or enjoy your coffee\n")

+ }

Page 141: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Customize the environment: ESS

141

ESS: EMACS speaks statistics

EMACS environment for R (and other statistics software)

Page 142: Introduction to R · 2016. 2. 29. · Reinhard Furrer, UZH I-Math, 12. 2. 2014 NZZ.ch Introduction to R. Contents 2 I Basics I Data handling and storing I Plotting I Linear models

Writing documents

142

Using Sweave() mingle/merges LATEX with R code and R code output

within one document.

Structure of a LATEX file with embedded R code:

<<tag, eval=TRUE, echo=TRUE, fig=TRUE>>=

plot( x, y, xlab=’Diameter’, ylab=’Height’)

@

Prints, evaluates the code and includes the figure.

Documentation:

stat.ethz.ch/R-manual/R-devel/library/utils/doc/Sweave.pdf

This presentation has been prepared with Sweave and the LATEX pack-

age pfuef.