introduction to r · 2016. 2. 29. · reinhard furrer, uzh i-math, 12. 2. 2014 nzz.ch introduction...
TRANSCRIPT
Reinhard Furrer, UZH
I-Math, 12. 2. 2014NZZ.ch
Introduction to R
Contents
2
I Basics
I Data handling and storing
I Plotting
I Linear models
I Simple programming tricks
3
Part 1
Basics
4
I What is R?
I The R-environment
I Getting started
I R rules
What is R?
5
I R is a language and environment for statistical computing and
graphics.
I R provides a wide variety of statistical and graphical techniques,
and is highly extensible.
I R produces well-designed publication-quality plots with a careful
choice of default values.
I R is available as Free Software under the terms of the Free Soft-
ware Foundation’s GNU General Public License in source code
form.
What is R?
6
Crude classification:
I Symbolic software:
– Mathematica
– Maple
– Magma
– . . .
I Numeric software:
– MATLAB, Octave
– NCL, IDL
– . . .
– R
The R-environment: micro
7
I R is an integrated suite of software facilities
I Emphasis on statistical analysis and graphical display
I Perform an entire analysis from raw data to reports
I Essentially command line interpreted, links to precompiled code
are possible
The R-environment: macro
8
Due to licence:
I freely available: cran.r-project.org
I huge community
I many packages (>5100): cran.r-project.org/web/packages/
I abundant documentation in form of:
FAQs (cran.r-project.org/doc/FAQ/R-FAQ.html), manuals (cran.r-
project.org/manuals.html or cran.r-project.org/other-docs.html),
wiki’s, books, . . . see www.r-project.org
I several mailing lists: www.r-project.org/mail.html
The R-environment: macro
9
Slides are mainly based on the following sources:
I An Introduction to R: (IR)
cran.r-project.org/doc/manuals/R-intro.pdf
I The R Primer : (RP)
www.stat.washington.edu/cggreen/rprimer/
I The R Inferno: (RI)
www.burns-stat.com/pages/Tutor/R inferno.pdf
and some 10 years of personal use . . .
Getting started: install R
10
Done through “The Comprehensive R Archive Network” (CRAN):
cran.r-project.org
Easy to follow instructions in Chapter 1 of RP:
www.stat.washington.edu/cggreen/rprimer/
Getting started: run R (Linux)
11
Launch R in your console:<194>furrer@furrer-laptop:~/teaching/intro2R> R
R version 2.15.0 (2012-03-30)Copyright (C) 2012 The R Foundation for Statistical ComputingISBN 3-900051-07-0Platform: i686-pc-linux-gnu (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.
>
Getting started: run R
12
RStudio
Runs under Windows, Linux, OS X (free; AGPLv3) rstudio.org
Getting started: run R
13
Tinn-R (Tinn stands for the recursive acronym ’Tinn is not Notepad’)
Runs under Windows (free; GPL) sciviews.org/Tinn-R
Getting started: run R
14
EMACS environment for R (and other statistics software)
Runs under Windows, Linux, OS X (GPL)
Getting started
15
> pi
[1] 3.141593
> cos( pi)
[1] -1
> 2 + 2.3
[1] 4.3
> sqrt( -1) # Oops
[1] NaN
> myvar <- exp( -2.3) # Assigning
> print( myvar)
[1] 0.1002588
> print( myvar, digits=16)
[1] 0.1002588437228037
RStudio
Hands-on tasks 1
16
1. Open RStudio and familarize with it.
2. What is the 15th digit of π?
3. Interpret the result of sin( pi).
Getting started
17
> nrcyclones <- c(6, 5, 4, 6, 6, 3, 12, 7, 4, 2, 6, 7, 4)
> # "c" is a function... creating a vector out of its elements
> summary( nrcyclones)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 4.000 6.000 5.538 6.000 12.000
> hist( nrcyclones)
Histogram of nrcyclones
nrcyclones
Fre
quen
cy
2 4 6 8 10 12
01
23
45
Getting started
18
> plot( nrcyclones, type="b")
●
●
●
● ●
●
●
●
●
●
●
●
●
2 4 6 8 10 12
24
68
1012
Index
nrcy
clon
es
> cor( nrcyclones[-1], nrcyclones[-13])
[1] -0.1113836
Getting started
19
> par( mfrow=c(1,2))
> acf( nrcyclones)
> pacf( nrcyclones)
0 2 4 6 8 10
−0.
50.
00.
51.
0
Lag
AC
F
Series nrcyclones
2 4 6 8 10
−0.
4−
0.2
0.0
0.2
0.4
Lag
Par
tial A
CF
Series nrcyclones
Getting started
20
> help( acf)acf package:stats R Documentation
Auto- and Cross- Covariance and -Correlation Function Estimation
Description:
The function 'acf' computes (and by default plots) estimates ofthe autocovariance or autocorrelation function. Function 'pacf'is the function used for the partial autocorrelations. Function'ccf' computes the cross-correlation or cross-covariance of twounivariate series.
Usage:
acf(x, lag.max = NULL,type = c("correlation", "covariance", "partial"),plot = TRUE, na.action = na.fail, demean = TRUE, ...)
pacf(x, lag.max, plot, na.action, ...)
## Default S3 method:pacf(x, lag.max = NULL, plot = TRUE, na.action = na.fail,
...)
Getting started: getting help
21
Various possibilities:
> ?mean # Shortcut for help( mean)
> ?"%*%" # The quotes are required!
> help.start() # Interactive html-based help!
Further illustrative help is accessed via:
> example("image") # example code in the help of "image"
> demo("image") # run the demo "image"
> demo() # lists all available demos
We hardly use the following command:
> q()
R rules
22
I R is case-sensitive.
I Variable names, function names, etc., should contain only
alphanumeric characters (A-Z, a-z, 0-9), the “.” (and “ ”).
Cannot be a reserved word or start with a digit or ” ”.
I Commands are separated by semicolons (“;”) or by a newline.
Commands are grouped with curly braces ({ }).
I # is the comment sign. Remainder of the line is ignored.
I If a command is not complete at the end of a line, R will give
a continuation prompt, “+ ”, on subsequent lines until the com-
mand is complete.
I As long as matched, single quotes (’) and double quotes (") are
equivalent.
R rules: reserved words
23
The reserved words in R’s parser are:
if, else, repeat, while, for, in, next, break, function
TRUE, FALSE, NULL, Inf, NaN, NA and NA-specific types.
... and ...-derivatives, which are used to refer to arguments
passed down from an enclosing function.
There are (unprotected) short cuts T and F, for TRUE and FALSE:
> T
[1] TRUE
> T <- F # How not to do it!!
> T
[1] FALSE
R rules: functions and operators
24
Most R statements are composed of functions and operators:
> y <- sqrt(2 + 2)
consists of the + operator followed by the √ -function and then the
assign operator.
Functions are of the form function( list of arguments )
Operators are of the form lhs operator rhs
Hands-on tasks 2
25
1. What are operators and what are functions in the following calls:
2 + 1
sin( pi)
2 + cos( 0)
2. What does the function median calculate?
3. Notice the difference between ?mean, ?"mean" and ?in, ?"in".
4. Create a variable named my1var containing log( 3).
5. Which of the following are valid variable names:
yo, beHappy!, I am 2, myvar;val, getvar1, getvar$char.
6.? Many operators can be used as functions: "operator"(lhs, rhs).
Compare: 2 + 2 and "+"(2,2)
R rules: syntax
26
R has the following operators (highest to lowest)::: ::: access variables in a name space$ @ component / slot extraction[ [[ indexing^ exponentiation (right to left)- + unary minus and plus: sequence operator%any% special operators* / multiply, divide+ - (binary) add, subtract< > <= >= == != ordering and comparison! negation& && and| || or~ as in formulae-> ->> rightwards assignment= assignment (right to left)<- <<- assignment (right to left)? help (unary and binary)
Hands-on tasks 3
27
1. Compare:
1:-3
1:(-3)
-1:3
-(1:3)
2. Compare:
2^1/2
2^(1/2)
3.? Be aware of floating point arithmetic:
pi==3.14159265358979
pi==3.141592653589793
pi==3.141592653589793116
28
Part 2
Data handling and storage
29
I Objects
I Indexing
I Functions
I Reading from files
Objects
30
R uses the following “core” objects:
I vectors
I matrices
I arrays
I factors
I lists
I data frames
I functions
Objects: vectors
31
Intrinsic attributes: mode and length
> v <- 1:4
> v
[1] 1 2 3 4
mode is of logical, numeric, complex, character (or raw).
> length( v)
[1] 4
> mode( v)
[1] "numeric"
> mode( 1i) # to give another example
[1] "complex"
The mode numeric has storage mode integer or double.
Hands-on tasks 4
32
1. All elements of a vector are of the same mode.
What is the mode of c("char", pi), c(2,1i)?
2. Interpret the result of sqrt(-1) and sqrt(-1+0i)
3. is.integer and as.integer query and coerce to integer format.
What is the output of length (two ways to verify)?
4.? Compare the results of identical(1,1.0) and
identical( as.integer(1),1.0)
5.? What is the result and storage mode of 3L, 3L*1, 3L*1L, 3L/1L,
3L/3L?
Objects: vectors: generation
33
Concatenation operator:
> v <- c( 1, 2, 3, 4)
Generate sequences (several additional possibilities exist):
> seq( 4) # identical to 1:4
[1] 1 2 3 4
> seq( 1, 12, by=2)
[1] 1 3 5 7 9 11
> seq( 1, by=2, length.out=12)
[1] 1 3 5 7 9 11 13 15 17 19 21 23
> rep( 1:4, 2) # identical to rep.int( 1:4, 2)
[1] 1 2 3 4 1 2 3 4
> rep( 1:4, each=2)
[1] 1 1 2 2 3 3 4 4
> rep( 1:4, 2:5) # identical to rep( 1:4, times=2:5)
[1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4
Hands-on tasks 5
34
1. Interpret the output of the following calls:
seq( from=1, to=13, by=2)
seq( from=1, to=13, length.out=3)
seq( from=1, by=2, length.out=3)
seq( from=1, to=12, by=2, length.out=3)
2. What calls generate the sequence: 1, 4, 4, 7, 7, 7, 10, 10, 10,
10, 13, 13, 13, 13, 13?
3. Create a sequence containing TRUE and FALSE according to the
parity of the last sequence.
4. Why is it not advisable to use the command: c <- c(1, 2, 3, 4)?
Objects: matrices
35
A vector with (minimal) attribute dim
> m <- matrix( 1:16, 4, 4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
> length( m)
[1] 16
> attributes( m)
$dim
[1] 4 4
Objects: matrices
36
A matrix can contain additional attributes
> rownames( m) <- paste( "r", 1:4, sep="")
> attributes( m)
$dim
[1] 4 4
$dimnames
$dimnames[[1]]
[1] "r1" "r2" "r3" "r4"
$dimnames[[2]]
NULL
The function attr( object, name) can be used to specify an attribute:
> attr( m, "dim") <- c(2, 8) # What is the result?
Objects: matrices: generation
37
> m1 <- matrix( 1:8, nrow=4, ncol=4, byrow=TRUE) # recycling
> m2 <- diag( 1:4)
> m3 <- cbind( 1:3, 2:4, 1)
> m3
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 2 3 1
[3,] 3 4 1
> t( m3) # transpose
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 1 1 1
Hands-on tasks 6
38
1. What is the effect of dim( m) <- c( 2, 8)? Try other values.
2. What is the result of
matrix( 1:7, nrow=4, ncol=4)
diag( m1)
rbind( 1:3, 2:4, 1)
cbind( rbind( 1:2, 3:4), 0) ?
3. Construct a block diagonal matrix with 2 blocks of sizes 2×2.
Objects: arrays
39
Arrays are higher-dimensional “matrices”
> a <- array( 1:24, c( 3, 4, 2))
> a
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
Hands-on tasks 7
40
1. What is the length of a?
2. What are its attributes?
3. aperm is the generalization of t.
Trace the elements of aperm(a,c(2,1,3)) and aperm(a,c(3,2,1)).
Objects: factors
41
Strange concept, neither numeric nor character.
> as.factor( 1:3)
[1] 1 2 3
Levels: 1 2 3
> as.factor( 1:3) + 1
[1] NA NA NA
Used in the context of categorical data.
Objects: lists
42
A vector whose elements can be of ‘any’ type.
> l <- list(1:2, as.factor(1:2), paste(1:2))
> l
[[1]]
[1] 1 2
[[2]]
[1] 1 2
Levels: 1 2
[[3]]
[1] "1" "2"
> length(l)
[1] 3
Objects: data frames
43
Matrix-like structures, in which the columns can be of different types.
> d <- data.frame( m)
> d
X1 X2 X3 X4 X5 X6 X7 X8
1 1 3 5 7 9 11 13 15
2 2 4 6 8 10 12 14 16
> attributes( d)
$names
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"
$row.names
[1] 1 2
$class
[1] "data.frame"
Objects: data frames
44
While rownames and colnames are for matrices, names and row.names are
for data frames.
> names( d)
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"
> row.names( d)
[1] "1" "2"
Luckily, the former work as well:
> colnames( d)
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8"
> rownames( d)
[1] "1" "2"
In general, work with dimnames.
Hands-on tasks 8
45
1. Can factors be ordered?
2. What is the difference between l[1] and l[[1]] ?
(use is.list(..) to probe the result).
3. Internally, a data.frame is a list with class data.frame .
Check d[[3]] .
4. What is the length of d? Is the result intuitive?
Objects: functions
46
R is built upon itself. Many of the functions are “visible”:> sdfunction (x, na.rm = FALSE)sqrt(var(if (is.vector(x)) x else as.double(x), na.rm = na.rm))<bytecode: 0x25d9408><environment: namespace:stats>
More later . . .
Objects: coercion and testing
47
An object obj has usually with three associated functions:
obj() , as.obj() , and is.obj() .
> is.matrix( a)
[1] FALSE
> as.matrix( v) # here equivalent to "matrix(v)"
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
Hands-on tasks 9
48
1. Notice the difference between matrix( a, nrow=3)
and as.matrix( a, nrow=3)
2. What is the result of c( 0, NULL, 3),
is.array( m), is.matrix( m)
is.array( a), is.matrix( a)
3. Note all coercions work. What is the result of
as.integer( pi)
as.integer( 2i)
as.numeric( "a")
Objects: summary
49
Source: RI
Indexing
50
Basically, extraction is done via the [ operator:
> v
[1] 1 2 3 4
> v[1]
[1] 1
> v[-c(2:3)] # or v[-c(2,3)] or v[-(2:3)]
[1] 1 4
Similarly, replacement is done via the [<- operator:
> v[ 1] <- 1.1
> v[-c(2:3)] <- c(2.2, 3.3)
> v
[1] 2.2 2.0 3.0 3.3
Indexing: vectors
51
Extraction is done via the [ operator:
> v
[1] 2.2 2.0 3.0 3.3
> v[ c(1, 4)]
[1] 2.2 3.3
> v[-c(1, 4)]
[1] 2 3
> v[c(TRUE, FALSE, TRUE, FALSE)]
[1] 2.2 3.0
> v[c(TRUE, FALSE, TRUE)] # note the recycling!
[1] 2.2 3.0 3.3
Extraction for (very) long vectors:
> tail( v, 2)
[1] 3.0 3.3
> head( v, -1)
[1] 2.2 2.0 3.0
Indexing: matrices
52
> m <- matrix( 1:16, 4, 4)
> m[2, 3]
[1] 10
> m[1,]
[1] 1 5 9 13
> m[,1]
[1] 1 2 3 4
> m[ c(1,8,12)] # ordered columwise
[1] 1 8 12
> m[ c(1,2,4), c(4,2,1)] # note the ordering
[,1] [,2] [,3]
[1,] 13 5 1
[2,] 14 6 2
[3,] 16 8 4
> m[cbind( c(1,2,4), c(4,2,1))] # What is the result when using rbind?
[1] 13 6 4
Indexing: matrices
53
If the matrix has appropriate dimnames attributes:
> rownames( m) <- paste( "r", 1:4, sep="")
> m
[,1] [,2] [,3] [,4]
r1 1 5 9 13
r2 2 6 10 14
r3 3 7 11 15
r4 4 8 12 16
> m["r1",]
[1] 1 5 9 13
> m[,1, drop=FALSE]
[,1]
r1 1
r2 2
r3 3
r4 4
Indexing: matrices
54
Extract or replace the diagonal values:
> n <- min( dim( m))
> diag( m)
[1] 1 6 11 16
> diag( m) <- -(1:n)
How to extract the values above the diagonal?
> m[ (1:(n-1))*(n+1)]
[1] 5 10 15
> m
[,1] [,2] [,3] [,4]
r1 -1 5 9 13
r2 2 -2 10 14
r3 3 7 -3 15
r4 4 8 12 -4
Hands-on tasks 10
55
1. Suppose that m only has rownames, interpret the result of m[,"c1"].
2. Use diag to extract the values above the diagonal.
3. Set the values of m below the diagonal to -1.
4. Compare m[cbind( c(1,2,4), c(4,2,1))] and the result when using
rbind instead?
Indexing: lists
56
Extraction is done via the [, [[, $ operator:
> l[[1]]
[1] 1 2
> l[1]
[[1]]
[1] 1 2
> ll <- list( a=2, b=3, cde=10)
> ll$a
[1] 2
> ll$c # note the partial matching
[1] 10
Indexing: data frames
57
Column extraction is also possible with $ operator:
> d$X1 # a data frame is primarily a list!
[1] 1 2
> d[,1]
[1] 1 2
> d[,"X1"]
[1] 1 2
Similarly:
> d[1,]
X1 X2 X3 X4 X5 X6 X7 X8
1 1 3 5 7 9 11 13 15
> d["1",]
X1 X2 X3 X4 X5 X6 X7 X8
1 1 3 5 7 9 11 13 15
Indexing: other details
58
I Matrices are stored column-wise.
I Arrays are stored along the indices.
I Objects can have length zero, e.g. v[0].
I Indexing starts at one, but indexing can have all negative values.
Hands-on tasks 11
59
1. What happens if ll <- list( a=2, b=3, cd=10, ce=12) is indexed
with ll$c?
2. What elements are extracted with m[1:6], a[1:4*2]?
3. Let exist <- 1:14. What elements are extracted with exist[-c(1:3)],
exist[c(1:3)]? What is the result of exist[-1:3]
4.? Examine the code
nonexist[2] <- 1
nonexist <- numeric(0)
length(nonexist)
nonexist[0]
nonexist[1]
nonexist[2] <- 1
nonexist
Functions
60
Example:
> x <- mean( x, trim=.1)
General structure:
> res <- fcn( defarg1, defarg2,..., optarg1, optarg2, ...)
I res may be NULL
I Required arguments need to be in order.
I Optional arguments are name matched.
Functions: “Math” group
61
Math(x, ...): abs, sign, sqrt, floor, ceiling, trunc
round, signif, exp, log, expm1, log1p
cos, sin, tan, acos, asin, atan
cosh, sinh, tanh, acosh, asinh, atanh
lgamma, gamma, digamma, trigamma
cumsum, cumprod, cummax, cummin
Ops(e1, e2): "+", "-", "*", "/", "^", "%%", "%/%"
"&", "|", "!"
"==", "!=", "<", "<=", ">=", ">"
Summary(..., na.rm=FALSE): all, any, sum, prod, min, max, range
Complex(z): Arg, Conj, Im, Mod, Re
Hands-on tasks 12
62
1. What is the result of min( c( 1, 3, NA)) ?
Is there a difference to min( 1, 3, NA) ?
How to get the result of 1?
2. What is the result of 17 %% 7 and 17 %/% 7 ? Why?
3.? It is possible to define functions without a function name:
(function(x,y) { z <- x**2 + y**2; x+y+z } )(0:7, 1)
Functions: matrices
63
For matrices, special operators are defined:
> m1 <- m2 <- matrix(1, 2, 2)
> m1[2, 2] <- 2
> m1 %*% m2
[,1] [,2]
[1,] 2 2
[2,] 3 3
> solve( m1)
[,1] [,2]
[1,] 2 -1
[2,] -1 1
> det( m1)
[1] 1
Functions: matrices: factorization
64
> svd( m1) # X = U D V'$d[1] 2.618034 0.381966
$u[,1] [,2]
[1,] -0.5257311 -0.8506508[2,] -0.8506508 0.5257311
$v[,1] [,2]
[1,] -0.5257311 -0.8506508[2,] -0.8506508 0.5257311> chol( m1) # X = R' R
[,1] [,2][1,] 1 1[2,] 0 1> eigen( m1) # X = G D G' ## We see eigen and chol again!$values[1] 2.618034 0.381966
$vectors[,1] [,2]
[1,] 0.5257311 -0.8506508[2,] 0.8506508 0.5257311
Functions: matrices: factorization
65
> qr( m1)$qr
[,1] [,2][1,] -1.4142136 -2.1213203[2,] 0.7071068 0.7071068
$rank[1] 2
$qraux[1] 1.7071068 0.7071068
$pivot[1] 1 2
attr(,"class")[1] "qr"
There are several additional functions associated: qr.qy, qr.tqr, . . .
Hands-on tasks 13
66
Let M <- m1 %*% t( m1)
1. What is the eigendecomposition of M ?
2. What are the singular values of the same matrix?
3. Propose several approaches to construct an inverse of
M + diag( 2)
4. How can you calculate the trace of an arbitrary matrix A ?
Functions: probability distributions
67
General construct of prefix and root.
I prefix: d density, p CDF, q quantile, r random numbers
I root: beta, binom, pois, norm, t, and many more
For example:
> runif( 5)
[1] 0.2282756 0.1472576 0.8364201 0.8430635 0.0640814
> dnorm( 0)
[1] 0.3989423
> qt( 0.975, df=1)
[1] 12.7062
Parameters are “quite” standard, consult the help.
Functions: apply
68
Applying a function to margins of an array or matrix.
> d
X1 X2 X3 X4 X5 X6 X7 X8
1 1 3 5 7 9 11 13 15
2 2 4 6 8 10 12 14 16
> apply( d, 2, mean)
X1 X2 X3 X4 X5 X6 X7 X8
1.5 3.5 5.5 7.5 9.5 11.5 13.5 15.5
> apply( d, 1, range)
[,1] [,2]
[1,] 1 2
[2,] 15 16
> apply( d, 1, function(x, tr) { x[2] - mean(x, trim=tr)}, tr=.4)
[1] -5 -5
Hands-on tasks 14
69
1. Draw a normal sample of size 100 and draw a histogram of the
sample.
What is the mean and standard deviation of the sample?
2. Repeat the previous exercise 1000 times and calculate the mean
of the means and the standard deviations.
3. How do the results compare to the ones from your peers?
Is there a way to “homogenize” the procedure?
Reading from files: data files
70
I Several possibilities of reading ASCII files:
> read.table(file, header = FALSE, sep = "")
> read.csv(file, header = TRUE, sep = ",", quote="\"")
> scan(file, ...)
I scan is a powerful (complex) alternative.
I Byte length encoding is read with read.fwd.
I Common open source storage formats are supported:
netCDF, GRIB, HDF, . . .
(specific packages need to be loaded).
I Directly reading Excel files is not possible (non-free software).
Reading from files: R code/objects
71
I R “source code” is read and evaluated with source("filename.R")
I R data files are read with load("file.RData")
I To save R objects use
> save.image()
> save(..., file="file.RData") # symbols or character strings
Note the save.image question when quitting R.
I data() lists all the available datasets in the search path (directly
available).
data( package=.packages( all.available=TRUE)) lists all the avail-
able datasets.
I data( name, package="packagename") loads name from the package
packagename.
Hands-on tasks 15
72
On www.math.uzh.ch/furrer/software/workshop/ the three datasets
data1.dat, data2.dat and data3.dat are deposited (use entire link).
1. Download the datasets and look at the content thereof.
What are the differences?
2. Load these three datasets into R, by properly keeping column and
row names of the original data.
Try to specify directly the URL instead of the filename, what
do you notice?
3. Save one of the datasets in R-native format.
4.? Are there ways to reduce the file size?
73
Part 3
Plotting
74
I Plotting in R
I High-level plotting (HLP) functions
I Low-level plotting (LLP) functions
I Interactive graphics functions
I Graphical parameters
Plotting in R
75
R distinguishes different plotting type functions:
I High-level plotting (HLP) functions create a new plot on the
graphics device, possibly with axes, labels, titles and so on.
I Low-level plotting (LLP) functions add more information to an
existing plot, such as extra points, lines and labels.
I Interactive graphics functions allow you interactively add infor-
mation to, or extract information from, an existing plot, using a
pointing device such as a mouse.
R maintains a list of graphical parameters which can be manipulated
to customize your plots.
Plotting in R: workflow
76
General workflow:
1. Choosing a device (screen, PDF file, . . . )
2. Setting graphical parameters
3. Calling a high-level plotting function
4. Calling low-level plotting functions
5. More calls to high-level and low-level functions
6. Closing the device
Simplest example (i.e., point 3 only):
> plot(0)
Plotting in R: workflow: example
77
> x <- rnorm( 100) # 100 random numbers
> pdf( "figure1.pdf") # Output to a PDF file
> par( mfrow=c(1, 2)) # Two panels for this plot
> hist( x) # high-level call
> abline( v=mean( x)) # low-level call
> qqnorm( x) # second high-level call
> dev.off() # close the device
produces: Histogram of x
x
Fre
quen
cy
−2 0 1 2
05
1015
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●●
−2 0 1 2
−2
−1
01
2
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Plotting in R: workflow
78
I If no device is open, the default one will be used (usually screen).
I When producing files, dev.off() is required.
I Each new high-level plot overwrites the current area, unless dif-
ferently specified (usually, add=TRUE).
I Several devices can be open, only one is active. Use dev.cur()
and dev.set(), to inquire and set the active device.
HLP functions
79
Common high-level plotting functions:
plot(x, y) most basic plotting command, flexiblehist(x) histogram (specify breaks for discrete data)boxplot(x) boxplot of one or several variablesqqnorm(y) quantile-quantile plot (empirical vs normal)qqplot(x, y) quantile-quantile plot (empirical vs arbitrary)pairs(x) scatterplots for multidimensional datacurve(expr) plots a functionimage(x, y, z) z = f(x, y) is provided in a matrixcontour(x, y, z) z = f(x, y) is provided in a matrixpersp(x, y, z) basic 3D plotting with shading
Hands-on tasks 16
80
1. Draw a random sample of size 15 from a normal distribution.
Plot a histogram and superimpose the true density.
2. Repeat the experiment 100 times and superimpose a histogram
of the means.
HLP functions: 3D plotting
81
Consider X1, . . . , Xniid∼ N (µ, σ2).
Investigate the likelihood function L(µ, σ) =n∏i=1
fX(xi;µ, σ).
For numerical stability, we work with the log-likelihood.
> mu <- 2
> sigma <- 2
> n <- 20
> x <- rnorm(n,mu,sigma)
> loglikelihood <- function(pars, x) {
+ return( sum( dnorm( x, pars[1], pars[2], log=T) ) )
+ }
HLP functions: 3D plotting
82
Evaluate the log-likelihood over a grid
> ns <- 50
> m <- seq( 1, to=4, length=ns)
> s <- seq( 1, to=5, length=ns)
> grid <- expand.grid( m, s)
> ll <- apply( grid, 1, loglikelihood, x=x) # What is ll?
> llmat <- matrix( ll, ns) # What is dim(llmat)? Why?
> image( m, s, llmat)
1.0 1.5 2.0 2.5 3.0 3.5 4.0
12
34
5
m
s
HLP functions: 3D plotting
83
> ncol <- 64
> mx <- unlist( grid[ which.max( ll),])
> image( m, s, llmat, col=topo.colors(ncol),
+ xlab=expression(mu), ylab=expression(sigma))
> abline( v=mx[1], h=mx[2])
1.0 1.5 2.0 2.5 3.0 3.5 4.0
12
34
5
µ
σ
HLP functions: 3D plotting
84
> image( m, s, llmat, col=topo.colors(64),
+ xlab=expression(mu), ylab=expression(sigma))
> abline( v=mx[1], h=mx[2])
> box()
> contour( m, s, llmat, add=T)
1.0 1.5 2.0 2.5 3.0 3.5 4.0
12
34
5
µ
σ
−70 −65 −60 −60 −55 −55 −50
−50
−45
−40
HLP functions: 3D plotting
85
> persp( m, s, llmat)
m
s
llmat
HLP functions: 3D plotting
86
> persp( m, s, llmat, phi=45, theta=30)
m
s
llmat
HLP functions: 3D plotting
87
> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE)
HLP functions: 3D plotting
88
> zfacet <- llmat[-1,-1]+llmat[-1,-ns]+llmat[-ns,-1]+llmat[-ns,-ns]
> facetcol <- cut( zfacet, ncol)
> brcol <- colorRampPalette( c("white","yellow", "red") )
> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE,
+ col=brcol( ncol)[facetcol]) -> out
HLP functions: 3D plotting
89
> persp( m, s, llmat, phi=45, theta=30, axes=FALSE, box=FALSE,
+ col=brcol(ncol)[facetcol], border=NA)
> points( trans3d(mx[1], mx[2], max( llmat), out), cex=4, col=4,
+ pch=4)
HLP functions: 3D plotting
90
Grid search delivers maximum:
> c( value=max(ll), mx)
value Var1 Var2
-39.924118 2.408163 1.816327
Numerical optimum is at:
> par <- optim( mx, function( theta) -loglikelihood( theta, x),
+ method="L-BFGS-B", lower=c( -Inf, 0))
> c( value=-par[["value"]], par[["par"]])
value Var1 Var2
-39.913951 2.381047 1.780258
> par[4] # _ALWAYS_ check!
$convergence
[1] 0
For a maximization, set control$fnscale to a negative value.
Hands-on tasks 17
91
Draw a random sample of size two from a normal density.
1. Plot the log-likelihood as a function of x1 and x2.
2. Plot the log-likelihood as a function of µ and σ.
LLP functions
92
Common low-level plotting functions:
points, lines similar as plot
title main/sub above/below the panelabline v, h, or intercept/slopetext like points with text insteadmtext quite flexiblelegend flexible through many parametersaxis add additional axis, (see xaxt, yaxt)box around the panelarrows, segments . . .polygon . . .rect . . .
Interactive graphics function
93
I locator(n=512):
gets n coordinates of the graphics cursor when left mouse button
is pressed.
I identify(x, y, n=length(x)):
after a left mouse button click, reads the position and searches
the closest point among x,y. Returns the index of the points.
I Both functions quit when pressing any other button.
I For more interaction, use package rgl.
Graphical parameters
94
The function par queries and sets plotting parameters (similar to
option for “system” parameters).
> par("bty") # Frame is a rectangle
[1] "o"
> par(bty="n") # no frame/box is drawn
> par("bty")
[1] "n"
Many options are available, see for example:
> par()
?par is my most frequent help call.
Graphical parameters
95
Further parameters:
adj text ajustment (.5 is default, centering)bg fg background and foreground (default) colorcex, cex. magnification of text and symbols relative to the defaultcol, col. color specification (numbers 0:7, words, rgb hex string)las rotation style of axis labelslty line type (1=solid, 2=dashed, 3=dotted, ...)lwd line widthmfrow,mfcol array of subplots filled by row/columnew if TRUE the next HLP will not clean the framepch specifying the symbol used for pointspty if s use square plotting areaxaxs, yaxs i for precise axis boundsxaxt, yaxt n to suppress axis drawingxlog, ylog if TRUE use logarithmic scale
where “ ” : axis, lab, main, sub
Graphical parameters
96
mai and omi (in inches or mar and oma in ’lines’):
As well as mgp, (defaults to c(3,1,0)). . .
Graphical parameters: example
97
> sample <- rt(100, df=2)
> boxplot( sample)
●
●
●
●
●●
−10
−5
05
1015
20
Graphical parameters: example
98
> par(bty="l", col=5, col.main=2, cex=2)
> boxplot( sample, main="Boxplot")−
100
1020
Boxplot
Graphical parameters: example
99
> par(bty="l", col.main=2, col.axis=4, cex=2, mai=c(.1,.7,.5,.1),
+ mgp=c(3,.8,0), adj=1, las=1, pch="-")
> boxplot( sample, main="Boxplot", col=5)
−
−
−
−
−−−10
−5
0
5
10
15
20Boxplot
Graphical parameters: example
100
> par(bty="l", col.main=2, col.axis=4, cex=2, mai=c(.1,.7,.5,.1),
+ mgp=c(3,.8,0), adj=1, las=1, pch="-")
> boxplot( sample, col=5)
> title("Boxplot", adj=.5)
−
−
−
−
−−−10
−5
0
5
10
15
20Boxplot
Hands-on tasks 18
101
Create the following plot. The data is available at:
www.math.uzh.ch/furrer/software/workshop/wheat.csv50
6070
8090
110
Durum
US
pro
duct
ion
(mio
bus
hel)
56
78
910
Pric
e (U
SD
per
bus
hel)
2008/09 2009/10 2010/11 2011/12
Hands-on tasks 19
102
Create the following plot.
−3 −2 −1 0 1 2 3
−6
−4
−2
02
46
x
f(x)
ex
ln(x)
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
f(x)
cosh(x)arcosh(x)
103
Part 4
Linear models
104
I A regression example
I Objects of class formula
I lm object
I Another regression example
I Other uses of formula objects
A regression example
105
Suppose we have a response y for a set of predictors x1, . . . , xp.
Assume a linear model
yi = β1xi1 + · · ·+ βpxip + εi εiiid∼ N (0, σ2), i = 1, . . . , n
in matrix notation y = Xβ + ε.
Given response and predictors “solve” the regression problem:
I What are the estimates β̂?
I Which predictors are significant?
I Is the model adequate?
I . . .
A regression example
106
Artificial data, so we know the “truth”:
> n <- 10
> x <- runif( n, -1, 2)
> beta <- c( 1, 1)
> sigma <- .5
> y <- beta[1] + beta[2]*x + rnorm( n, sd=sigma)
> plot( x, y)
●
●
●
●
●
●
●
●
●●
−0.5 0.0 0.5 1.0 1.5
0.5
1.5
2.5
3.5
x
y
A regression example
107
A linear model is fitted with> lm1 <- lm( y~x)> summary( lm1)Call:lm(formula = y ~ x)
Residuals:Min 1Q Median 3Q Max
-1.1663 -0.3133 0.1224 0.3003 0.6425
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9993 0.2257 4.426 0.002208 **x 1.0673 0.2031 5.255 0.000769 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.577 on 8 degrees of freedomMultiple R-squared: 0.7754, Adjusted R-squared: 0.7473F-statistic: 27.62 on 1 and 8 DF, p-value: 0.000769
A regression example
108
> coef( lm1)
(Intercept) x
0.9992531 1.0673167
> fitted( lm1)
1 2 3 4 5 6 7
0.7820819 1.1234585 1.7661843 2.8399724 0.5777119 2.8085353 2.9567395
8 9 10
2.0477779 1.9463282 0.1297729
> resid( lm1)
1 2 3 4 5
-0.39579007 0.23662769 0.32153818 0.17254164 -0.12536026
6 7 8 9 10
0.64252432 0.07220797 -0.37600485 -1.16633597 0.61805134
A regression example
109
> par( mfrow=c(2, 2))
> plot( lm1)
0.5 1.0 1.5 2.0 2.5 3.0
−1.
00.
0
Fitted values
Res
idua
ls
●
●●
●
●
●
●
●
●
●
Residuals vs Fitted
9
610
●
●●
●
●
●
●
●
●
●
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−2
−1
01
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als Normal Q−Q
9
106
0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
Fitted values
Sta
ndar
dize
d re
sidu
als
●
●●
●●
●
●
●
●
●
Scale−Location9
10 6
0.0 0.1 0.2 0.3
−2
01
Leverage
Sta
ndar
dize
d re
sidu
als
●
●●●
●
●
●
●
●
●
Cook's distance 1
0.5
0.5
Residuals vs Leverage10
9
6
A regression example
110
> pre <- predict( lm1, newdata=data.frame(x=0))
> pre
1
0.9992531
> plot( x, y)
> points( 0, pre, col=2, cex=2)
●
●
●
●
●
●
●
●
●●
−0.5 0.0 0.5 1.0 1.5
0.5
1.5
2.5
3.5
x
y
●
A regression example
111
> new <- data.frame( x = seq(-2, 3, by=0.25))
> pred.w.plim <- predict( lm1, new, interval="prediction")
> pred.w.clim <- predict( lm1, new, interval="confidence")
> plot( x, y)
> points( 0, pre, col=2, cex=2)
> matlines( new$x, cbind(pred.w.clim, pred.w.plim[,-1]), lty=1)
●
●
●
●
●
●
●
●
●●
−0.5 0.0 0.5 1.0 1.5
0.5
1.5
2.5
3.5
x
y
●
Objects of class formula
112
General structure: LHS ~ RHS
I ~ is used to define a model formula
I LHS is usually a single vector, the response
I RHS is of the form
op1 term1 op2 term2 ...
where opi is either + or - and termi: formula expression consisting
of factors, vectors or matrices connected by formula operators.
I Examples of formula operators are in RI p52.
I I(object) treated as is, inhibit the interpretation of operators as
model operators.
I offset(object) term in a linear model with known coefficient (=1)
lm object
113
Generic for lm object:plot
summary
residuals resid
coef
predict
add1
drop1
step
deviance
formula
anova
vcov
kappa
effects
There exist some more . . .
Another regression example
114
> pairs( swiss, panel = panel.smooth, main = "swiss data",
+ col = 3 + (swiss$Catholic > 50), gap=0)
Fertility
0 40 80
●●
●
●
●●
●
●
●●●
●●●
●
●●
●●
●●●
●●
●●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
● ●●
●● ●
●
●●
● ●
● ●●
●●
●●●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
0 20 40
●●
●
●
●●
●
●
● ●●
●● ●
●
●●
●●
●●●
●●
● ●●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●●●
●●●
●
●●
●●
●●●
●●
●●●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
15 20 25
4060
80●●
●
●
● ●
●
●
● ●●
●● ●
●
●●
●●
●● ●
●●
● ●●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
040
80
●
●●●
●●
● ●
●●
●●●●
● ●
●
●●
●
●●
●●
●
●●●
●
●
●●●
●
●
●●
●
●
●●●
●
●
●
●
●
Agriculture
●
●● ●
●●
●●
●●
● ●●
●●●
●
●●
●
●●
●●
●
●● ●
●
●
● ●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●●
●●
●●
●●
● ●●
●●●
●
●●
●
●●
●●
●
●● ●
●
●
●●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●●
●●
●●
●●
●●●●● ●
●
●●
●
●●
●●
●
●●●
●
●
●●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●●●
●
●●
●●
●●●
●● ●
●
●●
●
●●
●●
●
●●●
●
●
● ●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
● Examination●
●●
●
●
●
●●●
●●
●
●
●●
● ●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●●
●
●
●●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
515
30
●
●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
020
40
●●
●●
●
● ● ●●
●
●
●●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●●● ●
●
●●
●
●●
●●
●●
●
● ●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●●● ●
●
●●
●
●●
●●
● ●
●
● ●●●
●
●
●●
●
●●
●
●
●
● ●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
● ● ●●
●
● ●
●
● ● Education
●●
●●
●
●●●●
●
●
●●●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●●●●
●
●
●●●●
●
●●
●
●●
●●
●●
●
●●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
● ●●●
●
●●
●
●●
●
●●
●
●
● ● ●●●
●
●●●●
●
●●● ●●●●● ●●●
●●
●
●● ●● ●● ●●
●● ●●
●●
●●
●
●
●●
●
●
● ●●●
●●
●●● ●
●
●●
● ●●●●● ●●●
●●
●
●●●●● ● ●●
●● ●
●
●●
●●
●
●
●●
●
●
● ●●●●
●
●● ● ●
●
●●
●● ●●●
●●●●●
●
●
● ●● ●●●● ●
●●●
●
● ●
●●
●
●
●●
●
●
●●●●
●●
●● ●●
●
●●
●●●●●
●● ●●●
●
●
●●●●● ●● ●
●●●
●
●●
●●●
Catholic0
4080
●
●●
●
●
●●●●●●
●● ●●
●
●●
● ●● ●●
● ● ●●●
●
●
● ●●●●●● ●
●●●
●
●●
●●
●
40 60 80
1520
25
●●
●●●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●●
●
●●● ●
●
● ●
●●●
●
●●
●
● ●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●●
●
●●● ●
●
5 15 25 35
●●
● ● ●
●
●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●●
●●
●
●●
●●
●●
●
● ●●
●
● ●●●
●
●●
●● ●
●
●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●●
●●
●
●●
●●
●●
●
●●●
●
●●●●
●
0 40 80
● ●
●●●
●
●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●●
●●
●
●●●●
●●
●
●●●
●
●●●●
● Infant.Mortality
swiss data
Another regression example
115
> summary( lmswiss <- lm(Fertility ~ . , data = swiss))Call:lm(formula = Fertility ~ ., data = swiss)
Residuals:Min 1Q Median 3Q Max
-15.2743 -5.2617 0.5032 4.1198 15.3213
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.91518 10.70604 6.250 1.91e-07 ***Agriculture -0.17211 0.07030 -2.448 0.01873 *Examination -0.25801 0.25388 -1.016 0.31546Education -0.87094 0.18303 -4.758 2.43e-05 ***Catholic 0.10412 0.03526 2.953 0.00519 **Infant.Mortality 1.07705 0.38172 2.822 0.00734 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 7.165 on 41 degrees of freedomMultiple R-squared: 0.7067, Adjusted R-squared: 0.671F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10
Another regression example
116
> drop1( lmswiss, test="F")
Single term deletions
Model:
Fertility ~ Agriculture + Examination + Education + Catholic +
Infant.Mortality
Df Sum of Sq RSS AIC F value Pr(>F)
<none> 2105.0 190.69
Agriculture 1 307.72 2412.8 195.10 5.9934 0.018727 *
Examination 1 53.03 2158.1 189.86 1.0328 0.315462
Education 1 1162.56 3267.6 209.36 22.6432 2.431e-05 ***
Catholic 1 447.71 2552.8 197.75 8.7200 0.005190 **
Infant.Mortality 1 408.75 2513.8 197.03 7.9612 0.007336 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Another regression example
117
> add1( lm( Fertility ~ 1, data=swiss), ~ Agriculture +
+ Examination + Education + Catholic + Infant.Mortality)
Single term additions
Model:
Fertility ~ 1
Df Sum of Sq RSS AIC
<none> 7178.0 238.34
Agriculture 1 894.8 6283.1 234.09
Examination 1 2994.4 4183.6 214.97
Education 1 3162.7 4015.2 213.04
Catholic 1 1543.3 5634.7 228.97
Infant.Mortality 1 1245.5 5932.4 231.39
Other uses of formula objects
118
I Functions like plot or boxplot can be fed with a formula object.
I Generalized linear models, extensions of linear models:
glm( formula, family = gaussian, data, weights, subset, ...)
119
Part 5
Programming tricks
120
I Search path
I Scripting
I Functions
I Writing packages
I Customize the environment
I Writing documents
Search path
121
R objects of a session are stored in environments.
The global environment is called the workspace.
> ls()
[1] "a" "beta" "brcol" "d"
[5] "facetcol" "grid" "l" "ll"
[9] "llmat" "lm1" "loglikelihood" "m"
[13] "m1" "m2" "m3" "mu"
[17] "mx" "myvar" "n" "ncol"
[21] "nrcyclones" "ns" "par" "s"
[25] "sample" "sigma" "v" "x"
[29] "y" "zfacet"
> rm( m1, m2, m3, facet, loglikelihood, nrcyclones, facetcol, grid,
+ llmat, zfacet, ncol, mx, brcol, ll, myvar, lm1, sample)
> ls()
[1] "a" "beta" "d" "l" "m" "mu" "n" "ns"
[9] "par" "s" "sigma" "v" "x" "y"
Search path
122
To list all environments or databases:
> search()
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
Variables are searched for in the databases until an appropriate match
is found.
Search path: data frames
123
attach allows you to put the “columns” of the argument in your
“search path”, i.e., they are directly accessible.
> X1
Error in try(X1) : object 'X1' not found
> attach( d) # reverse is done with a detach(d)
> X1
[1] 1 2
> search()
[1] ".GlobalEnv" "d" "package:stats"
[4] "package:graphics" "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods" "Autoloads"
[10] "package:base"
> detach( d)
> search()[1:3]
[1] ".GlobalEnv" "package:stats" "package:graphics"
Hands-on tasks 20
124
1. What is the command rm( list=ls()) doing.
2. Attach d, change an entry in X1, then attach d again.
What do you notice?
Scripting
125
I Save R commands in a file.
File is executed with source( filename ),
where filename is a character string.
I Scripting is faster than line by line evaluation.
I Better programming practice compared to history re-evaluation!
I Make use of #.
I Add plenty of spaces or newlines to structure the code.
Scripting: flow control
126
I if-statements:
> if(condition) expr
> if(condition) cons.expr else alt.expr
I Control:
> stop('message')> warning('message') # evaluation is continued
I Loops:
> for(var in seq) expr
> while(condition) expr
> repeat expr # needs a break
Most loops can be avoided by “vectorizing” the commands.
Scripting: flow control: vectorizing
127
Instead of:
> rns <- matrix(0, 90, 100)
> sol <- numeric( 90)
> for ( i in 1:90) {
+ rns[i,] <- rnorm(100)
+ sol[i] <- mean( rns[i,])
+ }
> rns
Use:
> rns <- array( rnorm( 90*100), c(90,100))
> sol <- apply( rns, 1, mean)
Hands-on tasks 21
128
1. Convince yourself that if ( cond ) expr and if(cond)expr
are equivalent (note the spaces).
2. Create a script executing a few commands and evaluate the script.
E.g. drawing 1000 random numbers from a gamma distribution,
plotting the histogram and indicating the mean and median with
vertical lines.
3. Implement a statement causing an error in the last call, what do
you notice?
Functions
129
I A function is defined by an assignment of the form
> functionname <- function(arg_1, arg_2, ...) expression
expression is usually a series of R expressions (evaluations) grouped
by { and }.
I The last (evaluated) expression is returned.
I Recommended to use a return() or invisible().
Functions
130
Example:
two functions that transform Cartesian (x, y) to polar coordinates
(θ, ρ):
> cart2polar <- function(x) {
+ return( cbind( atan2(x[,2], x[,1]), sqrt( x[,1]^2 + x[,2]^2)))
+ }
> polar2cart <- function(x) {
+ return( cbind( x[,2]*cos(x[,1]), x[,2]*sin(x[,1])) )
+ }
> n <- 1500
> po <- cbind( runif(n, 0, 2*pi), runif( n, 0, 1))
Functions
131
> par( pty="s")
> plot( polar2cart( po))
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●●
● ●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●
●
● ●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
0.5
1.0
polar2cart(po)[,1]
pola
r2ca
rt(p
o)[,2
]
Functions
132
Maybe, some checking might be useful:
> cart2polar <- function(x) {
+ if ((length(dim(x))!=2) || (dim(x)[2]!=2))
+ stop("Need a nx2 matrix/array")
+ return( cbind(atan2(x[,2],x[,1]), sqrt( x[,1]^2+x[,2]^2)))
+ }
> cart2polar(rep(1,2))
Error in cart2polar(rep(1, 2)) : Need a nx2 matrix/array
> cart2polar(cbind(1,2))
[,1] [,2]
[1,] 1.107149 2.236068
Hands-on tasks 22
133
1. extend the function cart2polar such that an optional argument
allows scaling of the coordinates.
2. extend the function polar2cart such that degrees as input are
possible.
Packages
134
I All R functions and datasets are stored in packages.
I Only when a package is loaded are its contents available.
This is done both for efficiency and to aid package developers,
who are protected from name clashes with other code.
I Packages come along with help files for each function and dataset!
I A few packages are standard and loaded by default:
stats, graphics, grDevices, utils, datasets, methods, base.
I There are > 3800 packages publicly available on CRAN.
Daily increasing . . .
Packages
135
I To see which packages are installed at your site, issue
> library()
I To see which packages are currently loaded, use
> search()
I To load a package, use
> library( abind)
I To remove a package, use
> detach( package:abind)
I A basic description of the package is often given by
> help( "package.name")
RStudio
Packages: namespaces
136
Packages have a NAMESPACE
:: accessing public (exported) objects
::: accessing private (non-exported) objects
Works for not-loaded packages as well!
> exists( "diag.spam")
[1] FALSE
> spam::diag.spam( 1)
[,1]
[1,] 1
Class 'spam'> spam::.spam.addsparsefull
Error : '.spam.addsparsefull' is not an exported object from 'namespace:spam'> # The following would work:
> # spam:::.spam.addsparsefull
Packages: writing packages
137
I Disseminate R code (globally or locally)
I Thorough code and documentation checking
Documentation:
cran.r-project.org/doc/manuals/R-exts.html
cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf
Customize the environment
138
Within RStudio, set preferences (→ Tools → Options)
Customize the environment
139
Global and local initialization files (Section 10.8 in RI).
I global: file taken from the R PROFILE environment variable
I local: .Rprofile in any directory
Launching R executes (“sources”)
1. site profile
2. user profile (local or home)
3. .RData
4. .First()
Customize the environment
140
Example:
> .First <- function() {
+ library( spam)
+ source( "/home/furrer/R/usefulfcn.R")
+ options( width=120)
+ }
Similarly, before closing R, .Last() is executed:
> .Last <- function() {
+ cat( "Thanks for using R - good night or enjoy your coffee\n")
+ }
Customize the environment: ESS
141
ESS: EMACS speaks statistics
EMACS environment for R (and other statistics software)
Writing documents
142
Using Sweave() mingle/merges LATEX with R code and R code output
within one document.
Structure of a LATEX file with embedded R code:
<<tag, eval=TRUE, echo=TRUE, fig=TRUE>>=
plot( x, y, xlab=’Diameter’, ylab=’Height’)
@
Prints, evaluates the code and includes the figure.
Documentation:
stat.ethz.ch/R-manual/R-devel/library/utils/doc/Sweave.pdf
This presentation has been prepared with Sweave and the LATEX pack-
age pfuef.