statistical computing using r - staff.pubhealth.ku.dk

417
R Basics Statistics Graphics Programming Nonlinear Models Examples Statistical Computing Using R Peter Dalgaard Department of Biostatistics University of Copenhagen March 2008 1 / 114

Upload: others

Post on 07-May-2022

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Statistical Computing Using R

Peter Dalgaard

Department of BiostatisticsUniversity of Copenhagen

March 2008

1 / 114

Page 2: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Practicalities

I 4 hours 1:00pm–5:00pmI Breaks ?I Lectures are theory alternating with demosI Planned for about 30 slides per hour, i.e. a reasonably

relaxed pace (actually, a bit too many . . . )I No exercises — will skip installation information

2 / 114

Page 3: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Practicalities

I 4 hours 1:00pm–5:00pmI Breaks ?I Lectures are theory alternating with demosI Planned for about 30 slides per hour, i.e. a reasonably

relaxed pace (actually, a bit too many . . . )I No exercises — will skip installation information

2 / 114

Page 4: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Practicalities

I 4 hours 1:00pm–5:00pmI Breaks ?I Lectures are theory alternating with demosI Planned for about 30 slides per hour, i.e. a reasonably

relaxed pace (actually, a bit too many . . . )I No exercises — will skip installation information

2 / 114

Page 5: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Practicalities

I 4 hours 1:00pm–5:00pmI Breaks ?I Lectures are theory alternating with demosI Planned for about 30 slides per hour, i.e. a reasonably

relaxed pace (actually, a bit too many . . . )I No exercises — will skip installation information

2 / 114

Page 6: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Practicalities

I 4 hours 1:00pm–5:00pmI Breaks ?I Lectures are theory alternating with demosI Planned for about 30 slides per hour, i.e. a reasonably

relaxed pace (actually, a bit too many . . . )I No exercises — will skip installation information

2 / 114

Page 7: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basics of R

I What is R?I Interacting with RI Extended user interfacesI The R languageI Dealing with R’s workspaceI Reading dataI Data handling tasks

3 / 114

Page 8: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basics of R

I What is R?I Interacting with RI Extended user interfacesI The R languageI Dealing with R’s workspaceI Reading dataI Data handling tasks

3 / 114

Page 9: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basics of R

I What is R?I Interacting with RI Extended user interfacesI The R languageI Dealing with R’s workspaceI Reading dataI Data handling tasks

3 / 114

Page 10: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basics of R

I What is R?I Interacting with RI Extended user interfacesI The R languageI Dealing with R’s workspaceI Reading dataI Data handling tasks

3 / 114

Page 11: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basics of R

I What is R?I Interacting with RI Extended user interfacesI The R languageI Dealing with R’s workspaceI Reading dataI Data handling tasks

3 / 114

Page 12: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basics of R

I What is R?I Interacting with RI Extended user interfacesI The R languageI Dealing with R’s workspaceI Reading dataI Data handling tasks

3 / 114

Page 13: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basics of R

I What is R?I Interacting with RI Extended user interfacesI The R languageI Dealing with R’s workspaceI Reading dataI Data handling tasks

3 / 114

Page 14: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Key Points about R

I Environment built around the programming language R,(an Open Source dialect of the S language).

I R is Free Software, and runs on a variety of platforms (I’llbe using Linux for my own convenience.)

I Command-line execution based on function callsI Extensible with user functionsI Workspace containing data and functionsI Graphics devices

4 / 114

Page 15: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Key Points about R

I Environment built around the programming language R,(an Open Source dialect of the S language).

I R is Free Software, and runs on a variety of platforms (I’llbe using Linux for my own convenience.)

I Command-line execution based on function callsI Extensible with user functionsI Workspace containing data and functionsI Graphics devices

4 / 114

Page 16: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Key Points about R

I Environment built around the programming language R,(an Open Source dialect of the S language).

I R is Free Software, and runs on a variety of platforms (I’llbe using Linux for my own convenience.)

I Command-line execution based on function callsI Extensible with user functionsI Workspace containing data and functionsI Graphics devices

4 / 114

Page 17: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Key Points about R

I Environment built around the programming language R,(an Open Source dialect of the S language).

I R is Free Software, and runs on a variety of platforms (I’llbe using Linux for my own convenience.)

I Command-line execution based on function callsI Extensible with user functionsI Workspace containing data and functionsI Graphics devices

4 / 114

Page 18: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Key Points about R

I Environment built around the programming language R,(an Open Source dialect of the S language).

I R is Free Software, and runs on a variety of platforms (I’llbe using Linux for my own convenience.)

I Command-line execution based on function callsI Extensible with user functionsI Workspace containing data and functionsI Graphics devices

4 / 114

Page 19: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Key Points about R

I Environment built around the programming language R,(an Open Source dialect of the S language).

I R is Free Software, and runs on a variety of platforms (I’llbe using Linux for my own convenience.)

I Command-line execution based on function callsI Extensible with user functionsI Workspace containing data and functionsI Graphics devices

4 / 114

Page 20: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Interacting with R

I Command line interface (CLI)I The basic mode of interaction is “read – evaluate – print”I User types an expression at the command line,I R evaluates itI . . . and prints the resultI Batch variation: read commands from a file

5 / 114

Page 21: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Interacting with R

I Command line interface (CLI)I The basic mode of interaction is “read – evaluate – print”I User types an expression at the command line,I R evaluates itI . . . and prints the resultI Batch variation: read commands from a file

5 / 114

Page 22: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Interacting with R

I Command line interface (CLI)I The basic mode of interaction is “read – evaluate – print”I User types an expression at the command line,I R evaluates itI . . . and prints the resultI Batch variation: read commands from a file

5 / 114

Page 23: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Interacting with R

I Command line interface (CLI)I The basic mode of interaction is “read – evaluate – print”I User types an expression at the command line,I R evaluates itI . . . and prints the resultI Batch variation: read commands from a file

5 / 114

Page 24: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Interacting with R

I Command line interface (CLI)I The basic mode of interaction is “read – evaluate – print”I User types an expression at the command line,I R evaluates itI . . . and prints the resultI Batch variation: read commands from a file

5 / 114

Page 25: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Interacting with R

I Command line interface (CLI)I The basic mode of interaction is “read – evaluate – print”I User types an expression at the command line,I R evaluates itI . . . and prints the resultI Batch variation: read commands from a file

5 / 114

Page 26: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Extended Interfaces

I Windows, Macintosh GUI, JGR: Fairly simple extensions ofCLI, mostly offloads some tasks to menu interface

I Script editing: The ability to work with multiple lines of Rcode, save them to a file for later use, etc. A simple scripteditor is built into the R GUI in recent versions.

I External editor interfaces: TINN-R, R-WinEdt adds syntaxhighlighting. Highly recommended.

I R embedded in a text editor (ESS – Emacs SpeaksStatistics). Popular on Unix/Linux systems.

I Fully graphical interfaces. Rcmdr, pmg, RKward, SciViews,. . .

6 / 114

Page 27: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Extended Interfaces

I Windows, Macintosh GUI, JGR: Fairly simple extensions ofCLI, mostly offloads some tasks to menu interface

I Script editing: The ability to work with multiple lines of Rcode, save them to a file for later use, etc. A simple scripteditor is built into the R GUI in recent versions.

I External editor interfaces: TINN-R, R-WinEdt adds syntaxhighlighting. Highly recommended.

I R embedded in a text editor (ESS – Emacs SpeaksStatistics). Popular on Unix/Linux systems.

I Fully graphical interfaces. Rcmdr, pmg, RKward, SciViews,. . .

6 / 114

Page 28: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Extended Interfaces

I Windows, Macintosh GUI, JGR: Fairly simple extensions ofCLI, mostly offloads some tasks to menu interface

I Script editing: The ability to work with multiple lines of Rcode, save them to a file for later use, etc. A simple scripteditor is built into the R GUI in recent versions.

I External editor interfaces: TINN-R, R-WinEdt adds syntaxhighlighting. Highly recommended.

I R embedded in a text editor (ESS – Emacs SpeaksStatistics). Popular on Unix/Linux systems.

I Fully graphical interfaces. Rcmdr, pmg, RKward, SciViews,. . .

6 / 114

Page 29: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Extended Interfaces

I Windows, Macintosh GUI, JGR: Fairly simple extensions ofCLI, mostly offloads some tasks to menu interface

I Script editing: The ability to work with multiple lines of Rcode, save them to a file for later use, etc. A simple scripteditor is built into the R GUI in recent versions.

I External editor interfaces: TINN-R, R-WinEdt adds syntaxhighlighting. Highly recommended.

I R embedded in a text editor (ESS – Emacs SpeaksStatistics). Popular on Unix/Linux systems.

I Fully graphical interfaces. Rcmdr, pmg, RKward, SciViews,. . .

6 / 114

Page 30: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Extended Interfaces

I Windows, Macintosh GUI, JGR: Fairly simple extensions ofCLI, mostly offloads some tasks to menu interface

I Script editing: The ability to work with multiple lines of Rcode, save them to a file for later use, etc. A simple scripteditor is built into the R GUI in recent versions.

I External editor interfaces: TINN-R, R-WinEdt adds syntaxhighlighting. Highly recommended.

I R embedded in a text editor (ESS – Emacs SpeaksStatistics). Popular on Unix/Linux systems.

I Fully graphical interfaces. Rcmdr, pmg, RKward, SciViews,. . .

6 / 114

Page 31: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 1

2+2log(10)help(log)summary(airquality)demo(graphics) # pretty pictures...

7 / 114

Page 32: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic Vector Types

I R is a vector based language, data types includeI Numeric (integer/double) vectorsI Character (strings) vectorsI Logical vectorsI These types are combined and extended to form more

complex objects

8 / 114

Page 33: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic Vector Types

I R is a vector based language, data types includeI Numeric (integer/double) vectorsI Character (strings) vectorsI Logical vectorsI These types are combined and extended to form more

complex objects

8 / 114

Page 34: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic Vector Types

I R is a vector based language, data types includeI Numeric (integer/double) vectorsI Character (strings) vectorsI Logical vectorsI These types are combined and extended to form more

complex objects

8 / 114

Page 35: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic Vector Types

I R is a vector based language, data types includeI Numeric (integer/double) vectorsI Character (strings) vectorsI Logical vectorsI These types are combined and extended to form more

complex objects

8 / 114

Page 36: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic Vector Types

I R is a vector based language, data types includeI Numeric (integer/double) vectorsI Character (strings) vectorsI Logical vectorsI These types are combined and extended to form more

complex objects

8 / 114

Page 37: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic operations

I Standard arithmetic is vectorized: x + y adds eachelement of x to the corresponding element of y

I Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

I c — concatenate: c(7, 9, 13)

I seq — sequences: seq(1, 9, 2), short form: 1:5 isthe same as seq(1,5,1)

I rep — replication rep(1:3, 3:1) (1 1 1 2 2 3)I sum, mean, range, . . .

9 / 114

Page 38: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic operations

I Standard arithmetic is vectorized: x + y adds eachelement of x to the corresponding element of y

I Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

I c — concatenate: c(7, 9, 13)

I seq — sequences: seq(1, 9, 2), short form: 1:5 isthe same as seq(1,5,1)

I rep — replication rep(1:3, 3:1) (1 1 1 2 2 3)I sum, mean, range, . . .

9 / 114

Page 39: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic operations

I Standard arithmetic is vectorized: x + y adds eachelement of x to the corresponding element of y

I Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

I c — concatenate: c(7, 9, 13)

I seq — sequences: seq(1, 9, 2), short form: 1:5 isthe same as seq(1,5,1)

I rep — replication rep(1:3, 3:1) (1 1 1 2 2 3)I sum, mean, range, . . .

9 / 114

Page 40: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic operations

I Standard arithmetic is vectorized: x + y adds eachelement of x to the corresponding element of y

I Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

I c — concatenate: c(7, 9, 13)

I seq — sequences: seq(1, 9, 2), short form: 1:5 isthe same as seq(1,5,1)

I rep — replication rep(1:3, 3:1) (1 1 1 2 2 3)I sum, mean, range, . . .

9 / 114

Page 41: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic operations

I Standard arithmetic is vectorized: x + y adds eachelement of x to the corresponding element of y

I Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

I c — concatenate: c(7, 9, 13)

I seq — sequences: seq(1, 9, 2), short form: 1:5 isthe same as seq(1,5,1)

I rep — replication rep(1:3, 3:1) (1 1 1 2 2 3)I sum, mean, range, . . .

9 / 114

Page 42: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic operations

I Standard arithmetic is vectorized: x + y adds eachelement of x to the corresponding element of y

I Recycling: If operating on two vectors of different length,the shorter one is replicated (with warning if it is not aneven multiple)

I c — concatenate: c(7, 9, 13)

I seq — sequences: seq(1, 9, 2), short form: 1:5 isthe same as seq(1,5,1)

I rep — replication rep(1:3, 3:1) (1 1 1 2 2 3)I sum, mean, range, . . .

9 / 114

Page 43: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 2

x <- round(rnorm(10,mean=20,sd=5)) # simulate dataxmean(x)m <- mean(x)mx - m # notice recycling(x - m)^2sum((x - m)^2)sqrt(sum((x - m)^2)/9)sd(x)

10 / 114

Page 44: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Attributes

I Attributes extend the basic vector types in various waysI attributes(x) shows themI Names, set with names(x) <-c("Huey","Dewey","Louie")

I Dimensions (dim())I DimnamesI Classes (S3, S4)

11 / 114

Page 45: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Attributes

I Attributes extend the basic vector types in various waysI attributes(x) shows themI Names, set with names(x) <-c("Huey","Dewey","Louie")

I Dimensions (dim())I DimnamesI Classes (S3, S4)

11 / 114

Page 46: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Attributes

I Attributes extend the basic vector types in various waysI attributes(x) shows themI Names, set with names(x) <-c("Huey","Dewey","Louie")

I Dimensions (dim())I DimnamesI Classes (S3, S4)

11 / 114

Page 47: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Attributes

I Attributes extend the basic vector types in various waysI attributes(x) shows themI Names, set with names(x) <-c("Huey","Dewey","Louie")

I Dimensions (dim())I DimnamesI Classes (S3, S4)

11 / 114

Page 48: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Attributes

I Attributes extend the basic vector types in various waysI attributes(x) shows themI Names, set with names(x) <-c("Huey","Dewey","Louie")

I Dimensions (dim())I DimnamesI Classes (S3, S4)

11 / 114

Page 49: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Attributes

I Attributes extend the basic vector types in various waysI attributes(x) shows themI Names, set with names(x) <-c("Huey","Dewey","Louie")

I Dimensions (dim())I DimnamesI Classes (S3, S4)

11 / 114

Page 50: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Classed Objects

I In R objects can have classesI These are used as the basis for function dispatchI I.e. the same (generic) function can have different methods

for different classesI Print methods are a prototypical exampleI There are two object systems, based (roughly) on S

version 3 and version 4. We shall return to this later.

12 / 114

Page 51: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Classed Objects

I In R objects can have classesI These are used as the basis for function dispatchI I.e. the same (generic) function can have different methods

for different classesI Print methods are a prototypical exampleI There are two object systems, based (roughly) on S

version 3 and version 4. We shall return to this later.

12 / 114

Page 52: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Classed Objects

I In R objects can have classesI These are used as the basis for function dispatchI I.e. the same (generic) function can have different methods

for different classesI Print methods are a prototypical exampleI There are two object systems, based (roughly) on S

version 3 and version 4. We shall return to this later.

12 / 114

Page 53: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Classed Objects

I In R objects can have classesI These are used as the basis for function dispatchI I.e. the same (generic) function can have different methods

for different classesI Print methods are a prototypical exampleI There are two object systems, based (roughly) on S

version 3 and version 4. We shall return to this later.

12 / 114

Page 54: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Classed Objects

I In R objects can have classesI These are used as the basis for function dispatchI I.e. the same (generic) function can have different methods

for different classesI Print methods are a prototypical exampleI There are two object systems, based (roughly) on S

version 3 and version 4. We shall return to this later.

12 / 114

Page 55: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Factors

I Factors are used to describe groupings (the termoriginates from factorial designs)

I Basically, these are just integer codes plus a set of namesfor the levels

I They have class "factor" making them (a) print nicelyand (b) maintain consistency

I A factor can also be ordered (class "ordered"),signifying that there is a natural sort order on the levels

I In model specifications, factors play a fundamental role byindicating that a variable should be treated as aclassification rather than as a quantitative variable (similarto a CLASS statement in SAS)

13 / 114

Page 56: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Factors

I Factors are used to describe groupings (the termoriginates from factorial designs)

I Basically, these are just integer codes plus a set of namesfor the levels

I They have class "factor" making them (a) print nicelyand (b) maintain consistency

I A factor can also be ordered (class "ordered"),signifying that there is a natural sort order on the levels

I In model specifications, factors play a fundamental role byindicating that a variable should be treated as aclassification rather than as a quantitative variable (similarto a CLASS statement in SAS)

13 / 114

Page 57: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Factors

I Factors are used to describe groupings (the termoriginates from factorial designs)

I Basically, these are just integer codes plus a set of namesfor the levels

I They have class "factor" making them (a) print nicelyand (b) maintain consistency

I A factor can also be ordered (class "ordered"),signifying that there is a natural sort order on the levels

I In model specifications, factors play a fundamental role byindicating that a variable should be treated as aclassification rather than as a quantitative variable (similarto a CLASS statement in SAS)

13 / 114

Page 58: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Factors

I Factors are used to describe groupings (the termoriginates from factorial designs)

I Basically, these are just integer codes plus a set of namesfor the levels

I They have class "factor" making them (a) print nicelyand (b) maintain consistency

I A factor can also be ordered (class "ordered"),signifying that there is a natural sort order on the levels

I In model specifications, factors play a fundamental role byindicating that a variable should be treated as aclassification rather than as a quantitative variable (similarto a CLASS statement in SAS)

13 / 114

Page 59: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Factors

I Factors are used to describe groupings (the termoriginates from factorial designs)

I Basically, these are just integer codes plus a set of namesfor the levels

I They have class "factor" making them (a) print nicelyand (b) maintain consistency

I A factor can also be ordered (class "ordered"),signifying that there is a natural sort order on the levels

I In model specifications, factors play a fundamental role byindicating that a variable should be treated as aclassification rather than as a quantitative variable (similarto a CLASS statement in SAS)

13 / 114

Page 60: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Indexing

I R has several useful indexing mechanisms:I a[5] single elementI a[5:7] several elementsI a[-6] all except the 6thI a[b>200] logical indexI a["name"] by name

14 / 114

Page 61: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Indexing

I R has several useful indexing mechanisms:I a[5] single elementI a[5:7] several elementsI a[-6] all except the 6thI a[b>200] logical indexI a["name"] by name

14 / 114

Page 62: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Indexing

I R has several useful indexing mechanisms:I a[5] single elementI a[5:7] several elementsI a[-6] all except the 6thI a[b>200] logical indexI a["name"] by name

14 / 114

Page 63: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Indexing

I R has several useful indexing mechanisms:I a[5] single elementI a[5:7] several elementsI a[-6] all except the 6thI a[b>200] logical indexI a["name"] by name

14 / 114

Page 64: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Indexing

I R has several useful indexing mechanisms:I a[5] single elementI a[5:7] several elementsI a[-6] all except the 6thI a[b>200] logical indexI a["name"] by name

14 / 114

Page 65: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Indexing

I R has several useful indexing mechanisms:I a[5] single elementI a[5:7] several elementsI a[-6] all except the 6thI a[b>200] logical indexI a["name"] by name

14 / 114

Page 66: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Lists

I A vector where the elements can have different typesI Functions often return listsI lst <- list(A=rnorm(5), B="hello")

I Special indexing:I lst$A

I lst[[1]] first elementI lst[1] list containing the first element

15 / 114

Page 67: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Lists

I A vector where the elements can have different typesI Functions often return listsI lst <- list(A=rnorm(5), B="hello")

I Special indexing:I lst$A

I lst[[1]] first elementI lst[1] list containing the first element

15 / 114

Page 68: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Lists

I A vector where the elements can have different typesI Functions often return listsI lst <- list(A=rnorm(5), B="hello")

I Special indexing:I lst$A

I lst[[1]] first elementI lst[1] list containing the first element

15 / 114

Page 69: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Lists

I A vector where the elements can have different typesI Functions often return listsI lst <- list(A=rnorm(5), B="hello")

I Special indexing:I lst$A

I lst[[1]] first elementI lst[1] list containing the first element

15 / 114

Page 70: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Lists

I A vector where the elements can have different typesI Functions often return listsI lst <- list(A=rnorm(5), B="hello")

I Special indexing:I lst$A

I lst[[1]] first elementI lst[1] list containing the first element

15 / 114

Page 71: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Lists

I A vector where the elements can have different typesI Functions often return listsI lst <- list(A=rnorm(5), B="hello")

I Special indexing:I lst$A

I lst[[1]] first elementI lst[1] list containing the first element

15 / 114

Page 72: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Lists

I A vector where the elements can have different typesI Functions often return listsI lst <- list(A=rnorm(5), B="hello")

I Special indexing:I lst$A

I lst[[1]] first elementI lst[1] list containing the first element

15 / 114

Page 73: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Matrices/Tables/Arrays

I Used in matrix calculus and as input to, e.g.,chisq.test(). Results of tabulation.

I Vectors with dimensionsI Dimnames can be added for nicer printingI Matrices: Generate with matrix

I Indexing methods include [i,j], [i,], [,j]

16 / 114

Page 74: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Matrices/Tables/Arrays

I Used in matrix calculus and as input to, e.g.,chisq.test(). Results of tabulation.

I Vectors with dimensionsI Dimnames can be added for nicer printingI Matrices: Generate with matrix

I Indexing methods include [i,j], [i,], [,j]

16 / 114

Page 75: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Matrices/Tables/Arrays

I Used in matrix calculus and as input to, e.g.,chisq.test(). Results of tabulation.

I Vectors with dimensionsI Dimnames can be added for nicer printingI Matrices: Generate with matrix

I Indexing methods include [i,j], [i,], [,j]

16 / 114

Page 76: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Matrices/Tables/Arrays

I Used in matrix calculus and as input to, e.g.,chisq.test(). Results of tabulation.

I Vectors with dimensionsI Dimnames can be added for nicer printingI Matrices: Generate with matrix

I Indexing methods include [i,j], [i,], [,j]

16 / 114

Page 77: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Matrices/Tables/Arrays

I Used in matrix calculus and as input to, e.g.,chisq.test(). Results of tabulation.

I Vectors with dimensionsI Dimnames can be added for nicer printingI Matrices: Generate with matrix

I Indexing methods include [i,j], [i,], [,j]

16 / 114

Page 78: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Data frames

I Like data set in other statistical systemsI Technically: Lists of vectors/factors of same lengthI Row names (must be unique)I Indexed like matrices (Beware, though: Data frames are

not matrices)I Generate from read operation or with data.frame

I Many sample data frames are avalilable using data()

17 / 114

Page 79: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Data frames

I Like data set in other statistical systemsI Technically: Lists of vectors/factors of same lengthI Row names (must be unique)I Indexed like matrices (Beware, though: Data frames are

not matrices)I Generate from read operation or with data.frame

I Many sample data frames are avalilable using data()

17 / 114

Page 80: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Data frames

I Like data set in other statistical systemsI Technically: Lists of vectors/factors of same lengthI Row names (must be unique)I Indexed like matrices (Beware, though: Data frames are

not matrices)I Generate from read operation or with data.frame

I Many sample data frames are avalilable using data()

17 / 114

Page 81: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Data frames

I Like data set in other statistical systemsI Technically: Lists of vectors/factors of same lengthI Row names (must be unique)I Indexed like matrices (Beware, though: Data frames are

not matrices)I Generate from read operation or with data.frame

I Many sample data frames are avalilable using data()

17 / 114

Page 82: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Data frames

I Like data set in other statistical systemsI Technically: Lists of vectors/factors of same lengthI Row names (must be unique)I Indexed like matrices (Beware, though: Data frames are

not matrices)I Generate from read operation or with data.frame

I Many sample data frames are avalilable using data()

17 / 114

Page 83: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Data frames

I Like data set in other statistical systemsI Technically: Lists of vectors/factors of same lengthI Row names (must be unique)I Indexed like matrices (Beware, though: Data frames are

not matrices)I Generate from read operation or with data.frame

I Many sample data frames are avalilable using data()

17 / 114

Page 84: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 3

airquality[1:10,]airquality$Monthairquality[airquality$Month==5,]oz <- airquality[airquality$Month==5,]$Ozonemean(oz)mean(oz, na.rm=TRUE)

18 / 114

Page 85: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The workspace

I The global environment contains R objects created on thecommand line.

I There is an additional search path of loaded packages andattached data frames.

I When you request an object by name, R looks first in theglobal environment, and if it doesn’t find it there, itcontinues along the search path.

I The search path is maintained by library(), attach(),and detach()

I Notice that objects in the global environment may maskobjects in packages and attached data frames

19 / 114

Page 86: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The workspace

I The global environment contains R objects created on thecommand line.

I There is an additional search path of loaded packages andattached data frames.

I When you request an object by name, R looks first in theglobal environment, and if it doesn’t find it there, itcontinues along the search path.

I The search path is maintained by library(), attach(),and detach()

I Notice that objects in the global environment may maskobjects in packages and attached data frames

19 / 114

Page 87: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The workspace

I The global environment contains R objects created on thecommand line.

I There is an additional search path of loaded packages andattached data frames.

I When you request an object by name, R looks first in theglobal environment, and if it doesn’t find it there, itcontinues along the search path.

I The search path is maintained by library(), attach(),and detach()

I Notice that objects in the global environment may maskobjects in packages and attached data frames

19 / 114

Page 88: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The workspace

I The global environment contains R objects created on thecommand line.

I There is an additional search path of loaded packages andattached data frames.

I When you request an object by name, R looks first in theglobal environment, and if it doesn’t find it there, itcontinues along the search path.

I The search path is maintained by library(), attach(),and detach()

I Notice that objects in the global environment may maskobjects in packages and attached data frames

19 / 114

Page 89: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The workspace

I The global environment contains R objects created on thecommand line.

I There is an additional search path of loaded packages andattached data frames.

I When you request an object by name, R looks first in theglobal environment, and if it doesn’t find it there, itcontinues along the search path.

I The search path is maintained by library(), attach(),and detach()

I Notice that objects in the global environment may maskobjects in packages and attached data frames

19 / 114

Page 90: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 4

attach(airquality)mean(Ozone, na.rm=TRUE)tapply(Ozone, Month, mean, na.rm=TRUE)detach()search()library(ISwR)data(intake) # From ISwRls()attach(intake)search()ls("intake") # show variables in data framepost - prerm(intake) # remove data framedetach() # remove from search path

20 / 114

Page 91: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

A Common Mistake

attach(juul2)sex <- factor(sex)tapply(height, sex, mean)detach()attach(subset(juul2, age > 25))sex <- factor(sex)tapply(height, sex, mean)

You get an error saying that height and sex are of differentlength. What went wrong?Second time around, sex was found in the global environmentbefore the attached data frame.

21 / 114

Page 92: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

A Common Mistake

attach(juul2)sex <- factor(sex)tapply(height, sex, mean)detach()attach(subset(juul2, age > 25))sex <- factor(sex)tapply(height, sex, mean)

You get an error saying that height and sex are of differentlength. What went wrong?Second time around, sex was found in the global environmentbefore the attached data frame.

21 / 114

Page 93: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

A Common Mistake

attach(juul2)sex <- factor(sex)tapply(height, sex, mean)detach()attach(subset(juul2, age > 25))sex <- factor(sex)tapply(height, sex, mean)

You get an error saying that height and sex are of differentlength. What went wrong?Second time around, sex was found in the global environmentbefore the attached data frame.

21 / 114

Page 94: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Reading Data, Overview

I Simple data vectors can be read using scan()

I Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim().

I The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

I For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

22 / 114

Page 95: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Reading Data, Overview

I Simple data vectors can be read using scan()

I Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim().

I The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

I For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

22 / 114

Page 96: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Reading Data, Overview

I Simple data vectors can be read using scan()

I Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim().

I The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

I For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

22 / 114

Page 97: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Reading Data, Overview

I Simple data vectors can be read using scan()

I Data frames can be read from most reasonably structuredtext file formats (space separated columns, tab- andcomma-delimited files) using read.table() orread.delim().

I The foreign package can read files from Stata, SASexport libraries, SPSS, and Epi-Info, Minitab, and someS-PLUS versions.

I For spreadsheets and databases, the quick and easy wayis to export to a delimited file, but you can work via ODBCconnections and database access packages

22 / 114

Page 98: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The Simplest Way to Read Data

I This is what you’d normally want to do:I Have data in a plain text fileI Columns separated by whitespaceI Missing values coded as the string "NA"

I Preferably have a row of variable names at the topI Use d <- read.table("myfile", header=TRUE)

23 / 114

Page 99: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The Simplest Way to Read Data

I This is what you’d normally want to do:I Have data in a plain text fileI Columns separated by whitespaceI Missing values coded as the string "NA"

I Preferably have a row of variable names at the topI Use d <- read.table("myfile", header=TRUE)

23 / 114

Page 100: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The Simplest Way to Read Data

I This is what you’d normally want to do:I Have data in a plain text fileI Columns separated by whitespaceI Missing values coded as the string "NA"

I Preferably have a row of variable names at the topI Use d <- read.table("myfile", header=TRUE)

23 / 114

Page 101: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The Simplest Way to Read Data

I This is what you’d normally want to do:I Have data in a plain text fileI Columns separated by whitespaceI Missing values coded as the string "NA"

I Preferably have a row of variable names at the topI Use d <- read.table("myfile", header=TRUE)

23 / 114

Page 102: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The Simplest Way to Read Data

I This is what you’d normally want to do:I Have data in a plain text fileI Columns separated by whitespaceI Missing values coded as the string "NA"

I Preferably have a row of variable names at the topI Use d <- read.table("myfile", header=TRUE)

23 / 114

Page 103: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The Simplest Way to Read Data

I This is what you’d normally want to do:I Have data in a plain text fileI Columns separated by whitespaceI Missing values coded as the string "NA"

I Preferably have a row of variable names at the topI Use d <- read.table("myfile", header=TRUE)

23 / 114

Page 104: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 5

fname <- "/home/pd/Rlibrary/ISwR/rawdata/thuesen.txt"file.show(fname)thu <- read.table(fname, header=TRUE)thu

(You can give the literal filename to read.table, but don’tforget the quotes.)

24 / 114

Page 105: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Options and Details

I read.table has quite a few options and detailsI Different codings of missing values (na.strings)I Different decimal separators (dec argument)I Text strings can be quoted if embedded blanksI You may skip lines, read a limited number of lines, and

more. Please consult the manual page for details.

25 / 114

Page 106: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Options and Details

I read.table has quite a few options and detailsI Different codings of missing values (na.strings)I Different decimal separators (dec argument)I Text strings can be quoted if embedded blanksI You may skip lines, read a limited number of lines, and

more. Please consult the manual page for details.

25 / 114

Page 107: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Options and Details

I read.table has quite a few options and detailsI Different codings of missing values (na.strings)I Different decimal separators (dec argument)I Text strings can be quoted if embedded blanksI You may skip lines, read a limited number of lines, and

more. Please consult the manual page for details.

25 / 114

Page 108: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Options and Details

I read.table has quite a few options and detailsI Different codings of missing values (na.strings)I Different decimal separators (dec argument)I Text strings can be quoted if embedded blanksI You may skip lines, read a limited number of lines, and

more. Please consult the manual page for details.

25 / 114

Page 109: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Options and Details

I read.table has quite a few options and detailsI Different codings of missing values (na.strings)I Different decimal separators (dec argument)I Text strings can be quoted if embedded blanksI You may skip lines, read a limited number of lines, and

more. Please consult the manual page for details.

25 / 114

Page 110: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Some Basic Functions

I Constructors of simple objectsI Single-column modificationsI Modifying and subsetting data frames

26 / 114

Page 111: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Some Basic Functions

I Constructors of simple objectsI Single-column modificationsI Modifying and subsetting data frames

26 / 114

Page 112: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Some Basic Functions

I Constructors of simple objectsI Single-column modificationsI Modifying and subsetting data frames

26 / 114

Page 113: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Constructors

I R deals with many kinds of objects besides data setsI Need to have ways of constructing them from the

command lineI We have (briefly) seen the c and list functionsI Notice the naming forms c(boys=1.2, girls=1.1)

I Extracting and setting names with names(x)

I For matrices and arrays, use the (surprise) matrix andarray functions. data.frame for data frames.

I It is also fairly common to construct a matrix from itscolumns using cbind

27 / 114

Page 114: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Constructors

I R deals with many kinds of objects besides data setsI Need to have ways of constructing them from the

command lineI We have (briefly) seen the c and list functionsI Notice the naming forms c(boys=1.2, girls=1.1)

I Extracting and setting names with names(x)

I For matrices and arrays, use the (surprise) matrix andarray functions. data.frame for data frames.

I It is also fairly common to construct a matrix from itscolumns using cbind

27 / 114

Page 115: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Constructors

I R deals with many kinds of objects besides data setsI Need to have ways of constructing them from the

command lineI We have (briefly) seen the c and list functionsI Notice the naming forms c(boys=1.2, girls=1.1)

I Extracting and setting names with names(x)

I For matrices and arrays, use the (surprise) matrix andarray functions. data.frame for data frames.

I It is also fairly common to construct a matrix from itscolumns using cbind

27 / 114

Page 116: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Constructors

I R deals with many kinds of objects besides data setsI Need to have ways of constructing them from the

command lineI We have (briefly) seen the c and list functionsI Notice the naming forms c(boys=1.2, girls=1.1)

I Extracting and setting names with names(x)

I For matrices and arrays, use the (surprise) matrix andarray functions. data.frame for data frames.

I It is also fairly common to construct a matrix from itscolumns using cbind

27 / 114

Page 117: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Constructors

I R deals with many kinds of objects besides data setsI Need to have ways of constructing them from the

command lineI We have (briefly) seen the c and list functionsI Notice the naming forms c(boys=1.2, girls=1.1)

I Extracting and setting names with names(x)

I For matrices and arrays, use the (surprise) matrix andarray functions. data.frame for data frames.

I It is also fairly common to construct a matrix from itscolumns using cbind

27 / 114

Page 118: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Constructors

I R deals with many kinds of objects besides data setsI Need to have ways of constructing them from the

command lineI We have (briefly) seen the c and list functionsI Notice the naming forms c(boys=1.2, girls=1.1)

I Extracting and setting names with names(x)

I For matrices and arrays, use the (surprise) matrix andarray functions. data.frame for data frames.

I It is also fairly common to construct a matrix from itscolumns using cbind

27 / 114

Page 119: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Constructors

I R deals with many kinds of objects besides data setsI Need to have ways of constructing them from the

command lineI We have (briefly) seen the c and list functionsI Notice the naming forms c(boys=1.2, girls=1.1)

I Extracting and setting names with names(x)

I For matrices and arrays, use the (surprise) matrix andarray functions. data.frame for data frames.

I It is also fairly common to construct a matrix from itscolumns using cbind

27 / 114

Page 120: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 6

x <- c(boys = 1.2, girls = 1.1)xnames(x)names(x) <- c("M", "F")xmatrix(1:4,ncol=2)cbind(x=0:3,"exp(x)"=exp(0:3))

28 / 114

Page 121: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The factor Function

I This is typically used when read.table gets it wrongI E.g. group codes read as numericI Or read as factors, but with levels in the wrong order (e.g.c("rare", "medium", "well-done") sortedalphabetically.)

I Notice the slightly confusing use of levels and labelsarguments.

I levels are the value codes on inputI labels are the value codes on output (and become the

levels of the resulting factor)

29 / 114

Page 122: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The factor Function

I This is typically used when read.table gets it wrongI E.g. group codes read as numericI Or read as factors, but with levels in the wrong order (e.g.c("rare", "medium", "well-done") sortedalphabetically.)

I Notice the slightly confusing use of levels and labelsarguments.

I levels are the value codes on inputI labels are the value codes on output (and become the

levels of the resulting factor)

29 / 114

Page 123: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The factor Function

I This is typically used when read.table gets it wrongI E.g. group codes read as numericI Or read as factors, but with levels in the wrong order (e.g.c("rare", "medium", "well-done") sortedalphabetically.)

I Notice the slightly confusing use of levels and labelsarguments.

I levels are the value codes on inputI labels are the value codes on output (and become the

levels of the resulting factor)

29 / 114

Page 124: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The factor Function

I This is typically used when read.table gets it wrongI E.g. group codes read as numericI Or read as factors, but with levels in the wrong order (e.g.c("rare", "medium", "well-done") sortedalphabetically.)

I Notice the slightly confusing use of levels and labelsarguments.

I levels are the value codes on inputI labels are the value codes on output (and become the

levels of the resulting factor)

29 / 114

Page 125: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The factor Function

I This is typically used when read.table gets it wrongI E.g. group codes read as numericI Or read as factors, but with levels in the wrong order (e.g.c("rare", "medium", "well-done") sortedalphabetically.)

I Notice the slightly confusing use of levels and labelsarguments.

I levels are the value codes on inputI labels are the value codes on output (and become the

levels of the resulting factor)

29 / 114

Page 126: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The factor Function

I This is typically used when read.table gets it wrongI E.g. group codes read as numericI Or read as factors, but with levels in the wrong order (e.g.c("rare", "medium", "well-done") sortedalphabetically.)

I Notice the slightly confusing use of levels and labelsarguments.

I levels are the value codes on inputI labels are the value codes on output (and become the

levels of the resulting factor)

29 / 114

Page 127: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 7

aq <- airqualityaq$Month <- factor(aq$Month, levels=5:9,

labels=month.name[5:9])aq$Monthlevels(aq$Month) <- month.abb[5:9]aq$Month

30 / 114

Page 128: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The cut Function

I The cut function converts a numerical variable into groupsaccording to a set of break points

I Notice that the number of breaks is one more than thenumber of intervals

I Notice also that the intervals are left-open, right-closed bydefault (right=FALSE changes that)

I . . . and that the lowest endpoint is not included by default(set include.lowest=TRUE if it bothers you)

31 / 114

Page 129: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The cut Function

I The cut function converts a numerical variable into groupsaccording to a set of break points

I Notice that the number of breaks is one more than thenumber of intervals

I Notice also that the intervals are left-open, right-closed bydefault (right=FALSE changes that)

I . . . and that the lowest endpoint is not included by default(set include.lowest=TRUE if it bothers you)

31 / 114

Page 130: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The cut Function

I The cut function converts a numerical variable into groupsaccording to a set of break points

I Notice that the number of breaks is one more than thenumber of intervals

I Notice also that the intervals are left-open, right-closed bydefault (right=FALSE changes that)

I . . . and that the lowest endpoint is not included by default(set include.lowest=TRUE if it bothers you)

31 / 114

Page 131: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The cut Function

I The cut function converts a numerical variable into groupsaccording to a set of break points

I Notice that the number of breaks is one more than thenumber of intervals

I Notice also that the intervals are left-open, right-closed bydefault (right=FALSE changes that)

I . . . and that the lowest endpoint is not included by default(set include.lowest=TRUE if it bothers you)

31 / 114

Page 132: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 8

age <- subset(juul, age >= 10 & age <= 16)$agerange(age)agegr <- cut(age, seq(10,16,2), right=FALSE, include.lowest=TRUE)length(age)table(agegr)agegr2 <- cut(age, seq(10,16,2), right=FALSE)table(agegr2)

32 / 114

Page 133: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Working with Dates

I Dates are usually read as character or factor variablesI Use the as.Date function to convert them to objects of

class "Date"I If data are not in a standard format (YYYY-MM-DD) you

must supply a format specification> as.Date("11/3-1959",format="%d/%m-%Y")[1] "1959-03-11"

I You can calculate differences between codeDate objects.The result is an object of class "difftime", with a unit ofdays. You need as.numeric to get the actual number.> as.numeric(Sys.Date()-as.Date("1959-3-11"),units="days")[1] 17903

33 / 114

Page 134: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Working with Dates

I Dates are usually read as character or factor variablesI Use the as.Date function to convert them to objects of

class "Date"I If data are not in a standard format (YYYY-MM-DD) you

must supply a format specification> as.Date("11/3-1959",format="%d/%m-%Y")[1] "1959-03-11"

I You can calculate differences between codeDate objects.The result is an object of class "difftime", with a unit ofdays. You need as.numeric to get the actual number.> as.numeric(Sys.Date()-as.Date("1959-3-11"),units="days")[1] 17903

33 / 114

Page 135: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Working with Dates

I Dates are usually read as character or factor variablesI Use the as.Date function to convert them to objects of

class "Date"I If data are not in a standard format (YYYY-MM-DD) you

must supply a format specification> as.Date("11/3-1959",format="%d/%m-%Y")[1] "1959-03-11"

I You can calculate differences between codeDate objects.The result is an object of class "difftime", with a unit ofdays. You need as.numeric to get the actual number.> as.numeric(Sys.Date()-as.Date("1959-3-11"),units="days")[1] 17903

33 / 114

Page 136: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Working with Dates

I Dates are usually read as character or factor variablesI Use the as.Date function to convert them to objects of

class "Date"I If data are not in a standard format (YYYY-MM-DD) you

must supply a format specification> as.Date("11/3-1959",format="%d/%m-%Y")[1] "1959-03-11"

I You can calculate differences between codeDate objects.The result is an object of class "difftime", with a unit ofdays. You need as.numeric to get the actual number.> as.numeric(Sys.Date()-as.Date("1959-3-11"),units="days")[1] 17903

33 / 114

Page 137: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Sorting Things

I Sorting is not used quite as much in R as in otherpackages, because few procedures rely on presorted data.

I However, it is easy enough: sort(x)I To put y in the order of x: y[order(x)]I or to sort an entire data framemydata[order(sex,age),]

I Notice that the semantics of the order function. It is notthe same as ranking, it is “the number of the observationthat should go here”.

34 / 114

Page 138: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Sorting Things

I Sorting is not used quite as much in R as in otherpackages, because few procedures rely on presorted data.

I However, it is easy enough: sort(x)I To put y in the order of x: y[order(x)]I or to sort an entire data framemydata[order(sex,age),]

I Notice that the semantics of the order function. It is notthe same as ranking, it is “the number of the observationthat should go here”.

34 / 114

Page 139: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Sorting Things

I Sorting is not used quite as much in R as in otherpackages, because few procedures rely on presorted data.

I However, it is easy enough: sort(x)I To put y in the order of x: y[order(x)]I or to sort an entire data framemydata[order(sex,age),]

I Notice that the semantics of the order function. It is notthe same as ranking, it is “the number of the observationthat should go here”.

34 / 114

Page 140: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Sorting Things

I Sorting is not used quite as much in R as in otherpackages, because few procedures rely on presorted data.

I However, it is easy enough: sort(x)I To put y in the order of x: y[order(x)]I or to sort an entire data framemydata[order(sex,age),]

I Notice that the semantics of the order function. It is notthe same as ranking, it is “the number of the observationthat should go here”.

34 / 114

Page 141: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Sorting Things

I Sorting is not used quite as much in R as in otherpackages, because few procedures rely on presorted data.

I However, it is easy enough: sort(x)I To put y in the order of x: y[order(x)]I or to sort an entire data framemydata[order(sex,age),]

I Notice that the semantics of the order function. It is notthe same as ranking, it is “the number of the observationthat should go here”.

34 / 114

Page 142: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Modifying and Subsetting Data Frames

I The syntax for indexing data frames easily gets heavy:airquality[airquality$Month == 5 &airquality$Ozone > 50,]

I The subset function uses nonstandard evaluation to allowyou to say subset(airquality, Month == 5 &Ozone > 50). I.e., it evaluates the second argumentwithin the data frame.

I The transform function is similar. It allows you to definenew variables or modify old ones using code likejuulnew <- transform(juul,

sex=factor(sex, labels=c("M","F")),tanner=factor(tanner))

35 / 114

Page 143: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Statistics

I Descriptive statisticsI Classical testsI Modeling

36 / 114

Page 144: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Statistics

I Descriptive statisticsI Classical testsI Modeling

36 / 114

Page 145: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Statistics

I Descriptive statisticsI Classical testsI Modeling

36 / 114

Page 146: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Descriptives

I mean, median, sd, etc.I quantile(x,p) where p is a vector of proportionsI (actually, there are nine different types of quantiles)I summary gives some key quantities or a variable,

depending on its type. This also works on entire dataframes

37 / 114

Page 147: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Descriptives

I mean, median, sd, etc.I quantile(x,p) where p is a vector of proportionsI (actually, there are nine different types of quantiles)I summary gives some key quantities or a variable,

depending on its type. This also works on entire dataframes

37 / 114

Page 148: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Descriptives

I mean, median, sd, etc.I quantile(x,p) where p is a vector of proportionsI (actually, there are nine different types of quantiles)I summary gives some key quantities or a variable,

depending on its type. This also works on entire dataframes

37 / 114

Page 149: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Descriptives

I mean, median, sd, etc.I quantile(x,p) where p is a vector of proportionsI (actually, there are nine different types of quantiles)I summary gives some key quantities or a variable,

depending on its type. This also works on entire dataframes

37 / 114

Page 150: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Tabulation

I For simple tables of discrete variables, use the tablefunction, as in table(sex,tanner)

I For tables of descriptives the first choice is tapply, forexample tapply(age, tanner, mean, na.rm=TRUE

I Explanation: age is split according to groups and mean iscalled on each piece with an extra argument, evaluatingmean(age, na.rm=TRUE) within each group.

38 / 114

Page 151: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Tabulation

I For simple tables of discrete variables, use the tablefunction, as in table(sex,tanner)

I For tables of descriptives the first choice is tapply, forexample tapply(age, tanner, mean, na.rm=TRUE

I Explanation: age is split according to groups and mean iscalled on each piece with an extra argument, evaluatingmean(age, na.rm=TRUE) within each group.

38 / 114

Page 152: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Tabulation

I For simple tables of discrete variables, use the tablefunction, as in table(sex,tanner)

I For tables of descriptives the first choice is tapply, forexample tapply(age, tanner, mean, na.rm=TRUE

I Explanation: age is split according to groups and mean iscalled on each piece with an extra argument, evaluatingmean(age, na.rm=TRUE) within each group.

38 / 114

Page 153: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Neater Tables

I Some variations over tapply are given by the by andaggregate functions

I Multiway tables are often hard to read and use forpresentation purposes. Look into the ftable (“flattened”tables) and Martyn Plummer’s stat.table function in theEpi package.

39 / 114

Page 154: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Neater Tables

I Some variations over tapply are given by the by andaggregate functions

I Multiway tables are often hard to read and use forpresentation purposes. Look into the ftable (“flattened”tables) and Martyn Plummer’s stat.table function in theEpi package.

39 / 114

Page 155: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Some Standard Tests

I Continuous data by group: t.test, wilcox.test,oneway.test, kruskal.test

I Categorical data: prop.test, chisq.test,fisher.test

I Correlations: cor.test, with options for nonparametrics

40 / 114

Page 156: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Some Standard Tests

I Continuous data by group: t.test, wilcox.test,oneway.test, kruskal.test

I Categorical data: prop.test, chisq.test,fisher.test

I Correlations: cor.test, with options for nonparametrics

40 / 114

Page 157: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Some Standard Tests

I Continuous data by group: t.test, wilcox.test,oneway.test, kruskal.test

I Categorical data: prop.test, chisq.test,fisher.test

I Correlations: cor.test, with options for nonparametrics

40 / 114

Page 158: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 9

attach(intake)t.test(pre, post, paired=TRUE)detach()

41 / 114

Page 159: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 10

caesar.shoechisq.test(caesar.shoe)fisher.test(caesar.shoe)x <- caesar.shoe[1,]n <- margin.table(caesar.shoe,2)nprop.trend.test(x,n)

42 / 114

Page 160: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Modeling Tools: Overview

I Model formulasI Model objects and summariesI Comparing modelsI Evaluating model fitI Generalized linear models

43 / 114

Page 161: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Modeling Tools: Overview

I Model formulasI Model objects and summariesI Comparing modelsI Evaluating model fitI Generalized linear models

43 / 114

Page 162: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Modeling Tools: Overview

I Model formulasI Model objects and summariesI Comparing modelsI Evaluating model fitI Generalized linear models

43 / 114

Page 163: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Modeling Tools: Overview

I Model formulasI Model objects and summariesI Comparing modelsI Evaluating model fitI Generalized linear models

43 / 114

Page 164: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Modeling Tools: Overview

I Model formulasI Model objects and summariesI Comparing modelsI Evaluating model fitI Generalized linear models

43 / 114

Page 165: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas

I Linear model, y = Xβ + ε

I In practice something like

y = β0 + β1 × height + β2 × 1(type=2) + β3 × 1(type=3) + ε

I Model formula:y = height + type

(Interpretation depends on whether variables arecategorical or continuous)

44 / 114

Page 166: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas

I Linear model, y = Xβ + ε

I In practice something like

y = β0 + β1 × height + β2 × 1(type=2) + β3 × 1(type=3) + ε

I Model formula:y = height + type

(Interpretation depends on whether variables arecategorical or continuous)

44 / 114

Page 167: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas

I Linear model, y = Xβ + ε

I In practice something like

y = β0 + β1 × height + β2 × 1(type=2) + β3 × 1(type=3) + ε

I Model formula:y = height + type

(Interpretation depends on whether variables arecategorical or continuous)

44 / 114

Page 168: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas in R

I R representation y ~ height + type where type is afactor

I Interactions a:b, a*b = a + b + a:b

I Algebra (a:(b + c) = a:b + a:c etc.)I Notice special interpretation of operatorsI Special items: offset, -1 (no intercept)

45 / 114

Page 169: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas in R

I R representation y ~ height + type where type is afactor

I Interactions a:b, a*b = a + b + a:b

I Algebra (a:(b + c) = a:b + a:c etc.)I Notice special interpretation of operatorsI Special items: offset, -1 (no intercept)

45 / 114

Page 170: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas in R

I R representation y ~ height + type where type is afactor

I Interactions a:b, a*b = a + b + a:b

I Algebra (a:(b + c) = a:b + a:c etc.)I Notice special interpretation of operatorsI Special items: offset, -1 (no intercept)

45 / 114

Page 171: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas in R

I R representation y ~ height + type where type is afactor

I Interactions a:b, a*b = a + b + a:b

I Algebra (a:(b + c) = a:b + a:c etc.)I Notice special interpretation of operatorsI Special items: offset, -1 (no intercept)

45 / 114

Page 172: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Formulas in R

I R representation y ~ height + type where type is afactor

I Interactions a:b, a*b = a + b + a:b

I Algebra (a:(b + c) = a:b + a:c etc.)I Notice special interpretation of operatorsI Special items: offset, -1 (no intercept)

45 / 114

Page 173: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Fitting Linear Models

aq <- transform(airquality, Month=factor(Month))fit.aq <- lm(log(Ozone) ~ Solar.R + Wind +

Temp + Month, data=aq)

I lm generates a fitted model objectI Extract information from model objectI Fit other models based on model object

46 / 114

Page 174: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Fitting Linear Models

aq <- transform(airquality, Month=factor(Month))fit.aq <- lm(log(Ozone) ~ Solar.R + Wind +

Temp + Month, data=aq)

I lm generates a fitted model objectI Extract information from model objectI Fit other models based on model object

46 / 114

Page 175: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Fitting Linear Models

aq <- transform(airquality, Month=factor(Month))fit.aq <- lm(log(Ozone) ~ Solar.R + Wind +

Temp + Month, data=aq)

I lm generates a fitted model objectI Extract information from model objectI Fit other models based on model object

46 / 114

Page 176: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Fitting Linear Models

aq <- transform(airquality, Month=factor(Month))fit.aq <- lm(log(Ozone) ~ Solar.R + Wind +

Temp + Month, data=aq)

I lm generates a fitted model objectI Extract information from model objectI Fit other models based on model object

46 / 114

Page 177: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Inspecting Model Objects

I Extract information about the fitI summary(fit.aq)

I fitted(fit.aq), resid(fit.aq)I anova(fit.aq, fit.aq2)

I plot(fit.aq) – diagnosticsI predict(fit.aq, newdata)

47 / 114

Page 178: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Inspecting Model Objects

I Extract information about the fitI summary(fit.aq)

I fitted(fit.aq), resid(fit.aq)I anova(fit.aq, fit.aq2)

I plot(fit.aq) – diagnosticsI predict(fit.aq, newdata)

47 / 114

Page 179: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Inspecting Model Objects

I Extract information about the fitI summary(fit.aq)

I fitted(fit.aq), resid(fit.aq)I anova(fit.aq, fit.aq2)

I plot(fit.aq) – diagnosticsI predict(fit.aq, newdata)

47 / 114

Page 180: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Inspecting Model Objects

I Extract information about the fitI summary(fit.aq)

I fitted(fit.aq), resid(fit.aq)I anova(fit.aq, fit.aq2)

I plot(fit.aq) – diagnosticsI predict(fit.aq, newdata)

47 / 114

Page 181: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Inspecting Model Objects

I Extract information about the fitI summary(fit.aq)

I fitted(fit.aq), resid(fit.aq)I anova(fit.aq, fit.aq2)

I plot(fit.aq) – diagnosticsI predict(fit.aq, newdata)

47 / 114

Page 182: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Inspecting Model Objects

I Extract information about the fitI summary(fit.aq)

I fitted(fit.aq), resid(fit.aq)I anova(fit.aq, fit.aq2)

I plot(fit.aq) – diagnosticsI predict(fit.aq, newdata)

47 / 114

Page 183: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Search

I anova(fit.aq) “Type I” sum of squaresI drop1(fit.aq) (“Type III”), add1I step(fit.aq) (AIC/BIC) criteriaI update(fit.aq,....) modifies a previous model

48 / 114

Page 184: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Search

I anova(fit.aq) “Type I” sum of squaresI drop1(fit.aq) (“Type III”), add1I step(fit.aq) (AIC/BIC) criteriaI update(fit.aq,....) modifies a previous model

48 / 114

Page 185: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Search

I anova(fit.aq) “Type I” sum of squaresI drop1(fit.aq) (“Type III”), add1I step(fit.aq) (AIC/BIC) criteriaI update(fit.aq,....) modifies a previous model

48 / 114

Page 186: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model Search

I anova(fit.aq) “Type I” sum of squaresI drop1(fit.aq) (“Type III”), add1I step(fit.aq) (AIC/BIC) criteriaI update(fit.aq,....) modifies a previous model

48 / 114

Page 187: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 11

aq <- transform(airquality, Month=factor(Month))fit.aq <- lm(log(Ozone) ~ Solar.R + Wind +

Temp + Month, data=aq)fit.aq2 <- update(fit.aq, ~ . - Month)summary(fit.aq)plot(fit.aq)drop1(fit.aq, test="F")anova(fit.aq, fit.aq2)

49 / 114

Page 188: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Generalized Linear Models

I Statistical distribution (exponential) familyI Link function transforming mean to linear scaleI DevianceI Examples; Binomial, Poisson, Gaussian (σ known — in

principle)I Canonical link functions: logit, log, identityI Fit using glm in R

50 / 114

Page 189: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Generalized Linear Models

I Statistical distribution (exponential) familyI Link function transforming mean to linear scaleI DevianceI Examples; Binomial, Poisson, Gaussian (σ known — in

principle)I Canonical link functions: logit, log, identityI Fit using glm in R

50 / 114

Page 190: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Generalized Linear Models

I Statistical distribution (exponential) familyI Link function transforming mean to linear scaleI DevianceI Examples; Binomial, Poisson, Gaussian (σ known — in

principle)I Canonical link functions: logit, log, identityI Fit using glm in R

50 / 114

Page 191: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Generalized Linear Models

I Statistical distribution (exponential) familyI Link function transforming mean to linear scaleI DevianceI Examples; Binomial, Poisson, Gaussian (σ known — in

principle)I Canonical link functions: logit, log, identityI Fit using glm in R

50 / 114

Page 192: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Generalized Linear Models

I Statistical distribution (exponential) familyI Link function transforming mean to linear scaleI DevianceI Examples; Binomial, Poisson, Gaussian (σ known — in

principle)I Canonical link functions: logit, log, identityI Fit using glm in R

50 / 114

Page 193: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Generalized Linear Models

I Statistical distribution (exponential) familyI Link function transforming mean to linear scaleI DevianceI Examples; Binomial, Poisson, Gaussian (σ known — in

principle)I Canonical link functions: logit, log, identityI Fit using glm in R

50 / 114

Page 194: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 12

no.yes <- c("No","Yes")smoking <- gl(2, 1, 8, no.yes)obesity <- gl(2, 2, 8, no.yes)snoring <- gl(2, 4, 8, no.yes)n.tot <- c(60,17,8,2,187,85,51,23)n.hyp <- c(5,2,1,0,35,13,15,8)data.frame(smoking,obesity,snoring,n.tot,n.hyp)hyp.tbl <- cbind(n.hyp,n.tot-n.hyp)glm.hyp <- glm(hyp.tbl~smoking+obesity+snoring,

family=binomial("logit"))summary(glm.hyp)

51 / 114

Page 195: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Graphics

I The standard interfaceI Customizing plotsI Graphics parametersI Math on plotsI Grid and lattice graphics

52 / 114

Page 196: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Graphics

I The standard interfaceI Customizing plotsI Graphics parametersI Math on plotsI Grid and lattice graphics

52 / 114

Page 197: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Graphics

I The standard interfaceI Customizing plotsI Graphics parametersI Math on plotsI Grid and lattice graphics

52 / 114

Page 198: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Graphics

I The standard interfaceI Customizing plotsI Graphics parametersI Math on plotsI Grid and lattice graphics

52 / 114

Page 199: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Graphics

I The standard interfaceI Customizing plotsI Graphics parametersI Math on plotsI Grid and lattice graphics

52 / 114

Page 200: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Standard R Graphics

I Ink on paper model; once something is drawn it cannot beerased.

I Sensible default plotsI Arguments can override defaultsI Options to turn off various elements of plots (e.g. the axes)I Functions to add elements.

53 / 114

Page 201: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Standard R Graphics

I Ink on paper model; once something is drawn it cannot beerased.

I Sensible default plotsI Arguments can override defaultsI Options to turn off various elements of plots (e.g. the axes)I Functions to add elements.

53 / 114

Page 202: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Standard R Graphics

I Ink on paper model; once something is drawn it cannot beerased.

I Sensible default plotsI Arguments can override defaultsI Options to turn off various elements of plots (e.g. the axes)I Functions to add elements.

53 / 114

Page 203: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Standard R Graphics

I Ink on paper model; once something is drawn it cannot beerased.

I Sensible default plotsI Arguments can override defaultsI Options to turn off various elements of plots (e.g. the axes)I Functions to add elements.

53 / 114

Page 204: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Standard R Graphics

I Ink on paper model; once something is drawn it cannot beerased.

I Sensible default plotsI Arguments can override defaultsI Options to turn off various elements of plots (e.g. the axes)I Functions to add elements.

53 / 114

Page 205: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 206: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 207: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 208: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 209: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 210: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 211: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 212: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Types of Plotting Functions

I High levelI Create a new page of plots with reasonable default

appearance.I Low level

I Draw elements of a plot on an existing page:I Draw title, subtitle, axes, legend . . .I Add points, lines, text, math expressions . . .

I InteractiveI Querying mouse position (locator), highlighting points

(identify)

54 / 114

Page 213: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Managing graphics devices

I You can have several graphics devices open at the sametime: dev.list() shows them

I Graphics output only goes to the current device:dev.cur()

I Turn off a graphics device with dev.off()

I Print the current plot: dev.print()I Make a copy of the current plot using another device:dev.copy(), dev.copy2eps

I However, best results are obtained by plotting directly tothe target device

55 / 114

Page 214: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Managing graphics devices

I You can have several graphics devices open at the sametime: dev.list() shows them

I Graphics output only goes to the current device:dev.cur()

I Turn off a graphics device with dev.off()

I Print the current plot: dev.print()I Make a copy of the current plot using another device:dev.copy(), dev.copy2eps

I However, best results are obtained by plotting directly tothe target device

55 / 114

Page 215: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Managing graphics devices

I You can have several graphics devices open at the sametime: dev.list() shows them

I Graphics output only goes to the current device:dev.cur()

I Turn off a graphics device with dev.off()

I Print the current plot: dev.print()I Make a copy of the current plot using another device:dev.copy(), dev.copy2eps

I However, best results are obtained by plotting directly tothe target device

55 / 114

Page 216: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Managing graphics devices

I You can have several graphics devices open at the sametime: dev.list() shows them

I Graphics output only goes to the current device:dev.cur()

I Turn off a graphics device with dev.off()

I Print the current plot: dev.print()I Make a copy of the current plot using another device:dev.copy(), dev.copy2eps

I However, best results are obtained by plotting directly tothe target device

55 / 114

Page 217: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Managing graphics devices

I You can have several graphics devices open at the sametime: dev.list() shows them

I Graphics output only goes to the current device:dev.cur()

I Turn off a graphics device with dev.off()

I Print the current plot: dev.print()I Make a copy of the current plot using another device:dev.copy(), dev.copy2eps

I However, best results are obtained by plotting directly tothe target device

55 / 114

Page 218: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Managing graphics devices

I You can have several graphics devices open at the sametime: dev.list() shows them

I Graphics output only goes to the current device:dev.cur()

I Turn off a graphics device with dev.off()

I Print the current plot: dev.print()I Make a copy of the current plot using another device:dev.copy(), dev.copy2eps

I However, best results are obtained by plotting directly tothe target device

55 / 114

Page 219: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The plot() Function

A generic function: does the right thing based on the class ofits arguments

I Plot sequential values of a numeric variable:x <- rnorm(100); plot(x)

I Bar chart for factors: plot(cut(x,5))I Time series plot: plot(ts(x))I Density plot: plot(density(x))I Two-way plots: plot(y ∼ x)

I Scatterplot if y, x are both numericI Series of boxplots if x is a factor

56 / 114

Page 220: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The plot() Function

A generic function: does the right thing based on the class ofits arguments

I Plot sequential values of a numeric variable:x <- rnorm(100); plot(x)

I Bar chart for factors: plot(cut(x,5))I Time series plot: plot(ts(x))I Density plot: plot(density(x))I Two-way plots: plot(y ∼ x)

I Scatterplot if y, x are both numericI Series of boxplots if x is a factor

56 / 114

Page 221: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The plot() Function

A generic function: does the right thing based on the class ofits arguments

I Plot sequential values of a numeric variable:x <- rnorm(100); plot(x)

I Bar chart for factors: plot(cut(x,5))I Time series plot: plot(ts(x))I Density plot: plot(density(x))I Two-way plots: plot(y ∼ x)

I Scatterplot if y, x are both numericI Series of boxplots if x is a factor

56 / 114

Page 222: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The plot() Function

A generic function: does the right thing based on the class ofits arguments

I Plot sequential values of a numeric variable:x <- rnorm(100); plot(x)

I Bar chart for factors: plot(cut(x,5))I Time series plot: plot(ts(x))I Density plot: plot(density(x))I Two-way plots: plot(y ∼ x)

I Scatterplot if y, x are both numericI Series of boxplots if x is a factor

56 / 114

Page 223: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The plot() Function

A generic function: does the right thing based on the class ofits arguments

I Plot sequential values of a numeric variable:x <- rnorm(100); plot(x)

I Bar chart for factors: plot(cut(x,5))I Time series plot: plot(ts(x))I Density plot: plot(density(x))I Two-way plots: plot(y ∼ x)

I Scatterplot if y, x are both numericI Series of boxplots if x is a factor

56 / 114

Page 224: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The plot() Function

A generic function: does the right thing based on the class ofits arguments

I Plot sequential values of a numeric variable:x <- rnorm(100); plot(x)

I Bar chart for factors: plot(cut(x,5))I Time series plot: plot(ts(x))I Density plot: plot(density(x))I Two-way plots: plot(y ∼ x)

I Scatterplot if y, x are both numericI Series of boxplots if x is a factor

56 / 114

Page 225: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The plot() Function

A generic function: does the right thing based on the class ofits arguments

I Plot sequential values of a numeric variable:x <- rnorm(100); plot(x)

I Bar chart for factors: plot(cut(x,5))I Time series plot: plot(ts(x))I Density plot: plot(density(x))I Two-way plots: plot(y ∼ x)

I Scatterplot if y, x are both numericI Series of boxplots if x is a factor

56 / 114

Page 226: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further High-level Plotting Functions

I Histogram: hist(x)I Boxplot: boxplot(x)I Barplot: barplot(x) (x can be a matrix)I Plot multiple variables: matplot(x) (x is a matrix)

57 / 114

Page 227: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further High-level Plotting Functions

I Histogram: hist(x)I Boxplot: boxplot(x)I Barplot: barplot(x) (x can be a matrix)I Plot multiple variables: matplot(x) (x is a matrix)

57 / 114

Page 228: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further High-level Plotting Functions

I Histogram: hist(x)I Boxplot: boxplot(x)I Barplot: barplot(x) (x can be a matrix)I Plot multiple variables: matplot(x) (x is a matrix)

57 / 114

Page 229: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further High-level Plotting Functions

I Histogram: hist(x)I Boxplot: boxplot(x)I Barplot: barplot(x) (x can be a matrix)I Plot multiple variables: matplot(x) (x is a matrix)

57 / 114

Page 230: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic x-y Plots

I The plot function with one or two numeric argumentsI Scatterplot or line plot (or both) depending on type

argument: "l" for l ines, "p" for points (the default), "b"for both, plus quite a few more

I Also: formula interface, plot(y~x), with argumentssimilar to the modeling functions like lm

58 / 114

Page 231: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic x-y Plots

I The plot function with one or two numeric argumentsI Scatterplot or line plot (or both) depending on type

argument: "l" for l ines, "p" for points (the default), "b"for both, plus quite a few more

I Also: formula interface, plot(y~x), with argumentssimilar to the modeling functions like lm

58 / 114

Page 232: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Basic x-y Plots

I The plot function with one or two numeric argumentsI Scatterplot or line plot (or both) depending on type

argument: "l" for l ines, "p" for points (the default), "b"for both, plus quite a few more

I Also: formula interface, plot(y~x), with argumentssimilar to the modeling functions like lm

58 / 114

Page 233: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Customizing Plots

I Most plotting functions take optional parameters to changethe appearance of the plot

I Most of these parameters can be supplied to the par()function, which changes the default behaviour ofsubsequent plotting functions

I Look them up via help(par)! Here are some of the morecommonly used:

I Point and line characteristics: pch, col, lty, lwdI Multiframe layout: mfrow, mfcolI Axes: xlim, ylim, xaxt, yaxt, log

59 / 114

Page 234: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Customizing Plots

I Most plotting functions take optional parameters to changethe appearance of the plot

I Most of these parameters can be supplied to the par()function, which changes the default behaviour ofsubsequent plotting functions

I Look them up via help(par)! Here are some of the morecommonly used:

I Point and line characteristics: pch, col, lty, lwdI Multiframe layout: mfrow, mfcolI Axes: xlim, ylim, xaxt, yaxt, log

59 / 114

Page 235: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Customizing Plots

I Most plotting functions take optional parameters to changethe appearance of the plot

I Most of these parameters can be supplied to the par()function, which changes the default behaviour ofsubsequent plotting functions

I Look them up via help(par)! Here are some of the morecommonly used:

I Point and line characteristics: pch, col, lty, lwdI Multiframe layout: mfrow, mfcolI Axes: xlim, ylim, xaxt, yaxt, log

59 / 114

Page 236: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Customizing Plots

I Most plotting functions take optional parameters to changethe appearance of the plot

I Most of these parameters can be supplied to the par()function, which changes the default behaviour ofsubsequent plotting functions

I Look them up via help(par)! Here are some of the morecommonly used:

I Point and line characteristics: pch, col, lty, lwdI Multiframe layout: mfrow, mfcolI Axes: xlim, ylim, xaxt, yaxt, log

59 / 114

Page 237: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Customizing Plots

I Most plotting functions take optional parameters to changethe appearance of the plot

I Most of these parameters can be supplied to the par()function, which changes the default behaviour ofsubsequent plotting functions

I Look them up via help(par)! Here are some of the morecommonly used:

I Point and line characteristics: pch, col, lty, lwdI Multiframe layout: mfrow, mfcolI Axes: xlim, ylim, xaxt, yaxt, log

59 / 114

Page 238: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Customizing Plots

I Most plotting functions take optional parameters to changethe appearance of the plot

I Most of these parameters can be supplied to the par()function, which changes the default behaviour ofsubsequent plotting functions

I Look them up via help(par)! Here are some of the morecommonly used:

I Point and line characteristics: pch, col, lty, lwdI Multiframe layout: mfrow, mfcolI Axes: xlim, ylim, xaxt, yaxt, log

59 / 114

Page 239: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Adding to Plots

I points(), lines() adds points and (poly-)linesI text() text strings at given coordinatesI mtext() margin text. Given in plotting coordinates along

one edge and lines in the perpendicular direction.I abline() line given by coefficients (a and b) or by fitted

linear modelI axis() adds an axis to one edge of the plot region.

Allows some options not otherwise available.

60 / 114

Page 240: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Adding to Plots

I points(), lines() adds points and (poly-)linesI text() text strings at given coordinatesI mtext() margin text. Given in plotting coordinates along

one edge and lines in the perpendicular direction.I abline() line given by coefficients (a and b) or by fitted

linear modelI axis() adds an axis to one edge of the plot region.

Allows some options not otherwise available.

60 / 114

Page 241: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Adding to Plots

I points(), lines() adds points and (poly-)linesI text() text strings at given coordinatesI mtext() margin text. Given in plotting coordinates along

one edge and lines in the perpendicular direction.I abline() line given by coefficients (a and b) or by fitted

linear modelI axis() adds an axis to one edge of the plot region.

Allows some options not otherwise available.

60 / 114

Page 242: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Adding to Plots

I points(), lines() adds points and (poly-)linesI text() text strings at given coordinatesI mtext() margin text. Given in plotting coordinates along

one edge and lines in the perpendicular direction.I abline() line given by coefficients (a and b) or by fitted

linear modelI axis() adds an axis to one edge of the plot region.

Allows some options not otherwise available.

60 / 114

Page 243: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Adding to Plots

I points(), lines() adds points and (poly-)linesI text() text strings at given coordinatesI mtext() margin text. Given in plotting coordinates along

one edge and lines in the perpendicular direction.I abline() line given by coefficients (a and b) or by fitted

linear modelI axis() adds an axis to one edge of the plot region.

Allows some options not otherwise available.

60 / 114

Page 244: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Approach to Customization

I Start with something that looks nearly rightI Modify parameters (using par() settings or plotting

arguments)I Add more graphics elements. Notice that there are

graphics parameters that turn things off, e.g. plot(x, y,xaxt="n") so that you can add completely customizedaxes.

I Multiframe layoutsI In emergency: overplot using par(new=TRUE)

61 / 114

Page 245: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Approach to Customization

I Start with something that looks nearly rightI Modify parameters (using par() settings or plotting

arguments)I Add more graphics elements. Notice that there are

graphics parameters that turn things off, e.g. plot(x, y,xaxt="n") so that you can add completely customizedaxes.

I Multiframe layoutsI In emergency: overplot using par(new=TRUE)

61 / 114

Page 246: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Approach to Customization

I Start with something that looks nearly rightI Modify parameters (using par() settings or plotting

arguments)I Add more graphics elements. Notice that there are

graphics parameters that turn things off, e.g. plot(x, y,xaxt="n") so that you can add completely customizedaxes.

I Multiframe layoutsI In emergency: overplot using par(new=TRUE)

61 / 114

Page 247: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Approach to Customization

I Start with something that looks nearly rightI Modify parameters (using par() settings or plotting

arguments)I Add more graphics elements. Notice that there are

graphics parameters that turn things off, e.g. plot(x, y,xaxt="n") so that you can add completely customizedaxes.

I Multiframe layoutsI In emergency: overplot using par(new=TRUE)

61 / 114

Page 248: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Approach to Customization

I Start with something that looks nearly rightI Modify parameters (using par() settings or plotting

arguments)I Add more graphics elements. Notice that there are

graphics parameters that turn things off, e.g. plot(x, y,xaxt="n") so that you can add completely customizedaxes.

I Multiframe layoutsI In emergency: overplot using par(new=TRUE)

61 / 114

Page 249: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 13

par(mfrow=c(2,2))matplot(intake)matplot(t(intake))matplot(t(intake),type="b")matplot(t(intake),type="b",pch=1:11,col="black",

lty="solid", xaxt="n")axis(1,at=1:2,labels=names(intake))

62 / 114

Page 250: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Margins

I R sometimes seems to leave too much empty spacearound plots

I There is a good reason for it: You might want to putsomething there (titles, axes).

I This is controlled by the mar parameter. By default, it isc(5,4,4,2)+0.1

I The units are lines of text, so depend on the setting ofpointsize and cex

I The mtext function is designed to write in the margins ofthe plot

I There is also an outer margin settable via the omaparameter. Useful for adding overall titles etc. tomultiframe plots

63 / 114

Page 251: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Margins

I R sometimes seems to leave too much empty spacearound plots

I There is a good reason for it: You might want to putsomething there (titles, axes).

I This is controlled by the mar parameter. By default, it isc(5,4,4,2)+0.1

I The units are lines of text, so depend on the setting ofpointsize and cex

I The mtext function is designed to write in the margins ofthe plot

I There is also an outer margin settable via the omaparameter. Useful for adding overall titles etc. tomultiframe plots

63 / 114

Page 252: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Margins

I R sometimes seems to leave too much empty spacearound plots

I There is a good reason for it: You might want to putsomething there (titles, axes).

I This is controlled by the mar parameter. By default, it isc(5,4,4,2)+0.1

I The units are lines of text, so depend on the setting ofpointsize and cex

I The mtext function is designed to write in the margins ofthe plot

I There is also an outer margin settable via the omaparameter. Useful for adding overall titles etc. tomultiframe plots

63 / 114

Page 253: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Margins

I R sometimes seems to leave too much empty spacearound plots

I There is a good reason for it: You might want to putsomething there (titles, axes).

I This is controlled by the mar parameter. By default, it isc(5,4,4,2)+0.1

I The units are lines of text, so depend on the setting ofpointsize and cex

I The mtext function is designed to write in the margins ofthe plot

I There is also an outer margin settable via the omaparameter. Useful for adding overall titles etc. tomultiframe plots

63 / 114

Page 254: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Margins

I R sometimes seems to leave too much empty spacearound plots

I There is a good reason for it: You might want to putsomething there (titles, axes).

I This is controlled by the mar parameter. By default, it isc(5,4,4,2)+0.1

I The units are lines of text, so depend on the setting ofpointsize and cex

I The mtext function is designed to write in the margins ofthe plot

I There is also an outer margin settable via the omaparameter. Useful for adding overall titles etc. tomultiframe plots

63 / 114

Page 255: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Margins

I R sometimes seems to leave too much empty spacearound plots

I There is a good reason for it: You might want to putsomething there (titles, axes).

I This is controlled by the mar parameter. By default, it isc(5,4,4,2)+0.1

I The units are lines of text, so depend on the setting ofpointsize and cex

I The mtext function is designed to write in the margins ofthe plot

I There is also an outer margin settable via the omaparameter. Useful for adding overall titles etc. tomultiframe plots

63 / 114

Page 256: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 14

x <- runif(50,0,2)y <- runif(50,0,2)plot(x, y, main="Main title", sub="subtitle",

xlab="x-label", ylab="y-label")text(0.6,0.6,"text at (0.6,0.6)")abline(h=.6,v=.6)for (side in 1:4) mtext(-1:4,side=side,at=.7,line=-1:4)mtext(paste("side",1:4), side=1:4, line=-1,font=2)

64 / 114

Page 257: Statistical Computing Using R - staff.pubhealth.ku.dk

0.0 0.5 1.0 1.5 2.0

0.5

1.0

1.5

2.0

Main title

subtitlex−label

y−la

bel

text at (0.6,0.6)

−101234

−101234

−10123

−1 0 1

side 1

sid

e 2

side 3

sid

e 4

Page 258: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Math on Plots

I Sort of like TeXI Works on unevaluated expressions (quote(alpha),

expression(alpha))I Special conventions: ˆ,[] sub/superscript, special namesalpha, sum, int

I See help(plotmath) and demo(plotmath)

I Manipulating the unevaluated expressions (“Computing onthe Language”) somewhat tricky. The bquote function ishandy.

66 / 114

Page 259: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Math on Plots

I Sort of like TeXI Works on unevaluated expressions (quote(alpha),

expression(alpha))I Special conventions: ˆ,[] sub/superscript, special namesalpha, sum, int

I See help(plotmath) and demo(plotmath)

I Manipulating the unevaluated expressions (“Computing onthe Language”) somewhat tricky. The bquote function ishandy.

66 / 114

Page 260: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Math on Plots

I Sort of like TeXI Works on unevaluated expressions (quote(alpha),

expression(alpha))I Special conventions: ˆ,[] sub/superscript, special namesalpha, sum, int

I See help(plotmath) and demo(plotmath)

I Manipulating the unevaluated expressions (“Computing onthe Language”) somewhat tricky. The bquote function ishandy.

66 / 114

Page 261: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Math on Plots

I Sort of like TeXI Works on unevaluated expressions (quote(alpha),

expression(alpha))I Special conventions: ˆ,[] sub/superscript, special namesalpha, sum, int

I See help(plotmath) and demo(plotmath)

I Manipulating the unevaluated expressions (“Computing onthe Language”) somewhat tricky. The bquote function ishandy.

66 / 114

Page 262: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Math on Plots

I Sort of like TeXI Works on unevaluated expressions (quote(alpha),

expression(alpha))I Special conventions: ˆ,[] sub/superscript, special namesalpha, sum, int

I See help(plotmath) and demo(plotmath)

I Manipulating the unevaluated expressions (“Computing onthe Language”) somewhat tricky. The bquote function ishandy.

66 / 114

Page 263: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 15

y <- rnorm(25)curve(dnorm(x, mean(y), sd(y)), from=-3, to=3)rug(y)abline(h=0)title(main=bquote(paste(mu==.(mean(y)), " ",

sigma==.(sd(y)))))

67 / 114

Page 264: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Grid and Lattice Graphics

I Standard R graphics allow graphs to be arranged in anm× n gridded layout.

I The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

I The lattice package uses grid for a structuralapproach to multiframe graphs (and more)

I This is mostly compatible with S-PLUS’s Trellis graphicsI Model formulas, y~x|g1*g2*...I Shingles: Partially overlapping intervals used for

conditioning plots

68 / 114

Page 265: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Grid and Lattice Graphics

I Standard R graphics allow graphs to be arranged in anm× n gridded layout.

I The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

I The lattice package uses grid for a structuralapproach to multiframe graphs (and more)

I This is mostly compatible with S-PLUS’s Trellis graphicsI Model formulas, y~x|g1*g2*...I Shingles: Partially overlapping intervals used for

conditioning plots

68 / 114

Page 266: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Grid and Lattice Graphics

I Standard R graphics allow graphs to be arranged in anm× n gridded layout.

I The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

I The lattice package uses grid for a structuralapproach to multiframe graphs (and more)

I This is mostly compatible with S-PLUS’s Trellis graphicsI Model formulas, y~x|g1*g2*...I Shingles: Partially overlapping intervals used for

conditioning plots

68 / 114

Page 267: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Grid and Lattice Graphics

I Standard R graphics allow graphs to be arranged in anm× n gridded layout.

I The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

I The lattice package uses grid for a structuralapproach to multiframe graphs (and more)

I This is mostly compatible with S-PLUS’s Trellis graphicsI Model formulas, y~x|g1*g2*...I Shingles: Partially overlapping intervals used for

conditioning plots

68 / 114

Page 268: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Grid and Lattice Graphics

I Standard R graphics allow graphs to be arranged in anm× n gridded layout.

I The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

I The lattice package uses grid for a structuralapproach to multiframe graphs (and more)

I This is mostly compatible with S-PLUS’s Trellis graphicsI Model formulas, y~x|g1*g2*...I Shingles: Partially overlapping intervals used for

conditioning plots

68 / 114

Page 269: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Grid and Lattice Graphics

I Standard R graphics allow graphs to be arranged in anm× n gridded layout.

I The grid package allows arbitrary viewports and creategraph objects (“grobs”) which can be modified before theyare printed.

I The lattice package uses grid for a structuralapproach to multiframe graphs (and more)

I This is mostly compatible with S-PLUS’s Trellis graphicsI Model formulas, y~x|g1*g2*...I Shingles: Partially overlapping intervals used for

conditioning plots

68 / 114

Page 270: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Demo 16

library(lattice)

trellis.par.set(theme = col.whitebg())myplot <-xyplot(log(Ozone)~Solar.R | equal.count(Temp),

group=Month, data=airquality,ylab=list(label=expression("log"*O[3]),cex=2),xlab=list(cex=2))

myplot # OBS: no plot until object is printed!

69 / 114

Page 271: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Panel Functions

I What goes inside each panel of a Lattice plot is controlledby a panel function

I There is a number of standard functions: panel.xyplot,panel.lmline, etc. (38 of them, currently)

I You can write your own panel functions, most often bycombining some of the standard ones

xyplot(log(Ozone)~Solar.R | equal.count(Temp), ......panel=function(x,y,...){

panel.xyplot(x,y,...)panel.lmline(x,y,type="l")

})

70 / 114

Page 272: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Panel Functions

I What goes inside each panel of a Lattice plot is controlledby a panel function

I There is a number of standard functions: panel.xyplot,panel.lmline, etc. (38 of them, currently)

I You can write your own panel functions, most often bycombining some of the standard ones

xyplot(log(Ozone)~Solar.R | equal.count(Temp), ......panel=function(x,y,...){

panel.xyplot(x,y,...)panel.lmline(x,y,type="l")

})

70 / 114

Page 273: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Panel Functions

I What goes inside each panel of a Lattice plot is controlledby a panel function

I There is a number of standard functions: panel.xyplot,panel.lmline, etc. (38 of them, currently)

I You can write your own panel functions, most often bycombining some of the standard ones

xyplot(log(Ozone)~Solar.R | equal.count(Temp), ......panel=function(x,y,...){

panel.xyplot(x,y,...)panel.lmline(x,y,type="l")

})

70 / 114

Page 274: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Panel Functions

I What goes inside each panel of a Lattice plot is controlledby a panel function

I There is a number of standard functions: panel.xyplot,panel.lmline, etc. (38 of them, currently)

I You can write your own panel functions, most often bycombining some of the standard ones

xyplot(log(Ozone)~Solar.R | equal.count(Temp), ......panel=function(x,y,...){

panel.xyplot(x,y,...)panel.lmline(x,y,type="l")

})

70 / 114

Page 275: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 276: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 277: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 278: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 279: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 280: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 281: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 282: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 283: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing R functions

I With experience, you will soon be writing your own Rfunctions

I Mostly, for ad-hoc tasks, collecting common functionalityI Or, because they are required input for certain tasks (e.g.,

optimizers!)I R is a full programming language with mathematical

functionalityI Flow controlI Scoping, local variablesI Matrix algebra

I Expressions inside functions work just like on thecommand line, except that the result is not printed.

I User-written functions are not substantially different fromsystem functions, making R very smoothly extensible.

71 / 114

Page 284: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Argument Matching

I logit <- function(p) log(p/(1-p))

I logit(0.5)

I Formal arguments (p)I Actual arguments (0.5)I Positional matching: plot(x,y)I Keyword matching: t.test(x ~ g, mu=2,alternative="less")

I Partial matching: t.test(x ~ g, mu=2, alt="l")

72 / 114

Page 285: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Argument Matching

I logit <- function(p) log(p/(1-p))

I logit(0.5)

I Formal arguments (p)I Actual arguments (0.5)I Positional matching: plot(x,y)I Keyword matching: t.test(x ~ g, mu=2,alternative="less")

I Partial matching: t.test(x ~ g, mu=2, alt="l")

72 / 114

Page 286: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Argument Matching

I logit <- function(p) log(p/(1-p))

I logit(0.5)

I Formal arguments (p)I Actual arguments (0.5)I Positional matching: plot(x,y)I Keyword matching: t.test(x ~ g, mu=2,alternative="less")

I Partial matching: t.test(x ~ g, mu=2, alt="l")

72 / 114

Page 287: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Argument Matching

I logit <- function(p) log(p/(1-p))

I logit(0.5)

I Formal arguments (p)I Actual arguments (0.5)I Positional matching: plot(x,y)I Keyword matching: t.test(x ~ g, mu=2,alternative="less")

I Partial matching: t.test(x ~ g, mu=2, alt="l")

72 / 114

Page 288: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Argument Matching

I logit <- function(p) log(p/(1-p))

I logit(0.5)

I Formal arguments (p)I Actual arguments (0.5)I Positional matching: plot(x,y)I Keyword matching: t.test(x ~ g, mu=2,alternative="less")

I Partial matching: t.test(x ~ g, mu=2, alt="l")

72 / 114

Page 289: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Argument Matching

I logit <- function(p) log(p/(1-p))

I logit(0.5)

I Formal arguments (p)I Actual arguments (0.5)I Positional matching: plot(x,y)I Keyword matching: t.test(x ~ g, mu=2,alternative="less")

I Partial matching: t.test(x ~ g, mu=2, alt="l")

72 / 114

Page 290: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Argument Matching

I logit <- function(p) log(p/(1-p))

I logit(0.5)

I Formal arguments (p)I Actual arguments (0.5)I Positional matching: plot(x,y)I Keyword matching: t.test(x ~ g, mu=2,alternative="less")

I Partial matching: t.test(x ~ g, mu=2, alt="l")

72 / 114

Page 291: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Flow control

I if/else

I switch()

I for loopsI repeat, while

73 / 114

Page 292: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Flow control

I if/else

I switch()

I for loopsI repeat, while

73 / 114

Page 293: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Flow control

I if/else

I switch()

I for loopsI repeat, while

73 / 114

Page 294: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Flow control

I if/else

I switch()

I for loopsI repeat, while

73 / 114

Page 295: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Conditional Expressions

if (paired)xok <- yok <- complete.cases(x, y)

else {yok <- !is.na(y)xok <- !is.na(x)

}twopi <- if(clockwise) -2*pi else 2*pi

I Notice that the condition is a scalar. It doesn’t vectorize;only one branch is taken. (Compare the ifelse()function)

I Conditions can be combined using && and || operatorsI An if expression does have a result, which can be used

as in the 2nd example above

74 / 114

Page 296: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The for loop

for (i in 1:n) ....for (i in names) ....for (ns in list(...)) ...for (pkg in getOption("defaultPackages")) {....

I Notice that the loop is over a vector or a listI Inside the body of a for loop, the loop variable takes on the

value of each element in turnI Even when it is a numeric sequence, the entire vector is

stored in memory. Fortunately, this is rarely a problem inpractice

I You can skip to the next element with next or exit the loopcompletely with break,

75 / 114

Page 297: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The for loop

for (i in 1:n) ....for (i in names) ....for (ns in list(...)) ...for (pkg in getOption("defaultPackages")) {....

I Notice that the loop is over a vector or a listI Inside the body of a for loop, the loop variable takes on the

value of each element in turnI Even when it is a numeric sequence, the entire vector is

stored in memory. Fortunately, this is rarely a problem inpractice

I You can skip to the next element with next or exit the loopcompletely with break,

75 / 114

Page 298: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The for loop

for (i in 1:n) ....for (i in names) ....for (ns in list(...)) ...for (pkg in getOption("defaultPackages")) {....

I Notice that the loop is over a vector or a listI Inside the body of a for loop, the loop variable takes on the

value of each element in turnI Even when it is a numeric sequence, the entire vector is

stored in memory. Fortunately, this is rarely a problem inpractice

I You can skip to the next element with next or exit the loopcompletely with break,

75 / 114

Page 299: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The for loop

for (i in 1:n) ....for (i in names) ....for (ns in list(...)) ...for (pkg in getOption("defaultPackages")) {....

I Notice that the loop is over a vector or a listI Inside the body of a for loop, the loop variable takes on the

value of each element in turnI Even when it is a numeric sequence, the entire vector is

stored in memory. Fortunately, this is rarely a problem inpractice

I You can skip to the next element with next or exit the loopcompletely with break,

75 / 114

Page 300: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The for loop

for (i in 1:n) ....for (i in names) ....for (ns in list(...)) ...for (pkg in getOption("defaultPackages")) {....

I Notice that the loop is over a vector or a listI Inside the body of a for loop, the loop variable takes on the

value of each element in turnI Even when it is a numeric sequence, the entire vector is

stored in memory. Fortunately, this is rarely a problem inpractice

I You can skip to the next element with next or exit the loopcompletely with break,

75 / 114

Page 301: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Not Using for Loops

I Many applications of for loops have the following structureI Allocate a list/vector to hold the resultsI Loop, saving the results of each iteration in turn

I (A common buglet is that people extend the vector onevery iteration, which can become terribly inefficient)

I This structure can be abstracted into a single function calllapply(lst, fun)

which causes fun to be applied to each element of lst,and returns a list of the results. (Further arguments can beadded and will be passed on to fun)

76 / 114

Page 302: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Not Using for Loops

I Many applications of for loops have the following structureI Allocate a list/vector to hold the resultsI Loop, saving the results of each iteration in turn

I (A common buglet is that people extend the vector onevery iteration, which can become terribly inefficient)

I This structure can be abstracted into a single function calllapply(lst, fun)

which causes fun to be applied to each element of lst,and returns a list of the results. (Further arguments can beadded and will be passed on to fun)

76 / 114

Page 303: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Not Using for Loops

I Many applications of for loops have the following structureI Allocate a list/vector to hold the resultsI Loop, saving the results of each iteration in turn

I (A common buglet is that people extend the vector onevery iteration, which can become terribly inefficient)

I This structure can be abstracted into a single function calllapply(lst, fun)

which causes fun to be applied to each element of lst,and returns a list of the results. (Further arguments can beadded and will be passed on to fun)

76 / 114

Page 304: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further Apply-functions

I lapply – list-applyI sapply – simplifying applyI tapply – tabulating applyI apply, sweep – along slices of tablesI replicate – repeat expression

77 / 114

Page 305: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further Apply-functions

I lapply – list-applyI sapply – simplifying applyI tapply – tabulating applyI apply, sweep – along slices of tablesI replicate – repeat expression

77 / 114

Page 306: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further Apply-functions

I lapply – list-applyI sapply – simplifying applyI tapply – tabulating applyI apply, sweep – along slices of tablesI replicate – repeat expression

77 / 114

Page 307: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further Apply-functions

I lapply – list-applyI sapply – simplifying applyI tapply – tabulating applyI apply, sweep – along slices of tablesI replicate – repeat expression

77 / 114

Page 308: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Further Apply-functions

I lapply – list-applyI sapply – simplifying applyI tapply – tabulating applyI apply, sweep – along slices of tablesI replicate – repeat expression

77 / 114

Page 309: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 310: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 311: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 312: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 313: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 314: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 315: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 316: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Class-based programming

I Lots of routines with similar functionality:I summary

I print

I plot

I Code reusability and simplification using higher levelabstractions

I (Compare old-style numerical libraries, with 5 types ofexponential functions and who knows how many differentkinds of matrix products.)

I Sometimes referred to as “object-oriented” programming,but somewhat different from C++ and Java

I Two versions currently in use, S3 and S4 based (mainly)on the corresponding S versions.

78 / 114

Page 317: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S3 classes

I In the S3 class system, a class is simply a character vectorattached to an object

I The class is accessed using class(), which also has anassignment form

I The object is often a list, but not invariablyI Notice that objects can have multiple classes — useinherits to query whether an object belongs to a givenclass.

79 / 114

Page 318: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S3 classes

I In the S3 class system, a class is simply a character vectorattached to an object

I The class is accessed using class(), which also has anassignment form

I The object is often a list, but not invariablyI Notice that objects can have multiple classes — useinherits to query whether an object belongs to a givenclass.

79 / 114

Page 319: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S3 classes

I In the S3 class system, a class is simply a character vectorattached to an object

I The class is accessed using class(), which also has anassignment form

I The object is often a list, but not invariablyI Notice that objects can have multiple classes — useinherits to query whether an object belongs to a givenclass.

79 / 114

Page 320: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S3 classes

I In the S3 class system, a class is simply a character vectorattached to an object

I The class is accessed using class(), which also has anassignment form

I The object is often a list, but not invariablyI Notice that objects can have multiple classes — useinherits to query whether an object belongs to a givenclass.

79 / 114

Page 321: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing S3 methods

I An S3 method is basically just a function which encodesthe class in its name, like t.test.formula ort.test.default

I The base function t.test is just a call toUseMethod("t.test")

I The logic of UseMethod is simply to go through the classvector and call the first matching method, or possibly adefault method

I A simple inheritance mechanism is obtained by allowingmethods to call NextMethod

I It is good programming practice to delegate all output tospecific print methods and have functions return classedobjects. Conversely, avoid calculating in print methods.

80 / 114

Page 322: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing S3 methods

I An S3 method is basically just a function which encodesthe class in its name, like t.test.formula ort.test.default

I The base function t.test is just a call toUseMethod("t.test")

I The logic of UseMethod is simply to go through the classvector and call the first matching method, or possibly adefault method

I A simple inheritance mechanism is obtained by allowingmethods to call NextMethod

I It is good programming practice to delegate all output tospecific print methods and have functions return classedobjects. Conversely, avoid calculating in print methods.

80 / 114

Page 323: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing S3 methods

I An S3 method is basically just a function which encodesthe class in its name, like t.test.formula ort.test.default

I The base function t.test is just a call toUseMethod("t.test")

I The logic of UseMethod is simply to go through the classvector and call the first matching method, or possibly adefault method

I A simple inheritance mechanism is obtained by allowingmethods to call NextMethod

I It is good programming practice to delegate all output tospecific print methods and have functions return classedobjects. Conversely, avoid calculating in print methods.

80 / 114

Page 324: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing S3 methods

I An S3 method is basically just a function which encodesthe class in its name, like t.test.formula ort.test.default

I The base function t.test is just a call toUseMethod("t.test")

I The logic of UseMethod is simply to go through the classvector and call the first matching method, or possibly adefault method

I A simple inheritance mechanism is obtained by allowingmethods to call NextMethod

I It is good programming practice to delegate all output tospecific print methods and have functions return classedobjects. Conversely, avoid calculating in print methods.

80 / 114

Page 325: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Writing S3 methods

I An S3 method is basically just a function which encodesthe class in its name, like t.test.formula ort.test.default

I The base function t.test is just a call toUseMethod("t.test")

I The logic of UseMethod is simply to go through the classvector and call the first matching method, or possibly adefault method

I A simple inheritance mechanism is obtained by allowingmethods to call NextMethod

I It is good programming practice to delegate all output tospecific print methods and have functions return classedobjects. Conversely, avoid calculating in print methods.

80 / 114

Page 326: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 327: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 328: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 329: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 330: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 331: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 332: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 333: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

What is wrong with S3 classes

I A large part of the code base of R (and S) is written usingS3 classes

I By and large, this works wellI However, problems have shown up over timeI There is no definition of which kind of object can belong to

a given class — opportunity for inconsistencyI E.g. a programmer may want to use the foo method forbar objects, so takes a list containing the componentsneeded for foo and sets the class to bar.

I What about other methods for bar objects?I What if the author of the bar code decides to change the

implementation?I You can only dispatch on one argument (the first). This is

not sufficient for things like matrix multiplication.

81 / 114

Page 334: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S4 classes

I (The methods package)I S4 formalizes the concept of a class in a way that attempts

to enforce consistencyI S4 classes have slots which have specified classes. These

are accessed using the @ operator, which is somewhat like$, but attempts to change a slot to something of the wrongclass gives an error

I Inheritance is handled explicitly: A class can contain otherclasses

I Methods can dispatch on multiple arguments (signatures)

82 / 114

Page 335: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S4 classes

I (The methods package)I S4 formalizes the concept of a class in a way that attempts

to enforce consistencyI S4 classes have slots which have specified classes. These

are accessed using the @ operator, which is somewhat like$, but attempts to change a slot to something of the wrongclass gives an error

I Inheritance is handled explicitly: A class can contain otherclasses

I Methods can dispatch on multiple arguments (signatures)

82 / 114

Page 336: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S4 classes

I (The methods package)I S4 formalizes the concept of a class in a way that attempts

to enforce consistencyI S4 classes have slots which have specified classes. These

are accessed using the @ operator, which is somewhat like$, but attempts to change a slot to something of the wrongclass gives an error

I Inheritance is handled explicitly: A class can contain otherclasses

I Methods can dispatch on multiple arguments (signatures)

82 / 114

Page 337: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S4 classes

I (The methods package)I S4 formalizes the concept of a class in a way that attempts

to enforce consistencyI S4 classes have slots which have specified classes. These

are accessed using the @ operator, which is somewhat like$, but attempts to change a slot to something of the wrongclass gives an error

I Inheritance is handled explicitly: A class can contain otherclasses

I Methods can dispatch on multiple arguments (signatures)

82 / 114

Page 338: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

S4 classes

I (The methods package)I S4 formalizes the concept of a class in a way that attempts

to enforce consistencyI S4 classes have slots which have specified classes. These

are accessed using the @ operator, which is somewhat like$, but attempts to change a slot to something of the wrongclass gives an error

I Inheritance is handled explicitly: A class can contain otherclasses

I Methods can dispatch on multiple arguments (signatures)

82 / 114

Page 339: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Using S4 in simple cases

I There’s some amount of “red tape” involved, but it isn’thard to emulate S3 functionality

I Define classes using setClass

I Create an instance using new

I Create new generic functions using setGeneric (existingS3 methods are automatically converted)

I Create methods using setMethod

83 / 114

Page 340: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Using S4 in simple cases

I There’s some amount of “red tape” involved, but it isn’thard to emulate S3 functionality

I Define classes using setClass

I Create an instance using new

I Create new generic functions using setGeneric (existingS3 methods are automatically converted)

I Create methods using setMethod

83 / 114

Page 341: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Using S4 in simple cases

I There’s some amount of “red tape” involved, but it isn’thard to emulate S3 functionality

I Define classes using setClass

I Create an instance using new

I Create new generic functions using setGeneric (existingS3 methods are automatically converted)

I Create methods using setMethod

83 / 114

Page 342: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Using S4 in simple cases

I There’s some amount of “red tape” involved, but it isn’thard to emulate S3 functionality

I Define classes using setClass

I Create an instance using new

I Create new generic functions using setGeneric (existingS3 methods are automatically converted)

I Create methods using setMethod

83 / 114

Page 343: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Using S4 in simple cases

I There’s some amount of “red tape” involved, but it isn’thard to emulate S3 functionality

I Define classes using setClass

I Create an instance using new

I Create new generic functions using setGeneric (existingS3 methods are automatically converted)

I Create methods using setMethod

83 / 114

Page 344: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Nonlinear Models

I Nonlinear regressionI Generic likelihood analysis

84 / 114

Page 345: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Nonlinear Models

I Nonlinear regressionI Generic likelihood analysis

84 / 114

Page 346: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Nonlinear Regression – nls

I Fits nonlinear models using least squaresI Self-starting models allows automatic determination of

start values for iterationI Writing self-starting models

85 / 114

Page 347: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Nonlinear Regression – nls

I Fits nonlinear models using least squaresI Self-starting models allows automatic determination of

start values for iterationI Writing self-starting models

85 / 114

Page 348: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Nonlinear Regression – nls

I Fits nonlinear models using least squaresI Self-starting models allows automatic determination of

start values for iterationI Writing self-starting models

85 / 114

Page 349: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Usage of nls

nlsout <- nls(y ~ A*exp(-alpha*t),start=list(A=2,alpha=0.05))

I Right side of model formula is arithmetic expression (nospecial interpretation for factors, etc.)

I Notice that this is a vectorized expression (very useful forODE solvers)

I The start argument defines which are parameters to beestimated and starting values for the iterative algorithm

86 / 114

Page 350: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Usage of nls

nlsout <- nls(y ~ A*exp(-alpha*t),start=list(A=2,alpha=0.05))

I Right side of model formula is arithmetic expression (nospecial interpretation for factors, etc.)

I Notice that this is a vectorized expression (very useful forODE solvers)

I The start argument defines which are parameters to beestimated and starting values for the iterative algorithm

86 / 114

Page 351: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Usage of nls

nlsout <- nls(y ~ A*exp(-alpha*t),start=list(A=2,alpha=0.05))

I Right side of model formula is arithmetic expression (nospecial interpretation for factors, etc.)

I Notice that this is a vectorized expression (very useful forODE solvers)

I The start argument defines which are parameters to beestimated and starting values for the iterative algorithm

86 / 114

Page 352: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Simple Usage of nls

nlsout <- nls(y ~ A*exp(-alpha*t),start=list(A=2,alpha=0.05))

I Right side of model formula is arithmetic expression (nospecial interpretation for factors, etc.)

I Notice that this is a vectorized expression (very useful forODE solvers)

I The start argument defines which are parameters to beestimated and starting values for the iterative algorithm

86 / 114

Page 353: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Summary Output

> summary(nlsout)

Formula: y ~ A * exp(-alpha * t)

Parameters:Estimate Std. Error t value Pr(>|t|)

A 4.75402 0.29408 16.166 5.88e-08 ***alpha 0.18364 0.02023 9.079 7.95e-06 ***

Residual standard error: 0.3892 on 9 degrees of freedom

Correlation of Parameter Estimates:A

alpha 0.6724

87 / 114

Page 354: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Profiling

par(mfrow=c(2,1))plot(profile(nlsout))

I Calculate profile t statistics, i.e. signed values of√∆SSD/SE(θ̂) for varying values of θ, maximized over

other parameters and signed according to which side of θ̂you’re on.

I Plots of |t| with indication of approximate confidence levels(.99, .95, .90, .80, .50)

88 / 114

Page 355: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Profiling

par(mfrow=c(2,1))plot(profile(nlsout))

I Calculate profile t statistics, i.e. signed values of√∆SSD/SE(θ̂) for varying values of θ, maximized over

other parameters and signed according to which side of θ̂you’re on.

I Plots of |t| with indication of approximate confidence levels(.99, .95, .90, .80, .50)

88 / 114

Page 356: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Profiling

par(mfrow=c(2,1))plot(profile(nlsout))

I Calculate profile t statistics, i.e. signed values of√∆SSD/SE(θ̂) for varying values of θ, maximized over

other parameters and signed according to which side of θ̂you’re on.

I Plots of |t| with indication of approximate confidence levels(.99, .95, .90, .80, .50)

88 / 114

Page 357: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Profile Plots

4.0 4.5 5.0 5.5

0.0

1.0

2.0

3.0

A

τ

0.15 0.20 0.25

0.0

1.0

2.0

3.0

alpha

τ

89 / 114

Page 358: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Confidence Intervals

> confint(nlsout)Waiting for profiling to be done...

2.5% 97.5%A 4.0931837 5.4381261alpha 0.1395008 0.2342045

I The same procedure as in profile plots, but showing resultsnumerically

90 / 114

Page 359: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Confidence Intervals

> confint(nlsout)Waiting for profiling to be done...

2.5% 97.5%A 4.0931837 5.4381261alpha 0.1395008 0.2342045

I The same procedure as in profile plots, but showing resultsnumerically

90 / 114

Page 360: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Selfstarting Models

I How to get starting values?I Mostly an art, but can be worked out for typical situationsI Typical tricks:

I transform to linearityI calculate “landmarks” as function of parameters (AUC,

initial slope, position and value of maximum) estimate themempirically and solve for parameters

I Idea: Store algorithm for starting value within model objectI Standard models supplied: SSfol, etc.

91 / 114

Page 361: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Selfstarting Models

I How to get starting values?I Mostly an art, but can be worked out for typical situationsI Typical tricks:

I transform to linearityI calculate “landmarks” as function of parameters (AUC,

initial slope, position and value of maximum) estimate themempirically and solve for parameters

I Idea: Store algorithm for starting value within model objectI Standard models supplied: SSfol, etc.

91 / 114

Page 362: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Selfstarting Models

I How to get starting values?I Mostly an art, but can be worked out for typical situationsI Typical tricks:

I transform to linearityI calculate “landmarks” as function of parameters (AUC,

initial slope, position and value of maximum) estimate themempirically and solve for parameters

I Idea: Store algorithm for starting value within model objectI Standard models supplied: SSfol, etc.

91 / 114

Page 363: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Selfstarting Models

I How to get starting values?I Mostly an art, but can be worked out for typical situationsI Typical tricks:

I transform to linearityI calculate “landmarks” as function of parameters (AUC,

initial slope, position and value of maximum) estimate themempirically and solve for parameters

I Idea: Store algorithm for starting value within model objectI Standard models supplied: SSfol, etc.

91 / 114

Page 364: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Selfstarting Models

I How to get starting values?I Mostly an art, but can be worked out for typical situationsI Typical tricks:

I transform to linearityI calculate “landmarks” as function of parameters (AUC,

initial slope, position and value of maximum) estimate themempirically and solve for parameters

I Idea: Store algorithm for starting value within model objectI Standard models supplied: SSfol, etc.

91 / 114

Page 365: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Selfstarting Models

I How to get starting values?I Mostly an art, but can be worked out for typical situationsI Typical tricks:

I transform to linearityI calculate “landmarks” as function of parameters (AUC,

initial slope, position and value of maximum) estimate themempirically and solve for parameters

I Idea: Store algorithm for starting value within model objectI Standard models supplied: SSfol, etc.

91 / 114

Page 366: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Selfstarting Models

I How to get starting values?I Mostly an art, but can be worked out for typical situationsI Typical tricks:

I transform to linearityI calculate “landmarks” as function of parameters (AUC,

initial slope, position and value of maximum) estimate themempirically and solve for parameters

I Idea: Store algorithm for starting value within model objectI Standard models supplied: SSfol, etc.

91 / 114

Page 367: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Direct Likelihood Approaches

I Ideas about generic likelihood software have been aroundfor a long time, e.g. in the “Lexical scoping” paper(Gentleman and Ihaka, JCGS 2000)

I The concrete occasion was a question on R-help byVincent Philion in July 2003

92 / 114

Page 368: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Direct Likelihood Approaches

I Ideas about generic likelihood software have been aroundfor a long time, e.g. in the “Lexical scoping” paper(Gentleman and Ihaka, JCGS 2000)

I The concrete occasion was a question on R-help byVincent Philion in July 2003

92 / 114

Page 369: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Philion’s problem (lightly edited)

Hello and thank you for your interest in this problem."real life data" would look like this:

x y0 280.03 210.1 110.3 15...100 0

Where X is dose and Y is response.the relation is linear for log(resp) = b log(dose) + intcpt

Response for dose 0 is a "control" = Ymax. So, What I want isthe dose for 50 percent response. For instance, in example 1:

Ymax = 28 (this is also an observation with Poisson error)

So I want dose for response = 14 = approx. 0.3

93 / 114

Page 370: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model

What Philion effectively suggested wasI Yi ∼ Pois(λ(xi))

I λ(0) = ymax

I log λ(x) = α + β log x for x > 0I This is standard log-linear and can be fitted with glm()

I . . . but it is a strange model!

94 / 114

Page 371: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model

What Philion effectively suggested wasI Yi ∼ Pois(λ(xi))

I λ(0) = ymax

I log λ(x) = α + β log x for x > 0I This is standard log-linear and can be fitted with glm()

I . . . but it is a strange model!

94 / 114

Page 372: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model

What Philion effectively suggested wasI Yi ∼ Pois(λ(xi))

I λ(0) = ymax

I log λ(x) = α + β log x for x > 0I This is standard log-linear and can be fitted with glm()

I . . . but it is a strange model!

94 / 114

Page 373: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model

What Philion effectively suggested wasI Yi ∼ Pois(λ(xi))

I λ(0) = ymax

I log λ(x) = α + β log x for x > 0I This is standard log-linear and can be fitted with glm()

I . . . but it is a strange model!

94 / 114

Page 374: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Model

What Philion effectively suggested wasI Yi ∼ Pois(λ(xi))

I λ(0) = ymax

I log λ(x) = α + β log x for x > 0I This is standard log-linear and can be fitted with glm()

I . . . but it is a strange model!

94 / 114

Page 375: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Alternative model

I Avoid strange behaviour as x → 0I E.g.: λ(x) = ymax/(1 + x/k)I alias 1/λ(x) = 1/ymax + 1/kymax × xI I.e., this is an inverse-linear Poisson modelI glm(y x, poisson("inverse")) +

reparameterization

95 / 114

Page 376: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Alternative model

I Avoid strange behaviour as x → 0I E.g.: λ(x) = ymax/(1 + x/k)I alias 1/λ(x) = 1/ymax + 1/kymax × xI I.e., this is an inverse-linear Poisson modelI glm(y x, poisson("inverse")) +

reparameterization

95 / 114

Page 377: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Alternative model

I Avoid strange behaviour as x → 0I E.g.: λ(x) = ymax/(1 + x/k)I alias 1/λ(x) = 1/ymax + 1/kymax × xI I.e., this is an inverse-linear Poisson modelI glm(y x, poisson("inverse")) +

reparameterization

95 / 114

Page 378: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Alternative model

I Avoid strange behaviour as x → 0I E.g.: λ(x) = ymax/(1 + x/k)I alias 1/λ(x) = 1/ymax + 1/kymax × xI I.e., this is an inverse-linear Poisson modelI glm(y x, poisson("inverse")) +

reparameterization

95 / 114

Page 379: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Alternative model

I Avoid strange behaviour as x → 0I E.g.: λ(x) = ymax/(1 + x/k)I alias 1/λ(x) = 1/ymax + 1/kymax × xI I.e., this is an inverse-linear Poisson modelI glm(y x, poisson("inverse")) +

reparameterization

95 / 114

Page 380: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Problem

I λ(x) = ymax/(1 + x/k)I For large x this is proportional to x−1

I . . . but not an arbitrary inverse power law (x−β)as in theoriginal question

I Possible fix: λ(x) = ymax/(1 + x/k)β

I . . . but this is no longer a generalized linear model

96 / 114

Page 381: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Problem

I λ(x) = ymax/(1 + x/k)I For large x this is proportional to x−1

I . . . but not an arbitrary inverse power law (x−β)as in theoriginal question

I Possible fix: λ(x) = ymax/(1 + x/k)β

I . . . but this is no longer a generalized linear model

96 / 114

Page 382: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Problem

I λ(x) = ymax/(1 + x/k)I For large x this is proportional to x−1

I . . . but not an arbitrary inverse power law (x−β)as in theoriginal question

I Possible fix: λ(x) = ymax/(1 + x/k)β

I . . . but this is no longer a generalized linear model

96 / 114

Page 383: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Problem

I λ(x) = ymax/(1 + x/k)I For large x this is proportional to x−1

I . . . but not an arbitrary inverse power law (x−β)as in theoriginal question

I Possible fix: λ(x) = ymax/(1 + x/k)β

I . . . but this is no longer a generalized linear model

96 / 114

Page 384: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Problem

I λ(x) = ymax/(1 + x/k)I For large x this is proportional to x−1

I . . . but not an arbitrary inverse power law (x−β)as in theoriginal question

I Possible fix: λ(x) = ymax/(1 + x/k)β

I . . . but this is no longer a generalized linear model

96 / 114

Page 385: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

This is silly!

I Why do we let the existence of standard model familiesinterfere with a sensible choice of model?

I When all you have is a hammer. . .I But we do have other tools!I E.g. quite powerful optimizers like optim

I Why not just write out the likelihood and maximize it?

97 / 114

Page 386: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

This is silly!

I Why do we let the existence of standard model familiesinterfere with a sensible choice of model?

I When all you have is a hammer. . .I But we do have other tools!I E.g. quite powerful optimizers like optim

I Why not just write out the likelihood and maximize it?

97 / 114

Page 387: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

This is silly!

I Why do we let the existence of standard model familiesinterfere with a sensible choice of model?

I When all you have is a hammer. . .I But we do have other tools!I E.g. quite powerful optimizers like optim

I Why not just write out the likelihood and maximize it?

97 / 114

Page 388: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

This is silly!

I Why do we let the existence of standard model familiesinterfere with a sensible choice of model?

I When all you have is a hammer. . .I But we do have other tools!I E.g. quite powerful optimizers like optim

I Why not just write out the likelihood and maximize it?

97 / 114

Page 389: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

This is silly!

I Why do we let the existence of standard model familiesinterfere with a sensible choice of model?

I When all you have is a hammer. . .I But we do have other tools!I E.g. quite powerful optimizers like optim

I Why not just write out the likelihood and maximize it?

97 / 114

Page 390: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The mle function

I Basically just a wrapper for optim("BFGS" method bydefault.

I Supply − log L and the routine does the rest (hopefully).I Nice methods for summary display, likelihood profiling

(√

LRT, possibly signed), and approximate confidenceintervals

98 / 114

Page 391: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The mle function

I Basically just a wrapper for optim("BFGS" method bydefault.

I Supply − log L and the routine does the rest (hopefully).I Nice methods for summary display, likelihood profiling

(√

LRT, possibly signed), and approximate confidenceintervals

98 / 114

Page 392: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The mle function

I Basically just a wrapper for optim("BFGS" method bydefault.

I Supply − log L and the routine does the rest (hopefully).I Nice methods for summary display, likelihood profiling

(√

LRT, possibly signed), and approximate confidenceintervals

98 / 114

Page 393: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

How to use it

> library(stats4)> mll <- function(ymax, k, la)+ with(e1,+ -sum(dpois(response,+ ymax / (1 + dose/k)^exp(la),+ log=TRUE)))> fit <- mle(mll, start=list(ymax=28, k=.3, la=0))

99 / 114

Page 394: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Output

> summary(fit)Maximum likelihood estimation

Call:mle(minuslogl = mll, start = list(ymax = 28, k = 0.3, la = 0))

Coefficients:Estimate Std. Error

ymax 24.4735208 4.7498215k 0.1884905 0.2735927la -0.2285072 0.4696129

-2 log L: 33.80755

100 / 114

Page 395: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Profiling> par(mfrow=c(3,1))> plot(profile(fit, del=.05))

15 20 25 30 35 40

0.0

1.0

2.0

ymax

z

0.0 0.5 1.0 1.5

0.0

1.0

2.0

k

z

−1.0 −0.5 0.0 0.5 1.0

0.0

1.0

2.0

la

z

101 / 114

Page 396: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Code Examples

I Alias “the buffer zone”I Simulation and permutation testsI A somewhat elaborate plotI Digging into model objectsI Time splitting

102 / 114

Page 397: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Code Examples

I Alias “the buffer zone”I Simulation and permutation testsI A somewhat elaborate plotI Digging into model objectsI Time splitting

102 / 114

Page 398: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Code Examples

I Alias “the buffer zone”I Simulation and permutation testsI A somewhat elaborate plotI Digging into model objectsI Time splitting

102 / 114

Page 399: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Code Examples

I Alias “the buffer zone”I Simulation and permutation testsI A somewhat elaborate plotI Digging into model objectsI Time splitting

102 / 114

Page 400: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Code Examples

I Alias “the buffer zone”I Simulation and permutation testsI A somewhat elaborate plotI Digging into model objectsI Time splitting

102 / 114

Page 401: Statistical Computing Using R - staff.pubhealth.ku.dk

Simple simulation

summary(glm.hyp)glm.hyp <- update(glm.hyp, ~ . - smoking)summary(glm.hyp)p0 <- predict(glm.hyp, type="response")n <- rowSums(hyp.tbl)(y <- rbinom(8, prob=p0, size=n))

glm(cbind(y,n-y)~ smoking+obesity+snoring,binomial)sim <- replicate(1000, { y <- rbinom(8, prob=p0, size=n)coef(glm(cbind(y,n-y)~ smoking+obesity+snoring,binomial))[2]})summary(glm(hyp.tbl ~ smoking+obesity+snoring,binomial))sum(abs(sim)>.06777)qqnorm(sim)

Page 402: Statistical Computing Using R - staff.pubhealth.ku.dk

Permutation tests

t.test(extra~group, data=sleep)sim <- replicate(1000,

t.test(extra~sample(group), data=sleep)$p.value)sum(sim < 0.0794)# Takes a bit too long on the laptop# sim <- with(sleep,combn(20, 10, function(i)# t.test(extra[i], extra[-i])$p.value))# sum(sim < 0.0794)/length(sim) # 0.07820

Page 403: Statistical Computing Using R - staff.pubhealth.ku.dk

A Plotting Example – preliminaries

(available <- aggregate(!is.na(alkfos),list(alkfos$grp), sum))alkfos.pctchange <- (sweep(alkfos[-1], 1, alkfos$c0,

"/") - 1)*100(means <- aggregate(alkfos.pctchange, list(alkfos$grp),

mean, na.rm=TRUE))(sds <- aggregate(alkfos.pctchange, list(alkfos$grp),

sd, na.rm=TRUE))available <- as.matrix(available[-(1:2)])means <- as.matrix(means[-1])sds <- as.matrix(sds[-1])sems <- sds/sqrt(available)

time <- c(0,3,6,9,12,18,24)

upr <- means + semslwr <- means - sems

Page 404: Statistical Computing Using R - staff.pubhealth.ku.dk

The Actual Plotting

par(mar=.1 + c(8,4,4,2))ylim <- range(upr,lwr)plot(time,means[1,], type="b", ylim=ylim, xaxt="n",

ylab="alkaline phosphatase")time2 <- time + 0.25points(time2,means[2,], type="b")segments(time,upr[1,],time,lwr[1,])segments(time2,upr[2,],time2,lwr[2,])axis(1,at=time)mtext(available[1,],side=1, line=5, at=time)mtext(available[2,],side=1, line=6, at=time)

Page 405: Statistical Computing Using R - staff.pubhealth.ku.dk

All Pairwise Differences

aq <- airqualityaq$Month <- factor(aq$Month, labels=month.abb[5:9])fit.aq <- lm(Ozone ~ Wind + Month, data=aq)(V <- vcov(fit.aq))where <- 3:6(M0 <- V[where, where])(M <- cbind(0,rbind(0,M0)))(m <- c(0,coef(fit.aq)[where]))(D <- outer(m, m, "-"))DI <- diag(M)(VarD <- outer(DI, DI, "+") - 2*M)D/sqrt(VarD)

Page 406: Statistical Computing Using R - staff.pubhealth.ku.dk

A more general approach

(T <- contr.treatment(month.abb[5:9]))MT %*% M0 %*% (t(T))## Same thing works for *any* contrast matrix

## How to get "where" and "contr.treatment" from fit.aq?fit.aq$contrasts[["Month"]]## so you can find the contrast matrix using something likeT <- get(fit.aq$contrasts[["Month"]])(levels(aq$Month))

## To find which part of the vcov matrix belongs to Monthfit.aq$assign## ...tells you that term #2 is in 3:6, so you just## need to know that Month is term #2

attr(terms(fit.aq),"term.labels")(pos <- match("Month", attr(terms(fit.aq),"term.labels")))where <- fit.aq$assign==posvcov(fit.aq)[where, where]

Page 407: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Time Splitting

I Split survival data into bands according to some time scaleI Used in survival analysis and epidemiologyI Vector of (delayed-entry) survival timesI Vector of break points

109 / 114

Page 408: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Time Splitting

I Split survival data into bands according to some time scaleI Used in survival analysis and epidemiologyI Vector of (delayed-entry) survival timesI Vector of break points

109 / 114

Page 409: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Time Splitting

I Split survival data into bands according to some time scaleI Used in survival analysis and epidemiologyI Vector of (delayed-entry) survival timesI Vector of break points

109 / 114

Page 410: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Time Splitting

I Split survival data into bands according to some time scaleI Used in survival analysis and epidemiologyI Vector of (delayed-entry) survival timesI Vector of break points

109 / 114

Page 411: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

“The SAS Way”

(pseudocode)

loop over individuals (implicit)loop over intervals{

if overlap with survival time{

trim survival time to intervaloutput modified case

}}

110 / 114

Page 412: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The R Way

You might mimic the SAS strategy, but it is inefficient in R.Here’s another idea:

loop over intervals{

select subjects that overlap with intervaltrim times to intervalkeep resulting vector

}stick all vectors together

That way, we can utilize vectorization of the selection andtrimming tasks.

111 / 114

Page 413: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The R Way

You might mimic the SAS strategy, but it is inefficient in R.Here’s another idea:

loop over intervals{

select subjects that overlap with intervaltrim times to intervalkeep resulting vector

}stick all vectors together

That way, we can utilize vectorization of the selection andtrimming tasks.

111 / 114

Page 414: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

The R Way

You might mimic the SAS strategy, but it is inefficient in R.Here’s another idea:

loop over intervals{

select subjects that overlap with intervaltrim times to intervalkeep resulting vector

}stick all vectors together

That way, we can utilize vectorization of the selection andtrimming tasks.

111 / 114

Page 415: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Trimming to a Single Interval

head(nickel)entry <- pmax(nickel$agein, 60)exit <- pmin(nickel$ageout, 65)valid <- (entry < exit)entry <- entry[valid]exit <- exit[valid]cens <- (nickel$ageout[valid] > 65)nickel60 <- nickel[valid,]nickel60$icd[cens] <- 0nickel60$agein <- entrynickel60$ageout <- exithead(nickel60)

112 / 114

Page 416: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Trimming as Function

trim <- function(start){end <- start + 5entry <- pmax(nickel$agein, start)exit <- pmin(nickel$ageout, end)valid <- (entry < exit)cens <- (nickel$ageout[valid] > end)result <- nickel[valid,]result$icd[cens] <- 0result$agein <- entry[valid]result$ageout <- exit[valid]result

}

113 / 114

Page 417: Statistical Computing Using R - staff.pubhealth.ku.dk

R Basics Statistics Graphics Programming Nonlinear Models Examples

Processing All Time Bands

head(trim(60))nickel.expand <- do.call("rbind", lapply(seq(20,95,5), trim))head(nickel.expand)subset(nickel.expand, id==4)

114 / 114