stat 437 lecture notes 1

Post on 21-Dec-2021

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stat 437 Lecture Notes 1Xiongzhi Chen

Washington State University

Contents3

Set up RStudio 3Install R and Rstudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Rstudio: a sanpshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Rstudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Objects in R: I 3Scalars in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Vectors in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Vectors in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4The seq command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Matrices in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Matrices in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Matrices in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Matrices in R: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Data frames in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Data frames in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Data frames in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Data frames in R: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Objects in R: II 8Character vectors in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Strings in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Factors in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Factors in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Logic operators in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Logic operators in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Logic operators in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Logic operators in R: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Lists in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Lists in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Lists in R: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Set operations in R: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Set operations in R: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12“Coerce” in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12length and dim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

R markdown 13Install R markdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Create a R markdown file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Structure of a markdown file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13A sample markdown file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Basic syntax: I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Basic syntax: II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1

Basic syntax: III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Basic syntax: IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Latex in markdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Data visualization 15Why data visualization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15R packages for visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Basic principles for plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Scatter plot, density plot, boxplot, bar plot 16Scatter plot matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Scatter plot matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Density plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Density plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Boxplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Boxplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Scatter plot matrix: ggpairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Bar plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Visualization with factors 20Look into iris data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Faceting with 1 factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Non-faceting with 1 factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Faceting with 2 factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Faceting with 2 factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Visualization with 3 factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Mathematical expressions 24Math expressions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Subset of diamonds data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Base layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Math symbols in axis titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Math symbols in axis titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Math symbols in legend title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Subset of diamonds data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Base layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Math symbols in legend labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Math symbols in strip names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Math symbols in strip names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Math symbols in strip names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Math symbols in plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Other ggplot2 twicks 31Not covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31License and session Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2

Set up RStudio

Install R and Rstudio

• Rstudio free version at: https://www.rstudio.com/products/rstudio/download/• R at: https://www.r-project.org/• install a R package by install.packages("package_name")• install R packages “tidyverse”, “ggplot2”, “markdown”, “igraph”, “plotly”, “ggmap”

Rstudio: a sanpshot

Rstudio

• Upper Left panel: R scripts, R markdown file, R project file, View data, etc

• Lower Left panel: R console, R markdown log, etc

• Upper Right panel: R workspace, History, etc

• Lower Right panel: Files in working directory, Plots, Help, etc

Objects in R: I

Scalars in R

3

> x = 3 # assign value 3 to variable x> y = 2> x+y # addition[1] 5> x*y # multiplication[1] 6> x/y # division[1] 1.5> x%%y # modulo[1] 1> x^y # exponentiation[1] 9> x/0[1] Inf> 0/0 # undefifed[1] NaN

Vectors in R: I

> z = c(1,2,3) # a vector of 3 components> v = c(5,6,7)> z+v # vector addition[1] 6 8 10> z*v # paired componentwise product[1] 5 12 21> z/v # paired componentwise division[1] 0.2000000 0.3333333 0.4285714> z%*%v # inner product

[,1][1,] 38> 2*z # scalar-vector multipication[1] 2 4 6

Vectors in R: II

> z = c(1,2,3)> v = c(5,6,7)> z[1] # access the 1st component of z[1] 1> t(v) # transpose of vector

[,1] [,2] [,3][1,] 5 6 7> z%*%t(v) # outer product

[,1] [,2] [,3][1,] 5 6 7[2,] 10 12 14[3,] 15 18 21

4

The seq command

> seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),+ length.out = NULL, along.with = NULL, ...)

Usage:> seq(0,1,by=0.1)[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Matrices in R: I

> matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix[,1] [,2] [,3]

[1,] 1 3 5[2,] 2 4 6> x = c(1,3,5) # a 3-component vector> y = c(2,4,6) #a 3-component vector> # stack x and y as 2 rows to obtain a 2-by-3 matrix> rbind(x,y)

[,1] [,2] [,3]x 1 3 5y 2 4 6> # stack x and y as 2 columns to obtain a 3-by-2 matrix> cbind(x,y)

x y[1,] 1 2[2,] 3 4[3,] 5 6

Matrices in R: II

> x=matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix> x

[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6> x[,1] # 1st column of x[1] 1 2> x[2,] # 2nd row of x[1] 2 4 6> x[1,2] # (1,2)-entry of x[1] 3> t(x) # transpose of x

[,1] [,2][1,] 1 2[2,] 3 4[3,] 5 6

5

Matrices in R: III

> x=matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix> x

[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6> y = rbind(c(0,1,0),c(1,1,1))> y

[,1] [,2] [,3][1,] 0 1 0[2,] 1 1 1> x %*%t(y) # matrix Cauchy product

[,1] [,2][1,] 3 9[2,] 4 12

Matrices in R: IV

> x=matrix(1:6,nrow=2,ncol=3) # a 2-by-3 matrix> x

[,1] [,2] [,3][1,] 1 3 5[2,] 2 4 6> y = rbind(c(0,1,0),c(1,1,1))> y

[,1] [,2] [,3][1,] 0 1 0[2,] 1 1 1> x + y # matrix addition

[,1] [,2] [,3][1,] 1 4 5[2,] 3 5 7> 2*x # scalar multiplication

[,1] [,2] [,3][1,] 2 6 10[2,] 4 8 12

Data frames in R: I

> x <- data.frame("SN" = 1:2, "Age" = c(21,15),+ "Name" = c("John","Dora"))> x

SN Age Name1 1 21 John2 2 15 Dora> x$SN #access SN[1] 1 2> x[,1] # access SN

6

[1] 1 2> class(x$SN) # check object type for SN[1] "integer"> class(x$Name) # check object type for Name[1] "factor"

Data frames in R: II

> x <- data.frame("SN" = 1:2, "Age" = c(21,15),+ "Name" = c("John","Dora"))> x

SN Age Name1 1 21 John2 2 15 Dora> x$SN[2] #access the 2nd entry of SN[1] 2> x[1,2] #access the 1st entry of Age[1] 21

Caution: do not transpose a data.frame when it contains different types of objects

Data frames in R: III

Import (malaria related death) data as data.frame:> Y = read.csv("dataMalyria.csv",header = TRUE,sep=",",+ colClasses=c("country"=NA,"percent"="numeric",+ "labels"=NA))> head(Y)

country percent labels1 Lesotho 0 <1%2 Mauritius 0 <1%3 Seychelles 0 <1%4 Cabo Verde 0 <1%5 Algeria 0 <1%6 Egypt 0 <1%

Data frames in R: IV

Import (malaria related death) data as data.frame:> str(Y) # object structure of Y'data.frame': 53 obs. of 3 variables:$ country: Factor w/ 53 levels "Algeria","Angola",..: 25 32 41 7 1 15 27 33 50 47 ...$ percent: num 0 0 0 0 0 0 0 0 0 0 ...$ labels : Factor w/ 5 levels " <1% "," 1-4% ",..: 1 1 1 1 1 1 1 1 1 1 ...

> dim(Y) # dimension of Y[1] 53 3> Y$id = 1:53 # append a column to Y> Y[1:3,] # display the first 3 rows of Y

7

country percent labels id1 Lesotho 0 <1% 12 Mauritius 0 <1% 23 Seychelles 0 <1% 3

Objects in R: II

Character vectors in R

> w = c("a","b","c") # a vector of 3 character components> w[2] # access the 2nd component[1] "b"> # 1st 10 upper case letters in the alphabet> LETTERS[seq( from = 1, to = 10 )][1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"

> # 1st 10 lower case letters in the alphabet> letters[seq( from = 1, to = 10 )][1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

> Q = c("Go","WSU","Cougs","!")> Q[1] "Go" "WSU" "Cougs" "!"> # concatenate two character vectors> c(w,Q)[1] "a" "b" "c" "Go" "WSU" "Cougs" "!"

Strings in R

> w = "Go cougs!"> w[1] "Go cougs!">> v = "Data analytics"> v[1] "Data analytics">> # concatenate two strings> paste(w,v,sep = " ")[1] "Go cougs! Data analytics"

Factors in R: I

> grades = c("A","F","D","C","B") # character vector> grades[1] "A" "F" "D" "C" "B"> class(grades)[1] "character"

8

> gradesF = factor(grades) # gradesF is a now factor> gradesF[1] A F D C BLevels: A B C D F> class(gradesF)[1] "factor"> # levels of the factor "gradesF"> levels(gradesF)[1] "A" "B" "C" "D" "F"> # levels are ordered alphabetically

Factors in R: II

> x = c(1,3,2) # numeric vector> b = factor(x) # change x into a factor> b[1] 1 3 2Levels: 1 2 3> levels(b) # levels are ordered from smallest to largest[1] "1" "2" "3"> # relabel levels of b> d = factor(x,labels = c("3Level","1Level","2Level"))> d[1] 3Level 2Level 1LevelLevels: 3Level 1Level 2Level

Logic operators in R: I

> x = 0 # assign 0 to x> x >0[1] FALSE> x == 0[1] TRUE> !x # return TRUE[1] TRUE> y = 1> y >= 1[1] TRUE> !y # return FALSE[1] FALSE> x & y # "and"; return FALSE[1] FALSE> x | y # "or"; return TRUE[1] TRUE

9

Logic operators in R: II

> x = 1> y = -1> x >0 & y > 0 # "and"[1] FALSE> x > 0 | y > 0 # "or"[1] TRUE> x >0 & !(y>0)[1] TRUE

Logic operators in R: III

> x = c(1,2,3) # a 3-component vector> x >0 # returns a 3-component logic vector[1] TRUE TRUE TRUE> x > 2 # returns a 3-component logic vector[1] FALSE FALSE TRUE> # return indices of entries of x that are greater than 2> which(x>2)[1] 3> # take the subvector of x whose entries not smaller than 2> x[x >=2][1] 2 3

Logic operators in R: IV

> x = c(1,2,3) # a 3-component vector> y = c(-1,4,-1) # a 3-component vector> # compare x and y entrywise; return a 3-component vector> x > y[1] TRUE FALSE TRUE> x == y[1] FALSE FALSE FALSE> x >= y[1] TRUE FALSE TRUE> any(x>y)[1] TRUE> all(x>y)[1] FALSE

Lists in R: I

> x = vector("list",3) # a list with 3 components> # assign a vector to its 1st component> x[[1]] = c(1,2,3)> # assign a string to its 2nd component> x[[2]] = "Second part of x"

10

> # assign a matrix to its 3rd component> x[[3]] = matrix(1:6,nrow=3)> x[[1]][1] 1 2 3

[[2]][1] "Second part of x"

[[3]][,1] [,2]

[1,] 1 4[2,] 2 5[3,] 3 6

Lists in R: II

> x = vector("list",3) # a list with 3 components> x[[1]] = c(1,2,3)> x[[2]] = "Second part of x"> x[[3]] = matrix(1:6,nrow=3)> x[[2]] # show 2nd component of x[1] "Second part of x"

Lists in R: III

> a = c(1,2,3)> b = "Second part of x"> c = matrix(1:6,nrow=3)> y = list("vector" = a, "string" = b, "matrix" = c)> y$vector[1] 1 2 3

$string[1] "Second part of x"

$matrix[,1] [,2]

[1,] 1 4[2,] 2 5[3,] 3 6

Set operations in R: I

> x = c(1,2,3) # a 3-component vector> 1 %in% x # check membership

11

[1] TRUE> c(2,3) %in% x[1] TRUE TRUE> y = c("stat","115","lecture")> "stat" %in% y[1] TRUE> "time" %in% y[1] FALSE

Set operations in R: II

> x = c(1,2,3) # a 3-component vector> y = c(-1,4,-1) # a 3-component vector> union(x, y)[1] 1 2 3 -1 4> intersect(x, y)numeric(0)> setdiff(x, y)[1] 1 2 3

“Coerce” in R

• as.numeric coerces an object to be numeric• as.factor coerces an object to be a factor• as.marix . . .• as.logical . . .• as.data.frame . . .• so on . . .

length and dim

• length returns the number of components of a vector> a = 1:10> length(a)[1] 10

• dim returns the dimension of matrix or data frame> x=dim(matrix(1:6,nrow=3,ncol=2))> x[1] 3 2> x[1][1] 3

12

R markdown

Install R markdown

> install.packages("markdown")> install.packages("knitr")

In Rstudio, follow “Tools > Global Options > Sweave”, and set “Weave Rnw files using” as “knitr”

More details and video tutorial at: Course webiste

Create a R markdown file

In Rstudio, follow “File > New File > R markdown . . . ”

More details and video tutorial at: Course webiste

Structure of a markdown file

• Header (that typesets the output document)• Main body (that contains the contents)

– R chunk (that contains R codes)– Text chunk (that contains non-coding texts or latex commands)

More details and video tutorial at: Course webiste

13

A sample markdown file

Basic syntax: I

Online tutorial: https://rmarkdown.rstudio.com/authoring_basics.html

Online tutorial: https://bookdown.org/yihui/rmarkdown/r-code.html

Basic syntax: II

Some things to go over carefully:

• Adjust figure size in the output document when figure is generated by a R chunk

• Enable current R chunk to use results produced by previous R chunks

• Basic latex commands

Basic syntax: III

To adjust figure size when figure is generated by a R chunk:

• use fig.width and fig.height to set graphical device size as in

{r eval=TRUE,fig.width = 3,fig.height=4}

• use out.width and out.height to set output size as in

{r eval=TRUE,out.width = 5,out.height=6}

More details at: https://bookdown.org/yihui/rmarkdown/r-code.html

14

Basic syntax: IV

To enable current R chunk to use results produced by privous R chunks:

• name a chunk as “chunk1” and cache results as in

{r chunk1,eval=TRUE,cache=TRUE}

• use dependson= refer to “chunk1” as in

{r chunk2,dependson="chunk1",eval=TRUE}

More details at: https://yihui.name/knitr/options/

Latex in markdown

• To include latex packages, add - \usepackage{package_name} in the header, such as:

header-includes:- \usepackage{bbm}- \usepackage{amssymb}- \usepackage{amsmath}- \usepackage{graphicx,float}

• For Latex commands, please use a quick reference: https://wch.github.io/latexsheet/

• Caution: not all Latex commands work in markdown

Data visualization

Why data visualization?

Data visualization

• provides preliminary understanding of data• helps present and disseminate knowledge• is a relatively under-developed subject of data science

R packages for visualization

• ggplot2: create plots• GGally: extend ggplot2• ggmap: provide maps• igraph: produce graphs• Plotly: create interactive web-based plots• Other specialized packages

Basic principles for plotting

• data usually need to be a data frame• build plot layer by layer• basic components of a plot command:

– data, mapping, scales

15

– geometric objects, coordinate system– facet, statistical transformations

Scatter plot, density plot, boxplot, bar plot

Scatter plot matrix

Scatter plot can be used to show any “visible” relationship between two variables.

Iris data:

• 4 variables: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width• species: setosa, versicolor, and virginica• 150 observations for each variable

Scatter plot matrix

> pairs(iris[,1:4], col=iris$Species)

Sepal.Length

2.0 3.0 4.0 0.5 1.5 2.5

4.5

6.0

7.5

2.0

3.0

4.0

Sepal.Width

Petal.Length

13

57

4.5 5.5 6.5 7.5

0.5

1.5

2.5

1 3 5 7

Petal.Width

16

Density plot

Density plot can be used to:

• visually check model assumptions• visually compare a response’s behavior under different conditions

Example: iris data set

Density plot

> library(ggplot2)> ggplot(iris, aes(x=Sepal.Length, color=Species)) ++ geom_density(linetype = "dashed") + theme_bw()

0.0

0.4

0.8

1.2

5 6 7 8

Sepal.Length

dens

ity

Species

setosa

versicolor

virginica

Boxplot

Boxplot does not present full distributional information as density plot. But it can be used to visuallycheck:

• median of data

17

• range of data• skewness of data• outliers in data

Boxplot

> library(ggplot2)> ggplot(iris, aes(x=Species,y=Sepal.Length))+geom_boxplot()++ theme_bw()+stat_summary(fun.y=mean,geom="point",shape=23,size=4)

5

6

7

8

setosa versicolor virginica

Species

Sep

al.L

engt

h

Scatter plot matrix: ggpairs

> library(GGally)> ggpairs(iris, aes(colour = Species, alpha = 0.4))

18

Cor : −0.118setosa: 0.743

versicolor: 0.526virginica: 0.457

Cor : 0.872setosa: 0.267

versicolor: 0.754virginica: 0.864

Cor : −0.428setosa: 0.178

versicolor: 0.561virginica: 0.401

Cor : 0.818setosa: 0.278

versicolor: 0.546virginica: 0.281

Cor : −0.366setosa: 0.233

versicolor: 0.664virginica: 0.538

Cor : 0.963setosa: 0.332

versicolor: 0.787virginica: 0.322

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

Sepal.Length

Sepal.W

idthP

etal.LengthP

etal.Width

Species

5 6 7 8 2.02.53.03.54.04.5 2 4 6 0.00.51.01.52.02.5 setosaversicolorvirginica

0.0

0.4

0.8

1.2

2.02.53.03.54.04.5

2

4

6

0.00.51.01.52.02.5

0.02.55.07.5

0.02.55.07.5

0.02.55.07.5

Bar plot

> library(ggplot2)> ggplot(mpg, aes(x=drv,y=hwy,fill=class))+theme_bw()++ geom_bar(stat='identity', position='dodge')

19

0

10

20

30

40

4 f r

drv

hwy

class

2seater

compact

midsize

minivan

pickup

subcompact

suv

Visualization with factors

Look into iris data set

> library(ggplot2)> head(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa

Faceting with 1 factor

> library(ggplot2)> ggplot(iris, aes(x=Sepal.Length,y=Petal.Length))+

20

+ theme_bw()+geom_point()++ facet_wrap(~Species,nrow=1)

setosa versicolor virginica

5 6 7 8 5 6 7 8 5 6 7 8

2

4

6

Sepal.Length

Pet

al.L

engt

h

Non-faceting with 1 factor

> library(ggplot2)> ggplot(iris, aes(x=Sepal.Length,y=Petal.Length,+ shape=Species,colour=Species))++ theme_bw()+geom_point()

21

2

4

6

5 6 7 8

Sepal.Length

Pet

al.L

engt

h Species

setosa

versicolor

virginica

Faceting with 2 factors

> library(ggplot2)> head(diamonds)# A tibble: 6 x 10

carat cut color clarity depth table price x y z<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>

1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.432 0.21 Prem~ E SI1 59.8 61 326 3.89 3.84 2.313 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.314 0.290 Prem~ I VS2 62.4 58 334 4.2 4.23 2.635 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.756 0.24 Very~ J VVS2 62.8 57 336 3.94 3.96 2.48

Faceting with 2 factors

> library(ggplot2)> diamondsA = diamonds[diamonds$color %in% c("E","J","G"), ]> ggplot(diamondsA, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=depth))+facet_grid(color~cut)

22

Fair Good Very Good Premium Ideal

EG

J

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5

0

5000

10000

15000

0

5000

10000

15000

0

5000

10000

15000

carat

pric

e

50

60

70

depth

Visualization with 3 factors

> library(ggplot2)> ggplot(diamondsA, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=clarity))+facet_grid(color~cut)

23

Fair Good Very Good Premium Ideal

EG

J

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5

0

5000

10000

15000

0

5000

10000

15000

0

5000

10000

15000

carat

pric

e

clarity

I1

SI2

SI1

VS2

VS1

VVS2

VVS1

IF

Mathematical expressions

Math expressions in R

• Plotmath documentation

• expression and paste commands

Subset of diamonds data

Use dplyr and piping:> library(dplyr)> dB = diamonds %>%+ filter(color %in% c("E","J","G")) %>%+ filter(cut %in% c("Ideal","Premium"))> head(dB)# A tibble: 6 x 10

carat cut color clarity depth table price x y z

24

<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.432 0.21 Prem~ E SI1 59.8 61 326 3.89 3.84 2.313 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.464 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.715 0.2 Prem~ E SI2 60.2 62 345 3.79 3.75 2.276 0.32 Prem~ E I1 60.9 58 345 4.38 4.42 2.68

Base layer

> library(ggplot2)> p1 = ggplot(dB, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=depth))+facet_grid(cut~color)> p1

E G J

Prem

iumIdeal

1 2 3 4 1 2 3 4 1 2 3 4

0

5000

10000

15000

0

5000

10000

15000

carat

pric

e

45

50

55

60

65

depth

Math symbols in axis titles

Create math expressions for axis titles:

25

> xs = expression(paste("carat ", pi["1,m"], sep=""))> ys = expression(paste("price ", gamma^2, sep=""))> ms = c("Price vs cara")

Math symbols in axis titles

> p2=p1 + ggtitle(ms)+xlab(xs)+ylab(ys)++ theme(plot.title = element_text(hjust = 0.5))> p2

E G J

Prem

iumIdeal

1 2 3 4 1 2 3 4 1 2 3 4

0

5000

10000

15000

0

5000

10000

15000

carat π1,m

pric

e γ2

45

50

55

60

65

depth

Price vs cara

Math symbols in legend title

26

> p2 + labs(col = expression(paste("my ",lambda, sep="")))

E G J

Prem

iumIdeal

1 2 3 4 1 2 3 4 1 2 3 4

0

5000

10000

15000

0

5000

10000

15000

carat π1,m

pric

e γ2

45

50

55

60

65

my λ

Price vs cara

Subset of diamonds data

> library(dplyr)> dC = diamonds %>% filter(color %in% c("E","J","G")) %>%+ filter(cut %in% c("Ideal","Premium")) %>%+ filter(clarity %in% c("SI1","VS1"))> head(dC)# A tibble: 6 x 10

carat cut color clarity depth table price x y z<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>

1 0.21 Prem~ E SI1 59.8 61 326 3.89 3.84 2.312 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.463 0.33 Ideal J SI1 61.1 56 403 4.49 4.55 2.764 0.23 Ideal G VS1 61.9 54 404 3.93 3.95 2.445 0.31 Prem~ G SI1 61.8 58 553 4.35 4.32 2.686 0.7 Ideal E SI1 62.5 57 2757 5.7 5.72 3.57

27

Base layer

> library(ggplot2)> p3 = ggplot(dC, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=clarity))+facet_grid(cut~color)> p3

E G JP

remium

Ideal

1 2 1 2 1 2

0

5000

10000

15000

0

5000

10000

15000

carat

pric

e

clarity

SI1

VS1

Math symbols in legend labels

> library(ggplot2)> p3+scale_color_discrete(labels =+ c(expression(italic(omega)),"Any"))

28

E G J

Prem

iumIdeal

1 2 1 2 1 2

0

5000

10000

15000

0

5000

10000

15000

carat

pric

e

clarity

ω

Any

Math symbols in strip names

Command:

• Both factors

facet_grid(factorA ~ factorB, labeller = label_parsed)

• One factor

facet_grid(factorA ~ factorB,labeller = labeller(factorA=label_parsed))

Math symbols in strip names

Create math expressions for levels of factors:> ColorFStg = c(expression(paste(pi[0],"=", 0.5,sep="")),+ expression(paste(lambda[z],"=", 0.6,sep="")),+ expression(paste(zeta[0],"=", 0.7,sep="")))> dC$colorF = dC$color> dC$colorF = factor(dC$color, labels =ColorFStg)> dC[,c(1:4,7,11)] %>% group_by(colorF) %>% slice(1)

29

# A tibble: 3 x 6# Groups: colorF [3]

carat cut color clarity price colorF<dbl> <ord> <ord> <ord> <int> <ord>

1 0.21 Premium E SI1 326 "paste(pi[0], \"=\", 0.5, ~2 0.23 Ideal G VS1 404 "paste(lambda[z], \"=\", 0~3 0.23 Ideal J VS1 340 "paste(zeta[0], \"=\", 0.7~

Math symbols in strip names

> ggplot(dC, aes(x=carat,y=price))+theme_bw()++ geom_point(aes(colour=clarity))++ facet_grid(cut~colorF,labeller = label_parsed)

π0=0.5 λz=0.6 ζ0=0.7

Prem

iumIdeal

1 2 1 2 1 2

0

5000

10000

15000

0

5000

10000

15000

carat

pric

e

clarity

SI1

VS1

Math symbols in plot

> ggplot(dC, aes(x=carat,y=price))+theme_bw()++ geom_line(aes(linetype=clarity))+

30

+ facet_grid(cut~colorF,labeller = label_parsed)

π0=0.5 λz=0.6 ζ0=0.7

Prem

iumIdeal

1 2 1 2 1 2

0

5000

10000

15000

0

5000

10000

15000

carat

pric

e

clarity

SI1

VS1

Other ggplot2 twicks

Not covered

The following have not been covered:

• some statistical transforms: stat_XXX• lines, shapes for x-y plot: geom_XXX• axis, legend and strip adjustment: theme• figure margin adjustment: margin

Information on this can be found on the ggplot2 book or https://stackoverflow.com

License and session Information

License

31

> sessionInfo()R version 3.5.0 (2018-04-23)Platform: x86_64-w64-mingw32/x64 (64-bit)Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:[1] LC_COLLATE=English_United States.1252[2] LC_CTYPE=English_United States.1252[3] LC_MONETARY=English_United States.1252[4] LC_NUMERIC=C[5] LC_TIME=English_United States.1252

attached base packages:[1] stats graphics grDevices utils datasets methods[7] base

other attached packages:[1] bindrcpp_0.2.2 dplyr_0.7.8 GGally_1.4.0 ggplot2_3.1.0[5] knitr_1.21

loaded via a namespace (and not attached):[1] Rcpp_1.0.0 RColorBrewer_1.1-2 pillar_1.3.1[4] compiler_3.5.0 plyr_1.8.4 highr_0.7[7] bindr_0.1.1 tools_3.5.0 digest_0.6.18

[10] viridisLite_0.3.0 evaluate_0.12 tibble_1.4.2[13] gtable_0.2.0 pkgconfig_2.0.2 rlang_0.3.0.1[16] cli_1.0.1 rstudioapi_0.8 yaml_2.2.0[19] xfun_0.4 withr_2.1.2 stringr_1.3.1[22] grid_3.5.0 tidyselect_0.2.5 reshape_0.8.8[25] glue_1.3.0 R6_2.3.0 fansi_0.4.0[28] rmarkdown_1.11 reshape2_1.4.3 purrr_0.2.5[31] magrittr_1.5 scales_1.0.0 codetools_0.2-15[34] htmltools_0.3.6 assertthat_0.2.0 colorspace_1.3-2[37] labeling_0.3 utf8_1.1.4 stringi_1.2.4[40] lazyeval_0.2.1 munsell_0.5.0 crayon_1.3.4

32

top related