Download - R introduction v2
![Page 1: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/1.jpg)
THE WRIGHT LAB COMPUTATION LUNCHES
An introduction to R Gene expression from DEVA to differential expression Handling massively parallel sequencing data
![Page 2: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/2.jpg)
R: data frames, plots and linear models
Martin Johnsson
![Page 3: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/3.jpg)
statistical environment
![Page 4: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/4.jpg)
free open source
![Page 5: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/5.jpg)
scripting language
![Page 6: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/6.jpg)
great packages (limma for microarrays, R/qtl for QTL
mapping etc)
![Page 7: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/7.jpg)
You need to write code!
![Page 8: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/8.jpg)
Why scripting?
harder the first time easier the next 20 times … necessity for moderately large data ( )
![Page 9: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/9.jpg)
R-project.org
Rstudio.com
![Page 10: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/10.jpg)
Interface is immaterial
ssh into a server, RStudio, alternatives very few platform-dependent elements that the end user needs to worry about
![Page 11: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/11.jpg)
Scripting
write your code down in a .R file run it with source ( ) ## comments
![Page 12: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/12.jpg)
if it runs without intervention from start to finish, you’re
actually programming
![Page 13: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/13.jpg)
Help!
within R: ? and tab your favourite search engine ask (Stack Exchange, R-help mailing list, package mailing lists)
![Page 14: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/14.jpg)
First task: make a new script that imports a data set and takes a subset
of the data
![Page 15: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/15.jpg)
Reading in data
Excel sheet to data.frame one sheet at a time clear formatting short succinct column names export text
read.table
![Page 16: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/16.jpg)
Subsetting
logical operators ==, !=, >, <, ! subset(data, column1==1) subset(data, column1==1 & column2>2)
![Page 17: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/17.jpg)
Indexing with [ ]
first three rows: data[c(1,2,3),] two columns: data[,c(2,4)] ranges: data[,1:3]
![Page 18: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/18.jpg)
Variables
columns in expressions data$column1 + data$column2 log10(data$column2)
assignment arrow data$new.column <- log10(data$column2) new.data <- subset(data, column1==10)
![Page 19: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/19.jpg)
Exercise: Start a new script and save it as unicorn_analysis.R. Import unicorn_data.csv.
Take a subset that only includes green unicorns.
![Page 20: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/20.jpg)
RStudio: File > New > R Script
data <- read.csv("unicorn_data.csv") green.unicorns <- subset(data, colour=="green")
![Page 21: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/21.jpg)
Anatomy of a function call
function.name(parameters) mean(data$column) mean(x=data$column) mean(x=data$column, na.rm=T) ?mean
![Page 22: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/22.jpg)
mean(exp(log(x)))
![Page 23: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/23.jpg)
programming in R == stringing together functions
and writing new ones
![Page 24: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/24.jpg)
Using a package (ggplot2) to make statistical graphics.
![Page 25: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/25.jpg)
install.packages("ggplot2") library(ggplot2)
![Page 26: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/26.jpg)
qplot(x=x.var, y=y.var, data=data) only x: histogram x and y numeric: scatterplot
or set geometry (geom) yourself
![Page 27: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/27.jpg)
geoms: point line boxplot jitter – scattered points tile – heatmap and many more
![Page 28: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/28.jpg)
Exercise: Make a scatterplot of weight and horn length in green unicorns.
Write all code in the unicorn_analysis.R script.
Save the plots as variables so you can refer back to them.
![Page 29: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/29.jpg)
green.scatterplot <- qplot(x=weight, y=horn.length, data=green.unicorns)
24
27
30
33
250 300 350 400 450weight
horn.length
![Page 30: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/30.jpg)
Exercise: Make a boxplot of horn length versus diet.
![Page 31: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/31.jpg)
qplot(x=diet, y=weight, data=unicorn.data, geom="boxplot")
25
30
35
40
candy flowersdiet
horn.length
![Page 32: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/32.jpg)
Small multiples
split the plot into multiple subplots useful for looking at patterns qplot(x=x.var, y=y.var, data=data, facets=~variable) facets=variable1~variable2
![Page 33: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/33.jpg)
Exercise: Again, make a boxplot of diet and horn length, but separated
into small multiples by colour.
![Page 34: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/34.jpg)
qplot(x=diet, y=horn.length, data=data, geom="boxplot", facets=~colour)
green pink
25
30
35
40
candy flowers candy flowersdiet
horn.length
![Page 35: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/35.jpg)
Comparing means with linear models
![Page 36: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/36.jpg)
Wilkinson–Rogers notation
one predictor: y ~ x additive model: y ~ x1 + x2 interactions: y ~ x1 + x2 + x1:y2 or y ~ x1 * x2 factors: y ~ factor(x)
![Page 37: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/37.jpg)
Student’s t-test
![Page 38: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/38.jpg)
t.test(variable ~ grouping, data=data) alternative: two sided, less, greater var.equal paired
geoms: boxplot, jitter
![Page 39: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/39.jpg)
Carl Friedrich Gauss least squares estimation
![Page 40: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/40.jpg)
25
30
35
40
250 300 350 400 450weight
horn.length
![Page 41: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/41.jpg)
linear model y = a + b x + e, e ~ N(0, sigma) lm(y ~ x, data=some.data)
formula data summary( ) function
![Page 42: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/42.jpg)
Exercise: Make a regression of horn length and body weight.
![Page 43: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/43.jpg)
model <- lm(horn.length ~ weight, data=data) summary(model) Call: lm(formula = horn.length ~ weight, data = data) Residuals: Min 1Q Median 3Q Max -6.5280 -2.0230 -0.1902 2.5459 7.3620 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.41236 3.85774 5.291 1.76e-05 *** weight 0.03153 0.01093 2.886 0.00793 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.447 on 25 degrees of freedom (3 observations deleted due to missingness) Multiple R-squared: 0.2499, Adjusted R-squared: 0.2199 F-statistic: 8.327 on 1 and 25 DF, p-value: 0.007932
![Page 44: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/44.jpg)
What have we actually fitted? model.matrix(horn.length ~ weight, data) What were the results? coef(model) How uncertain? confint(model)
![Page 45: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/45.jpg)
Plotting the model
regression equation y = a + x b a is the intercept b is the slope of the line
pull out coefficients with coef( ) a plot with two layers: scatterplot with added geom_abline( )
![Page 46: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/46.jpg)
scatterplot <- qplot(x=weight, y=horn.length, data=data) a <- coef(model)[1] b <- coef(model)[2] scatterplot + geom_abline(intercept=a, slope=b)
25
30
35
40
250 300 350 400 450weight
horn.length
![Page 47: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/47.jpg)
A regression diagnostic
the linear model needs several assumptions, particularly linearity and equal error variance the residuals vs fitted plot can help spot gross deviations
![Page 48: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/48.jpg)
qplot(x=fitted(model), y=residuals(model))
-4
0
4
8
28 30 32 34fitted(model)
residuals(model)
![Page 49: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/49.jpg)
Photo: Peter (anemoneprojectors), CC:BY-SA-2.0 http://www.flickr.com/people/anemoneprojectors/
![Page 50: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/50.jpg)
analysis of variance aov(formula, data=some.data)
drop1(aov.object, test="F") F-tests (Type II SS)
post-hoc tests pairwise.t.test TukeyHSD
![Page 51: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/51.jpg)
Exercise: Perform analysis of variance on weight and the effects of diet while
controlling for colour.
![Page 52: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/52.jpg)
We want a two-way anova with an F-test for diet. model.int <- aov(weight ~ diet * colour, data=data) drop1(model.int, test="F") model.add <- aov(weight ~ diet + colour, data=data) drop1(model.add, test="F") Single term deletions Model: weight ~ diet + colour Df Sum of Sq RSS AIC F value Pr(>F) <none> 85781 223.72 diet 1 471.1 86252 221.87 0.1318 0.71975 colour 1 13479.7 99260 225.66 3.7714 0.06396 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
![Page 53: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/53.jpg)
Black magic
none of the is really R’s fault, but things that come up along the way
![Page 54: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/54.jpg)
Black magic
none of the is really R’s fault, but things that come up along the way missing values: na.rm=T, na.exclude( )
![Page 55: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/55.jpg)
Black magic
none of the is really R’s fault, but things that come up along the way missing values: na.rm=T, na.exclude( ) type I, II and III sums of squares in Anova
![Page 56: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/56.jpg)
Black magic
none of the is really R’s fault, but things that come up along the way missing values: na.rm=T, na.exclude( ) type I, II and III sums of squares in anova floating-point arithmetic, e.g. sin(pi)
![Page 57: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/57.jpg)
Reading
Daalgard, Introductory statistics with R, electronic resource at the library Faraway, The linear model in R Gelman & Hill, Data analysis using regression and multilevel/hierarchical models Wickham, ggplot2 book tons of tutorials online, for instance http://martinsbioblogg.wordpress.com/a-slightly-different-introduction-to-r/
![Page 58: R introduction v2](https://reader034.vdocuments.us/reader034/viewer/2022051513/54622ce7b1af9f92238b4da4/html5/thumbnails/58.jpg)
Exercise
More (and some of the same) analysis of the unicorn data set. Use the R documentation and google. I will post solutions.