regression & correlation analysis of biological data ryan mcewan and julia chapman department of...
TRANSCRIPT
![Page 1: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/1.jpg)
Regression & Correlation
Analysis of Biological DataRyan McEwan and Julia ChapmanDepartment of BiologyUniversity of [email protected]
![Page 2: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/2.jpg)
Simple linear regression is a standard technique in the Analysis of Biological Data:
The main idea is assessing the relationship between two variables, assuming that the relationship is direction and linear…and assuming that one variable is a driver of the relationship.
The Response variable (plotted on X) is assumed to respond in a linear relationship to changes in the Predictor variable (plotted on Y).
The reverse is not assumed in this analysis (that X drives Y). Think heart rate and exercise.Other examples?
![Page 3: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/3.jpg)
But if you have a cloud of points…where do you put the line?
![Page 4: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/4.jpg)
Best fit lines & “Least Squares” regression
The idea is to drive the line through the cloud in the area that minimizes the distance between the points and the line.
![Page 5: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/5.jpg)
Regression residuals
You can generate a table of residuals..a new data set!
How much does each point deviate from theregression line?
![Page 6: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/6.jpg)
Detrending… a scientific siren song
![Page 7: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/7.jpg)
Regression lines can have varying slopes from a single Y intercept.
![Page 8: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/8.jpg)
Regression lines can have identical slopes, but different Y intercepts.
![Page 9: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/9.jpg)
![Page 10: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/10.jpg)
We will be running a test of this sort in R. The thing I want to you to understand is that the statistical test…. The P-value generated… relates to the null hypothesis of NO SLOPE. That the line is indeed flat. That would mean the response variable is NOT changing in relation to the predictor.
![Page 11: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/11.jpg)
![Page 12: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/12.jpg)
…ruut row…
![Page 13: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/13.jpg)
IMPORTANT! The P-value from a regression, tells you whether the line is statistically flat….it does not tell you how much variation is captured!
![Page 14: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/14.jpg)
![Page 15: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/15.jpg)
It may be more useful to calculate a confidence interval
![Page 16: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/16.jpg)
You might wish to have replicate values
![Page 17: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/17.jpg)
Your relationship might not be linear!
Polynomial Regression
![Page 18: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/18.jpg)
Regression Diagnostics!A stepwise process of adding factors to the regression. Testing P value, r2, etc.
If you are going to take this on, you need to grind! Read, analyze, read some more
![Page 19: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/19.jpg)
Correlation is a related form of analysis, but is different in one fundamental way…a correlation is testing for a relationship between two factors, but NOT ASSUMING one causes the other.
Thus, no predictor and response
![Page 20: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/20.jpg)
You would use a correlation analysis if you are not making assumptions about one factor driving another.
Pearson correlation for normally distributed data
Spearman (rank) correlation for non normally distributed data.
![Page 21: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/21.jpg)
Logistic regression:
To be used if your data are categorical……
![Page 22: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/22.jpg)
Caution 1: Correlation is not causation!
![Page 23: Regression & Correlation Analysis of Biological Data Ryan McEwan and Julia Chapman Department of Biology University of Dayton ryan.mcewan@udayton.edu](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c925503460f9494d74f/html5/thumbnails/23.jpg)
Extrapolation is dangerous!!