simple linear regression one reason for assessing correlation is to identify a variable that could...
TRANSCRIPT
![Page 1: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/1.jpg)
Simple Linear Regression
One reason for assessing correlation is to identify a variable that could be used to predict another variable
If that is your intention –
the sample used for the assessment of the correlation must be ‘representative’ of the population within which you wish to make the
predictions.
![Page 2: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/2.jpg)
Assume you would like to predict the scores that each of you would get on Exam 2 in the course.
If we assume that you are a sample from the ‘population’ of students who have taken this course, we could look at past performance to predict future performance.
If all you knew was that you were students from the same population taking the same course – what would you predict for your Exam 2 scores?
![Page 3: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/3.jpg)
Students
100
70
90
80
Exam 2(85.7)
Mean for Exam 2 based on previous students
Given no other information, the best guess would be that each student would get the mean Exam 2 grade, BUT if there was a variable related to Exam 2 scores that was available for each student, it could be used to improve the predictions
![Page 4: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/4.jpg)
For example, it turns out that grades on Exam 1 and grades on Exam 2 are significantly correlated (r = .64), so they share some variance and knowing a student’s Exam 1 grade should help predict her Exam 2 grade
Note ‘outlier’
![Page 5: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/5.jpg)
Note that original data set produced this scatter plot
What is the problem?
Note ‘outlier’
Reduces r from .64 to .52
![Page 6: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/6.jpg)
Simple Linear Regression is just an application using Pearson’s r
(a coefficient of the strength and direction of linear association)
so assumptions are the same
Involves finding the linear relationship between X and Y, that minimizes the differences between actual Y scores and predicted Yp scores
(predicted from X)
![Page 7: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/7.jpg)
line of best fit (minimizes errors) -
where formula for points on line is
Y = a + bX
a is ‘intercept’: value of Y when X = 0
b is the slope of the line: change in Y with each change in X
so all predicted scores (Yp) fall on the line of best fit
![Page 8: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/8.jpg)
Y
X
Mean Y
0
Intercept- a
Y when
X = 0
Mean X
X
Y
slope
![Page 9: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/9.jpg)
In regression language
a is the ‘regression constant’
b is the ‘regression coefficient’-
based on r and the variability of Y
relative to the variability of X
If r = 1, have perfect straight line relationship
If r is less than 1
equation becomes
Yp = a + bX (+ residual)
![Page 10: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/10.jpg)
Y
X
Mean Y
0
Intercept- a
Y when
X = 0
Mean X
X
Y
slope Line of Best Fit would minimize deviations of scores from regression line
![Page 11: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/11.jpg)
The regression line of ‘best fit’
minimizes those errors of predictionleast squares regression line
Sum (Yactual – Yp)2
by = r(SDy/SDx) If X and Y are converted to z scores, both SDs = 1, so by = r
by also can be found by Covxy/Varx
– even though r is correlation of X & Y,
b will vary depending on which one is used
to predict other – changes which SD goes on top/bottom of ratio
ay = Mean Y – by (Mean X)
Value of Yp when X = 0
![Page 12: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/12.jpg)
Partitioning the Variability in Y
SSTotal = Sum (Y - Mean Y)2
variability of Y scores from the mean
Separated into
SSregression = Sum (Yp – Mean Y) 2 Improvement in predictions when using X (variability in Y explained by X),
rather than assuming everyone gets the Mean
SSresidual = Sum (Y - Yp) 2
Degree to which predictions do not match the actual scores
(prediction errors that have been minimized)
![Page 13: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/13.jpg)
SSregression / SST = r2 % of total variance in Y accounted for by X
or --variance in Y explained by X
SSresidual / SST = 1 – r2 % unexplained variance in Y (errors)
Variance of errors = SSresidual/df = MSresidual
Standard Error of Estimate = SQRT (MSresidual)typical amount by which predicted score deviates from actual score
Across each value of X, what is the typical deviation of actual from predicted scores of Y
![Page 14: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/14.jpg)
Standard Error of Estimate
For predicting scores for any individual, can estimate SEE for that prediction from
SEEest = SEE * SQRT 1 + 1 + (X - Mx)2
N (N-1) * (Sx2)
the error is higher as the score on X deviates from the Mean X, and with a smaller sample size used for making the estimate
![Page 15: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/15.jpg)
IQ GPA120.00 3.80115.00 3.00110.00 3.40105.00 3.50100.00 2.80 95.00 2.40 90.00 2.50
Mean 105.00 3.06
Example in Handout Packet – Predicting IQ from GPA
![Page 16: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/16.jpg)
Linear Regression
2.50 3.00 3.50
gpa
90.00
100.00
110.00
120.00
iq
iq = 53.05 + 16.99 * gpaR-Square = 0.69
Mean IQ = 105
This would be your best ‘guess’ for every person if you had no useful predictor
Improvement in Prediction using GPA
Residual – distance from the line
Residual much greater here
Mean GPA = 3.06
![Page 17: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/17.jpg)
Descriptive Statistics
105.0000 10.80123 7
3.0571 .52870 7
iq
gpa
Mean Std. Deviation N
Model Summary
.832a .692 .630 6.56802Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), gpaa.
Coefficientsa
53.049 15.702 3.378 .020
16.993 5.072 .832 3.351 .020
(Constant)
gpa
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: iqa.
Predicted IQ = 53.05 + 16.99 (GPA) + error
approximate 95% CIs are + 2(6.57) for predicting mean IQ of those with a given GPA
Although listed as R and R2 in SPSS regression output, these are, in Simple Linear Regression analyses, just the Pearson r and r2
SPSS will report an ANOVA Table with Regression output, but for the Simple Linear Regression, all you need report is the t-value (which has df = n-p-1; p = # of predictors) that tests the significance of the single predictor (gpa) in the Model (is r reliably different from 0).
Note that the Standardized Coefficient, beta, which is the regression coefficient when all variables are standardized (z-scores) is the same as r
Adjusted R2 is adjusted for the sample size and the number of predictors in the model. Since the sample value will be an inflated estimate of the value for R2 in the population, use adjusted R2 when applying results to the population.
![Page 18: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/18.jpg)
Linear Regression with95.00% Mean Prediction Interval
2.50 3.00 3.50
gpa
90.00
100.00
110.00
120.00
130.00iq
iq = 53.05 + 16.99 * gpaR-Square = 0.69
Note that accuracy of predictions decreases as you move away from the means
95% confident that the ‘true’ line of best fit lies within these CIs
![Page 19: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/19.jpg)
Linear Regression with95.00% Mean Prediction Interval
2.50 3.00 3.50
gpa
90.00
100.00
110.00
120.00
130.00iq
iq = 53.05 + 16.99 * gpaR-Square = 0.69
Note that accuracy of predictions decreases as you move away from the means
95% confident that the ‘true’ line of best fit lies within the CIs
If you consider all the lines that might fall in the intervals, can see that variability increases as you move away from the ‘center’ (Mx, My)
![Page 20: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/20.jpg)
Predicting grades for PSYC 6102 Exam 2 from Exam 1 scores
r = +.64
Usual convention is to report regression constant and coefficient to 3 decimal places
Linear Regression with95.00% Mean Prediction Interval
60.00 70.00 80.00 90.00 100.00
exam1
70.00
80.00
90.00
exam
2
exam2 = 38.54 + 0.56 * exam1R-Square = 0.41
Exam2 = 38.54 + .56 * Exam1
R2 = .41
![Page 21: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/21.jpg)
Predicted Exam 2 score = 38.54 + .56 (Exam 1) + error
approximate 95% CI, at best, + 2(5.15)
Descriptive Statistics
86.1545 6.65560 110
84.9727 7.57163 110
exam2
exam1
Mean Std. Deviation N
Model Summary
.637a .406 .401 5.15176Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), exam1a.
Coefficientsa
38.542 5.559 6.933 .000
.560 .065 .637 8.598 .000
(Constant)
exam1
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: exam2a.
![Page 22: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/22.jpg)
Note that the 95% CI’s are much wider for making predictions for individual’s Exam 2 scores rather than predicting the typical Exam 2 score (on the line)
Usual convention is to report regression constant and coefficient to 3 decimal places
Linear Regression with95.00% Mean Prediction Interval and95.00% Individual Prediction Interval
60.00 70.00 80.00 90.00 100.00
exam1
60.00
70.00
80.00
90.00
100.00
exam2 = 38.54 + 0.56 * exam1R-Square = 0.41
![Page 23: Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your](https://reader030.vdocuments.us/reader030/viewer/2022032709/56649eab5503460f94bb08c2/html5/thumbnails/23.jpg)
Partial Correlation logicFinding the correlation of X and Y after ‘partialling’ out the relationship of each with Z
predict X from Z – for each person, find residuals
(amount prediction missed by)
predict Y from Z – for each person, find residuals
(amount prediction missed by)
Correlate the two sets of residuals
relationship of X and Y after
removing relationship each has with Z