lecture slides stats1.13.l07.air
DESCRIPTION
Lecture slides stats1.13.l07.airTRANSCRIPT
![Page 1: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/1.jpg)
Statistics One
Lecture 7 Introduction to Regression
1
![Page 2: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/2.jpg)
Three segments
• Overview • Calculation of regression coefficients • Assumptions
2
![Page 3: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/3.jpg)
Lecture 7 ~ Segment 1
Regression: Overview
3
![Page 4: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/4.jpg)
Regression: Overview
• Important concepts & topics – Simple regression vs. multiple regression – Regression equation – Regression model
4
![Page 5: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/5.jpg)
Regression: Overview
• Regression: a statistical analysis used to predict scores on an outcome variable, based on scores on one or multiple predictor variables – Simple regression: one predictor variable – Multiple regression: multiple predictors
5
![Page 6: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/6.jpg)
Regression: Overview
• Example: IMPACT (see Lab 2) – An online assessment tool to investigate the
effects of sports-related concussion • http://www.impacttest.com
6
![Page 7: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/7.jpg)
IMPACT example
• IMPACT provides data on 6 variables – Verbal memory – Visual memory – Visual motor speed – Reaction time – Impulse control – Symptom score
7
![Page 8: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/8.jpg)
IMPACT: Correlations pre-injury
8
![Page 9: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/9.jpg)
IMPACT: Correlations pre-injury
9
![Page 10: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/10.jpg)
IMPACT: Correlations post-injury
10
![Page 11: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/11.jpg)
IMPACT: Correlations post-injury
11
![Page 12: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/12.jpg)
IMPACT example
12
• For this example, assume: – Symptom Score is the outcome variable – Simple regression example: • Predict Symptom Score from just one variable
– Multiple regression example: • Predict Symptom Score from two variables
![Page 13: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/13.jpg)
Regression equation
• Y = m + bX + e
– Y is a linear function of X – m = intercept – b = slope – e = error (residual)
13
![Page 14: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/14.jpg)
Regression equation
• Y = B0 + B1X1 + e
– Y is a linear function of X1 – B0 = intercept = regression constant – B1 = slope = regression coefficient – e = error (residual)
14
![Page 15: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/15.jpg)
Model R and R2
• R = multiple correlation coefficient – R = rÝY – The correlation between the predicted scores
and the observed scores • R2
– The percentage of variance in Y explained by the model
15
![Page 16: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/16.jpg)
IMPACT example
• Y = B0 + B1X1 + e – Let Y = Symptom Score – Let X1 = Impulse Control
– Solve for B0 and B1
– In R, function lm
16
![Page 17: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/17.jpg)
IMPACT example
17
Ŷ = 20.48 + 1.43(X) r = .40 R2 = 16%
![Page 18: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/18.jpg)
IMPACT example
18
![Page 19: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/19.jpg)
Regression model
• The regression model is used to model or predict future behavior – The model is just the regression equation – Later in the course we will discuss more
complex models that consist of a set of regression equations
19
![Page 20: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/20.jpg)
Regression: It gets better
• The goal is to produce better models so we can generate more accurate predictions – Add more predictor variables, and/or… – Develop better predictor variables
20
![Page 21: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/21.jpg)
IMPACT example
• Y = B0 + B1X1 + B2X2 + e – Let Y = Symptom Score – Let X1 = Impulse Control – Let X2 = Verbal Memory
– Solve for B0 and B1 and B2
– In R, function lm
21
![Page 22: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/22.jpg)
IMPACT example
22
Ŷ = 4.13 + 1.48(X1) + 0.22(X2) R2 = 22%
![Page 23: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/23.jpg)
IMPACT example
23
![Page 24: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/24.jpg)
Model R and R2
• R = multiple correlation coefficient – R = rÝY – The correlation between the predicted scores
and the observed scores • R2
– The percentage of variance in Y explained by the model
24
![Page 25: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/25.jpg)
IMPACT example
25
R2 = 22% rÝY = .47
![Page 26: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/26.jpg)
Segment summary
• Important concepts & topics – Simple regression vs. multiple regression – Regression equation – Regression model
26
![Page 27: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/27.jpg)
END SEGMENT
27
![Page 28: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/28.jpg)
Lecture 7 ~ Segment 2
Calculation of regression coefficients
28
![Page 29: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/29.jpg)
Estimation of coefficients
• Regression equation: – Y = B0 + B1X1 + e – Ŷ = B0 + B1X1
• (Y – Ŷ) = e (residual)
29
![Page 30: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/30.jpg)
Estimation of coefficients
• The values of the coefficients (e.g., B1) are estimated such that the regression model yields optimal predictions – Minimize the residuals!
30
![Page 31: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/31.jpg)
Estimation of coefficients
• Ordinary Least Squares estimation – Minimize the sum of the squared (SS) residuals
– SS.RESIDUAL = Σ(Y – Ŷ)2
31
![Page 32: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/32.jpg)
IMPACT example
32
![Page 33: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/33.jpg)
Estimation of coefficients
• Sum of Squared deviation scores (SS) in variable Y – SS.Y
33
SS.Y à
![Page 34: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/34.jpg)
Estimation of coefficients
• Sum of Squared deviation scores (SS) in variable X – SS.X
34
SS.X à
![Page 35: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/35.jpg)
Estimation of coefficients
• Sum of Cross Products – SP.XY
35
SS.Y à
SS.X à
SP.XY
![Page 36: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/36.jpg)
Estimation of coefficients
• Sum of Cross Products = SS of the Model – SP.XY = SS.MODEL
36
SS.Y à
SS.X à
SS.MODEL
![Page 37: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/37.jpg)
Estimation of coefficients
• SS.RESIDUAL = (SS.Y – SS.MODEL)
37
SS.Y à
SS.X à
SS.MODEL
SS.RESIDUAL
![Page 38: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/38.jpg)
Estimation of coefficients
• Formula for the unstandardized coefficient – B1 = r x (SDy/ SDx)
38
![Page 39: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/39.jpg)
Estimation of coefficients
• Formula for the standardized coefficient – If X and Y are standardized then
• SDy = SDx = 1 • B = r x (SDy/ SDx) • β = r
39
![Page 40: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/40.jpg)
Segment summary
• Important concepts – Regression equation and model – Ordinary least squares estimation – Unstandardized regression coefficients – Standardized regression coefficients
40
![Page 41: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/41.jpg)
END SEGMENT
41
![Page 42: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/42.jpg)
Lecture 7 ~ Segment 3
Assumptions
42
![Page 43: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/43.jpg)
Assumptions
• Assumptions of linear regression – Normal distribution for Y – Linear relationship between X and Y – Homoscedasticity
43
![Page 44: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/44.jpg)
Assumptions
• Assumptions of linear regression – Reliability of X and Y – Validity of X and Y – Random and representative sampling
44
![Page 45: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/45.jpg)
Assumptions
• Assumptions of linear regression – Normal distribution for Y – Linear relationship between X and Y – Homoscedasticity
45
![Page 46: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/46.jpg)
Anscombe’s quartet
46
![Page 47: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/47.jpg)
Anscombe’s quartet
• Regression equation for all 4 examples:
– Ŷ = 3.00 + 0.50(X1)
47
![Page 48: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/48.jpg)
Anscombe’s quartet
• To test assumptions, save residuals
– Y = B0 + B1X1 + e
– e = (Y – Ŷ)
48
![Page 49: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/49.jpg)
Anscombe’s quartet
• Then examine a scatterplot with – X on the X-axis – Residuals on the Y-axis
49
![Page 50: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/50.jpg)
Anscombe’s quartet
50
![Page 51: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/51.jpg)
Segment summary
• Assumptions when interpreting r – Normal distributions for Y – Linear relationship between X and Y – Homoscedasticity – Examine residuals to evaluate assumptions
51
![Page 52: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/52.jpg)
END SEGMENT
52
![Page 53: Lecture slides stats1.13.l07.air](https://reader033.vdocuments.us/reader033/viewer/2022052321/555ce7bdd8b42aeb2c8b5f48/html5/thumbnails/53.jpg)
END LECTURE 7
53