simple linear regression (slr) che1147 saed sayad university of toronto
TRANSCRIPT
![Page 1: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/1.jpg)
Simple Linear Regression (SLR)
CHE1147
Saed Sayad
University of Toronto
![Page 2: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/2.jpg)
Types of Correlation
Positive correlation Negative correlation No correlation
![Page 3: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/3.jpg)
Simple linear regression describes the linear relationship between a predictor variable, plotted on the x-axis, and a response variable, plotted on the y-axis
Independent Variable (X)
depe
nden
t Var
iabl
e (Y
)
![Page 4: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/4.jpg)
1oY X
X
Y
o1.0
1
![Page 5: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/5.jpg)
1oY X
X
Y
o
1.0
1
![Page 6: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/6.jpg)
X
Y
![Page 7: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/7.jpg)
X
Y ε
ε
![Page 8: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/8.jpg)
Fitting data to a linear model
1i o i iY X
intercept slope residuals
![Page 9: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/9.jpg)
How to fit data to a linear model?
The Ordinary Least Square Method (OLS)
![Page 10: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/10.jpg)
Least Squares Regression
Residual (ε) =
Sum of squares of residuals =
Model line:
• we must find values of and that minimise o 1
XY 10
YY
2)( YY 2)(min YY
![Page 11: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/11.jpg)
Regression Coefficients
21x
xy
xx
xy
S
Sb
XbYb 10
![Page 12: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/12.jpg)
Required Statistics
nsobservatio ofnumber n
n
XX
n
YY
![Page 13: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/13.jpg)
Descriptive Statistics
1
)( 1
2
n
YYYVar
n
i
1
)( 1
2
n
XXXVar
n
i
xxS
)(SSTS yy
xyS 1
),(Covar 1
n
YYXXYX
n
i
![Page 14: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/14.jpg)
Regression Statistics
2)( YYSST
2)( YYSSR
2)( YYSSE
![Page 15: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/15.jpg)
Y
Variance to beexplained by predictors
(SST)
![Page 16: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/16.jpg)
Y
X1
Variance NOT explained by X1
(SSE)
Variance explained by X1
(SSR)
![Page 17: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/17.jpg)
SSESSRSST
Regression Statistics
![Page 18: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/18.jpg)
Regression Statistics
SST
SSRR 2
Coefficient of Determinationto judge the adequacy of the regression model
![Page 19: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/19.jpg)
Regression Statistics
yx
xy
yyxx
xy
SS
SR
RR
2
Correlation
measures the strength of the linear association between two variables.
![Page 20: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/20.jpg)
Standard Error for the regression model
MSES
n
SSES
SS
e
e
ee
2
2
22
2
Regression Statistics
2)( YYSSE
![Page 21: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/21.jpg)
ANOVA
df SS MS F P-value
Regression 1 SSR SSR / df MSR / MSE P(F)
Residual n-2 SSE SSE / df
Total n-1 SST
If P(F)< then we know that we get significantly better prediction of Y from the regression model than by just predicting mean of Y.
ANOVA to test significance of regression
0:
0:
1
10
AH
H
![Page 22: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/22.jpg)
Hypothesis Tests for Regression Coefficients
ib
iikn S
bt
)1(
0:
0:
1
0
i
i
H
H
![Page 23: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/23.jpg)
Hypotheses Tests for Regression Coefficients
xx
eekn
SS
b
bS
bt
2
11
1
11)1( )(
0:
0:
1
10
AH
H
![Page 24: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/24.jpg)
Confidence Interval on Regression Coefficients
xx
ekn
xx
ekn S
Stb
S
Stb
2
)1(,2/11
2
)1(,2/1
Confidence Interval for
![Page 25: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/25.jpg)
Hypothesis Tests on Regression Coefficients
xxe
ekn
SX
nS
b
bS
bt
22
00
0
00)1(
1)(
0:
0:
0
00
AH
H
![Page 26: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/26.jpg)
xxekn
xxekn S
X
nStb
S
X
nStb
22
)1(,2/00
22
)1(,2/0
11
Confidence Interval for the intercept
Confidence Interval on Regression Coefficients
![Page 27: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/27.jpg)
Hypotheses Test the Correlation Coefficient
0:
0:0
AH
H
201
2
R
nRT
We would reject the null hypothesis if 2,2/0 ntt
![Page 28: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/28.jpg)
Diagnostic Tests For Regressions
i
Expected distribution of residuals for a linear model with normal distribution or residuals (errors).
iY
![Page 29: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/29.jpg)
Diagnostic Tests For Regressions
i
Residuals for a non-linear fit
iY
![Page 30: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/30.jpg)
Diagnostic Tests For Regressions
i
Residuals for a quadratic function or polynomial
iY
![Page 31: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/31.jpg)
Diagnostic Tests For Regressions
i
Residuals are not homogeneous (increasing in variance)
iY
![Page 32: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/32.jpg)
Regression – important points
1. Ensure that the range of valuessampled for the predictor variableis large enough to capture the fullrange to responses by the responsevariable.
![Page 33: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/33.jpg)
X
Y
X
Y
![Page 34: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/34.jpg)
Regression – important points
2. Ensure that the distribution ofpredictor values is approximatelyuniform within the sampled range.
![Page 35: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/35.jpg)
X
Y
X
Y
![Page 36: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/36.jpg)
Assumptions of Regression
1. The linear model correctly describes the functional relationship between X and Y.
![Page 37: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/37.jpg)
Assumptions of Regression
1. The linear model correctly describes the functional relationship between X and Y.
Y
X
![Page 38: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/38.jpg)
Assumptions of Regression
2. The X variable is measured without error
X
Y
![Page 39: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/39.jpg)
Assumptions of Regression
3. For any given value of X, the sampled Y values are independent
4. Residuals (errors) are normally distributed.
5. Variances are constant along the regression line.
![Page 40: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/40.jpg)
Multiple Linear Regression (MLR)
![Page 41: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/41.jpg)
The linear model with a singlepredictor variable X can easily be extended to two or more predictor variables.
1 1 2 2 ...o p pY X X X
![Page 42: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/42.jpg)
Y
X1
Variance NOT explained by X1 and X2
Unique variance explained by X1
Unique variance explained by X2
X2
Common variance explained by X1 and X2
![Page 43: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/43.jpg)
Y
X1 X2
A “good” model
![Page 44: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/44.jpg)
Partial Regression Coefficients (slopes): Regression coefficient of X after controlling for (holding all other predictors constant) influence of other variables from both X and Y.
1 1 2 2 ...o p pY X X X
Partial Regression Coefficients
intercept residuals
![Page 45: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/45.jpg)
The matrix algebra of
Ordinary Least Square
1( ' ) 'X X X Y Predicted Values:
Residuals:
Intercept and Slopes:
XY
YY
![Page 46: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/46.jpg)
Regression StatisticsHow good is our model?
2)( YYSST
2)( YYSSR
2)( YYSSE
![Page 47: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/47.jpg)
Regression Statistics
SST
SSRR 2
Coefficient of Determinationto judge the adequacy of the regression model
![Page 48: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/48.jpg)
Adjusted R2 are not biased!
n = sample sizek = number of independent variables
)1(1
11 22 R
kn
nRadj
Regression Statistics
![Page 49: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/49.jpg)
Standard Error for the regression model
MSES
kn
SSES
SS
e
e
ee
2
2
22
1
Regression Statistics
2)( YYSSE
![Page 50: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/50.jpg)
ANOVA
df SS MS F P-value
Regression k SSR SSR / df MSR / MSE P(F)
Residual n-k-1 SSE SSE / df
Total n-1 SST
If P(F)< then we know that we get significantly better prediction of Y from the regression model than by just predicting mean of Y.
ANOVA to test significance of regression
0:
0...: 210
iA
k
H
H
at least one!
![Page 51: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/51.jpg)
Hypothesis Tests for Regression Coefficients
ib
iikn S
bt
)1(
0:
0:
1
0
i
i
H
H
![Page 52: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/52.jpg)
Hypotheses Tests for Regression Coefficients
iie
ii
ie
ikn
CS
b
bS
bt
2
1)1( )(
0:
0:0
iA
i
H
H
xx
e
S
S 2
![Page 53: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/53.jpg)
Confidence Interval on Regression Coefficients
iiekniiiiekni CStbCStb 2)1(,2/
2)1(,2/
Confidence Interval for
![Page 54: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/54.jpg)
1( ' ) 'X X X Y
![Page 55: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/55.jpg)
1( ' ) 'X X X Y
![Page 56: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/56.jpg)
1( ' ) 'X X X Y
![Page 57: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/57.jpg)
iie
ii
ie
ikn
CS
b
bS
bt
2
1)1( )(
![Page 58: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/58.jpg)
Diagnostic Tests For Regressions
i
Expected distribution of residuals for a linear model with normal distribution or residuals (errors).
iX
X Residual Plot
-5
0
5
10
0 2 4 6 8
XR
esid
uals
![Page 59: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/59.jpg)
Standardized Residuals
2e
ii
S
ed
Standard Residuals
-2-1.5
-1-0.5
00.5
11.5
22.5
0 5 10 15 20 25
![Page 60: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/60.jpg)
Avoiding predictors (Xs)
that do not contribute significantly
to model prediction
Model Selection
![Page 61: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/61.jpg)
- Forward selectionThe ‘best’ predictor variables are entered, one by one.
- Backward eliminationThe ‘worst’ predictor variables are eliminated, one by one.
Model Selection
![Page 62: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/62.jpg)
Forward Selection
![Page 63: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/63.jpg)
BackwardElimination
![Page 64: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/64.jpg)
Model Selection: The General Case
1
),...,,,...,,(
),...,,,...,,(),...,,(
121
12121
kn
xxxxxSSEqk
xxxxxSSExxxSSE
Fkqq
kqqq
1,, knqkFF
zeronot in oneleast at :
0...:
1
210
H
H kqq
Reject H0 if :
![Page 65: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/65.jpg)
The degree of correlation between Xs.
A high degree of multicolinearity produces unacceptable uncertainty (large variance) in regression coefficient estimates (i.e., large sampling variation)
Imprecise estimates of slopes and even the signs of the coefficients may be misleading.
t-tests which fail to reveal significant factors.
Multicolinearity
![Page 66: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/66.jpg)
Scatter Plot
![Page 67: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/67.jpg)
Multicolinearity
If the F-test for significance of regression is significant, but tests on the individual regression coefficients are not, multicolinearity may be present.
Variance Inflation Factors (VIFs) are very useful measures of multicolinearity. If any VIF exceed 5, multicolinearity is a problem.
iii
i CR
VIF
21
1)(
![Page 68: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/68.jpg)
Model Evaluation
Prediction Error Sum of Squares(leave-one-out)
n
iii yyPRESS
1
2)( )(
![Page 69: Simple Linear Regression (SLR) CHE1147 Saed Sayad University of Toronto](https://reader033.vdocuments.us/reader033/viewer/2022050708/56649f3e5503460f94c5e57b/html5/thumbnails/69.jpg)
Thank You!