![Page 1: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/1.jpg)
Correlation and Regression Analysis• Many engineering design and analysis problems involve factors that are
interrelated and dependent. E.g., (1) runoff volume, rainfall; (2) evaporation, temperature, wind speed; (3) peak discharge, drainage area, rainfall intensity; (4) crop yield, irrigated water, fertilizer.
• Due to inherent complexity of system behaviors and lack of full understanding of the procedure involved, the relationship among the various relevant factors or variables are established empirically or semi-empirically.
• Regression analysis is a useful and widely used statistical tool dealing with investigation of the relationship between two or more variables related in a non-deterministic fashion.
• If a variable Y is related to several variables X1, X2, …, XK and their
relationships can be expressed, in general, as
Y = g(X1, X2, …, XK)
where g(.) = general expression for a function;
Y = Dependent (or response) variable;
X1, X2,…, XK = Independent (or explanatory) variables.
![Page 2: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/2.jpg)
Correlation• When a problem involves two dependent random variables, the degree of
linear dependence between the two can be measured by the correlation coefficient (X,Y), which is defined as
where Cov(X,Y) is the covariance between random variables X and Y defined as
where <Cov(X,Y)< and (X,Y) .
• Various correlation coefficients are developed in statistics for measuring the degree of association between random variables. The one defined above is called the Pearson product moment correlation coefficient or correlation coefficient.
• If the two random variables X and Y are independent, then (X,Y)= Cov(X,Y)= . However, the reverse statement is not necessarily true.
![Page 3: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/3.jpg)
Cases of Correlation
Perfectly linearlycorrelated in oppositedirection
Strongly & positively correlated inlinear fashion
Perfectly correlated innonlinear fashion, butuncorrelated linearly.
Uncorrelated in linear fashion
![Page 4: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/4.jpg)
Calculation of Correlation Coefficient
• Given a set of n paired sample observations of two random variables (xi, yi), the sample correlation coefficient ( r) can be calculated as
![Page 5: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/5.jpg)
Auto-correlation• Consider following daily stream flows (in 1000 m3) in June 2001 at Chung Mei
Upper Station (610 ha) located upstream of a river feeding to Plover Cove Reservoir. Determine its 1-day auto-correlation coefficient, i.e., (Qt, Qt+1).
• 29 pairs: {(Qt, Qt+1)} = {(Q1, Q2), (Q2, Q3), …, (Q29, Q30)};
Relevant sample statistics: n=29
The 1-day auto-correlation is 0.439
Day (t) Flow Q(t) Day (t) Flow Q(t) Day (t) Flow Q(t)1 8.35 11 313.89 21 20.062 6.78 12 480.88 22 17.523 6.32 13 151.28 23 116.134 17.36 14 83.92 24 68.255 191.62 15 44.58 25 280.226 82.33 16 36.58 26 347.537 524.45 17 33.65 27 771.308 196.77 18 26.39 28 124.209 785.09 19 22.98 29 58.00
10 562.05 20 21.92 30 44.08
11186.22; 230.06; 187.45; 229.17t tt Q t QQ S Q S
![Page 6: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/6.jpg)
Chung Mei Upper Daily Flow
10 20 30
0
100
200
300
400
500
600
700
800
Day
Flo
w (1000 c
ubic
mete
rs)
1 2 3 4 5
-1.0-0.8-0.6-0.4-0.20.00.20.40.60.81.0
Aut
ocor
rela
tion
Autocorrelation for June 2001 Daily Flows at Chung Mei Upper, HK
Time lags (Days)
0
100
200300
400
500
600700
800
900
0 200 400 600 800 1000
Q(t), 1000 m^3
Q(t
+1),
1000
m^3
![Page 7: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/7.jpg)
Regression Models
• due to the presence of uncertainties a deterministic functional relationship generally is not very appropriate or realistic.
• The deterministic model form can be modified to account for uncertainties in the model as
Y = g(X1, X2, …, XK) +
where = model error term with E()=0, Var()=2.
• In engineering applications, functional forms commonly used for establishing empirical relationships are
– Additive: Y = 0 + 1X1 + 2X2 + … + KXK +
– Multiplicative: K21 β
Kβ2
β10 X...XXβY
![Page 8: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/8.jpg)
Least Square Method
Suppose that there are n pairs of data, {(xi, yi)}, i=1, 2,.. , n and a plot of these data appears as
What is a plausible mathematical model describing x & y relation?
x
y
![Page 9: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/9.jpg)
Least Square Method
Considering an arbitrary straight line, y =0+1 x, is to be fitted through these data points. The question is “Which line is the most representative”?
11
0
xi
x
yi
yi^
y =0+1 x^
ei = yi – yi = error (residual)^
y
11
0
xi
x
yi
yiyi^
y =0+1 xy =0+1 xy =0+1 x^
ei = yi – yi = error (residual)^ei = yi – yi = error (residual)^
y
![Page 10: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/10.jpg)
Least Square Criterion
• What are the values of 0 and 1 such that the resulting line “best” fits the data points?
• But, wait !!! What goodness-of-fit criterion to use to determine among all possible combinations of 0 and 1 ?
• The least squares (LS) criterion states that the sum of the squares of errors (or residuals, deviations) is minimum. Mathematically, the LS criterion can be written as:
• Any other criteria that can be used?
![Page 11: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/11.jpg)
Normal Equations for LS Criterion
• The necessary conditions for the minimum values of D are:
and
• Expanding the above equations
• Normal equations:
00
D
01
D
n
iiii
n
iii
xxyD
xyD
110
1
110
0
02
012
n
iiii
n
iii
xyx
xy
1
1
0
0
0
0
1
2
11
11
n
ii
n
ii
n
iii
n
ii
n
ii
xxyx
xny
n
iii
n
ii
n
ii
n
ii
n
ii
yxxx
yxn
11
2
1
11
![Page 12: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/12.jpg)
LS Solution (2 Unknowns)
2
1
2
12
11
2
111
11
1
1
ˆ
ˆˆ
xnx
yxnyx
xn
x
yxn
yx
xyn
x
n
y
n
ii
n
iii
n
ii
n
ii
n
ii
n
ii
n
iii
n
ii
n
ii
![Page 13: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/13.jpg)
Fitting a Polynomial Eq. By LS Method nixxxy i
kikiii ,,2,1,2
2 LS criterion:
minimize D=
n
i
kiiii xxxy
1
22
,,
Set kjforD
j
,,2,1,0,0
Normal Equations are:
n
i
kii
n
i
ki
n
i
ki
n
i
ki
n
iii
n
i
ki
n
ii
n
ii
n
ii
n
i
ki
n
ii
xyxxx
xyxxx
yxxn
11
2
1
1
1
11
1
1
2
1
111
![Page 14: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/14.jpg)
Fitting a Linear Function of Several Variables kxxxy 21
Normal equations:
n
iiki
n
iik
n
iiik
n
iik
n
iii
n
iiki
n
ii
n
ii
n
ii
n
iik
n
ii
xyxxxx
xyxxxx
yxxn
11
2
11
1
11
11
1
21
1
1111
LS criterion :
Minimize D= 2
11
n
i i ki
y x x x
k ,,, 1
Set 0 , 0,1, 2, ,j
Dfor j k
![Page 15: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/15.jpg)
Matrix Form of Multiple Regression by LS
nknknn
k
k
n xxx
xxx
xxx
y
y
y
2
1
21
22221
11211
2
1
1
1
1
(Note: ijx= i
th observation of the jth independent variable)
or y = X + in short
LS criterion is:
min β X -y 'βX -y ε ε' 1
2
n
iiD
β Set 0β D , and result in: 0β XyX
^
) - ( '
The LS solutions are: y X' XX' β 1 ˆ
![Page 16: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/16.jpg)
Measure of Goodness-of-Fit
R2 = Coefficient of Determination
n
1i
2y
iy
n
1i
2iε
1
= 1 - % of variation in the dependent variable, y, unexplained by the regression equation; = % of variation in the dependent variable, y, explained by the
regression equation.
![Page 17: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/17.jpg)
Example 1 (LS Method)
![Page 18: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/18.jpg)
Example 1 (LS Method)
![Page 19: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/19.jpg)
LS Example
![Page 20: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/20.jpg)
LS Example (Matrix Approach)
![Page 21: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/21.jpg)
LS Example (by Minitab w/ 0)
![Page 22: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/22.jpg)
LS Example (by Minitab w/o 0)
![Page 23: Correlation and Regression Analysis Many engineering design and analysis problems involve factors that are interrelated and dependent. E.g., (1) runoff](https://reader036.vdocuments.us/reader036/viewer/2022062511/5517bdd355034616658b4690/html5/thumbnails/23.jpg)
LS Example (Output Plots)