31. regression analysis 1-1-11
TRANSCRIPT
![Page 1: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/1.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 1/27
Regression Analysis
1Regression Analysis
![Page 2: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/2.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 2/27
Least-Squares Linear Regression
Enables fit of linear or exponential function to data.
The goal in regression analysis is the development
of a statistical model that can be used to predict
the values of a dependent or response variable
from the values of the independent variable(s). Linear Fits Most Common
For exponential functions, data must be transformed.
2Regression Analysis
![Page 3: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/3.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 3/27
Method of Least Squares
If we have N pairs of data(xi, yi) we seek to fit astraight line through thedata of the form:
Determine constants, a0 and a1, such that thedistance between the actualy data and the fitted/predicted line is minimized. Each xi is assumed to be error
free. All the error is assumedto be in the y values.
a0=
x i x i y i ! x i
2 y i""""
x i"( )2
! N x i
2"
a1=
x i y i ! N x i y i""" x i"( )
2
! N x i
2"
xaa y10
+=
3Regression Analysis
![Page 4: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/4.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 4/27
Manual Calculation Method
Seeking an equation with the form: y=a0+a1x y=0.879+0.540x
Raw Data
yi xi xiyi xi
1.2 1 1.2 1
2 1.6 3.2 2.56
2.4 3.4 8.16 11.56
3.5 4 14 16
3.5 5.2 18.2 27.04Sum 12.6 15.2 44.76 58.16
( )( ) ( )( )
( ) ( )( )
( )( ) ( )( )
( ) ( )( )540.0
16.5852.15
76.4456.122.15
879.0
16.5852.15
6.1216.5876.442.15
21
20
=!
!
=
=
!
!
=
a
a
4Regression Analysis
![Page 5: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/5.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 5/27
How good is the fit?
Coefficient of Determination (R2) measures the goodness of fit and the
proportion of the variation of the y values associated with the variation inthe x variable in the regression. The ratio of the explained variation to thetotal variation.
R2 =1Perfect Fit (good prediction)
R2 =0No correlation between x and y
For engineering data, R2, will normally be quite high (0.8-0.90 or higher)
A low value might indicate that some important variable was not considered,but is affecting the results.
R2 =1!
(ax i + b! y i)2"
( y i ! y)2"= Excel Function RSQ (yi 's, xi 's)
where
y = average of the yi 's
5Regression Analysis
![Page 6: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/6.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 6/27
Standard Error of Estimate SEE The standard error of estimate (SEE or Syx) is a
statistical measure of how well the best-fit line representsthe data. This is, effectively, the standard deviation of thedifferences between the data points and the best-fit line. It provides an estimation of the scatter/random error in the data
about the fitted line. This is analogous to standard deviation for
sample data. It has the same units as y.
2 degrees of freedom are lost to calculate coefficients a0 and a1.
sey=
SEE =
S yx=
yi ! ˆ yi( )2
" N ! 2
=
Excel Function STEYX ( yi 's, xi 's)
where
yi = actual value of y for a given x i
ˆ yi= predicted value of y for a given x i
6Regression Analysis
![Page 7: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/7.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 7/27
Linear Regression Assumptions
Variation in the data is assumed to be normally distributed and due torandom causes.
Assuming random variation exists in y values, while x values are error
free.
Since error has been minimized in the y direction, an erroneous
conclusion may be made if x is estimated based on a value for y. For power law or exponential relationships, data needs to be
transformed before carrying out linear regression analysis.
(As we will discuss later, the method of least squares can also be
applied to nonlinear functional relationships.)
7Regression Analysis
![Page 8: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/8.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 8/27
Linear Regression Example
y = 0.9977x + 0.0295
R2
= 0.9993
0.00
0.50
1.00
1.50
2.00
2.50
3.00
0.00 0.50 1.00 1.50 2.00 2.50 3.00
Length, cm
O u t p
u t ,
V o l t s
Use Excel
Chart>>Add Trendline to obtain coefficients
Functions RSQ() and STEYX() to determine R2 and SEE
8Regression Analysis
![Page 9: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/9.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 9/27
Regression Analysis using Excel Analysis Tools
Linear regression is a standard feature of
statistical programs and most spreadsheetprograms. It is only necessary to input the x and y
data. The remaining calculations are performed
immediately.
Excel “Regression Analysis” macro
Performs linear regression only
Non-linear relationships must be transformed
Calculates the slope, intercept, SEE, and the upper and lower
confidence intervals for the slope and intercept
Does not produce any graphical output on the user’s plot.
Does not update automatically.
The user must interpret the results.
9Regression Analysis
![Page 10: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/10.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 10/27
Linear Regression in Excel 2008
10Regression Analysis
Torque, N-m (Y) RPM (X) Y Predicted Residual Residual/SEE=Residual/sey
4.89 100 4.998433207 0.108433207 0.17558474
4.77 201 4.559896053 -0.210103947 -0.340219088
3.79 298 4.138726707 0.348726707 0.564689451
3.76 402 3.687163697 -0.072836303 -0.117943051
2.84 500 3.261652399 0.421652399 0.682777249
4.12 601 2.823115245 -1.296884755 -2.100031702
2.05 699 2.397603947 0.347603947 0.562871377
1.61 799 1.963408745 0.353408745 0.572271025
-0.004341952 5.432628409 m1 b
0.000954031 0.481645161 se1 seb
0.775391233 0.617554846 r^2 sey
20.71311576 6 F df
=LINEST(A2:A9,B2:B9,TRUE,TRUE)
Outlier
Y = m1i X + b
![Page 11: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/11.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 11/27
Linear Regression Example: Omit Outlier
Regression Analysis 11
Torque, N-m (Y) RPM (X) Y Predicted Residual Residual/SEE=Residual/sey
4.89 100 5.000219168 0.110219168 0.504559919
4.77 201 4.504157858 -0.265842142 -1.21696881
3.79 298 4.02774254 0.23774254 1.088334807
3.76 402 3.516946736 -0.243053264 -1.112646171
2.84 500 3.03561992 0.19561992 0.895506407
2.05 699 2.058231795 0.008231795 0.037683406
1.61 799 1.567081983 -0.042918017 -0.196469559
-0.004911498 5.49136898 m1 b
0.000348477 0.170606738 se1 seb
0.975447633 0.218446143 r^2 sey
198.6463557 5 F df
9.479149271 0.238593586 m1 b
![Page 12: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/12.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 12/27
Uncertainties on Regression
Regression Analysis 12
Confidence Interval for Regression Line
SEE=sey 0.218446143
TINV(a=0.05,n=5) 2.570581835
95%C.I.=TINV(α=0.05, ν=5)*SEE/SQRT(7) 0.212239784
Prediction Band for Regression Line
95%P.I.=TINV(α=0.05, ν=5)*SEE 0.561533687
Uncertainty in Slope
Δb=TiINV(0.05,5)*se1 0.000895789
Uncertainty in Intercept
Δb=TiINV(0.05,5)*seb 0.438558582
![Page 13: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/13.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 13/27
Regression Line Confidence Intervals & Prediction Band
Not only do you want to obtain a curve fit relationship but you also want to
establish a confidence interval in the equation or measure of randomuncertainty in a curve fit.
ν=N-2 in determination of t-value. Two degrees of freedom are lost becausem1 and b are determined.
CI = ! y " ±t # ,$
SEE
N = ±t
# ,$
S yx
N = ±t
# ,$
S ey
N
where
t # ,$
= TINV (# ,$ )
(two-sided t-table)
# = 1% P
PB" ±
t # ,$ SEE = ±
t # ,$ S yx= ±
t # ,$ S ey
13Regression Analysis
0
1
2
3
4
5
6
0 200 400 600 800 1000
Prediction Band -95%
CI - 95%
Torque, Lease Squares Fit
CI +95%
Prediction Band +95%
Data
T o r q u e ,
N - m
RPM
![Page 14: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/14.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 14/27
Regression Analysis 14
Regression Line Confidence Interval & Prediction Band
CI!in!Curve!Fit! = ±t ! 2,n"2 # sey
1
n+( x
*" x )
2
S xx$ ±t
! 2,n"2 #sey
n
! yPrediction!Band = ±t
" 2,n#2sey
n +1
n+
x*# x( )
2
S xx$ ±t
" 2,n#2sey
More accurate-minimum at mean
-flares out at low & high extremes
Approximate
![Page 15: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/15.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 15/27
Summations Used in Statistics & Regression
Variable ExpressionSample Standard Deviation
Expressions used inregression analysis
Sum of squares for evaluating CI & PI
Standard error of estimate
S x=
1
N !1" x
i! x( )
2
#$
%&
'
()
1/2
S xx=
xi ! x( )2
"
sey = SEE = S yx =
yi ! y
predicted !at ! x= xi( )2
" N ! 2
#
$%
&
'(
1/2
![Page 16: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/16.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 16/27
CI in slope and intercept
Regression Analysis 16
Slope, m
CI !in!slope = ±t ! 2,v " se1
Intercept, b
CI in Intercept = ±t ! 2,v " seb
Note 1: ν =n-2 .Note 2: m & b are not independent variables. Therefore,
do not apply RSS to y=mx+b to determine Δy. Instead,
use CI for curve fit.
![Page 17: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/17.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 17/27
Outliers in x-y Data Sets
Method involves computing the ratio of the
residuals (predicted-actual) to the standard error of estimate (sey=SEE)
1. Residuals=ypredicted-yactual at each xi
2. Plot the ratio of residuals/SEE for each xi
. These arethe “standardized residuals”.
3. Standardized residuals exceeding ±2 may be
considered outliers. Assuming the residuals are
normally distributed, you can expect that 95% of
residuals are in the range ±2 (that is, within 2 standarddeviations from best fit line)
17Regression Analysis
![Page 18: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/18.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 18/27
Linear Regressionwith Data Transformation
18Regression Analysis
![Page 19: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/19.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 19/27
Data Transformation
Commonly, test data do not show an approximate
linear relationship between the dependent (Y) andindependent (X) variables and a direct linear
regression is not useful.
The form of the relationship expected between thedependent and independent variables is often known.
The data needs to be transformed prior to performing a
linear regression.
Transformations often can be accomplished by takingthe logarithms of or natural logarithms of one or both
sides of the equation.
19Regression Analysis
![Page 20: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/20.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 20/27
Common Transformations
Relationship Plot Method Transformed
Intercept, b
Transformed
Slope, m1y=αxγ Log y vs. Log x (log plot)
Log(y)=Log(α)+γLog(x)
Ln y vs. x (log-log paper)
Ln(y)=Ln(α)+γLn(x)
Log(α)
Ln(α)
γ
γ
y=αeγx Log y vs. x (semi-log plot)
Log(y)=Log(α)+γLog(e)x
Ln y vs. x (semi-log plot)
Ln(y)=Ln(α)+γx
Log(α)
Ln(α)
γ Log(e)
γ
20Regression Analysis
![Page 21: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/21.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 21/27
3
3.5
4
4.5
5
0 10 20 30 40 50
Velocity, ft/s
O u t p u t V
o l t a g e ,
V D C
Regression with Transformation
Example
A velocity probe provides a voltage output that is relatedto velocity, U, by the form E=δ+εUρ
δ, ε, and ρ are constants
U (ft/s) Ei (V)
0 3.19
10 3.99
20 4.3
30 4.48
40 4.651
10
1 10 100
Velocity, ft/s
O u t p u t V
o l t a g e ,
V D C
21Regression Analysis
![Page 22: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/22.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 22/27
Data Relationship Transformation
E=δ+εUρ (E=δ=3.19 at U=0)
Log(E-3.19)=Log(εUρ)
Log(E-3.19)=Log(ε)+Log(Uρ)= Log(ε)+ρLog(U)
U (ft/s) Ei (V) Lets Tranform this X Y
0 3.19
10 3.99 1.00 -0.097
20 4.3 1.30 0.045
30 4.48 1.48 0.11140 4.65 1.60 0.164
Perform Regression on the
transformed Data
Y X b
22Regression Analysis
m1
![Page 23: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/23.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 23/27
Solution (Excel 2004 Output)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.998723855
R Square 0.997449339
Adjusted R Square 0.996174009 t value t*SEE
Standard Error 0.01 3.18 0.02
Observations 4
ANOVA
df SS MS F ignificance F
Regression 1 0.038118269 0.038118 782.1106 0.00127614
Residual 2 9.74754E-05 4.87E-05
Total 3 0.038215745
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -0.525 0.021056315 -24.9274 0.001605 -0.61547736 -0.4342812
X Variable 1 0.432 0.015438034 27.96624 0.001276 0.36531922 0.49816831
t ! ," = TINV (0.05,2) = 4.3026
Y=-0.525+0.432X
SEE=0.0070
23Regression Analysis
![Page 24: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/24.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 24/27
Regression with Transformation & Uncertainty
Example 4.10
3
3.5
4
4.5
5
0 10 20 30 40 50
U, ft/s
E ,
V
E=3.19+0.298U0.432
B=Logb-0.525=Logbb=0.298
Y predicted Y+ Y- Transform it Back Again E E+ E-
3.19 3.19 3.19
-0.0931 -0.0781 -0.1082 4.00 4.03 3.970.0368 0.0519 0.0218 4.28 4.32 4.24
0.1129 0.1279 0.0978 4.49 4.53 4.44
0.1668 0.1818 0.1518 4.66 4.71 4.61
24Regression Analysis
![Page 25: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/25.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 25/27
Multiple and Polynomial Regression
Regression analysis can also be performed in situations where there ismore than one independent variable (multiple regression) or for
polynomials of an independent variable (polynomial regression)
Polynomial Expression Seeks the form
Y=b+m1*x+m2*x2+……+mkx
k
Multiple Regression seeks a function of the form
Y = b + m1 ˆ x1 + m2
ˆ x2 + m3 ˆ x3 + ....+ m
k ˆ x
k
where
ˆ x may represent several independent variables
For example: ˆ x1 = x1
ˆ x2 = x2
ˆ x3 = x1 ! x2
25Regression Analysis
![Page 26: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/26.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 26/27
Linear Regression in Excel 2004
Input the result values
Input desired confidence
level
Input the independent
variable
26Regression Analysis
![Page 27: 31. Regression Analysis 1-1-11](https://reader031.vdocuments.us/reader031/viewer/2022021123/577d21041a28ab4e1e94486b/html5/thumbnails/27.jpg)
8/2/2019 31. Regression Analysis 1-1-11
http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 27/27
Excel 2004 Linear Regression Output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99964308
R Square 0.99928628
Adjusted R Squar 0.99910785
Standard Error 0.02788582
Observations 6
ANOVAdf SS MS F Significance
Regression 1 4.35502286 4.35502286 5600.45805 1.9107E-07
Residual 4 0.00311048 0.00077762
Total 5 4.35813333
Coefficients Standard Erro t Stat P-value Lower 95% Upper 95%
Intercept 0.02952381 0.02018228 1.46285828 0.21733392 -0.02651117 0.08555879
X Variable 1 0.99771429 0.01333197 74.8362082 1.9107E-07 0.9606988 1.03472978
R2
SEE=sey
intercept ”b" slope ”m1"
The lower and upper bounds for thecoefficients. To obtain the +- bound,
simply subtract the lower from the upper and divide by two.
N
27Regression Analysis