31. regression analysis 1-1-11

27
Regression Analysis 1 Regression Analysis

Upload: ani-krishna

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 1/27

Regression Analysis

1Regression Analysis

Page 2: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 2/27

Least-Squares Linear Regression

  Enables fit of linear or exponential function to data.

  The goal in regression analysis is the development

of a statistical model that can be used to predict

the values of a dependent or response variable

from the values of the independent variable(s).  Linear Fits Most Common

  For exponential functions, data must be transformed.

2Regression Analysis

Page 3: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 3/27

Method of Least Squares

 If we have N pairs of data(xi, yi) we seek to fit astraight line through thedata of the form:

  Determine constants, a0 and a1, such that thedistance between the actualy data and the fitted/predicted line is minimized.  Each xi is assumed to be error 

free. All the error is assumedto be in the y values.

a0=

 x i x i y i ! x i

2 y i""""

 x i"( )2

! N x i

2"

a1=

 x i y i ! N x i y i""" x i"( )

2

! N x i

2"

 xaa y10

+=

3Regression Analysis

Page 4: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 4/27

Manual Calculation Method 

  Seeking an equation with the form: y=a0+a1x  y=0.879+0.540x

Raw Data

yi xi xiyi xi

1.2 1 1.2 1

2 1.6 3.2 2.56

2.4 3.4 8.16 11.56

3.5 4 14 16

3.5 5.2 18.2 27.04Sum 12.6 15.2 44.76 58.16

( )( ) ( )( )

( ) ( )( )

( )( ) ( )( )

( ) ( )( )540.0

16.5852.15

76.4456.122.15

879.0

16.5852.15

6.1216.5876.442.15

21

20

=!

!

=

=

!

!

=

a

a

4Regression Analysis

Page 5: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 5/27

How good is the fit? 

Coefficient of Determination (R2) measures the goodness of fit and the

proportion of the variation of the y values associated with the variation inthe x variable in the regression. The ratio of the explained variation to thetotal variation.

  R2 =1Perfect Fit (good prediction)

  R2 =0No correlation between x and y

  For engineering data, R2, will normally be quite high (0.8-0.90 or higher)

  A low value might indicate that some important variable was not considered,but is affecting the results.

 R2 =1!

(ax i + b! y i)2"

( y i ! y)2"= Excel Function RSQ (yi 's, xi 's)

where

 y = average of the yi 's

5Regression Analysis

Page 6: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 6/27

Standard Error of Estimate SEE   The standard error of estimate (SEE or Syx) is a

statistical measure of how well the best-fit line representsthe data. This is, effectively, the standard deviation of thedifferences between the data points and the best-fit line.  It provides an estimation of the scatter/random error in the data

about the fitted line. This is analogous to standard deviation for 

sample data.  It has the same units as y.

  2 degrees of freedom are lost to calculate coefficients a0 and a1.

sey=

SEE =

S  yx=

 yi !  ˆ yi( )2

" N ! 2

=

Excel Function STEYX ( yi 's, xi 's)

where

 yi = actual  value of y for a given x i

 ˆ yi= predicted value of y for a given x i

6Regression Analysis

Page 7: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 7/27

Linear Regression Assumptions

  Variation in the data is assumed to be normally distributed and due torandom causes.

  Assuming random variation exists in y values, while x values are error 

free.

  Since error has been minimized in the y direction, an erroneous

conclusion may be made if x is estimated based on a value for y.  For power law or exponential relationships, data needs to be

transformed before carrying out linear regression analysis.

  (As we will discuss later, the method of least squares can also be

applied to nonlinear functional relationships.)

7Regression Analysis

Page 8: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 8/27

Linear Regression Example

y = 0.9977x + 0.0295

R2

= 0.9993

0.00

0.50

1.00

1.50

2.00

2.50

3.00

0.00 0.50 1.00 1.50 2.00 2.50 3.00

Length, cm

   O  u   t  p

  u   t ,

   V  o   l   t  s

  Use Excel

  Chart>>Add Trendline to obtain coefficients

  Functions RSQ() and STEYX() to determine R2 and SEE 

8Regression Analysis

Page 9: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 9/27

Regression Analysis using Excel Analysis Tools

  Linear regression is a standard feature of 

statistical programs and most spreadsheetprograms. It is only necessary to input the x and y

data. The remaining calculations are performed

immediately.

  Excel “Regression Analysis” macro

  Performs linear regression only

  Non-linear relationships must be transformed

  Calculates the slope, intercept, SEE, and the upper and lower 

confidence intervals for the slope and intercept

  Does not produce any graphical output on the user’s plot.

  Does not update automatically.

  The user must interpret the results.

9Regression Analysis

Page 10: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 10/27

Linear Regression in Excel 2008 

10Regression Analysis

Torque, N-m (Y) RPM (X) Y Predicted Residual Residual/SEE=Residual/sey

4.89 100 4.998433207 0.108433207 0.17558474

4.77 201 4.559896053 -0.210103947 -0.340219088

3.79 298 4.138726707 0.348726707 0.564689451

3.76 402 3.687163697 -0.072836303 -0.117943051

2.84 500 3.261652399 0.421652399 0.682777249

4.12 601 2.823115245 -1.296884755 -2.100031702

2.05 699 2.397603947 0.347603947 0.562871377

1.61 799 1.963408745 0.353408745 0.572271025

-0.004341952 5.432628409 m1 b

0.000954031 0.481645161 se1 seb

0.775391233 0.617554846 r^2 sey

20.71311576 6 F df 

=LINEST(A2:A9,B2:B9,TRUE,TRUE)

Outlier  

Y  = m1i X + b

Page 11: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 11/27

Linear Regression Example: Omit Outlier 

Regression Analysis 11

Torque, N-m (Y) RPM (X) Y Predicted Residual Residual/SEE=Residual/sey

4.89 100 5.000219168 0.110219168 0.504559919

4.77 201 4.504157858 -0.265842142 -1.21696881

3.79 298 4.02774254 0.23774254 1.088334807

3.76 402 3.516946736 -0.243053264 -1.112646171

2.84 500 3.03561992 0.19561992 0.895506407

2.05 699 2.058231795 0.008231795 0.037683406

1.61 799 1.567081983 -0.042918017 -0.196469559

-0.004911498 5.49136898 m1 b

0.000348477 0.170606738 se1 seb

0.975447633 0.218446143 r^2 sey

198.6463557 5 F df 

9.479149271 0.238593586 m1 b

Page 12: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 12/27

Uncertainties on Regression

Regression Analysis 12

Confidence Interval for Regression Line

SEE=sey 0.218446143

TINV(a=0.05,n=5) 2.570581835

95%C.I.=TINV(α=0.05, ν=5)*SEE/SQRT(7) 0.212239784

Prediction Band for Regression Line

95%P.I.=TINV(α=0.05, ν=5)*SEE 0.561533687

Uncertainty in Slope

Δb=TiINV(0.05,5)*se1 0.000895789

Uncertainty in Intercept

Δb=TiINV(0.05,5)*seb 0.438558582

Page 13: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 13/27

Regression Line Confidence Intervals & Prediction Band 

  Not only do you want to obtain a curve fit relationship but you also want to

establish a confidence interval in the equation or measure of randomuncertainty in a curve fit.

  ν=N-2 in determination of t-value. Two degrees of freedom are lost becausem1 and b are determined.

CI = ! y " ±t # ,$ 

SEE 

 N = ±t 

# ,$ 

S  yx

 N = ±t 

# ,$ 

S ey

 N 

where

t #  ,$ 

= TINV (# ,$  )

(two-sided t-table)

# = 1% P

PB" ±

t # ,$  SEE = ±

t # ,$  S  yx= ±

t # ,$  S ey

13Regression Analysis

0

1

2

3

4

5

6

0 200 400 600 800 1000

Prediction Band -95%

CI - 95%

Torque, Lease Squares Fit

CI +95%

Prediction Band +95%

Data

   T  o  r  q  u  e ,

   N  -  m

RPM

Page 14: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 14/27

Regression Analysis 14

Regression Line Confidence Interval & Prediction Band 

CI!in!Curve!Fit! = ±t !  2,n"2 # sey

1

n+( x

*" x )

2

S  xx$ ±t 

!  2,n"2 #sey

n

! yPrediction!Band = ±t 

"  2,n#2sey

n +1

n+

 x*# x( )

2

S  xx$ ±t 

"  2,n#2sey

More accurate-minimum at mean

-flares out at low & high extremes

Approximate

Page 15: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 15/27

Summations Used in Statistics & Regression

Variable ExpressionSample Standard Deviation

Expressions used inregression analysis

Sum of squares for evaluating CI & PI

Standard error of estimate

S  x=

1

 N !1" x

i! x( )

2

#$

%&

'

()

1/2

S  xx=

xi ! x( )2

"

sey = SEE = S  yx =

 yi ! y

 predicted !at ! x= xi( )2

" N ! 2

#

$%

&

'(

1/2

Page 16: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 16/27

CI in slope and intercept 

Regression Analysis 16

Slope, m

CI !in!slope = ±t !  2,v " se1

Intercept, b

CI in Intercept = ±t !  2,v " seb

Note 1: ν =n-2 .Note 2: m & b are not independent variables. Therefore,

do not apply RSS to y=mx+b to determine Δy. Instead,

use CI for curve fit.

Page 17: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 17/27

Outliers in x-y Data Sets

  Method involves computing the ratio of the

residuals (predicted-actual) to the standard error of estimate (sey=SEE)

1.  Residuals=ypredicted-yactual at each xi 

2.  Plot the ratio of residuals/SEE for each xi

. These arethe “standardized residuals”.

3.  Standardized residuals exceeding ±2 may be

considered outliers. Assuming the residuals are

normally distributed, you can expect that 95% of 

residuals are in the range ±2 (that is, within 2 standarddeviations from best fit line)

17Regression Analysis

Page 18: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 18/27

Linear Regressionwith Data Transformation

18Regression Analysis

Page 19: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 19/27

Data Transformation

  Commonly, test data do not show an approximate

linear relationship between the dependent (Y) andindependent (X) variables and a direct linear 

regression is not useful.

  The form of the relationship expected between thedependent and independent variables is often known.

  The data needs to be transformed prior to performing a

linear regression.

  Transformations often can be accomplished by takingthe logarithms of or natural logarithms of one or both

sides of the equation.

19Regression Analysis

Page 20: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 20/27

Common Transformations

Relationship Plot Method Transformed

Intercept, b

Transformed

Slope, m1y=αxγ  Log y vs. Log x (log plot) 

Log(y)=Log(α)+γLog(x)

Ln y vs. x (log-log paper) 

Ln(y)=Ln(α)+γLn(x)

Log(α)

Ln(α)

γ 

γ 

y=αeγx Log y vs. x (semi-log plot)

Log(y)=Log(α)+γLog(e)x 

Ln y vs. x (semi-log plot) 

Ln(y)=Ln(α)+γx

Log(α)

Ln(α)

γ Log(e)

γ 

20Regression Analysis

Page 21: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 21/27

3

3.5

4

4.5

5

0 10 20 30 40 50

Velocity, ft/s

   O  u   t  p  u   t   V

  o   l   t  a  g  e ,

   V   D   C

Regression with Transformation

  Example

  A velocity probe provides a voltage output that is relatedto velocity, U, by the form E=δ+εUρ 

  δ, ε, and ρ are constants 

U (ft/s) Ei (V)

0 3.19

10 3.99

20 4.3

30 4.48

40 4.651

10

1 10 100

Velocity, ft/s

   O  u   t  p  u   t   V

  o   l   t  a  g  e ,

   V   D   C

21Regression Analysis

Page 22: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 22/27

Data Relationship Transformation

E=δ+εUρ (E=δ=3.19 at U=0) 

Log(E-3.19)=Log(εUρ)

Log(E-3.19)=Log(ε)+Log(Uρ)= Log(ε)+ρLog(U)

U (ft/s) Ei (V) Lets Tranform this X Y

0 3.19

10 3.99 1.00 -0.097

20 4.3 1.30 0.045

30 4.48 1.48 0.11140 4.65 1.60 0.164

Perform Regression on the

transformed Data

Y X b

22Regression Analysis

m1

Page 23: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 23/27

Solution (Excel 2004 Output)

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.998723855

R Square 0.997449339

Adjusted R Square 0.996174009 t value t*SEE

Standard Error 0.01 3.18 0.02

Observations 4

ANOVA

df SS MS F ignificance F  

Regression 1 0.038118269 0.038118 782.1106 0.00127614

Residual 2 9.74754E-05 4.87E-05

Total 3 0.038215745

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -0.525 0.021056315 -24.9274 0.001605 -0.61547736 -0.4342812

X Variable 1 0.432 0.015438034 27.96624 0.001276 0.36531922 0.49816831

 

t !  ,"  = TINV (0.05,2) = 4.3026

Y=-0.525+0.432X

SEE=0.0070 

23Regression Analysis

Page 24: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 24/27

Regression with Transformation & Uncertainty  

Example 4.10

3

3.5

4

4.5

5

0 10 20 30 40 50

U, ft/s

   E ,

   V

E=3.19+0.298U0.432 

B=Logb-0.525=Logbb=0.298

Y predicted Y+ Y- Transform it Back Again E E+ E-

3.19 3.19 3.19

-0.0931 -0.0781 -0.1082 4.00 4.03 3.970.0368 0.0519 0.0218 4.28 4.32 4.24

0.1129 0.1279 0.0978 4.49 4.53 4.44

0.1668 0.1818 0.1518 4.66 4.71 4.61

24Regression Analysis

Page 25: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 25/27

Multiple and Polynomial Regression

  Regression analysis can also be performed in situations where there ismore than one independent variable (multiple regression) or for 

polynomials of an independent variable (polynomial regression)

  Polynomial Expression Seeks the form

  Y=b+m1*x+m2*x2+……+mkx

k

  Multiple Regression seeks a function of the form

Y  = b + m1 ˆ x1 + m2

 ˆ x2 + m3 ˆ x3 + ....+ m

k  ˆ x

where

 ˆ x may represent several independent variables

For example: ˆ x1 = x1

 ˆ x2 = x2

 ˆ x3 = x1 !  x2

25Regression Analysis

Page 26: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 26/27

Linear Regression in Excel 2004

Input the result values

Input desired confidence

level

Input the independent

variable

26Regression Analysis

Page 27: 31. Regression Analysis 1-1-11

8/2/2019 31. Regression Analysis 1-1-11

http://slidepdf.com/reader/full/31-regression-analysis-1-1-11 27/27

Excel 2004 Linear Regression Output 

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.99964308

R Square 0.99928628

Adjusted R Squar 0.99910785

Standard Error 0.02788582

Observations 6

ANOVAdf SS MS F Significance

Regression 1 4.35502286 4.35502286 5600.45805 1.9107E-07

Residual 4 0.00311048 0.00077762

Total 5 4.35813333

Coefficients Standard Erro t Stat P-value Lower 95% Upper 95%

Intercept 0.02952381 0.02018228 1.46285828 0.21733392 -0.02651117 0.08555879

X Variable 1 0.99771429 0.01333197 74.8362082 1.9107E-07 0.9606988 1.03472978

R2 

SEE=sey 

intercept ”b" slope ”m1"

The lower and upper bounds for thecoefficients. To obtain the +- bound,

simply subtract the lower from the upper and divide by two.

N

27Regression Analysis