regression and correlation jake blanchard fall 2010

13
Regression and Correlation Jake Blanchard Fall 2010

Upload: cuthbert-robertson

Post on 12-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Regression and Correlation

Jake BlanchardFall 2010

IntroductionWe can use regression to find

relationships between random variables

This does not necessarily imply causation

Correlation can be used to measure predictability

Regression with Constant VarianceLinear Regression: E(Y|

X=x)=+xIn general, variance is function of

xIf we assume the variance is a

constant, then the analysis is simplified

Define total absolute error as the sum of the squares of the errors

Linear Regression

n

ii

n

iii

n

iiii

n

iii

n

iii

n

iii

xx

xxyy

xy

solve

xyx

xy

xyxy

1

2

1

1

2

1

2

1

2

1

22

02

02

Variance in Regression Analysis

Relevant variance is conditional: Var(Y|X=x)

2

2|2

22|

1

22

1

22|

1

22|

1

2

2

1

2

1

Y

XY

XY

n

ii

n

iiXY

n

iiiXY

s

sr

ns

xxyyn

s

xyn

s

Confidence IntervalsRegression coefficients are t-

distributed with n-2 dofStatistic below is thus t-

distributed with n-2 dof

And the confidence interval is

n

ii

ixY

xYi

xx

xx

ns

Yi

1

2

2

|

|

1

n

ii

iXY

nixY

xx

xx

nsty

i

1

2

2

|2,

211|

1

ExampleExample 8.1Data for compressive strength (q)

of stiff clay as a function of “blow counts” (N)

038.08

305.0

2

029.0

112.0

22.19

1

12.959

1

123.2

7.18

22|

22

222

222

ns

Nq

NnN

qNnqN

qnqs

NnNs

q

N

Nq

i

ii

iq

iN

744.0,21.0

7.18*104353

7.184

10

1038.*306.2477.

477.04*112.0029.0

4

306.2

1

95.0|

2

2

95.0|

8,975.0

1

2

2

|2,

211|

Nq

Nq

i

n

ii

iXY

nixY

y

Nat

t

xx

xx

nsty

i

Plot

Correlation Estimate

22

2|2

,

,

1,

1,

1

21

1

1

1

1

rs

s

n

n

s

s

ss

yxnyx

n

ss

yyxx

n

Y

xYyx

Y

Xyx

YX

n

iii

yx

YX

n

iii

yx

Regression with Non-Constant VarianceNow relax

assumption of constant variance

Assume regions with large conditional variance weighted less

)(2

)(

1

)(

1

|

1

)|(

)(|

|

1

2

2

22

2

11

2

1

1111

1

11

1

22

22

22

xsgsn

yyws

xgww

xwxww

ywxwyxww

w

xwyw

xyw

xgxXYVarw

weights

xxXYE

xgxXYVar

xY

n

iii

iii

n

iii

n

iii

n

ii

n

iii

n

iii

n

iiii

n

ii

n

ii

n

iii

n

iii

n

iiii

iii

Example (8.2)Data for maximum settlement (x)

of storage tanks and maximum differential settlement (y)

From looking at data, assume g(x)=x (that is, standard deviation of y increases linearly with x

2

22

1

|

ii xw

xxXYVar

Example (8.2) continued

96.0

243.0

0589.0

65.0

045.0

627.0

923.0

11.1

65.1

|

2

xs

s

s

s

y

x

xy

y

x

Multiple Regression

ikkiii xxxy ...22110

“Nonlinear” Regression

)()|( xgxYE

Use LINEST in Excel