gu yuxian wang weinan beijing national day school

24
Gu Yuxian Wang Weinan Beijing National Day School Research Project For Linear Regression

Upload: henry-mcgee

Post on 01-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gu Yuxian Wang Weinan Beijing National Day School

Gu Yuxian Wang WeinanBeijing National Day School

Research Project For Linear Regression

Page 2: Gu Yuxian Wang Weinan Beijing National Day School

Part 1 The Simple Linear Regression

• Given two variables X and Y.• , … are measured without an error• , … are measured with error • So we can let • We can use the least squares estimators and

the maximum likelihood estimator to estimate parameter and .

1x nx2x

1y ny2y

ii xY 10

0 1

Page 3: Gu Yuxian Wang Weinan Beijing National Day School

The Least Squares Estimators• Let • All we need to do is to minimize Δ .

• Let ,• Solve the equation.

n

iii

n

ii yxyy

1

210

1

2 ])[()(

XX

XY

S

Sxy 0̂

XX

XY

S

S1̂

00

01

n

iiiXY yyxxS

1

))((

n

iiYY xxS

1

2)(

n

iiXX yyS

1

2)(

Page 4: Gu Yuxian Wang Weinan Beijing National Day School

The Maximum Likelihood Estimator

• Assume that

• So

iiεii

iiiiY

xββyFxββyεP

yεxββPyYPyF

1010

10

2,0 ~ N

2

210

210

2

1 σ

xββy

iiεiY

ii

eσπ

xββyfyf

Page 5: Gu Yuxian Wang Weinan Beijing National Day School

• The likelihood function

• Compute and

• Solve

• We get

n

i

ii xynL

12

210

22

1ln

0

01

00

1

XX

XY

S

Sxy 0̂

XX

XY

S

S1̂

Page 6: Gu Yuxian Wang Weinan Beijing National Day School

Efficiency Analysis

• They are unbiased.

00

11

1011

21

10

1111

ˆ

ˆ

E

n

xn

S

x

S

xx

n

YE

S

x

S

YExE

n

ii

XX

n

ii

XX

n

ii

n

ii

n

ii

XX

n

ii

XX

n

iii

Page 7: Gu Yuxian Wang Weinan Beijing National Day School

Part2 Errors-in-Variables (EIV) Regression Model

• When the measurements for X is not accurate.• There are two ways to measure errors.• The orthogonal regression and the geometric

mean regression.

Page 8: Gu Yuxian Wang Weinan Beijing National Day School

The Orthogonal Regression(OR)• The distances between the regression line and points are

• To minimize Compute and solve• We are supposed to get

2

1

10

1

ii xy

n

i

ii xy

12

1

210

1

00

01

XY

YYXXXYYYXX

S

SSSSS

2

4ˆ22

1

10

ˆˆ xy

Page 9: Gu Yuxian Wang Weinan Beijing National Day School

The Geometric Mean Regression(GMR)

• The area is• To minimize

Compute and solve

we get

}

2{

1 1

210

n

i

ii xy

1

210

2 ii xy

00

01

XX

YY

S

S1̂ 10

ˆˆ xy

Page 10: Gu Yuxian Wang Weinan Beijing National Day School

Parametric Method

• Assume

X and Y follow a bivariate normal distribution

• We use moment generating function (mgf) to derive the distribution of X and Y :

10

Y

X

),(~

),0(~

),0(~

2

2

2

N

N

N

Page 11: Gu Yuxian Wang Weinan Beijing National Day School

2101212

12

2222

12

122

02

22

221

22121

2121

02

0121

)()(2

1)(

2

1

2

1

2

1)(

2

1)(

21121

)()(21,

)(M)(M)(M

)(E),(M

tttttt

ttttttt

t

ttYX

e

eeee

etttt

ett

• Since are independent, we can separate mgf.

• The bivariate normal distribution that

• method of moment estimator(MOME)

222

12

1

21

22

10

,N~

Y

X

,,

Page 12: Gu Yuxian Wang Weinan Beijing National Day School

2

2

21 1

1

21

11

2

2221

2

11

2

22

101

1

)(1

)()()(),(

)(

)(

)(

)(

n

S

n

yxyx

n

YEXEXYEYXCov

n

S

n

y

n

yYD

n

S

n

x

n

xXD

n

yyYE

n

xxXE

XY

n

i

n

iiin

iii

YY

n

ii

n

ii

XX

n

ii

n

ii

n

ii

Y

n

ii

X

Page 13: Gu Yuxian Wang Weinan Beijing National Day School

• We get:

xy

S

SSSSS

XY

XYXXYYXXYY

10

22

1

ˆˆ

2

Page 14: Gu Yuxian Wang Weinan Beijing National Day School

Special Situation for MLE • The Orthogonal Regression(OR)

• The Geometric Mean Regression (GMR)

XY

YYXXXYYYXX

XY

XYXXYYXXYY

S

SSSSS

S

SSSSS

2

2

22

11

22

1

XX

YYS

S

XY

XYXXYYXXYY

S

S

S

SSSSS

XX

YY

1

22

1

ˆ

2

Page 15: Gu Yuxian Wang Weinan Beijing National Day School

–This is when Y has no error.

–This is when X has no error, so we get the same answer as our first discussion.

02

2

XY

YY

S

S1̂

XX

XY

S

S1̂ 2

2

Page 16: Gu Yuxian Wang Weinan Beijing National Day School

Another Estimator • We want to (1)occupy all la (like MLE)

(2)without distributions(like (OR)&(G))

• Calculate

n

iii

n

iii

yxc

c

xxcyyc

1

2102

1

1

22

1

])())(1[(

00

01

0)1()1( 13

14

1 YYXYXYXX ScSccScS

Page 17: Gu Yuxian Wang Weinan Beijing National Day School

Let

XY

XYXXYYXXYY

S

SSSSS

2

4)(

22

1

22

21

4

2)(

2

1)(

XYXXYY

XYXXXXYYXX

XY SSS

SSSSS

Sd

d

0

04

2)(

2

22

2

XYYYXX

XYXXYY

XYXXXXYYXX

SSS

SSS

SSSSS

)(1 XY

YY

XX

XY

S

S

S

S )()0( 11

],[),0[ 1-1

XY

YY

XX

XY

S

S

S

S

So is increasing and

We get

Prove 1-1 to 1

Page 18: Gu Yuxian Wang Weinan Beijing National Day School

Let

So there is at least one root for

YYXYXYXX ScSccScSxf )1()1( 13

14

1

1Prove 1-1 to

0)()()(

)(

2

2

XY

YY

XX

XYYYXXXY

XY

YY

XYYYXXXX

XY

S

Sf

S

SfSSS

S

Sf

SSSS

Sf

0)1()1( 13

14

1 YYXYXYXX ScSccScS

We have

c

Page 19: Gu Yuxian Wang Weinan Beijing National Day School

So there is ONLY one root for

(when )

XY

YY

XX

XY

S

S

S

S,1And when

0)1()1( 13

14

1 YYXYXYXX ScSccScS

XY

YY

XX

XY

S

S

S

S,1

Then we have

XY

YY

XX

XY

S

S

S

Sc ,]1,0[ 1

1-1

0)( 1 fWe can proof

Page 20: Gu Yuxian Wang Weinan Beijing National Day School

Another Estimator Again• The angle• Let Compute & solve

• We get

• ***

cossin)sin(

sin

1

22

ddd

21

1

210

2

cossin

])[(

n

iii yx

d

00

01

cot

cot1̂

XXXY

XYYY

SS

SS

10

ˆˆ xy

cot1̂

Page 21: Gu Yuxian Wang Weinan Beijing National Day School

Part3 Multiple Linear Regression

The Least Squares Estimators

• Similar to simple linear regression:

• Compute

• We will get a group of equations:

)()2(2

)1(10

niniii xxxY

n

i

niniii xxxy

1

2)()2(2

)1(10

00,010

n

Page 22: Gu Yuxian Wang Weinan Beijing National Day School

0

0

0

0

111

22

1

11

10

1

2

1

2

1

323

1

222

1

121

1

20

1

1

1

1

1

313

1

212

1

121

1

10

111

22

1

110

n

ii

ni

n

i

ni

nin

n

ii

ni

n

ii

ni

n

i

ni

n

iii

n

i

niin

n

iii

n

ii

n

iii

n

ii

n

iii

n

i

niin

n

iii

n

iii

n

ii

n

ii

n

ii

n

i

nin

n

ii

n

ii

yxxxxxxxx

yxxxxxxxxx

yxxxxxxxxx

yxxxn

Assume its coefficient matrix is

The solution is

11 nnija

n

ii

ni

n

iii

n

ii

nnij

n yx

yx

y

a

1

1

1

1

1

11

1

0

Page 23: Gu Yuxian Wang Weinan Beijing National Day School

Errors-in-Variables (EIV) Regression Model(Two Variables)

• The Orthogonal Regression(OR)

• The Geometric Mean Regression(GMR1)(the volume )

• The Geometric Mean Regression(GMR2)(the sum of area )

iii YXZ 210

n

i

iii yxz

1 21

3210

6

)]([

n

i

iii yxz

12

22

1

2210

1

)]([

2210

21 121

)]()[111

(2

1iii

n

i

yxz

Page 24: Gu Yuxian Wang Weinan Beijing National Day School

Thanks!!!!!!