gu yuxian wang weinan beijing national day school
TRANSCRIPT
Gu Yuxian Wang WeinanBeijing National Day School
Research Project For Linear Regression
Part 1 The Simple Linear Regression
• Given two variables X and Y.• , … are measured without an error• , … are measured with error • So we can let • We can use the least squares estimators and
the maximum likelihood estimator to estimate parameter and .
1x nx2x
1y ny2y
ii xY 10
0 1
The Least Squares Estimators• Let • All we need to do is to minimize Δ .
• Let ,• Solve the equation.
n
iii
n
ii yxyy
1
210
1
2 ])[()(
XX
XY
S
Sxy 0̂
XX
XY
S
S1̂
00
01
n
iiiXY yyxxS
1
))((
n
iiYY xxS
1
2)(
n
iiXX yyS
1
2)(
The Maximum Likelihood Estimator
• Assume that
• So
iiεii
iiiiY
xββyFxββyεP
yεxββPyYPyF
1010
10
2,0 ~ N
2
210
210
2
1 σ
xββy
iiεiY
ii
eσπ
xββyfyf
• The likelihood function
• Compute and
• Solve
• We get
n
i
ii xynL
12
210
22
1ln
0
01
00
1
XX
XY
S
Sxy 0̂
XX
XY
S
S1̂
Efficiency Analysis
• They are unbiased.
00
11
1011
21
10
1111
ˆ
ˆ
E
n
xn
S
x
S
xx
n
YE
S
x
S
YExE
n
ii
XX
n
ii
XX
n
ii
n
ii
n
ii
XX
n
ii
XX
n
iii
Part2 Errors-in-Variables (EIV) Regression Model
• When the measurements for X is not accurate.• There are two ways to measure errors.• The orthogonal regression and the geometric
mean regression.
The Orthogonal Regression(OR)• The distances between the regression line and points are
• To minimize Compute and solve• We are supposed to get
2
1
10
1
ii xy
n
i
ii xy
12
1
210
1
00
01
XY
YYXXXYYYXX
S
SSSSS
2
4ˆ22
1
10
ˆˆ xy
The Geometric Mean Regression(GMR)
• The area is• To minimize
Compute and solve
we get
}
2{
1 1
210
n
i
ii xy
1
210
2 ii xy
00
01
XX
YY
S
S1̂ 10
ˆˆ xy
Parametric Method
• Assume
X and Y follow a bivariate normal distribution
• We use moment generating function (mgf) to derive the distribution of X and Y :
10
Y
X
),(~
),0(~
),0(~
2
2
2
N
N
N
2101212
12
2222
12
122
02
22
221
22121
2121
02
0121
)()(2
1)(
2
1
2
1
2
1)(
2
1)(
21121
)()(21,
)(M)(M)(M
)(E),(M
tttttt
ttttttt
t
ttYX
e
eeee
etttt
ett
• Since are independent, we can separate mgf.
• The bivariate normal distribution that
• method of moment estimator(MOME)
222
12
1
21
22
10
,N~
Y
X
,,
2
2
21 1
1
21
11
2
2221
2
11
2
22
101
1
)(1
)()()(),(
)(
)(
)(
)(
n
S
n
yxyx
n
YEXEXYEYXCov
n
S
n
y
n
yYD
n
S
n
x
n
xXD
n
yyYE
n
xxXE
XY
n
i
n
iiin
iii
YY
n
ii
n
ii
XX
n
ii
n
ii
n
ii
Y
n
ii
X
• We get:
xy
S
SSSSS
XY
XYXXYYXXYY
10
22
1
ˆˆ
2
4ˆ
Special Situation for MLE • The Orthogonal Regression(OR)
• The Geometric Mean Regression (GMR)
XY
YYXXXYYYXX
XY
XYXXYYXXYY
S
SSSSS
S
SSSSS
2
4ˆ
2
4ˆ
22
11
22
1
XX
YYS
S
XY
XYXXYYXXYY
S
S
S
SSSSS
XX
YY
1
22
1
ˆ
2
4ˆ
•
–This is when Y has no error.
–This is when X has no error, so we get the same answer as our first discussion.
02
2
XY
YY
S
S1̂
XX
XY
S
S1̂ 2
2
Another Estimator • We want to (1)occupy all la (like MLE)
(2)without distributions(like (OR)&(G))
• Calculate
n
iii
n
iii
yxc
c
xxcyyc
1
2102
1
1
22
1
])())(1[(
00
01
0)1()1( 13
14
1 YYXYXYXX ScSccScS
Let
XY
XYXXYYXXYY
S
SSSSS
2
4)(
22
1
22
21
4
2)(
2
1)(
XYXXYY
XYXXXXYYXX
XY SSS
SSSSS
Sd
d
0
04
2)(
2
22
2
XYYYXX
XYXXYY
XYXXXXYYXX
SSS
SSS
SSSSS
)(1 XY
YY
XX
XY
S
S
S
S )()0( 11
],[),0[ 1-1
XY
YY
XX
XY
S
S
S
S
So is increasing and
We get
Prove 1-1 to 1
Let
So there is at least one root for
YYXYXYXX ScSccScSxf )1()1( 13
14
1
1Prove 1-1 to
0)()()(
)(
2
2
XY
YY
XX
XYYYXXXY
XY
YY
XYYYXXXX
XY
S
Sf
S
SfSSS
S
Sf
SSSS
Sf
0)1()1( 13
14
1 YYXYXYXX ScSccScS
We have
c
So there is ONLY one root for
(when )
XY
YY
XX
XY
S
S
S
S,1And when
0)1()1( 13
14
1 YYXYXYXX ScSccScS
XY
YY
XX
XY
S
S
S
S,1
Then we have
XY
YY
XX
XY
S
S
S
Sc ,]1,0[ 1
1-1
0)( 1 fWe can proof
Another Estimator Again• The angle• Let Compute & solve
• We get
• ***
cossin)sin(
sin
1
22
ddd
21
1
210
2
cossin
])[(
n
iii yx
d
00
01
cot
cot1̂
XXXY
XYYY
SS
SS
10
ˆˆ xy
cot1̂
Part3 Multiple Linear Regression
The Least Squares Estimators
• Similar to simple linear regression:
• Compute
• We will get a group of equations:
)()2(2
)1(10
niniii xxxY
n
i
niniii xxxy
1
2)()2(2
)1(10
00,010
n
0
0
0
0
111
22
1
11
10
1
2
1
2
1
323
1
222
1
121
1
20
1
1
1
1
1
313
1
212
1
121
1
10
111
22
1
110
n
ii
ni
n
i
ni
nin
n
ii
ni
n
ii
ni
n
i
ni
n
iii
n
i
niin
n
iii
n
ii
n
iii
n
ii
n
iii
n
i
niin
n
iii
n
iii
n
ii
n
ii
n
ii
n
i
nin
n
ii
n
ii
yxxxxxxxx
yxxxxxxxxx
yxxxxxxxxx
yxxxn
Assume its coefficient matrix is
The solution is
11 nnija
n
ii
ni
n
iii
n
ii
nnij
n yx
yx
y
a
1
1
1
1
1
11
1
0
Errors-in-Variables (EIV) Regression Model(Two Variables)
• The Orthogonal Regression(OR)
• The Geometric Mean Regression(GMR1)(the volume )
• The Geometric Mean Regression(GMR2)(the sum of area )
iii YXZ 210
n
i
iii yxz
1 21
3210
6
)]([
n
i
iii yxz
12
22
1
2210
1
)]([
2210
21 121
)]()[111
(2
1iii
n
i
yxz
Thanks!!!!!!