m odel is w rong ?! s. eguchi, ism & guas. what is model? no model is true ! feature of...
DESCRIPTION
Asymptotics on correct model Large sample asymptotics Asymptotic consistency, normality Asymptotic efficiency (Higher-order asymptotics) Non-parametric asymptoticsTRANSCRIPT
MODEL IS WRONG ?!
S. Eguchi, ISM & GUAS
What is MODEL?
No Model is True !
Feature of interests can reflect on Model
Patterns of interests can incorporate into Model
Observations can only be made to finite precision●
●
●
Cf. J K Lindsay “ Parametric Statistical Inference ”
Asymptotics on correct model
Large sample asymptotics
Asymptotic consistency, normality
Asymptotic efficiency
(Higher-order asymptotics)
Non-parametric asymptotics
Outline
● Near-Model
Bridge para and non-parametrics
Non-efficiency under Near model
●
●
Near model
)( )....,,( 1 xgXX n ~
)()),(,KL(min
nOfg
0
parametric
non-parametric
near-parametric
}:),({ xfM
Tubular Neighborhood
MM
N
})),(,(KLmin:{)(
fggMN
g
Density estimation
)( )....,,(iid
1 ygYY n ~
Estimate g(y)
Kernel estimate
n
i
i
hyyK
nhhyg
1
)(1),(ˆ
0, asa.e.)(),(ˆ
hnyyghyg
Local Likelihood
The main body
n
ii
i yfh
yyKn
hyL1
Main ),(log1),,( )(
)}]()([ {,),,(1
main1
hzyKE
hyyKhyL
n
i
in
Localization versions
(Eguchi, Copas, 1998)
Local likelihood density estimate
),,(maxarg),(~main hyLhy
Maximum Local Likelihood Estimator
The density estimator
:()),(~,(),(~ 1
h
h
ZhyyfZhyf
normalizing const )
hy
-5
-2.5
0
2.5
5 3
4
5
6
7
8
0
0.05
0.1
0.15
-5
-2.5
0
2.5
5
65.3opt h
),(~ hyf
Global vs Local likelihood
-7.5 -5 -2.5 2.5 5 7.5
0.02
0.04
0.06
0.08
0.1
0.12
0.14
-7.5 -5 -2.5 2.5 5 7.5
0.02
0.04
0.06
0.08
0.1
0.12
Global (h =) Local (h = 3.65)opt
Regression function
Estimate (x) = E(Y|x)
)())}(),(KL({min T nOmE βxxβ
)dim(),dim( βx pd
GLM )( Tβxm
Cf.Eguchi,Kim,Park (2002)
Bridge of nonpara / parametric
) large()(1 21
opt hnOh
) small()(0 221
opt2222 hnOh pd
pdp
) finite()1(1 opt22
22 hOhpd
p
)())},(),(KL({min nOmE βxxβ
Discriminant Analysis
Input vector label
Logistic model
Almost logistic model
A class of loss functions
For a given data
Estimate the score
Logistic loss
Error rate
Medical screening
where
Empirical loss
For a training data
score
Estimating function
IRLS
where
Logistic
Asymptotic efficiency
)()()(1)ˆ(var 11A ββββ JVJn
Cramer-Rao type
1TA }}1({E1)ˆ(var
xxβ ppn
( logistic loss).
Risk under correct model
)}ˆ({I),ˆ(Risk ββ DD
Under the correct model
Expected D-loss
)()(),ˆ(Risk),ˆ(Risk 101 DoDD n ββ
Let
Risk under near model
)( argmin 11 βββ
D
Hesse(D)})ˆ(vartr{)(),ˆ(Risk 1A
11 21 βββ DD
where
Let
Let
λ-family
Target risk ),(risk D
)()()1()( 0 βββ DDD λ-family
)(minargˆ emp βββ Dxβ Tˆ
score
Hesse(D)})ˆ(vartr{)(),ˆ(Risk 1A
11 21 βββ DD
λ
(Proof )
opt
(Eguchi, Copas, 2002)
Some analysis
False positive rate
0.435% 0.423%
15.0opt
λ
Conclusions
● Near-Model
Bridge para and non-parametrics
Non-efficiency under Near model
●
●
α-neighborhood
}{}{}{ ,11,0 ,2222
2222
pd
ppd
p
2biasvarRisk
)(
opt
)( nO
Future project??
)( )....,,( 1 xgXX n ~
)()),(,KL(min k
kk
nOfg kk
}:),({ kkkkk xfM
11 , kkkk MM
-3 -2 -1 1 2 3
-1
-0.5
0.5
1
-3 -2 -1 1 2 3
-1
-0.5
0.5
1
-3 -2 -1 1 2 3
-1
-0.5
0.5
1
-3 -2 -1 1 2 3
-1.5
-1
-0.5
0.5
1
0.16 0.18 0.22 0.24 0.26 0.28 0.30.0095
0.0105
0.011
0.0115
0.012
0.0125
1.2 1.4 1.6 1.80.0045
0.0055
0.006
0.0065
0.007
0.0075
19.0opt h
3.1opt h