m odel is w rong ?! s. eguchi, ism & guas. what is model? no model is true ! feature of...

MODEL IS WRONG ?!

S. Eguchi, ISM & GUAS

What is MODEL?

No Model is True !

Feature of interests can reflect on Model

Patterns of interests can incorporate into Model

Observations can only be made to finite precision●

●

●

Cf. J K Lindsay “ Parametric Statistical Inference ”

Asymptotics on correct model

Large sample asymptotics

Asymptotic consistency, normality

Asymptotic efficiency

(Higher-order asymptotics)

Non-parametric asymptotics

Outline

● Near-Model

Bridge para and non-parametrics

Non-efficiency under Near model

●

●

Near model

)( )....,,( 1 xgXX n ～

)()),(,KL(min

nOfg

0

parametric

non-parametric

near-parametric

}:),({ xfM

Tubular Neighborhood

MM

N

})),(,(KLmin:{)(

fggMN

g

Density estimation

)( )....,,(iid

1 ygYY n ～

Estimate g(y)

Kernel estimate

n

i

i

hyyK

nhhyg

1

)(1),(ˆ

0, asa.e.)(),(ˆ

hnyyghyg

Local Likelihood

The main body

n

ii

i yfh

yyKn

hyL1

Main ),(log1),,( )(

)}]()([ {,),,(1

main1

hzyKE

hyyKhyL

n

i

in

Localization versions

(Eguchi, Copas, 1998)

Local likelihood density estimate

),,(maxarg),(~main hyLhy

Maximum Local Likelihood Estimator

The density estimator

:()),(~,(),(~ 1

h

h

ZhyyfZhyf

normalizing const )

hy

-5

-2.5

0

2.5

5 3

4

5

6

7

8

0

0.05

0.1

0.15

-5

-2.5

0

2.5

5

65.3opt h

),(~ hyf

Global vs Local likelihood

-7.5 -5 -2.5 2.5 5 7.5

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-7.5 -5 -2.5 2.5 5 7.5

0.02

0.04

0.06

0.08

0.1

0.12

Global (h =) Local (h = 3.65)opt

Regression function

Estimate (x) = E(Y|x)

)())}(),(KL({min T nOmE βxxβ

)dim(),dim( βx pd

GLM )( Tβxm

Cf.Eguchi,Kim,Park (2002)

Bridge of nonpara / parametric

) large()(1 21

opt hnOh

) small()(0 221

opt2222 hnOh pd

pdp

) finite()1(1 opt22

22 hOhpd

p

)())},(),(KL({min nOmE βxxβ

Discriminant Analysis

Input vector label

Logistic model

Almost logistic model

A class of loss functions

For a given data

Estimate the score

Logistic loss

Error rate

Medical screening

where

Empirical loss

For a training data

score

Estimating function

IRLS

where

Logistic

Asymptotic efficiency

)()()(1)ˆ(var 11A ββββ JVJn

Cramer-Rao type

1TA }}1({E1)ˆ(var

xxβ ppn

（ logistic loss)．

Risk under correct model

)}ˆ({I),ˆ(Risk ββ DD

Under the correct model

Expected D-loss

)()(),ˆ(Risk),ˆ(Risk 101 DoDD n ββ

Let

Risk under near model

)( argmin 11 βββ

D

Hesse(D)})ˆ(vartr{)(),ˆ(Risk 1A

11 21 βββ DD

where

Let

Let

λ-family

Target risk ),(risk D

)()()1()( 0 βββ DDD λ-family

)(minargˆ emp βββ Dxβ Tˆ

score

Hesse(D)})ˆ(vartr{)(),ˆ(Risk 1A

11 21 βββ DD

λ

(Proof )

opt

(Eguchi, Copas, 2002)

Some analysis

False positive rate

0.435% 0.423%

15.0opt

λ

Conclusions

● Near-Model

Bridge para and non-parametrics

Non-efficiency under Near model

●

●

α-neighborhood

}{}{}{ ,11,0 ,2222

2222

pd

ppd

p

2biasvarRisk

)(

opt

)( nO

Future project??

)( )....,,( 1 xgXX n ～

)()),(,KL(min k

kk

nOfg kk

}:),({ kkkkk xfM

11 , kkkk MM

-3 -2 -1 1 2 3

-1

-0.5

0.5

1

-3 -2 -1 1 2 3

-1

-0.5

0.5

1

-3 -2 -1 1 2 3

-1

-0.5

0.5

1

-3 -2 -1 1 2 3

-1.5

-1

-0.5

0.5

1

0.16 0.18 0.22 0.24 0.26 0.28 0.30.0095

0.0105

0.011

0.0115

0.012

0.0125

1.2 1.4 1.6 1.80.0045

0.0055

0.006

0.0065

0.007

0.0075

19.0opt h

3.1opt h

m odel is w rong ?! s. eguchi, ism & guas. what is model? no model is true ! feature of...

Documents