on multiple-case diagnostics in linear subspace method · on multiple-case diagnostics in linear...

36
On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI 1 Hiroyuki MINAMI 2 Masahiro MIZUTA 2 1 Graduate School of Information Science and Technology, Hokkaido University 2 Information Initiative Center, Hokkaido University 19 th International Conference on Computational Statistics Paris – France, August 22-27 1/36

Upload: others

Post on 14-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

On multiple-case diagnostics

in linear subspace method

Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA2

1Graduate School of Information Science and Technology,

Hokkaido University2Information Initiative Center, Hokkaido University

19th International Conference on Computational Statistics Paris – France, August 22-27 1/36

Page 2: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Outline

1 Introduction

2 Sensitivity analysis in linear subspace

method

3 A multiple-case diagnostics with clustering

4 Numerical example

5 Concluding remarks

2/36

Page 3: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Outline

1 Introduction

2 Sensitivity analysis in linear subspace

method

3 A multiple-case diagnostics with clustering

4 Numerical example

5 Concluding remarks

3/36

Page 4: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Introduction4/36

Sensitivity analysis in pattern recognition

Single-case diagnostics in linear subspace

method (Hayashi et al., 2008)

Multiple-case diagnostics in linear subspace

method

We show the availability through a

simulation study.

Page 5: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Outline

1 Introduction

2 Sensitivity analysis in linear subspace

method

3 A multiple-case diagnostics with clustering

4 Numerical example

5 Concluding remarks

5/36

Page 6: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

CLAss-Featuring Information Compression

(CLAFIC; Watanabe, 1967)

Training observations ( )KkniR k

pk

i ,,1;,,1 KK ==∈x

Class1

L LClass Classk K

*x

11

1 1,, nxx K

k

n

k

kxx ,,1 K

K

n

K

Kxx ,,1 K

Test observation

6/36

Linear subspace method

Page 7: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

[CLAFIC]Autocorrelation matrix of the training data in -th class :

∑=

Τ=

kn

i

k

i

k

i

k

kn

G1

1ˆ xx

0ˆˆˆ21 ≥≥≥≥ k

p

kk λλλ L

k

( )psk

s ,,1ˆ K=u

Eigenvalues :

Eigenvectors :

A projection matrix in -th class:k

( )ppP k

p

s

k

s

k

sk

k

≤≤=∑=

Τ1ˆˆˆ

1

uu

{ }** ˆmaxarg: xxT

kPk =

7/36

Page 8: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

CLAFIC

∗∗xx

T

2P

∗x

Class1

Class2

>

∗x Class1

8/36

∗∗xx

T

1P

Test observation

∗∗xx

T

2P

∗∗xx

T

1P

Page 9: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

[Discriminant score and Average]

Discriminant score for :kix

( ) ,1ˆˆk

k

ik

k

i

k

i niQz ≤≤= xxT

where .ˆˆ1

1

−= ∑

=

K

l

lkk PPKK

Q

.ˆ1ˆ

1

∑=

=kn

i

k

i

k

k zn

Z

Average discriminant score

9/36

Page 10: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

where is the number of the basis vectors in -th class.

means without -th observation in -th class

[Sample influence function (SIF)]

( )( ) ( ) ( ) ( ){ },ˆˆ1ˆ; )( kjgkgk

g

j QQnQ vechvechvechxSIF −⋅−−=

( ))(ˆjgkQvech ( )kQvech j g

[Empirical influence function (EIF)]

( )( )( ) ( ) ( )

( ) ( ) ( ),

ˆˆˆˆˆˆˆˆˆ1

1

ˆˆˆˆˆˆˆˆˆ

ˆ;

1 1

1

1 1

1

+−

−−

=

+−

=

∑ ∑

∑ ∑

= +=

= +=

kgGK

kgG

Qg

g

g

g

p

s

g

s

g

t

g

t

g

s

g

t

jg

g

g

s

p

pt

g

t

g

s

p

s

g

s

g

t

g

t

g

s

g

t

jg

g

g

s

p

pt

g

t

g

s

k

g

j

TTT

TTT

uuuuuuvech

uuuuuuvech

vechxEIF

λλ

λλ

gp g

10/36

Page 11: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Case of

( ) ( )∑=

=kn

i

k

i

jg

k

k

i

k

jg

k Qn

Z1

,ˆ1ˆ xxT

( ) ( ) ( ).ˆˆ1ˆ)( kjgkg

jg

k QQnQ −⋅−−=

( )( )k

g

j Q; vechxSIF

Case of

( )∑=

≠=kn

i

k

i

jg

k

k

i

k

jg

k kgQn

Z1

,ˆ1ˆ xxT

( ) ( ).ˆˆˆˆˆˆˆˆˆ1

1 1

1

∑ ∑= +=

−+−

−−=

g

g

p

s

g

s

g

t

g

t

g

s

g

t

jg

g

g

s

p

pt

g

t

g

s

jg

k GK

QTTT

uuuuuuλλ

( )( )k

g

j Q; vechxEIF

11/36

Page 12: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Single-case diagnostics

Influence of single observation for kZ

Index

12/36

( )jg

kZ

Page 13: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Multiple-case diagnostics

Influence of multiple observations for with( ) ( )gg

jig

k njniZ ,,1;,,1ˆ ,KK ==

kZ

Index

Index( )jigkZ,ˆ

13/36

i

j

Page 14: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Multiple-case diagnostics

( )Tw 1,,1,10 K=g

( )( )gkQL 0ˆ wvech

( )kQvech

hww tgg += 0

( )( )gk gQL wvechw

ˆ

Weights of unperturbed observations

in -th class

Maximum likelihood estimator

Maximum log likelihood

Weights of perturbed case in -th class

Maximum log likelihood in -th

perturbed class

g

14/36

1=h

g

g

Page 15: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Based on Cook (1986) and Tanaka (1994),

( ) ( ) ( ) ( ) ( ).0

2

100 32

2

2

ttdt

Ddt

dt

dDDtD Ο+++=

( ) ( ) .][ 22

0

tL

ConsttDgg

gg⋅

∂∂

∂−⋅≅

=

hww

h

ww

T

T

.][

0

2

hww

h

ww

T

T

h

gg

gg

LC

=∂∂

∂−=

15/36

( ) ( )( ) ( )( )]ˆˆ[2 00

g

k

g

k

ggQLQLD wvechwvechw

w−=

Page 16: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

( )( ) ( )

( ).

ˆˆ 2

hw

vech

vechvechw

vechh

w

T

T

wT

h

∂∂

∂−

∂=

g

k

kk

g

k gg Q

QQ

LQC

( )( ) ,][]ˆ][[ 1 hEIFvechacovEIFh TT

h gkkgk QC −∧

[ ] ( )( ){ }k

g

rgk Q;vechxEIFEIF =

and are evaluated

at and , respectively.

( ) ]ˆ[ g

gkQ wvech

T

w

∂∂ ( ) ( ) ][ 2 Tvechvech kk QQL ∂∂∂−

gg

0ww = ( ) ( )kkQQ ˆvechvech =

By and ,( )( ) ( )0

ˆˆ;

=∂

∂⋅=

gr

g

g

r

k

gk

g

r

QnQ

w

w

w

vechvechxEIF ( )( )

( ) ( )

1

2

ˆ

∂∂∂

−=T

vechvechvechacov

kk

k

QQ

LEQ

we estimate ashC

where and .( )( )( ) ( )

∂∂

∂−=

T

vechvechvechacov

kk

k

QQ

LQ

ˆˆˆ

2

16/36

Page 17: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

( ) ( ) ( ) ( ) ,ˆ1ˆˆ1ˆ1

1,1,1

1 1,1,1

T

vechvechvechvech

−= ∑∑ ∑

=−−

= =−−

N

jjNkiNk

N

i

N

jjNkiNk Q

NQQ

NQ

N

N

( ) ( )NrQrNk ,,1ˆ,1

K=−vech ( )k

rg

k QQ ˆˆ −

We solve the following eigenvalue problem. 17/36

( )( ) .0][ˆ][

1

=

−∧

hIEIFvechEIFT

acov λgkkgk Q

( )( )kQvechacov∧

Here, we estimate with jackknife method.

where we put as .

( )( ) kk VQ /ˆ

JACKvechacov ≅∧

hC : Cook’s Curvature

Page 18: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

1 Get the eigenvectors with

2 Plot them in a lower

dimensional space

3 Detect observations that

have similar influence

4 Evaluate the values of

( )( ) .0][ˆ][

1

=

−∧

hIEIFvechEIFT

acov λgkkgk Q

18/36

( ) { }( )g

Jg

k nJZ ,,1ˆ K∈

Page 19: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Outline

1 Introduction

2 Sensitivity analysis in linear subspace

method

3 A multiple-case diagnostics with clustering

4 Numerical example

5 Concluding remarks

19/36

Page 20: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

A multiple-case diagnostics with clustering

We can not always represent influence directions in

low dimensions.

We solve the problem with clustering.

Eigenvalues of :

A component of the eigenvectors : ( )gij nij ,,1, K=a

021 ≥≥≥≥gn

ξξξ L

( )( )

−∧

IEIFvechEIF T

acov λ][ˆ][

1

gkkgk Q

20/36

Page 21: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Euclidean norm

( ) ( ) ( ).,,1,1 g

n

i ijijijijjj njjdg

K=−−= ∑ =aaaa

T

( )jjdD=Apply Ward’s method to

Reduce the number of combinations of deleting

observations based on the result

21/36

Page 22: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Outline

1 Introduction

2 Sensitivity analysis in linear subspace

method

3 A multiple-case diagnostics with clustering

4 Numerical example

5 Concluding remarks

22/36

Page 23: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Numerical example23/36

[Simulation]

Classes : Group1, Group2

Distribution : Multivariate normal distribution

Number of variables : 6

Number of observations : 10

We compared the result of the proposed method

with the result of deleting observations in all

possible combinations.

Threshold of linear subspace method : 0.997

Page 24: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Group1 :

[ ]Tµ 352.12244.18030.1037.0241.127857.1051 −=

=∑

44.963101.179-5.694-0.045-13.106-245.875

101.179-250.45912.1270.076209.257525.731-

5.694-12.1272.1710.0019.090-21.393-

0.045-0.0760.0010.0000.208-0.384-

13.106-209.2579.090-0.208-1674.438789.092

245.875525.731-21.393-0.384-789.0928676.883

1

Data [,1] [,2] [,3] [,4] [,5] [,6]

[1,] 3.251 74.731 0.04 2.868 19.71 -15.56

[2,] -18.51 87.221 0.038 0.316 17.876 -13.239

[3,] 162.092 130.641 0.03 0.835 -2.196 -2.52

[4,] 120.625 175.7 0.033 3.239 39.097 -17.958

[5,] 70.893 104.692 0.064 -0.848 12.563 -10.372

[6,] 61.044 139.198 0.044 1.157 14.358 -10.993

[7,] 277.224 98.071 0.051 1.425 18.551 -15.139

[8,] 40.983 188.004 0.042 0.915 48.729 -23.67

[9,] 212.658 175.569 0.008 -1.482 -2.177 -1.233

[10,] 128.307 98.585 0.018 1.873 15.924 -12.834

24/36

Page 25: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Group2 : [ ]Tµ 1.3141.3250.089-0.000131.57897.4922 =

=∑

101.04150.9006.4070.119311.49677.281

50.90092.73232.2380.025-171.388145.029

6.40732.23818.1230.00210.82717.550

0.1190.025-0.0020.0010.125-1.018-

311.496171.38810.8270.125-1934.00284.440-

77.281145.02917.5501.018-84.440-5736.272

2

Data [,1] [,2] [,3] [,4] [,5] [,6]

[1,] 90.704 96.139 -0.021 -0.622 -0.666 -3.766

[2,] 39.285 95.397 0.026 3.493 0.681 1.991

[3,] 31.19 243.481 -0.001 1.623 8.291 16.302

[4,] -18.781 113.201 0.011 2.88 7.173 -4.303

[5,] 193.847 161.998 0.024 0.688 15.552 22.948

[6,] 122.154 110.92 0.024 4.865 7.591 -4.361

[7,] 174.834 111.309 0.007 -0.382 -7.5 -1.789

[8,] 48.081 122.082 0.024 -9.056 -15.391 -2.228

[9,] 87.225 122.684 -0.013 -5.637 -9.465 -4.696

[10,] 206.376 138.573 -0.082 1.261 6.988 -6.954

25/36

Page 26: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

26/36

{ }10,7 Large influential subset

Average scores in Group1

Average scores in Group2

All possible combinations

( ) { }( )1

1 ,,1;2,1ˆ nJkZ J

k K∈=

Page 27: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

27/36

( ) { }( )2

2 ,,1;2,1ˆ nJkZ J

k K∈=

{ }8,5 Large influential subset

Average scores in Group1

Average scores in Group2

All possible combinations

Page 28: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

28/36

8 clusters8 clusters

1023 → 255 1023 → 255( )128 −= ( )128 −=

Clusters of influence directions

Proposed method

Page 29: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

{ }10,7

Large influential subset

Average scores in Group1

Average scores in Group2

29/36Proposed method

Page 30: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

{ }8,5Large influential subset

Average scores in Group1

Average scores in Group2

30/36Proposed method

Page 31: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Case 231/36

[Simulation]

Classes : Group1, Group2

Distribution : Multivariate normal distribution

Number of variables : 6

Number of observations : 100

Threshold of linear subspace method : 0.997

Page 32: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

32/36

[ ]Tµ 352.12244.18030.1037.0241.127857.1051 −=

=∑

44.963101.179-5.694-0.045-13.106-245.875

101.179-250.45912.1270.076209.257525.731-

5.694-12.1272.1710.0019.090-21.393-

0.045-0.0760.0010.0000.208-0.384-

13.106-209.2579.090-0.208-1674.438789.092

245.875525.731-21.393-0.384-789.0928676.883

1

[ ]Tµ 1.3141.3250.089-0.000131.57897.4922 =

=∑

101.04150.9006.4070.119311.49677.281

50.90092.73232.2380.025-171.388145.029

6.40732.23818.1230.00210.82717.550

0.1190.025-0.0020.0010.125-1.018-

311.496171.38810.8270.125-1934.00284.440-

77.281145.02917.5501.018-84.440-5736.272

2

Setting up 3 outliers that

have similar influence

direction

{ }94,93,42

Group1 Group2

Page 33: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

19 clusters

1.267651 1030

→ 524287

Our Multiple-case

diagnostics in Group1

{ }94,93,423 outliers

33/36

×

Page 34: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Outline

1 Introduction

2 Sensitivity analysis in linear subspace

method

3 A multiple-case diagnostics with clustering

4 Numerical example

5 Concluding remarks

34/36

Page 35: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Concluding remarks

We proposed a multiple-case diagnostics with

clustering in linear subspace method.

In simulation study, we could confirm the availability

of our method.

In the future, we would like to show the results of

some simulation studies and their Monte Carlo

simulations.

35/36

Page 36: On multiple-case diagnostics in linear subspace method · On multiple-case diagnostics in linear subspace method Kuniyoshi HAYASHI1 Hiroyuki MINAMI2 Masahiro MIZUTA 2 1Graduate School

Thank you for your attention!

36/36