a linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...a linear-time...

87
A linear-time adaptive nonparametric two-sample test Zolt´ an Szab´ o (CMAP, ´ Ecole Polytechnique) Wittawat Jitkrittum Kacper Chwialkowski Arthur Gretton Signal Processing and Machine Learning Seminar Marseilles March 24, 2017 Zolt´ anSzab´o A linear-time adaptive nonparametric two-sample test

Upload: lenguyet

Post on 10-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

A linear-time adaptive nonparametric two-sampletest

Zoltan Szabo (CMAP, Ecole Polytechnique)

Wittawat Jitkrittum Kacper Chwialkowski Arthur Gretton

Signal Processing and Machine Learning SeminarMarseilles

March 24, 2017

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 2: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Motivating examples

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 3: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Motivating example-1: NLP

Given: two categories of documents (Bayesian inference,neuroscience).

Task:

test their distinguishability,most discriminative words Ñ interpretability.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 4: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Motivating example-2: computer vision

Given: two sets of faces (happy, angry).

Task:

check if they are different,determine the most discriminative features/regions.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 5: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

One-page summary

We propose a nonparametric t-test.

It gives a reason why H0 is rejected.

It is

adaptive Ñ high test power.fast (linear time).

Paper, code:

NIPS [Jitkrittum et al., 2016].

https://github.com/wittawatj/interpretable-test.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 6: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

One-page summary

We propose a nonparametric t-test.

It gives a reason why H0 is rejected.

It is

adaptive Ñ high test power.fast (linear time).

Paper, code:

NIPS [Jitkrittum et al., 2016].

https://github.com/wittawatj/interpretable-test.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 7: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Two-sample test, distribution features

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 8: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a two-sample test?

Given:

X “ txiuni“1i.i.d.„ P, Y “ tyjunj“1

i.i.d.„ Q.

Example: xi = i th happy face, yj = j th sad face.

Problem: using X , Y test

H0 : P “ Q, vs

H1 : P ‰ Q.

Assume X ,Y Ă Rd .

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 9: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a two-sample test?

Given:

X “ txiuni“1i.i.d.„ P, Y “ tyjunj“1

i.i.d.„ Q.

Example: xi = i th happy face, yj = j th sad face.

Problem: using X , Y test

H0 : P “ Q, vs

H1 : P ‰ Q.

Assume X ,Y Ă Rd .

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 10: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a two-sample test?

Given:

X “ txiuni“1i.i.d.„ P, Y “ tyjunj“1

i.i.d.„ Q.

Example: xi = i th happy face, yj = j th sad face.

Problem: using X , Y test

H0 : P “ Q, vs

H1 : P ‰ Q.

Assume X ,Y Ă Rd .

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 11: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Ingredients of two-sample test

Test statistic: λn “ λnpX ,Y q, random.Significance level: α “ 0.01.Under H0: PH0p λn ď Tα

looomooon

correctly accepting H0

q “ 1´ α.

Under H1: PH1pTα ă λnq “ Ppcorrectly rejecting H0q =: power.

0 20 40 60 800

0.01

0.02

0.03

0.04

0.05

0.06

λn

PH0

(

λn

)

PH1

(

λn

)

λn

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 12: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Ingredients of two-sample test

Test statistic: λn “ λnpX ,Y q, random.Significance level: α “ 0.01.Under H0: PH0p λn ď Tα

looomooon

correctly accepting H0

q “ 1´ α.

Under H1: PH1pTα ă λnq “ Ppcorrectly rejecting H0q =: power.

0 20 40 60 800

0.01

0.02

0.03

0.04

0.05

0.06

λn

PH0

(

λn

)

PH1

(

λn

)

λn

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 13: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards representations of distributions: EX

Given: 2 Gaussians with different means.

Solution: t-test.

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4pdf

Two Gaussian variables: different means

PQ

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 14: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards representations of distributions: EX 2

Setup: 2 Gaussians; same means, different variances.

Idea: look at the 2nd-order features of RVs.

ϕx “ x2 ñ difference in EX 2.

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

pdf

Two Gaussian variables: different variances

PQ

10−1

100

101

102

0

0.2

0.4

0.6

0.8

1

1.2

1.4

pd

f

Pdf−s of X2

PQ

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 15: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards representations of distributions: EX 2

Setup: 2 Gaussians; same means, different variances.

Idea: look at the 2nd-order features of RVs.

ϕx “ x2 ñ difference in EX 2.

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

pdf

Two Gaussian variables: different variances

PQ

10−1

100

101

102

0

0.2

0.4

0.6

0.8

1

1.2

1.4

pd

f

Pdf−s of X2

PQ

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 16: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards representations of distributions: further moments

Setup: a Gaussian and a Laplacian distribution.

Challenge: their means and variances are the same.

Idea: look at higher-order features.

−6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7p

df

Gaussian & Laplacian variables

PQ

Let us consider feature representations!

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 17: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel: similarity between features

Given: x and x1 objects (images or texts).

Question: how similar they are?

Define features of the objects:

ϕx : features of x,

ϕx1 : features of x1.

Kernel: inner product of these features

kpx, x1q :“ 〈ϕx, ϕx1〉 .

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 18: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel: similarity between features

Given: x and x1 objects (images or texts).

Question: how similar they are?

Define features of the objects:

ϕx : features of x,

ϕx1 : features of x1.

Kernel: inner product of these features

kpx, x1q :“ 〈ϕx, ϕx1〉 .

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 19: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel: similarity between features

Given: x and x1 objects (images or texts).

Question: how similar they are?

Define features of the objects:

ϕx : features of x,

ϕx1 : features of x1.

Kernel: inner product of these features

kpx, x1q :“ 〈ϕx, ϕx1〉 .

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 20: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel examples on Rd (γ ą 0, p P Z`)

Polynomial kernel:

kpx, yq “ p〈x, y〉` γqp.

Gaussian kernel:

kpx, yq “ e´γ}x´y}22 .

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 21: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards distribution features

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 22: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards distribution features

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 23: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards distribution features

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 24: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Towards distribution features

{MMD2pP,Qq “ ĚKP,P `ĘKQ,Q ´ 2ĘKP,Q (without diagonals in ĚKP,P, ĘKQ,Q)

:{MMD illustration credit: Arthur Gretton

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 25: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel Ñ distribution feature

Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.

Feature of P (mean embedding):

µP :“ Ex„Prϕxs.

Previous quantity: unbiased estimate of

MMD2pP,Qq “ }µP ´ µQ}2 .

Valid test [Gretton et al., 2012]. Challenges:

1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 26: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel Ñ distribution feature

Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.Feature of P (mean embedding):

µP :“ Ex„Prϕxs.

Previous quantity: unbiased estimate of

MMD2pP,Qq “ }µP ´ µQ}2 .

Valid test [Gretton et al., 2012]. Challenges:

1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 27: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel Ñ distribution feature

Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.Feature of P (mean embedding):

µP :“ Ex„Prϕxs.

Previous quantity: unbiased estimate of

MMD2pP,Qq “ }µP ´ µQ}2 .

Valid test [Gretton et al., 2012]. Challenges:

1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 28: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Kernel Ñ distribution feature

Kernel recall: kpx, x1q “ 〈ϕx, ϕx1〉.Feature of P (mean embedding):

µP :“ Ex„Prϕxs.

Previous quantity: unbiased estimate of

MMD2pP,Qq “ }µP ´ µQ}2 .

Valid test [Gretton et al., 2012]. Challenges:

1 Threshold choice: ’ugly’ asymptotics of n {MMD2pP,Pq.2 Test statistic: quadratic time complexity.3 Witness P Hpkq: can be hard to interpret.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 29: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Linear-time tests

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 30: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Linear-time 2-sample test

Recall:

MMDpP,Qq “ }µP ´ µQ}Hpkq .

Changing [Chwialkowski et al., 2015] this to

ρpP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

with random tvjuJj“1 test locations.

ρ is a metric (a.s.). How do we estimate it? Distribution under H0?

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 31: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a random metric?

In short

It is a metric almost surely.

In other words,

ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.

ρpP,Qq “ ρpQ,Pq almost surely.

ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.

V “ tvjuJj“1 Ă Rd : reason of randomness.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 32: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a random metric?

In short

It is a metric almost surely.

In other words,

ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.

ρpP,Qq “ ρpQ,Pq almost surely.

ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.

V “ tvjuJj“1 Ă Rd : reason of randomness.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 33: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a random metric?

In short

It is a metric almost surely.

In other words,

ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.

ρpP,Qq “ ρpQ,Pq almost surely.

ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.

V “ tvjuJj“1 Ă Rd : reason of randomness.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 34: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a random metric?

In short

It is a metric almost surely.

In other words,

ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.

ρpP,Qq “ ρpQ,Pq almost surely.

ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.

V “ tvjuJj“1 Ă Rd : reason of randomness.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 35: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

What is a random metric?

In short

It is a metric almost surely.

In other words,

ρpP,Qq ě 0, ρpP,Qq “ 0 ô P “ Q almost surely.

ρpP,Qq “ ρpQ,Pq almost surely.

ρpP,Qq ď ρpP,Dq ` ρpD,Qq almost surely.

V “ tvjuJj“1 Ă Rd : reason of randomness.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 36: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Result

Theorem

If k is

bounded: supx,x1 kpx, x1q ď Bk ă 8,

analytic: x ÞÑ kpx, yq is analytic for any y P Rd .

characteristic: µ is injective,

then

ρpP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

is a metric a.s. w.r.t. tvjuJj“1.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 37: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Result

Theorem

If k is

bounded: supx,x1 kpx, x1q ď Bk ă 8,

analytic: x ÞÑ kpx, yq is analytic for any y P Rd .

characteristic: µ is injective,

then

ρpP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

is a metric a.s. w.r.t. tvjuJj“1.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 38: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Result

Theorem

If k is

bounded: supx,x1 kpx, x1q ď Bk ă 8,

analytic: x ÞÑ kpx, yq is analytic for any y P Rd .

characteristic: µ is injective,

then

ρpP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

is a metric a.s. w.r.t. tvjuJj“1.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 39: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Result

Theorem

If k is

bounded: supx,x1 kpx, x1q ď Bk ă 8,

analytic: x ÞÑ kpx, yq is analytic for any y P Rd .

characteristic: µ is injective,

then

ρpP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

is a metric a.s. w.r.t. tvjuJj“1.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 40: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Why do analytic features work? – proof idea

µ is injective to analytic functions:

k : bounded, analytic ñ elements of Hk : analytic.k : characteristic, bounded ñ µ “ µk : well-defined, injective.

µ: characteristic ñ for P ‰ Q, f :“ µP ´ µQ ‰ 0.

f : analytic, thus

ρpP,Qq “

g

f

f

e

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

is a metric, a.s. w.r.t. (vji .i .d .„ ) m ! λ. Reason: for an

analytic f ‰ 0, mtv : f pvq “ 0u “ 0.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 41: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Why do analytic features work? – proof idea

µ is injective to analytic functions:

k : bounded, analytic ñ elements of Hk : analytic.k : characteristic, bounded ñ µ “ µk : well-defined, injective.

µ: characteristic ñ for P ‰ Q, f :“ µP ´ µQ ‰ 0.

f : analytic, thus

ρpP,Qq “

g

f

f

e

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

is a metric, a.s. w.r.t. (vji .i .d .„ ) m ! λ. Reason: for an

analytic f ‰ 0, mtv : f pvq “ 0u “ 0.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 42: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Why do analytic features work? – proof idea

µ is injective to analytic functions:

k : bounded, analytic ñ elements of Hk : analytic.k : characteristic, bounded ñ µ “ µk : well-defined, injective.

µ: characteristic ñ for P ‰ Q, f :“ µP ´ µQ ‰ 0.

f : analytic, thus

ρpP,Qq “

g

f

f

e

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

is a metric, a.s. w.r.t. (vji .i .d .„ ) m ! λ. Reason: for an

analytic f ‰ 0, mtv : f pvq “ 0u “ 0.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 43: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Estimation

Compute

pρ2pP,Qq “1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2,

where µPpvq “1n

řni“1 kpxi , vq. Example using kpx, vq “ e´

}x´v}2

2σ2 :

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 44: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Estimation – continued

pρ2pP,Qq “1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

“1

J

Jÿ

j“1

«

1

n

nÿ

i“1

kpxi , vjq ´1

n

nÿ

i“1

kpyi , vjq

ff2

“1

J

Jÿ

j“1

pznq2j “

1

JzTn zn,

where zn “1n

řni“1 rkpxi , vjq ´ kpyi , vjqs

Jj“1

loooooooooooooomoooooooooooooon

“:zi

P RJ .

Good news: estimation is linear in n!

Bad news: intractable null distr. =?n pρ2pP,Pq w

ÝÑ sum of Jcorrelated χ2.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 45: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Estimation – continued

pρ2pP,Qq “1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

“1

J

Jÿ

j“1

«

1

n

nÿ

i“1

kpxi , vjq ´1

n

nÿ

i“1

kpyi , vjq

ff2

“1

J

Jÿ

j“1

pznq2j “

1

JzTn zn,

where zn “1n

řni“1 rkpxi , vjq ´ kpyi , vjqs

Jj“1

loooooooooooooomoooooooooooooon

“:zi

P RJ .

Good news: estimation is linear in n!

Bad news: intractable null distr. =?n pρ2pP,Pq w

ÝÑ sum of Jcorrelated χ2.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 46: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Estimation – continued

pρ2pP,Qq “1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

“1

J

Jÿ

j“1

«

1

n

nÿ

i“1

kpxi , vjq ´1

n

nÿ

i“1

kpyi , vjq

ff2

“1

J

Jÿ

j“1

pznq2j “

1

JzTn zn,

where zn “1n

řni“1 rkpxi , vjq ´ kpyi , vjqs

Jj“1

loooooooooooooomoooooooooooooon

“:zi

P RJ .

Good news: estimation is linear in n!

Bad news: intractable null distr. =?n pρ2pP,Pq w

ÝÑ sum of Jcorrelated χ2.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 47: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Estimation – continued

pρ2pP,Qq “1

J

Jÿ

j“1

rµPpvjq ´ µQpvjqs2

“1

J

Jÿ

j“1

«

1

n

nÿ

i“1

kpxi , vjq ´1

n

nÿ

i“1

kpyi , vjq

ff2

“1

J

Jÿ

j“1

pznq2j “

1

JzTn zn,

where zn “1n

řni“1 rkpxi , vjq ´ kpyi , vjqs

Jj“1

loooooooooooooomoooooooooooooon

“:zi

P RJ .

Good news: estimation is linear in n!

Bad news: intractable null distr. =?n pρ2pP,Pq w

ÝÑ sum of Jcorrelated χ2.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 48: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Normalized version gives tractable null

Modified test statistic:

λn “ nzTn Σ´1n zn,

where Σn “ cov ptziuni“1q.

Under H0:

λnwÝÑ χ2pJq. ñ Easy to get the p1´ αq-quantile!

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 49: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Our idea

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 50: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Idea

Until this point: test locations (V) are fixed.

Instead: choose θ “ tV, σu to

maximize lower bound on the test power.

Theorem (Lower bound on power, for large n)

Test power ě Lpλnq; L: explicit function, increasing.

Here,

λn “ nµTΣ´1µ: population version of λn.µ “ Exyrz1s, Σ “ Exy

pz1 ´ µqpz1 ´ µqT‰

.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 51: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Idea

Until this point: test locations (V) are fixed.

Instead: choose θ “ tV, σu to

maximize lower bound on the test power.

Theorem (Lower bound on power, for large n)

Test power ě Lpλnq; L: explicit function, increasing.

Here,

λn “ nµTΣ´1µ: population version of λn.µ “ Exyrz1s, Σ “ Exy

pz1 ´ µqpz1 ´ µqT‰

.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 52: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Non-convexity, informative features

2D problem:

P :“ N p0, Iq, Q :“ N pe1, Iq.

V “ tv1, v2u. Fix v1 to s.

v2 ÞÑ λnptv1, v2uq: contourplot.

Nearby locations: do notincrease discrimininability.

Non-convexity: reveals multipleways to capture the difference.

v2 λtrn/2(v1, v2)

0

20

40

60

80

100

120

140

160

v2 λtrn/2(v1, v2)

128

136

144

152

160

168

176

184

192

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 53: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Non-convexity, informative features

2D problem:

P :“ N p0, Iq, Q :“ N pe1, Iq.

V “ tv1, v2u. Fix v1 to s.

v2 ÞÑ λnptv1, v2uq: contourplot.

Nearby locations: do notincrease discrimininability.

Non-convexity: reveals multipleways to capture the difference.

v2 λtrn/2(v1, v2)

0

20

40

60

80

100

120

140

160

v2 λtrn/2(v1, v2)

128

136

144

152

160

168

176

184

192

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 54: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Convergence of the λn estimator

But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.

1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.

2 Test statistic: λten2

`

θ˘

.

Theorem (Guarantee on objective approximation, γn Ñ 0)

supV,K

ˇ

ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ

ˇ

ˇ “ O`

n´14

˘

.

Examples:

K “"

kσpx, yq “ e´}x´y}2

2σ2 : σ ą 0

*

,

K “!

kApx, yq “ e´px´yqTApx´yq : A ą 0

)

.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 55: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Convergence of the λn estimator

But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.

1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.

2 Test statistic: λten2

`

θ˘

.

Theorem (Guarantee on objective approximation, γn Ñ 0)

supV,K

ˇ

ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ

ˇ

ˇ “ O`

n´14

˘

.

Examples:

K “"

kσpx, yq “ e´}x´y}2

2σ2 : σ ą 0

*

,

K “!

kApx, yq “ e´px´yqTApx´yq : A ą 0

)

.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 56: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Convergence of the λn estimator

But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.

1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.

2 Test statistic: λten2

`

θ˘

.

Theorem (Guarantee on objective approximation, γn Ñ 0)

supV,K

ˇ

ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ

ˇ

ˇ “ O`

n´14

˘

.

Examples:

K “"

kσpx, yq “ e´}x´y}2

2σ2 : σ ą 0

*

,

K “!

kApx, yq “ e´px´yqTApx´yq : A ą 0

)

.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 57: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Convergence of the λn estimator

But λn is unknown. Split pX ,Y q into pXtr ,Ytr q and pXte ,Yteq.

1 Locations, kernel parameter: θ “ arg maxθ λtrn2pθq.

2 Test statistic: λten2

`

θ˘

.

Theorem (Guarantee on objective approximation, γn Ñ 0)

supV,K

ˇ

ˇzTn pΣn ` γnq´1zn ´ µTΣ´1µ

ˇ

ˇ “ O`

n´14

˘

.

Examples:

K “"

kσpx, yq “ e´}x´y}2

2σ2 : σ ą 0

*

,

K “!

kApx, yq “ e´px´yqTApx´yq : A ą 0

)

.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 58: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Proof idea

Lower bound on the test power:

|λn ´ λn| À }zn ´ µ}2 ` }Σn ´Σ}F .

Bound the r.h.s. by Hoeffding inequality ñ Pp|λn ´ λn| ě tq.By reparameterization: Ppλn ě Tαq bound.

Uniformly λn « λn:

Reduction to bounding supV,K

}zn ´ µ}2, supV,K

}Σn ´Σ}F .

Empirical processes, Dudley entropy bound.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 59: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Proof idea

Lower bound on the test power:

|λn ´ λn| À }zn ´ µ}2 ` }Σn ´Σ}F .

Bound the r.h.s. by Hoeffding inequality ñ Pp|λn ´ λn| ě tq.By reparameterization: Ppλn ě Tαq bound.

Uniformly λn « λn:

Reduction to bounding supV,K

}zn ´ µ}2, supV,K

}Σn ´Σ}F .

Empirical processes, Dudley entropy bound.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 60: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Numerical demos

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 61: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Parameter settings

Gaussian kernel (σ). α “ 0.01. J “ 1. Repeat 500 trials.Report

PprejectH0q «#times λn ą Tα holds

#trials.

Compare 4 methods

ME-full: Optimize V and Gaussian bandwidth σ.ME-grid: Optimize σ. Random V [Chwialkowski et al., 2015].MMD-quad: Test with quadratic-time MMD [Gretton et al., 2012].MMD-lin: Test with linear-time MMD [Gretton et al., 2012].

Optimize kernels to power in MMD-lin, MMD-quad.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 62: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

NLP: discrimination of document categories

5903 NIPS papers (1988-2015).Keyword-based category assignment into 4 groups:

Bayesian inference, Deep learning, Learning theory, Neuroscience

d “ 2000 nouns. TF-IDF representation.

Problem nte ME-full ME-grid MMD-quad MMD-lin1. Bayes-Bayes 215 .012 .018 .022 .008

2. Bayes-Deep 216 .954 .034 .906 .262

3. Bayes-Learn 138 .990 .774 1.00 .238

4. Bayes-Neuro 394 1.00 .300 .952 .972

5. Learn-Deep 149 .956 .052 .876 .500

6. Learn-Neuro 146 .960 .572 1.00 .538

Performance of ME-full rOpnqs is comparable to MMD-quad rOpn2qs.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 63: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

NLP: most/least discriminative words

Aggregating over trials; example: ’Bayes-Neuro’.

Most discriminative words:

spike, markov, cortex, dropout, recurr, iii, gibb.

learned test locations: highly interpretable,’markov’, ’gibb’ (ð Gibbs): Bayesian inference,’spike’, ’cortex’: key terms in neuroscience.

Least dicriminative ones:

circumfer, bra, dominiqu, rhino, mitra, kid, impostor.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 64: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

NLP: most/least discriminative words

Aggregating over trials; example: ’Bayes-Neuro’.

Most discriminative words:

spike, markov, cortex, dropout, recurr, iii, gibb.

learned test locations: highly interpretable,’markov’, ’gibb’ (ð Gibbs): Bayesian inference,’spike’, ’cortex’: key terms in neuroscience.

Least dicriminative ones:

circumfer, bra, dominiqu, rhino, mitra, kid, impostor.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 65: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Distinguish positive/negative emotions

Karolinska Directed Emotional Faces (KDEF) [Lundqvist et al., 1998].70 actors = 35 females and 35 males.d “ 48ˆ 34 “ 1632. Grayscale. Pixel features.

` :happy neutral surprised

´ :afraid angry disgusted

Problem nte ME-full ME-grid MMD-quad MMD-lin˘ vs. ˘ 201 .010 .012 .018 .008

` vs. ´ 201 .998 .656 1.00 .578

Learned test location (averaged) =

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 66: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Summary

We proposed a nonparametric t-test:

linear time,adaptive Ñ high-power (« ’MMD-quad’),

2 demos: discriminating

documents of different categories,positive/negative emotions.

Extension (independence testing):

https://arxiv.org/abs/1610.04782

https://github.com/wittawatj/fsic-test

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 67: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Summary

We proposed a nonparametric t-test:

linear time,adaptive Ñ high-power (« ’MMD-quad’),

2 demos: discriminating

documents of different categories,positive/negative emotions.

Extension (independence testing):

https://arxiv.org/abs/1610.04782

https://github.com/wittawatj/fsic-test

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 68: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Thank you for the attention!

Acknowledgements: This work was supported by the GatsbyCharitable Foundation.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 69: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Contents

Characteristic functions, infinite J.

Number of locations (J).

MMD: IPM representation.

Estimation of MMD2.

Computational complexity: pJ, n, dq-dependence.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 70: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Characteristic functions, infinite J

Characteristic functions – poor choice:

ρ2pP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rφPpvjq ´ φQpvjqs2.

[Moulines et al., 2007]:

ρ3pP,Qq :“nxnyn

›C´

12 pµQ ´ µPq

Hk

,

C “nx

nx ` nyCxx `

nynx ` ny

Cyy : pooled covariance operator.

Computational cost: high (cubic).

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 71: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Characteristic functions, infinite J

Characteristic functions – poor choice:

ρ2pP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rφPpvjq ´ φQpvjqs2.

[Moulines et al., 2007]:

ρ3pP,Qq :“nxnyn

›C´

12 pµQ ´ µPq

Hk

,

C “nx

nx ` nyCxx `

nynx ` ny

Cyy : pooled covariance operator.

Computational cost: high (cubic).

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 72: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Characteristic functions, infinite J

Characteristic functions – poor choice:

ρ2pP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rφPpvjq ´ φQpvjqs2.

[Moulines et al., 2007]:

ρ3pP,Qq :“nxnyn

›C´

12 pµQ ´ µPq

Hk

,

C “nx

nx ` nyCxx `

nynx ` ny

Cyy : pooled covariance operator.

Computational cost: high (cubic).

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 73: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Smoothed characteristic functions

ψPptq “

ż

Rd

φPpωq`pt ´ ωqdω, t P Rd ,

ρ4pP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rψPpvjq ´ ψQpvjqs2.

It

works,

is more sensitive to differences in the frequency domain.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 74: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Smoothed characteristic functions

ψPptq “

ż

Rd

φPpωq`pt ´ ωqdω, t P Rd ,

ρ4pP,Qq :“

g

f

f

e

1

J

Jÿ

j“1

rψPpvjq ´ ψQpvjqs2.

It

works,

is more sensitive to differences in the frequency domain.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 75: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Number of locations (J)

Small J:

often enough to detect the difference of P & Q.few distinguishing regions to reject H0.faster test.

Very large J:

test power need not increase monotonically in J (morelocations ñ statistic can gain in variance).defeats the purpose of a linear-time test.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 76: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Number of locations (J)

Small J:

often enough to detect the difference of P & Q.few distinguishing regions to reject H0.faster test.

Very large J:

test power need not increase monotonically in J (morelocations ñ statistic can gain in variance).defeats the purpose of a linear-time test.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 77: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

MMD: IPM representation

MMD2pP,Qq “ }µP ´ µQ}2Hpkq

«

sup}f }Hpkqď1

〈µP ´ µQ, f 〉Hpkq

ff2

p˚q“

«

sup}f }Hpkqď1

Ex„Pf pxq ´ Ey„Qf pyq

ff2

.

p˚q in details:

〈µP, f 〉Hpkq “⟨ż

kp¨, xqdPpxq, f⟩

Hpkq

ż

〈kp¨, xq, f 〉Hpkqlooooooomooooooon

“f pxq

dPpxq

“ Ex„Pf pxq.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 78: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

MMD: IPM representation

MMD2pP,Qq “ }µP ´ µQ}2Hpkq “

«

sup}f }Hpkqď1

〈µP ´ µQ, f 〉Hpkq

ff2

p˚q“

«

sup}f }Hpkqď1

Ex„Pf pxq ´ Ey„Qf pyq

ff2

.

p˚q in details:

〈µP, f 〉Hpkq “⟨ż

kp¨, xqdPpxq, f⟩

Hpkq

ż

〈kp¨, xq, f 〉Hpkqlooooooomooooooon

“f pxq

dPpxq

“ Ex„Pf pxq.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 79: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

MMD: IPM representation

MMD2pP,Qq “ }µP ´ µQ}2Hpkq “

«

sup}f }Hpkqď1

〈µP ´ µQ, f 〉Hpkq

ff2

p˚q“

«

sup}f }Hpkqď1

Ex„Pf pxq ´ Ey„Qf pyq

ff2

.

p˚q in details:

〈µP, f 〉Hpkq “⟨ż

kp¨, xqdPpxq, f⟩

Hpkq

ż

〈kp¨, xq, f 〉Hpkqlooooooomooooooon

“f pxq

dPpxq

“ Ex„Pf pxq.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 80: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

MMD: IPM representation

MMD2pP,Qq “ }µP ´ µQ}2Hpkq “

«

sup}f }Hpkqď1

〈µP ´ µQ, f 〉Hpkq

ff2

p˚q“

«

sup}f }Hpkqď1

Ex„Pf pxq ´ Ey„Qf pyq

ff2

.

p˚q in details:

〈µP, f 〉Hpkq “⟨ż

kp¨, xqdPpxq, f⟩

Hpkq

ż

〈kp¨, xq, f 〉Hpkqlooooooomooooooon

“f pxq

dPpxq

“ Ex„Pf pxq.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 81: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

MMD: IPM representation

MMD2pP,Qq “ }µP ´ µQ}2Hpkq “

«

sup}f }Hpkqď1

〈µP ´ µQ, f 〉Hpkq

ff2

p˚q“

«

sup}f }Hpkqď1

Ex„Pf pxq ´ Ey„Qf pyq

ff2

.

p˚q in details:

〈µP, f 〉Hpkq “⟨ż

kp¨, xqdPpxq, f⟩

Hpkq

ż

〈kp¨, xq, f 〉Hpkqlooooooomooooooon

“f pxq

dPpxq

“ Ex„Pf pxq.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 82: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

MMD: IPM representation

MMD2pP,Qq “ }µP ´ µQ}2Hpkq “

«

sup}f }Hpkqď1

〈µP ´ µQ, f 〉Hpkq

ff2

p˚q“

«

sup}f }Hpkqď1

Ex„Pf pxq ´ Ey„Qf pyq

ff2

.

p˚q in details:

〈µP, f 〉Hpkq “⟨ż

kp¨, xqdPpxq, f⟩

Hpkq

ż

〈kp¨, xq, f 〉Hpkqlooooooomooooooon

“f pxq

dPpxq

“ Ex„Pf pxq.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 83: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Estimation of MMD2

Squared difference between feature means:

MMD2pP,Qq “ }µP ´ µQ}2H “ 〈µP ´ µQ, µP ´ µQ〉H“ 〈µP, µP〉H ` 〈µQ, µQ〉H ´ 2 〈µP, µQ〉H“ EP,Pkpx , x

1q ` EQ,Qkpy , y1q ´ 2EP,Qkpx , yq.

Unbiased empirical estimate for txiuni“1 „ P, tyju

nj“1 „ Q:

{MMD2pP,Qq “ ĚKP,P `ĘKQ,Q ´ 2ĘKP,Q.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 84: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Estimation of MMD2

Squared difference between feature means:

MMD2pP,Qq “ }µP ´ µQ}2H “ 〈µP ´ µQ, µP ´ µQ〉H“ 〈µP, µP〉H ` 〈µQ, µQ〉H ´ 2 〈µP, µQ〉H“ EP,Pkpx , x

1q ` EQ,Qkpy , y1q ´ 2EP,Qkpx , yq.

Unbiased empirical estimate for txiuni“1 „ P, tyju

nj“1 „ Q:

{MMD2pP,Qq “ ĚKP,P `ĘKQ,Q ´ 2ĘKP,Q.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 85: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Computational complexity

Optimization & testing: linear in n.

Testing: O`

ndJ ` nJ2 ` J3˘

.

Optimization: O`

ndJ2 ` J3˘

per gradient ascent.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 86: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Chwialkowski, K., Ramdas, A., Sejdinovic, D., and Gretton, A.(2015).Fast Two-Sample Testing with Analytic Representations ofProbability Measures.In Neural Information Processing Systems (NIPS), pages1981–1989.

Gretton, A., Borgwardt, K., Rasch, M., Scholkopf, B., andSmola, A. (2012).A kernel two-sample test.Journal of Machine Learning Research, 13:723–773.

Jitkrittum, W., Szabo, Z., Chwialkowski, K., and Gretton, A.(2016).Interpretable distribution features with maximum testingpower.In Neural Information Processing Systems (NIPS).

Lundqvist, D., Flykt, A., and Ohman, A. (1998).The Karolinska directed emotional faces-KDEF.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test

Page 87: A linear-time adaptive nonparametric two-sample testzoltan.szabo/talks/invited...A linear-time adaptive nonparametric two-sample test Zolt an Szab o (CMAP, Ecole Polytechnique) Wittawat

Technical report, ISBN 91-630-7164-9.

Moulines, E., Bach, F. R., and Harchaoui, Z. (2007).Testing for homogenity with kernel Fisher discriminantanalysis.In Neural Information Processing Systems (NIPS), pages609–616.

Zoltan Szabo A linear-time adaptive nonparametric two-sample test