learning fast-mixing models for structured predictionjsteinhardt/publications/fast-mcmc/slides.pdfq...

Learning Fast-Mixing Models for Structured Prediction

Jacob Steinhardt Percy Liang

Stanford University

{jsteinhardt,pliang}@cs.stanford.edu

July 8, 2015

J. Steinhardt & P. Liang (Stanford) Fast-Mixing Models July 8, 2015 1 / 11

Structured Prediction Task

x: b d s a d b n n n f a a s s j j j

z: b # # a # # n-n-n # a-a # # n-n a

y: b a n a n a

Goal: fit maximum likelihood model pθ (z | x). Two routes:Use simple model u, exact inference

Use expressive model, Gibbs sampling (transition kernel A)Can we get the best of both worlds?

z: b # # a # # n-n-n # a-a # # n-n a

y: b a n a n aGoal: fit maximum likelihood model pθ (z | x). Two routes:

Use simple model u, exact inference

z: b # # a # # n-n-n # a-a # # n-n a

y: b a n a n aGoal: fit maximum likelihood model pθ (z | x). Two routes:Use simple model u, exact inference

z: b # # a # # n-n-n # a-a # # n-n a

Use expressive model, Gibbs sampling (transition kernel A)

Can we get the best of both worlds?

z: b # # a # # n-n-n # a-a # # n-n a

Strong Doeblin Chains

Definition (Doeblin, 1940)

A chain A is strong Doeblin with parameter ε if

A(zt | zt−1) = εu(zt)+(1− ε)A(zt | zt−1)

for some u, A.

u A A A u A u A A · · ·

All Doeblin chains mix quickly:

Proposition

If A is ε strong Doeblin, then its mixing time is at most 1ε.

Moreover, the stationary distribution is AT u, where T ∼ Geometric(ε).

for some u, A.

A A A u A u A A · · ·

Proposition

for some u, A.

A A u A u A A · · ·

Proposition

for some u, A.

A u A u A A · · ·

Proposition

for some u, A.

u A A A

u A u A A · · ·

Proposition

for some u, A.

u A A A u

A u A A · · ·

Proposition

for some u, A.

u A A A u A

u A A · · ·

Proposition

for some u, A.

u A A A u A u

A A · · ·

Proposition

for some u, A.

u A A A u A u A

A · · ·

Proposition

for some u, A.

u A A A u A u A A

· · ·

Proposition

for some u, A.

u A A A u A u A A · · ·

Proposition

for some u, A.

u A A A u A u A A · · ·

Proposition

for some u, A.

u A A A u A u A A · · ·

Proposition

A Strong Doeblin Family

Let θ parameterize a distribution uθ and transition matrix Aθ .

Aθ = εuθ +(1− ε)Aθ

Three model families:

{uθ}θ∈Θ

F{πθ}θ∈Θ F {πθ}θ∈Θ

F parameterizes computationally tractable distributions!

{uθ}θ∈Θ

= εuθ +(1− ε)Aθ

{uθ}θ∈Θ

F{πθ}θ∈Θ

F {πθ}θ∈Θ

{uθ}θ∈Θ

Strategy

Parameterize strong Doeblin distributions πθ

Maximize log-likelihood: L(θ) = 1n ∑

ni=1 log πθ (z(i))

Issue: hard to compute ∇L(θ)

Insight: interpret Markov chain as latent variable model:

pθ : z1

z2 · · · zTAθ Aθ Aθ

Observe: πθ (z) = pθ (zT = z), T ∼ Geometric(ε)

Strategy

pθ : z1

Strategy

pθ : z1

Learning Updates

Recall latent variable model:

pθ : z1

For any fixed z,

∂ logpθ (zT = z)∂θ

= Ez1:T−1∼pθ (·|zT =z)

[∂ logpθ (z1:T )

Upshot: just need to sample trajectories that end at z.=⇒ importance sampling

Learning Updates

pθ : z1

For any fixed z,

= Ez1:T−1∼pθ (·|zT =z)

[∂ logpθ (z1:T )

Learning Updates

pθ : z1

For any fixed z,

= Ez1:T−1∼pθ (·|zT =z)

[∂ logpθ (z1:T )

Experiments: Task

Task from before:

z: b # # a # # n-n-n # a-a # # n-n a

y: b a n a n a

Note y is a deterministic function y = f (z).

Goal: learn model p(z | x) that maximizes

p(y | x) = ∑z∈f−1(y)

p(z | x)

Experiments: Task

Task from before:

z: b # # a # # n-n-n # a-a # # n-n a

y: b a n a n a

Note y is a deterministic function y = f (z).

Goal: learn model p(z | x) that maximizes

p(y | x) = ∑z∈f−1(y)

p(z | x)

Experiments: Setup

Models:

u(z | x) (bigram, DP): z1 z2 z3 z4

A(zt | zt−1,x) (dictionary, Gibbs): z1 z2 z3 z4

Comparisons:basic: Gibbs sampling (A)

(compute gradients assuming exact inference)

uθ -Gibbs: Gibbs with random restarts from u

Doeblin: our method

Experiments: Setup

Models:

Doeblin: our method

Experiments: Setup

Models:

Comparisons:

basic: Gibbs sampling (A)(compute gradients assuming exact inference)

Doeblin: our method

Experiments: Setup

Models:

Doeblin: our method

Experiments: Setup

Models:

Doeblin: our method

Experiments: Setup

Models:

Doeblin: our method

Experiments: Results

5 10 15training passes

0.25ac

cySwipe Typing (Test Accuracy)

Doeblinbasic

Experiments: Results

5 10 15training passes

0.25ac

cySwipe Typing (Test Accuracy)

Doeblinuµ-Gibbsbasic

Discussion

Summary:

Strong Doeblin property enables fast mixing

Interpolates between tractability and expressivity

Provides better learning updates at training time

Also in paper:

Theoretical analysis of strong Doeblin family

Multi-stage Doeblin chains

Discussion

Summary:

Strong Doeblin property enables fast mixing

Interpolates between tractability and expressivity

Provides better learning updates at training time

Also in paper:

Theoretical analysis of strong Doeblin family

Multi-stage Doeblin chains

Discussion

Related work:

Policy gradient (Sutton et al., 1999)

Inference-aware learning (Barbu, 2009; Domke, 2011; Stoyanov et al.,2011; Huang et al., 2012)

Strong Doeblin analysis (Doeblin, 1940; Propp & Wilson, 1996; Corcoran& Tweedie, 1998)

Future work:

Explore other tractable families

Learn multi-stage chains

Reproducible experiments on CodaLab: codalab.org/worksheets

Discussion

Related work:

Future work:

Discussion

Related work:

Future work:

learning fast-mixing models for structured predictionjsteinhardt/publications/fast-mcmc/slides.pdfq...

Documents

j s j =wns=8 n s#=8

íó íëìó í j7 s=78 j 8 gj n...

investor presentation 2019 - eversholt rail group · j f m...

smart planet 1 -...

=7nsjx s&=7 s&j n s $7& 1 s ==/ us - gri tires ·...

i j n t p s

annual report fy15 - easterseals · 88w 1 j g=js...

j oh n s on - wsgr.com

g j = # 1 n / # 1 1 n g j n = 8 1 / s 8`#11...s ª ; Ú Ô...

nwgg=js n j s jg s= s==#

n :1. j'1;s@

smart gas metering project - engerati.com isabelle... ·...

8 1gs# n j s #1 w# s= s =7g1 s

8 a n 1 s s j

copy of lunch menu a4 - hotels & bb's€¦ · ja# /;wg=8;sa...

voisin voices dec 2020 · 2020. 12. 15. · v o i s i n s v...

íëìó j 1 ns s n s=j #8 # n

powerpoint presentation · 2,000 4,000 6,000 8,000 10,000...

j s n -century egyptology

s n j s#` - longfordcollege.com