a spectral algorithm for learning hidden markov modelsmohri/aml15/kentaro_spectral_hmm.pdf · a...

23
Introduction Learning Algorithm Learning Guarantees Conclusion A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade, and Tong Zhang arXiv:0811.4413 Kentaro Hanaki New York University May 5, 2015 Spectral Algorithm for Learning HMM New York University

Upload: others

Post on 20-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

A Spectral Algorithm for LearningHidden Markov Models

Based on Daniel Hsu, Sham Kakade, and Tong ZhangarXiv:0811.4413

Kentaro Hanaki

New York University

May 5, 2015

Spectral Algorithm for Learning HMM New York University

Page 2: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Table of Contents

Introduction

Learning Algorithm

Learning Guarantees

Conclusion

Spectral Algorithm for Learning HMM New York University

Page 3: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Table of Contents

Introduction

Learning Algorithm

Learning Guarantees

Conclusion

Spectral Algorithm for Learning HMM New York University

Page 4: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

What is HMM?

h1 h4h3h2 h5

x1 x4x3x2 x5

I Probabilistic model for sequential observations Pr[x1, · · · , xT ]

I Observations are generated by their underlying hidden states

I Hidden states follows Markov assumptions

I Low-dim. hidden states → Dynamics easier to understand

I Speech recognition, NLP (PoS tagging, NER, MT, ...), etc.

Spectral Algorithm for Learning HMM New York University

Page 5: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Parameters of HMM

h1 h4h3h2 h5

x1 x4x3x2 x5

T = Pr(ht | ht-1)

O = Pr(xt | ht)

I Pr[x1, · · · , xT ] =∑

h1,··· ,hT (∏T

t=1 Pr[ht |ht−1]Pr[xt |ht ])Pr[h1]

I Observation matrix Oi ,a = Pr[xt = i |ht = a]

I Transition matrix Ta,b = Pr[ht = a|ht−1 = b]

I Prior probability for the initial hidden states πa = Pr[ht = a]

Spectral Algorithm for Learning HMM New York University

Page 6: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Training HMM — Traditional Approaches

Natural loss for probabilistic model is negative log likelihood (NLL)

1. Find the global minimum of NLLI NLL is not convex due to the presence of hidden statesI Solving non-convex optimization problem is extremely hard

2. Heuristically minimize NLL using EM algorithmI NLL is convex once the hidden states are fixedI E-step: Infer and fix the hidden states based on the parametersI M-step: Minimize NLL with respect to parametersI Tend to get stuck into local minimaI Works fine in practice, hard to analyze learning guarantees

Spectral Algorithm for Learning HMM New York University

Page 7: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Contribution of the Paper

The paper proposed an efficient training method that admits aunique solution and learning guarantees

Spectral Algorithm for Learning HMM New York University

Page 8: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Table of Contents

Introduction

Learning Algorithm

Learning Guarantees

Conclusion

Spectral Algorithm for Learning HMM New York University

Page 9: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Overview of the Algorithm

I Learn model for predicting joint probabilities for observations

I Learning is hard due to the presence of hidden statesI Two steps solution without directly referring to hidden states:

I Map to observation space so that probs can be estimatedI Find subspace that is tractable

Spectral Algorithm for Learning HMM New York University

Page 10: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Observation Operator Representation

In HMM, joint probability can be written as

Pr[x1, · · · , xt ] = ~1TmAxt · · ·Ax1~π

Ax is the observation operator

Ax = Tdiag(Ox ,1, · · · ,Ox ,m)

Spectral Algorithm for Learning HMM New York University

Page 11: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Proof

Proof for t = 2 (Generalization is straightforward)

Pr[x1 = i , x2 = j] =∑a,b

Pr[x1 = i , x2 = j |h1 = a, h2 = b]Pr[h1 = a, h2 = b]

=∑a,b

Pr[x1 = i , x2 = j |h1 = a, h2 = b]Tb,aπa

=∑a,b

Pr[x1 = i |h1 = a]Pr[x2 = j |h2 = b]Tb,aπa

=∑a,b

Oj,bTb,aOi,aπa

= ~1TmTdiag(Oj,1, · · · ,Oj,m)Tdiag(Oi,1, · · · ,Oi,m)~π

= ~1TmAjAi~π

Spectral Algorithm for Learning HMM New York University

Page 12: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Observation Operator Representation

Problem: Estimation of ~π and Ax requires inferring hidden states

I ~π: Prior for hidden states

I Ax : Transition between hidden states

Solution: Map everything into the space in which

I each component can be estimated from observation

I dynamics as easy as in hidden state space

Spectral Algorithm for Learning HMM New York University

Page 13: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Mapping to Subspace

I First map everything into observation space using O

~π → O~π, ~1m → O~1m, Ax → OAxO−1

I O−1 not well-defined and dynamics complicated in this space

I Find U such that UTO is invertible (i.e., preserves hiddenstate dynamics) and

~π → ~b1 = (UTO)~π, ~1m → ~b∞ = (UTO)~1m

Ax → Bx = (UTO)Ax(UTO)−1

Pr[x1, · · · , xt ] = ~bT∞Bxt · · ·Bx1~b1

Spectral Algorithm for Learning HMM New York University

Page 14: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Estimating b1, b∞ and B

b1, b∞ and Bx can be estimated using

(P1)i = Pr[x1 = i ]

(P2,1)i ,j = Pr[x2 = i , x1 = j ]

(P3,x ,1)i ,j = Pr[x3 = i , x2 = x , x1 = j ]

Lemmab1, b∞, Bx can be expressed as

b1 = UTP1

b∞ = (PT2,1U)+P1

Bx = (UTP3,x ,1)(UTP2,1)+

Spectral Algorithm for Learning HMM New York University

Page 15: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Finding U

LemmaAssume ~π > 0 and that O and T have column rank m (i.e., anytwo hidden states are not identical). If U is the matrix of leftsingular vectors of P2,1 corresponding to non-zero singular values,then UTO is invertible.

Spectral Algorithm for Learning HMM New York University

Page 16: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Proof for the Existence of (UTO)−1

P2,1 can be rewritten as

P2,1 = OTdiag(~π)OT

So, rank(P2,1) ≤ rank(O). Also, Tdiag(~π)OT has full row rankfrom assumptions. This implies

O = P2,1(Tdiag(~π)OT )+

Therefore, rank(O) ⊂ rank(P2,1). So,

rank(O) = rank(P2,1) = rank(U)

and UTO has rank m and invertible.

Spectral Algorithm for Learning HMM New York University

Page 17: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Learning Algorithm

1. Randomly sample N observation triples (x1, x2, x3) andestimate P1, P2,1 and P3,x ,1

2. Compute SVD of P2,1 and let U be the matrix consisting ofleft singular vectors corresponding to m largest singular values

3. Compute model parameters b1, b∞ and Bx

Spectral Algorithm for Learning HMM New York University

Page 18: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Table of Contents

Introduction

Learning Algorithm

Learning Guarantees

Conclusion

Spectral Algorithm for Learning HMM New York University

Page 19: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Learning Bound for Joint Probabilities

TheoremThere exists a constant C > 0 s.t.

Pr

∑x1,··· ,xt

|Pr[x1, · · · , xt ]− P̂r[x1, · · · , xt ]| ≥ ε

≤ exp

(−

σm(O)2σm(P2,1)4Nε2

C(1 + n0(ε)σm(P2,1)2)t2

)

where

n0(ε) = min{k : ε(k) ≤ ε/4}

ε(k) = min

∑j∈S

Pr[x2 = j] : S ⊂ [n], |S| = n − k

and σm is the m-th largest singular value.

Note: Bound gets looser as the length of sequence gets longer!!!

Spectral Algorithm for Learning HMM New York University

Page 20: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Brief Idea of the Proof

1. Compute the sampling error for P1, P2,1 and P3,x ,1

2. Compute how the sampling error is propagated to accuracyI Compute the approximation error for UI Compute the approximation error for b1, b∞ and Bx

Spectral Algorithm for Learning HMM New York University

Page 21: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Learning Bound for Conditional Probabilities

Conditional probability

Pr[xT |xT−1, · · · , x1] =bT∞BxT bT∑x b

T∞BxbT

bt+1 =BxT+1

bTbT∞BxT bT

Conditional probability also have learning guarantee

I Bound independent of the length of the sequenceI Two more parameters compared to joint probability bound

I α: Smallest value of transition matrix Ax

I γ: Error for hidden states measured in observation space

Spectral Algorithm for Learning HMM New York University

Page 22: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Table of Contents

Introduction

Learning Algorithm

Learning Guarantees

Conclusion

Spectral Algorithm for Learning HMM New York University

Page 23: A Spectral Algorithm for Learning Hidden Markov Modelsmohri/aml15/kentaro_spectral_hmm.pdf · A Spectral Algorithm for Learning Hidden Markov Models Based on Daniel Hsu, Sham Kakade,

Introduction Learning Algorithm Learning Guarantees Conclusion

Conclusion

I Spectral learning for HMM yields a unique solution

I Joint probability estimated without inferring hidden states

I Learning guarantee for joint probability depends on the length,but that for conditional probability does not

Spectral Algorithm for Learning HMM New York University