quiz!! in hmms... t/f:... the emissions are hidden. false t/f:... observations are independent...

CS 294-5: Statistical Natural Language Processing

QUIZ!!In HMMs ...T/F: ... the emissions are hidden. FALSET/F: ... observations are independent given no evidence. FALSET/F: ... each variable Xi has its own (different) CPT. FALSET/F: ... typically none of the variables Xi are observed. TRUET/F: ... can be solved with variable elimination up to any time t. TRUET/F: In the forward algorithm, you must normalize each iteration. FALSE

1X2E1X1X3X4E2E3E4CSE 511a: Artificial IntelligenceSpring 2013Lecture 20: HMMs Particle Filters04/15/2012Robert Plessvia Kilian Q. Weinberger w/ slides from Dan Klein UC Berkeley2AnnouncementsProject 4 is up!Due in one week!!

Contest Coming Soon

3Recap: Reasoning Over TimeStationary Markov models

Hidden Markov modelsX2X1X3X4

rainsun0.70.70.30.3X5X2E1X1X3X4E2E3E4E5

XEPrainumbrella0.9rainno umbrella0.1sunumbrella0.2sunno umbrella0.8Exact Inference

5Just like Variable Elimination6X2e1X1X3X4e2e3e4XEP(E|X)rainumbrella0.9rainno umbrella0.1sunumbrella0.2sunno umbrella0.8X1P(X1)rain0.1sun0.9XP(e|X)rain0.9sun0.2evidenceX1P(X1,e1)rain0.09sun0.18Emission probability:Just like Variable Elimination7X2X1,e1X3X4e2e3e4X1P(X1,e1)rain0.09sun0.18XtXt+1P(Xt+1|Xt)rainrain0.9rainsun0.1sunrain0.2sunsun0.8Transition probability:X1X2P(X2,X1,e1)rainrain0.081rainsun0.009sunrain0.036sunsun0.144Just like Variable Elimination8X1,X2,e1X3X4e2e3e4Marginalize out X1...X1X2P(X2,X1, e1)rainrain0.081rainsun0.009sunrain0.036sunsun0.144X2P(X2,e1)rain0.117sun0.153Just like Variable Elimination9X2,e1X3X4e2e3e4X2P(X2,e1)rain0.117sun0.153XEP(E|X)rainumbrella0.9rainno umbrella0.1sunumbrella0.2sunno umbrella0.8Emission probability:Just like Variable Elimination10X2,e1, e2X3X4e3e4The Forward AlgorithmWe are given evidence at each time and want to know

We can derive the following updates

We can normalize as we go if we want to have P(x|e) at each time step, or just once at the end

Line 1 2 is marginalization of x_t-1

Line 2-3 . Well you get line 3 if you factor the P( , , ) into components.By HMM definition, evidence e_t only depends on x_t, and x_t only depends on x_t-1 Line 3-4: re arranging terms.

11Online Belief UpdatesEvery time step, we start with current P(X | evidence)We update for time:

We update for evidence:

The forward algorithm does both at once (and doesnt normalize)Problem: space is |X| and time is |X|2 per time step

X2X1X2E2Why |X|^2 per step? Top line computes something for every variaible. And that something requires summing over all variables. So |X|^212Best Explanation QueriesQuery: most likely seq:X5X2E1X1X3X4E2E3E4E5

13

So far, we have seen inference approaches to estimate the state, given evidence sampled over time.

The other type of HMM query is an explanatory query.13State Path TrellisState trellis: graph of states and transitions over time

Each arc represents some transitionEach arc has weightEach path is a sequence of statesThe product of weights on a path is the seqs probabilityCan think of the Forward algorithm as computing sums of all paths in this graph

sunrainsunrainsunrainsunrain

14The Viterbi algorithm computes the sum of the best path14Viterbi Algorithm

sunrainsunrainsunrainsunrain15Example

16http://www.cs.umb.edu/~srevilak/viterbi/16FilteringFiltering is the inference process of finding a distribution over XT given e1 through eT : P( XT | e1:t )We first compute P( X1 | e1 ):For each t from 2 to T, we have P( Xt-1 | e1:t-1 ) Elapse time: compute P( Xt | e1:t-1 )

Observe: compute P(Xt | e1:t-1 , et) = P( Xt | e1:t )

17

Recap: Reasoning Over TimeStationary Markov models

Hidden Markov modelsX2X1X3X4

rainsun0.70.70.30.3X5X2E1X1X3X4E2E3E4E5

XEPrainumbrella0.9rainno umbrella0.1sunumbrella0.2sunno umbrella0.8Recap: Filtering

Elapse time: compute P( Xt | e1:t-1 )

Observe: compute P( Xt | e1:t )

X2E1X1E2

Belief:

Prior on X1ObserveElapse timeObserve

Recap: Filtering

Elapse time: compute P( Xt | e1:t-1 )

Observe: compute P( Xt | e1:t )

X2E1X1E2

Belief:

Prior on X1ObserveElapse timeObserve

Approximate Inference

21Particle FilteringSometimes |X| is too big to use exact inference|X| may be too big to even store B(X)E.g. X is continuous|X|2 may be too big to do updates

Solution: approximate inferenceTrack samples of X, not all valuesSamples are called particlesTime per step is linear in the number of samplesBut: number needed may be largeIn memory: list of particles, not states

This is how robot localization works in practice0.00.10.00.00.00.20.00.20.5Representation: ParticlesOur representation of P(X) is now a list of N particles (samples)Generally, N

quiz!! in hmms... t/f:... the emissions are hidden. false t/f:... observations are independent...

Documents

time step x2x1x2e2why

sunno umbrella0

variable xi

rainno umbrella0

variable elimination10x2

variable elimination7x2x1

variable elimination8x1

variable elimination9x2