chapter 3 (part 3): maximum-likelihood and bayesian parameter estimation (section 3.10)

19
Pattern Classificatio n All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Upload: jackson-daniel

Post on 02-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher. - PowerPoint PPT Presentation

TRANSCRIPT

Pattern Classification

All materials in these slides were taken from

Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John

Wiley & Sons, 2000 with the permission of the authors and

the publisher

Chapter 3 (Part 3): Maximum-Likelihood and Bayesian

Parameter Estimation (Section 3.10)

• Hidden Markov Model: Extension of

Markov Chains

Pattern Classification, Chapter 3 (Part 3)

3

Pattern Classification, Chapter 3 (Part 3)

4

• Hidden Markov Model (HMM)

• Interaction of the visible states with the hidden states

bjk= 1 for all j where bjk=P(Vk(t) | j(t)).

• 3 problems are associated with this model

• The evaluation problem

• The decoding problem

• The learning problem

Pattern Classification, Chapter 3 (Part 3)

5

•The evaluation problem

It is the probability that the model produces a sequence VT of visible states. It is:

where each r indexes a particular sequence

of T hidden states.

)(P)|V(P)V(P Tr

r

1r

Tr

TTmax

)T(),...,2(),1(Tr

Tt

1t

Tr

Tt

1t

Tr

T

))1t(|)t((P)P( (2)

))t(|)t(v(P)|V(P )1(

Pattern Classification, Chapter 3 (Part 3)

6

Using equations (1) and (2), we can write:

Interpretation: The probability that we observe the particular sequence of T visible states VT is equal to the sum over all rmax possible sequences of hidden states of the conditional probability that the system has made a particular transition multiplied by the probability that it then emitted the visible symbol in our target sequence.

Example: Let 1, 2, 3 be the hidden states; v1, v2, v3 be the visible states

and V3 = {v1, v2, v3} is the sequence of visible states

P({v1, v2, v3}) = P(1).P(v1 | 1).P(2 | 1).P(v2 | 2).P(3 | 2).P(v3 | 3)+…+ (possible terms in the sum= all possible (33= 27) cases

!)

maxr

1r

Tt

1t

T )1t(|)t((P ))t(|)t(v(P)V(P

Pattern Classification, Chapter 3 (Part 3)

7

First possibility:

Second Possibility:

P({v1, v2, v3}) = P(2).P(v1 | 2).P(3 | 2).P(v2 | 3).P(1 | 3).P(v3 | 1) + …+

Therefore:

states hidden ofsequence possible

3t

1t321 ))1t(|)t((P)).t(|)t(v(P})v,v,v({P

1

(t = 1)2

(t = 2)

3

(t = 3)

v1 v2 v3

2

(t = 1)1

(t = 3)3

(t = 2)

v3v2v1

Pattern Classification, Chapter 3 (Part 3)

8

Evaluation

• HMM forward

• HMM backward

• Example 3.

• HMM Forward

Pattern Classification, Chapter 3 (Part 3)

9

Pattern Classification, Chapter 3 (Part 3)

10

Pattern Classification, Chapter 3 (Part 3)

11

Pattern Classification, Chapter 3 (Part 3)

12

Left-to-Right model (speech recognition)

Pattern Classification, Chapter 3 (Part 3)

13

•The decoding problem (optimal state sequence)

Given a sequence of visible states VT, the decoding problem is to find the most probable sequence of hidden states.

This problem can be expressed mathematically as:

find the single “best” state sequence (hidden states)

|)T(V),...,2(v),1(v),T(),...,2(),1(Pmaxarg)T(ˆ),...,2(ˆ),1(ˆ

:that such )T(ˆ),...,2(ˆ),1(ˆ

)T(),...,2(),1(

Note that the summation disappeared, since we want to findOnly one unique best case !

Pattern Classification, Chapter 3 (Part 3)

14

Where: = [,A,B]

= P((1) = ) (initial state probability)

A = aij = P((t+1) = j | (t) = i)

B = bjk = P(v(t) = k | (t) = j)

In the preceding example, this computation corresponds to the selection of the best path amongst:

{1(t = 1),2(t = 2),3(t = 3)}, {2(t = 1),3(t = 2),1(t = 3)}

{3(t = 1),1(t = 2),2(t = 3)}, {3(t = 1),2(t = 2),1(t = 3)}

{2(t = 1),1(t = 2),3(t = 3)}

Pattern Classification, Chapter 3 (Part 3)

15

Decoding

• HMM decoding

• Example 4

• Might have invalid path

Pattern Classification, Chapter 3 (Part 3)

16

Pattern Classification, Chapter 3 (Part 3)

17

•The learning problem (parameter estimation)

This third problem consists of determining a method to adjust the model parameters = [,A,B] to satisfy a certain optimization criterion. We need to find the best model

Such that to maximize the probability of the observation sequence:

We use an iterative procedure such as Baum-Welch or Gradient to find this local optimum

]ˆ,ˆ,ˆ[^

BA

)|(

TVPMax

Pattern Classification, Chapter 3 (Part 3)

18

Learning

• The Forward-Backward Algorithm

• Baum-Welch Algorithm

Pattern Classification, Chapter 3 (Part 3)

19

bby Eq. 140 y Eq. 140

bby Eq. 141y Eq. 141