hidden markov models with conﬁdencehidden markov models with conﬁdence giovanni cherubin and...

Hidden Markov Models with ConfidenceGiovanni Cherubin and Ilia Nouretdinov

21 April 2016

[email protected]

@gchers

mailto:[email protected]?subject=

Applications• Speech: speech recognition, speech synthesis

• Biology: DNA analysis, gene prediction

• Information Security: cryptanalysis, IDSs, password recovery, software piracy detection, risks evaluation, side-channel attacks

• Other fields: handwriting recognition, time series analysis, activity recognition

Hidden Markov Models• Specific case of Bayesian network

• “Hidden” Markov process St, and observable variable Ot

S1 S2

O1 O2

… SL

OL



o1



o1

s1



o1 o2

s1



o1 o2

s1 s2



o1 o2

s1 s2

…

…

Some notation

Markov process St:

Variable Ot, whose value depends on the current state st.

S1 S2

O1 O2

… SL

OL

P(St = st|St�1 = st�1, ..., S1 = s1) = P(St = st|St�1 = st�1)

Some notation

Discrete HMMs: Ot takes values in a finite set

Continuous: Ot is continuous

S1 S2 … SL

Some notation

S1 S2

O1 O2

… SL

OL

Initial probabilities

Transition probabilitiesEmission densities

�i

A = {aij}

B = {bi}

Some notation

λ = (A, B, π) : HMM i = 1, 2, …, N : states x = (o1, o2, …, oL) : observation sequence h = (s1, s2, …, sL) : state sequence

S1 S2

O1 O2

… SL

OL

SettingLearning under fully observable data and decoding

Training set (x1, h1) (x2, h2)

… (xn, hn)

Test object xn+1

Predict a set of candidates for hn+1 : Y

Standard Approach

• Training using Maximum Likelihood

• Decoding using List-Viterbi algorithm

Standard ApproachEstimate the following from data using ML

πi = P(S1 = i) for 1 ≤ i ≤ N

aij = P(St = j | St-1 = i) for 1 ≤ i ≤ N, 1 ≤ j ≤ N

bi : • Assume a distribution (e.g.: bi ~ N (μ, σ)) • Estimate from data its parameters when the

hidden sequence is in the i-th state

Training

Standard ApproachDecoding: Viterbi algorithm


H e … o

A A … A

… … … …

… … … …

z z … z


H e … o

A A … A

… … … …

… … … …

z z … z

Vt(i): probability of the most likely path ending in the i-th

state at time t


H e … o

A A … A

… … … …

… … … …

z z … z

Vt(i): probability of the most likely path ending in the i-th

state at time t

V1(i): probability of starting in i-th state, given the

observation

Standard Approach

H e … o

A A … A

… … … …

… … … …

z z … z

Decoding: List-Viterbi algorithm

Standard Approach

H e … o

A A … A

… … … …

… … … …

z z … z


Vt(i): probabilities of the k most likely paths ending in

the i-th state at time t (vector)

Standard Approach

H e … o

A A … A

… … … …

… … … …

z z … z


Vt(i): probabilities of the k most likely paths ending in

the i-th state at time t (vector)

V1(i): probability of starting from i-th state, given the

observation

Confident Prediction for HMMs

• Predict list of candidate state sequences using CP

• Rank the list with respect to their likelihood

Conformal Prediction

• Statistical framework to make predictions

• Allows to edge predictions

• Works for many underlying algorithms

Conformal Prediction“non-conformity measure”

A({H, , H, …}, ) = 0.2

Conformal Prediction

CP

A“Training set” New objectNon-conformity measure

εSignificance level

YThe probability of error is smaller or equal to ε

Prediction

{( , a), ( , H), …}

CP CP CP CP�

L

Confident PredictionConfident Prediction for HMMs

CP CP CP CP�

L

A

H

Y1


CP CP CP CP�

L

A

H

Y1

a

e

i

Y2


CP CP CP CP�

L

A

H

Y1

a

e

i

Y2

… a

… o

YL…


CP CP CP CP�

L

A

H

Y1

a

e

i

Y2

… a

… o

YL…

Y = Y1 x Y2 x … x YL

P(hn+1 ∉ Y) ≤ ε


Ranking

Estimate A, π from data using Maximum Likelihood

σ(h) = P(h | A, π) for h ∈ Y

Sort Y with respect to these scores

Confident Prediction for HMMs

Experimental SettingContinuous HMM with three hidden states

We consider the following cases:

• Emissions have Normal distribution. The Standard Approach assumes the correct distribution.

• Emissions have GMM distribution. The Standard Approach (erroneously) assumes Normal distribution.

-4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4

-0.25

0.25

0.5

0.75

1

-2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4

-0.25

0.25

0.5

0.75

1

Validity

Number of examples

Cum

ulat

ive e

rror

Cumulative Error

Number of examples

Cum

ulat

ive e

rror

Optimal case

Cumulative Error

Number of examples

Cum

ulat

ive e

rror

Optimal caseNumber of examples

Cum

ulat

ive e

rror

Assumptions violation

Average Position

Method Average PositionStandard Approach 58

CP-HMMε = 0.01 917

CP-HMMε = 0.05 208

CP-HMMε = 0.1 70

Method Average PositionStandard Approach 294

CP-HMMε = 0.01 1067

CP-HMMε = 0.05 337

CP-HMMε = 0.1 146

Optimal case Assumptions violation

Comparison

Standard Approach Confidence Prediction for HMM

Assume emission PDF. If the wrong distribution is assumed, it can

perform badlyOnly assume exchangeability on

emissions

Needs to specify k Validity guaranteed for required accuracy

Future Work• Applications: speech recognition, cryptanalysis,

biology

• Noisy data

• Other nonconformity measures

• Probabilities instead of predictions (e.g.: Venn-Machines)

Thank you

References• [VGS05] “Algorithmic learning in a random world”, V.

Vovk, A. Gammerman, G. Shafer. 2005.

• [V67] “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm”, A.J. Viterbi. 1967.

• [F73] “The Viterbi algorithm”. G.D. Forney Jr. 1973.

• [R89] “ A tutorial on hidden Markov models and selected applications in speech recognition”, L.R. Rabiner. 1989.

Hidden Markov Models with ConfidenceGiovanni Cherubin and Ilia Nouretdinov

21 April 2016

[email protected]

@gchers

hidden markov models with conﬁdencehidden markov models with conﬁdence giovanni cherubin and...

Documents