hidden markov models with confidencehidden markov models with confidence giovanni cherubin and...
TRANSCRIPT
-
Hidden Markov Models with ConfidenceGiovanni Cherubin and Ilia Nouretdinov
21 April 2016
@gchers
mailto:[email protected]?subject=
-
Applications• Speech: speech recognition, speech synthesis
• Biology: DNA analysis, gene prediction
• Information Security: cryptanalysis, IDSs, password recovery, software piracy detection, risks evaluation, side-channel attacks
• Other fields: handwriting recognition, time series analysis, activity recognition
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
S1 S2
O1 O2
… SL
OL
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
o1
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
o1
s1
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
o1 o2
s1
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
o1 o2
s1 s2
-
Hidden Markov Models• Specific case of Bayesian network
• “Hidden” Markov process St, and observable variable Ot
o1 o2
s1 s2
…
…
-
Some notation
Markov process St:
Variable Ot, whose value depends on the current state st.
S1 S2
O1 O2
… SL
OL
P(St = st|St�1 = st�1, ..., S1 = s1) = P(St = st|St�1 = st�1)
-
Some notation
Discrete HMMs: Ot takes values in a finite set
Continuous: Ot is continuous
S1 S2 … SL
-
Some notation
S1 S2
O1 O2
… SL
OL
Initial probabilities
Transition probabilitiesEmission densities
�i
A = {aij}
B = {bi}
-
Some notation
λ = (A, B, π) : HMM i = 1, 2, …, N : states x = (o1, o2, …, oL) : observation sequence h = (s1, s2, …, sL) : state sequence
S1 S2
O1 O2
… SL
OL
-
SettingLearning under fully observable data and decoding
Training set (x1, h1) (x2, h2)
… (xn, hn)
Test object xn+1
Predict a set of candidates for hn+1 : Y
-
Standard Approach
• Training using Maximum Likelihood
• Decoding using List-Viterbi algorithm
-
Standard ApproachEstimate the following from data using ML
πi = P(S1 = i) for 1 ≤ i ≤ N
aij = P(St = j | St-1 = i) for 1 ≤ i ≤ N, 1 ≤ j ≤ N
bi : • Assume a distribution (e.g.: bi ~ N (μ, σ)) • Estimate from data its parameters when the
hidden sequence is in the i-th state
Training
-
Standard ApproachDecoding: Viterbi algorithm
-
Standard ApproachDecoding: Viterbi algorithm
-
Standard ApproachDecoding: Viterbi algorithm
H e … o
A A … A
… … … …
… … … …
z z … z
-
Standard ApproachDecoding: Viterbi algorithm
H e … o
A A … A
… … … …
… … … …
z z … z
-
Standard ApproachDecoding: Viterbi algorithm
H e … o
A A … A
… … … …
… … … …
z z … z
-
Standard ApproachDecoding: Viterbi algorithm
H e … o
A A … A
… … … …
… … … …
z z … z
Vt(i): probability of the most likely path ending in the i-th
state at time t
-
Standard ApproachDecoding: Viterbi algorithm
H e … o
A A … A
… … … …
… … … …
z z … z
Vt(i): probability of the most likely path ending in the i-th
state at time t
V1(i): probability of starting in i-th state, given the
observation
-
Standard Approach
H e … o
A A … A
… … … …
… … … …
z z … z
Decoding: List-Viterbi algorithm
-
Standard Approach
H e … o
A A … A
… … … …
… … … …
z z … z
Decoding: List-Viterbi algorithm
Vt(i): probabilities of the k most likely paths ending in
the i-th state at time t (vector)
-
Standard Approach
H e … o
A A … A
… … … …
… … … …
z z … z
Decoding: List-Viterbi algorithm
Vt(i): probabilities of the k most likely paths ending in
the i-th state at time t (vector)
V1(i): probability of starting from i-th state, given the
observation
-
Confident Prediction for HMMs
• Predict list of candidate state sequences using CP
• Rank the list with respect to their likelihood
-
Conformal Prediction
• Statistical framework to make predictions
• Allows to edge predictions
• Works for many underlying algorithms
-
Conformal Prediction“non-conformity measure”
A({H, , H, …}, ) = 0.2
-
Conformal Prediction
CP
A“Training set” New objectNon-conformity measure
εSignificance level
YThe probability of error is smaller or equal to ε
Prediction
{( , a), ( , H), …}
-
CP CP CP CP�
L
Confident PredictionConfident Prediction for HMMs
-
CP CP CP CP�
L
A
H
Y1
Confident PredictionConfident Prediction for HMMs
-
CP CP CP CP�
L
A
H
Y1
a
e
i
Y2
Confident PredictionConfident Prediction for HMMs
-
CP CP CP CP�
L
A
H
Y1
a
e
i
Y2
… a
… o
YL…
Confident PredictionConfident Prediction for HMMs
-
CP CP CP CP�
L
A
H
Y1
a
e
i
Y2
… a
… o
YL…
Confident PredictionConfident Prediction for HMMs
-
CP CP CP CP�
L
A
H
Y1
a
e
i
Y2
… a
… o
YL…
Y = Y1 x Y2 x … x YL
P(hn+1 ∉ Y) ≤ ε
Confident PredictionConfident Prediction for HMMs
-
Ranking
Estimate A, π from data using Maximum Likelihood
σ(h) = P(h | A, π) for h ∈ Y
Sort Y with respect to these scores
Confident Prediction for HMMs
-
Experimental SettingContinuous HMM with three hidden states
We consider the following cases:
• Emissions have Normal distribution. The Standard Approach assumes the correct distribution.
• Emissions have GMM distribution. The Standard Approach (erroneously) assumes Normal distribution.
-4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4
-0.25
0.25
0.5
0.75
1
-2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4
-0.25
0.25
0.5
0.75
1
-
Validity
Number of examples
Cum
ulat
ive e
rror
-
Cumulative Error
Number of examples
Cum
ulat
ive e
rror
Optimal case
-
Cumulative Error
Number of examples
Cum
ulat
ive e
rror
Optimal caseNumber of examples
Cum
ulat
ive e
rror
Assumptions violation
-
Average Position
Method Average PositionStandard Approach 58
CP-HMMε = 0.01 917
CP-HMMε = 0.05 208
CP-HMMε = 0.1 70
Method Average PositionStandard Approach 294
CP-HMMε = 0.01 1067
CP-HMMε = 0.05 337
CP-HMMε = 0.1 146
Optimal case Assumptions violation
-
Comparison
Standard Approach Confidence Prediction for HMM
Assume emission PDF. If the wrong distribution is assumed, it can
perform badlyOnly assume exchangeability on
emissions
Needs to specify k Validity guaranteed for required accuracy
-
Future Work• Applications: speech recognition, cryptanalysis,
biology
• Noisy data
• Other nonconformity measures
• Probabilities instead of predictions (e.g.: Venn-Machines)
-
Thank you
-
References• [VGS05] “Algorithmic learning in a random world”, V.
Vovk, A. Gammerman, G. Shafer. 2005.
• [V67] “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm”, A.J. Viterbi. 1967.
• [F73] “The Viterbi algorithm”. G.D. Forney Jr. 1973.
• [R89] “ A tutorial on hidden Markov models and selected applications in speech recognition”, L.R. Rabiner. 1989.
-
Hidden Markov Models with ConfidenceGiovanni Cherubin and Ilia Nouretdinov
21 April 2016
@gchers