hidden markov models with confidencehidden markov models with confidence giovanni cherubin and...

49
Hidden Markov Models with Confidence Giovanni Cherubin and Ilia Nouretdinov 21 April 2016 [email protected] @gchers

Upload: others

Post on 16-Feb-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

  • Hidden Markov Models with ConfidenceGiovanni Cherubin and Ilia Nouretdinov

    21 April 2016

    [email protected]

    @gchers

    mailto:[email protected]?subject=

  • Applications• Speech: speech recognition, speech synthesis

    • Biology: DNA analysis, gene prediction

    • Information Security: cryptanalysis, IDSs, password recovery, software piracy detection, risks evaluation, side-channel attacks

    • Other fields: handwriting recognition, time series analysis, activity recognition

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

    S1 S2

    O1 O2

    … SL

    OL

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

    o1

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

    o1

    s1

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

    o1 o2

    s1

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

    o1 o2

    s1 s2

  • Hidden Markov Models• Specific case of Bayesian network

    • “Hidden” Markov process St, and observable variable Ot

    o1 o2

    s1 s2

  • Some notation

    Markov process St:

    Variable Ot, whose value depends on the current state st.

    S1 S2

    O1 O2

    … SL

    OL

    P(St = st|St�1 = st�1, ..., S1 = s1) = P(St = st|St�1 = st�1)

  • Some notation

    Discrete HMMs: Ot takes values in a finite set

    Continuous: Ot is continuous

    S1 S2 … SL

  • Some notation

    S1 S2

    O1 O2

    … SL

    OL

    Initial probabilities

    Transition probabilitiesEmission densities

    �i

    A = {aij}

    B = {bi}

  • Some notation

    λ = (A, B, π) : HMM i = 1, 2, …, N : states x = (o1, o2, …, oL) : observation sequence h = (s1, s2, …, sL) : state sequence

    S1 S2

    O1 O2

    … SL

    OL

  • SettingLearning under fully observable data and decoding

    Training set (x1, h1) (x2, h2)

    … (xn, hn)

    Test object xn+1

    Predict a set of candidates for hn+1 : Y

  • Standard Approach

    • Training using Maximum Likelihood

    • Decoding using List-Viterbi algorithm

  • Standard ApproachEstimate the following from data using ML

    πi = P(S1 = i) for 1 ≤ i ≤ N

    aij = P(St = j | St-1 = i) for 1 ≤ i ≤ N, 1 ≤ j ≤ N

    bi : • Assume a distribution (e.g.: bi ~ N (μ, σ)) • Estimate from data its parameters when the

    hidden sequence is in the i-th state

    Training

  • Standard ApproachDecoding: Viterbi algorithm

  • Standard ApproachDecoding: Viterbi algorithm

  • Standard ApproachDecoding: Viterbi algorithm

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

  • Standard ApproachDecoding: Viterbi algorithm

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

  • Standard ApproachDecoding: Viterbi algorithm

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

  • Standard ApproachDecoding: Viterbi algorithm

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

    Vt(i): probability of the most likely path ending in the i-th

    state at time t

  • Standard ApproachDecoding: Viterbi algorithm

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

    Vt(i): probability of the most likely path ending in the i-th

    state at time t

    V1(i): probability of starting in i-th state, given the

    observation

  • Standard Approach

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

    Decoding: List-Viterbi algorithm

  • Standard Approach

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

    Decoding: List-Viterbi algorithm

    Vt(i): probabilities of the k most likely paths ending in

    the i-th state at time t (vector)

  • Standard Approach

    H e … o

    A A … A

    … … … …

    … … … …

    z z … z

    Decoding: List-Viterbi algorithm

    Vt(i): probabilities of the k most likely paths ending in

    the i-th state at time t (vector)

    V1(i): probability of starting from i-th state, given the

    observation

  • Confident Prediction for HMMs

    • Predict list of candidate state sequences using CP

    • Rank the list with respect to their likelihood

  • Conformal Prediction

    • Statistical framework to make predictions

    • Allows to edge predictions

    • Works for many underlying algorithms

  • Conformal Prediction“non-conformity measure”

    A({H, , H, …}, ) = 0.2

  • Conformal Prediction

    CP

    A“Training set” New objectNon-conformity measure

    εSignificance level

    YThe probability of error is smaller or equal to ε

    Prediction

    {( , a), ( , H), …}

  • CP CP CP CP�

    L

    Confident PredictionConfident Prediction for HMMs

  • CP CP CP CP�

    L

    A

    H

    Y1

    Confident PredictionConfident Prediction for HMMs

  • CP CP CP CP�

    L

    A

    H

    Y1

    a

    e

    i

    Y2

    Confident PredictionConfident Prediction for HMMs

  • CP CP CP CP�

    L

    A

    H

    Y1

    a

    e

    i

    Y2

    … a

    … o

    YL…

    Confident PredictionConfident Prediction for HMMs

  • CP CP CP CP�

    L

    A

    H

    Y1

    a

    e

    i

    Y2

    … a

    … o

    YL…

    Confident PredictionConfident Prediction for HMMs

  • CP CP CP CP�

    L

    A

    H

    Y1

    a

    e

    i

    Y2

    … a

    … o

    YL…

    Y = Y1 x Y2 x … x YL

    P(hn+1 ∉ Y) ≤ ε

    Confident PredictionConfident Prediction for HMMs

  • Ranking

    Estimate A, π from data using Maximum Likelihood

    σ(h) = P(h | A, π) for h ∈ Y

    Sort Y with respect to these scores

    Confident Prediction for HMMs

  • Experimental SettingContinuous HMM with three hidden states

    We consider the following cases:

    • Emissions have Normal distribution. The Standard Approach assumes the correct distribution.

    • Emissions have GMM distribution. The Standard Approach (erroneously) assumes Normal distribution.

    -4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4

    -0.25

    0.25

    0.5

    0.75

    1

    -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4

    -0.25

    0.25

    0.5

    0.75

    1

  • Validity

    Number of examples

    Cum

    ulat

    ive e

    rror

  • Cumulative Error

    Number of examples

    Cum

    ulat

    ive e

    rror

    Optimal case

  • Cumulative Error

    Number of examples

    Cum

    ulat

    ive e

    rror

    Optimal caseNumber of examples

    Cum

    ulat

    ive e

    rror

    Assumptions violation

  • Average Position

    Method Average PositionStandard Approach 58

    CP-HMMε = 0.01 917

    CP-HMMε = 0.05 208

    CP-HMMε = 0.1 70

    Method Average PositionStandard Approach 294

    CP-HMMε = 0.01 1067

    CP-HMMε = 0.05 337

    CP-HMMε = 0.1 146

    Optimal case Assumptions violation

  • Comparison

    Standard Approach Confidence Prediction for HMM

    Assume emission PDF. If the wrong distribution is assumed, it can

    perform badlyOnly assume exchangeability on

    emissions

    Needs to specify k Validity guaranteed for required accuracy

  • Future Work• Applications: speech recognition, cryptanalysis,

    biology

    • Noisy data

    • Other nonconformity measures

    • Probabilities instead of predictions (e.g.: Venn-Machines)

  • Thank you

  • References• [VGS05] “Algorithmic learning in a random world”, V.

    Vovk, A. Gammerman, G. Shafer. 2005.

    • [V67] “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm”, A.J. Viterbi. 1967.

    • [F73] “The Viterbi algorithm”. G.D. Forney Jr. 1973.

    • [R89] “ A tutorial on hidden Markov models and selected applications in speech recognition”, L.R. Rabiner. 1989.

  • Hidden Markov Models with ConfidenceGiovanni Cherubin and Ilia Nouretdinov

    21 April 2016

    [email protected]

    @gchers