discriminative training and machine learning approaches machine learning lab, dept. of csie, ncku...
DESCRIPTION
Our Concerns Feature extraction and HMM modeling should be jointly performed. Common objective function should be considered. To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory. Model parameters should be calculated rapidly without applying descent algorithm. 3TRANSCRIPT
![Page 1: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/1.jpg)
Discriminative Trainingand
Machine Learning Approaches
Machine Learning Lab, Dept. of CSIE, NCKU
Chih-Pin Liao
![Page 2: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/2.jpg)
Discriminative Training
2
![Page 3: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/3.jpg)
Our ConcernsFeature extraction and HMM modeling should be jointly performed.
Common objective function should be considered.
To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory.
Model parameters should be calculated rapidly without applying descent algorithm.
3
![Page 4: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/4.jpg)
MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications.
Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors.
Gradient descent algorithm was used to estimate HMM parameters.
Minimum Classification Error (MCE)4
![Page 5: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/5.jpg)
Procedure of training discriminative models using observations X
Discriminant function
Anti-discriminant function
Misclassification measure
)(log),g( jj XPX
1
)(log1
1)(log),(
jc
cjj XPC
XPXG
),(),g(),( jjj XGXXd
}{ j
MCE Training Procedure5
![Page 6: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/6.jpg)
Loss function is calculated by mapping
into a range between zero to one through a sigmoid function.
Minimize the expected loss or classification error to find discriminative model.
C
jjjXX XXlEXlE
1
)(1 ),( minarg)],([argmin ˆ
)),(exp(11),(
jj Xd
Xl
),( jXd
Expected Loss6
![Page 7: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/7.jpg)
Hypothesis Test
7
![Page 8: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/8.jpg)
New training criterion was derived from hypothesis test theory.
We are testing null hypothesis against alternative hypothesis.
Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma
Higher likelihood ratio imply stronger confidence towards accepting null hypothesis.
)()(
LR1
0
HXPHXP
Likelihood Ratio Test8
![Page 9: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/9.jpg)
Null and alternative hypotheses : Observations X are from target HMM state
j : Observation X are not from target HMM
state j
We develop discriminative HMM parameters for target state against non-target states.
Problem turns out to verify the goodness of data alignment to the corresponding HMM states.
0H
1H
Hypotheses in HMM Training9
![Page 10: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/10.jpg)
Maximum Confidence Hidden Markov Model
10
![Page 11: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/11.jpg)
MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure
where parameter set consists of HMM parameters and transformation matrix
)|(log)|(log maxarg
)|LLR( maxarg MC
XPXP
X
},,,{ Wjkjkjk
Maximum Confidence HMM11
![Page 12: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/12.jpg)
Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation
E-step
T
t
C
jctjtt
SS
jc
PC
PXjsP
SXXSPXSXEQ
1 1
)(1
1)(log),(
),(LLR),(],),(LLR[)(
xx
Hybrid Parameter Estimation12
![Page 13: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/13.jpg)
)},,{},,({)}{}({
)()(21
log212log
2loglog
11
)()(21
log212log
2loglog
),(
)(
1
1
1 1 1 1
1
WWQQ
WW
dW
C
WW
dWkj
Q
jkjkjkjkgjkjk
cktckT
ckt
ckck
T
t
C
j
K
k iktikT
ikt
ikik
t
jc
xx
xx
Expectation Function13
![Page 14: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/14.jpg)
T
t
T
t
K
kt
K
kt
T
tt
T
tt
jk
jc
jc
kcC
kj
kcC
kj
1 1 11
11
),(1
1),(
),(1
1),(
T
tt
T
tt
T
ttt
T
ttt
jk
jc
jc
kcC
kj
kcC
kj
W
11
11
),(1
1),(
),(1
1),(
xx
MC Estimates of HMM Parameters14
![Page 15: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/15.jpg)
'
),(1
1),(
))((),(1
1
))((),(
11
1
1
W
kcC
kj
kcC
kj
W T
tt
T
tt
T
t
Tcktcktt
Tjktjkt
T
tt
Tjk
jc
jc
xx
xx
MC Estimates of HMM Parameters15
![Page 16: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/16.jpg)
)(
)()()1( )(
i
igii
W
WWQWW
C
j
K
k
ijk
Tiijki
ig WWWT
WWWQ
1 1
1)()()()(
)(
)()(
MC Estimate of Transformation Matrix16
![Page 17: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/17.jpg)
Training featuresfrom face images
Uniformsegmentation
Transform HMMparameters with W
Convergence?
Viterbidecoding
MCM-basedHMM parameters
Extract featureswith estimated Wfrom observation
yes
no
Initialize W
Estimate transformation matrix Wwith GPD algorithm
W convergence?
WWWQWW tt
)|()()1(
no yes
Estimate initialHMM parameters
17
![Page 18: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/18.jpg)
MC Classification Rule
Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y
)LLR( maxargMC cc
Yc
18
![Page 19: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/19.jpg)
Summary A new maximum confidence HMM framework was
proposed. Hypothesis test principle was used for building
training criterion. Discriminative feature extraction and HMM
modeling were performed under the same criterion. “Maximum Confidence Hidden Markov Modeling for Face
Recognition”Chien, Jen-Tzung; Liao, Chih-Pin;Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 30, Issue 4, April 2008 Page(s):606 – 616
19
![Page 20: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/20.jpg)
Machine Learning Approaches
20
![Page 21: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/21.jpg)
Introduction
Conditional Random Fields (CRF) relax the normal conditional independence
assumption of the likelihood model enforce the homogeneity of labeling variables
conditioned on the observation Due to the weak assumptions of CRF model
and its discriminative nature allows arbitrary relationship among data may require less resources to train its
parameters
21
![Page 22: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/22.jpg)
Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs) language and text processing problem Object recognition problems Image and video segmentation tracking problem in video sequences
22
![Page 23: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/23.jpg)
Generative & Discriminative Model
23
![Page 24: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/24.jpg)
Two Classes of Models24
Generative model (HMM) - model the distribution of states
Direct model (MEMM and CRF)- model the posterior probability directly
)|()|( SXXS PP
)|(maxargˆ XSSS
p 1ts
1tx tx 1tx
ts 1ts
1ts
1tx tx 1tx
ts 1ts 1ts
1tx tx 1tx
ts 1ts
MEMM CRF
![Page 25: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/25.jpg)
Comparisons of Two Kinds of Model
25 Generative model – HMM Use Bayesian rule approximation Assume that observations are independent Multiple overlapping features are not modeled Model is estimated through recursive Viterbi
algorithm)|()|()()( 11 sxPssPss t
Sstt
![Page 26: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/26.jpg)
Direct model - MEMM and CRF Direct modeling of posterior probability Dependencies of observations are flexibly
modeled Model is estimated through recursive
Viterbi algorithm),|()()( 11
t
Sstt xssPss
26
![Page 27: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/27.jpg)
Hidden Markov Model &Maximum Entropy Markov Model
27
![Page 28: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/28.jpg)
HMM for Human Motion Recognition
HMM is defined by Transition probability Observation probability
1ts
1tx tx 1tx
ts 1ts
28
)|( 1tt ssp
)|( tt sxp
![Page 29: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/29.jpg)
Maximum Entropy Markov Model29
MEMM is defined by is used to replace
transition and observation probability in HMM model
1ts
1tx tx 1tx
ts 1ts
),|( 1 ttt xssp
![Page 30: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/30.jpg)
Maximum Entropy Criterion 30
Definition of feature functions
where
Constrained optimization problem
where empirical
expectation
model expectation
01)(,0
),(,
ssandcbifscf tt
ttsb
},{ 1 ttt sxc
ii ffi EEf ~:
VsCc
if scfcpcspEi
,),()(~)|(
N
jjji
VsCcif scf
NscfscpE
i1,
),(1),(),(~~
![Page 31: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/31.jpg)
Solution of MEMM
Lagrange multipliers are used for constrained optimization
where are the model parameters
Solution is obtained by
Ss jjj
iii
iii scf
scfscf
cZcsp
)),(exp(
)),(exp()),(exp(1)|(
31
i
ffi iiEEcspHcsp ))~(())|(()),|((
}{ i
)|(log)|()(~))|((,
cspcspcpcspHVsCc
![Page 32: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/32.jpg)
GIS Algorithm Optimize the Maxmimum Mutual
Information Criterion (MMI) Step1: Calculate the empirical expectation
Step2: Start from an initial value
Step3: Calculate the model expectation
Step4: Update model parameters
Repeat step 3 and 4 until convergence)
~log( )(
)()(current
f
fcurrenti
newi
i
i
E
E
32
1)0( i
N
jjjif scf
NE
i1
),(1~
VsCc
if scfcspN
Ei
,
),()|(1
![Page 33: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/33.jpg)
Conditional Random Field
33
![Page 34: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/34.jpg)
Conditional Random Field
34 Definition
Let be a graph such that .When conditioned on , and obeyed the
Markovproperty Then, is a conditional random field
1ts
1tx tx 1tx
ts 1ts
),( SXG Vvv )(SS
X vS
)~,,|(),,|( vwpvwp wvwv SXSSXS
),( SX
![Page 35: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/35.jpg)
CRF Model Parameters
The undirected graphical structure can be used to factorize into a normalized product of potential functions
Consider the graph as a linear-chain structure
Model parameter set
Feature function set
35
jVvvjj
iEeeii vgefp
,,),,(),,(exp),|( XSXSXS
,...},,...;,{ 2121
,...},,...;,{ 2121 ggff
![Page 36: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/36.jpg)
CRF Parameter Estimation
36 We can rewrite and maximize the posterior probability
where
and Log posterior probability is given by
)),(exp()(
1),|( k
kk FZ
p XSX
XS
,...},,...;,{,....},{ 212121
,...},,...;,{,...},{ 212121 ggffFF
k j
kkjjk F
ZL ),(
)(1log)( )()(
)( XSX
![Page 37: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/37.jpg)
Parameter Updating by GIS Algorithm
37 Differentiating the log posterior probability with respect to parameter
Setting this derivative to zero yields the constraint in maximum entropy model
This estimation has no closed-form solution. We can use GIS algorithm.
)],([)],([)( )(),|(),(~ )(
kj
kpjp
j
FEFELk XSXS
XSXS
![Page 38: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/38.jpg)
CRF MEMMDifference Objective Function Max. posterior
probability with Gibbs distribution
Max. entropy under constrain
Complexity of calculating normalization term
Full
DP
N-Best
Top One
Inference in model
Similarity Feature function State & observationState & state
Parameter Weight of feature function
Distribution Gibbs distribution
)1(O
)|(| NsO
)(kO
)|(| NsO
)|( XSp
)|(| 2 NsO
),|( 1 ttt xssp
38
![Page 39: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/39.jpg)
Summary and Future works 39
We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied.
In the future, the variational inference algorithm will be developed for improving calculation of conditional probability.
The posterior probability can be calculated directly by a approximating approach.
“Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung;ICASSP 2008. IEEE International Conference on March 31 2008-April 4 2008 Page(s):1969 - 1972
![Page 40: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao](https://reader035.vdocuments.us/reader035/viewer/2022062413/5a4d1b5e7f8b9ab0599abfa1/html5/thumbnails/40.jpg)
Thanks for your attention and
Discussion
40