context dependent classification
DESCRIPTION
CONTEXT DEPENDENT CLASSIFICATION. Remember: Bayes rule Here: The class to which a feature vector belongs depends on: Its own value The values of the other features An existing relation among the various classes. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/1.jpg)
1
CONTEXT DEPENDENT CONTEXT DEPENDENT CLASSIFICATIONCLASSIFICATION
Remember: Bayes rule
Here: The class to which a feature vectorbelongs depends on:
Its own value The values of the other features An existing relation among the various
classes
ijxPxP ji ),()(
![Page 2: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/2.jpg)
2
This interrelation demands the classification to be performed simultaneously for all available feature vectors
Thus, we will assume that the training vectors
occur in sequence, one after the other and we will refer to them as observations
Nx,...,x,x 21
![Page 3: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/3.jpg)
3
The Context Dependent Bayesian Classifier Let Let Let be a sequence of classes, that is
There are MN of those Thus, the Bayesian rule can equivalently be
stated as
Markov Chain Models (for class dependence)
},...,,{: 21 NxxxX
Mii ,...,2,1 ,
i
iNiii ... : 21
Njii MjijiXPXPX ,...,2,1, , )()( :
)(),...,,(1121
kkkkk iiiiii PP
![Page 4: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/4.jpg)
4
NOW remember:
or
Assume: statistically mutually independent The pdf in one class independent of the
others, then
)()...,...,(
).,...,(
),...,,()(
1121
11
21
iiii
iii
iiii
PP
P
PP
NN
NN
N
N
kiiii PPP
kk2
)())(()(11
ix
N
kiki k
xpXp1
)()(
![Page 5: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/5.jpg)
5
From the above, the Bayes rule is readily seen to be equivalent to:
that is, it rests on
To find the above maximum in brute-force task we need Ο(NMΝ ) operations!!
)()())(()(
)())((
jjii
ji
XpPXpP
XPXP
N
kikii
iiii
kkkxpP
xpPPXp
2
1
)()(
.)()()()(
1
11
![Page 6: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/6.jpg)
6
The Viterbi Algorithm
![Page 7: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/7.jpg)
7
Thus, each Ωi corresponds to one path through the trellis diagram. One of them is the optimum (e.g., black). The classes along the optimal path determine the classes to which ωi are assigned.
To each transition corresponds a cost. For our case
•
•
•
)(
).(),(ˆ11
k
kkkk
ik
iiii
xp
Pd
)()(),(ˆ1101 iiiii xpPd
)()(),(ˆˆ1
1 i
N
kiii PXpdD
kk
![Page 8: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/8.jpg)
8
• Equivalently
where,
• Define the cost up to a node ,k,
N
k
N
k
dDdD1 1
(.,.)(.,.)ˆlnˆln
),(ˆln),(11
kkkk iiii dd
k
riii rrk
dD1
),()(1
![Page 9: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/9.jpg)
9
Bellman’s principle now states
The optimal path terminates at
• Complexity O (NM2)
Mii
dDD
kk
iiiii kkkk
k
,...,2,1,
),()(max)(
1
maxmax 111
0)(0max iD
:*iN
)(maxarg max*
NNi
N ii D
![Page 10: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/10.jpg)
10
Channel Equalization
The problem
•
•
•
knkkkk nIIIfx ),...,,( 11
Tlkkkk xxxx ],...,,[ 11
rkkk IIx ˆor ˆ
rkk Ix equalizer
![Page 11: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/11.jpg)
11
Example
•
•
• In xk three input symbols are involved:
Ik, Ik-1, Ik-2
k1kkk nII5.0x
2 1
l,xx
xk
kk
![Page 12: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/12.jpg)
12
Ik Ik-1 Ik-2 xk xk-10 0 0 0 0 ω1
0 0 1 0 1 ω2
0 1 0 1 0.5 ω3
0 1 1 1 1.5 ω4
1 0 0 0.5 0 ω5
1 0 1 0.5 1 ω6
1 1 0 1.5 0.5 ω7
1 1 1 1.5 1.5 ω8
![Page 13: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/13.jpg)
13
Not all transitions are allowed
•
• Then
•
)1 ,0 ,0(),,( 21 kkk III
),,( 11 kkk III)0 ,0 ,1(
)0 ,0 ,0(
25
1)( 2 iP
1 ,5 ,5.0 i
otherwise ,0
![Page 14: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/14.jpg)
14
• In this context, ωi are related to states. Given the current state and the transmitted bit, Ik, we determine the next state. The probabilities P(ωi|ωj) define the state dependence model.
The transition cost
•
for all allowable transitions
)() ,(1
xddkikk ii
ikkx
1 2
1
))()((k kk i ik
Tik xx
![Page 15: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/15.jpg)
15
Assume:• Noise white and Gaussian• A channel impulse response estimate to be
available
•
•
• The states are determined by the values of the binary variablesIk-1,…,Ik-n+1
For n=3, there will be 4 states
f
knkkk nIIfx )),...,(ˆ( 1
21)),...,(ˆ(
)(ln)(ln),(1
nkkk
kiii
IIfx
npxpdkkkk
![Page 16: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/16.jpg)
16
Hidden Markov Models
In the channel equalization problem, the states are observable and can be “learned” during the training period
Now we shall assume that states are not observable and can only be inferred from the training data
Applications:• Speech and Music Recognition• OCR• Blind Equalization• Bioinformatics
![Page 17: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/17.jpg)
17
An HMM is a stochastic finite state automaton, that generates the observation sequence, x1, x2,…, xN
We assume that: The observation sequence is produced as a result of successive transitions between states, upon arrival at a state:
![Page 18: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/18.jpg)
18
This type of modeling is used for nonstationary stochastic processes that undergo distinct transitions among a set of different stationary processes.
![Page 19: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/19.jpg)
19
Examples of HMM:• The single coin case: Assume a coin that is
tossed behind a curtain. All it is available to us is the outcome, i.e., H or T. Assume the two states to be:
S = 1HS = 2T
This is also an example of a random experiment with observable states. The model is characterized by a single parameter, e.g., P(H). Note that
P(1|1) = P(H)P(2|1) = P(T) = 1 – P(H)
![Page 20: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/20.jpg)
20
• The two-coins case: For this case, we observe a sequence of H or T. However, we have no access to know which coin was tossed. Identify one state for each coin. This is an example where states are not observable. H or T can be emitted from either state. The model depends on four parameters.
P1(H), P2(H),
P(1|1), P(2|2)
![Page 21: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/21.jpg)
21
• The three-coins case example is shown below:
• Note that in all previous examples, specifying the model is equivalent to knowing:
– The probability of each observation (H,T) to be emitted from each state.
– The transition probabilities among states: P(i|j).
![Page 22: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/22.jpg)
22
A general HMM model is characterized by the following set of parameters
• Κ, number of states
•
•
•
K,...,2,1i),ix(p
KjijiP ,...,2,1, ),(
(.) ies,probabilit state initial ,,...,2,1 ),( PKiiP
![Page 23: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/23.jpg)
23
That is:
What is the problem in Pattern Recognition• Given M reference patterns, each
described by an HMM, find the parameters, S, for each of them (training)
• Given an unknown pattern, find to which one of the M, known patterns, matches best (recognition)
} ),(),( ),({ KiPixpjiPS
![Page 24: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/24.jpg)
24
Recognition: Any path method• Assume the M models to be known (M
classes).• A sequence of observations, X, is given.• Assume observations to be emissions
upon the arrival on successive states• Decide in favor of the model S* (from the
M available) according to the Bayes rule
for equiprobable patterns
)(maxarg* XSPSS
)(maxarg* SXpSS
![Page 25: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/25.jpg)
25
• For each model S there is more than one possible sets of successive state transitions Ωi, each with probability
Thus:
• For the efficient computation of the above DEFINE
–
iii
ii
SPSXp
SXpSXP
)(),(
),()(
),,...,()( 1111 Sixxpi kkk
ki
kkkkk ixpiiPi )()( )( 111
History Local activity
)( SP i
![Page 26: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/26.jpg)
26
• Observe that
S
N
K
i
iSXP1
)()( Compute this for each S
![Page 27: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/27.jpg)
27
• Some more quantities
–
–
)()()(
),,...,,()(
1111
21
1
kkkki
k
kNkkk
ixpiiPi
Sixxxpi
k
)()(
),,...,()( 1
kk
kNk
ii
Sixxpi
![Page 28: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/28.jpg)
28
Training• The philosophy:
Given a training set X, known to belong to the specific model, estimate the unknown parameters of S, so that the output of the model, e.g.
to be maximized
This is a ML estimation problem with missing data
s
N
K
i
iSXp1
)()(
![Page 29: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/29.jpg)
29
Assumption: Data x discrete
Definitions:
•
•
)()(},...,2,1{ ixPixprx
)()()()()(
),( 11
SXPjijxPijPii
ji kkkk
)()()()(
SXPiiiii kk
k
![Page 30: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/30.jpg)
30
The Algorithm:• Initial conditions for all the unknown
parameters.
• Step 1: From the current estimates of the model parameters reestimate the new model S from
)SX(P Compute
1
1
1
1
)(
),()( N
kk
N
kk
i
jiijP
iji
from ns transitioof # to from ns transitioof #
N
kk
N
rxkk
x
i
iirP
1
) and 1
)(
)()(
irxi stateat being of
and stateat
)()( 1 iiP
![Page 31: CONTEXT DEPENDENT CLASSIFICATION](https://reader036.vdocuments.us/reader036/viewer/2022062501/568167ef550346895ddd5f85/html5/thumbnails/31.jpg)
31
• Step 3: Computego to step 2. Otherwise stop
• Remarks:– Each iteration improves the model
– The algorithm converges to a maximum (local or global)
– The algorithm is an implementation of the EM algorithm
S SSXPSXPSXP ,)()( If ).(
)()( : SXPSXPS