pattern recognition: statistical and neural
DESCRIPTION
Nanjing University of Science & Technology. Pattern Recognition: Statistical and Neural. Lonnie C. Ludeman Lecture 10 Sept 28, 2005. P(C 2 ). P(C 2 ). 0. N MAP =. N MPE =. P(C 1 ). P(C 1 ). (C 22 - C 12 ) P(C 2 ). N BAYES =. (C 11 - C 21 ) P(C 1 ). - PowerPoint PPT PresentationTRANSCRIPT
1
Pattern Recognition:Statistical and Neural
Lonnie C. Ludeman
Lecture 10
Sept 28, 2005
Nanjing University of Science & Technology
2
If l( x ) N
Likelihood ratio
<>C1
C2 Threshold
Review 3: MAP, MPE , Bayes, Neyman Pearson Classification Rules
(C22 - C12 ) P(C2)
(C11 - C21 ) P(C1)NBAYES =
P(C2)
P(C1)NMAP =
P(C2)
P(C1)NMPE = = p(x | C2 ) dx
R1( NNP )0
3
Lecture 10 Topics
1. Gaussian Random variables and Vectors
2. General Gaussian Problem: 2-Class Case
Special Cases: Quadratic and Linear Classifiers
3. Mahalanobis Distance
4.General Gaussian: M-Class Case
Special Cases: Quadratic and Linear Classifiers
4
pX(x) = 1
2 s
exp { - (x - m) 2 }
2s2
m = Mean Value
X = Random Variable
Gaussian (Normal) Random Variable: X ~ N(m, s2)
s2 = Variance
X is a Gaussian (Normal) Random Variable if
its probability density function pX(x) is given by
5
General Gaussian Density: x ~ N(M, K)
The random vector X is normal (Gaussian) distributed if its density function p(x)is given by
p(x) = 1
(2 )N 2 K
1 2
exp( - (x - M)TK-1(x – M) )1 2
M = [ m1, m2, … , mN ]T Mean Vector
k11 k12 … k1N
k21 k22 … k2N
kN1 kN2 … kNN
K =
x = [ x1, x2, … , xN ]T Pattern Vector
Covariance Matrix
6
Properties of Covariance Martix
kjk = E[ ( x
j – m
j ) ( x
k – m
k ) ] Covariance
kjj = E[ ( x
j – m
j )2 ] Component Variance
For j , k = 1, 2, … , M
K is a positive definite Matrix
K has positive eigen values
7
General Gaussian Problem: 2-Class Case
The random vector X is normally (Gaussian) distributed under both classes
p(x|C1) = 1
(2 )N 2 K
1
1 2
exp(- (x – M1)TK
1
-1(x – M1) )
1 2
p(x|C2) = 1
(2 )N 2 K
2
1 2
exp(- (x – M2)TK
2
-1(x – M2) )1
2
C1 :
X ~ N( M
1, K
1 )
C2 :
X ~ N( M
2, K
2 )
8
C1 :
X ~ N( M
1, K
1 ) , P(C
1)
C2 :
X ~ N( M
2, K
2 ) , P(C
2)
A. Assumptions:
B. Performance Measure: MAP, P(errror), Risk, P
D , MiniMax
C: Optimum Classification: Min or Max
General Gaussian Framework: 2-Class Case
9
Optimum Decision Rule: 2-Class Gaussian
Derivation of Optimum Decision Rule which is a likelihood ratio test- threshold determined by type
K2
1 2
p(x|C1)
1
(2 )N 2
exp(- (x – M1)TK
1
-1(x – M1) )1
2
p(x|C2) 1
(2 )N 2 K
2
1 2
exp(- (x – M2)TK
2
-1(x – M2) )1
2
=
if exp(- (x – M
1)TK
1
-1(x – M1) )
exp(- (x – M2)TK
2
-1(x – M2) )K
1
1 2
½
½
>< TC
1
C2
K1
1 2
10
if- (x – M
1)TK
1
-1(x – M1) + (x – M
2)TK
2
-1(x – M2)
><
C1
C2
T1 = 2 ln(T ) = 2 lnT + ln - ln
K2
1 2
K1
1 2
T1
Optimum Decision Rule: 2-Class Gaussian
where
And T is the optimum threshold for the typeof performance measure used
K1
K2
Quadratic Processing
11
(C22 - C12 ) P(C2)
(C11 - C21 ) P(C1)NBAYES =
P(C2)
P(C1)NMAP =
P(C2)
P(C1)NMPE = = p(x | C2 ) dx
R1( NNP )0
T = NMAP or NBAYES or NMPE or NNP
12
dMAH
(x, y) = (x – y)TA-1(x – y)
Mahalanobis Distance: Definition
Given two N-Vectors x and y the Mahalanobis Distance d
MAH(x,y) is
defined by
If A = the identity Matrix then
dMAH
(x, y) = dEUCLIDIAN
(x, y) = (x – y)T(x – y)
13
if ( M1 – M
2)T K-1 x >
<C
1
C2
T2
2-Class Gaussian: Special Case 1: K1 = K
2 = K
And T is the optimum threshold for the typeof performance measure used
T2 = ln T + ½ ( M
1
T K-1 M1 – M
2
T K-1 M2)
Equal Covariance Matrices
Linear Processing
14
if ( M1 – M
2)T x >
<C
1
C2
T3
2-Class Gaussian: Case 2: K1 = K
2 = K = s2 I
And T is the optimum threshold for the typeof performance measure used
T3 = s2 ln T + ½ ( M
1
T M1 – M
2
T M2)
Equal Scaled Identity Covariance Matrices
Linear Processing
15
if ( M1 – M
2)T x >
<C
1
C2
T4
2-Class Case Gaussian: Case 3: K1 = K
2 = K = s2 I
And T is the optimum threshold for the typeof performance measure used
T4 = ½ ( M
1
T M1 – M
2
T M2)
MPE or Bayes 0,1 costs with P(C1) = P(C
2)
Linear Processing
16
C1 :
X ~ N( M
1, K
1 ) , P(C
1)
General Gaussian: M-Class Case
A. Asssumptions
p(x|C1) = 1
(2 )N/2
K1
exp(-½ (x – M1)TK
1
-1(x – M1) )
p(x|C2) = 1
(2 )N/2 K2
½ exp(- ½ (x – M
2)TK
2
-1(x – M2) )
C2 :
X ~ N( M
2, K
2 ) , P(C
2)
½
17
p(x|CM) = 1
(2 )N/2 KM
½ exp(- ½ (x – M
M)TK
M
-1(x – MM) )
CM
: X ~ N( M
M, K
M ) , P(C
M)
B: Performance Measue : P(error)
C: Decision Rule: Minimum P(error)
18
Selects class Ck
if p(x | C
k) P(C
k) > p(x | C
j) P(C
j) for all j = k
where
General Gaussian: M-Class Case
C: Optimum MPE Decision Rule Derivation
p(x |Cj) P(C
j) =
(2 )N/2
Kj
P(Cj) exp{-½ (x – M
j)TK
j
-1(x – Mj) }
½
19
Define equivalent statistic: Sj(x) for j = 1, 2, … , M
Sj(x) = P(C
j) exp{-½ (x – M
j)TK
j
-1(x – Mj) } / K
j
½
Q
i(x) = (x – M
j)TK
j
-1(x – Mj) } – 2 ln P(C
j) + ln | K
i |
dMAH
(x , Mj) Bias
Quadratic Operation on observation vector x
2
M- Class General Gaussian - Continued
Select Class Cj if Q
j(x) is MINIMUM
Another equivalent statistic: Qj(x) for j = 1, 2, … , M
20
M-Class Gaussian: Case 1: K1 = K
2 = … = K
M = K
Define equivalent statistic: Sj/(x) for j = 1, 2, … , M
P(Cj) exp{-½ (x – M
j)TK-1(x – M
j) }S
j/(x) =
Sj//(x) = (x – M
j)TK-1(x – M
j) – 2 lnP(C
j)
Define equivalent statistic: Sj//(x) for j = 1, 2, … , M
21
Compute min of (x – Mj)TK-1(x – M
j) - 2 lnP(C
j)
dMAH
(x, y)2 biasSelect class C
j
with minimum value
Gaussian M-Class: Case 1: K1 = K
2 = … = K
M = K
Equivalent Decision Rule
22
Case 1a: K1 = K
2 = … = K
M = K (Continued)
= xTK-1x – x K-1 Mj - M
j
TK-1x + Mj
T
K-1M
j
Compute min of (x – Mj)TK-1 (x – M
j)
Same for each class Same terms
Select Class Cj if following is minimum
- 2Mj
TK-1x + Mj K-1M
j
T – 2 lnP(Cj)
23
Select Class Cj if L
j(x) is MAXIMUM
Lj(x) = M
j
TK-1x – ½ Mj
T
K-1M
j
+ lnP(Cj)
Equivalent Rule
M-Class Gaussian – Case 1: K1 = K
2 = … = K
M = K
Dot Product Bias
Linear Operation on observation vector x
24
Summary
1. Gaussian Random variables and Vectors
2. General Gaussian Problem: 2-Class Case
Special Cases: Quadratic and Linear Classifiers
3. Mahalanobis Distance
4.General Gaussian: M-Class Case
Special Cases: Quadratic and Linear Classifiers
25
End of Lecture 10