pattern recognition: statistical and neural

1

Pattern Recognition:Statistical and Neural

Lonnie C. Ludeman

Lecture 10

Sept 28, 2005

Nanjing University of Science & Technology

2

If l( x ) N

Likelihood ratio

<>C1

C2 Threshold

Review 3: MAP, MPE , Bayes, Neyman Pearson Classification Rules

(C22 - C12 ) P(C2)

(C11 - C21 ) P(C1)NBAYES =

P(C2)

P(C1)NMAP =

P(C2)

P(C1)NMPE = = p(x | C2 ) dx

R1( NNP )0

3

Lecture 10 Topics

1. Gaussian Random variables and Vectors

2. General Gaussian Problem: 2-Class Case

Special Cases: Quadratic and Linear Classifiers

3. Mahalanobis Distance

4.General Gaussian: M-Class Case


4

pX(x) = 1

2 s

exp { - (x - m) 2 }

2s2

m = Mean Value

X = Random Variable

Gaussian (Normal) Random Variable: X ~ N(m, s2)

s2 = Variance

X is a Gaussian (Normal) Random Variable if

its probability density function pX(x) is given by

5

General Gaussian Density: x ~ N(M, K)

The random vector X is normal (Gaussian) distributed if its density function p(x)is given by

p(x) = 1

(2 )N 2 K

1 2

exp( - (x - M)TK-1(x – M) )1 2

M = [ m1, m2, … , mN ]T Mean Vector

k11 k12 … k1N

k21 k22 … k2N

kN1 kN2 … kNN

K =

x = [ x1, x2, … , xN ]T Pattern Vector

Covariance Matrix

6

Properties of Covariance Martix

kjk = E[ ( x

j – m

j ) ( x

k – m

k ) ] Covariance

kjj = E[ ( x

j – m

j )2 ] Component Variance

For j , k = 1, 2, … , M

K is a positive definite Matrix

K has positive eigen values

7

General Gaussian Problem: 2-Class Case

The random vector X is normally (Gaussian) distributed under both classes

p(x|C1) = 1

(2 )N 2 K

1

1 2

exp(- (x – M1)TK

1

-1(x – M1) )

1 2

p(x|C2) = 1

(2 )N 2 K

2

1 2

exp(- (x – M2)TK

2

-1(x – M2) )1

2

C1 :

X ~ N( M

1, K

1 )

C2 :

X ~ N( M

2, K

2 )

8

C1 :

X ~ N( M

1, K

1 ) , P(C

1)

C2 :

X ~ N( M

2, K

2 ) , P(C

2)

A. Assumptions:

B. Performance Measure: MAP, P(errror), Risk, P

D , MiniMax

C: Optimum Classification: Min or Max

General Gaussian Framework: 2-Class Case

9

Optimum Decision Rule: 2-Class Gaussian

Derivation of Optimum Decision Rule which is a likelihood ratio test- threshold determined by type

K2

1 2

p(x|C1)

1

(2 )N 2

exp(- (x – M1)TK

1

-1(x – M1) )1

2

p(x|C2) 1

(2 )N 2 K

2

1 2

exp(- (x – M2)TK

2

-1(x – M2) )1

2

=

if exp(- (x – M

1)TK

1

-1(x – M1) )

exp(- (x – M2)TK

2

-1(x – M2) )K

1

1 2

½

½

>< TC

1

C2

K1

1 2

10

if- (x – M

1)TK

1

-1(x – M1) + (x – M

2)TK

2

-1(x – M2)

><

C1

C2

T1 = 2 ln(T ) = 2 lnT + ln - ln

K2

1 2

K1

1 2

T1

Optimum Decision Rule: 2-Class Gaussian

where

And T is the optimum threshold for the typeof performance measure used

K1

K2

Quadratic Processing

11

(C22 - C12 ) P(C2)

(C11 - C21 ) P(C1)NBAYES =

P(C2)

P(C1)NMAP =

P(C2)

P(C1)NMPE = = p(x | C2 ) dx

R1( NNP )0

T = NMAP or NBAYES or NMPE or NNP

12

dMAH

(x, y) = (x – y)TA-1(x – y)

Mahalanobis Distance: Definition

Given two N-Vectors x and y the Mahalanobis Distance d

MAH(x,y) is

defined by

If A = the identity Matrix then

dMAH

(x, y) = dEUCLIDIAN

(x, y) = (x – y)T(x – y)

13

if ( M1 – M

2)T K-1 x >

<C

1

C2

T2

2-Class Gaussian: Special Case 1: K1 = K

2 = K


T2 = ln T + ½ ( M

1

T K-1 M1 – M

2

T K-1 M2)

Equal Covariance Matrices

Linear Processing

14

if ( M1 – M

2)T x >

<C

1

C2

T3

2-Class Gaussian: Case 2: K1 = K

2 = K = s2 I


T3 = s2 ln T + ½ ( M

1

T M1 – M

2

T M2)

Equal Scaled Identity Covariance Matrices

Linear Processing

15

if ( M1 – M

2)T x >

<C

1

C2

T4

2-Class Case Gaussian: Case 3: K1 = K

2 = K = s2 I


T4 = ½ ( M

1

T M1 – M

2

T M2)

MPE or Bayes 0,1 costs with P(C1) = P(C

2)

Linear Processing

16

C1 :

X ~ N( M

1, K

1 ) , P(C

1)

General Gaussian: M-Class Case

A. Asssumptions

p(x|C1) = 1

(2 )N/2

K1

exp(-½ (x – M1)TK

1

-1(x – M1) )

p(x|C2) = 1

(2 )N/2 K2

½ exp(- ½ (x – M

2)TK

2

-1(x – M2) )

C2 :

X ~ N( M

2, K

2 ) , P(C

2)

½

17

p(x|CM) = 1

(2 )N/2 KM

½ exp(- ½ (x – M

M)TK

M

-1(x – MM) )

CM

: X ~ N( M

M, K

M ) , P(C

M)

B: Performance Measue : P(error)

C: Decision Rule: Minimum P(error)

18

Selects class Ck

if p(x | C

k) P(C

k) > p(x | C

j) P(C

j) for all j = k

where

General Gaussian: M-Class Case

C: Optimum MPE Decision Rule Derivation

p(x |Cj) P(C

j) =

(2 )N/2

Kj

P(Cj) exp{-½ (x – M

j)TK

j

-1(x – Mj) }

½

19

Define equivalent statistic: Sj(x) for j = 1, 2, … , M

Sj(x) = P(C

j) exp{-½ (x – M

j)TK

j

-1(x – Mj) } / K

j

½

Q

i(x) = (x – M

j)TK

j

-1(x – Mj) } – 2 ln P(C

j) + ln | K

i |

dMAH

(x , Mj) Bias

Quadratic Operation on observation vector x

2

M- Class General Gaussian - Continued

Select Class Cj if Q

j(x) is MINIMUM

Another equivalent statistic: Qj(x) for j = 1, 2, … , M

20

M-Class Gaussian: Case 1: K1 = K

2 = … = K

M = K

Define equivalent statistic: Sj/(x) for j = 1, 2, … , M

P(Cj) exp{-½ (x – M

j)TK-1(x – M

j) }S

j/(x) =

Sj//(x) = (x – M

j)TK-1(x – M

j) – 2 lnP(C

j)

Define equivalent statistic: Sj//(x) for j = 1, 2, … , M

21

Compute min of (x – Mj)TK-1(x – M

j) - 2 lnP(C

j)

dMAH

(x, y)2 biasSelect class C

j

with minimum value

Gaussian M-Class: Case 1: K1 = K

2 = … = K

M = K

Equivalent Decision Rule

22

Case 1a: K1 = K

2 = … = K

M = K (Continued)

= xTK-1x – x K-1 Mj - M

j

TK-1x + Mj

T

K-1M

j

Compute min of (x – Mj)TK-1 (x – M

j)

Same for each class Same terms

Select Class Cj if following is minimum

- 2Mj

TK-1x + Mj K-1M

j

T – 2 lnP(Cj)

23

Select Class Cj if L

j(x) is MAXIMUM

Lj(x) = M

j

TK-1x – ½ Mj

T

K-1M

j

+ lnP(Cj)

Equivalent Rule

M-Class Gaussian – Case 1: K1 = K

2 = … = K

M = K

Dot Product Bias

Linear Operation on observation vector x

24

Summary

1. Gaussian Random variables and Vectors

2. General Gaussian Problem: 2-Class Case


3. Mahalanobis Distance

4.General Gaussian: M-Class Case


25

End of Lecture 10

pattern recognition: statistical and neural

Documents