pattern recognition: statistical and neural

25
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 10 Sept 28, 2005 Nanjing University of Science & Technology

Upload: giselle-barrera

Post on 03-Jan-2016

34 views

Category:

Documents


3 download

DESCRIPTION

Nanjing University of Science & Technology. Pattern Recognition: Statistical and Neural. Lonnie C. Ludeman Lecture 10 Sept 28, 2005. P(C 2 ). P(C 2 ). 0. N MAP =. N MPE =. P(C 1 ). P(C 1 ). (C 22 - C 12 ) P(C 2 ). N BAYES =. (C 11 - C 21 ) P(C 1 ). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pattern Recognition: Statistical and Neural

1

Pattern Recognition:Statistical and Neural

Lonnie C. Ludeman

Lecture 10

Sept 28, 2005

Nanjing University of Science & Technology

Page 2: Pattern Recognition: Statistical and Neural

2

If l( x ) N

Likelihood ratio

<>C1

C2 Threshold

Review 3: MAP, MPE , Bayes, Neyman Pearson Classification Rules

(C22 - C12 ) P(C2)

(C11 - C21 ) P(C1)NBAYES =

P(C2)

P(C1)NMAP =

P(C2)

P(C1)NMPE = = p(x | C2 ) dx

R1( NNP )0

Page 3: Pattern Recognition: Statistical and Neural

3

Lecture 10 Topics

1. Gaussian Random variables and Vectors

2. General Gaussian Problem: 2-Class Case

Special Cases: Quadratic and Linear Classifiers

3. Mahalanobis Distance

4.General Gaussian: M-Class Case

Special Cases: Quadratic and Linear Classifiers

Page 4: Pattern Recognition: Statistical and Neural

4

pX(x) = 1

2 s

exp { - (x - m) 2 }

2s2

m = Mean Value

X = Random Variable

Gaussian (Normal) Random Variable: X ~ N(m, s2)

s2 = Variance

X is a Gaussian (Normal) Random Variable if

its probability density function pX(x) is given by

Page 5: Pattern Recognition: Statistical and Neural

5

General Gaussian Density: x ~ N(M, K)

The random vector X is normal (Gaussian) distributed if its density function p(x)is given by

p(x) = 1

(2 )N 2 K

1 2

exp( - (x - M)TK-1(x – M) )1 2

M = [ m1, m2, … , mN ]T Mean Vector

k11 k12 … k1N

k21 k22 … k2N

kN1 kN2 … kNN

K =

x = [ x1, x2, … , xN ]T Pattern Vector

Covariance Matrix

Page 6: Pattern Recognition: Statistical and Neural

6

Properties of Covariance Martix

kjk = E[ ( x

j – m

j ) ( x

k – m

k ) ] Covariance

kjj = E[ ( x

j – m

j )2 ] Component Variance

For j , k = 1, 2, … , M

K is a positive definite Matrix

K has positive eigen values

Page 7: Pattern Recognition: Statistical and Neural

7

General Gaussian Problem: 2-Class Case

The random vector X is normally (Gaussian) distributed under both classes

p(x|C1) = 1

(2 )N 2 K

1

1 2

exp(- (x – M1)TK

1

-1(x – M1) )

1 2

p(x|C2) = 1

(2 )N 2 K

2

1 2

exp(- (x – M2)TK

2

-1(x – M2) )1

2

C1 :

X ~ N( M

1, K

1 )

C2 :

X ~ N( M

2, K

2 )

Page 8: Pattern Recognition: Statistical and Neural

8

C1 :

X ~ N( M

1, K

1 ) , P(C

1)

C2 :

X ~ N( M

2, K

2 ) , P(C

2)

A. Assumptions:

B. Performance Measure: MAP, P(errror), Risk, P

D , MiniMax

C: Optimum Classification: Min or Max

General Gaussian Framework: 2-Class Case

Page 9: Pattern Recognition: Statistical and Neural

9

Optimum Decision Rule: 2-Class Gaussian

Derivation of Optimum Decision Rule which is a likelihood ratio test- threshold determined by type

K2

1 2

p(x|C1)

1

(2 )N 2

exp(- (x – M1)TK

1

-1(x – M1) )1

2

p(x|C2) 1

(2 )N 2 K

2

1 2

exp(- (x – M2)TK

2

-1(x – M2) )1

2

=

if exp(- (x – M

1)TK

1

-1(x – M1) )

exp(- (x – M2)TK

2

-1(x – M2) )K

1

1 2

½

½

>< TC

1

C2

K1

1 2

Page 10: Pattern Recognition: Statistical and Neural

10

if- (x – M

1)TK

1

-1(x – M1) + (x – M

2)TK

2

-1(x – M2)

><

C1

C2

T1 = 2 ln(T ) = 2 lnT + ln - ln

K2

1 2

K1

1 2

T1

Optimum Decision Rule: 2-Class Gaussian

where

And T is the optimum threshold for the typeof performance measure used

K1

K2

Quadratic Processing

Page 11: Pattern Recognition: Statistical and Neural

11

(C22 - C12 ) P(C2)

(C11 - C21 ) P(C1)NBAYES =

P(C2)

P(C1)NMAP =

P(C2)

P(C1)NMPE = = p(x | C2 ) dx

R1( NNP )0

T = NMAP or NBAYES or NMPE or NNP

Page 12: Pattern Recognition: Statistical and Neural

12

dMAH

(x, y) = (x – y)TA-1(x – y)

Mahalanobis Distance: Definition

Given two N-Vectors x and y the Mahalanobis Distance d

MAH(x,y) is

defined by

If A = the identity Matrix then

dMAH

(x, y) = dEUCLIDIAN

(x, y) = (x – y)T(x – y)

Page 13: Pattern Recognition: Statistical and Neural

13

if ( M1 – M

2)T K-1 x >

<C

1

C2

T2

2-Class Gaussian: Special Case 1: K1 = K

2 = K

And T is the optimum threshold for the typeof performance measure used

T2 = ln T + ½ ( M

1

T K-1 M1 – M

2

T K-1 M2)

Equal Covariance Matrices

Linear Processing

Page 14: Pattern Recognition: Statistical and Neural

14

if ( M1 – M

2)T x >

<C

1

C2

T3

2-Class Gaussian: Case 2: K1 = K

2 = K = s2 I

And T is the optimum threshold for the typeof performance measure used

T3 = s2 ln T + ½ ( M

1

T M1 – M

2

T M2)

Equal Scaled Identity Covariance Matrices

Linear Processing

Page 15: Pattern Recognition: Statistical and Neural

15

if ( M1 – M

2)T x >

<C

1

C2

T4

2-Class Case Gaussian: Case 3: K1 = K

2 = K = s2 I

And T is the optimum threshold for the typeof performance measure used

T4 = ½ ( M

1

T M1 – M

2

T M2)

MPE or Bayes 0,1 costs with P(C1) = P(C

2)

Linear Processing

Page 16: Pattern Recognition: Statistical and Neural

16

C1 :

X ~ N( M

1, K

1 ) , P(C

1)

General Gaussian: M-Class Case

A. Asssumptions

p(x|C1) = 1

(2 )N/2

K1

exp(-½ (x – M1)TK

1

-1(x – M1) )

p(x|C2) = 1

(2 )N/2 K2

½ exp(- ½ (x – M

2)TK

2

-1(x – M2) )

C2 :

X ~ N( M

2, K

2 ) , P(C

2)

½

Page 17: Pattern Recognition: Statistical and Neural

17

p(x|CM) = 1

(2 )N/2 KM

½ exp(- ½ (x – M

M)TK

M

-1(x – MM) )

CM

: X ~ N( M

M, K

M ) , P(C

M)

B: Performance Measue : P(error)

C: Decision Rule: Minimum P(error)

Page 18: Pattern Recognition: Statistical and Neural

18

Selects class Ck

if p(x | C

k) P(C

k) > p(x | C

j) P(C

j) for all j = k

where

General Gaussian: M-Class Case

C: Optimum MPE Decision Rule Derivation

p(x |Cj) P(C

j) =

(2 )N/2

Kj

P(Cj) exp{-½ (x – M

j)TK

j

-1(x – Mj) }

½

Page 19: Pattern Recognition: Statistical and Neural

19

Define equivalent statistic: Sj(x) for j = 1, 2, … , M

Sj(x) = P(C

j) exp{-½ (x – M

j)TK

j

-1(x – Mj) } / K

j

½

Q

i(x) = (x – M

j)TK

j

-1(x – Mj) } – 2 ln P(C

j) + ln | K

i |

dMAH

(x , Mj) Bias

Quadratic Operation on observation vector x

2

M- Class General Gaussian - Continued

Select Class Cj if Q

j(x) is MINIMUM

Another equivalent statistic: Qj(x) for j = 1, 2, … , M

Page 20: Pattern Recognition: Statistical and Neural

20

M-Class Gaussian: Case 1: K1 = K

2 = … = K

M = K

Define equivalent statistic: Sj/(x) for j = 1, 2, … , M

P(Cj) exp{-½ (x – M

j)TK-1(x – M

j) }S

j/(x) =

Sj//(x) = (x – M

j)TK-1(x – M

j) – 2 lnP(C

j)

Define equivalent statistic: Sj//(x) for j = 1, 2, … , M

Page 21: Pattern Recognition: Statistical and Neural

21

Compute min of (x – Mj)TK-1(x – M

j) - 2 lnP(C

j)

dMAH

(x, y)2 biasSelect class C

j

with minimum value

Gaussian M-Class: Case 1: K1 = K

2 = … = K

M = K

Equivalent Decision Rule

Page 22: Pattern Recognition: Statistical and Neural

22

Case 1a: K1 = K

2 = … = K

M = K (Continued)

= xTK-1x – x K-1 Mj - M

j

TK-1x + Mj

T

K-1M

j

Compute min of (x – Mj)TK-1 (x – M

j)

Same for each class Same terms

Select Class Cj if following is minimum

- 2Mj

TK-1x + Mj K-1M

j

T – 2 lnP(Cj)

Page 23: Pattern Recognition: Statistical and Neural

23

Select Class Cj if L

j(x) is MAXIMUM

Lj(x) = M

j

TK-1x – ½ Mj

T

K-1M

j

+ lnP(Cj)

Equivalent Rule

M-Class Gaussian – Case 1: K1 = K

2 = … = K

M = K

Dot Product Bias

Linear Operation on observation vector x

Page 24: Pattern Recognition: Statistical and Neural

24

Summary

1. Gaussian Random variables and Vectors

2. General Gaussian Problem: 2-Class Case

Special Cases: Quadratic and Linear Classifiers

3. Mahalanobis Distance

4.General Gaussian: M-Class Case

Special Cases: Quadratic and Linear Classifiers

Page 25: Pattern Recognition: Statistical and Neural

25

End of Lecture 10