classification algorithms lecture 17labs.seas.wustl.edu/bme/raman/lectures/lecture15... ·...

59
Classification Algorithms Lecture 17

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

1!

Classification Algorithms!

!Lecture 17!

Page 2: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

2!

Probability Theory!Apples and Oranges!

Pick red box!(40%)!

Pick blue box!(60%)!

(Red Box)!2 Apples!

6 Oranges!

(Blue Box)!3 Apples!1 Orange!

any piece of fruit in the boxes is equally likely!

From Bishop, PRML!

Page 3: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

3!

Probability Theory!Apples and Oranges!

Pick red box!(40%)!

Pick blue box!(60%)!

(Red Box)!2 Apples!

6 Oranges!

(Blue Box)!3 Apples!1 Orange!

1.  What is the overall probability that the selection will pick an apple?!

From Bishop, PRML!

Page 4: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

4!

Probability Theory!Apples and Oranges!

Pick red box!(40%)!

Pick blue box!(60%)!

(Red Box)!2 Apples!

6 Oranges!

(Blue Box)!3 Apples!1 Orange!

2. Given that we have chosen an orange, what is the probability the !box we chose was the blue one?!

From Bishop, PRML!

Page 5: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

5!

Probability Theory!

Total # Trials:!

!!# instances where X=xi!!!!!# instances where Y=yj!!

!

N

!

ci

!

rj

From Bishop, PRML!

Page 6: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

6!

Probability Theory!

Marginal Probability!

From Bishop, PRML!

Page 7: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

7!

Probability Theory!

Marginal Probability!

Conditional Probability!

From Bishop, PRML!

Page 8: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

8!

Probability Theory!

Marginal Probability!

Conditional Probability!Joint Probability!

From Bishop, PRML!

Page 9: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

9!

Probability Theory!

Sum Rule!!!!

From Bishop, PRML!

Page 10: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

10!

Probability Theory!

Sum Rule!!!!

Product Rule!!

!

p X = xi Y = y j( )p Y = y j( )

From Bishop, PRML!

Page 11: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

11!

The Rules of Probability!

g  Sum Rule!

g  Product Rule!

!

p X,Y( ) = p X Y( )p Y( )

From Bishop, PRML!

Page 12: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

12!

Bayesʼ Theorem!

posterior  ∝  likelihood  ×  prior  

!

p Y X( )p X( ) = p X Y( )p Y( ) From Bishop, PRML!

Page 13: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

13!

Probability Theory!Apples and Oranges!

Pick red box!(40%)!

Pick blue box!(60%)!

(Red Box)!6 Apples!

2 Oranges!

(Blue Box)!3 Apples!

1 Oranges!

1.  What is the overall probability that the selection will pick an apple?!

From Bishop, PRML!

Page 14: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

14!

Sum Rule & Product Rule at work!g P(B=r)=4/10!g P(B=b)=6/10!g P(F=a | B=r)=1/4!g P(F=o | B=r)=3/4!g P(F=a | B=b)=3/4!g P(F=o | B=b)=1/4!

!

P F = a( ) = P F = aB = b( )P B = b( ) + P F = aB = r( )P B = r( )

P F = a( ) =14"410

+34"610

=1120

P F = o( ) =920

!

p X,Y( ) = p X Y( )p Y( )

Page 15: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

15!

Probability Theory!Apples and Oranges!

Pick red box!(40%)!

Pick blue box!(60%)!

(Red Box)!2 Apples!

6 Oranges!

(Blue Box)!3 Apples!1 Orange!

2. Given that we have chosen an orange, what is the probability the !box we chose was the blue one?!

Page 16: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

16!

Sum Rule & Product Rule at work!g P(B=r)=4/10!g P(B=b)=6/10!g P(F=a | B=r)=1/4!g P(F=o | B=r)=3/4!g P(F=a | B=b)=3/4!g P(F=o | B=b)=1/4!

!

P B = bF = o( ) =P F = oB = b( )P B = b( )

P(F = o)

P B = bF = o( ) =14"610

"209

=13

!

p X,Y( ) = p X Y( )p Y( )

Page 17: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

17!

Classification!g Likelihood ratio test:!

n  Assume we are to classify an object based on the evidence provided by a measurement (or feature vector) x"

n  Would you agree that a reasonable decision rule would be the following?"

g  "Choose the class that is most ʻprobableʼ given the observed feature vector x”"

g  More formally: Evaluate the posterior probability of each class P(Ci|x) and choose the class with largest P(Ci|x)"

Page 18: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

18!

Classification!g Likelihood ratio test:!

n  Let us examine the implications of this decision rule for a 2-class problem"

n  In this case the decision rule becomes"

!

if P C1 x( ) > P C2 x( ) " x #C1

else P C1 x( ) < P C2 x( ) " x #C2

Page 19: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

19!

Classification!g Likelihood ratio test:!

n  Let us examine the implications of this decision rule for a 2-class problem"

n  In this case the decision rule becomes"

n  More compactly:"!

if P C1 x( ) > P C2 x( ) " x #C1

else P C1 x( ) < P C2 x( ) " x #C2

!

P C1 x( )<C2

>C1

P C2 x( )

Page 20: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

20!

Classification!g Likelihood ratio test:!

n  Let us examine the implications of this decision rule for a 2-class problem"

n  More compactly:"

n  From Bayes rule:"

!

P C1 x( )<C2

>C1

P C2 x( )

!

P xC1( )P(C1)P(x)

<C2

>C1

P xC2( )P(C2)P(x)

Page 21: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

21!

Classification!g Likelihood ratio test:!

n  Let us examine the implications of this decision rule for a 2-class problem"

n  More compactly:"

n  From Bayes rule:"

!

P C1 x( )<C2

>C1

P C2 x( )

!

P xC1( )P(C1)P(x)

<C2

>C1

P xC2( )P(C2)P(x)

P xC1( )P(C1)<C2

>C1

P xC2( )P(C2)

"(x) =P xC1( )P xC2( )

<C2

>C1

P(C2)P(C1)

Page 22: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

22!

Classification!g Likelihood ratio test:!

n  Let us examine the implications of this decision rule for a 2-class problem"

n  More compactly:"

n  From Bayes rule:"

n  The term ∧(x) is called the likelihood ratio"

!

P C1 x( )<C2

>C1

P C2 x( )

!

P xC1( )P(C1)P(x)

<C2

>C1

P xC2( )P(C2)P(x)

"(x) =P xC1( )P xC2( )

<C2

>C1

P(C2)P(C1)

Page 23: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

23!

An example!g Likelihood ratio test:!

!

P xC1( ) =12"

e#12x#4( )2

P xC2( ) =12"

e#12x#10( )2

(lets assume equal priors)!P(C1)=P(C2)!

From Gutierrez-Osuna!

Page 24: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

24!

An example!g Likelihood ratio test:!

!

"(x) =P xC1( )P xC2( )

<C2

>C1

1

!

"(x) =

12#

e$12x$4( )2

12#

e$12x$10( )2

=e$12x$4( )2

e$12x$10( )2

<C2

>C1

1

log "(x)( ) = $ x $ 4( )2 + x $10( )2<C2

>C1

0

!

P xC1( ) =12"

e#12x#4( )2

P xC2( ) =12"

e#12x#10( )2

Page 25: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

25!

An example!g Likelihood ratio test:!

!

"(x) =P xC1( )P xC2( )

<C2

>C1

1

!

log "(x)( ) = # x # 4( )2 + x #10( )2<C2

>C1

0

7<C2

>C1

x

!

P xC1( ) =12"

e#12x#4( )2

P xC2( ) =12"

e#12x#10( )2

Page 26: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

26!

Variants!g  Maximum A Posteriori (MAP) Criterion!

g Maximum Likelihood (ML) Criterion!!

"(x) =P xC1( )P(C1)

P(x)<C2

>C1

P xC2( )P(C2)P(x)

=P C1 x( )P C2 x( )

<C2

>C1

1

!

"(x) =P xC1( )P(C1)

P(x)<C2

>C1

P xC2( )P(C2)P(x)

=P xC1( )P xC2( )

<C2

>C1

1 P C1( ) = P C2( )[ ]

Page 27: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

27!

Discriminant Functions!g  All the decision rules we have presented in this lecture have the same

structure!n  At each point x in feature space choose class Ci which maximizes (or minimizes)

some measure gi(x)"g  This structure can be formalized with a set of discriminant functions gi

(x), i=1..C, and the following decision rule!! !“assign x to the class C if gi(x) >gj(x) all j≠I”!

g  Therefore, we can visualize the decision rule as a network or machine that computes C discriminant functions and selects the category corresponding to the largest discriminant. Such network is depicted in the following figure!

Criterion! Discriminant !Function!

MAP" gi(x)=P(Ci|x)"

ML" gi(x)=P(x|Ci)"

g1(x)!

g2(x)!

gC(x)!

x1!

x2!

xd!

gC(x)!

gC(x)!

gC(x)!

gC(x)!

gC(x)!

gC(x)!

Select max gi(x)!

features!

discriminant!functions!

Class!assignment!

From Gutierrez-Osuna!

Page 28: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

28!

Quadratic Classifiers!g Bayes classifiers for Normally distributed

classes!n  Case 1: Σi=σ2I"n  Case 2: Σi=Σ (Σ diagonal)"n  Case 3: Σi=Σ (Σ non-diagonal)"n  Case 4: Σi=σi

2I"n  Case 5: Σi≠Σj general case"

From Gutierrez-Osuna! From Duda, Hart and Stork!

Page 29: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

29!

Quadratic Classifiers!g  Bayes classifiers for Normally distributed classes!

g  As we will show, for classes that are normally distributed, this family of discriminant functions can be reduced to very simple expressions!

!

choose Ci if gi(x) > g j (x) "j # iwhere gi(x) = P(Ci | x) (MAP)

g1(x)!

g2(x)!

gC(x)!

x1!

x2!

xd!

gC(x)!

gC(x)!

gC(x)!

gC(x)!

gC(x)!

gC(x)!

Select max gi(x)!

features!

discriminant!functions!

Class!assignment!

From Gutierrez-Osuna!

Page 30: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

30!

Quadratic Classifiers!g  Bayes classifiers for Normally distributed classes!

!g  Gaussian distribution:!

g  Bayes Rule!

!

choose Ci if gi(x) > g j (x) "j # iwhere gi(x) = P(Ci | x) (MAP)

!

P x( ) =1

2"( )D / 2 #1/ 2exp $

12x $ µ( )T#$1 x $ µ( )

%

& '

(

) *

µ +mean D $ dimensional, 2 +var iance DxD covariance matrix# +determinant covariance matrix

!

gi(x) = P(Ci | x) =P(x |Ci)P(Ci)

P(x)=

12"( )D / 2 #i

1/ 2 exp $12x $ µi( )T#i

$1 x $ µi( )%

& '

(

) * P(Ci)

1P(x)

Page 31: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

31!

Quadratic Classifiers!g  Bayes Rule for Gaussian distribution (after eliminating

constants):!

g  Taking natural logs since the logarithm is also monotonically increasing function!

g  This is called quadratic discriminant function!

!

gi(x) = P(Ci | x) = "i#1/ 2 exp #

12x # µi( )T"i

#1 x # µi( )$

% &

'

( ) P(Ci)

!

gi(x) = "12x " µi( )T#i

"1 x " µi( )$

% &

'

( ) "12log #i( ) + log P(Ci)( )

Page 32: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

32!

Case 1: : Σi=σ2I !!g  This situation occurs when the features are statistically independent

with the same variance for all classes!n  In this case, the quadratic discriminant function becomes"

gi (x) = !12x !µi( )T ! 2I( )

!1x !µi( )

"

#$

%

&'!12log ! 2I( )+ log P(Ci )( )

gi (x) = !12! 2 x !µi( )T x !µi( )

"

#$

%

&'!12D log ! 2( )+ log P(Ci )( )

gi (x) =

droppingsecondterm

!12! 2 x !µi( )T x !µi( )

"

#$

%

&'+ log P(Ci )( )

Page 33: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

33!

Case 1: : Σi=σ2I !!g  This situation occurs when the features are statistically independent

with the same variance for all classes!n  In this case, the quadratic discriminant function becomes"

n  Expanding"

n  Ignoring xTx as it is constant for all classes"

gi (x) = !12! 2 xT x ! xTµi !µi

T x +µiTµi( )+ log P(Ci )( )

gi (x) = !12! 2 xT x ! 2µi

T x +µiTµi( )+ log P(Ci )( )

gi (x) = !12x !µi( )T ! 2I( )

!1x !µi( )

"

#$

%

&'!12log ! 2I( )+ log P(Ci )( )

gi (x) = !12! 2 x !µi( )T x !µi( )

"

#$

%

&'!12D log ! 2( )+ log P(Ci )( )

gi (x) =

droppingsecondterm

!12! 2 x !µi( )T x !µi( )

"

#$

%

&'+ log P(Ci )( )

gi (x) = !12! 2 !2µi

T x +µiTµi( )+ log P(Ci )( )

Page 34: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

34!

Case 1: : Σi=σ2I !!g  This situation occurs when the features are statistically independent

with the same variance for all classes!n  In this case, the quadratic discriminant function becomes"""

n  Expanding""

n  Ignoring xTx as it is constant for all classes"

n  Discriminant function form"

gi (x) = !12! 2 xT x ! 2µi

T x +µiTµi( )+ log P(Ci )( )!

gi(x) = "12# 2 x " µi( )T x " µi( )

$

% &

'

( ) + log P(Ci)( )

gi (x) = !12! 2 !2µi

T x +µiTµi( )+ log P(Ci )( )

gi (x) = wiT x +wio

wherewi =

µi

! 2

wi0 = !12! 2 µi

Tµi + log P(Ci )( )

"

#$$

%$$

Page 35: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

35!

Case 1: : Σi=σ2I !!g  This situation occurs when the features are statistically independent

with the same variance for all classes!n  Discriminant function form"

"n  Since the discriminant is linear, the decision boundaries gi(x), gj(x) will be

hyperplanes"n  If we assume equal priors"

n  This is the nearest mean classifier"n  If unit variance (σ2=1), the distance becomes Euclidean distance"

!

gi(x) = "12#2

"2µiT x " µi

Tµi( ) + log P(Ci)( )

!

gi(x) = wiT x + wio

wherewi =

µi

" 2

wi = #12" 2 µi

Tµi + log P(Ci)( )

$

% &

' &

!

gi(x) = "12#2

x " µi( )T x " µi( )$

% &

'

( ) + log P(Ci)( )

gi(x) = "12#2

x " µi( )T x " µi( )$

% &

'

( )

Page 36: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

36!

Case 1: : Σi=σ2I !!

From Gutierrez-Osuna!

Page 37: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

37!

Case 2: : Σi=Σ (Σ diagonal) !!g  The classes still have the same covariance matrix, but the features are

allowed to have different variances!n  In this case, the quadratic discriminant function becomes"

"n  Eliminating the term x[k]2, which is constant for all classes"

g  This discriminant is linear, so the decision boundaries gi(x)=gj(x), will also be hyper-planes"g  The loci of constant probability are hyper-ellipses aligned with the feature axes"g  Note that the only difference with the previous classifier is that the distance of each axis is

normalized by the variance of the axis"

Page 38: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

38!

Case 2: : Σi=Σ (Σ diagonal) !!

From Gutierrez-Osuna!

Page 39: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

39!

Case 3: : Σi=Σ (Σ non-diagonal) !!g  In this case, all the classes have the same covariance matrix, but this

is no longer diagonal!g  The quadratic discriminant becomes!

!g  Eliminating constant log(|Σ|) term!

n  The quadratic term is called the Mahanalobis distance"

g  The Mahalanobis distance is a vector! distance that uses a Σ-1 norm!

n  Σ-1 can be thought of as a stretching factor "on the space n Note that for an identity covariance matrix (Σ=I), the"n  Mahalanobis distance becomes the familiar Euclidean distance"

!

gi(x) = "12x " µi( )T#i

"1 x " µi( )$

% &

'

( ) "12log #i( ) + log P(Ci)( )

gi(x) = "12x " µi( )T#"1 x " µi( )

$

% &

'

( ) "12log #( ) + log P(Ci)( )

!

gi(x) = "12x " µi( )T#"1 x " µi( )

$

% &

'

( ) + log P(Ci)( )

!

x " y#"1

2= x " y( )T#"1 x " y( )

Page 40: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

40!

Case 3: : Σi=Σ (Σ non-diagonal) !!g  Expansion of the quadratic term in the discriminant yields!

g  Removing the term xTΣ-1x, which is constant for all classes!

g  Reorganizing terms we obtain!

"n  This discriminant is linear, so the decision boundaries will also be hyper-planes"n  The constant probability loci are hyper-ellipses aligned with the eigenvectors of Σ"

g  If we can assume equal priors!n  The classifier becomes a minimum (Mahalanobis) distance classifier"

!

!

gi(x) = "12x " µi( )T#"1 x " µi( )

$

% &

'

( ) + log P(Ci)( )

gi(x) = "12xT#"1x " 2µi

T#"1x + µiT#"1µi( )$

% &

'

( ) + log P(Ci)( )

!

gi(x) = µiT"#1x #

12

µiT"#1µi

$

% &

'

( ) + log P(Ci)( )

!

gi(x) = wiT x + wio

wherewi = "#1µi

wi = #12

µiT"#1µi + log P(Ci)( )

$

% &

' &

!

gi(x) = "12x " µi( )T#"1 x " µi( )

$

% &

'

( )

Page 41: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

41!

Case 3: : Σi=Σ (Σ non-diagonal) !!g  Expansion of the quadratic term in the discriminant yields!

g  Reorganizing terms we obtain!

"n  This discriminant is linear, so the decision boundaries will also be hyper-planes"n  The constant probability loci are hyper-ellipses aligned with the eigenvectors of Σ"

g  If we can assume equal priors!n  The classifier becomes a minimum (Mahalanobis) distance classifier"

!

!

gi(x) = µiT"#1x #

12

µiT"#1µi

$

% &

'

( ) + log P(Ci)( )

!

gi(x) = wiT x + wio

wherewi = "#1µi

wi = #12

µiT"#1µi + log P(Ci)( )

$

% &

' &

!

gi(x) = "12x " µi( )T#"1 x " µi( )

$

% &

'

( )

Page 42: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

42!

Case 3: : Σi=Σ (Σ non-diagonal) !

From Gutierrez-Osuna!

Page 43: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

43!

Case 4: Σi=σi I!g  In this case, each class has a different covariance

matrix, which is proportional to the identity matrix!n  The quadratic discriminant becomes"

g  This expression cannot be reduced further so!n  The decision boundaries are quadratic: hyper-ellipses"n  The loci of constant probability are hyper-spheres aligned with

the feature axis"

!

gi(x) = "12x " µi( )T#i

"1 x " µi( )$

% &

'

( ) "12log #i( ) + log P(Ci)( )

gi(x) = "12x " µi( )T* i

"2 x " µi( )$

% &

'

( ) "12N log * i

2( ) + log P(Ci)( )

Page 44: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

44!

Case 4: Σi=σi I!

From Gutierrez-Osuna!

Page 45: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

45!

Case 5: Σi≠Σj!g  We already derived the expression for the general case at the

beginning of this discussion!

!g  Reorganizing terms in a quadratic form yields!

n  The loci of constant probability for each class are hyper-ellipses, oriented with the eigenvectors of Σi for that class"

n  The decision boundaries are again quadratic: hyper-ellipses or hyper-parabolloids"

n  Notice that the quadratic expression in the discriminant is proportional to the Mahalanobis distance using the class-conditional covariance Σi"

!

gi(x) = xTWix + wiT x + wi0

where

Wi = "12#i"1

wi = #i"1µi

wi0 = "12

µiT#i

"1µi "12log #i( ) + log P(Ci)( )

$

%

& &

'

& &

!

gi(x) = "12x " µi( )T#"1 x " µi( )

$

% &

'

( ) "12log #i( ) + log P(Ci)( )

Page 46: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

46!

Case 5: Σi≠Σj!

From Gutierrez-Osuna!

Page 47: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

47!

Naïve Bayes Classifier: An Example!

Day Outlook Temperature Humidity Wind Play D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

From Machine Learning, Mitchell!

Page 48: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

48!

Naïve Bayes Classifier!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

Predict the target value Play = yes or Play = no!

Page 49: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

49!

Naïve Bayes Classifier!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

Predict the target value Play = yes or Play = no!

Using Bayes rule we can write:!

!

P(Play = yes) =P sunny,cool,humidity,strong play = yes( )P play = yes( )

P sunny,cool,humidity,strong play = playi( )playi =yes,no"

!

P(Play = no) =P sunny,cool,humidity,strong play = yes( )P play = no( )

P sunny,cool,humidity,strong play = playi( )playi =yes,no"

Page 50: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

50!

Naïve Bayes Classifier!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

Predict the target value Play = yes or Play = no!

More generally we can write the most probable target value as:!

!

"MAP = argmax" j #V

P a1,a2,...,an" j( )P " j( )P a1,a2,...,an( )

"MAP = argmax" j #V

P a1,a2,...,an" j( )P " j( )

Page 51: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

51!

Naïve Bayes Classifier!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

Predict the target value Play = yes or Play = no!

More generally we can write the most probable target value as:!

!

"MAP = argmax" j #V

P a1,a2,...,an" j( )P " j( )

Naïve Bayes classifier is based on the simplifying assumption!that attributes/features are conditionally independent!

!

P a1,a2,...,an" j( ) = P a1" j( )# P a2" j( )# ...P an" j( )P a1,a2,...,an" j( ) = P ai" j( )

i$

Page 52: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

52!

Naïve Bayes Classifier!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

Predict the target value Play = yes or Play = no!

More generally we can write the most probable target value as:!

!

"MAP = argmax" j #V

P a1,a2,...,an" j( )P " j( )

Naïve Bayes classifier is based on the simplifying assumption!that attributes/features are conditionally independent!

!

"MAP = argmaxv j #V

P " j( ) P ai" j( )i$

Page 53: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

53!

Naïve Bayes Classifier: An Example!

Day Outlook Temperature Humidity Wind Play D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

From Machine Learning, Mitchell!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

Page 54: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

54!

Naïve Bayes Classifier!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

P(Play = Yes) � 9/14!

P(Play = no) � 5/14!

P ( Wind = strong | Play = yes ) = 3/9!

P(Wind = strong | Play = no ) = 3/5!

P(yes)*P(sunny|yes)*P(cool|yes)*P(high|yes)*P(strong|yes)=? !

P(no)*P(sunny|no)*P(cool|no)*P(high|no)*P(strong|no)=?!

Page 55: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

55!

Naïve Bayes Classifier!

Outlook = sunny, Temp = Cool, Humidity = high, Wind = strong!

P(Play = Yes) � 9/14!

P(Play = no) � 5/14!

P ( Wind = strong | Play = yes ) = 3/9!

P(Wind = strong | Play = no ) = 3/5!

P(yes)*P(sunny|yes)*P(cool|yes)*P(high|yes)*P(strong|yes)=0.0053 !

P(no)*P(sunny|no)*P(cool|no)*P(high|no)*P(strong|no)=0.0206!

Naïve Bayes Classifier Prediction: Play Tennis = no!

Page 56: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

56!

Non-parametric density estimation!

From Gutierrez-Osuna!

Page 57: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

57!

Nearest Neighbor Classifier!

From Gutierrez-Osuna!

Page 58: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

58!

Nearest Neighbor Classifier!

From Gutierrez-Osuna!

Page 59: Classification Algorithms Lecture 17labs.seas.wustl.edu/bme/raman/Lectures/Lecture15... · 2013-04-05 · 4! Probability Theory! Apples and Oranges! Pick red box! (40%)! Pick blue

59!

Nearest Neighbor Classifier!

From Gutierrez-Osuna!