pattern classification all materials in these slides were taken from pattern classification (2nd ed)...

29
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Upload: dennis-goodman

Post on 13-Dec-2015

247 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification

All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Page 2: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Chapter 2 (Part 1): Bayesian Decision Theory

(Sections 1-4)

1. Introduction – Bayesian Decision Theory• Pure statistics, probabilities known, optimal decision

2. Bayesian Decision Theory–Continuous Features

3. Minimum-Error-Rate Classification

4. Classifiers, Discriminant Functions and Decision Surfaces

Page 3: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

3

1. Introduction

• The sea bass/salmon example

• State of nature, prior

• State of nature is a random variable

• The catch of salmon and sea bass is equiprobable

• P(1) = P(2) (uniform priors)

• P(1) + P( 2) = 1 (exclusivity and exhaustivity)

Page 4: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

4

• Decision rule with only the prior information

• Decide 1 if P(1) > P(2) otherwise decide 2

• Use of the class –conditional information

• P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon

Page 5: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

5

Page 6: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

6

• Posterior, likelihood, evidence

• P(j | x) = P(x | j) P (j) / P(x) (Bayes Rule)

• Where in case of two categories

• Posterior = (Likelihood. Prior) / Evidence

2j

1jjj )(P)|x(P)x(P

Page 7: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

7

Page 8: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

8

• Decision given the posterior probabilities

X is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1

if P(1 | x) < P(2 | x) True state of nature = 2

Therefore: whenever we observe a particular x, the probability of

error is :

P(error | x) = P(1 | x) if we decide 2

P(error | x) = P(2 | x) if we decide 1

Page 9: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

9

• Minimizing the probability of error

• Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2

Therefore:

P(error | x) = min [P(1 | x), P(2 | x)]

(Bayes decision)

Page 10: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

102. Bayesian Decision Theory – Continuous Features

• Generalization of the preceding ideas

1. Use of more than one feature

2. Use more than two states of nature

3. Allowing actions other than deciding the state of nature

4. Introduce a loss of function which is more general than the probability of error

Page 11: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

11

• Allowing actions other than classification primarily allows the possibility of rejection

• Refusing to make a decision in close or bad cases!

• The loss function states how costly each action taken is

Page 12: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

12

Let {1, 2,…, c} be the set of c states of nature

(or “categories”)

Let {1, 2,…, a} be the set of possible actions

Let (i | j) be the loss incurred for taking

action i when the state of nature is j

Page 13: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

13

Overall risk

R = Sum of all R(i | x) for i = 1,…,a

Minimizing R Minimizing R(i | x) for i = 1,…, a

for i = 1,…,a

Conditional risk

cj

1jjjii )x|(P)|()x|(R

Page 14: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

14

Select the action i for which R(i | x) is minimum

R is minimum and R in this case is called the Bayes risk = best performance that can be achieved!

Page 15: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

15

• Two-category classification

1 : deciding 1

2 : deciding 2

ij = (i | j)

loss incurred for deciding i when the true state of nature is j

Conditional risk:

R(1 | x) = 11P(1 | x) + 12P(2 | x)

R(2 | x) = 21P(1 | x) + 22P(2 | x)

Page 16: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

16

Our rule is the following:

if R(1 | x) < R(2 | x)

action 1: “decide 1” is taken

This results in the equivalent rule :

decide 1 if:

(21- 11) P(x | 1) P(1) >

(12- 22) P(x | 2) P(2)

and decide 2 otherwise

Page 17: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

17

Likelihood ratio:

The preceding rule is equivalent to the following rule:

Then take action 1 (decide 1)

Otherwise take action 2 (decide 2)

)(P

)(P.

)|x(P)|x(P

if1

2

1121

2212

2

1

Page 18: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 1)

18

Optimal decision property

“If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions”

Page 19: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

19

3. Minimum-Error-Rate Classification

• Actions are decisions on classesIf action i is taken and the true state of nature is j then:

the decision is correct if i = j and in error if i j

• Seek a decision rule that minimizes the probability of error which is the error rate

Page 20: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

20

• Introduction of the zero-one loss function:

Therefore, the conditional risk is:

“The risk corresponding to this loss function is the average probability error”

c,...,1j,i ji 1

ji 0),( ji

1jij

cj

1jjjii

)x|(P1)x|(P

)x|(P)|()x|(R

Page 21: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

21

• Minimize the risk requires maximize P(i | x)

(since R(i | x) = 1 – P(i | x))

• For Minimum error rate

• Decide i if P (i | x) > P(j | x) j i

Page 22: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

22

• Regions of decision and zero-one loss function, therefore:

• If is the zero-one loss function wich means:

b1

2

a1

2

)(P

)(P2 then

0 1

2 0 if

)(P

)(P then

0 1

1 0

)|x(P

)|x(P :if decide then

)(P

)(P. Let

2

11

1

2

1121

2212

Page 23: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

23

Page 24: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

244. Classifiers, Discriminant Functions

and Decision Surfaces

• The multi-category case

• Set of discriminant functions gi(x), i = 1,…, c

• The classifier assigns a feature vector x to class i

if:

gi(x) > gj(x) j i

Page 25: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

25

Page 26: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

26

• Let gi(x) = - R(i | x)

(max. discriminant corresponds to min. risk!)

• For the minimum error rate, we take gi(x) = P(i | x)

(max. discrimination corresponds to max. posterior!)

gi(x) P(x | i) P(i)

gi(x) = ln P(x | i) + ln P(i)

(ln: natural logarithm!)

Page 27: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

27

• Feature space divided into c decision regions

if gi(x) > gj(x) j i then x is in Ri

(Ri means assign x to i)

• The two-category case• A classifier is a “dichotomizer” that has two discriminant

functions g1 and g2

Let g(x) g1(x) – g2(x)

Decide 1 if g(x) > 0 ; Otherwise decide 2

Page 28: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

28

• The computation of g(x)

)(P

)(Pln

)|x(P

)|x(Pln

)x|(P)x|(P)x(g

2

1

2

1

21

Page 29: Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley

Pattern Classification, Chapter 2 (Part 2)

29