bayesian classification with a brief introduction to pattern recognition modified from slides by...

Bayesian Classificationwith a brief introduction to pattern

recognition

Modified from slides byMichael L. Raymer, Ph.D.

8/29/03 M. Raymer – WSU, FBS 2

The pattern recognition paradigm• Fruit on an assembly line

Oranges, grapefruit, lemons, cherries, apples

• Sensors measure:Red intensityYellow intensityMass (kg)Approximate volume

• At the end of the line, a gate switches to deposit the fruit into the correct bin


Training the algorithm

Red = 2.125

Yellow = 6.143

Mass = 134.32

Volume = 24.21

Apple

Sensors, scales, etc…


Training (2)Red = 2.125

Yellow = 6.143

Mass = 134.32

Volume = 24.21

Apple

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

LabelRed = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Red = ???

Yellow = ???

Mass = ???

Volume = ???

Label

Classifier


Testing

Red = 2.125

Yellow = 6.143

Mass = 134.32

Volume = 24.21

??

Classifier

!


Pattern MatrixV1 V2 V3 V4 V5

Ex 1 3.06 2.05 6.39 7.84 6.75

Ex 2 8.25 0.72 2.52 0.50 9.08

Ex 3 2.72 9.32 5.68 7.83 7.86

Ex 4 7.37 1.30 2.97 0.61 3.49

Ex 5 0.73 1.46 6.60 6.08 0.78

Ex 6 4.85 5.08 4.87 8.06 8.65

Ex 7 5.89 1.23 6.38 2.81 6.84

Ex 8 0.52 6.57 4.08 3.62 0.59

Ex 9 5.66 3.65 6.87 6.90 7.93

Ex 10 3.92 0.73 1.01 3.57 2.47

Ex 11 8.84 1.42 2.79 3.40 3.19

Ex 12 5.63 4.32 8.08 0.82 4.74

Class

1

1

1

2

2

2

3

3

3

4

4

4


Nearest Neighbor Classification

Mass (normalized)0 1 2 3 4 5 6 7 8 9 10

12

34

56

78

910

Red

Int

ensi

ty (

norm

aliz

ed)

?


Evaluating Accuracy

Trainingdata

Mass (normalized)0 1 2 3 4 5 6 7 8 9 10

12

34

56

78

910

Red

Inte

nsity

(no

rmal

ized

)

Testingdata


Problems with KNN classifiers• Lots of memorization• Slow (lots of distance calculations)• Incorrect features cause problems• Features are assumed to all be of

equal importance in classification• Odd exemplars (e.g. green/yellow

apples) cause problems• What value for k?


Distributions• Bayesian classifiers start with an estimate of

the distribution of the features

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

P(N)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N =# Heads (20 Tosses)0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.5 1 1.5 2

N

P(N

)

Binomial Distribution(Discrete)

Gaussian Distribution(Continuous)


Density Estimation• Parametric

Assume a Gaussian (e.g.) distribution.Estimate the parameters (,).

• Non-parametricHistogram samplingBin size is criticalGaussian smoothing

can help


The Gaussian distribution

xxxfd

1T

2

1exp

2

12

12

2

22 2

ex

xf

Multivariate (d-dimensional):

Univariate:

A parametric Bayesian classifier must estimate and from the training samples.


Making decisions• Once you have the distributions for

Each featureand

Each class

• You can ask questions like…

If I have an apple, what is the probability that the diameter will be between 3.2 and 3.5 inches?


More decisions…

Non-parametric Parametric

Diameter

Cou

nt

bins all

inches 3.5 through 3.1 ngrepresenti bins

dxex

5.3

1.3

2

2

22


A Simple Example• You are given a fruit with a

diameter of 4” – is it a pear or an apple?

• To begin, we need to know the distributions of diameters for pears and apples.


Maximum Likelihood

P(x)

apple|xP pear|xP

diameterx

Class-Conditional Distributions

Class-Conditional Distributions

1” 2” 3” 4” 5” 6”


What are we asking?• If the fruit is an apple, how likely is it

to have a diameter of 4”?• If the fruit is a xenofruit from planet

Xircon, how likely is it to have a diameter of 4”?

Is this the right question to ask?


A Key Problem• We based this decision on

(class conditional)• What we really want to use is

(posterior probability)• What if we found the fruit in a

pear orchard?• We need to know the prior

probability of finding an apple or a pear!

pear|xP

xP |pear


Statistical decisions…• If a fruit has a diameter of 4”, how

likely is it to be an apple?

Apples 4” Fruit


“Inverting” the question

Given an apple, what is the probability that it will have a diameter of 4”?

Given an apple, what is the probability that it will have a diameter of 4”?

Given a 4” diameter fruit, what is the probability that it is an apple?

Given a 4” diameter fruit, what is the probability that it is an apple? appleP |0.4x appleP |0.4x 0.4x| appleP 0.4x| appleP


Prior Probabilities• Prior probability + Evidence

Posterior Probability

• Without evidence, what is the “prior probability” that a fruit is an apple?


Bayes Rule

c

jjj

jjj

Pxp

PxpxP

1

|

||

or

xpPxp

xP jjj

|

|


Example Revisited

• Is it an ordinary apple or an uncommon pear?

05.0pear|4

4.0apple|4

"dP

"dP

9.0)pear(

1.0apple

P

P


Bayes Rule Example

47.0085.0

04.0

9.005.01.04.0

1.04.0

"dP 4|apple


appleapple|4

P"dpP"dp

P"dp


Bayes Rule Example "dP 4|pear


pearpear|4

P"dpP"dp

P"dp

53.0085.0

045.0

9.005.01.04.0

9.005.0


Solution

909.0000999.000099.0

00099.0

999.00001.0001.099.0

001.099.0

)|( posguiltP

)(||

|

innocentPinnocentpospguiltPguiltposp

guiltPguiltposp


Marginal Distributions

apple|1xP pear|1xP

apple|2xP pear|2xP


Combining Marginals• Assuming independent features:

• If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).

jdjjj xPxPxPxP ω|ω|ω|ω| 21


Bayes Decision Rule

• Provably optimal when the features (evidence) follow Gaussian distributions, and are independent.

jxPxP ji

i

||

such that , classPredict


Likelihood Ratios• When deciding between two

possibilities, we don’t need the exact probabilities. We only need to know which one is greater.

• The denominator for all the classes is always equal.Can be eliminatedUseful when there are many possible

classes


Likelihood Ratio Example


pearpear|4

P"dpP"dp

P"dp


appleapple|4

P"dpP"dp

P"dp


Likelihood Ratio Example

appleapple|4

pearpear|4

P"dp

P"dp


In-class example:Oranges Grapefruit

Red Intensity

1 0 13

2018

20

8

3 4

22

0

5

10

15

20

25

3.01

7

3.60

47

4.19

24

4.78

01

5.36

78

5.95

55

6.54

32

7.13

09

7.71

86

8.30

63M

ore

Bin

Fre

qu

en

cy

Mass

05

10

19 21

149

12

63

05

10152025

0.09

18

1.11

06

2.12

94

3.14

824.

167

5.18

58

6.20

46

7.22

34

8.24

22M

ore

Bin

Fre

qu

en

cy

Red Intensity

1 2

810

19

15

19

15

8

1 2

0

5

10

15

20

5.39

1

5.82

64

6.26

18

6.69

72

7.13

267.

568

8.00

34

8.43

88

8.87

42

9.30

96M

ore

Bin

Fre

qu

en

cy

Mass

1 17 9

2026

12 14

72 1

05

1015202530

4.48

1

5.24

83

6.01

56

6.78

29

7.55

02

8.31

75

9.08

48

9.85

21

10.6

194

11.3

867

Mor

e

Bin

Fre

qu

en

cy


Example (cont’d)• After observing several hundred fruit

pass down the assembly line, we observe that72% are oranges28% are grapefruit

• Fruit ‘x’Red intensity = 8.2Mass = 7.6

What shall we predict for the class of fruit ‘x’?

What shall we predict for the class of fruit ‘x’?


Conclusion

39.|orange xP

61.|grapefruit xP

Predict that fruit ‘x’ is a grapefruit, despite the relative scarcity of grapefruits on the conveyor belt.


Abbreviated

• Since the denominator is the same for all classes, we can just compare:

orangeorange|6.7orange|2.8 PmassPredP

and

grapefruitgrapefruit|6.7grapefruit|2.8 PmassPredP


Likelihood comparison orangeorange|6.7orange|2.8 PmassPredP

0069.

72.12.08.

grapefruitgrapefruit|6.7grapefruit|2.8 PmassPredP

0106.

28.20.19.

bayesian classification with a brief introduction to pattern recognition modified from slides by...

Documents