bayesian classification with a brief introduction to pattern recognition modified from slides by...
Post on 22-Dec-2015
229 views
TRANSCRIPT
Bayesian Classificationwith a brief introduction to pattern
recognition
Modified from slides byMichael L. Raymer, Ph.D.
8/29/03 M. Raymer – WSU, FBS 2
The pattern recognition paradigm• Fruit on an assembly line
Oranges, grapefruit, lemons, cherries, apples
• Sensors measure:Red intensityYellow intensityMass (kg)Approximate volume
• At the end of the line, a gate switches to deposit the fruit into the correct bin
8/29/03 M. Raymer – WSU, FBS 3
Training the algorithm
Red = 2.125
Yellow = 6.143
Mass = 134.32
Volume = 24.21
Apple
Sensors, scales, etc…
8/29/03 M. Raymer – WSU, FBS 4
Training (2)Red = 2.125
Yellow = 6.143
Mass = 134.32
Volume = 24.21
Apple
Red = ???
Yellow = ???
Mass = ???
Volume = ???
Label
Red = ???
Yellow = ???
Mass = ???
Volume = ???
Label
Red = ???
Yellow = ???
Mass = ???
Volume = ???
Label
Red = ???
Yellow = ???
Mass = ???
Volume = ???
Label
Red = ???
Yellow = ???
Mass = ???
Volume = ???
LabelRed = ???
Yellow = ???
Mass = ???
Volume = ???
Label
Red = ???
Yellow = ???
Mass = ???
Volume = ???
Label
Classifier
8/29/03 M. Raymer – WSU, FBS 5
Testing
Red = 2.125
Yellow = 6.143
Mass = 134.32
Volume = 24.21
??
Classifier
!
8/29/03 M. Raymer – WSU, FBS 6
Pattern MatrixV1 V2 V3 V4 V5
Ex 1 3.06 2.05 6.39 7.84 6.75
Ex 2 8.25 0.72 2.52 0.50 9.08
Ex 3 2.72 9.32 5.68 7.83 7.86
Ex 4 7.37 1.30 2.97 0.61 3.49
Ex 5 0.73 1.46 6.60 6.08 0.78
Ex 6 4.85 5.08 4.87 8.06 8.65
Ex 7 5.89 1.23 6.38 2.81 6.84
Ex 8 0.52 6.57 4.08 3.62 0.59
Ex 9 5.66 3.65 6.87 6.90 7.93
Ex 10 3.92 0.73 1.01 3.57 2.47
Ex 11 8.84 1.42 2.79 3.40 3.19
Ex 12 5.63 4.32 8.08 0.82 4.74
Class
1
1
1
2
2
2
3
3
3
4
4
4
8/29/03 M. Raymer – WSU, FBS 7
Nearest Neighbor Classification
Mass (normalized)0 1 2 3 4 5 6 7 8 9 10
12
34
56
78
910
Red
Int
ensi
ty (
norm
aliz
ed)
?
8/29/03 M. Raymer – WSU, FBS 8
Evaluating Accuracy
Trainingdata
Mass (normalized)0 1 2 3 4 5 6 7 8 9 10
12
34
56
78
910
Red
Inte
nsity
(no
rmal
ized
)
Testingdata
8/29/03 M. Raymer – WSU, FBS 9
Problems with KNN classifiers• Lots of memorization• Slow (lots of distance calculations)• Incorrect features cause problems• Features are assumed to all be of
equal importance in classification• Odd exemplars (e.g. green/yellow
apples) cause problems• What value for k?
8/29/03 M. Raymer – WSU, FBS 10
Distributions• Bayesian classifiers start with an estimate of
the distribution of the features
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
P(N)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
N =# Heads (20 Tosses)0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.5 1 1.5 2
N
P(N
)
Binomial Distribution(Discrete)
Gaussian Distribution(Continuous)
8/29/03 M. Raymer – WSU, FBS 11
Density Estimation• Parametric
Assume a Gaussian (e.g.) distribution.Estimate the parameters (,).
• Non-parametricHistogram samplingBin size is criticalGaussian smoothing
can help
8/29/03 M. Raymer – WSU, FBS 12
The Gaussian distribution
xxxfd
1T
2
1exp
2
12
12
2
22 2
ex
xf
Multivariate (d-dimensional):
Univariate:
A parametric Bayesian classifier must estimate and from the training samples.
8/29/03 M. Raymer – WSU, FBS 13
Making decisions• Once you have the distributions for
Each featureand
Each class
• You can ask questions like…
If I have an apple, what is the probability that the diameter will be between 3.2 and 3.5 inches?
8/29/03 M. Raymer – WSU, FBS 14
More decisions…
Non-parametric Parametric
Diameter
Cou
nt
bins all
inches 3.5 through 3.1 ngrepresenti bins
dxex
5.3
1.3
2
2
22
8/29/03 M. Raymer – WSU, FBS 15
A Simple Example• You are given a fruit with a
diameter of 4” – is it a pear or an apple?
• To begin, we need to know the distributions of diameters for pears and apples.
8/29/03 M. Raymer – WSU, FBS 16
Maximum Likelihood
P(x)
apple|xP pear|xP
diameterx
Class-Conditional Distributions
Class-Conditional Distributions
1” 2” 3” 4” 5” 6”
8/29/03 M. Raymer – WSU, FBS 17
What are we asking?• If the fruit is an apple, how likely is it
to have a diameter of 4”?• If the fruit is a xenofruit from planet
Xircon, how likely is it to have a diameter of 4”?
Is this the right question to ask?
8/29/03 M. Raymer – WSU, FBS 18
A Key Problem• We based this decision on
(class conditional)• What we really want to use is
(posterior probability)• What if we found the fruit in a
pear orchard?• We need to know the prior
probability of finding an apple or a pear!
pear|xP
xP |pear
8/29/03 M. Raymer – WSU, FBS 19
Statistical decisions…• If a fruit has a diameter of 4”, how
likely is it to be an apple?
Apples 4” Fruit
8/29/03 M. Raymer – WSU, FBS 20
“Inverting” the question
Given an apple, what is the probability that it will have a diameter of 4”?
Given an apple, what is the probability that it will have a diameter of 4”?
Given a 4” diameter fruit, what is the probability that it is an apple?
Given a 4” diameter fruit, what is the probability that it is an apple? appleP |0.4x appleP |0.4x 0.4x| appleP 0.4x| appleP
8/29/03 M. Raymer – WSU, FBS 21
Prior Probabilities• Prior probability + Evidence
Posterior Probability
• Without evidence, what is the “prior probability” that a fruit is an apple?
8/29/03 M. Raymer – WSU, FBS 22
The heart of it all• Bayes Rule
classes all
)()|(
)()|(|
classPclassevidenceP
classPclassevidencePevidenceclassP
pearpear|4appleapple|4
appleapple|4
P"dpP"dp
P"dp
8/29/03 M. Raymer – WSU, FBS 23
Bayes Rule
c
jjj
jjj
Pxp
PxpxP
1
|
||
or
xpPxp
xP jjj
|
|
8/29/03 M. Raymer – WSU, FBS 24
Example Revisited
• Is it an ordinary apple or an uncommon pear?
05.0pear|4
4.0apple|4
"dP
"dP
9.0)pear(
1.0apple
P
P
8/29/03 M. Raymer – WSU, FBS 25
Bayes Rule Example
47.0085.0
04.0
9.005.01.04.0
1.04.0
"dP 4|apple
pearpear|4appleapple|4
appleapple|4
P"dpP"dp
P"dp
8/29/03 M. Raymer – WSU, FBS 26
Bayes Rule Example "dP 4|pear
pearpear|4appleapple|4
pearpear|4
P"dpP"dp
P"dp
53.0085.0
045.0
9.005.01.04.0
9.005.0
8/29/03 M. Raymer – WSU, FBS 27
Solution
909.0000999.000099.0
00099.0
999.00001.0001.099.0
001.099.0
)|( posguiltP
)(||
|
innocentPinnocentpospguiltPguiltposp
guiltPguiltposp
8/29/03 M. Raymer – WSU, FBS 28
Marginal Distributions
apple|1xP pear|1xP
apple|2xP pear|2xP
8/29/03 M. Raymer – WSU, FBS 29
Combining Marginals• Assuming independent features:
• If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).
jdjjj xPxPxPxP ω|ω|ω|ω| 21
8/29/03 M. Raymer – WSU, FBS 30
Bayes Decision Rule
• Provably optimal when the features (evidence) follow Gaussian distributions, and are independent.
jxPxP ji
i
||
such that , classPredict
8/29/03 M. Raymer – WSU, FBS 31
Likelihood Ratios• When deciding between two
possibilities, we don’t need the exact probabilities. We only need to know which one is greater.
• The denominator for all the classes is always equal.Can be eliminatedUseful when there are many possible
classes
8/29/03 M. Raymer – WSU, FBS 32
Likelihood Ratio Example
pearpear|4appleapple|4
pearpear|4
P"dpP"dp
P"dp
pearpear|4appleapple|4
appleapple|4
P"dpP"dp
P"dp
8/29/03 M. Raymer – WSU, FBS 33
Likelihood Ratio Example
appleapple|4
pearpear|4
P"dp
P"dp
8/29/03 M. Raymer – WSU, FBS 34
In-class example:Oranges Grapefruit
Red Intensity
1 0 13
2018
20
8
3 4
22
0
5
10
15
20
25
3.01
7
3.60
47
4.19
24
4.78
01
5.36
78
5.95
55
6.54
32
7.13
09
7.71
86
8.30
63M
ore
Bin
Fre
qu
en
cy
Mass
05
10
19 21
149
12
63
05
10152025
0.09
18
1.11
06
2.12
94
3.14
824.
167
5.18
58
6.20
46
7.22
34
8.24
22M
ore
Bin
Fre
qu
en
cy
Red Intensity
1 2
810
19
15
19
15
8
1 2
0
5
10
15
20
5.39
1
5.82
64
6.26
18
6.69
72
7.13
267.
568
8.00
34
8.43
88
8.87
42
9.30
96M
ore
Bin
Fre
qu
en
cy
Mass
1 17 9
2026
12 14
72 1
05
1015202530
4.48
1
5.24
83
6.01
56
6.78
29
7.55
02
8.31
75
9.08
48
9.85
21
10.6
194
11.3
867
Mor
e
Bin
Fre
qu
en
cy
8/29/03 M. Raymer – WSU, FBS 35
Example (cont’d)• After observing several hundred fruit
pass down the assembly line, we observe that72% are oranges28% are grapefruit
• Fruit ‘x’Red intensity = 8.2Mass = 7.6
What shall we predict for the class of fruit ‘x’?
What shall we predict for the class of fruit ‘x’?
8/29/03 M. Raymer – WSU, FBS 36
The whole enchilada 6.7,2.8|orangeP
grapefruitgrapefruit|6.7,2.8orangeorange|6.7,2.8
orangeorange|6.7,2.8
PpPp
Pp
and…
orange|6.7orange|2.8orange|6.7,2.8 massPredPP
(Naïve assumption)
Repeat for grapefruit and predict the more probable class.
8/29/03 M. Raymer – WSU, FBS 37
The whole enchilada (2) 6.7,2.8|orangeP
grapefruit orange,
)(|6.7|2.8
)orange(orange|6.7orange|2.8
f
fPfmassPfredP
PmassPredP
)grapefruit(grapefruit|6.7grapefruit|2.8)orange(orange|6.7orange|2.8
)orange(orange|6.7orange|2.8
PmassPredPPmassPredP
PmassPredP
39.
28.20.19.72.12.08.
72.12.08.
8/29/03 M. Raymer – WSU, FBS 38
The whole enchilada (3) 6.7,2.8|grapefruitP
grapefruit orange,
)(|6.7|2.8
)grapefruit(grapefruit|6.7grapefruit|2.8
f
fPfmassPfredP
PmassPredP
)grapefruit(grapefruit|6.7grapefruit|2.8)orange(orange|6.7orange|2.8
)grapefruit(grapefruit|6.7grapefruit|2.8
PmassPredPPmassPredP
PmassPredP
61.
28.20.19.72.12.08.
28.20.19.
8/29/03 M. Raymer – WSU, FBS 39
Conclusion
39.|orange xP
61.|grapefruit xP
Predict that fruit ‘x’ is a grapefruit, despite the relative scarcity of grapefruits on the conveyor belt.
8/29/03 M. Raymer – WSU, FBS 40
Abbreviated
• Since the denominator is the same for all classes, we can just compare:
orangeorange|6.7orange|2.8 PmassPredP
and
grapefruitgrapefruit|6.7grapefruit|2.8 PmassPredP
8/29/03 M. Raymer – WSU, FBS 41
Likelihood comparison orangeorange|6.7orange|2.8 PmassPredP
0069.
72.12.08.
grapefruitgrapefruit|6.7grapefruit|2.8 PmassPredP
0106.
28.20.19.