bayesian decision theory - hacettepe Üniversitesipinar/courses/vbm687/... · 2017. 3. 17. ·...
TRANSCRIPT
![Page 1: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/1.jpg)
Bayesian Decision TheorySlides are adapted from Jason Corso, George Bebis and Sargur Srihari
based on the content from Duda, Hart & Stork
![Page 2: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/2.jpg)
Motivation
![Page 3: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/3.jpg)
![Page 4: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/4.jpg)
![Page 5: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/5.jpg)
Bayesian Decision Theory
• Fundamental statistical approach to statistical pattern classification
• Quantifies trade-offs between classification using probabilities and costs of decisions
• Assumes all relevant probabilities are known
![Page 6: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/6.jpg)
It is the decision making when all underlying probability distributions are known.
It is optimal given the distributions are known.
For two classes w1 and w2 ,
Prior probabilities for an unknown new observation:
P(w1) : the new observation belongs to class 1
P(w2) : the new observation belongs to class 2
P(w1 ) + P(w2 ) = 1
It reflects our prior knowledge. It is our decision rule when no feature on the new object
is available:
Classify as class 1 if P(w1 ) > P(w2 )
Bayesian Decision Theory
![Page 7: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/7.jpg)
Bayesian Decision Theory
• Design classifiers to make decisions subject to minimizing an expected ”risk”.• The simplest risk is the classification error (i.e., assuming that misclassification costs are
equal).
• When misclassification costs are not equal, the risk can include the cost associated with different misclassifications.
![Page 8: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/8.jpg)
Example
![Page 9: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/9.jpg)
Terminology - consider the sea bass/salmon example
• State of nature ω (class label): • e.g., ω1 for sea bass, ω2 for salmon
• Probabilities P(ω1) and P(ω2) (priors):• e.g., prior knowledge of how likely is to get a sea bass or a salmon
• Probability density function p(x) (evidence): • e.g., how frequently we will measure a pattern with feature value x
(e.g., x corresponds to lightness)
![Page 10: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/10.jpg)
Prior Probability
![Page 11: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/11.jpg)
11
• The catch of salmon and sea bass is equiprobable
• P(w1) = P(w2) (uniform priors)
• P(w1) + P( w2) = 1 (exclusivity and exhaustivity)
Prior Probability
![Page 12: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/12.jpg)
Decision Rule from only Priors
1 2
2 1
( )( )
( )
P if wedecideP error
P if wedecide
w w
w w
Favours the most likely class.
or P(error) = min[P(ω1), P(ω2)]
![Page 13: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/13.jpg)
Features and Feature spaces
![Page 14: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/14.jpg)
• The class-conditional probability density function is the probability density function for x, our feature, given that the state of nature is ω
• e.g., how frequently we will measure a pattern with feature value x given that the pattern belongs to class ωj
• P(x | w1) and P(x | w2) describe
the difference in lightness
between populations of
sea-bass and salmon
Class Conditional probability density p(x/ωj) (likelihood) :
![Page 15: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/15.jpg)
• e.g., the probability that the fish belongs to class ωj given feature x.
Conditional probability P(ωj /x) (posterior) :
![Page 16: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/16.jpg)
Decision Rule Using Conditional Probabilities
• Using Bayes’ rule:
where (i.e., scale factor – sum of probs = 1)
Decide ω1 if P(ω1 /x) > P(ω2 /x); otherwise decide ω2
or
Decide ω1 if p(x/ω1)P(ω1)>p(x/ω2)P(ω2); otherwise decide ω2
or
Decide ω1 if p(x/ω1)/p(x/ω2) >P(ω2)/P(ω1) ; otherwise decide ω2
( / ) ( )( / )
( )
j j
j
p x P likelihood priorP x
p x evidence
w ww
2
1
( ) ( / ) ( )j j
j
p x p x Pw w
thresholdlikelihood ratio
![Page 17: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/17.jpg)
Decision Rule Using Conditional Probabilities (cont’d)
1 2
2 1( ) ( )
3 3P Pw w P(ωj /x)p(x/ωj)
![Page 18: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/18.jpg)
Probability of error• x is an observation for which:
• if P(ω1 | x) > P(ω2 | x) True state of nature = ω1
• if P(ω1 | x) < P(ω2 | x) True state of nature = ω2
![Page 19: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/19.jpg)
Bayes Decision Rule (with equal costs)
![Page 20: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/20.jpg)
Where do Probabilities come from?
• There are two competitive answers:
(1) Relative frequency (objective) approach.• Probabilities can only come from experiments.
(2) Bayesian (subjective) approach.• Probabilities may reflect degree of belief and can be
based on opinion.
![Page 21: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/21.jpg)
Example (objective approach)
• Classify cars whether they are more or less than $50K:• Classes: C1 if price > $50K, C2 if price <= $50K
• Features: x, the height of a car
• Use the Bayes’ rule to compute the posterior probabilities:
• We need to estimate p(x/C1), p(x/C2), P(C1), P(C2)
( / ) ( )( / )
( )
i ii
p x C P CP C x
p x
![Page 22: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/22.jpg)
Example (cont’d)
• Collect data• Ask drivers how much their car was and measure height.
• Determine prior probabilities P(C1), P(C2)• e.g., 1209 samples: #C1=221 #C2=988
1
2
221( ) 0.183
1209
988( ) 0.817
1209
P C
P C
![Page 23: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/23.jpg)
Example (cont’d)
• Determine class conditional probabilities (likelihood)• Discretize car height into bins and use normalized histogram
( / )ip x C
![Page 24: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/24.jpg)
Example (cont’d)
• Calculate the posterior probability for each bin:
1 11
1 1 2 2
( 1.0 / ) ( )( / 1.0)
( 1.0 / ) ( ) ( 1.0 / ) ( )
0.2081*0.1830.438
0.2081*0.183 0.0597*0.817
p x C P CP C x
p x C P C p x C P C
( / )iP C x
![Page 25: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/25.jpg)
A More General Theory
• Use more than one features.
• Allow more than two categories.
• Allow actions other than classifying the input to one of the possible categories (e.g., rejection) - Refusing to make a decision in close or bad cases!.
• Employ a more general error function (i.e., expected “risk”) by associating a “cost” (based on a “loss” function) with different errors.
• Note that, the loss function states how costly each action taken is
![Page 26: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/26.jpg)
Loss Function
If action i is taken and the true state of nature is wj then the decision is correct if i = j and in error if i jSeek a decision rule that minimizes the probability of error which is the error rate
![Page 27: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/27.jpg)
Expected loss
![Page 28: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/28.jpg)
![Page 29: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/29.jpg)
Overall Risk
• Suppose α(x) is a general decision rule that determines which action α1, α2, …, αl to take for every x.
• The overall risk is defined as:
Clearly, we want the rule α(·) that minimizes R(α(x)|x) for all x.
( ( ) / ) ( )R R a p d x x x x
• The Bayes rule minimizes R by:(i) Computing R(αi /x) for every αi given an x
(ii) Choosing the action αi with the minimum R(αi /x)
• The resulting minimum R* is called Bayes risk and is the best (i.e., optimum) performance that can be achieved:
![Page 30: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/30.jpg)
Example: Two-category classification
• Define
• α1: decide ω1
• α2: decide ω2
• λij = λ(αi /ωj)
loss incurred for deciding wi when the true state of nature is wj
• The conditional risks are:
1
( / ) ( / ) ( / )c
i i j j
j
R a a P w w
x x
![Page 31: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/31.jpg)
Example: Two-category classification
• Minimum risk decision rule:
or (i.e., using likelihood ratio)
or
>
thresholdlikelihood ratio
![Page 32: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/32.jpg)
32
x (21- 11)
x (12- 22)
![Page 33: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/33.jpg)
33
Two-Category Decision Theory: Chopping Machine
1 = chop
2 = DO NOT chop
w1 = NO hand in machine
w2 = hand in machine
11 = (1 | w1) = $ 0.00
12 = (1 | w2) = $ 100.00
21 = (2 | w1) = $ 0.01
22 = (1 | w1) = $ 0.01
Therefore our rule becomes
(21- 11) P(x | w1) P(w1) > (12- 22) P(x | w2) P(w2)
0.01 P(x | w1) P(w1) > 99.99 P(x | w2) P(w2)
![Page 34: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/34.jpg)
34
Our rule is the following:
if R(1 | x) < R(2 | x)
action 1: “decide w1” is taken
This results in the equivalent rule :
decide w1 if:
(21- 11) P(x | w1) P(w1) >
(12- 22) P(x | w2) P(w2)
and decide w2 otherwise
![Page 35: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/35.jpg)
Special Case:Zero-One Loss Function
• Assign the same loss to all errors:
• All errors are equally costly.
• The conditional risk corresponding to this loss function:
The risk corresponding to this loss function is the average probability error.
R(a i | x) = l(a i |w j )P(w j | x)j=1
j= c
å
= P(w j | x) =1- P(w i | x)j¹ i
å
![Page 36: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/36.jpg)
Special Case:Zero-One Loss Function (cont’d)
• The decision rule becomes:
• The overall risk turns out to be the average probability error!
or
or
![Page 37: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/37.jpg)
Loss function
Let denote the loss for deciding class i
when the true class is j
In minimizing the risk, we decide class one if
Rearrange it, we have
![Page 38: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/38.jpg)
Loss function
l =0 1
1 0
æ
è ç
ö
ø ÷ ,
then ql =P(w2)
P(w1)= qa
l =0 2
1 0
æ
è ç
ö
ø ÷
then ql =2P(w2)
P(w1)= qb
Example:
![Page 39: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/39.jpg)
Example
2 1( ) / ( )a P P w w
2 12 22
1 21 11
( )( )
( )( )b
P
P
w
w
(decision regions)
Decide ω1 if p(x/ω1)/p(x/ω2)>P(ω2 )/P(ω1) otherwise decide ω2
Assuming zero-one loss:
12 21
>
assume:
Assuming general loss:
![Page 40: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/40.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Diagram of pattern classificationProcedure of pattern recognition and decision making
subjects Featuresx
ObservablesX
Action
Inner beliefw
X--- all the observables using existing sensors and instruments
x --- is a set of features selected from components of X, or linear/non-linear functions of X.
w --- is our inner belief/perception about the subject class.
--- is the action that we take for x.
We denote the three spaces by
},...,,{,class ofindex theis
vectorais),...,,(
α,,
k21
C
d21
αCd
wwww
xxxx
wx
![Page 41: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/41.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Examples
Ex 1: Fish classification
X=I is the image of fish,
x =(brightness, length, fin#, ….)
w is our belief what the fish type is
c={“sea bass”, “salmon”, “trout”, …}
is a decision for the fish type,
in this case c=
={“sea bass”, “salmon”, “trout”, …}
Ex 2: Medical diagnosis
X= all the available medical tests, imaging scans that a
doctor can order for a patient
x =(blood pressure, glucose level, cough, x-ray….)
w is an illness type
c={“Flu”, “cold”, “TB”, “pneumonia”, “lung cancer”…}
is a decision for treatment,
={“Tylenol”, “Hospitalize”, …}
![Page 42: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/42.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Tasks
subjects Featuresx
ObservablesX
Decision
Inner beliefw
controlsensors
selectingInformative
features
statisticalinference
risk/costminimization
In Bayesian decision theory, we are concerned with the last three steps in the big ellipseassuming that the observables are given and features are selected.
![Page 43: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/43.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Bayesian Decision Theory
Featuresx
Decision(x)
Inner beliefp(w|x)
statisticalInference
risk/costminimization
Two probability tables:
a). Prior p(w)
b). Likelihood p(x|w)
A risk/cost function
(is a two-way table)
( | w)
The belief on the class w is computed by the Bayes rule
The risk is computed by
)(
)()|()|(
xp
wpwxpxwp
k
xxR1j
jjii )|)p(ww|()|(
![Page 44: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/44.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Decision Rule
A decision is made to minimize the average cost / risk,
It is minimized when our decision is made to minimize the cost / risk for each instance x.
dx)()|)(( xpxxRR
d:)(x
k
j
jj xwpwxRx1
)|()|(minarg)|(minarg)(
A decision rule is a mapping function from feature space to the set of actions
we will show that randomized decisions won’t be optimal.
![Page 45: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/45.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Bayesian errorIn a special case, like fish classification, the action is classification, we assume a 0/1 error.
jiji
jiji
wifw
wifw
1)|(
0)|(
)|(1)|p(w)|( i
w
ji
ij
xpxxR
The risk for classifying x to class i is,
The optimal decision is to choose the class that has maximum posterior probability
)|(maxarg))|(1(minarg)( xpxpx
The total risk for a decision rule, in this case, is called the Bayesian error
dxxpxxpdxxpxerrorperrorpR )())|)((1()()|()(
![Page 46: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/46.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
An example of fish classification
![Page 47: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/47.jpg)
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Decision/classification Boundaries
![Page 48: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/48.jpg)
Discriminant Functions
Decision surface defined bygi(x) = gj(x)
})(...,...),(),({maxarg)( 21 xgxgxgx k
![Page 49: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/49.jpg)
Bayes Discriminants
![Page 50: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/50.jpg)
Uniqueness of discriminants
![Page 51: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/51.jpg)
Decision Regions and Boundaries
• Discriminants divide the feature space in decision regionsR1, R2, …, Rc, separated by decision boundaries.
Decision boundary
is defined by:
g1(x)=g2(x)
![Page 52: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/52.jpg)
Case of two categories
• More common to use a single discriminant function (dichotomizer) instead of two:
• Examples:
1 2
1 1
2 2
( ) ( / ) ( / )
( / ) ( )( ) ln ln
( / ) ( )
g P P
p Pg
p P
w w
w w
w w
x x x
xx
x
![Page 53: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/53.jpg)
Discriminant function for discrete features
Discrete features: x = [x1, x2, …, xd ]t , xi{0,1 }
pi = P(xi = 1 | w1)
qi = P(xi = 1 | w2)
The likelihood will be:
![Page 54: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/54.jpg)
Discriminant function for discrete features
The discriminant function:
The likelihood ratio:
![Page 55: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/55.jpg)
g(x) = wi
i=1
d
å x i + w0
wi = lnpi(1- qi)
qi(1- pi) i =1,...,d
w0 = ln1- pi
1- qii=1
d
å + lnP(w1)
P(w2)
Discriminant function for discrete features
So the decision surface is again a hyperplane.
![Page 56: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/56.jpg)
56
The Univariate Normal Density
• Easy to work with analytically
• A lot of processes are asymptotically Gaussian
• Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem)
![Page 57: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/57.jpg)
Univariate Normal Density
![Page 58: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/58.jpg)
Entropy
![Page 59: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/59.jpg)
Multivariate density: N( , )
• Multivariate normal density in d dimensions:
where:
x = (x1, x2, …, xd)t (t stands for the transpose of a vector)
= (1, 2, …, d)t mean vector
= d*d covariance matrix
|| and -1 are determinant and inverse of , respectively
• The covariance matrix is always symmetric and positive semidefinite; we assume is positive definite so the determinant of is strictly positive
• Multivariate normal density is completely specified by [d + d(d+1)/2] parameters
• If variables x1 and x2 are statistically independent then the covariance of x1 and x2 is zero.
)x()x(2
1exp
)2(
1)x(P
1t
2/12/d
![Page 60: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/60.jpg)
Multivariate Normal Density
![Page 61: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/61.jpg)
The Covariance Matrix
![Page 62: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/62.jpg)
Mahalanobis Distance
![Page 63: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/63.jpg)
Reminder of some results for random vectors
Let Σ be a kxk square symmetrix matrix, then it has k pairs of
eigenvalues and eigenvectors. A can be decomposed as:
S = l1e1e1¢ + l2e2e2
¢ + .......+ lkekek¢ = PL ¢ P
Positive-definite matrix:
¢ x Sx > 0,"x ¹ 0
l1 ³ l2 ³ ...... ³ lk > 0
Note : ¢ x Sx = l1( ¢ x e1)2 + ......+ lk ( ¢ x ek )2
![Page 64: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/64.jpg)
Normal density
Whitening transform:
P : eigen vector matrix
L : diagonal eigen value matrix
Aw = PL- 1
2
Aw
t SAw
= L- 1
2P tSPL- 1
2
= L- 1
2P tPLP tPL- 1
2
= I
S = l1e1e1¢ + l2e2e2
¢ + .......+ lkekek¢ = PL ¢ P
![Page 65: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/65.jpg)
Linear Combinations of Normals
![Page 66: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/66.jpg)
General Discriminant Function for Multivariate Gaussian DensityRecall the minimum error rate discriminant
( ) ln ( / ) ln ( )i i ig p Pw w x x
N(μ,Σ)
![Page 67: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/67.jpg)
Simple Case: Statistically Independent Features with Same Variance
A classifier that uses linear discriminant functions is called “a linear machine”
The decision surfaces for a linear machine are pieces of hyperplanes defined by:
gi(x) = gj(x)
.
![Page 68: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/68.jpg)
Multivariate Gaussian Density: Case I
• Σi=σ2 I (diagonal matrix)
• Features are statistically independent
• Each feature has the same variance
![Page 69: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/69.jpg)
Case1 : Σi=σ2 I
![Page 70: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/70.jpg)
Case 1: Σi=σ2 I
i.e. With equal prior, x0 is the middle point between the two means.The decision surface is a hyperplane, perpendicular to the line between the means.
![Page 71: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/71.jpg)
Case 1: Σi=σ2 I
• Properties of decision boundary:• It passes through x0
• It is orthogonal to the line linking the means.
• If σ is very small, the position of the boundary is insensitive to P(ωi) and P(ωj)
)
)
2( ) || ||i ig x x
When P(ωi) are equal, then the discriminant becomes:
![Page 72: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/72.jpg)
Case 1: Σi=σ2 I
With unequal prior
probabilities, the
decision boundary
shifts to the less likely
mean.
![Page 73: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/73.jpg)
Multivariate Gaussian Density: Case 2
• Σi= Σ
Recall
![Page 74: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/74.jpg)
Case 2 : Σi= Σ
![Page 75: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/75.jpg)
Case 2 : Σi= Σ
Properties of hyperplane (decision boundary) for equal but asymmetric Gaussian distributions
It passes through x0
It is not orthogonal to the line between the means.If P(ωi) and P(ωj) are not equal, then x0 shifts away from the most likely category.
![Page 76: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/76.jpg)
• Mahalanobis distance classifier • When P(ωi) are equal, then the discriminant becomes:
Case 2 : Σi= Σ
![Page 77: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/77.jpg)
Multivariate Gaussian Density: Case 3
• Σi= arbitrary
e.g., hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids etc.
hyperquadrics;
![Page 78: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/78.jpg)
non-linear
decision
boundaries
Case 3: Σi= arbitrary
![Page 79: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/79.jpg)
Case 3: Σi= arbitrary
![Page 80: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/80.jpg)
Case 3: Σi= arbitrary
![Page 81: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/81.jpg)
Example - Case 3
P(ω1)=P(ω2)
decision boundary:
boundary does
not pass through
midpoint of μ1,μ2
![Page 82: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/82.jpg)
Signal Detection Theory
![Page 83: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/83.jpg)
Signal Detection Theory
![Page 84: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/84.jpg)
Receiver Operating Characteristics
![Page 85: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/85.jpg)
ROC
![Page 86: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/86.jpg)
Example: Person Authentication
• Authenticate a person using biometrics (e.g., fingerprints).
• There are two possible distributions (i.e., classes):• Authentic (A) and Impostor (I)
IA
![Page 87: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/87.jpg)
Example: Person Authentication
• Possible decisions:• (1) correct acceptance (true positive):
• X belongs to A, and we decide A
• (2) incorrect acceptance (false positive): • X belongs to I, and we decide A
• (3) correct rejection (true negative): • X belongs to I, and we decide I
• (4) incorrect rejection (false negative): • X belongs to A, and we decide I
IA
false positive
correct acceptance
correct rejection
false negative
![Page 88: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/88.jpg)
Error vs Threshold
x* (threshold)
ROC Curve
FAR: False Accept Rate (False Positive)FRR: False Reject Rate (False Negative)
![Page 89: Bayesian Decision Theory - Hacettepe Üniversitesipinar/courses/VBM687/... · 2017. 3. 17. · Decision Inner belief w control sensors selecting Informative features statistical inference](https://reader033.vdocuments.us/reader033/viewer/2022060517/6049b3640eb6e17b4a5d1a37/html5/thumbnails/89.jpg)
False Negatives vs Positives
ROC Curve
FAR: False Accept Rate (False Positive)FRR: False Reject Rate (False Negative)