comp24111 machine learning naïve bayes classifier ke chen
Post on 18-Jan-2018
241 Views
Preview:
DESCRIPTION
TRANSCRIPT
COMP24111 Machine Learning
Naïve Bayes Classifier Ke Chen
COMP24111 Machine Learning2
Outline
• Background• Probability Basics• Probabilistic Classification• Naïve Bayes • Example: Play Tennis• Relevant Issues• Conclusions
COMP24111 Machine Learning3
Background• There are three methods to establish a classifier a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM b) Model the probability of class memberships given input data Example: perceptron with the cross-entropy cost
c) Make a probabilistic model of data within each class Examples: naive Bayes, model based classifiers
• a) and b) are examples of discriminative classification• c) is an example of generative classification• b) and c) are both examples of probabilistic
classification
COMP24111 Machine Learning4
Probability Basics
• Prior, conditional and joint probability for random
variables– Prior probability: – Conditional probability: – Joint probability: – Relationship:– Independence:
• Bayesian Rule
)| ,)( 121 XP(XX|XP 2
)()()( )(
XXXP
CPC|P|CP
)(XP
) )( ),,( 22 ,XP(XPXX 11 XX)()|()()|() 2211122 XPXXPXPXXP,XP(X1
)()() ),()|( ),()|( 212121212 XPXP,XP(XXPXXPXPXXP 1
EvidencePriorLikelihoodPosterior
COMP24111 Machine Learning5
Probability Basics
• Quiz: We have two six-sided dice. When they are tolled, it could
end up with the following occurance: (A) dice 1 lands on side “3”, (B) dice 2 lands on side “1”, and (C) Two dice sum to eight. Answer the following questions:
? equals ),( Is 8)?),( 7)?),( 6)?)|( 5)?)|( 4)
? 3)? 2)? )( )1
P(C)P(A)CAPCAPBAPACPBAP
P(C) P(B)
AP
COMP24111 Machine Learning6
Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model),, , )( 1 n1L X(Xc,,cC|CP XX
),,,( 21 nxxx x
Discriminative Probabilistic Classifier
1x 2x nx
)|( 1 xcP )|( 2 xcP )|( xLcP
COMP24111 Machine Learning7
Probabilistic Classification
• Establishing a probabilistic model for classification
(cont.)– Generative model
),, , )( 1 n1L X(Xc,,cCC|P XX
GenerativeProbabilistic Model
for Class 1
)|( 1cP x
1x 2x nx
GenerativeProbabilistic Model
for Class 2
)|( 2cP x
1x 2x nx
GenerativeProbabilistic Model
for Class L
)|( LcP x
1x 2x nx
),,,( 21 nxxx x
COMP24111 Machine Learning8
Probabilistic Classification
• MAP classification rule
– MAP: Maximum A Posterior– Assign x to c* if
• Generative classification with the MAP rule– Apply Bayesian rule to convert them into posterior
probabilities
– Then apply the MAP rule
Lc,,cccc|cCP|cCP 1** , )( )( xXxX
LicCPcC|P
PcCPcC|P|cCP
ii
iii
,,2,1 for )()(
)()()( )(
xXxX
xXxX
COMP24111 Machine Learning9
Naïve Bayes
• Bayes classification
Difficulty: learning the joint probability • Naïve Bayes classification
– Assumption that all input attributes are conditionally independent!
– MAP classification rule: for
)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX
)|,,( 1 CXXP n
)|()|()|( )|,,()|(
)|,,();,,|()|,,,(
21
21
22121
CXPCXPCXPCXXPCXP
CXXPCXXXPCXXXP
n
n
nnn
Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*
1***
1
),,,( 21 nxxx x
COMP24111 Machine Learning10
Naïve Bayes
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
Output: conditional probability tables; for elements
– Test Phase: Given an unknown instance , Look up tables to assign the label c* to X’ if
; in examples with)|( estimate)|(̂
),1 ;,,1( attribute each of value attribute every For ; in examples with)( estimate)(̂
of value target each For 1
S
S
ijkjijkj
jjjk
ii
Lii
cCxXPcCxXP
N,knj XxcCPcCP
)c,,c(c c
Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*
1***
1
),,( 1 naa XLNX jj ,
COMP24111 Machine Learning11
Example
• Example: Play Tennis
COMP24111 Machine Learning12
Example
• Learning Phase
Outlook Play=Yes
Play=No
Sunny 2/9 3/5Overcas
t4/9 0/5
Rain 3/9 2/5
Temperature
Play=Yes Play=No
Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5
Humidity Play=Yes
Play=No
High 3/9 4/5Normal 6/9 1/5
Wind Play=Yes
Play=No
Strong 3/9 3/5Weak 6/9 2/5
P(Play=Yes) = 9/14P(Play=No) = 5/14
COMP24111 Machine Learning13
Example
• Test Phase
– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)– Look up tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9=0.22P(Temperature=Cool|Play=Yes) = 3/9=0.33P(Huminity=High|Play=Yes) = 3/9=0.33P(Wind=Strong|Play=Yes) = 3/9=0.33P(Play=Yes) = 9/14=0.64
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
COMP24111 Machine Learning14
Example
• Test Phase
– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong)– Look up tables
– MAP rule
P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14
P(Outlook=Sunny|Play=Yes) = 2/9P(Temperature=Cool|Play=Yes) = 3/9P(Huminity=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.
COMP24111 Machine Learning15
Relevant Issues
• Violation of Independence Assumption
– For many real world tasks,– Nevertheless, naïve Bayes works surprisingly well
anyway!• Zero conditional probability Problem
– If no example contains the attribute value– In this circumstance, during test – For a remedy, conditional probabilities estimated with
)|()|( )|,,( 11 CXPCXPCXXP nn
0)|(̂ , ijkjjkj cCaXPaX0)|(̂)|(̂)|(̂ 1 inijki cxPcaPcxP
)1 examples, virtual"" of (number prior to weight:) of values possible for /1 (usually, estimate prior :
whichfor examples training of number : C and whichfor examples training of number :
)|(̂
mmXttpp
cCncaXn
mnmpncCaXP
j
i
ijkjc
cijkj
COMP24111 Machine Learning16
Relevant Issues
• Continuous-valued Input Attributes
– Numberless values for an attribute – Conditional probability modeled with the normal
distribution
– Learning Phase: Output: normal distributions and – Test Phase:
• Calculate conditional probabilities with all the normal distributions• Apply the MAP rule to make a decision
ijji
ijji
ji
jij
jiij
cCcX
XcCXP
whichfor examples of X values attribute of deviation standard :C whichfor examples of values attribute of (avearage) mean :
2)(
exp21)|(̂ 2
2
Ln ccCXX ,, ),,,( for 11 XLn
),,( for 1 nXX XLicCP i ,,1 )(
COMP24111 Machine Learning17
Conclusions• Naïve Bayes based on the independence assumption
– Training is very easy and fast; just requiring considering each attribute in each class separately
– Test is straightforward; just looking up tables or calculating conditional probabilities with normal distributions
• A popular generative model– Performance competitive to most of state-of-the-art classifiers
even in presence of violating independence assumption– Many successful applications, e.g., spam mail filtering– A good candidate of a base learner in ensemble learning– Apart from classification, naïve Bayes can do more…
top related