comp24111 machine learning naïve bayes classifier ke chen

COMP24111 Machine Learning

Naïve Bayes Classifier Ke Chen

COMP24111 Machine Learning2

Outline

• Background• Probability Basics• Probabilistic Classification• Naïve Bayes • Example: Play Tennis• Relevant Issues• Conclusions

Background• There are three methods to establish a classifier a) Model a classification rule directly

Examples: k-NN, decision trees, perceptron, SVM b) Model the probability of class memberships given input data Example: perceptron with the cross-entropy cost

c) Make a probabilistic model of data within each class Examples: naive Bayes, model based classifiers

• a) and b) are examples of discriminative classification• c) is an example of generative classification• b) and c) are both examples of probabilistic

classification

Probability Basics

• Prior, conditional and joint probability for random

variables– Prior probability: – Conditional probability: – Joint probability: – Relationship:– Independence:

• Bayesian Rule

)| ,)( 121 XP(XX|XP 2

)()()( )(

CPC|P|CP

) )( ),,( 22 ,XP(XPXX 11 XX)()|()()|() 2211122 XPXXPXPXXP,XP(X1

)()() ),()|( ),()|( 212121212 XPXP,XP(XXPXXPXPXXP 1

EvidencePriorLikelihoodPosterior

Probability Basics

• Quiz: We have two six-sided dice. When they are tolled, it could

end up with the following occurance: (A) dice 1 lands on side “3”, (B) dice 2 lands on side “1”, and (C) Two dice sum to eight. Answer the following questions:

? equals ),( Is 8)?),( 7)?),( 6)?)|( 5)?)|( 4)

? 3)? 2)? )( )1

P(C)P(A)CAPCAPBAPACPBAP

P(C) P(B)

Probabilistic Classification

• Establishing a probabilistic model for classification

– Discriminative model),, , )( 1 n1L X(Xc,,cC|CP XX

),,,( 21 nxxx x

Discriminative Probabilistic Classifier

1x 2x nx

)|( 1 xcP )|( 2 xcP )|( xLcP

• Establishing a probabilistic model for classification

(cont.)– Generative model

),, , )( 1 n1L X(Xc,,cCC|P XX

GenerativeProbabilistic Model

for Class 1

)|( 1cP x

1x 2x nx

for Class 2

)|( 2cP x

1x 2x nx

for Class L

)|( LcP x

1x 2x nx

),,,( 21 nxxx x

• MAP classification rule

– MAP: Maximum A Posterior– Assign x to c* if

• Generative classification with the MAP rule– Apply Bayesian rule to convert them into posterior

probabilities

– Then apply the MAP rule

Lc,,cccc|cCP|cCP 1** , )( )( xXxX

LicCPcC|P

PcCPcC|P|cCP

,,2,1 for )()(

)()()( )(

Naïve Bayes

• Bayes classification

Difficulty: learning the joint probability • Naïve Bayes classification

– Assumption that all input attributes are conditionally independent!

– MAP classification rule: for

)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX

)|,,( 1 CXXP n

)|()|()|( )|,,()|(

)|,,();,,|()|,,,(

CXPCXPCXPCXXPCXP

CXXPCXXXPCXXXP

Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*

),,,( 21 nxxx x

Naïve Bayes

• Naïve Bayes Algorithm (for discrete input attributes)

– Learning Phase: Given a training set S,

Output: conditional probability tables; for elements

– Test Phase: Given an unknown instance , Look up tables to assign the label c* to X’ if

; in examples with)|( estimate)|(̂

),1 ;,,1( attribute each of value attribute every For ; in examples with)( estimate)(̂

of value target each For 1

ijkjijkj

cCxXPcCxXP

N,knj XxcCPcCP

)c,,c(c c

Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*

),,( 1 naa XLNX jj ,

Example

• Example: Play Tennis

Example

• Learning Phase

Outlook Play=Yes

Play=No

Sunny 2/9 3/5Overcas

t4/9 0/5

Rain 3/9 2/5

Temperature

Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes

Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes

Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14P(Play=No) = 5/14

Example

• Test Phase

– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,

Wind=Strong)– Look up tables

– MAP rule

P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9=0.22P(Temperature=Cool|Play=Yes) = 3/9=0.33P(Huminity=High|Play=Yes) = 3/9=0.33P(Wind=Strong|Play=Yes) = 3/9=0.33P(Play=Yes) = 9/14=0.64

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Example

• Test Phase

– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,

Wind=Strong)– Look up tables

– MAP rule

P(Outlook=Sunny|Play=No) = 3/5P(Temperature=Cool|Play==No) = 1/5P(Huminity=High|Play=No) = 4/5P(Wind=Strong|Play=No) = 3/5P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9P(Temperature=Cool|Play=Yes) = 3/9P(Huminity=High|Play=Yes) = 3/9P(Wind=Strong|Play=Yes) = 3/9P(Play=Yes) = 9/14

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Relevant Issues

• Violation of Independence Assumption

– For many real world tasks,– Nevertheless, naïve Bayes works surprisingly well

anyway!• Zero conditional probability Problem

– If no example contains the attribute value– In this circumstance, during test – For a remedy, conditional probabilities estimated with

)|()|( )|,,( 11 CXPCXPCXXP nn

0)|(̂ , ijkjjkj cCaXPaX0)|(̂)|(̂)|(̂ 1 inijki cxPcaPcxP

)1 examples, virtual"" of (number prior to weight:) of values possible for /1 (usually, estimate prior :

whichfor examples training of number : C and whichfor examples training of number :

mmXttpp

cCncaXn

mnmpncCaXP

Relevant Issues

• Continuous-valued Input Attributes

– Numberless values for an attribute – Conditional probability modeled with the normal

distribution

– Learning Phase: Output: normal distributions and – Test Phase:

• Calculate conditional probabilities with all the normal distributions• Apply the MAP rule to make a decision

whichfor examples of X values attribute of deviation standard :C whichfor examples of values attribute of (avearage) mean :

exp21)|(̂ 2

Ln ccCXX ,, ),,,( for 11 XLn

),,( for 1 nXX XLicCP i ,,1 )(

Conclusions• Naïve Bayes based on the independence assumption

– Training is very easy and fast; just requiring considering each attribute in each class separately

– Test is straightforward; just looking up tables or calculating conditional probabilities with normal distributions

• A popular generative model– Performance competitive to most of state-of-the-art classifiers

even in presence of violating independence assumption– Many successful applications, e.g., spam mail filtering– A good candidate of a base learner in ensemble learning– Apart from classification, naïve Bayes can do more…

comp24111 machine learning naïve bayes classifier ke chen

Documents

naÏve bayes classifier -...

naïve bayes classifier -...

last lecture summary naïve bayes classifier

naïve bayes classifier ke chen modified and extended by...

multi-resolution methods...

design and development of naÏve bayes classifier

naïve bayes classifier - unibas.ch · the naïve bayes...

data classification preprocessing naïve bayes...

probability and naïve bayes classifier louis oliphant...

enhanced smoothing methods using na ve bayes classifier...

naÏve bayes classifier - ulisboa · pdf file1 naÏve bayes...

bayes classifier and naïve bayes - oregon state...

text mining dengan metode naïve bayes classifier dan...

learning: naïve bayes...

naÏve bayes classifier

data classification preprocessing naïve bayes...

adding a source code searching capability to yioop · is to...

text classification: naïve bayes classifier with sentiment...

improve naïve bayesian classifier by discriminative...

naÏve bayes classifier: a mapreduce approach a …