the naïve bayes classifier - svivek.com · let’s be use the bayes rule for predicting ygiven an...
TRANSCRIPT
![Page 1: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/1.jpg)
MachineLearning
TheNaïveBayesClassifier
1
![Page 2: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/2.jpg)
Today’slecture
• ThenaïveBayesClassifier
• LearningthenaïveBayesClassifier
• Practicalconcerns
2
![Page 3: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/3.jpg)
Today’slecture
• ThenaïveBayesClassifier
• LearningthenaïveBayesClassifier
• Practicalconcerns
3
![Page 4: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/4.jpg)
Wherearewe?
WehaveseenBayesianlearning– Usingaprobabilisticcriteriontoselectahypothesis– Maximumaposterioriandmaximumlikelihoodlearning
• Question:Whatisthedifferencebetweenthem?
Wecouldalsolearnfunctionsthatpredictprobabilitiesofoutcomes
– Differentfromusingaprobabilisticcriteriontolearn
Maximumaposteriori(MAP)predictionasopposedtoMAPlearning
4
![Page 5: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/5.jpg)
Wherearewe?
WehaveseenBayesianlearning– Usingaprobabilisticcriteriontoselectahypothesis– Maximumaposterioriandmaximumlikelihoodlearning
• Question:Whatisthedifferencebetweenthem?
Wecouldalsolearnfunctionsthatpredictprobabilitiesofoutcomes
– Differentfromusingaprobabilisticcriteriontolearn
Maximumaposteriori(MAP)predictionasopposedtoMAPlearning
5
![Page 6: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/6.jpg)
MAPprediction
Let’sbeusetheBayesruleforpredictingy givenaninputx
6
Posteriorprobabilityoflabelbeingy forthisinputx
![Page 7: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/7.jpg)
MAPprediction
Let’sbeusetheBayesruleforpredictingy givenaninputx
Predicty fortheinputx using
7
![Page 8: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/8.jpg)
MAPprediction
Let’sbeusetheBayesruleforpredictingy givenaninputx
Predicty fortheinputx using
8
![Page 9: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/9.jpg)
MAPprediction
Let’sbeusetheBayesruleforpredictingy givenaninputx
Predicty fortheinputx using
9
Don’tconfusewithMAPlearning:findshypothesisby
![Page 10: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/10.jpg)
MAPprediction
Predicty fortheinputx using
10
Likelihood ofobservingthisinputx whenthelabelisy
Priorprobabilityofthelabelbeingy
Allweneedarethesetwosetsofprobabilities
![Page 11: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/11.jpg)
Example:Tennisagain
11
Temperature Wind P(T, W|Tennis=Yes)
Hot Strong 0.15
Hot Weak 0.4
Cold Strong 0.1
Cold Weak 0.35
Temperature Wind P(T, W|Tennis=No)
Hot Strong 0.4
Hot Weak 0.1
Cold Strong 0.3
Cold Weak 0.2
Playtennis P(Playtennis)
Yes 0.3
No 0.7Prior
Likelihood
Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?
OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?
OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?
![Page 12: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/12.jpg)
Example:Tennisagain
12
Temperature Wind P(T, W|Tennis=Yes)
Hot Strong 0.15
Hot Weak 0.4
Cold Strong 0.1
Cold Weak 0.35
Temperature Wind P(T, W|Tennis=No)
Hot Strong 0.4
Hot Weak 0.1
Cold Strong 0.3
Cold Weak 0.2
Playtennis P(Playtennis)
Yes 0.3
No 0.7Prior
Likelihood
Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?
OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?
OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?
![Page 13: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/13.jpg)
Example:Tennisagain
13
Temperature Wind P(T, W|Tennis=Yes)
Hot Strong 0.15
Hot Weak 0.4
Cold Strong 0.1
Cold Weak 0.35
Temperature Wind P(T, W|Tennis=No)
Hot Strong 0.4
Hot Weak 0.1
Cold Strong 0.3
Cold Weak 0.2
Playtennis P(Playtennis)
Yes 0.3
No 0.7Prior
Likelihood
Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?
OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?
OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?
![Page 14: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/14.jpg)
Example:Tennisagain
14
Temperature Wind P(T, W|Tennis=Yes)
Hot Strong 0.15
Hot Weak 0.4
Cold Strong 0.1
Cold Weak 0.35
Temperature Wind P(T, W|Tennis=No)
Hot Strong 0.4
Hot Weak 0.1
Cold Strong 0.3
Cold Weak 0.2
Playtennis P(Playtennis)
Yes 0.3
No 0.7Prior
Likelihood
Input:Temperature=Hot(H)Wind=Weak(W)
ShouldIplaytennis?
![Page 15: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/15.jpg)
Example:Tennisagain
15
Temperature Wind P(T, W|Tennis=Yes)
Hot Strong 0.15
Hot Weak 0.4
Cold Strong 0.1
Cold Weak 0.35
Temperature Wind P(T, W|Tennis=No)
Hot Strong 0.4
Hot Weak 0.1
Cold Strong 0.3
Cold Weak 0.2
Playtennis P(Playtennis)
Yes 0.3
No 0.7Prior
Likelihood
Input:Temperature=Hot(H)Wind=Weak(W)
ShouldIplaytennis?
argmaxy P(H,W|play?)P(play?)
![Page 16: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/16.jpg)
Example:Tennisagain
16
Temperature Wind P(T, W|Tennis=Yes)
Hot Strong 0.15
Hot Weak 0.4
Cold Strong 0.1
Cold Weak 0.35
Temperature Wind P(T, W|Tennis=No)
Hot Strong 0.4
Hot Weak 0.1
Cold Strong 0.3
Cold Weak 0.2
Playtennis P(Playtennis)
Yes 0.3
No 0.7Prior
Likelihood
Input:Temperature=Hot(H)Wind=Weak(W)
ShouldIplaytennis?
argmaxy P(H,W|play?)P(play?)
P(H,W|Yes)P(Yes)=0.4£ 0.3=0.12
P(H,W|No)P(No)=0.1£ 0.7=0.07
![Page 17: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/17.jpg)
Example:Tennisagain
17
Temperature Wind P(T, W|Tennis=Yes)
Hot Strong 0.15
Hot Weak 0.4
Cold Strong 0.1
Cold Weak 0.35
Temperature Wind P(T, W|Tennis=No)
Hot Strong 0.4
Hot Weak 0.1
Cold Strong 0.3
Cold Weak 0.2
Playtennis P(Playtennis)
Yes 0.3
No 0.7Prior
Likelihood
Input:Temperature=Hot(H)Wind=Weak(W)
ShouldIplaytennis?
argmaxy P(H,W|play?)P(play?)
P(H,W|Yes)P(Yes)=0.4£ 0.3=0.12
P(H,W|No)P(No)=0.1£ 0.7=0.07
MAPprediction=Yes
![Page 18: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/18.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
Outlook: S(unny),O(vercast),R(ainy)
Temperature: H(ot),M(edium),C(ool)
Humidity: H(igh),N(ormal),L(ow)
Wind: S(trong),W(eak)
18
![Page 19: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/19.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
Outlook: S(unny),O(vercast),R(ainy)
Temperature: H(ot),M(edium),C(ool)
Humidity: H(igh),N(ormal),L(ow)
Wind: S(trong),W(eak)
19
Weneedtolearn
1. ThepriorP(Play?)2. ThelikelihoodsP(X|Play?)
![Page 20: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/20.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
PriorP(play?)
• Asinglenumber(Whyonlyone?)
LikelihoodP(X|Play?)
• Thereare4features
• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)
• (24 – 1)parametersineachcase
Oneforeachassignment
20
![Page 21: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/21.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
PriorP(play?)
• Asinglenumber(Whyonlyone?)
LikelihoodP(X|Play?)
• Thereare4features
• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)
21
![Page 22: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/22.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
3 3 3 2
PriorP(play?)
• Asinglenumber(Whyonlyone?)
LikelihoodP(X|Play?)
• Thereare4features
• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)
22Valuesforthisfeature
![Page 23: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/23.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
3 3 3 2
PriorP(play?)
• Asinglenumber(Whyonlyone?)
LikelihoodP(X|Play?)
• Thereare4features
• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)
• (3 ⋅ 3 ⋅ 3 ⋅ 2 − 1)parametersineachcase
Oneforeachassignment
23Valuesforthisfeature
![Page 24: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/24.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
PriorP(Y)
• Ifthereareklabels,thenk– 1parameters(whynotk?)
LikelihoodP(X|Y)
• Iftherearedfeatures,then:
• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy
• k(2d – 1)parameters
Needalotofdatatoestimatethesemanynumbers!
24
Ingeneral
![Page 25: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/25.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
PriorP(Y)
• Ifthereareklabels,thenk– 1parameters(whynotk?)
LikelihoodP(X|Y)
• IftherearedBooleanfeatures:
• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy
• k(2d – 1)parameters
Needalotofdatatoestimatethesemanynumbers!
25
Ingeneral
![Page 26: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/26.jpg)
Howhardisittolearnprobabilisticmodels?
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
PriorP(Y)
• Ifthereareklabels,thenk– 1parameters(whynotk?)
LikelihoodP(X|Y)
• IftherearedBooleanfeatures:
• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy
• k(2d – 1)parameters
Needalotofdatatoestimatethesemanynumbers!
26
Ingeneral
![Page 27: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/27.jpg)
Howhardisittolearnprobabilisticmodels?
PriorP(Y)
• Ifthereareklabels,thenk– 1parameters(whynotk?)
LikelihoodP(X|Y)
• IftherearedBooleanfeatures:
• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy
• k(2d – 1)parameters
Needalotofdatatoestimatethesemanynumbers!
27
Highmodelcomplexity
Ifthereisverylimiteddata,highvarianceintheparameters
![Page 28: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/28.jpg)
Howhardisittolearnprobabilisticmodels?
PriorP(Y)
• Ifthereareklabels,thenk– 1parameters(whynotk?)
LikelihoodP(X|Y)
• IftherearedBooleanfeatures:
• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy
• k(2d – 1)parameters
Needalotofdatatoestimatethesemanynumbers!
28
Highmodelcomplexity
Ifthereisverylimiteddata,highvarianceintheparameters
Howcanwedealwiththis?
![Page 29: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/29.jpg)
Howhardisittolearnprobabilisticmodels?
PriorP(Y)
• Ifthereareklabels,thenk– 1parameters(whynotk?)
LikelihoodP(X|Y)
• IftherearedBooleanfeatures:
• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy
• k(2d – 1)parameters
Needalotofdatatoestimatethesemanynumbers!
29
Highmodelcomplexity
Ifthereisverylimiteddata,highvarianceintheparameters
Howcanwedealwiththis?
Answer:Makeindependenceassumptions
![Page 30: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/30.jpg)
Recall:Conditionalindependence
SupposeX,YandZarerandomvariables
XisconditionallyindependentofYgivenZiftheprobabilitydistributionofXisindependentofthevalueofYwhenZisobserved
Orequivalently
30
![Page 31: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/31.jpg)
Modelingthefeatures
𝑃(𝑥+, 𝑥-,⋯ , 𝑥/|𝑦) requiredk(2d – 1)parameters
Whatifallthefeatureswereconditionallyindependentgiventhelabel?
Thatis,𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦 = 𝑃 𝑥+ 𝑦 𝑃 𝑥- 𝑦 ⋯𝑃 𝑥/ 𝑦
Requiresonlydnumbersforeachlabel.kd featuresoverall.Notbad!
31
TheNaïveBayesAssumption
![Page 32: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/32.jpg)
Modelingthefeatures
𝑃(𝑥+, 𝑥-,⋯ , 𝑥/|𝑦) requiredk(2d – 1)parameters
Whatifallthefeatureswereconditionallyindependentgiventhelabel?
Thatis,𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦 = 𝑃 𝑥+ 𝑦 𝑃 𝑥- 𝑦 ⋯𝑃 𝑥/ 𝑦
Requiresonlydnumbersforeachlabel.kd parametersoverall.Notbad!
32
TheNaïveBayesAssumption
![Page 33: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/33.jpg)
TheNaïveBayesClassifier
Assumption:FeaturesareconditionallyindependentgiventhelabelY
Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)
33
![Page 34: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/34.jpg)
TheNaïveBayesClassifier
Assumption:FeaturesareconditionallyindependentgiventhelabelY
Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)
Decisionrule
34
ℎ45 𝒙 = argmax<
𝑃 𝑦 𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦)
![Page 35: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/35.jpg)
TheNaïveBayesClassifier
Assumption:FeaturesareconditionallyindependentgiventhelabelY
Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)
Decisionrule
35
ℎ45 𝒙 = argmax<
𝑃 𝑦 𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦)
= argmax<
𝑃 𝑦 =𝑃(𝑥>|𝑦)�
>
![Page 36: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/36.jpg)
DecisionboundariesofnaïveBayes
WhatisthedecisionboundaryofthenaïveBayesclassifier?
Considerthetwoclasscase.Wepredictthelabeltobe+if
36
𝑃 𝑦 = + =𝑃 𝑥> 𝑦 = + > 𝑃 𝑦 = − =𝑃 𝑥> 𝑦 = −)�
>
�
>
![Page 37: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/37.jpg)
DecisionboundariesofnaïveBayes
WhatisthedecisionboundaryofthenaïveBayesclassifier?
Considerthetwoclasscase.Wepredictthelabeltobe+if
37
𝑃 𝑦 = + =𝑃 𝑥> 𝑦 = + > 𝑃 𝑦 = − =𝑃 𝑥> 𝑦 = −)�
>
�
>
𝑃 𝑦 = + ∏ 𝑃 𝑥> 𝑦 = +)�>
𝑃 𝑦 = − ∏ 𝑃(𝑥>|𝑦 = −)�>
> 1
![Page 38: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/38.jpg)
DecisionboundariesofnaïveBayes
WhatisthedecisionboundaryofthenaïveBayesclassifier?
Takinglogandsimplifying,weget
38
Thisisalinearfunctionofthefeaturespace!
Easytoprove.Seenoteoncoursewebsite
log𝑃(𝑦 = −|𝒙)𝑃(𝑦 = +|𝒙) = 𝒘F𝒙 + 𝑏
![Page 39: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/39.jpg)
Today’slecture
• ThenaïveBayesClassifier
• LearningthenaïveBayesClassifier
• PracticalConcerns
39
![Page 40: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/40.jpg)
LearningthenaïveBayesClassifier
• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities
• Priorforeachlabel:P(y)• Likelihoodsforfeaturexj givenalabel:P(xj|y)
IfwehaveadatasetD={(xi,yi)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatistheprobabilisticcriteriontoselectthehypothesis?
40
![Page 41: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/41.jpg)
LearningthenaïveBayesClassifier
• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities
• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)
IfwehaveadatasetD={(xi,yi)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatistheprobabilisticcriteriontoselectthehypothesis?
41
![Page 42: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/42.jpg)
LearningthenaïveBayesClassifier
• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities
• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)
Supposewehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples
42
![Page 43: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/43.jpg)
LearningthenaïveBayesClassifier
• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities
• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)
Supposewehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples
43
Anoteonconventionforthissection:• Examplesinthedatasetareindexedbythesubscript𝑖 (e.g. 𝒙𝑖)• Featureswithinanexampleareindexedbythesubscript𝑗
• The𝑗MN featureofthe𝑖MN examplewillbe𝑥O>
![Page 44: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/44.jpg)
LearningthenaïveBayesClassifier
• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities
• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)
Ifwehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatisaprobabilisticcriteriontoselectthehypothesis?
44
![Page 45: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/45.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
45
HerehisdefinedbyalltheprobabilitiesusedtoconstructthenaïveBayesdecision
![Page 46: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/46.jpg)
Maximumlikelihoodestimation
Givenadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)} withmexamples
46
Eachexampleinthedatasetisindependentandidenticallydistributed
SowecanrepresentP(D|h)asthisproduct
![Page 47: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/47.jpg)
Maximumlikelihoodestimation
Givenadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples
47
Asks“Whatprobabilitywouldthisparticularh assigntothepair(xi,yi)?”
Eachexampleinthedatasetisindependentandidenticallydistributed
SowecanrepresentP(D|h)asthisproduct
![Page 48: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/48.jpg)
Maximumlikelihoodestimation
GivenadatasetD={(xi,yi)}withmexamples
48
![Page 49: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/49.jpg)
Maximumlikelihoodestimation
GivenadatasetD={(xi,yi)}withmexamples
49
TheNaïveBayesassumption
xij isthejthfeatureofxi
![Page 50: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/50.jpg)
Maximumlikelihoodestimation
GivenadatasetD={(xi,yi)}withmexamples
50
Howdoweproceed?
![Page 51: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/51.jpg)
Maximumlikelihoodestimation
GivenadatasetD={(xi,yi)}withmexamples
51
![Page 52: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/52.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
52
Whatnext?
![Page 53: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/53.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
53
Whatnext?
Weneedtomakeamodelingassumptionaboutthefunctionalformoftheseprobabilitydistributions
![Page 54: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/54.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
54
Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary
• Prior:P(y=1)=p andP(y=0)=1– p
![Page 55: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/55.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
55
Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary
• Prior:P(y=1)=p andP(y=0)=1– p
• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj
![Page 56: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/56.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
56
Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary
• Prior:P(y=1)=p andP(y=0)=1– p
• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj
![Page 57: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/57.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
57
Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary
• Prior:P(y=1)=p andP(y=0)=1– p
• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj
hconsistsofp,allthea’sandb’s
![Page 58: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/58.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
58
• Prior:P(y=1)=p andP(y=0)=1– p
![Page 59: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/59.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
59
• Prior:P(y=1)=p andP(y=0)=1– p
[z]iscalledtheindicatorfunctionortheIversonbracket
Itsvalueis1iftheargumentzistrueandzerootherwise
![Page 60: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/60.jpg)
LearningthenaïveBayesClassifier
Maximumlikelihoodestimation
60
Likelihoodforeachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj
![Page 61: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/61.jpg)
LearningthenaïveBayesClassifier
Substitutingandderivingtheargmax,weget
61
P(y=1)=p
![Page 62: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/62.jpg)
LearningthenaïveBayesClassifier
Substitutingandderivingtheargmax,weget
62
P(y=1)=p
P(xj =1|y=1)=aj
![Page 63: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/63.jpg)
LearningthenaïveBayesClassifier
Substitutingandderivingtheargmax,weget
63
P(y=1)=p
P(xj =1|y=1)=aj
P(xj =1|y=0)=bj
![Page 64: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/64.jpg)
Let’slearnanaïveBayesclassifier
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
64
![Page 65: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/65.jpg)
Let’slearnanaïveBayesclassifier
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
65
P(Play=+)=9/14 P(Play=-)=5/14
![Page 66: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/66.jpg)
Let’slearnanaïveBayesclassifier
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
66
P(Play=+)=9/14 P(Play=-)=5/14
P(O =S|Play=+)=2/9
![Page 67: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/67.jpg)
Let’slearnanaïveBayesclassifier
67
P(Play=+)=9/14 P(Play=-)=5/14
P(O =S|Play=+)=2/9
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
P(O =R|Play=+)=3/9
![Page 68: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/68.jpg)
Let’slearnanaïveBayesclassifier
68
P(Play=+)=9/14 P(Play=-)=5/14
P(O =S|Play=+)=2/9
O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -
P(O =R|Play=+)=3/9
P(O =O|Play=+)=4/9
Andsoon,forotherattributesandalsoforPlay=-
![Page 69: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/69.jpg)
NaïveBayes:LearningandPrediction
• Learning– Counthowoftenfeaturesoccurwitheachlabel.Normalizetogetlikelihoods
– Priorsfromfractionofexampleswitheachlabel– Generalizestomulticlass
• Prediction– Uselearnedprobabilitiestofindhighestscoringlabel
69
![Page 70: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/70.jpg)
Today’slecture
• ThenaïveBayesClassifier
• LearningthenaïveBayesClassifier
• Practicalconcerns+anexample
70
![Page 71: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/71.jpg)
ImportantcaveatswithNaïveBayes
1. Featuresneednotbeconditionallyindependentgiventhelabel– Justbecauseweassumethattheyaredoesn’tmeanthat
that’showtheybehaveinnature– Wemadeamodelingassumptionbecauseitmakes
computation andlearningeasier
2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts
71
![Page 72: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/72.jpg)
ImportantcaveatswithNaïveBayes
1. Featuresarenotconditionallyindependentgiventhelabel
AllbetsareoffifthenaïveBayesassumptionisnotsatisfied
Andyet,veryoftenusedinpracticebecauseofsimplicityWorksreasonablywellevenwhentheassumptionisviolated
72
![Page 73: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/73.jpg)
ImportantcaveatswithNaïveBayes
2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts
73
Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.
Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes
Shouldwetreatthosecountsaszero?
![Page 74: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/74.jpg)
ImportantcaveatswithNaïveBayes
2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts
74
Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.
Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes
Shouldwetreatthosecountsaszero? Butthatwillmaketheprobabilitieszero
![Page 75: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/75.jpg)
ImportantcaveatswithNaïveBayes
2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts
75
Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.
Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes
Shouldwetreatthosecountsaszero?
Answer:Smoothing• Addfakecounts(verysmallnumberssothatthecountsarenotzero)• TheBayesianinterpretationofsmoothing:Priors onthehypothesis(MAPlearning)
Butthatwillmaketheprobabilitieszero
![Page 76: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/76.jpg)
Example:Classifyingtext
• Instancespace:Textdocuments• Labels:Spam orNotSpam
• Goal:TolearnafunctionthatcanpredictwhetheranewdocumentisSpam orNotSpam
HowwouldyoubuildaNaïveBayesclassifier?
76
Letusbrainstorm
Howtorepresentdocuments?Howtoestimateprobabilities?Howtoclassify?
![Page 77: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/77.jpg)
Example:Classifyingtext
1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword
2. LearningfromNlabeleddocuments1. Priors
2. Foreachwordwinvocabulary:
77
![Page 78: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/78.jpg)
Example:Classifyingtext
1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword
2. LearningfromNlabeleddocuments1. Priors
2. Foreachwordwinvocabulary:
78
![Page 79: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/79.jpg)
Example:Classifyingtext
1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword
2. LearningfromNlabeleddocuments1. Priors
2. Foreachwordwinvocabulary:
79
![Page 80: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/80.jpg)
Example:Classifyingtext
1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword
2. LearningfromNlabeleddocuments1. Priors
2. Foreachwordwinvocabulary:
80
![Page 81: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/81.jpg)
Example:Classifyingtext
1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword
2. LearningfromNlabeleddocuments1. Priors
2. Foreachwordwinvocabulary:
81
Howoftendoesawordoccurwithalabel?
![Page 82: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/82.jpg)
Example:Classifyingtext
1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword
2. LearningfromNlabeleddocuments1. Priors
2. Foreachwordwinvocabulary:
82
Smoothing
![Page 83: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/83.jpg)
Continuousfeatures
• Sofar,wehavebeenlookingatdiscretefeatures– P(xj |y)isaBernoullitrial(i.e.acointoss)
• WecouldmodelP(xj |y)withotherdistributionstoo– Thisisaseparateassumptionfromtheindependence
assumptionthatnaiveBayesmakes– Eg:Forrealvaluedfeatures,(Xj |Y)couldbedrawnfroma
normaldistribution
• Exercise:Derivethemaximumlikelihoodestimatewhenthefeaturesareassumedtobedrawnfromthenormaldistribution
83
![Page 84: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds](https://reader030.vdocuments.us/reader030/viewer/2022041206/5d5eb29188c993a0128bace4/html5/thumbnails/84.jpg)
Summary:NaïveBayes
• Independenceassumption– Allfeaturesareindependentofeachothergiventhelabel
• Maximumlikelihoodlearning:Learningissimple– Generalizestorealvaluedfeatures
• PredictionviaMAPestimation– Generalizestobeyondbinaryclassification
• Importantcaveatstoremember– Smoothing– Independenceassumptionmaynotbevalid
• Decisionboundaryislinearforbinaryclassification
84