uva cs 4501 : machine learning lecture 17: naïve bayes ... · naive bayes is not so naive •...

73
UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes Classifier for Text Classification Dr. Yanjun Qi University of Virginia Department of Computer Science

Upload: others

Post on 26-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

UVACS4501:MachineLearning

Lecture17:NaïveBayesClassifierfor

TextClassification

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

Page 2: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Wherearewe?èThreemajorsectionsforclassification

•  We can divide the large variety of classification approaches into roughly three major types

1. Discriminative - directly estimate a decision rule/boundary - e.g., SVM, NN, decision tree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors

11/15/18

Dr.YanjunQi/UVACS

2

Page 3: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Today:NaïveBayesClassifierforTextü  DictionarybasedVectorspacerepresentationof

textarticleü  MultivariateBernoullivs.Multinomialü  MultivariateBernoullinaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparametersü  MultinomialnaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparameters§  MultinomialnaïveBayesclassifierasConditional

StochasticLanguageModels(Extra)

11/15/18 3

Dr.YanjunQi/UVACS

Page 4: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Textdocumentclassification,e.g.spamemailfiltering

•  Input:documentD•  Output:thepredictedclassC,cisfrom{c1,...,cL}

•  E.g.,– SpamfilteringTask:Classifyemailas‘Spam’,‘Other’.

P(C=spam|D)

4

Dr.YanjunQi/UVACS

From:""<[email protected]>Subject:realestateistheonlyway...gemoalvgkayAnyonecanbuyrealestatewithnomoneydownStoppayingrentTODAY!ChangeyourlifeNOWbytakingasimplecourse!ClickBelowtoorder:http://www.wholesaledaily.com/sales/nmd.htm

Page 5: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

NaiveBayesisNotSoNaive

• Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems

Goal: Financial services industry direct mail response prediction: Predict if the recipient of mail will actually respond to the advertisement – 750,000 records.

• A good dependable baseline for text classification (but not the best)!

• Formosttextcategorizationtasks,therearemanyrelevantfeaturesandmanyirrelevantones

Page 6: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

TextclassificationTasks•  Input:documentD•  Output:thepredictedclassC,cisfrom{c1,...,cL}

Textclassificationexamples:

•  Classifyemailas ‘Spam’,‘Other’.•  Classifywebpagesas ‘Student’,‘Faculty’,‘Other’•  Classifynewsstoriesintotopics‘Sports’,‘Politics’..•  Classifybusinessnamesby industry.•  Classifymoviereviewsas‘Favorable’,‘Unfavorable’,‘Neutral’ •  …andmanymore.

11/15/18 6

Dr.YanjunQi/UVACS

Page 7: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

11/15/18

Dr.YanjunQi/UVACS

7

Page 8: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

TextCategorization/Classification

§  Given:§  Arepresentationofatextdocumentd

§  Issue:howtorepresenttextdocuments.§ Usuallysometypeofhigh-dimensionalspace–bagofwords

§  Afixedsetofoutputclasses:C={c1,c2,…,cJ}

Sec. 13.1

Page 9: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Thebagofwordsrepresentation

I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

f( )=c

Page 10: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Thebagofwordsrepresentation

f( )=cgreat 2love 2

recommend 1

laugh 1happy 1

... ...

Page 11: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Representingtext:alistofwordsè

Dictionary

Commonrefinements:removestopwords,stemming,collapsingmultipleoccurrencesofwordsintoone….

C

11/15/18 11

Dr.YanjunQi/UVACS

I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

great 2

love 2

recommend 1

laugh 1

happy 1

... .

word frequency

Page 12: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

‘Bagofwords’representationoftext

Bagofwordrepresentation:

Representtextasavectorofwordfrequencies.

11/15/18 12

Dr.YanjunQi/UVACS

I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

great 2

love 2

recommend 1

laugh 1

happy 1

... .

word frequency

Page 13: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Another“Bagofwords”representationoftextèEachdictionarywordasBoolean

word Boolean

Bagofwordrepresentation:

RepresenttextasavectorofBooleanrepresentingifawordExistsorNOT.

11/15/18 13

Dr.YanjunQi/UVACS

I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

great Yes

love Yes

recommend Yes

laugh Yes

happy Yes

hate No

... .

Page 14: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Bagofwords

•  Whatsimplifyingassumptionarewetaking?

Weassumedwordorderisnotimportant.

11/15/18 14

Dr.YanjunQi/UVACS

Page 15: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Bagofwordsrepresentationdocumenti

e.g.,X(i,j)=Frequencyofwordjindocumenti

wordj

Acollectionofdocuments

C

11/15/18 15

Dr.YanjunQi/UVACS

Page 16: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

UnknownWords•  Howtohandlewordsinthetestcorpusthatdidnotoccurinthetrainingdata,i.e.outofvocabulary(OOV)words?

•  Trainamodelthatincludesanexplicitsymbolforanunknownword(<UNK>).–  Chooseavocabularyinadvanceandreplaceother(i.e.notinvocabulary)wordsinthecorpuswith<UNK>.

–  Veryoften,<UNK>alsousedtoreplacerarewords–  Replacethefirstoccurrenceofeachwordinthetrainingdatawith<UNK>.

Page 17: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Today:NaïveBayesClassifierforTextü  DictionarybasedVectorspacerepresentationof

textarticleü  MultivariateBernoullivs.Multinomialü  MultivariateBernoullinaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparametersü  MultinomialnaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparameters§  MultinomialnaïveBayesclassifierasConditional

StochasticLanguageModels(Extra)

11/15/18 17

Dr.YanjunQi/UVACS

Page 18: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

‘Bagofwords’èwhatprobabilitymodel?

Pr(D = d | C = ci )

?11/15/18 18

Dr.YanjunQi/UVACS

I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

great .

love .

recommend .

laugh .

happy .

... .

word

Page 19: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

‘Bagofwords’èwhatprobabilitymodel?

Pr(D | C = c) =?

Pr(W1 = n1,W2 = n2 ,...,Wk = nk | C = c)

Pr(W1 = true,W2 = false...,Wk = true | C = c)

19

Dr.YanjunQi/UVACS

TwoPreviousmodels

11/15/18

Page 20: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

NaïveProbabilisticModelsoftextdocuments

Pr(D | C = c) =

Pr(W1 = n1,W2 = n2 ,...,Wk = nk | C = c)

Pr(W1 = true,W2 = false...,Wk = true | C = c)

20

Dr.YanjunQi/UVACS

TwoPreviousmodels

11/15/18

MultivariateBernoulliDistribution

MultinomialDistribution

Page 21: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Text Classification with Naïve Bayes Classifier

•  Multinomial vs Multivariate Bernoulli?

•  Multinomial model is almost always more effective in text applications!

11/15/18 21

Dr.YanjunQi/UVACS

AdaptFromManning’textCattutorial

Page 22: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Experiment:MultinomialvsmultivariateBernoulli

•  M&N(1998)didsomeexperimentstoseewhichisbetter

•  Determineifauniversitywebpageis{student,faculty,other_stuff}

•  Trainon~5,000hand-labeledwebpages–  Cornell,Washington,U.Texas,Wisconsin

•  Crawlandclassifyanewsite(CMU)

11/15/18 22

Dr.YanjunQi/UVACS

AdaptFromManning’textCattutorial

Page 23: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Multinomialvs.multivariateBernoulli

11/15/18 23

Dr.YanjunQi/UVACS

Page 24: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Today:NaïveBayesClassifierforTextü  DictionarybasedVectorspacerepresentationof

textarticleü  MultivariateBernoullivs.Multinomialü  MultivariateBernoulli§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparametersü  MultinomialnaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparameters§  MultinomialnaïveBayesclassifierasConditional

StochasticLanguageModels(Extra)

11/15/18 24

Dr.YanjunQi/UVACS

Page 25: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 1: Multivariate Bernoulli

•  Model 1: Multivariate Bernoulli – For each word in a dictionary, feature – Xw = true in document d if w appears in d

– Naive Bayes assumption: •  Given the document’s topic class label,

appearance of one word in the document tells us nothing about chances that another word appears

25 11/15/18

Dr.YanjunQi/UVACS

Pr(W1 = true,W2 = false...,Wk = true | C = c)

word Boolean

I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.

great Yes

love Yes

recommend Yes

laugh Yes

happy Yes

hate No

... .

Xw

Page 26: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 1: Multivariate Bernoulli

•  Model 1: Multivariate Bernoulli – One feature Xw for each word in dictionary – Xw = true in document d if w appears in d

– Naive Bayes assumption: •  Given the document’s class label,

appearance of one word in the document tells us nothing about chances that another word appears

26 11/15/18

Dr.YanjunQi/UVACS

Pr(W1 = true,W2 = false...,Wk = true | C = c)

Page 27: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 1: Multivariate Bernoulli Naïve Bayes Classifier

27 11/15/18

Dr.YanjunQi/UVACS

•  Conditional Independence Assumption: Features (word presence) are independent of each other given the class variable:

•  Multivariate Bernoulli model is appropriate for binary feature variables

word True/false

great Yes

love Yes

recommend Yes

laugh Yes

happy Yes

hate No

... .

Page 28: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 1: Multivariate Bernoulli

28

C=Flu

W1 W2 W5 W3 W4 W6

11/15/18

Pr(W1 = true,W2 = false,...,Wk = true |C = c)= P(W1 = true |C)•P(W2 = false |C)•!•P(Wk = true |C)

runnynose sinus fever muscle-ache running

thisisnaïve

Dr.YanjunQi/UVACS

Page 29: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Review:BernoulliDistributione.g.CoinFlips

•  Youflipacoin– Headwithprobabilityp– Binaryrandomvariable– Bernoullitrialwithsuccessprobabilityp

•  Youflipkcoins– Howmanyheadswouldyouexpect– NumberofheadsX:discreterandomvariable– Binomialdistributionwithparameterskandp

11/15/18

Dr.YanjunQi/UVACS

29

Pr(Wi = true | C = c)

Page 30: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

11/15/18

Dr.YanjunQi/UVACS

30

Page 31: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Review:DerivingtheMaximumLikelihoodEstimateforBernoulli

−l(p)= − log(L(p))= − log px(1− p)n−x⎡⎣ ⎤⎦

= − log(px )− log((1− p)n−x )

= −x log(p)−(n− x)log(1− p)

11/15/18 31

Dr.YanjunQi/UVACS

ˆ p = xn

The proportion of positives!

i.e.Relativefrequencyofabinaryevent

Minimize the negative log-likelihood è MLE parameter estimation

Page 32: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Parameter estimation

fraction of documents of label cj in which word w_i appears

•  Multivariate Bernoulli model:

•  SmoothingtoAvoidOverfitting

P̂(wi = true|c j )=

11/15/18 32 AdaptFromManning’textCattutorial

Dr.YanjunQi/UVACS

Page 33: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

TestingStage:(LookUpOperations)

11/15/18

Dr.YanjunQi/UVACS

33

Page 34: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Underflow Prevention: log space •  Multiplying lots of probabilities, which are between 0 and 1,

can result in floating-point underflow. •  Since log(xy) = log(x) + log(y), it is better to perform all

computations by summing logs of probabilities rather than multiplying probabilities.

•  Class with highest final un-normalized log probability score is still the most probable.

•  Note that model is now just max of sum of weights… cNB = argmax

cj∈ClogP(c j )+ logP(xi |c j )

i∈dictionary∑

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

34 11/15/18AdaptFromManning’textCattutorial

Dr.YanjunQi/UVACS

Page 35: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Today:NaïveBayesClassifierforTextü  DictionarybasedVectorspacerepresentationof

textarticleü  MultivariateBernoullivs.Multinomialü  MultivariateBernoulli§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparametersü  MultinomialnaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparameters§  MultinomialnaïveBayesclassifierasConditional

StochasticLanguageModels(Extra)

11/15/18 35

Dr.YanjunQi/UVACS

Page 36: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 2: Multinomial Naïve Bayes - ‘Bagofwords’representationoftext

word frequency

Canberepresentedasamultinomialdistribution.

Words=likecoloredballs,thereareKpossibletypeofthem(i.e.fromadictionaryofKwords)Document=containsNwords,eachwordoccursnitimes(likeabagofNcoloredballs)

11/15/18 36

Dr.YanjunQi/UVACS

great 2

love 2

recommend 1

laugh 1

happy 1

... .

Pr(W1 = n1,...,Wk = nk |C = c)

Page 37: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 2: Multinomial Naïve Bayes - ‘Bagofwords’representationoftext

word frequency

Canberepresentedasamultinomialdistribution.

Words=likecoloredballs,thereareKpossibletypeofthem(i.e.fromadictionaryofKwords)Document=containsNwords,eachwordoccursnitimes(likeabagofNcoloredballs)

11/15/18 37

Dr.YanjunQi/UVACS

Pr(W1 = n1,...,Wk = nk |C = c)great 2

love 2

recommend 1

laugh 1

happy 1

... .

Page 38: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 2: Multinomial Naïve Bayes - ‘Bagofwords’representationoftext

word frequency

Canberepresentedasamultinomialdistribution.

Words=likecoloredballs,thereareKpossibletypeofthem(i.e.fromadictionaryofKwords)ADocument=containsNwords,eachwordoccursnitimes(likeabagofNcoloredballs)

11/15/18 38

Dr.YanjunQi/UVACS

Pr(W1 = n1,...,Wk = nk |C = c)great 2

love 2

recommend 1

laugh 1

happy 1

... .

Page 39: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Multinomialdistribution•  Themultinomialdistributionisageneralizationofthebinomial

distribution.

•  Thebinomialdistributioncountssuccessesofanevent(forexample,headsincointosses).

•  Theparameters:–  N(numberoftrials)–  (theprobabilityofsuccessoftheevent)

•  Themultinomialcountsthenumberofasetofevents(forexample,howmanytimeseachsideofadiecomesupinasetofrolls).–  Theparameters:–  N(numberoftrials)–  (theprobabilityofsuccessforeachcategory)

p

1.. kθ θ

11/15/18 39

Dr.YanjunQi/UVACS

Abinomialdistributionisthemultinomialdistributionwithk=2and

θ1 = pθ2 = 1−θ1

Page 40: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Multinomialdistribution•  Themultinomialdistributionisageneralizationofthebinomial

distribution.

•  Thebinomialdistributioncountssuccessesofanevent(forexample,headsincointosses).

•  Theparameters:–  N(numberoftrials)–  (theprobabilityofsuccessoftheevent)

•  Themultinomialcountsthenumberofasetofevents(forexample,howmanytimeseachsideofadiecomesupinasetofrolls).–  Theparameters:–  N(numberoftrials)–  (theprobabilityofsuccessforeachcategory)

p

1.. kθ θ

11/15/18 40

Dr.YanjunQi/UVACS

Page 41: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

MultinomialDistributionforTextClassification

•  W1,W2,..Wkarevariables

P(W1 = n1,...,Wk = nk | c, N ,θ1,c ,..,θk ,c ) = N !

n1!n2 !..nk !θ1,c

n1θ2,cn2 ..θk ,c

nk

1

k

iin N

=

=∑1

1k

iiθ

=

=∑

NumberofpossibleorderingsofNballs

orderinvariantselectionsNoteeventsareindependent

Abinomialdistributionisthemultinomialdistributionwithk=2and

θ1 = pθ2 = 1−θ1

11/15/18 41

Dr.YanjunQi/UVACS

Page 42: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

MultinomialDistributionforTextClassification

•  W1,W2,..Wkarevariables

1

k

iin N

=

=∑

θ i,ci=1

k

∑ = 1

NumberofpossibleorderingsofNballs

Labelinvariant

11/15/18 42

Dr.YanjunQi/UVACS

P(W1 = n1,...,Wk = nk | c, N ,θ1,c ,..,θk ,c ) = N !

n1!n2 !..nk !θ1,c

n1θ2,cn2 ..θk ,c

nk

Page 43: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 2: Multinomial Naïve Bayes - ‘Bagofwords’ –TESTINGStage

word frequency

11/15/18 43

Dr.YanjunQi/UVACS

argmaxc

P(W1 = n1,...,Wk = nk ,c)

= argmaxc

{p(c)*θ1,cn1θ2,c

n2 ..θk ,cnk }

great 2

love 2

recommend 1

laugh 1

happy 1

... .

Page 44: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Today:NaïveBayesClassifierforTextü  DictionarybasedVectorspacerepresentationof

textarticleü  MultivariateBernoullivs.Multinomialü  MultivariateBernoulli§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparametersü  MultinomialnaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparameters§  MultinomialnaïveBayesclassifierasConditional

StochasticLanguageModels(Extra)

11/15/18 44

Dr.YanjunQi/UVACS

Page 45: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

11/15/18

Dr.YanjunQi/UVACS

45

Page 46: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

DerivingtheMaximumLikelihoodEstimateformultinomialdistribution

11/15/18 46

function of θ

LIKELIHOOD:

argmaxθ

1,..,θk

P(d1,...,dT |θ1,..,θk )

= argmaxθ

1,..,θk t=1

T

∏P(dt |θ1,..,θk )

= argmaxθ

1,..,θk t=1

T

∏Nd t !

n1,dt!n2,dt

!..nk ,dt!θ1

n1,dtθ2

n2,dt ..θk

nk ,dt

= argmaxθ

1,..,θk t=1

T

∏θ1

n1,dtθ2

n2,dt ..θk

nk ,dt

s.t. θii=1

k

∑ =1

Dr.YanjunQi/UVACS

Page 47: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

11/15/18

Dr.YanjunQi/UVACS

47

DerivingtheMaximumLikelihoodEstimateformultinomialdistributionargmax

θ1,..,θk

log(L(θ ))

= argmaxθ1,..,θk

log(t=1

T

∏θ1n1,dtθ2

n2,dt ..θknk ,dt )

= argmaxθ1,..,θk

n1,dtt=1,...T∑ log(θ1)+ n2,dt

t=1,...T∑ log(θ2 )+ ...+ nk ,dt

t=1,...T∑ log(θk )

s.t. θii=1

k

∑ =1

ConstrainedoptimizationMLEestimator

Constrainedoptimization

θi =

ni,dtt=1,...T∑

n1,dtt=1,...T∑ + n2,dt

t=1,...T∑ +...+ nk,dt

t=1,...T∑

=

ni,dtt=1,...T∑

Ndtt=1,...T∑

è i.e. We can create a mega-document by concatenating all documents d_1 to d_T èUse relative frequency of w in mega-document

Page 48: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

DerivingtheMaximumLikelihoodEstimateformultinomialdistribution

11/15/18 48

argmaxθ1,..,θk

log(L(θ ))

= argmaxθ1,..,θk

log(t=1

T

∏θ1n1,dtθ2

n2,dt ..θknk ,dt )

= argmaxθ1,..,θk

n1,dtt=1,...T∑ log(θ1)+ n2,dt

t=1,...T∑ log(θ2 )+ ...+ nk ,dt

t=1,...T∑ log(θk )

s.t. θii=1

k

∑ =1

ConstrainedoptimizationMLEestimator

Constrainedoptimization

θi =

ni,dtt=1,...T∑

n1,dtt=1,...T∑ + n2,dt

t=1,...T∑ +...+ nk,dt

t=1,...T∑

=

ni,dtt=1,...T∑

Ndtt=1,...T∑

è i.e. We can create a mega-document by concatenating all documents d_1 to d_T èUse relative frequency of w in mega-document

Dr.YanjunQi/UVACS

Page 49: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

DerivingtheMaximumLikelihoodEstimateformultinomialdistribution

11/15/18 49

argmaxθ1,..,θk

log(L(θ ))

= argmaxθ1,..,θk

log(t=1

T

∏θ1n1,dtθ2

n2,dt ..θknk ,dt )

= argmaxθ1,..,θk

n1,dtt=1,...T∑ log(θ1)+ n2,dt

t=1,...T∑ log(θ2 )+ ...+ nk ,dt

t=1,...T∑ log(θk )

s.t. θii=1

k

∑ =1

ConstrainedoptimizationMLEestimator

Constrainedoptimization

è i.e. We can create a mega-document by concatenating all documents d_1 to d_T èUse relative frequency of w in mega-document

Dr.YanjunQi/UVACS

θi =

ni,dtt=1,...T∑

n1,dtt=1,...T∑ + n2,dt

t=1,...T∑ +...+ nk,dt

t=1,...T∑

=

ni,dtt=1,...T∑

Ndtt=1,...T∑

Page 50: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

DerivingtheMaximumLikelihoodEstimateformultinomialdistribution

11/15/18 50

argmaxθ1,..,θk

log(L(θ ))

= argmaxθ1,..,θk

log(t=1

T

∏θ1n1,dtθ2

n2,dt ..θknk ,dt )

= argmaxθ1,..,θk

n1,dtt=1,...T∑ log(θ1)+ n2,dt

t=1,...T∑ log(θ2 )+ ...+ nk ,dt

t=1,...T∑ log(θk )

s.t. θii=1

k

∑ =1

ConstrainedoptimizationMLEestimator

Constrainedoptimization

θi =

ni,dtt=1,...T∑

n1,dtt=1,...T∑ + n2,dt

t=1,...T∑ +...+ nk,dt

t=1,...T∑

=

ni,dtt=1,...T∑

Ndtt=1,...T∑

è i.e. We can create a mega-document by concatenating all documents d_1 to d_T èUse relative frequency of wi in mega-document

Dr.YanjunQi/UVACS

Page 51: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

DerivingtheMaximumLikelihoodEstimateformultinomialdistribution

11/15/18 51

ConstrainedoptimizationMLEestimator

θi =

ni,dtt=1,...T∑

n1,dtt=1,...T∑ + n2,dt

t=1,...T∑ +...+ nk,dt

t=1,...T∑

=

ni,dtt=1,...T∑

Ndtt=1,...T∑

è i.e. We can create a mega-document by concatenating all documents d_1 to d_T

èUse relative frequency of w in mega-document

Dr.YanjunQi/UVACS

Page 52: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

DerivingtheMaximumLikelihoodEstimateformultinomialBayesClassifier

11/15/18 52

function of θ

LIKELIHOOD:

argmaxθ

1,..,θk

P(d1,...,dT |θ1,..,θk )

= argmaxθ

1,..,θk t=1

T

∏P(dt |θ1,..,θk )

= argmaxθ

1,..,θk t=1

T

∏Nd t !

n1,dt!n2,dt

!..nk ,dt!θ1

n1,dtθ2

n2,dt ..θk

nk ,dt

= argmaxθ

1,..,θk t=1

T

∏θ1

n1,dtθ2

n2,dt ..θk

nk ,dt

s.t. θii=1

k

∑ =1

Dr.YanjunQi/UVACS

Page 53: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Parameter estimation

fraction of times in which word w appears

across all documents of class cj

== )|(ˆ ji cwXP

11/15/18 53 AdaptFromManning’textCattutorial

Dr.YanjunQi/UVACS

Multinomial model:

•  Can create a mega-document for class j by concatenating all documents on this class,

•  Use frequency of w in mega-document

Page 54: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

n  Textj ß is length nj and is a singledocumentcontainingalldocsjn  foreachwordwk inVocabulary

n  nk,j ß numberofoccurrencesof wk inTextj;nj is length of Textj

n 

Multinomial : Learning Algorithm for parameter estimation with MLE

•  From training corpus, extract Vocabulary •  Calculate required P(cj) and P(wk | cj) terms

–  For each cj in C do •  docsj ß subset of documents for which the target

class is cj

P(wk |c j )←

nk , j +αnj +α |Vocabulary |

|documents # total|||

)( jj

docscP ←

54 11/15/18Relative frequency of word w_k appears

across all documents of class cj

Dr.YanjunQi/UVACS

e.g.,α =1

Page 55: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

n  Textj ß is length nj and is a singledocumentcontainingalldocsjn  foreachwordwk inVocabulary

n  nk,j ß numberofoccurrencesof wk inTextj;nj is length of Textj

n 

Multinomial : Learning Algorithm for parameter estimation with MLE

•  From training corpus, extract Vocabulary •  Calculate required P(cj) and P(wk | cj) terms

–  For each cj in C do •  docsj ß subset of documents for which the target

class is cj

P(wk |c j )←

nk , j +αnj +α |Vocabulary |

|documents # total|||

)( jj

docscP ←

55 11/15/18Relative frequency of word w_k appears

across all documents of class cj

Dr.YanjunQi/UVACS

e.g.,α =1

Page 56: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Naive Bayes is Not So Naive •  Naïve Bayes: First and Second place in KDD-CUP 97 competition, among

16 (then) state of the art algorithms Goal: Financial services industry direct mail response prediction model: Predict if the recipient of mail will actually respond to the advertisement – 750,000 records.

•  Robust to Irrelevant Features Irrelevant Features cancel each other without affecting results Instead Decision Trees can heavily suffer from this.

•  Very good in domains with many equally important features Decision Trees suffer from fragmentation in such cases – especially if little data

•  A good dependable baseline for text classification (but not the best)! •  Optimal if the Independence Assumptions hold: If assumed independence is

correct, then it is the Bayes Optimal Classifier for problem •  Very Fast: Learning with one pass of counting over the data; testing linear in the

number of attributes, and document collection size •  Low Storage requirements

56 11/15/18

Dr.YanjunQi/UVACS

Page 57: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

References

q Prof.AndrewMoore’sreviewtutorialq Prof.KeChenNBslidesq Prof.CarlosGuestrinrecitationslidesq Prof.RaymondJ.MooneyandJimmyLin’sslidesaboutlanguagemodel

q Prof.Manning’textCattutorial

11/15/18

Dr.YanjunQi/UVACS

57

Page 58: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Today:NaïveBayesClassifierforTextü  DictionarybasedVectorspacerepresentationof

textarticleü  MultivariateBernoullivs.Multinomialü  MultivariateBernoulli§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparametersü  MultinomialnaïveBayesclassifier§  Testing§  TrainingWithMaximumLikelihoodEstimation

forestimatingparameters§  MultinomialnaïveBayesclassifierasConditional

StochasticLanguageModels(Extra)

11/15/18 58

Dr.YanjunQi/UVACS

Page 59: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

TheBigPicture

Modeli.e.Datagenerating

process

ObservedData

Probability

Estimation/learning/Inference/Datamining

Buthowtospecifyamodel?

11/15/18

Build a generative model that approximates how data is produced.

59

Dr.YanjunQi/UVACS

Page 60: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Model 2: Multinomial Naïve Bayes - ‘Bagofwords’representationoftext

grain(s) 3 oilseed(s) 2 total 3 wheat 1 maize 1 soybean 1 tonnes 1

... ...

word frequency

Canberepresentedasamultinomialdistribution.

Words=likecoloredballs,thereareKpossibletypeofthem(i.e.fromadictionaryofKwords)Document=containsNwords,eachwordoccursnitimes(likeabagofNcoloredballs)

11/15/18 60

Pr(W1 = n1,...,Wk = nk |C = c)

WHYisthisnaïve???

P(W1 = n1,...,Wk = nk | c, N ,θ

1,..,θk ) = N !

n1!n2 !..nk !θ1

n1θ2n2 ..θk

nk

multinomialcoefficient,normallycanleaveoutinpracticalcalculations.

Dr.YanjunQi/UVACS

Page 61: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

WHYMULTINOMIALONTEXTISNAÏVEPROB.MODELING?

MainQuestion:

11/15/18

Dr.YanjunQi/UVACS

61

Page 62: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

MultinomialNaïveBayesasèa generative model that approximates how a text string is produced

•  StochasticLanguageModels:– Modelprobabilityofgeneratingstrings(eachwordinturnfollowingthesequentialorderinginthestring)inthelanguage(commonlyallstringsoverdictionary∑).

–  E.g.,unigrammodel

0.2 the

0.1 a

0.01 boy

0.01 dog

0.03 said

0.02 likes

the boy likes the dog

0.2 0.01 0.02 0.2 0.01

Multiply all five terms

Model C_1

P(s | C_1) = 0.00000008 11/15/18 62

AdaptFromManning’textCattutorial

Dr.YanjunQi/UVACS

Page 63: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

MultinomialNaïveBayesasèa generative model that approximates how a text string is produced

•  StochasticLanguageModels:– Modelprobabilityofgeneratingstrings(eachwordinturnfollowingthesequentialorderinginthestring)inthelanguage(commonlyallstringsoverdictionary∑).

–  E.g.,unigrammodel

0.2 the

0.1 a

0.01 boy

0.01 dog

0.03 said

0.02 likes

the boy likes the dog

0.2 0.01 0.02 0.2 0.01

Multiply all five terms

Model C_1

P(d | C_1) = 0.00000008 11/15/18 63

AdaptFromManning’textCattutorial

Dr.YanjunQi/UVACS

Page 64: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

MultinomialNaïveBayesasConditionalStochasticLanguageModels

•  Modelconditionalprobabilityofgeneratinganystringfromtwopossiblemodels

0.2 the

0.01 boy

0.0001 said

0.0001 likes

0.0001 black

0.0005 dog

0.01 garden

Model C1 Model C2

dog boy likes black the

0.0005 0.01 0.0001 0.0001 0.2 0.01 0.0001 0.02 0.1 0.2

P(d|C2) P(C2) > P(d|C1) P(C1)

0.2 the

0.0001 boy

0.03 said

0.02 likes

0.1 black

0.01 dog

0.0001 garden

11/15/18 64èdismorelikelytobefromclassC2

Dr.YanjunQi/UVACS

P(C2) = P(C1)

Page 65: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

APhysicalMetaphor

•  Coloredballsarerandomlydrawnfrom(withreplacement)

A string of words model

P()=P()P()P()P()

11/15/18 65

Dr.YanjunQi/UVACS

Page 66: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

UnigramlanguagemodelèMoregeneral:Generatinglanguagestringfromaprobabilisticmodel

•  UnigramLanguageModels

•  Alsocouldbebigram(orgenerally,n-gram)LanguageModels

=P()P(|)P(|)P(|)

P()P()P()P()

P()

Easy. Effective!

11/15/18 66AdaptFromManning’textCattutorial

P()P(|)P(|)P(|)

Chainrule

NAÏVE:conditionalindependentoneachpositionofthestring

Dr.YanjunQi/UVACS

Page 67: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

UnigramlanguagemodelèMoregeneral:Generatinglanguagestringfromaprobabilisticmodel

•  UnigramLanguageModels

•  Alsocouldbebigram(orgenerally,n-gram)LanguageModels

=P()P(|)P(|)P(|)

P()P()P()P()

P()

Easy. Effective!

11/15/18 67AdaptFromManning’textCattutorial

P()P(|)P(|)P(|)

Chainrule

NAÏVE:conditionalindependentoneachpositionofthestring

Dr.YanjunQi/UVACS

Page 68: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

MultinomialNaïveBayes=aclassconditionalunigramlanguagemodel

•  ThinkofXiasthewordontheithpositioninthedocumentstring

•  Effectively,theprobabilityofeachclassisdoneasaclass-specificunigramlanguagemodel

class

X1 X2 X3 X4 X5 X6

11/15/18 68AdaptFromManning’textCattutorial

Dr.YanjunQi/UVACS

Page 69: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

11/15/18

Dr.YanjunQi/UVACS

69

the boy likes the dog

0.2 0.01 0.02 0.2 0.01

Page 70: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Using Multinomial Naive Bayes Classifiers to Classify Text: Basic method

•  Attributes are text positions, values are words.

n  Stilltoomanypossibilitiesn  Usesameparametersforawordacrosspositionsn  Resultisbagofwordsmodel(overwordtokens)

cNB = argmaxc j∈C

P(cj ) P(xi | cj )i∏

= argmaxc j∈C

P(cj )P(x1 = "the"|cj )!P(xn = "the"|cj )

70 11/15/18

Dr.YanjunQi/UVACS

Page 71: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Multinomial Naïve Bayes: Classifying Step

•  Positions ß all word positions in current document which contain tokens found in Vocabulary

•  Return cNB, where

∏∈∈

=positionsi

jijCc

NB cxPcPc )|()(argmaxj

11/15/18 71

Easytoimplement,no

needtoconstructbag-of-words

vectorexplicitly!!!

AdaptFromManning’textCattutorial

Pr(W1 = n1,...,Wk = nk |C = c j )

Equalto,(leavingoutofmultinomialcoefficient)

Dr.YanjunQi/UVACS

Page 72: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Multinomial Naïve Bayes: Classifying Step

•  Positions ß all word positions in current document which contain tokens found in Vocabulary

•  Return cNB, where

∏∈∈

=positionsi

jijCc

NB cxPcPc )|()(argmaxj

11/15/18 72

Easytoimplement,no

needtoconstructbag-of-words

vectorexplicitly!!!

AdaptFromManning’textCattutorial

Pr(W1 = n1,...,Wk = nk |C = c j )

Equalto,(leavingoutofmultinomialcoefficient)

Dr.YanjunQi/UVACS

Page 73: UVA CS 4501 : Machine Learning Lecture 17: Naïve Bayes ... · Naive Bayes is Not So Naive • Naive Bayes won 1st and 2nd place in KDD-CUP 97 competition out of 16 systems Goal:

Multinomial Bayes: Time Complexity

•  Training Time: O(T*Ld + |C||V|)) where Ld is the average length of a document in D. –  Assumes V and all Di , ni, and nk,j pre-computed in O(T*Ld)

time during one pass through all of the data. –  |C||V|=Complexityofcomputingallprobabilityvalues(loopover

wordsandclasses)

–  Generally just O(T*Ld) since usually |C||V| < T*Ld •  Test Time: O(|C| Lt)

where Lt is the average length of a test document. –  Very efficient overall, linearly proportional to the time needed

to just read in all the words. –  Plus, robust in practice

73 AdaptFromManning’textCattutorial

T: num. doc |V| dict size |C| class size

11/15/18

Dr.YanjunQi/UVACS