naive bayes classifiers and document classification · brandon malone naive bayes classi ers and...

40
The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up Naive Bayes Classifiers and Document Classification Brandon Malone Much of this material is adapted from notes by Hiroshi Shimodaira Many of the images were taken from the Internet January 24, 2014 Brandon Malone Naive Bayes Classifiers and Document Classification

Upload: others

Post on 26-Jun-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Naive Bayes Classifiersand Document Classification

Brandon Malone

Much of this material is adapted from notes by Hiroshi Shimodaira

Many of the images were taken from the Internet

January 24, 2014

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 2: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Document Classification

Suppose we have a large number of books. Some are about fantasy,

some are about technology, and some are about the high seas.

?

?

?

We are given a new book. How can we (automatically) tell whichtopic the book belongs to?

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 3: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

The Naive Bayes Classifier

A1 A2 A3 Am

C

What are the conditional independencies asserted by this structure?

All of the attributes (Ai s, sometimes called “features”) areindependent, given the class.

If all variables are binary, how many parameters do we need?

1 for the class, plus 2 for each attribute.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 4: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

The Naive Bayes Classifier

A1 A2 A3 Am

C

What are the conditional independencies asserted by this structure?

All of the attributes (Ai s, sometimes called “features”) areindependent, given the class.

If all variables are binary, how many parameters do we need?

1 for the class, plus 2 for each attribute.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 5: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

The Naive Bayes Classifier

A1 A2 A3 Am

C

What are the conditional independencies asserted by this structure?

All of the attributes (Ai s, sometimes called “features”) areindependent, given the class.

If all variables are binary, how many parameters do we need?

1 for the class, plus 2 for each attribute.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 6: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

The Naive Bayes Classifier

A1 A2 A3 Am

C

What are the conditional independencies asserted by this structure?

All of the attributes (Ai s, sometimes called “features”) areindependent, given the class.

If all variables are binary, how many parameters do we need?

1 for the class, plus 2 for each attribute.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 7: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

1 The Multinomial Distribution

2 Multinomial document model

3 Naive Bayes Classifier

4 Wrap-up

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 8: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Counting distinct permutations

How many distinct sequences can we make?

There are 16 letters, so there are 16! ≈ 2× 1013 permutations.

We can choose the “I”s 4! different ways but have the same permutation.

n!

n1!n2! . . . nd !=

n!

nM !nI !nS !nP !nT !nA!nE !=

16!

1!4!5!2!2!1!1!≈ 1.8× 109

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 9: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Counting distinct permutations

How many distinct sequences can we make?

There are 16 letters, so there are 16! ≈ 2× 1013 permutations.

We can choose the “I”s 4! different ways but have the same permutation.

n!

n1!n2! . . . nd !=

n!

nM !nI !nS !nP !nT !nA!nE !=

16!

1!4!5!2!2!1!1!≈ 1.8× 109

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 10: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Counting distinct permutations

How many distinct sequences can we make?

There are 16 letters, so there are 16! ≈ 2× 1013 permutations.

We can choose the “I”s 4! different ways but have the same permutation.

n!

n1!n2! . . . nd !=

n!

nM !nI !nS !nP !nT !nA!nE !=

16!

1!4!5!2!2!1!1!≈ 1.8× 109

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 11: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Counting distinct permutations

How many distinct sequences can we make?

There are 16 letters, so there are 16! ≈ 2× 1013 permutations.

We can choose the “I”s 4! different ways but have the same permutation.

n!

n1!n2! . . . nd !=

n!

nM !nI !nS !nP !nT !nA!nE !=

16!

1!4!5!2!2!1!1!≈ 1.8× 109

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 12: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Creating a distribution for the items

Suppose we now attach probabilities to each of the d items.

d∑t=1

pt = 1 pt > 0, for all t

We can view creating our sequence as a series of independentdraws from this distribution.

If order important, then the probability of our example is

pM × pI × pS × pS · · · × pE = pnMM × pnII × pnSS . . . pnEE =d∏

t=1

pndt

What if order is not important?

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 13: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Creating a distribution for the items

What if order is not important?

Say n = (n1, . . . nd) gives the number of each item we drew. Then

P(n) = P(drawing n one way)× number of ways to draw n

P(n) =d∏

t=1

pndt ×n!

n1!n2! . . . nd !

This is called the multinomial distribution.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 14: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Creating a distribution for the items

What if order is not important?

Say n = (n1, . . . nd) gives the number of each item we drew. Then

P(n) = P(drawing n one way)× number of ways to draw n

P(n) =d∏

t=1

pndt ×n!

n1!n2! . . . nd !

This is called the multinomial distribution.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 15: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Estimating the probabilities from data

Suppose we roll a die 6 times, and we get...

What probabilities might we attach to each number?

pt =nt∑du=1 nu

These are called the maximum likelihood parameters.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 16: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Estimating the probabilities from data

Suppose we roll a die 6 times, and we get...

What probabilities might we attach to each number?

pt =nt∑du=1 nu

These are called the maximum likelihood parameters.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 17: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Estimating the probabilities from data

Suppose we roll a die 6 times, and we get...

What probabilities might we attach to each number?

pt =nt∑du=1 nu

These are called the maximum likelihood parameters.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 18: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

The zero probability problem

Suppose we use the maximum likelihood parameters. What is theprobability of rolling a 3?

A simple correction is to add a “pseudocount” to each item.

pt =nt + 1

d +∑d

u=1 nu

This is sometimes called “smoothing,” and we will return to this problem.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 19: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

The zero probability problem

Suppose we use the maximum likelihood parameters. What is theprobability of rolling a 3?

A simple correction is to add a “pseudocount” to each item.

pt =nt + 1

d +∑d

u=1 nu

This is sometimes called “smoothing,” and we will return to this problem.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 20: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Documents as bags of words

We can view documents as a bag of words, in which we discardthe order among words and simply count occurrences.

So a document D i is ni = (ni ,1, . . . ni ,d), where ni ,t gives the countof word t in D i .

Our vocabulary consists of d words.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 21: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

A generative model for a document

Suppose we want to create a document (bag of words) of K words.

Further, suppose we are given the distribution for the vocabulary(pt for each word).

A simple technique is to draw from the distribution K times.

Kwk

Figure: A simple generative model using plate notation

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 22: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

A generative model for a document

Suppose we want to create a document (bag of words) of K words.

Further, suppose we are given the distribution for the vocabulary(pt for each word).

A simple technique is to draw from the distribution K times.

Kwk

Figure: A simple generative model using plate notation

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 23: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Documents about a topic

Suppose I want to write a book about fantasy.

Am I likely to use the same words as if I were writing a book abouttechnology?

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 24: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Documents about a topic

Suppose I want to write a book about fantasy.

Am I likely to use the same words as if I were writing a book abouttechnology?

Maybe...

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 25: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Documents about a topic

Suppose I want to write a book about fantasy.

Am I likely to use the same words as if I were writing a book abouttechnology?

... but probably not.

So the probability distribution of my words depends upon the topicof my book.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 26: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

A generative model for documents about a topic

Suppose we want to create a document of K words about fantasy.

Further, suppose we are given the distribution for the vocabulary giventhat the topic is fantasy (P(wt |C = fantasy) for each word).

A simple technique is to draw from the distribution K times.

We assume the word probabilities are independent given the topic!

This is called the (multinomial) naive Bayes classifier.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 27: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

A generative model for documents about a topic

Suppose we want to create a document of K words about fantasy.

Further, suppose we are given the distribution for the vocabulary giventhat the topic is fantasy (P(wt |C = fantasy) for each word).

A simple technique is to draw from the distribution K times.

wk

C

K

Figure: A conditional generative model using plate notation

We assume the word probabilities are independent given the topic!

This is called the (multinomial) naive Bayes classifier.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 28: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

A generative model for documents about a topic

Suppose we want to create a document of K words about fantasy.

Further, suppose we are given the distribution for the vocabulary giventhat the topic is fantasy (P(wt |C = fantasy) for each word).

A simple technique is to draw from the distribution K times.

w1 w2 w3 wK

C

. . .

Figure: A conditional generative model as an explicit graphical model

We assume the word probabilities are independent given the topic!

This is called the (multinomial) naive Bayes classifier.

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 29: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning forward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni and are toldthat it is about fantasy.

What is the likelihood of this document, Pr(ni |C = fantasy)?

P(ni |C = fantasy) = P(drawing ni one way|C = fantasy)× number of ways to draw ni

=d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 30: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning forward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni and are toldthat it is about fantasy.

What is the likelihood of this document, Pr(ni |C = fantasy)?

P(ni |C = fantasy) = P(drawing ni one way|C = fantasy)× number of ways to draw ni

=d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 31: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning forward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni and are toldthat it is about fantasy.

What is the likelihood of this document, Pr(ni |C = fantasy)?

P(ni |C = fantasy) = P(drawing ni one way|C = fantasy)× number of ways to draw ni

=d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 32: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning backward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni .

What is the posterior probability that this document is aboutfantasy, Pr(C = fantasy|ni )?

Pr(C = fantasy|ni ) =Pr(ni |C = fantasy)× Pr(C = fantasy)

Pr(ni )

=

∏dt=1 Pr(wt |C = fantasy)nd × n!

n1!n2!...nd !× Pr(C = fantasy)

Pr(ni )

Do we need to know the exact probability for classification?

Pr(C = fantasy|ni ) ∝d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !× Pr(C = fantasy)

The topic of ni is arg maxk Pr(C = k|ni ).

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 33: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning backward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni .

What is the posterior probability that this document is aboutfantasy, Pr(C = fantasy|ni )?

Pr(C = fantasy|ni ) =Pr(ni |C = fantasy)× Pr(C = fantasy)

Pr(ni )

=

∏dt=1 Pr(wt |C = fantasy)nd × n!

n1!n2!...nd !× Pr(C = fantasy)

Pr(ni )

Do we need to know the exact probability for classification?

Pr(C = fantasy|ni ) ∝d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !× Pr(C = fantasy)

The topic of ni is arg maxk Pr(C = k|ni ).

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 34: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning backward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni .

What is the posterior probability that this document is aboutfantasy, Pr(C = fantasy|ni )?

Pr(C = fantasy|ni ) =Pr(ni |C = fantasy)× Pr(C = fantasy)

Pr(ni )

=

∏dt=1 Pr(wt |C = fantasy)nd × n!

n1!n2!...nd !× Pr(C = fantasy)

Pr(ni )

Do we need to know the exact probability for classification?

Pr(C = fantasy|ni ) ∝d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !× Pr(C = fantasy)

The topic of ni is arg maxk Pr(C = k|ni ).

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 35: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning backward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni .

What is the posterior probability that this document is aboutfantasy, Pr(C = fantasy|ni )?

Pr(C = fantasy|ni ) =Pr(ni |C = fantasy)× Pr(C = fantasy)

Pr(ni )

=

∏dt=1 Pr(wt |C = fantasy)nd × n!

n1!n2!...nd !× Pr(C = fantasy)

Pr(ni )

Do we need to know the exact probability for classification?

Pr(C = fantasy|ni ) ∝d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !× Pr(C = fantasy)

The topic of ni is arg maxk Pr(C = k|ni ).

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 36: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Reasoning backward about documents

Suppose we are given a naive Bayes classifier (Pr(wt |C ) for allwords and topics and Pr(C ) for all topics).

Further, suppose we are given a document Di = ni .

What is the posterior probability that this document is aboutfantasy, Pr(C = fantasy|ni )?

Pr(C = fantasy|ni ) =Pr(ni |C = fantasy)× Pr(C = fantasy)

Pr(ni )

=

∏dt=1 Pr(wt |C = fantasy)nd × n!

n1!n2!...nd !× Pr(C = fantasy)

Pr(ni )

Do we need to know the exact probability for classification?

Pr(C = fantasy|ni ) ∝d∏

t=1

Pr(wt |C = fantasy)nd ×n!

n1!n2! . . . nd !× Pr(C = fantasy)

The topic of ni is arg maxk Pr(C = k |ni ).

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 37: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Learning naive Bayes classifiers from data

Suppose we are given N documents and their topics. How can welearn a naive Bayes classifier from this?

Pr(C = k). The (smoothed) proportion of documents whichbelong to topic k

Pr(C = k) =Nk + 1

N + T

Pr(wt |C = k). The (smoothed) proportion of the times wtappears in a document from topic k .

Pr(wt |C = k) =1 + number of times wt appears in a document from topic k

d + number of words in all documents from topic k

Pr(wt |C = k) =1 +

∑i such that zi=k ni,t

d +∑d

s=1

∑Ni such that zi=k ni,t

Nk . The number of documents from topic kT . The number of topicszi . An indicator which gives the topic k of ni

ni,t . The number of times word wt appears in document ni

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 38: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Class work

Convert the documents in the corpus on the handout into theirbag of words representation.

Construct the naive Bayes classifier for the corpus.

Calculate the likelihood, or conditional distributions, for eachdocument in the corpus (Pr(ni |C = zi )).

Calculate the posterior probability, or classification distribution,for the unlabeled documents (Pr(C = k |ni )).

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 39: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Recap

During this section, we discussed

The multinomial distribution

Estimating the (smoothed) parameters for a multinomialdistribution from data

The multinomial bag of words representation of textdocuments

Independence assumptions in a naive Bayes classifier (NBC)

Calculating likelihood using an NBC

Calculating posterior probability using an NBC

Learning an NBC from data

Brandon Malone Naive Bayes Classifiers and Document Classification

Page 40: Naive Bayes Classifiers and Document Classification · Brandon Malone Naive Bayes Classi ers and Document Classi cation. The Multinomial Distribution Multinomial document model Naive

The Multinomial Distribution Multinomial document model Naive Bayes Classifier Wrap-up

Next in probabilistic models

Markov models for modeling time series and sequences

Hidden Markov models for gene prediction

S2 S3 SnS1

O1 O2 O3 On

Forward-backward algorithm for finding the most likelyinstantiation of a set of hidden variables

Brandon Malone Naive Bayes Classifiers and Document Classification