naive bayesian classifier example, m-estimate of probability

24
Naive Bayesian Classifier Example, m-estimate of probability Relevant Readings: Section 6.9.1 CS495 - Machine Learning, Fall 2009

Upload: others

Post on 29-Nov-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Naive Bayesian Classifier Example, m-estimate of probability

Naive Bayesian Classifier Example, m-estimate ofprobability

Relevant Readings: Section 6.9.1

CS495 - Machine Learning, Fall 2009

Page 2: Naive Bayesian Classifier Example, m-estimate of probability

Training data for function playTennis [Table 3.2, Mitchell]

Day Outlook Temperature Humidity Wind PlayTennisD1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong NoD7 Overcast Cool Normal Strong YesD8 Sunny Mild High Weak NoD9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong YesD13 Overcast Hot Normal Weak YesD14 Rain Mild High Strong No

Page 3: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)

I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 4: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)

I v = argmaxb∈{Yes,No} Pr(b)∏

i Pr(ai | b)I v = argmaxb∈{Yes,No} Pr(b)

·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 5: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 6: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 7: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 8: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:

I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 9: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14

I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 10: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14

I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 11: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9

I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 12: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5

I We end up with 914· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 13: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 14: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 15: Naive Bayesian Classifier Example, m-estimate of probability

Classifying a new instance

I Consider the instance: (Sunny, Cool, High, Strong)I We will use Naive Bayes to classify it (v = Yes/No)I v = argmaxb∈{Yes,No} Pr(b)

∏i Pr(ai | b)

I v = argmaxb∈{Yes,No} Pr(b)·Pr(Outlook = Sunny | b)·Pr(Temperature = Cool | b)·Pr(Humidity = High | b)·Pr(Wind = Strong | b)

I So just try b = Yes and b = No and see which comes outhigher

I We can estimate each term using the data, for example:I Pr(Yes) = 9/14I Pr(No) = 5/14I Pr(Outlook = Sunny | Yes) = 2/9I Pr(Outlook = Sunny | No) = 3/5I We end up with 9

14· 2

9· 3

9· 3

9· 3

9= .0053 for Yes and

514· 3

5· 1

5· 4

5· 3

5= .0206 for No.

I We thus predict that No is the output

Page 16: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 17: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0

I To avoid this, we fix the following numbers p and mbeforehand:

I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 18: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:

I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 19: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:I A nonzero prior estimate p for Pr(A | B), and

I A number m that says how confident we are of our priorestimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 20: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 21: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 22: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 23: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values

Page 24: Naive Bayesian Classifier Example, m-estimate of probability

m-estimate of probability

I Note that we estimated conditional probabilities Pr(A | B) byncn where nc is the number of times A ∧ B happened and n is

the number of times B happened in the training data

I This can cause trouble if nc = 0I To avoid this, we fix the following numbers p and m

beforehand:I A nonzero prior estimate p for Pr(A | B), andI A number m that says how confident we are of our prior

estimate p, as measured in number of samples

I Then instead of using ncn for the estimate, use nc+mp

n+m

I Just think of this as adding a bunch of samples to start thewhole process

I If we don’t have any knowledge of p, assume the attribute isuniformly distributed over all possible values