announcementslwehbe/10701_s19/files/lecture_5.pdf · 2019-02-05 · 10-701 introduction to machine...
TRANSCRIPT
10-701 Introduction to Machine Learning (PhD) Lecture 5: Naive Bayes
Leila Wehbe Carnegie Mellon University
Machine Learning Department
Slides based on Tom Mitchell’s 10-701 Spring 2016 material
Announcements• HW2 was released, due on February
15th
• Projects • Teams + project due February 11th (let
us know if you have problems!) • Midway Report due March 6th • Poster Session on April 17th,
9am-2pm • Final Report due May 1st
You want to build a classifier for new customers by learning P(Y|X)
O: Is older than 35
years
I: Has Personal
Income
S: Is a
Student
J: Birthday before
July 1st
Y: Buys computer
0 1 0 0 00 1 0 1 01 1 0 1 11 1 0 1 11 0 1 0 11 0 1 1 00 0 1 0 10 1 0 0 00 0 1 1 10 1 1 0 10 0 1 1 11 1 0 1 10 1 1 0 11 1 0 0 0
0 1 1 1 ?
New customer i:
Want to find P(Yi = 0| O=0, I = 0, S = 1, J = 1)
How many parameters must we estimate?
Suppose X =(X1,X2) where Xi and Y are boolean RV’s
To estimate P(Y| X1, X2)
X1 X2 P(Y = 1 | X1,X2) P(Y = 0 | X1,X2)
0 0 0.1 0.9
1 0 0.24 0.76
0 1 0.54 0.46
1 1 0.23 0.77
How many parameters must we estimate?
Suppose X =(X1,X2) where Xi and Y are boolean RV’s
To estimate P(Y| X1, X2)
X1 X2 P(Y = 1 | X1,X2) P(Y = 0 | X1,X2)
0 0 0.1 0.9
1 0 0.24 0.76
0 1 0.54 0.46
1 1 0.23 0.77
4 parameters: P(Y=0|X) = 1-P(Y=1|X)
How many parameters must we estimate?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
To estimate P(Y| X1, X2, … Xn)
X1 X2 X3 … XN
P(Y = 1 | X)
P(Y = 0 | X)
0 0 0 … 0 0.1 0.91 0 0 … … 0.24 0.76… … … … … … …… … … … … … …… … … … … … …1 1 1 1 1 0.52 0.48
How many parameters must we estimate?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
To estimate P(Y| X1, X2, … Xn)
X1 X2 X3 … XN
P(Y = 1 | X)
P(Y = 0 | X)
0 0 0 … 0 0.1 0.91 0 0 … … 0.24 0.76… … … … … … …… … … … … … …… … … … … … …1 1 1 1 1 0.52 0.48
2n rows!
How many parameters must we estimate?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
To estimate P(Y| X1, X2, … Xn)
If we have 30 boolean Xi’s: P(Y | X1, X2, … X30) ~ 1Billion
X1 X2 X3 … XN
P(Y = 1 | X)
P(Y = 0 | X)
0 0 0 … 0 0.1 0.91 0 0 … … 0.24 0.76… … … … … … …… … … … … … …… … … … … … …1 1 1 1 1 0.52 0.48
2n rows!
Bayes Rule
Which is shorthand for:
Equivalently:
Can we reduce params using Bayes Rule?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
How many parameters to define P(X1,… Xn | Y)?
Can we reduce params using Bayes Rule?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
How many parameters to define P(X1,… Xn | Y)?
Y X1 X2 … XNP(X|Y)
0 0 0 … 0 0.10 1 0 … … 0.02… … … … … …… … … … … …… … … … … …0 1 1 1 1 0.16
2n rows!
-1 because all the probabilities sum to 1
Can we reduce params using Bayes Rule?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
How many parameters to define P(X1,… Xn | Y)?
Y X1 X2 … XNP(X|Y)
0 0 0 … 0 0.10 1 0 … … 0.02… … … … … …… … … … … …… … … … … …0 1 1 1 1 0.16
Y X1 X2 … XNP(X|Y)
1 0 0 … 01 1 0 … …… … … … …… … … … …… … … … …1 1 1 1 1
2n -1 Should I compute these as well?
Can we reduce params using Bayes Rule?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
How many parameters to define P(X1,… Xn | Y)?
Y X1 X2 … XNP(X|Y)
0 0 0 … 0 0.10 1 0 … … 0.02… … … … … …… … … … … …… … … … … …0 1 1 1 1 0.16
Y X1 X2 … XNP(X|Y)
1 0 0 … 01 1 0 … …… … … … …… … … … …… … … … …1 1 1 1 1
2n -1 2n -1
Can we reduce params using Bayes Rule?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
How many parameters to define P(X1,… Xn | Y)?
How many parameters to define P(Y)?
2*(2n -1)
If n = 30, 2 billion
Can we reduce params using Bayes Rule?
Suppose X =(X1,… Xn) where Xi and Y are boolean RV’s
How many parameters to define P(X1,… Xn | Y)?
How many parameters to define P(Y)?
2*(2n -1)
If n = 30, 2 billion
1
Naïve Bayes
Naïve Bayes assumes
i.e., that Xi and Xj are conditionally independent given Y, for all i≠j
Conditional IndependenceDefinition: X is conditionally independent of Y given Z, if
the probability distribution governing X is independent of the value of Y, given the value of Z
Which we often write
E.g.,
P(thunder|raining,lightning) = P(thunder|lightning)
Naïve Bayes uses assumption that the Xi are conditionally independent, given Y. E.g.,
Given this assumption, then:
Naïve Bayes uses assumption that the Xi are conditionally independent, given Y. E.g.,
Given this assumption, then:
Chain rule
Naïve Bayes uses assumption that the Xi are conditionally independent, given Y. E.g.,
Given this assumption, then:
Conditional independence
Naïve Bayes uses assumption that the Xi are conditionally independent, given Y. E.g.,
Given this assumption, then:
in general:
Naïve Bayes uses assumption that the Xi are conditionally independent, given Y. E.g.,
Given this assumption, then:
in general:
How many parameters to describe P(X1…Xn|Y)? P(Y)? • Without conditional indep assumption? • With conditional indep assumption?
2(2n -1) and 1
Naïve Bayes uses assumption that the Xi are conditionally independent, given Y. E.g.,
Given this assumption, then:
in general:
How many parameters to describe P(X1…Xn|Y)? P(Y)? • Without conditional indep assumption? • With conditional indep assumption?
2(2n -1) and 1 2n and 1
Bayes rule:
Assuming conditional independence among Xi’s:
So, to pick most probable Y for Xnew = ( X1, …, Xn )
Naïve Bayes in a Nutshell
(estimate in training)
(testing)
Naïve Bayes Algorithm – discrete Xi
• Train Naïve Bayes (examples) for each* value yk
estimate for each* value xij of each attribute Xi
estimate • Classify (Xnew)
* probabilities must sum to 1, so need estimate only v-1 of these, where v is the number of values, which is 2 in the binary case
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
O: Is older than 35
years
S: Is a
Student
J: Birthday before
July 1st
Y: Buys computer
0 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 10 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
O S J Y
1 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 11 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
5/149/14O S J Y
1 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 11 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
5/149/144/9
O S J Y
1 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 11 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
5/149/144/9 5/9
O S J Y
1 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 11 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
5/149/144/9 5/92/5 3/5
O S J Y
1 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 11 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
5/149/144/9 5/94/5 1/56/9 3/91/5 4/5
O S J Y
1 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 11 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
Let’s train! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
O S J Y
1 0 0 00 0 1 01 0 1 11 0 1 11 1 0 11 1 1 00 1 0 11 0 0 00 1 1 10 1 0 10 1 1 11 0 1 10 1 0 11 0 0 0
5/149/144/9 5/94/5 1/56/9 3/91/5 4/55/9 4/92/5 3/5
Let’s test! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
5/149/144/9 5/94/5 1/56/9 3/91/5 4/55/9 4/92/5 3/5
0 1 1 ?
Let’s test! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
0 1 1 ?
5/149/144/9 5/94/5 1/56/9 3/91/5 4/55/9 4/92/5 3/5
P(O=0,S=1,J=1|Y=0) * P(Y=0) = 1/5 * 1/5 * 2/5 * 5/14 = 0.0057
Let’s test! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
0 1 1 ?
5/149/144/9 5/94/5 1/56/9 3/91/5 4/55/9 4/92/5 3/5
P(O=0,S=1,J=1|Y=0) * P(Y=0) = 1/5 * 1/5 * 2/5 * 5/14 = 0.0057
P(O=0,S=1,J=1|Y=1) * P(Y=1) = 5/9 * 6/9 * 5/9 * 9/14 = 0.13
Let’s test! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
0 1 1 ?
5/149/144/9 5/94/5 1/56/9 3/91/5 4/55/9 4/92/5 3/5
P(O=0,S=1,J=1|Y=0) * P(Y=0) = 1/5 * 1/5 * 2/5 * 5/14 = 0.0057
P(O=0,S=1,J=1|Y=1) * P(Y=1) = 5/9 * 6/9 * 5/9 * 9/14 = 0.13
Can already pick label 1
Let’s test! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
0 1 1 ?
5/149/144/9 5/94/5 1/56/9 3/91/5 4/55/9 4/92/5 3/5
P(O=0,S=1,J=1|Y=0) * P(Y=0) = 1/5 * 1/5 * 2/5 * 5/14 = 0.0057
P(O=0,S=1,J=1|Y=1) * P(Y=1) = 5/9 * 6/9 * 5/9 * 9/14 = 0.13
P (X) =X
i
P (X|Y = i)P (Y = i)<latexit sha1_base64="vmS1q7EwCzdmHDZj7XUc+voZ32M=">AAACBnicdVDLSgMxFM34rPU16lKEYBHazZCp1dZFoeDGZQX7kHYYMmnahmYeJBmhjF258VfcuFDErd/gzr8x01ZQ0QMhJ+fcy809XsSZVAh9GAuLS8srq5m17PrG5ta2ubPblGEsCG2QkIei7WFJOQtoQzHFaTsSFPsepy1vdJ76rRsqJAuDKzWOqOPjQcD6jGClJdc8qOfbBViFXRn7LoP6dXtdZQVN0ss1c8hCpyW7aENknSC7UkQzclY+hraFpsiBOequ+d7thST2aaAIx1J2bBQpJ8FCMcLpJNuNJY0wGeEB7WgaYJ9KJ5muMYFHWunBfij0CRScqt87EuxLOfY9XeljNZS/vVT8y+vEql9xEhZEsaIBmQ3qxxyqEKaZwB4TlCg+1gQTwfRfIRligYnSyWV1CF+bwv9Js2jZyLIvS7laZR5HBuyDQ5AHNiiDGrgAddAABNyBB/AEno1749F4MV5npQvGvGcP/IDx9gn76ZY9</latexit><latexit sha1_base64="vmS1q7EwCzdmHDZj7XUc+voZ32M=">AAACBnicdVDLSgMxFM34rPU16lKEYBHazZCp1dZFoeDGZQX7kHYYMmnahmYeJBmhjF258VfcuFDErd/gzr8x01ZQ0QMhJ+fcy809XsSZVAh9GAuLS8srq5m17PrG5ta2ubPblGEsCG2QkIei7WFJOQtoQzHFaTsSFPsepy1vdJ76rRsqJAuDKzWOqOPjQcD6jGClJdc8qOfbBViFXRn7LoP6dXtdZQVN0ss1c8hCpyW7aENknSC7UkQzclY+hraFpsiBOequ+d7thST2aaAIx1J2bBQpJ8FCMcLpJNuNJY0wGeEB7WgaYJ9KJ5muMYFHWunBfij0CRScqt87EuxLOfY9XeljNZS/vVT8y+vEql9xEhZEsaIBmQ3qxxyqEKaZwB4TlCg+1gQTwfRfIRligYnSyWV1CF+bwv9Js2jZyLIvS7laZR5HBuyDQ5AHNiiDGrgAddAABNyBB/AEno1749F4MV5npQvGvGcP/IDx9gn76ZY9</latexit><latexit sha1_base64="vmS1q7EwCzdmHDZj7XUc+voZ32M=">AAACBnicdVDLSgMxFM34rPU16lKEYBHazZCp1dZFoeDGZQX7kHYYMmnahmYeJBmhjF258VfcuFDErd/gzr8x01ZQ0QMhJ+fcy809XsSZVAh9GAuLS8srq5m17PrG5ta2ubPblGEsCG2QkIei7WFJOQtoQzHFaTsSFPsepy1vdJ76rRsqJAuDKzWOqOPjQcD6jGClJdc8qOfbBViFXRn7LoP6dXtdZQVN0ss1c8hCpyW7aENknSC7UkQzclY+hraFpsiBOequ+d7thST2aaAIx1J2bBQpJ8FCMcLpJNuNJY0wGeEB7WgaYJ9KJ5muMYFHWunBfij0CRScqt87EuxLOfY9XeljNZS/vVT8y+vEql9xEhZEsaIBmQ3qxxyqEKaZwB4TlCg+1gQTwfRfIRligYnSyWV1CF+bwv9Js2jZyLIvS7laZR5HBuyDQ5AHNiiDGrgAddAABNyBB/AEno1749F4MV5npQvGvGcP/IDx9gn76ZY9</latexit><latexit sha1_base64="vmS1q7EwCzdmHDZj7XUc+voZ32M=">AAACBnicdVDLSgMxFM34rPU16lKEYBHazZCp1dZFoeDGZQX7kHYYMmnahmYeJBmhjF258VfcuFDErd/gzr8x01ZQ0QMhJ+fcy809XsSZVAh9GAuLS8srq5m17PrG5ta2ubPblGEsCG2QkIei7WFJOQtoQzHFaTsSFPsepy1vdJ76rRsqJAuDKzWOqOPjQcD6jGClJdc8qOfbBViFXRn7LoP6dXtdZQVN0ss1c8hCpyW7aENknSC7UkQzclY+hraFpsiBOequ+d7thST2aaAIx1J2bBQpJ8FCMcLpJNuNJY0wGeEB7WgaYJ9KJ5muMYFHWunBfij0CRScqt87EuxLOfY9XeljNZS/vVT8y+vEql9xEhZEsaIBmQ3qxxyqEKaZwB4TlCg+1gQTwfRfIRligYnSyWV1CF+bwv9Js2jZyLIvS7laZR5HBuyDQ5AHNiiDGrgAddAABNyBB/AEno1749F4MV5npQvGvGcP/IDx9gn76ZY9</latexit>
Let’s test! Buy computer? P(Y|O,S,J)• O= Is older than 35 • S= Is a student
• J = Birthday before July 1 • Y= buys computer
What probability parameters must we estimate?
P(Y=1) : P(O=1 | Y=1) : P(O=1 | Y=0) : P(S=1 | Y=1) : P(S=1 | Y=0) : P(J=1 | Y=1) : P(J=1 | Y=0) :
P(Y=0) : P(O=0 | Y=1) : P(O=0 | Y=0) : P(S=0 | Y=1) : P(S=0 | Y=0) : P(J=0 | Y=1) : P(J=0 | Y=0) :
0 1 1 ?
5/149/144/9 5/94/5 1/56/9 3/91/5 4/55/9 4/92/5 3/5
P(Y=0|O=0,S=1,J=1) = 0.04
P(Y=1|O=0,S=1,J=1) = 0.96 Choice is Label 1
Assume we have a lot of data• We estimate these probabilities
• We obtain 68% test accuracy. • What does this mean?
• Our estimated P(Y=1) was 64.5%, let’s say this is actually the truth.
• Is 68% impressive?
• What is chance performance?
• What happens if I flip an unbiased coin?
• If P(Y=1)>0.5, then we can just predict 1 all the time! • What will be the accuracy?
• What happens if you predict Y=1 with probability 0.645 ==> this is called probability matching in cognitive science
Naïve Bayes: Point #1
Often the Xi are not really conditionally independent
• We use Naïve Bayes in many cases anyway, and it often works pretty well – often the right classification, even when not the right
probability (see [Domingos&Pazzani, 1996])
• What is effect on estimated P(Y|X)? – Extreme case: what if we add two copies: Xi = Xk
Extreme case: what if we add two copies: Xi = Xk
P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) _____________________________________________________________________________ P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) + P(Y=0) P(O=1|Y=0) P(S=0|Y=0) P(J=0|Y=0)
Naïve Bayes: Point #2
Irrelevant variables, do they affect you?
P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) _____________________________________________________________________________ P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) + P(Y=0) P(O=1|Y=0) P(S=0|Y=0) P(J=0|Y=0)
Naïve Bayes: Point #2
Irrelevant variables, do they affect you?
P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) _____________________________________________________________________________ P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) + P(Y=0) P(O=1|Y=0) P(S=0|Y=0) P(J=0|Y=0)
If J is not relevant then P(J|Y=0) = P(J|Y=1) = P(J)
Does it hurt classification?
Naïve Bayes: Point #2
Irrelevant variables, do they affect you?
P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) _____________________________________________________________________________ P(Y=1) P(O=1|Y=1) P(S=0|Y=1) P(J=0|Y=1) + P(Y=0) P(O=1|Y=0) P(S=0|Y=0) P(J=0|Y=0)
If J is not relevant then P(J|Y=0) = P(J|Y=1) = P(J)
Does it hurt classification?
If we had the correct estimate, performance is not affected! If we have noisy estimates ==> performance is affected
Another way to view Naïve Bayes (Boolean Y, Xi’s): Decision rule: is this quantity greater or less than 1?
(is q1 larger than q0?)
Naïve Bayes: Point #3
P (Y = 1|X1...Xn)
P (Y = 0|X1...Xn)=
P (Y = 1)Q
i P (Xi|Y = 1)
P (Y = 0)Q
i P (Xi|Y = 0)<latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit>
Another way to view Naïve Bayes (Boolean Y, Xi’s): Decision rule: is this quantity greater or less than 1?
Naïve Bayes: Point #3
logP (Y = 1|X1...Xn)
P (Y = 0|X1...Xn)= log
P (Y = 1)
P (Y = 0)+
X
i
logP (Xi|Y = 1)
P (Xi|Y = 0)<latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit>
> or < 0 ?
> or < 1 ?
1� ✓ik = P̂ (Xi = 0|Y = k)<latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit><latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit><latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit><latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit>
✓ik = P̂ (Xi = 1|Y = k)<latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit><latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit><latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit><latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit>
= logP (Y = 1)
P (Y = 0)+X
i
Xilog✓i1✓i0
+ (1�Xi)1� ✓i11� ✓i0
<latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit><latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit><latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit><latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit>
P (Y = 1|X1...Xn)
P (Y = 0|X1...Xn)=
P (Y = 1)Q
i P (Xi|Y = 1)
P (Y = 0)Q
i P (Xi|Y = 0)<latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit>
Another way to view Naïve Bayes (Boolean Y, Xi’s): Decision rule: is this quantity greater or less than 1?
Naïve Bayes: Point #3
logP (Y = 1|X1...Xn)
P (Y = 0|X1...Xn)= log
P (Y = 1)
P (Y = 0)+
X
i
logP (Xi|Y = 1)
P (Xi|Y = 0)<latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit>
What happens when X has a large number of dimensions? 1� ✓ik = P̂ (Xi = 0|Y = k)
<latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit><latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit><latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit><latexit sha1_base64="B/e7ypgMQQOWjtzVDDblu7ueuoI=">AAACDXicbVDLSsNAFJ3UV62vqEs3g1WoC0sigt0UCm5cVrAPaUqYTKfN0MmDmRuhxPyAG3/FjQtF3Lp35984bbPQ6oELh3Pu5d57vFhwBZb1ZRSWlldW14rrpY3Nre0dc3evraJEUtaikYhk1yOKCR6yFnAQrBtLRgJPsI43vpz6nTsmFY/CG5jErB+QUciHnBLQkmse2afYAZ8BcVM+znAdOz6BtJnhStfldev+tj4+cc2yVbVmwH+JnZMyytF0zU9nENEkYCFQQZTq2VYM/ZRI4FSwrOQkisWEjsmI9TQNScBUP519k+FjrQzwMJK6QsAz9edESgKlJoGnOwMCvlr0puJ/Xi+BYa2f8jBOgIV0vmiYCAwRnkaDB1wyCmKiCaGS61sx9YkkFHSAJR2CvfjyX9I+q9pW1b4+LzdqeRxFdIAOUQXZ6AI10BVqohai6AE9oRf0ajwaz8ab8T5vLRj5zD76BePjG6olmf0=</latexit>
✓ik = P̂ (Xi = 1|Y = k)<latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit><latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit><latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit><latexit sha1_base64="mi3cvEbeM0hMBeUWxZLXWgT5+x8=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0Wom5KIYDeFghuXFexD2hAm00kzdPJg5kYoMWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPlwiuwLK+jcLK6tr6RnGztLW9s7tn7h+0VZxKylo0FrHsekQxwSPWAg6CdRPJSOgJ1vFGVxO/c8+k4nF0C+OEOSEZRtznlICWXPO4DwED4mZ8lOM67gcEsmaOK12X1+2Hu/rozDXLVtWaAi8Te07KaI6ma371BzFNQxYBFUSpnm0l4GREAqeC5aV+qlhC6IgMWU/TiIRMOdn0lRyfamWA/VjqigBP1d8TGQmVGoee7gwJBGrRm4j/eb0U/JqT8ShJgUV0tshPBYYYT3LBAy4ZBTHWhFDJ9a2YBkQSCjq9kg7BXnx5mbTPq7ZVtW8uyo3aPI4iOkInqIJsdIka6Bo1UQtR9Iie0St6M56MF+Pd+Ji1Foz5zCH6A+PzB2wLmWI=</latexit>
= logP (Y = 1)
P (Y = 0)+X
i
Xilog✓i1✓i0
+ (1�Xi)1� ✓i11� ✓i0
<latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit><latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit><latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit><latexit sha1_base64="e2j9oPERXnt8bgyTVS1N9P82sIY=">AAACZ3icbZFLSwMxEMez66uur/pABC/RIrRIy0YEexEELx4rWK10y5JNs20w+yCZFcqyfkhv3r34LUzbBZ8DQ/6Z+Q1J/glSKTS47ptlLywuLa9UVp219Y3Nrer2zr1OMsV4lyUyUb2Aai5FzLsgQPJeqjiNAskfgqfraf/hmSstkvgOJikfRHQUi1AwCqbkV1/wJZbJyAsVZXmn/nhJGsVsdRsFPsWOp7PIF7hn0mB4znkw5kD9XJCiwF87t5iO1EnT0I0SJU38kybN77xfrbktdxb4ryClqKEyOn711RsmLIt4DExSrfvETWGQUwWCSV44XqZ5StkTHfG+kTGNuB7kM58KfGIqQxwmymQMeFb9PpHTSOtJFBgyojDWv3vT4n+9fgZhe5CLOM2Ax2x+UJhJDAmemo6HQnEGcmIEZUqYu2I2psYfMF/jGBPI7yf/FfdnLeK2yO157apd2lFBh+gY1RFBF+gK3aAO6iKG3i3H2rX2rA97y963D+aobZUzu+hH2EefVIK1ig==</latexit>
P (Y = 1|X1...Xn)
P (Y = 0|X1...Xn)=
P (Y = 1)Q
i P (Xi|Y = 1)
P (Y = 0)Q
i P (Xi|Y = 0)<latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit><latexit sha1_base64="qPxGUhFFqiqy0kAj6kcWnWvH5J0=">AAACUXicbVFNS8MwGH5Xv+b8mnr0EhyCu5RWBHcRBl48TnCzspaSZqkLpmlJUmHU/UUPevJ/ePGgmG49zOkLgYfngzd5EmWcKe047zVrZXVtfaO+2dja3tnda+4fDFSaS0L7JOWp9CKsKGeC9jXTnHqZpDiJOL2LHq9K/e6JSsVScasnGQ0S/CBYzAjWhgqbYz+WmBSod3p/6T57oWvbtheK9rQoGWeRQZdobp55234m01HITNIL2XPJVJk2WpYcEw6bLcd2ZoP+ArcCLaimFzZf/VFK8oQKTThWaug6mQ4KLDUjnE4bfq5ohskjfqBDAwVOqAqKWSNTdGKYEYpTaY7QaMYuJgqcKDVJIuNMsB6rZa0k/9OGuY47QcFElmsqyHxRnHOkU1TWi0ZMUqL5xABMJDN3RWSMTWvafELDlOAuP/kvGJzZrmO7N+etbqeqow5HcAyn4MIFdOEaetAHAi/wAV/wXXurfVpgWXOrVasyh/BrrK0fGJSuHg==</latexit>
Another way to view Naïve Bayes (Boolean Y, Xi’s): Decision rule: is this quantity greater or less than 1?
Naïve Bayes: Point #3
logP (Y = 1|X1...Xn)
P (Y = 0|X1...Xn)= log
P (Y = 1)
P (Y = 0)+
X
i
logP (Xi|Y = 1)
P (Xi|Y = 0)<latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit><latexit sha1_base64="/rYJIOwZIzM4Y+zUxADSRw01G6Y=">AAACXHicbZHPS8MwHMXTOnU/nE4FL16CQ5gIpRXBXQYDLx4nuK2yjZJm6RaWpiVJhdHtn/S2i/+KZlsnc/MLgcd7n5DkxY8Zlcq2F4Z5kDs8Os4XiqWT8ulZ5fyiI6NEYNLGEYuE6yNJGOWkrahixI0FQaHPSNefPC/z7gcRkkb8TU1jMgjRiNOAYqS05VUki0b9QCCcwlbtveHMXM+xLMv1+N08XTr2tgMb8Jdf4RtIR/ew2JdJ6NFtxPXobIOttUa9StW27NXAfeFkogqyaXmVz/4wwklIuMIMSdlz7FgNUiQUxYzMi/1EkhjhCRqRnpYchUQO0lU5c3irnSEMIqEXV3Dlbu9IUSjlNPQ1GSI1lrvZ0vwv6yUqqA9SyuNEEY7XBwUJgyqCy6bhkAqCFZtqgbCg+q4Qj5HuRen/KOoSnN0n74vOg+XYlvP6WG3Wszry4BrcgBpwwBNoghfQAm2AwQJ8G3mjYHyZObNklteoaWR7LsGfMa9+AMWMsBk=</latexit>
= logP (Y = 1)
P (Y = 0)+
X
i
xilog✓i1✓i0
+ (1� xi)1� ✓i11� ✓i0
<latexit sha1_base64="daTz7ayQg2WPc7ful+7fQQEAHOw=">AAACZ3icbZFLSwMxEMez63t9VSsieIkWoUVaNiLopSB48VjB+qBblmyabYPZB8msWJb1Q3rz7sVvYdou+BwY8s/Mb0jyT5BKocF13yx7bn5hcWl5xVldW9/YrGxt3+okU4x3WSITdR9QzaWIeRcESH6fKk6jQPK74PFy0r974kqLJL6Bccr7ER3GIhSMgin5lRfcxjIZeqGiLO/UH9qkUUxXt1HgY+x4Oot8gZ9NGgzPOA9GHKifC1IU+GvnFpOROmkaulGipIl/0qT5nfcrNbflTgP/FaQUNVRGx6+8eoOEZRGPgUmqdY+4KfRzqkAwyQvHyzRPKXukQ94zMqYR1/186lOBj0xlgMNEmYwBT6vfJ3IaaT2OAkNGFEb6d29S/K/XyyA87+ciTjPgMZsdFGYSQ4InpuOBUJyBHBtBmRLmrpiNqPEHzNc4xgTy+8l/xe1Ji7gtcn1auzgv7VhG++gQ1RFBZ+gCXaEO6iKG3i3Hqlo71oe9ae/aezPUtsqZKvoR9sEnxYK1yg==</latexit><latexit sha1_base64="daTz7ayQg2WPc7ful+7fQQEAHOw=">AAACZ3icbZFLSwMxEMez63t9VSsieIkWoUVaNiLopSB48VjB+qBblmyabYPZB8msWJb1Q3rz7sVvYdou+BwY8s/Mb0jyT5BKocF13yx7bn5hcWl5xVldW9/YrGxt3+okU4x3WSITdR9QzaWIeRcESH6fKk6jQPK74PFy0r974kqLJL6Bccr7ER3GIhSMgin5lRfcxjIZeqGiLO/UH9qkUUxXt1HgY+x4Oot8gZ9NGgzPOA9GHKifC1IU+GvnFpOROmkaulGipIl/0qT5nfcrNbflTgP/FaQUNVRGx6+8eoOEZRGPgUmqdY+4KfRzqkAwyQvHyzRPKXukQ94zMqYR1/186lOBj0xlgMNEmYwBT6vfJ3IaaT2OAkNGFEb6d29S/K/XyyA87+ciTjPgMZsdFGYSQ4InpuOBUJyBHBtBmRLmrpiNqPEHzNc4xgTy+8l/xe1Ji7gtcn1auzgv7VhG++gQ1RFBZ+gCXaEO6iKG3i3Hqlo71oe9ae/aezPUtsqZKvoR9sEnxYK1yg==</latexit><latexit sha1_base64="daTz7ayQg2WPc7ful+7fQQEAHOw=">AAACZ3icbZFLSwMxEMez63t9VSsieIkWoUVaNiLopSB48VjB+qBblmyabYPZB8msWJb1Q3rz7sVvYdou+BwY8s/Mb0jyT5BKocF13yx7bn5hcWl5xVldW9/YrGxt3+okU4x3WSITdR9QzaWIeRcESH6fKk6jQPK74PFy0r974kqLJL6Bccr7ER3GIhSMgin5lRfcxjIZeqGiLO/UH9qkUUxXt1HgY+x4Oot8gZ9NGgzPOA9GHKifC1IU+GvnFpOROmkaulGipIl/0qT5nfcrNbflTgP/FaQUNVRGx6+8eoOEZRGPgUmqdY+4KfRzqkAwyQvHyzRPKXukQ94zMqYR1/186lOBj0xlgMNEmYwBT6vfJ3IaaT2OAkNGFEb6d29S/K/XyyA87+ciTjPgMZsdFGYSQ4InpuOBUJyBHBtBmRLmrpiNqPEHzNc4xgTy+8l/xe1Ji7gtcn1auzgv7VhG++gQ1RFBZ+gCXaEO6iKG3i3Hqlo71oe9ae/aezPUtsqZKvoR9sEnxYK1yg==</latexit><latexit sha1_base64="daTz7ayQg2WPc7ful+7fQQEAHOw=">AAACZ3icbZFLSwMxEMez63t9VSsieIkWoUVaNiLopSB48VjB+qBblmyabYPZB8msWJb1Q3rz7sVvYdou+BwY8s/Mb0jyT5BKocF13yx7bn5hcWl5xVldW9/YrGxt3+okU4x3WSITdR9QzaWIeRcESH6fKk6jQPK74PFy0r974kqLJL6Bccr7ER3GIhSMgin5lRfcxjIZeqGiLO/UH9qkUUxXt1HgY+x4Oot8gZ9NGgzPOA9GHKifC1IU+GvnFpOROmkaulGipIl/0qT5nfcrNbflTgP/FaQUNVRGx6+8eoOEZRGPgUmqdY+4KfRzqkAwyQvHyzRPKXukQ94zMqYR1/186lOBj0xlgMNEmYwBT6vfJ3IaaT2OAkNGFEb6d29S/K/XyyA87+ciTjPgMZsdFGYSQ4InpuOBUJyBHBtBmRLmrpiNqPEHzNc4xgTy+8l/xe1Ji7gtcn1auzgv7VhG++gQ1RFBZ+gCXaEO6iKG3i3Hqlo71oe9ae/aezPUtsqZKvoR9sEnxYK1yg==</latexit>
✓ik = P̂ (xi = 1|Y = k)<latexit sha1_base64="s1BRrSy+i6eIhIIR7ZaPuGDtl/I=">AAACCnicbVDLSsNAFJ34rPUVdelmtAh1UxIR7KZQcOOygn1IW8JkOm2GTCZh5kYssWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPnwiuwXG+raXlldW19dxGfnNre2fX3ttv6DhVlNVpLGLV8olmgktWBw6CtRLFSOQL1vTDy7HfvGNK81jewDBh3YgMJO9zSsBInn3UgYAB8TIejnAFdwICWW2Ei/cer7gPt5Xw1LMLTsmZAC8Sd0YKaIaaZ391ejFNIyaBCqJ123US6GZEAaeCjfKdVLOE0JAMWNtQSSKmu9nklRE+MUoP92NlSgKeqL8nMhJpPYx80xkRCPS8Nxb/89op9MvdjMskBSbpdFE/FRhiPM4F97hiFMTQEEIVN7diGhBFKJj08iYEd/7lRdI4K7lOyb0+L1TLszhy6BAdoyJy0QWqoitUQ3VE0SN6Rq/ozXqyXqx362PaumTNZg7QH1ifP52rmYI=</latexit><latexit sha1_base64="s1BRrSy+i6eIhIIR7ZaPuGDtl/I=">AAACCnicbVDLSsNAFJ34rPUVdelmtAh1UxIR7KZQcOOygn1IW8JkOm2GTCZh5kYssWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPnwiuwXG+raXlldW19dxGfnNre2fX3ttv6DhVlNVpLGLV8olmgktWBw6CtRLFSOQL1vTDy7HfvGNK81jewDBh3YgMJO9zSsBInn3UgYAB8TIejnAFdwICWW2Ei/cer7gPt5Xw1LMLTsmZAC8Sd0YKaIaaZ391ejFNIyaBCqJ123US6GZEAaeCjfKdVLOE0JAMWNtQSSKmu9nklRE+MUoP92NlSgKeqL8nMhJpPYx80xkRCPS8Nxb/89op9MvdjMskBSbpdFE/FRhiPM4F97hiFMTQEEIVN7diGhBFKJj08iYEd/7lRdI4K7lOyb0+L1TLszhy6BAdoyJy0QWqoitUQ3VE0SN6Rq/ozXqyXqx362PaumTNZg7QH1ifP52rmYI=</latexit><latexit sha1_base64="s1BRrSy+i6eIhIIR7ZaPuGDtl/I=">AAACCnicbVDLSsNAFJ34rPUVdelmtAh1UxIR7KZQcOOygn1IW8JkOm2GTCZh5kYssWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPnwiuwXG+raXlldW19dxGfnNre2fX3ttv6DhVlNVpLGLV8olmgktWBw6CtRLFSOQL1vTDy7HfvGNK81jewDBh3YgMJO9zSsBInn3UgYAB8TIejnAFdwICWW2Ei/cer7gPt5Xw1LMLTsmZAC8Sd0YKaIaaZ391ejFNIyaBCqJ123US6GZEAaeCjfKdVLOE0JAMWNtQSSKmu9nklRE+MUoP92NlSgKeqL8nMhJpPYx80xkRCPS8Nxb/89op9MvdjMskBSbpdFE/FRhiPM4F97hiFMTQEEIVN7diGhBFKJj08iYEd/7lRdI4K7lOyb0+L1TLszhy6BAdoyJy0QWqoitUQ3VE0SN6Rq/ozXqyXqx362PaumTNZg7QH1ifP52rmYI=</latexit><latexit sha1_base64="s1BRrSy+i6eIhIIR7ZaPuGDtl/I=">AAACCnicbVDLSsNAFJ34rPUVdelmtAh1UxIR7KZQcOOygn1IW8JkOm2GTCZh5kYssWs3/oobF4q49Qvc+TdOHwttPXDhcM693HuPnwiuwXG+raXlldW19dxGfnNre2fX3ttv6DhVlNVpLGLV8olmgktWBw6CtRLFSOQL1vTDy7HfvGNK81jewDBh3YgMJO9zSsBInn3UgYAB8TIejnAFdwICWW2Ei/cer7gPt5Xw1LMLTsmZAC8Sd0YKaIaaZ391ejFNIyaBCqJ123US6GZEAaeCjfKdVLOE0JAMWNtQSSKmu9nklRE+MUoP92NlSgKeqL8nMhJpPYx80xkRCPS8Nxb/89op9MvdjMskBSbpdFE/FRhiPM4F97hiFMTQEEIVN7diGhBFKJj08iYEd/7lRdI4K7lOyb0+L1TLszhy6BAdoyJy0QWqoitUQ3VE0SN6Rq/ozXqyXqx362PaumTNZg7QH1ifP52rmYI=</latexit>
1� ✓ik = P̂ (xi = 0|Y = k)<latexit sha1_base64="33YFRAM5JISW7rVcYS5ZeIJcR88=">AAACDXicbVC7SgNBFJ2NrxhfUUubwSjEwrArgmkCARvLCOYh2WWZncwmQ2YfzNwVw5ofsPFXbCwUsbW382+cJFto4oELh3Pu5d57vFhwBab5beSWlldW1/LrhY3Nre2d4u5eS0WJpKxJIxHJjkcUEzxkTeAgWCeWjASeYG1veDnx23dMKh6FNzCKmROQfsh9TgloyS0eWafYhgED4qZ8OMY1bA8IpI0xLt+7vGY+3NaGJ26xZFbMKfAisTJSQhkabvHL7kU0CVgIVBClupYZg5MSCZwKNi7YiWIxoUPSZ11NQxIw5aTTb8b4WCs97EdSVwh4qv6eSEmg1CjwdGdAYKDmvYn4n9dNwK86KQ/jBFhIZ4v8RGCI8CQa3OOSURAjTQiVXN+K6YBIQkEHWNAhWPMvL5LWWcUyK9b1ealezeLIowN0iMrIQheojq5QAzURRY/oGb2iN+PJeDHejY9Za87IZvbRHxifP9vFmh0=</latexit><latexit sha1_base64="33YFRAM5JISW7rVcYS5ZeIJcR88=">AAACDXicbVC7SgNBFJ2NrxhfUUubwSjEwrArgmkCARvLCOYh2WWZncwmQ2YfzNwVw5ofsPFXbCwUsbW382+cJFto4oELh3Pu5d57vFhwBab5beSWlldW1/LrhY3Nre2d4u5eS0WJpKxJIxHJjkcUEzxkTeAgWCeWjASeYG1veDnx23dMKh6FNzCKmROQfsh9TgloyS0eWafYhgED4qZ8OMY1bA8IpI0xLt+7vGY+3NaGJ26xZFbMKfAisTJSQhkabvHL7kU0CVgIVBClupYZg5MSCZwKNi7YiWIxoUPSZ11NQxIw5aTTb8b4WCs97EdSVwh4qv6eSEmg1CjwdGdAYKDmvYn4n9dNwK86KQ/jBFhIZ4v8RGCI8CQa3OOSURAjTQiVXN+K6YBIQkEHWNAhWPMvL5LWWcUyK9b1ealezeLIowN0iMrIQheojq5QAzURRY/oGb2iN+PJeDHejY9Za87IZvbRHxifP9vFmh0=</latexit><latexit sha1_base64="33YFRAM5JISW7rVcYS5ZeIJcR88=">AAACDXicbVC7SgNBFJ2NrxhfUUubwSjEwrArgmkCARvLCOYh2WWZncwmQ2YfzNwVw5ofsPFXbCwUsbW382+cJFto4oELh3Pu5d57vFhwBab5beSWlldW1/LrhY3Nre2d4u5eS0WJpKxJIxHJjkcUEzxkTeAgWCeWjASeYG1veDnx23dMKh6FNzCKmROQfsh9TgloyS0eWafYhgED4qZ8OMY1bA8IpI0xLt+7vGY+3NaGJ26xZFbMKfAisTJSQhkabvHL7kU0CVgIVBClupYZg5MSCZwKNi7YiWIxoUPSZ11NQxIw5aTTb8b4WCs97EdSVwh4qv6eSEmg1CjwdGdAYKDmvYn4n9dNwK86KQ/jBFhIZ4v8RGCI8CQa3OOSURAjTQiVXN+K6YBIQkEHWNAhWPMvL5LWWcUyK9b1ealezeLIowN0iMrIQheojq5QAzURRY/oGb2iN+PJeDHejY9Za87IZvbRHxifP9vFmh0=</latexit><latexit sha1_base64="33YFRAM5JISW7rVcYS5ZeIJcR88=">AAACDXicbVC7SgNBFJ2NrxhfUUubwSjEwrArgmkCARvLCOYh2WWZncwmQ2YfzNwVw5ofsPFXbCwUsbW382+cJFto4oELh3Pu5d57vFhwBab5beSWlldW1/LrhY3Nre2d4u5eS0WJpKxJIxHJjkcUEzxkTeAgWCeWjASeYG1veDnx23dMKh6FNzCKmROQfsh9TgloyS0eWafYhgED4qZ8OMY1bA8IpI0xLt+7vGY+3NaGJ26xZFbMKfAisTJSQhkabvHL7kU0CVgIVBClupYZg5MSCZwKNi7YiWIxoUPSZ11NQxIw5aTTb8b4WCs97EdSVwh4qv6eSEmg1CjwdGdAYKDmvYn4n9dNwK86KQ/jBFhIZ4v8RGCI8CQa3OOSURAjTQiVXN+K6YBIQkEHWNAhWPMvL5LWWcUyK9b1ealezeLIowN0iMrIQheojq5QAzURRY/oGb2iN+PJeDHejY9Za87IZvbRHxifP9vFmh0=</latexit>
Prevents underflow
Naïve Bayes: Point #4
If unlucky, our MLE estimate for P(Xi | Y) might be zero. (for example, Xi = birthdate. Xi = Jan_25_1992)
• Why worry about just one parameter out of many?
• What can be done to address this?
Naïve Bayes: Point #4
If unlucky, our MLE estimate for P(Xi | Y) might be zero. (for example, Xi = birthdate. Xi = Jan_25_1992)
• Why worry about just one parameter out of many?
• What can be done to address this?
P (Y = i)P (X1|Y = i)P (X2|Y = i) . . . P (Xn|Y = i)<latexit sha1_base64="DyJsZu8dRCE7q5OSN/HHZ9lolN0=">AAACFnicbZDLSgMxFIbPeK31VnXpJliEurDMFMFuhIIblxXsRdphyKSZNjSTGZKMUGqfwo2v4saFIm7FnW9jOh1EW38IfPznnCTn92POlLbtL2tpeWV1bT23kd/c2t7ZLeztN1WUSEIbJOKRbPtYUc4EbWimOW3HkuLQ57TlDy+n9dYdlYpF4kaPYuqGuC9YwAjWxvIKp/XS7QU7QfVS23Puf7Ayw24v0io1RGp4haJdtlOhRXAyKEKmulf4NFeQJKRCE46V6jh2rN0xlpoRTif5bqJojMkQ92nHoMAhVe44XWuCjo3TQ0EkzREape7viTEOlRqFvukMsR6o+drU/K/WSXRQdcdMxImmgsweChKOdISmGaEek5RoPjKAiWTmr4gMsMREmyTzJgRnfuVFaFbKjl12rs+KtWoWRw4O4QhK4MA51OAK6tAAAg/wBC/waj1az9ab9T5rXbKymQP4I+vjG5oum+M=</latexit><latexit sha1_base64="DyJsZu8dRCE7q5OSN/HHZ9lolN0=">AAACFnicbZDLSgMxFIbPeK31VnXpJliEurDMFMFuhIIblxXsRdphyKSZNjSTGZKMUGqfwo2v4saFIm7FnW9jOh1EW38IfPznnCTn92POlLbtL2tpeWV1bT23kd/c2t7ZLeztN1WUSEIbJOKRbPtYUc4EbWimOW3HkuLQ57TlDy+n9dYdlYpF4kaPYuqGuC9YwAjWxvIKp/XS7QU7QfVS23Puf7Ayw24v0io1RGp4haJdtlOhRXAyKEKmulf4NFeQJKRCE46V6jh2rN0xlpoRTif5bqJojMkQ92nHoMAhVe44XWuCjo3TQ0EkzREape7viTEOlRqFvukMsR6o+drU/K/WSXRQdcdMxImmgsweChKOdISmGaEek5RoPjKAiWTmr4gMsMREmyTzJgRnfuVFaFbKjl12rs+KtWoWRw4O4QhK4MA51OAK6tAAAg/wBC/waj1az9ab9T5rXbKymQP4I+vjG5oum+M=</latexit><latexit sha1_base64="DyJsZu8dRCE7q5OSN/HHZ9lolN0=">AAACFnicbZDLSgMxFIbPeK31VnXpJliEurDMFMFuhIIblxXsRdphyKSZNjSTGZKMUGqfwo2v4saFIm7FnW9jOh1EW38IfPznnCTn92POlLbtL2tpeWV1bT23kd/c2t7ZLeztN1WUSEIbJOKRbPtYUc4EbWimOW3HkuLQ57TlDy+n9dYdlYpF4kaPYuqGuC9YwAjWxvIKp/XS7QU7QfVS23Puf7Ayw24v0io1RGp4haJdtlOhRXAyKEKmulf4NFeQJKRCE46V6jh2rN0xlpoRTif5bqJojMkQ92nHoMAhVe44XWuCjo3TQ0EkzREape7viTEOlRqFvukMsR6o+drU/K/WSXRQdcdMxImmgsweChKOdISmGaEek5RoPjKAiWTmr4gMsMREmyTzJgRnfuVFaFbKjl12rs+KtWoWRw4O4QhK4MA51OAK6tAAAg/wBC/waj1az9ab9T5rXbKymQP4I+vjG5oum+M=</latexit><latexit sha1_base64="DyJsZu8dRCE7q5OSN/HHZ9lolN0=">AAACFnicbZDLSgMxFIbPeK31VnXpJliEurDMFMFuhIIblxXsRdphyKSZNjSTGZKMUGqfwo2v4saFIm7FnW9jOh1EW38IfPznnCTn92POlLbtL2tpeWV1bT23kd/c2t7ZLeztN1WUSEIbJOKRbPtYUc4EbWimOW3HkuLQ57TlDy+n9dYdlYpF4kaPYuqGuC9YwAjWxvIKp/XS7QU7QfVS23Puf7Ayw24v0io1RGp4haJdtlOhRXAyKEKmulf4NFeQJKRCE46V6jh2rN0xlpoRTif5bqJojMkQ92nHoMAhVe44XWuCjo3TQ0EkzREape7viTEOlRqFvukMsR6o+drU/K/WSXRQdcdMxImmgsweChKOdISmGaEek5RoPjKAiWTmr4gMsMREmyTzJgRnfuVFaFbKjl12rs+KtWoWRw4O4QhK4MA51OAK6tAAAg/wBC/waj1az9ab9T5rXbKymQP4I+vjG5oum+M=</latexit>
Estimating Parameters• Maximum Likelihood Estimate (MLE): choose θ that
maximizes probability of observed data
• Maximum a Posteriori (MAP) estimate: choose θ that is most probable given prior probability and the data
Maximum likelihood estimates:
Estimating Parameters: Y, Xi discrete-valued
MAP estimates (Beta, Dirichlet priors): Only difference:
“imaginary” examples
How many counts should we add?
Learning to classify text documents• Classify which emails are spam? • Classify which emails promise an attachment? • Classify which web pages are student home pages?
How shall we represent text documents for Naïve Bayes?
Baseline: Bag of Words Approach
aardvark 0
about 2
all 2
Africa 1
apple 0
anxious 0
...
gas 1
...
oil 1
…
Zaire 0
How can we express X?
• Y discrete valued. e.g., Spam or not • X = (X1, X2, … Xn) with n the number of words in English.
• What are the problems with this representation?
How can we express X?
• Y discrete valued. e.g., Spam or not • X = (X1, X2, … Xn) with n the number of words in English.
• What are the problems with this representation? • Some words always present • Some words very infrequent • Doesn’t count how often a word appears • Conditional independence assumption is false…
Learning to classify document: P(Y|X)the “Bag of Words” model
• Y discrete valued. e.g., Spam or not • X = (X1, X2, … Xn) = document
• Xi is a random variable describing the word at position i in the document
• possible values for Xi : any word wk in English
• Xi represents the ith word position in document
• X1 = “I”, X2 = “am”, X3 = “pleased”
• and, let’s assume the Xi are iid (indep, identically distributed)
Learning to classify document: P(Y|X)the “Bag of Words” model
• Y discrete valued. e.g., Spam or not • X = (X1, X2, … Xn) = document
• Xi is a random variable describing the word at position i in the document
• possible values for Xi : any word wk in English
Multinomial distribution
All Xi have the same distribution
P (D|✓) / ✓↵11 ✓↵2
2 . . . ✓↵nn
<latexit sha1_base64="Uk8D2QK9q1pbLGje3ccUxeuS+gU=">AAACQXicbZBLSwMxFIUzvq2vqks3wSLopswUQZeCLlxWsFro1OFOmtpgJhOSO0IZ56+58R+4c+/GhSJu3Zg+xOeFwMl37s3jxFoKi77/4E1MTk3PzM7NlxYWl5ZXyqtrZzbNDOMNlsrUNGOwXArFGyhQ8qY2HJJY8vP46nDgn19zY0WqTrGveTuBSyW6ggE6FJWb9e2jmxB7HGGHhtqkGlM62kd5UFzkIUjdgygoPmHtC9aKsJOi/XTUl6OKqFzxq/6w6F8RjEWFjKsele/dYSxLuEImwdpW4Gts52BQMMmLUphZroFdwSVvOakg4badDxMo6JYjHdpNjVsK6ZB+n8ghsbafxK4zAezZ394A/ue1Muzut3OhdIZcsdFF3UxSl9IgTtoRhjOUfSeAGeHeSlkPDDB0oZdcCMHvL/8VZ7Vq4FeDk93Kwf44jjmyQTbJNgnIHjkgx6ROGoSRW/JInsmLd+c9ea/e26h1whvPrJMf5b1/AC8Csnw=</latexit><latexit sha1_base64="Uk8D2QK9q1pbLGje3ccUxeuS+gU=">AAACQXicbZBLSwMxFIUzvq2vqks3wSLopswUQZeCLlxWsFro1OFOmtpgJhOSO0IZ56+58R+4c+/GhSJu3Zg+xOeFwMl37s3jxFoKi77/4E1MTk3PzM7NlxYWl5ZXyqtrZzbNDOMNlsrUNGOwXArFGyhQ8qY2HJJY8vP46nDgn19zY0WqTrGveTuBSyW6ggE6FJWb9e2jmxB7HGGHhtqkGlM62kd5UFzkIUjdgygoPmHtC9aKsJOi/XTUl6OKqFzxq/6w6F8RjEWFjKsele/dYSxLuEImwdpW4Gts52BQMMmLUphZroFdwSVvOakg4badDxMo6JYjHdpNjVsK6ZB+n8ghsbafxK4zAezZ394A/ue1Muzut3OhdIZcsdFF3UxSl9IgTtoRhjOUfSeAGeHeSlkPDDB0oZdcCMHvL/8VZ7Vq4FeDk93Kwf44jjmyQTbJNgnIHjkgx6ROGoSRW/JInsmLd+c9ea/e26h1whvPrJMf5b1/AC8Csnw=</latexit><latexit sha1_base64="Uk8D2QK9q1pbLGje3ccUxeuS+gU=">AAACQXicbZBLSwMxFIUzvq2vqks3wSLopswUQZeCLlxWsFro1OFOmtpgJhOSO0IZ56+58R+4c+/GhSJu3Zg+xOeFwMl37s3jxFoKi77/4E1MTk3PzM7NlxYWl5ZXyqtrZzbNDOMNlsrUNGOwXArFGyhQ8qY2HJJY8vP46nDgn19zY0WqTrGveTuBSyW6ggE6FJWb9e2jmxB7HGGHhtqkGlM62kd5UFzkIUjdgygoPmHtC9aKsJOi/XTUl6OKqFzxq/6w6F8RjEWFjKsele/dYSxLuEImwdpW4Gts52BQMMmLUphZroFdwSVvOakg4badDxMo6JYjHdpNjVsK6ZB+n8ghsbafxK4zAezZ394A/ue1Muzut3OhdIZcsdFF3UxSl9IgTtoRhjOUfSeAGeHeSlkPDDB0oZdcCMHvL/8VZ7Vq4FeDk93Kwf44jjmyQTbJNgnIHjkgx6ROGoSRW/JInsmLd+c9ea/e26h1whvPrJMf5b1/AC8Csnw=</latexit><latexit sha1_base64="Uk8D2QK9q1pbLGje3ccUxeuS+gU=">AAACQXicbZBLSwMxFIUzvq2vqks3wSLopswUQZeCLlxWsFro1OFOmtpgJhOSO0IZ56+58R+4c+/GhSJu3Zg+xOeFwMl37s3jxFoKi77/4E1MTk3PzM7NlxYWl5ZXyqtrZzbNDOMNlsrUNGOwXArFGyhQ8qY2HJJY8vP46nDgn19zY0WqTrGveTuBSyW6ggE6FJWb9e2jmxB7HGGHhtqkGlM62kd5UFzkIUjdgygoPmHtC9aKsJOi/XTUl6OKqFzxq/6w6F8RjEWFjKsele/dYSxLuEImwdpW4Gts52BQMMmLUphZroFdwSVvOakg4badDxMo6JYjHdpNjVsK6ZB+n8ghsbafxK4zAezZ394A/ue1Muzut3OhdIZcsdFF3UxSl9IgTtoRhjOUfSeAGeHeSlkPDDB0oZdcCMHvL/8VZ7Vq4FeDk93Kwf44jjmyQTbJNgnIHjkgx6ROGoSRW/JInsmLd+c9ea/e26h1whvPrJMf5b1/AC8Csnw=</latexit>
Learning to classify document: P(Y|X)the “Bag of Words” model
• Y discrete valued. e.g., Spam or not • X = (X1, X2, … Xn) = document
• Xi is a random variable describing the word at position i in the document
• possible values for Xi : any word wk in English
• Document = bag of words: the vector of counts for all wk’s – like #heads, #tails, but we have many more than 2 values – assume word probabilities are position independent
(i.i.d. rolls of a 50,000-sided die)
Naïve Bayes Algorithm – discrete Xi • Train Naïve Bayes (examples) for each value yk
estimate for each value xj of each attribute Xi
estimate
• Classify (Xnew)
prob that word xj appears in position i, given Y=yk
* Additional assumption: word probabilities are position independent
MAP estimates for bag of words
Map estimate for multinomial
✓jk =↵jk + �jk � 1P
m ↵mk +P
m �mk � 1<latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit>
MAP estimates for bag of words
Map estimate for multinomial
What β’s should we choose?
✓jk =↵jk + �jk � 1P
m ↵mk +P
m �mk � 1<latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit>
# seen “aardvark” # hallucinated “aardvark”
# seen words # hallucinated words
MAP estimates for bag of words
Map estimate for multinomial
What β’s should we choose?
✓jk =↵jk + �jk � 1P
m ↵mk +P
m �mk � 1<latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit>
Probabilities over all classes?
Constant per word?
# seen “aardvark” # hallucinated “aardvark”
# seen words # hallucinated words
MAP estimates for bag of words
Map estimate for multinomial
What β’s should we choose?
✓jk =↵jk + �jk � 1P
m ↵mk +P
m �mk � 1<latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit><latexit sha1_base64="uX3PJcn417A8oAFe0wB+DSPe5bM=">AAACSHicbZDNS8MwGMbT+TXn19Sjl+AQBHG0IriLMPDicYL7gHWUNEu3uKQtyVthlP55Xjx682/w4kERb2ZbGbr5QuDheX4vSR4/FlyDbb9ahZXVtfWN4mZpa3tnd6+8f9DSUaIoa9JIRKrjE80ED1kTOAjWiRUj0hes7Y9uJnn7kSnNo/AexjHrSTIIecApAWN5Zc+FIQPipQ+jDF9jN1CEpi4R8TD3zrDrz4FzJ8OpqxPpyTkkR9mUyt0ZLHM488oVu2pPBy8LJxcVlE/DK7+4/YgmkoVABdG669gx9FKigFPBspKbaBYTOiID1jUyJJLpXjotIsMnxunjIFLmhICn7u+NlEitx9I3pCQw1IvZxPwv6yYQ1HopD+MEWEhnFwWJwBDhSau4zxWjIMZGEKq4eSumQ2K6BNN9yZTgLH55WbQuqo5dde4uK/VaXkcRHaFjdIocdIXq6BY1UBNR9ITe0Af6tJ6td+vL+p6hBSvfOUR/plD4AZUvssE=</latexit>
Probabilities over all classes?
Constant per word?
Prior is
Dirichlet
Posterior is also
Dirichlet
(Dirichlet is the conjugate prior for a multinomial
likelihood function)
P (✓k) =✓�1k
1k , ✓�2k
2k , . . . , ✓�mk
mk
Beta(�1k,�2k, . . . ,�mk)<latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit><latexit sha1_base64="BtOu40dmbNDpNla2CgWibObMJGw=">AAACdXicbZHLSgMxFIYz463WqqNbN8EitFDrTDe6EUQ3LivYC3RqyaSZNjRzITkjlGGew/dy58MIppdpa/VA4M//5Zwk53ix4Aps+8swd3b39g8Kh8Wj0vHJqXVWaqsokZS1aCQi2fWIYoKHrAUcBOvGkpHAE6zjTZ5mvPPOpOJR+ArTmPUDMgq5zykBbQ2sj2bFhTEDMkgnWRXfY9eXhKa550yyt9T1cp3VctDYAI05GEagVjjYwFpnGU4f9aayLlVbJ+e5q+NVnA2ssl2354H/CmcpymgZzYH1qavQJGAhUEGU6jl2DP2USOBUsKzoJorFhE7IiPW0DEnAVD+dNzDDV9oZYj+SeoWA5+5mRkoCpaaBp08GBMZqm83M/1gvAf+un/IwToCFdHGRnwgMEZ5NAw+5ZBTEVAtCJddvxXRM9ARAz6yom+Bsf/mvaDfqjl13XmxUQBfoElWQg27RA3pGTdRCFH0bZaNmXJumWTFvFu0yjWXfztGvMJ0fyPDDkg==</latexit><latexit sha1_base64="BtOu40dmbNDpNla2CgWibObMJGw=">AAACdXicbZHLSgMxFIYz463WqqNbN8EitFDrTDe6EUQ3LivYC3RqyaSZNjRzITkjlGGew/dy58MIppdpa/VA4M//5Zwk53ix4Aps+8swd3b39g8Kh8Wj0vHJqXVWaqsokZS1aCQi2fWIYoKHrAUcBOvGkpHAE6zjTZ5mvPPOpOJR+ArTmPUDMgq5zykBbQ2sj2bFhTEDMkgnWRXfY9eXhKa550yyt9T1cp3VctDYAI05GEagVjjYwFpnGU4f9aayLlVbJ+e5q+NVnA2ssl2354H/CmcpymgZzYH1qavQJGAhUEGU6jl2DP2USOBUsKzoJorFhE7IiPW0DEnAVD+dNzDDV9oZYj+SeoWA5+5mRkoCpaaBp08GBMZqm83M/1gvAf+un/IwToCFdHGRnwgMEZ5NAw+5ZBTEVAtCJddvxXRM9ARAz6yom+Bsf/mvaDfqjl13XmxUQBfoElWQg27RA3pGTdRCFH0bZaNmXJumWTFvFu0yjWXfztGvMJ0fyPDDkg==</latexit><latexit sha1_base64="2OOn6zPXhp7CH/LzqyoQWWX0IUQ=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pjcWQSh642UFu0BbSybNtGEyC8kZoQzzHL6Xdz6MYLrX1gOBP/93TpZznEhwBZb1bZg7u3v7B5nD7NHxyelZ7vyiqcJYUtagoQhl2yGKCR6wBnAQrB1JRnxHsJbjPU9464NJxcPgDcYR6/lkGHCXUwLa6uc+64UujBiQfuKlRfyIu64kNFl4tpe+J11nodPSAlTWQGUKBiGoJfbXsNZpipMnvSmsjiqtihe1y/QiTvu5vFW2poG3hT0XeTSPej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRIetoGRCfqV4ybWCKb7QzwG4o9QoAT931ioT4So19R2f6BEZqk03M/1gnBrfaS3gQxcACOrvIjQWGEE+mgQdcMgpirAWhkuu3YjoiegKgZ5bVTbA3v7wtmpWybZXtVytfq87bkUFX6BoVkI3uUQ29oDpqIIp+jLxRMm5N0yyYd6Y9SzWNec0l+hPmwy9LncRw</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit><latexit sha1_base64="03N+DkYPgNUd/J3uXdmRdwzpDpw=">AAACgHicbZHZSgMxFIYz41brVvXSm2ARWqh1pggWQSh642UFu0BbSybN2DCZheSMUIZ5Dt/LOx9GMN1r64HAn/87J8s5TiS4Asv6Nsyt7Z3dvcx+9uDw6Pgkd3rWVGEsKWvQUISy7RDFBA9YAzgI1o4kI74jWMvxnsa89cGk4mHwCqOI9XzyHnCXUwLa6uc+64UuDBmQfuKlRfyAu64kNJl7tpe+JV1nrtPSHFRWQGUCBiGoBfZXsNZpipNHvSksjyoti+e1i/QiTvu5vFW2JoE3hT0TeTSLej/3pU+hsc8CoIIo1bGtCHoJkcCpYGm2GysWEeqRd9bRMiA+U71k0sAUX2lngN1Q6hUAnrirFQnxlRr5js70CQzVOhub/7FODG61l/AgioEFdHqRGwsMIR5PAw+4ZBTESAtCJddvxXRI9ARAzyyrm2Cvf3lTNCtl2yrbL7f5WnXWjgy6QJeogGx0h2roGdVRA1H0Y+SNknFtmmbBvDHtaappzGrO0Z8w738BTN3EdA==</latexit>
For code and data, see www.cs.cmu.edu/~tom/mlbook.html click on “Software and Data” For code and data, see
www.cs.cmu.edu/~tom/mlbook.html click on “Software and Data”
19 Prior probabilities
For code and data, see www.cs.cmu.edu/~tom/mlbook.html click on “Software and Data”
19 Prior probabilities
Can group over sub-classes to generate
prior counts
(hallucinate training examples)
Performance can be very good• Even when taking half of the email
• Assumption doesn’t hurt the particular problem?
• Redundancy? • Leads less examples to train?
Converges faster to asymptotic performance? (Ng and Jordan)
What if we have continuous Xi ?
Eg., image classification: Xi is real-valued ith pixel
What if we have continuous Xi ?Eg., image classification: Xi is real-valued ith pixel
Naïve Bayes requires P(Xi | Y=yk), but Xi is real (continuous)
Common approach: assume P(Xi | Y=yk) follows a Normal (Gaussian) distribution
GNB Example: Classify a person’s cognitive state, based on brain image
• reading a sentence or viewing a picture? • reading the word describing a “Tool” or “Building”? • answering the question, or getting confused?
time
Stimuli for the study:
ant
or 60 distinct exemplars, presented 6 times each
What if we have continuous Xi ?Gaussian Naïve Bayes (GNB): assume
Sometimes assume variance • is independent of Y (i.e., σi), • or independent of Xi (i.e., σk) • or both (i.e., σ)
Gaussian Naïve Bayes Algorithm – continuous Xi (but still discrete Y)
• Train Naïve Bayes (examples) for each value yk
estimate* for each attribute Xi estimate
• class conditional mean , variance
• Classify (Xnew)
* probabilities must sum to 1, so need estimate only n-1 parameters...
Estimating Parameters: Y discrete, Xi continuous
Maximum likelihood estimates: jth training example
δ()=1 if (Yj=yk) else 0
ith feature kth class
Mean activations over all training examples for Y=“bottle”
Y is the mental state (reading “house” or “bottle”) Xi are the voxel activities,
this is a plot of the µ’s defining P(Xi | Y=“bottle”)
fMRI activation
high
below average
average
Classification task: is person viewing a “tool” or “building”?
p4 p8 p6 p11 p5 p7 p10 p9 p2 p12 p3 p10
0.10.20.30.40.50.60.70.80.9
1
Participants
Cla
ssifi
catio
n ac
cura
cy
statistically significant
p<0.05
Cla
ssifi
catio
n ac
cura
cy
Where is information encoded in the brain?
Accuracies of cubical 27-voxel
classifiers centered at
each significant voxel
[0.7-0.8]
Questions to think about:
• How can we extend Naïve Bayes if just 2 of the Xi‘s are dependent?
• What error will the classifier achieve if Naïve Bayes assumption is satisfied and we have infinite training data?
• What does the decision surface of a Naïve Bayes classifier look like?
• Can you use Naïve Bayes for a combination of discrete and real-valued Xi?
We covered: • Bayes classifiers to learn P(Y|X) • MLE and MAP estimates for parameters of P • Conditional independence • Naïve Bayes ! make Bayesian learning practical • Text classification • Naïve Bayes and continuous variables Xi:
• Gaussian Naïve Bayes classifier
Next: • Learn P(Y|X) directly
• Logistic regression, Regularization, Gradient ascent • Naïve Bayes or Logistic Regression?
• Generative vs. Discriminative classifiers
Gaussian Naïve Bayes – Big Picture
Example: Y= PlayBasketball (boolean), X1=Height, X2=MLgrade
assume P(Y=1) = 0.5
Logistic RegressionIdea: • Naïve Bayes allows computing P(Y|X) by
learning P(Y) and P(X|Y)
• Why not learn P(Y|X) directly?