naive bayes classification - the university of edinburgh · overview today’s lecture the curse of...
TRANSCRIPT
![Page 1: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/1.jpg)
Naive Bayes Classification
Guido Sanguinetti
Informatics 2B— Learning and Data Lecture 610 February 2012
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 1
![Page 2: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/2.jpg)
Back to the fish
x P̂(x |M) P̂(x |F )
4 0.00 0.02... ... ...18 0.01 0.0019 0.01 0.0020 0.00 0.00
1 What is the value of P(M|X = 4)?
2 What is the value of P(F |X = 18)?
3 You observe data point x = 20. To which class should it beassigned?
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 2
![Page 3: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/3.jpg)
Overview
Today’s lecture
The curse of dimensionality
Naive Bayes approximation
Introduction to text classification
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 3
![Page 4: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/4.jpg)
Recap: Bayes’ Theorem and Pattern Recognition
Let C = c1, . . . , cK denote the class and X = x denote theinput feature vector
Classify x as the class with the maximum posterior probability:
c∗ = arg maxck
P(ck | x)
Re-express this conditional probability using Bayes’ theorem:
posterior︷ ︸︸ ︷P(ck | x) =
likelihood︷ ︸︸ ︷P(x | ck)
prior︷ ︸︸ ︷P(ck)
P(x)
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 4
![Page 5: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/5.jpg)
The curse of dimensionality
Fish example: we constructed a histogram of lengths (m bins)
Imagine the input is (length, weight): we need a 2-dhistogram (m ×m bins)
And if we have a third feature, such as circumference: m3 bins
The space of inputs grows exponentially with the number ofdimensions. Bellman termed this the curse of dimensionality
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 5
![Page 6: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/6.jpg)
The curse of dimensionality
Fish example: we constructed a histogram of lengths (m bins)
Imagine the input is (length, weight): we need a 2-dhistogram (m ×m bins)
And if we have a third feature, such as circumference: m3 bins
The space of inputs grows exponentially with the number ofdimensions. Bellman termed this the curse of dimensionality
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 5
![Page 7: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/7.jpg)
The curse of dimensionality
Fish example: we constructed a histogram of lengths (m bins)
Imagine the input is (length, weight): we need a 2-dhistogram (m ×m bins)
And if we have a third feature, such as circumference: m3 bins
The space of inputs grows exponentially with the number ofdimensions. Bellman termed this the curse of dimensionality
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 5
![Page 8: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/8.jpg)
The curse of dimensionality
Fish example: we constructed a histogram of lengths (m bins)
Imagine the input is (length, weight): we need a 2-dhistogram (m ×m bins)
And if we have a third feature, such as circumference: m3 bins
The space of inputs grows exponentially with the number ofdimensions. Bellman termed this the curse of dimensionality
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 5
![Page 9: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/9.jpg)
Weather Example
Outlook Temperature Humidity Windy Play
sunny hot high false NOsunny hot high true NO
overcast hot high false YESrainy mild high false YESrainy cool normal false YESrainy cool normal true NO
overcast cool normal true YESsunny mild high false NOsunny cool normal false YESrainy mild normal false YESsunny mild normal true YES
overcast mild high true YESovercast hot normal false YES
rainy mild high true NO
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 6
![Page 10: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/10.jpg)
Weather data summary
Counts:Outlook Temperature Humidity Windy Play
Y N Y N Y N Y N Y N
sunny 2 3 hot 2 2 high 3 4 F 6 2 9 5overc 4 0 mild 4 2 norm 6 1 T 3 3rainy 3 2 cool 3 1
Relative frequencies:Outlook Temperature Humidity Windy Play
Y N Y N Y N Y N Y N
s 2/9 3/5 h 2/9 2/5 h 3/9 4/5 F 6/9 2/5 9/14 5/14o 4/9 0/5 m 4/9 2/5 n 6/9 1/5 T 3/9 3/9r 3/9 2/5 cl 3/9 1/5
We are given the following test example:Outlook Temp. Humidity Windy Play
x1 sunny cool high true ?
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 7
![Page 11: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/11.jpg)
Weather data summary
Counts:Outlook Temperature Humidity Windy Play
Y N Y N Y N Y N Y N
sunny 2 3 hot 2 2 high 3 4 F 6 2 9 5overc 4 0 mild 4 2 norm 6 1 T 3 3rainy 3 2 cool 3 1
Relative frequencies:Outlook Temperature Humidity Windy Play
Y N Y N Y N Y N Y N
s 2/9 3/5 h 2/9 2/5 h 3/9 4/5 F 6/9 2/5 9/14 5/14o 4/9 0/5 m 4/9 2/5 n 6/9 1/5 T 3/9 3/9r 3/9 2/5 cl 3/9 1/5
We are given the following test example:Outlook Temp. Humidity Windy Play
x1 sunny cool high true ?
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 7
![Page 12: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/12.jpg)
Weather data summary
Counts:Outlook Temperature Humidity Windy Play
Y N Y N Y N Y N Y N
sunny 2 3 hot 2 2 high 3 4 F 6 2 9 5overc 4 0 mild 4 2 norm 6 1 T 3 3rainy 3 2 cool 3 1
Relative frequencies:Outlook Temperature Humidity Windy Play
Y N Y N Y N Y N Y N
s 2/9 3/5 h 2/9 2/5 h 3/9 4/5 F 6/9 2/5 9/14 5/14o 4/9 0/5 m 4/9 2/5 n 6/9 1/5 T 3/9 3/9r 3/9 2/5 cl 3/9 1/5
We are given the following test example:Outlook Temp. Humidity Windy Play
x1 sunny cool high true ?
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 7
![Page 13: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/13.jpg)
Naive Bayes
Write the likelihood as a joint distribution of the dcomponents of x
P(x | ck) = P(x1, x2, . . . , xd | ck)
Naive Bayes: Assume the components of the input featurevector are independent:
P(x1, x2, . . . , xd | ck) ' P(x1 | ck)P(x2 | ck) . . . P(xd | ck)
=d∏
i=1
P(xi | ck)
Weather example:
P(O, T , H, W | Play) ' P(O | Play) · P(T | Play)
· P(H | Play) · P(W | Play)
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 8
![Page 14: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/14.jpg)
Naive Bayes
Write the likelihood as a joint distribution of the dcomponents of x
P(x | ck) = P(x1, x2, . . . , xd | ck)
Naive Bayes: Assume the components of the input featurevector are independent:
P(x1, x2, . . . , xd | ck) ' P(x1 | ck)P(x2 | ck) . . . P(xd | ck)
=d∏
i=1
P(xi | ck)
Weather example:
P(O, T , H, W | Play) ' P(O | Play) · P(T | Play)
· P(H | Play) · P(W | Play)
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 8
![Page 15: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/15.jpg)
Naive Bayes
Write the likelihood as a joint distribution of the dcomponents of x
P(x | ck) = P(x1, x2, . . . , xd | ck)
Naive Bayes: Assume the components of the input featurevector are independent:
P(x1, x2, . . . , xd | ck) ' P(x1 | ck)P(x2 | ck) . . . P(xd | ck)
=d∏
i=1
P(xi | ck)
Weather example:
P(O, T , H, W | Play) ' P(O | Play) · P(T | Play)
· P(H | Play) · P(W | Play)
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 8
![Page 16: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/16.jpg)
Naive Bayes Approximation
Take d 1-dimensional distributions rather than a singled-dimensional distribution
If each dimension can take m different values, this results inmd relative frequencies rather than md
Re-express Bayes’ theorem:
P(ck | x) =P(x|ck)P(ck
P(x)
=
∏di=1 P(xi | ck)P(ck)∏d
i=1 P(xi )
∝ P(ck)d∏
i=1
P(xi | ck)
c∗ = arg maxc
P(c | x)
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 9
![Page 17: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/17.jpg)
Naive Bayes Approximation
Take d 1-dimensional distributions rather than a singled-dimensional distribution
If each dimension can take m different values, this results inmd relative frequencies rather than md
Re-express Bayes’ theorem:
P(ck | x) =P(x|ck)P(ck
P(x)
=
∏di=1 P(xi | ck)P(ck)∏d
i=1 P(xi )
∝ P(ck)d∏
i=1
P(xi | ck)
c∗ = arg maxc
P(c | x)
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 9
![Page 18: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/18.jpg)
Naive Bayes Approximation
Take d 1-dimensional distributions rather than a singled-dimensional distribution
If each dimension can take m different values, this results inmd relative frequencies rather than md
Re-express Bayes’ theorem:
P(ck | x) =P(x|ck)P(ck
P(x)
=
∏di=1 P(xi | ck)P(ck)∏d
i=1 P(xi )
∝ P(ck)d∏
i=1
P(xi | ck)
c∗ = arg maxc
P(c | x)
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 9
![Page 19: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/19.jpg)
Naive Bayes: Weather Example
We need much more training data to estimate directlyP(O, T , H, W | Play) using relative frequencies (since mostcombinations of the input variables are not observed)
The training data does let us estimate P(O | Play)P(T | Play) P(H | Play) P(W | Play), using relativefrequencies
For test dataOutlook Temp. Humidity Windy Play
x1 sunny cool high true ?
P(O = s | play = Y ) = 2/9 P(O = s | play = N) = 3/5
P(T = c | play = Y ) = 3/9 P(T = c | play = N) = 1/5
P(H = h | play = Y ) = 3/9 P(O = s | play = N) = 4/5
P(W = t | play = Y ) = 3/9 P(W = t | play = N) = 3/5
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 10
![Page 20: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/20.jpg)
Naive Bayes: Weather Example
We need much more training data to estimate directlyP(O, T , H, W | Play) using relative frequencies (since mostcombinations of the input variables are not observed)
The training data does let us estimate P(O | Play)P(T | Play) P(H | Play) P(W | Play), using relativefrequencies
For test dataOutlook Temp. Humidity Windy Play
x1 sunny cool high true ?
P(O = s | play = Y ) = 2/9 P(O = s | play = N) = 3/5
P(T = c | play = Y ) = 3/9 P(T = c | play = N) = 1/5
P(H = h | play = Y ) = 3/9 P(O = s | play = N) = 4/5
P(W = t | play = Y ) = 3/9 P(W = t | play = N) = 3/5
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 10
![Page 21: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/21.jpg)
Naive Bayes: Weather Example
We need much more training data to estimate directlyP(O, T , H, W | Play) using relative frequencies (since mostcombinations of the input variables are not observed)
The training data does let us estimate P(O | Play)P(T | Play) P(H | Play) P(W | Play), using relativefrequencies
For test dataOutlook Temp. Humidity Windy Play
x1 sunny cool high true ?
P(O = s | play = Y ) = 2/9 P(O = s | play = N) = 3/5
P(T = c | play = Y ) = 3/9 P(T = c | play = N) = 1/5
P(H = h | play = Y ) = 3/9 P(O = s | play = N) = 4/5
P(W = t | play = Y ) = 3/9 P(W = t | play = N) = 3/5
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 10
![Page 22: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/22.jpg)
Naive Bayes Classification: Weather Example
P(play = Y | x) ∝ P(play = Y ) · [P(O = s | play = Y )
· P(T = c | play = Y ) · P(H = h | play = Y )
· P(W = t | play = Y )]
=9
14·[
2
9· 3
9· 3
9· 3
9
]= 0.0053
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 11
![Page 23: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/23.jpg)
Naive Bayes Classification: Weather Example
P(play = Y | x) ∝ P(play = Y ) · [P(O = s | play = Y )
· P(T = c | play = Y ) · P(H = h | play = Y )
· P(W = t | play = Y )]
=9
14·[
2
9· 3
9· 3
9· 3
9
]= 0.0053
P(play = N | x) ∝ P(play = N) · [P(O = s | play = N)
· P(T = c | play = N) · P(H = h | play = N)
· P(W = t | play = N)]
=5
14·[
3
5· 1
5· 4
5· 3
5
]= 0.0206
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 12
![Page 24: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/24.jpg)
Naive Bayes Classification: Weather Example
P(play = Y | x) ∝ P(play = Y ) · [P(O = s | play = Y )
· P(T = c | play = Y ) · P(H = h | play = Y )
· P(W = t | play = Y )]
=9
14·[
2
9· 3
9· 3
9· 3
9
]= 0.0053
P(play = N | x) ∝ P(play = N) · [P(O = s | play = N)
· P(T = c | play = N) · P(H = h | play = N)
· P(W = t | play = N)]
=5
14·[
3
5· 1
5· 4
5· 3
5
]= 0.0206
P(play = Y | x) < P(play = N | x), so classify x as play = N
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 13
![Page 25: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/25.jpg)
Naive Bayes for Text Classification
Text Classification
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 14
![Page 26: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/26.jpg)
Identifying Spam
Spam?
I got your contact information from your country’sinformation directory during my desperate search forsomeone who can assist me secretly and confidentially inrelocating and managing some family fortunes.
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 15
![Page 27: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/27.jpg)
Identifying Spam
Spam?
Dear Dr. Steve Renals,The proof for your article, Combining SpectralRepresentations for Large-Vocabulary Continuous SpeechRecognition, is ready for your review. Please access yourproof via the user ID and password provided below.Kindly log in to the website within 48 HOURS ofreceiving this message so that we may expedite thepublication process.
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 16
![Page 28: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/28.jpg)
Identifying Spam
Spam?
Congratulations to you as we bring to your notice, theresults of the First Category draws of THE HOLLANDCASINO LOTTO PROMO INT. We are happy to informyou that you have emerged a winner under the FirstCategory, which is part of our promotional draws.
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 17
![Page 29: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/29.jpg)
Identifying Spam
Question
How can we identify an email as spam automatically?
Text classification: classify email messages as spam or non-spam(ham), based on the words they contain
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 18
![Page 30: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/30.jpg)
Identifying Spam
Question
How can we identify an email as spam automatically?
Text classification: classify email messages as spam or non-spam(ham), based on the words they contain
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 18
![Page 31: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/31.jpg)
Text Classification using Bayes Theorem
Document D, with class ck
Classify D as the class with the highest posterior probability:
P(ck | D) =P(D | ck)P(ck)
P(D)∝ P(D | ck)P(ck)
How do we represent D? How do we estimate P(D | ck)?
Bernoulli document model: a document is represented by abinary feature vector, whose elements indicate absence orpresence of corresponding word in the document
Multinomial document model: a document is representedby an integer feature vector, whose elements indicatefrequency of corresponding word in the document
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 19
![Page 32: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/32.jpg)
Text Classification using Bayes Theorem
Document D, with class ck
Classify D as the class with the highest posterior probability:
P(ck | D) =P(D | ck)P(ck)
P(D)∝ P(D | ck)P(ck)
How do we represent D? How do we estimate P(D | ck)?
Bernoulli document model: a document is represented by abinary feature vector, whose elements indicate absence orpresence of corresponding word in the document
Multinomial document model: a document is representedby an integer feature vector, whose elements indicatefrequency of corresponding word in the document
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 19
![Page 33: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/33.jpg)
Text Classification using Bayes Theorem
Document D, with class ck
Classify D as the class with the highest posterior probability:
P(ck | D) =P(D | ck)P(ck)
P(D)∝ P(D | ck)P(ck)
How do we represent D? How do we estimate P(D | ck)?
Bernoulli document model: a document is represented by abinary feature vector, whose elements indicate absence orpresence of corresponding word in the document
Multinomial document model: a document is representedby an integer feature vector, whose elements indicatefrequency of corresponding word in the document
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 19
![Page 34: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/34.jpg)
Text Classification using Bayes Theorem
Document D, with class ck
Classify D as the class with the highest posterior probability:
P(ck | D) =P(D | ck)P(ck)
P(D)∝ P(D | ck)P(ck)
How do we represent D? How do we estimate P(D | ck)?
Bernoulli document model: a document is represented by abinary feature vector, whose elements indicate absence orpresence of corresponding word in the document
Multinomial document model: a document is representedby an integer feature vector, whose elements indicatefrequency of corresponding word in the document
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 19
![Page 35: Naive Bayes Classification - The University of Edinburgh · Overview Today’s lecture The curse of dimensionality Naive Bayes approximation Introduction to text classi cation Informatics](https://reader030.vdocuments.us/reader030/viewer/2022040701/5d5eb29188c993a0128bacdf/html5/thumbnails/35.jpg)
Summary
Naive Bayes approximation
Example: classifiying multidimensional data using Naive Bayes
Next lecture: Text classification using Naive Bayes
Informatics 2B: Learning and Data Lecture 6 Naive Bayes Classification 20