1 data mining with naïve bayesian methods instructor: qiang yang hong kong university of science...
Post on 20-Dec-2015
214 views
TRANSCRIPT
![Page 1: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/1.jpg)
1
Data Mining with Naïve Bayesian Methods
Instructor: Qiang YangHong Kong University of Science and Technology
Thanks: Dan Weld, Eibe Frank
![Page 2: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/2.jpg)
2
How to Predict?
From a new day’s data we wish to predict the decision
Applications Text analysis Spam Email Classification Gene analysis Network Intrusion Detection
![Page 3: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/3.jpg)
3
Naïve Bayesian Models Two assumptions: Attributes are
equally important statistically independent (given the class
value) This means that knowledge about the value of a
particular attribute doesn’t tell us anything about the value of another attribute (if the class is known)
Although based on assumptions that are almost never correct, this scheme works well in practice!
![Page 4: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/4.jpg)
4
Why Naïve?
Assume the attributes are independent, given class What does that mean?
play
outlook temp humidity windy
Pr(outlook=sunny | windy=true, play=yes)= Pr(outlook=sunny|play=yes)
![Page 5: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/5.jpg)
5
Weather data set
Outlook Windy Play
overcast FALSE yes
rainy FALSE yes
rainy FALSE yes
overcast TRUE yes
sunny FALSE yes
rainy FALSE yes
sunny TRUE yes
overcast TRUE yes
overcast FALSE yes
![Page 6: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/6.jpg)
6
Is the assumption satisfied? #yes=9 #sunny=2 #windy, yes=3 #sunny|windy, yes=1
Pr(outlook=sunny|windy=true, play=yes)=1/3
Pr(outlook=sunny|play=yes)=2/9
Pr(windy|outlook=sunny,play=yes)=1/2
Pr(windy|play=yes)=3/9
Thus, the assumption is NOT satisfied.But, we can tolerate some errors (see later
slides)
Outlook Windy Play
overcast FALSE yes
rainy FALSE yes
rainy FALSE yes
overcast TRUE yes
sunny FALSE yes
rainy FALSE yes
sunny TRUE yes
overcast TRUE yes
overcast FALSE yes
![Page 7: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/7.jpg)
7
Probabilities for the weather data
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
A new day: Likelihood of the two classes
For “yes” = 2/9 3/9 3/9 3/9 9/14 = 0.0053
For “no” = 3/5 1/5 4/5 3/5 5/14 = 0.0206
Conversion into a probability by normalization:
P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205
P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795
![Page 8: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/8.jpg)
8
Bayes’ rule
Probability of event H given evidence E:
A priori probability of H: Probability of event before evidence has been seen
A posteriori probability of H: Probability of event after evidence has been seen
]Pr[]Pr[]|Pr[
]|Pr[E
HHEEH
]|Pr[ EH
]Pr[H
![Page 9: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/9.jpg)
9
Naïve Bayes for classification
Classification learning: what’s the probability of the class given an instance?
Evidence E = an instance Event H = class value for instance (Play=yes,
Play=no) Naïve Bayes Assumption: evidence can be split
into independent parts (i.e. attributes of instance are independent)
]Pr[
]Pr[]|Pr[]|Pr[]|Pr[]|Pr[ 21
E
HHEHEHEEH n
![Page 10: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/10.jpg)
10
The weather data exampleOutlook Temp. Humidity Windy Play
Sunny Cool High True ?
]|Pr[]|Pr[ yesSunnyOutlookEyes
]|Pr[ yesCooleTemperatur
]|Pr[ yesHighHumdity
]|Pr[ yesTrueWindy]Pr[]Pr[
Eyes
]Pr[14/99/39/39/39/2
E
Evidence E
Probability forclass “yes”
![Page 11: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/11.jpg)
11
The “zero-frequency problem”
What if an attribute value doesn’t occur with every class value (e.g. “Humidity = high” for class “yes”)?
Probability will be zero! A posteriori probability will also be zero!
(No matter how likely the other values are!) Remedy: add 1 to the count for every attribute
value-class combination (Laplace estimator) Result: probabilities will never be zero! (also:
stabilizes probability estimates)
0]|Pr[ yesHighHumdity
0]|Pr[ Eyes
![Page 12: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/12.jpg)
12
Modified probability estimates
In some cases adding a constant different from 1 might be more appropriate
Example: attribute outlook for class yes
Weights don’t need to be equal (if they sum to 1)
9
3/2
9
3/4
9
3/3
Sunny Overcast Rainy
92 1p
94 2p
9
3 3p
![Page 13: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/13.jpg)
13
Missing values
Training: instance is not included in frequency count for attribute value-class combination
Classification: attribute will be omitted from calculation
Example: Outlook Temp. Humidity Windy Play
? Cool High True ?
Likelihood of “yes” = 3/9 3/9 3/9 9/14 = 0.0238
Likelihood of “no” = 1/5 4/5 3/5 5/14 = 0.0343
P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41%
P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%
![Page 14: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/14.jpg)
14
Dealing with numeric attributes
Usual assumption: attributes have a normal or Gaussian probability distribution (given the class)
The probability density function for the normal distribution is defined by two parameters:
The sample mean :
The standard deviation :
The density function f(x):
n
iixn 1
1
n
iixn 1
2)(11
2
2
2
)(
21
)(
x
exf
![Page 15: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/15.jpg)
15
Statistics for the weather data
Example density value:
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 83 85 86 85 False 6 2 9 5
Overcast 4 0 70 80 96 90 True 3 3
Rainy 3 2 68 65 80 70
… … … …
Sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 std dev 6.2 7.9 std dev 10.2 9.7 True 3/9 3/5
Rainy 3/9 2/5
0340.02.62
1)|66(
2
2
2.62
)7366(
eyesetemperaturf
![Page 16: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/16.jpg)
16
Classifying a new day
A new day:
Missing values during training: not included in calculation of mean and standard deviation
Outlook Temp. Humidity Windy Play
Sunny 66 90 true ?
Likelihood of “yes” = 2/9 0.0340 0.0221 3/9 9/14 = 0.000036
Likelihood of “no” = 3/5 0.0291 0.0380 3/5 5/14 = 0.000136
P(“yes”) = 0.000036 / (0.000036 + 0. 000136) = 20.9%
P(“no”) = 0. 000136 / (0.000036 + 0. 000136) = 79.1%
![Page 17: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/17.jpg)
17
Probability densities
Relationship between probability and density:
But: this doesn’t change calculation of a posteriori probabilities because cancels out
Exact relationship:
)(]22
Pr[ cfcxc
b
a
dttfbxa )(]Pr[
![Page 18: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/18.jpg)
18
Example of Naïve Bayes in Weka
Use Weka Naïve Bayes Module to classify Weather.nominal.arff
![Page 19: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/19.jpg)
19
![Page 20: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/20.jpg)
20
![Page 21: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/21.jpg)
21
![Page 22: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/22.jpg)
22
![Page 23: 1 Data Mining with Naïve Bayesian Methods Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Dan Weld, Eibe](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d445503460f94a209fd/html5/thumbnails/23.jpg)
23
Discussion of Naïve Bayes
Naïve Bayes works surprisingly well (even if independence assumption is clearly violated)
Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class
However: adding too many redundant attributes will cause problems (e.g. identical attributes)
Note also: many numeric attributes are not normally distributed