bayesian classification ng
Post on 04-Apr-2018
219 Views
Preview:
TRANSCRIPT
-
7/29/2019 Bayesian Classification NG
1/35
-
7/29/2019 Bayesian Classification NG
2/35
Bayesian Classification
What are Bayesian Classifiers? Statistical Classifiers Predict class membership probabilities Based on Bayes Theorem Nave Bayesian Classifier Computationally Simple
Comparable performance with DT and NNclassifiers
-
7/29/2019 Bayesian Classification NG
3/35
Bayesian Classification
Probabilistic learning: Calculate explicitprobabilities for hypothesis, among the most
practical approaches to certain types oflearning problems
Incremental: Each training example canincrementally increase/decrease the
probability that a hypothesis is correct. Priorknowledge can be combined with observeddata.
-
7/29/2019 Bayesian Classification NG
4/35
Bayes Theorem
Let X be a data sample whose class labelis unknown
Let H be some hypothesis that X belongs
to a class C For classification determine P(H/X) P(H/X) is the probability that H holds given
the observed data sample X P(H/X) is posterior probability
-
7/29/2019 Bayesian Classification NG
5/35
Bayes Theorem
Example: Sample space: All Fruits
X is round and red
H= hypothesis that X is an Apple
P(H/X) is our confidence that X is an applegiven that X is round and red
P(H) is Prior Probability of H, ie, theprobability that any given data sample is an
apple regardless of how it looks P(H/X) is based on more information Note that P(H) is independent of X
-
7/29/2019 Bayesian Classification NG
6/35
Bayes Theorem
Example: Sample space: All Fruits P(X/H) ? It is the probability that X is round and red
given that we know that it is true that X is anapple Here P(X) is prior probability =
P(data sample from our set of fruits is red and
round)
-
7/29/2019 Bayesian Classification NG
7/35
Estimating Probabilities
P(X), P(H), and P(X/H) may be estimatedfrom given data
Bayes Theorem
Use of Bayes Theorem in Nave BayesianClassifier!!
)()()|()|(
XP
HPHXPXHP =
-
7/29/2019 Bayesian Classification NG
8/35
Nave Bayesian Classification
Also called Simple BC
Why Nave/Simple??
Class Conditional IndependenceEffect of an attribute values on a given class is
independent of the values of other attributes
This assumption simplifies computations
-
7/29/2019 Bayesian Classification NG
9/35
Nave Bayesian Classification
Steps Involved
1. Each data sample is of the type
X=(xi) i =1(1)n, where xi is the values of X for
attribute Ai
2. Suppose there are m classes Ci,i=1(1)m.
X Ci iff
P(Ci|X) > P(Cj|X) for 1 j m, j i
i.e BC assigns X to class Ci having highest
-
7/29/2019 Bayesian Classification NG
10/35
Nave Bayesian Classification
The class for whichP(Ci|X) is maximized is called the maximumposterior hypothesis.
From Bayes Theorem
3. P(X) is constant. Only need be maximized.
If class prior probabilities not known, then assume allclasses to be equally likely
Otherwise maximize
P(Ci) = Si/S
Problem: computing P(X|Ci) is unfeasible!
(find out how you would find it and why it is infeasible)
)()()|()|(
XP
CPCXPXCP
iii =
)()|( ii CPCXP
)()|( ii CPCXP
-
7/29/2019 Bayesian Classification NG
11/35
Nave Bayesian Classification
)|( iCXP
4. Nave assumption: attribute independence
= P(x1,,xn|C) = P(xk|C)
5. In order to classify an unknown sample X,evaluate for each class Ci.
Sample X is assigned to the class Ci iff
P(X|Ci)P(Ci) > P(X|Cj) P(Cj) for 1 j m, j i
)()|( ii CPCXP
-
7/29/2019 Bayesian Classification NG
12/35
Nave Bayesian Classification
EXAMPLEAge Income Student Credit_rating Class:Buys_comp
40 LOW Y FAIR Y
>40 LOW Y EXCELLENT N
31..40 LOW Y EXCELLENT Y
-
7/29/2019 Bayesian Classification NG
13/35
Nave Bayesian Classification
EXAMPLE
X= (
-
7/29/2019 Bayesian Classification NG
14/35
Nave Bayesian Classification
EXAMPLE
P(age
-
7/29/2019 Bayesian Classification NG
15/35
Nave Bayesian Classification
EXAMPLE
P(X | buys_comp=Y)=0.222*0.444*0.667*0.667=0.044
P(X | buys_comp=N)=0.600*0.400*0.200*0.400=0.019
P(X | buys_comp=Y)P(buys_comp=Y) = 0.044*0.643=0.028
P(X | buys_comp=N)P(buys_comp=N) = 0.019*0.357=0.007
CONCLUSION: X buys computer
-
7/29/2019 Bayesian Classification NG
16/35
Nave Bayes Classifier:
Issues Probability values ZERO!
Recall what you observed in WEKA!
If Ak is continuous valued!
Recall what you observed in WEKA!
If there are no tuples in the training set corresponding
to students for the class buys-comp=NO
P(student = Y|buys_comp=N)=0
Implications?
Solution?
-
7/29/2019 Bayesian Classification NG
17/35
Nave Bayes Classifier:
Issues Laplacian Correction or Laplace Estimator
Philosophy we assume that the training data set is so large
that adding one to each count that we need would only make
a negligible difference in the estimated prob. value.
Example: D (1000)
Class: buys_comp=Y
income=low zero tuples
income=medium 990 tuples
income=high 10 tuples
Without Laplacian Correction the probs. are 0, 0.990, and0.010
With Laplacian correction: 1/1003 = 0.001,
991/1003=0.988, and 11/1003=0.011 respectively.
-
7/29/2019 Bayesian Classification NG
18/35
Nave Bayes Classifier:
Issues
Continuous variable: need to do more
work than categorical attributes!
It is typically assumed to have a
Guassian distribution with a mean and a std. dev. .
Do it yourself! And cross check with
WEKA!
-
7/29/2019 Bayesian Classification NG
19/35
Nave Bayes (Summary)
Robust to isolated noise points
Handle missing values by ignoring theinstance during probability estimate
calculations
Robust to irrelevant attributes
Independence assumption may not holdfor some attributesUse other techniques such as Bayesian
Belief Networks (BBN)
-
7/29/2019 Bayesian Classification NG
20/35
Probability CalculationsAge Income Student Credit_rating Class:Buys_comp
40 LOW Y GOOD Y
>40 LOW Y EXCELLENT N
31..40 LOW Y EXCELLENT Y
-
7/29/2019 Bayesian Classification NG
21/35
Bayesian Belief Networks
Nave BC assumes Class Conditional Independence This assumption simplifies computations
When this assumption holds true, Nave BC is most accuratecompared to all other classifiers
In real problems, dependencies do exist between variables 2 methods to overcome this limitation of NBC
Bayesian networks, that combine Bayesian reasoning
with causal relationships between attributes
Decision trees, that reason on one attribute at the time,
considering most important attributes first
-
7/29/2019 Bayesian Classification NG
22/35
Conditional Independence
Let X, Y, & Z denote three set of randomvariables. The variables in X are said tobe conditionally independent of Y, givenZ if
P(X|Y,Z) = P(X|Z)Rel. bet. a persons arm length and
his/her reading skills!!One might observe that people with
longer arms tend to have higher levels ofreading skills
How do you explain this rel.?
-
7/29/2019 Bayesian Classification NG
23/35
Conditional Independence
Can be explained through a confoundingfactor, AGE
A young child tends to have short armsand lacks the reading skills of an adult
If the age of a person is fixed, then theobserved rel. between arm length andreading skills disappears
We can this conclude that arm length
and reading skills are conditionallyindependent when the age variable isfixed
P(reading skills| long arms,age) = P(reading skills|age)
-
7/29/2019 Bayesian Classification NG
24/35
Conditional Independence
P(X,Y|Z) = P(X,Y,Z)/P(Z)= P(X,Y,Z)/P(Y,Z) x P(Y,Z)/P(Z)
= P(X|Y,Z) x P(Y|Z)
= P(X|Z) x P(Y|Z)
This explains the Nave Bayesian:
P(X|Ci) = P(x1, x2, x3,,xn|C) =P(xk|C)
-
7/29/2019 Bayesian Classification NG
25/35
Bayesian Belief Networks
Belief Networks
Bayesian Networks
Probabilistic Networks
-
7/29/2019 Bayesian Classification NG
26/35
Bayesian Belief Networks
Conditional Independence (CI) assumption madeby NBC may be too rigid
Specially for classification problems in which
attributes are somewhat correlated We need a more flexible approach for modelingthe class conditional probabilities
P(X|Ci)= P(x1, x2, x3,,xn|C)
instead of requiring that all the attributes be CIgiven the class, BBN allows us to specify which pair
of attributes are CI
-
7/29/2019 Bayesian Classification NG
27/35
Bayesian Belief Networks
Belief Networks has 2 components
Directed Acyclic Graph (DAG)
Conditional Probability Table (CPT)
-
7/29/2019 Bayesian Classification NG
28/35
Bayesian Belief Networks
A node in BBN is CI of its non-descendants,if its parents are known
-
7/29/2019 Bayesian Classification NG
29/35
Bayesian Belief Networks
FamilyHistory
LungCancer
PositiveXRay
Smoker
Emphysema
Dyspnea
(~FH, ~S)
Bayesian Belief Networks
LC
~LC
(FH, S) (FH, ~S)(~FH, S)
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
The conditional probability table
for the variable LungCancer
-
7/29/2019 Bayesian Classification NG
30/35
-
7/29/2019 Bayesian Classification NG
31/35
Bayesian Belief Networks
Lung Cancer is CI of Emphysema, given its parents, FH& Smoker
BBN has a Conditional Probability Table (CPT) for eachvariable in the DAG
CPT for a variable Y specifies the conditionaldistribution P(Y|parents(Y))
P(LC=Y|FH=Y,S=Y) = 0.8
P(LC=N|FH=N,S=N) = 0.9
LC
~LC
0.8
0.2
0.5
0.5
0.7
0.3
0.1
0.9
CPT for LungCancer
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
-
7/29/2019 Bayesian Classification NG
32/35
Bayesian Belief Networks
Let X=(x1, x2,,xn) be a tuple described by variablesor attributes Y1, Y2, ,Yn respectively
Each variable is CI of its nondescendants given itsparents
Allows he DAG to provide a complete representationof the existing Joint Probability Distribution by:
P(x1, x2, x3,,xn)=P(xi|Parents(Yi))
where P(x1, x2, x3,,xn) is the prob. of a particularcombination of values of X, and the values for P(xi|Parents(Yi)) correspond to the entries in CPT for Yi
-
7/29/2019 Bayesian Classification NG
33/35
Bayesian Belief Networks
A node within the network can selected as an outputnode, representing a class label attribute
More than one output node
Rather than returning a single class label, theclassification process can return a probability distributionthat gives the probability of each class
Training BBN!!
-
7/29/2019 Bayesian Classification NG
34/35
Training BBN
Number of scenarios possible
Network topology may be given in advance orinferred from data
Variables may be observable or hidden (mising orincomplete data) in all or some of the training tuples
Many algos for learning the network topology fromthe training data given observable attibutes
If network topology is known and the variablesobservable, training is straightforward (just computeCPT entries)
-
7/29/2019 Bayesian Classification NG
35/35
Training BBNs
Topology given, but some variables are hiddenGradient Descent (self study)
Falls under the class of algos calledAdaptive ProbabilisticNetworks
BBNs are computationally expensive
BBNs provide explicit representation of Causal structure
Domain experts can provide prior knowledge to the trainingprocess in the form of topology and/or in conditionalprobability values
This leads to significant improvement in the learning
process
top related