bayesian classification ng

Upload: janani-aec

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Bayesian Classification NG

    1/35

  • 7/29/2019 Bayesian Classification NG

    2/35

    Bayesian Classification

    What are Bayesian Classifiers? Statistical Classifiers Predict class membership probabilities Based on Bayes Theorem Nave Bayesian Classifier Computationally Simple

    Comparable performance with DT and NNclassifiers

  • 7/29/2019 Bayesian Classification NG

    3/35

    Bayesian Classification

    Probabilistic learning: Calculate explicitprobabilities for hypothesis, among the most

    practical approaches to certain types oflearning problems

    Incremental: Each training example canincrementally increase/decrease the

    probability that a hypothesis is correct. Priorknowledge can be combined with observeddata.

  • 7/29/2019 Bayesian Classification NG

    4/35

    Bayes Theorem

    Let X be a data sample whose class labelis unknown

    Let H be some hypothesis that X belongs

    to a class C For classification determine P(H/X) P(H/X) is the probability that H holds given

    the observed data sample X P(H/X) is posterior probability

  • 7/29/2019 Bayesian Classification NG

    5/35

    Bayes Theorem

    Example: Sample space: All Fruits

    X is round and red

    H= hypothesis that X is an Apple

    P(H/X) is our confidence that X is an applegiven that X is round and red

    P(H) is Prior Probability of H, ie, theprobability that any given data sample is an

    apple regardless of how it looks P(H/X) is based on more information Note that P(H) is independent of X

  • 7/29/2019 Bayesian Classification NG

    6/35

    Bayes Theorem

    Example: Sample space: All Fruits P(X/H) ? It is the probability that X is round and red

    given that we know that it is true that X is anapple Here P(X) is prior probability =

    P(data sample from our set of fruits is red and

    round)

  • 7/29/2019 Bayesian Classification NG

    7/35

    Estimating Probabilities

    P(X), P(H), and P(X/H) may be estimatedfrom given data

    Bayes Theorem

    Use of Bayes Theorem in Nave BayesianClassifier!!

    )()()|()|(

    XP

    HPHXPXHP =

  • 7/29/2019 Bayesian Classification NG

    8/35

    Nave Bayesian Classification

    Also called Simple BC

    Why Nave/Simple??

    Class Conditional IndependenceEffect of an attribute values on a given class is

    independent of the values of other attributes

    This assumption simplifies computations

  • 7/29/2019 Bayesian Classification NG

    9/35

    Nave Bayesian Classification

    Steps Involved

    1. Each data sample is of the type

    X=(xi) i =1(1)n, where xi is the values of X for

    attribute Ai

    2. Suppose there are m classes Ci,i=1(1)m.

    X Ci iff

    P(Ci|X) > P(Cj|X) for 1 j m, j i

    i.e BC assigns X to class Ci having highest

  • 7/29/2019 Bayesian Classification NG

    10/35

    Nave Bayesian Classification

    The class for whichP(Ci|X) is maximized is called the maximumposterior hypothesis.

    From Bayes Theorem

    3. P(X) is constant. Only need be maximized.

    If class prior probabilities not known, then assume allclasses to be equally likely

    Otherwise maximize

    P(Ci) = Si/S

    Problem: computing P(X|Ci) is unfeasible!

    (find out how you would find it and why it is infeasible)

    )()()|()|(

    XP

    CPCXPXCP

    iii =

    )()|( ii CPCXP

    )()|( ii CPCXP

  • 7/29/2019 Bayesian Classification NG

    11/35

    Nave Bayesian Classification

    )|( iCXP

    4. Nave assumption: attribute independence

    = P(x1,,xn|C) = P(xk|C)

    5. In order to classify an unknown sample X,evaluate for each class Ci.

    Sample X is assigned to the class Ci iff

    P(X|Ci)P(Ci) > P(X|Cj) P(Cj) for 1 j m, j i

    )()|( ii CPCXP

  • 7/29/2019 Bayesian Classification NG

    12/35

    Nave Bayesian Classification

    EXAMPLEAge Income Student Credit_rating Class:Buys_comp

    40 LOW Y FAIR Y

    >40 LOW Y EXCELLENT N

    31..40 LOW Y EXCELLENT Y

  • 7/29/2019 Bayesian Classification NG

    13/35

    Nave Bayesian Classification

    EXAMPLE

    X= (

  • 7/29/2019 Bayesian Classification NG

    14/35

    Nave Bayesian Classification

    EXAMPLE

    P(age

  • 7/29/2019 Bayesian Classification NG

    15/35

    Nave Bayesian Classification

    EXAMPLE

    P(X | buys_comp=Y)=0.222*0.444*0.667*0.667=0.044

    P(X | buys_comp=N)=0.600*0.400*0.200*0.400=0.019

    P(X | buys_comp=Y)P(buys_comp=Y) = 0.044*0.643=0.028

    P(X | buys_comp=N)P(buys_comp=N) = 0.019*0.357=0.007

    CONCLUSION: X buys computer

  • 7/29/2019 Bayesian Classification NG

    16/35

    Nave Bayes Classifier:

    Issues Probability values ZERO!

    Recall what you observed in WEKA!

    If Ak is continuous valued!

    Recall what you observed in WEKA!

    If there are no tuples in the training set corresponding

    to students for the class buys-comp=NO

    P(student = Y|buys_comp=N)=0

    Implications?

    Solution?

  • 7/29/2019 Bayesian Classification NG

    17/35

    Nave Bayes Classifier:

    Issues Laplacian Correction or Laplace Estimator

    Philosophy we assume that the training data set is so large

    that adding one to each count that we need would only make

    a negligible difference in the estimated prob. value.

    Example: D (1000)

    Class: buys_comp=Y

    income=low zero tuples

    income=medium 990 tuples

    income=high 10 tuples

    Without Laplacian Correction the probs. are 0, 0.990, and0.010

    With Laplacian correction: 1/1003 = 0.001,

    991/1003=0.988, and 11/1003=0.011 respectively.

  • 7/29/2019 Bayesian Classification NG

    18/35

    Nave Bayes Classifier:

    Issues

    Continuous variable: need to do more

    work than categorical attributes!

    It is typically assumed to have a

    Guassian distribution with a mean and a std. dev. .

    Do it yourself! And cross check with

    WEKA!

  • 7/29/2019 Bayesian Classification NG

    19/35

    Nave Bayes (Summary)

    Robust to isolated noise points

    Handle missing values by ignoring theinstance during probability estimate

    calculations

    Robust to irrelevant attributes

    Independence assumption may not holdfor some attributesUse other techniques such as Bayesian

    Belief Networks (BBN)

  • 7/29/2019 Bayesian Classification NG

    20/35

    Probability CalculationsAge Income Student Credit_rating Class:Buys_comp

    40 LOW Y GOOD Y

    >40 LOW Y EXCELLENT N

    31..40 LOW Y EXCELLENT Y

  • 7/29/2019 Bayesian Classification NG

    21/35

    Bayesian Belief Networks

    Nave BC assumes Class Conditional Independence This assumption simplifies computations

    When this assumption holds true, Nave BC is most accuratecompared to all other classifiers

    In real problems, dependencies do exist between variables 2 methods to overcome this limitation of NBC

    Bayesian networks, that combine Bayesian reasoning

    with causal relationships between attributes

    Decision trees, that reason on one attribute at the time,

    considering most important attributes first

  • 7/29/2019 Bayesian Classification NG

    22/35

    Conditional Independence

    Let X, Y, & Z denote three set of randomvariables. The variables in X are said tobe conditionally independent of Y, givenZ if

    P(X|Y,Z) = P(X|Z)Rel. bet. a persons arm length and

    his/her reading skills!!One might observe that people with

    longer arms tend to have higher levels ofreading skills

    How do you explain this rel.?

  • 7/29/2019 Bayesian Classification NG

    23/35

    Conditional Independence

    Can be explained through a confoundingfactor, AGE

    A young child tends to have short armsand lacks the reading skills of an adult

    If the age of a person is fixed, then theobserved rel. between arm length andreading skills disappears

    We can this conclude that arm length

    and reading skills are conditionallyindependent when the age variable isfixed

    P(reading skills| long arms,age) = P(reading skills|age)

  • 7/29/2019 Bayesian Classification NG

    24/35

    Conditional Independence

    P(X,Y|Z) = P(X,Y,Z)/P(Z)= P(X,Y,Z)/P(Y,Z) x P(Y,Z)/P(Z)

    = P(X|Y,Z) x P(Y|Z)

    = P(X|Z) x P(Y|Z)

    This explains the Nave Bayesian:

    P(X|Ci) = P(x1, x2, x3,,xn|C) =P(xk|C)

  • 7/29/2019 Bayesian Classification NG

    25/35

    Bayesian Belief Networks

    Belief Networks

    Bayesian Networks

    Probabilistic Networks

  • 7/29/2019 Bayesian Classification NG

    26/35

    Bayesian Belief Networks

    Conditional Independence (CI) assumption madeby NBC may be too rigid

    Specially for classification problems in which

    attributes are somewhat correlated We need a more flexible approach for modelingthe class conditional probabilities

    P(X|Ci)= P(x1, x2, x3,,xn|C)

    instead of requiring that all the attributes be CIgiven the class, BBN allows us to specify which pair

    of attributes are CI

  • 7/29/2019 Bayesian Classification NG

    27/35

    Bayesian Belief Networks

    Belief Networks has 2 components

    Directed Acyclic Graph (DAG)

    Conditional Probability Table (CPT)

  • 7/29/2019 Bayesian Classification NG

    28/35

    Bayesian Belief Networks

    A node in BBN is CI of its non-descendants,if its parents are known

  • 7/29/2019 Bayesian Classification NG

    29/35

    Bayesian Belief Networks

    FamilyHistory

    LungCancer

    PositiveXRay

    Smoker

    Emphysema

    Dyspnea

    (~FH, ~S)

    Bayesian Belief Networks

    LC

    ~LC

    (FH, S) (FH, ~S)(~FH, S)

    0.8

    0.2

    0.5

    0.5

    0.7

    0.3

    0.1

    0.9

    The conditional probability table

    for the variable LungCancer

  • 7/29/2019 Bayesian Classification NG

    30/35

  • 7/29/2019 Bayesian Classification NG

    31/35

    Bayesian Belief Networks

    Lung Cancer is CI of Emphysema, given its parents, FH& Smoker

    BBN has a Conditional Probability Table (CPT) for eachvariable in the DAG

    CPT for a variable Y specifies the conditionaldistribution P(Y|parents(Y))

    P(LC=Y|FH=Y,S=Y) = 0.8

    P(LC=N|FH=N,S=N) = 0.9

    LC

    ~LC

    0.8

    0.2

    0.5

    0.5

    0.7

    0.3

    0.1

    0.9

    CPT for LungCancer

    (FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

  • 7/29/2019 Bayesian Classification NG

    32/35

    Bayesian Belief Networks

    Let X=(x1, x2,,xn) be a tuple described by variablesor attributes Y1, Y2, ,Yn respectively

    Each variable is CI of its nondescendants given itsparents

    Allows he DAG to provide a complete representationof the existing Joint Probability Distribution by:

    P(x1, x2, x3,,xn)=P(xi|Parents(Yi))

    where P(x1, x2, x3,,xn) is the prob. of a particularcombination of values of X, and the values for P(xi|Parents(Yi)) correspond to the entries in CPT for Yi

  • 7/29/2019 Bayesian Classification NG

    33/35

    Bayesian Belief Networks

    A node within the network can selected as an outputnode, representing a class label attribute

    More than one output node

    Rather than returning a single class label, theclassification process can return a probability distributionthat gives the probability of each class

    Training BBN!!

  • 7/29/2019 Bayesian Classification NG

    34/35

    Training BBN

    Number of scenarios possible

    Network topology may be given in advance orinferred from data

    Variables may be observable or hidden (mising orincomplete data) in all or some of the training tuples

    Many algos for learning the network topology fromthe training data given observable attibutes

    If network topology is known and the variablesobservable, training is straightforward (just computeCPT entries)

  • 7/29/2019 Bayesian Classification NG

    35/35

    Training BBNs

    Topology given, but some variables are hiddenGradient Descent (self study)

    Falls under the class of algos calledAdaptive ProbabilisticNetworks

    BBNs are computationally expensive

    BBNs provide explicit representation of Causal structure

    Domain experts can provide prior knowledge to the trainingprocess in the form of topology and/or in conditionalprobability values

    This leads to significant improvement in the learning

    process