l2 probabilistic reasoning

Upload: sahil-kothari

Post on 14-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 L2 Probabilistic Reasoning

    1/39

  • 7/29/2019 L2 Probabilistic Reasoning

    2/39

    The process of probabilistic inference

    1. define model of problem

    2. derive posterior distributions and estimators

    3. estimate parameters from data

    4. evaluate model accuracy

  • 7/29/2019 L2 Probabilistic Reasoning

    3/39

    Axioms of probability

    Axioms (Kolmogorov):0 P(A) 1

    P(true) = 1

    P(false) = 0

    P(A or B) = P(A) + P(B) P(A and B)

    Corollaries:- A single random variable must sum to 1:

    - The joint probability of a set of variables must also sum to 1.- If A and B are mutually exclusive:

    P(A or B) = P(A) + P(B)

    n

    i=1

    P(D = di) = 1

  • 7/29/2019 L2 Probabilistic Reasoning

    4/39

    Rules of probability

    conditional probability

    corollary (Bayes rule)

    P r(A|B) = P r(A and B)P r(B)

    , P r(B) > 0

    Pr(B|A)Pr(A) = Pr(A andB) = Pr(A|B)Pr(B)

    Pr(B|A) =Pr(A|B)Pr(B)

    Pr(A)

  • 7/29/2019 L2 Probabilistic Reasoning

    5/39

  • 7/29/2019 L2 Probabilistic Reasoning

    6/39

    Basic concepts

    Making rational decisions when faced with uncertainty:

    Probabilitythe precise representation of knowledge and uncertainty

    Probability theoryhow to optimally update your knowledge based on new information

    Decision theory: probability theory + utility theoryhow to use this information to achieve maximum expected utility

    Basic probability theory

    random variables probability distributions (discrete) and probability densities (continuous) rules of probability expectation and the computation of 1st and 2nd moments joint and multivariate probability distributions and densities covariance and principal components

  • 7/29/2019 L2 Probabilistic Reasoning

    7/39

    The Joint Distribution

    Recipe for making a joint distributionof M variables:

    1. Make a truth table listing allcombinations of values of your

    variables (if there are M Booleanvariables then the table will have2M rows).

    2. For each combination of values,say how probable it is.

    3. If you subscribe to the axioms ofprobability, those numbers mustsum to 1.

    Example: Booleanvariables A, B, C

    0.10111

    0.25011

    0.10101

    0.05001

    0.05110

    0.10010

    0.05100

    0.30000

    ProbCBA

    A

    B

    C0.050.25

    0.10 0.05 0.05

    0.10

    0.100.30

    All the nice looking slides like this one from are from Andrew Moore, CMU

  • 7/29/2019 L2 Probabilistic Reasoning

    8/39

    The Joint Distribution

    Recipe for making a joint distributionof M variables:

    1. Make a truth table listing allcombinations of values of your

    variables (if there are M Booleanvariables then the table will have2M rows).

    2. For each combination of values,say how probable it is.

    3. If you subscribe to the axioms ofprobability, those numbers mustsum to 1.

    Example: Booleanvariables A, B, C

    0.10111

    0.25011

    0.10101

    0.05001

    0.05110

    0.10010

    0.05100

    0.30000

    ProbCBA

    A

    B

    C0.050.25

    0.10 0.05 0.05

    0.10

    0.100.30

  • 7/29/2019 L2 Probabilistic Reasoning

    9/39

    The Joint Distribution

    Recipe for making a joint distributionof M variables:

    1. Make a truth table listing allcombinations of values of your

    variables (if there are M Booleanvariables then the table will have2M rows).

    2. For each combination of values,say how probable it is.

    3. If you subscribe to the axioms ofprobability, those numbers mustsum to 1.

    Example: Booleanvariables A, B, C

    0.10111

    0.25011

    0.10101

    0.05001

    0.05110

    0.10010

    0.05100

    0.30000

    ProbCBA

    A

    B

    C0.050.25

    0.10 0.05 0.05

    0.10

    0.100.30

  • 7/29/2019 L2 Probabilistic Reasoning

    10/39

    Using theJoint

    One you have the JD you canask for the probability of anylogical expression involvingyour attribute

    !"E

    PEP

    matchingrows

    )row()(

  • 7/29/2019 L2 Probabilistic Reasoning

    11/39

    Using theJoint

    One you have the JD you canask for the probability of anylogical expression involvingyour attribute

    !"E

    PEP

    matchingrows

    )row()(

    P(Poor Male) = 0.4654

  • 7/29/2019 L2 Probabilistic Reasoning

    12/39

    Using theJoint

    One you have the JD you canask for the probability of anylogical expression involvingyour attribute

    !"E

    PEP

    matchingrows

    )row()(

    P(Poor) = 0.7604

  • 7/29/2019 L2 Probabilistic Reasoning

    13/39

    Using theJoint

    One you have the JD you canask for the probability of anylogical expression involvingyour attribute

    !"E

    PEP

    matchingrows

    )row()(

    Inference

    with theJoint

    !

    !"

    #"

    2

    21

    matchingrows

    andmatchingrows

    2

    2121

    )row(

    )row(

    )(

    )()|(

    E

    EE

    P

    P

    EP

    EEPEEP

    P(Male | Poor) = 0.4654 / 0.7604 = 0.612

  • 7/29/2019 L2 Probabilistic Reasoning

    14/39

    Continuous probability distributions

    probability density function (pdf)

    joint probability density

    marginal probability calculating probabilities using the pdf Bayes rule

  • 7/29/2019 L2 Probabilistic Reasoning

    15/39

    A PDF of American Ages in 2000

    more of Andrews nice slides

  • 7/29/2019 L2 Probabilistic Reasoning

    16/39

    A PDF of American Ages in 2000Let X be a continuous randomvariable.

    If p(x) is a Probability DensityFunction for X then

    ! " #$

    $%&

    b

    ax

    dxxpbXaP )(

    ! " #$$%&

    50

    30ageage)age(50Age30 dpP

    = 0.36

  • 7/29/2019 L2 Probabilistic Reasoning

    17/39

    What does p(x) mean?

    It does not mean a probability!

    First of all, its not a value between 0 and 1.

    Its just a value, and an arbitrary one at that. The likelihood of p(a) can only be compared relativelyto other values p(b) It indicates the relative probability of the integrated density over a small delta:

    If

    then

    !

    bp

    ap"

    )(

    )(

    !

    hbXhbP

    haXhaP

    h

    "

    #$$%

    #$$%

    & )(

    )(lim

    0

  • 7/29/2019 L2 Probabilistic Reasoning

    18/39

    ExpectationsE[X] = the expected value ofrandom variable X

    = the average value wed seeif we took a very large number

    of random samples of X

    !"

    #"$

    $x

    dxxpx )(

  • 7/29/2019 L2 Probabilistic Reasoning

    19/39

    ExpectationsE[X] = the expected value ofrandom variable X

    = the average value wed seeif we took a very large number

    of random samples of X

    !"

    #"$

    $x

    dxxpx )(

    = the first moment of theshape formed by the axes and

    the blue curve

    = the best value to choose ifyou must guess an unknownpersons age and youll befined the square of your error

    E[age]=35.897

  • 7/29/2019 L2 Probabilistic Reasoning

    20/39

    Expectation of a function

    !=E[f(X)] = the expectedvalue of f(x) where x is drawnfrom Xs distribution.

    = the average value wed seeif we took a very large numberof random samples of f(X)

    "#

    $#%

    %x

    dxxpxf )()(!

    Note that in general:

    ])[()]([ XEfxfE &

    64.1786]age[ 2 %E

    62.1288])age[( 2 %E

  • 7/29/2019 L2 Probabilistic Reasoning

    21/39

    Variance

    '2

    = Var[X] = theexpected squareddifference betweenx and E[X] "

    #

    $#%

    $%x

    dxxpx )()( 22 !'

    = amount youd expect to lose

    if you must guess an unknownpersons age and youll befined the square of your error,and assuming you playoptimally

    02.498]age[Var %

  • 7/29/2019 L2 Probabilistic Reasoning

    22/39

    Standard Deviation

    !2

    = Var[X] = theexpected squareddifference betweenx and E[X] "

    #

    $#%

    $%x

    dxxpx )()( 22 &!

    = amount youd expect to lose

    if you must guess an unknownpersons age and youll befined the square of your error,and assuming you playoptimally

    ! = Standard Deviation =typical deviation of X fromits mean

    02.498]age[Var %

    ][Var X%!

    32.22%!

  • 7/29/2019 L2 Probabilistic Reasoning

    23/39

    Simple example: medical test results

    Test report for rare disease is positive, 90% accurate Whats the probability that you have the disease?

    What if the test is repeated?

    This is the simplest example of reasoning by combining sources of information.

  • 7/29/2019 L2 Probabilistic Reasoning

    24/39

    How do we model the problem?

    Which is the correct description of Test is 90% accurate ?

    What do we want to know?

    More compact notation:

    P(T = true|D = true) P(T|D)

    P(T = false|D = false) P(T|D)

    P(T = true) = 0.9P(T = true|D = true) = 0.9

    P(D = true|T = true) = 0.9

    P(T = true)

    P(T = true|D = true)

    P(D = true|T = true)

  • 7/29/2019 L2 Probabilistic Reasoning

    25/39

    Evaluating the posterior probability through Bayesian inference

    We want P(D|T) = The probability of the having the disease given a positive test Use Bayes rule to relate it to what we know: P(T|D)

    Whats the prior P(D)? Disease is rare, so lets assume

    What about P(T)?

    Whats the interpretation of that?

    P(D|T) =P(T|D)P(D)

    P(T)posterior

    likelihood prior

    normalizing

    constant

    P(D) = 0.001

  • 7/29/2019 L2 Probabilistic Reasoning

    26/39

    P(T) is the marginal probability of P(T,D) = P(T|D) P(D) So, compute with summation

    For true or false propositions:

    Evaluating the normalizing constant

    P(D|T) =P(T|D)P(D)

    P(T)posterior

    likelihood prior

    normalizing

    constant

    P(T) =

    all values of D

    P(T|D)P(D)

    P(T) = P(T|D)P(D) + P(T|D)P(D)

    What are

    these?

  • 7/29/2019 L2 Probabilistic Reasoning

    27/39

    Refining our model of the test

    We also have to consider the negative case to incorporate all information:

    What should it be?

    What about ?

    P(T|D) = 0.9P(T|D) = ?

    P(D)

  • 7/29/2019 L2 Probabilistic Reasoning

    28/39

  • 7/29/2019 L2 Probabilistic Reasoning

    29/39

    Same problem different situation

    Suppose we have a test to determine if you won the lottery. Its 90% accurate.

    What is P($ = true | T = true) then?

  • 7/29/2019 L2 Probabilistic Reasoning

    30/39

    Playing around with the numbers

    What if the test were 100% reliable?

    What if the test was the same, but disease wasnt so rare?

    P(D|T) = 1.0

    0.001

    1.0 0.001 + 0.0 0.999= 1.0

    P(D|T) =P(T|D)P(D)

    P(T|D)P(D) + P(T|D)P(D)

    P(D|T) =. .

    0.9 0.1 + 0.1 0.9= 0.5

  • 7/29/2019 L2 Probabilistic Reasoning

    31/39

    Repeating the test

    We can relax, P(D|T) = 0.0089, right? Just to be sure the doctor recommends repeating the test.

    How do we represent this?

    Again, we apply Bayes rule

    How do we model P(T1,T2|D)?

    P(D|T1, T2)

    P(D|T1, T2) =P(T1, T2|D)P(D)

    P(T1, T2)

  • 7/29/2019 L2 Probabilistic Reasoning

    32/39

    Modeling repeated tests

    Easiest is to assume the tests are independent.

    This also implies:

    Plugging these in, we have

    P(T1, T2|D) = P(T1|D)P(T2|D)

    P(D|T1, T2) =P(T1, T2|D)P(D)

    P(T1, T2)

    P(T1, T2) = P(T1)P(T2)

    P(D|T1, T2) =P T1 D P T2 D P D

    P(T1)P(T2)

  • 7/29/2019 L2 Probabilistic Reasoning

    33/39

    Evaluating the normalizing constant again

    Expanding as before we have

    Plugging in the numbers gives us

    Another way to think about this:- Whats the chance of 1 false positive from the test?

    - Whats the chance of 2 false positives?

    The chance of 2 false positives is still 10x more likely than the a prior probability ofhaving the disease.

    P(D|T1, T2) = P(T1|D)P(T2|D)P(D)D={t,f} P(T1|D)P(T2|D)P(D)

    P(D|T) = 0.9

    0.9

    0.001

    0.9 0.9 0.001 + 0.1 0.1 0.999= 0.075

  • 7/29/2019 L2 Probabilistic Reasoning

    34/39

    Simpler: Combining information the Bayesian way

    Lets look at the equation again:

    If we rearrange slightly:

    Its the posterior for the first test, which we just computed

    P(D|T1, T2) =P(T1|D)P(T2|D)P(D)

    P(T1)P(T2)

    P(D|T1, T2) =P(T2|D)P(T1|D)P(D)

    P(T2)P(T1)

    Weve seen this

    before!

    P(D

    |T1) =

    P(T1|D)P(D)

    P(T1)

  • 7/29/2019 L2 Probabilistic Reasoning

    35/39

    The old posterior is the new prior

    We can just plugin the value of the old posterior It plays exactly the same role as our old prior

    Plugging in the numbers gives the same answer:

    P(D|T1, T2) =P T2 D P T1 D P D

    P(T2)P(T1)

    P(D|T1, T2) =P(T2|D) 0.0089

    P(T2)

    P(D|T) =P(T|D)P(D)

    P(T|D)P(D) + P(T|D)P(D)

    P(D|T) = 0.9 0.0089

    0.9 0.0089 + 0.1 0.9911= 0.075

    This is how Bayesian

    reasoning combines old

    information with new

    information to update

    our belief states.

  • 7/29/2019 L2 Probabilistic Reasoning

    36/39

    Example 1.2 (Hamburgers). Consider the following fictitious scientific information: Doctors find thatpeople with Kreuzfeld-Jacob disease (KJ) almost invariably ate hamburgers, thus p(Hamburger Eater|KJ ) =0.9. The probability of an individual having KJ is currently rather low, about one in 100,000.

    1. Assuming eating lots of hamburgers is rather widespread, say p(Hamburger Eater) = 0.5, what is theprobability that a hamburger eater will have Kreuzfeld-Jacob disease?

    This may be computed as

    p(KJ |Hamburger Eater) =p(Hamburger Eater, KJ )

    p(Hamburger Eater)=

    p(Hamburger Eater|KJ )p(KJ )

    p(Hamburger Eater)

    (1.2.1)

    =9

    10

    1

    100000

    1

    2

    = 1.8 105 (1.2.2)

    2. If the fraction of people eating hamburgers was rather small, p(Hamburger Eater) = 0.001, what is theprobability that a regular hamburger eater will have Kreuzfeld-Jacob disease? Repeating the abovecalculation, this is given by

    9

    10

    1

    100000

    1

    1000

    1/100 (1.2.3)

    This is much higher than in scenario (1) since here we can be more sure that eating hamburgers isrelated to the illness.

  • 7/29/2019 L2 Probabilistic Reasoning

    37/39

    Example 1.3 (Inspector Clouseau). Inspector Clouseau arrives at the scene of a crime. The victim lies deadin the room alongside the possible murder weapon, a knife. The Butler (B) and Maid (M) are the inspectorsmain suspects and the inspector has a prior belief of 0.6 that the Butler is the murderer, and a prior beliefof 0.2 that the Maid is the murderer. These beliefs are independent in the sense that p(B, M) = p(B)p(M).(It is possible that both the Butler and the Maid murdered the victim or neither). The inspectors priorcriminal knowledge can be formulated mathematically as follows:

    dom(B) = dom(M) = {murderer, not murderer} , dom(K) = {knife used, knife not used} (1.2.4)

    p(B = murderer) = 0.6, p(M = murderer) = 0.2 (1.2.5)

    p(knife used|B = not murderer, M = not murderer) = 0.3p(knife used|B = not murderer, M = murderer) = 0.2p(knife used|B = murderer, M = not murderer) = 0.6p(knife used|B = murderer, M = murderer) = 0.1

    (1.2.6)

    In addition p(K , B , M ) = p(K|B, M)p(B)p(M). Assuming that the knife is the murder weapon, what isthe probability that the Butler is the murderer? (Remember that it might be that neither is the murderer).Using b for the two states of B and m for the two states of M,

    p(B|K) =X

    m

    p(B, m|K) =X

    m

    p(B,m,K)

    p(K)=

    Pm p(K|B, m)p(B, m)Pm,b p(K|b, m)p(b, m)

    =p(B)P

    m p(K|B, m)p(m)Pb p(b)P

    m p(K|b, m)p(m)(1.2.7)

  • 7/29/2019 L2 Probabilistic Reasoning

    38/39

    Example 1.5 (Aristotle : Resolution). We can represent the statement All apples are fruit by p(F = tr|A =tr) = 1. Similarly, All fruits grow on trees may be represented by p(T = tr|F = tr) = 1. Additionallywe assume that whether or not something grows on a tree depends only on whether or not it is a fruit,p(T|A,F) = P(T|F). From this we can compute

    p(T = tr|A = tr) = XFp(T = tr|F, A = tr)p(F|A = tr) = X

    Fp(T = tr|F)p(F|A = tr)

    = p(T = tr|F = fa)p(F = fa|A = tr)| {z }=0

    +p(T = tr|F = tr)| {z }=1

    p(F = tr|A = tr)| {z }=1

    = 1 (1.2.16)

    In other words we have deduced that All apples grow on trees is a true statement, based on the informationpresented. (This kind of reasoning is called resolution and is a form of transitivity : from the statementsA) F and F) T we can infer A) T).

  • 7/29/2019 L2 Probabilistic Reasoning

    39/39

    Next time

    Bayesian belief networks