bayesian network. introduction independence assumptions seems to be necessary for probabilistic...

45
Bayesian Network

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Bayesian Network

Page 2: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Introduction

Independence assumptions Seems to be necessary for probabilistic inference

to be practical.Naïve Bayes Method Makes independence assumptions that are often

not true Also called Idiot Bayes Method for this reason.

Bayesian Network Explicitly models the independence relationships

in the data. Use these independence relationships to make

probabilistic inferences. Also known as: Belief Net, Bayes Net, Causal Net,

Page 3: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Why Bayesian Networks?Intuitive language

Can utilize causal knowledge in constructing models

Domain experts comfortable building a network

General purpose “inference” algorithms P(Bad Battery | Has Gas, Won’t Start)

Exact: Modular specification leads to large computational efficiencies

Gas

Start

Battery

Page 4: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Random Variables

A random variable is a set of exhaustive and mutually exclusive possibilities.Example:

throwing a die: small {1,2} medium: {3,4} large: {5,6}

Medical data patient’s age blood pressure

Variable vs. Eventa variable taking a value = an event.

Page 5: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Independence of Variables

Instantiation of a variable is an event.A set of variables are independent iff all possible instantiations of the variables are independent.Example:

X: patient blood pressure {high, medium, low}Y: patient sneezes {yes, no}P(X=high, Y=yes) = P(X=high) x P(Y=yes)P(X=high, Y=no) = P(X=high) x P(Y=no)... ...P(X=low, Y=yes) = P(X=low) x P(Y=yes)P(X=low, Y=no) = P(X=low) x P(Y=no)

Conditional independence between a set of variables holds iff the conditional independence between all possible instantiations of the variables holds.

Page 6: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Bayesian Networks: Definition

Bayesian networks are directed acyclic graphs (DAGs).

Nodes in Bayesian networks represent random variables, which is normally assumed to take on discrete values.

The links of the network represent direct probabilistic influence.

The structure of the network represents the probabilistic dependence/independence relationships between the random variables represented by the nodes.

Page 7: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Example

family-out (fo)

light-on (lo) dog-out (do)

bowel-problem (bp)

hear-bark (hb)

Page 8: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Bayesian Network: Probabilities

The nodes and links are quantified with probability distributions.

The root nodes (those with no ancestors) are assigned prior probability distributions.

The other nodes are assigned with the conditional probability distribution of the node given its parents.

Page 9: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Example

family-out (fo)

light-on (lo) dog-out (do)

bowel-problem (bp)

hear-bark (hb)

P(fo)=.15 P(bp)=.01

P(lo | fo)=.6 P(lo | fo)=.05

P(do| fo, bp)=.99 P(do| fo, bp)=.90 P(do| fo, bp)=.97 P(do| fo, bp)=.3

P(hb | do)=.7 P(hb | do)=.01Conditional Probability Tables (CPTs)

Page 10: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Noisy-OR-Gate

Exception Independence

Noisy-OR-Gate—the exceptions to the causations are independent.

P(E|C1, C2)=1-(1-P(E|C1))(1-P(E|C2))

C1 C2

E

Inhibitor1 Inhibitor2

Page 11: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Inference in Bayesian Networks

Given a Bayesian network and its CPTs, we can compute probabilities of the following form:

P(H | E1, E2,... ... , En)

where H, E1, E2,... ... , En are assignments to nodes (random variables) in the network.

Example:

The probability of family-out given lights out and hearing bark:

P( fo | lo, hb).

Page 12: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Example: Car Diagnosis

Page 13: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

MammoNet

Page 14: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Applications of Bayesian NetworksDoctor's

NoteStudent

Ill

Frequent Absence

Student Disinterested

Computer Crash

Paper Late

Dog Ate It

“Dog Ate It”

Power Failure

Clocks Slow

Lives in Dorm

Hungry Dog

“Computer Crash”

Make up Excuse

Page 15: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Application of Bayesian Network:Diagnosis

Page 16: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

ARCO1: Forecasting Oil Prices

Page 17: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

ARCO1: Forecasting Oil Prices

Page 18: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Semantics of Belief Networks

Two ways understanding the semantics of belief network Representation of the joint probability

distribution Encoding of a collection of conditional

independence statements

Page 19: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Terminology

Descendent

Ancestor

Parent

Non-descendent

X

Y1 Y2

Non-descendent

Page 20: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Three Types of Connections

a

b

ab

ca

c

b

c

Linear

Converging Diverging

family-out

dog-out

bowel-problemfamily-out

dog-out

hear-bark

family-out

dog-outlight-on

Page 21: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Connection Pattern and IndependenceLinear connection: The two end variables are

usually dependent on each other. The middle variable renders them independent.

Converging connection: The two end variables are usually independent on each other. The middle variable renders them dependent.

Divergent connection: The two end variables are usually dependent on each other. The middle variable renders them independent.

family-out

dog-out

bowel-problemfamily-out

dog-out

hear-bark

family-out

dog-outlight-on

Page 22: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

D-Separation

A variable a is d-separated from b by a set of variables E if there does not exist a d-connecting path between a and b such that None of its linear or diverging nodes is in E

For each of the converging nodes, either it or one of its descendents is in E.

Intuition: The influence between a and b must propagate

through a d-connecting path

Page 23: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

If a and b are d-separated by E, then they are conditionally independent of each other given E:

P(a, b | E) = P(a | E) x P(b | E)

Page 24: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Example Independence Relationships

family-out (fo)

light-on (lo) dog-out (do)

bowel-problem (bp)

hear-bark (hb)

Page 25: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Chain RuleA joint probability distribution can be

expressed as a product of conditional probabilities:

P(X1, X2, ..., Xn)

=P(X1) x P(X2, X1, ..., Xn | X1)

=P(X1) x P(X2|X1) x P(X3, X4, ..., Xn| X1, X2)

=P(X1) x P(X2|X1) x P(X3|X1,X2) x P(X4, ..., Xn|X1,X2,X3)... ...

= P(X1) x P(X2|X1) x P(X3|X1, X2)x ...x P(Xn| X1, ...Xn-1)This has nothing to do with any independence assumption!

Page 26: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Compute the Joint Probability

Given a Bayesian network, let X1, X2, ..., Xn be an ordering of the nodes such that only the nodes that are indexed lower than i may have directed path to Xi.

Since the parents of Xi, d-separates Xi and all the other nodes that are indexed lower than i,

P(Xi| X1, ...Xi-1)=P(Xi| parents(Xi))

This probability is available in the Bayesian network.

Therefore, P(X1, X2, ..., Xn) can be computed from the probabilities available in Beyesian network.

Page 27: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

family-out (fo)

light-on (lo) dog-out (do)

bowel-problem (bp)

hear-bark (hb)

P(fo)=.15 P(bp)=.01

P(lo | fo)=.6 P(lo | fo)=.05

P(do| fo, bp)=.99 P(do| fo, bp)=.90 P(do| fo, bp)=.97 P(do| fo, bp)=.3

P(hb | do)=.7 P(hb | do)=.01

P(fo, ¬lo, do, hb, ¬bp) = ?

Page 28: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

What can Bayesian Networks Compute?The inputs to a Bayesian Network

evaluation algorithm is a set of evidences: e.g.,

E = { hear-bark=true, lights-on=true }

The outputs of Bayesian Network evaluation algorithm are

P(Xi=v | E)

where Xi is an variable in the network.

For example: P(family-out=true| E) is the probability of family being out given hearing dog's bark and seeing the lights on.

Page 29: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Computation in Bayesian Networks

Computation in Bayesian networks is NP-hard. All algorithms for computing the probabilities are exponential to the size of the network.

There are two ways around the complexity barrier: Algorithms for special subclass of networks,

e.g., singly connected networks.

Approximate algorithms.

The computation for singly connected graph is linear to the size of the network.

Page 30: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Bayesian Network Inference

Inference: calculating P(X |Y ) for some variables or sets of variables X and Y.Inference in Bayesian networks is #P-hard!

Reduces to

How many satisfying assignments?

I1 I2 I3 I4 I5

O

Inputs: prior probabilities of .5

P(O) must be(#sat. assign.)*(.5^#inputs)

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 31: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Bayesian Network Inference

But…inference is still tractable in some cases.Let’s look a special class of networks: trees / forests in which each node has at most one parent.

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 32: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Decomposing the probabilities

Suppose we want P(Xi | E ) where E is some set of evidence variables.Let’s split E into two parts:

Ei- is the part consisting of assignments to variables in

the subtree rooted at Xi

Ei+ is the rest of it

Xi

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 33: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Decomposing the probabilities

)(λ)(απ

)|(

)|()|(

)|(

)|(),|(

),|()|(

ii

ii

ii

ii

iii

iiii

XX

EEP

EXPXEP

EEP

EXPEXEP

EEXPEXP

Xi

Where:• is a constant independent of Xi

• (Xi) = P(Xi |Ei+)

• (Xi) = P(Ei-| Xi)

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 34: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Using the decomposition for inference

We can use this decomposition to do inference as follows. First, compute (Xi) = P(Ei

-| Xi) for all Xi recursively, using the leaves of the tree as the base case.If Xi is a leaf: If Xi is in E : (Xi) = 0 if Xi matches E, 1

otherwise If Xi is not in E : Ei

- is the null set, so P(Ei

-| Xi) = 1 (constant)

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 35: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Quick aside: “Virtual evidence”

For theoretical simplicity, but without loss of generality, let’s assume that all variables in E (the evidence set) are leaves in the tree.Why can we do this WLOG:

Xi

Xi

Xi’Observe Xi

Equivalent to

Observe Xi’

Where P(Xi’| Xi) =1 if Xi’=Xi, 0 otherwise

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 36: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Calculating (Xi) for non-leaves

Suppose Xi has one child, Xj

Then:

Xi

Xj

j

j

j

j

Xjij

Xjiij

Xjiiij

Xijiiii

XXXP

XEPXXP

XXEPXXP

XXEPXEPX

)(λ)|(

)|()|(

),|()|(

)|,()|()(λ

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 37: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Calculating (Xi) for non-leaves

Now, suppose Xi has a set of children, C.

Since Xi d-separates each of its subtrees, the contribution of each subtree to (Xi) is independent:

CX Xjij

CXijiii

j j

j

XXXP

XXEPX

)λ()|(

)(λ)|()(λ

where j(Xi) is the contribution to P(Ei-| Xi) of the

part of the evidence lying in the subtree rooted at one of Xi’s children Xj.

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 38: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

We are now -happySo now we have a way to recursively compute all the (Xi)’s, starting from the root and using the leaves as the base case.If we want, we can think of each node in the network as an autonomous processor that passes a little “ message” to its parent.

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 39: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

The other half of the problem

Remember, P(Xi|E) = (Xi)(Xi). Now that we have all the (Xi)’s, what about the (Xi)’s?(Xi) = P(Xi |Ei

+).What about the root of the tree, Xr? In that case, Er

+ is the null set, so (Xr) = P(Xr). No sweat. Since we also know (Xr), we can compute the final P(Xr).So for an arbitrary Xi with parent Xp, let’s inductively assume we know (Xp) and/or P(Xp|E). How do we get (Xi)?

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 40: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Computing (Xi)

Xp

Xi

Where i(Xp) is defined as

p

p

p

p

p

Xpipi

X pi

ppi

Xippi

Xipipi

Xipiiii

XXXP

X

EXPXXP

EXPXXP

EXPEXXP

EXXPEXPX

)(π)|(

)(λ

)|()|(

)|()|(

)|(),|(

)|,()|()(π

)(λ

)|(

pi

p

X

EXP

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 41: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

We’re done. Yay!

Thus we can compute all the (Xi)’s, and, in turn, all the P(Xi|E)’s.Can think of nodes as autonomous processors passing and messages to their neighbors

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 42: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Conjunctive queries

What if we want, e.g., P(A, B | C) instead of just marginal distributions P(A | C) and P(B | C)?Just use chain rule: P(A, B | C) = P(A | C) P(B | A, C) Each of the latter probabilities

can be computed using the technique just discussed.

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 43: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Polytrees

Technique can be generalized to polytrees: undirected versions of the graphs are still trees, but nodes can have more than one parent

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 44: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Dealing with cyclesCan deal with undirected cycles in graph by clustering variables together

Conditioning

A

B C

D

A

D

BC

Set to 0 Set to 1

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt

Page 45: Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence

Join trees

Arbitrary Bayesian network can be transformed via some evil graph-theoretic magic into a join tree in which a similar method can be employed.

A

B

E D

F

C

G

ABC

BCD BCD

DF

In the worst case the join tree nodes must take on exponentiallymany combinations of values, but often works well in practice

www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt