reasoning under uncertainty

267
Reasoning Under Uncertainty Radu Marinescu 4C @ University College Cork

Upload: obelia

Post on 24-Feb-2016

43 views

Category:

Documents


1 download

DESCRIPTION

Reasoning Under Uncertainty. Radu Marinescu 4C @ University College Cork. Why uncertainty?. Uncertainty in medical diagnosis Diseases produce symptoms In diagnosis, observed symptoms => disease ID Uncertainties Symptoms may not occur Symptoms may not be reported - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reasoning Under Uncertainty

Reasoning Under Uncertainty

Radu Marinescu4C @ University College Cork

Page 2: Reasoning Under Uncertainty

Why uncertainty?• Uncertainty in medical diagnosis

Diseases produce symptoms In diagnosis, observed symptoms => disease ID Uncertainties

• Symptoms may not occur• Symptoms may not be reported• Diagnostic tests are not perfect

– False positive, false negative

• How do we estimate confidence? P(disease | symptoms, tests) = ?

Page 3: Reasoning Under Uncertainty

Why uncertainty?• Uncertainty in medical decision-making

Physicians, patients must decide on treatments Treatments may not be successful Treatments may have unpleasant side effects

• Choosing treatments Weigh risks of adverse outcomes

• People are BAD at reasoning intuitively about probabilities Provide systematic analysis

Page 4: Reasoning Under Uncertainty

Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief (or Bayesian) networks

Example networks and software• Inference in belief networks

Exact inference• Variable elimination, join-tree clustering, AND/OR search

Approximate inference• Mini-clustering, belief propagation, sampling

Page 5: Reasoning Under Uncertainty

Bibliography• Judea Pearl. “Probabilistic reasoning in intelligent systems”, 1988

• Stuart Russell & Peter Norvig. “Artificial Intelligence. A Modern Approach”, 2002 (Ch 13-17)

• Kevin Murphy. "A Brief Introduction to Graphical Models and Bayesian Networks"http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

• Rina Dechter. "Bucket Elimination: A Unifying Framework for Probabilistic Inference"http://www.ics.uci.edu/~csp/R48a.ps

• Rina Dechter. "Mini-Buckets: A General Scheme for Approximating Inference"http://www.ics.uci.edu/~csp/r62a.pdf

• Rina Dechter & Robert Mateescu. "AND/OR Search Spaces for Graphical Models".http://www.ics.uci.edu/~csp/r126.pdf

Page 6: Reasoning Under Uncertainty

Reasoning under uncertainty• A problem domain is modeled by a list of (discrete)

random variables: X1, X2, …, Xn

• Knowledge about the problem is represented by a joint probability distribution: P(X1, X2, …, Xn)

Page 7: Reasoning Under Uncertainty

Example• Alarm (Pearl88)

Story: In Los Angeles, burglary and earthquake are common. They both can trigger an alarm. In case of alarm, two neighbors John and Mary may call 911

Problem: estimate the probability of a burglary based on who has or has not called

Variables: • Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M)

Knowledge required by the probabilistic approach in order to solve this problem: P(B, E, A, J, M)

Page 8: Reasoning Under Uncertainty

Joint probability distributionDefines probabilities for all possible value assignments to the variables in the set

Page 9: Reasoning Under Uncertainty

Inference with joint probability distribution

• What is the probability of burglary given that Mary called, P(B=y | M=y)?

• Compute marginal probability:

• Compute answer (reasoning by conditioning):

JAE

MJAEBPMBP,,

),,,,(),(

B M P(B,M)

y y 0.000115

y n 0.000075

n y 0.00015

n n 0.99971

43.000015.0000115.0

000115.0)(

),()|(

yMPyMyBPyMyBP

Page 10: Reasoning Under Uncertainty

Advantages• Probability theory well-established and well understood

• In theory, can perform arbitrary inference among the variables given a joint probability. This is because the joint probability contains information of all aspects of the relationships among the variables Diagnostic inference:

• From effects to causes• Example: P(B=y | M=y)

Predictive inference:• From causes to effects• Example: P(M=y | B=y)

Combining evidence: P(B=y | J=y, M=y, E=n)

• All inference sanctioned by probability theory and hence has clear semantics

Page 11: Reasoning Under Uncertainty

Difficulty: complexity in model construction and inference

• In Alarm example: 32 numbers needed (parameters) Quite unnatural to assess

• P(B=y, E=y, A=y, J=y, M=y) Computing P(B=y | M=y) takes 29 additions

• In general, P(X1, X2, …, Xn) needs at least 2n numbers to specify the

joint probability distribution Knowledge acquisition difficult (complex, unnatural) Exponential storage and inference

Page 12: Reasoning Under Uncertainty

Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief networks

Example networks and software• Inference in belief networks

Exact inference Approximate inference

• Miscellaneous Mixed networks, influence diagrams, etc.

Page 13: Reasoning Under Uncertainty

Chain rule and factorization• Overcome the problem of exponential size by

exploiting conditional independencies The chain rule of probability:

No gains yet. The number of parameters required by the factors is still O(2n)

n

iiin XXXPXXXP

XXXPXXPXPXXXPXXPXPXXP

11121

123121321

12121

),|(),,,(

),|()|()(),,()|()(),(

Page 14: Reasoning Under Uncertainty

Conditional independence• A random variable X is conditionally

independent of a set of random variables Y given a set of random variables Z if P(X | Y, Z) = P(X | Z)

• Intuitively: Y tells us nothing more about X than we know by

knowing Z As far as X is concerned, we can ignore Y if we

know Z

Page 15: Reasoning Under Uncertainty

Conditional independence• About P(Xi|X1,…,Xi-1):

Domain knowledge usually allows one to identify a subset pa(Xi) {X1, …, Xi-1} such that

• Given pa(Xi), Xi is independent of all variables in {X1,…,Xi-1} \ pa(Xi), i.e.

P(Xi | X1, …, Xi-1) = P(Xi | pa(Xi))

• Then

• Joint distribution factorized!• The number of parameters might have been substantially

reduced

n

iiin XpaXPXXXP

121 ))(|(),...,,(

Page 16: Reasoning Under Uncertainty

Example continued

• pa(B) = {}, pa(E) = {}, pa(A) = {B,E}, pa(J) = {A}, pa(M) = {A}• Conditional probability tables (CPT)

)|()|(),|()()(),,,|(),,|(),|()|()(

),,,,(

AMPAJPEBAPEPBPJAEBMPAEBJPEBAPBEPBP

MJAEBP

B P(B)

Y .01

N .99

E P(E)

Y .02

N .98

M A P(M|A)Y Y .9N Y .1Y N .05N N .95

J A P(J|A)Y Y .7N Y .3Y N .01N N .99

A B E P(A|B,E)

Y Y Y .95

N Y Y .05

Y Y N .94

N Y N .06

Y N Y .29

N N Y .71

Y N N .001

N N N .999

Page 17: Reasoning Under Uncertainty

Example continued• Model size reduced from 32 to 2+2+4+4+8=20• Model construction easier

Fewer parameters to assess Parameter more natural to assess

• e.g., P(B=y), P(J=y | A=y), P(A=y | B=y, E=y), etc.

• Inference easier. Will see this later.

Page 18: Reasoning Under Uncertainty

Outline• Probabilistic modeling with joint distributions• Conditional Independence and factorization• Belief networks

Example networks and software• Inference in belief networks

Exact inference Approximate inference

Page 19: Reasoning Under Uncertainty

From factorization to belief networks• Graphically represent the conditional independency

relationships: Construct a directed graph by drawing an arc from Xj to Xi iff Xj

pa(Xi)

Also attach the CPT P(Xi | pa(Xi)) to node Xi

B E

A

J M

P(B) P(E)

P(A|B,E)

P(J|A) P(M|A)

Page 20: Reasoning Under Uncertainty

Formal definition• A belief network is:

A directed acyclic graph (DAG), where:• Each node represents a random variable• And is associated with the conditional probability of the node given

its parents Represents the joint probability distribution:

A variable is conditionally independent of its non-descendants given its parents

n

iiin XpaXPXXXP

121 ))(|(),...,,(

Page 21: Reasoning Under Uncertainty

Independences in belief networks• 3 basic independence structures

Burglary

Alarm

JohnCalls

1: chain

Burglary

Alarm

Earthquake

2: common descendants

MaryCalls

Alarm

JohnCalls

3: common ancestors

Page 22: Reasoning Under Uncertainty

Independences in belief networks

Burglary

Alarm

JohnCalls

1. JohnCalls is independent of Burglary given Alarm

)|()|()|,()|(),|(

ABPAJPABJPAJPBAJP

Page 23: Reasoning Under Uncertainty

Independences in belief networks

Burglary

Alarm

Earthquake

2. Burglary is independent of Earthquake not knowing Alarm.Burglary and Earthquake become dependent given Alarm!!

)|()|()|,()()(),(

AEPABPAEBPEPBPEBP

Page 24: Reasoning Under Uncertainty

Independences in belief networks

MaryCalls

Alarm

JohnCalls

3. MaryCalls is independent of JohnCalls given Alarm.

)|()|()|,()|(),|(

AMPAJPAMJPAJPMAJP

Page 25: Reasoning Under Uncertainty

Independences in belief networks• BN models many conditional independence relations relating distant

variables and sets, which are defined in terms of the graphical criterion called d-separation

• d-separation = conditional independence Let X, Y and Z be three sets of nodes If X and Y are d-separated by Z, then X and Y are conditionally independent given

Z: P(X|Y, Z) = P(X|Z)

• d-separation in the graph: A is d-separated from B given C if every undirected path between them is

blocked

• Path blocking 3 cases that expand on three basic independence structures

Page 26: Reasoning Under Uncertainty

Undirected path blocking• With “linear” substructure

• With “wedge” substructure (common ancestors)

• With “vee” substructure (common descendants)

X YZ in C

X YZ in C

X

Y

Z or any of its descendants not in C

Page 27: Reasoning Under Uncertainty

Example

1

2 3

4

5

X Y

ZX = {2} and Y = {3} are d-separated by Z = {1}

• path 2 1 3 is blocked by 1 Z• path 2 4 3 is blocked because 4 and all itsdescendants are outside Z

X = {2} and Y = {3} are not d-separated by Z = {1,5}

• path 2 1 3 is blocked by 1 Z• path 2 4 3 is activated because 5 (which isa descendant of 4) is in Z

• learning the value of consequence 5 renders5’s causes 2 and 3 dependant

)|()|()|,( ZYPZXPZYXP

)|()|()|,( ZYPZXPZYXP

Page 28: Reasoning Under Uncertainty

I-mapness• Given a probability distribution P on a set of

variables {X1, …, Xn}, a belief network B representing P is a minimal I-map (Pearl88) I-mapness: every d-separation condition displayed

in B corresponds to a valid conditional independence relationship in P

Minimal: none of the arrows in B can be deleted without destroying its I-mapness

Page 29: Reasoning Under Uncertainty

Full joint distribution in BN

B E

A

J M

P(B,E,A,J,M) =

Rewrite the full joint probability using the product rule:

= P(J|B,E,A,M) P(B,E,A,M)

= P(J|A)P(B,E,A,M)

P(M|B,E,A) P(B,E,A) P(M|A) P(B,E,A)

P(A|B,E) P(B,E)

P(B) P(E)

= P(J|A) P(M|A) P(A|B,E) P(B) P(E)

Page 30: Reasoning Under Uncertainty

Example network

PCWP COHRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

The “alarm” network: Monitoring Intensive-Care Patients37 variables, 509 parameters (instead of 237)

Page 31: Reasoning Under Uncertainty

Software• GeNIe (University of Pittsburgh) - free

http://genie.sis.pitt.edu• SamIam (UCLA) - free

http://reasoning.cs.ucla.edu/SamIam/• Hugin - commercial

http://www.hugin.com• Netica - commercial

http://www.norsys.com• UCI Lab – free but no GUI

http://graphmod.ics.uci.edu/

Page 32: Reasoning Under Uncertainty

GeNIe screenshot

Page 33: Reasoning Under Uncertainty

Applications• Belief networks are used in:

Genetic linkage analysis Speech recognition Medical diagnosis Probabilistic error correcting coding Monitoring and diagnosis in distributed systems Troubleshooting (Microsoft) …

Page 34: Reasoning Under Uncertainty

Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief networks• Inference in belief networks

Exact inference Approximate inference

Page 35: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) VE+C hybrid AND/OR search (tree, graph)

Page 36: Reasoning Under Uncertainty

Belief updating

Smoking

BronchitisLung cancer

X-ray Dyspnoea

P(Lung cancer = yes | Smoking = no, Dyspnoea = yes) ?

Page 37: Reasoning Under Uncertainty

Probabilistic inference tasks• Belief updating

• Maximum probable explanation (MPE)

• Maximum a posteriori hypothesis (MAP)

)|()( evidencexXPXBEL iii

),(maxarg* exPxx

AXa

k exPaa\

**1 ),(maxarg),...,(

Page 38: Reasoning Under Uncertainty

Belief updating: P(X|evidence) = ?

A

B C

ED

P(A)

P(C|A)P(B|A)

P(E|B,C)

P(D|A,B)

P(A|E=0)

∑E=0,D,C,B P(A) P(B|A) P(C|A) P(D|A,B) P(E|B,C) =

α P(A,E=0) =

P(A) ∑E=0 ∑D ∑C P(C|A) ∑B P(B|A) P(D|A,B) P(E|B,C)

λB(A,D,C,E)Variable Elimination

Page 39: Reasoning Under Uncertainty

Bucket elimination

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

A

B C

ED

Moralize (“marry parents”)

Bucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C), P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

Ordering: A, E, D, C, B

P(C|A)

Page 40: Reasoning Under Uncertainty

The bucket operation

ELIMINATION: multiply (*) and sum (∑)

bucket(B): { P(E|B,C), P(D|A,B), P(B|A) }

λB(A,C,D,E) = ∑B P(B|A)*P(D|A,B)*P(E|B,C)

OBSERVED BUCKET:

bucket(B): { P(E|B,C), P(D|A,B), P(B|A), B=1 }

λB(A) = P(B=1|A) λB(A,D) = P(D|A,B=1)

λB(E,C) = P(E|B=1,C)

Page 41: Reasoning Under Uncertainty

Multiplying functions

Page 42: Reasoning Under Uncertainty

Summing out a variable

Page 43: Reasoning Under Uncertainty

Bucket elimination

Bucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C), P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

∑∏ Elimination operator

λB(A,D,C,E)

λC(A,D,E)

λD(A,E)

λE(A)

P(A,E=0)

B

C

D

E

A

w* = 4“induced width”(max clique size)

Page 44: Reasoning Under Uncertainty

Induced graph

B

C

D

E

A

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

P(C|A)

Induced width of the ordering w*(d)||

max width of the nodes

A

B C

ED

Page 45: Reasoning Under Uncertainty

Complexity of elimination))(*exp(( dwnO

w*(d) – induced width of the moral graph along ordering d

A

B C

ED

“Moral” graph

B

C

D

E

A

w*(d1) = 4

E

D

C

B

A

w*(d2) = 2

Page 46: Reasoning Under Uncertainty

Finding small induced-width orderings• NP-complete• A tree has induced width of ?• Greedy algorithms:

Min-width Min induced-width Max-cardinality Min-fill (thought as the best) Anytime min-width (via Branch-and-Bound)

Page 47: Reasoning Under Uncertainty

MPE: Most Probable Explanation

Smoking

BronchitisLung Cancer

X-ray Dyspnoea

)1,,,,0(maxarg)1,',',',0( DXBCSPdxcbs

),|1(),0|()0|()0|()0()1,,,,0( CBDPCSXPSBPSCPSPDXBCSP

Page 48: Reasoning Under Uncertainty

Applications• Probabilistic decoding

A stream of bits is transmitted across a noisy channel and the problem is to recover the transmitted stream given the observed output and parity check bits

x0 x1 x2 x3 x4

u0 u1 u2 u3 u4

y0u y1

u y2u y3

u y4u

y0x y1

x y2x y3

x y4x

Transmitted bits

Parity check bits

Received bits (observed)

Received parity check bits (observed)

Page 49: Reasoning Under Uncertainty

Applications• Medical diagnosis

Given some observed symptoms, determine the most likely subset of diseases that may explain the symptoms

Symptom2

Symptom3

Symptom4

Symptom5

Symptom1 Symptom6

Disease1 Disease2 Disease4

Disease6Disease5

Disease3 Disease7

Page 50: Reasoning Under Uncertainty

Applications• Genetic linkage analysis

Given the genotype information of a pedigree, infer the maximum likelihood haplotype configuration (maternal and paternal) of the unobserved individuals

2 1A AB B

a ab b

A aB b 3

genotyped

haplotype

S23m

L21fL21m

L23m

X21 S23f

L22fL22m

L23f

X22

X23

S13m

L11fL11m

L13m

X11 S13f

L12fL12m

L13f

X12

X13

Locus 1

Locus 2

(Fishelson & Geiger, 2002)

Page 51: Reasoning Under Uncertainty

Bucket elimination for MPE

A

B C

ED

P(A)

P(C|A)P(B|A)

P(E|B,C)

P(D|A,B)

MPE =

maxA,E=0,D,C,B P(A) P(B|A) P(C|A) P(D|A,B) P(E|B,C) =

maxAP(A) maxE=0 maxD maxC P(C|A) maxB P(B|A) P(D|A,B) P(E|B,C)

λB(A,D,C,E)Variable Elimination

Page 52: Reasoning Under Uncertainty

Max out a variable

A B C f(A,B,C)

T T T 0.03

T T F 0.07

T F T 0.54

T F F 0.36

F T T 0.06

F T F 0.14

F F T 0.48

F F F 0.32

A C f(A,C)

T T 0.54

T F 0.36

F T 0.48

F F 0.32

max out B

Page 53: Reasoning Under Uncertainty

Bucket elimination

Bucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C), P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

max∏ Elimination/combination operators

λB(A,D,C,E)

λC(A,D,E)

λD(A,E)

λE(A)

MPE value

B

C

D

E

A

w* = 4“induced width”(max clique size)

width

4

3

1

1

0

Page 54: Reasoning Under Uncertainty

Generating the MPE tupleBucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C), P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

λB(A,D,C,E)

λC(A,D,E)

λD(A,E)

λE(A)a’ = argmax P(A)∙λE(A)

e’ = 0

d’ = argmax λC(a’,D,e’)

c’ = argmax P(C|a’) ∙∙ λC(a’,d’,C,e’)

b’ = argmax P(e’|B,c’) ∙ ∙ P(d’|a’,B) P(B|a’)∙

Return (a’, b’, c’, d’, e’)

Page 55: Reasoning Under Uncertainty

Complexity of elimination))(*exp(( dwnO

w*(d) – induced width of the moral graph along ordering d

A

B C

ED

“Moral” graph

B

C

D

E

A

w*(d1) = 4

E

D

C

B

A

w*(d2) = 2

Page 56: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) VE+C hybrid AND/OR search (tree, graph)

Page 57: Reasoning Under Uncertainty

From BE to Bucket-Tree elimination• Motivation

BE computes P(evidence) or P(X|evidence) where X is the last variable in the ordering

What if we need all marginal probabilities P(Xi|evidence), where Xi {X1, X2, …, Xn} ?

• Run BE n times with Xi being the last variable• Inefficient! – induced width may vary significantly from

one ordering to another• SOLUTION: Bucket-Tree Elimination (BTE)

Page 58: Reasoning Under Uncertainty

Bucket-Tree eliminationA

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

Bucket E:

Bucket D:

Bucket C:

Bucket B:

Bucket A:

P(E|B,C)

P(D|A,B)

P(B|A)

P(A)

P(C|A) λE(B,C)

λD(A,B) λC(A,B)

λB(A)

P(E|B,C)

P(D|A,B)

P(C|A)

P(B|A)

P(A)

E

D

C

B

A

λE(B,C)

λD(A,B)λC(A,B)

λB(A)

• Variable elimination can be viewed asmessage passing (elimination) using abucket tree• Any node (bucket) can be the root• Complexity: time and space exponentialin the induced width

P(C|A)

Page 59: Reasoning Under Uncertainty

Bucket-Tree (more formal)• Bucket Tree

A bucket tree has each bucket Bi as a node and there is an arc from Bi to Bj if the function created at Bi was placed in Bj

• Graph-based definition Let Gd be the induced graph along d. Each variable

X and its earlier neighbors is a node BX. There is an arc from BX to BY if Y is the closest parent to X.

Page 60: Reasoning Under Uncertainty

Bucket-Tree

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

Belief network

E

D

C

B

A

Induced graph

E,B,C

A,B,D

A,B,C

B,A

A

E

D

C

B

A

λE(B,C)

λD(A,B)λC(A,B)

λB(A)

Bucket tree

P(C|A)

Page 61: Reasoning Under Uncertainty

Bucket-Tree propagation

u

Xn

X2

X1

v

h(u,v)

)},(),,(),...,,(),,({)()( 21 uvhuxhuxhuxhuubucket n

Compute the message:

),(elim)},({)(

),(vu

uvhubucketffvuh

h(x1,u)

h(xn,u) elim(u,v) = vars(u) – vars(v)

Page 62: Reasoning Under Uncertainty

Upward messages in the bucket-tree

E,B,C

A,B,D

A,B,C

B,A

A

E

D

C

B

A

λE(B,C)

λD(A,B)λC(A,B)

λB(A)πA(A)

πC(B,C)

πB(A,B)πB(A,B)

A

CB

EC

DBA

CB

CDB

BA

BAACPCB

BAAABPBA

BAABPBA

APA

),()|(),(

),()()|(),(

),()|(),(

)()(

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

P(C|A)

Page 63: Reasoning Under Uncertainty

Computing marginals from the bucket-tree

E,B,C : P(E|B,C)

A,B,D : P(D|A,B)

A,B,C : P(C|A)

B,A : P(B|A)

A : P(A)

E

D

C

B

A

λE(B,C)

λD(A,B)λC(A,B)

λB(A)πA(A)

πC(B,C)

πB(A,B)πB(A,B)

BA

ECB CBBAACPevidenceCP

,

),(),()|()|(

Page 64: Reasoning Under Uncertainty

Buckets -> Super-buckets -> Clusters

G,F

F,B,C

D,B,A

A,B,C

B,A

A

F

B,C A,BA,B

A

G,F

F,B,C

D,B,A

A,B,C

F

B,CA,B

G,F

A,B,C,D,F

F

A

B C

FD

P(A)

P(B|A)

P(F|B,C)

P(D|A,B)

Time-space trade off! G

P(C|A)

P(G|F)

Page 65: Reasoning Under Uncertainty

Tree decomposition• A tree decomposition for a belief network ‹X,D,G,P› is a

triple ‹T,χ,ψ›, where T=(V,E) is a tree, and χ and ψ are labeling functions, associating with each vertex v V two sets χ(v) V and ψ(v) P such that: For each function (CPT) pi P there is exactly one vertex such

that pi ψ(v) and scope(pi) χ(v) For each variable Xi X, the set {v V | Xi χ(v)} forms a

connected sub-tree (running intersection property)

• A join-tree is a tree decomposition where all clusters are maximal E.g., a bucket-tree is a tree decomposition but not a join-tree

Page 66: Reasoning Under Uncertainty

Treewidth and separator• The width (aka treewidth) of a tree

decomposition ‹T,χ,ψ› is max|χ(v)|, and its hyperwidth is max|ψ(v)|

• Given two adjacent vertices u and v of a tree decomposition, a separator of u and v is defined as sep(u,v) = χ(u) χ(v)

Page 67: Reasoning Under Uncertainty

Finding join-tree decompositions• Good join trees using triangulation

Create induced graph G’ along some ordering d Identify all maximal cliques in G’ Order cliques {C1, C2, …, Ct} by rank of the highest

vertex in each clique Form the join tree by connecting each Ci to a

predecessor Cj (j < i) sharing the largest number of vertices with Ci

Page 68: Reasoning Under Uncertainty

Example

E

D

C

B

A

Induced graph

A

B C

ED

Moral graph

ECBC3

DBAC2

CBAC1

P(A)

P(B|A) P(C|A)

P(E|B,C)P(D|A,B)

BC

P(E|B,C)

P(D|A,B)

P(A), P(B|A), P(C|A)

AB

Treewidth = 3Separator size = 2

χ(C3)

ψ(C3)

separators

Page 69: Reasoning Under Uncertainty

Tree decomposition for belief updating

A

B

C D

F

E

G

ABCP(A), P(B|A), P(C|

A,B)

BCDFP(D|B), P(F|

C,D)

BEFP(E|E,F)

EFGP(G|E,F)

BC

BF

EF

1

2

3

4

Page 70: Reasoning Under Uncertainty

Tree decomposition for belief updating

A

B

C D

F

E

G

ABC

BCDF

BEF

EFG

BC

BF

EF

A

BACPABPAPCB ),|()|()(),()2,1(

DC

CBDCFPBDPFB,

)2,1()3,2( ),(),|()|(),(

B

FBFBEPFE ),(),|(),( )3,2()4,3(

),|(),()3,4( FEgGPFE

E

FEFBEPFB ),(),|(),( )3,4()2,3(

FD

FBDCFPBDPCB,

)2,3()1,2( ),(),|()|(),(

1

2

3

4Time: O(exp(w+1))Space: O(exp(sep))

Page 71: Reasoning Under Uncertainty

CTE - properties• Correctness and completeness

Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence

• Time complexity: O(deg x (n+N) x dw*+1)• Space complexity: O(N x dsep)

» deg = max degree of a node in T» n = number of variables (=number of CPTs)» N = number of nodes in T» d = maximum domain size» w* = induced width» sep = separator size

Page 72: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) Cycle cutset scheme VE+C hybrid AND/OR search (tree, graph)

Page 73: Reasoning Under Uncertainty

Conditioning

0 0 0 0

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

D

B

A 0 1

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

P(C|A)

?)0(

)0,()0|(

EP

EAPEAP

P(A=0)P(B=0|A=0)P(C=0|A=0)P(E=0|B=0,C=0)P(D=0|A=0,B=0)P(A=0)P(B=0|A=0)P(C=0|A=0)P(E=0|B=0,C=0)P(D=1|A=0,B=0)

…P(A=0)P(B=1|A=0)P(C=1|A=0)P(E=0|B=1,C=1)P(D=1|A=0,B=1)

∑ = P(A=0, E=0)

Page 74: Reasoning Under Uncertainty

Conditioning

0 0 0 0

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

D

B

A 0 1

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

P(C|A)

?)0(

)0,()0|(

EP

EAPEAP

P(A=0, E=0) P(A=1, E=0)

)0,1()0,0()0,0()0|0(

EAPEAP

EAPEAP

)0,1()0,0()0,1()0|1(

EAPEAP

EAPEAP

Page 75: Reasoning Under Uncertainty

Conditioning + Elimination

IDEA: condition until w* of the remaining graph gets small enough!

0 0 0 0

0 1 0 1

E

C

D

B

A 0 1

Search

Elimination

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

P(C|A)

w* = 1 w*

loop cutset

w* = ww* = 0

search w-cutset elimination

?)0( EP

Page 76: Reasoning Under Uncertainty

Loop-cutset method• Condition until we get a polytree (no loops)

subset of conditioning variables = loop-cutset

A

B C

ED

B C

ED

A=0 A=0 A=0

B C

ED

A=1 A=1 A=1

P(B|D=0) = P(B,A=0|D=0) + P(B,A=1|D=0)

Loop-cutset method is time exponential in loop-cutset size and linear space!

Page 77: Reasoning Under Uncertainty

w-cutset method

• Identify a w-cutset, Cw, of the network Finding smallest loop-cutset/w-cutset is NP-hard

• For each assignment of the cutset, solve by VE the conditioned subproblem

• Aggregate the solutions over all cutset assignments

• Time complexity: exp(|Cw| + w)• Space complexity: exp(w)

Page 78: Reasoning Under Uncertainty

Interleaving Conditioning and Elimination

Page 79: Reasoning Under Uncertainty

Interleaving Conditioning and Elimination

Eliminate

Page 80: Reasoning Under Uncertainty

Interleaving Conditioning and Elimination

Page 81: Reasoning Under Uncertainty

Interleaving Conditioning and EliminationEliminate

Page 82: Reasoning Under Uncertainty

Interleaving Conditioning and Elimination

Page 83: Reasoning Under Uncertainty

Interleaving Conditioning and Elimination

Condition

Page 84: Reasoning Under Uncertainty

Interleaving Conditioning and Elimination

...

...

Page 85: Reasoning Under Uncertainty

General graphical models• All algorithms generalize to any graphical

model Through general operations of combination and

marginalization General BE, BTE, CTE, VE+C Applicable to Markov networks, to constraint

optimization, to counting number of solutions in SAT/CSP, etc.

Page 86: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) VE+C hybrid Cycle cutset scheme AND/OR search (tree, graph)

Page 87: Reasoning Under Uncertainty

Solution techniques

Search: Conditioning

Complete

Incomplete

Gradient Descent

Complete

Incomplete

Tree ClusteringVariable Elimination

Mini-Clustering(i)Mini-Bucket(i)

Stochastic Local SearchDFS search

Inference: Elimination

Time: exp(treewidth)Space:exp(treewidth)

Time: exp(n)Space: linear

AND/OR searchTime: exp(treewidth*log n)Space: linear

Hybrids

Space: exp(treewidth)Time: exp(treewidth)

Time: exp(pathwidth)Space: exp(pathwidth)

Belief Propagation

Bucket Elimination

Page 88: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) Cycle cutset VE+C hybrid AND/OR search spaces

• AND/OR tree search• AND/OR graph search

Page 89: Reasoning Under Uncertainty

OR search spaceA

D

B C

E

F

Ordering: A B E C D F

A

D

B C

E

F

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1

Page 90: Reasoning Under Uncertainty

AND/OR search space

AOR

0AND 1

BOR B

0AND 1 0 1

EOR C E C E C E C

OR D F D F D F D F D F D F D F D F

AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

AND 0 10 1 0 10 1 0 10 1 0 10 1

A

D

B C

E

F

A

D

B

CE

F

Moral graph DFS tree

A

D

B C

E

F

A

D

B C

E

F

Page 91: Reasoning Under Uncertainty

OR vs. AND/ORAOR

0AND 1

BOR B

0AND 1 0 1

EOR C E C E C E C

OR D F D F D F D F D F D F D F D F

AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

AND 0 10 1 0 10 1 0 10 1 0 10 1

E 0 1 0 1 0 1 0 1

0C 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0B 1 0 1

A 0 1

E 0 1 0 1 0 1 0 1

0C 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0B 1 0 1

A 0 1

AND/OR

OR

A

D

B C

E

FA

D

B

CE

F

1

1

1

0

1

0

Page 92: Reasoning Under Uncertainty

OR vs. AND/OR

92

AOR

0AND 1

BOR B

0AND 1 0 1

EOR C E C E C E C

OR D F D F D F D F D F D F D F D F

AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

AND 0 10 1 0 10 1 0 10 1 0 10 1

E 0 1 0 1 0 1 0 1

0C 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0B 1 0 1

A 0 1

AND/OR

OR

A

D

B C

E

FA

D

B

CE

F

AND/OR size: exp(4), OR size exp(6)

Page 93: Reasoning Under Uncertainty

AND/OR search spaces• The AND/OR search tree of R relative to a spanning-tree, T, has:

Alternating levels of: OR nodes (variables) and AND nodes (values)

• Successor function: The successors of OR nodes X are all its consistent values along its path The successors of AND <X,v> are all X child variables in T

• A solution is a consistent subtree• Task: compute the value of the root node

0

A

B

0

E

OR

AND

OR

AND

OR

AND

C

0

OR

AND

D

0 1

F

0 1

1

D

0 1

F

0 1

0 1

1

E C

0

D

0 1

F

0 1

1

D

0 1

F

0 1

0 1

1

B

0

E C

0

D

0 1

F

0 1

1

D

0 1

F

0 1

0 1

1

E C

0

D

0 1

F

0 1

1

D

0 1

F

0 1

0 1

A

D

B C

E

F

A

D

B

CE

F

Page 94: Reasoning Under Uncertainty

From DFS trees to pseudo trees

(a) Graph

4 61

3 2 7 5

(b) DFS treedepth=3

(c) Pseudo treedepth=2

(d) Chaindepth=6

4 6

1

3

2 7

5 2 7

1

4

3 5

6

4

6

1

3

2

7

5

(Freuder85, Bayardo&Miranker95)

Page 95: Reasoning Under Uncertainty

Pseudo tree vs. DFS tree

Model (DAG) w* Pseudo tree avg. depth

DFS tree avg. depth

(N=50, P=2) 9.54 16.82 36.03(N=50, P=3) 16.1 23.34 40.6(N=50, P=4) 20.91 28.31 43.19(N=100, P=2) 18.3 27.59 72.36(N=100, P=3) 30.97 41.12 80.47(N=100, P=4) 40.27 50.53 86.54

N = number of nodes, P = number of parents. MIN-FILL ordering. 100 instances.

Page 96: Reasoning Under Uncertainty

Finding min-depth backbone trees• Finding min depth DFS, or pseudo tree is NP-

complete, but:• Given a tree-decomposition whose treewidth

is w*, there exists a pseudo tree T of G whose depth, satisfies:

m <= w* log n

(Bayardo & Miranker96, Bodlaender & Gilbert91)

Page 97: Reasoning Under Uncertainty

Generating pseudo trees from bucket trees

FAC

B DE

E

A

C

B

D

F (AF) (EF)

(A)

(AB)

(AC) (BC)

(AE)

(BD) (DE)

Bucket-tree based on dd: A B C E D F

E

A

C

B

D

F

Induced graph

E

A

C

B

D F

Bucket-tree used as pseudo tree

AND

AND

AND

AND

0 1

BOR B

0 1 0 1

COR E C E C E C E

OR D F D F D F D F D F D F D F D F

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 10 1 0 10 1 0 10 1 0 10 1

AOR

AND/OR search tree

Bucket-tree

ABE

A

ABC

AB

BDE AEF

bucket-A

bucket-E

bucket-B

bucket-C

bucket-D bucket-F

(AE) (BE)

Page 98: Reasoning Under Uncertainty

Other heuristics for pseudo trees• Depth-first traversal of the induced graph

constructed along some elimination ordering (e.g., min-fill) Sometimes can get slightly different trees than those

obtained from the bucket-tree

• Recursive decomposition of the dual hypergraph while minimizing the separator size at each step Functions (CPTs) are vertices in the dual hypergraph,

while variables are hyperedges Separator = set of hyperedges (i.e., variables)

Page 99: Reasoning Under Uncertainty

Quality of the pseudo trees

Network hypergraph min-fill width depth width depthbarley 7 13 7 23diabetes 7 16 4 77link 21 40 15 53mildew 5 9 4 13munin1 12 17 12 29munin2 9 16 9 32munin3 9 15 9 30munin4 9 18 9 30water 11 16 10 15pigs 11 20 11 26

Bayesian Networks Repository

Page 100: Reasoning Under Uncertainty

AND/OR search tree properties• Theorem: Any AND/OR search tree based on a pseudo tree is

sound and complete (expresses all and only solutions)

• Theorem: Size of AND/OR search tree is O(n km)Size of OR search tree is O(kn)

• Theorem: Size of AND/OR search tree can be bounded by O(exp(w* log n))

• Related to: (Freuder85; Dechter90, Bayardo et. al. 96, Darwiche01, Bacchus et. al. 03)

• When the pseudo-tree is a chain we get an OR space

Page 101: Reasoning Under Uncertainty

AND/OR vs. OR spaces

width depthOR space AND/OR space

Time (sec.) Nodes Time (sec.) AND nodes OR nodes

5 10 3.15 2,097,150 0.03 10,494 5,247

4 9 3.13 2,097,150 0.01 5,102 2,551

5 10 3.12 2,097,150 0.03 8,926 4,463

4 10 3.12 2,097,150 0.02 7,806 3,903

5 13 3.11 2,097,150 0.10 36,510 18,255

Random graphs with 20 nodes, 20 edges and 2 values per node

Page 102: Reasoning Under Uncertainty

Tasks and values of nodes• v(n) is the value of the tree T(n) for the task:

Optimization (MPE): v(n) is the optimal solution in T(n) Belief updating: v(n), probability of evidence in T(n).

• Goal: compute the value of the root node recursively using DFS search of the AND/OR tree.

• Theorem: Complexity of AO DFS search is: Space: O(n) Time: O(n km) Time: O(exp(w* log n))

Page 103: Reasoning Under Uncertainty

Weighted AND/OR tree (belief updating)

0

A

B

0

E

OR

AND

OR

AND

OR

AND

C

0

OR

AND

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

1

B

0

E C

0

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

A

D

B C

E

A

D

B

CE

B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5

),|( CBDP

.7.8 .9 .5 .7.8 .9 .5

Evidence: D=1

A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8

),|( BAEP

Evidence: E=0

.4 .5 .7 .2

A B=0 B=10 .4 .61 .1 .9

)|( ABPA C=0 C=10 .2 .81 .7 .3

)|( ACPA P(A)0 .61 .4

)(AP

.2 .8 .2 .8 .1 .9 .1 .9

.4 .6 .1 .9

.6 .4

D: P(D|B,C) D=1C: P(C|A)E: P(E|A,B) E=0B: P(B|A)A: P(A)

w(X,x) = product of CPTs that contain X and their scope is fully instantiated along the path

Page 104: Reasoning Under Uncertainty

Computing node values (belief updating)

k

i

iviwv1

),A(),A(AOR node

1

A

2 k

w(A,1)w(A,2)

w(A,k)

v(A,1) v(A,1) v(A,1)…

AND node

0

X1 X2 Xm…v(X1) v(X2) v(Xm)

m

iiXvv

1

0,A

NOTE: • the value of a terminal AND node is 1• the weight of an OR-AND arc for which no CPTs are fully instantiated is 1

Page 105: Reasoning Under Uncertainty

AND/OR tree algorithm (belief updating)

AND node: Combination operator (product)

OR node: Marginalization operator (summation)

Value of node = updated belief for sub-problem below

0

A

B

0

E

OR

AND

OR

AND

OR

AND

C

0

OR

AND

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

1

B

0

E C

0

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

A

D

B C

E

A

D

B

CE

B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5

),|( CBDP

.7.8 .9 .5 .7.8 .9 .5

Evidence: D=1

A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8

),|( BAEP

Evidence: E=0

.4 .5 .7 .2

A B=0 B=10 .4 .61 .1 .9

)|( ABPA C=0 C=10 .2 .81 .7 .3

)|( ACPA P(A)0 .61 .4

)(AP

.2 .8 .2 .8 .1 .9 .1 .9

.4 .6 .1 .9

.6 .4

.8 .9

.8 .9

.7 .5

.7 .5

.8 .9

.8 .9

.7 .5

.7 .5

.4 .5 .7 .2.88 .54 .89 .52

.352 .27 .623 .104

.3028 .1559

.24408

.3028 .1559

Result: P(D=1,E=0)

0.3028*0.6 + 0.1559*0.4 = 0.24408

Page 106: Reasoning Under Uncertainty

Complexity of AND/OR tree search

AND/OR tree OR tree

Space O(n) O(n)

TimeO(n km)

O(n kw* log n)(Freuder & Quinn85), (Collin, Dechter & Katz91), (Bayardo & Miranker95), (Darwiche01)

O(kn)

k = domain sizem = depth of pseudo-treen = number of variablesw*= treewidth

Page 107: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) VE+C hybrid AND/OR search spaces

• AND/OR tree search• AND/OR graph search

Page 108: Reasoning Under Uncertainty

From search trees to search graphs• Any two nodes that root identical sub-trees or

sub-graphs can be merged

Page 109: Reasoning Under Uncertainty

From search trees to search graphs• Any two nodes that root identical sub-trees or

sub-graphs can be merged

Page 110: Reasoning Under Uncertainty

AND/OR search treeA

D

B C

E

F

G H

J

K

A

D

B

CE

F

G

H

J

KAOR

0AND 1

BOR B

0AND 1 0 1

EOR C E C E C E C

OR D F D F D F D F D F D F D F D F

AND

AND 0 10 1 0 10 1 0 10 1 0 10 1

OR

OR

AND

AND

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

Page 111: Reasoning Under Uncertainty

AND/OR search graph

AOR

0AND 1

BOR B

0AND 1 0 1

EOR C E C E C E C

OR D F D F D F D F D F D F D F D F

AND

AND 0 10 1 0 10 1 0 10 1 0 10 1

OR

OR

AND

AND

0

G

H H

0 1 0 1

0 1

1

G

H H

0 1 0 1

0 1

0

J

K K

0 1 0 1

0 1

1

J

K K

0 1 0 1

0 1

A

D

B C

E

F

G H

J

K

A

D

B

CE

F

G

H

J

K

Page 112: Reasoning Under Uncertainty

Merging based on context• One way of recognizing nodes that can be merged

context(X) = ancestors of X in the pseudo tree that are connected to X, or to descendants

of X

[ ]

[A]

[AB]

[AE][BC]

[AB]

A

D

B

EC

F

pseudo tree

A

E

C

B

F

D

A

E

C

B

F

D

Page 113: Reasoning Under Uncertainty

AND/OR graph algorithm (belief updating)

.7.8

0

A

B

0

E C

0

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

1

B

0

E C

0 10 1

1

E C

0 10 1

A

D

B C

E

B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5

),|( CBDP.7.8 .9 .5

A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8

),|( BAEP

Evidence: E=0

.4 .5 .7 .2

A B=0 B=10 .4 .61 .1 .9

)|( ABPA C=0 C=10 .2 .81 .7 .3

)|( ACPA P(A)0 .61 .4

)(AP

.2 .8 .2 .8 .1 .9 .1 .9

.4 .6 .1 .9

.6 .4

.9

.8 .9

.5

.7 .5 .8 .9 .7 .5

.4 .5 .7 .2.88 .54 .89 .52

.352 .27 .623 .104

.3028 .1559

.24408

.3028 .1559

A

D

B

CE

[ ]

[A]

[AB]

[BC]

[AB]

Context

B C Value0 0 .80 1 .91 0 .71 1 .1

Cache table for D

Result: P(D=1,E=0)

Page 114: Reasoning Under Uncertainty

Context-minimal AND/OR graphC

0

K

0

H

0

L0 1

N N

0 1 0 1

F F F

1 1

0 1 0 1

F

G

0 1

1

A0 1

B B

0 1 0 1

E EE E

0 1 0 1

J JJ J

0 1 0 1

A0 1

B B

0 1 0 1

E EE E

0 1 0 1

J JJ J

0 1 0 1

G

0 1

G

0 1

G

0 1

M

0 1

M

0 1

M

0 1

M

0 1

P

0 1

P

0 1

O

0 1

O

0 1

O

0 1

O

0 1

L0 1

N N

0 1 0 1

P

0 1

P

0 1

O

0 1

O

0 1

O

0 1

O

0 1

D

0 1

D

0 1

D

0 1

D

0 1

K

0

H

0

L0 1

N N

0 1 0 1

1 1

A0 1

B B

0 1 0 1

E EE E

0 1 0 1

J JJ J

0 1 0 1

A0 1

B B

0 1 0 1

E EE E

0 1 0 1

J JJ J

0 1 0 1

P

0 1

P

0 1

O

0 1

O

0 1

O

0 1

O

0 1

L0 1

N N

0 1 0 1

P

0 1

P

0 1

O

0 1

O

0 1

O

0 1

O

0 1

D

0 1

D

0 1

D

0 1

D

0 1

B A

C

E

F G

HJ

D

K M

L

N

OP

C

HK

D

M

F

G

A

B

E

J

O

L

N

P

[AB]

[AF][CHAE]

[CEJ]

[CD]

[CHAB]

[CHA]

[CH]

[C]

[ ]

[CKO]

[CKLN]

[CKL]

[CK]

[C]

(C K H A B E J L N O D P M F G)

Page 115: Reasoning Under Uncertainty

How big is the context?

Theorem: The maximum context size for a pseudo tree is equal to the treewidth of the graph along the pseudo tree.

C

HK

D

M

F

G

A

B

E

J

O

L

N

P

[AB]

[AF][CHAE]

[CEJ]

[CD]

[CHAB]

[CHA]

[CH]

[C]

[ ]

[CKO]

[CKLN]

[CKL]

[CK]

[C]

(C K H A B E J L N O D P M F G)

B A

C

E

F G

H

J

D

K M

L

N

OP

max context size = treewidth

Page 116: Reasoning Under Uncertainty

Treewidth vs. pathwidth

G

EK

F

L

H

C

BA

M

J

D

EK

L

H

C

A

M

J

ABC

BDEF

BDFG

EFH

FHK

HJ KLM

treewidth = 3 = (max cluster size) - 1

ABC

BDEFG

EFH

FHKJ

KLM

pathwidth = 4 = (max cluster size) - 1

D

G

B

F

TREE

CHAIN

Page 117: Reasoning Under Uncertainty

AND/OR graph search

• AO(i): searches depth-first, cache i-context i = the max size of a cache table (i.e. number of

variables in a context)

i=0 i=w*

Space: O(n)

Time: O(exp(w* log n))

Space: O(exp w*)

Time: O(exp w*)Space: O(exp(i) )

Time: O(exp(m_i+i )

i

Page 118: Reasoning Under Uncertainty

Complexity of AND/OR graph search

AND/OR graph OR graph

Space O(n kw*) O(n kpw*)

Time O(n kw*) O(n kpw*)

k = domain sizen = number of variablesw*= treewidthpw*= pathwidth

w* ≤ pw* ≤ w* log n

Page 119: Reasoning Under Uncertainty

Related work• Recursive Conditioning (RC) (Darwiche01)

Can be viewed as an AND/OR graph search algorithm guided by tree

Guiding tree structure is called “dtree”

• Value Elimination (VE) (Bacchus et al.03) Also an AND/OR graph search algorithm using an

advanced caching scheme based on components rather than graph-based contexts

Can use dynamic variable orderings

Page 120: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) VE+C hybrid AND/OR search spaces

• AND/OR tree search• AND/OR graph search

Page 121: Reasoning Under Uncertainty

AND/OR w-cutset

A

C

B K

G L

D F

H

M

J

E

AC

B K

G

L

D

FH

M

J

E

A

C

B K

G L

D F

H

M

J

E

C

B K

G

L

D

FH

M

J

E

3-cutset

A

C

B K

G L

D F

H

M

J

E

C

K

G

L

D

FH

M

J

E

2-cutset

A

C

B K

G L

D F

H

M

J

E

L

D

FH

M

J

E

1-cutset

Page 122: Reasoning Under Uncertainty

AND/OR w-cutset

AC

B K

G

L

D

FH

M

J

E

AC

B K

G

L

D

FH

M

J

E

AC

B K

G

L

D

FH

M

J

E

pseudo tree 1-cutset treemoral graph

Page 123: Reasoning Under Uncertainty

Searching AND/OR graphs

• AO(i): searches depth-first, cache i-context i = the max size of a cache table (i.e. number of

variables in a context)

i=0 i=w*

Space: O(n)

Time: O(exp(w* log n))

Space: O(exp w*)

Time: O(exp w*)Space: O(exp(i) )

Time: O(exp(m_i+i )

i

Page 124: Reasoning Under Uncertainty

w-cutset trees over AND/OR space

• Definition: T_w is a w-cutset tree relative to backbone pseudo tree T, iff T_w roots

T and when removed, yields treewidth w.

• Theorem: AO(i) time complexity for backbone T is time O(exp(i+m_i)) and space

O(i), m_i is the depth of the T_i tree.

• Better than w-cutset: O(exp(i+c_i)) where c_i is the number of nodes in T_i

Page 125: Reasoning Under Uncertainty

Exact inference• Variable elimination (inference)

Bucket elimination Bucket-Tree elimination Cluster-Tree elimination

• Conditioning (search) VE+C hybrid AND/OR search for Most Probable Explanations

Page 126: Reasoning Under Uncertainty

AND/OR Branch-and-Bound for MPE

• Solved by BE in time and space exponential in treewidth w*

• Solved by Conditioning in linear space and time exponential in the number of variables n

• It can be solved by AND/OR search: Tree search: space O(n), time O(exp(w* log n)) Graph search: time and space O(exp(w*))

n

iiiXX paXPMPE

n1

,..., |max1

Page 127: Reasoning Under Uncertainty

Weighted AND/OR tree (MPE task)

0

A

B

0

E

OR

AND

OR

AND

OR

AND

C

0

OR

AND

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

1

B

0

E C

0

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

A

D

B C

E

A

D

B

CE

B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5

),|( CBDP

.7.8 .9 .5 .7.8 .9 .5

Evidence: D=1

A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8

),|( BAEP

Evidence: E=0

.4 .5 .7 .2

A B=0 B=10 .4 .61 .1 .9

)|( ABPA C=0 C=10 .2 .81 .7 .3

)|( ACPA P(A)0 .61 .4

)(AP

.2 .8 .2 .8 .1 .9 .1 .9

.4 .6 .1 .9

.6 .4

D: P(D|B,C) D=1C: P(C|A)E: P(E|A,B) E=0B: P(B|A)A: P(A)

w(X,x) = product of CPTs that contain X and their scope is fully instantiated along the path

Page 128: Reasoning Under Uncertainty

Computing node values (MPE task)

),(),(maxA iAviAwv i OR node

1

A

2 k

w(A,1)w(A,2)

w(A,k)

v(A,1) v(A,1) v(A,1)…

AND node

0

X1 X2 Xm…v(X1) v(X2) v(Xm)

m

iiXvv

1

0,A

NOTE: • the value of a terminal AND node is 1• the weight of an OR-AND arc for which no CPTs are fully instantiated is 1

Page 129: Reasoning Under Uncertainty

AND/OR tree algorithm (MPE task)

AND node: Combination operator (product)

OR node: Marginalization operator (maximization)

Value of node = MPE value for sub-problem below

0

A

B

0

E

OR

AND

OR

AND

OR

AND

C

0

OR

AND

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

1

B

0

E C

0

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

A

D

B C

E

A

D

B

CE

B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5

),|( CBDP

.7.8 .9 .5 .7.8 .9 .5

Evidence: D=1

A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8

),|( BAEP

Evidence: E=0

.4 .5 .7 .2

A B=0 B=10 .4 .61 .1 .9

)|( ABPA C=0 C=10 .2 .81 .7 .3

)|( ACPA P(A)0 .61 .4

)(AP

.2 .8 .2 .8 .1 .9 .1 .9

.4 .6 .1 .9

.6 .4

.8 .9

.8 .9

.7 .5

.7 .5

.8 .9

.8 .9

.7 .5

.7 .5

.4 .5 .7 .2.72 .40 .81 .45

.288 .20 .567 .09

.12 .081

.072

.12 .081

Result: MPE(D=1,E=0)

MAX( 0.12*0.6 , 0.081*0.4 )= 0.072

Page 130: Reasoning Under Uncertainty

Branch-and-Bound search

n

g(n) cost of thesearch path to n

h(n) estimates theoptimal cost below n

UB(n) = g(n) * h(n)

Upper Bound UB(n)

OR Search Tree

Prune if UB(n) ≤ LB

Lower Bound LB

(Lawler & Wood66)

Page 131: Reasoning Under Uncertainty

Partial solution tree

0

D

0

(A=0, B=0, C=0, D=0)

0

A

B C

0

0

A

B C

00

D

1

(A=0, B=0, C=0, D=1)

0

A

B C

01

D

0

(A=0, B=1, C=0, D=0)

0

A

B C

01

D

1

(A=0, B=1, C=0, D=1)

A

B C

D

Pseudo tree

Extension(T’) – solution trees that extend T’

Page 132: Reasoning Under Uncertainty

Exact evaluation function

OR

AND

OR

AND

OR

OR

AND

AND

A

0

B

0

D

E E

0 1 0 1

0 1

C

1

1

6 4 8 54 5

4 5

2 4

9

9

2 5 0 0

0 0

0

1

0

0

D

0

C

1

v(D,0)

3

3 5 00

9

tip nodes

F

1

3

3 50

F v(F)

A

B

C

D E

F

A

B

C D

E

F

A B C f1(ABC)

0 0 0 20 0 1 50 1 0 30 1 1 51 0 0 91 0 1 31 1 0 71 1 1 2

A B F f2(ABF)

0 0 0 30 0 1 50 1 0 10 1 1 41 0 0 61 0 1 51 1 0 61 1 1 5

B D E f3(BDE)

0 0 0 60 0 1 40 1 0 80 1 1 51 0 0 91 0 1 31 1 0 71 1 1 4

f*(T’) = w(A,0) * w(B,1) * w(C,0) * w(D,0) * v(D,0) * v(F)

Page 133: Reasoning Under Uncertainty

Heuristic evaluation function

OR

AND

OR

AND

OR

OR

AND

AND

A

0

B

0

D

E E

0 1 0 1

0 1

C

1

1

6 4 8 54 5

4 5

2 4

9

9

2 5 0 0

0 0

0

1

0

0

D

0

C

1

h(D,0) = 4

3

3 5 00

9

tip nodes

F

1

3

3 50

F h(F) = 5

A

B

C

D E

F

A

B

C D

E

F

A B C f1(ABC)

0 0 0 20 0 1 50 1 0 30 1 1 51 0 0 91 0 1 31 1 0 71 1 1 2

A B F f2(ABF)

0 0 0 30 0 1 50 1 0 10 1 1 41 0 0 61 0 1 51 1 0 61 1 1 5

B D E f3(BDE)

0 0 0 60 0 1 40 1 0 80 1 1 51 0 0 91 0 1 31 1 0 71 1 1 4

f(T’) = w(A,0) * w(B,1) * w(C,0) * w(D,0) * h(D,0) * h(F) ≥ f*(T’)

h(n) ≥ v(n)

Page 134: Reasoning Under Uncertainty

AND/OR Branch-and-Bound searchOR

AND

OR

AND

OR

OR

AND

AND

A

0

B

0

D

E E

0 1 0 1

0 1

C

1

1

1

0

D

E E

0 1 0 1

0 1

C

10

B

0 1

f(T’) ≤ LB

LB (Marinescu and Dechter, 05)

Page 135: Reasoning Under Uncertainty

AND/OR Branch-and-Bound search• Associate each node n with a heuristic upper

bound h(n) on v(n)• EXPAND (top-down)

Evaluate f(T’) of the current partial solution sub-tree T’, and prune search if f(T’) ≤ LB

Expand the tip node n by generating its successors• PROPAGATE (bottom-up)

Update value of the parent p of n• OR nodes: maximization• AND nodes: product

Page 136: Reasoning Under Uncertainty

How to Generate Heuristics• The principle of relaxed models

Mini-Bucket Elimination for belief networks(Pearl86)

Page 137: Reasoning Under Uncertainty

Grid Networks (BN)

Grid(w*, h)(n, e)

SamIamv. 2.3.2

time

MBE(i)BB+SMB(i)

AOBB+SMB(i)BB+DMB(i)

AOBB+DMB(i)i=10

MBE(i)BB+SMB(i)

AOBB+SMB(i)BB+DMB(i)

AOBB+DMB(i)i=14

MBE(i)BB+SMB(i)

AOBB+SMB(i)BB+DMB(i)

AOBB+DMB(i)i=18

MBE(i)BB+SMB(i)

AOBB+SMB(i)BB+DMB(i)

AOBB+DMB(i)i=20

time nodes time nodes time nodes time nodes 90-24-1(36, 61)(576, 20)

-

0.14----

----

0.89-

1500.66--

-

24,117,151--

7.61-

93.73-

1979.42

-

1,413,764-

1,228

31.26-

111.46-

2637.71

-

1,308,009-

598 90-26-1(35, 64)(676, 40)

-

0.16-

1533.11-

1852.27

-

17,899,574-

177,661

1.02-

242.37--

-

3,205,257--

11.74-

21.48-

2889.49

-

165,182-

1,191

36.1670.5336.49

--

327,859

5,777--

90-30-1(38, 68)(900, 60)

-

0.25----

----

1.35-

239.08--

-

3,324,942--

13.34-

101.10--

-

1,358,569--

50.53-

87.68--

-

485,300--

Min-fill pseudo tree. Time limit 1 hour.

(Sang et al.05)

Page 138: Reasoning Under Uncertainty

Genetic Linkage Analysis (BN)

pedigree(n, d)(w*, h)

Superlink

v. 1.6

time

SamIam

v. 2.3.2

time

MBE(i)BB+SMB(i)

AOBB+SMB(i)i=12

MBE(i)BB+SMB(i)

AOBB+SMB(i)i=16

MBE(i)BB+SMB(i)

AOBB+SMB(i)i=20

time nodes time nodes time nodesped18(1184, 5)(21, 119)

139.06

157.05

0.51--

--

4.59-

270.96

-

2,555,078

19.30-

20.27

-

7,689ped25(994, 5)(29, 53)

-

out

0.34--

--

3.20--

--

33.42-

1894.17

-

11,709,153ped30(1016, 5)(25, 51)

13095.83

out

0.31-

5563.22

-

63,068,960

2.66-

1811.34

-

20,275,620

24.88-

82.25

-

588,558ped33(581, 5)(26, 48)

-

out

0.41-

2335.28

-

32,444,818

5.28-

62.91

-

807,071

51.24-

76.47

-

320,279ped39(1272, 5)(23, 94)

322.14

out

0.52--

--

8.41-

4041.56

-

52,804,044

81.27-

141.23

-

407,280

(Fishelson&Geiger02)

Min-fill pseudo tree. Time limit 3 hours.

Page 139: Reasoning Under Uncertainty

Memory intensive AND/OR Branch-and-Bound

• Associate each node n with a heuristic upper bound h(n) on v(n)

• EXPAND (top-down) Evaluate f(T’) of the current partial solution sub-tree T’, and

prune search if f(T’) ≤ LB If not in cache, expand the tip node n by generating its

successors• PROPAGATE (bottom-up)

Update value of the parent p of n• OR nodes: maximization• AND nodes: multiplication

Cache value of n, based on context

Page 140: Reasoning Under Uncertainty

Best-first AND/OR search for MPE• Best-first search expands first the node with

the best heuristic evaluation function among all nodes encountered so far

• It never expands nodes whose cost is beyond the optimal one, unlike depth-first search algorithms (Dechter & Pearl85)

• Superior among memory intensive algorithms employing the same heuristic function

Page 141: Reasoning Under Uncertainty

Best-First AND/OR Search• Maintains the set of best partial solution trees• EXPAND (top-down)

Traces down marked connectors from root (best partial solution tree) Expands a tip node n by generating its successors n’ Associate each successor with heuristic estimate h(n’)

• Initialize v(n’) = h(n’)

• REVISE (bottom-up) Updates node values v(n)

• OR nodes: maximization• AND nodes: multiplication

Marks the most promising solution tree from the root Label the nodes as SOLVED:

• OR is SOLVED if marked child is SOLVED• AND is SOLVED if all children are SOLVED

• Terminate when root node is SOLVED

[specializes Nilsson’s AO* to graphical models (Nilsson80)]

(Marinescu & Dechter, 07)

Page 142: Reasoning Under Uncertainty

Grid Networks (BN)

grid (w*, h)(n, e)

SamIam

MBE(i)BB-C+SMB(i)AOBB+SMB(i)

AOBB-C+SMB(i)AOBF-C+SMB(i)

i=12

MBE(i)BB-C+SMB(i)AOBB+SMB(i)

AOBB-C+SMB(i)AOBF-C+SMB(i)

i=14

MBE(i)BB-C+SMB(i)AOBB+SMB(i)

AOBB-C+SMB(i)AOBF-C+SMB(i)

i=16

MBE(i)BB-C+SMB(i)AOBB+SMB(i)

AOBB-C+SMB(i)AOBF-C+SMB(i)

i=18

time nodes time nodes time nodes time nodes 90-24-1(33, 111)(576, 20)

out

0.28---

out

---

0.64-

2338.671273.09

21.94

-

24,117,1519,047,518

75,637

1.69-

1548.09596.27

10.59

-

18,238,9834,923,760

33,770

4.60-

138.6770.42

6.06

-

1,413,764473,675

5,144 90-34-1(45, 153)(1154, 80)

out

0.63---

out

---

1.25---

out

---

3.72---

243.63

---

596,978

11.66---

270.88

---

667,013 90-38-1(47, 163)(1444, 120)

out

0.78-

2032.33969.02101.69

-

6,835,7452,623,971

174,786

1.67--

1753.10103.80

--

3,794,053146,237

4.20-

807.38203.67

54.00

-

2,850,393614,868

95,511

12.36-

568.69165.45

53.44

-

2,079,146488,873

78,431Min-fill pseudo tree. Time limit 1 hour.

Page 143: Reasoning Under Uncertainty

Solving the MAP task

• Solved by BE in time and space exponential in constrained induced width w*

• Solved by AND/OR search: Tree search: space O(n), time O(exp(w* log n)) Graph search: time and space O(exp(w*))

AXa

k exPaa\

**1 ),(maxarg),...,(MAP

Page 144: Reasoning Under Uncertainty

Bucket elimination for MAP

A

B C

ED

P(A)

P(B|A)

P(E|B,C)

P(D|A,B)

P(C|A)

A

B C

ED

Moralize (marry parents)

Variables A and B are the hypothesis variables, variable E is evidence

0,,,, ),,,,(max0,,maxedcbaba edcbaPebaPMAP

c d eba cbePbadPacPabPaPMAP0

),|(),|()|()|(max)(max

Page 145: Reasoning Under Uncertainty

Bucket elimination for MAP

Bucket E:

Bucket D:

Bucket C:

Bucket B:

Bucket A:

P(E|B,C), E = 0

P(D|A,B)

P(A)

λE(B, C)

λC(A,B)λD(A, B)

λB(A)

MAP value

P(C|A)

P(B|A)

SUM buckets

MAX buckets

Page 146: Reasoning Under Uncertainty

Bucket elimination for MAP• Elimination order is important: SUM variables are eliminated

first, followed by the MAX variables ordering: A, B, C, D, E is legal ordering: A, C, D, E, B is illegal

• Induced width corresponding to a legal elimination order is called constrained induced width cw* Typically it may be far larger than the unconstrained induced width,

ie cw* ≥ w*

• When interleaving MAX and SUM (using unconstrained orderings) the result is an Upper Bound on the MAP value Can be used as a guiding heuristic function for search

Page 147: Reasoning Under Uncertainty

AND/OR tree algorithm for MAP

AND node: Combination operator (product)

OR node: MAX for hypothesis, SUM otherwise

0

A

B

0

E

OR

AND

OR

AND

OR

AND

C

0

OR

AND

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

1

B

0

E C

0

D

0 1

1

D

0 1

0 1

1

E C

0

D

0 1

1

D

0 1

0 1

A

D

B C

E

A

D

B

CE

B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5

),|( CBDP

.7.8 .9 .5 .7.8 .9 .5

Evidence: D=1

A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8

),|( BAEP

Evidence: E=0

.4 .5 .7 .2

A B=0 B=10 .4 .61 .1 .9

)|( ABPA C=0 C=10 .2 .81 .7 .3

)|( ACPA P(A)0 .61 .4

)(AP

.2 .8 .2 .8 .1 .9 .1 .9

.4 .6 .1 .9

.6 .4

.8 .9

.8 .9

.7 .5

.7 .5

.8 .9

.8 .9

.7 .5

.7 .5

.4 .5 .7 .2.88 .54 .89 .52

.352 .27 .623 .104

.162 .0936

.0972

.162 .0936

Result: MAP(D=1,E=0)

MAX( 0.162*0.6 , 0.0936*0.4 )= 0.0972

Page 148: Reasoning Under Uncertainty

AND/OR search for MAP• Pseudo tree must be consistent with the

constrained elimination order• Graph search via context-based caching

• Time and space complexity Tree search:

• Space linear, time O(exp(cw*log n)) Graph search:

• Time and space O(exp(cw*))

Page 149: Reasoning Under Uncertainty

Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief networks• Inference in belief networks

Exact inference Approximate inference

Page 150: Reasoning Under Uncertainty

Approximate inference• Mini-Bucket Elimination

Mini-clustering• Iterative Belief Propagation

IJGP – Iterative Joint Graph Propagation• Sampling

Forward sampling Gibbs sampling (MCMC) Importance sampling

Page 151: Reasoning Under Uncertainty

Solution techniques

Search: Conditioning

Complete

Incomplete

Gradient Descent

Complete

Incomplete

Tree ClusteringVariable Elimination

Mini-Clustering(i)Mini-Bucket(i)

Stochastic Local SearchDFS search

Inference: Elimination

Time: exp(treewidth)Space:exp(treewidth)

Time: exp(n)Space: linear

AND/OR searchTime: exp(treewidth*log n)Space: linear

Hybrids

Space: exp(treewidth)Time: exp(treewidth)

Time: exp(pathwidth)Space: exp(pathwidth)

Belief Propagation

Bucket Elimination

Page 152: Reasoning Under Uncertainty

Variable elimination (MPE)

A

B C

ED

P(A)

P(C|A)P(B|A)

P(E|B,C)

P(D|A,B)

MPE = ?

maxA,E=0,D,C,B P(A) P(B|A) P(C|A) P(D|A,B) P(E|B,C) =

maxAP(A) maxE=0 maxD maxC P(C|A) maxB P(B|A) P(D|A,B) P(E|B,C)

λB(A,D,C,E)Variable Elimination

Given a belief network and some evidence

Page 153: Reasoning Under Uncertainty

Bucket elimination (MPE)

Bucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C), P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

max∏ Elimination operator

λB(A,D,C,E)

λC(A,D,E)

λD(A,E)

λE(A)

MPE

B

C

D

E

A

w* = 4“induced width”(max clique size)

width

4

3

1

1

0

Page 154: Reasoning Under Uncertainty

MBE: Mini-Bucket Elimination• Computation in a bucket is time and space

exponential in the number of variables involved (i.e., width)

• Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables

• The idea is similar to i-consistency: bound the size of recorded dependencies (Dechter 2003)

Page 155: Reasoning Under Uncertainty

Idea: MPE task

XX gh

Split a bucket into mini-buckets => bound complexity

)()()O(e :decrease complexity lExponentia n rnr eOeO

Page 156: Reasoning Under Uncertainty

MBE(i=3) in action for MPE

Bucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C) P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

λB(C,E)

λC(A,D,E)

Upper Bound on MPE value

λE(A)

λB(A,D)

λD(A,E)

4 variables: split

3 variables: OK

3 variables: OK

2 variables: OK

1 variable: OK

Mini-bucketsmax∏max∏

Page 157: Reasoning Under Uncertainty

MBE(i=3) in action for MPEBucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C), P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

λB(C,E)

λC(A,D,E)

λE(A)

λB(A,D)

λD(A,E)

a’ = argmax P(A) ∙ λE(A)

e’ = 0

d’ = argmax λC(a’,D,e’) ∙ ∙ λC(a’,D)

c’ = argmax P(C|a’) ∙∙ λC(C,e’)

b’ = argmax P(e’|B,c’) ∙ ∙ P(d’|a’,B) P(B|a’)∙

Return (a’, b’, c’, d’, e’)A Lower Bound can also be computed as the probability of

the sub-optimal assignment P(a’, b’, c’, d’, e’)

Page 158: Reasoning Under Uncertainty

MBE(i=3) for probability of evidence

Bucket B:

Bucket C:

Bucket D:

Bucket E:

Bucket A:

P(E|B,C) P(D|A,B), P(B|A)

P(C|A)

E=0

P(A)

λB(C,E)

λC(A,D,E)

Upper Bound on P(evidence)

λE(A)

λB(A,D)

λD(A,E)

4 variables: split

3 variables: OK

3 variables: OK

2 variables: OK

1 variable: OK

Mini-buckets∑∏∑∏

Page 159: Reasoning Under Uncertainty

MBE(i) for probability of evidence• If we process all mini-buckets by summation

then we get an unnecessarily large upper bound on the probability of evidence

• Tighter upper bound Process first mini-bucket by summation and

remaining ones by maximization• We can also get a lower bound on P(evidence)

Process first mini-bucket by summation and remaining ones by minimization

Page 160: Reasoning Under Uncertainty

Properties of MBE(i)• Controlling parameter i (called i-bound)

Maximum number of distinct variables in a mini-bucket Outputs both a lower and an upper bound

• Complexity: O(exp(i)) time and space• As i-bound increases, both accuracy and time complexity

increase Clearly, if i = w*, then we have pure BE

• Possible use of mini-bucket approximations As anytime algorithms (Dechter & Rish, 1997) As heuristic functions for depth-first and best-first search (Kask

& Dechter, 2001), (Marinescu & Dechter, 2005)

Page 161: Reasoning Under Uncertainty

Mini-Bucket Heuristics• Static Mini-Buckets

Pre-compiled Reduced overhead Less accurate Static variable ordering

• Dynamic Mini-Buckets Computed dynamically Higher overhead High accuracy Dynamic variable ordering

Page 162: Reasoning Under Uncertainty

Heuristic evaluation function

OR

AND

OR

AND

OR

OR

AND

AND

A

0

B

0

D

E E

0 1 0 1

0 1

C

1

1

6 4 8 54 5

4 5

2 4

9

9

2 5 0 0

0 0

0

1

0

0

D

0

C

1

h(D,0) = 4

3

3 5 00

9

tip nodes

F

1

3

3 50

F h(F) = 5

A

B

C

D E

F

A

B

C D

E

F

A B C f1(ABC)

0 0 0 20 0 1 50 1 0 30 1 1 51 0 0 91 0 1 31 1 0 71 1 1 2

A B F f2(ABF)

0 0 0 30 0 1 50 1 0 10 1 1 41 0 0 61 0 1 51 1 0 61 1 1 5

B D E f3(BDE)

0 0 0 60 0 1 40 1 0 80 1 1 51 0 0 91 0 1 31 1 0 71 1 1 4

f(T’) = w(A,0) * w(B,1) * w(C,0) * w(D,0) * h(D,0) * h(F) ≥ f*(T’)

h(n) ≥ v(n)

Page 163: Reasoning Under Uncertainty

Bucket eliminationA

f(A,B)B

f(B,C)C f(B,F)F

f(A,G) f(F,G)

Gf(B,E) f(C,E)

Ef(A,D) f(B,D) f(C,D)

D

hG (A,F)

hF (A,B)

hB (A)

hE (B,C)hD (A,B,C)

hC (A,B)

A B

CD

E

F

G

A

B

C F

GD E

Ordering: (A, B, C, D, E, F, G)

h*(a, b, c) = hD(a, b, c) * hE(b, c)

(Dechter99)

Page 164: Reasoning Under Uncertainty

Static mini-bucket heuristics

A

f(A,B)B

f(B,C)C f(B,F)F

f(A,G) f(F,G)

Gf(B,E) f(C,E)

Ef(B,D) f(C,D)

D

hG (A,F)

hF (A,B)

hB (A)

hE (B,C)hD (B,C)

hC (B)

hD (A)

f(A,D)D

mini-buckets

A B

CD

E

F

G

A

B

C F

GD E

Ordering: (A, B, C, D, E, F, G)

h(a, b, c) = hD(a) * hD(b, c) * hE(b, c) ≥ h*(a, b, c)

MBE(3)

Page 165: Reasoning Under Uncertainty

Dynamic mini-bucket heuristics

A

f(a,b)B

f(b,C)C f(b,F)F

f(a,G) f(F,G)

Gf(b,E) f(C,E)

Ef(a,D) f(b,D) f(C,D)

D

hG (F)

hF ()

hB ()

hE (C)hD (C)

hC ()

A B

CD

E

F

G

A

B

C F

GD E

Ordering: (A, B, C, D, E, F, G)

h(a, b, c) = hD(c) * hE(c) = h*(a, b, c)

MBE(3)

Page 166: Reasoning Under Uncertainty

Static vs. Dynamic Mini-Bucket Heuristics

s1196 ISCAS’89 circuit.

Page 167: Reasoning Under Uncertainty

Approximate inference• Mini-Bucket Elimination

Mini-clustering (tree decompositions)• Iterative Belief Propagation

IJGP – Iterative Joint Graph Propagation• Sampling

Forward sampling Gibbs sampling (MCMC) Importance sampling Particle filtering

Page 168: Reasoning Under Uncertainty

Cluster Tree Elimination (CTE)• Correctness and completeness:

Algorithm CTE is correct, i.e. it computes the exact posterior joint probability of all single variables (or subsets) and the evidence.

• Time complexity: O ( deg (n+N) d w*+1 )

• Space complexity: O ( N d sep)where deg = the maximum degree of a node

n = number of variables (= number of CPTs)N = number of nodes in the tree decompositiond = the maximum domain size of a variablew* = the induced widthsep = the separator size

Page 169: Reasoning Under Uncertainty

Cluster Tree Elimination - messages

),|()|()(),()2,1( bacpabpapcbha

A B C p(a), p(b|a), p(c|a,b)

B C D Fp(d|b), p(f|c,d)

h(1,2)(b,c)

B E Fp(e|b,f), h(2,3)(b,f)

E F Gp(g|e,f)

),(),|()|(),( )2,1(,

)3,2( cbhdcfpbdpfbhdc

2

4

1

3

EF

BC

BFsep(2,3)={B,F}

elim(2,3)={C,D}

G

E

F

C D

B

A

Page 170: Reasoning Under Uncertainty

Mini-Clustering for belief updating• Motivation:

Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem

When the induced width w* is big, CTE algorithm becomes infeasible

• The basic idea: Try to reduce the size of the cluster (the exponent);

partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a

mini-cluster The idea was explored for variable elimination (MBE)

Page 171: Reasoning Under Uncertainty

Idea of Mini-Clustering

Split a cluster into mini-clusters => bound complexity

)()( :decrease complexity lExponentia rnrn eOeO)O(e

},...,,,...,{ 11 nrr hhhh )(ucluster

elim

n

iihh

1

},...,{ 1 rhh },...,{ 1 nr hh

elim

n

rii

elim

r

ii hhg

11

gh

Page 172: Reasoning Under Uncertainty

Mini-Clustering (MC)

),|()|()(),(1)2,1( bacpabpapcbh

a

A B C p(a), p(b|a), p(c|a,b)

B E Fp(e|b,f)

E F Gp(g|e,f)

2

4

1

3EF

BC

BF

dc

dcfpfh,

2)3,2( ),|()(

),()|()( 1)2,1(

,

1)3,2( cbhbdpbh

dc

dc

dcfpcbhbdpfbh,

1)2,1()3,2( ),|(),()|(),(

),|()|()(),()2,1( bacpabpapcbha

Cluster Tree Elimination Mini-Clustering, i=3

G

E

F

C D

B

A

B C D Fp(d|b), p(f|c,d)2

B C D Fp(d|b), h(1,2)(b,c), p(f|c,d)

sep(2,3) = {B,F}elim(2,3) = {C,D}

B C D C D F p(d|b), h(1,2)(b,c) p(f|c,d)

Page 173: Reasoning Under Uncertainty

EF

BF

BC

),|()|()(:),(1)2,1( bacpabpapcbh

a

)2,1(H

),|(max:)(

),()|(:)(

,

2)1,2(

1)2,3(

,

1)1,2(

dcfpch

fbhbdpbh

fd

fd

)1,2(H

),|(max:)(

),()|(:)(

,

2)3,2(

1)2,1(

,

1)3,2(

dcfpfh

cbhbdpbh

dc

dc

)3,2(H

),(),|(:),( 1)3,4(

1)2,3( fehfbepfbh

e

)2,3(H

)()(),|(:),( 2)3,2(

1)3,2(

1)4,3( fhbhfbepfeh

b

)4,3(H

),|(:),(1)3,4( fegGpfeh e)3,4(H

ABC

2

4

1

3 BEF

EFG

BCDF

Mini-Clustering - example

Page 174: Reasoning Under Uncertainty

Mini-Clustering• Correctness and completeness:

Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values.

• Time & space complexity: O(exp(i))

Page 175: Reasoning Under Uncertainty

Approximate inference• Mini-Bucket Elimination

Mini-clustering• Iterative Belief Propagation

IJGP – Iterative Joint Graph Propagation• Sampling

Forward sampling Gibbs sampling (MCMC) Importance sampling Particle filtering

Page 176: Reasoning Under Uncertainty

Iterative Belief Propagation (IBP)• Belief propagation is exact for poly-trees (Pearl, 1988)• IBP - applying BP iteratively to cyclic networks

• No guarantees for convergence• Works well for many coding networks

)( 12xU

)( 11uX

1U 2U 3U

2X1X )( 12uX

)( 13xU

) BEL(U update :step One

1

Page 177: Reasoning Under Uncertainty

Iterative Belief Propagation

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A C

A AB BC

BE

C

CDE CE

F H

FFG GH H

GI

The graph IBP works on (dual graph)

A

D

I

B

E

J

F

G

C

H

Belief network

P(A)P(B|A,C)

P(C)

P(D|A,B,E) P(E|B,C)

P(F|C,D,E)P(G|H,F)

P(H)

P(I|F,G) P(J|H,G,I)

Page 178: Reasoning Under Uncertainty

Iterative Join-Graph Propagation (IJGP)• IBP is applied to a loopy network iteratively

not an anytime algorithm when it converges, it converges very fast

• MC applies bounded inference along a tree decomposition MC is an anytime algorithm controlled by i-bound MC converges in two passes up and down the tree

• IJGP combines: the iterative feature of IBP the anytime feature of MC

Page 179: Reasoning Under Uncertainty

IJGP - The basic idea Apply Cluster Tree Elimination to any join-graph

We commit to graphs that are minimal I-maps

Avoid cycles as long as I-mapness is not violated

Result: use minimal arc-labeled join-graphs

Page 180: Reasoning Under Uncertainty

IJGP - ExampleA

D

I

B

E

J

F

G

C

H

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A C

A AB BC

BE

C

CDE CE

F H

FFG GH H

GI

Belief network The graph IBP works on (dual graph)

Page 181: Reasoning Under Uncertainty

Arc-minimal join-graph

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A C

A AB BC

BE

C

CDE CE

F H

FFG GH H

GI

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A

AB BC

CDE CE

H

FFG GH

GI

Page 182: Reasoning Under Uncertainty

Minimal arc-labeled join-graph

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A

AB BC

CDE CE

H

FFG GH

GI

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A

AB BC

CDE CE

H

FF GH

GI

Page 183: Reasoning Under Uncertainty

Join-graph decompositions

a) Minimal arc-labeled join graph

b) Join-graph obtained by collapsing nodes of graph a)

c) Minimal arc-labeled join graph

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A

AB BC

CDE CE

H

FF GH

GI

ABCDE

FGI

BCE

GHIJ

CDEF

FGH

BC

CDE CE

FF GH

GI

ABCDE

FGI

BCE

GHIJ

CDEF

FGH

BC

DE CE

FF GH

GI

Page 184: Reasoning Under Uncertainty

Tree decomposition

ABCDE

FGHI GHIJ

CDEF

CDE

F

GHI

a) Minimal arc-labeled join graph

b) Tree decomposition

ABCDE

FGI

BCE

GHIJ

CDEF

FGH

BC

DE CE

FF GH

GI

Page 185: Reasoning Under Uncertainty

Join-graphsA

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A C

A AB BC

BE

C

CDE CE

F H

FFG GH H

GI

A

ABDE

FGI

ABC

BCE

GHIJ

CDEF

FGH

C

H

A

AB BC

CDE CE

H

FF GH

GI

ABCDE

FGI

BCE

GHIJ

CDEF

FGH

BC

DE CE

FF GH

GI

ABCDE

FGHI GHIJ

CDEF

CDE

F

GHI

more accuracy

less complexity

Page 186: Reasoning Under Uncertainty

Message propagationABCDE

FGI

BCE

GHIJ

CDEF

FGH

BC

CDECE

FF GH

GI

ABCDEp(a), p(c), p(b|ac), p(d|abe),p(e|b,c)

h(3,1)(bc)

BCD

CDEF

BC

CDE CE

1 3

2

h(3,1)(bc)

h(1,2)

Minimal arc-labeled: sep(1,2)={D,E} elim(1,2)={A,B,C}

Non-minimal arc-labeled: sep(1,2)={C,D,E} elim(1,2)={A,B}

cba

bchbcepabedpacbpcpapdeh,,

)1,3()2,1( )()|()|()|()()()(

ba

bchbcepabedpacbpcpapcdeh,

)1,3()2,1( )()|()|()|()()()(

Page 187: Reasoning Under Uncertainty

Bounded decompositions• We want arc-labeled decompositions such that:

the cluster size (internal width) is bounded by i (the accuracy parameter)

the width of the decomposition as a graph (external width) is as small as possible – closer to a tree

• Possible approaches to build decompositions: partition-based algorithms - inspired by the mini-bucket

decomposition grouping-based algorithms

Page 188: Reasoning Under Uncertainty

Partition-based algorithms

G

E

F

C D

B

A

a) schematic mini-bucket(i), i=3 b) minimal arc-labeled join-graph decomposition

CDB

CAB

BA

A

CBP(D|B)

P(C|A,B)

P(A)

BA

P(B|A)

FCD

P(F|C,D)

GFE

EBF

BF

EFP(E|B,F)

P(G|F,E)

B

CD

BF

A

F

G: (GFE)

E: (EBF) (EF)

F: (FCD) (BF)

D: (DB) (CD)

C: (CAB) (CB)

B: (BA) (AB) (B)

A: (A)

Page 189: Reasoning Under Uncertainty

IJGP properties• IJGP(i) applies BP to min arc-labeled join-graph, whose cluster

size is bounded by i

• On join-trees IJGP finds exact beliefs!

• IJGP is a Generalized Belief Propagation algorithm (Yedidia, Freeman and Weiss, 2001)

• Complexity of one iteration: time: O(deg•(n+N) •d i+1) space: O(N•d)

Page 190: Reasoning Under Uncertainty

Random networks - KL at convergence

evidence=0

Random networks, N=50, K=2, P=3, evid=0, w*=16, 100 instances

i-bound1 2 3 4 5 6 7 8 9 10 11

KL

dist

ance

1e-5

1e-4

1e-3

1e-2

IJGPMCIBP

Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances

i-bound

1 2 3 4 5 6 7 8 9 10 11K

L di

stan

ce1e-5

1e-4

1e-3

1e-2

IJGPMCIBP

evidence=5

Page 191: Reasoning Under Uncertainty

Random networks - KL vs. iterations

evidence=0 evidence=5

Random networks, N=50, K=2, P=3, evid=0, w*=16, 100 instances

Number of iterations

0 5 10 15 20 25 30 35

KL

dist

ance

1e-5

1e-4

1e-3

1e-2IJGP(2)IJGP(10)IBP

Number of iterations

0 5 10 15 20 25 30 35

KL

dist

ance

1e-5

1e-4

1e-3

1e-2

1e-1 IJGP(2)IJGP(10)IBP

Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances

Page 192: Reasoning Under Uncertainty

Random networks - Time

Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances

i-bound

1 2 3 4 5 6 7 8 9 10 11

Tim

e (s

econ

ds)

0.0

0.2

0.4

0.6

0.8

1.0

IJGP 20 itMCIBP 10 it

Page 193: Reasoning Under Uncertainty

IJGP summary• IJGP borrows the iterative feature from IBP and the anytime

virtues of bounded inference from MC

• Empirical evaluation showed the potential of IJGP, which improves with iteration and most of the time with i-bound, and scales up to large networks

• IJGP is almost always superior, often by a high margin, to IBP and MC

• Based on all our experiments, we think that IJGP provides a practical breakthrough to the task of belief updating

• #CSP: can use IJGP to generate solution counts estimates for depth-first Branch-and-Bound search

Page 194: Reasoning Under Uncertainty

Approximate inference• Mini-Bucket Elimination

Mini-clustering• Iterative Belief Propagation

IJGP – Iterative Joint Graph Propagation• Sampling

Forward sampling Gibbs sampling (MCMC) Importance sampling

Page 195: Reasoning Under Uncertainty

Approximation algorithms• Structural Approximations

Eliminate some dependencies• Remove edges• Mini-Bucket and Mini-Clustering approaches

• Local Search Approach for optimization tasks: MPE, MAP

• Favorite MAX-CSP/WCSP/WSAT local search solver!

• Sampling Generate random samples and compute values of interest

from samples, not original network

Page 196: Reasoning Under Uncertainty

Sampling• Input: Bayesian network with set of nodes X• Sample = a tuple with assigned values

s=(X1=x1,X2=x2,… ,Xk=xk)

• Tuple may include all variables (except evidence) or a subset

• Sampling schemas dictate how to generate samples (tuples)

• Ideally, samples are distributed according to P(X|E)

Page 197: Reasoning Under Uncertainty

Sampling fundamentals

dxXxggE )()(

Given a set of variables X = {X1, X2, … Xn} that represent joint probability distribution (X) and some function g(X), we can compute expected value of g(X) :

Page 198: Reasoning Under Uncertainty

Sampling from (X)

Given independent, identically distributed samples (iid) S1, S2, …ST from (X), it follows from Strong Law of Large Numbers:

T

ttSg

Tg

1)(1

},...,,{ 21tn

ttt xxxS A sample St is an instantiation:

Page 199: Reasoning Under Uncertainty

Sampling basics

• Given random variable X, D(X)={0, 1}• Given P(X) = {0.3, 0.7}• Generate k=10 samples: 0,1,1,1,0,1,1,0,1,0• Approximate P’(X):

}6.0,4.0{)('

6.0106

#)1(#)1('

4.0104

#)0(#)0('

XPsamples

XsamplesXP

samplesXsamplesXP

Page 200: Reasoning Under Uncertainty

How to draw a sample ?

• Given random variable X, D(X)={0, 1}• Given P(X) = {0.3, 0.7}• Sample X P (X)

draw random number r [0, 1] If (r < 0.3) then set X=0 Else set X=1

• Can generalize for any domain size

Page 201: Reasoning Under Uncertainty

Sampling in BN

• Same idea: generate a set of samples T• Estimate posterior marginal P(Xi|E) from

samples• Challenge: X is a vector and P(X) is a huge

distribution represented by BN• Need to know:

How to generate a new sample ? How many samples T do we need ? How to estimate P(E=e) and P(Xi|e) ?

Page 202: Reasoning Under Uncertainty

Sampling algorithms

• Forward Sampling• Gibbs Sampling (MCMC)

Blocking Rao-Blackwellised

• Likelihood Weighting• Importance Sampling• Sequential Monte-Carlo (Particle Filtering) in

Dynamic Bayesian Networks

Page 203: Reasoning Under Uncertainty

Forward sampling

• Forward Sampling Case with No evidence E={} Case with Evidence E=e

Page 204: Reasoning Under Uncertainty

Forward sampling no evidence (Henrion 1988)

Input: Bayesian networkX= {X1,…,XN}, N- #nodes, T - # samples

Output: T samples Process nodes in topological order – first process the

ancestors of a node, then the node itself:1. For t = 1 to T2. For i = 1 to N3. Xi sample xi

t from P(xi | pai)

Page 205: Reasoning Under Uncertainty

Sampling a value

What does it mean to sample xit from P(Xi | pai) ?

• Assume D(Xi)={0,1}• Assume P(Xi | pai) = (0.3, 0.7)

• Draw a random number r from [0,1]If r falls in [0,0.3], set Xi = 0If r falls in [0.3,1], set Xi = 1

0 10.3 r

Page 206: Reasoning Under Uncertainty

Forward sampling (example)

X1

X4

X2X3

)( 1xP

)|( 12 xxP

),|( 324 xxxP

)|( 13 xxP

)|( from Sample .4)|( from Sample .3)|( from Sample .2

)( from Sample .1 sample generate//

Evidence No

3,244

133

122

11

xxxPxxxPxxxPx

xPxk

Page 207: Reasoning Under Uncertainty

Forward Sampling-Answering Queries

Task: given T samples {S1,S2,…,Sn} estimate P(Xi = xi) :

TxXsamplesxXP ii

ii)(#)(

Basically, count the proportion of samples where Xi = xi

Page 208: Reasoning Under Uncertainty

Forward sampling w/ evidenceInput: Bayesian network

X= {X1,…,XN}, N- #nodesE – evidence, T - # samples

Output: T samples consistent with E1. For t=1 to T2. For i=1 to N3. Xi sample xi

t from P(xi | pai)4. If Xi in E and Xi xi, reject sample: 5. i = 1 and go to step 2

Page 209: Reasoning Under Uncertainty

Forward sampling (example)

)|( from Sample 5.otherwise 1, fromstart and

samplereject 0, If .4)|( from Sample .3)|( from Sample .2

)( from Sample .1 sample generate//

0 :Evidence

3,244

3

133

122

11

3

xxxPx

xxxPxxxPx

xPxk

X

X1

X4

X2X3

)( 1xP

)|( 12 xxP

),|( 324 xxxP

)|( 13 xxP

Page 210: Reasoning Under Uncertainty

Forward sampling: illustration

Let Y be a subset of evidence nodes s.t. Y=u

Page 211: Reasoning Under Uncertainty

Forward sampling – How many samples?

Theorem: Let s(y) be the estimate of P(y) resulting from a randomly chosen sample set S with T samples. Then, to guarantee relative error at most with probability at least 1- it is enough to have:

1

)( 2

yPcT

Derived from Chebychev’s Bound.

222])(,)([)( NeyPyPyPP

Page 212: Reasoning Under Uncertainty

Forward sampling: performance

Advantages:• P(xi | pa(xi)) is readily available• Samples are independent !

Drawbacks:• If evidence E is rare (P(e) is low), then we will reject

most of the samples!• Since P(y) in estimate of T is unknown, must estimate

P(y) from samples themselves!• If P(e) is small, T will become very big!

Page 213: Reasoning Under Uncertainty

Problem: evidence!

• Forward Sampling High Rejection Rate

• Fix evidence values Gibbs sampling (MCMC) Likelihood Weighting Importance Sampling

Page 214: Reasoning Under Uncertainty

Problem: Evidence

• Forward Sampling High rejection rate Samples are independent

• Fix evidence values Gibbs sampling (MCMC) Likelihood Weighting Importance Sampling

Page 215: Reasoning Under Uncertainty

Sampling algorithms

• Forward Sampling• Gibbs Sampling (MCMC)

Blocking Rao-Blackwellised

• Likelihood Weighting• Importance Sampling

Page 216: Reasoning Under Uncertainty

Gibbs Sampling• Markov Chain Monte Carlo method

(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)

• Samples are dependent, form Markov Chain• Sample from P’(X|e) which converges to P(X|e)• Guaranteed to converge when all P > 0• Methods to improve convergence:

Blocking Rao-Blackwellised

Page 217: Reasoning Under Uncertainty

Gibbs Sampling (Pearl, 1988)

• A sample t[1,2,…], is an instantiation of all variables in the network:

• Sampling process Fix values of observed variables e Instantiate node values in sample x0 at random Generate samples x1,x2,…xT from P(X|e) Compute posteriors from samples

},...,,{ 2211tNN

ttt xXxXxXx

Page 218: Reasoning Under Uncertainty

),,...,,|(

...),,...,,|(

),,...,,|(

11

12

11

1

31

121

22

3211

11

exxxxPxX

exxxxPxX

exxxxPxX

tN

ttN

tNN

tN

ttt

tN

ttt

Ordered Gibbs Sampler

Generate sample xt+1 from xt :

In short, for i=1 to N:

ProcessAll variablesIn SomeOrder

),\|( from sampled1 exxxPxX it

itii

Page 219: Reasoning Under Uncertainty

Gibbs Sampling (Pearl, 1988)

iX

Markov blanket:

nodesother all oft independen is parents), their andchildren, (parents,

Given

iX

blanketMarkov

)()( jj chX

jiii pachpaXM

:)\|( )\|( :Important it

iit

i xmarkovxPxxxP

ij chX

jjiiit

i paxPpaxPxxxP )|()|()\|(

Page 220: Reasoning Under Uncertainty

Ordered Gibbs Sampling Algorithm

Input: X, EOutput: T samples {xt }• Fix evidence E • Generate samples from P(X | E)1. For t = 1 to T (compute samples)2. For i = 1 to N (loop through variables)3. Xi sample xi

t from P(Xi | markovt \ Xi)

Page 221: Reasoning Under Uncertainty

Answering Queries

• Query: P(xi |e) = ?• Method 1: count #of samples where Xi=xi:

• Method 2: average probability (mixture estimator):

n

t it

iiii XmarkovxXPT

xXP1

)\|(1)(

TxXsamplesxXP ii

ii)(#)(

Page 222: Reasoning Under Uncertainty

Gibbs Sampling - example

X = {X1,X2,…,X9}E = {X9}

X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 223: Reasoning Under Uncertainty

Gibbs Sampling - example

X1 = x10 X6 = x6

0

X2 = x20 X7 = x7

0

X3 = x30 X8 = x8

0

X4 = x40

X5 = x50

X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 224: Reasoning Under Uncertainty

Gibbs Sampling - example

X1 P (X1 |X02,…,X0

8 ,X9)E = {X9}

P (X1=0 |X02,X0

3 ,X9} = αP(X1=0)P(X0

2|X1=0)P(X30|X1=0)

P (X1=1 |X02,X0

3 ,X9} = αP(X1=1)P(X0

2|X1=1)P(X30|X1=1)

X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 225: Reasoning Under Uncertainty

Gibbs Sampling - example

X2 P(X2 |X11,…,X0

8 ,X9}E = {X9}

Markov blanket for X2 is:{X2, X1, X4, X5, X3}

X1

X4

X8 X5 X2

X3

X9 X7

X6

Page 226: Reasoning Under Uncertainty

Gibbs Sampling: Illustration

Page 227: Reasoning Under Uncertainty

Gibbs Sampling: Burn-In• We want to sample from P(X | E)• But … starting point is random• Solution: throw away first K samples • Known As “Burn-In”• What is K ? Hard to tell. Use intuition.• Alternatives: sample first sample values from

approximate P(x|e) For example, run IBP first

Page 228: Reasoning Under Uncertainty

Gibbs Sampling: Convergence• Converge to stationary distribution * :

* = * Pwhere P is a transition kernel

pij = P(Xi Xj)• Guaranteed to converge iff chain is :

irreducible aperiodic ergodic ( i,j pij > 0)

Page 229: Reasoning Under Uncertainty

Gibbs Sampling: Performance• Advantage:

guaranteed to converge to P(X|E), as long as Pi > 0• Disadvantage:

convergence may be slow

• Problems: Samples are dependent ! Statistical variance is too big in high-dimensional

problems

Page 230: Reasoning Under Uncertainty

Gibbs: Speeding ConvergenceObjectives:1. Reduce dependence between samples

(autocorrelation) Skip samples Randomize Variable Sampling Order

2. Reduce variance Blocking Gibbs Sampling Rao-Blackwellisation

Page 231: Reasoning Under Uncertainty

Skipping Samples• Pick only every k-th sample (Gayer, 1992)

Can reduce dependence between samples! Increases variance ! Waists samples !

Page 232: Reasoning Under Uncertainty

Randomized Variable Order• Random Scan Gibbs Sampler

Pick each next variable Xi for update at random with probability pi , i pi = 1.

• In the simplest case, pi are distributed uniformly. In some instances, reduces variance (MacEachern, Peruggia, 1999)

Page 233: Reasoning Under Uncertainty

Blocking• Sample several variables together, as a block• Example: Given three variables X,Y,Z, with domains of

size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:

Xt+1 P(yt,zt)=P(wt)(yt+1,zt+1)=Wt+1 P(xt+1)

+ Can improve convergence greatly when two variables are strongly correlated!

- Domain of the block variable grows exponentially with the #variables in a block!

Page 234: Reasoning Under Uncertainty

Rao-Blackwellisation• Do not sample all variables!• Sample a subset!• Example: Given three variables X,Y,Z, sample

only X and Y, sum out Z. Given sample (xt,yt), compute next sample:

xt+1 P(yt)yt+1 P(xt+1)

Page 235: Reasoning Under Uncertainty

Rao-Blackwell Theorem

Bottom line: reducing number of variables in a sample reduce variance!

Page 236: Reasoning Under Uncertainty

Blocking vs. Rao-Blackwellisation

• Standard Gibbs:P(x|y,z),P(y|x,z),P(z|x,y) (1)

• Blocking:P(x|y,z), P(y,z|x) (2)

• Rao-Blackwellised:P(x|y), P(y|x) (3)

Var3 < Var2 < Var1 (Liu, Wong, Kong, 1994)

X Y

Z

Page 237: Reasoning Under Uncertainty

Rao-Blackwellised Gibbs: Cutset Sampling

• Select C X (possibly cycle-cutset), |C| = m• Fix evidence E• Initialize nodes with random values:

For i=1 to m: ci to Ci = c 0

i

• For t=1 to n , generate samples:For i=1 to m:

Ci=cit+1 P(ci|c1

t+1,…,ci-1 t+1,ci+1

t,…,cmt ,e)

},...,,{ 2211tKK

ttt cCcCcCc

Page 238: Reasoning Under Uncertainty

Cutset Sampling - generating samples

Generate sample ct+1 from ct :

),\|(

),,...,,|(

...),,...,,|(

),,...,,|(

1

11

12

11

1

31

121

22

3211

11

ecccPcC

eccccPcC

eccccPcC

eccccPcC

it

itii

tK

ttK

tKK

tK

ttt

tK

ttt

from sampled

Page 239: Reasoning Under Uncertainty

Cutset Sampling• How to choose C ?

Special case: C is cycle-cutset, O(N) General case: apply Bucket Tree Elimination (BTE),

O(exp(w)) where w is the induced width of the network when nodes in C are observed.

Pick C wisely so as to minimize w notion of w-cutset

Page 240: Reasoning Under Uncertainty

w-cutset Sampling• C=w-cutset of the network, a set of nodes

such that when C and E are instantiated, the adjusted induced width of the network is w

• Complexity of exact inference: bounded by w !

• Cycle-cutset is a special case!

Page 241: Reasoning Under Uncertainty

Cutset Sampling - Answering Queries• Query: ci C, P(ci |e)=? same as Gibbs:• Special case of w-cutset

• Query: P(xi |e) = ?

computed while generating sample t

compute after generating sample t(easy because C is a cut-set)

T

t it

ii ecccPT

|e)(cP1

),\|(1

T

tt

ii ,ecxPT

|e)(xP1

)|(1

Page 242: Reasoning Under Uncertainty

Cutset Sampling Example

}{ 05

02

0 ,xxc X1

X7

X5 X4

X2

X9 X8

X3

E=x9

X6

Page 243: Reasoning Under Uncertainty

Cutset Sampling Example

),(

),(1)(

),(

),(

}{

905

''2

905

'2

9052

12

905

''2

905

'2

05

02

0

,xxxBTE

,xxxBTE,x| xxP x

,xxxBTE

,xxxBTE

,xx c

X1

X7

X6 X5 X4

X2

X9 X8

X3

Sample a new value for X2 :

Page 244: Reasoning Under Uncertainty

Cutset Sampling Example

},{

),(

),(1)(

),(

),(

)(

},{

15

12

1

9''

512

9'5

12

9125

15

9''

512

9'5

12

9052

12

05

02

0

xxc

,xxxBTE

,xxxBTE,x| xxP x

,xxxBTE

,xxxBTE

,x| xxP x

xxc

X1

X7

X6 X5 X4

X2

X9 X8

X3

Sample a new value for X5 :

Page 245: Reasoning Under Uncertainty

Cutset Sampling Example

)(

)(

)(

31)|(

)(

)(

)(

9252

9152

9052

92

9252

32

9152

22

9052

12

,x| xxP

,x| xxP

,x| xxP

xxP

,x| xxP x

,x| xxP x

,x| xxP x

X1

X7

X6 X5 X4

X2

X9 X8

X3

Query P(x2|e) for sampling node X2 :Sample 1

Sample 2

Sample 3

Page 246: Reasoning Under Uncertainty

Cutset Sampling Example

),,|(

),,|(

),,|(

31)|(

),,|(},{

),,|(},{

),,|(},{

935

323

925

223

915

123

93

935

323

35

32

3

925

223

25

22

2

915

123

15

12

1

xxxxP

xxxxP

xxxxP

xxP

xxxxPxxc

xxxxPxxc

xxxxPxxc

X1

X7

X6 X5 X4

X2

X9 X8

X3

Query P(x3 |e) for non-sampled node X3 :

Page 247: Reasoning Under Uncertainty

CPCS179 Test Results

MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35Exact Time = 122 sec using Loop-Cutset Conditioning

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

100 500 1000 2000 3000 4000

# samples

Cutset Gibbs

CPCS179, n=179, |C|=8, |E|=35

0

0.002

0.004

0.006

0.008

0.01

0.012

0 20 40 60 80

Time(sec)

Cutset Gibbs

Page 248: Reasoning Under Uncertainty

CPCS360b Test Results

MSE vs. #samples (left) and time (right) Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36Exact Time > 60 min using Cutset ConditioningExact Values obtained via Bucket Elimination

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

0 200 400 600 800 1000

# samples

Cutset Gibbs

CPCS360b, n=360, |C|=21, |E|=36

0

0.00004

0.00008

0.00012

0.00016

1 2 3 5 10 20 30 40 50 60

Time(sec)

Cutset Gibbs

Page 249: Reasoning Under Uncertainty

Sampling algorithms

• Forward Sampling• Gibbs Sampling (MCMC)

Blocking Rao-Blackwellised

• Likelihood Weighting• Importance Sampling

Page 250: Reasoning Under Uncertainty

Likelihood Weighting(Fung and Chang, 1990; Shachter and Peot, 1990)

• “Clamping” evidence +• Forward sampling +• Weighting samples by evidence likelihood

Works well for likely evidence!

Page 251: Reasoning Under Uncertainty

Likelihood Weighting

e e e e e

Sample in topological order over X !

e e e e

xi P(Xi|pai)P(Xi|pai) is a look-up in CPT!

Page 252: Reasoning Under Uncertainty

Likelihood Weighting Outline

EndFor)|(

)|(

)(

Do ForEach 1

)(

)()(

)(

iit

ii

iitt

ii

i

i

t

paXPxX

ElsepaePww

eXEXIf

XXw

Page 253: Reasoning Under Uncertainty

Likelihood Weighting

T

t

t

ti

T

t

t

ii

w

xxw

ePexPexP

1

)(

)(

1

)( ),(

)(ˆ),(ˆ

)|(ˆ

Estimate Posterior Marginals: P(Xi | e)

otherwise 0 and , contains sample if ,1),( )()(i

tti xxxx

Page 254: Reasoning Under Uncertainty

Likelihood Weighting

• Converges to exact posterior marginals• Generates samples fast • Sampling distribution is close to prior

(especially if E Leaf Nodes)• Increasing sampling variance

Convergence may be slow Many samples with P(x(t))=0 rejected

Page 255: Reasoning Under Uncertainty

Sampling algorithms

• Forward Sampling• Gibbs Sampling (MCMC)

Blocking Rao-Blackwellised

• Likelihood Weighting• Importance Sampling

Page 256: Reasoning Under Uncertainty

Importance Sampling Idea• In general, it is hard to sample from target

distribution P(X|E)• Generate samples from sampling (proposal)

distribution Q(X)• Weigh each sample against P(X|E)

dxxfxQxPdxxffI t )()()()()(

Page 257: Reasoning Under Uncertainty

Importance Sampling Theory

Z

EX

n

iii

EX

eEZPeEP

eXpaXPeEEXPeEP

),()(simplify E,\XLet Z

)),(|(),\()(\ 1\

Page 258: Reasoning Under Uncertainty

Importance Sampling Theory

• Given a distribution called the proposal distribution Q (such that P(Z=z,e)>0 => Q(Z=z)>0)

Zz

eEzZPeEP ),()(

)()(

),()( zZQ

zZQeEzZP

eEPZz

Zz

Q zZzQZE )( :value expected of definition By

)()(

),()( zZwEzZQ

eEzZPEeEP QQ

w(Z=z) is called importance weight

Page 259: Reasoning Under Uncertainty

Importance Sampling Theory

)()(

),()( zZwEzZQ

eEzZPEeEP QQ

)()(ˆ ,N

)(1)(

),(1)(ˆ

)z,...,(z Samples

Q fromdrawn samples ofset aGiven

11

n1

eEPeEPAs

zZwNzZQ

eEzZPN

eEPN

i

ii

N

ii

i

Underlying principle, Approximate Average over a set of numbers by an average over a set of sampled numbers

Page 260: Reasoning Under Uncertainty

Importance Sampling (Informally)• Express the problem as computing the average over

a set of real numbers• Sample a subset of real numbers• Approximate the true average by sample average.

True Average:• Average of (0.11, 0.24, 0.55, 0.77, 0.88,0.99)=0.59

Sample Average over 2 samples: • Average of (0.24, 0.77) = 0.505

Page 261: Reasoning Under Uncertainty

How to generate samples from Q

• Express Q in product form: Q(Z)=Q(Z1)Q(Z2|Z1)….Q(Zn|Z1,..Zn-1)

• Sample along the order Z1,..Zn

• Example: Q(Z1)=(0.2,0.8) Q(Z2|Z1)=(0.2,0.8,0.1,0.9) Q(Z3|Z1,Z2)=Q(Z3|Z1)=(0.5,0.5,0.3,0.7)

N

ii

i

zZQeEzZP

NeEP

1 )(),(1

)(

Page 262: Reasoning Under Uncertainty

How to sample from Q?

• Each Sample Z=z Sample Z1=z1 from Q(Z1) Sample Z2=z2 from Q(Z2|Z1=z1) Sample Z3=z3 from Q(Z3|Z1=z1)

• Generate N such samples

)(1)(

),(1)(

)z,...,(z Samples

11

n1

iN

i

N

ii

i

zZwNzZQ

eEzZPN

eEP

Page 263: Reasoning Under Uncertainty

Likelihood weighting

• Q= Prior Distribution = CPTs of the Bayesian network

Page 264: Reasoning Under Uncertainty

Likelihood weighting example

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)

0)BC,|S)P(DC,|1S)P(X|0S)P(B|P(S)P(C0)B1,P(Xfalse0 and 1 where?)0,1( trueBXP

Page 265: Reasoning Under Uncertainty

Likelihood weighting example

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

Q=Prior

Q(S,C,D)=Q(S)*Q(C|S)*Q(D|C,B=0)

=P(S)P(C|S)P(D|C,B=0)

Sample S=s from P(S)

Sample C=c from P(C|S=s)

Sample D=d from P(D|C=c,B=0)

N

ii

i

zZQeEzZP

NeEP

1 )(),(1)(

),|1()|0()0,|()|()(

)0,|(),|1()|0()|()()0,|()|()(

)0,1,,,()(

),()(

sScCXPsSBPBcCdDPsScCPsSP

BcCdDPsScCXPsSBPsScCPsSPBcCdDPsScCPsSP

BXdDcCsSPzZQ

eEzZPzZw i

ii

Page 266: Reasoning Under Uncertainty

How to solve belief updating?

eEeExX

eEPeExXPeExXP

ii

iiii

is Evidence :rDenominato, is Evidence :Numerator

sampling importanceby r Denominato andNumerator Estimate)(

),()|(

0 , z sample iff 1),(,

)(

)(),()|(ˆ

j

1

1

elsexXcontainszxwhere

zw

zwzxeExXP

iij

i

N

j

j

N

j

jji

ii

Page 267: Reasoning Under Uncertainty

Summary