reasoning under uncertainty
DESCRIPTION
Reasoning Under Uncertainty. Radu Marinescu 4C @ University College Cork. Why uncertainty?. Uncertainty in medical diagnosis Diseases produce symptoms In diagnosis, observed symptoms => disease ID Uncertainties Symptoms may not occur Symptoms may not be reported - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/1.jpg)
Reasoning Under Uncertainty
Radu Marinescu4C @ University College Cork
![Page 2: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/2.jpg)
Why uncertainty?• Uncertainty in medical diagnosis
Diseases produce symptoms In diagnosis, observed symptoms => disease ID Uncertainties
• Symptoms may not occur• Symptoms may not be reported• Diagnostic tests are not perfect
– False positive, false negative
• How do we estimate confidence? P(disease | symptoms, tests) = ?
![Page 3: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/3.jpg)
Why uncertainty?• Uncertainty in medical decision-making
Physicians, patients must decide on treatments Treatments may not be successful Treatments may have unpleasant side effects
• Choosing treatments Weigh risks of adverse outcomes
• People are BAD at reasoning intuitively about probabilities Provide systematic analysis
![Page 4: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/4.jpg)
Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief (or Bayesian) networks
Example networks and software• Inference in belief networks
Exact inference• Variable elimination, join-tree clustering, AND/OR search
Approximate inference• Mini-clustering, belief propagation, sampling
![Page 5: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/5.jpg)
Bibliography• Judea Pearl. “Probabilistic reasoning in intelligent systems”, 1988
• Stuart Russell & Peter Norvig. “Artificial Intelligence. A Modern Approach”, 2002 (Ch 13-17)
• Kevin Murphy. "A Brief Introduction to Graphical Models and Bayesian Networks"http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html
• Rina Dechter. "Bucket Elimination: A Unifying Framework for Probabilistic Inference"http://www.ics.uci.edu/~csp/R48a.ps
• Rina Dechter. "Mini-Buckets: A General Scheme for Approximating Inference"http://www.ics.uci.edu/~csp/r62a.pdf
• Rina Dechter & Robert Mateescu. "AND/OR Search Spaces for Graphical Models".http://www.ics.uci.edu/~csp/r126.pdf
![Page 6: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/6.jpg)
Reasoning under uncertainty• A problem domain is modeled by a list of (discrete)
random variables: X1, X2, …, Xn
• Knowledge about the problem is represented by a joint probability distribution: P(X1, X2, …, Xn)
![Page 7: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/7.jpg)
Example• Alarm (Pearl88)
Story: In Los Angeles, burglary and earthquake are common. They both can trigger an alarm. In case of alarm, two neighbors John and Mary may call 911
Problem: estimate the probability of a burglary based on who has or has not called
Variables: • Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M)
Knowledge required by the probabilistic approach in order to solve this problem: P(B, E, A, J, M)
![Page 8: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/8.jpg)
Joint probability distributionDefines probabilities for all possible value assignments to the variables in the set
![Page 9: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/9.jpg)
Inference with joint probability distribution
• What is the probability of burglary given that Mary called, P(B=y | M=y)?
• Compute marginal probability:
• Compute answer (reasoning by conditioning):
JAE
MJAEBPMBP,,
),,,,(),(
B M P(B,M)
y y 0.000115
y n 0.000075
n y 0.00015
n n 0.99971
43.000015.0000115.0
000115.0)(
),()|(
yMPyMyBPyMyBP
![Page 10: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/10.jpg)
Advantages• Probability theory well-established and well understood
• In theory, can perform arbitrary inference among the variables given a joint probability. This is because the joint probability contains information of all aspects of the relationships among the variables Diagnostic inference:
• From effects to causes• Example: P(B=y | M=y)
Predictive inference:• From causes to effects• Example: P(M=y | B=y)
Combining evidence: P(B=y | J=y, M=y, E=n)
• All inference sanctioned by probability theory and hence has clear semantics
![Page 11: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/11.jpg)
Difficulty: complexity in model construction and inference
• In Alarm example: 32 numbers needed (parameters) Quite unnatural to assess
• P(B=y, E=y, A=y, J=y, M=y) Computing P(B=y | M=y) takes 29 additions
• In general, P(X1, X2, …, Xn) needs at least 2n numbers to specify the
joint probability distribution Knowledge acquisition difficult (complex, unnatural) Exponential storage and inference
![Page 12: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/12.jpg)
Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief networks
Example networks and software• Inference in belief networks
Exact inference Approximate inference
• Miscellaneous Mixed networks, influence diagrams, etc.
![Page 13: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/13.jpg)
Chain rule and factorization• Overcome the problem of exponential size by
exploiting conditional independencies The chain rule of probability:
No gains yet. The number of parameters required by the factors is still O(2n)
n
iiin XXXPXXXP
XXXPXXPXPXXXPXXPXPXXP
11121
123121321
12121
),|(),,,(
),|()|()(),,()|()(),(
![Page 14: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/14.jpg)
Conditional independence• A random variable X is conditionally
independent of a set of random variables Y given a set of random variables Z if P(X | Y, Z) = P(X | Z)
• Intuitively: Y tells us nothing more about X than we know by
knowing Z As far as X is concerned, we can ignore Y if we
know Z
![Page 15: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/15.jpg)
Conditional independence• About P(Xi|X1,…,Xi-1):
Domain knowledge usually allows one to identify a subset pa(Xi) {X1, …, Xi-1} such that
• Given pa(Xi), Xi is independent of all variables in {X1,…,Xi-1} \ pa(Xi), i.e.
P(Xi | X1, …, Xi-1) = P(Xi | pa(Xi))
• Then
• Joint distribution factorized!• The number of parameters might have been substantially
reduced
n
iiin XpaXPXXXP
121 ))(|(),...,,(
![Page 16: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/16.jpg)
Example continued
• pa(B) = {}, pa(E) = {}, pa(A) = {B,E}, pa(J) = {A}, pa(M) = {A}• Conditional probability tables (CPT)
)|()|(),|()()(),,,|(),,|(),|()|()(
),,,,(
AMPAJPEBAPEPBPJAEBMPAEBJPEBAPBEPBP
MJAEBP
B P(B)
Y .01
N .99
E P(E)
Y .02
N .98
M A P(M|A)Y Y .9N Y .1Y N .05N N .95
J A P(J|A)Y Y .7N Y .3Y N .01N N .99
A B E P(A|B,E)
Y Y Y .95
N Y Y .05
Y Y N .94
N Y N .06
Y N Y .29
N N Y .71
Y N N .001
N N N .999
![Page 17: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/17.jpg)
Example continued• Model size reduced from 32 to 2+2+4+4+8=20• Model construction easier
Fewer parameters to assess Parameter more natural to assess
• e.g., P(B=y), P(J=y | A=y), P(A=y | B=y, E=y), etc.
• Inference easier. Will see this later.
![Page 18: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/18.jpg)
Outline• Probabilistic modeling with joint distributions• Conditional Independence and factorization• Belief networks
Example networks and software• Inference in belief networks
Exact inference Approximate inference
![Page 19: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/19.jpg)
From factorization to belief networks• Graphically represent the conditional independency
relationships: Construct a directed graph by drawing an arc from Xj to Xi iff Xj
pa(Xi)
Also attach the CPT P(Xi | pa(Xi)) to node Xi
B E
A
J M
P(B) P(E)
P(A|B,E)
P(J|A) P(M|A)
![Page 20: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/20.jpg)
Formal definition• A belief network is:
A directed acyclic graph (DAG), where:• Each node represents a random variable• And is associated with the conditional probability of the node given
its parents Represents the joint probability distribution:
A variable is conditionally independent of its non-descendants given its parents
n
iiin XpaXPXXXP
121 ))(|(),...,,(
![Page 21: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/21.jpg)
Independences in belief networks• 3 basic independence structures
Burglary
Alarm
JohnCalls
1: chain
Burglary
Alarm
Earthquake
2: common descendants
MaryCalls
Alarm
JohnCalls
3: common ancestors
![Page 22: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/22.jpg)
Independences in belief networks
Burglary
Alarm
JohnCalls
1. JohnCalls is independent of Burglary given Alarm
)|()|()|,()|(),|(
ABPAJPABJPAJPBAJP
![Page 23: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/23.jpg)
Independences in belief networks
Burglary
Alarm
Earthquake
2. Burglary is independent of Earthquake not knowing Alarm.Burglary and Earthquake become dependent given Alarm!!
)|()|()|,()()(),(
AEPABPAEBPEPBPEBP
![Page 24: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/24.jpg)
Independences in belief networks
MaryCalls
Alarm
JohnCalls
3. MaryCalls is independent of JohnCalls given Alarm.
)|()|()|,()|(),|(
AMPAJPAMJPAJPMAJP
![Page 25: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/25.jpg)
Independences in belief networks• BN models many conditional independence relations relating distant
variables and sets, which are defined in terms of the graphical criterion called d-separation
• d-separation = conditional independence Let X, Y and Z be three sets of nodes If X and Y are d-separated by Z, then X and Y are conditionally independent given
Z: P(X|Y, Z) = P(X|Z)
• d-separation in the graph: A is d-separated from B given C if every undirected path between them is
blocked
• Path blocking 3 cases that expand on three basic independence structures
![Page 26: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/26.jpg)
Undirected path blocking• With “linear” substructure
• With “wedge” substructure (common ancestors)
• With “vee” substructure (common descendants)
X YZ in C
X YZ in C
X
Y
Z or any of its descendants not in C
![Page 27: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/27.jpg)
Example
1
2 3
4
5
X Y
ZX = {2} and Y = {3} are d-separated by Z = {1}
• path 2 1 3 is blocked by 1 Z• path 2 4 3 is blocked because 4 and all itsdescendants are outside Z
X = {2} and Y = {3} are not d-separated by Z = {1,5}
• path 2 1 3 is blocked by 1 Z• path 2 4 3 is activated because 5 (which isa descendant of 4) is in Z
• learning the value of consequence 5 renders5’s causes 2 and 3 dependant
)|()|()|,( ZYPZXPZYXP
)|()|()|,( ZYPZXPZYXP
![Page 28: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/28.jpg)
I-mapness• Given a probability distribution P on a set of
variables {X1, …, Xn}, a belief network B representing P is a minimal I-map (Pearl88) I-mapness: every d-separation condition displayed
in B corresponds to a valid conditional independence relationship in P
Minimal: none of the arrows in B can be deleted without destroying its I-mapness
![Page 29: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/29.jpg)
Full joint distribution in BN
B E
A
J M
P(B,E,A,J,M) =
Rewrite the full joint probability using the product rule:
= P(J|B,E,A,M) P(B,E,A,M)
= P(J|A)P(B,E,A,M)
P(M|B,E,A) P(B,E,A) P(M|A) P(B,E,A)
P(A|B,E) P(B,E)
P(B) P(E)
= P(J|A) P(M|A) P(A|B,E) P(B) P(E)
![Page 30: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/30.jpg)
Example network
PCWP COHRBP
HREKG HRSAT
ERRCAUTERHRHISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2PRESS
INSUFFANESTHTPR
LVFAILURE
ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME
HYPOVOLEMIA
CVP
BP
The “alarm” network: Monitoring Intensive-Care Patients37 variables, 509 parameters (instead of 237)
![Page 31: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/31.jpg)
Software• GeNIe (University of Pittsburgh) - free
http://genie.sis.pitt.edu• SamIam (UCLA) - free
http://reasoning.cs.ucla.edu/SamIam/• Hugin - commercial
http://www.hugin.com• Netica - commercial
http://www.norsys.com• UCI Lab – free but no GUI
http://graphmod.ics.uci.edu/
![Page 32: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/32.jpg)
GeNIe screenshot
![Page 33: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/33.jpg)
Applications• Belief networks are used in:
Genetic linkage analysis Speech recognition Medical diagnosis Probabilistic error correcting coding Monitoring and diagnosis in distributed systems Troubleshooting (Microsoft) …
![Page 34: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/34.jpg)
Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief networks• Inference in belief networks
Exact inference Approximate inference
![Page 35: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/35.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) VE+C hybrid AND/OR search (tree, graph)
![Page 36: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/36.jpg)
Belief updating
Smoking
BronchitisLung cancer
X-ray Dyspnoea
P(Lung cancer = yes | Smoking = no, Dyspnoea = yes) ?
![Page 37: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/37.jpg)
Probabilistic inference tasks• Belief updating
• Maximum probable explanation (MPE)
• Maximum a posteriori hypothesis (MAP)
)|()( evidencexXPXBEL iii
),(maxarg* exPxx
AXa
k exPaa\
**1 ),(maxarg),...,(
![Page 38: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/38.jpg)
Belief updating: P(X|evidence) = ?
A
B C
ED
P(A)
P(C|A)P(B|A)
P(E|B,C)
P(D|A,B)
P(A|E=0)
∑E=0,D,C,B P(A) P(B|A) P(C|A) P(D|A,B) P(E|B,C) =
α P(A,E=0) =
P(A) ∑E=0 ∑D ∑C P(C|A) ∑B P(B|A) P(D|A,B) P(E|B,C)
λB(A,D,C,E)Variable Elimination
![Page 39: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/39.jpg)
Bucket elimination
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
A
B C
ED
Moralize (“marry parents”)
Bucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C), P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
Ordering: A, E, D, C, B
P(C|A)
![Page 40: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/40.jpg)
The bucket operation
ELIMINATION: multiply (*) and sum (∑)
bucket(B): { P(E|B,C), P(D|A,B), P(B|A) }
λB(A,C,D,E) = ∑B P(B|A)*P(D|A,B)*P(E|B,C)
OBSERVED BUCKET:
bucket(B): { P(E|B,C), P(D|A,B), P(B|A), B=1 }
λB(A) = P(B=1|A) λB(A,D) = P(D|A,B=1)
λB(E,C) = P(E|B=1,C)
![Page 41: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/41.jpg)
Multiplying functions
![Page 42: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/42.jpg)
Summing out a variable
![Page 43: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/43.jpg)
Bucket elimination
Bucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C), P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
∑∏ Elimination operator
λB(A,D,C,E)
λC(A,D,E)
λD(A,E)
λE(A)
P(A,E=0)
B
C
D
E
A
w* = 4“induced width”(max clique size)
![Page 44: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/44.jpg)
Induced graph
B
C
D
E
A
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
P(C|A)
Induced width of the ordering w*(d)||
max width of the nodes
A
B C
ED
![Page 45: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/45.jpg)
Complexity of elimination))(*exp(( dwnO
w*(d) – induced width of the moral graph along ordering d
A
B C
ED
“Moral” graph
B
C
D
E
A
w*(d1) = 4
E
D
C
B
A
w*(d2) = 2
![Page 46: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/46.jpg)
Finding small induced-width orderings• NP-complete• A tree has induced width of ?• Greedy algorithms:
Min-width Min induced-width Max-cardinality Min-fill (thought as the best) Anytime min-width (via Branch-and-Bound)
![Page 47: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/47.jpg)
MPE: Most Probable Explanation
Smoking
BronchitisLung Cancer
X-ray Dyspnoea
)1,,,,0(maxarg)1,',',',0( DXBCSPdxcbs
),|1(),0|()0|()0|()0()1,,,,0( CBDPCSXPSBPSCPSPDXBCSP
![Page 48: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/48.jpg)
Applications• Probabilistic decoding
A stream of bits is transmitted across a noisy channel and the problem is to recover the transmitted stream given the observed output and parity check bits
x0 x1 x2 x3 x4
u0 u1 u2 u3 u4
y0u y1
u y2u y3
u y4u
y0x y1
x y2x y3
x y4x
Transmitted bits
Parity check bits
Received bits (observed)
Received parity check bits (observed)
![Page 49: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/49.jpg)
Applications• Medical diagnosis
Given some observed symptoms, determine the most likely subset of diseases that may explain the symptoms
Symptom2
Symptom3
Symptom4
Symptom5
Symptom1 Symptom6
Disease1 Disease2 Disease4
Disease6Disease5
Disease3 Disease7
![Page 50: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/50.jpg)
Applications• Genetic linkage analysis
Given the genotype information of a pedigree, infer the maximum likelihood haplotype configuration (maternal and paternal) of the unobserved individuals
2 1A AB B
a ab b
A aB b 3
genotyped
haplotype
S23m
L21fL21m
L23m
X21 S23f
L22fL22m
L23f
X22
X23
S13m
L11fL11m
L13m
X11 S13f
L12fL12m
L13f
X12
X13
Locus 1
Locus 2
(Fishelson & Geiger, 2002)
![Page 51: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/51.jpg)
Bucket elimination for MPE
A
B C
ED
P(A)
P(C|A)P(B|A)
P(E|B,C)
P(D|A,B)
MPE =
maxA,E=0,D,C,B P(A) P(B|A) P(C|A) P(D|A,B) P(E|B,C) =
maxAP(A) maxE=0 maxD maxC P(C|A) maxB P(B|A) P(D|A,B) P(E|B,C)
λB(A,D,C,E)Variable Elimination
![Page 52: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/52.jpg)
Max out a variable
A B C f(A,B,C)
T T T 0.03
T T F 0.07
T F T 0.54
T F F 0.36
F T T 0.06
F T F 0.14
F F T 0.48
F F F 0.32
A C f(A,C)
T T 0.54
T F 0.36
F T 0.48
F F 0.32
max out B
![Page 53: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/53.jpg)
Bucket elimination
Bucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C), P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
max∏ Elimination/combination operators
λB(A,D,C,E)
λC(A,D,E)
λD(A,E)
λE(A)
MPE value
B
C
D
E
A
w* = 4“induced width”(max clique size)
width
4
3
1
1
0
![Page 54: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/54.jpg)
Generating the MPE tupleBucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C), P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
λB(A,D,C,E)
λC(A,D,E)
λD(A,E)
λE(A)a’ = argmax P(A)∙λE(A)
e’ = 0
d’ = argmax λC(a’,D,e’)
c’ = argmax P(C|a’) ∙∙ λC(a’,d’,C,e’)
b’ = argmax P(e’|B,c’) ∙ ∙ P(d’|a’,B) P(B|a’)∙
Return (a’, b’, c’, d’, e’)
![Page 55: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/55.jpg)
Complexity of elimination))(*exp(( dwnO
w*(d) – induced width of the moral graph along ordering d
A
B C
ED
“Moral” graph
B
C
D
E
A
w*(d1) = 4
E
D
C
B
A
w*(d2) = 2
![Page 56: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/56.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) VE+C hybrid AND/OR search (tree, graph)
![Page 57: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/57.jpg)
From BE to Bucket-Tree elimination• Motivation
BE computes P(evidence) or P(X|evidence) where X is the last variable in the ordering
What if we need all marginal probabilities P(Xi|evidence), where Xi {X1, X2, …, Xn} ?
• Run BE n times with Xi being the last variable• Inefficient! – induced width may vary significantly from
one ordering to another• SOLUTION: Bucket-Tree Elimination (BTE)
![Page 58: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/58.jpg)
Bucket-Tree eliminationA
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
Bucket E:
Bucket D:
Bucket C:
Bucket B:
Bucket A:
P(E|B,C)
P(D|A,B)
P(B|A)
P(A)
P(C|A) λE(B,C)
λD(A,B) λC(A,B)
λB(A)
P(E|B,C)
P(D|A,B)
P(C|A)
P(B|A)
P(A)
E
D
C
B
A
λE(B,C)
λD(A,B)λC(A,B)
λB(A)
• Variable elimination can be viewed asmessage passing (elimination) using abucket tree• Any node (bucket) can be the root• Complexity: time and space exponentialin the induced width
P(C|A)
![Page 59: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/59.jpg)
Bucket-Tree (more formal)• Bucket Tree
A bucket tree has each bucket Bi as a node and there is an arc from Bi to Bj if the function created at Bi was placed in Bj
• Graph-based definition Let Gd be the induced graph along d. Each variable
X and its earlier neighbors is a node BX. There is an arc from BX to BY if Y is the closest parent to X.
![Page 60: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/60.jpg)
Bucket-Tree
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
Belief network
E
D
C
B
A
Induced graph
E,B,C
A,B,D
A,B,C
B,A
A
E
D
C
B
A
λE(B,C)
λD(A,B)λC(A,B)
λB(A)
Bucket tree
P(C|A)
![Page 61: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/61.jpg)
Bucket-Tree propagation
u
Xn
X2
X1
v
h(u,v)
)},(),,(),...,,(),,({)()( 21 uvhuxhuxhuxhuubucket n
…
Compute the message:
),(elim)},({)(
),(vu
uvhubucketffvuh
h(x1,u)
h(xn,u) elim(u,v) = vars(u) – vars(v)
![Page 62: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/62.jpg)
Upward messages in the bucket-tree
E,B,C
A,B,D
A,B,C
B,A
A
E
D
C
B
A
λE(B,C)
λD(A,B)λC(A,B)
λB(A)πA(A)
πC(B,C)
πB(A,B)πB(A,B)
A
CB
EC
DBA
CB
CDB
BA
BAACPCB
BAAABPBA
BAABPBA
APA
),()|(),(
),()()|(),(
),()|(),(
)()(
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
P(C|A)
![Page 63: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/63.jpg)
Computing marginals from the bucket-tree
E,B,C : P(E|B,C)
A,B,D : P(D|A,B)
A,B,C : P(C|A)
B,A : P(B|A)
A : P(A)
E
D
C
B
A
λE(B,C)
λD(A,B)λC(A,B)
λB(A)πA(A)
πC(B,C)
πB(A,B)πB(A,B)
BA
ECB CBBAACPevidenceCP
,
),(),()|()|(
![Page 64: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/64.jpg)
Buckets -> Super-buckets -> Clusters
G,F
F,B,C
D,B,A
A,B,C
B,A
A
F
B,C A,BA,B
A
G,F
F,B,C
D,B,A
A,B,C
F
B,CA,B
G,F
A,B,C,D,F
F
A
B C
FD
P(A)
P(B|A)
P(F|B,C)
P(D|A,B)
Time-space trade off! G
P(C|A)
P(G|F)
![Page 65: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/65.jpg)
Tree decomposition• A tree decomposition for a belief network ‹X,D,G,P› is a
triple ‹T,χ,ψ›, where T=(V,E) is a tree, and χ and ψ are labeling functions, associating with each vertex v V two sets χ(v) V and ψ(v) P such that: For each function (CPT) pi P there is exactly one vertex such
that pi ψ(v) and scope(pi) χ(v) For each variable Xi X, the set {v V | Xi χ(v)} forms a
connected sub-tree (running intersection property)
• A join-tree is a tree decomposition where all clusters are maximal E.g., a bucket-tree is a tree decomposition but not a join-tree
![Page 66: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/66.jpg)
Treewidth and separator• The width (aka treewidth) of a tree
decomposition ‹T,χ,ψ› is max|χ(v)|, and its hyperwidth is max|ψ(v)|
• Given two adjacent vertices u and v of a tree decomposition, a separator of u and v is defined as sep(u,v) = χ(u) χ(v)
![Page 67: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/67.jpg)
Finding join-tree decompositions• Good join trees using triangulation
Create induced graph G’ along some ordering d Identify all maximal cliques in G’ Order cliques {C1, C2, …, Ct} by rank of the highest
vertex in each clique Form the join tree by connecting each Ci to a
predecessor Cj (j < i) sharing the largest number of vertices with Ci
![Page 68: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/68.jpg)
Example
E
D
C
B
A
Induced graph
A
B C
ED
Moral graph
ECBC3
DBAC2
CBAC1
P(A)
P(B|A) P(C|A)
P(E|B,C)P(D|A,B)
BC
P(E|B,C)
P(D|A,B)
P(A), P(B|A), P(C|A)
AB
Treewidth = 3Separator size = 2
χ(C3)
ψ(C3)
separators
![Page 69: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/69.jpg)
Tree decomposition for belief updating
A
B
C D
F
E
G
ABCP(A), P(B|A), P(C|
A,B)
BCDFP(D|B), P(F|
C,D)
BEFP(E|E,F)
EFGP(G|E,F)
BC
BF
EF
1
2
3
4
![Page 70: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/70.jpg)
Tree decomposition for belief updating
A
B
C D
F
E
G
ABC
BCDF
BEF
EFG
BC
BF
EF
A
BACPABPAPCB ),|()|()(),()2,1(
DC
CBDCFPBDPFB,
)2,1()3,2( ),(),|()|(),(
B
FBFBEPFE ),(),|(),( )3,2()4,3(
),|(),()3,4( FEgGPFE
E
FEFBEPFB ),(),|(),( )3,4()2,3(
FD
FBDCFPBDPCB,
)2,3()1,2( ),(),|()|(),(
1
2
3
4Time: O(exp(w+1))Space: O(exp(sep))
![Page 71: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/71.jpg)
CTE - properties• Correctness and completeness
Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence
• Time complexity: O(deg x (n+N) x dw*+1)• Space complexity: O(N x dsep)
» deg = max degree of a node in T» n = number of variables (=number of CPTs)» N = number of nodes in T» d = maximum domain size» w* = induced width» sep = separator size
![Page 72: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/72.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) Cycle cutset scheme VE+C hybrid AND/OR search (tree, graph)
![Page 73: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/73.jpg)
Conditioning
0 0 0 0
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
E
C
D
B
A 0 1
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
P(C|A)
?)0(
)0,()0|(
EP
EAPEAP
P(A=0)P(B=0|A=0)P(C=0|A=0)P(E=0|B=0,C=0)P(D=0|A=0,B=0)P(A=0)P(B=0|A=0)P(C=0|A=0)P(E=0|B=0,C=0)P(D=1|A=0,B=0)
…P(A=0)P(B=1|A=0)P(C=1|A=0)P(E=0|B=1,C=1)P(D=1|A=0,B=1)
∑ = P(A=0, E=0)
![Page 74: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/74.jpg)
Conditioning
0 0 0 0
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
E
C
D
B
A 0 1
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
P(C|A)
?)0(
)0,()0|(
EP
EAPEAP
P(A=0, E=0) P(A=1, E=0)
)0,1()0,0()0,0()0|0(
EAPEAP
EAPEAP
)0,1()0,0()0,1()0|1(
EAPEAP
EAPEAP
![Page 75: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/75.jpg)
Conditioning + Elimination
IDEA: condition until w* of the remaining graph gets small enough!
0 0 0 0
0 1 0 1
E
C
D
B
A 0 1
Search
Elimination
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
P(C|A)
w* = 1 w*
loop cutset
w* = ww* = 0
search w-cutset elimination
?)0( EP
![Page 76: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/76.jpg)
Loop-cutset method• Condition until we get a polytree (no loops)
subset of conditioning variables = loop-cutset
A
B C
ED
B C
ED
A=0 A=0 A=0
B C
ED
A=1 A=1 A=1
P(B|D=0) = P(B,A=0|D=0) + P(B,A=1|D=0)
Loop-cutset method is time exponential in loop-cutset size and linear space!
![Page 77: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/77.jpg)
w-cutset method
• Identify a w-cutset, Cw, of the network Finding smallest loop-cutset/w-cutset is NP-hard
• For each assignment of the cutset, solve by VE the conditioned subproblem
• Aggregate the solutions over all cutset assignments
• Time complexity: exp(|Cw| + w)• Space complexity: exp(w)
![Page 78: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/78.jpg)
Interleaving Conditioning and Elimination
![Page 79: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/79.jpg)
Interleaving Conditioning and Elimination
Eliminate
![Page 80: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/80.jpg)
Interleaving Conditioning and Elimination
![Page 81: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/81.jpg)
Interleaving Conditioning and EliminationEliminate
![Page 82: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/82.jpg)
Interleaving Conditioning and Elimination
![Page 83: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/83.jpg)
Interleaving Conditioning and Elimination
Condition
![Page 84: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/84.jpg)
Interleaving Conditioning and Elimination
...
...
![Page 85: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/85.jpg)
General graphical models• All algorithms generalize to any graphical
model Through general operations of combination and
marginalization General BE, BTE, CTE, VE+C Applicable to Markov networks, to constraint
optimization, to counting number of solutions in SAT/CSP, etc.
![Page 86: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/86.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) VE+C hybrid Cycle cutset scheme AND/OR search (tree, graph)
![Page 87: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/87.jpg)
Solution techniques
Search: Conditioning
Complete
Incomplete
Gradient Descent
Complete
Incomplete
Tree ClusteringVariable Elimination
Mini-Clustering(i)Mini-Bucket(i)
Stochastic Local SearchDFS search
Inference: Elimination
Time: exp(treewidth)Space:exp(treewidth)
Time: exp(n)Space: linear
AND/OR searchTime: exp(treewidth*log n)Space: linear
Hybrids
Space: exp(treewidth)Time: exp(treewidth)
Time: exp(pathwidth)Space: exp(pathwidth)
Belief Propagation
Bucket Elimination
![Page 88: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/88.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) Cycle cutset VE+C hybrid AND/OR search spaces
• AND/OR tree search• AND/OR graph search
![Page 89: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/89.jpg)
OR search spaceA
D
B C
E
F
Ordering: A B E C D F
A
D
B C
E
F
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
E
C
F
D
B
A 0 1
![Page 90: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/90.jpg)
AND/OR search space
AOR
0AND 1
BOR B
0AND 1 0 1
EOR C E C E C E C
OR D F D F D F D F D F D F D F D F
AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
AND 0 10 1 0 10 1 0 10 1 0 10 1
A
D
B C
E
F
A
D
B
CE
F
Moral graph DFS tree
A
D
B C
E
F
A
D
B C
E
F
![Page 91: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/91.jpg)
OR vs. AND/ORAOR
0AND 1
BOR B
0AND 1 0 1
EOR C E C E C E C
OR D F D F D F D F D F D F D F D F
AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
AND 0 10 1 0 10 1 0 10 1 0 10 1
E 0 1 0 1 0 1 0 1
0C 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0B 1 0 1
A 0 1
E 0 1 0 1 0 1 0 1
0C 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0B 1 0 1
A 0 1
AND/OR
OR
A
D
B C
E
FA
D
B
CE
F
1
1
1
0
1
0
![Page 92: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/92.jpg)
OR vs. AND/OR
92
AOR
0AND 1
BOR B
0AND 1 0 1
EOR C E C E C E C
OR D F D F D F D F D F D F D F D F
AND 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
AND 0 10 1 0 10 1 0 10 1 0 10 1
E 0 1 0 1 0 1 0 1
0C 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0B 1 0 1
A 0 1
AND/OR
OR
A
D
B C
E
FA
D
B
CE
F
AND/OR size: exp(4), OR size exp(6)
![Page 93: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/93.jpg)
AND/OR search spaces• The AND/OR search tree of R relative to a spanning-tree, T, has:
Alternating levels of: OR nodes (variables) and AND nodes (values)
• Successor function: The successors of OR nodes X are all its consistent values along its path The successors of AND <X,v> are all X child variables in T
• A solution is a consistent subtree• Task: compute the value of the root node
0
A
B
0
E
OR
AND
OR
AND
OR
AND
C
0
OR
AND
D
0 1
F
0 1
1
D
0 1
F
0 1
0 1
1
E C
0
D
0 1
F
0 1
1
D
0 1
F
0 1
0 1
1
B
0
E C
0
D
0 1
F
0 1
1
D
0 1
F
0 1
0 1
1
E C
0
D
0 1
F
0 1
1
D
0 1
F
0 1
0 1
A
D
B C
E
F
A
D
B
CE
F
![Page 94: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/94.jpg)
From DFS trees to pseudo trees
(a) Graph
4 61
3 2 7 5
(b) DFS treedepth=3
(c) Pseudo treedepth=2
(d) Chaindepth=6
4 6
1
3
2 7
5 2 7
1
4
3 5
6
4
6
1
3
2
7
5
(Freuder85, Bayardo&Miranker95)
![Page 95: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/95.jpg)
Pseudo tree vs. DFS tree
Model (DAG) w* Pseudo tree avg. depth
DFS tree avg. depth
(N=50, P=2) 9.54 16.82 36.03(N=50, P=3) 16.1 23.34 40.6(N=50, P=4) 20.91 28.31 43.19(N=100, P=2) 18.3 27.59 72.36(N=100, P=3) 30.97 41.12 80.47(N=100, P=4) 40.27 50.53 86.54
N = number of nodes, P = number of parents. MIN-FILL ordering. 100 instances.
![Page 96: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/96.jpg)
Finding min-depth backbone trees• Finding min depth DFS, or pseudo tree is NP-
complete, but:• Given a tree-decomposition whose treewidth
is w*, there exists a pseudo tree T of G whose depth, satisfies:
m <= w* log n
(Bayardo & Miranker96, Bodlaender & Gilbert91)
![Page 97: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/97.jpg)
Generating pseudo trees from bucket trees
FAC
B DE
E
A
C
B
D
F (AF) (EF)
(A)
(AB)
(AC) (BC)
(AE)
(BD) (DE)
Bucket-tree based on dd: A B C E D F
E
A
C
B
D
F
Induced graph
E
A
C
B
D F
Bucket-tree used as pseudo tree
AND
AND
AND
AND
0 1
BOR B
0 1 0 1
COR E C E C E C E
OR D F D F D F D F D F D F D F D F
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 10 1 0 10 1 0 10 1 0 10 1
AOR
AND/OR search tree
Bucket-tree
ABE
A
ABC
AB
BDE AEF
bucket-A
bucket-E
bucket-B
bucket-C
bucket-D bucket-F
(AE) (BE)
![Page 98: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/98.jpg)
Other heuristics for pseudo trees• Depth-first traversal of the induced graph
constructed along some elimination ordering (e.g., min-fill) Sometimes can get slightly different trees than those
obtained from the bucket-tree
• Recursive decomposition of the dual hypergraph while minimizing the separator size at each step Functions (CPTs) are vertices in the dual hypergraph,
while variables are hyperedges Separator = set of hyperedges (i.e., variables)
![Page 99: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/99.jpg)
Quality of the pseudo trees
Network hypergraph min-fill width depth width depthbarley 7 13 7 23diabetes 7 16 4 77link 21 40 15 53mildew 5 9 4 13munin1 12 17 12 29munin2 9 16 9 32munin3 9 15 9 30munin4 9 18 9 30water 11 16 10 15pigs 11 20 11 26
Bayesian Networks Repository
![Page 100: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/100.jpg)
AND/OR search tree properties• Theorem: Any AND/OR search tree based on a pseudo tree is
sound and complete (expresses all and only solutions)
• Theorem: Size of AND/OR search tree is O(n km)Size of OR search tree is O(kn)
• Theorem: Size of AND/OR search tree can be bounded by O(exp(w* log n))
• Related to: (Freuder85; Dechter90, Bayardo et. al. 96, Darwiche01, Bacchus et. al. 03)
• When the pseudo-tree is a chain we get an OR space
![Page 101: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/101.jpg)
AND/OR vs. OR spaces
width depthOR space AND/OR space
Time (sec.) Nodes Time (sec.) AND nodes OR nodes
5 10 3.15 2,097,150 0.03 10,494 5,247
4 9 3.13 2,097,150 0.01 5,102 2,551
5 10 3.12 2,097,150 0.03 8,926 4,463
4 10 3.12 2,097,150 0.02 7,806 3,903
5 13 3.11 2,097,150 0.10 36,510 18,255
Random graphs with 20 nodes, 20 edges and 2 values per node
![Page 102: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/102.jpg)
Tasks and values of nodes• v(n) is the value of the tree T(n) for the task:
Optimization (MPE): v(n) is the optimal solution in T(n) Belief updating: v(n), probability of evidence in T(n).
• Goal: compute the value of the root node recursively using DFS search of the AND/OR tree.
• Theorem: Complexity of AO DFS search is: Space: O(n) Time: O(n km) Time: O(exp(w* log n))
![Page 103: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/103.jpg)
Weighted AND/OR tree (belief updating)
0
A
B
0
E
OR
AND
OR
AND
OR
AND
C
0
OR
AND
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
1
B
0
E C
0
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
A
D
B C
E
A
D
B
CE
B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5
),|( CBDP
.7.8 .9 .5 .7.8 .9 .5
Evidence: D=1
A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8
),|( BAEP
Evidence: E=0
.4 .5 .7 .2
A B=0 B=10 .4 .61 .1 .9
)|( ABPA C=0 C=10 .2 .81 .7 .3
)|( ACPA P(A)0 .61 .4
)(AP
.2 .8 .2 .8 .1 .9 .1 .9
.4 .6 .1 .9
.6 .4
D: P(D|B,C) D=1C: P(C|A)E: P(E|A,B) E=0B: P(B|A)A: P(A)
w(X,x) = product of CPTs that contain X and their scope is fully instantiated along the path
![Page 104: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/104.jpg)
Computing node values (belief updating)
k
i
iviwv1
),A(),A(AOR node
1
A
2 k
w(A,1)w(A,2)
w(A,k)
v(A,1) v(A,1) v(A,1)…
AND node
0
X1 X2 Xm…v(X1) v(X2) v(Xm)
m
iiXvv
1
0,A
NOTE: • the value of a terminal AND node is 1• the weight of an OR-AND arc for which no CPTs are fully instantiated is 1
![Page 105: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/105.jpg)
AND/OR tree algorithm (belief updating)
AND node: Combination operator (product)
OR node: Marginalization operator (summation)
Value of node = updated belief for sub-problem below
0
A
B
0
E
OR
AND
OR
AND
OR
AND
C
0
OR
AND
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
1
B
0
E C
0
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
A
D
B C
E
A
D
B
CE
B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5
),|( CBDP
.7.8 .9 .5 .7.8 .9 .5
Evidence: D=1
A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8
),|( BAEP
Evidence: E=0
.4 .5 .7 .2
A B=0 B=10 .4 .61 .1 .9
)|( ABPA C=0 C=10 .2 .81 .7 .3
)|( ACPA P(A)0 .61 .4
)(AP
.2 .8 .2 .8 .1 .9 .1 .9
.4 .6 .1 .9
.6 .4
.8 .9
.8 .9
.7 .5
.7 .5
.8 .9
.8 .9
.7 .5
.7 .5
.4 .5 .7 .2.88 .54 .89 .52
.352 .27 .623 .104
.3028 .1559
.24408
.3028 .1559
Result: P(D=1,E=0)
0.3028*0.6 + 0.1559*0.4 = 0.24408
![Page 106: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/106.jpg)
Complexity of AND/OR tree search
AND/OR tree OR tree
Space O(n) O(n)
TimeO(n km)
O(n kw* log n)(Freuder & Quinn85), (Collin, Dechter & Katz91), (Bayardo & Miranker95), (Darwiche01)
O(kn)
k = domain sizem = depth of pseudo-treen = number of variablesw*= treewidth
![Page 107: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/107.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) VE+C hybrid AND/OR search spaces
• AND/OR tree search• AND/OR graph search
![Page 108: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/108.jpg)
From search trees to search graphs• Any two nodes that root identical sub-trees or
sub-graphs can be merged
![Page 109: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/109.jpg)
From search trees to search graphs• Any two nodes that root identical sub-trees or
sub-graphs can be merged
![Page 110: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/110.jpg)
AND/OR search treeA
D
B C
E
F
G H
J
K
A
D
B
CE
F
G
H
J
KAOR
0AND 1
BOR B
0AND 1 0 1
EOR C E C E C E C
OR D F D F D F D F D F D F D F D F
AND
AND 0 10 1 0 10 1 0 10 1 0 10 1
OR
OR
AND
AND
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
![Page 111: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/111.jpg)
AND/OR search graph
AOR
0AND 1
BOR B
0AND 1 0 1
EOR C E C E C E C
OR D F D F D F D F D F D F D F D F
AND
AND 0 10 1 0 10 1 0 10 1 0 10 1
OR
OR
AND
AND
0
G
H H
0 1 0 1
0 1
1
G
H H
0 1 0 1
0 1
0
J
K K
0 1 0 1
0 1
1
J
K K
0 1 0 1
0 1
A
D
B C
E
F
G H
J
K
A
D
B
CE
F
G
H
J
K
![Page 112: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/112.jpg)
Merging based on context• One way of recognizing nodes that can be merged
context(X) = ancestors of X in the pseudo tree that are connected to X, or to descendants
of X
[ ]
[A]
[AB]
[AE][BC]
[AB]
A
D
B
EC
F
pseudo tree
A
E
C
B
F
D
A
E
C
B
F
D
![Page 113: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/113.jpg)
AND/OR graph algorithm (belief updating)
.7.8
0
A
B
0
E C
0
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
1
B
0
E C
0 10 1
1
E C
0 10 1
A
D
B C
E
B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5
),|( CBDP.7.8 .9 .5
A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8
),|( BAEP
Evidence: E=0
.4 .5 .7 .2
A B=0 B=10 .4 .61 .1 .9
)|( ABPA C=0 C=10 .2 .81 .7 .3
)|( ACPA P(A)0 .61 .4
)(AP
.2 .8 .2 .8 .1 .9 .1 .9
.4 .6 .1 .9
.6 .4
.9
.8 .9
.5
.7 .5 .8 .9 .7 .5
.4 .5 .7 .2.88 .54 .89 .52
.352 .27 .623 .104
.3028 .1559
.24408
.3028 .1559
A
D
B
CE
[ ]
[A]
[AB]
[BC]
[AB]
Context
B C Value0 0 .80 1 .91 0 .71 1 .1
Cache table for D
Result: P(D=1,E=0)
![Page 114: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/114.jpg)
Context-minimal AND/OR graphC
0
K
0
H
0
L0 1
N N
0 1 0 1
F F F
1 1
0 1 0 1
F
G
0 1
1
A0 1
B B
0 1 0 1
E EE E
0 1 0 1
J JJ J
0 1 0 1
A0 1
B B
0 1 0 1
E EE E
0 1 0 1
J JJ J
0 1 0 1
G
0 1
G
0 1
G
0 1
M
0 1
M
0 1
M
0 1
M
0 1
P
0 1
P
0 1
O
0 1
O
0 1
O
0 1
O
0 1
L0 1
N N
0 1 0 1
P
0 1
P
0 1
O
0 1
O
0 1
O
0 1
O
0 1
D
0 1
D
0 1
D
0 1
D
0 1
K
0
H
0
L0 1
N N
0 1 0 1
1 1
A0 1
B B
0 1 0 1
E EE E
0 1 0 1
J JJ J
0 1 0 1
A0 1
B B
0 1 0 1
E EE E
0 1 0 1
J JJ J
0 1 0 1
P
0 1
P
0 1
O
0 1
O
0 1
O
0 1
O
0 1
L0 1
N N
0 1 0 1
P
0 1
P
0 1
O
0 1
O
0 1
O
0 1
O
0 1
D
0 1
D
0 1
D
0 1
D
0 1
B A
C
E
F G
HJ
D
K M
L
N
OP
C
HK
D
M
F
G
A
B
E
J
O
L
N
P
[AB]
[AF][CHAE]
[CEJ]
[CD]
[CHAB]
[CHA]
[CH]
[C]
[ ]
[CKO]
[CKLN]
[CKL]
[CK]
[C]
(C K H A B E J L N O D P M F G)
![Page 115: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/115.jpg)
How big is the context?
Theorem: The maximum context size for a pseudo tree is equal to the treewidth of the graph along the pseudo tree.
C
HK
D
M
F
G
A
B
E
J
O
L
N
P
[AB]
[AF][CHAE]
[CEJ]
[CD]
[CHAB]
[CHA]
[CH]
[C]
[ ]
[CKO]
[CKLN]
[CKL]
[CK]
[C]
(C K H A B E J L N O D P M F G)
B A
C
E
F G
H
J
D
K M
L
N
OP
max context size = treewidth
![Page 116: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/116.jpg)
Treewidth vs. pathwidth
G
EK
F
L
H
C
BA
M
J
D
EK
L
H
C
A
M
J
ABC
BDEF
BDFG
EFH
FHK
HJ KLM
treewidth = 3 = (max cluster size) - 1
ABC
BDEFG
EFH
FHKJ
KLM
pathwidth = 4 = (max cluster size) - 1
D
G
B
F
TREE
CHAIN
![Page 117: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/117.jpg)
AND/OR graph search
• AO(i): searches depth-first, cache i-context i = the max size of a cache table (i.e. number of
variables in a context)
i=0 i=w*
Space: O(n)
Time: O(exp(w* log n))
Space: O(exp w*)
Time: O(exp w*)Space: O(exp(i) )
Time: O(exp(m_i+i )
i
![Page 118: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/118.jpg)
Complexity of AND/OR graph search
AND/OR graph OR graph
Space O(n kw*) O(n kpw*)
Time O(n kw*) O(n kpw*)
k = domain sizen = number of variablesw*= treewidthpw*= pathwidth
w* ≤ pw* ≤ w* log n
![Page 119: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/119.jpg)
Related work• Recursive Conditioning (RC) (Darwiche01)
Can be viewed as an AND/OR graph search algorithm guided by tree
Guiding tree structure is called “dtree”
• Value Elimination (VE) (Bacchus et al.03) Also an AND/OR graph search algorithm using an
advanced caching scheme based on components rather than graph-based contexts
Can use dynamic variable orderings
![Page 120: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/120.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) VE+C hybrid AND/OR search spaces
• AND/OR tree search• AND/OR graph search
![Page 121: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/121.jpg)
AND/OR w-cutset
A
C
B K
G L
D F
H
M
J
E
AC
B K
G
L
D
FH
M
J
E
A
C
B K
G L
D F
H
M
J
E
C
B K
G
L
D
FH
M
J
E
3-cutset
A
C
B K
G L
D F
H
M
J
E
C
K
G
L
D
FH
M
J
E
2-cutset
A
C
B K
G L
D F
H
M
J
E
L
D
FH
M
J
E
1-cutset
![Page 122: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/122.jpg)
AND/OR w-cutset
AC
B K
G
L
D
FH
M
J
E
AC
B K
G
L
D
FH
M
J
E
AC
B K
G
L
D
FH
M
J
E
pseudo tree 1-cutset treemoral graph
![Page 123: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/123.jpg)
Searching AND/OR graphs
• AO(i): searches depth-first, cache i-context i = the max size of a cache table (i.e. number of
variables in a context)
i=0 i=w*
Space: O(n)
Time: O(exp(w* log n))
Space: O(exp w*)
Time: O(exp w*)Space: O(exp(i) )
Time: O(exp(m_i+i )
i
![Page 124: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/124.jpg)
w-cutset trees over AND/OR space
• Definition: T_w is a w-cutset tree relative to backbone pseudo tree T, iff T_w roots
T and when removed, yields treewidth w.
• Theorem: AO(i) time complexity for backbone T is time O(exp(i+m_i)) and space
O(i), m_i is the depth of the T_i tree.
• Better than w-cutset: O(exp(i+c_i)) where c_i is the number of nodes in T_i
![Page 125: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/125.jpg)
Exact inference• Variable elimination (inference)
Bucket elimination Bucket-Tree elimination Cluster-Tree elimination
• Conditioning (search) VE+C hybrid AND/OR search for Most Probable Explanations
![Page 126: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/126.jpg)
AND/OR Branch-and-Bound for MPE
• Solved by BE in time and space exponential in treewidth w*
• Solved by Conditioning in linear space and time exponential in the number of variables n
• It can be solved by AND/OR search: Tree search: space O(n), time O(exp(w* log n)) Graph search: time and space O(exp(w*))
n
iiiXX paXPMPE
n1
,..., |max1
![Page 127: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/127.jpg)
Weighted AND/OR tree (MPE task)
0
A
B
0
E
OR
AND
OR
AND
OR
AND
C
0
OR
AND
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
1
B
0
E C
0
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
A
D
B C
E
A
D
B
CE
B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5
),|( CBDP
.7.8 .9 .5 .7.8 .9 .5
Evidence: D=1
A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8
),|( BAEP
Evidence: E=0
.4 .5 .7 .2
A B=0 B=10 .4 .61 .1 .9
)|( ABPA C=0 C=10 .2 .81 .7 .3
)|( ACPA P(A)0 .61 .4
)(AP
.2 .8 .2 .8 .1 .9 .1 .9
.4 .6 .1 .9
.6 .4
D: P(D|B,C) D=1C: P(C|A)E: P(E|A,B) E=0B: P(B|A)A: P(A)
w(X,x) = product of CPTs that contain X and their scope is fully instantiated along the path
![Page 128: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/128.jpg)
Computing node values (MPE task)
),(),(maxA iAviAwv i OR node
1
A
2 k
w(A,1)w(A,2)
w(A,k)
v(A,1) v(A,1) v(A,1)…
AND node
0
X1 X2 Xm…v(X1) v(X2) v(Xm)
m
iiXvv
1
0,A
NOTE: • the value of a terminal AND node is 1• the weight of an OR-AND arc for which no CPTs are fully instantiated is 1
![Page 129: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/129.jpg)
AND/OR tree algorithm (MPE task)
AND node: Combination operator (product)
OR node: Marginalization operator (maximization)
Value of node = MPE value for sub-problem below
0
A
B
0
E
OR
AND
OR
AND
OR
AND
C
0
OR
AND
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
1
B
0
E C
0
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
A
D
B C
E
A
D
B
CE
B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5
),|( CBDP
.7.8 .9 .5 .7.8 .9 .5
Evidence: D=1
A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8
),|( BAEP
Evidence: E=0
.4 .5 .7 .2
A B=0 B=10 .4 .61 .1 .9
)|( ABPA C=0 C=10 .2 .81 .7 .3
)|( ACPA P(A)0 .61 .4
)(AP
.2 .8 .2 .8 .1 .9 .1 .9
.4 .6 .1 .9
.6 .4
.8 .9
.8 .9
.7 .5
.7 .5
.8 .9
.8 .9
.7 .5
.7 .5
.4 .5 .7 .2.72 .40 .81 .45
.288 .20 .567 .09
.12 .081
.072
.12 .081
Result: MPE(D=1,E=0)
MAX( 0.12*0.6 , 0.081*0.4 )= 0.072
![Page 130: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/130.jpg)
Branch-and-Bound search
n
g(n) cost of thesearch path to n
h(n) estimates theoptimal cost below n
UB(n) = g(n) * h(n)
Upper Bound UB(n)
OR Search Tree
Prune if UB(n) ≤ LB
Lower Bound LB
(Lawler & Wood66)
![Page 131: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/131.jpg)
Partial solution tree
0
D
0
(A=0, B=0, C=0, D=0)
0
A
B C
0
0
A
B C
00
D
1
(A=0, B=0, C=0, D=1)
0
A
B C
01
D
0
(A=0, B=1, C=0, D=0)
0
A
B C
01
D
1
(A=0, B=1, C=0, D=1)
A
B C
D
Pseudo tree
Extension(T’) – solution trees that extend T’
![Page 132: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/132.jpg)
Exact evaluation function
OR
AND
OR
AND
OR
OR
AND
AND
A
0
B
0
D
E E
0 1 0 1
0 1
C
1
1
6 4 8 54 5
4 5
2 4
9
9
2 5 0 0
0 0
0
1
0
0
D
0
C
1
v(D,0)
3
3 5 00
9
tip nodes
F
1
3
3 50
F v(F)
A
B
C
D E
F
A
B
C D
E
F
A B C f1(ABC)
0 0 0 20 0 1 50 1 0 30 1 1 51 0 0 91 0 1 31 1 0 71 1 1 2
A B F f2(ABF)
0 0 0 30 0 1 50 1 0 10 1 1 41 0 0 61 0 1 51 1 0 61 1 1 5
B D E f3(BDE)
0 0 0 60 0 1 40 1 0 80 1 1 51 0 0 91 0 1 31 1 0 71 1 1 4
f*(T’) = w(A,0) * w(B,1) * w(C,0) * w(D,0) * v(D,0) * v(F)
![Page 133: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/133.jpg)
Heuristic evaluation function
OR
AND
OR
AND
OR
OR
AND
AND
A
0
B
0
D
E E
0 1 0 1
0 1
C
1
1
6 4 8 54 5
4 5
2 4
9
9
2 5 0 0
0 0
0
1
0
0
D
0
C
1
h(D,0) = 4
3
3 5 00
9
tip nodes
F
1
3
3 50
F h(F) = 5
A
B
C
D E
F
A
B
C D
E
F
A B C f1(ABC)
0 0 0 20 0 1 50 1 0 30 1 1 51 0 0 91 0 1 31 1 0 71 1 1 2
A B F f2(ABF)
0 0 0 30 0 1 50 1 0 10 1 1 41 0 0 61 0 1 51 1 0 61 1 1 5
B D E f3(BDE)
0 0 0 60 0 1 40 1 0 80 1 1 51 0 0 91 0 1 31 1 0 71 1 1 4
f(T’) = w(A,0) * w(B,1) * w(C,0) * w(D,0) * h(D,0) * h(F) ≥ f*(T’)
h(n) ≥ v(n)
![Page 134: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/134.jpg)
AND/OR Branch-and-Bound searchOR
AND
OR
AND
OR
OR
AND
AND
A
0
B
0
D
E E
0 1 0 1
0 1
C
1
1
1
0
D
E E
0 1 0 1
0 1
C
10
B
0 1
f(T’) ≤ LB
LB (Marinescu and Dechter, 05)
![Page 135: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/135.jpg)
AND/OR Branch-and-Bound search• Associate each node n with a heuristic upper
bound h(n) on v(n)• EXPAND (top-down)
Evaluate f(T’) of the current partial solution sub-tree T’, and prune search if f(T’) ≤ LB
Expand the tip node n by generating its successors• PROPAGATE (bottom-up)
Update value of the parent p of n• OR nodes: maximization• AND nodes: product
![Page 136: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/136.jpg)
How to Generate Heuristics• The principle of relaxed models
Mini-Bucket Elimination for belief networks(Pearl86)
![Page 137: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/137.jpg)
Grid Networks (BN)
Grid(w*, h)(n, e)
SamIamv. 2.3.2
time
MBE(i)BB+SMB(i)
AOBB+SMB(i)BB+DMB(i)
AOBB+DMB(i)i=10
MBE(i)BB+SMB(i)
AOBB+SMB(i)BB+DMB(i)
AOBB+DMB(i)i=14
MBE(i)BB+SMB(i)
AOBB+SMB(i)BB+DMB(i)
AOBB+DMB(i)i=18
MBE(i)BB+SMB(i)
AOBB+SMB(i)BB+DMB(i)
AOBB+DMB(i)i=20
time nodes time nodes time nodes time nodes 90-24-1(36, 61)(576, 20)
-
0.14----
----
0.89-
1500.66--
-
24,117,151--
7.61-
93.73-
1979.42
-
1,413,764-
1,228
31.26-
111.46-
2637.71
-
1,308,009-
598 90-26-1(35, 64)(676, 40)
-
0.16-
1533.11-
1852.27
-
17,899,574-
177,661
1.02-
242.37--
-
3,205,257--
11.74-
21.48-
2889.49
-
165,182-
1,191
36.1670.5336.49
--
327,859
5,777--
90-30-1(38, 68)(900, 60)
-
0.25----
----
1.35-
239.08--
-
3,324,942--
13.34-
101.10--
-
1,358,569--
50.53-
87.68--
-
485,300--
Min-fill pseudo tree. Time limit 1 hour.
(Sang et al.05)
![Page 138: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/138.jpg)
Genetic Linkage Analysis (BN)
pedigree(n, d)(w*, h)
Superlink
v. 1.6
time
SamIam
v. 2.3.2
time
MBE(i)BB+SMB(i)
AOBB+SMB(i)i=12
MBE(i)BB+SMB(i)
AOBB+SMB(i)i=16
MBE(i)BB+SMB(i)
AOBB+SMB(i)i=20
time nodes time nodes time nodesped18(1184, 5)(21, 119)
139.06
157.05
0.51--
--
4.59-
270.96
-
2,555,078
19.30-
20.27
-
7,689ped25(994, 5)(29, 53)
-
out
0.34--
--
3.20--
--
33.42-
1894.17
-
11,709,153ped30(1016, 5)(25, 51)
13095.83
out
0.31-
5563.22
-
63,068,960
2.66-
1811.34
-
20,275,620
24.88-
82.25
-
588,558ped33(581, 5)(26, 48)
-
out
0.41-
2335.28
-
32,444,818
5.28-
62.91
-
807,071
51.24-
76.47
-
320,279ped39(1272, 5)(23, 94)
322.14
out
0.52--
--
8.41-
4041.56
-
52,804,044
81.27-
141.23
-
407,280
(Fishelson&Geiger02)
Min-fill pseudo tree. Time limit 3 hours.
![Page 139: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/139.jpg)
Memory intensive AND/OR Branch-and-Bound
• Associate each node n with a heuristic upper bound h(n) on v(n)
• EXPAND (top-down) Evaluate f(T’) of the current partial solution sub-tree T’, and
prune search if f(T’) ≤ LB If not in cache, expand the tip node n by generating its
successors• PROPAGATE (bottom-up)
Update value of the parent p of n• OR nodes: maximization• AND nodes: multiplication
Cache value of n, based on context
![Page 140: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/140.jpg)
Best-first AND/OR search for MPE• Best-first search expands first the node with
the best heuristic evaluation function among all nodes encountered so far
• It never expands nodes whose cost is beyond the optimal one, unlike depth-first search algorithms (Dechter & Pearl85)
• Superior among memory intensive algorithms employing the same heuristic function
![Page 141: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/141.jpg)
Best-First AND/OR Search• Maintains the set of best partial solution trees• EXPAND (top-down)
Traces down marked connectors from root (best partial solution tree) Expands a tip node n by generating its successors n’ Associate each successor with heuristic estimate h(n’)
• Initialize v(n’) = h(n’)
• REVISE (bottom-up) Updates node values v(n)
• OR nodes: maximization• AND nodes: multiplication
Marks the most promising solution tree from the root Label the nodes as SOLVED:
• OR is SOLVED if marked child is SOLVED• AND is SOLVED if all children are SOLVED
• Terminate when root node is SOLVED
[specializes Nilsson’s AO* to graphical models (Nilsson80)]
(Marinescu & Dechter, 07)
![Page 142: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/142.jpg)
Grid Networks (BN)
grid (w*, h)(n, e)
SamIam
MBE(i)BB-C+SMB(i)AOBB+SMB(i)
AOBB-C+SMB(i)AOBF-C+SMB(i)
i=12
MBE(i)BB-C+SMB(i)AOBB+SMB(i)
AOBB-C+SMB(i)AOBF-C+SMB(i)
i=14
MBE(i)BB-C+SMB(i)AOBB+SMB(i)
AOBB-C+SMB(i)AOBF-C+SMB(i)
i=16
MBE(i)BB-C+SMB(i)AOBB+SMB(i)
AOBB-C+SMB(i)AOBF-C+SMB(i)
i=18
time nodes time nodes time nodes time nodes 90-24-1(33, 111)(576, 20)
out
0.28---
out
---
0.64-
2338.671273.09
21.94
-
24,117,1519,047,518
75,637
1.69-
1548.09596.27
10.59
-
18,238,9834,923,760
33,770
4.60-
138.6770.42
6.06
-
1,413,764473,675
5,144 90-34-1(45, 153)(1154, 80)
out
0.63---
out
---
1.25---
out
---
3.72---
243.63
---
596,978
11.66---
270.88
---
667,013 90-38-1(47, 163)(1444, 120)
out
0.78-
2032.33969.02101.69
-
6,835,7452,623,971
174,786
1.67--
1753.10103.80
--
3,794,053146,237
4.20-
807.38203.67
54.00
-
2,850,393614,868
95,511
12.36-
568.69165.45
53.44
-
2,079,146488,873
78,431Min-fill pseudo tree. Time limit 1 hour.
![Page 143: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/143.jpg)
Solving the MAP task
• Solved by BE in time and space exponential in constrained induced width w*
• Solved by AND/OR search: Tree search: space O(n), time O(exp(w* log n)) Graph search: time and space O(exp(w*))
AXa
k exPaa\
**1 ),(maxarg),...,(MAP
![Page 144: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/144.jpg)
Bucket elimination for MAP
A
B C
ED
P(A)
P(B|A)
P(E|B,C)
P(D|A,B)
P(C|A)
A
B C
ED
Moralize (marry parents)
Variables A and B are the hypothesis variables, variable E is evidence
0,,,, ),,,,(max0,,maxedcbaba edcbaPebaPMAP
c d eba cbePbadPacPabPaPMAP0
),|(),|()|()|(max)(max
![Page 145: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/145.jpg)
Bucket elimination for MAP
Bucket E:
Bucket D:
Bucket C:
Bucket B:
Bucket A:
P(E|B,C), E = 0
P(D|A,B)
P(A)
λE(B, C)
λC(A,B)λD(A, B)
λB(A)
MAP value
P(C|A)
P(B|A)
SUM buckets
MAX buckets
![Page 146: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/146.jpg)
Bucket elimination for MAP• Elimination order is important: SUM variables are eliminated
first, followed by the MAX variables ordering: A, B, C, D, E is legal ordering: A, C, D, E, B is illegal
• Induced width corresponding to a legal elimination order is called constrained induced width cw* Typically it may be far larger than the unconstrained induced width,
ie cw* ≥ w*
• When interleaving MAX and SUM (using unconstrained orderings) the result is an Upper Bound on the MAP value Can be used as a guiding heuristic function for search
![Page 147: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/147.jpg)
AND/OR tree algorithm for MAP
AND node: Combination operator (product)
OR node: MAX for hypothesis, SUM otherwise
0
A
B
0
E
OR
AND
OR
AND
OR
AND
C
0
OR
AND
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
1
B
0
E C
0
D
0 1
1
D
0 1
0 1
1
E C
0
D
0 1
1
D
0 1
0 1
A
D
B C
E
A
D
B
CE
B C D=0 D=10 0 .2 .80 1 .1 .91 0 .3 .71 1 .5 .5
),|( CBDP
.7.8 .9 .5 .7.8 .9 .5
Evidence: D=1
A B E=0 E=10 0 .4 .60 1 .5 .51 0 .7 .31 1 .2 .8
),|( BAEP
Evidence: E=0
.4 .5 .7 .2
A B=0 B=10 .4 .61 .1 .9
)|( ABPA C=0 C=10 .2 .81 .7 .3
)|( ACPA P(A)0 .61 .4
)(AP
.2 .8 .2 .8 .1 .9 .1 .9
.4 .6 .1 .9
.6 .4
.8 .9
.8 .9
.7 .5
.7 .5
.8 .9
.8 .9
.7 .5
.7 .5
.4 .5 .7 .2.88 .54 .89 .52
.352 .27 .623 .104
.162 .0936
.0972
.162 .0936
Result: MAP(D=1,E=0)
MAX( 0.162*0.6 , 0.0936*0.4 )= 0.0972
![Page 148: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/148.jpg)
AND/OR search for MAP• Pseudo tree must be consistent with the
constrained elimination order• Graph search via context-based caching
• Time and space complexity Tree search:
• Space linear, time O(exp(cw*log n)) Graph search:
• Time and space O(exp(cw*))
![Page 149: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/149.jpg)
Outline• Probabilistic modeling with joint distributions• Conditional independence and factorization• Belief networks• Inference in belief networks
Exact inference Approximate inference
![Page 150: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/150.jpg)
Approximate inference• Mini-Bucket Elimination
Mini-clustering• Iterative Belief Propagation
IJGP – Iterative Joint Graph Propagation• Sampling
Forward sampling Gibbs sampling (MCMC) Importance sampling
![Page 151: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/151.jpg)
Solution techniques
Search: Conditioning
Complete
Incomplete
Gradient Descent
Complete
Incomplete
Tree ClusteringVariable Elimination
Mini-Clustering(i)Mini-Bucket(i)
Stochastic Local SearchDFS search
Inference: Elimination
Time: exp(treewidth)Space:exp(treewidth)
Time: exp(n)Space: linear
AND/OR searchTime: exp(treewidth*log n)Space: linear
Hybrids
Space: exp(treewidth)Time: exp(treewidth)
Time: exp(pathwidth)Space: exp(pathwidth)
Belief Propagation
Bucket Elimination
![Page 152: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/152.jpg)
Variable elimination (MPE)
A
B C
ED
P(A)
P(C|A)P(B|A)
P(E|B,C)
P(D|A,B)
MPE = ?
maxA,E=0,D,C,B P(A) P(B|A) P(C|A) P(D|A,B) P(E|B,C) =
maxAP(A) maxE=0 maxD maxC P(C|A) maxB P(B|A) P(D|A,B) P(E|B,C)
λB(A,D,C,E)Variable Elimination
Given a belief network and some evidence
![Page 153: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/153.jpg)
Bucket elimination (MPE)
Bucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C), P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
max∏ Elimination operator
λB(A,D,C,E)
λC(A,D,E)
λD(A,E)
λE(A)
MPE
B
C
D
E
A
w* = 4“induced width”(max clique size)
width
4
3
1
1
0
![Page 154: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/154.jpg)
MBE: Mini-Bucket Elimination• Computation in a bucket is time and space
exponential in the number of variables involved (i.e., width)
• Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables
• The idea is similar to i-consistency: bound the size of recorded dependencies (Dechter 2003)
![Page 155: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/155.jpg)
Idea: MPE task
XX gh
Split a bucket into mini-buckets => bound complexity
)()()O(e :decrease complexity lExponentia n rnr eOeO
![Page 156: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/156.jpg)
MBE(i=3) in action for MPE
Bucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C) P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
λB(C,E)
λC(A,D,E)
Upper Bound on MPE value
λE(A)
λB(A,D)
λD(A,E)
4 variables: split
3 variables: OK
3 variables: OK
2 variables: OK
1 variable: OK
Mini-bucketsmax∏max∏
![Page 157: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/157.jpg)
MBE(i=3) in action for MPEBucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C), P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
λB(C,E)
λC(A,D,E)
λE(A)
λB(A,D)
λD(A,E)
a’ = argmax P(A) ∙ λE(A)
e’ = 0
d’ = argmax λC(a’,D,e’) ∙ ∙ λC(a’,D)
c’ = argmax P(C|a’) ∙∙ λC(C,e’)
b’ = argmax P(e’|B,c’) ∙ ∙ P(d’|a’,B) P(B|a’)∙
Return (a’, b’, c’, d’, e’)A Lower Bound can also be computed as the probability of
the sub-optimal assignment P(a’, b’, c’, d’, e’)
![Page 158: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/158.jpg)
MBE(i=3) for probability of evidence
Bucket B:
Bucket C:
Bucket D:
Bucket E:
Bucket A:
P(E|B,C) P(D|A,B), P(B|A)
P(C|A)
E=0
P(A)
λB(C,E)
λC(A,D,E)
Upper Bound on P(evidence)
λE(A)
λB(A,D)
λD(A,E)
4 variables: split
3 variables: OK
3 variables: OK
2 variables: OK
1 variable: OK
Mini-buckets∑∏∑∏
![Page 159: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/159.jpg)
MBE(i) for probability of evidence• If we process all mini-buckets by summation
then we get an unnecessarily large upper bound on the probability of evidence
• Tighter upper bound Process first mini-bucket by summation and
remaining ones by maximization• We can also get a lower bound on P(evidence)
Process first mini-bucket by summation and remaining ones by minimization
![Page 160: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/160.jpg)
Properties of MBE(i)• Controlling parameter i (called i-bound)
Maximum number of distinct variables in a mini-bucket Outputs both a lower and an upper bound
• Complexity: O(exp(i)) time and space• As i-bound increases, both accuracy and time complexity
increase Clearly, if i = w*, then we have pure BE
• Possible use of mini-bucket approximations As anytime algorithms (Dechter & Rish, 1997) As heuristic functions for depth-first and best-first search (Kask
& Dechter, 2001), (Marinescu & Dechter, 2005)
![Page 161: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/161.jpg)
Mini-Bucket Heuristics• Static Mini-Buckets
Pre-compiled Reduced overhead Less accurate Static variable ordering
• Dynamic Mini-Buckets Computed dynamically Higher overhead High accuracy Dynamic variable ordering
![Page 162: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/162.jpg)
Heuristic evaluation function
OR
AND
OR
AND
OR
OR
AND
AND
A
0
B
0
D
E E
0 1 0 1
0 1
C
1
1
6 4 8 54 5
4 5
2 4
9
9
2 5 0 0
0 0
0
1
0
0
D
0
C
1
h(D,0) = 4
3
3 5 00
9
tip nodes
F
1
3
3 50
F h(F) = 5
A
B
C
D E
F
A
B
C D
E
F
A B C f1(ABC)
0 0 0 20 0 1 50 1 0 30 1 1 51 0 0 91 0 1 31 1 0 71 1 1 2
A B F f2(ABF)
0 0 0 30 0 1 50 1 0 10 1 1 41 0 0 61 0 1 51 1 0 61 1 1 5
B D E f3(BDE)
0 0 0 60 0 1 40 1 0 80 1 1 51 0 0 91 0 1 31 1 0 71 1 1 4
f(T’) = w(A,0) * w(B,1) * w(C,0) * w(D,0) * h(D,0) * h(F) ≥ f*(T’)
h(n) ≥ v(n)
![Page 163: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/163.jpg)
Bucket eliminationA
f(A,B)B
f(B,C)C f(B,F)F
f(A,G) f(F,G)
Gf(B,E) f(C,E)
Ef(A,D) f(B,D) f(C,D)
D
hG (A,F)
hF (A,B)
hB (A)
hE (B,C)hD (A,B,C)
hC (A,B)
A B
CD
E
F
G
A
B
C F
GD E
Ordering: (A, B, C, D, E, F, G)
h*(a, b, c) = hD(a, b, c) * hE(b, c)
(Dechter99)
![Page 164: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/164.jpg)
Static mini-bucket heuristics
A
f(A,B)B
f(B,C)C f(B,F)F
f(A,G) f(F,G)
Gf(B,E) f(C,E)
Ef(B,D) f(C,D)
D
hG (A,F)
hF (A,B)
hB (A)
hE (B,C)hD (B,C)
hC (B)
hD (A)
f(A,D)D
mini-buckets
A B
CD
E
F
G
A
B
C F
GD E
Ordering: (A, B, C, D, E, F, G)
h(a, b, c) = hD(a) * hD(b, c) * hE(b, c) ≥ h*(a, b, c)
MBE(3)
![Page 165: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/165.jpg)
Dynamic mini-bucket heuristics
A
f(a,b)B
f(b,C)C f(b,F)F
f(a,G) f(F,G)
Gf(b,E) f(C,E)
Ef(a,D) f(b,D) f(C,D)
D
hG (F)
hF ()
hB ()
hE (C)hD (C)
hC ()
A B
CD
E
F
G
A
B
C F
GD E
Ordering: (A, B, C, D, E, F, G)
h(a, b, c) = hD(c) * hE(c) = h*(a, b, c)
MBE(3)
![Page 166: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/166.jpg)
Static vs. Dynamic Mini-Bucket Heuristics
s1196 ISCAS’89 circuit.
![Page 167: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/167.jpg)
Approximate inference• Mini-Bucket Elimination
Mini-clustering (tree decompositions)• Iterative Belief Propagation
IJGP – Iterative Joint Graph Propagation• Sampling
Forward sampling Gibbs sampling (MCMC) Importance sampling Particle filtering
![Page 168: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/168.jpg)
Cluster Tree Elimination (CTE)• Correctness and completeness:
Algorithm CTE is correct, i.e. it computes the exact posterior joint probability of all single variables (or subsets) and the evidence.
• Time complexity: O ( deg (n+N) d w*+1 )
• Space complexity: O ( N d sep)where deg = the maximum degree of a node
n = number of variables (= number of CPTs)N = number of nodes in the tree decompositiond = the maximum domain size of a variablew* = the induced widthsep = the separator size
![Page 169: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/169.jpg)
Cluster Tree Elimination - messages
),|()|()(),()2,1( bacpabpapcbha
A B C p(a), p(b|a), p(c|a,b)
B C D Fp(d|b), p(f|c,d)
h(1,2)(b,c)
B E Fp(e|b,f), h(2,3)(b,f)
E F Gp(g|e,f)
),(),|()|(),( )2,1(,
)3,2( cbhdcfpbdpfbhdc
2
4
1
3
EF
BC
BFsep(2,3)={B,F}
elim(2,3)={C,D}
G
E
F
C D
B
A
![Page 170: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/170.jpg)
Mini-Clustering for belief updating• Motivation:
Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem
When the induced width w* is big, CTE algorithm becomes infeasible
• The basic idea: Try to reduce the size of the cluster (the exponent);
partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a
mini-cluster The idea was explored for variable elimination (MBE)
![Page 171: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/171.jpg)
Idea of Mini-Clustering
Split a cluster into mini-clusters => bound complexity
)()( :decrease complexity lExponentia rnrn eOeO)O(e
},...,,,...,{ 11 nrr hhhh )(ucluster
elim
n
iihh
1
},...,{ 1 rhh },...,{ 1 nr hh
elim
n
rii
elim
r
ii hhg
11
gh
![Page 172: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/172.jpg)
Mini-Clustering (MC)
),|()|()(),(1)2,1( bacpabpapcbh
a
A B C p(a), p(b|a), p(c|a,b)
B E Fp(e|b,f)
E F Gp(g|e,f)
2
4
1
3EF
BC
BF
dc
dcfpfh,
2)3,2( ),|()(
),()|()( 1)2,1(
,
1)3,2( cbhbdpbh
dc
dc
dcfpcbhbdpfbh,
1)2,1()3,2( ),|(),()|(),(
),|()|()(),()2,1( bacpabpapcbha
Cluster Tree Elimination Mini-Clustering, i=3
G
E
F
C D
B
A
B C D Fp(d|b), p(f|c,d)2
B C D Fp(d|b), h(1,2)(b,c), p(f|c,d)
sep(2,3) = {B,F}elim(2,3) = {C,D}
B C D C D F p(d|b), h(1,2)(b,c) p(f|c,d)
![Page 173: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/173.jpg)
EF
BF
BC
),|()|()(:),(1)2,1( bacpabpapcbh
a
)2,1(H
),|(max:)(
),()|(:)(
,
2)1,2(
1)2,3(
,
1)1,2(
dcfpch
fbhbdpbh
fd
fd
)1,2(H
),|(max:)(
),()|(:)(
,
2)3,2(
1)2,1(
,
1)3,2(
dcfpfh
cbhbdpbh
dc
dc
)3,2(H
),(),|(:),( 1)3,4(
1)2,3( fehfbepfbh
e
)2,3(H
)()(),|(:),( 2)3,2(
1)3,2(
1)4,3( fhbhfbepfeh
b
)4,3(H
),|(:),(1)3,4( fegGpfeh e)3,4(H
ABC
2
4
1
3 BEF
EFG
BCDF
Mini-Clustering - example
![Page 174: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/174.jpg)
Mini-Clustering• Correctness and completeness:
Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values.
• Time & space complexity: O(exp(i))
![Page 175: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/175.jpg)
Approximate inference• Mini-Bucket Elimination
Mini-clustering• Iterative Belief Propagation
IJGP – Iterative Joint Graph Propagation• Sampling
Forward sampling Gibbs sampling (MCMC) Importance sampling Particle filtering
![Page 176: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/176.jpg)
Iterative Belief Propagation (IBP)• Belief propagation is exact for poly-trees (Pearl, 1988)• IBP - applying BP iteratively to cyclic networks
• No guarantees for convergence• Works well for many coding networks
)( 12xU
)( 11uX
1U 2U 3U
2X1X )( 12uX
)( 13xU
) BEL(U update :step One
1
![Page 177: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/177.jpg)
Iterative Belief Propagation
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A C
A AB BC
BE
C
CDE CE
F H
FFG GH H
GI
The graph IBP works on (dual graph)
A
D
I
B
E
J
F
G
C
H
Belief network
P(A)P(B|A,C)
P(C)
P(D|A,B,E) P(E|B,C)
P(F|C,D,E)P(G|H,F)
P(H)
P(I|F,G) P(J|H,G,I)
![Page 178: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/178.jpg)
Iterative Join-Graph Propagation (IJGP)• IBP is applied to a loopy network iteratively
not an anytime algorithm when it converges, it converges very fast
• MC applies bounded inference along a tree decomposition MC is an anytime algorithm controlled by i-bound MC converges in two passes up and down the tree
• IJGP combines: the iterative feature of IBP the anytime feature of MC
![Page 179: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/179.jpg)
IJGP - The basic idea Apply Cluster Tree Elimination to any join-graph
We commit to graphs that are minimal I-maps
Avoid cycles as long as I-mapness is not violated
Result: use minimal arc-labeled join-graphs
![Page 180: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/180.jpg)
IJGP - ExampleA
D
I
B
E
J
F
G
C
H
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A C
A AB BC
BE
C
CDE CE
F H
FFG GH H
GI
Belief network The graph IBP works on (dual graph)
![Page 181: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/181.jpg)
Arc-minimal join-graph
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A C
A AB BC
BE
C
CDE CE
F H
FFG GH H
GI
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FFG GH
GI
![Page 182: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/182.jpg)
Minimal arc-labeled join-graph
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FFG GH
GI
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FF GH
GI
![Page 183: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/183.jpg)
Join-graph decompositions
a) Minimal arc-labeled join graph
b) Join-graph obtained by collapsing nodes of graph a)
c) Minimal arc-labeled join graph
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FF GH
GI
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
CDE CE
FF GH
GI
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
DE CE
FF GH
GI
![Page 184: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/184.jpg)
Tree decomposition
ABCDE
FGHI GHIJ
CDEF
CDE
F
GHI
a) Minimal arc-labeled join graph
b) Tree decomposition
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
DE CE
FF GH
GI
![Page 185: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/185.jpg)
Join-graphsA
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A C
A AB BC
BE
C
CDE CE
F H
FFG GH H
GI
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FF GH
GI
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
DE CE
FF GH
GI
ABCDE
FGHI GHIJ
CDEF
CDE
F
GHI
more accuracy
less complexity
![Page 186: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/186.jpg)
Message propagationABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
CDECE
FF GH
GI
ABCDEp(a), p(c), p(b|ac), p(d|abe),p(e|b,c)
h(3,1)(bc)
BCD
CDEF
BC
CDE CE
1 3
2
h(3,1)(bc)
h(1,2)
Minimal arc-labeled: sep(1,2)={D,E} elim(1,2)={A,B,C}
Non-minimal arc-labeled: sep(1,2)={C,D,E} elim(1,2)={A,B}
cba
bchbcepabedpacbpcpapdeh,,
)1,3()2,1( )()|()|()|()()()(
ba
bchbcepabedpacbpcpapcdeh,
)1,3()2,1( )()|()|()|()()()(
![Page 187: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/187.jpg)
Bounded decompositions• We want arc-labeled decompositions such that:
the cluster size (internal width) is bounded by i (the accuracy parameter)
the width of the decomposition as a graph (external width) is as small as possible – closer to a tree
• Possible approaches to build decompositions: partition-based algorithms - inspired by the mini-bucket
decomposition grouping-based algorithms
![Page 188: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/188.jpg)
Partition-based algorithms
G
E
F
C D
B
A
a) schematic mini-bucket(i), i=3 b) minimal arc-labeled join-graph decomposition
CDB
CAB
BA
A
CBP(D|B)
P(C|A,B)
P(A)
BA
P(B|A)
FCD
P(F|C,D)
GFE
EBF
BF
EFP(E|B,F)
P(G|F,E)
B
CD
BF
A
F
G: (GFE)
E: (EBF) (EF)
F: (FCD) (BF)
D: (DB) (CD)
C: (CAB) (CB)
B: (BA) (AB) (B)
A: (A)
![Page 189: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/189.jpg)
IJGP properties• IJGP(i) applies BP to min arc-labeled join-graph, whose cluster
size is bounded by i
• On join-trees IJGP finds exact beliefs!
• IJGP is a Generalized Belief Propagation algorithm (Yedidia, Freeman and Weiss, 2001)
• Complexity of one iteration: time: O(deg•(n+N) •d i+1) space: O(N•d)
![Page 190: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/190.jpg)
Random networks - KL at convergence
evidence=0
Random networks, N=50, K=2, P=3, evid=0, w*=16, 100 instances
i-bound1 2 3 4 5 6 7 8 9 10 11
KL
dist
ance
1e-5
1e-4
1e-3
1e-2
IJGPMCIBP
Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances
i-bound
1 2 3 4 5 6 7 8 9 10 11K
L di
stan
ce1e-5
1e-4
1e-3
1e-2
IJGPMCIBP
evidence=5
![Page 191: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/191.jpg)
Random networks - KL vs. iterations
evidence=0 evidence=5
Random networks, N=50, K=2, P=3, evid=0, w*=16, 100 instances
Number of iterations
0 5 10 15 20 25 30 35
KL
dist
ance
1e-5
1e-4
1e-3
1e-2IJGP(2)IJGP(10)IBP
Number of iterations
0 5 10 15 20 25 30 35
KL
dist
ance
1e-5
1e-4
1e-3
1e-2
1e-1 IJGP(2)IJGP(10)IBP
Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances
![Page 192: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/192.jpg)
Random networks - Time
Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances
i-bound
1 2 3 4 5 6 7 8 9 10 11
Tim
e (s
econ
ds)
0.0
0.2
0.4
0.6
0.8
1.0
IJGP 20 itMCIBP 10 it
![Page 193: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/193.jpg)
IJGP summary• IJGP borrows the iterative feature from IBP and the anytime
virtues of bounded inference from MC
• Empirical evaluation showed the potential of IJGP, which improves with iteration and most of the time with i-bound, and scales up to large networks
• IJGP is almost always superior, often by a high margin, to IBP and MC
• Based on all our experiments, we think that IJGP provides a practical breakthrough to the task of belief updating
• #CSP: can use IJGP to generate solution counts estimates for depth-first Branch-and-Bound search
![Page 194: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/194.jpg)
Approximate inference• Mini-Bucket Elimination
Mini-clustering• Iterative Belief Propagation
IJGP – Iterative Joint Graph Propagation• Sampling
Forward sampling Gibbs sampling (MCMC) Importance sampling
![Page 195: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/195.jpg)
Approximation algorithms• Structural Approximations
Eliminate some dependencies• Remove edges• Mini-Bucket and Mini-Clustering approaches
• Local Search Approach for optimization tasks: MPE, MAP
• Favorite MAX-CSP/WCSP/WSAT local search solver!
• Sampling Generate random samples and compute values of interest
from samples, not original network
![Page 196: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/196.jpg)
Sampling• Input: Bayesian network with set of nodes X• Sample = a tuple with assigned values
s=(X1=x1,X2=x2,… ,Xk=xk)
• Tuple may include all variables (except evidence) or a subset
• Sampling schemas dictate how to generate samples (tuples)
• Ideally, samples are distributed according to P(X|E)
![Page 197: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/197.jpg)
Sampling fundamentals
dxXxggE )()(
Given a set of variables X = {X1, X2, … Xn} that represent joint probability distribution (X) and some function g(X), we can compute expected value of g(X) :
![Page 198: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/198.jpg)
Sampling from (X)
Given independent, identically distributed samples (iid) S1, S2, …ST from (X), it follows from Strong Law of Large Numbers:
T
ttSg
Tg
1)(1
},...,,{ 21tn
ttt xxxS A sample St is an instantiation:
![Page 199: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/199.jpg)
Sampling basics
• Given random variable X, D(X)={0, 1}• Given P(X) = {0.3, 0.7}• Generate k=10 samples: 0,1,1,1,0,1,1,0,1,0• Approximate P’(X):
}6.0,4.0{)('
6.0106
#)1(#)1('
4.0104
#)0(#)0('
XPsamples
XsamplesXP
samplesXsamplesXP
![Page 200: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/200.jpg)
How to draw a sample ?
• Given random variable X, D(X)={0, 1}• Given P(X) = {0.3, 0.7}• Sample X P (X)
draw random number r [0, 1] If (r < 0.3) then set X=0 Else set X=1
• Can generalize for any domain size
![Page 201: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/201.jpg)
Sampling in BN
• Same idea: generate a set of samples T• Estimate posterior marginal P(Xi|E) from
samples• Challenge: X is a vector and P(X) is a huge
distribution represented by BN• Need to know:
How to generate a new sample ? How many samples T do we need ? How to estimate P(E=e) and P(Xi|e) ?
![Page 202: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/202.jpg)
Sampling algorithms
• Forward Sampling• Gibbs Sampling (MCMC)
Blocking Rao-Blackwellised
• Likelihood Weighting• Importance Sampling• Sequential Monte-Carlo (Particle Filtering) in
Dynamic Bayesian Networks
![Page 203: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/203.jpg)
Forward sampling
• Forward Sampling Case with No evidence E={} Case with Evidence E=e
![Page 204: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/204.jpg)
Forward sampling no evidence (Henrion 1988)
Input: Bayesian networkX= {X1,…,XN}, N- #nodes, T - # samples
Output: T samples Process nodes in topological order – first process the
ancestors of a node, then the node itself:1. For t = 1 to T2. For i = 1 to N3. Xi sample xi
t from P(xi | pai)
![Page 205: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/205.jpg)
Sampling a value
What does it mean to sample xit from P(Xi | pai) ?
• Assume D(Xi)={0,1}• Assume P(Xi | pai) = (0.3, 0.7)
• Draw a random number r from [0,1]If r falls in [0,0.3], set Xi = 0If r falls in [0.3,1], set Xi = 1
0 10.3 r
![Page 206: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/206.jpg)
Forward sampling (example)
X1
X4
X2X3
)( 1xP
)|( 12 xxP
),|( 324 xxxP
)|( 13 xxP
)|( from Sample .4)|( from Sample .3)|( from Sample .2
)( from Sample .1 sample generate//
Evidence No
3,244
133
122
11
xxxPxxxPxxxPx
xPxk
![Page 207: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/207.jpg)
Forward Sampling-Answering Queries
Task: given T samples {S1,S2,…,Sn} estimate P(Xi = xi) :
TxXsamplesxXP ii
ii)(#)(
Basically, count the proportion of samples where Xi = xi
![Page 208: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/208.jpg)
Forward sampling w/ evidenceInput: Bayesian network
X= {X1,…,XN}, N- #nodesE – evidence, T - # samples
Output: T samples consistent with E1. For t=1 to T2. For i=1 to N3. Xi sample xi
t from P(xi | pai)4. If Xi in E and Xi xi, reject sample: 5. i = 1 and go to step 2
![Page 209: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/209.jpg)
Forward sampling (example)
)|( from Sample 5.otherwise 1, fromstart and
samplereject 0, If .4)|( from Sample .3)|( from Sample .2
)( from Sample .1 sample generate//
0 :Evidence
3,244
3
133
122
11
3
xxxPx
xxxPxxxPx
xPxk
X
X1
X4
X2X3
)( 1xP
)|( 12 xxP
),|( 324 xxxP
)|( 13 xxP
![Page 210: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/210.jpg)
Forward sampling: illustration
Let Y be a subset of evidence nodes s.t. Y=u
![Page 211: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/211.jpg)
Forward sampling – How many samples?
Theorem: Let s(y) be the estimate of P(y) resulting from a randomly chosen sample set S with T samples. Then, to guarantee relative error at most with probability at least 1- it is enough to have:
1
)( 2
yPcT
Derived from Chebychev’s Bound.
222])(,)([)( NeyPyPyPP
![Page 212: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/212.jpg)
Forward sampling: performance
Advantages:• P(xi | pa(xi)) is readily available• Samples are independent !
Drawbacks:• If evidence E is rare (P(e) is low), then we will reject
most of the samples!• Since P(y) in estimate of T is unknown, must estimate
P(y) from samples themselves!• If P(e) is small, T will become very big!
![Page 213: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/213.jpg)
Problem: evidence!
• Forward Sampling High Rejection Rate
• Fix evidence values Gibbs sampling (MCMC) Likelihood Weighting Importance Sampling
![Page 214: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/214.jpg)
Problem: Evidence
• Forward Sampling High rejection rate Samples are independent
• Fix evidence values Gibbs sampling (MCMC) Likelihood Weighting Importance Sampling
![Page 215: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/215.jpg)
Sampling algorithms
• Forward Sampling• Gibbs Sampling (MCMC)
Blocking Rao-Blackwellised
• Likelihood Weighting• Importance Sampling
![Page 216: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/216.jpg)
Gibbs Sampling• Markov Chain Monte Carlo method
(Gelfand and Smith, 1990, Smith and Roberts, 1993, Tierney, 1994)
• Samples are dependent, form Markov Chain• Sample from P’(X|e) which converges to P(X|e)• Guaranteed to converge when all P > 0• Methods to improve convergence:
Blocking Rao-Blackwellised
![Page 217: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/217.jpg)
Gibbs Sampling (Pearl, 1988)
• A sample t[1,2,…], is an instantiation of all variables in the network:
• Sampling process Fix values of observed variables e Instantiate node values in sample x0 at random Generate samples x1,x2,…xT from P(X|e) Compute posteriors from samples
},...,,{ 2211tNN
ttt xXxXxXx
![Page 218: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/218.jpg)
),,...,,|(
...),,...,,|(
),,...,,|(
11
12
11
1
31
121
22
3211
11
exxxxPxX
exxxxPxX
exxxxPxX
tN
ttN
tNN
tN
ttt
tN
ttt
Ordered Gibbs Sampler
Generate sample xt+1 from xt :
In short, for i=1 to N:
ProcessAll variablesIn SomeOrder
),\|( from sampled1 exxxPxX it
itii
![Page 219: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/219.jpg)
Gibbs Sampling (Pearl, 1988)
iX
Markov blanket:
nodesother all oft independen is parents), their andchildren, (parents,
Given
iX
blanketMarkov
)()( jj chX
jiii pachpaXM
:)\|( )\|( :Important it
iit
i xmarkovxPxxxP
ij chX
jjiiit
i paxPpaxPxxxP )|()|()\|(
![Page 220: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/220.jpg)
Ordered Gibbs Sampling Algorithm
Input: X, EOutput: T samples {xt }• Fix evidence E • Generate samples from P(X | E)1. For t = 1 to T (compute samples)2. For i = 1 to N (loop through variables)3. Xi sample xi
t from P(Xi | markovt \ Xi)
![Page 221: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/221.jpg)
Answering Queries
• Query: P(xi |e) = ?• Method 1: count #of samples where Xi=xi:
• Method 2: average probability (mixture estimator):
n
t it
iiii XmarkovxXPT
xXP1
)\|(1)(
TxXsamplesxXP ii
ii)(#)(
![Page 222: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/222.jpg)
Gibbs Sampling - example
X = {X1,X2,…,X9}E = {X9}
X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 223: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/223.jpg)
Gibbs Sampling - example
X1 = x10 X6 = x6
0
X2 = x20 X7 = x7
0
X3 = x30 X8 = x8
0
X4 = x40
X5 = x50
X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 224: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/224.jpg)
Gibbs Sampling - example
X1 P (X1 |X02,…,X0
8 ,X9)E = {X9}
P (X1=0 |X02,X0
3 ,X9} = αP(X1=0)P(X0
2|X1=0)P(X30|X1=0)
P (X1=1 |X02,X0
3 ,X9} = αP(X1=1)P(X0
2|X1=1)P(X30|X1=1)
X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 225: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/225.jpg)
Gibbs Sampling - example
X2 P(X2 |X11,…,X0
8 ,X9}E = {X9}
Markov blanket for X2 is:{X2, X1, X4, X5, X3}
X1
X4
X8 X5 X2
X3
X9 X7
X6
![Page 226: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/226.jpg)
Gibbs Sampling: Illustration
![Page 227: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/227.jpg)
Gibbs Sampling: Burn-In• We want to sample from P(X | E)• But … starting point is random• Solution: throw away first K samples • Known As “Burn-In”• What is K ? Hard to tell. Use intuition.• Alternatives: sample first sample values from
approximate P(x|e) For example, run IBP first
![Page 228: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/228.jpg)
Gibbs Sampling: Convergence• Converge to stationary distribution * :
* = * Pwhere P is a transition kernel
pij = P(Xi Xj)• Guaranteed to converge iff chain is :
irreducible aperiodic ergodic ( i,j pij > 0)
![Page 229: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/229.jpg)
Gibbs Sampling: Performance• Advantage:
guaranteed to converge to P(X|E), as long as Pi > 0• Disadvantage:
convergence may be slow
• Problems: Samples are dependent ! Statistical variance is too big in high-dimensional
problems
![Page 230: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/230.jpg)
Gibbs: Speeding ConvergenceObjectives:1. Reduce dependence between samples
(autocorrelation) Skip samples Randomize Variable Sampling Order
2. Reduce variance Blocking Gibbs Sampling Rao-Blackwellisation
![Page 231: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/231.jpg)
Skipping Samples• Pick only every k-th sample (Gayer, 1992)
Can reduce dependence between samples! Increases variance ! Waists samples !
![Page 232: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/232.jpg)
Randomized Variable Order• Random Scan Gibbs Sampler
Pick each next variable Xi for update at random with probability pi , i pi = 1.
• In the simplest case, pi are distributed uniformly. In some instances, reduces variance (MacEachern, Peruggia, 1999)
![Page 233: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/233.jpg)
Blocking• Sample several variables together, as a block• Example: Given three variables X,Y,Z, with domains of
size 2, group Y and Z together to form a variable W={Y,Z} with domain size 4. Then, given sample (xt,yt,zt), compute next sample:
Xt+1 P(yt,zt)=P(wt)(yt+1,zt+1)=Wt+1 P(xt+1)
+ Can improve convergence greatly when two variables are strongly correlated!
- Domain of the block variable grows exponentially with the #variables in a block!
![Page 234: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/234.jpg)
Rao-Blackwellisation• Do not sample all variables!• Sample a subset!• Example: Given three variables X,Y,Z, sample
only X and Y, sum out Z. Given sample (xt,yt), compute next sample:
xt+1 P(yt)yt+1 P(xt+1)
![Page 235: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/235.jpg)
Rao-Blackwell Theorem
Bottom line: reducing number of variables in a sample reduce variance!
![Page 236: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/236.jpg)
Blocking vs. Rao-Blackwellisation
• Standard Gibbs:P(x|y,z),P(y|x,z),P(z|x,y) (1)
• Blocking:P(x|y,z), P(y,z|x) (2)
• Rao-Blackwellised:P(x|y), P(y|x) (3)
Var3 < Var2 < Var1 (Liu, Wong, Kong, 1994)
X Y
Z
![Page 237: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/237.jpg)
Rao-Blackwellised Gibbs: Cutset Sampling
• Select C X (possibly cycle-cutset), |C| = m• Fix evidence E• Initialize nodes with random values:
For i=1 to m: ci to Ci = c 0
i
• For t=1 to n , generate samples:For i=1 to m:
Ci=cit+1 P(ci|c1
t+1,…,ci-1 t+1,ci+1
t,…,cmt ,e)
},...,,{ 2211tKK
ttt cCcCcCc
![Page 238: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/238.jpg)
Cutset Sampling - generating samples
Generate sample ct+1 from ct :
),\|(
),,...,,|(
...),,...,,|(
),,...,,|(
1
11
12
11
1
31
121
22
3211
11
ecccPcC
eccccPcC
eccccPcC
eccccPcC
it
itii
tK
ttK
tKK
tK
ttt
tK
ttt
from sampled
![Page 239: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/239.jpg)
Cutset Sampling• How to choose C ?
Special case: C is cycle-cutset, O(N) General case: apply Bucket Tree Elimination (BTE),
O(exp(w)) where w is the induced width of the network when nodes in C are observed.
Pick C wisely so as to minimize w notion of w-cutset
![Page 240: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/240.jpg)
w-cutset Sampling• C=w-cutset of the network, a set of nodes
such that when C and E are instantiated, the adjusted induced width of the network is w
• Complexity of exact inference: bounded by w !
• Cycle-cutset is a special case!
![Page 241: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/241.jpg)
Cutset Sampling - Answering Queries• Query: ci C, P(ci |e)=? same as Gibbs:• Special case of w-cutset
• Query: P(xi |e) = ?
computed while generating sample t
compute after generating sample t(easy because C is a cut-set)
T
t it
ii ecccPT
|e)(cP1
),\|(1
T
tt
ii ,ecxPT
|e)(xP1
)|(1
![Page 242: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/242.jpg)
Cutset Sampling Example
}{ 05
02
0 ,xxc X1
X7
X5 X4
X2
X9 X8
X3
E=x9
X6
![Page 243: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/243.jpg)
Cutset Sampling Example
),(
),(1)(
),(
),(
}{
905
''2
905
'2
9052
12
905
''2
905
'2
05
02
0
,xxxBTE
,xxxBTE,x| xxP x
,xxxBTE
,xxxBTE
,xx c
X1
X7
X6 X5 X4
X2
X9 X8
X3
Sample a new value for X2 :
![Page 244: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/244.jpg)
Cutset Sampling Example
},{
),(
),(1)(
),(
),(
)(
},{
15
12
1
9''
512
9'5
12
9125
15
9''
512
9'5
12
9052
12
05
02
0
xxc
,xxxBTE
,xxxBTE,x| xxP x
,xxxBTE
,xxxBTE
,x| xxP x
xxc
X1
X7
X6 X5 X4
X2
X9 X8
X3
Sample a new value for X5 :
![Page 245: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/245.jpg)
Cutset Sampling Example
)(
)(
)(
31)|(
)(
)(
)(
9252
9152
9052
92
9252
32
9152
22
9052
12
,x| xxP
,x| xxP
,x| xxP
xxP
,x| xxP x
,x| xxP x
,x| xxP x
X1
X7
X6 X5 X4
X2
X9 X8
X3
Query P(x2|e) for sampling node X2 :Sample 1
Sample 2
Sample 3
![Page 246: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/246.jpg)
Cutset Sampling Example
),,|(
),,|(
),,|(
31)|(
),,|(},{
),,|(},{
),,|(},{
935
323
925
223
915
123
93
935
323
35
32
3
925
223
25
22
2
915
123
15
12
1
xxxxP
xxxxP
xxxxP
xxP
xxxxPxxc
xxxxPxxc
xxxxPxxc
X1
X7
X6 X5 X4
X2
X9 X8
X3
Query P(x3 |e) for non-sampled node X3 :
![Page 247: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/247.jpg)
CPCS179 Test Results
MSE vs. #samples (left) and time (right) Non-Ergodic (1 deterministic CPT entry)|X| = 179, |C| = 8, 2<= D(Xi)<=4, |E| = 35Exact Time = 122 sec using Loop-Cutset Conditioning
CPCS179, n=179, |C|=8, |E|=35
0
0.002
0.004
0.006
0.008
0.01
0.012
100 500 1000 2000 3000 4000
# samples
Cutset Gibbs
CPCS179, n=179, |C|=8, |E|=35
0
0.002
0.004
0.006
0.008
0.01
0.012
0 20 40 60 80
Time(sec)
Cutset Gibbs
![Page 248: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/248.jpg)
CPCS360b Test Results
MSE vs. #samples (left) and time (right) Ergodic, |X| = 360, D(Xi)=2, |C| = 21, |E| = 36Exact Time > 60 min using Cutset ConditioningExact Values obtained via Bucket Elimination
CPCS360b, n=360, |C|=21, |E|=36
0
0.00004
0.00008
0.00012
0.00016
0 200 400 600 800 1000
# samples
Cutset Gibbs
CPCS360b, n=360, |C|=21, |E|=36
0
0.00004
0.00008
0.00012
0.00016
1 2 3 5 10 20 30 40 50 60
Time(sec)
Cutset Gibbs
![Page 249: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/249.jpg)
Sampling algorithms
• Forward Sampling• Gibbs Sampling (MCMC)
Blocking Rao-Blackwellised
• Likelihood Weighting• Importance Sampling
![Page 250: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/250.jpg)
Likelihood Weighting(Fung and Chang, 1990; Shachter and Peot, 1990)
• “Clamping” evidence +• Forward sampling +• Weighting samples by evidence likelihood
Works well for likely evidence!
![Page 251: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/251.jpg)
Likelihood Weighting
e e e e e
Sample in topological order over X !
e e e e
xi P(Xi|pai)P(Xi|pai) is a look-up in CPT!
![Page 252: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/252.jpg)
Likelihood Weighting Outline
EndFor)|(
)|(
)(
Do ForEach 1
)(
)()(
)(
iit
ii
iitt
ii
i
i
t
paXPxX
ElsepaePww
eXEXIf
XXw
![Page 253: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/253.jpg)
Likelihood Weighting
T
t
t
ti
T
t
t
ii
w
xxw
ePexPexP
1
)(
)(
1
)( ),(
)(ˆ),(ˆ
)|(ˆ
Estimate Posterior Marginals: P(Xi | e)
otherwise 0 and , contains sample if ,1),( )()(i
tti xxxx
![Page 254: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/254.jpg)
Likelihood Weighting
• Converges to exact posterior marginals• Generates samples fast • Sampling distribution is close to prior
(especially if E Leaf Nodes)• Increasing sampling variance
Convergence may be slow Many samples with P(x(t))=0 rejected
![Page 255: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/255.jpg)
Sampling algorithms
• Forward Sampling• Gibbs Sampling (MCMC)
Blocking Rao-Blackwellised
• Likelihood Weighting• Importance Sampling
![Page 256: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/256.jpg)
Importance Sampling Idea• In general, it is hard to sample from target
distribution P(X|E)• Generate samples from sampling (proposal)
distribution Q(X)• Weigh each sample against P(X|E)
dxxfxQxPdxxffI t )()()()()(
![Page 257: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/257.jpg)
Importance Sampling Theory
Z
EX
n
iii
EX
eEZPeEP
eXpaXPeEEXPeEP
),()(simplify E,\XLet Z
)),(|(),\()(\ 1\
![Page 258: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/258.jpg)
Importance Sampling Theory
• Given a distribution called the proposal distribution Q (such that P(Z=z,e)>0 => Q(Z=z)>0)
Zz
eEzZPeEP ),()(
)()(
),()( zZQ
zZQeEzZP
eEPZz
Zz
Q zZzQZE )( :value expected of definition By
)()(
),()( zZwEzZQ
eEzZPEeEP QQ
w(Z=z) is called importance weight
![Page 259: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/259.jpg)
Importance Sampling Theory
)()(
),()( zZwEzZQ
eEzZPEeEP QQ
)()(ˆ ,N
)(1)(
),(1)(ˆ
)z,...,(z Samples
Q fromdrawn samples ofset aGiven
11
n1
eEPeEPAs
zZwNzZQ
eEzZPN
eEPN
i
ii
N
ii
i
Underlying principle, Approximate Average over a set of numbers by an average over a set of sampled numbers
![Page 260: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/260.jpg)
Importance Sampling (Informally)• Express the problem as computing the average over
a set of real numbers• Sample a subset of real numbers• Approximate the true average by sample average.
True Average:• Average of (0.11, 0.24, 0.55, 0.77, 0.88,0.99)=0.59
Sample Average over 2 samples: • Average of (0.24, 0.77) = 0.505
![Page 261: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/261.jpg)
How to generate samples from Q
• Express Q in product form: Q(Z)=Q(Z1)Q(Z2|Z1)….Q(Zn|Z1,..Zn-1)
• Sample along the order Z1,..Zn
• Example: Q(Z1)=(0.2,0.8) Q(Z2|Z1)=(0.2,0.8,0.1,0.9) Q(Z3|Z1,Z2)=Q(Z3|Z1)=(0.5,0.5,0.3,0.7)
N
ii
i
zZQeEzZP
NeEP
1 )(),(1
)(
![Page 262: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/262.jpg)
How to sample from Q?
• Each Sample Z=z Sample Z1=z1 from Q(Z1) Sample Z2=z2 from Q(Z2|Z1=z1) Sample Z3=z3 from Q(Z3|Z1=z1)
• Generate N such samples
)(1)(
),(1)(
)z,...,(z Samples
11
n1
iN
i
N
ii
i
zZwNzZQ
eEzZPN
eEP
![Page 263: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/263.jpg)
Likelihood weighting
• Q= Prior Distribution = CPTs of the Bayesian network
![Page 264: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/264.jpg)
Likelihood weighting example
lung Cancer
Smoking
X-ray
Bronchitis
DyspnoeaP(D|C,B)
P(B|S)
P(S)
P(X|C,S)
P(C|S)
P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)
0)BC,|S)P(DC,|1S)P(X|0S)P(B|P(S)P(C0)B1,P(Xfalse0 and 1 where?)0,1( trueBXP
![Page 265: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/265.jpg)
Likelihood weighting example
lung Cancer
Smoking
X-ray
Bronchitis
DyspnoeaP(D|C,B)
P(B|S)
P(S)
P(X|C,S)
P(C|S)
Q=Prior
Q(S,C,D)=Q(S)*Q(C|S)*Q(D|C,B=0)
=P(S)P(C|S)P(D|C,B=0)
Sample S=s from P(S)
Sample C=c from P(C|S=s)
Sample D=d from P(D|C=c,B=0)
N
ii
i
zZQeEzZP
NeEP
1 )(),(1)(
),|1()|0()0,|()|()(
)0,|(),|1()|0()|()()0,|()|()(
)0,1,,,()(
),()(
sScCXPsSBPBcCdDPsScCPsSP
BcCdDPsScCXPsSBPsScCPsSPBcCdDPsScCPsSP
BXdDcCsSPzZQ
eEzZPzZw i
ii
![Page 266: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/266.jpg)
How to solve belief updating?
eEeExX
eEPeExXPeExXP
ii
iiii
is Evidence :rDenominato, is Evidence :Numerator
sampling importanceby r Denominato andNumerator Estimate)(
),()|(
0 , z sample iff 1),(,
)(
)(),()|(ˆ
j
1
1
elsexXcontainszxwhere
zw
zwzxeExXP
iij
i
N
j
j
N
j
jji
ii
![Page 267: Reasoning Under Uncertainty](https://reader037.vdocuments.us/reader037/viewer/2022103102/5681668a550346895dda48de/html5/thumbnails/267.jpg)
Summary