approximation techniques bounded inference 275b. sp22 mini-buckets: “local inference” the idea...
TRANSCRIPT
Approximation Techniques bounded inference
275b
SP2 2
Mini-buckets: “local inference”
The idea is similar to i-consistency: bound the size of recorded dependencies
Computation in a bucket is time and space exponential in the number of variables involved
Therefore, partition functions in a bucket into “mini-buckets” on smaller number of variables
SP2 3
Mini-bucket approximation: MPE task
Split a bucket into mini-buckets =>bound complexity
XX gh )()()O(e :decrease complexity lExponentia n rnr eOeO
SP2 4
Approx-mpe(i) Input: i – max number of variables allowed in a mini-bucket Output: [lower bound (P of a sub-optimal solution), upper bound]
Example: approx-mpe(3) versus elim-mpe
2*w 4*w
SP2 5
Properties of approx-mpe(i) Complexity: O(exp(2i)) time and O(exp(i)) time.
Accuracy: determined by upper/lower (U/L) bound.
As i increases, both accuracy and complexity increase.
Possible use of mini-bucket approximations: As anytime algorithms (Dechter and Rish, 1997) As heuristics in best-first search (Kask and Dechter,
1999)
Other tasks: similar mini-bucket approximations for: belief updating, MAP and MEU (Dechter and Rish, 1997)
SP2 6
Anytime Approximation
UL
L
U
mpe(i)-approxL mpe(i)-approxU
iii
ii
step
smallest theand largest the
solution return ,11
far so found solutionbest thekeepby computed boundlower by computed boundupper
available are resources space and time
0
returnend
if
While :Initialize
)mpe(-anytime
SP2 7
Bounded elimination for belief updating Idea mini-bucket is the same:
So we can apply a sum in each mini-bucket, or better, one sum and the rest max, or min (for lower-bound)
Approx-bel-max(i,m) generating upper and lower-bound on beliefs approximates elim-bel
Approx-map(i,m): max buckets will be maximizes, sum buckets will be sum-max. Approximates elim-map.
)(max)()()(
)()()()(
Xgxfxgxf
xgxfxgxf
XX X
X X X
SP2 8
Empirical Evaluation(Dechter and Rish, 1997; Rish thesis, 1999)
Randomly generated networks Uniform random probabilities Random noisy-OR
CPCS networks Probabilistic decoding
Comparing approx-mpe and anytime-mpe
versus elim-mpe
SP2 9
Random networks Uniform random: 60 nodes, 90 edges (200 instances)
In 80% of cases, 10-100 times speed-up while U/L<2 Noisy-OR – even better results
Exact elim-mpe was infeasible; appprox-mpe took 0.1 to 80 sec.
q
iy
in qqyyxPi
parameter noise random),...,|0(1
1
SP2 10
Anytime-mpe(0.0001) U/L error vs time
Time and parameter i
1 10 100 1000
Up
pe
r/L
ow
er
0.6
1.0
1.4
1.8
2.2
2.6
3.0
3.4
3.8 cpcs422b cpcs360b
i=1 i=21
CPCS networks – medical diagnosis(noisy-OR model)
Test case: no evidence
505.2 70.3anytime-mpe( ),
110.5 70.3anytime-mpe( ),
1697.6 115.8elim-mpe
cpcs422 cpcs360 AlgorithmTime (sec)
410 110
SP2 11
log(U/L)
0 2 4 6 8 10 12 0
100
200
300
400
500
600
700
800
900
1000
Freq
uenc
y
log(U/L) histogram for i=10 on 1000 instances of random evidence
log(U/L) histogram for i=10 on 1000 instances of likely evidence
log(U/L)
0 1 2 3 4 5 6 7 8 9 10 11 12 0
100
200
300
400
500
600
700
800
900
1000
Freq
uenc
y
The effect of evidenceMore likely evidence=>higher MPE => higher accuracy (why?)
Likely evidence versus random (unlikely) evidence
SP2 12
Probabilistic decodingError-correcting linear block code
State-of-the-art: approximate algorithm – iterative belief propagation (IBP) (Pearl’s poly-tree algorithm applied to loopy networks)
SP2 13
Iterative Belief Proapagation Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic
networks
No guarantees for convergence Works well for many coding networks
)( 11uX
1U 2U 3U
2X1X
)( 12xU
)( 12uX
)( 13xU
) BEL(U update :step One
1
SP2 14
approx-mpe vs. IBPcodes *w-low onbetter is mpe-approx
codes w*)-(high generatedrandomly onbetter is IBP
Bit error rate (BER) as a function of noise (sigma):
SP2 15
Mini-buckets: summary Mini-buckets – local inference approximation
Idea: bound size of recorded functions
Approx-mpe(i) - mini-bucket algorithm for MPE Better results for noisy-OR than for random
problems Accuracy increases with decreasing noise in Accuracy increases for likely evidence Sparser graphs -> higher accuracy Coding networks: approx-mpe outperfroms IBP on
low-induced width codes
SP2 16
Heuristic search Mini-buckets record upper-bound heuristics The evaluation function over
Best-first: expand a node with maximal evaluation function Branch and Bound: prune if f >= upper bound Properties:
an exact algorithm Better heuristics lead to more prunning
),...(x 1p pxx
pj buckethjp
p
iiip
ppp
hxh
paxPxg
xhxgxf
)(
)|()(
)()()(1
1
SP2 17
Heuristic Function
Given a cost functionP(a,b,c,d,e) = P(a) • P(b|a) • P(c|a) • P(e|b,c) • P(d|b,a)
Define an evaluation function over a partial assignment as theprobability of it’s best extension
f*(a,e,d) = maxb,c P(a,b,c,d,e) = = P(a) • maxb,c P)b|a) • P(c|a) • P(e|b,c) • P(d|a,b)
= g(a,e,d) • H*(a,e,d)
E
E
DA
D
B
D
D
B0
1
1
0
1
0
SP2 18
Heuristic Function
H*(a,e,d) = maxb,c P(b|a) • P(c|a) • P(e|b,c) • P(d|a,b)
= maxc P(c|a) • maxb P(e|b,c) • P(b|a) • P(d|a,b)
maxc P(c|a) • maxb P(e|b,c) • maxb P(b|a) • P(d|a,b)
= H(a,e,d)
f(a,e,d) = g(a,e,d) • H(a,e,d) f*(a,e,d)
The heuristic function H is compiled during the preprocessing stage of the
Mini-Bucket algorithm.
SP2 19
maxB P(e|b,c) P(d|a,b) P(b|a)
maxC P(c|a) hB(e,c)
maxD hB(d,a)
maxE hC(e,a)
maxA P(a) hE(a) hD (a)
Heuristic Function
The evaluation function f(xp) can be computed using function
recorded by the Mini-Bucket scheme and can be used to estimate
the probability of the best extension of partial assignment xp={x1, …, xp},
f(xp)=g(xp) H(xp )
For example,
H(a,e,d) = hB(d,a) hC (e,a)
g(a,e,d) = P(a)
SP2 20
Properties Heuristic is monotone Heuristic is admissible Heuristic is computed in linear time IMPORTANT:
Mini-buckets generate heuristics of varying strength using control parameter – bound I
Higher bound -> more preprocessing -> stronger heuristics -> less search Allows controlled trade-off between
preprocessing and search
SP2 21
Empirical Evaluation of mini-bucket heuristics
Time [sec]
0 10 20 30
% S
olve
d E
xact
ly
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
BBMB i=2
BFMB i=2
BBMB i=6
BFMB i=6
BBMB i=10
BFMB i=10
BBMB i=14
BFMB i=14
Random Coding, K=100, noise 0.32
Time [sec]
0 10 20 30
% S
olve
d E
xact
ly
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
BBMB i=6 BFMB i=6 BBMB i=10 BFMB i=10 BBMB i=14 BFMB i=14
SP2 22
Cluster Tree Elimination - properties Correctness and completeness: Algorithm CTE is
correct, i.e. it computes the exact joint probability of a single variable and the evidence.
Time complexity: O ( deg (n+N) d w*+1 )
Space complexity: O ( N d sep)where deg = the maximum degree of a node
n = number of variables (= number of CPTs)N = number of nodes in the tree
decompositiond = the maximum domain size of a variablew* = the induced widthsep = the separator size
SP2 23
Mini-Clustering for belief updating Motivation:
Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem
When the induced width w* is big, CTE algorithm becomes infeasible
The basic idea: Try to reduce the size of the cluster (the exponent);
partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a
mini-cluster The idea was explored for variable elimination (Mini-
Bucket)
SP2 24
Idea of Mini-Clustering
Split a cluster into mini-clusters => bound complexity
)()( :decrease complexity lExponentia rnrn eOeO)O(e
},...,,,...,{ 11 nrr hhhh )(ucluster
elim
n
iihh
1
},...,{ 1 rhh },...,{ 1 nr hh
elim
n
rii
elim
r
ii hhg
11
gh
SP2 25
EF
BF
BC
),|()|()(:),(1)2,1( bacpabpapcbh
a
)2,1(H
fd
fd
dcfpch
fbhbdpbh
,
2)1,2(
,
1)2,3(
1)1,2(
),|(:)(
),()|(:)(
)1,2(H
dc
dc
dcfpfh
cbhbdpbh
,
2)3,2(
,
1)2,1(
1)3,2(
),|(:)(
),()|(:)(
)3,2(H
),(),|(:),( 1)3,4(
1)2,3( fehfbepfbh
e
)2,3(H
)()(),|(:),( 2)3,2(
1)3,2(
1)4,3( fhbhfbepfeh
b
)4,3(H
),|(:),(1)3,4( fegGpfeh e)3,4(H
ABC
2
4
1
3 BEF
EFG
BCDF
Mini-Clustering - example
SP2 26
ABC
2
4
),()2,1( cbh1
3 BEF
EFG
),()1,2( cbh
),()3,2( fbh
),()2,3( fbh
),()4,3( feh
),()3,4( fehEF
BF
BC
BCDF
),(1)2,1( cbh
)(
)(2
)1,2(
1)1,2(
ch
bh
)(
)(2
)3,2(
1)3,2(
fh
bh
),(1)2,3( fbh
),(1)4,3( feh
),(1)3,4( feh
)2,1(H
)1,2(H
)3,2(H
)2,3(H
)4,3(H
)3,4(H
ABC
2
4
1
3 BEF
EFG
EF
BF
BC
BCDF
Cluster Tree Elimination vs. Mini-Clustering
SP2 27
Mini-Clustering
Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values.
Time & space complexity: O(n hw* d i)
where hw* = maxu | {f | f (u) } |
SP2 28
Experimental results
Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization
(approximate)
Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks
Measures: Normalized Hamming
Distance (NHD) BER (Bit Error Rate) Absolute error Relative error Time
SP2 29
Random networks - Absolute error
evidence=0 evidence=10
Random networks, N=50, P=2, k=2, evid=0, w*=10, 50 instances
i-bound
0 2 4 6 8 10
Abs
olut
e er
ror
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
MCGibbs SamplingIBP
Random networks, N=50, P=2, k=2, evid=10, w*=10, 50 instances
i-bound
0 2 4 6 8 10
Abs
olut
e er
ror
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
MCGibbs SamplingIBP
SP2 30
Noisy-OR networks - Absolute error
Noisy-OR networks, N=50, P=3, evid=10, w*=16, 25 instances
i-bound
0 2 4 6 8 10 12 14 16
Abs
olut
e er
ror
1e-5
1e-4
1e-3
1e-2
1e-1
1e+0
MCIBPGibbs Sampling
Noisy-OR networks, N=50, P=3, evid=20, w*=16, 25 instances
i-bound
0 2 4 6 8 10 12 14 16A
bsol
ute
erro
r1e-5
1e-4
1e-3
1e-2
1e-1
1e+0
MCIBPGibbs Sampling
evidence=10 evidence=20
SP2 31
Grid 15x15 - 10 evidenceGrid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
NH
D
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
MCIBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.00
0.01
0.02
0.03
0.04
0.05
0.06
MCIBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Rel
ativ
e er
ror
0.00
0.02
0.04
0.06
0.08
0.10
0.12
MCIBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Tim
e (
seco
nd
s)
0
2
4
6
8
10
12
MCIBP
SP2 32
CPCS422 - Absolute error
evidence=0 evidence=10
CPCS 422, evid=0, w*=23, 1 instance
i-bound
2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.00
0.01
0.02
0.03
0.04
0.05
MCIBP
CPCS 422, evid=10, w*=23, 1 instance
i-bound
2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.00
0.01
0.02
0.03
0.04
0.05
MCIBP
SP2 33
Coding networks - Bit Error Rate
sigma=0.22 sigma=.51
Coding networks, N=100, P=4, sigma=.51, w*=12, 50 instances
i-bound
0 2 4 6 8 10 12
Bit
Err
or R
ate
0.06
0.08
0.10
0.12
0.14
0.16
0.18
MCIBP
Coding networks, N=100, P=4, sigma=.22, w*=12, 50 instances
i-bound
0 2 4 6 8 10 12
Bit
Err
or R
ate
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
MCIBP
SP2 34
Mini-Clustering summary
MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating
Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms
SP2 35
What is IJGP?
IJGP is an approximate algorithm for belief updating in Bayesian networks
IJGP is a version of join-tree clustering which is both anytime and iterative
IJGP applies message passing along a join-graph, rather than a join-tree
Empirical evaluation shows that IJGP is almost always superior to other approximate schemes (IBP, MC)
SP2 36
Iterative Belief Propagation - IBP
Belief propagation is exact for poly-trees IBP - applying BP iteratively to cyclic
networks
No guarantees for convergence
Works well for many coding networks
One step:update BEL(U1)
)( 11uX
U1
)( 12xU
)( 12uX
)( 13xU
U2 U3
X2X1
SP2 37
IJGP - Motivation IBP is applied to a loopy network iteratively
not an anytime algorithm when it converges, it converges very fast
MC applies bounded inference along a tree decomposition MC is an anytime algorithm controlled by i-bound MC converges in two passes up and down the tree
IJGP combines: the iterative feature of IBP the anytime feature of MC
SP2 38
IJGP - The basic idea Apply Cluster Tree Elimination to any join-
graph
We commit to graphs that are minimal I-maps
Avoid cycles as long as I-mapness is not violated
Result: use minimal arc-labeled join-graphs
SP2 39
IJGP - ExampleA
D
I
B
E
J
F
G
C
H
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A C
A AB BC
BE
C
CDE CE
FH
FFG GH H
GI
a) Belief network a) The graph IBP works on
SP2 40
Arc-minimal join-graphA
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A C
A AB BC
BE
C
CDE CE
FH
FFG GH H
GI
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FFG GH
GI
SP2 41
Minimal arc-labeled join-graph
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FFG GH
GI
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FF GH
GI
SP2 42
Join-graph decompositions
a) Minimal arc-labeled join graph
b) Join-graph obtained by collapsing nodes of graph a)
c) Minimal arc-labeled join graph
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FF GH
GI
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
CDE CE
FF GH
GI
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
DE CE
FF GH
GI
SP2 43
Tree decompositionABCDE
FGHI GHIJ
CDEF
CDE
F
GHI
a) Minimal arc-labeled join graph
a) Tree decomposition
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
DE CE
FF GH
GI
SP2 44
Join-graphsA
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A C
A AB BC
BE
C
CDE CE
FH
FFG GH H
GI
A
ABDE
FGI
ABC
BCE
GHIJ
CDEF
FGH
C
H
A
AB BC
CDE CE
H
FF GH
GI
ABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
DE CE
FF GH
GI
ABCDE
FGHI GHIJ
CDEF
CDE
F
GHI
more accuracy
less complexity
SP2 45
Message propagationABCDE
FGI
BCE
GHIJ
CDEF
FGH
BC
CDE
CE
FF GH
GI
ABCDEp(a), p(c), p(b|ac), p(d|abe),p(e|b,c)
h(3,1)(bc)
BCD
CDEF
BC
CDE CE
1 3
2
h(3,1)(bc)
h(1,2)
Minimal arc-labeled: sep(1,2)={D,E} elim(1,2)={A,B,C}
Non-minimal arc-labeled: sep(1,2)={C,D,E} elim(1,2)={A,B}
cba
bchbcepabedpacbpcpapdeh,,
)1,3()2,1( )()|()|()|()()()(
ba
bchbcepabedpacbpcpapcdeh,
)1,3()2,1( )()|()|()|()()()(
SP2 46
Bounded decompositions We want arc-labeled decompositions such
that: the cluster size (internal width) is bounded by i (the
accuracy parameter) the width of the decomposition as a graph (external
width) is as small as possible
Possible approaches to build decompositions: partition-based algorithms - inspired by the mini-
bucket decomposition grouping-based algorithms
SP2 47
Partition-based algorithms
G
E
F
C D
B
A
a) schematic mini-bucket(i), i=3 b) arc-labeled join-graph decomposition
CDB
CAB
BA
A
CBP(D|B)
P(C|A,B)
P(A)
BA
P(B|A)
FCD
P(F|C,D)
GFE
EBF
BF
EFP(E|B,F)
P(G|F,E)
B
CD
BF
A
F
G: (GFE)
E: (EBF) (EF)
F: (FCD) (BF)
D: (DB) (CD)
C: (CAB) (CB)
B: (BA) (AB) (B)
A: (A)
SP2 48
IJGP properties IJGP(i) applies BP to min arc-labeled join-
graph, whose cluster size is bounded by i
On join-trees IJGP finds exact beliefs
IJGP is a Generalized Belief Propagation algorithm (Yedidia, Freeman, Weiss 2001)
Complexity of one iteration: time: O(deg•(n+N) •d i+1) space: O(N•d)
SP2 49
Empirical evaluation Algorithms:
Exact IBP MC IJGP
Measures: Absolute error Relative error Kulbach-Leibler (KL) distance Bit Error Rate Time
Networks (all variables are binary): Random networks Grid networks (MxM) CPCS 54, 360, 422 Coding networks
SP2 50
Random networks - KL at convergence
evidence=0
Random networks, N=50, K=2, P=3, evid=0, w*=16, 100 instances
i-bound1 2 3 4 5 6 7 8 9 10 11
KL
dist
ance
1e-5
1e-4
1e-3
1e-2
IJGPMCIBP
Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances
i-bound
1 2 3 4 5 6 7 8 9 10 11K
L di
stan
ce1e-5
1e-4
1e-3
1e-2
IJGPMCIBP
evidence=5
SP2 51
Random networks - KL vs. iterations
evidence=0 evidence=5
Random networks, N=50, K=2, P=3, evid=0, w*=16, 100 instances
Number of iterations
0 5 10 15 20 25 30 35
KL
dist
ance
1e-5
1e-4
1e-3
1e-2IJGP(2)IJGP(10)IBP
Number of iterations
0 5 10 15 20 25 30 35
KL
dist
ance
1e-5
1e-4
1e-3
1e-2
1e-1IJGP(2)IJGP(10)IBP
Random networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances
SP2 52
Random networks - TimeRandom networks, N=50, K=2, P=3, evid=5, w*=16, 100 instances
i-bound
1 2 3 4 5 6 7 8 9 10 11
Tim
e (
seco
nds
)
0.0
0.2
0.4
0.6
0.8
1.0
IJGP 20 itMCIBP 10 it
SP2 53
Coding networks - BERCoding, N=400, 500 instances, 30 it, w*=43, sigma=.32
i-bound
0 2 4 6 8 10 12
BE
R
0.00237
0.00238
0.00239
0.00240
0.00241
0.00242
0.00243
IBPIJGP
Coding, N=400, 500 instances, 30 it, w*=43, sigma=.51
i-bound
0 2 4 6 8 10 12
BE
R
0.0745
0.0750
0.0755
0.0760
0.0765
0.0770
0.0775
0.0780
0.0785
IBPIJGP
Coding, N=400, 500 instances, 30 it, w*=43, sigma=.65
i-bound
0 2 4 6 8 10 12
BE
R
0.1900
0.1902
0.1904
0.1906
0.1908
0.1910
0.1912
0.1914
IBPIJGP
sigma=.22 sigma=.32
sigma=.51 sigma=.65
Coding, N=400, 1000 instances, 30 it, w*=43, sigma=.22
i-bound
0 2 4 6 8 10 12
BE
R
1e-5
1e-4
1e-3
1e-2
1e-1
IJGPMCIBP
SP2 54
Coding networks - TimeCoding, N=400, 500 instances, 30 iterations, w*=43
i-bound
0 2 4 6 8 10 12
Tim
e (s
econ
ds)
0
2
4
6
8
10
IJGP 30 iterationsMCIBP 30 iterations
SP2 55
IJGP summary IJGP borrows the iterative feature from IBP and the
anytime virtues of bounded inference from MC
Empirical evaluation showed the potential of IJGP, which improves with iteration and most of the time with i-bound, and scales up to large networks
IJGP is almost always superior, often by a high margin, to IBP and MC
Based on all our experiments, we think that IJGP provides a practical breakthrough to the task of belief updating
SP2 56
Random networks
N=80, 100 instances, w*=15
SP2 57
Random networks
N=80, 100 instances, w*=15
SP2 58
CPCS 54, CPCS360
CPCS360: 5 instances, w*=20 CPCS54: 100 instances, w*=15
SP2 59
Graph coloring problems
X1
X2
X3H1
H2
H3 X1 X2 X3 X4 Xn-1 Xn…
H1 H2 H3 H4…
Hk Xi Xj P(Hk|XiXj)1 1 2 1-1 2 1 1-1 3 3 1-1 … …
SP2 60
Graph coloring problems
SP2 61
Inference power of IBP - summary IBP’s inference of zero beliefs converges in a finite number of
iterations and is sound; The results extend to generalized belief propagation algorithms, in particular to IJGP;
We identified classes of networks for which IBP: can infer zeros, and therefore is likely to be good; can not infer zeros, although there are many of them (graph coloring),
and therefore is bad
Based on the analysis it is easy to synthesize belief networks that are hard for IBP.
The success of IBP for coding networks can be explained by: Many extreme beliefs An easy-for-arc-consistency flat network
SP2 62
“Road map” CSPs: complete algorithms CSPs: approximations Belief nets: complete algorithms Belief nets: approximations
Local inference: mini-buckets Stochastic simulations Variational techniques
MDPs
SP2 63
Stochastic Simulation Forward sampling (logic sampling) Likelihood weighing Markov Chain Monte Carlo
(MCMC): Gibbs sampling
SP2 64
Approximation via Sampling
(MCMC) sampling Gibbs * weighinglikelihood *
:ues their val tonodes evidence clamping"" - sampling) forward (e.g., rejection-acceptance -
? Eevidence handle How to3.
, #
)(
:sfrequencieby iesprobabilit Estimate2. )x,...,x,(xs where),s,...,s(
: ( from samples generate 1.
in
i2
i1
iN1
N
yYwithsamplesyYP
PN
SX)
SP2 66
Forward sampling (example)
1X
2X 3X
4X
)( 1xP
)|( 12 xxP
),|( 324 xxxP
)|( 13 xxP
)|( from sample 5.otherwise 1, fromstart and
samplereject 0, If .4)|( from Sample .3)|( from Sample .2
)( from Sample .1 sample generate//
0 :Evidence
3,244
3
133
122
11
3
xxxPx
xxxPxxxPx
xPxk
X
Drawback: high rejection rate!
SP2 68
Gibbs Sampling(Geman and Geman, 1984)
Markov Chain Monte Carlo (MCMC):create a Markov chain of samples
}){\|( from sample 5. .4
to# 3. , 2.
. , 1.
iiii
i
ii
iii
XXxPxXEX
N1samplevaluerandomxEX
exEX
forFor
each For each For
Advantage: guaranteed to converge to P(X)Disadvantage: convergence may be slow
SP2 69
Gibbs Sampling (cont’d)(Pearl, 1988)
ij chX
jjiiii paxPpaxPXXxP )|()|(}){\|(
:locally computed is }){\|( :Important ii XXxP
iX )()( jj chX
jiii pachpaXM
Markov blanket:
nodesother all oft independen is parents), their andchildren, (parents,
Given
iX
blanketMarkov