belief propagation in a continuous world
DESCRIPTION
Belief Propagation in a Continuous World. Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A. Graphical Models. Nodes represent random variables. Edges represent dependencies. - PowerPoint PPT PresentationTRANSCRIPT
Belief Propagation in a Continuous World
Andrew Frank 11/02/2009
Joint work with Alex Ihler and Padhraic Smyth
Graphical Models
• Nodes represent random variables.• Edges represent dependencies.
C
B
A C
B
AC
B
A
C E
DB
A
Markov Random Fields
E
DB
CA
D
A C E
B
B E | C, DA C | B
Factoring Probability Distributions
Independence relations factorization
D
C
BA
p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)Z1
Toy Example: A Day in Court
W
A
E V
A, E, W є {“Innocent”, “Guilty”} V є {“Not guilty verdict”, “Guilty verdict”}
2112
),( EAf
I
G
GI
I 13)( Af
G
Inference
• Most probable explanation:
• Marginalization:
Iterative Message Updates
x
Belief Propagation
W
A
E V
mAE(E)
mWE(E)
mEV(V)
Loopy BP
C
A
B D
C
A
B D
Does this work? Does it make any sense?
A Variational Perspective
• Reformulate the problem:
True distribution, P
“Tractable” distributions
Best tractable approximation, Q
Find Q to minimize the divergence.
• Desired traits:– Simple enough to enable easy computation– Complex enough to represent P
Choose an Approximating Family
e.g.Fully factored:
Structured:
Choose a Divergence Measure
• Kullback-Liebler divergence:
• Alpha divergence:
Common choices:
Behavior of α-Divergence
Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft. Research, 2005.
Resulting Algorithms
Assuming a fully-factored form of Q, we get…*
• Mean field, α = 0• Belief propagation, α = 1• Tree-reweighted BP, α ≥ 1
* By minimizing “local divergence”:Q(X1, X2, …, Xn) = f(X1) f(X2) … f(Xn)
Local vs. Global Minimization
Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft. Research, 2005.
Applications
Sensor Localization
A B
C
Protein Side Chain Placement
RTDCYGN
+
Common traits?
?Continuous state space:
Easy Solution: Discretize!
10 bins
10 bins
Domain size:d = 100
20 bins
20 bins
Domain size:d = 400
Each message:
O(d2)
Particle BP
We’d like to pass “continuous messages”…
C
AB D
B
mAB(B)
1 4 4.2 5 2.5 … … …
Instead, pass discrete messages over sets of particles:
{ b(i)} ~ WB(B) mAB({b(i)})b(1) b(2) b(N). . .
PBP: Computing the Messages
Re-write as an expectation:
Finite-sample approximation:
Choosing“Good” Proposals
C
AB D
Proposal should “match” the integrand.
Sample from the belief:
Iteratively Refine Particle Sets(2)
f(xs, xt)
(1) Draw a set of particles, {xs(i)} ~ Ws(xs).
(2) Discrete inference over the particle discretization.(3) Adjust Ws(xs)
(1) (3)
Xs Xt
(1) (3)
Benefits of PBP
• No distributional assumptions.• Easy accuracy/speed trade-off.• Relies on an “embedded” discrete algorithm.
Belief propagation, mean field, tree-reweighted BP…
Exploring PBP: A Simple Example
xs
||xs – xt||
Continuous Ising Model MarginalsApproximateExact
Mean Field PBPα = 0
PBPα = 1
TRW PBPα = 1.5
* Run with 100 particles per node
A Localization Scenario
Exact Marginal
PBP Marginal
Tree-reweighted PBP Marginal
Estimating the Partition Function
• Mean field provides a lower bound.• Tree-reweighted BP provides an upper bound.
p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)Z1
Z = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)
Partition Function Bounds
Conclusions
• BP and related algorithms are useful!• Particle BP let’s you handle continuous RVs.• Extensions to BP can work with PBP, too.
Thank You!