belief propagation in a continuous world

Belief Propagation in a Continuous World

Andrew Frank 11/02/2009

Joint work with Alex Ihler and Padhraic Smyth

Graphical Models

• Nodes represent random variables.• Edges represent dependencies.

C

B

A C

B

AC

B

A

C E

DB

A

Markov Random Fields

E

DB

CA

D

A C E

B

B E | C, DA C | B

Factoring Probability Distributions

Independence relations factorization

D

C

BA

p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)Z1

Toy Example: A Day in Court

W

A

E V

A, E, W є {“Innocent”, “Guilty”} V є {“Not guilty verdict”, “Guilty verdict”}

2112

),( EAf

I

G

GI

I 13)( Af

G

Inference

• Most probable explanation:

• Marginalization:

Iterative Message Updates

x

Belief Propagation

W

A

E V

mAE(E)

mWE(E)

mEV(V)

Loopy BP

C

A

B D

C

A

B D

Does this work? Does it make any sense?

A Variational Perspective

• Reformulate the problem:

True distribution, P

“Tractable” distributions

Best tractable approximation, Q

Find Q to minimize the divergence.

• Desired traits:– Simple enough to enable easy computation– Complex enough to represent P

Choose an Approximating Family

e.g.Fully factored:

Structured:

Choose a Divergence Measure

• Kullback-Liebler divergence:

• Alpha divergence:

Common choices:

Behavior of α-Divergence

Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft. Research, 2005.

Resulting Algorithms

Assuming a fully-factored form of Q, we get…*

• Mean field, α = 0• Belief propagation, α = 1• Tree-reweighted BP, α ≥ 1

* By minimizing “local divergence”:Q(X1, X2, …, Xn) = f(X1) f(X2) … f(Xn)

Local vs. Global Minimization

Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft. Research, 2005.

Applications

Sensor Localization

A B

C

Protein Side Chain Placement

RTDCYGN

+

Common traits?

?Continuous state space:

Easy Solution: Discretize!

10 bins

10 bins

Domain size:d = 100

20 bins

20 bins

Domain size:d = 400

Each message:

O(d2)

Particle BP

We’d like to pass “continuous messages”…

C

AB D

B

mAB(B)

1 4 4.2 5 2.5 … … …

Instead, pass discrete messages over sets of particles:

{ b(i)} ~ WB(B) mAB({b(i)})b(1) b(2) b(N). . .

PBP: Computing the Messages

Re-write as an expectation:

Finite-sample approximation:

Choosing“Good” Proposals

C

AB D

Proposal should “match” the integrand.

Sample from the belief:

Iteratively Refine Particle Sets(2)

f(xs, xt)

(1) Draw a set of particles, {xs(i)} ~ Ws(xs).

(2) Discrete inference over the particle discretization.(3) Adjust Ws(xs)

(1) (3)

Xs Xt

(1) (3)

Benefits of PBP

• No distributional assumptions.• Easy accuracy/speed trade-off.• Relies on an “embedded” discrete algorithm.

Belief propagation, mean field, tree-reweighted BP…

Exploring PBP: A Simple Example

xs

||xs – xt||

Continuous Ising Model MarginalsApproximateExact

Mean Field PBPα = 0

PBPα = 1

TRW PBPα = 1.5

* Run with 100 particles per node

A Localization Scenario

Exact Marginal

PBP Marginal

Tree-reweighted PBP Marginal

Estimating the Partition Function

• Mean field provides a lower bound.• Tree-reweighted BP provides an upper bound.

p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)Z1

Z = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)

Partition Function Bounds

Conclusions

• BP and related algorithms are useful!• Particle BP let’s you handle continuous RVs.• Extensions to BP can work with PBP, too.

Thank You!

belief propagation in a continuous world

Documents

fa fb fc fd fa

divergence source

divergence measures

da c b

trw pbp

alpha divergence

local divergence

d toy example