probabilistic graphical models - epfl€¦ · probabilistic graphical models lecture 2: graphical...

77
Probabilistic Graphical Models Lecture 2: Graphical Models. Belief Propagation Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 30/9/2011 (EPFL) Graphical Models 30/9/2011 1 / 30

Upload: others

Post on 20-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Probabilistic Graphical Models

    Lecture 2: Graphical Models. Belief Propagation

    Volkan Cevher, Matthias SeegerEcole Polytechnique Fédérale de Lausanne

    30/9/2011

    (EPFL) Graphical Models 30/9/2011 1 / 30

  • Outline

    1 Graphical Models

    2 Belief Propagation

    (EPFL) Graphical Models 30/9/2011 2 / 30

  • Graphical Models

    Literature

    Excellent book about graphical models and belief propagation, writtenby one of the pioneers in these topics:

    Pearl, J.Probabilistic Reasoning in Intelligent Systems (1990)

    (EPFL) Graphical Models 30/9/2011 3 / 30

  • Graphical Models

    The Need to Factorize

    Variables x1, x2, . . . , xn

    P(x1) =X

    x2

    · · ·X

    xn

    P(x1, x2, . . . , xn)

    Marginalization: Exponential timeStorage: Exponential space ) Need factorization

    Independence?But probabilistic modelling is about dependencies!

    Conditional independenceDependencies may have simple structure

    (EPFL) Graphical Models 30/9/2011 4 / 30

  • Graphical Models

    The Need to Factorize

    Variables x1, x2, . . . , xn

    P(x1) =X

    x2

    · · ·X

    xn

    P(x1, x2, . . . , xn)

    Marginalization: Exponential timeStorage: Exponential space ) Need factorization

    Independence?But probabilistic modelling is about dependencies!Conditional independenceDependencies may have simple structure

    (EPFL) Graphical Models 30/9/2011 4 / 30

  • Graphical Models

    Towards Bayesian NetworksTracking a fly

    Path pretty random

    Positions not independentBut conditionally independent(Markovian)

    (EPFL) Graphical Models 30/9/2011 5 / 30

  • Graphical Models

    Towards Bayesian NetworksTracking a fly

    Path pretty random

    Positions not independentBut conditionally independent(Markovian)

    (EPFL) Graphical Models 30/9/2011 5 / 30

  • Graphical Models

    Towards Bayesian NetworksTracking a fly

    Path pretty randomPositions not independent

    But conditionally independent(Markovian)

    (EPFL) Graphical Models 30/9/2011 5 / 30

  • Graphical Models

    Towards Bayesian NetworksTracking a fly

    Path pretty randomPositions not independent

    But conditionally independent(Markovian)

    (EPFL) Graphical Models 30/9/2011 5 / 30

  • Graphical Models

    Towards Bayesian NetworksTracking a fly

    Path pretty randomPositions not independent

    But conditionally independent(Markovian)

    (EPFL) Graphical Models 30/9/2011 5 / 30

  • Graphical Models

    Towards Bayesian NetworksTracking a fly

    Path pretty randomPositions not independentBut conditionally independent(Markovian)

    (EPFL) Graphical Models 30/9/2011 5 / 30

  • Graphical Models

    Towards Bayesian NetworksTracking a fly

    Path pretty randomPositions not independentBut conditionally independent(Markovian)

    Remember

    P(x1, . . . , xn) =P(x1)P(x2|x1) . . .P(xn|xn�1, . . . , x1) ?

    Here: P(xn|xn�1, . . . , x1) = P(xn|xn�1)) Linear storageCausal factorization) Bayesian networks

    (EPFL) Graphical Models 30/9/2011 5 / 30

  • Graphical Models

    Bayesian Networks (Directed Graphical Models)Causal factorization:

    P(x1, . . . , xn) =nY

    i=1

    P(xi |x⇡i )

    Bayesian network(aka directed graphical model,aka causal network):

    Graphical representation ofancestry [DAG]P(xi |x⇡i ): Conditionalprobability table (CPT)

    Conditional IndependenceA?B | C , P(A,B | C) = P(A | C)P(B | C) , P(A | C,B) = P(A | C)

    (EPFL) Graphical Models 30/9/2011 6 / 30

  • Graphical Models

    Did It Rain Tonight?

    (EPFL) Graphical Models 30/9/2011 7 / 30

  • Graphical Models

    Monty Hall Problem

    Let’s make a deal!Door with car (hidden)First choice of yours (remains closed)Host opens door with goat, H 6= F ,DDo you switch?

    “Intuition”: Fifty-fifty.F ,H give no information. He would be stupid, wouldn’t he?

    (EPFL) Graphical Models 30/9/2011 8 / 30

  • Graphical Models

    Monty Hall Problem

    Let’s make a deal!Door with car (hidden)First choice of yours (remains closed)Host opens door with goat, H 6= F ,DDo you switch?

    “Intuition”: Fifty-fifty.F ,H give no information. He would be stupid, wouldn’t he?

    (EPFL) Graphical Models 30/9/2011 8 / 30

  • Graphical Models

    Winning with Bayes (I)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}

    Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.

    OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!

    Bayes makes you switch anddouble your chance ofwinning!

    (EPFL) Graphical Models 30/9/2011 9 / 30

  • Graphical Models

    Winning with Bayes (I)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.

    OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!

    Bayes makes you switch anddouble your chance ofwinning!

    (EPFL) Graphical Models 30/9/2011 9 / 30

  • Graphical Models

    Winning with Bayes (I)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.

    OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!

    Bayes makes you switch anddouble your chance ofwinning!

    (EPFL) Graphical Models 30/9/2011 9 / 30

  • Graphical Models

    Winning with Bayes (I)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.

    OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!

    Bayes makes you switch anddouble your chance ofwinning!

    (EPFL) Graphical Models 30/9/2011 9 / 30

  • Graphical Models

    Winning with Bayes (II)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.

    P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)

    P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat

    P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F

    Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps

    (EPFL) Graphical Models 30/9/2011 10 / 30

  • Graphical Models

    Winning with Bayes (II)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)

    P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat

    P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F

    Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps

    (EPFL) Graphical Models 30/9/2011 10 / 30

  • Graphical Models

    Winning with Bayes (II)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)

    P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat

    P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F

    Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps

    (EPFL) Graphical Models 30/9/2011 10 / 30

  • Graphical Models

    Winning with Bayes (II)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)

    P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat

    P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F

    Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps

    (EPFL) Graphical Models 30/9/2011 10 / 30

  • Graphical Models

    Winning with Bayes (II)

    F

    D

    HI

    oor withcar

    irstchoice

    ostopens

    D=F?

    To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)

    P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat

    P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F

    Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps

    (EPFL) Graphical Models 30/9/2011 10 / 30

  • Graphical Models

    Why Graphical Models?

    1 Easy way of communicating ideas about dependencies, models2 Precise semantics: Conditional independence constraints on

    distributions. Efficient algorithms for testing these3 Lead to large savings in computations (belief propagation)

    (EPFL) Graphical Models 30/9/2011 11 / 30

  • Graphical Models

    Graphical Models in PracticeDependency structures, and efficient ways to propagate information orconstraints, are fundamental.

    Coding / Information TheoryLDPC codes and BP decodingrevolutionized this field(resurrection of Gallagercodes)Used from deep spacecommunication (Mars rovers)over satellite transmission toCD players / hard drives

    Courtesy MacKay: Information Theory . . . (2003)

    (EPFL) Graphical Models 30/9/2011 12 / 30

  • Graphical Models

    Graphical Models in PracticeDependency structures, and efficient ways to propagate information orconstraints, are fundamental.

    Expert systems done rightQMR-DT: Invert causalnetwork for helping medicaldiagnosesHugin: Advanced decisionsupport (Lauritzen)

    http://www.hugin.com/

    Promedas: Medical diagnosticadvisory system(SNN Nimegen)

    http://www.promedas.nl/

    (EPFL) Graphical Models 30/9/2011 12 / 30

  • Graphical Models

    Graphical Models in PracticeDependency structures, and efficient ways to propagate information orconstraints, are fundamental.

    Computer Vision:Markov Random Fields

    Denoising, super-resolution,restoration (early work byBesag)Depth / reconstruction fromstereo, matching,correspondencesSegmentation, matting,blending, stitching, impainting,. . .

    Courtesy MSR

    (EPFL) Graphical Models 30/9/2011 12 / 30

  • Graphical Models

    Conditional Independence Semantics

    Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?

    Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!

    (EPFL) Graphical Models 30/9/2011 13 / 30

  • Graphical Models

    Conditional Independence Semantics

    Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)

    Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!

    (EPFL) Graphical Models 30/9/2011 13 / 30

  • Graphical Models

    Conditional Independence Semantics

    Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphs

    How do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!

    (EPFL) Graphical Models 30/9/2011 13 / 30

  • Graphical Models

    Conditional Independence Semantics

    Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by C

    For Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!

    (EPFL) Graphical Models 30/9/2011 13 / 30

  • Graphical Models

    Conditional Independence Semantics

    Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!

    (EPFL) Graphical Models 30/9/2011 13 / 30

  • Graphical Models

    Undirected Graphical Models (Markov Random Fields)

    Bayesian Networks: Describe CIs with directed graphs (DAGs)Markov Random Fields: Describe CIs with undirected graphs

    CI semantics of undirected models: Really just graph separation

    (EPFL) Graphical Models 30/9/2011 14 / 30

  • Graphical Models

    Undirected Graphical Models (Markov Random Fields)

    Bayesian Networks: Describe CIs with directed graphs (DAGs)Markov Random Fields: Describe CIs with undirected graphsCI semantics of undirected models: Really just graph separation

    (EPFL) Graphical Models 30/9/2011 14 / 30

  • Graphical Models

    Undirected Graphical Models (II)

    Why two frameworks?Each can capture setups the other cannotMore important: In practice, some problems are much easier toparameterize (therefore: to learn) as MRFs, others much easier asBayes nets

    How do distributions P for MRF graph G look like?Hammersley / Clifford:

    Maximal cliques (completely connected parts) Cj of GP(x ) consistent with MRF G,

    P(x ) = Z�1Y

    j

    �j(xCj ), Z :=X

    x

    Y

    j

    �j(xCj )

    with potentials �j(xCj ) � 0. Z : Partition function.Potentials need not normalize to 1

    (EPFL) Graphical Models 30/9/2011 15 / 30

  • Graphical Models

    Undirected Graphical Models (II)

    Why two frameworks?Each can capture setups the other cannotMore important: In practice, some problems are much easier toparameterize (therefore: to learn) as MRFs, others much easier asBayes nets

    How do distributions P for MRF graph G look like?Hammersley / Clifford:

    Maximal cliques (completely connected parts) Cj of GP(x ) consistent with MRF G,

    P(x ) = Z�1Y

    j

    �j(xCj ), Z :=X

    x

    Y

    j

    �j(xCj )

    with potentials �j(xCj ) � 0. Z : Partition function.Potentials need not normalize to 1

    (EPFL) Graphical Models 30/9/2011 15 / 30

  • Graphical Models

    Undirected Graphical Models (III)

    (EPFL) Graphical Models 30/9/2011 16 / 30

  • Graphical Models

    Directed vs. Undirected

    Sampling x ⇠ P(x ):Always simple from Bayes net. Can be very hard for an MRF

    Implicit, symmetrical knowledge? Little idea about causal links(pixels of image, correspondences)? MRFs more useful thenBottomline: Usually, one or the other is much more suitable.Better know well about both!

    (EPFL) Graphical Models 30/9/2011 17 / 30

  • Graphical Models

    Directed vs. Undirected

    Sampling x ⇠ P(x ):Always simple from Bayes net. Can be very hard for an MRFImplicit, symmetrical knowledge? Little idea about causal links(pixels of image, correspondences)? MRFs more useful then

    Bottomline: Usually, one or the other is much more suitable.Better know well about both!

    (EPFL) Graphical Models 30/9/2011 17 / 30

  • Graphical Models

    Directed vs. Undirected

    Sampling x ⇠ P(x ):Always simple from Bayes net. Can be very hard for an MRFImplicit, symmetrical knowledge? Little idea about causal links(pixels of image, correspondences)? MRFs more useful thenBottomline: Usually, one or the other is much more suitable.Better know well about both!

    (EPFL) Graphical Models 30/9/2011 17 / 30

  • Belief Propagation

    Towards Efficient Marginalization

    With sufficient Markovian CI constraints (directed or undirected):

    P(x1, . . . , xn) /Y

    j

    �j(xNj ), |Nj |⌧ n

    Can store that. But what about computation?

    Short answer: It depends on global graph structure properties,beyond local factorization

    (EPFL) Graphical Models 30/9/2011 18 / 30

  • Belief Propagation

    Towards Efficient Marginalization

    With sufficient Markovian CI constraints (directed or undirected):

    P(x1, . . . , xn) /Y

    j

    �j(xNj ), |Nj |⌧ n

    Can store that. But what about computation?Short answer: It depends on global graph structure properties,beyond local factorization

    Storage: Linear in nComputation: Exponential in n1/2 [P6=NP]

    (EPFL) Graphical Models 30/9/2011 18 / 30

  • Belief Propagation

    Node Elimination

    Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)

    (EPFL) Graphical Models 30/9/2011 19 / 30

  • Belief Propagation

    Node Elimination

    Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)

    X

    x4

    P(x1, . . . , x4, . . . , x7)

    (EPFL) Graphical Models 30/9/2011 19 / 30

  • Belief Propagation

    Node Elimination

    Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)

    X

    x4

    P(x1, . . . , x4, . . . , x7)

    =�1(x1, x2)�2(x2, x3)

    X

    x4

    �3(x3, x4)�4(x4, x5)

    !�5(x5, x6)�6(x6, x7)

    (EPFL) Graphical Models 30/9/2011 19 / 30

  • Belief Propagation

    Node Elimination

    Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)

    X

    x4

    P(x1, . . . , x4, . . . , x7)

    =�1(x1, x2)�2(x2, x3)

    X

    x4

    �3(x3, x4)�4(x4, x5)

    !�5(x5, x6)�6(x6, x7)

    =�1(x1, x2)�2(x2, x3)M35(x3, x5)�5(x5, x6)�6(x6, x7)

    (EPFL) Graphical Models 30/9/2011 19 / 30

  • Belief Propagation

    Tree Graphs

    (EPFL) Graphical Models 30/9/2011 20 / 30

  • Belief Propagation

    Factor Graphs

    Factor graphs: Yet another type of graphical model

    Bipartite graph:variable / factor nodesNo probabilitysemanticsJust for derivingMarkovian propagationalgorithmsFactor graph = tree) Fast computation

    (EPFL) Graphical Models 30/9/2011 21 / 30

  • Belief Propagation

    Factor Graphs

    Factor graphs: Yet another type of graphical model

    Bipartite graph:variable / factor nodesNo probabilitysemanticsJust for derivingMarkovian propagationalgorithmsFactor graph = tree) Fast computation

    Undirected GM! Factor graph: ImmediateDirected GM ! Factor graph: Easy exercise

    (EPFL) Graphical Models 30/9/2011 21 / 30

  • Belief Propagation

    Towards Belief Propagation

    (EPFL) Graphical Models 30/9/2011 22 / 30

  • Belief Propagation

    What is a Message?

    (EPFL) Graphical Models 30/9/2011 23 / 30

  • Belief Propagation

    What is a Message?

    Formally: Directed potential over onevariableIntuition: Message T2 ! a:What T2 thinks xa should beNaive “definition”:

    Product: All T2, and edge! aSum: All except xa

    ) Real definition recursive (G tree!)

    (EPFL) Graphical Models 30/9/2011 24 / 30

  • Belief Propagation

    What is a Message?

    Formally: Directed potential over onevariableIntuition: Message T2 ! a:What T2 thinks xa should beNaive “definition”:

    Product: All T2, and edge! aSum: All except xa

    ) Real definition recursive (G tree!)

    Subtle points:Messages: Not conditional / marginal distributions of P.Message µT2!a(xa) has seen T2 only

    Strictly speaking: Two types of messages: �! ⇤, ⇤!�) Understand idea, behind formalities

    (EPFL) Graphical Models 30/9/2011 24 / 30

  • Belief Propagation

    What is a Message?

    Formally: Directed potential over onevariableIntuition: Message T2 ! a:What T2 thinks xa should beNaive “definition”:

    Product: All T2, and edge! aSum: All except xa

    ) Real definition recursive (G tree!)

    Subtle points:Messages: Not conditional / marginal distributions of P.Message µT2!a(xa) has seen T2 onlyStrictly speaking: Two types of messages: �! ⇤, ⇤!�) Understand idea, behind formalities

    (EPFL) Graphical Models 30/9/2011 24 / 30

  • Belief Propagation

    Message Passing: The RecipeMessage µT!a(xa) to a

    1 Expand factor: (T1, b1),(T2, b2)

    2 Product: Gather potentials

    �j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito

    3 Sum: Over xb1 , xb2

    (EPFL) Graphical Models 30/9/2011 25 / 30

  • Belief Propagation

    Message Passing: The RecipeMessage µT!a(xa) to a

    1 Expand factor: (T1, b1),(T2, b2)

    2 Product: Gather potentials

    �j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito

    3 Sum: Over xb1 , xb2

    (EPFL) Graphical Models 30/9/2011 25 / 30

  • Belief Propagation

    Message Passing: The RecipeMessage µT!a(xa) to a

    1 Expand factor: (T1, b1),(T2, b2)

    2 Product: Gather potentials�j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito

    3 Sum: Over xb1 , xb2

    (EPFL) Graphical Models 30/9/2011 25 / 30

  • Belief Propagation

    Message Passing: The RecipeMessage µT!a(xa) to a

    1 Expand factor: (T1, b1),(T2, b2)

    2 Product: Gather potentials�j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito

    3 Sum: Over xb1 , xb2

    µT!a(xa) /X

    xb1 ,xb2

    �j(xab1b2)

    0

    @Y

    T̃ :T1\b1

    µT̃!b1(xb1)

    1

    A

    0

    @Y

    T̃ :T2\b2

    µT̃!b2(xb2)

    1

    A

    (EPFL) Graphical Models 30/9/2011 25 / 30

  • Belief Propagation

    Message Passing: The Idea

    I can never remember these message passing equations.What I remember:

    Messages are partial information, given part of graph

    Message passing is information propagation

    Product: PredictSum: Marginalize (cover your tracks)

    Directed like filtering)We’ll see cases where

    Qis not

    Q, and

    Pis not

    P

    Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all partsMP works on trees, because information cannot go around incycles

    (EPFL) Graphical Models 30/9/2011 26 / 30

  • Belief Propagation

    Message Passing: The Idea

    I can never remember these message passing equations.What I remember:

    Messages are partial information, given part of graphMessage passing is information propagation

    Product: PredictSum: Marginalize (cover your tracks)

    Directed like filtering)We’ll see cases where

    Qis not

    Q, and

    Pis not

    P

    Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all partsMP works on trees, because information cannot go around incycles

    (EPFL) Graphical Models 30/9/2011 26 / 30

  • Belief Propagation

    Message Passing: The Idea

    I can never remember these message passing equations.What I remember:

    Messages are partial information, given part of graphMessage passing is information propagation

    Product: PredictSum: Marginalize (cover your tracks)

    Directed like filtering)We’ll see cases where

    Qis not

    Q, and

    Pis not

    P

    Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all parts

    MP works on trees, because information cannot go around incycles

    (EPFL) Graphical Models 30/9/2011 26 / 30

  • Belief Propagation

    Message Passing: The Idea

    I can never remember these message passing equations.What I remember:

    Messages are partial information, given part of graphMessage passing is information propagation

    Product: PredictSum: Marginalize (cover your tracks)

    Directed like filtering)We’ll see cases where

    Qis not

    Q, and

    Pis not

    P

    Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all partsMP works on trees, because information cannot go around incycles

    (EPFL) Graphical Models 30/9/2011 26 / 30

  • Belief Propagation

    Belief Propagation: More than Node Elimination

    Marginalization by message passing:

    P(xa) =X

    x\xa

    P(x ) / �a(xa)Y

    j2Na

    µTj!a(xa)

    Na : Factor nodes neighbouring a$ factors �j(xa, . . . )�a: Can be ⌘ 1

    All marginals P(x1), P(x2), . . . ? Do this n times. Right?) NO! Do this twice only!) If you understand that, you’ve understood belief propagation

    Belief Propagation on Trees

    Message uniquely defined, independent of use, order ofcomputationMessage can be computed once all inputs received. Oncecomputed, it does not change anymoreCompute all messages (2 per edge)) All marginals, O(1) each

    (EPFL) Graphical Models 30/9/2011 27 / 30

  • Belief Propagation

    Belief Propagation: More than Node Elimination

    Marginalization by message passing:

    P(xa) =X

    x\xa

    P(x ) / �a(xa)Y

    j2Na

    µTj!a(xa)

    Na : Factor nodes neighbouring a$ factors �j(xa, . . . )�a: Can be ⌘ 1All marginals P(x1), P(x2), . . . ? Do this n times. Right?) NO! Do this twice only!) If you understand that, you’ve understood belief propagation

    Belief Propagation on Trees

    Message uniquely defined, independent of use, order ofcomputationMessage can be computed once all inputs received. Oncecomputed, it does not change anymoreCompute all messages (2 per edge)) All marginals, O(1) each

    (EPFL) Graphical Models 30/9/2011 27 / 30

  • Belief Propagation

    Belief Propagation: More than Node Elimination

    Marginalization by message passing:

    P(xa) =X

    x\xa

    P(x ) / �a(xa)Y

    j2Na

    µTj!a(xa)

    Na : Factor nodes neighbouring a$ factors �j(xa, . . . )�a: Can be ⌘ 1All marginals P(x1), P(x2), . . . ? Do this n times. Right?) NO! Do this twice only!) If you understand that, you’ve understood belief propagation

    Belief Propagation on TreesMessage uniquely defined, independent of use, order ofcomputationMessage can be computed once all inputs received. Oncecomputed, it does not change anymoreCompute all messages (2 per edge)) All marginals, O(1) each

    (EPFL) Graphical Models 30/9/2011 27 / 30

  • Belief Propagation

    Implementation of Belief Propagation

    Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves

    (EPFL) Graphical Models 30/9/2011 28 / 30

  • Belief Propagation

    Implementation of Belief Propagation

    Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves

    Messages can be normalized at will:

    µT!a(xa) =X

    xCj\a�j(xCj )

    YCµT̃!b1(xb1) . . .

    (EPFL) Graphical Models 30/9/2011 28 / 30

  • Belief Propagation

    Implementation of Belief Propagation

    Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves

    Messages can be normalized at will:

    µT!a(xa) = CX

    xCj\a�j(xCj )

    YµT̃!b1(xb1) . . .

    (EPFL) Graphical Models 30/9/2011 28 / 30

  • Belief Propagation

    Implementation of Belief Propagation

    Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves

    Avoiding underflow / overflow (yes, it does matter):Renormalize each message to sum to 1Better: Work in log domain (log-messages, log-potentials):Q! +P! logsumexp [careful with zeros!]

    logsumexp(v ) := logXk

    i=1evi = M + log

    Xki=1

    evi�M| {z }

    numerically stable

    , M = maxi

    vi

    (EPFL) Graphical Models 30/9/2011 28 / 30

  • Belief Propagation

    Searching for the Mode: Max-ProductDecoding:

    x⇤ 2 argmaxx

    P(x )

    max,Q

    : Same decomposition asP

    ,Q

    .Better: max,

    Pin log domain

    Max-messages:

    µT!a(xa) = maxxCj\a

    ✓log�j(xCj ) +

    Xb2Cj\a

    µTb!b(xb)◆

    Back-pointer tables:

    �T!a(xa) 2 argmaxxCj\a

    ✓log�j(xCj ) +

    Xb2Cj\a

    µTb!b(xb)◆

    (EPFL) Graphical Models 30/9/2011 29 / 30

  • Belief Propagation

    Searching for the Mode: Max-ProductDecoding:

    x⇤ 2 argmaxx

    P(x )

    max,Q

    : Same decomposition asP

    ,Q

    .Better: max,

    Pin log domain

    Max-messages:

    µT!a(xa) = maxxCj\a

    ✓log�j(xCj ) +

    Xb2Cj\a

    µTb!b(xb)◆

    Back-pointer tables:

    �T!a(xa) 2 argmaxxCj\a

    ✓log�j(xCj ) +

    Xb2Cj\a

    µTb!b(xb)◆

    (EPFL) Graphical Models 30/9/2011 29 / 30

  • Belief Propagation

    Searching for the Mode: Max-ProductDecoding:

    x⇤ 2 argmaxx

    P(x )

    max,Q

    : Same decomposition asP

    ,Q

    .Better: max,

    Pin log domain

    Max-messages:

    µT!a(xa) = maxxCj\a

    ✓log�j(xCj ) +

    Xb2Cj\a

    µTb!b(xb)◆

    Back-pointer tables:

    �T!a(xa) 2 argmaxxCj\a

    ✓log�j(xCj ) +

    Xb2Cj\a

    µTb!b(xb)◆

    (EPFL) Graphical Models 30/9/2011 29 / 30

  • Belief Propagation

    Wrap-Up

    Belief propagation (sum-product) on trees:All marginals in linear time, by local information propagationMax-product, max-sum, logsumexp-sum, . . . :What matters is the graph!

    What about general graphs?

    Decomposable graphs. Treewidth of a graphJunction tree algorithm

    Interested?

    PMR Edinburgh slides:http://www.inf.ed.ac.uk/teaching/courses/pmr/slides/jta-2x2.pdf

    Lauritzen, S; Spiegelhalter, D. Local Computations withProbabilities on Graphical Structures and their Application to ExpertSystems. JRSS-B, 50: 157-224 (1988)

    Beware (not surprising): Inference on general graphs is NP hard.In general, approximations are a must

    (EPFL) Graphical Models 30/9/2011 30 / 30

  • Belief Propagation

    Wrap-Up

    Belief propagation (sum-product) on trees:All marginals in linear time, by local information propagationMax-product, max-sum, logsumexp-sum, . . . :What matters is the graph!What about general graphs?

    Decomposable graphs. Treewidth of a graphJunction tree algorithm

    Interested?PMR Edinburgh slides:http://www.inf.ed.ac.uk/teaching/courses/pmr/slides/jta-2x2.pdf

    Lauritzen, S; Spiegelhalter, D. Local Computations withProbabilities on Graphical Structures and their Application to ExpertSystems. JRSS-B, 50: 157-224 (1988)

    Beware (not surprising): Inference on general graphs is NP hard.In general, approximations are a must

    (EPFL) Graphical Models 30/9/2011 30 / 30

  • Belief Propagation

    Wrap-Up

    Belief propagation (sum-product) on trees:All marginals in linear time, by local information propagationMax-product, max-sum, logsumexp-sum, . . . :What matters is the graph!What about general graphs?

    Decomposable graphs. Treewidth of a graphJunction tree algorithm

    Interested?PMR Edinburgh slides:http://www.inf.ed.ac.uk/teaching/courses/pmr/slides/jta-2x2.pdf

    Lauritzen, S; Spiegelhalter, D. Local Computations withProbabilities on Graphical Structures and their Application to ExpertSystems. JRSS-B, 50: 157-224 (1988)

    Beware (not surprising): Inference on general graphs is NP hard.In general, approximations are a must

    (EPFL) Graphical Models 30/9/2011 30 / 30

    Graphical ModelsBelief Propagation