algorithms for reasoning with probabilistic graphical modelsdechter/talks/part1-inference.pdf ·...

109
Algorithms for Reasoning with Probabilistic Graphical Models International Summer School on Deep Learning July 2017 Prof. Rina Dechter Prof. Alexander Ihler

Upload: others

Post on 26-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Algorithms for Reasoning with Probabilistic Graphical Models

International Summer School on Deep Learning July 2017

Prof. Rina DechterProf. Alexander Ihler

Page 2: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Outline of Lectures• Class 1: Introduction and Inference

• Class 2: Search

• Class 3: Variational Methods and Monte-Carlo Sampling

Dechter & Ihler DeepLearn 2017 2

E K

F

LH

C

BA

M

G

J

DABC

BDEF

DGF

EFH

FHK

HJ KLM

0 1 0 1 0 1 0 10 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

01010101010101010101010101010101010101010101010101010101010101010 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1EC

FD

BA 0 1

Context minimal AND/OR search graph

AOR0ANDBOR

0ANDOR E

OR F FAND 01

AND 0 1C

D D01

0 1

1EC

D D0 1

1B

0E

F F0 1

C1

EC

Page 3: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 3

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 4: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 4

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 5: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

5Dechter & Ihler DeepLearn 2017

Page 6: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

• Examples & Tasks– Maximization (MAP): compute the most probable configuration

6

[Yanover & Weiss 2002]

Dechter & Ihler DeepLearn 2017

Page 7: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

• Examples & Tasks– Summation & marginalization

7

grass

plane

sky

grass

cow

Observation y Observation yMarginals p( xi | y ) Marginals p( xi | y )

and

“partition function”

Dechter & Ihler DeepLearn 2017

e.g., [Plath et al. 2009]

Page 8: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

• Examples & Tasks– Mixed inference (marginal MAP, MEU, …)

8

Test

Drill Oil salepolicy

Testresult

Seismicstructure

Oilunderground

Oilproduced

Testcost

Drillcost

Salescost

Oil sales

Marketinformation

Influence diagrams &optimal decision-making

(the “oil wildcatter” problem)

Dechter & Ihler DeepLearn 2017

e.g., [Raiffa 1968; Shachter 1986]

Page 9: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical models

9

Example:

The combination operator defines an overall function from the individual factors,e.g., “+” :

Notation:Discrete Xi values called statesTuple or configuration: states taken by a set of variablesScope of f: set of variables that are arguments to a factor f

often index factors by their scope, e.g.,

Dechter & Ihler DeepLearn 2017

A graphical model consists of:-- variables-- domains-- functions or “factors”

and a combination operator

(we’ll assume discrete)

Page 10: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical models

10

+ = 0 + 6

A B f(A,B)

0 0 6

0 1 0

1 0 0

1 1 6

B C f(B,C)

0 0 6

0 1 0

1 0 0

1 1 6

A B C f(A,B,C)

0 0 0 12

0 0 1 6

0 1 0 0

0 1 1 6

1 0 0 6

1 0 1 0

1 1 0 6

1 1 1 12

=

For discrete variables, think of functions as “tables” (though we might represent them more efficiently)

A graphical model consists of:-- variables-- domains-- functions or “factors”

and a combination operator

Example:

(we’ll assume discrete)

Dechter & Ihler DeepLearn 2017

Page 11: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Canonical forms

11

Typically either multiplication or summation; mostly equivalent:

Product of nonnegative factors(probabilities, 0/1, etc.)

Sum of factors(costs, utilities, etc.)

log / exp

A graphical model consists of:-- variables-- domains-- functions or “factors”

and a combination operator

Dechter & Ihler DeepLearn 2017

Page 12: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical visualization

12

Primal graph:variables → nodesfactors → cliques

G

A

B C

D F

A graphical model consists of:-- variables-- domains-- functions or “factors”

and a combination operator

Dechter & Ihler DeepLearn 2017

Page 13: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Example: Map Coloring

13

Overall function is “and” of individual constraints:

for adjacent regions i,j

“Tabular” form:

X0 X1 f(X0 ,X1)

0 0 0

0 1 1

0 2 1

1 0 1

1 1 0

1 2 1

2 0 1

2 1 1

2 2 0

Tasks: “max”: is there a solution?“sum”: how many solutions?

Dechter & Ihler DeepLearn 2017

Page 14: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Example: Bayesian Networks

14

Random variables S,K,R,WS has states: {Fall, Winter, Spring, Summer}R, K, W have states: {True, False}

Overall function is product of conditional probabilities:

Season

Sprinkler Rain

Wet

P(S)

P(K|S)

P(W|K,S)

P(R|S)

P(W|K,R) = K R W=0 W=1

0 0 1.0 0.0

0 1 0.2 0.8

1 0 0.1 0.9

1 1 0.01 0.99

Typical tasks:Observe some variables’ outcomeReason about the change in probability of others

“max”: what’s the most probable (MAP) state?“sum”: what’s the probability it rained, given it’s wet out?

(sometime called a “belief”)

Dechter & Ihler DeepLearn 2017

Page 15: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Alarm network• Bayes nets: compact representation of large joint distributions

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

The “alarm” network: 37 variables, 509 parameters (rather than 237 = 1011 !)

[Beinlich et al., 1989]

Dechter & Ihler DeepLearn 2017 15

Page 16: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

habits. smoking similar have Friendscancer. causes Smoking

Example: Markov logic

( ))()(),(,)()(

ySmokesxSmokesyxFriendsyxxCancerxSmokesx

⇔⇒∀⇒∀

1.15.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Two constants: Anna (A) and Bob (B) SA CA f(SA,CA)

0 0 exp(1.5)

0 1 exp(1.5)

1 0 1.0

1 1 exp(1.5)

FAB SA SB f(.)

0 0 0 exp(1.1)

0 0 1 exp(1.1)

0 1 0 exp(1.1)

0 1 1 exp(1.1)

1 0 0 exp(1.1)

1 0 1 1.0

1 1 0 1.0

1 1 1 exp(1.1)

[Richardson & Domingos 2005]

Dechter & Ihler DeepLearn 2017 16

Page 17: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Example domains for graphical models• Natural Language processing

– Information extraction, semantic parsing, translation, topic models, …

• Computer vision– Object recognition, scene analysis, segmentation, tracking, …

• Computational biology– Pedigree analysis, protein folding and binding, sequence matching, …

• Networks– Webpage link analysis, social networks, communications, citations, ….

• Robotics– Planning & decision making

17Dechter & Ihler DeepLearn 2017

Page 18: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical visualization

18

Primal graph:variables nodesfactors cliques

A graphical model consists of:-- variables-- domains-- functions or “factors”

and a combination operator

Dechter & Ihler DeepLearn 2017

G

A

B C

D F

ABD

BCF AC

DFGD

B

C

A F

Dual graph:factor scopes nodesedges intersections (separators)

Page 19: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical visualization

19

“Factor” graph: explicitly indicate the scope of each factorvariables circlesfactors squares

G

A

B C

D F

A

B

C

D

A

B

C

D

Useful for disambiguating factorization:

= vs.

O(d4) pairwise: O(d2)

A

B

C

D

?

Dechter & Ihler DeepLearn 2017

Page 20: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Ex: Boltzmann machines• Boltzmann machines:

• Deep Boltzmann machines:

MNIST:

“visible” v

“hidden” h

“hidden” h2

Xi Xj f(Xi,Xj)

0 0 1.0

0 1 1.0

1 0 1.0

1 1 exp(wij)

Xi f(Xi)

0 1.0

1 exp(ai)

Dechter & Ihler DeepLearn 2017 20

Page 21: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Graphical visualization

21

A graphical model consists of:-- variables-- domains-- functions or “factors”

Operators:combination operator

(sum, product, join, …)

elimination operator(projection, sum, max, min, ...)

Types of queries:Marginal:MPE / MAP:Marginal MAP:

Dechter & Ihler DeepLearn 2017

)( : CAFfi +==

A

D

B C

E

F

• All these tasks are NP-hard• exploit problem structure• identify special cases• approximate

A C F P(F|A,C)0 0 0 0.140 0 1 0.960 1 0 0.400 1 1 0.601 0 0 0.351 0 1 0.651 1 0 0.721 1 1 0.68

Conditional Probability Table (CPT)

Primal graph(interaction graph)

A C Fred green blueblue red redblue blue green

green red blue

Relation

(𝐴𝐴⋁𝐶𝐶⋁𝐹𝐹)

Page 22: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 23

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 23: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Sum-Inference

Max-Inference

Mixed-Inference

Types of queries

• NP-hard: exponentially many terms• We will focus on approximation algorithms

– Anytime: very fast & very approximate ! Slower & more accurate

Harder

Dechter & Ihler DeepLearn 2017 24

Page 24: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Tree-solving is easy

Belief updating (sum-prod)

MPE (max-prod)

CSP – consistency (projection-join)

#CSP (sum-prod)

P(X)

P(Y|X) P(Z|X)

P(T|Y) P(R|Y) P(L|Z) P(M|Z)

)(XmZX

)(XmXZ

)(ZmZM)(ZmZL

)(ZmMZ)(ZmLZ

)(XmYX

)(XmXY

)(YmTY

)(YmYT

)(YmRY

)(YmYR

Trees are processed in linear time and memoryDeepLearn 2017Dechter & Ihler 25

Page 25: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Transforming into a Tree • By Inference (thinking)

– Transform into a single, equivalent tree of sub-problems

• By Conditioning (guessing)– Transform into many tree-like sub-problems.

DeepLearn 2017Dechter & Ihler 26

Page 26: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Inference and Treewidth

EK

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

treewidth = 4 - 1 = 3treewidth = (maximum cluster size) - 1

Inference algorithm:Time: exp(tree-width)Space: exp(tree-width)

DeepLearn 2017Dechter & Ihler 27

Page 27: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Conditioning and Cycle cutset

C P

J A

L

B

E

DF M

O

H

K

G N

C P

J

L

B

E

DF M

O

H

K

G N

A

C P

J

L

E

DF M

O

H

K

G N

B

P

J

L

E

DF M

O

H

K

G N

C

Cycle cutset = {A,B,C}

C P

J A

L

B

E

DF M

O

H

K

G N

C P

J

L

B

E

DF M

O

H

K

G N

C P

J

L

E

DF M

O

H

K

G N

C P

J A

L

B

E

DF M

O

H

K

G N

DeepLearn 2017Dechter & Ihler 28

Page 28: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Search over the Cutset

A=yellow A=green

B=red B=blue B=red B=blueB=green B=yellow

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

• Inference may require too much memory

• Condition on some of the variablesA

C

B K

G

LD

FH

M

J

E

GraphColoringproblem

DeepLearn 2017Dechter & Ihler 29

Page 29: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Inference

exp(w*) time/space

A

D

B C

E

F0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1 SearchExp(w*) timeO(w*) space

E K

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A=yellow A=green

B=blue B=red B=blueB=green

CK

G

LD

F H

M

J

EACB K

G

LD

F H

M

J

E

CK

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

ESearch+inference:Space: exp(q)Time: exp(q+c(q))

q: usercontrolled

Bird's-eye View of Exact Algorithms

DeepLearn 2017Dechter & Ihler 30

Page 30: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Inference

exp(w*) time/space

A

D

B C

E

F0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1 SearchExp(w*) timeO(w*) space

E K

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A=yellow A=green

B=blue B=red B=blueB=green

CK

G

LD

F H

M

J

EACB K

G

LD

F H

M

J

E

CK

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

ESearch+inference:Space: exp(q)Time: exp(q+c(q))

q: usercontrolled

Context minimal AND/OR search graph

18 AND nodes

AOR0ANDBOR

0ANDOR E

OR F FAND 0 1

AND 0 1

C

D D

0 1

0 1

1

EC

D D

0 1

1

B

0

E

F F

0 1

C1

EC

Bird's-eye View of Exact Algorithms

DeepLearn 2017Dechter & Ihler 31

Page 31: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

A

D

B C

E

F0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1

E K

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A=yellow A=green

B=blue B=red B=blueB=green

CK

G

LD

F H

M

J

EACB K

G

LD

F H

M

J

E

CK

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

E

Inference

Bounded Inference

Search

Sampling

Search + inference:

Sampling + bounded inference

Bird's-eye View of Approximate Algorithms

DeepLearn 2017Dechter & Ihler 32

Context minimal AND/OR search graph

18 AND nodes

AOR0ANDBOR

0ANDOR E

OR F FAND 0 1

AND 0 1

C

D D

0 1

0 1

1

EC

D D

0 1

1

B

0

E

F F

0 1

C1

EC

Page 32: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

DeepLearn 2017

The Conditioning and Elimination Operators

Dechter & Ihler 33

Page 33: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Conditioning on Observations• Observing a variable’s value

– Reduces the scope of the factor

Dechter & Ihler DeepLearn 2017 34

Page 34: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Assign A=b Assign B=rB g(B)

b 4g 5r 1

A B f(A,B)

b b 4

b g 5

b r 1

g b 2

g g 6

g r 3

r b 1

r g 1

r r 6

h∅

1

Conditioning on Observations

Dechter & Ihler DeepLearn 2017 35

Page 35: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Conditional independence• Undirected graphs have very simple conditional independence

– X conditionally independent of Y given Z?– Check all paths from X to Y– A path is “inactive” (blocked) if it passes through a variable node in Z– If no path from X to Y, conditionally independent

• Examples:

Dechter & Ihler DeepLearn 2017 36

A B

C

D

A,D independent given B

A B

C

D

A,B independent

Markov blanket of X: set of variables directly connected to X

Page 36: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Combination of Factors

A B f(A,B)

b b 0.4

b g 0.1

g b 0

g g 0.5

B C f(B,C)

b b 0.2

b g 0

g b 0

g g 0.8A B C f(A,B,C)

b b b 0.1

b b g 0

b g b 0

b g g 0.08

g b b 0

g b g 0

g g b 0

g g g 0.4

= 0.1 x 0.8

DeepLearn 2017Dechter & Ihler 37

Page 37: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Elimination in a Factor

DeepLearn 2017

A B f(A,B)

b b 4

b g 6

b r 1

g b 2

g g 6

g r 3

r b 1

r g 1

r r 6

Elim(f,B) A g(A)

b 11

g 11

r 8

Elim(g,A)h∅

30

Elim = sum

Dechter & Ihler 38

Page 38: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Conditioning versus Elimination

DeepLearn 2017

A

G

B

C

E

D

F

Conditioning (search) Elimination (inference)

A=1 A=k…

G

B

C

E

D

F

G

B

C

E

D

F

A

G

B

C

E

D

F

G

B

C

E

D

F

k “sparser” problems 1 “denser” problemDechter & Ihler 39

Page 39: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 40

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 40: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 41

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 41: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Variable Elimination in Trees(Use distributive rule to calculate efficiently:)

DeepLearn 2017Dechter & Ihler 42

Page 42: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Variable Elimination in Trees(Use distributive rule to calculate efficiently:)

DeepLearn 2017Dechter & Ihler 43

Page 43: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Variable Elimination in Trees(Use distributive rule to calculate efficiently:)

DeepLearn 2017Dechter & Ihler 44

Page 44: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Variable Elimination in Trees(Use distributive rule to calculate efficiently:)

DeepLearn 2017Dechter & Ihler 45

Page 45: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Variable Elimination in Trees

For trees:Efficient elimination order (leaves to root);computational complexity same as model size

(Use distributive rule to calculate efficiently:)

DeepLearn 2017Dechter & Ihler 46

Page 46: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Sum-Inference

Max-Inference

Mixed-Inference

Types of queries

• NP-hard: exponentially many terms• We will focus on approximation algorithms

– Anytime: very fast & very approximate ! Slower & more accurate

Harder

Dechter & Ihler DeepLearn 2017 47

Page 47: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 48

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 48: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

“primal” graph

A

D E

CB

Belief Updating• p(X | Evidence) = ?

Dechter & Ihler DeepLearn 2017 49

Variable Elimination

Page 49: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

W*=4“induced width”

(max clique size)

bucket B:

bucket C:

bucket D:

bucket E:

bucket A:

B

C

D

E

A

A

D E

CB

DeepLearn 2017Dechter & Ihler 50

Algorithm BE-bel [Dechter 1996]

Bucket Elimination

Elimination & combinationoperators

Page 50: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

W*=4”induced width”

(max clique size)

bucket B:

bucket C:

bucket D:

bucket E:

bucket A:

B

C

D

E

A

A

D E

CB

DeepLearn 2017Dechter & Ihler 51

Algorithm BE-bel [Dechter 1996]

Bucket Elimination

Elimination & combinationoperators

Time and space exponential in the induced-width / treewidth

Page 51: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Finding MPE/MAP

OPT

bucket B:

bucket C:

bucket D:

bucket E:

bucket A:

B

C

D

E

A

Algorithm BE-mpe (Dechter 1996, Bertele and Briochi, 1977)

DeepLearn 2017

A

D E

CBB C

ED

Dechter & Ihler 52

W*=4“induced width”(max clique size)

Page 52: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Generating the Optimal Assignment• Given BE messages, select optimum config in reverse order

DeepLearn 2017

Return optimal configuration (a*,b*,c*,d*,e*)

E:

C:

D:

B:

A:

OPT = optimal value

Dechter & Ihler 53

Page 53: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

DeepLearn 2017

primal graph

A

D E

CB

B

C

D

E

A

E

D

C

B

A

Dechter & Ihler 55

• Width is the max number of parents in the ordered graph• Induced-width is the width of the induced ordered graph: recursively

connecting parents going from last node to first.• Induced-width w*(d) is the max induced-width over all nodes in ordering d• Induced-width of a graph, w* is the min w*(d) over all orderings d

Induced Width

Page 54: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Complexity of Bucket Elimination

DeepLearn 2017

The effect of the ordering:

primal graph

A

D E

CB

B

C

D

E

A

E

D

C

B

A

Finding smallest induced-width is hard!

r = number of functions

Bucket-Elimination is time and space

: the induced width of the primal graph along ordering d

Dechter & Ihler 56

Page 55: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Sum-Inference

Max-Inference

Mixed-Inference

Types of queries

• NP-hard: exponentially many terms

Harder

Dechter & Ihler DeepLearn 2017 57

Page 56: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Marginal MAP is not easy on trees• Pure MAP or summation tasks

– Dynamic programming– Ex: efficient on trees

• Marginal MAP– Operations do not commute:

– Sum must be done first!

58

Max variables

Dechter & Ihler DeepLearn 2017

Page 57: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Bucket Elimination

A

B C

ED

MAX

SUM

B:

C:

D:

E:

A:

MAP* is the marginal MAP value

cons

train

ed e

limin

atio

n or

der

Bucket Elimination for MMAP

59Dechter & Ihler DeepLearn 2017

Page 58: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Bucket Elimination for MMAP

Dechter & Ihler DeepLearn 2017 60

Page 59: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 61

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 60: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

From BE to Bucket-Tree Elimination(BTE)

Dechter & Ihler DeepLearn 2017 62

D

G

AB C

F

First, observe the BE operates on a tree.

Second, What if we want the marginal on D?

Page 61: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

BTE: Allows Messages Both Ways

Dechter & Ihler DeepLearn 2017 63

D

G

AB C

FInitial buckets+ messages

𝑃𝑃(𝐷𝐷) = �𝑎𝑎,𝑏𝑏

𝑃𝑃(𝐷𝐷|𝑎𝑎, 𝑏𝑏) 𝜋𝜋𝐵𝐵→𝐷𝐷(𝑎𝑎, 𝑏𝑏)

𝑃𝑃(𝐹𝐹) = �𝑏𝑏,𝑐𝑐

𝑃𝑃 𝐹𝐹 𝑏𝑏, 𝑐𝑐 𝜋𝜋𝐶𝐶→𝐹𝐹 𝑏𝑏, 𝑐𝑐 𝜋𝜋𝐺𝐺→𝐹𝐹(𝐹𝐹)

Page 62: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

From Buckets to Clusters• Merge non-maximal buckets into maximal clusters.• Connect clusters into a tree: each cluster to one with which it

shares a largest subset of variables.• Separators are variable- intersection on adjacent clusters.

F

B,C A,B

A,B

G,F

A,B,C

D,B,A

B,A

A

F,B,C

(A)

D

G

AB C

F

A super-bucket-tree is an i-map of the Bayesian network

F

B,C

G,F

A,B,C

D,B,AF,B,C

(B)

F

G,F

A,B,C,D,F

(C)

Dechter & Ihler DeepLearn 2017 64

Time exp(3)Memory exp(2)

Time exp(5)Memory exp(1)

Page 63: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

The General Tree-Decomposition

DeepLearn 2017Dechter & Ihler 65

EK

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

treewidth = 4 - 1 = 3treewidth = (maximum cluster size) - 1

Inference algorithm:Time: exp(tree-width)Space: exp(tree-width)

We move Bucket tree to general Claster tree to trade Memory for time

Page 64: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

G

E

F

C D

B

A

)p(b|a

)p(a

),| bap(c

),dp(f|c

)P(d|b),| fbp(e

), fp(g|e

A B Cp(a), p(b|a), p(c|a,b)

B C D Fp(d|b), p(f|c,d)

B E Fp(e|b,f)

E F Gp(g|e,f)

EF

BF

BC

Example of a Tree Decomposition

Dechter & Ihler DeepLearn 2017 67

Page 65: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

property)on intersecti (running subtree connected a forms set the bleeach variaFor 2.

and such that vertex oneexactly is therefunction each For 1.

:satisfying and sets, twox each verte with gassociatin functions,

labeling are and and treea is where,,, triple a is model graphical afor A

χ(v)}V|X{vXXχ(v))scope(pψ(v)p

PpPψ(v)

Xχ(v)Vvψχ(V,E)TT

X,D,Piondecomposit tree

ii

ii

i

∈∈∈⊆∈

∈⊆

⊆∈=><

><ψχ

A B Cp(a), p(b|a), p(c|a,b)

B C D Fp(d|b), p(f|c,d)

B E Fp(e|b,f)

E F Gp(g|e,f)

EF

BF

BC

G

E

F

C D

B

A

Tree decomposition

Connectedness, orRunning intersection property

Tree Decompositions

Dechter & Ihler DeepLearn 2017 69

Page 66: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

u v

x1

x2

xn

∑ ∏ −∈=

),( )},({)(),(

:message theCompute

vu uvhuclusterffvuh

elim

h(u,v)

)},(),,(),...,,(),,({)()( 21 uvhuxhuxhuxhuucluster n∪=ψ

Elim(u,v) = cluster(u)-sep(u,v)

For max-productJust replace ∑With max.

Message passing on a tree decomposition

Dechter & Ihler DeepLearn 2017 70

Page 67: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Cluster-Tree Elimination (CTE), orJoin-Tree Message-passing

),|()|()(),()2,1( bacpabpapcbha

⋅⋅= ∑

),(),|()|(),( )2,3(,

)1,2( fbhdcfpbdpcbhfd

⋅⋅= ∑

),(),|()|(),( )2,1(,

)3,2( cbhdcfpbdpfbhdc

⋅⋅= ∑

),(),|(),( )3,4()2,3( fehfbepfbhe

⋅= ∑

),(),|(),( )3,2()4,3( fbhfbepfehb

⋅= ∑

),|(),()3,4( fegGpfeh e==G

E

F

C D

B

AABC

2

4

1

3 BEF

EFG

EF

BF

BC

BCDF

Time: O ( exp(w+1) )Space: O ( exp(sep) )

For each cluster P(X|e) is computed, also P(e)Dechter & Ihler DeepLearn 2017 71

CTE is exact

Page 68: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Examples of (Join)-Trees Construction

DeepLearn 2017

A

E D

CB

A

B

C

D

E

FF

F

E

D

C

B

A

ABCE

DEF

BCE

BCDEDE

E

FD

ABC

ABE

AB

BC

D

BCD

B

Dechter & Ihler 72

Page 69: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 73

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 70: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Finding a Small Induced-Width• NP-complete• A tree has induced-width of ?• Greedy algorithms:

– Min width– Min induced-width– Max-cardinality and chordal graphs– Fill-in (thought as the best)

• Anytime algorithms– Search-based [Gogate & Dechter 2003]

– Stochastic (CVO) [Kask, Gelfand & Dechter 2010]

DeepLearn 2017Dechter & Ihler 74

Page 71: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Greedy Orderings Heuristics• Min-induced-width

– From last to first, pick a node with smallest width

• Min-Fill– From last to first, pick a node with smallest fill-edges

Dechter & Ihler DeepLearn 2017 75

Complexity? O(𝑛𝑛3)

Page 72: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Min-Fill Heuristic• Select the variable that creates the fewest “fill-in” edges

Dechter & Ihler DeepLearn 2017 76

A

E D

C

F

A

D

CB

F

A

E D

CB

F

Eliminate B next?Connect neighbors“Fill-in” = 3: (A,D), (C,E), (D,E)

Eliminate E next?Neighbors already connected“Fill-in” = 0

Page 73: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Different Induced Graphs

DeepLearn 2017Dechter & Ihler 77A Min-fill ordering

A Min-IW ordering

Page 74: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• A graph is chordal if every cycle of length at least 4 has a chord

• Deciding chordality by max-cardinality ordering: – from 1 to n, always assigning a next node connected to a largest set of previously

selected nodes.

• A graph along max-cardinality order has no fill-in edges iff it is chordal.

• The maximal cliques of chordal graphs form a tree

Dechter & Ihler DeepLearn 2017 78

Chordal Graphs

[Tarjan & Yanakakis 1980]

Page 75: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Greedy Orderings Heuristics• Min-induced-width

– From last to first, pick a node with smallest width

• Min-Fill– From last to first, pick a node with smallest fill-edges

• Max-cardinality search • From first to last, pick a node with largest neighbors already

ordered.

Dechter & Ihler DeepLearn 2017 79

Complexity? O(𝑛𝑛3)

Complexity? O(𝑛𝑛 + 𝑚𝑚)

Page 76: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Summary Of Inference Scheme• Bucket elimination is time and memory exponential in the

induced-width.

• Join-tree (junction tree) clustering is time O(exp(w*)) and memory O(exp(sep)).

• Bothe solve exactly all queries.

• Finding the w* is hard, but greedy schemes work quit well to approximate. Most popular is fill-edges

• W along d is induced-width. Best induced-width is tree-width.

Dechter & Ihler DeepLearn 2017 80

Page 77: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 81

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 78: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Decomposition bounds• Upper & lower bounds via approximate problem decomposition• Example: MAP inference

– Relaxation: two “copies” of x, no longer required to be equal– Bound is tight (equality) if f1, f2 agree on maximizing value x

DeepLearn 2017 82

X f1(X)

0 1.0

1 2.0

2 3.0

3 4.0

X f2(X)

0 1.0

1 2.0

2 2.0

3 0.0

X F (X)

0 2.0

1 4.0

2 5.0

3 4.0

+=

5.0 = 4.0 + 2.0 = 6.0

Dechter & Ihler

Page 79: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Mini-Bucket Approximation

Split a bucket into mini-buckets ―> bound complexity

Exponential complexity decrease:

bucket (X) =

Dechter & Ihler DeepLearn 2017 83

Page 80: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Mini-Bucket Elimination

A

D E

CB

bucket E:

bucket C:

bucket D:

bucket B:

bucket A:

mini-buckets

U = upper bound

[Dechter & Rish 2003]

Dechter & Ihler DeepLearn 2017 84

Page 81: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Mini-Bucket Elimination

bucket E:

bucket C:

bucket D:

bucket B:

bucket A:

mini-buckets

U = upper bound

[Dechter & Rish 2003]

A

D E

CB’

B

Can interpret process as “duplicating” B [Kask et al. 2001, Geffner et al. 2007,

Choi et al. 2007, Johnson et al. 2007]

Dechter & Ihler DeepLearn 2017 85

Page 82: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Mini-Bucket Decoding• Assign values in reverse order using approximate messages

DeepLearn 2017

Greedy configuration = lower bound

E:

C:

D:

B:

A:

mini-buckets

U = upper bound

Dechter & Ihler 86

Page 83: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Properties of MBE(i)• Complexity: O(r exp(i)) time and O(exp(i)) space

• Yields a lower bound and an upper bound

• Accuracy: determined by upper/lower (U/L) bound

• Possible use of mini-bucket approximations– As anytime algorithms– As heuristics in search

• Other tasks (similar mini-bucket approximations)– Belief updating, Marginal MAP, MEU, WCSP, Max-CSP[Dechter and Rish, 1997], [Liu and Ihler, 2011], [Liu and Ihler, 2013]

DeepLearn 2017Dechter & Ihler 87

Page 84: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Tightening the bound• Reparameterization (or, “cost shifting”)

– Decrease bound without changing overall function

88

+B C f2(B,C)

0 0 1.0

0 1 0.0

1 0 1.0

1 1 3.0

A B C F(A,B,C)

0 0 0 3.0

0 0 1 2.0

0 1 0 2.0

0 1 1 4.0

1 0 0 4.5

1 0 1 3.5

1 1 0 4.0

1 1 1 6.0

=

A B f1(A,B) ¸(B)

0 0 2.00

1 0 3.5

0 1 1.0+1

1 1 3.0

B C f2(B,C) -¸(B)

0 0 1.00

0 1 0.0

1 0 1.0-1

1 1 3.0

A B f1(A,B)

0 0 2.0

1 0 3.5

0 1 1.0

1 1 3.0

+=(Adjusting functions

cancel each other)

Dechter & Ihler DeepLearn 2017 88

(Decomposition bound is exact)

Page 85: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Decomposition for MAP

• Bound solution using decomposed optimization• Solve independently: optimistic bound

• Tighten the bound by reparameterization– Enforces lost equality constraints using Lagrange multipliers

Reparameterization:

Add factors that “adjust” each local term, butcancel out in total

Dechter & Ihler DeepLearn 2017 89

Page 86: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Decomposition for MAP

• Many names for the same class of bounds– Dual decomposition [Komodakis et al. 2007]

– TRW, MPLP [Wainwright et al. 2005; Globerson & Jaakkola 2007]

– Soft arc consistency [Cooper & Schiex 2004]

– Max-sum diffusion [Warner 2007]

Add factors that “adjust” each local term, butcancel out in total

Dechter & Ihler DeepLearn 2017 90

Reparameterization:

Page 87: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Decomposition for MAP

• Many ways to optimize the bound:– Sub-gradient descent [Komodakis et al. 2007; Jojic et al. 2010]

– Coordinate descent [Warner 2007; Globerson & Jaakkola 2007; Sontag 2009; Ihler et al 2012]

– Proximal optimization [Ravikumar et al. 2010]

– ADMM [Meshi & Globerson 2011; Martins et al. 2011; Forouzan & Ihler 2013]

Dechter & Ihler DeepLearn 2017 91

Add factors that “adjust” each local term, butcancel out in total

Reparameterization:

Page 88: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Decomposition for MAP

• Many ways to optimize the bound:– Sub-gradient descent [Komodakis et al. 2007; Jojic et al. 2010]

– Coordinate descent [Warner 2007; Globerson & Jaakkola 2007; Sontag 2009; Ihler et al 2012]

– Proximal optimization [Ravikumar et al. 2010]

– ADMM [Meshi & Globerson 2011; Martins et al. 2011; Forouzan & Ihler 2013]

Relaxationupper bound

Decoded configurations

MAP

Dechter & Ihler DeepLearn 2017 92

Add factors that “adjust” each local term, butcancel out in total

Reparameterization:

Page 89: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Optimizing the bound• Can optimize the bound in various ways:

– (Sub-)gradient descent

A B f1(A,B) ¸(B)

0 0 1.00

1 0 0.0

0 1 0.00

1 1 2.5

0 2 1.00

1 2 3.0

B C f2(B,C) -¸(B)

0 0 5.00

0 1 2.0

1 0 1.00

1 1 1.5

2 0 0.20

2 1 0.0

+=

Dechter & Ihler DeepLearn 2017 93

A B B C

Page 90: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Optimizing the bound• Can optimize the bound in various ways:

– (Sub-)gradient descent

A B f1(A,B) ¸(B)

0 0 1.0+1

1 0 0.0

0 1 0.00

1 1 2.5

0 2 1.0-1

1 2 3.0

B C f2(B,C) -¸(B)

0 0 5.0-1

0 1 2.0

1 0 1.00

1 1 1.5

2 0 0.2+1

2 1 0.0

+=

Dechter & Ihler DeepLearn 2017 94

A B B C

Page 91: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Optimizing the bound• Can optimize the bound in various ways:

– (Sub-)gradient descent

A B f1(A,B) ¸(B)

0 0 1.0+1

1 0 0.0

0 1 0.00

1 1 2.5

0 2 1.0-1

1 2 3.0

B C f2(B,C) -¸(B)

0 0 5.0-1

0 1 2.0

1 0 1.00

1 1 1.5

2 0 0.2+1

2 1 0.0

+=

Dechter & Ihler DeepLearn 2017 95

A B B C

Page 92: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Optimizing the bound• Can optimize the bound in various ways:

– (Sub-)gradient descent

A B f1(A,B) ¸(B)

0 0 1.0+2

1 0 0.0

0 1 0.0-1

1 1 2.5

0 2 1.0-1

1 2 3.0

B C f2(B,C) -¸(B)

0 0 5.0-2

0 1 2.0

1 0 1.0+1

1 1 1.5

2 0 0.2+1

2 1 0.0

+=

Dechter & Ihler DeepLearn 2017 96

A B B C

Page 93: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Optimizing the bound• Can optimize the bound in various ways:

– (Sub-)gradient descent

A B f1(A,B) ¸(B)

0 0 1.0+2

1 0 0.0

0 1 0.0-1

1 1 2.5

0 2 1.0-1

1 2 3.0

B C f2(B,C) -¸(B)

0 0 5.0-2

0 1 2.0

1 0 1.0+1

1 1 1.5

2 0 0.2+1

2 1 0.0

+=

Dechter & Ihler DeepLearn 2017 97

Both parts agree on the optima value(s):zero subgradient

A B B C

Page 94: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Optimizing the bound• Can optimize the bound in various ways:

– (Sub-)gradient descent– Coordinate descent

A B f1(A,B) ¸(B)

0 0 1.0

1 0 0.0

0 1 0.0

1 1 2.5

0 2 1.0

1 2 3.0

B C f2(B,C) -¸(B)

0 0 5.0

0 1 2.0

1 0 1.0

1 1 1.5

2 0 0.2

2 1 0.0

+=

Dechter & Ihler DeepLearn 2017 98

Easy to minimize over a single variable, e.g. B:

Find maxima for each BMatch values between f’s

A B B C

Page 95: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Optimizing the bound• Can optimize the bound in various ways:

– (Sub-)gradient descent– Coordinate descent

A B f1(A,B) ¸(B)

0 0 1.0 - 0.5 +2.51 0 0.0

0 1 0.0 - 1.25 +0.751 1 2.5

0 2 1.0 - 1.5+0.11 2 3.0

B C f2(B,C) -¸(B)

0 0 5.0 +0.5- 2.50 1 2.0

1 0 1.0 +1.25- 0.751 1 1.5

2 0 0.2 +1.5- 0.12 1 0.0

+=

Dechter & Ihler DeepLearn 2017 99

Easy to minimize over a single variable, e.g. B:

Find maxima for each BMatch values between f’s

A B B C

Page 96: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

E:

C:

D:

B:

A:

Mini-Bucket as Decomposition

DeepLearn 2017

mini-buckets

U = upper bound

[Ihler et al. 2012]

Dechter & Ihler 100

Page 97: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

E:

C:

D:

B:

A:

Mini-Bucket as Decomposition

DeepLearn 2017

U = upper bound

Join graph:

{A,B,C} {B,D,E}

{A,C,E}

{A,D,E}

{A,E}

{A}

{B}

{D,E}

{A}{A}

{A,E}

{A,C}

• Downward pass as cost shifting

• Can also do cost shifting within mini-buckets:

“Join graph” message passing

• “Moment-matching” version:One message exchange within each bucket, during downward sweep

• Optimal bound defined by cliques (“regions”) and cost-shifting f’nscopes (“coordinates”)

[Ihler et al. 2012]

Dechter & Ihler 101

Page 98: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Anytime Approximation

• Can tighten the bound in various ways– Cost-shifting (improve consistency between cliques)– Increase i-bound (higher order consistency)

• Simple moment-matching step improves bound significantly

DeepLearn 2017Dechter & Ihler 102

Page 99: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Anytime Approximation

• Can tighten the bound in various ways– Cost-shifting (improve consistency between cliques)– Increase i-bound (higher order consistency)

• Simple moment-matching step improves bound significantly

DeepLearn 2017Dechter & Ihler 103

Page 100: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Anytime Approximation

• Can tighten the bound in various ways– Cost-shifting (improve consistency between cliques)– Increase i-bound (higher order consistency)

• Simple moment-matching step improves bound significantly

DeepLearn 2017Dechter & Ihler 104

Page 101: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Decomposition for Sum• Generalize technique to sum via Holder’s inequality:

• Define the weighted (or powered) sum:

– “Temperature” interpolates between sum & max:

– Different weights do not commute:

Dechter & Ihler DeepLearn 2017 105

Page 102: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Decomposition for Sum

• Fixed elimination order• Assign weight per clique & variable

• Again, tighten bound by reparameterization– Can also optimize over weights

Weights:

Dechter & Ihler DeepLearn 2017 106

Reparameterization:

Ex: w12 = [ 0.5 0.3 - ]w13 = [ 0.5 - 0.6 ]w23 = [ - 0.7 0.4 ]

[Peng, Liu, Ihler 2015]

Page 103: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Weighted Mini-bucket

Compute downward messages using weighted sum

Upper bound if all weights positive(corresponding lower bound if only one positive, rest negative)

E:

C:

D:

B:

A:

mini-buckets

U = upper bound

[Liu & Ihler 2011]

Dechter & Ihler DeepLearn 2017 107

Page 104: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 110

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 105: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Variable elimination in trees

• Two-pass algorithm– Pass messages upward to a root– Pass messages back downward– Messages summarize marginalization

of sub-model rooted at that node

• Use messages to compute marginals• Can also update all messages in parallel, until converged

Computing all marginal probabilities:

Calculating messages: Calculating marginals:

DeepLearn 2017Dechter & Ihler 111

Page 106: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Loopy belief propagation• Apply the same local updates in arbitrary structure

– More precisely, often called the “sum-product” algorithm

• Resulting algorithm computes “beliefs” b– May not converge– Initialization & schedule can matter– Is approximate:– But, is often pretty good in practice

(quality depends on how “tree-like” the model is)

Calculating messages: Calculating marginals:

[Pearl 1986]

DeepLearn 2017Dechter & Ihler 112

Page 107: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Example: Ising model• Log-factors

Iteration !

True probabilities "

DeepLearn 2017Dechter & Ihler 113

We’ll see these ideas again in Class 3…

Page 108: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

• Basics of graphical models– Queries – Examples, applications, and tasks – Algorithms overview

• Inference algorithms, exact– Bucket elimination for trees– Bucket elimination – Jointree clustering – Elimination orders

• Approximate elimination– Decomposition bounds– Mini-bucket & weighted mini-bucket – Belief propagation

• Summary and Class 2DeepLearn 2017

RoadMap: Introduction and Inference

Dechter & Ihler 114

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A

D E

CBB C

ED

E K

F

L

H

C

BA

M

G

J

D

Page 109: Algorithms for Reasoning with Probabilistic Graphical Modelsdechter/talks/Part1-Inference.pdf · Algorithms for Reasoning with Probabilistic Graphical Models International Summer

Preview of Class 2• Class 1: Introduction and Inference

• Class 2: Search

• Class 3: Variational Methods and Monte-Carlo Sampling

E K

F

LH

C

BA

M

G

J

DABC

BDEF

DGF

EFH

FHK

HJ KLM

0 1 0 1 0 1 0 10 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

01010101010101010101010101010101010101010101010101010101010101010 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1EC

FD

BA 0 1

Context minimal AND/OR search graph

AOR0ANDBOR

0ANDOR E

OR F FAND 01

AND 0 1C

D D01

0 1

1EC

D D0 1

1B

0E

F F0 1

C1

EC

Dechter & Ihler DeepLearn 2017 116