stat 375: inference in graphical models lectures 1-2montanar/teaching/stat375/lectures/lect-1-2.pdfi...

78

Upload: others

Post on 14-Jul-2020

1 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Stat 375: Inference in Graphical ModelsLectures 1-2

Andrea Montanari

Stanford University

April 2, 2012

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 1 / 49

Page 2: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Logistics

Mon-Wed 9:30AM-10:45PM, Green Earth Sciences 131

Andrea Montanari (montanari)

http://www.stanford.edu/ montanar/TEACHING/Stat375/stat375.html

Homework due on Mon (starting 4/9/2012).Take-home �nal.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 2 / 49

Page 3: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Topics

I Probability distributions that are `local' wrt a graph

I Random variables sit on vertices.

I Strongly dependent if they are nearby.

I Images, philogenies, error-correcting codes, Bayes networks. . .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 3 / 49

Page 4: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Topics

Emphasis on computational and mathematical aspects.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 4 / 49

Page 5: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Topics

I Equivalent graphical representations.

I Polynomial reductions between various probabilistic inferencetasks. Computational hardness.

I Models on trees. Belief propagation.

I Variational inference: naive mean �eld, Bethe free energy,generalized BP and convex relaxations.

I Gaussian graphical models.

I Learning graphical model form data.

I Correlation decay. The Markov Chain Monte Carlo method.

I Applications to clustering and classi�cation.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 5 / 49

Page 6: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Why you should not take this class

I Emphasis on fundamental challenges.

I There is no textbook.

I We'll use the P-word.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 6 / 49

Page 7: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Why you should not take this class

I Emphasis on fundamental challenges.

I There is no textbook.

I We'll use the P-word.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 6 / 49

Page 8: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Why you should not take this class

I Emphasis on fundamental challenges.

I There is no textbook.

I We'll use the P-word.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 6 / 49

Page 9: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Why you should not take this class

I Emphasis on fundamental challenges.

I There is no textbook.

I We'll use the P-word.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 6 / 49

Page 10: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Let us start: Families of graphical models

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 7 / 49

Page 11: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

General theme

Probability distribution over x = (x1; x2; : : : ; xn)

�(x1; x2; : : : ; xn)

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 8 / 49

Page 12: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Family # 1: Undirected Pairwise Graphical Models

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 9 / 49

Page 13: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Family # 1: Undirected Pairwise Graphical Models

(aka Markov Random Fields)

x1

x2 x3 x4

x5x6

x7x8x9x10

x11x12

G = (V ;E), V = [n ], x = (x1; : : : ; xn), xi 2 X

�(x ) =1

Z

Y(ij )2E

ij (xi ; xj ) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 10 / 49

Page 14: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Undirected Pairwise Graphical Models

Speci�ed by

I Graph G = (V ;E).

I Alphabet X .

I Compatibility functions ij : X � X ! R+, (i ; j ) 2 E .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 11 / 49

Page 15: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Alphabet

Typically jX j <1.

Occasionally X = R and

�(dx ) =1

Z

Y(ij )2E

ij (xi ; xj )dx

(all formulae interpreted as densities)

Key challenge: n � 1 (space dimension).

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 12 / 49

Page 16: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Alphabet

Typically jX j <1.

Occasionally X = R and

�(dx ) =1

Z

Y(ij )2E

ij (xi ; xj )dx

(all formulae interpreted as densities)

Key challenge: n � 1 (space dimension).

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 12 / 49

Page 17: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Alphabet

Typically jX j <1.

Occasionally X = R and

�(dx ) =1

Z

Y(ij )2E

ij (xi ; xj )dx

(all formulae interpreted as densities)

Key challenge: n � 1 (space dimension).

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 12 / 49

Page 18: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Partition function

Z �Xx2XV

Y(ij )2E

ij (xi ; xj )

[Plays a crucial role!]

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 13 / 49

Page 19: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Notations

1

2 3 4

56

789

10

1112

@i ��neighbors of vertex i

; deg(i) = j@i j ;

@9 =�10; 2; 11

; deg(9) = 3;

xU � (xi )i2U ;

xf1;7;10g = (x1; x7; x10) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 14 / 49

Page 20: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Notations

1

2 3 4

56

789

10

1112

@i ��neighbors of vertex i

; deg(i) = j@i j ;

@9 =�10; 2; 11

; deg(9) = 3;

xU � (xi )i2U ;

xf1;7;10g = (x1; x7; x10) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 14 / 49

Page 21: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Notations

1

2 3 4

56

789

10

1112

@i ��neighbors of vertex i

; deg(i) = j@i j ;

@9 =�10; 2; 11

; deg(9) = 3;

xU � (xi )i2U ;

xf1;7;10g = (x1; x7; x10) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 14 / 49

Page 22: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Example: Ising models

x1

x2 x3 x4

x5x6

x7x8x9x10

x11x12

G = (V ;E), V = [n ], x = (x1; : : : ; xn), xi 2 f+1;�1g

�(x ) =1

Z

Y(ij )2E

ij (xi ; xj ) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 15 / 49

Page 23: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Example: Ising models

x1

x2 x3 x4

x5x6

x7x8x9x10

x11x12

G = (V ;E), V = [n ], x = (x1; : : : ; xn), xi 2 f+1;�1g

�(x ) =1

Zexp

n X(i ;j )2E

�ij xixj +Xi2V

�ixio:

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 16 / 49

Page 24: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

A motivation: Boltzmann Machines

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 17 / 49

Page 25: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Can we train a computer to do handwriting?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 18 / 49

Page 26: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Can we train a computer to do handwriting?

MNIST dataset: 60; 000 handwritted digits (28� 28 pixels)

Can we learn �(xI ), xI 2 f+1;�1gI , I = [28]� [28] that generates

samples as above?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 19 / 49

Page 27: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Can we train a computer to do handwriting?

MNIST dataset: 60; 000 handwritted digits (28� 28 pixels)

Can we learn �(xI ), xI 2 f+1;�1gI , I = [28]� [28] that generates

samples as above?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 19 / 49

Page 28: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

An attempt

(R. Salakhutdinov, G. Hinton, AISTATS 2009)

What's the magic?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 20 / 49

Page 29: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

An attempt

(R. Salakhutdinov, G. Hinton, AISTATS 2009)

What's the magic?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 20 / 49

Page 30: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

What's the magic?

Image

Hidden variables

�(xI ) =X

xH (1);xH (2);xH (3)

�G;�(xI ; xH (1); xH (2); xH (3))

�G;�( � ) Ising model on G = (V ;E)

V = (I ;H (1);H (2);H (3))

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 21 / 49

Page 31: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

What's the magic?

Image

Hidden variables

�(xI ) =X

xH (1);xH (2);xH (3)

�G;�(xI ; xH (1); xH (2); xH (3))

�G;�( � ) Ising model on G = (V ;E)

V = (I ;H (1);H (2);H (3))

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 21 / 49

Page 32: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Family # 2: Factor Graph Models

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 22 / 49

Page 33: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Family # 2: Factor Graph Models

x3

x1

x6

x4

x2

x5

x7

x

x

x

8

9

10

variable xi 2 X

factor a(x5; x7; x9; x10)

G = (V ;F ;E), V = [n ], x = (x1; : : : ; xn), xi 2 X

�(x ) =1

Z

Ya2F

a(x@a) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 23 / 49

Page 34: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Factor graph models

Terminology

I Variable nodes: i ; j ; k ; � � � 2 V .

I Function nodes: a ; b; c � � � 2 F .

Speci�ed by

I Factor graph G = (V ;F ;E).

I Alphabet X .

I Compatibility fuctions a : X @a ! R+, a 2 F .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 24 / 49

Page 35: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Factor graph models

Terminology

I Variable nodes: i ; j ; k ; � � � 2 V .

I Function nodes: a ; b; c � � � 2 F .

Speci�ed by

I Factor graph G = (V ;F ;E).

I Alphabet X .

I Compatibility fuctions a : X @a ! R+, a 2 F .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 24 / 49

Page 36: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Factor graphs are as powerful as pairwise models

A pairwise model on G = (V ;E) with alphabet X can be representedby a factor graph model on G 0 = (V 0;F 0;E 0) with V 0 = V , F 0 ' E ,jE 0j = 2jE j, X 0 = X .

I Put a factor node on each edge.

A factor model on G = (V ;F ;E) with alphabet X can be representedby a pairwise model on G 0 = (V 0;F 0) with V 0 = V [ F , E 0 = E ,X 0 = X�, � � maxa2F deg(a).

I Represent at a factor node the state of its neighbors,

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 25 / 49

Page 37: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Factor graphs are as powerful as pairwise models

A pairwise model on G = (V ;E) with alphabet X can be representedby a factor graph model on G 0 = (V 0;F 0;E 0) with V 0 = V , F 0 ' E ,jE 0j = 2jE j, X 0 = X .

I Put a factor node on each edge.

A factor model on G = (V ;F ;E) with alphabet X can be representedby a pairwise model on G 0 = (V 0;F 0) with V 0 = V [ F , E 0 = E ,X 0 = X�, � � maxa2F deg(a).

I Represent at a factor node the state of its neighbors,

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 25 / 49

Page 38: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Factor graphs are as powerful as pairwise models

A pairwise model on G = (V ;E) with alphabet X can be representedby a factor graph model on G 0 = (V 0;F 0;E 0) with V 0 = V , F 0 ' E ,jE 0j = 2jE j, X 0 = X .

I Put a factor node on each edge.

A factor model on G = (V ;F ;E) with alphabet X can be representedby a pairwise model on G 0 = (V 0;F 0) with V 0 = V [ F , E 0 = E ,X 0 = X�, � � maxa2F deg(a).

I Represent at a factor node the state of its neighbors,

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 25 / 49

Page 39: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Family # 3: Bayesian Networks

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 26 / 49

Page 40: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Family # 3: Bayesian Networks

x1 x2 x3

x4 x5

x6 x7 x8

G = (V ;D), directed V = [n ], x = (x1; : : : ; xn), xi 2 XD = f directed edges g

�(x ) =Yi2V

�i (xi jx�(i)) ; �(i) =�parents of i

:

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 27 / 49

Page 41: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayesian Networks

Speci�ed by

I Directed acyclic graph G = (V ;D), ((i ; j ) 6= (j ; i)).

I Alphabet X .

I Conditional probability tables �i ( � j � ) : X � X�(i) ! R+, i 2 F :

Xxi2X

�i (xi jx�(i)) = 1 for all x�(i) 2 X�(i) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 28 / 49

Page 42: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks are as powerful as factor graphs

A Bayes network G = (V ;D) with alphabet X can be represented by afactor graph model on G 0 = (V 0;F 0;E 0) with V 0 = V , jF 0j = jV j,jE 0j = jD j+ jV j, X 0 = X .

I Represent by a factor node each CPT.

A factor model on G = (V ;F ;E) with alphabet X can be representedby a Bayes network G 0 = (V 0;D 0) with V 0 = V and X 0 = X .

I Choose a total ordering of V 0 = V and write � in terms ofconditional probabilities.

In general the resulting Bayes network is dense.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 29 / 49

Page 43: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks are as powerful as factor graphs

A Bayes network G = (V ;D) with alphabet X can be represented by afactor graph model on G 0 = (V 0;F 0;E 0) with V 0 = V , jF 0j = jV j,jE 0j = jD j+ jV j, X 0 = X .

I Represent by a factor node each CPT.

A factor model on G = (V ;F ;E) with alphabet X can be representedby a Bayes network G 0 = (V 0;D 0) with V 0 = V and X 0 = X .

I Choose a total ordering of V 0 = V and write � in terms ofconditional probabilities.

In general the resulting Bayes network is dense.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 29 / 49

Page 44: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks are as powerful as factor graphs

A Bayes network G = (V ;D) with alphabet X can be represented by afactor graph model on G 0 = (V 0;F 0;E 0) with V 0 = V , jF 0j = jV j,jE 0j = jD j+ jV j, X 0 = X .

I Represent by a factor node each CPT.

A factor model on G = (V ;F ;E) with alphabet X can be representedby a Bayes network G 0 = (V 0;D 0) with V 0 = V and X 0 = X .

I Choose a total ordering of V 0 = V and write � in terms ofconditional probabilities.

In general the resulting Bayes network is dense.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 29 / 49

Page 45: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks are as powerful as factor graphs

A Bayes network G = (V ;D) with alphabet X can be represented by afactor graph model on G 0 = (V 0;F 0;E 0) with V 0 = V , jF 0j = jV j,jE 0j = jD j+ jV j, X 0 = X .

I Represent by a factor node each CPT.

A factor model on G = (V ;F ;E) with alphabet X can be representedby a Bayes network G 0 = (V 0;D 0) with V 0 = V and X 0 = X .

I Choose a total ordering of V 0 = V and write � in terms ofconditional probabilities.

In general the resulting Bayes network is dense.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 29 / 49

Page 46: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks with observed variables

V = H [O ;

x = (xi )i2H = (x1; : : : ; xn) = ( Hidden Variables ) ;

y = (yi )i2O = (y1; : : : ; ym) = ( Observed Variables )

�(x ; y) =Yi2H

�(xi jx�(i)\H ; y�(i)\O)Yi2O

�(yi jx�(i)\H ; y�(i)\O)

Of interest �y(x ) = �(x jy)

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 30 / 49

Page 47: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks with observed variables

V = H [O ;

x = (xi )i2H = (x1; : : : ; xn) = ( Hidden Variables ) ;

y = (yi )i2O = (y1; : : : ; ym) = ( Observed Variables )

�(x ; y) =Yi2H

�(xi jx�(i)\H ; y�(i)\O)Yi2O

�(yi jx�(i)\H ; y�(i)\O)

Of interest �y(x ) = �(x jy)

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 30 / 49

Page 48: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks with observed variables

V = H [O ;

x = (xi )i2H = (x1; : : : ; xn) = ( Hidden Variables ) ;

y = (yi )i2O = (y1; : : : ; ym) = ( Observed Variables )

�(x ; y) =Yi2H

�(xi jx�(i)\H ; y�(i)\O)Yi2O

�(yi jx�(i)\H ; y�(i)\O)

Of interest �y(x ) = �(x jy)

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 30 / 49

Page 49: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Example 1: Forensic science

[Kadane, Shum, A probabilistic analysis of the Sacco and Vanzetti evidence, 1996]

[Taroni et al., Bayesian Networks and Probabilistic Inference in Forensic Science,

2006]

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 31 / 49

Page 50: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Example 2: Diagnostic network

diseases

`soft ORs'

symptoms

[M. Shwe, et al., Methods of Information in Medicine, 1991]Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 32 / 49

Page 51: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Example 2: Diagnostic network

diseases

symptoms

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 33 / 49

Page 52: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Bayes networks are as powerful as factor graphs

diseases

symptoms

With observed variables graph remains unchanged

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 34 / 49

Page 53: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Markov property

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 35 / 49

Page 54: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Global Markov Property

�(x ) = �(x1; x2; : : : ; xn)

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 36 / 49

Page 55: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Global Markov Property

A B C

De�nition

Let A[B [C be a partition of V . We say that B separates A from C

if any path starting in A and terminating in C has at least one node inB .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 37 / 49

Page 56: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Global Markov Property

A B C

De�nition

The probability distribution � over XV satis�es the global Markovproperty on G if for any partition V = A [B [C such that Bseparates A from C ,

�(xA; xC jxB ) = �(xAjxB )�(xC jxB )

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 38 / 49

Page 57: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

The most general

Theorem (Hammersley, Cli�ord, 1971)

Let �( � ) be a probability distribution on XV , and G = (V ;E) be agraph such that

I � is Markov with respect to G.

I �(x ) > 0 for all x 2 XV .

I G does not contain triangles.

Then � is a pairwise graphical model on G.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 39 / 49

Page 58: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Proof sketch (Grimmett, Bull. London Math. Soc. 1973)

For every S � V , de�ne

e S (xS ) � YU�S

�(xU ; 0V nU )(�1)jSnU j

Example: For S = fi ; j g (not necessarily edge) let�ij ;+( � ) � �( � ; 0V nfi ;jg)

e ij (xi ; xj ) = �ij ;+(xi ; xj )�ij ;+(xi ; 0)�1�ij ;+(0; xj )

�1 �ij ;+(0; 0) :

Exercise: If V = fi ; j g. . .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 40 / 49

Page 59: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Proof sketch (Grimmett, Bull. London Math. Soc. 1973)

For every S � V , de�ne

e S (xS ) � YU�S

�(xU ; 0V nU )(�1)jSnU j

Example: For S = fi ; j g (not necessarily edge) let�ij ;+( � ) � �( � ; 0V nfi ;jg)

e ij (xi ; xj ) = �ij ;+(xi ; xj )�ij ;+(xi ; 0)�1�ij ;+(0; xj )

�1 �ij ;+(0; 0) :

Exercise: If V = fi ; j g. . .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 40 / 49

Page 60: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Proof sketch (Grimmett, Bull. London Math. Soc. 1973)

For every S � V , de�ne

e S (xS ) � YU�S

�(xU ; 0V nU )(�1)jSnU j

Example: For S = fi ; j g (not necessarily edge) let�ij ;+( � ) � �( � ; 0V nfi ;jg)

e ij (xi ; xj ) = �ij ;+(xi ; xj )�ij ;+(xi ; 0)�1�ij ;+(0; xj )

�1 �ij ;+(0; 0) :

Exercise: If V = fi ; j g. . .

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 40 / 49

Page 61: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Proof sketch (Grimmett, 1973)

I Claim 1: �(x ) = �(0V )QS�V

e S (xS )

I Claim 2: For S 6= fi ; j g 2 E , fig, e S (xS ) =const.Given �, can construct G !

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 41 / 49

Page 62: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Proof sketch (Grimmett, 1973)

I Claim 1: �(x ) = �(0V )QS�V

e S (xS )

I Claim 2: For S 6= fi ; j g 2 E , fig, e S (xS ) =const.Given �, can construct G !

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 41 / 49

Page 63: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Useful facts

Fact

XK�W

(�1)jK j = I(W = ;) :

(Sum includes the empty set.)

Fact

If f is a set function, W 0 �W proper subset, then

XK�W

(�1)jK jf (K \W 0) = 0 :

(Sum includes the empty set.)

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 42 / 49

Page 64: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

If G contains triangles

�(x ) =1

Z

YC2cliques(G)

C(xC) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 43 / 49

Page 65: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Footnote #1

De�nition (Global Markov)

For any partition V = A [B [C such that B separates A from C ,

�(xA; xC jxB ) = �(xAjxB )�(xC jxB )

Local Markov: Required only for A = fig, B = @i , C = V n fig[ @i .

Pairwise Markov: Required only for A = fig, B = V n fi ; j g,C = fj g.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 44 / 49

Page 66: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

They are in fact equivalent

Obviously

(G)) (L)) (P)

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 45 / 49

Page 67: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Less obviously: (P) ) (G)

By induction over s � jV nB j:

I s = 2: Pairwise property.

I Assume (G) for any jB j with jV nB j = s � 2 and prove it forjV nB j = s + 1.

WLOG take A [B [C = V and jAj � 2.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 46 / 49

Page 68: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Less obviously: (P) ) (G)

By induction over s � jV nB j:

I s = 2: Pairwise property.

I Assume (G) for any jB j with jV nB j = s � 2 and prove it forjV nB j = s + 1.

WLOG take A [B [C = V and jAj � 2.

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 46 / 49

Page 69: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Induction step: The Intersection Lemma

We write A�B�C if xA is conditionally independent of xC given xB .

Lemma

If � is strictly positive and

A�(C [D)�B ; A�(B [D)�C ;

then

A�D�(B [C ) :

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 47 / 49

Page 70: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Induction step

i

eAA B C

A = eA [ figby induction assumption C�(B [ i)�eA

C�(B [ eA)�fig

By the intersection lemma

C�B�(eA [ i) = A

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 48 / 49

Page 71: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Induction step

i

eAA B C

A = eA [ figby induction assumption C�(B [ i)�eA

C�(B [ eA)�fig

By the intersection lemma

C�B�(eA [ i) = A

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 48 / 49

Page 72: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Induction step

i

eAA B C

A = eA [ figby induction assumption C�(B [ i)�eA

C�(B [ eA)�fig

By the intersection lemma

C�B�(eA [ i) = A

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 48 / 49

Page 73: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

An interesting mathematical phenomenon

This is not true if G is in�nite.

BTW: How do you de�ne a MRF on an in�nite graph?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 49 / 49

Page 74: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

An interesting mathematical phenomenon

This is not true if G is in�nite.

BTW: How do you de�ne a MRF on an in�nite graph?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 49 / 49

Page 75: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

An interesting mathematical phenomenon

This is not true if G is in�nite.

BTW: How do you de�ne a MRF on an in�nite graph?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 49 / 49

Page 76: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Footnote #2

Where does the (�1)jSnU j come from?

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 50 / 49

Page 77: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Möbius inversion formula

2V =�subsets of V

;

Theorem

Let f ; g : 2V ! R. Then the following are equivalent

f (S) =XU�S

g(U ) ; for all S � V ;

g(S) =XU�S

(�1)jSnU jf (U ) ; for all S � V :

Proof: Exercise.

[see also G.C.Rota, Prob. Theor. Rel. Fields, 2 (1964) 340-368]

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 51 / 49

Page 78: Stat 375: Inference in Graphical Models Lectures 1-2montanar/TEACHING/Stat375/LECTURES/lect-1-2.pdfI We'll use the P-word. Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2,

Relation with what we did

f (S) = log�(xS ; 0V nS ) ;

g(S) =XU�S

(�1)jSnU j log�(xU ; 0V nU ) = ~psiS (x )

Claim 1 in the proof Möbius

Andrea Montanari (Stanford) Stat375: Lecture 1, 2 April 2, 2012 52 / 49