probabilistic inference lecture 1

80
Probabilistic Inference Lecture 1 M. Pawan Kumar [email protected] es available online http://cvc.centrale-ponts.fr/personnel/pa

Upload: ursa

Post on 05-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Probabilistic Inference Lecture 1. M. Pawan Kumar [email protected]. Slides available online http:// cvc.centrale-ponts.fr /personnel/ pawan /. About the Course. 7 lectures + 1 exam Probabilistic Models – 1 lecture Energy Minimization – 4 lectures Computing Marginals – 2 lectures - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probabilistic Inference Lecture 1

Probabilistic InferenceLecture 1

M. Pawan [email protected]

Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

Page 2: Probabilistic Inference Lecture 1

About the Course

• 7 lectures + 1 exam

• Probabilistic Models – 1 lecture

• Energy Minimization – 4 lectures

• Computing Marginals – 2 lectures

• Related Courses• Probabilistic Graphical Models (MVA)• Structured Prediction

Page 3: Probabilistic Inference Lecture 1

Instructor

• Assistant Professor (2012 – Present)

• Center for Visual Computing• 12 Full-time Faculty Members• 2 Associate Faculty Members

• Research Interests• Probabilistic Models• Machine Learning• Computer Vision• Medical Image Analysis

Page 4: Probabilistic Inference Lecture 1

Students

• Third year at ECP

• Specializing in Machine Learning and Vision

• Prerequisites• Probability Theory• Continuous Optimization• Discrete Optimization

Page 5: Probabilistic Inference Lecture 1

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

Example (on board) !!

Page 6: Probabilistic Inference Lecture 1

Outline

• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs

• Conversions

• Exponential Family

• Inference

Page 7: Probabilistic Inference Lecture 1

MRF

UnobservedRandomVariables

Edges define a neighborhood over random variables

Neighbors

Page 8: Probabilistic Inference Lecture 1

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Variable Va takes a value or a label va from a set L

V = v is called a labeling Discrete, Finite

= {l1, l2,…, lh}

Page 9: Probabilistic Inference Lecture 1

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

MRF assumes the Markovian property for P(v)

Page 10: Probabilistic Inference Lecture 1

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Va is conditionally independent of Vb given Va’s neighbors

Hammersley-Clifford Theorem

Page 11: Probabilistic Inference Lecture 1

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Probability P(v) can be decomposed into clique potentials

Potentialψ12(v1,v2)

Potentialψ56(v5,v6)

Page 12: Probabilistic Inference Lecture 1

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v) proportional to Π(a,b) ψab(va,vb)

Potentialψ1(v1,d1)

Probability P(d|v) proportional to Πa ψa (va,da)

ObservedData

Page 13: Probabilistic Inference Lecture 1

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v,d) =Πa ψa(va,da) Π(a,b) ψab(va,vb)

Z

Z is known as the partition function

Page 14: Probabilistic Inference Lecture 1

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

High-orderPotential

ψ4578(v4,v5,v7,v8)

Page 15: Probabilistic Inference Lecture 1

Pairwise MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Z is known as the partition function

UnaryPotentialψ1(v1,d1)

PairwisePotentialψ56(v5,v6)

Probability P(v,d) =Πa ψa(va,da) Π(a,b) ψab(va,vb)

Z

Page 16: Probabilistic Inference Lecture 1

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

A is conditionally independent of B given C if

there is no path from A to B when C is removed

Page 17: Probabilistic Inference Lecture 1

Conditional Random Fields (CRF)

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

CRF assumes the Markovian property for P(v|d)

Hammersley-Clifford Theorem

Page 18: Probabilistic Inference Lecture 1

CRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v|d) proportional to Πa ψa(va;d) Π(a,b) ψab(va,vb;d)

Clique potentials that depend on the data

Page 19: Probabilistic Inference Lecture 1

CRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v|d) =Πa ψa (va;d) Π(a,b) ψab(va,vb;d)

Z

Z is known as the partition function

Page 20: Probabilistic Inference Lecture 1

MRF and CRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)

Z

Page 21: Probabilistic Inference Lecture 1

Outline

• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs

• Conversions

• Exponential Family

• Inference

Page 22: Probabilistic Inference Lecture 1

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Directed Acyclic Graph (DAG) – no directed loops

Ignoring directionality of edges, a DAG can have loops

Page 23: Probabilistic Inference Lecture 1

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Bayesian Network concisely represents the probability P(v)

Page 24: Probabilistic Inference Lecture 1

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Probability P(v) = Πa P(va|Parents(va))

P(v1)P(v2|v1)P(v3|v1)P(v4|v2)P(v5|v2,v3)P(v6|v3)P(v7|v4,v5)P(v8|v5,v6)

Page 25: Probabilistic Inference Lecture 1

Bayesian Networks

Courtesy Kevin Murphy

Page 26: Probabilistic Inference Lecture 1

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Va is conditionally independent of its ancestors given its parents

Page 27: Probabilistic Inference Lecture 1

Bayesian Networks

Conditional independence of A and B given C

Courtesy Kevin Murphy

Page 28: Probabilistic Inference Lecture 1

Outline

• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs

• Conversions

• Exponential Family

• Inference

Page 29: Probabilistic Inference Lecture 1

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Two types of nodes: variable nodes and factor nodes

Bipartite graph between the two types of nodes

Page 30: Probabilistic Inference Lecture 1

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψa(v1,v2)

Page 31: Probabilistic Inference Lecture 1

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψa({v}a)

Page 32: Probabilistic Inference Lecture 1

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψb(v2,v3)

Page 33: Probabilistic Inference Lecture 1

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψb({v}b)

Page 34: Probabilistic Inference Lecture 1

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

ψb({v}b)

Probability P(v) =Πa ψa({v}a)

Z

Z is known as the partition function

Page 35: Probabilistic Inference Lecture 1

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

Page 36: Probabilistic Inference Lecture 1

MRF to Factor Graphs

Page 37: Probabilistic Inference Lecture 1

Bayesian Networks to Factor Graphs

Page 38: Probabilistic Inference Lecture 1

Factor Graphs to MRF

Page 39: Probabilistic Inference Lecture 1

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

Page 40: Probabilistic Inference Lecture 1

Motivation

Random Variable V Label set L = {l1, l2,…, lh}

Samples V1, V2, …, Vm that are i.i.d.

Functions ϕα: L Reals

Empirical expectations: μα = (Σi ϕα(Vi))/m

Expectation wrt distribution P: EP[ϕα(V)] = Σi ϕα(li)P(li)

Given empirical expectations, find compatible distribution

Underdetermined problem

α indexes a set of functions

Page 41: Probabilistic Inference Lecture 1

Maximum Entropy Principle

max Entropy of the distribution

s.t. Distribution is compatible

Page 42: Probabilistic Inference Lecture 1

Maximum Entropy Principle

max -Σi P(li)log(P(li))

s.t. Distribution is compatible

Page 43: Probabilistic Inference Lecture 1

Maximum Entropy Principle

max -Σi P(li)log(P(li))

s.t. Σi ϕα(li)P(li) = μα for all α

Σi P(li) = 1

P(v) proportional to exp(-Σα θαϕα(v))

Page 44: Probabilistic Inference Lecture 1

Exponential Family

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2,…, lh}

Labeling V = v, va L for all a {1, 2,…, n}

Functions ϕα: Ln Reals α indexes a set of functions

P(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters NormalizationConstant

Page 45: Probabilistic Inference Lecture 1

Minimal RepresentationP(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters NormalizationConstant

No non-zero c such that Σα cαΦα(v) = Constant

Page 46: Probabilistic Inference Lecture 1

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

Page 47: Probabilistic Inference Lecture 1

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

va θa for all Va V

vavb θab for all (Va,Vb) E

Page 48: Probabilistic Inference Lecture 1

Ising ModelP(v) = exp{-Σa θava -Σa,b θabvavb- A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

va θa for all Va V

vavb θab for all (Va,Vb) E

Page 49: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

Page 50: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

Foreground histogram of RGB values FG

Background histogram of RGB values BG

‘+1’ indicates foreground and ‘-1’ indicates background

Page 51: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

More likely to be foreground than background

Page 52: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

More likely to be background than foreground

θa proportional to -log(FG(da)) + log(BG(da))

Page 53: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

More likely to belong to same label

Page 54: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

Less likely to belong to same label

θab proportional to -exp(-(da-db)2)

Page 55: Probabilistic Inference Lecture 1

Rest of lecture 1 ….

Page 56: Probabilistic Inference Lecture 1

Exponential FamilyP(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters Log-PartitionFunction

Random Variables V = {V1,V2,…,Vn}

Labeling V = vva L = {l1,l2,…,lh}

Random Variable Va takes a value or label va

Page 57: Probabilistic Inference Lecture 1

Overcomplete RepresentationP(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters Log-PartitionFunction

There exists a non-zero c such that Σα cαΦα(v) = Constant

Page 58: Probabilistic Inference Lecture 1

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

Page 59: Probabilistic Inference Lecture 1

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Page 60: Probabilistic Inference Lecture 1

Ising ModelP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Page 61: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

Foreground histogram of RGB values FG

Background histogram of RGB values BG

‘1’ indicates foreground and ‘0’ indicates background

Page 62: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

More likely to be foreground than background

Page 63: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

More likely to be background than foreground

θa;0 proportional to -log(BG(da))

θa;1 proportional to -log(FG(da))

Page 64: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

More likely to belong to same label

Page 65: Probabilistic Inference Lecture 1

Interactive Binary Segmentation

Less likely to belong to same label

θab;ik proportional to exp(-(da-db)2) if i ≠ k

θab;ik = 0 if i = k

Page 66: Probabilistic Inference Lecture 1

Metric LabelingP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh}

Page 67: Probabilistic Inference Lecture 1

Metric LabelingP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

θab;ik is a metric distance function over labels

Label set L = {0, …, h-1}

Page 68: Probabilistic Inference Lecture 1

Metric LabelingP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

θab;ik is a metric distance function over labels

Label set L = {0, …, h-1}

Page 69: Probabilistic Inference Lecture 1

Stereo Correspondence

Disparity Map

Page 70: Probabilistic Inference Lecture 1

Stereo Correspondence

L = {disparities}

Pixel (xa,ya) in leftcorresponds to

pixel (xa+va,ya) in right

Page 71: Probabilistic Inference Lecture 1

Stereo Correspondence

L = {disparities}

θa;i is proportional tothe difference in RGB values

Page 72: Probabilistic Inference Lecture 1

Stereo Correspondence

L = {disparities}

θab;ik = wab d(i,k)

wab proportional to exp(-(da-db)2)

Page 73: Probabilistic Inference Lecture 1

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(v) = exp{-Σα θαΦα(v) - A(θ)}

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

Page 74: Probabilistic Inference Lecture 1

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

A(θ) : log Z

Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)

Z

ψa(li) : exp(-θa;i) ψa(li,lk) : exp(-θab;ik)

Parameters θ are sometimes also referred to as potentials

Page 75: Probabilistic Inference Lecture 1

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

Labeling as a function f : {1, 2, … , n} {1, 2, …, h}

Variable Va takes a label lf(a)

Page 76: Probabilistic Inference Lecture 1

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(f) = exp{-Σa θa;f(a) -Σa,b θab;f(a)f(b) - A(θ)}

Labeling as a function f : {1, 2, … , n} {1, 2, …, h}

Variable Va takes a label lf(a)

Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

Page 77: Probabilistic Inference Lecture 1

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(f) = exp{-Q(f) - A(θ)}

Labeling as a function f : {1, 2, … , n} {1, 2, …, h}

Variable Va takes a label lf(a)

Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

Page 78: Probabilistic Inference Lecture 1

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

Page 79: Probabilistic Inference Lecture 1

Inference

maxv ( P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} )

Maximum a Posteriori (MAP) Estimation

minf ( Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) )

Energy Minimization

P(va = li) = Σv P(v)δ(va = li)

Computing Marginals

P(va = li, vb = lk) = Σv P(v)δ(va = li)δ(vb = lk)

Page 80: Probabilistic Inference Lecture 1

Next Lecture …

Energy minimization for tree-structured pairwise MRF