stat 598l probabilistic graphical modelsskirshne/teaching/stat598l_f09/mn.pdf · 2010-01-27 ·...
Post on 25-Jul-2020
2 Views
Preview:
TRANSCRIPT
STAT 598LProbabilistic Graphical Models
Instructor: Sergey Kirshner
Markov Networks
Motivating Example
• Is there a Bayesian Network that is a P-map for {(A ⊥ B │ C, D), (C ⊥ D │ A, B)}?– No other independence except for application of
symmetry, so the rest of the parents are dependent (in a P-map)
– Skeleton
– Adding directions• Without loss of generality, A->C
• Cannot have B->C (A->C<-B)
• Cannot have D->B (C->B<-D)
• Cannot have A->D (A->D<-B)STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
No BN P-map!
Undirected Model• Is there a different framework that can represent
these dependencies?– What if we had undirected separation instead of d-
separation?
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
• Markov networks (Markov random fields, MRFs)– Represent conditional independence
relations with an undirected graph
– Encode functional dependence using potential functions or factors
Factors
STAT 598L: Probabilistic Graphical Models (Markov Networks)
{X1,X2,…,Xn} = set of variables{Y1,Y2,…,Yk} ⊆ {X1,X2,…,Xn} -- subset of variables
Val(Y1) x Val(Y2)x … x Val(Yk)0 R+
φscope[φ]
=
= factor
Joint probability = product of factors
Factor = measure of relationship for a group of variables
Example
STAT 598L: Probabilistic Graphical Models (Markov Networks)
normalization constant(partition function)
Gibbs distribution
Example (continued)
STAT 598L: Probabilistic Graphical Models (Markov Networks)
How many free parameters?3+3+3+3=12
Factors and Free Parameters
• For this analysis, stick to binary variables
• Each factor of k variables = 2k-1 free parameters
• Assume all factors are of the same size– nCk ways possible factors (O(nk))
– Total of O(nk2k) free parameters
– Compare to O(2n) for a full table
• Conclusion: even using large factors reduces the number of free parameters
STAT 598L: Probabilistic Graphical Models (Markov Networks)
BNs: Special Case
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Factor Operations: Product
STAT 598L: Probabilistic Graphical Models (Markov Networks)
X=x Y=y φ1(x,y)
1 1 0.4
1 0 0.7
0 1 1
0 0 0.8
Y=y Z=z φ2(y,z)
1 1 0.3
1 0 0.9
0 1 0.5
0 0 1
X=x Y=y Z=z φ12(x,y,z)
1 1 1 0.12
1 1 0 0.36
1 0 1 0.35
1 0 0 0.7
0 1 1 0.3
0 1 0 0.9
0 0 1 0.4
0 0 0 0.8
Conditional Independence?
• What about {a,c}, {a,d}, {b,c}, and {b,d}?– They cannot be made independent!
– Edges connect variables in the same scope
– Resulting graph = Markov network
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
Factorization: Formal Definition• Given: Gibbs distribution P with non-negative factors Φ={φ1,…,φK}, and a Markov network H
• P factorizes over H: scope of every factor corresponds to a complete subgraph of H
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
A
C
B
D
Factorization
• Collection of factors is not unique– Are the scopes {{A,B}, {A,C}, and {B,C}}, or is it just
{A,B,C}?
– Networks can obscure scopes (structures) of original factors
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
Graphical Model
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Graphical Model = Graph + Parameters
Bayesian Network =parents in
chain decomposition
+conditional probability
distributions
Markov network =variables in
factors + factors
Undirected vs Directed Model• Bayesian networks:
– DAG => dimensionality reduction with chain rule for probability (simple justification)
– Possible causal dependence (interpretation the edge directions)
– Parameters are interpretable
– Represented independencies depend on the order of variables (drawback)
• Undirected model:– No ordering to consider! (Fewer objects, one less uncertainty to worry
about)
– Intuition using exponential models (later in the course)
– Difficult to interpret (and to illicit) the parameters
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Representational Power: BN vs MN
• Can Bayesian Networks represent all independencies from Markov Network?– No: {(A ⊥ B │ C, D), (C ⊥ D │ A, B)}
• Can Markov Networks represent all independencies from Bayesian Networks– No: A -> B <- C
• What is the overlap?– Later
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Graph Separation
• Need to establish conditional independence from undirected graph properties
• Active path = none of the intermediate variables are observed
• No active paths = separation
• Monotonic: adding observed variables can only reduce active paths
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D
E
blocked
Set of global independencies (global Markov property)
Representation Theorem for BNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to GEach variable is independent of its non-descendants given its parents
Local Markov assumption
independencies graph structure
Representation Theorem for MNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to Hglobal independencies set by
scopes of factors
Global Markov property
independencies graph structure
A
C
B
D
E
?
Representation Theorem for MNs
• Proof: Need to show
– Case 1: Assume• Partition Di so that either
Di⊆A∪C or Di⊆B∪C
STAT 598L: Probabilistic Graphical Models (Markov Networks)independencies graph structure
P factorizes according to Hglobal independencies set by
scopes of factors
A
B
C
Representation Theorem for MNs
• Proof: Need to show
– Case 2:
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A B
CU1
U2
independencies graph structure
P factorizes according to Hglobal independencies set by
scopes of factors
Converse?
• Think xor
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to Hglobal independencies set by
scopes of factors
Global Markov property
independencies graph structure
A
C
B
D
E
Hammersley-Clifford Theorem
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to Hglobal independencies set by
scopes of factors
Global Markov property
independencies graph structure
A
C
B
D
E
If P is positive and
• Interpreting the statement
• Sketch of proof (by construction):– All factors not in the trail are uniform (remove
nodes and edges not in the trail)
– Make the remaining factors almost deterministic
Completeness of separation
STAT 598L: Probabilistic Graphical Models (Bayesian Networks)
Active trail between X and Y given Z X and Y are dependent given Z in some P that factorizes according to H
More General Result
STAT 598L: Probabilistic Graphical Models (Bayesian Networks)
Soundness
Intuition: Two binary variables X and Y;3-d space of possible factors with a 2-d manifold for independence
Completeness (almost)
X Y
Representation Theorem for BNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
P factorizes according to GEach variable is independent of its non-descendants given its parents
Local Markov assumption
independencies graph structure
Other Ways to Encode Independence
• Local Markov independence:
• Pairwise independence:
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Markov blanket (local)
Pairwise Markov independencies
Relation Between Independencies
• Two separated nodes will also be separated by the neighbors for either node
• Variables corresponding to non-adjacent are conditionally independent given the variables corresponding to neighbors– Conditionally independent also given the rest of
the variables (monotonic)
STAT 598L: Probabilistic Graphical Models (Markov Networks)
global local pariwise
Converse
• For all disjoint A, B, and C,
– Induction on size of C• |C|=n-2:
• |C|=k-1<n-2, case I:
STAT 598L: Probabilistic Graphical Models (Markov Networks)
globalpairwise
&
&
&
Converse
• For all disjoint A, B, and C,
– Induction on size of C• |C|=k-1<n-2, case II:
• Assume |A|=|B|=1, otherwise approach as in case I
STAT 598L: Probabilistic Graphical Models (Markov Networks)
globalpairwise
&
&
&
Equivalence
• Given P is positive– Global Markov property
– Local Markov property
– Pairwise Markov property
STAT 598L: Probabilistic Graphical Models (Markov Networks)
How To Recover MNs from Distribution
• If P is positive– Check whether A ⊥ B | X-A-B or
– Find smallest C such that A ⊥ C | X-A-C• C=MBP(A) (Markov blanket)
– In both cases, the graph is a minimal I-map of P
– Graphs are the same – such I-map is unique!
• If P is not positive– No guarantee that the resulting graph is an I-map
of P
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Finding P-maps
• If P-map exists– Find a minimal I-map
– It is also a P-map!
• Does it always exist?– Think v-structure
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Alternative Parametrizations
• Structure of the Markov network may hide the scopes of the factors– Think complete graph: is it one factor with all
variables in the scope or a product of factors with pairs of variables in the scope?
• May want to make factorization more explicit in the structure
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Factor Graphs
• Bipartite graph: variables vs factors
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
DA CB D
Log-Linear Model
• Product into a sum
• Convert factors into a finer set of features• Break down factors further (context)
• Different features may share same scope
STAT 598L: Probabilistic Graphical Models (Markov Networks)
energy functions
weights features
Ising Model
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Binary xis
STAT 598L: Probabilistic Graphical Models (Markov Networks)http://www.cis.upenn.edu/~jshi/GraphTutorial/
Recap
• Parameterizations for Markov networks– Features
– Overparameterizations
– How many parameters are free?
– Canonical parameterization
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Plan
• Proof of Hammersley-Clifford theorem (if there is interest)
• Justification for Markov networks using Maximum Entropy principle (later)
• Relating Bayesian and Markov networks– Proof of soundness theorem for Bayesian
networks
– Determining which Markov networks are P-maps for which Bayesian networks
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Information Theory
• P(X) encodes our uncertainty about X– Some variables are more uncertain than others
– How can we quantify this intuition?• Entropy: average number of bits required to encode X
• Entropy is maximized when X is uniform
STAT 598L: Probabilistic Graphical Models (Markov Networks) 40
P(X) P(Y)
X Y
( ) ( ) ( ) ( )∑=
=
xP xP
xPxp
EXH 1log1log
From Carlos Guestrin’s 10-708 Probabilistic Graphical Models Fall 2008 at CMU
Maximum Entropy Principle
• Given everything else the same, pick a distribution with the maximum entropy– Closest to uniform
• Example: ¾ kangaroo’s are left-handed and ¾drink Foster’s– Want to reconstruct the full probability table
knowing only p11+p12=0.75 and p11+p21=0.75
– Have 3 free parameters and only 2 constraints leaving 1 free parameter
STAT 598L: Probabilistic Graphical Models (Markov Networks)
11 12
21 22
p pp p
MaxEnt Principle Continued
• Since we are not given that left-handedness is correlated with Foster drunkedness, ideally do not want to introduce the correlation into the model
• Which objective function to maximize?
• Entropy is (the only) such function– Want to maximize HP(X) subject to the constraints
p11+p12=0.75 and p11+p21=0.75
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Gull S.F., Skilling J. (1984), “The Maximum Entropy Method,” in Indirect Imaging
Direct Solution
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Left-handedness is independent of Foster drunkedness!
Round-about Solution
• Constraints = Lagrange multipliers
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Round-about Solution
• How to find the weights?– Plug in the log-linear model for P(x) and maximize F(x)
– Or, satisfy the constraints
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Log-linear model!
MaxEnt in a More General Setting• Given a set of constraints
– General solution to the MaxEnt formulation is
• Log-linear model is an approximation to a distribution that preserves some properties (constraints) while making the distribution as close to uniform as possible– Duality between constraints and weights
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Soundness of d-separation
STAT 598L: Probabilistic Graphical Models (Markov Networks)
For all P that factorizes according to G
G is an I-map for P
G is a BN structure for P
d-separation in G
conditional independencein Plocal graph property
global separation property
Proof Outline
• Given evidence, convert Bayesian network into an equivalent Markov network– Construct such network
– Show that it is an equivalent Markov network
• Use separation property of the Markov network to prove the theorem
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Constructing MNs from BNs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
A
C
B
D E
moralized graph
I-mapminimal I-map
G H
Constructing MNs from BNs with Evidence
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
A
C
B
D E
moralized graph
G H
P-map for Moral Graphs
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
A
C
B
D E
moral graph moralized graph
minimal I-map
G H
Proof: pick an active (minimal) trailin G. Show it is in H.
Two cases:Trail has no v-structures -- no marked nodes-- same trail is in HTrail has v-structures – v-structure is covered-- not minimal -- contradiction
Soundness for d-separation
• What if the graph is not moral?– What if immoralities did not matter?
– They are if effects or their descendant is in evidence
• Only consider the subgraphs for which immoralities have a descendant in the evidence– Upward closure of evidence nodes
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Upward Closure and Its MN
STAT 598L: Probabilistic Graphical Models (Markov Networks)
A
C
B
D E
G
A
C
B
D
G’
Exercise 3.8: BN(G’) agrees with BN(G) over nodes of G’
barren node
A
C
B
D
H
Soundness of d-separation
• Consider X and Y d-separated by Z
• Build an upward closure for X∪Y∪Z
• d-separation is equivalent to separation in H
• Separation in H implies conditional independence
STAT 598L: Probabilistic Graphical Models (Markov Networks)
For all P that factorizes according to G
G is a BN structure for P
d-separation in G
conditional independencein P
A
C
B
D E
G
A
C
B
D
H
From Markov Networks to Bayesian Networks
• As seen before, Markov networks cannot represent immoralities
• Can show that if a Bayesian network G is a minimal I-map for some Markov network structure H, it contains no immoralities
• No immoralities = every three nodes with v-structure are covered
• Undirected cycle of length >3 = v-structure– Must have a chord
• All BN I-maps of Markov networks are chordal– No BN P-map exists for a non-chordal MN
STAT 598L: Probabilistic Graphical Models (Markov Networks)
Markov Networks: Summary• Mass/density = normalized product of factors
• Represent conditional independence with independence graphs– Conditional independence = separation in the graph
– Global separation = local separation (Markov blanket) = pairwiseseparation, all in positive distributions
• Interpretation: closest to uniform under constraints specified by features– Scope of features determines the structure of the graph
(representation theorem)
• Relationship between Markov and Bayesian networks– MNs cannot represent v-structures of BNs
– BNs cannot represent chordless loops of MNs
– Chordal graphs can be represented (as P-maps) by bothSTAT 598L: Probabilistic Graphical Models (Markov Networks)
top related