department of statistics university of...

Algebraic Problems in Graphical Modeling

Mathias Drton

Department of StatisticsUniversity of Chicago

Outline

1 What (roughly) are graphical models?a.k.a. Markov random fields, Bayesian networks,. . .

2 Gaussian models on undirected graphs

Determining conditional independences Maximum likelihood estimation

3 Gaussian models on directed graphs

Model (Markov) equivalence Parameter identification

[Drton, Sturmfels and Sullivant: Lectures on Algebraic Statistics, Ober-wolfach Seminars Series, Vol. 39, Birkhauser, Basel, 2009]

Mathias Drton 2 / 14

What (roughly) are graphical models?

Data Realizations of random variables X1, . . . ,Xp

Statistical model Family of candidates for joint distribution of(X1, . . . ,Xp)

Graphical model Statistical model associated with a graph thathas X1, . . . ,Xp as nodes

Points of view (i) Density function factors over graph:

f (x1, x2, x3) = g(x1, x2)h(x2, x3)

(ii) Non-adjacent r.v. ‘somehow’ independent:

X1 independent of X3 given X2

Graphical models Mathias Drton 3 / 14

What are graphical models good for?

Very literal application: Inference of networks

(Sachs et al., 2005, Science)

Suitably sparse graphs yield scalable models.(e.g., think of computing with 100 binary variables)

Graph helps structure computations.

Combinatorial answers to statistical questions.

Gaussian/Multivariate normal distribution

Let µ ∈ Rp be any vector.

Let Σ ∈ Rp×p be a positive definite matrix.

Definition

The distribution with probability density function

f (x) =1√

(2π)p det(Σ)exp

2(x − µ)TΣ−1(x − µ)

, x ∈ Rp,

is called the Gaussian or multivariate normal distribution withmean µ and covariance matrix Σ; in symbols Np(µ,Σ).

Undirected Gaussian models

Inverse covariance matrix

Σ−1 =

σ11 σ12 0 σ14

σ12 σ22 σ23 00 σ23 σ33 σ34

σ14 0 σ34 σ44

Because exponent of the Gaussian density is

zTΣ−1z =4∑

4∑j=1

σijzizj ,

we have the density factorization

f (x1, . . . , x4) = g12(x1, x2)g23(x2, x3)g34(x3, x4)g14(x1, x4).

Undirected graphs Mathias Drton 6 / 14

Reading off conditional independences

If Xi and Xj are non-adjacent in the graph, then

Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j

Are there other conditional independences?

XA⊥⊥XB | XC ⇐⇒ rank(ΣAC×BC ) ≤ |C |,

for pairwise disjoint sets A,B,C ⊂ 1, . . . , p.

Theorem (Global Markov property)

If Np(µ,Σ) is in a graphical model, then

C separates A and B in the graph =⇒ XA⊥⊥XB | XC ,

and equivalence holds for generic distributions in the model.

Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j

Are there other conditional independences? Check

Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j

Are there other conditional independences? Check

Maximum likelihood estimation

Optimize the log-likelihood function

Σ−1 7→ log det(Σ−1)− trace(Σ−1 · S),

where S is a data-derived positive definite matrix.

How difficult? What is algebraic degree of ‘likelihood equations’?

Theorem

The following two statements are equivalent:

(i) The ML estimator in the Gaussian graphical model associatedwith the graph G is a rational function of S.

(ii) The graph G is chordal.

More on ML estimation −→ Caroline Uhler

Theorem

Directed graphs

Structural/regression equations:

X1 = ε1,

X2 = λ12X1 + ε2,

X3 = λ13X1 + λ23X2 + ε3,

X4 = λ34X3 + ε4,

with independent errors εi ∼ N (0, ωi ).

1 −λ12 −λ13 00 1 −λ23 00 0 1 −λ340 0 0 1

ε1ε2ε3ε4

Directed graphs Mathias Drton 9 / 14

Directed Gaussian models

Gaussian distribution in directed graphicalmodel if

Σ = (I − Λ)−TΩ(I − Λ)−1

for Λ ∈ RE and Ω 0 diagonal.

Factorization:

f (x) =

p∏i=1

fi(xi | xpa(i)

)= f1(x1) f2(x2 | x1) f3(x3 | x1, x2) f4(x4 | x3)

Are models associated with different graphs different?

Directed Gaussian models

Gaussian distribution in directed graphicalmodel if

Σ = (I − Λ)−TΩ(I − Λ)−1

for Λ ∈ RE and Ω 0 diagonal.

Factorization:

f (x) =

p∏i=1

fi(xi | xpa(i)

)= f1(x1) f2(x2 | x1) f3(x3 | x1, x2) f4(x4 | x3)

Are models associated with different graphs different?

Model equivalence

Useful to obtain implicit description of the image of

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Model cut out by conditional independences: d-separation

Theorem

The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have

(i) same skeleton, and

(ii) same unshielded colliders (induced subgraphs u → v ← w).

Model equivalence

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Theorem

Model equivalence

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Theorem

X1 X2 X3 ≡ X1 X2 X3

Model equivalence

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Theorem

X1 X2 X3 6≡ X1 X2 X3

Hidden/unobserved variables

Hidden variables: Project to principal submatrix of Σ

Example: ‘Verma graph’

X3 X5X2 X4

Are models still cut out by conditional independence?

Verma graph: Relations in Σ2345×2345 generated by

σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35

−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.

More on related topics−→ Jan Draisma, Kelli Talaska & Thomas Richardson

X3 X5X2 X4

σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35

−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.

X3 X5X2 X4

σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35

−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.

Identification

Parametrization map φG for an acyclic digraph G is injective.

Statistical inference about all parameters possible because wecan recover = identify all λij and ωi from the cov. matrix Σ.

What about hidden variables?

Identifiability questions can be answered by studying the ideal⟨σij−[(I−Λ)−TΩ(I−Λ)−1]ij , i ≤ j observed

⟩⊂ R[Σ,Λ,Ω].

Identification

Parametrization map φG for an acyclic digraph G is injective.

Statistical inference about all parameters possible because wecan recover = identify all λij and ωi from the cov. matrix Σ.

What about hidden variables?

Identifiability questions can be answered by studying the ideal⟨σij−[(I−Λ)−TΩ(I−Λ)−1]ij , i ≤ j observed

⟩⊂ R[Σ,Λ,Ω].

Identification

X3 X5X2 X4

Ideal contains in particular

λ45σ24 − σ25,λ45(σ22σ33σ44 − σ22σ234 − σ223σ44

)+ σ23σ25σ34−

− σ23σ24σ35 + σ22σ34σ35 + σ223σ45 − σ22σ33σ45.

More on identification−→ Luis Garcia & Rina Foygel (Gaussian models)−→ Jason Morton & Marco Valtorta (Discrete models)

Identification

X3 X5X2 X4

Ideal contains in particular

λ45σ24 − σ25,λ45(σ22σ33σ44 − σ22σ234 − σ223σ44

)+ σ23σ25σ34−

− σ23σ24σ35 + σ22σ34σ35 + σ223σ45 − σ22σ33σ45.

More on identification−→ Luis Garcia & Rina Foygel (Gaussian models)−→ Jason Morton & Marco Valtorta (Discrete models)

department of statistics university of...

Documents