department of statistics university of...

26
Algebraic Problems in Graphical Modeling Mathias Drton Department of Statistics University of Chicago

Upload: others

Post on 20-Apr-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Algebraic Problems in Graphical Modeling

Mathias Drton

Department of StatisticsUniversity of Chicago

Page 2: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Outline

1 What (roughly) are graphical models?a.k.a. Markov random fields, Bayesian networks,. . .

2 Gaussian models on undirected graphs

Determining conditional independences Maximum likelihood estimation

3 Gaussian models on directed graphs

Model (Markov) equivalence Parameter identification

[Drton, Sturmfels and Sullivant: Lectures on Algebraic Statistics, Ober-wolfach Seminars Series, Vol. 39, Birkhauser, Basel, 2009]

Mathias Drton 2 / 14

Page 3: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

What (roughly) are graphical models?

Data Realizations of random variables X1, . . . ,Xp

Statistical model Family of candidates for joint distribution of(X1, . . . ,Xp)

Graphical model Statistical model associated with a graph thathas X1, . . . ,Xp as nodes

Points of view (i) Density function factors over graph:

f (x1, x2, x3) = g(x1, x2)h(x2, x3)

(ii) Non-adjacent r.v. ‘somehow’ independent:

X1 independent of X3 given X2

Graphical models Mathias Drton 3 / 14

Page 4: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

What are graphical models good for?

Very literal application: Inference of networks

(Sachs et al., 2005, Science)

Suitably sparse graphs yield scalable models.(e.g., think of computing with 100 binary variables)

Graph helps structure computations.

Combinatorial answers to statistical questions.

Graphical models Mathias Drton 4 / 14

Page 5: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Gaussian/Multivariate normal distribution

Let µ ∈ Rp be any vector.

Let Σ ∈ Rp×p be a positive definite matrix.

Definition

The distribution with probability density function

f (x) =1√

(2π)p det(Σ)exp

−1

2(x − µ)TΣ−1(x − µ)

, x ∈ Rp,

is called the Gaussian or multivariate normal distribution withmean µ and covariance matrix Σ; in symbols Np(µ,Σ).

Graphical models Mathias Drton 5 / 14

Page 6: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Undirected Gaussian models

X1 X2

X4 X3

Inverse covariance matrix

Σ−1 =

σ11 σ12 0 σ14

σ12 σ22 σ23 00 σ23 σ33 σ34

σ14 0 σ34 σ44

Because exponent of the Gaussian density is

zTΣ−1z =4∑

i=1

4∑j=1

σijzizj ,

we have the density factorization

f (x1, . . . , x4) = g12(x1, x2)g23(x2, x3)g34(x3, x4)g14(x1, x4).

Undirected graphs Mathias Drton 6 / 14

Page 7: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Reading off conditional independences

If Xi and Xj are non-adjacent in the graph, then

Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j

Are there other conditional independences?

Check

XA⊥⊥XB | XC ⇐⇒ rank(ΣAC×BC ) ≤ |C |,

for pairwise disjoint sets A,B,C ⊂ 1, . . . , p.

Theorem (Global Markov property)

If Np(µ,Σ) is in a graphical model, then

C separates A and B in the graph =⇒ XA⊥⊥XB | XC ,

and equivalence holds for generic distributions in the model.

Undirected graphs Mathias Drton 7 / 14

Page 8: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Reading off conditional independences

If Xi and Xj are non-adjacent in the graph, then

Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j

Are there other conditional independences? Check

XA⊥⊥XB | XC ⇐⇒ rank(ΣAC×BC ) ≤ |C |,

for pairwise disjoint sets A,B,C ⊂ 1, . . . , p.

Theorem (Global Markov property)

If Np(µ,Σ) is in a graphical model, then

C separates A and B in the graph =⇒ XA⊥⊥XB | XC ,

and equivalence holds for generic distributions in the model.

Undirected graphs Mathias Drton 7 / 14

Page 9: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Reading off conditional independences

If Xi and Xj are non-adjacent in the graph, then

Σ−1ij = 0 ⇐⇒ Xi⊥⊥Xj | Xk : k 6= i , j

Are there other conditional independences? Check

XA⊥⊥XB | XC ⇐⇒ rank(ΣAC×BC ) ≤ |C |,

for pairwise disjoint sets A,B,C ⊂ 1, . . . , p.

Theorem (Global Markov property)

If Np(µ,Σ) is in a graphical model, then

C separates A and B in the graph =⇒ XA⊥⊥XB | XC ,

and equivalence holds for generic distributions in the model.

Undirected graphs Mathias Drton 7 / 14

Page 10: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Maximum likelihood estimation

Optimize the log-likelihood function

Σ−1 7→ log det(Σ−1)− trace(Σ−1 · S),

where S is a data-derived positive definite matrix.

How difficult? What is algebraic degree of ‘likelihood equations’?

Theorem

The following two statements are equivalent:

(i) The ML estimator in the Gaussian graphical model associatedwith the graph G is a rational function of S.

(ii) The graph G is chordal.

More on ML estimation −→ Caroline Uhler

Undirected graphs Mathias Drton 8 / 14

Page 11: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Maximum likelihood estimation

Optimize the log-likelihood function

Σ−1 7→ log det(Σ−1)− trace(Σ−1 · S),

where S is a data-derived positive definite matrix.

How difficult? What is algebraic degree of ‘likelihood equations’?

Theorem

The following two statements are equivalent:

(i) The ML estimator in the Gaussian graphical model associatedwith the graph G is a rational function of S.

(ii) The graph G is chordal.

More on ML estimation −→ Caroline Uhler

Undirected graphs Mathias Drton 8 / 14

Page 12: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Maximum likelihood estimation

Optimize the log-likelihood function

Σ−1 7→ log det(Σ−1)− trace(Σ−1 · S),

where S is a data-derived positive definite matrix.

How difficult? What is algebraic degree of ‘likelihood equations’?

Theorem

The following two statements are equivalent:

(i) The ML estimator in the Gaussian graphical model associatedwith the graph G is a rational function of S.

(ii) The graph G is chordal.

More on ML estimation −→ Caroline Uhler

Undirected graphs Mathias Drton 8 / 14

Page 13: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Directed graphs

X1

X2

X3

X4

Structural/regression equations:

X1 = ε1,

X2 = λ12X1 + ε2,

X3 = λ13X1 + λ23X2 + ε3,

X4 = λ34X3 + ε4,

with independent errors εi ∼ N (0, ωi ).

So,X1

X2

X3

X4

=

1 −λ12 −λ13 00 1 −λ23 00 0 1 −λ340 0 0 1

−T

ε1ε2ε3ε4

Directed graphs Mathias Drton 9 / 14

Page 14: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Directed Gaussian models

X1

X2

X3

X4

Gaussian distribution in directed graphicalmodel if

Σ = (I − Λ)−TΩ(I − Λ)−1

for Λ ∈ RE and Ω 0 diagonal.

Factorization:

f (x) =

p∏i=1

fi(xi | xpa(i)

)= f1(x1) f2(x2 | x1) f3(x3 | x1, x2) f4(x4 | x3)

Are models associated with different graphs different?

Directed graphs Mathias Drton 10 / 14

Page 15: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Directed Gaussian models

X1

X2

X3

X4

Gaussian distribution in directed graphicalmodel if

Σ = (I − Λ)−TΩ(I − Λ)−1

for Λ ∈ RE and Ω 0 diagonal.

Factorization:

f (x) =

p∏i=1

fi(xi | xpa(i)

)= f1(x1) f2(x2 | x1) f3(x3 | x1, x2) f4(x4 | x3)

Are models associated with different graphs different?

Directed graphs Mathias Drton 10 / 14

Page 16: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Model equivalence

Useful to obtain implicit description of the image of

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Model cut out by conditional independences: d-separation

Theorem

The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have

(i) same skeleton, and

(ii) same unshielded colliders (induced subgraphs u → v ← w).

Directed graphs Mathias Drton 11 / 14

Page 17: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Model equivalence

Useful to obtain implicit description of the image of

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Model cut out by conditional independences: d-separation

Theorem

The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have

(i) same skeleton, and

(ii) same unshielded colliders (induced subgraphs u → v ← w).

Directed graphs Mathias Drton 11 / 14

Page 18: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Model equivalence

Useful to obtain implicit description of the image of

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Model cut out by conditional independences: d-separation

Theorem

The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have

(i) same skeleton, and

(ii) same unshielded colliders (induced subgraphs u → v ← w).

X1 X2 X3 ≡ X1 X2 X3

Directed graphs Mathias Drton 11 / 14

Page 19: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Model equivalence

Useful to obtain implicit description of the image of

φG (Λ,Ω) = (I − Λ)−TΩ(I − Λ)−1.

Model cut out by conditional independences: d-separation

Theorem

The images of φG1 and φG2 for two acyclic digraphs G1 = (V ,E1)and G2 = (V ,E2) are the same if and only if G1 and G2 have

(i) same skeleton, and

(ii) same unshielded colliders (induced subgraphs u → v ← w).

X1 X2 X3 6≡ X1 X2 X3

Directed graphs Mathias Drton 11 / 14

Page 20: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Hidden/unobserved variables

Hidden variables: Project to principal submatrix of Σ

Example: ‘Verma graph’

X1

X3 X5X2 X4

Are models still cut out by conditional independence?

Verma graph: Relations in Σ2345×2345 generated by

σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35

−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.

More on related topics−→ Jan Draisma, Kelli Talaska & Thomas Richardson

Directed graphs Mathias Drton 12 / 14

Page 21: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Hidden/unobserved variables

Hidden variables: Project to principal submatrix of Σ

Example: ‘Verma graph’

X1

X3 X5X2 X4

Are models still cut out by conditional independence?

Verma graph: Relations in Σ2345×2345 generated by

σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35

−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.

More on related topics−→ Jan Draisma, Kelli Talaska & Thomas Richardson

Directed graphs Mathias Drton 12 / 14

Page 22: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Hidden/unobserved variables

Hidden variables: Project to principal submatrix of Σ

Example: ‘Verma graph’

X1

X3 X5X2 X4

Are models still cut out by conditional independence?

Verma graph: Relations in Σ2345×2345 generated by

σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ224σ35 + σ22σ24σ34σ35

−σ223σ25σ44 + σ22σ25σ33σ44 + σ223σ24σ45 − σ22σ24σ33σ45.

More on related topics−→ Jan Draisma, Kelli Talaska & Thomas Richardson

Directed graphs Mathias Drton 12 / 14

Page 23: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Identification

Parametrization map φG for an acyclic digraph G is injective.

Statistical inference about all parameters possible because wecan recover = identify all λij and ωi from the cov. matrix Σ.

What about hidden variables?

Identifiability questions can be answered by studying the ideal⟨σij−[(I−Λ)−TΩ(I−Λ)−1]ij , i ≤ j observed

⟩⊂ R[Σ,Λ,Ω].

Directed graphs Mathias Drton 13 / 14

Page 24: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Identification

Parametrization map φG for an acyclic digraph G is injective.

Statistical inference about all parameters possible because wecan recover = identify all λij and ωi from the cov. matrix Σ.

What about hidden variables?

Identifiability questions can be answered by studying the ideal⟨σij−[(I−Λ)−TΩ(I−Λ)−1]ij , i ≤ j observed

⟩⊂ R[Σ,Λ,Ω].

Directed graphs Mathias Drton 13 / 14

Page 25: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Identification

Example: ‘Verma graph’

X1

X3 X5X2 X4

Ideal contains in particular

λ45σ24 − σ25,λ45(σ22σ33σ44 − σ22σ234 − σ223σ44

)+ σ23σ25σ34−

− σ23σ24σ35 + σ22σ34σ35 + σ223σ45 − σ22σ33σ45.

More on identification−→ Luis Garcia & Rina Foygel (Gaussian models)−→ Jason Morton & Marco Valtorta (Discrete models)

Directed graphs Mathias Drton 14 / 14

Page 26: Department of Statistics University of Chicagogalton.uchicago.edu/~drton/SIAM_AG11/siam_ag11_session1_drton.pdfLuis Garcia & Rina Foygel(Gaussian models)! Jason Morton & Marco Valtorta(Discrete

Identification

Example: ‘Verma graph’

X1

X3 X5X2 X4

Ideal contains in particular

λ45σ24 − σ25,λ45(σ22σ33σ44 − σ22σ234 − σ223σ44

)+ σ23σ25σ34−

− σ23σ24σ35 + σ22σ34σ35 + σ223σ45 − σ22σ33σ45.

More on identification−→ Luis Garcia & Rina Foygel (Gaussian models)−→ Jason Morton & Marco Valtorta (Discrete models)

Directed graphs Mathias Drton 14 / 14