![Page 1: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/1.jpg)
Graphical Models: LearningPradeep Ravikumar
Carnegie Mellon University
![Page 2: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/2.jpg)
Learning Graphical Models• In many contexts, we do not have an apriori specified graphical model
distribution
• All we have access to are iid samples drawn from *unknown* graphical model distribution
• Would like to estimate graphical model distribution from data:
• graph
• factor functions
• We focus on parametric graphical models where factor functions have a specific parametric form, so this entails estimating some parameters (rather than general functions)
![Page 3: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/3.jpg)
Learning Undirected Graphical Models
![Page 4: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/4.jpg)
We will focus on pairwise graphical models
p(X; , G) =1
Z()exp
(s,t)E(G)
st st(Xs, Xt)
st(xs, xt) : arbitrary potential functions
Ising xs xt
Potts I(xs = xt)Indicator I(xs, xt = j, k)
![Page 5: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/5.jpg)
Graphical Model Selection
Graphical model selection
let G = (V,E) be an undirected graph on p = |V | vertices
pairwise Markov random field: family of prob. distributions
P(x1, . . . , xp; θ) =1
Z(θ)exp
! "
(s,t)∈E
θstxsxt
#
Problem of graph selection: given n independent and identicallydistributed (i.i.d.) samples of X = (X1, . . . , Xp), identify the underlyinggraph structure
Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 7 / 36
Given: n samples of X = (X1, . . . , Xp) with distribution p(X;
;G), where
p(X;
) = exp
8<
:X
(s,t)2E(G)
stst(xs, xt)A(
)
9=
;
Problem: Estimate graph G given just the n samples.
?
![Page 6: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/6.jpg)
Samples from binary-valued pairwise MRFs
Independence model θst = 0
![Page 7: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/7.jpg)
Samples from binary-valued pairwise MRFs
Medium coupling θst ≈ 0.2
![Page 8: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/8.jpg)
Samples from binary-valued pairwise MRFs
Strong coupling θst ≈ 0.8
![Page 9: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/9.jpg)
Graphical Model Selection
Graphical model selection
let G = (V,E) be an undirected graph on p = |V | vertices
pairwise Markov random field: family of prob. distributions
P(x1, . . . , xp; θ) =1
Z(θ)exp
! "
(s,t)∈E
θstxsxt
#
Problem of graph selection: given n independent and identicallydistributed (i.i.d.) samples of X = (X1, . . . , Xp), identify the underlyinggraph structure
Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 7 / 36
Given: n samples of X = (X1, . . . , Xp) with distribution p(X;
;G), where
p(X;
) = exp
8<
:X
(s,t)2E(G)
stst(xs, xt)A(
)
9=
;
Problem: Estimate graph G given just the n samples.
?
![Page 10: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/10.jpg)
Learning Graphical Models
![Page 11: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/11.jpg)
Learning Graphical Models
• Two Step Procedures:
![Page 12: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/12.jpg)
Learning Graphical Models
• Two Step Procedures:
‣ 1. Model Selection; estimate graph structure
![Page 13: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/13.jpg)
Learning Graphical Models
• Two Step Procedures:
‣ 1. Model Selection; estimate graph structure
‣ 2. Parameter Inference given graph structure
![Page 14: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/14.jpg)
Learning Graphical Models
• Two Step Procedures:
‣ 1. Model Selection; estimate graph structure
‣ 2. Parameter Inference given graph structure
• Score Based Approaches: search over space of graphs, with a score for graph based on parameter inference
![Page 15: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/15.jpg)
Learning Graphical Models
• Two Step Procedures:
‣ 1. Model Selection; estimate graph structure
‣ 2. Parameter Inference given graph structure
• Score Based Approaches: search over space of graphs, with a score for graph based on parameter inference
• Constraint-based Approaches: estimate individual edges by hypothesis tests for conditional independences
![Page 16: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/16.jpg)
Learning Graphical Models
• Two Step Procedures:
‣ 1. Model Selection; estimate graph structure
‣ 2. Parameter Inference given graph structure
• Score Based Approaches: search over space of graphs, with a score for graph based on parameter inference
• Constraint-based Approaches: estimate individual edges by hypothesis tests for conditional independences
• Caveats: (a) difficult to provide guarantees for estimators; (b) estimators are NP-Hard
![Page 17: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/17.jpg)
Learning Graphical Models
• State of the art methods are based on estimating neighborhoods
‣ Via high-dimensional statistical model estimation
‣ Via high-dimensional hypothesis tests
![Page 18: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/18.jpg)
Ising Model Selection
Graphical model selection
let G = (V,E) be an undirected graph on p = |V | vertices
pairwise Markov random field: family of prob. distributions
P(x1, . . . , xp; θ) =1
Z(θ)exp
! "
(s,t)∈E
θstxsxt
#
Problem of graph selection: given n independent and identicallydistributed (i.i.d.) samples of X = (X1, . . . , Xp), identify the underlyinggraph structure
Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 7 / 36
Given: n samples of X = (X1, . . . , Xp) with distribution p(X;
;G), where
p(X;
) = exp
8<
:X
(s,t)2E(G)
stst(xs, xt)A(
)
9=
;
Problem: Estimate graph G given just the n samples.
?
p(X; ) = exp
8<
:X
(s,t)2E(G)
stXsXt A()
9=
;
![Page 19: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/19.jpg)
Ising Model Selection
Given: n samples of X = (X1, . . . , Xp) with distribution p(X;
;G), where
p(X;
) = exp
8<
:X
(s,t)2E(G)
stst(xs, xt)A(
)
9=
;
Problem: Estimate graph G given just the n samples.
p(X; ) = exp
8<
:X
(s,t)2E(G)
stXsXt A()
9=
;
Applications: statistical physics, computer vision, social network analysis
US Senate 109th Congress
Banerjee et al, 2008
![Page 20: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/20.jpg)
Ising Model Selection
• Just computing the likelihood of a known Ising model is NP Hard (since the normalization constant requires summing over exponentially many configurations)
Z() =
X
x21,1p
exp
X
st
st
x
s
x
t
!
![Page 21: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/21.jpg)
Ising Model Selection
• Just computing the likelihood of a known Ising model is NP Hard (since the normalization constant requires summing over exponentially many configurations)
• Estimating the unknown Ising model parameters as well as graph structure might seem to be NP Hard as well
Z() =
X
x21,1p
exp
X
st
st
x
s
x
t
!
![Page 22: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/22.jpg)
Ising Model Selection
• Just computing the likelihood of a known Ising model is NP Hard (since the normalization constant requires summing over exponentially many configurations)
• Estimating the unknown Ising model parameters as well as graph structure might seem to be NP Hard as well
• On the other hand, it is tractable to estimate the node-wise conditional distributions, of one variable conditioned on the rest of the variables
Z() =
X
x21,1p
exp
X
st
st
x
s
x
t
!
![Page 23: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/23.jpg)
Neighborhood Estimation in Ising Models
p(Xr|XV \r; , G) =exp(
tN(r) 2 rtXrXt)
exp(
tN(r) 2 rtXrXt) + 1
For Ising models, node conditional distribution is just a logistic regression model:
![Page 24: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/24.jpg)
Neighborhood Estimation in Ising Models
• So instead of estimating graph structure constrained global Ising model, we could estimate structure constrained local node-conditional distributions —logistic regression models
p(Xr|XV \r; , G) =exp(
tN(r) 2 rtXrXt)
exp(
tN(r) 2 rtXrXt) + 1
For Ising models, node conditional distribution is just a logistic regression model:
![Page 25: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/25.jpg)
Neighborhood Estimation in Ising Models
• So instead of estimating graph structure constrained global Ising model, we could estimate structure constrained local node-conditional distributions —logistic regression models
• But would node-conditional distributions uniquely specify a consistent joint, or even be consistent with any joint at all?
p(Xr|XV \r; , G) =exp(
tN(r) 2 rtXrXt)
exp(
tN(r) 2 rtXrXt) + 1
For Ising models, node conditional distribution is just a logistic regression model:
![Page 26: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/26.jpg)
Conditional and Joint Distributions
• Would node-conditional distributions uniquely specify a consistent joint, or even be consistent with any joint at all?
![Page 27: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/27.jpg)
Conditional and Joint Distributions
• Would node-conditional distributions uniquely specify a consistent joint, or even be consistent with any joint at all?
• In general: no!
![Page 28: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/28.jpg)
Conditional and Joint Distributions
• Would node-conditional distributions uniquely specify a consistent joint, or even be consistent with any joint at all?
• In general: no!
• But for the Ising model and node-wise logistic regression models: yes!
![Page 29: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/29.jpg)
Conditional and Joint Distributions
• Would node-conditional distributions uniquely specify a consistent joint, or even be consistent with any joint at all?
• In general: no!
• But for the Ising model and node-wise logistic regression models: yes!
• Theorem (Besag 1974, R., Wainwright, Lafferty 2010): An Ising model uniquely specifies and is uniquely specified by a set of node-wise logistic regression models.
![Page 30: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/30.jpg)
Neighborhood Estimation in Ising Models
• Global graph constraint of sparse, bounded degree graphs is equivalent to local constraint of bounded node-degrees (number of neighbors)
• Estimate node neighborhoods via constrained logistic regression models, and stich node-neighborhoods together to form global graph
p(Xr|XV \r; , G) =exp(
tN(r) 2 rtXrXt)
exp(
tN(r) 2 rtXrXt) + 1
For Ising models, node conditional distribution is just a logistic regression model:
![Page 31: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/31.jpg)
Graph selection via neighborhood regression
Observation: Recovering graph G equivalent to recovering neighborhood set N(s)for all s ∈ V .
Method: Given n i.i.d. samples X(1), . . . , X(n), perform logistic regression of
each node Xs on X\s := Xs, t = s to estimate neighborhood structure bN(s).
1 For each node s ∈ V , perform ℓ1 regularized logistic regression of Xs on theremaining variables X\s:
bθ[s] := arg minθ∈Rp−1
(1n
nX
i=1
f(θ; X(i)\s )
| z + ρn ∥θ∥1|z
)
logistic likelihood regularization
2 Estimate the local neighborhood bN(s) as the support (non-negative entries) of
the regression vector bθ[s].
3 Combine the neighborhood estimates in a consistent manner (AND, or ORrule).
Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 21 / 36
![Page 32: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/32.jpg)
Empirical behavior: Unrescaled plots
0 100 200 300 400 500 6000
0.2
0.4
0.6
0.8
1
Number of samples
Pro
b.
su
cce
ss
Star graph; Linear fraction neighbors
p = 64
p = 100
p = 225
Plots of success probability versus raw sample size .Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 22 / 36
![Page 33: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/33.jpg)
Sufficient conditions for consistent model selectiongraph sequences Gp,d = (V, E) with p vertices, and maximum degree d.
edge weights |θst| ≥ θmin for all (s, t) ∈ E
draw n i.i.d, samples, and analyze prob. success indexed by (n, p, d)
Theorem
Under incoherence conditions, for a rescaled sample size (RavWaiLaf06)
θLR(n, p, d) :=n
d3 log p> θcrit
and regularization parameter ρn ≥ c1 τ!
log pn , then with probability greater
than 1 − 2 exp"− c2(τ − 2) log p
#→ 1:
(a) Uniqueness: For each node s ∈ V , the ℓ1-regularized logistic convexprogram has a unique solution. (Non-trivial since p ≫ n =⇒ not strictly convex).
(b) Correct exclusion: The estimated sign neighborhood $N(s) correctlyexcludes all edges not in the true neighborhood.
(c) Correct inclusion: For θmin ≥ c3τ√
dρn, the method selects the correctsigned neighborhood.
Consequence: For θmin = Ω(1/d), it suffices to have n = Ω(d3 log p).
(R., Wainwright, Lafferty, 2010)
![Page 34: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/34.jpg)
Assumptions
Define Fisher information matrix of logistic regression:Q∗ := Eθ∗
!∇2f(θ∗;X)
".
A1. Dependency condition: Bounded eigenspectra:
Cmin ≤ λmin(Q∗SS), and λmax(Q∗
SS) ≤ Cmax.
λmax(Eθ∗ [XXT ]) ≤ Dmax.
A2. Incoherence There exists an ν ∈ (0, 1] such that
|||Q∗ScS(Q∗
SS)−1|||∞,∞ ≤ 1 − ν.
where |||A|||∞,∞ := maxi#
j |Aij |.
bounds on eigenvalues are fairly standard
incoherence condition:
! partly necessary (prevention of degenerate models)! partly an artifact of ℓ1-regularization
incoherence condition is weaker than correlation decay
Martin Wainwright (UC Berkeley) High-dimensional graph selection September 2009 26 / 36
![Page 35: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/35.jpg)
Other Undirected Graphical Models
• Similar estimators work for other undirected parametric graphical models as well
• Discrete/Categorical Graphical Models (Jalali, Ravikumar, Sanghavi, Ruan 2011)
• Gaussian Graphical Models (Ravikumar, Raskutti, Wainwright, Yu 2011)
• Exponential Family Graphical Models (Yang, Ravikumar, Allen, Liu 2015)
![Page 36: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/36.jpg)
Example: Mixed Graphical ModelsReferences
Experiments: Cancer Genomic and Transcriptomic DataCombine ‘Level III RNA-sequencing’ data and ‘Level II non-silent somatic mutationand level III copy number variation data’ for 697 breast cancer patients.
ABCC11
CXCL17
CEACAM6
CEACAM5
TSPYP5 RPS24P1
ASSP6ADLICANP PLIN4CXCL13
PDZK1
GREB1
PGR
OFDYP2
CDY2A
ARSEP
RBMY2XP
CDY9P
OFDYP13
CDY15P
FAM8A8P
GYG2P
XKRYP5
SEDLP5 OFDYP14
CDY4PCDY21P
EIF1AY
CDY22P
SEDLP4
XKRYP6
XKRYP2
OFDYP1
BPY2
OFDYP9
RBMY1H
ACTGP2
CDY13P CDY5P
XKRYNLGN4Y
TAF9P1 HDHD1BP
VCYCYCSP46
STSPCDY2B
CDY6P
CXCL9RBMY2QP
RBMY2NP ADIPOQ
TSPYP3 UBDPREX1SERPINA1
GATA3
FAM5C
SERPINA6
HCP5P14
CPB1
CALML5 CST9
CHAD
KLK11DHRS2
KIAA1972
NAT1 KLK6KLK5
EHF
CLEC3A
YESP
PVALB
GALNT6
ELF5
SH3BGRL
KRT23
NKAIN1
PRAME
NPC2STC1
WNK4CHGB
CARTPT RAP1AP RIMS4 CP
CDY7P
CDY8PCDH3
SEDLP3 IFI6
ADH1B FABP4MRP63P10
C4orf7 ARSDP
RBMY2OP
CD36
PCSK1
PIK3CA
TSPAN1
PYY
CBLN2VSTM2A
SYT1UGT2B11
BCAS1 TRH
DCDS100A14
CACNG4
SLC5A8
ABCA12
FAM106A
FADS2GLYATL2
CYP2A7
KRT6BTCN1
SFRP1
SOX10
SERPINB5 GABRP
COL17A1
TMC5
TGIF2LY
S100A6
USP12P3
CST5
SLC30A8
T2R55
CYP2A6
CYP2B7P1
PS5
NELL2
S100A9
S100A7
ADAM6
LOC96610
IGJ S100A8
KRT5
DSC3
FABP7
KRT14
DSG3
USP9YP2
SMCYTBL1YP
CYorf15A
ZNF381P
RPS4Y2P CYorf14
PRY
CDY10P
TSPYP4 CYorf15B
OFDYP5
TAF9P2 RBMY1B
CASKP
HSFY2
OFDYP6 TSPYP2
SMCYPUSP9YP1
RBMY2EP RBMY2KP RBMY2HP
AMELY
SRY
PSMA6P AYP1p1
AQP10
ATP8B2 ADAM32
UTY
ADAM5
OFDYP7
RBMY2TP FAM8A9P
RBMY1A1 OFDYP4
PCDH11Y
HSFY1XKRYP1
RBMY2SP BCORL2
KRT18P10
RBMY1J DAZ1
RBMY2VP RBMY2BP
RBMY2WP
DAZ2
TSPYP1 TBL1YPRKY
TSPY2RBMY2GP
RPS4Y1
AQP3
MUC2
GSTM1 COX6C MUC6
CGA GSTM3
ALBSMR3B
SLPI
PKP1
DLK1
TFF3
CNTNAP2 CYP4X1
CYP4Z1
SPDEF
TMEM150C RPL8
FOXA1
TFF1
C1orf64
CRISP3 C3orf57
A2ML1
THSD4
GSTT1
TP53
SLC7A5
LBP
IFIT1
GOLGA2LY1
OFDYP10
C8orf85 XKRYP3
XKRY2
OFDYP16
CDY16P
RBMY1D MX1
C20orf114
VCY1B
TMSB4Y ZFY
OFDYP3
CYorf16
RBMY2MP
CDY3P
KALP
RBMY1A3P
RBMY2UP
PRY2
RBMY1F
CDY11P
RBMY2AP
RBMY1E
OFDYP8
CDY12P
USP9YAPXLP
GPR143P FAM8A7P
RBMY2JP
DDX3Y
OFDYP15 OFDYP11 CDY17P
CDY18P OFDYP12 BPY2BRBMY3AP
SFPQP
CDY1PPP1R12BP
CDY20P
CSPG4LYP1
RBMY2DP
CYCSP48
RBMY2CP DAZ4CDY1B
XKRYP4
CSPG4LYP2
CDY14P DAZ3
CDY23P
GOLGA2LY2
BPY2CCDY19P PARP4P RBMY2YP
TSPY1
TPGM - Ising graphical model
(Yellow) Gene expression via RNA-sequencing, count-valued
(Blue) Genomic mutation, binary mutation status
Well known components: (DLK1, THSD4) - (TP53)
(UT Austin) Mixed Graphical Models via Exponential Families AISTATS 2014 22 / 25
References
Experiments: Cancer Genomic and Transcriptomic DataCombine ‘Level III RNA-sequencing’ data and ‘Level II non-silent somatic mutationand level III copy number variation data’ for 697 breast cancer patients.
ABCC11
CXCL17
CEACAM6
CEACAM5
TSPYP5 RPS24P1
ASSP6ADLICANP PLIN4CXCL13
PDZK1
GREB1
PGR
OFDYP2
CDY2A
ARSEP
RBMY2XP
CDY9P
OFDYP13
CDY15P
FAM8A8P
GYG2P
XKRYP5
SEDLP5 OFDYP14
CDY4PCDY21P
EIF1AY
CDY22P
SEDLP4
XKRYP6
XKRYP2
OFDYP1
BPY2
OFDYP9
RBMY1H
ACTGP2
CDY13P CDY5P
XKRYNLGN4Y
TAF9P1 HDHD1BP
VCYCYCSP46
STSPCDY2B
CDY6P
CXCL9RBMY2QP
RBMY2NP ADIPOQ
TSPYP3 UBDPREX1SERPINA1
GATA3
FAM5C
SERPINA6
HCP5P14
CPB1
CALML5 CST9
CHAD
KLK11DHRS2
KIAA1972
NAT1 KLK6KLK5
EHF
CLEC3A
YESP
PVALB
GALNT6
ELF5
SH3BGRL
KRT23
NKAIN1
PRAME
NPC2STC1
WNK4CHGB
CARTPT RAP1AP RIMS4 CP
CDY7P
CDY8PCDH3
SEDLP3 IFI6
ADH1B FABP4MRP63P10
C4orf7 ARSDP
RBMY2OP
CD36
PCSK1
PIK3CA
TSPAN1
PYY
CBLN2VSTM2A
SYT1UGT2B11
BCAS1 TRH
DCDS100A14
CACNG4
SLC5A8
ABCA12
FAM106A
FADS2GLYATL2
CYP2A7
KRT6BTCN1
SFRP1
SOX10
SERPINB5 GABRP
COL17A1
TMC5
TGIF2LY
S100A6
USP12P3
CST5
SLC30A8
T2R55
CYP2A6
CYP2B7P1
PS5
NELL2
S100A9
S100A7
ADAM6
LOC96610
IGJ S100A8
KRT5
DSC3
FABP7
KRT14
DSG3
USP9YP2
SMCYTBL1YP
CYorf15A
ZNF381P
RPS4Y2P CYorf14
PRY
CDY10P
TSPYP4 CYorf15B
OFDYP5
TAF9P2 RBMY1B
CASKP
HSFY2
OFDYP6 TSPYP2
SMCYPUSP9YP1
RBMY2EP RBMY2KP RBMY2HP
AMELY
SRY
PSMA6P AYP1p1
AQP10
ATP8B2 ADAM32
UTY
ADAM5
OFDYP7
RBMY2TP FAM8A9P
RBMY1A1 OFDYP4
PCDH11Y
HSFY1XKRYP1
RBMY2SP BCORL2
KRT18P10
RBMY1J DAZ1
RBMY2VP RBMY2BP
RBMY2WP
DAZ2
TSPYP1 TBL1YPRKY
TSPY2RBMY2GP
RPS4Y1
AQP3
MUC2
GSTM1 COX6C MUC6
CGA GSTM3
ALBSMR3B
SLPI
PKP1
DLK1
TFF3
CNTNAP2 CYP4X1
CYP4Z1
SPDEF
TMEM150C RPL8
FOXA1
TFF1
C1orf64
CRISP3 C3orf57
A2ML1
THSD4
GSTT1
TP53
SLC7A5
LBP
IFIT1
GOLGA2LY1
OFDYP10
C8orf85 XKRYP3
XKRY2
OFDYP16
CDY16P
RBMY1D MX1
C20orf114
VCY1B
TMSB4Y ZFY
OFDYP3
CYorf16
RBMY2MP
CDY3P
KALP
RBMY1A3P
RBMY2UP
PRY2
RBMY1F
CDY11P
RBMY2AP
RBMY1E
OFDYP8
CDY12P
USP9YAPXLP
GPR143P FAM8A7P
RBMY2JP
DDX3Y
OFDYP15 OFDYP11 CDY17P
CDY18P OFDYP12 BPY2BRBMY3AP
SFPQP
CDY1PPP1R12BP
CDY20P
CSPG4LYP1
RBMY2DP
CYCSP48
RBMY2CP DAZ4CDY1B
XKRYP4
CSPG4LYP2
CDY14P DAZ3
CDY23P
GOLGA2LY2
BPY2CCDY19P PARP4P RBMY2YP
TSPY1
TPGM - Ising graphical model
(Yellow) Gene expression via RNA-sequencing, count-valued
(Blue) Genomic mutation, binary mutation status
Well known components: (DLK1, THSD4) - (TP53)
(UT Austin) Mixed Graphical Models via Exponential Families AISTATS 2014 22 / 25
Poisson-Ising Models
![Page 37: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/37.jpg)
Example: Poisson Graphical Models
• MicroRNA network learnt from The Cancer Genome Atlas (TCGA) Breast Cancer Level II Data
Case Study: Biological Results
SPGM miRNA Network:
An Important Special Case: Poisson Graphical Model
Joint Distribution:
P(X ) = exp
8
<
:
X
s
s
X
s
+X
(s,t)2E
st
X
s
X
t
+X
s
log(Xs
!) A()
9
=
;
.
Node-conditional Distributions:
P(Xs
|XV \s) / exp
8
<
:
0
@s
+X
t2N(s)
st
X
t
1
A
X
s
+ log(Xs
!)
9
=
;
,
Pairwise variant discussed as “Poisson auto-model” in (Besag, 74).
![Page 38: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/38.jpg)
Learning Directed Graphical Models
![Page 39: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/39.jpg)
Moralization
• The undirected graphical corresponding to moralized graph is the smallest undirected graphical model that includes directed graphical model distribution
• Learning undirected graphical model structure given iid samples from directed graphical model would estimate moralized graph
Recall: Moralization
1X
2X
3X
X 4
X 5
X61X
2X
3X
X 4
X 5
X6
(a) (b)
1X
2X
3X
X 4
X 5
X61X
2X
3X
X 4
X 5
X6
(a) (b)
![Page 40: Graphical Models: Learning - Data Science Summer School€¦ · Learning Graphical Models • In many contexts, we do not have an apriori specified graphical model distribution •](https://reader035.vdocuments.us/reader035/viewer/2022070800/5f0220dc7e708231d402b638/html5/thumbnails/40.jpg)
Learning Directed Graphical Models
• Two Step Process:
• Learn the undirected “moralized” graph
• Orient the edges using conditional independence tests
• PC algorithm — Spirtes, Glymour & Scheines (2000)
• Open Problem: Not very scalable algorithms for the second step (or for overall directed graphical model estimation)