estimation and inference for differential networks · estimatedbrainnetworks...
TRANSCRIPT
![Page 2: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/2.jpg)
Collaborators
Byol Kim (U Chicago)
Song Liu (U Bristol)
2
![Page 3: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/3.jpg)
Motivation
Example: ADHD-200 brain imaging dataset (Biswal et al. 2010)
The ADHD-200 dataset is a collection of resting-state functional MRI ofsubjects with and without attention deficit hyperactive disorder.
I subjects: 491 controls, 195 cases diagnosed with ADHD of varioustypes
I for each subject there are between 76 and 276 R-fMRI scansI focus on 264 voxels as the regions of interest (ROI); extracted by
Power et al. (2011)
We are interested in understanding how the structure of the neural networkvaries with age of subjects. (Lu, Kolar, and Liu 2017)
3
![Page 4: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/4.jpg)
Estimated Brain NetworksThe differences between junior, median and senior neural networks.
I The green edges only exist in G(11.75) and the red edges only existin G(8).
I The green edges only exist in G(20) and the red edges only exist inG(11.75).
coronal sagittal transverse
(a) G(8) v.s. G(11.75)
(b) G(11.75) v.s. G(20) 4
![Page 5: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/5.jpg)
Limitations
While we can estimate the networks and their difference, it is hard toquantify uncertainty in the estimates and perform statistical inference.
The goal here is to develop methodology capable of doing so.
5
![Page 6: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/6.jpg)
Probabilistic Graphical Models
- Graph G = (V ,E ) with p nodes- Random vector X = (X1, . . . ,Xp)′
Represents conditional independence relationships between nodes
Useful for exploring associations between measured variables
(a, b) 6∈ E ⇐⇒ Xa ⊥ Xb | Xab
X1 ⊥ X6 | X2, . . . ,X51 2
3
45
6
6
![Page 7: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/7.jpg)
Structure Learning ProblemGiven an i.i.d. sample Dn = xin
i=1 from a distribution P ∈ PLearn the set of conditional independence relationships
G = G(Dn)
1 2
3
45
6 ?
Some literature: Yuan and Lin (2007), O. Banerjee, El Ghaoui, and d’Aspremont(2008), Friedman, Hastie, and Tibshirani (2008), T. T. Cai, Liu, and Luo (2011),Meinshausen and Bühlmann (2006), Ravikumar, Wainwright, and Lafferty(2010), Xue, Zou, and Ca (2012), Yang et al. (2012), Yang et al. (2015), Yanget al. (2013), Yang et al. (2014), . . .
7
![Page 8: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/8.jpg)
Difference Estimation
8
![Page 9: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/9.jpg)
Literature Review: Gaussian Graphical Models
Penalized estimation (B. Zhang and Wang 2012, Danaher, Wang, andWitten (2014))
Ωx , Ωy = arg maxΩx0,Ωy0
∑c∈x ,y
(log |Ωc | − tr ΣcΩc − λ ‖Ωc‖1
)−λ2 ‖Ωx − Ωy‖1
Direct difference estimation (S. D. Zhao, Cai, and Li 2014)
∆ = arg min∆‖∆‖1 subject to ‖Σx ∆Σy − (Σy − Σx )‖∞ ≤ λ
9
![Page 10: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/10.jpg)
Literature Review: Inference for Graphical Models
Inference is available for single group testingI Ren et al. (2015), Lu, Kolar, and Liu (2017), J. Wang and Kolar
(2016), Barber and Kolar (2015), Yu, Gupta, and Kolar (2016), Chenet al. (2015), Jankova and van de Geer (2015, 2016)
Xia, Cai, and Cai (2015) combines inference results for each precisionmatrix based on Ren et al. (2015). We will discuss this result more later.
10
![Page 11: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/11.jpg)
Inference for Differential Markov Random Fields
11
![Page 12: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/12.jpg)
The Setting
Let F = f ( · ; θ) be a parametric family of probability distributions ofthe form
f (y ; θ) = Z (θ)−1 exp( ∑
1≤i≤j≤mθij ψij(yi , yj)
)= Z (θ)−1 exp
(θ>ψ(y)
)
Given samplesI xii∈[nx ] from f (x ; θx ), andI yii∈[ny ] from f (y ; θy )
Our goal is to estimate the change
δθ = θx − θy
12
![Page 13: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/13.jpg)
Density Ratio Approach
Kullback-Leibler importance estimation procedure (KLIEP) (Sugiyama etal. 2008)
r(y ; δθ) = fx (y)fy (y) =
(Z (θx )Z (θy )
)−1exp
((θx − θy︸ ︷︷ ︸
δθ
)>ψ(y)
)
δθ = arg minδθ
DKL(fx ‖ r( · ; δθ)fy )
= arg minδθ
− Ex
[δ>θ ψ(x)
]+ logEy
[exp
(δ>θ ψ(y)
) ].
13
![Page 14: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/14.jpg)
Sparse Direct Difference Estimation
Given the data, we replace the expectations with the appropriate sampleaverages to form the empirical KLIEP loss
`KLIEP(δθ; X ,Y ) = − 1nx
nx∑i=1
δ>θ ψ(xi ) + log[1ny
ny∑j=1
exp(δ>θ ψ(yj)
)]
Under a suitable set of assumptions, the penalized estimator
δθ = arg minδθ
`KLIEP(δθ; X ,Y ) + λ‖δθ‖1
can be shown to be consistent (S. Liu et al. 2017, Fazayeli and Banerjee(2016)).
14
![Page 15: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/15.jpg)
Inference For a Low Dimensional Component
Let δθ = (δθ,1, δθ,2). Our goal is construction of an asymptotically Normalestimator of δθ,1.
An oracle estimator
δθ,1 = arg minδθ,1
`KLIEP(δθ,1, δ∗θ,2)
is asymptotically Normal.
What about a feasible estimator
δθ,1 = arg minδθ,1
`KLIEP(δθ,1, δθ,2)?
15
![Page 16: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/16.jpg)
Performance of a Naive Estimator
16
![Page 17: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/17.jpg)
Feasible Plug-in Estimator
0 = ∂1`KLIEP(δθ,1, δθ,2) = ∂1`KLIEP(δ∗θ,1, δ∗θ,2)
+ H11(δθ,1 − δ∗θ,1) + H12(δθ,2 − δ∗θ,2) + op(1)
I H = ∇2`KLIEP(δ∗θ ; Y )
I δθ,2 is a consistent estimator of δ∗θ,2 with ‖δθ,2 − δ∗θ,2‖2 .P
√s log(p)
n
The problem is that the limiting distribution depends on the modelselection procedure used to estimate δ∗θ,2.
17
![Page 18: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/18.jpg)
ω-Corrected Score Function
Consider a linear combination of the components of the score vector
Sω(δθ,1, δθ,2) = ∂1`KLIEP(δθ,1, δθ,2)− ω>∂2`KLIEP(δθ,1, δθ,2)
Our estimator δθ,1 is a solution to the following estimating equation
0 = Sω(δθ,1, δθ,2)
18
![Page 19: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/19.jpg)
ω-Corrected Score Function
0 =√
nSω(δθ,1, δθ,2) =√
nSω(δ∗θ,1, δ∗θ,2)
+√
n(H11 − ωT H21
) (δθ,1 − δ∗θ,1
)+√
n(H12 − ωT H22
) (δθ,2 − δ∗θ,2
)+ oP(1)
In order to have√
n(δθ,1 − δ∗θ,1
)→D N(0, σ2
δθ,1) we need:
I E[Sω(δθ,1, δθ,2)] = 0I√
n(H12 − ωT H22
) (δθ,2 − δ∗θ,2
)vanishes sufficiently fast
19
![Page 20: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/20.jpg)
ω-Corrected Score Function
Idea from Ning and Liu (2017)
Construct ω as a solution to
H12 = ωT H22
That is ω = H−122 H21
Sample estimate obtained as
ω = arg minω
12ω>H(δθ; Y )ω − ω>e1 + λ2‖ω‖1
I needs to converge sufficiently fast
20
![Page 21: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/21.jpg)
Our Practical Procedure
Compute an initial estimate δθ:
δθ ← arg minδθ
`KLIEP(δθ; X ,Y ) + λ1‖δθ‖1
δθ ← arg minδθ
`KLIEP(δθ; X ,Y ) : supp(δθ) ⊆ supp(δθ)
Compute a sparse approximation ω to the corresponding row of the inverseof the Hessian:
ω ← arg minω
12ω>H(δθ; Y )ω − ω>e1 + λ2‖ω‖1
Re-estimate on the union of the supports from Steps 1 and 2:
(δθ,1, δθ,2)← arg minδθ
`KLIEP(δθ; X ,Y ) : supp(δθ) ⊆ supp(δθ) ∪ supp(ω)
21
![Page 22: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/22.jpg)
Technical Conditions
High Level Technical Conditions:I consistency for initial parameter estimation
I deviaton for the gradient and Hessian
I local smoothness of the loss
I conditions for the central limit theorem and estimation of the variance
22
![Page 23: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/23.jpg)
Technical Conditions
I ‖ψ(y)‖∞ = M <∞ with probability 1I λmin(H) ≥ λ > 0I ω is approximately sparseI ∀∆ ∈ E ∩ B(0, ‖δ∗θ‖1)
1n2
y
∑1≤j<j′≤ny
〈ψ(yj)− ψ(yj′ ),∆〉2 ≥ C2(κ1‖∆‖2 − κ2
log pny‖∆‖2
1
)
with probability 1− δny
23
![Page 24: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/24.jpg)
Main Result
Suppose the regularity conditions hold. Let n = nx ∧ ny be the smallersample size and ρ = limn→∞ n/(nx ∨ ny ).
With appropriately chosen regularization parameters λ1, λ2 out estimatorδθ,1 satisfies √
nσ−1(δθ,1 − δ∗θ,1)→D N(0, 1 + ρ)
whereσ2 = (Σ11 − Σ12Σ−1
22 Σ21)−1.
Furthermore, σ2 can be consistently estimated.
24
![Page 25: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/25.jpg)
Simulation Plot
25
![Page 26: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/26.jpg)
Power Plot
26
![Page 27: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/27.jpg)
Simultaneous Inference
For a potentially large I, we would like to simultaneously test
H0k : δ∗θ,k = δθ,0k , k ∈ I ⊆ [p]
Or construct simultaneous confidence intervals
Pδ∗θ,k ∈ (δθ,k − σktα(I)/
√n, δθ,k + σktα(I)/
√n) ∀k ∈ I
≥ 1− α
27
![Page 28: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/28.jpg)
Simulation Assisted Simultaneous Inference
Inference based on the following test statistic
maxk∈I
√n|δθ,k − δ∗θ,k |.
Approximate the distribution of the test statistic with
W = maxk
1√n
n∑i=1
⟨ωk ,−ψ(xi ) + 1
ny
ny∑j=1
ψ(yj)r(yj ; δ∗θ )⟩ξi ,
where ξini=1 are i.i.d. standard normal random variables.
The bootstrap critical value is the empirical quantile
tα(I) = inf t ∈ R : P W ≤ t | X ,Y ≥ 1− α .
28
![Page 29: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/29.jpg)
Future Work
Lower bounds
Can we say something about uncertainty when confounders are present?I The problem is how to account for the fact that we are estimating the
effect of confounders.
29
![Page 30: Estimation and Inference for Differential Networks · EstimatedBrainNetworks Thedifferencesbetweenjunior,medianandseniorneuralnetworks. I ThegreenedgesonlyexistinGb(11.75) andtherededgesonlyexist](https://reader033.vdocuments.us/reader033/viewer/2022050715/5ed9cf10c775f12f0c206f45/html5/thumbnails/30.jpg)
ReferencesBanerjee, O., L. El Ghaoui, and A. d’Aspremont. 2008. “Model SelectionThrough Sparse Maximum Likelihood Estimation.” J. Mach. Learn. Res. 9(3): 485–516.Barber, Rina Foygel, and Mladen Kolar. 2015. “ROCKET: RobustConfidence Intervals via Kendall’s Tau for Transelliptical Graphical Models.”ArXiv E-Prints, arXiv:1502.07641, February.Biswal, Bharat B, Maarten Mennes, Xi-Nian Zuo, Suril Gohel, Clare Kelly,Steve M Smith, Christian F Beckmann, Jonathan S Adelstein, Randy LBuckner, and Stan Colcombe. 2010. “Toward Discovery Science of HumanBrain Function.” Proceedings of the National Academy of Sciences 107(10). National Acad Sciences: 4734–9.Cai, T. Tony, W. Liu, and X. Luo. 2011. “A Constrained `1 MinimizationApproach to Sparse Precision Matrix Estimation.” J. Am. Stat. Assoc.106 (494): 594–607.Chen, Mengjie, Zhao Ren, Hongyu Zhao, and Harrison H. Zhou. 2015.“Asymptotically Normal and Efficient Estimation of Covariate-AdjustedGaussian Graphical Model.” Journal of the American StatisticalAssociation 0 (ja): 00–00. doi:10.1080/01621459.2015.1010039.Danaher, P., P. Wang, and Daniela M. Witten. 2014. “The JointGraphical Lasso for Inverse Covariance Estimation Across Multiple Classes.”J. R. Stat. Soc. B 76 (2). Wiley-Blackwell: 373–97.doi:10.1111/rssb.12033.Fazayeli, Farideh, and Arindam Banerjee. 2016. “Generalized DirectChange Estimation in Ising Model Structure.” In Proceedings of the 33rdInternational Conference on Machine Learning, edited by Maria FlorinaBalcan and Kilian Q. Weinberger, 48:2281–90. Proceedings of MachineLearning Research. New York, New York, USA: PMLR.http://proceedings.mlr.press/v48/fazayeli16.html.Friedman, Jerome H., Trevor J. Hastie, and Robert J. Tibshirani. 2008.“Sparse Inverse Covariance Estimation with the Graphical Lasso.”Biostatistics 9 (3): 432–41.Liu, Song, Taiji Suzuki, Raissa Relator, Jun Sese, Masashi Sugiyama, andKenji Fukumizu. 2017. “Support Consistency of Direct Sparse-ChangeLearning in Markov Networks.” Ann. Statist. 45 (3): 959–90.https://doi.org/10.1214/16-AOS1470.Lu, Junwei, Mladen Kolar, and Han Liu. 2017. “Post-RegularizationInference for Dynamic Nonparanormal Graphical Models.” Journal ofMachine Learning Research (to Appear), December.Meinshausen, Nicolas, and Peter Bühlmann. 2006. “High DimensionalGraphs and Variable Selection with the Lasso.” Ann. Stat. 34 (3):1436–62.Power, Jonathan D, Alexander L Cohen, Steven M Nelson, Gagan S Wig,Kelly Anne Barnes, Jessica A Church, Alecia C Vogel, et al. 2011.“Functional Network Organization of the Human Brain.” Neuron 72 (4).Elsevier: 665–78.Ravikumar, Pradeep, Martin J. Wainwright, and J. D. Lafferty. 2010.“High-Dimensional ising Model Selection Using `1-Regularized LogisticRegression.” Ann. Stat. 38 (3): 1287–1319. doi:10.1214/09-AOS691.Ren, Zhao, Tingni Sun, Cun-Hui Zhang, and Harrison H. Zhou. 2015.“Asymptotic Normality and Optimalities in Estimation of Large GaussianGraphical Models.” Ann. Stat. 43 (3): 991–1026.doi:10.1214/14-AOS1286.Sugiyama, Masashi, Shinichi Nakajima, Hisashi Kashima, Paul V. Buenau,and Motoaki Kawanabe. 2008. “Direct Importance Estimation with ModelSelection and Its Application to Covariate Shift Adaptation.” In Advancesin Neural Information Processing Systems 20, edited by J. C. Platt, D.Koller, Y. Singer, and S. T. Roweis, 1433–40. Curran Associates, Inc.http://papers.nips.cc/paper/3248-direct-importance-estimation-with-model-selection-and-its-application-to-covariate-shift-adaptation.pdf.Wang, J., and Mladen Kolar. 2016. “Inference for High-DimensionalExponential Family Graphical Models.” In Proc. of Aistats, edited byArthur Gretton and Christian C. Robert, 51:751–60.Xia, Yin, Tianxi Cai, and T. Tony Cai. 2015. “Testing DifferentialNetworks with Applications to the Detection of Gene-Gene Interactions.”Biometrika 102 (2). Oxford University Press (OUP): 247–66.doi:10.1093/biomet/asu074.Xue, Lingzhou, Hui Zou, and Tianxi Ca. 2012. “Nonconcave PenalizedComposite Conditional Likelihood Estimation of Sparse Ising Models.”Ann. Stat. 40 (3). Institute of Mathematical Statistics: 1403–29.http://projecteuclid.org/euclid.aos/1344610588.Yang, Eunho, Genevera I. Allen, Zhandong Liu, and Pradeep Ravikumar.2012. “Graphical Models via Generalized Linear Models.” In Advances inNeural Information Processing Systems 25, edited by F. Pereira, C.J.C.Burges, L. Bottou, and K.Q. Weinberger, 1358–66. Curran Associates, Inc.http://papers.nips.cc/paper/4617-graphical-models-via-generalized-linear-models.pdf.Yang, Eunho, Yulia Baker, Pradeep Ravikumar, Genevera I. Allen, andZhandong Liu. 2014. “Mixed Graphical Models via Exponential Families.”In Proc. 17th Int. Conf, Artif. Intel. Stat., 1042–50.Yang, Eunho, Pradeep Ravikumar, Genevera I. Allen, and Zhandong Liu.2013. “On Poisson Graphical Models.” In Advances in Neural InformationProcessing Systems 26, edited by C.J.C. Burges, L. Bottou, M. Welling, Z.Ghahramani, and K.Q. Weinberger, 1718–26. Curran Associates, Inc.http://papers.nips.cc/paper/5153-on-poisson-graphical-models.pdf.———. 2015. “On Graphical Models via Univariate Exponential FamilyDistributions.” J. Mach. Learn. Res. 16: 3813–47.http://jmlr.org/papers/v16/yang15a.html.Yu, Ming, Varun Gupta, and Mladen Kolar. 2016. “Statistical Inference forPairwise Graphical Models Using Score Matching.” In Advances in NeuralInformation Processing Systems 29. Curran Associates, Inc.Yuan, M., and Y. Lin. 2007. “Model Selection and Estimation in theGaussian Graphical Model.” Biometrika 94 (1): 19–35.Zhang, Bai, and Yue Wang. 2012. “Learning Structural Changes ofGaussian Graphical Models in Controlled Experiments.” Arxiv:1203.3532,March. http://arxiv.org/abs/1203.3532v1.Zhao, Sihai Dave, T. Tony Cai, and Hongzhe Li. 2014. “Direct Estimationof Differential Networks.” Biometrika 101 (2). Oxford University Press(OUP): 253–68. doi:10.1093/biomet/asu009.
30