a b c d - media.nature.com · a b c d spleen lung peritoneal microglia supplementary figure 1....
TRANSCRIPT
a b c d
Spleen Lung Peritoneal Microglia
Supplementary Figure 1. Diversity of key genes among macrophages. The gene expression profiles of several (a) chemokine receptors, (b) Toll-like receptors, (c) C-type lectin domain members, and (d) efferocytic receptors in red pulp, lung, brain and peritoneal macrophages is displayed as normalized signal intensity.
Nature Immunology doi:10.1038/ni.2416
3 -3 0
Expression
Supplementary Figure 2. Expression profiles of uniquely downregulated genes in macrophages from different organs. Heat map and gene list reveals genes uniquely downregulated by single macrophage populations by >= 5 fold.
Nature Immunology doi:10.1038/ni.2416
0 103 104 105
Spleen
M!
DC CD8+/ CD103+
DC CD11b+
0 103 104 105
0 103 104 105
0 103 104 105
Lung
0 103 104 105
0 103 104 105
MR1
Supplementary Figure 3. Evaluation of Mr1 staining in macrophages and DCs from the spleen and lung. Using a new mAb to Mr1 as described (Chua WJ, J immunol 186,4744-4750. 2011), we compared macrophages and conventional DCs in the spleen or lung for Mr1 expression. Expression was absent on DCs, but present on macrophages, though difficult to detect as expected (Chua WJ, J immunol 186,4744-4750. 2011). Specific staining on macrophages was analyzed by staining with anti-Mr1 mAb in Mr1 knock-out mice (shown in blue profiles). Data are representative of two independent experiments performed.
Nature Immunology doi:10.1038/ni.2416
Supplementary Figure 4. Intracellular FACS staining for CD68 in red pulp macrophages versus CD4+ and CD8+ spleen DCs. Splenocytes were permeabilized and stained with isotype control mAb (filled histograms) or anti-CD68 mAb (open histograms).
Nature Immunology doi:10.1038/ni.2416
Resting peritoneum
Day 5 after thioglycollate
0 103 104 105
F4/80
0
103
104
105
CD115
0 103 104 105
F4/80
0
103
104
105
CD115
0 103 104 105
F4/80
0
103
104
105
CD36
0 103 104 105
F4/80
0
103
104
105
CD11c
0 103 104 105
F4/80
0
103
104
105
MHC-II
0 103 104 105
F4/80
0
103
104
105
CD36
0 103 104 105
F4/80
0
103
104
105
CD11c
0 103 104 105
F4/80
0
103
104
105
MHC-II
Supplementary Figure 5. Phenotype of resting and thioglycollate elicited cells five days after thioglycollate administration intraperitoneally. Staining for F4/80, CD115, CD11c and MHC-II reveals that monocytes-derived macrophage infiltrating the peritoneal cavity in response to thioglycollate upregulate CD11c on their surface with a substantial sub-population that also expresses MHC-II. Box in top panels shows gate that was generated before plotting data in panels below.
Nature Immunology doi:10.1038/ni.2416
Supplementary Figure 6. Lung macrophages are not contaminated with eosinophils. Eosinophils and lung resident macrophages can be discriminated by their level of CD11c expression. A blue gate is shown around macrophages while a red gate delineates eosinophils. Accordingly, when projected on a FACS plot showing CD11b vs Siglec-F, eosinophils are CD11b+ Siglec-F+ (red population) while macrophages are CD11b-/lo Siglec-F+ (blue population). Also lung macrophages, when gated as MERTK+ CD64+ cells, are devoid of eosinophils contaminants. Eosinophils are shown in red while macrophages, gated using the classical strategy shown above (CD11b+ Siglec-F+), are shown in blue.
Nature Immunology doi:10.1038/ni.2416
Ingenuity Canonical Pathways p-value MoleculesMicroglia
0.0005 SLC2A1, LDHB, MMP2, SLC2A5, EDN1, MMP14, VHL Glioma Invasiveness Signaling 0.0078 PLAU, ITGB5, RHOH, MMP2 mTOR Signaling 0.0138 PPM1L, PRKAB1, RPTOR, RHOH, RPS6KA1, PRR5 Leukocyte Extravasation Signaling 0.0145 SIPA1, PLCG1, RHOH, MMP2, JAM3, MMP14, SELPLG Communication between Innate and Adaptive Immune Cells 0.0148 CCL4, TNFRSF17, CCL3L1/CCL3L3, TLR9 Macropinocytosis Signaling 0.0148 PLCG1, ARF6, PDGFB, ITGB5 Crosstalk between Dendritic Cells and Natural Killer Cells 0.0214 PVRL2, FSCN1, TLR9, TREM2 Cysteine Metabolism 0.0282 LDHB, CHST11, CHST7 TREM1 Signaling 0.0331 CCL2, PLCG1, TLR9 Dendritic Cell Maturation 0.0398 HLA-DOB, FSCN1, HLA-DOA, TLR9, TREM2
Lung Macrophages Mitotic Roles of Polo-Like Kinase 0.0065 KIF23, CDC25B, KIF11, PPP2R1B, PRC1, CDC25A Glycerolipid Metabolism 0.0078 LPIN1, LIPF, GLA, MGLL, GK, LPL, AKR1B1, DGAT2 Leukocyte Extravasation Signaling 0.0079 MMP19, ACTG1, CXCR4, SPN, MMP8, PRKCH, MMP12, BMX, CTNNB1, CLDN1, EZR, ITGAL LPS/IL-1 Mediated Inhibition of RXR Function 0.0105 CPT1A, GSTM5, Gstm3, MGST3, ABCG1, IL1RL2, ALAS1, ACSL1, RARA, FABP1, HMGCS1, ACOX1 Fatty Acid Elongation in Mitochondria 0.0135 HSD17B4, EHHADH, Acaa1b Cell Cycle Regulation by BTG Family Proteins 0.0182 PPP2R1B, CCNE1, CCNE2, CCRN4L Cell Cycle: G1/S Checkpoint Regulation 0.0209 CDK6, CCNE1, CCNE2, CDC25A, RBL1 Sphingolipid Metabolism 0.0224 SGMS2, LPIN1, SPTLC2, GLA, NAAA, SULF2 Cyclins and Cell Cycle Regulation 0.0251 PPP2R1B, CDK6, CCNE1, CCNE2, CDC25A, CCNA2 Integrin Signaling 0.0275 ACTG1, NEDD9, ITGA5, ARHGAP26, CAPN2, ITGAX, GRB7, CAPN1, RHOF, BCAR3, ITGAL p38 MAPK Signaling 0.0324 RPS6KA5, MAP3K5, IL1RL2, MAP4K1, MAPKAPK3, CREB5, IL1RN Biosynthesis of Steroids 0.0380 FDFT1, SQLE, IDI1 Cell Cycle Control of Chromosomal Replication 0.0417 MCM4, CDK6, DBF4 FAK Signaling 0.0447 ACTG1, ITGA5, ARHGAP26, CAPN2, TNS1, CAPN1 Aryl Hydrocarbon Receptor Signaling 0.0457 GSTM5, Gstm3, MGST3, CDK6, CCNE1, CCNE2, RARA, CCNA2 cAMP-mediated signaling 0.0468 AKAP13, PTGER2, CXCR2, CAMK2G, CNR2, FPR2, CREB5, PRKAR2B, FPR1, P2RY14, AKAP5
Peritoneal Macrophages (F4/80hi) Eicosanoid Signaling 0.0005 PTGIR, DPEP2, PTGER4, ALOX15, PRDX6, PTGIS, PTGES LXR/RXR Activation 0.0011 MSR1, APOE, ACACA, MMP9, APOC2, LBP, PLTP IL-12 Signaling and Production in Macrophages 0.0012 TGFB2, IKBKE, CD40, ALOX15, AKT3, PRKD3, MST1R, STAT4 Acute Phase Response Signaling 0.0030 IKBKE, SAA1, FN1, CP, RRAS, AKT3, C4A/C4B, HP, LBP, CFB N-Glycan Biosynthesis 0.0062 MAN1A1, FUT8, RPN2, ARSG, DAD1 TR/RXR Activation 0.0105 KLF9, ENO1, F10, ACACA, AKT3, HP Virus Entry via Endocytic Pathways 0.0117 ITGA6, DNM1, RRAS, FLNB, PRKD3, ITGB7 Glycolysis/Gluconeogenesis 0.0138 ALDH2, ENO1, ALDH1A2, HK1, PFKL, PDHA1 Riboflavin Metabolism 0.0191 ENPP5, ACPP, RFK Human Embryonic Stem Cell Pluripotency 0.0200 TGFB2, FZD1, WNT2, S1PR1, AKT3, S1PR5, FGFR1
0.0214 IKBKE, ITGA6, RRAS, AKT3, PRKD3 Aminosugars Metabolism 0.0251 PDE2A, CMAH, ALOX15, HK1, UAP1 PTEN Signaling 0.0251 TGFBR3, IKBKE, RRAS, CCND1, AKT3, FGFR1 N-Glycan Degradation 0.0257 MAN1A1, GLB1, ENGASE RAR Activation 0.0257 TGFB2, CYP26A1, ALDH1A2, RARB, ZBTB16, AKT3, PRKD3, RARG Aryl Hydrocarbon Receptor Signaling 0.0269 TGFB2, NQO2, NFIA, ALDH1A2, RARB, CCND1, RARG
0.0275 ALDH2, ALDH1A2, DPYSL3, HIBCH0.0282 TGFB2, TGFBR3, FZD1, RARB, WNT2, CCND1, AKT3, RARG
Histidine Metabolism 0.0295 ALDH2, HAL, ALDH1A2, HDC Glycosphingolipid Biosynthesis 0.0302 ST3GAL4, GLB1, ST3GAL5 Complement System 0.0302 CFH, C4A/C4B, CFB Inhibition of Angiogenesis by TSP1 0.0355 THBS1, MMP9, AKT3 P2Y Purigenic Receptor Signaling Pathway 0.0380 P2RY1, PLCB4, RRAS, AKT3, PRKD3, GNG12 Coagulation System 0.0380 F10, F5, F13A1 Butanoate Metabolism 0.0407 ALDH2, ALDH1A2, PRDX6, PDHA1 G-Protein Coupled Receptor Signaling 0.0417 PDE2A, RGS18, PTGER4, P2RY1, RRAS, S1PR1, AKT3, CMKLR1, PTGIR, IKBKE, FZD1,
EDNRB, GPRC5B, CXCR7, PLCB4, HTR2A, S1PR5 Glycosaminoglycan Degradation 0.0479 HPSE, GLB1, ALOX15
Spleen Red Pulp Macrophages Interferon Signaling 0.0004 IRF1, STAT1, MX1, STAT2, IFIT3 Communication between Innate and Adaptive Immune Cells 0.0006 Tlr11, CD86, IL15, HLA-DRB1, TLR1, CD4, IL1B Dendritic Cell Maturation 0.0007 HLA-DQB1, CD86, IL15, STAT1, HLA-DRB1, MAPK8, STAT2, HLA-DQA1, CD1D, IL1B Primary Immunodeficiency Signaling 0.0013 CIITA, CD4, ICOS, DCLRE1C, ADA Graft-versus-Host Disease Signaling 0.0014 HLA-DQB1, CD86, HLA-DRB1, HLA-DQA1, IL1B IL-15 Production 0.0020 IRF1, IL15, STAT1, PTK2 LPS/IL-1 Mediated Inhibition of RXR Function 0.0029 ABCC3, GSTA4, HS3ST2, CHST15, IL1R1, MAPK8, UST, ACSL3, NDST1, NR1H3, IL1B Chondroitin Sulfate Biosynthesis 0.0030 HS3ST2, CHST15, UST, NDST1, DSE Antigen Presentation Pathway 0.0032 CIITA, HLA-DRB1, CD74, HLA-DQA1 Cysteine Metabolism 0.0034 GOT1, HS3ST2, CHST15, UST, NDST1 Keratan Sulfate Biosynthesis 0.0037 HS3ST2, CHST15, WDFY3, UST, NDST1 Activation of IRF by Cytosolic Pattern Recognition Receptors 0.0048 IFIH1, STAT1, MAPK8, STAT2, IRF7 Glycerolipid Metabolism 0.0051 AOAH, Akr1b7, DGKI, ADHFE1, MOGAT1, PPAP2B, PPAP2A Role of JAK2 in Hormone-like Cytokine Signaling 0.0058 SOCS5, STAT1, EPOR, HLTF Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses 0.0063 Tlr11, IFIH1, NOD1, TLR1, IRF7, IL1B Phospholipid Degradation 0.0107 GDPD1, DGKI, PLCL1, HMOX1, PPAP2B, PPAP2A Cdc42 Signaling 0.0195 HLA-DQB1, EXOC6, HLA-DRB1, MAPK8, VAV2, HLA-DQA1, H2-T24 Xenobiotic Metabolism Signaling 0.0200 ABCC3, GSTA4, HS3ST2, CHST15, MAPK8, UST, AHR, HMOX1, NDST1, MAF, IL1B Glycerophospholipid Metabolism 0.0209 GDPD1, GOT1, DGKI, PLCL1, HMOX1, PPAP2B, PPAP2A Sphingolipid Metabolism 0.0229 ARSI, UGCG, GALC, PPAP2B, PPAP2A TREM1 Signaling 0.0229 Tlr11, CD86, TLR1, IL1B Complement System 0.0302 CD55, C6, C2 FXR/RXR Activation 0.0372 CYP27A1, MAPK8, ABCB4, NR1H3, IL1B
0.0457 FYB, FYN, VAV2, HMOX1, PTEN Leukocyte Extravasation Signaling 0.0457 DLC1, VCAM1, PECAM1, MAPK8, MMP13, VAV2, MMP27, PTK2
Supplementary Table 1. Pathway analysis of the specific gene expression profiles distinguishing different macrophages
Supplementary Table 4. Analysis of modules significantly enriched in macrophage-associated genes
Macrophage populations
Fine module
hypergeometric p-value
overlap size
overlap genes
112 3.21E-05 4 CD14;CTSL;SEPP1;TMEM195125 1.80E-04 3 COMT1;PLOD1;TCN2130 1.12E-06 4 TLR4;TMEM77;TOM1;TPP1161 4.18E-12 6 A930039A15RIK;CAMK1;GLUL;MYO7A;PLA2G15;PON3
w/o Peritoneum 165 4.26E-05 3 GPR77;IL1A;TMEM86A
w/o Lung 168 1.26E-05 3 C1QA;C1QB;C1QC
132 5.29E-11 7 C130050O18RIK;FCGR4;HGF;PILRA;PILRB1;PILRB2;TLR8165 1.98E-04 3 LPL;MITF;SNX24
w/o Spleen 122 8.73E-08 4 CEBPB;DHRS3;PLOD3;PROS1
165 1.02E-05 4 2810405K02RIK;GM4951;GM5970;IGF1295 7.60E-05 3 ASPA;CD5L;FCNA
122 3.59E-05 4 CEBPB;DRAM1;DUSP3;FN1164 4.63E-06 4 CLEC4E;F10;GDA;PLCB1166 6.05E-10 6 ALOX5;ATG7;G6PDX;PGD;PRDX5;SEPX1188 2.02E-04 3 CAV1;FZD4;PDK4
Lung and Spleen 133 3.37E-07 5 CLEC4A3;EAR1;EAR10;GM5150;SIGLEC1
Peritoneum and Microglia x x x x
Lung and Microglia 168 4.19E-07 4 HPGDS;P2RY12;SLC40A1;SLC7A8
Microglia and Spleen 128 5.14E-06 3 ANG;SERPINE1;X99384
All
4 m
acro
phag
e po
pula
tion
Two
mac
roph
age
popu
latio
ns
all
w/o Microglia
Peritoneum and Lung
Peritoneum and Spleen
Gene/Name Function/Other Informations
Akr1b10, aldo-keto reductase family1 member B10 ubiquitin-dependent degradation of acetyl-coA carboxylase a; key role in regulating phospholipids composition in cells, reactive oxygen species, and cell survival.
Blvrb, biliverdin IX beta reductase Converts biliverdin to bilirubin
Camk1; calcium/calmodulin-dependent kinase 1 Major signaling intermediate
Glul; glutamate-cysteine ligase (also called GCL) Catalyzes the rate-limiting step in glutathione synthesis.
Myo7a (myosin VIIA) Intracellular trafficking of vesicles to the lysosome (mutations cause Usher’s syndrome)
Nln, neurolysin. Also called oligopeptidase M. metallo carboxypeptidase in the same family as angiotensin-converting enzyme
Pcyox1, prenyl cysteine oxidase 1. Catabolism of prenylcysteines
Pla2g15, group XV phospholipase A2 lysosomal phospholipase A2; regulates phospholipid content/distribution.
Pon3, paroxonase 3. Paroxonases have lactonase activity and serve as anti-oxidants. PON3 is mainly found on HDL.
Slc48a1, solute carrier family 48 (heme transporter), member 1
Heme transporter that regulates intracellular heme availability/degradation through the endosomal or lysosomal compartment.
A930039A15Rik unknown
Supplementary Table 5. Function and other information on the 11 genes that comprise module 161.
Supplementary Table 6. Fine modules and predicted regulators of specific macrophage populations in different organs.
Tissue macrophage population
Spleen Red Pulp
Lung Peritoneal Microglia
Fine module # 330 296 295, 111, 112 194, 314
Predicted regulators SpiC PPAR#" Gata6 MafB, ZFHX3,
ZFP715, Bhlhe41
Supplementary Note 1 Ontogenet Algorithm Description
Dataset
Mouse expression was measured on Affymetrix Mogen1 arrays. Clustering was performed on ImmGen
release of September 2010, across 802 samples representing 244 hematopoietic cell types (1-3
replicates per cell type). Ontogenet was applied to the ImmGen release of March 2011. This release
includes 802 samples representing 244 hematopoietic cell types. However, Ontogenet was applied only
to the data of the 676 samples (195 hematopoietic cell types) that were connected to the hematopoietic
tree. Affymetrix annotation version 31 was used.
Data preprocessing
Expression data was normalized as part of the ImmGen pipeline by RMA. Data was log2 transformed.
For gene symbols with more than one probeset on the array, only the probeset with the highest mean
expression was retained. Of those, only probesets with a standard deviation higher than 0.5 across the
entire dataset were used for the clustering, resulting with 7,965 unique differentially expressed genes.
Definition of modules
Modules were defined by clustering. Clustering was performed by Super Paramagnetic Clustering [Blatt
et al. 19961] with default parameters, resulting in 80 stable clusters. The remaining unclustered genes
were grouped into a separate cluster. Those are named coarse modules C1-C81.
Each coarse module was further partitioned to fine modules by further hierarchical clustering, resulting
in 334 fine modules, referred to in the text as fine modules F1-F334. On average, 3.9 fine modules were
nested in a single coarse module. The smallest number of fine modules nested in a coarse module was 1
(23 coarse modules), and the maximum was 11 (7 coarse modules).
Choice of candidate regulators
Candidate regulators were curated from the following sources: (1) The mouse orthologs of all the genes
that were used as candidate regulator in a previous study of human hematopoiesis [Novershtern et al.
20112] ; (2) genes annotated with the Gene Ontology term ‘transcription factor activity’ in mouse,
human or rat; (3) genes for which there is a known DNA binding motif in TRANSFAC matrix database3
v8.3, JASPAR4 Version 2008 and experimentally determined PWMs5-6; and (4) genes with published ChIP-
seq or ChIP-chip data (Supplementary Table 7). Regulators that were not measured on the array or
whose expression did not change sufficiently (standard deviation < 0.5 across the entire dataset) to be
included in the clustering were removed, except that regulators that did not meet the 0.5 variation
cutoff but were highly correlated (>0.85) with another regulator that passed the cutoff were included as
well. This resulted in 578 candidate regulators (Supplementary Table 8).
Module regulatory program
Ontogenet takes as input (1) gene expression profiles across many different cell types, (2) a partitioning
of the genes into modules (coarse and fine clusters, above); (3) a predefined set of candidate regulators;
and (4) an ontogeny tree relating the cell types. It constructs a regulatory program for each module
consisting of a combination of regulators and their ‘regulatory weights’ in each cell type. Each regulatory
program aims to explain as much of the gene expression variance in the module as possible, while
remaining simple and being consistent across related cell types in the ontogeny.
More formally, we assume that the expression of a gene in a module can be modeled as a linear
combination of the expression of the regulators. We will denote activity of a regulator r in a cell type t
as . We model expression of a gene , a member of module , in cell type as ∑
, where each is a Gaussian random variable with zero mean and variance specific to a
combination of a module m and a cell type . Hence the regulatory program learned by Ontogenet is
represented in terms of weights specific to a module, regulator, and a cell type combination. We
note that, due to parameter tying, the effective number of parameters is significantly smaller than the
nominal size of the regulatory program representation (# modules) x (# regulators) x (# cell types).
Module cell-type specific variance estimation
The module variance in a given cell type is estimated from the expression of module members
across all replicates of the cell type. While we utilize an unbiased estimator, we make special
considerations for the modules with less than 10 members. For these modules the variance estimate
is computed by a pooled variance estimator across modules with more than 10 members but still
specific to the cell type. We note that the estimated variances in a fine module are typically smaller
than the variances in its parent coarse module.
Regulatory program fitting as a penalized regression problem
Estimation of the weights takes the form of a regression problem, but due to
overparameterization of the problem, we need to regularize this problem, giving rise to a penalized
regression problem of the form
∑
( ∑
)
( )
where ( ) is a chosen penalty. In our case this penalty is composed of two parts, one promoting
sparsity and selection of correlated predictors and another promoting consistency of regulatory
programs between related cell types.
We assume that only a small number of regulators are actively regulating any one module. A standard
approach to promoting such sparsity in the regression problems is to introduce an L1 penalty, sum of
absolute values ∑ ∑ ∑ . However, this penalty tends to be overly aggressive in inducing
sparsity, thus avoiding the retention of highly correlated predictors, which may all be biological relevant
due to ‘redundancy’ in densely interconnected regulatory circuits. Such behavior can be counteracted by
addition of squared terms
∑ ∑ ∑ ( )
yielding a composite penalty known as [Zou and Hastie,
20057] ∑ ∑
∑ ∑ ( )
which we write compactly as ‖ ‖
‖ ‖
.
An important input to our regulatory program fitting procedure is the ontogeny (differentiation) tree.
This tree is encoded as an edge list ( ) and with ( ) we denote that cell type is a parent of cell
type . The similarity of the regulatory programs for a particular module in two related cell types
( ) can be assessed as a sum of the absolute value of the difference of regulatory weights
∑ . The key observation being that is 0 if the regulatory
relationship between regulator and module is the same in cell type and its parent type . More
generally, the total difference of the regulatory programs can be written as ∑ ∑ ( )
. We will write this term in a compact form as ‖ ‖ where is a vector of weights for all
regulators across all cell types concatenated together and is a matrix of size (RE) x (RT), where R is the
number of regulators, T is the number of cell types and E is the number of edges in the tree. We note
that multiplication by matrix computes the differences between relevant entries of the vector .
The less the regulatory programs change throughout differentiation the smaller will the term ‖ ‖
be. Thus using this term as a penalty will promote the preservation of a consistent regulatory program
throughout differentiation.
Combining all the considerations above, the complete objective for fitting a regulatory program of a
module is given by
∑
( ∑
)
‖ ‖
‖ ‖
‖ ‖
Optimization of this objective is somewhat complicated by the fact that absolute value is a non smooth
function and hence direct optimization by methods such as gradient descent is not feasible. Alternative
methods, such as projected gradients, are possible but their convergence is relatively slow and we opted
to use a primal dual interior point method [Boyd and Vandenberghe, 20048]
In order to simplify the discussion of the optimization we introduce a sparse predictor matrix A of size
(RT) x (T) where ( )
and 0 otherwise. Further we note that the optimal depends only
on the mean of the module’s genes and we can introduce variable
∑
. Hence we can
rewrite the objective as
‖ ‖
‖ ‖ ‖ ‖
‖ ‖
‖ ‖
Finally we can absorb the term
‖ ‖
into the first term as follows
‖[ ] [
√ ] ‖
‖ ‖ ‖ ‖
Regulatory program transfer between coarse and fine modules
The fine modules are encouraged to have a similar program to the coarse module in which they are
nested. This is accomplished by introduction of an additional penalty term. We will denote the already
learned regulatory program of a coarse module as and the regulatory program of a fine module that
we wish to learn as . The coarse-to-fine version of our objective is then
‖ ‖
‖ ‖ ‖ ‖
‖ ‖
‖ ‖
‖ ‖
where the last term ties the coarse and fine modules’ programs. This objective can be transformed into
‖[
√
] [
√
√
] ‖
‖ ‖ ‖ ‖
Solving the prototypical optimization problem
We note that both coarse and fine module regulatory program fitting problems have been expressed in
the following general form
‖ ‖
‖ ‖ ‖ ‖
We reformulate this optimization problem by addition of variables that decouple the penalties.
‖ ‖ ‖ ‖
This reformulation enables straightforward derivation of a primal dual interior point method [Boyd and
Vandenberghe 2004 8].
Model selection using Bayesian Information Criterion
The formulation of our optimization problem is dependent on a set of parameters . Different
combination of these parameters will yield regulatory programs of different quality. One way to assess
the quality of the fits is by using held-out data or through cross validation. Search for these parameters
using cross-validation is prohibitively expensive and instead we utilize the Bayesian Information
Criterion to assess the quality of the fit regulatory program. The BIC criterion compares models, here
encoded by regulatory programs, based on their tradeoff between data log likelihood and degrees of
freedom. The log likelihood for our model is
( ) ∑∑
( ∑
)
The computation of the degrees of freedom is somewhat involved but intuitively simple: a regulatory
weight that remains the same through a particular connected portion of the differentiation tree is
counted as a single degree of freedom. In order to make this more formal we will consider matrix A and
construct its counterpart B. We will use to denote a column of matrix A. We will now construct a
graph where nodes correspond to columns of matrix A. Given two nodes corresponding to and
, the graph will have an edge between these two nodes if cell type is a parent of cell type , and
. The matrix B will have columns that are sums of columns corresponding to connected
components in the graph. We eliminate all columns of B that are zeros and the final degrees of freedom
are given by ( ) ( ( ( )) ) where ( ) is a diagonal matrix with entries
being a number of columns of A in the connected component associated with a column of B .
Hence we can compute the BIC(w) as
( ) ( ) ( ) ∑∑
( ∑
)
( )
Postprocessing of regulatory programs
Once an optimal regulatory program with respect to BIC is obtained we perform postprocessing to
remove regulatory relationships for underexpressed regulators. We placed a low cutoff of 5.5 on the
log2 scale. At this level the correlation between the predictor and the target module may very well be
due to noise and hence the relationship could be spurious
1. Blatt, M., Wiseman, S. & Domany, E. Superparamagnetic Clustering of Data. Physical Review Letters 76, 3251-3254 (1996).
2. Novershtern, N. et al. Densely Interconnected Transcriptional Circuits Control Cell States in Human Hematopoiesis. Cell 144, 296-309 (2011).
3. Matys, V. et al. TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. Nucleic Acids Research 34, D108-D110.
4. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open‐access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32, D91-D94 (2004).
5. Badis, G. et al. Diversity and Complexity in DNA Recognition by Transcription Factors. Science 324, 1720-1723 (2009).
6. Berger, M.F. et al. Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences. Cell 133, 1266-1276 (2008).
7. Zou, H. & Hastie, T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67, 301-320 (2005).
8. Boyd, S.P. & Vandenberghe, L. Convex optimization. (Cambridge, Cambridge, UK ; New York; 2004).
Glossary of Abbreviations within the Immgen Database Relevant to this study Terms DC, dendritic cell Kd, kidney LC, Langerhans cell LV, liver Lu, lung MLN, mesenteric lymph node PC, peritoneal cavity pDC, plasmacytoid dendritic cell Ser, intestinal serosa Sk, skin Sp, spleen SLN, skin-‐draining lymph node Th, thymus Markers 4, CD4 8, CD8 11b, CD11b 103, CD103 480, F4/80 II, MHC II lo, low expressing