a b c d - media.nature.com · a b c d spleen lung peritoneal microglia supplementary figure 1....

a b c d

Spleen Lung Peritoneal Microglia

Supplementary Figure 1. Diversity of key genes among macrophages. The gene expression profiles of several (a) chemokine receptors, (b) Toll-like receptors, (c) C-type lectin domain members, and (d) efferocytic receptors in red pulp, lung, brain and peritoneal macrophages is displayed as normalized signal intensity.

Nature Immunology doi:10.1038/ni.2416

3 -3 0

Expression

Supplementary Figure 2. Expression profiles of uniquely downregulated genes in macrophages from different organs. Heat map and gene list reveals genes uniquely downregulated by single macrophage populations by >= 5 fold.


0 103 104 105

Spleen

M!

DC CD8+/ CD103+

DC CD11b+

0 103 104 105

0 103 104 105

0 103 104 105

Lung

0 103 104 105

0 103 104 105

MR1

Supplementary Figure 3. Evaluation of Mr1 staining in macrophages and DCs from the spleen and lung. Using a new mAb to Mr1 as described (Chua WJ, J immunol 186,4744-4750. 2011), we compared macrophages and conventional DCs in the spleen or lung for Mr1 expression. Expression was absent on DCs, but present on macrophages, though difficult to detect as expected (Chua WJ, J immunol 186,4744-4750. 2011). Specific staining on macrophages was analyzed by staining with anti-Mr1 mAb in Mr1 knock-out mice (shown in blue profiles). Data are representative of two independent experiments performed.


Supplementary Figure 4. Intracellular FACS staining for CD68 in red pulp macrophages versus CD4+ and CD8+ spleen DCs. Splenocytes were permeabilized and stained with isotype control mAb (filled histograms) or anti-CD68 mAb (open histograms).


Resting peritoneum

Day 5 after thioglycollate

0 103 104 105

F4/80

0

103

104

105

CD115

0 103 104 105

F4/80

0

103

104

105

CD115

0 103 104 105

F4/80

0

103

104

105

CD36

0 103 104 105

F4/80

0

103

104

105

CD11c

0 103 104 105

F4/80

0

103

104

105

MHC-II

0 103 104 105

F4/80

0

103

104

105

CD36

0 103 104 105

F4/80

0

103

104

105

CD11c

0 103 104 105

F4/80

0

103

104

105

MHC-II

Supplementary Figure 5. Phenotype of resting and thioglycollate elicited cells five days after thioglycollate administration intraperitoneally. Staining for F4/80, CD115, CD11c and MHC-II reveals that monocytes-derived macrophage infiltrating the peritoneal cavity in response to thioglycollate upregulate CD11c on their surface with a substantial sub-population that also expresses MHC-II. Box in top panels shows gate that was generated before plotting data in panels below.


Supplementary Figure 6. Lung macrophages are not contaminated with eosinophils. Eosinophils and lung resident macrophages can be discriminated by their level of CD11c expression. A blue gate is shown around macrophages while a red gate delineates eosinophils. Accordingly, when projected on a FACS plot showing CD11b vs Siglec-F, eosinophils are CD11b+ Siglec-F+ (red population) while macrophages are CD11b-/lo Siglec-F+ (blue population). Also lung macrophages, when gated as MERTK+ CD64+ cells, are devoid of eosinophils contaminants. Eosinophils are shown in red while macrophages, gated using the classical strategy shown above (CD11b+ Siglec-F+), are shown in blue.


Ingenuity Canonical Pathways p-value MoleculesMicroglia

0.0005 SLC2A1, LDHB, MMP2, SLC2A5, EDN1, MMP14, VHL Glioma Invasiveness Signaling 0.0078 PLAU, ITGB5, RHOH, MMP2 mTOR Signaling 0.0138 PPM1L, PRKAB1, RPTOR, RHOH, RPS6KA1, PRR5 Leukocyte Extravasation Signaling 0.0145 SIPA1, PLCG1, RHOH, MMP2, JAM3, MMP14, SELPLG Communication between Innate and Adaptive Immune Cells 0.0148 CCL4, TNFRSF17, CCL3L1/CCL3L3, TLR9 Macropinocytosis Signaling 0.0148 PLCG1, ARF6, PDGFB, ITGB5 Crosstalk between Dendritic Cells and Natural Killer Cells 0.0214 PVRL2, FSCN1, TLR9, TREM2 Cysteine Metabolism 0.0282 LDHB, CHST11, CHST7 TREM1 Signaling 0.0331 CCL2, PLCG1, TLR9 Dendritic Cell Maturation 0.0398 HLA-DOB, FSCN1, HLA-DOA, TLR9, TREM2

Lung Macrophages Mitotic Roles of Polo-Like Kinase 0.0065 KIF23, CDC25B, KIF11, PPP2R1B, PRC1, CDC25A Glycerolipid Metabolism 0.0078 LPIN1, LIPF, GLA, MGLL, GK, LPL, AKR1B1, DGAT2 Leukocyte Extravasation Signaling 0.0079 MMP19, ACTG1, CXCR4, SPN, MMP8, PRKCH, MMP12, BMX, CTNNB1, CLDN1, EZR, ITGAL LPS/IL-1 Mediated Inhibition of RXR Function 0.0105 CPT1A, GSTM5, Gstm3, MGST3, ABCG1, IL1RL2, ALAS1, ACSL1, RARA, FABP1, HMGCS1, ACOX1 Fatty Acid Elongation in Mitochondria 0.0135 HSD17B4, EHHADH, Acaa1b Cell Cycle Regulation by BTG Family Proteins 0.0182 PPP2R1B, CCNE1, CCNE2, CCRN4L Cell Cycle: G1/S Checkpoint Regulation 0.0209 CDK6, CCNE1, CCNE2, CDC25A, RBL1 Sphingolipid Metabolism 0.0224 SGMS2, LPIN1, SPTLC2, GLA, NAAA, SULF2 Cyclins and Cell Cycle Regulation 0.0251 PPP2R1B, CDK6, CCNE1, CCNE2, CDC25A, CCNA2 Integrin Signaling 0.0275 ACTG1, NEDD9, ITGA5, ARHGAP26, CAPN2, ITGAX, GRB7, CAPN1, RHOF, BCAR3, ITGAL p38 MAPK Signaling 0.0324 RPS6KA5, MAP3K5, IL1RL2, MAP4K1, MAPKAPK3, CREB5, IL1RN Biosynthesis of Steroids 0.0380 FDFT1, SQLE, IDI1 Cell Cycle Control of Chromosomal Replication 0.0417 MCM4, CDK6, DBF4 FAK Signaling 0.0447 ACTG1, ITGA5, ARHGAP26, CAPN2, TNS1, CAPN1 Aryl Hydrocarbon Receptor Signaling 0.0457 GSTM5, Gstm3, MGST3, CDK6, CCNE1, CCNE2, RARA, CCNA2 cAMP-mediated signaling 0.0468 AKAP13, PTGER2, CXCR2, CAMK2G, CNR2, FPR2, CREB5, PRKAR2B, FPR1, P2RY14, AKAP5

Peritoneal Macrophages (F4/80hi) Eicosanoid Signaling 0.0005 PTGIR, DPEP2, PTGER4, ALOX15, PRDX6, PTGIS, PTGES LXR/RXR Activation 0.0011 MSR1, APOE, ACACA, MMP9, APOC2, LBP, PLTP IL-12 Signaling and Production in Macrophages 0.0012 TGFB2, IKBKE, CD40, ALOX15, AKT3, PRKD3, MST1R, STAT4 Acute Phase Response Signaling 0.0030 IKBKE, SAA1, FN1, CP, RRAS, AKT3, C4A/C4B, HP, LBP, CFB N-Glycan Biosynthesis 0.0062 MAN1A1, FUT8, RPN2, ARSG, DAD1 TR/RXR Activation 0.0105 KLF9, ENO1, F10, ACACA, AKT3, HP Virus Entry via Endocytic Pathways 0.0117 ITGA6, DNM1, RRAS, FLNB, PRKD3, ITGB7 Glycolysis/Gluconeogenesis 0.0138 ALDH2, ENO1, ALDH1A2, HK1, PFKL, PDHA1 Riboflavin Metabolism 0.0191 ENPP5, ACPP, RFK Human Embryonic Stem Cell Pluripotency 0.0200 TGFB2, FZD1, WNT2, S1PR1, AKT3, S1PR5, FGFR1

0.0214 IKBKE, ITGA6, RRAS, AKT3, PRKD3 Aminosugars Metabolism 0.0251 PDE2A, CMAH, ALOX15, HK1, UAP1 PTEN Signaling 0.0251 TGFBR3, IKBKE, RRAS, CCND1, AKT3, FGFR1 N-Glycan Degradation 0.0257 MAN1A1, GLB1, ENGASE RAR Activation 0.0257 TGFB2, CYP26A1, ALDH1A2, RARB, ZBTB16, AKT3, PRKD3, RARG Aryl Hydrocarbon Receptor Signaling 0.0269 TGFB2, NQO2, NFIA, ALDH1A2, RARB, CCND1, RARG

0.0275 ALDH2, ALDH1A2, DPYSL3, HIBCH0.0282 TGFB2, TGFBR3, FZD1, RARB, WNT2, CCND1, AKT3, RARG

Histidine Metabolism 0.0295 ALDH2, HAL, ALDH1A2, HDC Glycosphingolipid Biosynthesis 0.0302 ST3GAL4, GLB1, ST3GAL5 Complement System 0.0302 CFH, C4A/C4B, CFB Inhibition of Angiogenesis by TSP1 0.0355 THBS1, MMP9, AKT3 P2Y Purigenic Receptor Signaling Pathway 0.0380 P2RY1, PLCB4, RRAS, AKT3, PRKD3, GNG12 Coagulation System 0.0380 F10, F5, F13A1 Butanoate Metabolism 0.0407 ALDH2, ALDH1A2, PRDX6, PDHA1 G-Protein Coupled Receptor Signaling 0.0417 PDE2A, RGS18, PTGER4, P2RY1, RRAS, S1PR1, AKT3, CMKLR1, PTGIR, IKBKE, FZD1,

EDNRB, GPRC5B, CXCR7, PLCB4, HTR2A, S1PR5 Glycosaminoglycan Degradation 0.0479 HPSE, GLB1, ALOX15

Spleen Red Pulp Macrophages Interferon Signaling 0.0004 IRF1, STAT1, MX1, STAT2, IFIT3 Communication between Innate and Adaptive Immune Cells 0.0006 Tlr11, CD86, IL15, HLA-DRB1, TLR1, CD4, IL1B Dendritic Cell Maturation 0.0007 HLA-DQB1, CD86, IL15, STAT1, HLA-DRB1, MAPK8, STAT2, HLA-DQA1, CD1D, IL1B Primary Immunodeficiency Signaling 0.0013 CIITA, CD4, ICOS, DCLRE1C, ADA Graft-versus-Host Disease Signaling 0.0014 HLA-DQB1, CD86, HLA-DRB1, HLA-DQA1, IL1B IL-15 Production 0.0020 IRF1, IL15, STAT1, PTK2 LPS/IL-1 Mediated Inhibition of RXR Function 0.0029 ABCC3, GSTA4, HS3ST2, CHST15, IL1R1, MAPK8, UST, ACSL3, NDST1, NR1H3, IL1B Chondroitin Sulfate Biosynthesis 0.0030 HS3ST2, CHST15, UST, NDST1, DSE Antigen Presentation Pathway 0.0032 CIITA, HLA-DRB1, CD74, HLA-DQA1 Cysteine Metabolism 0.0034 GOT1, HS3ST2, CHST15, UST, NDST1 Keratan Sulfate Biosynthesis 0.0037 HS3ST2, CHST15, WDFY3, UST, NDST1 Activation of IRF by Cytosolic Pattern Recognition Receptors 0.0048 IFIH1, STAT1, MAPK8, STAT2, IRF7 Glycerolipid Metabolism 0.0051 AOAH, Akr1b7, DGKI, ADHFE1, MOGAT1, PPAP2B, PPAP2A Role of JAK2 in Hormone-like Cytokine Signaling 0.0058 SOCS5, STAT1, EPOR, HLTF Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses 0.0063 Tlr11, IFIH1, NOD1, TLR1, IRF7, IL1B Phospholipid Degradation 0.0107 GDPD1, DGKI, PLCL1, HMOX1, PPAP2B, PPAP2A Cdc42 Signaling 0.0195 HLA-DQB1, EXOC6, HLA-DRB1, MAPK8, VAV2, HLA-DQA1, H2-T24 Xenobiotic Metabolism Signaling 0.0200 ABCC3, GSTA4, HS3ST2, CHST15, MAPK8, UST, AHR, HMOX1, NDST1, MAF, IL1B Glycerophospholipid Metabolism 0.0209 GDPD1, GOT1, DGKI, PLCL1, HMOX1, PPAP2B, PPAP2A Sphingolipid Metabolism 0.0229 ARSI, UGCG, GALC, PPAP2B, PPAP2A TREM1 Signaling 0.0229 Tlr11, CD86, TLR1, IL1B Complement System 0.0302 CD55, C6, C2 FXR/RXR Activation 0.0372 CYP27A1, MAPK8, ABCB4, NR1H3, IL1B

0.0457 FYB, FYN, VAV2, HMOX1, PTEN Leukocyte Extravasation Signaling 0.0457 DLC1, VCAM1, PECAM1, MAPK8, MMP13, VAV2, MMP27, PTK2

Supplementary Table 1. Pathway analysis of the specific gene expression profiles distinguishing different macrophages

Supplementary Table 4. Analysis of modules significantly enriched in macrophage-associated genes

Macrophage populations

Fine module

hypergeometric p-value

overlap size

overlap genes

112 3.21E-05 4 CD14;CTSL;SEPP1;TMEM195125 1.80E-04 3 COMT1;PLOD1;TCN2130 1.12E-06 4 TLR4;TMEM77;TOM1;TPP1161 4.18E-12 6 A930039A15RIK;CAMK1;GLUL;MYO7A;PLA2G15;PON3

w/o Peritoneum 165 4.26E-05 3 GPR77;IL1A;TMEM86A

w/o Lung 168 1.26E-05 3 C1QA;C1QB;C1QC

132 5.29E-11 7 C130050O18RIK;FCGR4;HGF;PILRA;PILRB1;PILRB2;TLR8165 1.98E-04 3 LPL;MITF;SNX24

w/o Spleen 122 8.73E-08 4 CEBPB;DHRS3;PLOD3;PROS1

165 1.02E-05 4 2810405K02RIK;GM4951;GM5970;IGF1295 7.60E-05 3 ASPA;CD5L;FCNA

122 3.59E-05 4 CEBPB;DRAM1;DUSP3;FN1164 4.63E-06 4 CLEC4E;F10;GDA;PLCB1166 6.05E-10 6 ALOX5;ATG7;G6PDX;PGD;PRDX5;SEPX1188 2.02E-04 3 CAV1;FZD4;PDK4

Lung and Spleen 133 3.37E-07 5 CLEC4A3;EAR1;EAR10;GM5150;SIGLEC1

Peritoneum and Microglia x x x x

Lung and Microglia 168 4.19E-07 4 HPGDS;P2RY12;SLC40A1;SLC7A8

Microglia and Spleen 128 5.14E-06 3 ANG;SERPINE1;X99384

All

4 m

acro

phag

e po

pula

tion

Two

mac

roph

age

popu

latio

ns

all

w/o Microglia

Peritoneum and Lung

Peritoneum and Spleen

Gene/Name Function/Other Informations

Akr1b10, aldo-keto reductase family1 member B10 ubiquitin-dependent degradation of acetyl-coA carboxylase a; key role in regulating phospholipids composition in cells, reactive oxygen species, and cell survival.

Blvrb, biliverdin IX beta reductase Converts biliverdin to bilirubin

Camk1; calcium/calmodulin-dependent kinase 1 Major signaling intermediate

Glul; glutamate-cysteine ligase (also called GCL) Catalyzes the rate-limiting step in glutathione synthesis.

Myo7a (myosin VIIA) Intracellular trafficking of vesicles to the lysosome (mutations cause Usher’s syndrome)

Nln, neurolysin. Also called oligopeptidase M. metallo carboxypeptidase in the same family as angiotensin-converting enzyme

Pcyox1, prenyl cysteine oxidase 1. Catabolism of prenylcysteines

Pla2g15, group XV phospholipase A2 lysosomal phospholipase A2; regulates phospholipid content/distribution.

Pon3, paroxonase 3. Paroxonases have lactonase activity and serve as anti-oxidants. PON3 is mainly found on HDL.

Slc48a1, solute carrier family 48 (heme transporter), member 1

Heme transporter that regulates intracellular heme availability/degradation through the endosomal or lysosomal compartment.

A930039A15Rik unknown

Supplementary Table 5. Function and other information on the 11 genes that comprise module 161.

Supplementary Table 6. Fine modules and predicted regulators of specific macrophage populations in different organs.

Tissue macrophage population

Spleen Red Pulp

Lung Peritoneal Microglia

Fine module # 330 296 295, 111, 112 194, 314

Predicted regulators SpiC PPAR#" Gata6 MafB, ZFHX3,

ZFP715, Bhlhe41

Supplementary Note 1 Ontogenet Algorithm Description

Dataset

Mouse expression was measured on Affymetrix Mogen1 arrays. Clustering was performed on ImmGen

release of September 2010, across 802 samples representing 244 hematopoietic cell types (1-3

replicates per cell type). Ontogenet was applied to the ImmGen release of March 2011. This release

includes 802 samples representing 244 hematopoietic cell types. However, Ontogenet was applied only

to the data of the 676 samples (195 hematopoietic cell types) that were connected to the hematopoietic

tree. Affymetrix annotation version 31 was used.

Data preprocessing

Expression data was normalized as part of the ImmGen pipeline by RMA. Data was log2 transformed.

For gene symbols with more than one probeset on the array, only the probeset with the highest mean

expression was retained. Of those, only probesets with a standard deviation higher than 0.5 across the

entire dataset were used for the clustering, resulting with 7,965 unique differentially expressed genes.

Definition of modules

Modules were defined by clustering. Clustering was performed by Super Paramagnetic Clustering [Blatt

et al. 19961] with default parameters, resulting in 80 stable clusters. The remaining unclustered genes

were grouped into a separate cluster. Those are named coarse modules C1-C81.

Each coarse module was further partitioned to fine modules by further hierarchical clustering, resulting

in 334 fine modules, referred to in the text as fine modules F1-F334. On average, 3.9 fine modules were

nested in a single coarse module. The smallest number of fine modules nested in a coarse module was 1

(23 coarse modules), and the maximum was 11 (7 coarse modules).

Choice of candidate regulators

Candidate regulators were curated from the following sources: (1) The mouse orthologs of all the genes

that were used as candidate regulator in a previous study of human hematopoiesis [Novershtern et al.

20112] ; (2) genes annotated with the Gene Ontology term ‘transcription factor activity’ in mouse,

human or rat; (3) genes for which there is a known DNA binding motif in TRANSFAC matrix database3

v8.3, JASPAR4 Version 2008 and experimentally determined PWMs5-6; and (4) genes with published ChIP-

seq or ChIP-chip data (Supplementary Table 7). Regulators that were not measured on the array or

whose expression did not change sufficiently (standard deviation < 0.5 across the entire dataset) to be

included in the clustering were removed, except that regulators that did not meet the 0.5 variation

cutoff but were highly correlated (>0.85) with another regulator that passed the cutoff were included as

well. This resulted in 578 candidate regulators (Supplementary Table 8).

Module regulatory program

Ontogenet takes as input (1) gene expression profiles across many different cell types, (2) a partitioning

of the genes into modules (coarse and fine clusters, above); (3) a predefined set of candidate regulators;

and (4) an ontogeny tree relating the cell types. It constructs a regulatory program for each module

consisting of a combination of regulators and their ‘regulatory weights’ in each cell type. Each regulatory

program aims to explain as much of the gene expression variance in the module as possible, while

remaining simple and being consistent across related cell types in the ontogeny.

More formally, we assume that the expression of a gene in a module can be modeled as a linear

combination of the expression of the regulators. We will denote activity of a regulator r in a cell type t

as . We model expression of a gene , a member of module , in cell type as ∑

, where each is a Gaussian random variable with zero mean and variance specific to a

combination of a module m and a cell type . Hence the regulatory program learned by Ontogenet is

represented in terms of weights specific to a module, regulator, and a cell type combination. We

note that, due to parameter tying, the effective number of parameters is significantly smaller than the

nominal size of the regulatory program representation (# modules) x (# regulators) x (# cell types).

Module cell-type specific variance estimation

The module variance in a given cell type is estimated from the expression of module members

across all replicates of the cell type. While we utilize an unbiased estimator, we make special

considerations for the modules with less than 10 members. For these modules the variance estimate

is computed by a pooled variance estimator across modules with more than 10 members but still

specific to the cell type. We note that the estimated variances in a fine module are typically smaller

than the variances in its parent coarse module.

Regulatory program fitting as a penalized regression problem

Estimation of the weights takes the form of a regression problem, but due to

overparameterization of the problem, we need to regularize this problem, giving rise to a penalized

regression problem of the form

∑

( ∑

)

( )

where ( ) is a chosen penalty. In our case this penalty is composed of two parts, one promoting

sparsity and selection of correlated predictors and another promoting consistency of regulatory

programs between related cell types.

We assume that only a small number of regulators are actively regulating any one module. A standard

approach to promoting such sparsity in the regression problems is to introduce an L1 penalty, sum of

absolute values ∑ ∑ ∑ . However, this penalty tends to be overly aggressive in inducing

sparsity, thus avoiding the retention of highly correlated predictors, which may all be biological relevant

due to ‘redundancy’ in densely interconnected regulatory circuits. Such behavior can be counteracted by

addition of squared terms

∑ ∑ ∑ ( )

yielding a composite penalty known as [Zou and Hastie,

20057] ∑ ∑

∑ ∑ ( )

which we write compactly as ‖ ‖

‖ ‖

.

An important input to our regulatory program fitting procedure is the ontogeny (differentiation) tree.

This tree is encoded as an edge list ( ) and with ( ) we denote that cell type is a parent of cell

type . The similarity of the regulatory programs for a particular module in two related cell types

( ) can be assessed as a sum of the absolute value of the difference of regulatory weights

∑ . The key observation being that is 0 if the regulatory

relationship between regulator and module is the same in cell type and its parent type . More

generally, the total difference of the regulatory programs can be written as ∑ ∑ ( )

. We will write this term in a compact form as ‖ ‖ where is a vector of weights for all

regulators across all cell types concatenated together and is a matrix of size (RE) x (RT), where R is the

number of regulators, T is the number of cell types and E is the number of edges in the tree. We note

that multiplication by matrix computes the differences between relevant entries of the vector .

The less the regulatory programs change throughout differentiation the smaller will the term ‖ ‖

be. Thus using this term as a penalty will promote the preservation of a consistent regulatory program

throughout differentiation.

Combining all the considerations above, the complete objective for fitting a regulatory program of a

module is given by

∑

( ∑

)

‖ ‖

‖ ‖

‖ ‖

Optimization of this objective is somewhat complicated by the fact that absolute value is a non smooth

function and hence direct optimization by methods such as gradient descent is not feasible. Alternative

methods, such as projected gradients, are possible but their convergence is relatively slow and we opted

to use a primal dual interior point method [Boyd and Vandenberghe, 20048]

In order to simplify the discussion of the optimization we introduce a sparse predictor matrix A of size

(RT) x (T) where ( )

and 0 otherwise. Further we note that the optimal depends only

on the mean of the module’s genes and we can introduce variable

∑

. Hence we can

rewrite the objective as

‖ ‖

‖ ‖ ‖ ‖

‖ ‖

‖ ‖

Finally we can absorb the term

‖ ‖

into the first term as follows

‖[ ] [

√ ] ‖

‖ ‖ ‖ ‖

Regulatory program transfer between coarse and fine modules

The fine modules are encouraged to have a similar program to the coarse module in which they are

nested. This is accomplished by introduction of an additional penalty term. We will denote the already

learned regulatory program of a coarse module as and the regulatory program of a fine module that

we wish to learn as . The coarse-to-fine version of our objective is then

‖ ‖

‖ ‖ ‖ ‖

‖ ‖

‖ ‖

‖ ‖

where the last term ties the coarse and fine modules’ programs. This objective can be transformed into

‖[

√

] [

√

√

] ‖

‖ ‖ ‖ ‖

Solving the prototypical optimization problem

We note that both coarse and fine module regulatory program fitting problems have been expressed in

the following general form

‖ ‖

‖ ‖ ‖ ‖

We reformulate this optimization problem by addition of variables that decouple the penalties.

‖ ‖ ‖ ‖

This reformulation enables straightforward derivation of a primal dual interior point method [Boyd and

Vandenberghe 2004 8].

Model selection using Bayesian Information Criterion

The formulation of our optimization problem is dependent on a set of parameters . Different

combination of these parameters will yield regulatory programs of different quality. One way to assess

the quality of the fits is by using held-out data or through cross validation. Search for these parameters

using cross-validation is prohibitively expensive and instead we utilize the Bayesian Information

Criterion to assess the quality of the fit regulatory program. The BIC criterion compares models, here

encoded by regulatory programs, based on their tradeoff between data log likelihood and degrees of

freedom. The log likelihood for our model is

( ) ∑∑

( ∑

)

The computation of the degrees of freedom is somewhat involved but intuitively simple: a regulatory

weight that remains the same through a particular connected portion of the differentiation tree is

counted as a single degree of freedom. In order to make this more formal we will consider matrix A and

construct its counterpart B. We will use to denote a column of matrix A. We will now construct a

graph where nodes correspond to columns of matrix A. Given two nodes corresponding to and

, the graph will have an edge between these two nodes if cell type is a parent of cell type , and

. The matrix B will have columns that are sums of columns corresponding to connected

components in the graph. We eliminate all columns of B that are zeros and the final degrees of freedom

are given by ( ) ( ( ( )) ) where ( ) is a diagonal matrix with entries

being a number of columns of A in the connected component associated with a column of B .

Hence we can compute the BIC(w) as

( ) ( ) ( ) ∑∑

( ∑

)

( )

Postprocessing of regulatory programs

Once an optimal regulatory program with respect to BIC is obtained we perform postprocessing to

remove regulatory relationships for underexpressed regulators. We placed a low cutoff of 5.5 on the

log2 scale. At this level the correlation between the predictor and the target module may very well be

due to noise and hence the relationship could be spurious

1. Blatt, M., Wiseman, S. & Domany, E. Superparamagnetic Clustering of Data. Physical Review Letters 76, 3251-3254 (1996).

2. Novershtern, N. et al. Densely Interconnected Transcriptional Circuits Control Cell States in Human Hematopoiesis. Cell 144, 296-309 (2011).

3. Matys, V. et al. TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. Nucleic Acids Research 34, D108-D110.

4. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open‐access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32, D91-D94 (2004).

5. Badis, G. et al. Diversity and Complexity in DNA Recognition by Transcription Factors. Science 324, 1720-1723 (2009).

6. Berger, M.F. et al. Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences. Cell 133, 1266-1276 (2008).

7. Zou, H. & Hastie, T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67, 301-320 (2005).

8. Boyd, S.P. & Vandenberghe, L. Convex optimization. (Cambridge, Cambridge, UK ; New York; 2004).

Glossary of Abbreviations within the Immgen Database Relevant to this study Terms DC, dendritic cell Kd, kidney LC, Langerhans cell LV, liver Lu, lung MLN, mesenteric lymph node PC, peritoneal cavity pDC, plasmacytoid dendritic cell Ser, intestinal serosa Sk, skin Sp, spleen SLN, skin-‐draining lymph node Th, thymus Markers 4, CD4 8, CD8 11b, CD11b 103, CD103 480, F4/80 II, MHC II lo, low expressing

a b c d - media.nature.com · a b c d spleen lung peritoneal microglia supplementary figure 1....

Documents