ku...immune cell as well as cell-specific susceptibility genes in t cells, b cells and monocytes....

13
university of copenhagen A systems biology approach uncovers cell-specific gene regulatory effects of genetic associations in multiple sclerosis International Multiple Sclerosis Genetics Consortium Published in: Nature Communications DOI: 10.1038/s41467-019-09773-y Publication date: 2019 Document version Publisher's PDF, also known as Version of record Document license: CC BY Citation for published version (APA): International Multiple Sclerosis Genetics Consortium (2019). A systems biology approach uncovers cell-specific gene regulatory effects of genetic associations in multiple sclerosis. Nature Communications, 10(1), [2236]. https://doi.org/10.1038/s41467-019-09773-y Download date: 19. Jul. 2021

Upload: others

Post on 24-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

u n i ve r s i t y o f co pe n h ag e n

A systems biology approach uncovers cell-specific gene regulatory effects of geneticassociations in multiple sclerosis

International Multiple Sclerosis Genetics Consortium

Published in:Nature Communications

DOI:10.1038/s41467-019-09773-y

Publication date:2019

Document versionPublisher's PDF, also known as Version of record

Document license:CC BY

Citation for published version (APA):International Multiple Sclerosis Genetics Consortium (2019). A systems biology approach uncovers cell-specificgene regulatory effects of genetic associations in multiple sclerosis. Nature Communications, 10(1), [2236].https://doi.org/10.1038/s41467-019-09773-y

Download date: 19. Jul. 2021

Page 2: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

ARTICLE

A systems biology approach uncovers cell-specificgene regulatory effects of genetic associations inmultiple sclerosisInternational Multiple Sclerosis Genetics Consortium

Genome-wide association studies (GWAS) have identified more than 50,000 unique asso-

ciations with common human traits. While this represents a substantial step forward,

establishing the biology underlying these associations has proven extremely difficult. Even

determining which cell types and which particular gene(s) are relevant continues to be a

challenge. Here, we conduct a cell-specific pathway analysis of the latest GWAS in multiple

sclerosis (MS), which had analyzed a total of 47,351 cases and 68,284 healthy controls

and found more than 200 non-MHC genome-wide associations. Our analysis identifies pan

immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes.

Finally, genotype-level data from 2,370 patients and 412 controls is used to compute intra-

individual and cell-specific susceptibility pathways that offer a biological interpretation of the

individual genetic risk to MS. This approach could be adopted in any other complex trait for

which genome-wide data is available.

A full list of authors and their affiliations appears at the end of the paper. Correspondence and requests for materials should be addressed to S.E.B.(email: [email protected])

Corrected: Author correction

https://doi.org/10.1038/s41467-019-09773-y OPEN

NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications 1

1234

5678

90():,;

Page 3: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

Translating Genome-wide association studies (GWAS) dis-coveries into functionally relevant biology has proven to behighly challenging. The extensive linkage disequilibrium

(LD) which typically flanks common variants means that mostGWAS identified SNPs are likely to be tags for functionallyrelevant variation rather than exerting any meaningful effectsthemselves. Furthermore, since the vast majority of associationsidentified by GWAS map to non-coding regulatory regions it islikely that the underlying functionally relevant variants only exertpertinent effects on gene expression in particular tissues1–4.Fortunately, better powered studies, have increased the number ofassociations identified enabling biological meaning to be inves-tigated in aggregate (i.e. pathway analysis). In its simplest form,genes lying closest to the most strongly associated (lead) SNPidentified for each association can be grouped into pathways orspecific functional memberships via the use of pre-assembledcontrolled vocabularies (Gene ontology, KEGG, etc)5–8. Thisapproach can be enhanced by using protein interaction networksto more rigorously assess which of the candidate genes encodeproteins that physically interact in any particular pathway9. Usingthis refined approach, we and others have been able to show thatMS-associated genes are indeed more likely to interact in proteinspace10–12. Furthermore, this analysis can be extended to includeassociation signals below the genome-wide threshold of sig-nificance and thereby nominate new additional potentiallymeaningful associations11,13,14. However, the networks of genes/proteins identified by these approaches are not cell- or tissue-specific, thus limiting the usefulness and interpretability of thisinformation.

With the completion of efforts like the Encyclopedia of DNAelements (ENCODE) and the Roadmap Epigenomics Project(REP) a wealth of information on regulatory elements is nowavailable from hundreds of cell types and dozens of differenttissues15,16, raising the possibility of applying network-basedapproaches in a cell-specific manner. We reasoned this approachwould likely be highly informative in diseases like multiplesclerosis (MS) where substantial numbers of associated variantshave been identified. MS is an autoimmune disease of the centralnervous system (CNS) and leads to a neurodegenerative process.Our recent GWAS meta-analysis and follow-up study haverevealed a total of 233 genome-wide significant associations and afurther 416 variants potentially associated with MS17.

Here we develop a framework to interpret such associations inthe context of cell-specific protein networks to identify the mostlikely process(es) affected by the non-MHC associations as awhole. This approach involves (1) selecting independently asso-ciated signals in extended haplotypic blocks; (2) identifying thegenomic regulatory processes likely to be altered by the poly-morphisms in these blocks in a cell-specific manner; (3) com-puting a cell-specific gene score for genes in each associated locus;(4) building cell-specific gene/protein networks; (5) Interpretingthe biological processes most likely affected for each of the celltypes studied. We demonstrate this approach in the latest GWASmeta-analysis in MS involving a total of 47,351 cases and 68,284controls. Furthermore, we use genotype-level data from a subsetof 2370 cases and 412 controls to identify cell-specific intra-individual risk pathways. These individualized scores can be usedas a global risk measure in subsequent associations with moredetailed phenotypes.

ResultsPredicted regulatory effects (PRE) of MS-associated variants.We integrated genetic association signals from the latest geneticanalysis of MS17 with cell specific information on regulatoryelements available from the ENCODE and Epigenomics

Roadmap projects (REP) to identify cell-specific networks likelyaffected by the susceptibility variants (Fig. 1a, SupplementaryFig. 1). We included all genome-wide (GW) significant singlenucleotide polymorphisms (SNPs) together with their proxiesselected at differing LD thresholds (r2 > 0.5 was used for the mainanalysis).

All regulatory information retrieved for each MS-associatedregion was compiled for each cell type in a single mastertable (Supplementary Data 1). This catalogue contains all of theavailable regulatory features that potentially modulate theexpression of each of the genes mapping to each associatedregion in a cell-specific manner. We next used this information tobuild a cell-specific, genetic regulatory network that constitutedthe basis of a gene prioritization scheme within each associatedregion (Fig. 1b). Specifically, each gene within a given locusreceived a score (PRE) that does not depend solely on the closestSNP, but that is equal to the weighted sum of all regulatoryfeatures potentially affected by variation at nearby associatedSNPs (See Methods). Figure 1c shows a heat map representationof the PREs for all genes (n= 2,444) implicated by the GWsignificant MS associations for each of the main cell/tissue types(Individual PREs for each cell/tissue and GWAS statisticalconfidence are listed in Supplementary Data 2-28). Two GWsignificant regions (10 and 21) are shown in larger detail asrepresentative examples (Fig. 1c). Because it integrates actualregulatory information for each associated SNP and those in LD,this approach can prioritize the most likely genes affected by thesame association signals in each cell type analyzed (Supplemen-tary Data 29). In the example, the PREs at the top of Fig. 1chighlight the lead variant defining Region 10 (rs6670198, Chr 1)and those in LD which are likely to affect the expression ofFAM213B and TNFRSF14 in all immune cells (B, T andmonocytes -M-). These SNPs would have almost no effect at allin the CNS and lungs (L), as MS-associated variants are unlikelyto alter their expression in those tissues. For comparison, a simpleproximity approach would have just implicated the FAM213Bgene, which, while being of biological interest, may describe anincomplete scenario. Interestingly, TNFRSF14 (a member of theTNF receptor superfamily) encodes a protein involved in signaltransduction pathways that activate both inflammatory andinhibitory T-cell immune responses due to its particular abilityto interact with multiple ligands in distinct configurations18.

Another example is provided by region 21, defined by the leadSNP rs6032662 mapping to chromosome 20. rs6032662 mapsmidway between NCOA5, a tumor suppressor gene, and CD40,which encodes a well-known member of the TNF-receptorsuperfamily. This receptor is essential in mediating a broadvariety of immune and inflammatory responses including T cellactivation, T cell-dependent immunoglobulin class switching,memory B cell development, and germinal center formation19.While its expression is highest among antigen presenting cells(including those derived from dendritic cells and monocyte/macrophages) its PRE score is low in the M set (Fig. 1c),indicating that, of the cells studied, MS-associated variants onlyregulate its expression in B cells. Altogether, these results showthat this approach is a useful strategy to prioritize genes withinassociation regions.

Protein connectivity among products of MS-associated loci.Previous studies have shown that the proteins encoded by genesaffected by genetic association signals are more likely to interact,in part because they often participate in the same biologicalpathways6,20–22. Thus, we evaluated (for each cell/tissue) howmany of the gene products in MS-risk loci predicted to be posi-tively regulated (i.e. PRE > 25th percentile or PRE-25) also

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y

2 NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications

Page 4: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

interacted in a human protein network containing 15,783 pro-teins and 455,321 interactions (See methods) (Fig. 2 shows aschematic of this approach). In addition to the total number ofinteractions we also computed other relevant network metricssuch as the size of the largest connected component –LCC– andthe number of connections –edges- among nodes within the LCC.These metrics, if statistically significant, are often good indicatorsof true biological networks. We found that, for all immune celltypes analyzed, the above network metrics among GW associatedgene products always exceeded those from 10,000 randomlygenerated, size-matched networks. Specifically, network metricsamong the gene products of GW associated loci were statisticallysignificant for T cells, B cells and monocytes (Fig. 3). The numberof interactions among genes related to the CNS were not sig-nificantly different than expected. One factor likely affecting thisresult is that in contrast to immune cells (for which PREs werecomputed on a cell-specific manner) computations for CNS arethe result of 25 different cell types/anatomical regions. This couldpotentially smooth the overall estimate of PREs as the various celltypes within the CNS could be under different regulatory control.

The three significant networks (Fig. 3a–c) shared several of theirgenes, thus constituting a core module (Fig. 3d). The molecularfunctions of most of these genes belonged to the binding (36%)and catalytic activity (33%) categories. Other functions werereceptor (13%), signal transduction (6%), and structural molecule(10%). A PANTHER analysis revealed these genes belong to JAK/STAT, IFNgamma, interleukin, and integrin signaling pathways,among others. Altogether, our analyses suggest that susceptibilityto MS stems from a core of processes that can be active in any ofseveral immune-related cell types. These findings are in agree-ment with the “omnigenic” model of inheritance of complexdiseases, posing that gene regulatory networks are sufficientlyinterconnected such that all genes expressed in disease-relevantcells are liable to affect the functions of core disease-relatedgenes23.

As predicted by the omnigenic model, we noted that severalgenes were only present in some cell types but not others. Forexample, CD28 was only present in the T cell network, ELMO1 inB cells and MERTK in the monocyte/macrophage lineage. CD28is located on the surface of T cells and provides a required

1

1

2

4

2

0.5 0.5

0.5

PRE

PPI

SNP # 123Gene 1

Gene 2

RepressorSNP

Cell type Reg feature

InhibitorsActivators

3.5303

7314

p31.

31p

31.1

12

SKI MORN1

RER1

PEX10

PANK4

HES5

TNFRSF14

FAM213B

FAM213B

MMEL1

MMEL1

TTC34

ACTRT2

PRE

B T M C LB T M C L B T M C L

200 kb hg38

PLCH2

SLC12A5

CD40

UBE2D3TTC34TNFRSF14NFKB1MMEL1MELL1MANBALOC115110FAM213BACTRT2

Region 10 (rs6670198)

RP11-740P5.2RP13-436F16.1

RP3-395M20.7

RP3-395M20.3

RP4-740C4.7

RP3-395M20.8

NCOA5

hg3850 kb

MMP9

CD40

NCOA5

RPL13P2

SLC12A5

RP11-465L10.10

RP5-998H6.2

Region 10Region 21

1q12

q32.

11q

411q

43q4

433

chr1(p36.32)

HighLow

chr20(q13.12)

20q1

3.33

20q1

3.2

20q1

3.13

20q1

3.12

20q1

2

q11.

23

11.2

2q1

1.21

20p1

1.21

20p1

1.23

20p1

2.1

p12.

220

p12.

3

20p1

3

Cell-specific generegulatory info

GWAS(individual genotypes)

GWAS(summary-level data)

Intra-individualgenetic networks

Cell-specificpopulation-levelgenetic networks

Regulome DBTM

report for GWAS SNPs

Regulatory marks

Transcriptional activatorEnhancer

PRENet weight

Gene 2

Gene 1

Region 21 (rs6032662)

a

b

c

Target gene

B cell

B cell

B cell

T cell

Monocyte H3R2M

DHS

TFBS

H3K27Ac

H3K4Me3 Gene 1

Gene 2

Gene 2

Gene 1

Gene 3

Fig. 1 Overall strategy and computation of the predicted regulatory effect (PRE) in MS-associated loci. a GWAS signals were integrated with cell-specificregulatory information to compute PRE at both population and individual level. In a second stage, genes with high PRE at each of the cell types analyzedwere identified in a human protein interactome (PPI) and sub-networks of enriched genes (proteins) were extracted. b Each MS-associated SNP and thosein LD were used as query in RegulomeDB. For each SNP, the all regulatory features were annotated and classified according to type and cell of origin. Agraph connecting every queried SNP (crosses), the regulatory feature (diamonds), and the target gene (circles) was created and the number ofexperiments supporting a particular regulatory feature was used as weight (numbers next to SNP). Finally, a PRE score was computed for each gene bysumming up weights from all incoming regulatory signals for each of the cell types analyzed. c Heatmap represents the PRE of all genes under GW MS-associated loci for cells of interest. Rows represent genes, and columns denote cell types. Colors indicate positive (red), neutral (white) and negative (blue)PRE values. Two representative regions are highlighted. Region 10 (associated SNP: rs6670198, green box) highlights immune-specific (B, T, and M)regulation of FAM213B and TNFRSF14. In contrast, region 21 (associated SNP rs6032662, blue box), shows high PRE only for CD40 in B cells. C: CNS; L: lung;T: T cells; M: monocytes; B: B cells. This analysis represents all SNPs with an r2 > 0.5 of the main GW effect

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y ARTICLE

NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications 3

Page 5: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

co-stimulatory signal to trigger their activation after engagementof the MHC-antigen-T cell receptor trimolecular complex24.ELMO1 encodes a cytoplasmic adaptor protein that interacts withDOCK family guanine nucleotide exchange factors to promoteactivation of the small GTPase RAC, thus enabling lymphocytemigration25. The MERTK gene encodes a receptor tyrosine kinase

that transduces signals from the extracellular matrix into thecytoplasm regulating many physiological processes including cellsurvival, migration, differentiation, and phagocytosis of apoptoticcells (efferocytosis). Specifically, MERTK plays several importantroles in normal macrophage physiology, including regulation ofcytokine secretion and clearance of apoptotic cells26. These

# of edges

Fre

quen

cy

MYB

ZFP36L2

HHEX

L PP IQGAP1

TEC

DDX6

MYNN

FOXP1

NDFIP1

VMP1

NANS

PLEK GRAP2

ELMO1GRB2

PPM1F

CAMK2G

CXCR4

CLEC2D

IL7R

CD48

SLAMF1CSF2RB

IL2RA

IDE

CD5 8

STK38

STAT3

SMARCA4

CBLB

DAB2

ZMIZ1

PAPD7

SP140

RCOR1JAK1

KIF11

CD69

TAGAPRPS6KB1

CD86

S1PR1

ZFP36L1

PLEC

SART1

MAPK1GOLGB1

ACTN1

CCNE2

MAPK1GRB2

PLEC

PLEK

CAMK2G

TNFRSF1A

LPPSLAMF1

CD226

ACTN1

IL2RA

SLC6A16

TGFBR3

MADD

MPHOSPH9

GRAP2

IDE

IQGAP1

CD58

CTLA4

CD48

TXK

CD28TAGAP

S1PR1

CLEC2D

STAT3

RGS1

SLAMF7

ZFP36L1

IL7R

RPS6KB1

CD69

RHGAP31

SART1PPM1F

SP140

TCF7

DAB2MYNN

FOXP1

SMARCA4

MYBZFP36L2

PAPD7

RUNX3

ZMIZ1

CBLB

DDX6BATF

JAK1

STK38TRIM14

STAT4

# of edges

Fre

quen

cy

0 50 100 150

0.00209

PPM1F CAMK2GCCNE2SART1 MERTK

TNFRSF1ACSF2RB

MADD

IL7R

RGS1

IQGAP1

TRIM14

CBLBRUNX3

BATF

PAPD7

KIF11

DAB2

ZFP36L2

DDX6

TEC

SMARCA4

STAT3

STK38

JAK1

MYB

MYNN

LPP

TXK

ZMIZ1

FOXP1

S1PR1CD58

CLEC2D

CD48 TAGAP

CD69

RPS6KB1

ACTN1

PLEK

ELMO1

NCF4

ZFP36L1

IDEPLECGRB2

0.00109

# of edges

PAPD7

DDX6

IQGAP1

ZMIZ1

CBLB

DAB2SMARCA4

ZFP36L1

ZFP36L2

PLECGRB2

ACTN1

IDE

SART1

RPS6KB1PLEK

CAMK2G

MAPK1

PPM1F

FOXP1

JAK1

STK38

MYNN

MYB

LPP

STAT3

CD69

S1PR1TAGAP

IL7R

CD48

CLEC2D

CD58

a b

c d

B cells T cells

Monocytes Common module

15/50 (30%) B cell exclusive 12/54 (22%) T cell exclusiveMembrane

Cytosol

Nucleus

Membrane

Cytosol

Nucleus

Membrane

Cytosol

Nucleus

Membrane

Cytosol

Nucleus

3/48 (6%) Monocyte exclusive

**

**

**

*

*

*

*

*

*

**

*

**

* *

* *

* *

*

* *

**

*

Fre

quen

cy

Binding (GO:0005488)

Catalytic activity (GO:0003824)

Receptor activity (GO:0004872)

Signal transducer activity (GO:0004871)

Structural molecule activity (GO:0005198)

200

150

100

50

0

0.00884200

150

100

50

0

0 20 40 60 80 100 120

600

400

200

0

0 20 40 60 80

MPHOSPH9

MAPK1

Fig. 3 Cell-specific gene sub-networks of GW associated regions (r2 > 0.5). Graphs correspond to the largest connected component in each cell/tissuebucket. Nodes represent proteins and edges represent interactions. For each cell type the PRE is proportional to the color intensity (dark: high; light: low).Genes/proteins are organized according to their cellular distribution. The histogram next to each sub-network shows the distribution of the number ofedges of 10,000 randomly generated networks. The red arrows denote the number of edges observed in the corresponding sub-network and the p-value,the probability of observing a more extreme number of edges in a randomly generated network. a B cells; b T cells; c monocytes. An asterisk is placed nextto genes/proteins exclusively observed in that cell type. d shows an aggregate (common) module present in all three cell types. A pie chart describes theGO: molecular functions assigned to these genes and a table describes the nine PANTHER pathways that were significantly enriched

PPI15,783 nodes455,321 edges

M

T B

C

109 nodes71 edges # edges

71500

Genes with highregulatory potential

Extract cell-specificsub-network

p < 10–5

Fig. 2 Network connectivity analysis. The PRE of genes were loaded as attributes in a protein interactome. In the central panel, genes with a PRE above the95th percentile of their respective cell-specific distributions are visualized (M: monocyte, green; T: T cells, red; B: B cells, blue, C: CNS, yellow). For each celltype, the number of edges in the sub-network composed of interacting proteins with PRE above the threshold was analyzed. In this example, the CNS sub-network is composed of 109 nodes and 71 edges. Ten thousand random networks with the same number of nodes (i.e. 109) were generated and thedistribution of edges was plotted along with the number of edges of the relevant sub-network (i.e. 71). A p-value was computed to evaluate the probabilitythat this number of edges was seen by chance

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y

4 NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications

Page 6: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

observations suggest that, in addition to the core susceptibilitymodule, at least part of the risk is cell type-specific.

We next performed a sensitivity analysis by testing separatelyGW, statistically replicated (SR), and non-replicated (NR) effects(see Methods) SNP sets at three different LD cut-offs (r2 > 0.1, r2

> 0.5, and r2 > 0.8) and three PRE thresholds (PRE-10, PRE-25and PRE-50) (Supplementary Data 30). As anticipated, althoughthe most significant network metrics were seen in the analysisbased on GW signals (Supplementary Fig. 2, panel A), somesignificant network metrics were also seen in the analysis basedon the SR SNPs (Supplementary Fig. 2, panel B). This confirmsthat in well powered studies variants with evidence for associationjust short of GW significance may still represent real effects14. Incontrast, the connectivity of networks obtained with SNPs fromthe NR set was usually not significantly higher than that ofrandom networks (Supplementary Fig. 2, panel C). As expected,including the wider range of SNPs implicated by relaxing the LDthreshold down to r2 > 0.1 resulted in less significant networkmetrics, suggesting that including less robust proxies introducedmore noise than signal.

Individualized PRE correlate with gene expression. The detailedmapping of regulatory information for each SNP suggests thatif PRE are computed for a given cell type in a single individualbased on the carriage of relevant risk alleles, these valuesshould capture a non-negligible proportion of the variance ingene expression in that cell type. To test this hypothesis, weinterrogated the expression of the entire transcriptome of FACS-sorted CD4+ T cells, and CD14+ monocytes from 25 MS patientsby RNAseq and then assessed the correlation of their genotypedependent PRE and their actual gene expression in each celltype separately. Our results showed that the correlation observedwas in all cases significantly higher than what would be expectedby chance if these metrics were independent. Furthermore, thecomputed correlations were always higher for the matchingcell type (CD4/CD8 expression with T cells PRE and CD14expression with monocytes PRE) (Table 1). The average corre-lation between RNA expression and PRE within the same celltype was 0.331 (CD4 vs. T cells, p < 10−300, linear regression),0.324 (CD8 vs. T cells, p < 10−300, linear regression), and 0.246(CD14 vs. monocytes, p < 10−300, linear regression), representinga significantly higher than expected value for each cell type.Correlations between PRE and RNA expression of mismatchedcell types were significantly lower. These results suggest that thecomputation of PRE can be applied to single patients and indi-vidual scores can be generated for each of them.

We then used the genotype-level data from one of the GWASdatasets (UCSF, Supplementary Data 31), composed of 2370patients and 412 controls, to compute cell-specific risk scores foreach individual using the same pipeline used for the population-level data. Rather than considering all 200 associations, this

personalized approach takes into account the specific risk allelespresent in each individual and thus enables exploration of subjectheterogeneity in a biological context and in a cell specific manner.Hierarchical clustering of subjects and heatmap visualizations ofthe PRE of genes under regions 9, 49, and 53 (r2 > 0.5) for all celltypes in this subset of cases and controls are shown as an example(Fig. 4). The heterogeneity across individuals in the PRE of agiven gene can be readily seen for all cell types. As expected forcommon variants, cases and controls (denoted by red and greenhorizontal bars in the leftmost column) do not cluster separatelywithin any single association region. This analysis provides avisual representation of which genes are most likely affected bycommon variants associated with the disease in those individuals,in a cell-specific manner. The PRE for most genes in eachheatmap show ample variability across individuals, highlightinggenetic differences in their susceptibility to MS at this locus.These maps also reveal which genes are most likely to be affectedin each cell type by these common alleles. For example, whileassociated variants near the gene EOMES (region 9) potentiallymodulate its expression in T cells, this gene is less regulated in Bcells and monocytes and strongly silenced in CNS (Fig. 4a). Thisis consistent with its function as a critical transcription factor in Tcell differentiation27,28. Interestingly, higher PRE of these variantsare observed for Th2 cells than for any of the other subsetsanalyzed. Previous reports revealed that EOMES expression limitsFOXP3 induction, thus effectively reducing Treg populations29.Genomic variation at this locus resulting in dysregulated EOMESexpression in T cells and NK cells might be a critical mediator ofthe risk to MS30.

Another interesting example is the high PRE observed for thegene CD40 (region 21) preferentially in B cells for most subjects(Fig. 4b, pink boxes). This finding is consistent with the criticalrole of CD40 in B cell development and maturation, andindicates that MS risk affecting B cell biology is higher in somesubjects than others. Furthermore, individuals carrying the riskvariants within CD40 (rs4810485*T) have been reported toexpress lower levels of CD40 in the surface of their B cells andlower IL-10 levels31. This could carry therapeutic implicationsconsidering the prominent role of B cell depletion therapy inthis disease32,33. Another signal revealing B cell involvement inMS risk is the high PRE of all members of the Fc receptor-like(FCRL) gene family in region 53 (Fig. 4c, yellow boxes)preferentially in B cells for most subjects, consistent with thefunction of these gene family as regulators of proliferation in Bcells and phagocytosis34.

Intra-individual risk networks are more connected in MS.Finally, we integrated individual risk regulatory scores with aglobal protein interactome to compute intra individual, cell spe-cific risk networks. We hypothesize that these networks wouldprovide a step forward in the description of aggregate persona-lized risk scores by representing risk in a biologically relevantmanner35,36. Furthermore, building risk profiles with pathwayand cell specific information describes more accurately the biol-ogy potentially affected by risk variants inherited by a particularindividual. We also hypothesized that similarly to what weobserved for cases and controls at the population level, moreinteractions among proteins encoded by risk loci would beobserved for cases than for controls. This was indeed the case, aswe observed statistically significant differences between cases andcontrols for the three main cell types tested (the CNS was notsignificant as shown in Fig. 5a and Supplementary Fig. 3) (Sup-plementary Data 32). The largest number of intra-individualinteractions among gene products with high PRE was observed inmonocytes, followed by T cells, B cells and the CNS (Fig. 5a

Table 1 Correlation between regulatory potential and cellspecific global gene expression

PRE CD4 r (p-value) CD8 r (p-value) CD14 r (p-value)

Monocyte 0.225 (p < 10−300) 0.218 (p < 10−314) 0.246 (p < 10−300)T-cell 0.331 (p < 10−300) 0.324 (p < 10−300) 0.253 (p < 10−300)B-cell 0.204 (p < 10−248) 0.204 (p < 10−247) 0.219 (p < 10−287)CNS 0.108 (p < 10−75) 0.112 (p < 10−82) 0.120 (p < 10−93)Lung 0.117 (p < 10−86) 0.122 (p < 10−94) 0.133 (p < 10−112)

Bold values indicate the corresponding cell type for the RNAseq results (e.g. CD14 RNAseq datacorresponds to monocytes, and CD4 and CD8 corresponds to T cells)

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y ARTICLE

NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications 5

Page 7: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

shows results for PRE-25). This is consistent with the largersignificance of these risk networks observed at the populationlevel (Fig. 3). Interestingly, PRE values correlate with the globalpolygenic risk (Supplementary Fig. 4) but it uniquely enablesidentification of high-risk and low-risk individuals in a cell-specific manner. Figure 5b highlights four case:control pairs withdifferent risk profiles in each of the four main cell types/tissuesanalyzed. For example, subject 201100870 (case) is at the 99thpercentile of the distribution of network edges for monocytes(120 nodes and 271 edges). In contrast, subject 20020214 (con-trol) is at the 1st percentile (61 nodes and 98 edges).

Another interesting observation emerging from this analysis isthat subjects at the extremes of the distribution of intra-individualinteractions (a proxy for their overall risk) can be identified foreach cell type. For example, subject 201327986 has 110 nodes and190 edges in this subject’s B cell risk network (blue box),corresponding to the 99th percentile of all cases (Fig. 6). Incontrast, the corresponding percentile of the number of edges inhis T, M and C (red/green/yellow) networks is substantially lower(47th/49th/66th). In line with results observed at the populationlevel, individual CNS risk networks are consistently smaller andless connected than those from B cells, T cells and monocytes.Although CNS is still the least connected network in subject201101897, with 118 edges in its CNS risk network (yellow box),it ranks in the 99th percentile of all cases. In contrast the

percentile connectivity of T cell (51st), B cell (29th) and M (75th)risk networks for this subject rank noticeably lower.

While the number of interactions (edges) among proteinsencoded by genes in associated loci was variable across celltypes, on average, more interactions were observed for cases thanfor controls in all cell types studied. Supplementary Fig. 3 showsthe significance of testing different network parameters betweencases and controls across a wide range of conditions. Similar towhat we observed at the population level, significant effectswere also seen for loci with less than genome-wide evidence forassociation, suggesting that some of those variants consideredless strongly significant, also confer risk.

In summary, this analysis underscores the notion that total MSrisk is not only carried by accumulation of risk alleles, but also byhow the genes and proteins affected by those polymorphismsinteract within each cell type. We anticipate this model couldapply to other common diseases.

DiscussionIn this work, we provide evidence that integration of associatedvariants from GWAS with regulatory information and proteininteractions, provide plausible models of disease pathogenesis.This approach not only offers a data-driven solution to prioritizewhich genes within a locus are most likely affected by the risk

Region 9

B cells Monocytes

Region 21

B cells Monocytes B cells MonocytesRegion 53

CNS T cells CNS T cells CNS T cells

Th1 Th17

Th2 Treg Th2 Treg Th2 Treg

Eom

es

Eom

es

CD

40

CD

40

FC

RL4

FC

RL5

FC

RL1

FC

RL2

FC

RL3

FC

RL4

FC

RL5

FC

RL3

FC

RL2

FC

RL1

PRE

High

Low

Eom

es

Eom

es

CD

40

CD

40

FC

RL5

FC

RL3

FC

RL4

FC

RL2

FC

RL1

FC

RL4

FC

RL5

FC

RL3

FC

RL1

FC

RL2

Th1 Th17 Th1 Th17

FC

RL4

FC

RL3

FC

RL5

FC

RL4

FC

RL3

FC

RL5

FC

RL4

FC

RL2

FC

RL3

FC

RL5

FC

RL5

CD

40

CD

40

Eom

es

Eom

esE

omes

Eom

es

CD

40

CD

40

a b c

Fig. 4 Individualized PRE computations for three representative associated regions. Each row represents an individual (out of 2370 cases and 412 controls),and each column represents a gene within the associated region. Region 9 (a) contains the gene EOMES (green boxes), region 21 includes CD40 (pinkboxes) (b) and region 53 (c) the FC receptor-like cluster (yellow boxes). The leftmost column denotes subject status (red: cases; green: controls)

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y

6 NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications

Page 8: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

variant(s) but also provides an interpretable model of risk in acell/tissue specific manner. While the PRE scores computed hereare significantly correlated with actual gene expression from thecorresponding cells, the correlation is partial, thus underscoring apotential limitation in our approach, and of available data.However, the statistically significant results obtained for genesand pathways known to be involved in MS further validates thisapproach. Indeed, the models presented here are consistent withMS genetic risk being driven by the long-term alteration of cel-lular pathways primarily in monocytes, but also in the B and Tcell arms of the human adaptive immune response. The smallerbut not negligible contribution of CNS pathways to MS risk is inagreement with our previous analysis17, which identified themonocyte/macrophage/microglia axis as a key player in directingthe autoimmune process to the CNS. However, an importantcaveat, particularly affecting results for this compartment, mustbe taken into account: for this analysis, the CNS group wascomposed of a heterogeneous ensemble of purified primary cells,established cell lines and dissected specimens from specific

anatomical regions as available in the ENCODE and REP data-sets. Although all derived from CNS tissue, it is highly likely thatdifferent regulatory mechanisms are at play in each cell type (insome cases resulting in drastically different expression patterns),thus somehow confounding the overall CNS signature computedhere. Thus, the detected effect of MS-associations which map tothe CNS could represent the lower boundary of a more wide-spread phenomenon. A more detailed CNS-specific data setcontaining genome-wide regulatory element information mightbe needed to address this question in larger detail.

In the last few years, several post-GWAS pathway approachessuch as DEPICT37, FUMA38 and PASCAL39 have been proposedand utilized to interpret and integrate summary statistics into abiologically meaningful model. While sharing some of the basiccharacteristics of previous approaches, our method features a setof unique properties, most notably the introduction of data-driven regulatory effects of associated variants (and those in LD),the ability to create cell-specific networks, and the computing ofindividual disease burden maps (Supplementary Table 1). Given

E-LCC

C

Monocyte

1 200

CASEN : 120E: 271

P : 99

N : 61E: 98P : 1

Subject_id: 201100870

Subject_id: 20020214

CTRL

T cell

CASEN : 99E: 172

P : 94

N : 53E: 60P : 1

Subject_id: 201102205

Subject_id: 20020214

CTRL

B cell

CASEN : 110E: 190

P : 99

N : 29E: 45P : 1

Subject_id: 200723721

CTRL

CNS

CASEN : 59E: 91

P : 85

N : 9E: 8P : 1

Subject_id: 201100870

Subject_id: 201328146

CTRL

Subject_id: 201327986

B M T

19982731201102551200723719201410965201101309201411024201420499201100828201101538201100705201410793201102513199828002011006832013279242011008212014108282013279452002017020072368920141088119982798201420510201328017201410731201100738201102483201411084201410860201501058201420436201420495201101110201102468201410665201102160201101898201420538201410627200614858201102646201100830201410652201501082201100803201102431201102138201102066201410775201420424201410974201410951201410753200614243199944422011020892014108072011026332014108792011019182014109702014107592011027442011009512014204272013280252014110491999446520132800420110207120132809620110137720072373319994566201102197201328046201410829201102388201420533201102613201328130201410791201410651201410690201328063201411001201420546201101993201411153201410720201420509201410948201410774201100818201101063201102180201102631201410983201410756201411037201328117201101793201101902201102013200001642004056782011025912014110022013279842014107172014108802002021420110211320150106320110252220110098620132816820110195320141104420150108020061424720027304201100995201101873200614163201101101201101709200723721201410620201101944201102081201327999200614653201410696200308557201327996201102215201102368201101053201328018201327925201100735201102558201101984201410683201411043201410885201101818201101703201410834201328134201411096200614785201411069201100744200723743201328152201102594201327934201102300201100816201410980200614548201327974201501094200614294201102414201327915200201762014109792013281352014204892011009052011027152002062320110209320110239820110228720141096720110087620061466320061458420141072720110105820141090120110262220110164320110218820141113520061537120110261520110242720061464820132811920110193520110111620110084320110241720132814620110198920110096619982725201101013200614641200724285201327963201410637201102532201100772201101789201101998201100736201101939201100883201102193201410695201102629201328019201102176201410959201501067200117142011015212011008122014107162011026812014110592014110152011023542011025212014111032002016120110086320141086120132808420110109420132813720110076520142047520141091620061461420110125020020189201102405200723704201101364201327981201101439201100925201102719201100885199827322014106472000027220110098920142050220061460320132805920110072420141077220110074320110228920141062620132803220141100520142045320110245120142051520110220520110260520110097420072333120061428620141091020110196720110261820110241020141062820020265200614298201328039201102136201102485200723405199943582013279382011010662006140422011008192013281512014110982014110932011014832006141822006148182006145112014109692011021052011023752011018862013279942011018602011027082014110652011008242011020242011022502013280792011022662014108942011020732011021952011014742007237232011026512011009832013278902011018702011019382011010602007240272011020902011019232006147732011025722011021922011010982013278932014109152014107281996062620061422520132788520110209220132792720110197620142046320110241520132797520110111720110069820110102920061363820110207820110242520110261120110268020110194020132807320110069520150108620110110520132802420110178020141084620110125620141072320110223920110091920150107520110181320141065920110083820072367320110193020132814219994510200204352011010472011018832011017062011025962011024262011019742011007592011019282011023782011022692013280002011025002011020512013280532006141282011017272013280572006142562014109662014106552011010002011007792011018672011026522006142732015010692014110342013279002014108402013278992006141002007237402011007542006155702002002320110264720040094719994450201501068201100771201410878201410779201420526201100787201101084201100807201410722201102088201327939201101342200724011201102272201100958201410737201410930201410685200404951201420422201102282201411057201102039201101050201100945201420451201328104201410928201102127201410734201101333201410618201100868201102217201102716201100917201410923200400887200614179201100813201411173201410754201101947201102580201411066201411023201102634201102219200400722201410865201411151201102627201101601201328089201100948201328094201328050201410918201102303201101942201102003201328112201101092201102222201101091201101548201410971201102482201102599201102743201410650201410770201101065201420516199827922011022082013279462011006492011007892011011872006148512014109852014107022011019562011026652006155682013279222014109612006148612006141292011018972011010702014110912011010422011023192014204422011026572011020302014111502006140632014108522006140952013281582011009772015010662006145852011019142011008532011021252014110422006140882011024912011020042013281362011006692014107492013281622011023442014107112011026372011026892013281411998278920110260920110254920110190020141106320110102220132796620110063720110074220110086420061426520132794120011739200148252014110112011007002011026682013280222011026012011010552011018442011008522006141982011024402013281542011014762011027212014106742011007932014108632014205252011018412014108162011008882011021572011006902011010872013280092011026432011020822011020182011008332011020422011016512011009352014106172014108422014109202011006742011017022013279912015010572013280302011021182013279082011021832011020692002028220110090819982785200724010201101890201327933201101927201420491201100810201102602201101100201100873201327990201411100201102216201101707200205122011018752011012992011024892006155882006140842014111682011018652014111382014110192011014982014205312011007962013281102013280342011007562006148342011021702013279832011026202011025362011007342011023182014109262014107102007242152005190532011010792013279142006141312011018422011010722014204472001551020000085201102664201101530201100758201328122201101037201102690200211408199943272011010712013278712011025642007237252014111572011018762011006862011010492013280442011018882011008492011023712011010242011014162014108572011010022011020942011025432013278702015010592007233382011013962011009242011021072011007102011024442011020682011022962004055612011009522014111372014108352006142502011018642011008552006146112007237062011025112011007702011009602007233212011018612011024502011010452011016222014108582011027392015010832011009072003085392011018741999431920141072620110179620132790720132794220110243020110204520061416220110194820110251820141094420110223320141094620110255220061481620072335120132793120110223420110197220141062920061514120110224820141114920110235320141081020000089201101233201102654201101282201102559201101016201411030200614229201101480201101851201411086200723716199943802011008592014106222014107302002023020061427620110179220110225720132800120061462220132797220132787719890304201102063201411039201327847201101734201102705201100769201410670201102263201410934201102692201411079201102102200504487201410752201328131201327880201410851201102635201410888201102367200614124201102476201100799201411083201410762201410855200613608201410769201327928201411115201102497201101028201102110201101499201410744201410956201102475201100783201411064199945191999549919994439201328035201102508201410823201100918201420545200723739201102361201420535201327950200615145201411085201102288201102048201102674200614160201328125200614541201102111201100940200614274201102374201102547201410634200615569201101294201102292201410975201410646201410689201101056201410820201327831201102324201328108201411087200613722201410825201327918200723722201410832201420462201327962201410700200614804201410836201328155201327876201102630201102341201102624201102435199827582013278392014108152006141572006151442014106731999437220150107120020211200723496201100961201420520201101990201411041201411102201411117201101446201410718200148292006142842014205272014107862011019612014109822011024322014106622013279792013281572002020520132784120110104120141109920110083620132807520061430020141114520110203720150107020040511620110070420141112720110089220110226520110093720072401820110188120141090220141069320110242820141088420072371520132806220110253520141095520110231620110170420110237920061556120141064920110246120141105320110237720072370320132788220141063620110143120132787320110205020110253920110092620141115920110171920110218520141079220110217420110248720110239420110164220132799820110192920072336820132789720110240120142046920110222620110216420110066720110184920061557520061457420141099120110093219982750201102046201420417201410824201102033201411155201100921201328015201420528201410990200614542201410999201102060201101460201100946201102636201410895201102525201328037200400920201100979200614862201410736200723695201101994199944592011016702002025620141065820110229120110223020110225920110105920110227520110077820110103120110236020110196220141084120141079920110243720072373120110078620110105720110131820011781201101119199943182011026782014108172014111292006148642014107332014108622011009432003085492011010092014107972011021202011023642011010542014106631999416720110086920141095720014823201411060201411068201102274201102534201327987201102606201102005199945022011015502014108332011016252014106752011020702011025892014109632014109062011019871998275720132791020110099820110164520110223620072328220132790120110123220110210920110143220132802720020396201102655201101858201410653201328008200723745201327954201101924201328072201327943201410819201102438201101020201420512201100854201411007201328107199828052011023362014108772014107512011012712011015132014109422011023822013281602014111162011008612015010722011022842013279882014107612014111342006145592013281382006146672014107081999446220072329620141089620141091720141112020110103020110077320141089720132798520110095620110235820110270720141102620110259220110217920141065420141087420141116020110266020141105519961103199945422013281032011023122013281232011022702011024842011020652011023312014109972011011672014204982011024162011008462013281402014106612014108222011027002014107822011025382014110452004050382011023042014204962006146162011019062011023902014108382014111072011019662011026722000508120141085420141092120000518201102621200614297200614279201102238201101097201101992201102023201100949201410995201327898201102709201102391201327834200049032013279522006142662014109532011016202013279492011017122011007532011023962011012552011013082011022281999445620110105120020028201100794201102457201410868201101095201411035201420445201102471200723661201410643201420452201410757201410780201102207201410639200723712201420504199203992011012092014107152013279692014109412011021552014107192006140662011027042001567320110227820141085020141064420072406920040547120110253320110231320110199120061484720141111120110192120061461320110243420110252320061514020110242020110094720110150520132789120110147320110191620110189220141088720132791120110133120142044020061451420132799320110087420110162720141107320141102120141107520110202920110184020132793720110267020061522620061417420110101020061426120110258520110093420110098220141105120110201120110214720142042020142043320061428220110090020141093220110218720132799520110246320132807720142048820132804720132816420132793220110254120132804519982145201100858201101478201328143201410694201100643201410640200200882014106322011023302014111582013281142006146332011025452011019312013279292006155742011018952011021142007233502011024782011026912011023832006142692011011362014204942006140902011010172011010122011011042014109772006141662007237292007236622006155892011019972014111432014109932014204932011020982014111822011020092006155832011025372014107052011025872011026402011022322011016102014108912013281182014111092011019812014106912011026382011020862013280862013280882011027342014109452014108302015010852011008702011022181999470720110198820110076420141091119982713201102326201102140201410996201101138201101618201102357201328010201102171201328115200614089201410648201101401201102268201101121201100715201101479201102149201101767201100941201410732201410849201420461201100884201410831200614169201410803200154082006146122013280832011026122014107352013279672011011982013280922011020142011011272011026612011022642011016711998274520110103320132798020061474220141117720110250120110215820110208420141098820110099120061468020110076120040512220132804820110239920020109201420534201101887201101878201102017201102571201420521201410811201100913201100914201102074200614240200402542200204902013280902011013152002731720110197020110118120020242201420523201102737201102227201411082201101099201420460201100902201102597200614748201420485201411046198817552013279592014204832014111392011013892014110562014110782011025552014204802011014552013278832001171620110121420015405201327895201410789201102348200614221201501073201328133201101667201410668201410909201420529200614159201501093201102142201101472200615594200308544201420537201102717201420458201101784201328016201102199201101329201102542201327953201102241201101433200614579201100721201410859201101932200308553201102677201102143201100994200001222011021532011017872006142492006146042011011062013281132006142912014110142006142772011019802011022792011007462015010772011018392001174120072330220141100820141068020110195920061404920110221020110251720110208520141066720141084320110063620110201620110198220132801220132806620110244620027301201101790201328163201102519200154882006142522013280702011010802014110292006140602014204542014205412011007622011026562006142552014110222011024362011024862013280402011023732015010762014110322002112992002020020141079820110270620020219201102161201101771201101821201328068201328041201410765198900612000496420132803120110214820110267920110258820132789420132787220110092720110122320141087220110232820141114720110257020132809920110126220072374220110133220141066620110191019994486201102550201102320201100894201101077201100672201100635201327936200614168201411104200614057201101648201411122201102583201102315201101885201102064201101085200723250201328165200308548201100996201327951201102710201411074201101673201102168200723727201100938201102351201501087201100814201102576201410837201100777201102206201102403201101564201327965201101641201100964201100879201410900201411061201100675201328111201101855201101905201102465201102220201102697201411076201327977201411175201102106201411052201102442200405762200614629201102502201101068200723308201328005201100668201101447200202512011019712011023232011026932011024492011008652014107832011025161999468820141093120110074120061426320110197720110229820110073719994223201102386201101644200007502014108532002008720110151420061465520132809120142045020132810120141115420141076020110174220011751201501074201410962201102212201102581201327930201328014201101046201101523201100694200615563201410914200723732200200982011006462014108002014107142011020552014111192014205362014204312011025532014111122011012682011026172011018452007232652013279862014108182013281212011024732011013002011022762011026501999445320110251420142051120110257320132797020110078420110072720110189320141083920141114820110197520110117920110192520110067820110163920061421820132797120110247920061428020061402519960638201411097201102225201328106201100933199954682011015352013280212014111232011010142014110802014106812006141272011023812013281592011026962014106332011014002011023422011023112011021672014110122011020672013281292013280612011019792013280852011024132011007252011022422014110282011012452011022542013280712013281272011021592013280282011007682011009042014204382014110002011009992007233492011020562011010262014111422013280062014109222014106132011025292011024002011022832011008902011009652011017812014110812011021862013280652014111012015010792011022022006146692006148452013279402011007492011016192014109892014108122011019132011023292011012812014108671998280820110234020061428920110202120110236520110099720110082619994628200308519201102456200614215201327887201411070200614171200211302200615564201420497201411025200615541200723187201410984201102054200723345201410741201327989201410713201102423201410638201101027201327921201101074201101710201410814200614258201102703201101872199827752013281322007237442015010892011020582011026952011008442006138692011008572011025602003085551998274920110063420110092020141117020141080820110185320072353420110065720141067920110257720040640720110217320110233419994695200614085200614662200614863201101891201420446201102632200614591201102260201411136201327916201102441201102595201100726201327917201410954201411113201102203201101788201102184201410889201100891201410925201410919201410940200614865201410949201410664201101062201102472201102598201410781200211348201410903201102345201501084201101907200723728200201822011026942011007522011010751998215420110271120141108820000236201101960200308524201327973201410973200405759201102574201102141201328067201102131200402498201411152200155482011021162014108132014109332014111562011027142007238222011015572011015622013280032014108932013281162014108922011009122006146082011010902004052932014107482011012722011021782011022112014106922014110362013279822014108762014109722011021632013280582011026002011010782014107472011019502011017052011009672014108062014107422006142032011021031998278820141061920061508020141116120110226720141113020110192220132788420110220120110268620110185220141064220110251519890060201327960201101672201101290201102741201501060201102494201102252201100656201501088201102077200273682014109292011009442014204252011023502006146502006148372011009392011020592007233112014108212011016502014106122011009842011008772007237362014109382014204392011010322014106412011024292007237342011023622011009811998279320110179520132803620141070320110098720110265320141101620141067820110189620061420820110069620141061419982780201101061201100985201102528201100820201100980200723724201102189200211318201100955201410763201102146201410698201102255200308512201411013201101086201410845201100745201420500200614640201102455201100711201420443201102363201102237201328026201410746201102445201420508201100856201101786201102460201102317200723701201102676201101239201410630201411058201101904201101246201420530201102221201100718201100650201420505201102122201100988201100645201411179201410785201410875201101048201501064201420543201328161201102372201101185200615593201102285201410738201101652201410960201501081200614193201101894200614803201328100201411162201410773201102488201410899201328148201102682200117312013279232011024662014111412011021292014204592011007912011018622006148482011027382011019202014106162006148352011026422011025782011010252011010112014109392011024812014204822014110892014106602001364420110203120061550720061478820061425420110215020132788620030850120110078220141087320142048619994307201102607201102616201411092201101043201328109201101824201410701201100691201410987201411108201420432201102144201420476201102273201101783201410743200723219201100837201501061201102507201102166201102347201328097201410958201420466201100860201102467201327875201327912201101355201102333201101868201102406201411106200405816201100780201102041201100775201102421200614113201420501201101453201102262201102213201410725201102584201101778201101151201328056201410677201100689201102346201102732200200312014111262011013762011024072014106722011026632013280202013279762014106712011024022011023272011024982011008482011016652014109982014109042006142852011009732004025202011010822006155332014204282015010782011007392011008042014106352011019582011014711998272820141098620141116420110209520150109120132806420110203220110252420061414020110091020142044820110210820110107320110266720141063120110148120141112120110163320142053920141097820110232220061500420072370220141112420110096920110217520110254020110100620132804220141117220040571720050453120110188920141116720110106720110207920141080520141061520110229920061461820132783720141106720061421120110088720040555520142051720110133020142044920110199520110166920141088220040075220110089920110185020141113120132808720141068420020382201327874201102385201420437201420423201101912201102452200614178201101107201100751201102586201328126201420457201328076201410699200723710201410947201410611201420542201102411201102196200615163201328120200202682013279782013279092014107772011017852011013682011020082014204192011022472014109052011010072013281282011012002014111762003085072014204872011020402014205442011025122011020072011026042013281452011023922006148552014106242011027312014110102011021042011009542014111462006146652014204212014109642011009302011019462011024122011023352014111402011024702011020622006142442014109922014110272014107842014109082014107882014110482013279352011025102014106232011023952011022442011013662007232892007237262014107782011019862014109122011026102014204792011021902014109762011009752006155922011026852011011632014109242011020062011007122006140502013280742006145022011022712011022772011022042002005820132808120110113520132800720141106220110210119994653201101945200723445201100692201102258201327889201328098201411105201102666201420418201420465200266851999440920142046720110186620061452920110195220110230720110235220110236620110091520110064820110271220110199920132815620142047820141094320110199620110084720110092920110111520141068619891044199944712014108692011027352011021331997545120110232520132794420110075720132801120110206120110234920110228120110248020110095020110067720132792020141079520110186320132799720061483820061462420061458920072372020110246420141086420141077120110166620110229020142050320142047120142049220132804320110233819960018201410907201420455201102527201420540201102671201102356201100840201411072200117582014107642014108042011007602013279192006146072011008711999436520142053220142051920110188020020186201410745201102097201102389201100992201411125201420441201328069201102134201410740201410704201101791201411071201327902201501090201328150201410625201100641201100795198918552014106762011023142011025032004058612014107092007237132013280802011007332013281022014110942011008222014204722011006622011020102011019112011020532013279472014204352011025792006147122003085062011006442011007222011025052013280382011024332011024432011021392004056962011022932011026732006146102014110172011027302014204342006141892011019852013279132011007062011026082011025682014108262014106452013278791988159420132809519994491201411031201327968201411009201102736201102309201101901201102308201101827201101955200723656201101325201101597200266782013279062014107582014108902011027012007237082011014252007239782013279482007237092013280332011026392011021321999554220061422320141066920141079020061458120110143620132801320110260320110231020110187720110130320110097120110262820141073920141080920072339720141076620142050720061447720110256720132784020110204720110202820141103820061479920132816620110069920132790520110073020061457520132813920061473120141101820061483620110200120132816720061380020110256620110210020110257520132792620061557120110081120061457020061458620110268820110247720072334720072365320061462020141088620141105020110228620150109220110245320132786820141072920061413020141104720110108320061483320110256920110217720110254420110207620072373520141098120141080120141068820061556220132808220072332720110169920061429520110091120110102320132795520110196320141111420132786920141096820141095220110184720132788120110204320141087020141091320061418020110274620141068720141084720061407620110270220142046820132812420141079620141116620141092720110250620050455520110218120110223120110255420061385220132783220110120120110230120110095920110222320141084820110246220061478420110087520110269820110067120110230520110166420110190920061427220110222920110186920141109520132789620110192620141077620141082720110205720110225620132807820110064020110215620110068420110195720110120520110185720040574120110193720110191520110213519982776201410950200724012200614164201101410201411077201420426201102297201327904201410707201411144200723737201102582201411132201410768200614775201101859200614598201410935201101004201102124201101647201101978201101096201327903201102080201100872201410697201411020201102469200614153201420514201101502201100880200614609201102409199944452014111102014108712011010052014106822013281472011010182013279922014204442002112962014107502011026871989014720142045620110239320061428820110239720061472620141078720110120620110078520110253120141117420030850820110115220132805420141116920141112820142048420132802320110261920020272201420429201328144200615566201410856201102562201328029201100901201100923201410657201410937201100957201102165201102495201410844201102614201410621201101468201102644201410794201102496201102214201101081201100845201102035201102022201501056201420477201410936201102243201411054201102151201327958200614752201101089201102117201410866200614267201420518201102563200202432013281492014110032011021192014111182011011032011019652014111332011026752011024242011009422011006852011024182013279562011009632011010762002668320132809320141100420061464720110219120110208320061425920110249220110215420132787820141100620110194320141072420110264120141088320110213720110229520020094200724239201102490201327961201420513201327888201410712201410656201102249201101882201102419201410767201420473201102546201501055200614099201411033201102152200614683201100928

a b

Fig. 5 Select case-control intra-individual MS-risk networks. a Number of edges in the largest connected component (LCC) of the network generatedamong proteins (genes) with high PRE (>25th percentile) in 2370 patients and 412 healthy controls (GW_r2 > 0.5). Each row represents a subject, eachcolumn represents a cell type (B: B cell; T: T cell; M: monocyte; c: CNS). The leftmost column indicates subject status (red: cases; green: controls).b Representative sub-networks from subjects at the extremes of the distribution for E-LCC for each cell type. For each network, the number of nodes (N),edges (E), and percentile relative to all subjects (P) is indicated. The intensity of node color is proportional to the PRE of each gene in the correspondingcell type

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y ARTICLE

NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications 7

Page 9: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

that each method produces a different output, it is not possible todirectly compare these approaches. However, the first and argu-ably most important step in all these tools (including ours) is tocompute the SNP to gene values. A basic comparison of thesethree tools shows that the exact same genes were prioritized foralmost half of all non-HLA associated loci (97/200). A closer lookrevealed that our method was the only one that called at least onegene per associated locus, and produced gene prioritization setswith the least ambiguities (Supplementary Table 2 and Supple-mentary Fig. 5).

Recent evidence has emerged that polygenic risk scores forschizophrenia associate with therapeutic response to Lithium-based therapies40. Similar approaches are being tested for otherpsychiatric, oncological and cardiovascular diseases41–45. Theobservation that some risk variants only affect expression of agiven gene in one cell type but not in others, may at least in part,underlie the observed clinical heterogeneity in the MS population.Thus, when this approach is implemented at the individual level,specific risk profiles can be built for each subject with MS. We

speculate that in the near future this information could also beused as the basis to develop individualized risk scores, or to derivepersonalized approaches to therapy. For example, a subject withhigh B cell genetic risk may be a good candidate for B celldepletion therapies, while a subject with a high T cell risk maybenefit the most from immunomodulatory drugs that targetT cell function or migration into the CNS.

This cell-specific pathway approach can be extended to anyset of SNPs of interest in any condition at both population(summary) and individual (genotype) levels.

MethodsPredicted regulatory effects (PRE). Genome-wide regulatory elements fromENCODE and REP were collected from regulomeDB46 (which contains more than400 million genomic regulatory features collected from 400 cell and tissues; Sup-plementary Table 3) for all non-MHC independent effects (SNPs)17. Specifically,single nucleotide polymorphisms (SNP) corresponding to all non-MHC GW (n=200; Supplementary Data 33), SR (n= 416; Supplementary Data 34) and NR (n=3695; Supplementary Data 35) were extracted for analysis GW, SR, and NR asdefined previously17). The 200 GW effects were distributed in 156 unique regions

T cell B cell Monocyte CNS

Subject_id: 201327986

Subject_id: 201101471

Subject_id: 201102205

Subject_id: 201101897

N : 110E: 190P : 99

N : 104E: 164P : 47

N : 116E: 201P : 49

N : 57E: 88P : 66

N : 102E: 134P : 52

N : 107E: 197P : 65

N : 58E: 84P : 53

N : 122E: 219P : 98

N : 77E: 124P : 30

N : 46E: 69P : 28

N : 99E: 172P : 60

N : 114E: 249P : 94

N : 91E: 121P : 129

N : 122E: 189P : 51

N : 118E: 206P : 75

N : 78E: 118P : 99

Fig. 6 Heterogeneity in intraindividual MS-risk networks Intraindividual cell-specific networks of four representative MS subjects showing heterogeneity ofrisk across all cell types. a Cell specific risk networks for subject_id: 201327986. b Cell specific risk networks for subject_id: 201101471. c Cell specific risknetworks for subject_id: 201102205. d Cell specific risk networks for subject_id: 201101897. For each subject, the most connected risk network (numberof edges in the highest percentile across all subjects) is highlighted within a colored box. For each network, the number of nodes (N), edges (E), andpercentile relative to all subjects (P) is indicated. The intensity of node color is proportional to the PRE of each gene in the corresponding cell type.M: monocyte, green; T: T cells, red; B: B cells, blue, C: CNS, yellow

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y

8 NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications

Page 10: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

(44 regions contained multiple independent effects). Similarly, the 416 SR effectswere distributed in 354 unique regions (62 regions contained multiple independenteffects) and the 3695 NR effects were distributed among 1,883 unique regions(1812 regions contained multiple independent affects). Three sets of SNPs werecreated for each region according to their r2 with their corresponding main effect(r2 ≥ 0.8, r2 ≥ 0.5, and r2 ≥ 0.1).

A python tool was written to automatically fetch data from RegulomeDB forthese SNPs in all three lists (totaling 538,826 SNPs). Similarly, data for SNPs indifferent levels of LD (r2 > 0.8, r2 > 0.5, and r2 > 0.1) with each primary effect werealso retrieved using chromosomal positions. The main analysis was performedusing r2 ≥ 0.5 whereas the other sets were only used for the sensitivity analysis. Intotal, 538,826 SNPs were included in the analysis.

To investigate the effect of SNPs across different cell types and to assess whichgene has most potential of being regulated across various cell types, we grouped thecell types present in ENCODE (Supplementary Table 4) and REP (SupplementaryTable 5) into four major cell types (buckets). Specifically, these were B cells, T cells,CNS (central nervous system), and M (monocytes). T cell subsets (Th1, Th2, Th17,and Treg) were also analyzed as a separate group. We also built a dataset from lung(L, a cell type/tissue not considered to play a major role in MS susceptibility) as acontrol. Cancer cell lines were excluded for this analysis.

Regulatory elements were grouped into two major classes: PEX (promoter/enhancer/activator) and R (repressors/inactivators). Cell or tissue of origin wasrecorded for each regulatory feature, and cell-specific information was grouped intothree main cell types (B cells, T cells, monocytes) and one tissue (CNS) that are ofinterest in MS. In total, 25 brain regions were considered (15 from ENCODE and10 from REP). In addition, T cell subsets deemed relevant in the pathogenesis ofMS (Th1, Th2, Th17, and Treg) were also analyzed separately. Primary cells andcell lines from a tissue not known to be involved in MS (lung, L) were included ascontrol. In addition, eQTL data for T cells and monocytes from the IMMVARproject47 were integrated into the PRE computations. The HTML data from thescrapped output was parsed to populate data present in regulomeDB tables. Amaster table was then compiled with each field of regulomeDB data for all538,826 SNPs.

Multiple regulatory features were considered including protein binding,transcription factor binding sites (TFBS), promoters, enhancers, insulators, histonemodifications, and DNAse hypersensitive regions (DHS). We classified these intothree broad groups representing promoter/enhancer/transcription (PEX), inert/quiescent (ZQI), and repressor (R, Supplementary Table 6). We next computedweighted SNP-based scores based on the genotype and number of risk alleles toquantitate the regulatory influence of variation at each SNP. The weights werecounted as positive if there was evidence that the region promotes transcriptionand as negative if there was evidence of repression. The weights were thennormalized by the total number of experiments conducted for each respective celltype to remove bias against well-studied cell types. These weighted weights (WW)were summed up across SNPs resulting in a sum of weighted weights (SWW, orpredicted regulatory effect -PRE-) per gene per region in each cell type. The WWconcept derives from the fact that the sum of the effects of neighboring SNP to agiven gene is weighted twice. The first time we weight the number of experimentsreported in ENCODE or REP for a given SNP-gene pair (e.g. we assign more valueto a relationship that has been reported in 10 independent experiments, to anotherthat has been reported just once). The second time, we weight the evidencestemming from all SNPs nearby a gene (depending on the LD structure there couldbe ~100 SNP near a given gene). A gene with a positive score indicates there isevidence that the region containing the MS-associated SNP(s) is activelyinfluencing its transcription in that particular cell type and vice-versa.

All computations were performed in parallel using the 7400-core QB3 computercluster at UCSF.

Protein interaction network-based pathway analysis (PINBPA). An experi-mentally determined human protein interactome consisting of 15,783 nodes and455,321 edges was used for this part of the analysis21. We loaded the network intoCytoscape48 and created cell-specific sub-networks using gene expression valuesfrom elsewhere. Specifically, we filtered interactions realized only by gene productsexpressed in a given cell type, by using RNAseq expression profiles from Kitsaket al.49. Thus, for the T cell interactome, we only retrieved interactions betweenproteins known to be expressed by any T cell subset present in Kitsak et al. In thecase of CNS, while gene expression data is sufficiently granular (profiles for dif-ferent brain cell types and regions exist), epigenomic data for CNS cells/tissues inENCODE or REP is very sparse, thus we decided to merge all data into a singleCNS category.

Next, we loaded the gene-level PRE for each cell type as node attributes andconducted a topological analysis by selecting the subnetwork corresponding to thelargest connected component of nodes with positive PRE (those with negativescores are assumed not to be expressed, and thus not to be active players of theinteractome). To eliminate noise from very small or loosely unconnected networks,only those with more than 15 nodes were considered. Sensitivity analysis wasperformed by defining different thresholds on PRE values (10th, 25th, 50thpercentiles) and building networks with only proteins exceeding these thresholds(Supplementary Fig S1). Individual network analysis was performed consideringdiffering sets of the potentially associated SNPs identified in our recently completedmeta-analysis; those SNP that showed statistically significant evidence of

replication and reached genome-wide significant in the final combined analysis(GW), those that showed statistically significant evidence of replication but did notreach genomewide significance in the final combined analysis (SR) and thosefailing to show statistically significant evidence of replication (NR). For each celltype, the number of nodes and edges of each subnetwork and that of its largestconnected component were computed. The statistical significances were computedby comparison against a background distribution of 10,000 networks of equal sizesampled randomly from the same PPI.

Cell-specific transcriptomes. This work was approved by the Institutional ReviewBoard at the University of California San Francisco (IRB# 10-00104). PBMCs wereobtained from 25 individuals by Ficoll method using Vacutainer CPT tube (BDBiosciences). Subjects were consented according to institutional (UCSF) reviewboard (IRB) guidelines. Three different cell subsets (CD4+ and CD8+ T cells, CD14+ monocyte) were sorted into RLT buffer using a MoFlo Astrios cell sorter(Beckman Coulter). Helper T cells were defined as CD3+CD19−CD4+, cytotoxicT-cell were CD3+CD19−CD8+, and monocytes were sorted as CD14+ cells. TotalRNA was isolated from sorted cell subsets using RNeasy Mini kit (Qiagen) andassessed RNA quality using Agilent 2100 Bioanalyzer (Agilent Technologies). 3′mRNA-Seq libraries for all cell subsets were prepared from 100 ng total RNA usingQuantSeq kit (Lexogen) according to the manufacturer’s instructions andsequenced 50-bp single-end on the HiSeq 4000 (Illumina). Sequence reads weremapped to the human genome reference (GRCh38) with Gencode annotation (r26)using STAR aligner50. Reads were normalized by median of ratios using the DEseq2package51. The R function featureCounts was used to obtain gene-level readcounts52.

We selected overlapping genes between RNA-Seq gene counts and PRE scores,and Pearson’s correlation test was performed using the cor.test function in R. Thesignificance of the correlation was confirmed by permutation testing (n= 1000).

Data availabilityAll data generated or analysed during this study are included in this published article(and its supplementary information files). RNA samples used in this work have beenutilized in its entirety and thus are not available. Raw RNAseq (fastq) files used in thiswork have been deposited in the UCSF Data Share Server [https://doi.org/10.7272/Q6HQ3X3M].

Code availabilityComputer code used in this study is available upon request.

Received: 7 August 2018 Accepted: 26 March 2019

References1. Paul, D. S. et al. Maps of open chromatin guide the functional follow-up of

genome-wide association signals: application to hematological traits. PLoSGenet 7, e1002139 (2011).

2. Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects targetgenes. Am. J. Hum. Genet 99, 1245–1260 (2016).

3. Maurano, M. T. et al. Systematic localization of common disease-associatedvariation in regulatory DNA. Science 337, 1190–1195 (2012).

4. Shooshtari, P., Huang, H. & Cotsapas, C. Integrative genetic and epigeneticanalysis uncovers regulatory mechanisms of autoimmune disease. Am. J. Hum.Genet 101, 75–86 (2017).

5. Cantor, R. M., Lange, K. & Sinsheimer, J. S. Prioritizing GWAS results: areview of statistical methods and recommendations for their application.Am. J. Hum. Genet 86, 6–22 (2010).

6. Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. Nat. Rev. Genet 11, 843–854 (2010).

7. Huang, H., Chanda, P., Alonso, A., Bader, J. S. & Arking, D. E. Gene-basedtests of association. PLoS Genet 7, e1002177 (2011).

8. Carbonetto, P. & Stephens, M. Integrated enrichment analysis of variants andpathways in genome-wide association studies indicates central role for IL-2signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn’sdisease. PLoS Genet 9, e1003770 (2013).

9. Li, M. X., Kwan, J. S. & Sham, P. C. HYST: a hybrid set-based test for genome-wide association studies, with application to protein-protein interaction-basedassociation analysis. Am. J. Hum. Genet 91, 478–488 (2012).

10. Chen, M., Cho, J. & Zhao, H. Incorporating biological pathways via a Markovrandom field model in genome-wide association studies. PLoS Genet 7,e1001353 (2011).

11. International Multiple Sclerosis Genetics, Consortim. Network-based multiplesclerosis pathway analysis with GWAS data from 15,000 cases and 30,000controls. Am. J. Hum. Genet 92, 854–865 (2013).

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y ARTICLE

NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications 9

Page 11: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

12. Wang, L., Matsushita, T., Madireddy, L., Mousavi, P. & Baranzini, S. E.PINBPA: cytoscape app for network analysis of GWAS data. Bioinformatics31, 262–264 (2015).

13. Jannot, A. S., Ehret, G. & Perneger, T. P < 5 × 10(-8) has emerged as astandard of statistical significance for genome-wide association studies. J. Clin.Epidemiol. 68, 460–465 (2015).

14. Panagiotou, O. A. & Ioannidis, J. P. Genome-Wide Significance Project Whatshould the genome-wide significance threshold be? Empirical replication ofborderline genetic associations. Int J. Epidemiol. 41, 273–286 (2012).

15. ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNAElements) Project. Science 306, 636–640 (2004).

16. Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference humanepigenomes. Nature 518, 317–330 (2015).

17. Patsopoulos, N. et al. The multiple sclerosis genomic map: role of peripheralimmune cells and resident microglia in susceptibility. Preprint at http://www.biorxiv.org/content/biorxiv/early/2017/07/13/143933.full.pdf (2017).

18. Steinberg, M. W., Cheung, T. C. & Ware, C. F. The signaling networks of theherpesvirus entry mediator (TNFRSF14) in immune regulation. Immunol.Rev. 244, 169–187 (2011).

19. Grewal, I. S. & Flavell, R. A. CD40 and CD154 in cell-mediated immunity.Annu Rev. Immunol. 16, 111–135 (1998).

20. Baranzini, S. E. et al. Pathway and network-based analysis of genome-wideassociation studies in multiple sclerosis. Hum. Mol. Genet 18, 2078–2090(2009).

21. Lee, I., Blom, U. M., Wang, P. I., Shim, J. E. & Marcotte, E. M. Prioritizingcandidate disease genes by network-based boosting of genome-wideassociation data. Genome Res. 21, 1109–1121 (2011).

22. Rossin, E. J. et al. Proteins encoded in genomic regions associated withimmune-mediated disease physically interact and suggest underlying biology.PLoS Genet. 7, e1001273 (2011).

23. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits:from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

24. Lenschow, D. J., Walunas, T. L. & Bluestone, J. A. CD28/B7 system of T cellcostimulation. Annu Rev. Immunol. 14, 233–258 (1996).

25. Stevenson, C. et al. Essential role of Elmo1 in Dock2-dependent lymphocytemigration. J. Immunol. 192, 6062–6070 (2014).

26. Anwar, A. et al. Mer tyrosine kinase (MerTK) promotes macrophage survivalfollowing exposure to oxidative stress. J. Leukoc. Biol. 86, 73–79 (2009).

27. Stienne, C. et al. Foxo3 transcription factor drives pathogenic T helper 1differentiation by inducing the expression of Eomes. Immunity 45, 774–787(2016).

28. Lupar, E. et al. Eomesodermin expression in CD4+ T cells restricts peripheralFoxp3 induction. J. Immunol. 195, 4742–4752 (2015).

29. Endo, Y. et al. Eomesodermin controls interleukin-5 production in memory Thelper 2 cells through inhibition of activity of the transcription factor GATA3.Immunity 35, 733–745 (2011).

30. Parnell, G. P. et al. The autoimmune disease-associated transcription factorsEOMES and TBX21 are dysregulated in multiple sclerosis and define amolecular subtype of disease. Clin. Immunol. 151, 16–24 (2014).

31. Smets, I. et al. Multiple sclerosis risk variants alter expression ofco-stimulatory genes in B cells. Brain 141, 786–796 (2018).

32. Webb, L. M., Foxwell, B. M. & Feldmann, M. Interleukin-7 activates humannaive CD4+ cells and primes for interleukin-4 production. Eur. J. Immunol.27, 633–640 (1997).

33. Guo, L. et al. Innate immunological function of TH2 cells in vivo. Nat.Immunol. 16, 1051–1059 (2015).

34. Davis, R. S. Fc receptor-like molecules. Annu Rev. Immunol. 25, 525–560(2007).

35. De Jager, P. L. et al. Integration of genetic risk factors into a clinical algorithmfor multiple sclerosis susceptibility: a weighted genetic risk score. LancetNeurol. 8, 1111–1119 (2009).

36. Gourraud, P. A. et al. Aggregation of multiple sclerosis genetic risk variants inmultiple and single case families. Ann. Neurol. 69, 65–74 (2011).

37. Pers, T. H. et al. Biological interpretation of genome-wide association studiesusing predicted gene functions. Nat. Commun. 6, 5890 (2015).

38. Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functionalmapping and annotation of genetic associations with FUMA. Nat. Commun.8, 1826 (2017).

39. Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z. & Bergmann, S. Fast andrigorous computation of gene and pathway scores from SNP-based summarystatistics. PLoS Comput Biol. 12, e1004714 (2016).

40. Amare, A. T. International Consortium on Lithium Genetics (ConLi+Gen)Association of polygenic score for schizophrenia and hla antigen andinflammation genes with response to lithium in bipolar affective disorder: AGenome-Wide Association Study. JAMA Psychiatry 75, 65–74 (2017).

41. Agerbo, E. et al. Polygenic risk score, parental socioeconomic status, familyhistory of psychiatric disorders, and the risk for schizophrenia: a Danishpopulation-based study and meta-analysis. JAMA Psychiatry 72, 635–641(2015).

42. Radice, P., Pharoah, P. D. & Peterlongo, P. Personalized testing based onpolygenic risk score is promising for more efficient population-basedscreening programs for common oncological diseases. Ann. Oncol. 27,369–370 (2016).

43. Shieh, Y. et al. Breast cancer risk prediction using a clinical risk model andpolygenic risk score. Breast Cancer Res Treat. 159, 513–525 (2016).

44. Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burdenof atherosclerosis and greater relative benefit from statin therapy in theprimary prevention setting. Circulation 135, 2091–2101 (2017).

45. Fernandez-Ruiz, I. Coronary artery disease: new polygenic risk score improvesprediction of CHD. Nat. Rev. Cardiol. 13, 697 (2016).

46. Boyle, A. P. et al. Annotation of functional variation in personal genomesusing RegulomeDB. Genome Res 22, 1790–1797 (2012).

47. De Jager, P. L. et al. ImmVar project: insights and design considerations forfuture studies of “healthy” immune variation. Semin Immunol. 27, 51–57(2015).

48. Shannon, P. et al. Cytoscape: a software environment for integrated modelsof biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

49. Kitsak, M. et al. Tissue specificity of human disease module. Sci. Rep. 6, 35241(2016).

50. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29,15–21 (2013).

51. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change anddispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

52. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purposeprogram for assigning sequence reads to genomic features. Bioinformatics 30,923–930 (2014).

AcknowledgementsThis investigation was supported in part by the following sources: NIH/NINDS awardsR01NS088155 and 1R01NS099240, the Valhalla Charitable Foundation, and the HeidrichFamily and Friends Foundation (Sergio E. Baranzini). US National Multiple SclerosisSociety (TA 3056-A-2), the Harvard NeuroDiscovery Center and an Intel ParallelComputing Center award (Nikolaos A. Patsopoulos). Swedish Medical Research Council;Swedish Research Council for Health, Working Life and Welfare, Knut and AliceWallenberg Foundation, AFA insurance, Swedish Brain Foundation, the Swedish Asso-ciation for Persons with Neurological Disabilities. Cambridge NIHR Biomedical ResearchCentre, UK Medical Research Council (G1100125) and the UK MS society (861/07).NIH/NINDS: R01 NS049477, NIH/NIAID: R01 AI059829, NIH/NIEHS: R01 ES0495103.Research Council of Norway grant 196776 and 240102. NINDS/NIH R01NS088155. OsloMS association. Research Council KU Leuven, Research Foundation Flanders. AFM,AFM-Généthon, CIC, ARSEP, ANR-10-INBS-01 and ANR-10-IAIHU-06. ResearchCouncil KU Leuven, Research Foundation Flanders. Inserm ATIP-Avenir Fellowshipand Connect-Talents Award. German Ministry for Education and Research, GermanCompetence Network MS (BMBF KKNMS). Oslo MS association, Research Council ofNorway grant 196776 and 240102. Dutch MS Research Foundation. TwinsUK is fundedby the Wellcome Trust, Medical Research Council, European Union, the NationalInstitute for Health Research (NIHR)-funded BioResource, Clinical Research Facility andBiomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust inpartnership with King’s College London. German Ministry for Education and Research,German Competence Network MS (BMBF KKNMS). Italian Foundation of MultipleSclerosis (FISM). NMSS (RG 4680A1/1). German Ministry for Education and Research,German Competence Network MS (BMBF KKNMS). Lundbeck Foundation and BenzonFoundation.

Author contributionL.F.B., L.B., F.M-B., D.B., M.C., G.C., B.A.C.C., S.D., K.D., P.D., D.E., F.E., B.F., A.G.,B.D., G.H., C.H., B.H., R.H., D.H., N.I., S.K., J.K., M.K., I.K., R.M., A.O., A.P., M.P-V.,J.S., T.O., B.V.T., G.S., H.F.H., A.C., S.L.H., D.A.H., F.Z., P.D., S.S., and J.R.O. directlycontributed to sample acquisition. S.B., J.M., K.K., S.J.C., and S.E.B. performed DNA/RNA processing. A.B., J.M., S.J.C., P.D., and S.E.B. performed genotyping and/orRNAseq. L.M., N.A.P., C.C., S.D.B., A.B., J.M., K.K., X.J., T.F.A., T.B., F.M-B., D.B., F.B.,E.G.C., B.A.C.C., C.G., A.G., P-A.G., J.H., N.I., C.M.L., M.L., F.L., R.H., J.S., H.F.H., S.L.H., D.A.H., F.Z., P.D., S.S., J.R.O., and S.E.B. performed analysis. L.M., N.A.P., C.C.,A.B., J.M., K.K., A.S., S.J.C., J.H., and S.E.B. we involved in data handling (clinical,demographic or genotypic). N.P., C.C., P-A.G., J.H., B.H., I.K., R.H., A.I., B.V.T., G.S.,H.F.H., A.C., S.L.H., D.A.H., F.Z., P.D., S.S., J.R.O., and S.E.B. designed the study. L.M.,C.C., S.D.B., J.M., K.K., S.L.H., S.S., J.R.O., and S.E.B. wrote the paper.

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y

10 NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications

Page 12: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

Additional informationSupplementary Information accompanies this paper at https://doi.org/10.1038/s41467-019-09773-y.

Competing interests: The authors declare no competing interests.

Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/

Journal peer review information: Nature Communications thanks the anonymousreviewers for their contribution to the peer review of this work. Peer reviewer reports areavailable.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.

Open Access This article is licensed under a Creative CommonsAttribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in any medium or format, as long as you giveappropriate credit to the original author(s) and the source, provide a link to the CreativeCommons license, and indicate if changes were made. The images or other third partymaterial in this article are included in the article’s Creative Commons license, unlessindicated otherwise in a credit line to the material. If material is not included in thearticle’s Creative Commons license and your intended use is not permitted by statutoryregulation or exceeds the permitted use, you will need to obtain permission directly fromthe copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© The Author(s) 2019

International Multiple Sclerosis Genetics Consortium

Lohith Madireddy1, Nikolaos A. Patsopoulos2,3, Chris Cotsapas4,5, Steffan D. Bos6,7, Ashley Beecham8,

Jacob McCauley8,9, Kicheol Kim1, Xiaoming Jia1, Adam Santaniello1, Stacy J. Caillier1, Till F.M. Andlauer10,11,

Lisa F. Barcellos12, Tone Berge7,13, Luisa Bernardinelli14, Filippo Martinelli-Boneschi15,16, David R. Booth17,

Farren Briggs18, Elisabeth G. Celius7,19, Manuel Comabella20, Giancarlo Comi21, Bruce A.C. Cree1,

Sandra D’Alfonso22, Katrina Dedham23, Pierre Duquette24, Efthimios Dardiotis25, Federica Esposito21,

Bertrand Fontaine26,27, Christiane Gasperi10, An Goris28, Bénédicte Dubois28, Pierre-Antoine Gourraud29,30,

Georgios Hadjigeorgiou31, Jonathan Haines18, Clive Hawkins32,37, Bernhard Hemmer10, Rogier Hintzen33,34,

Dana Horakova35, Noriko Isobe36, Seema Kalra32,37, Jun-ichi Kira36, Michael Khalil38, Ingrid Kockum39,

Christina M. Lill40,41, Matthew R. Lincoln4, Felix Luessi40, Roland Martin42, Annette Oturai43, Aarno Palotie44,45,

Margaret A. Pericak-Vance8,9, Roland Henry1, Janna Saarela44, Adrian Ivinson46, Tomas Olsson39,

Bruce V. Taylor47, Graeme J. Stewart17, Hanne F. Harbo6,7, Alastair Compston48, Stephen L. Hauser1,

David A. Hafler4, Frauke Zipp40, Philip De Jager3,49, Stephen Sawcer48, Jorge R. Oksenberg1 &

Sergio E. Baranzini 1,50

1Weill Institute for Neurosciences, Department of Neurology, University of California San Francisco, San Francisco, CA 94158, USA. 2SystemsBiology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, and Division of Genetics,Department of Medicine, Brigham & Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA. 3Broad Institute of Harvard Universityand Massachusetts Institute of Technology, Cambridge, MA 02142, USA. 4Departments of Neurology, Yale School of Medicine, 300 George St,New Haven, CT 06511, USA. 5Department of Genetics, Yale School of Medicine, 300 George St, New Haven, CT 06511, USA. 6Institute of ClinicalMedicine, University of Oslo, Oslo 0318, Norway. 7Department of Neurology, Oslo University Hospital, Oslo 0424, Norway. 8John P. HussmanInstitute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL 33136, USA. 9Dr. John T. Macdonald FoundationDepartment of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL 33136, USA. 10Department of Neurology, Klinikumrechts der Isar, School of Medicine, Technical University of Munich, 81675 Munich, Germany. 11Munich Cluster for Systems Neurology (SyNergy),81377 Munich, Germany. 12Division of Epidemiology, School of Public HealthUniversity of California, 324 Stanley Hall, MC#3220, Berkeley, CA94720, USA. 13Department of Mechanical, Electronics and Chemical Engineering, Oslo Metropolitan University, Oslo 0167, Norway. 14Section ofBiostatistics, Neurophyisiology and Psychiatry, Unit of medical and genomic statistics, Universita of Pavia, Pavia 27100, Italy. 15Department ofBiomedical Sciences for Health, University of Milan, Milan 20133, Italy. 16MS Research Unit and Department of Neurology, IRCCS Policlinico SanDonato, San Donato Milanese, Milan 20097, Italy. 17Faculty of Medicine, Westmead Clinical School, The Westmead Institute for Medical Research,Sydney, NSW 2145, Australia. 18Department of Quantitative and Population Health Sciences, School of Medicine, Case Western Reserve University,Cleveland, OH 44106, USA. 19Institute of Health and Society, University of Oslo, Oslo 0318, Norway. 20Servei de Neurologia-Neuroimmunologia.Centre d’Esclerosi Múltiple de Catalunya (Cemcat), Institut de Recerca Vall d’Hebron (VHIR), Hospital Universitari Vall d’Hebron, UniversitatAutònoma de Barcelona, Barcelona 08035, Spain. 21Department of Neurology, San Raffaele Scientific Institute, Milan 20132, Italy. 22Department ofHealth Sciences, UPO University, Novara 28100, Italy. 23Department of Clinical Neurosciences. Neurology Unit, University of Cambridge,Cambridge CB2 1QW, UK. 24Faculté de médecine, MS Clinic Centre Hospitalier de l’, Université de Montréal. Université de Montréal Montreal,Montreal QC H3A 1G1, Canada. 25Department of Neurology, Laboratory of Neurogenetics, University of Thessaly, University Hospital of Larissa,Larissa 41223, Greece. 26Department of Neurology, University Hopital Pitié-Salpêtrière, Paris 75013, France. 27UMR 1127, Sorbonne-Université,INSERM, University Hopital Pitié-Salpêtrière, Paris 75013, France. 28KU Leuven Department of Neurosciences, Laboratory for Neuroimmunology,Leuven 3000, Belgium. 29Université de Nantes, INSERM, Centre de Recherche en Transplantation et Immunologie, UMR 1064, ATIP-Avenir, Equipe 5,Nantes F-44093, France. 30CHU de Nantes, INSERM, CIC 1413, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, NantesF-44093, France. 31Department of Neurology, Medical School, University of Cyprus, Nicosia 587G+X2, Cyprus. 32Institute for Science & Technology

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y ARTICLE

NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications 11

Page 13: ku...immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute

in Medicine, Keele University, Keele ST5 5GB, UK. 33Department of Neurology, Erasmus MC Dr Molewaterplein 40, Rotterdam 3015 GD, TheNetherlands. 34Department of Immunology, Erasmus MC, Rotterdam 3015 GD, The Netherlands. 35First Faculty of Medicine, Department of Neurologyand Center of Clinical Neuroscience, Charles University and General University Hospital, Prague 3CFG+RJ, Czech Republic. 36Department ofNeurology, Kyushu University, Kyushu 812-0053, Japan. 37Royal Stoke MS Centre of Excellence, University Hospital North Midlands, Stoke-on-Trent,ST4 6QG, UK. 38Department of Neurology, Medical University of Graz, Graz A-8036, Austria. 39Department of Clinical Neuroscience and Centerfor Molecular Medicine, Karolinska Institutet, Stockholm 17176, Sweden. 40Department of Neurology, University Medical Center of the JohannesGutenberg University Mainz, Mainz 55131, Germany. 41Genetic and Molecular Epidemiology Group, Lübeck Interdisciplinary Platform for GenomeAnalytics, Institutes of Neurogenetics and Cardiogenetics, University of Lübeck, Lübeck 23562, Germany. 42Neuroimmunology and MS Research(nims), Department of Neurology, University Zurich, Zürich 8006, Switzerland. 43Department of Neurology, section 2082, Rigshospitalet, DanishMultiple Sclerosis Center, University of Copenhagen, Copenhagen 2100, Denmark. 44Institute for Molecular Medicine Finland (FIMM), University ofHelsinki, Helsinki FIN-00014, Finland. 45Harvard University, Center for Human Genetic Research, Boston, MA 02115, USA. 46UK Dementia ResearchInstitute at University College London, London WC1E6BT, UK. 47Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS 7000,Australia. 48Department of Clinical Neurosciences, Cambridge Biomedical Campus, University of Cambridge, Cambridge CB2 0QQ, UK. 49Departmentof Neurology, Center for Translational and Computational Neuroimmunology and Multiple Sclerosis Center, Columbia University Medical Center,New York, NY 10032, USA. 50Bakar Institute for Computational Health Science. University of California San Francisco, San Francisco, CA 94158, USA

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09773-y

12 NATURE COMMUNICATIONS | (2019) 10:2236 | https://doi.org/10.1038/s41467-019-09773-y | www.nature.com/naturecommunications