identifying biologically relevant amino acids in immunogenetic studies richard m. single department...

53
Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Upload: sandra-anastasia-hoover

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Identifying Biologically Relevant Amino Acids in Immunogenetic Studies

Richard M. Single

Department of Mathematics and Statistics

University of Vermont

Page 2: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• HLA background and nomenclature• Asymmetric Linkage Disequilibrium (ALD)

– Motivation, Definition & Example

• Amino acid level analyses of HLA disease associations– SFVT Analysis & Pairwise allele level analyses– Conditional Haplotype analyses & ALD

• Identifying units of selection– ALD as a tool

Outline

Page 3: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont
Page 4: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

TCR

= peptide fragment

-m

TCR

HLA class I HLA class II

TCR = T-cell receptor

-m = microglobulin

HLA molecules are cell-surface proteins that present peptide fragments to T-cells

• HLA molecules bind specific sets of peptides (based on structure)• Any given HLA allele codes to present a subset of available peptides to T-cells

Page 5: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

HLA-A * 24 : 02 : 01 : 02 : L

Locus Field 1 (2-Digit)

Serological level(where possible)

Field 2 (4-Digit)

Peptide level(amino acid difference)

Field 3(6-Digit)

Nucleotide level[silent]

(synonymous substitutions)

Field 4(8-Digit)

Intron level (3’ or 5’

polymorphism)

ExpressionN = nullL = lowS = soluble…

• For most analyses, we want to distinguish among unique peptide sequences, i.e., 2 fields (“4-digit”) level

• This level of resolution treats alleles with the same peptide sequence for exons 2 & 3 (class I) or exon 2 (class II) as being equivalent [“binning” alleles]

HLA Allele Nomenclature

Page 6: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

HLA Nomenclature and why it matters

• Challenges for HLA data management and analysis– The HLA genes are very polymorphic;– HLA nomenclature is complicated;– There are multiple ways to generate HLA data;– All common typing systems generate ambiguous data;– There are multiple ways to report alleles and ambiguities;

These issues make meta-analyses of HLA data from

different sources very difficult.

Page 7: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Extending STREGA to Immunogenomic Studies

• The STrengthening the REporting of Genetic Association studies (STREGA) statement provides community-based data reporting and analysis standards for genomic disease association studies

• The IDAWG (immunogenomics.org) has proposed an extension of STREGA: STrengthening the REporting of Immunogenomic Studies (STREIS)

Page 8: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

From STREGA to STREIS

Extensions to the STREGA guidelines for immunogenomic data include:

• Describing the system(s) used to store, manage, and validate genotype and allele data

• Documenting all methods applied to resolve ambiguity • Defining any codes used to represent ambiguities

- e.g., NMDP codes - A*0201/0209/0266 = A*02AJEY- A*0201/0209/0266/0275/0289 = A*02BSFJ

• Describing any binning or combining of alleles into common categories- e.g., G-codes

- A*0201/ 0209/ 0243N/ 0266/ 0275/ 0283N/ 0289 = “A020101g”

• Avoiding the use of subjective terms (e.g. high-resolution typing), that may change over time

Page 9: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• Immunology Database and Analysis Portal (www.ImmPort.org) Developed under the Bioinformatics Integration Support Contract (BISC) for NIH, NIAID, & DAIT (Division of Allergy, Immunology, and Transplantation)

– Data validation pipeline– Analysis tools– Standardized ambiguity reduction tools – Data from a large number of immunogenomic studies

• ImmunoGenomics Data Analysis Working Group (www.immunogenomics.org) (www.IgDAWG.org)

An international collaborative group working to …– facilitate the sharing of immunogenomic data (HLA, KIR, etc.) and – foster consistent analysis and interpretation of immunogenomic data

Resources for HLA Data Validation & Analysis

Page 10: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• HLA background and nomenclature• Asymmetric Linkage Disequilibrium (ALD)

– Motivation, Definition & Example

• Amino acid level analyses of HLA disease associations– SFVT Analysis & Pairwise allele level analyses– Conditional Haplotype analyses & ALD

• Identifying units of selection– ALD as a tool

Outline

Page 11: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Asymmetric Linkage Disequilibrium (ALD)

- Standard LD measures give an incomplete description of the correlation of genetic variation at two loci when there are different numbers of alleles at the loci.

- We developed a pair of conditional asymmetric LD (ALD) measures that more accurately capture this information.

- For disease association studies, the ALD can help to identify when stratification analyses can be applied to detect primary disease predisposing genes.

- For evolutionary studies, the ALD can be informative for the study of forces such as selection acting on individual amino acids, or other loci in high LD.

- For SNP studies, ALD measures can be used for analyses of LD between haplotype blocks, for SNP–gene LD, and for haplotype block–gene LD.

Page 12: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

1 1

I J

iji ji j

D p q D

12

12

2

21 1 2

min( 1 1) min( 1 1)

I J

ij i ji j LD

n

D p qX N

WI J I J

The two most common measures of the strength of LD are:

(1) the normalized measure of the individual LD values, namely Dij' = Dij / Dmax (Lewontin 1964); and

(2) the correlation coefficient r for bi-allelic data, which is most often reported as r2 = D2 / (pA1 pA2 pB1 pB2).

r =1 only when the allelic variations at the two loci show 100% correlation

Their multi-allelic extensions are:

Linkage Disequilibrium (LD) Measures

Page 13: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• When there are different numbers of alleles at two loci, the direct correlation property for the r measure is not retained.

• The asymmetric LD (ALD) measures more accurately reflect covariation at two loci.

- WA/B and WB/A describe variation observed at the 1st locus conditioned on the 2nd

• Example: (two and three alleles at the A and B loci)

f(A1B1) = 0.3, f(A2B2) = 0.5, f(A2B3) = 0.2,

Wn = 1, WA/B = 1 and WB/A = 0.73,

There is variation at the B locus on haplotypes containing the A2 allele there is not 100% correlation.

- ALD measures indicate that, with appropriate sample size, stratification analyses could be carried out for some comparisons.

- Wn = 1 could result in passing over these data for conditional analyses.

Asymmetric LD measures: WA/B and WB/A

Page 14: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Standard LD measures D’ and Wn

Standard LD measures (overall D’ & Wn) assume/force symmetry, even though with >2 alleles per locus that is not the case

Data Source: Immport Study#SDY26: Identifying polymorphisms associated with risk for the development of myopericarditis following smallpox vaccine

Page 15: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Asymmetric Linkage Disequilibrium (ALD)

Interpretation:

ALD for HLA-DRB1 conditioning on HLA-DQA1 WDRB1 / DQA1 = .58

ALD for HLA-DQA1 conditioning on HLA-DRB1 WDQA1 / DRB1 = .95

 The overall variation for DRB1 is relatively high given specific DQA1 alleles.

The overall variation for DQA1 is relatively low given specific DRB1 alleles.

ALDrow gene conditional on column gene

Page 16: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Asymmetric Linkage Disequilibrium (ALD)Table 1. Linkage disequilibrium and genetic diversity measures

Description

Definition of Measuresa 1. Single locus homozygosity (F)b

FA = i pAi

2 2. Haplotype specific homozygosity (HSF)c

FA/Bj = i (fij / pBj)

2

3. Overall weighted HSF valuesd FA/B (and FB/A)

FA/B = j (FA/Bj) (pBj) = FA + i j Dij

2 / pBj

4. Multi-allelic ALDe squared WA/B (and WB/A)

WA/B

2 = (FA/B−FA) / (1−FA)

Thomson and Single(2014) Genetics

Page 17: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Asymmetric Linkage Disequilibrium (ALD)Table 1. Linkage disequilibrium and genetic diversity measures

Description

Definition of Measuresa 1. Single locus homozygosity (F)b

FA = i pAi

2 2. Haplotype specific homozygosity (HSF)c

FA/Bj = i (fij / pBj)

2

3. Overall weighted HSF valuesd FA/B (and FB/A)

FA/B = j (FA/Bj) (pBj) = FA + i j Dij

2 / pBj

4. Multi-allelic ALDe squared WA/B (and WB/A)

WA/B

2 = (FA/B−FA) / (1−FA)

If both loci are bi-allelic: WA/B

2 = [i j (Dij2 / pBj)] / (1 − FA) = D2 / (pA1 pA2 pB1 pB2) = r2, since D11= −D12= −D21= D22=D

Thomson and Single(2014) Genetics

Page 18: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Other Conditional Measures of LD

• Other measures of LD that are conditional have been proposed (Nei and Li, 1980; Chakravarti et al, 1984; Hudson, 1985; Kaplan and Weir, 1992; Guo SW, 1997).

- They measure association between alleles at a marker locus (locus B) and alleles at a disease locus (locus A).

- They were developed to account for study designs in which individuals are not randomly sampled from a single population, but where sampling intensity varies within disease categories.

- They are equivalent to Somer’s D statistic defined on the contingency table relating two categorical variables

• In contrast, our statistic is a population-based measure that does not depend on a specific patient sampling scheme.

Page 19: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

ALD & tag-SNPs in the HLA region

• DeBakker et al. (2006) identified tag-SNPs based on r2 for SNPs with recoded HLA alleles (recoded as presence/absence of each specific HLA allele)

DeBakker et al. (2006) Nature Genetics

Page 20: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

ALD & tag-SNPs in the HLA region

Thomson and Single(2014) Genetics

Page 21: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont
Page 22: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont
Page 23: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont
Page 24: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• HLA background and nomenclature• Asymmetric Linkage Disequilibrium (ALD)

– Motivation, Definition & Example

• Amino acid level analyses of HLA disease associations– SFVT Analysis & Pairwise allele level analyses– Conditional Haplotype analyses & ALD

• Identifying units of selection– ALD as a tool

Outline

Page 25: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Risk Category

I

I

II

II

II

II

II

III

III

III

III

DRB1

*08:01

*11:04

*13:01

*11:01

*01:01

*03:01

*13:02

*04:04

*15:01

*07:01

*04:01

sum

total

patients

102

57

90

60

74

89

28

7

38

30

21

596

708

controls

13

11

38

36

50

61

23

16

80

65

47

440

546

OR

6.9

4.3

1.9

1.3

1.2

1.1

0.9

0.3

0.3

0.3

0.3

Overall p-value < 2.6E-27

Juvenile Idiopathic Arthritis oligoarticular persistent (JIA-OP) Common HLA-DRB1 alleles

AA 86 implicated via pairwise within serogroup analysis

Page 26: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Sequence Feature Variant Type (SFVT) Analysis - Overview

• An exploratory approach for genetic association studies that uses combinations of amino acid (AA) residues as the unit of analysis.

• Goal: – To identify biologically relevant amino acid (AA) residues that

account for the major disease risk attributable to HLA

• Genes/proteins are sub-divided into biologically relevant units affecting gene expression and/or protein function (i.e., Sequence Features)– Polymorphic AAs (single AA sites)– Structural features (e.g., beta 1 domain, alpha-helix 2, …)– Functional features (e.g., peptide binding, T-cell interacting, …)– Combinational (e.g., alpha-helix 2 & peptide binding, …)

Page 27: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

www.immport.org

Page 28: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Summary of SFVT Analysis

HLA Typing

(Allele-level)

Group HLA alleles based on structural/ functional sequence motifs

(Sequence Features)

Perform disease association tests based sequence motifs

(Sequence Feature-level)

Choose the top Sequence Features associated with disease risk for further study

Identify individual AAs

& combinations of AAs directly involved in

disease risk

ORs & p-values

LD patterns

Conditional/ Stratificationanalyses

Page 29: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Representative Sequence Features: HLA-DRB1

Table from Karp et al. (2010) Hum Molec Genet

Sequence Feature ID

Sequence Feature Name

Sequence Feature Type

Amino Acid Position(s)

# of Variant Types

HLA-DRB1_SF1 allele Standard Allele Designation NA 497

HLA-DRB1_SF4 mature protein Structural - Complete protein 1..237 52

HLA-DRB1_SF5 beta 1 domain Structural - Domain 1..95 69

HLA-DRB1_SF12 loop between beta-strands 1 & 2 Structural - Secondary structure motif 19, 20, 21, 22 5

HLA-DRB1_SF13 beta-strand 2 Structural - Secondary structure motif 23..32 28

HLA-DRB1_SF21 alpha-helix 2 Structural - Secondary structure motif 65..72 29

HLA-DRB1_SF128 T cell receptor binding Functional

60, 64, 65, 66, 67, 69, 70, 71, 73, 76, 77, 78, 80, 81, 82, 84, 85 81

HLA-DRB1_SF137 peptide antigen binding pocket 7 Functional28, 30, 47, 61, 67, 71 53

HLA-DRB1_SF163 alpha-helix 2_peptide antigen binding Structural_Functional Combination 67, 70, 71 21

HLA-DRB1_SF164 alpha-helix 2_T cell receptor binding Structural_Functional Combination65, 66, 67, 69, 70, 71 24

Page 30: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Variant Types for HLA-DRB1_SF153“beta-strand 2_peptide antigen binding”

… 5 of 11 Variant Types (VTs) for Sequence Feature 153 (SF153)

DRB1_SF153_VT1 (LEC): DRB1*0101, 0102, 0103, 0104, 0105, …DRB1_SF153_VT2 (FEL): DRB1*0113, 0701, 0703, 0704, 0705, …DRB1_SF153_VT3 (YDY): DRB1*0301, 0304, 0305, 0306, 0308, …

Karp et al 2010 Hum Mol Gen

Page 31: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9

DRB1 Amino Acids p-value ORmax ORmin

AA position 13 13 2.00E-28 4.9 0.33

Pocket 6 11, 13, 30 4.00E-28 7.1 0.31

Pocket 4 13, 26, 28, 70, 71, 74, 78 6.00E-28 6.8 0.28

DRB1 allele 9…………………….86 1.00E-27 9.4 0.28

Pocket 7 28, 30, 47, 61, 67, 71 9.00E-27 9.4 0.28

AA positions X-LD [11, 12, 10, 16] 9.00E-25 3.2 0.33

AA position 67 67 3.00E-17 3.4 0.54

Pocket 9 9, 37, 57 4.00E-16 3.9 0.33

AA position 74 74 4.00E-16 6.8 0.33

AA position 37 37 4.00E-13 1.8 0.34

AA position 57 57 6.00E-13 3.9 0.44

…………. …… ……… … ….

AA position 86 86 ns 1.1 0.9

AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis

SFVT analysis DRB1 summary for JIA-OP

Page 32: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

SFVT Analysis - Summary

• An exploratory approach for identifying biologically relevant AAs in HLA association studies

• Pros – Utilizes information about the inter-relationships among HLA alleles– Covers more extended protein regions than single amino acid-based analyses

• Cons– Care is needed to address complex patterns of LD among AAs and SFs in

order to identify AAs directly involved in disease– Due to multiple comparisons with highly correlated SFs appropriate p-value

adjustments are necessary– The effects of some amino acids (or combinations) may be missed, so complementary analyses are useful

Page 33: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

DRB1 Amino Acids 13 and 67 13 - 67 patients controls

OR G - F108 14 6.8 S - F 130 49 2.3 S - I 131 71 1.5 G - I 13 8 1.3 S - L 102 80 1.0 R - I 44 91 0.2 others 270 233

p < 8E-9AA 13 involvedor an AA in LD

overall p < 2E-28

Conditional Haplotype Analysis of JIA-OP

Page 34: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

DRB1 Amino Acids 13 and 67 13 - 67 patients controls

OR G - F108 14 6.8 S - F 130 49 2.3 S - I 131 71 1.5 G - I 13 8 1.3 S - L 102 80 1.0 R - I 44 91 0.2 others 270 233

p < 0.002AA 67 involvedor an AA in LD

An extensive set of CH analyses are required, as well as consideration of LD patterns

p < 0.001AA 67 involvedor an AA in LD

Conditional Haplotype Analysis of JIA-OP

Page 35: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9

DRB1 Amino Acids p-value ORmax ORmin

AA position 13 13 2.00E-28 4.9 0.33

Pocket 6 11, 13, 30 4.00E-28 7.1 0.31

Pocket 4 13, 26, 28, 70, 71, 74, 78 6.00E-28 6.8 0.28

DRB1 allele 9…………………….86 1.00E-27 9.4 0.28

Pocket 7 28, 30, 47, 61, 67, 71 9.00E-27 9.4 0.28

AA positions X-LD [11, 12, 10, 16] 9.00E-25 3.2 0.33

AA position 67 67 3.00E-17 3.4 0.54

Pocket 9 9, 37, 57 4.00E-16 3.9 0.33

AA position 74 74 4.00E-16 6.8 0.33

AA position 37 37 4.00E-13 1.8 0.34

AA position 57 57 6.00E-13 3.9 0.44

…………. …… ……… … ….

AA position 86 86 ns 1.1 0.9

AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis

SFVT analysis DRB1 summary for JIA-OP

Page 36: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

LD for DRB1 AAs

Wn JIA controls

ALDrow gene conditional on column gene

Asymmetric LD (ALD)Wn (symmetric)

Page 37: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Conditional Haplotype Analysis of JIA-OP

11_13 Cases Controls OR S-G 121 22 4.89 p<3.6E-06 S-S 363 200 1.81 D-F 9 6 1.15 ns

L-F 87 66 1.01 V-H 46 84 0.38 P-R 50 99 0.34 G-Y 30 65 0.33 Total 708 546

12_13 Cases Controls OR

T-G 121 22 4.91 p<3.6E-06 T-S 363 200 1.82 K-F 98 76 0.994 K-H 46 84 0.382 p<1.2E-05 K-R 50 99 0.343 K-Y 30 65 0.327 Total 708 546

Page 38: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

OR AA position 13 67 74 86 37 57

6.9 DRB1*0801 G F L G Y S

4.3 DRB1*1104 S F A V Y D

1.9 DRB1*1301 S I A V N D

1.3 DRB1*1101 S F A G Y D

1.2 DRB1*0101 F L A G S D

1.1 DRB1*0301 S L R V N D

0.9 DRB1*1302 S I A G N D

0.3 DRB1*0404 H L A V Y D

0.3 DRB1*1501 R I A V S D

0.3 DRB1*0701 Y I Q G F V

0.3 DRB1*0401 H L A G Y D

• These alleles show the strongest evidence for direct involvement in JIA-OP disease risk

• The 6 identified AA sites uniquely define each allele, preventing further stratification analyses

Common DRB1 Alleles & AAs in JIA-OP

Page 39: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• HLA background and nomenclature• Asymmetric Linkage Disequilibrium (ALD)

– Motivation, Definition & Example

• Amino acid level analyses of HLA disease associations– SFVT Analysis & Pairwise allele level analyses– Conditional Haplotype analyses & ALD

• Identifying units of selection– ALD as a tool

Outline

Page 40: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• Balancing selection can result from:

- Overdominance/Heterozygote advantage- Frequency-dependent selection- Selective regimes that change over time/space

• For HLA, the common factor in these models is rare allele advantage, which is consistent with a pathogen-directed frequency-dependent selection model.

• At the Amino Acid (AA) level we see- High AA variability at antigen recognition sites (ARS)- Relatively even AA frequencies at ARS sites- Higher rates of non-synonymous vs. synonymous changes at ARS

Balancing Selection Operates at Most HLA Loci

Page 41: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Homozygosity (F) and theNormalized Deviate (Fnd)

0

0.05

0.1

0.15

0.2

0.25

0.3

allele

alle

le fr

eque

ncy

0

0.1

0.2

0.3

0.4

0.5

0.6

allele

alle

le fr

eque

ncy

0

0.02

0.04

0.06

0.08

0.1

0.12

alleleal

lele

freq

uenc

y

Neutrality

FOBS ≈ FEQ

Fnd ≈ 0

Directional Selection

FOBS > FEQ

Fnd > 0

Balancing Selection

FOBS < FEQ

Fnd < 0

2

1

k

iiF p

Fnd = (FOBS - FEQ) / SD(FEQ)

Page 42: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Fnd for DRB1 AA sites in JIA Controls

• Fnd << 0 gives evidence of possible balancing selection.• Fnd >> 0 gives evidence of possible directional selection.

Page 43: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Fnd for DRB1 AA sites (Meta-Analysis)

Fnd for all polymorphic sites in a meta-analysis of 57 populations

• Fnd << 0 gives evidence of possible balancing selection.• Fnd >> 0 gives evidence of possible directional selection.

Page 44: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Asymmetric LD : JIA – Controls(Row gene conditional on column gene)

Wn : JIA – Controls

Asymmetric LD (ALD)

LD for DRB1 AAs

Wn (symmetric)

Page 45: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Acknowledgements

University of Sao PauloDiogo Meyer

University of GrazWolfgang Helmberg

Cincinnati Children’s HospitalSusan ThompsonDavid Glass

University of TexasNishanth MarthandanPaula GuidryDavid KarpRichard Scheuermann

Children's Hospital Oakland Research Inst.Steven J. MackJill A. Hollenbach

Harvard Medical SchoolAlex Lancaster

UC BerkeleyGlenys Thomson

UC San FranciscoOwen Solberg

Roche Molecular SystemsHenry A. Erlich

Anthony Nolan Research Inst.Steven G.E. MarshMatthew Waller

NCBI/NIHMike Feolo

NGITJeff WiserPatrick DunnTom Smith

Page 46: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Distributions of Fnd values

Results from a meta-analysis of 497 HLA population studies in ten geographic regions

Page 47: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Solberg et al., 2008

Distributions of Fnd values

Page 48: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

• Cano & Fernandez-Vina (2009) described two sequence dimorphisms that define the primary immunodominant serological epitopes for HLA-DPB1.

• All DPB1 alleles can be divided into four serologic categories (DP1, DP2, DP3, and DP4):

Evidence of Balancing Selection at HLA-DPB1

Serological Category 56 85 86 87DP1 A E A VDP2 E G P MDP3 E E A VDP4 A G P M

AA position

Page 49: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Global Distribution of DP serological categories

Page 50: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

.

.

Fnd for DPB1 Alleles ( )& DP Serological Categories ( )

Page 51: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Evidence of Balancing Selection at HLA-DPB1

• We constructed a randomization test (“random binning” to 4 categories) to ensure that the effect was not driven by differences in the observed number of variants at the allele-level vs. serotype-level.

• Randomization tests have confirmed results for European populations more than in other geographic regions

- A possible ascertainment bias? (many common alleles were first identified in European populations)

- Could natural selection favoring DPB1 diversity at the serologic level be greater in Europe?

Page 52: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Evidence of Balancing Selection at HLA-DPB1

Supplementary Figure S1. Mean Fnd values for trios of variant DPB1 Exon 2 amino acid positions

-1.5

-1

-0.5

0

0.5

1

0 50 100 150 200 250 300 350

mean Fnd

Amino-Acid Position Trio

mean Fnd values in variable sets of 3 amino-acid positions vs 36/56/85 paired trios

Page 53: Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Acknowledgements

University of Sao PauloDiogo Meyer

University of GrazWolfgang Helmberg

Cincinnati Children’s HospitalSusan ThompsonDavid Glass

University of TexasNishanth MarthandanPaula GuidryDavid KarpRichard Scheuermann

Children's Hospital Oakland Research Inst.Steven J. MackJill A. Hollenbach

Harvard Medical SchoolAlex Lancaster

UC BerkeleyGlenys Thomson

UC San FranciscoOwen Solberg

Roche Molecular SystemsHenry A. Erlich

Anthony Nolan Research Inst.Steven G.E. MarshMatthew Waller

NCBI/NIHMike Feolo

NGITJeff WiserPatrick DunnTom Smith