genome-wide strategies for detecting multiple loci that

74
Genome-wide strategies for detecting multiple loci that influence complex diseases Jonathan Marchini, Peter Donnelly, Lon R Cardon Presented by Jeff Kilpatrick

Upload: others

Post on 04-Jun-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome-wide strategies for detecting multiple loci that

Genome-wide strategies for detecting multiple loci that influence complex diseases

Jonathan Marchini, Peter Donnelly, Lon R Cardon

Presented by Jeff Kilpatrick

Page 2: Genome-wide strategies for detecting multiple loci that

Introduction

Page 3: Genome-wide strategies for detecting multiple loci that

Introduction

• Genetic epidemiologists have unprecedented mountains of data

Page 4: Genome-wide strategies for detecting multiple loci that

Introduction

• Genetic epidemiologists have unprecedented mountains of data thanks, Human

Genome Project!

Page 5: Genome-wide strategies for detecting multiple loci that

Introduction

• Genetic epidemiologists have unprecedented mountains of data

• Large collections of human data now available

thanks, Human

Genome Project!

Page 6: Genome-wide strategies for detecting multiple loci that

Introduction

• Genetic epidemiologists have unprecedented mountains of data

• Large collections of human data now available

• Massively parallel genotyping can produce data for over a million genetic markers per person -- fast

thanks, Human

Genome Project!

Page 7: Genome-wide strategies for detecting multiple loci that

Introduction

Page 8: Genome-wide strategies for detecting multiple loci that

Introduction

• Great! So here’s the plan:

Page 9: Genome-wide strategies for detecting multiple loci that

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

Page 10: Genome-wide strategies for detecting multiple loci that

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

2. Compile list of genes near significant markers

Page 11: Genome-wide strategies for detecting multiple loci that

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

2. Compile list of genes near significant markers

3. Publish in Nature

Page 12: Genome-wide strategies for detecting multiple loci that

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

2. Compile list of genes near significant markers

3. Publish in Nature

4. Grow fat and wealthy with a supermodel spouse

Page 13: Genome-wide strategies for detecting multiple loci that

Introduction

Page 14: Genome-wide strategies for detecting multiple loci that

Introduction

• Wake up! The reality of genotype-phenotype association Hell:

Page 15: Genome-wide strategies for detecting multiple loci that

Introduction

• Wake up! The reality of genotype-phenotype association Hell:

• Evidence suggests interactions contribute broadly to complex traits

Page 16: Genome-wide strategies for detecting multiple loci that

Introduction

• Wake up! The reality of genotype-phenotype association Hell:

• Evidence suggests interactions contribute broadly to complex traits

• Frequency distribution of marker variants affects their statistical power

Page 17: Genome-wide strategies for detecting multiple loci that

Introduction

• This paper explores two questions

• Is there hope for consistently detecting such effects?

• How do we design and analyze genome-wide association studies?

Page 18: Genome-wide strategies for detecting multiple loci that

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

The Plan Today

Page 19: Genome-wide strategies for detecting multiple loci that

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Page 20: Genome-wide strategies for detecting multiple loci that

Interaction Models

Page 21: Genome-wide strategies for detecting multiple loci that

• Model: a mathematical description of how genes confer risk

Interaction Models

Page 22: Genome-wide strategies for detecting multiple loci that

• Model: a mathematical description of how genes confer risk

• Example: “exactly two disease variants from two susceptibility loci are required”

Interaction Models

Page 23: Genome-wide strategies for detecting multiple loci that

Interaction Models

• The example:

AA Aa aa

BB

Bb

bb

Page 24: Genome-wide strategies for detecting multiple loci that

Interaction Models

• Adding a disease variant at either marker multiplicatively increases risk

• Loci do not interact

Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2

After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that

plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.

Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping

Multiplicative withinand between loci

Two-locus interaction multiplicative effects

Two-locus interaction threshold effects

AA

Aa

!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

Odd

s

Locus 1

Locu

s 20.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Locus 1

bbBb

BB

Locu

s 2

bbBb

BB

Locu

s 2

bbBb

BB

aaAa AALocus 1

aaAa AA

aaAa AA

4.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

!(1+"2)

!(1+"1) !(1+")

!(1+")2

!(1+")2

!(1+")4!(1+"1)2

!(1+"1)(1+"2) !(1+")

!(1+")

!(1+")

!(1+")

!(1+"1)(1+"2)2

!(1–"2)2

!(1+"1)2(1+"2) !(1+"1)2(1+"2)2

a

b

Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.

Published online 27 March 2005; doi:10.1038/ng1537

1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. ([email protected]).

NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413

LET TERS

©20

05 N

atur

e Pu

blis

hing

Gro

up h

ttp://

ww

w.n

atur

e.co

m/n

atur

egen

etic

s

Model 1: multiplicative withinand between loci

Page 25: Genome-wide strategies for detecting multiple loci that

Interaction Models

• Neither locus alone is sufficient

• Multiple risk alleles from different loci increase risk linearly

Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2

After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that

plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.

Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping

Multiplicative withinand between loci

Two-locus interaction multiplicative effects

Two-locus interaction threshold effects

AA

Aa

!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

Odd

s

Locus 1

Locu

s 20.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Locus 1

bbBb

BB

Locu

s 2

bbBb

BB

Locu

s 2

bbBb

BB

aaAa AALocus 1

aaAa AA

aaAa AA

4.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

!(1+"2)

!(1+"1) !(1+")

!(1+")2

!(1+")2

!(1+")4!(1+"1)2

!(1+"1)(1+"2) !(1+")

!(1+")

!(1+")

!(1+")

!(1+"1)(1+"2)2

!(1–"2)2

!(1+"1)2(1+"2) !(1+"1)2(1+"2)2

a

b

Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.

Published online 27 March 2005; doi:10.1038/ng1537

1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. ([email protected]).

NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413

LET TERS

©20

05 N

atur

e Pu

blis

hing

Gro

up h

ttp://

ww

w.n

atur

e.co

m/n

atur

egen

etic

s

Model 2: two-locus interactionmultiplicative effects

Page 26: Genome-wide strategies for detecting multiple loci that

Interaction Models

• Neither locus alone is sufficient

• Presence of risk variants from both markers increases elevates risk to constant level

Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2

After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that

plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.

Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping

Multiplicative withinand between loci

Two-locus interaction multiplicative effects

Two-locus interaction threshold effects

AA

Aa

!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

Odd

s

Locus 1

Locu

s 20.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Locus 1

bbBb

BB

Locu

s 2

bbBb

BB

Locu

s 2

bbBb

BB

aaAa AALocus 1

aaAa AA

aaAa AA

4.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

!(1+"2)

!(1+"1) !(1+")

!(1+")2

!(1+")2

!(1+")4!(1+"1)2

!(1+"1)(1+"2) !(1+")

!(1+")

!(1+")

!(1+")

!(1+"1)(1+"2)2

!(1–"2)2

!(1+"1)2(1+"2) !(1+"1)2(1+"2)2

a

b

Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.

Published online 27 March 2005; doi:10.1038/ng1537

1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. ([email protected]).

NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413

LET TERS

©20

05 N

atur

e Pu

blis

hing

Gro

up h

ttp://

ww

w.n

atur

e.co

m/n

atur

egen

etic

s

Model 3: two-locus interactionthreshold effects

Page 27: Genome-wide strategies for detecting multiple loci that

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Page 28: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

Page 29: Genome-wide strategies for detecting multiple loci that

• Outside our dream world, we have to be selective in the tests we conduct

Analysis Strategies

Page 30: Genome-wide strategies for detecting multiple loci that

• Outside our dream world, we have to be selective in the tests we conduct

• Tests cost time. Time is money.

Analysis Strategies

Page 31: Genome-wide strategies for detecting multiple loci that

• Outside our dream world, we have to be selective in the tests we conduct

• Tests cost time. Time is money.

• Tests cost significance

Analysis Strategies

Page 32: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

Page 33: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy I -- “Dreamland”

Page 34: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

Page 35: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

• For n markers, n tests are required

Page 36: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

• For n markers, n tests are required

• Has a snowball’s chance to discover interactions

Page 37: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

• For n markers, n tests are required

• Has a snowball’s chance to discover interactions

Page 38: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

Page 39: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy II -- “Styx”

Page 40: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy II -- “Styx”

• Test all pairs of loci

Page 41: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy II -- “Styx”

• Test all pairs of loci

• Requires n2 tests

Page 42: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy II -- “Styx”

• Test all pairs of loci

• Requires n2 tests

• Will discover all pairwise interactions, assuming their effects survive correction for multiple tests

Page 43: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

Page 44: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy III -- “The Compromise”

Page 45: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy III -- “The Compromise”

• Search for mildly associated loci

Page 46: Genome-wide strategies for detecting multiple loci that

Analysis Strategies

• Strategy III -- “The Compromise”

• Search for mildly associated loci

• All pairs of selected loci are tested

Page 47: Genome-wide strategies for detecting multiple loci that

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Page 48: Genome-wide strategies for detecting multiple loci that

Power Analysis

Page 49: Genome-wide strategies for detecting multiple loci that

• Simulated genotypes generated at two loci under each model

Power Analysis

Page 50: Genome-wide strategies for detecting multiple loci that

• Simulated genotypes generated at two loci under each model

• Calculations assume L = 300,000 markers, with two (unobserved) causative loci

Power Analysis

Page 51: Genome-wide strategies for detecting multiple loci that

• Simulated genotypes generated at two loci under each model

• Calculations assume L = 300,000 markers, with two (unobserved) causative loci

• Bonferroni correction applied

Power Analysis

Page 52: Genome-wide strategies for detecting multiple loci that

Power AnalysisD

ista

nce

to

dis

ease

locu

sH

igh

Med

ium

Low

Dreamland(either locus)

Dreamland(both loci)

Styx(both loci)

The Compromise(both loci)

Page 53: Genome-wide strategies for detecting multiple loci that

Power Analysis

Page 54: Genome-wide strategies for detecting multiple loci that

Power Analysis

• Interaction-based searches perform well, in spite of harsh correction

Page 55: Genome-wide strategies for detecting multiple loci that

Power Analysis

• Interaction-based searches perform well, in spite of harsh correction

• Except when recovering one marker under Model 1

Page 56: Genome-wide strategies for detecting multiple loci that

Power Analysis

• Interaction-based searches perform well, in spite of harsh correction

• Except when recovering one marker under Model 1

• Power strongly correlated with minor allele frequency and LD

Page 57: Genome-wide strategies for detecting multiple loci that

Power Analysis

Page 58: Genome-wide strategies for detecting multiple loci that

Power Analysis

• All three strategies are computationally feasible

Page 59: Genome-wide strategies for detecting multiple loci that

Power Analysis

• All three strategies are computationally feasible

• Styx approach took 33 hours on ten nodes with 300,000 markers and 2,000 subjects

Page 60: Genome-wide strategies for detecting multiple loci that

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Page 61: Genome-wide strategies for detecting multiple loci that

Loose Ends

Page 62: Genome-wide strategies for detecting multiple loci that

• Power analysis suggests reasons for failure to replicate

Loose Ends

Page 63: Genome-wide strategies for detecting multiple loci that

• Power analysis suggests reasons for failure to replicate

• Presence of locus interaction

Loose Ends

Page 64: Genome-wide strategies for detecting multiple loci that

• Power analysis suggests reasons for failure to replicate

• Presence of locus interaction

• Different allele frequencies between initial and follow-up cohorts

Loose Ends

Page 65: Genome-wide strategies for detecting multiple loci that

Loose Ends

Page 66: Genome-wide strategies for detecting multiple loci that

Loose Ends

• This study understates usefulness of interaction searches

Page 67: Genome-wide strategies for detecting multiple loci that

Loose Ends

• This study understates usefulness of interaction searches

• Bonferroni is conservative

Page 68: Genome-wide strategies for detecting multiple loci that

Loose Ends

• This study understates usefulness of interaction searches

• Bonferroni is conservative

• Permutation testing would be more accurate

Page 69: Genome-wide strategies for detecting multiple loci that

Conclusions

Page 70: Genome-wide strategies for detecting multiple loci that

Conclusions

• All non-exhaustive interaction searches may miss some effects

Page 71: Genome-wide strategies for detecting multiple loci that

Conclusions

• All non-exhaustive interaction searches may miss some effects

• Complete enumeration is too expensive for higher order effects

Page 72: Genome-wide strategies for detecting multiple loci that

Conclusions

• All non-exhaustive interaction searches may miss some effects

• Complete enumeration is too expensive for higher order effects

• The Compromise provides the best of both worlds in most studies

Page 73: Genome-wide strategies for detecting multiple loci that
Page 74: Genome-wide strategies for detecting multiple loci that

Questions