genome-wide strategies for detecting multiple loci that

Post on 04-Jun-2022

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genome-wide strategies for detecting multiple loci that influence complex diseases

Jonathan Marchini, Peter Donnelly, Lon R Cardon

Presented by Jeff Kilpatrick

Introduction

Introduction

• Genetic epidemiologists have unprecedented mountains of data

Introduction

• Genetic epidemiologists have unprecedented mountains of data thanks, Human

Genome Project!

Introduction

• Genetic epidemiologists have unprecedented mountains of data

• Large collections of human data now available

thanks, Human

Genome Project!

Introduction

• Genetic epidemiologists have unprecedented mountains of data

• Large collections of human data now available

• Massively parallel genotyping can produce data for over a million genetic markers per person -- fast

thanks, Human

Genome Project!

Introduction

Introduction

• Great! So here’s the plan:

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

2. Compile list of genes near significant markers

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

2. Compile list of genes near significant markers

3. Publish in Nature

Introduction

• Great! So here’s the plan:

1. Evaluate each marker for association with disease

2. Compile list of genes near significant markers

3. Publish in Nature

4. Grow fat and wealthy with a supermodel spouse

Introduction

Introduction

• Wake up! The reality of genotype-phenotype association Hell:

Introduction

• Wake up! The reality of genotype-phenotype association Hell:

• Evidence suggests interactions contribute broadly to complex traits

Introduction

• Wake up! The reality of genotype-phenotype association Hell:

• Evidence suggests interactions contribute broadly to complex traits

• Frequency distribution of marker variants affects their statistical power

Introduction

• This paper explores two questions

• Is there hope for consistently detecting such effects?

• How do we design and analyze genome-wide association studies?

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

The Plan Today

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Interaction Models

• Model: a mathematical description of how genes confer risk

Interaction Models

• Model: a mathematical description of how genes confer risk

• Example: “exactly two disease variants from two susceptibility loci are required”

Interaction Models

Interaction Models

• The example:

AA Aa aa

BB

Bb

bb

Interaction Models

• Adding a disease variant at either marker multiplicatively increases risk

• Loci do not interact

Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2

After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that

plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.

Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping

Multiplicative withinand between loci

Two-locus interaction multiplicative effects

Two-locus interaction threshold effects

AA

Aa

!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

Odd

s

Locus 1

Locu

s 20.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Locus 1

bbBb

BB

Locu

s 2

bbBb

BB

Locu

s 2

bbBb

BB

aaAa AALocus 1

aaAa AA

aaAa AA

4.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

!(1+"2)

!(1+"1) !(1+")

!(1+")2

!(1+")2

!(1+")4!(1+"1)2

!(1+"1)(1+"2) !(1+")

!(1+")

!(1+")

!(1+")

!(1+"1)(1+"2)2

!(1–"2)2

!(1+"1)2(1+"2) !(1+"1)2(1+"2)2

a

b

Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.

Published online 27 March 2005; doi:10.1038/ng1537

1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. (lon.cardon@well.ox.ac.uk).

NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413

LET TERS

©20

05 N

atur

e Pu

blis

hing

Gro

up h

ttp://

ww

w.n

atur

e.co

m/n

atur

egen

etic

s

Model 1: multiplicative withinand between loci

Interaction Models

• Neither locus alone is sufficient

• Multiple risk alleles from different loci increase risk linearly

Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2

After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that

plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.

Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping

Multiplicative withinand between loci

Two-locus interaction multiplicative effects

Two-locus interaction threshold effects

AA

Aa

!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

Odd

s

Locus 1

Locu

s 20.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Locus 1

bbBb

BB

Locu

s 2

bbBb

BB

Locu

s 2

bbBb

BB

aaAa AALocus 1

aaAa AA

aaAa AA

4.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

!(1+"2)

!(1+"1) !(1+")

!(1+")2

!(1+")2

!(1+")4!(1+"1)2

!(1+"1)(1+"2) !(1+")

!(1+")

!(1+")

!(1+")

!(1+"1)(1+"2)2

!(1–"2)2

!(1+"1)2(1+"2) !(1+"1)2(1+"2)2

a

b

Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.

Published online 27 March 2005; doi:10.1038/ng1537

1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. (lon.cardon@well.ox.ac.uk).

NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413

LET TERS

©20

05 N

atur

e Pu

blis

hing

Gro

up h

ttp://

ww

w.n

atur

e.co

m/n

atur

egen

etic

s

Model 2: two-locus interactionmultiplicative effects

Interaction Models

• Neither locus alone is sufficient

• Presence of risk variants from both markers increases elevates risk to constant level

Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2

After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that

plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.

Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping

Multiplicative withinand between loci

Two-locus interaction multiplicative effects

Two-locus interaction threshold effects

AA

Aa

!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

!AA

!Aa

!!!aa

BBBbbb

Odd

s

Locus 1

Locu

s 20.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0

0.5

1.0

1.5

2.0

2.5

Locus 1

bbBb

BB

Locu

s 2

bbBb

BB

Locu

s 2

bbBb

BB

aaAa AALocus 1

aaAa AA

aaAa AA

4.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

!(1+"2)

!(1+"1) !(1+")

!(1+")2

!(1+")2

!(1+")4!(1+"1)2

!(1+"1)(1+"2) !(1+")

!(1+")

!(1+")

!(1+")

!(1+"1)(1+"2)2

!(1–"2)2

!(1+"1)2(1+"2) !(1+"1)2(1+"2)2

a

b

Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.

Published online 27 March 2005; doi:10.1038/ng1537

1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. (lon.cardon@well.ox.ac.uk).

NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413

LET TERS

©20

05 N

atur

e Pu

blis

hing

Gro

up h

ttp://

ww

w.n

atur

e.co

m/n

atur

egen

etic

s

Model 3: two-locus interactionthreshold effects

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Analysis Strategies

• Outside our dream world, we have to be selective in the tests we conduct

Analysis Strategies

• Outside our dream world, we have to be selective in the tests we conduct

• Tests cost time. Time is money.

Analysis Strategies

• Outside our dream world, we have to be selective in the tests we conduct

• Tests cost time. Time is money.

• Tests cost significance

Analysis Strategies

Analysis Strategies

Analysis Strategies

• Strategy I -- “Dreamland”

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

• For n markers, n tests are required

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

• For n markers, n tests are required

• Has a snowball’s chance to discover interactions

Analysis Strategies

• Strategy I -- “Dreamland”

• Perform locus-by-locus search

• For n markers, n tests are required

• Has a snowball’s chance to discover interactions

Analysis Strategies

Analysis Strategies

• Strategy II -- “Styx”

Analysis Strategies

• Strategy II -- “Styx”

• Test all pairs of loci

Analysis Strategies

• Strategy II -- “Styx”

• Test all pairs of loci

• Requires n2 tests

Analysis Strategies

• Strategy II -- “Styx”

• Test all pairs of loci

• Requires n2 tests

• Will discover all pairwise interactions, assuming their effects survive correction for multiple tests

Analysis Strategies

Analysis Strategies

• Strategy III -- “The Compromise”

Analysis Strategies

• Strategy III -- “The Compromise”

• Search for mildly associated loci

Analysis Strategies

• Strategy III -- “The Compromise”

• Search for mildly associated loci

• All pairs of selected loci are tested

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Power Analysis

• Simulated genotypes generated at two loci under each model

Power Analysis

• Simulated genotypes generated at two loci under each model

• Calculations assume L = 300,000 markers, with two (unobserved) causative loci

Power Analysis

• Simulated genotypes generated at two loci under each model

• Calculations assume L = 300,000 markers, with two (unobserved) causative loci

• Bonferroni correction applied

Power Analysis

Power AnalysisD

ista

nce

to

dis

ease

locu

sH

igh

Med

ium

Low

Dreamland(either locus)

Dreamland(both loci)

Styx(both loci)

The Compromise(both loci)

Power Analysis

Power Analysis

• Interaction-based searches perform well, in spite of harsh correction

Power Analysis

• Interaction-based searches perform well, in spite of harsh correction

• Except when recovering one marker under Model 1

Power Analysis

• Interaction-based searches perform well, in spite of harsh correction

• Except when recovering one marker under Model 1

• Power strongly correlated with minor allele frequency and LD

Power Analysis

Power Analysis

• All three strategies are computationally feasible

Power Analysis

• All three strategies are computationally feasible

• Styx approach took 33 hours on ten nodes with 300,000 markers and 2,000 subjects

• Interaction models

• Analysis strategies

• Power analysis

• Loose ends

Loose Ends

• Power analysis suggests reasons for failure to replicate

Loose Ends

• Power analysis suggests reasons for failure to replicate

• Presence of locus interaction

Loose Ends

• Power analysis suggests reasons for failure to replicate

• Presence of locus interaction

• Different allele frequencies between initial and follow-up cohorts

Loose Ends

Loose Ends

Loose Ends

• This study understates usefulness of interaction searches

Loose Ends

• This study understates usefulness of interaction searches

• Bonferroni is conservative

Loose Ends

• This study understates usefulness of interaction searches

• Bonferroni is conservative

• Permutation testing would be more accurate

Conclusions

Conclusions

• All non-exhaustive interaction searches may miss some effects

Conclusions

• All non-exhaustive interaction searches may miss some effects

• Complete enumeration is too expensive for higher order effects

Conclusions

• All non-exhaustive interaction searches may miss some effects

• Complete enumeration is too expensive for higher order effects

• The Compromise provides the best of both worlds in most studies

Questions

top related