genome-wide strategies for detecting multiple loci that
TRANSCRIPT
Genome-wide strategies for detecting multiple loci that influence complex diseases
Jonathan Marchini, Peter Donnelly, Lon R Cardon
Presented by Jeff Kilpatrick
Introduction
Introduction
• Genetic epidemiologists have unprecedented mountains of data
Introduction
• Genetic epidemiologists have unprecedented mountains of data thanks, Human
Genome Project!
Introduction
• Genetic epidemiologists have unprecedented mountains of data
• Large collections of human data now available
thanks, Human
Genome Project!
Introduction
• Genetic epidemiologists have unprecedented mountains of data
• Large collections of human data now available
• Massively parallel genotyping can produce data for over a million genetic markers per person -- fast
thanks, Human
Genome Project!
Introduction
Introduction
• Great! So here’s the plan:
Introduction
• Great! So here’s the plan:
1. Evaluate each marker for association with disease
Introduction
• Great! So here’s the plan:
1. Evaluate each marker for association with disease
2. Compile list of genes near significant markers
Introduction
• Great! So here’s the plan:
1. Evaluate each marker for association with disease
2. Compile list of genes near significant markers
3. Publish in Nature
Introduction
• Great! So here’s the plan:
1. Evaluate each marker for association with disease
2. Compile list of genes near significant markers
3. Publish in Nature
4. Grow fat and wealthy with a supermodel spouse
Introduction
Introduction
• Wake up! The reality of genotype-phenotype association Hell:
Introduction
• Wake up! The reality of genotype-phenotype association Hell:
• Evidence suggests interactions contribute broadly to complex traits
Introduction
• Wake up! The reality of genotype-phenotype association Hell:
• Evidence suggests interactions contribute broadly to complex traits
• Frequency distribution of marker variants affects their statistical power
Introduction
• This paper explores two questions
• Is there hope for consistently detecting such effects?
• How do we design and analyze genome-wide association studies?
• Interaction models
• Analysis strategies
• Power analysis
• Loose ends
The Plan Today
• Interaction models
• Analysis strategies
• Power analysis
• Loose ends
Interaction Models
• Model: a mathematical description of how genes confer risk
Interaction Models
• Model: a mathematical description of how genes confer risk
• Example: “exactly two disease variants from two susceptibility loci are required”
Interaction Models
Interaction Models
• The example:
AA Aa aa
BB
Bb
bb
✓
✓
✓
Interaction Models
• Adding a disease variant at either marker multiplicatively increases risk
• Loci do not interact
Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2
After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that
plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.
Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping
Multiplicative withinand between loci
Two-locus interaction multiplicative effects
Two-locus interaction threshold effects
AA
Aa
!aa
BBBbbb
!AA
!Aa
!!!aa
BBBbbb
!AA
!Aa
!!!aa
BBBbbb
Odd
s
Locus 1
Locu
s 20.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
Locus 1
bbBb
BB
Locu
s 2
bbBb
BB
Locu
s 2
bbBb
BB
aaAa AALocus 1
aaAa AA
aaAa AA
4.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
!(1+"2)
!(1+"1) !(1+")
!(1+")2
!(1+")2
!(1+")4!(1+"1)2
!(1+"1)(1+"2) !(1+")
!(1+")
!(1+")
!(1+")
!(1+"1)(1+"2)2
!(1–"2)2
!(1+"1)2(1+"2) !(1+"1)2(1+"2)2
a
b
Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.
Published online 27 March 2005; doi:10.1038/ng1537
1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. ([email protected]).
NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413
LET TERS
©20
05 N
atur
e Pu
blis
hing
Gro
up h
ttp://
ww
w.n
atur
e.co
m/n
atur
egen
etic
s
Model 1: multiplicative withinand between loci
Interaction Models
• Neither locus alone is sufficient
• Multiple risk alleles from different loci increase risk linearly
Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2
After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that
plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.
Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping
Multiplicative withinand between loci
Two-locus interaction multiplicative effects
Two-locus interaction threshold effects
AA
Aa
!aa
BBBbbb
!AA
!Aa
!!!aa
BBBbbb
!AA
!Aa
!!!aa
BBBbbb
Odd
s
Locus 1
Locu
s 20.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
Locus 1
bbBb
BB
Locu
s 2
bbBb
BB
Locu
s 2
bbBb
BB
aaAa AALocus 1
aaAa AA
aaAa AA
4.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
!(1+"2)
!(1+"1) !(1+")
!(1+")2
!(1+")2
!(1+")4!(1+"1)2
!(1+"1)(1+"2) !(1+")
!(1+")
!(1+")
!(1+")
!(1+"1)(1+"2)2
!(1–"2)2
!(1+"1)2(1+"2) !(1+"1)2(1+"2)2
a
b
Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.
Published online 27 March 2005; doi:10.1038/ng1537
1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. ([email protected]).
NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413
LET TERS
©20
05 N
atur
e Pu
blis
hing
Gro
up h
ttp://
ww
w.n
atur
e.co
m/n
atur
egen
etic
s
Model 2: two-locus interactionmultiplicative effects
Interaction Models
• Neither locus alone is sufficient
• Presence of risk variants from both markers increases elevates risk to constant level
Genome-wide strategies for detecting multiple loci thatinfluence complex diseasesJonathan Marchini1, Peter Donnelly1 & Lon R Cardon2
After nearly 10 years of intense academic and commercialresearch effort, large genome-wide association studies forcommon complex diseases are now imminent. Although theseconditions involve a complex relationship between genotypeand phenotype, including interactions between unlinked loci1,the prevailing strategies for analysis of such studies focus onthe locus-by-locus paradigm. Here we consider analyticalmethods that explicitly look for statistical interactions betweenloci. We show first that they are computationally feasible, evenfor studies of hundreds of thousands of loci, and second thateven with a conservative correction for multiple testing, theycan be more powerful than traditional analyses under a rangeof models for interlocus interactions. We also show that
plausible variations across populations in allele frequenciesamong interacting loci can markedly affect the power to detecttheir marginal effects, which may account in part for the well-known difficulties in replicating association results. Theseresults suggest that searching for interactions among geneticloci can be fruitfully incorporated into analysis strategies forgenome-wide association studies.
Since the completion of the human genome project, genome-wideassociation studies have been considered to hold promise for unravel-ing the genetic etiology of complex traits2. It is now possible to assessthis promise, as the emergence of large marker panels, large collectionsof well-phenotyped human samples and high-throughput genotyping
Multiplicative withinand between loci
Two-locus interaction multiplicative effects
Two-locus interaction threshold effects
AA
Aa
!aa
BBBbbb
!AA
!Aa
!!!aa
BBBbbb
!AA
!Aa
!!!aa
BBBbbb
Odd
s
Locus 1
Locu
s 20.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
Locus 1
bbBb
BB
Locu
s 2
bbBb
BB
Locu
s 2
bbBb
BB
aaAa AALocus 1
aaAa AA
aaAa AA
4.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
!(1+"2)
!(1+"1) !(1+")
!(1+")2
!(1+")2
!(1+")4!(1+"1)2
!(1+"1)(1+"2) !(1+")
!(1+")
!(1+")
!(1+")
!(1+"1)(1+"2)2
!(1–"2)2
!(1+"1)2(1+"2) !(1+"1)2(1+"2)2
a
b
Figure 1 Multilocus models of disease. (a) The odds of disease for two loci under the epistatic scenarios considered. In model 1, the odds increasemultiplicatively with genotype both within and between loci. In model 2, the odds have a baseline value (a) unless both loci have at least one disease-associated allele. After that, the odds increase multiplicatively within and between genotypes. Model 3 is similar to model 2 but specifies a threshold ofdisease effects rather than multiplicative gene action. Both loci have the same effect size. As models 2 and 3 include no explicit marginal effects, they areexpected to be harder to detect without an interaction-based search strategy. (b) Examples of the genotypic risks under illustrative parameters. In theseexamples, pA ¼ pB ¼ 0.25 and l ¼ 0.20, which permits derivation of the genotypic effects, y, as 0.20, 0.45 and 0.53 for the examples shown (left toright); a ¼ 1.0 for illustration purposes.
Published online 27 March 2005; doi:10.1038/ng1537
1Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK. 2Wellcome Trust Centre for Human Genetics, University of Oxford,Oxford OX3 7BN, UK. Correspondence should be addressed to L.R.C. ([email protected]).
NATURE GENETICS VOLUME 37 [ NUMBER 4 [ APRIL 2005 413
LET TERS
©20
05 N
atur
e Pu
blis
hing
Gro
up h
ttp://
ww
w.n
atur
e.co
m/n
atur
egen
etic
s
Model 3: two-locus interactionthreshold effects
• Interaction models
• Analysis strategies
• Power analysis
• Loose ends
Analysis Strategies
• Outside our dream world, we have to be selective in the tests we conduct
Analysis Strategies
• Outside our dream world, we have to be selective in the tests we conduct
• Tests cost time. Time is money.
Analysis Strategies
• Outside our dream world, we have to be selective in the tests we conduct
• Tests cost time. Time is money.
• Tests cost significance
Analysis Strategies
Analysis Strategies
Analysis Strategies
• Strategy I -- “Dreamland”
Analysis Strategies
• Strategy I -- “Dreamland”
• Perform locus-by-locus search
Analysis Strategies
• Strategy I -- “Dreamland”
• Perform locus-by-locus search
• For n markers, n tests are required
Analysis Strategies
• Strategy I -- “Dreamland”
• Perform locus-by-locus search
• For n markers, n tests are required
• Has a snowball’s chance to discover interactions
Analysis Strategies
• Strategy I -- “Dreamland”
• Perform locus-by-locus search
• For n markers, n tests are required
• Has a snowball’s chance to discover interactions
Analysis Strategies
Analysis Strategies
• Strategy II -- “Styx”
Analysis Strategies
• Strategy II -- “Styx”
• Test all pairs of loci
Analysis Strategies
• Strategy II -- “Styx”
• Test all pairs of loci
• Requires n2 tests
Analysis Strategies
• Strategy II -- “Styx”
• Test all pairs of loci
• Requires n2 tests
• Will discover all pairwise interactions, assuming their effects survive correction for multiple tests
Analysis Strategies
Analysis Strategies
• Strategy III -- “The Compromise”
Analysis Strategies
• Strategy III -- “The Compromise”
• Search for mildly associated loci
Analysis Strategies
• Strategy III -- “The Compromise”
• Search for mildly associated loci
• All pairs of selected loci are tested
• Interaction models
• Analysis strategies
• Power analysis
• Loose ends
Power Analysis
• Simulated genotypes generated at two loci under each model
Power Analysis
• Simulated genotypes generated at two loci under each model
• Calculations assume L = 300,000 markers, with two (unobserved) causative loci
Power Analysis
• Simulated genotypes generated at two loci under each model
• Calculations assume L = 300,000 markers, with two (unobserved) causative loci
• Bonferroni correction applied
Power Analysis
Power AnalysisD
ista
nce
to
dis
ease
locu
sH
igh
Med
ium
Low
Dreamland(either locus)
Dreamland(both loci)
Styx(both loci)
The Compromise(both loci)
Power Analysis
Power Analysis
• Interaction-based searches perform well, in spite of harsh correction
Power Analysis
• Interaction-based searches perform well, in spite of harsh correction
• Except when recovering one marker under Model 1
Power Analysis
• Interaction-based searches perform well, in spite of harsh correction
• Except when recovering one marker under Model 1
• Power strongly correlated with minor allele frequency and LD
Power Analysis
Power Analysis
• All three strategies are computationally feasible
Power Analysis
• All three strategies are computationally feasible
• Styx approach took 33 hours on ten nodes with 300,000 markers and 2,000 subjects
• Interaction models
• Analysis strategies
• Power analysis
• Loose ends
Loose Ends
• Power analysis suggests reasons for failure to replicate
Loose Ends
• Power analysis suggests reasons for failure to replicate
• Presence of locus interaction
Loose Ends
• Power analysis suggests reasons for failure to replicate
• Presence of locus interaction
• Different allele frequencies between initial and follow-up cohorts
Loose Ends
Loose Ends
Loose Ends
• This study understates usefulness of interaction searches
Loose Ends
• This study understates usefulness of interaction searches
• Bonferroni is conservative
Loose Ends
• This study understates usefulness of interaction searches
• Bonferroni is conservative
• Permutation testing would be more accurate
Conclusions
Conclusions
• All non-exhaustive interaction searches may miss some effects
Conclusions
• All non-exhaustive interaction searches may miss some effects
• Complete enumeration is too expensive for higher order effects
Conclusions
• All non-exhaustive interaction searches may miss some effects
• Complete enumeration is too expensive for higher order effects
• The Compromise provides the best of both worlds in most studies
Questions