association analysis shaun purcell boulder twin workshop 2004
Post on 20-Dec-2015
217 Views
Preview:
TRANSCRIPT
Association analysisAssociation analysis
Shaun PurcellBoulder Twin Workshop 2004
OverviewOverview
• Candidate gene association
• Haplotypes and linkage disequilibrium
• Linkage and association
• Family-based association
What is association?What is association?
• Categorical traits– disease susceptibility genes
• Continuous traits– quantitative trait loci, QTL
Disease traitsDisease traits
Case Control
AA n1 n2
Aa n3 n4
aa n5 n6
Is there a difference in allele/genotype frequency between cases and controls?
Disease traitsDisease traits
Case Control
AA 30 25 p2
Aa 50 50 2p(1-p)
aa 20 25 (1-p)2
Is there a difference in allele/genotype frequency between cases and controls?
2Test for independence , p-value
Disease traitsDisease traits
Case
Control
AA n1 n2
Aa n3 n4
aa n5 n6
Case Control
A 2n1+n3
2n2+n4
a 2n5+n3
2n6+n4
Case
Control
A* n1+n3
n2+n4
aa n5 n6
General model Additive model Dominant model for A
2 df
1 df 1 df
Effect sizes calculated as odds ratios
Quantitative traitsQuantitative traits
AA
Aa
aa
-2
-1
0
1
2
3
4
aa Aa AA
ID Y G A D001 0.34 aa -1 0002 1.23 Aa 0 1003 1.66 Aa 0 1004 2.74 AA 1 0005 1.33 AA 1 0… … … … …
Y = aA + dD + e
Some web resourcesSome web resources• BGIM
http://statgen.iop.kcl.ac.uk/bgim/Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language.
• GxE moderator modelshttp://statgen.iop.kcl.ac.uk/gxe/
• Power calculationhttp://statgen.iop.kcl.ac.uk/gpc/
• Case/control association toolshttp://statgen.iop.kcl.ac.uk/gpc/model/
Relative riskRelative riskGenotype P(D|G) RR
AA P(D|AA) P(D|AA)/P(D|aa)
Aa P(D|Aa) P(D|Aa)/P(D|aa)
aa P(D|aa) 1
P(D|AA) / P(D|aa) labelled RR(AA)
P(D|Aa) / P(D|aa) labelled RR(Aa)
Genetic modelsGenetic modelsModel RR(Aa) RR(AA)
General x y
Multiplicative x x2
Dominant x x
Recessive 1.000 x
No effect 1.000 1.000
TestsTestsTest Alternate NullAny effect? General No effectAny effect assuming a multiplicative gene?
Multiplicative
No effect
Any effect assuming a dominant gene?
Dominance No effect
Any effect assuming a recessive gene?
Recessive No effect
Can we assume a multiplicative effect?
General Multiplicative
Can we assume a dominant effect?
General Dominance
Can we assume a recessive effect?
General Recessive
Multiple samplesMultiple samples
• Constrain frequencies across samples• Constrain effects across samples
– Can test genetic models with effects and/or frequencies constrained to be equal
– Can perform tests of homogeneity of effects and/or frequencies across samples
An exampleAn example2 case/control samples2 case/control samples
• Population frequency 5%
Case
Control
AA 17 11
Aa 35 59
aa 24 40
Case
Control
AA 37 10
Aa 67 43
aa 20 37
Homogeneous effects across samplesHomogeneous allele frequencies across samples
Model p RR(Aa)RR(AA)-2LL----- - ---------------- Gen 0.367 1.979 3.663
0.367 1.979 3.663 793.143
Mult 0.367 1.911 3.6510.367 1.911 3.651 793.199
Dom 0.401 1.990 1.9900.401 1.990 1.990
802.927
Rec 0.405 1.000 1.9210.405 1.000 1.921
805.064
None 0.442 1.000 1.0000.442 1.000 1.000 815.628
Heterogeneous effects across samplesHomogeneous allele frequencies across samples
Model p RR(Aa) RR(AA) -2LL----- - ------ ------ ---- Gen 0.367 1.235 2.136
0.367 2.890 5.547 786.498
Mult 0.367 1.440 2.073 0.367 2.282 5.208 788.262
Dom 0.401 1.216 1.2160.401 2.936 2.936 796.422
Rec 0.405 1.000 1.5190.405 1.000 2.195 803.849
None 0.443 1.000 1.0000.443 1.000 1.000 815.628
TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS=========================================================
Gen vs None (2 df) : 22.485 p = 0.000Mult vs None (1 df) : 22.429 p = 0.000Dom vs None (1 df) : 12.701 p = 0.000Rec vs None (1 df) : 10.564 p = 0.001Gen vs Mult (1 df) : 0.056 p = 0.813Gen vs Dom (1 df) : 9.784 p = 0.002Gen vs Rec (1 df) : 11.921 p = 0.001
TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS===========================================================
Gen vs None (4 df) : 29.130 p = 0.000Mult vs None (2 df) : 27.366 p = 0.000Dom vs None (2 df) : 19.205 p = 0.000Rec vs None (2 df) : 11.779 p = 0.003Gen vs Mult (2 df) : 1.764 p = 0.414Gen vs Dom (2 df) : 9.925 p = 0.007Gen vs Rec (2 df) : 17.351 p = 0.000
TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS===========================================
w/ Gen model (2 df) : 6.645 p = 0.036w/ Mult model (1 df) : 4.938 p = 0.026w/ Dom model (1 df) : 6.505 p = 0.011w/ Rec model (1 df) : 1.215 p = 0.270
Indirect associationIndirect association
QTL
Genotyped markers
Ungenotyped markers
RecombinationRecombination
Paternal chromosomeMaternal chromosome
Homologous chromosomes in one parent
Recombination eventduring meiosis
Recombinant gamete transmitted,harboring mutation
RecombinationRecombination
Paternal chromosomeMaternal chromosome
Homologous chromosomes in one parent
No recombination eventduring meiosis
Nonrecombinant gamete transmitted,not harboring mutation
Linkage: affected sib Linkage: affected sib pairspairs
Paternal chromosomeMaternal chromosome
First affected offspring, no recombination
Second affected offspring,recombinant gamete
IBD sharing from this one parent (0 or 1)1
0
Association analysisAssociation analysis
• Mutation occurs on a ‘red’ chromosome
Association analysisAssociation analysis
• Mutation occurs on a ‘red’ chromosome
Association analysisAssociation analysis
• Association due to `linkage disequilibrium’
A aM AM aMm Am am
This individual has aa and Mm genotypes
and am and aM haplotypes
HaplotypesHaplotypes
A aM AM aMm Am am
This individual has Aa and Mm genotypes and AM and am haplotypes
… but given only genotype data, consistent with Am/aM as well as
AM/am
HaplotypesHaplotypes
A aM AM aMm Am am
This individual has AA and Mm genotypes
and AM and Am haplotypes
HaplotypesHaplotypes
Equilibrium haplotype Equilibrium haplotype frequenciesfrequencies
A aM pr ps pm qr qs q
r s
Linkage disequilibriumLinkage disequilibrium
A aM pr + D ps - D pm qr - D qs + D q
r s
DMAX = Min(qs, pr)
D’ = D /DMAX
r2 = D’ / pqrs
Haplotype analysisHaplotype analysis
1. Estimate haplotypes from genotypes2. Associate haplotypes with trait
Haplotype Freq. Odds RatioAAGG 40% 1.00*
AAGT 30% 2.21
CGCG 25% 1.07
AGCT 5% 0.92
* baseline, fixed to 1.00
LinkageLinkage AssociationAssociation
QTL genotype
Trait
IBD at the QTL
Sib correlation
0 1 2 aa Aa AA
Marker genotype
Trait
QTL genotype
Trait
LDRF
IBD at the Marker
Sib correlation
0 1 2IBD at the QTL
Sib correlation
0 1 2 aa Aa AAaa Aa AA
Variance ComponentsVariance Components
• MeansM1 M2
• Variance-covariance matrix
V1 C21
C12 V2
ASSOCIATION
LINKAGE
Variance ComponentsVariance Components
• MeansM1 + bG1 M2 + bG2
• Variance-covariance matrix
V1 C21+ q(-½)
C12 + q(-½) V2
LINKAGEq = regression coef. = IBD sharing 0 , ½ , 1
ASSOCIATIONb = regression coef.G = individual’s genotype
• POPULATION MODEL– Allele & genotype frequencies– Demographics & population history– Linkage disequilibrium, haplotype structure
• TRANSMISSION MODEL– Mendelian segregation– Identity by descent & genetic relatedness
• PHENOTYPE MODEL– Biometrical model of quantitative traits– Additive & dominance components
Components of a Genetic Components of a Genetic TheoryTheory
G
G
G
G
G
G
G
G
Time
G
G
G
G
G
G
G
G
GG
G
G
G
G
GG
PP
3/5 2/6
3/2 5/2
3/5 2/6
3/6 5/6
Both families are ‘linked’ with the marker…
…but a different allele is involved.
Linkage without associationLinkage without association
3/6 2/4
3/2 6/2
3/5 2/6
3/6 5/6
All families are ‘linked’ with the marker…
… and allele 6 is ‘associated’ with disease
4/6 2/6
6/6 6/6
Linkage is just association within families
Linkage and associationLinkage and association
3/6
2/43/2
6/23/5
2/5
3/6 5/6
Allele 6 is more common in the GREEN populationThe disease is more common in the GREEN population
… a ‘spurious association’
4/62/6
6/6
2/2
3/4
5/2
Controls Cases
Association without Association without linkagelinkage
TDTTDT
• Transmission disequilibrium test– test for linkage and association
AA Aa
Aa AA
AA AA
Aa
aa AA
Aa
Aa Aa
TDT “A” disease alleleTDT “A” disease allele
AA x Aa AA x Aa aa x Aa aa x Aa
AA Aa Aa aa
+ - + -
0.5 0.5 + -
+ - 0.5 0.5
Additive
Dominant
Recessive
Between and within Between and within componentscomponents
Sib1
Sib2
Sib1 = B - W
Sib2 = B + W
Between and within Between and within componentscomponents
• Fulker et al (1999)
S1 S2 S1 S2 B W S1 S2
AA AA 1 1 1 0 B+W B-W
AA Aa 1 0 0.5 0.5
B+W B-W
AA aa 1 -1 0 1 B+W B-W
Note : W = S1 – B
Parental genotypesParental genotypes
• Use parental genotypes to generate B
• Examples– AA from AAxAA W = 0
– Aa from AAxAa W = -0.5
– Aa from AaxAa W = 0
Pat Mat
B
1 1 1
1 0 0.5
1 -1 0
0 1 0.5
0 0 0
0 -1 -0.5
-1 1 0
-1 0 -0.5
-1 -1 -1
assoc.mxassoc.mx
• Sibling pair sample
• B and W components precalculated in input file
• Single SNP genotype
• Quantitative trait
assoc.datassoc.dat
-0.007 -0.972 -1 0 -0.5 -0.5 0.5 -0.829 -0.196 1 1 1 0 0 0.369 0.645 1 1 1 0 0 0.318 1.55 0 1 0.5 -0.5 0.5 1.52 0.910 0 0 0 0 0 -0.948 -1.55 1 1 1 0 0 0.596 -0.394 1 0 0.5 0.5 -0.5 -1.91 -0.905 0 1 0.5 -0.5 0.5 0.499 0.940 1 0 0.5 0.5 -0.5 -1.17 -1.29 1 0 0.5 0.5 -0.5 -0.16 -1.81 1 1 1 0 0
s1 s2 g1 g2 b w1 w2
! Mx script for QTL association: sib pairs, univariate
Group 1 : Calc NG=2
Begin Matrices;! ** Parameters
B Full 1 1 free! association : between componentW Full 1 1 free ! association : within component
M Full 1 1 free ! meanS Full 1 1 free ! Shared residual varianceN Full 1 1 free! Nonshared residual variance
! ** Definition variables **C Full 1 1 ! association : between X Full 1 1 ! association : within, sib 1 Y Full 1 1 ! association : within, sib 2
End Matrices;
! ** Uncomment for B=W model ! Equate W 1 1 1 B 1 1 1
! Starting valuesMatrix B 0Matrix W 0Matrix M 0Matrix S 0.5Matrix N 0.5
End
Group2 : Data Group Data NI=7 NO=0 RE file=assoc.dat Labels Sib1 Sib2 g1 g2 b w1 w2 Select Sib1 Sib2 b w1 w2 / Definition b w1 w2 /
Matrices = Group 1
Means M + B*C + W*X | M + B*C + W*Y / Covariance
S + N | S _ S | S + N /
Specify C b / Specify X w1 / Specify Y w2 /
End
ModelsModels
B & W B Full 1 1 free W Full 1 1 free!Equate W 1 1 1 B 1 1 1
B = W B Full 1 1 free W Full 1 1 freeEquate W 1 1 1 B 1 1 1
B B Full 1 1 free W Full 1 1!Equate W 1 1 1 B 1 1 1
B=W=0B Full 1 1 W Full 1 1!Equate W 1 1 1 B 1 1 1
TestsTests
Test HA H0
Standard association test B = WB=W=0
Test of stratification B & W B = W
Robust association test B & W B
assoc.mxassoc.mx
Model B W -2LL df
B & W -0.478 -0.365 2103.96 795
B = W -0.420 -0.420 2105.05 796
B -0.4778 2127.01 796
B=W=0 2163.34 797
Test of total association HA B=W 2105.05 H0 B=W=0 2163.34
Δ-2LL = 58.29, df = 1, p < 1e-14
assoc.mxassoc.mx
Model B W -2LL df
B & W -0.478 -0.365 2103.96 795
B = W -0.420 -0.420 2105.05 796
B -0.4778 2127.01 796
B=W=0 2163.34 797
Test of stratification HA B &W 2103.96 H0 B = W 2105.05
Δ-2LL = 1.09, df = 1, p =0.29
assoc.mxassoc.mx
Model B W -2LL df
B & W -0.478 -0.365 2103.96 795
B = W -0.420 -0.420 2105.05 796
B -0.4778 2127.01 796
B=W=0 2163.34 797
Test of within association HA B &W 2103.96 H0 B 2127.01 Δ-2LL = 23.06, df = 1, p < 1e-6
ImplementationImplementation
• QTDT– Abecasis et al (2001) AJHG– extends between/within model to
general pedigrees– multiple alleles– covariates– combined test of linkage and
association– discrete as well as quantitative traits
Linkage Linkage AssociationAssociation
• families
• detectable over large distances >10 cM
• large effects OR >3, variance>10%
• unrelateds or families
• detectable over small distances <1 cM
• small effects OR<2, variance<1%
top related