gwas for quantitative traits -...
TRANSCRIPT
GWAS for quantitative traits
Peter M. Visscher
Queensland Institute ofMedical Research
Overview
• Darwin and Mendel
• Background: population genetics
• Background: quantitative genetics
• GWAS– Examples
– Analysis
– Statistical power
[Galton, 1889]
Mendelian Genetics
Following a single (or several) genes that we can directly score
Phenotype highly informative
as to genotype
Darwin & Mendel
• Darwin (1859) Origin of Species– Instant Classic, major immediate impact
– Problem: Model of Inheritance• Darwin assumed Blending inheritance
• Offspring = average of both parents
• zo = (zm + zf)/2
• Fleming Jenkin (1867) pointed out problem– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)
– Hence, under blending inheritance, half the variation is removed each generation and this must somehow be replenished by mutation.
Mendel• Mendel (1865), Experiments in Plant Hybridization• No impact, paper essentially ignored
– Ironically, Darwin had an apparently unread copy in his library
– Why ignored? Perhaps too mathematical for 19th century biologists
• Rediscovery in 1900 (by three independent groups)
• Mendel’s key idea: Genes are discrete particles passed on intact from parent to offspring
The height vs. pea debate
(early 1900s)
Do quantitative traits have the same hereditary and evolutionary properties as discrete characters?
Biometricians Mendelians
RA Fisher (1918). Transactions of the Royal Societyof Edinburgh52: 399-433.
m-a m+d m+a
Trait
m-a m+d m+a
Trait
Population Genetics
• Allele and genotype frequencies• Hardy-Weinberg Equilibrium• Linkage (dis)equilibrium
Allele and Genotype Frequencies
6
Given genotype frequencies, we can always compute allelefrequencies, e.g.,
The converse is not true: given allele frequencies we cannot uniquely determine the genotype frequencies
For n alleles, there are n(n+1)/2 genotypes
If we are willing to assume random mating,
Hardy-Weinbergproportions
∑≠
+=ji
jiiii AAfreqAAfreqp )(21)(
≠=
=jipp
jipAAfreq
ji
iji for 2
for )(
2
Hardy-Weinberg• Prediction of genotype frequencies from allele freqs
• Allele frequencies remain unchanged over generations,provided:
• Infinite population size (no genetic drift)
• No mutation
• No selection
• No migration
• Under HW conditions, a single generation of randommating gives genotype frequencies in Hardy-Weinbergproportions, and they remain forever in these proportions
QC in GWAS studies
Linkage equilibrium
Random mating and recombination eventually changesgamete frequencies so that they are in linkage equilibrium (LE).
Once in LE, gamete frequencies do not change (unless acted on by other forces)
At LE, alleles in gametes are independent of each other:
freq(AB) = freq(A)*freq(B)freq(ABC) = freq(A) * freq(B) * freq(C)
Linkage disequilibriumWhen linkage disequilibrium (LD) present, alleles are nolonger independent --- knowing that one allele is in the gamete provides information on alleles at other loci:
freq(AB) ≠ freq(A) * freq(B)
The disequilibrium between alleles A and B is given by
DAB = freq(AB) – freq(A)*freq(B)
GWAS relies on LD between markers and causal variants
Linkage equilibrium Linkage disequilibrium
Q1 M1
Q2 M2
Q1 M2
Q2 M1
Q1 M1
Q2 M2
Q1 M2
Q2 M1
Q1 M1
Q1 M1
Q2 M2
Q2 M2
Q1 M1
Q2 M2
Q1 M1
Q2 M2
The Decay of Linkage DisequilibriumThe frequency of the AB gamete is given by
freq(AB) = freq(A)*freq*(B) + DAB
If recombination frequency between the A and B lociis c, the disequilibrium in generation t is
D(t) = D(0) (1 – c)t
Note that D(t) -> zero, although the approach can be slow when c is very small
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0 10 20 30 40 50 60 70 80 90 100Generation
LD
c = 0.10c = 0.01c = 0.001
NB: Gene mapping & GWAS
Forces that Generate LD
• Drift (finite population size)• Selection• Migration (admixture)• Mutation• Population structure (stratification)
Effective population size determines the number of markers needed for GWAS
Quantitative Genetics
The analysis of traits whose variation is determined by both a number of genes and
environmental factors
Phenotype is highly uninformative as tounderlying genotype
m-a m+d m+a
Trait
m-a m+d m+a
Trait
Complex (or Quantitative) trait
• No (apparent) simple Mendelian basis for variation in the trait
• May be a single gene strongly influenced by environmental factors
• May be the result of a number of genes of equal (or differing) effect
• Most likely, a combination of both multiple genes and environmental factors.
• Example: Blood pressure, cholesterol levels, IQ, height, etc.
Basic model of Quantitative Genetics
Basic model: P = G + E
G = average phenotypic value for that genotypeif we are able to replicate it over the universeof environmental values, G = E[P]
G x E interaction --- G values are differentacross environments. Basic model nowbecomes P = G + E + GE
Biometrical model for single diallelic Quantitative
Trait Locus (QTL)
Contribution of the QTL to the Mean (X)
aaAaAAGenotypes
Frequencies, f(x)
Effect, x
p2 2pq q2
a d -a
( )∑=i
ii xfxµ
= a(p2) + d(2pq) – a(q2)Mean (X) = a(p-q) + 2pqd
Example: Apolipoprotein E & Alzheimer’s
Genotype ee Ee EE
Average age of onset 68.4 75.5 84.3
2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95
d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85
d/a = -0.10 Only small amount of dominance
Biometrical model for single diallelic QTL
Contribution of the QTL to the Variance (X)
aaAaAAGenotypes
Frequencies, f(x)
Effect, x
p2 2pq q2
a d -a
= (a-m)2p2 + (d-m)22pq + (-a-m)2q2Var (X)
( ) ( )∑ −=i
ii xfxVar 2µ
= VQTL
HW proportions
Biometrical model for single diallelic QTL
= (a-m)2p2 + (d-m)22pq + (-a-m)2q2Var (X)
= 2pq[a+(q-p)d]2 + (2pqd)2
= VAQTL+ VDQTL
Additive effects: the main effects of individual allelesDominance effects: represent the interaction between alleles
Biometrical model for single biallelic QTL
aa Aa AA
m
-a
a
d
Var (X) = Regression Variance + Residual Variance= Additive Variance + Dominance Variance
Fisher 1918
Association (GWAS)
• State of play
• Model
• Analysis method
• Power of detection
• GWAS works
• Effect sizes are typically small– Disease: OR ~1.1 to ~1.3
– Quantitative traits: % var explained <<1%
Disease Number of loci
Percent of Heritability Measure Explained
Heritability Measure
Age-related macular degeneration
5 50% Sibling recurrence risk
Crohn’s disease 32 20% Genetic risk (liability)
Systemic lupus erythematosus
6 15% Sibling recurrence risk
Type 2 diabetes 18 6% Sibling recurrence risk
HDL cholesterol 7 5.2% Phenotypic variance
Height 40 5% Phenotypic variance
Early onset myocardial infarction
9 2.8% Phenotypic variance
Fasting glucose 4 1.5% Phenotypic variance
Effect sizes QT (104 SNPs)% variance explained, quantitative
traits
05
101520253035
0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7
Freq
uenc
y
Linear model for single SNP
• Allelic
Y = µ+ b*x + ex = 0, 1, 2 for genotypes aa, Aa and AA
• Genotypic
Y = µ + Gi + eGi = genotype group for corresponding to
genotypes aa, Aa and AA
Additive model
Additive + dominance model
Method
• Linear regression
• ANOVA
• (other: maximum likelihood, Bayesian)
Test statistic (allelic model)
212,1
22
2
~)ˆvar(/ˆ
)1,0(~)ˆ(/ˆ
χ
σ
≈=
≈=
−
−
N
N
FbbT
NtbbT
)1(2)var()ˆvar(
22
ppNxNb ee
−==
σσ
Statistical Power (additive model)
q2 = {2p(1-p)[a + d(1-2p)]2} / σp2
Non-centrality parameter of χ2 test:
λ = Nq2/(1-q2) ≈ Nq2
Required sample size given type-I (α) and type-II (β) error:
N = [(1-q2)/(q2)](z(1-α/2) + z(1-β))2 ≈ (z(1-α/2) + z(1-β))
2 / q2
LD again
r2 = LD correlation between QTL and genotyped SNP
Proportion of variance explained at SNP= r2q2
Required sample size for detectionN ≈ (z(1-α/2) + z(1-β))
2 / (r2q2)
Genetic Power Calculator (Shaun Purcell)http://pngu.mgh.harvard.edu/~purcell/gpc/
Serum bilirubin: if all GWAS were so simple…
RS2070959_A210
95%
CI P
HEN
OTY
PE
2.000
1.500
1.000
0.500
0.000
-0.500
38% of phenotypic variance explained
1984