exploring complex diseases using genome-wide association: challenges and strategies
DESCRIPTION
Exploring complex diseases using genome-wide association: challenges and strategies. Li Jin, Ph.D. Fudan University CAS-MPG Partner Institute for Computational Biology. HGM2006, Helsinki. A G C. G G C. Gly. Ser. Positional Cloning. HGM2006, Helsinki. Linkage Disequilibrium. - PowerPoint PPT PresentationTRANSCRIPT
Exploring complex diseases using genome-wide association: challenges and strategies
Li Jin, Ph.D.
Fudan University
CAS-MPG Partner Institute for Computational Biology
HGM2006, Helsinki
AGC
GGC
Ser
Gly
Positional Cloning
HGM2006, Helsinki
LinkageDisequilibrium
Linkage
HGM2006, Helsinki
Daly et al. Nature Genetics, 2001 HGM2006, Helsinki
Genome-wide Association Study
Candidate Gene/Region Association Study
Genotyping tagSNPsSelect tagSNPs
Association analysis
HGM2006, Helsinki
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
Multiple Testing
• Large number of SNPs
– Number of tagging SNPs remains to be large (106)
• Multiple testing problem:
– Stringent p-value (10-6 – 10-7)
– Freimer and Sabatti (2004)
– Sample size and power
• Association:
– Linear transformation: T is an invariable
– Nonlinear transformation
)()( PPPPT ATA
7105 gwP
HGM2006, Helsinki
Motivation
PPPhPh AA )()(
)()( PhPh A
Statistics based on
Higher Power?
Statistics based on
PP A
Low Power
HGM2006, Helsinki
Nonlinear Transformations
Entropy
Function Derivative
xx log xlog1 Exponential
xexe
12 xx 12 xPolynomial
Sigmoid
xe 1
12)1( x
x
e
e
Gaussian
2
2
2
)(
cx
e
2
2
2
)(
2
cx
exc
Reciprocal
x
12
1
x
HGM2006, Helsinki
Power (Case-Control )
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.5
1
1.5
Allele Frequency
Expecte
d N
oncentr
ality
Para
mete
r
Entropy
2
Exp
Quadratic
Sigmoid
Gasaian
Reciprocal
Expected noncentrality parameters of the nonlinear test statistics
NA=NG=100, PD=0.5
HGM2006, Helsinki
Association test of MMP-2 gene with esophageal carcinoma
P values entropy exponential polynomial sigmoid reciprocal χ2
3.2 ×10-8 2.3 ×10-7 1.9 ×10-7 2.0 ×10-7 5.1 ×10-6 7.0 ×10-6
Yu C, et al. Cancer Res 2004, 64: 7622-7628
Association Studies
HGM2006, Helsinki
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
How LD patterns are compared between populations?
• Step 1: Infer haplotype blocks for each population• Step 2: Compare the boundaries of LD blocks between
populations.Pop A
Pop BTarget SNP
HGM2006, Helsinki
HGM2006, Helsinki
Factors Influencing Block Inferences
• Sample size
• Criterion and thresholds
• Genotyping error
• Gene flow
• Search algorithm
HGM2006, Helsinki
Af
As Eu
Daic (Thai) ?
HGM2006, Helsinki
Samples
Uighur 45
Han50
Wa45 Zhuang
44
Hmong 46
European40
African American48
Samoan50
HGM2006, Helsinki
SNP Selection and Genotyping
• Selected from dbSNP (build 117)• Most of them are double-hits• 26,112 SNPs on Chro. 21• 1 SNP for every 1.3 kb (Golden Path b.34)
• Illumina BeadLab platform• 17 oligonucleotide primer sets• Three QA criteria
– Samples– SNP: trios & duplicates– SNP: Hardy-Weinberg Expectation
HGM2006, Helsinki
Zhuang Han Hmong
Samoan Uighur
Wa
European African AmericanHGM2006, Helsinki
Phylogeny of Human Populations
HMJ
CCY
HAN
WBM
UIG
EUR
AA0.0684
0.0372
0.0093
0.0133
0.0093
0.0202
0.0039
0.0103
0.0341
0.0016
0.0023
0.01
Genetic Distance (FST)
HGM2006, Helsinki
Hmong
Zhuang
Han
Wa
Uyghur
European
African
Measurement of LD Sharing
• SNPs presented in both Pop A & Pop B• SNPs with MAF 0.1 were included• In LD, if r2 c (c = 0.1, 0.5, 0.8)
Pop A
Pop BTarget SNP
a = # LD in A
c = # LD in A & B
b = # LD in B
SAB = c/a
SBA = c/b
200kb
HGM2006, Helsinki
0.5
0.6
0.7
0.8
0.9
0.00 0.02 0.04 0.06 0.08 0.10 0.12
Fst
S
r2 > 0.1
r2 > 0.5SAB ~ FST
FST increases with time after divergence (t)
In non-Africans
HGM2006, Helsinki
Pop A
Pop BTarget SNP
a = # LD in A
c = # LD in A & B
b = # LD in B
SAB = c/a
SBA = c/b
200kb
Correlation of LD between Populations = corr(a,b)
HGM2006, Helsinki
Correlation of LD Between Populations and Genetic Distance (FST)
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 05 0. 1 0. 15
HGM2006, Helsinki
Portability of tagging SNPs (RAB)
RAB =Number of SNPs captured by tagSNPs
Total number of SNPs
Pop A
Pop B
Portability from A to B = RAB
HGM2006, Helsinki
0.05
0.10
0.15
0.20
0.25
0.00 0.02 0.04 0.06 0.08 0.10 0.12
Fst
Rab
r2 > 0.1
r2 > 0.5
RAB ~ FST
• R can be estimated using FST
• FST can be estimated using a small number of SNPs• Conclusion: R can be approximately estimated by typing a small number of SNPs
1-
HGM2006, Helsinki
t
RAB FST
HGM2006, Helsinki
Conclusions
• Substantial LD sharing between populations: ancestral LDs
• tagSNPs are generally portable between populations, at least within Asia
• Portability of a population to another can be estimated empirically using a small set of SNPs
HGM2006, Helsinki
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
Population Stratification
• 209 languages belonging to 6 linguistic families• Consistent observation of south-north differentiation• Affect the power of association studies - false positives• Different loci show different level of differentiation: Is
there an adequate adjustment?
HGM2006, Helsinki
Individual treeChromosome 2120,288 SNPs
HGM2006, Helsinki
Cluster Decomposition of Chinese PopulationsHGM2006, Helsinki
Y Chromosomes143 populations
mtDNA91 populations
CODIS STRs79 populations
HLA-A107 populations
Geographic Genetic Clines Based on Principle Components
HGM2006, Helsinki
Distributions of mtDNA Haplogroups
HGM2006, Helsinki
Distributions of Y Haplogroups
HGM2006, Helsinki
All haplogroups
All haplogroups
Major haplogroups
HGM2006, Helsinki
Uyghurs
HGM2006, Helsinki
Uyghurs
HGM2006, Helsinki
Population Stratification
• Different loci show different level of differentiation• Admixture indeed exist at least in some of the
populations• Adjustment for population stratification using average
differentiation is not adequate
HGM2006, Helsinki
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
Perfect Phylogeny Approach
• No recombination and recurrent mutation
• No loop in network
• Not necessarily continuous
• Objective: Group SNPs into PP sets
PP(A)PP(B)PP(C)
HGM2006, Helsinki
1
1
2
34
5432site 1site 2site 3site 4
(1, 2, 3) (4, 5)(2 , 3) (1, 4, 5)(1, 2, 3, 5) (4)(2) (1, 3, 4, 5)
Inference of Phylogeny
HGM2006, Helsinki
Sample Size
HaploTree PHASE 2.0.2 PPH
Accuracy Run time Accuracy Run time Accuracy Run time
25 94.81% 0.36s 94.55% 12.23s 92.50% 0.14s
50 97.44% 0.58s 97.37% 14.37s 96.48% 0.23s
100 98.78% 0.82s 98.74% 18.42s 98.07% 0.62s
Comparison of Different Algorithms
HGM2006, Helsinki
1
1
2
34
5432site 1site 2site 3site 4
(1, 2, 3) (4, 5)(2 , 3) (1, 4, 5)(1, 2, 3, 5) (4)(2) (1, 3, 4, 5)
Inference of Phylogeny
HGM2006, Helsinki
Identification of Disease Mutation
• For each PP, it allows a stepwise search to localize the most likely branch (edge) of the mutation.
• The best PP can be determined based on the likelihood (with adjustment of degree of freedom)
PP(A)PP(B)PP(C)
HGM2006, Helsinki
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
A Study of CAD
• Coronary Atherosclerosis in Chinese Populations
• 123 candidate genes belong to several pathways including antioxidant, inflammation, coagulation
• 1,518 tagSNPs typed
• 916 samples (492 cases and 424 controls)
HGM2006, Helsinki
HGM2006, Helsinki
CD36MMP8
PDGFC
DSCR1
ITGB1
ITGA2
PDGFB
SELL CCR2ITGA6
LAMA4
EDN1 SELE
TGFB3
VEGF
MSR1
NFKB1
MMP9IL1B
ACE
PON2
PON3 PON1
GPX3
SOD2
TXN
HMOX1GSRGCLM
NOS3GSS
NPR3
TXN
MMP9
Anti-oxidation Pathway
Inflammatory Pathway
With-PW interaction
Between-PW interactionHGM2006, Helsinki
CreditsCredits
• University of Texas – Houston
– Momiao Xiong, Jinying Zhao
• Chinese Human Genome Center at Shanghai
– Wei Huang, Haifeng Wang, Ying Wang, Zhu Chen, Guoping Zhao
• Fudan University & CAS-MPG Institute of Computational Biology
– Shuhua Xu, Fuzhong Xue, Yungang He, Yi Wang, Ming Lu, Ji Qian, Bo Wen, Hui Li, Wenqing Fu, Li Jin
HGM2006, Helsinki