010101100010010100001010101010011011100110001100101000100101 introduction: human population genomics...
TRANSCRIPT
010101100010010100001010101010011011100110001100101000100101
Introduction: Human Population Genomics
ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG
• Cost• Killer apps• Roadblocks?
How soon will we all be sequenced?
Time
2013?2018?
Cost
Applications
The Hominid Lineage
Human population migrations
• Out of Africa, Replacement– Single mother of all humans (Eve)
~150,000yr– Single father of all humans (Adam)
~70,000yr– Humans out of Africa ~50000 years
ago replaced others (e.g., Neandertals)
• Multiregional Evolution– Generally debunked, however,– ~5% of human genome in Europeans,
Asians is Neanderthal, Denisova
Coalescence
Y-chromosome coalescence
Why humans are so similar
A small population that interbred reduced the genetic variation
Out of Africa ~ 50,000 years ago
Out of Africa
Migration of Humans
Migration of Humans
http://info.med.yale.edu/genetics/kkidd/point.html
Migration of Humans
http://info.med.yale.edu/genetics/kkidd/point.html
Some Key Definitions
Mary: AGCCCGTACGJohn: AGCCCGTACGJosh: AGCCCGTACGKate: AGCCCGTACGPete: AGCCCGTACGAnne: AGCCCGTACGMimi: AGCCCGTACGMike: AGCCCTTACGOlga: AGCCCTTACGTony: AGCCCTTACG
Alleles: G, T
Major Allele: GMinor Allele: T
Heterozygosity:Prob[2 alleles picked at random with replacement are different]
2*.75*.25 = .375
H = 4Nu/(1+4Nu)
G/GG/GG/TG/GG/GG/GG/GT/TT/GT/G
Recombinations:At least 1/chromosomeOn average ~1/100 Mb
Linkage Disequilibrium:The degree of correlation between two SNP locations
Mom Dad
Human Genome Variation
SNPTGCTGAGATGCCGAGA
Novel SequenceTGCTCGGAGATGC - - - GAGA
InversionMobile Element orPseudogene Insertion
Translocation Tandem Duplication
MicrodeletionTGC - - AGATGCCGAGA
Transposition
Large DeletionNovel Sequenceat Breakpoint
TGC
The Fall in Heterozygosity
H – HPOP
FST = ------------- H
The HapMap Project
ASW African ancestry in Southwest USA 90CEU Northern and Western Europeans (Utah) 180CHB Han Chinese in Beijing, China 90CHD Chinese in Metropolitan Denver 100GIH Gujarati Indians in Houston, Texas 100JPT Japanese in Tokyo, Japan 91LWK Luhya in Webuye, Kenya 100MXL Mexican ancestry in Los Angeles 90MKK Maasai in Kinyawa, Kenya 180TSI Toscani in Italia 100YRI Yoruba in Ibadan, Nigeria 100
Genotyping:Probe a limited number (~1M) of known highly variable positions of the human genome
Linkage Disequilibrium & Haplotype Blocks
pA pG
Linkage Disequilibrium (LD):
D = P(A and G) - pApG
Minor allele: A G
Population Sequencing – 1000 Genomes Project
The 1000 Genomes Project Consortium et al. Nature 467, 1061-1173 (2010) doi:10.1038/nature09534
Association Studies
Control
Disease
A/GA/GG/GG/GA/GG/GG/G
A/AA/GA/AA/GA/GA/AA/A
AA 0 4
AG 3 3
GG 4 0
p-value
Wellcome Trust Case Control
Nature 447, 661-678(7 June 2007) Nature 464, 713-720(1 April 2010)
Many associations of small effect sizes (<1.5)
Disease ClusteringDisease Genotyping
Multiple Sclerosis (MS)Illumina chip,
15K non-synon SNPs
Ankylosing Spondylitis (AS)
Autoimmune Thyroid (ATD)
Breast Cancer (BC)
Rheumatoid Arthritis (RA)
Affy 500K array
Bipolar Disorder (BD)
Crohn's Disease (CD)
Coronary Artery (CAD)
Hypertension (HT)
Type 1 Diabetes (T1D)
Type 2 Diabetes (T2D)
Randomization to determine significance
Use results as a distance metric for clustering diseases
Compute disease-disease correlations
PLoS Genet 5(12): e1000792. doi:10.1371/journal.pgen.1000792. 2009.
Disease Clustering
• RA vs. ATD• RA vs. MS
– No recorded co-occurrence of RA and MS
SNP - Allele Gene Symbol
Genetic Variation Score (GVS)RA
(NARAC) RA AS T1D ATD MS (IMSGC) MS
rs11752919 - C ZSCAN23 -3.48 -3.21 -9.39 1.10 0.70 3.25 2.99
rs3130981 - A CDSN -0.46 -1.00 -9.47 -4.94 0.33 10.00 13.41
rs151719 - G HLA-DMB -6.71 -4.77 -1.08 -13.63 0.34 8.58 17.76
rs10484565 - T TAP2 25.52 8.37 1.34 15.74 -1.36 -0.56 -0.30
rs1264303 - G VARS2 11.51 7.36 18.76 0.89 -1.76 -1.85 -1.75
rs1265048 - C CDSN 6.59 2.97 50.13 6.34 -0.85 -2.39 -4.16
rs2071286 - A NOTCH4 5.30 0.78 6.42 4.04 -0.03 -1.89 -2.45
rs2076530 - G BTNL2 67.49 56.46 14.06 13.58 -6.41 -9.50 -18.52
rs757262 - T TRIM40 14.58 9.11 6.27 1.56 -0.79 -2.05 -7.34
Heritability & Environment
Bienvenu OJ, Davydow DS, & Kendler KS (2011). Psychological medicine, 41 (1), 33-40 PMID:
Ancestry Inference
?Danish
French
Spanish
Mexican
Global Ancestry Inference
Nature. 2008 November 6; 456(7218): 98–101.
Ancestry Painting
?Danish
French
Spanish
Mexican
Ancestry Painting – Haplotype-based
HAPAA, HAPMIX
HAPAA: Genome Res. 2008. 18: 676-682HAPMIX: PLoS Genet 5(6): e1000519, 2009
Fixation, Positive & Negative Selection
Neutral Drift Positive SelectionNegative Selection
How can we detect negative
selection?
How can we detect positive
selection?
Conservation and Human SNPs
CNSs have fewer SNPs
SNPs have shifted allele frequency spectra
Neutral CNS
How can we detect positive selection?
Ka/Ks ratio:Ratio of nonsynonymous tosynonymous substitutions
Very old, persistent, strong positive selection for a protein that keeps adapting
Examples: immune response, spermatogenesis
How can we detect positive selection?
Long Haplotypes –iHS test
Less time:• Fewer mutations• Fewer recombinations