010101100010010100001010101010011011100110001100101000100101 introduction: human population genomics...

29
010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Upload: denis-glenn

Post on 04-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

010101100010010100001010101010011011100110001100101000100101

Introduction: Human Population Genomics

ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Page 2: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

• Cost• Killer apps• Roadblocks?

How soon will we all be sequenced?

Time

2013?2018?

Cost

Applications

Page 3: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

The Hominid Lineage

Page 4: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Human population migrations

• Out of Africa, Replacement– Single mother of all humans (Eve)

~150,000yr– Single father of all humans (Adam)

~70,000yr– Humans out of Africa ~50000 years

ago replaced others (e.g., Neandertals)

• Multiregional Evolution– Generally debunked, however,– ~5% of human genome in Europeans,

Asians is Neanderthal, Denisova

Page 5: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Coalescence

Y-chromosome coalescence

Page 6: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Why humans are so similar

A small population that interbred reduced the genetic variation

Out of Africa ~ 50,000 years ago

Out of Africa

Page 7: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Migration of Humans

Page 8: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Migration of Humans

http://info.med.yale.edu/genetics/kkidd/point.html

Page 9: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Migration of Humans

http://info.med.yale.edu/genetics/kkidd/point.html

Page 10: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Some Key Definitions

Mary: AGCCCGTACGJohn: AGCCCGTACGJosh: AGCCCGTACGKate: AGCCCGTACGPete: AGCCCGTACGAnne: AGCCCGTACGMimi: AGCCCGTACGMike: AGCCCTTACGOlga: AGCCCTTACGTony: AGCCCTTACG

Alleles: G, T

Major Allele: GMinor Allele: T

Heterozygosity:Prob[2 alleles picked at random with replacement are different]

2*.75*.25 = .375

H = 4Nu/(1+4Nu)

G/GG/GG/TG/GG/GG/GG/GT/TT/GT/G

Recombinations:At least 1/chromosomeOn average ~1/100 Mb

Linkage Disequilibrium:The degree of correlation between two SNP locations

Mom Dad

Page 11: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Human Genome Variation

SNPTGCTGAGATGCCGAGA

Novel SequenceTGCTCGGAGATGC - - - GAGA

InversionMobile Element orPseudogene Insertion

Translocation Tandem Duplication

MicrodeletionTGC - - AGATGCCGAGA

Transposition

Large DeletionNovel Sequenceat Breakpoint

TGC

Page 12: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

The Fall in Heterozygosity

H – HPOP

FST = ------------- H

Page 13: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

The HapMap Project

ASW African ancestry in Southwest USA 90CEU Northern and Western Europeans (Utah) 180CHB Han Chinese in Beijing, China 90CHD Chinese in Metropolitan Denver 100GIH Gujarati Indians in Houston, Texas 100JPT Japanese in Tokyo, Japan 91LWK Luhya in Webuye, Kenya 100MXL Mexican ancestry in Los Angeles 90MKK Maasai in Kinyawa, Kenya 180TSI Toscani in Italia 100YRI Yoruba in Ibadan, Nigeria 100

Genotyping:Probe a limited number (~1M) of known highly variable positions of the human genome

Page 14: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Linkage Disequilibrium & Haplotype Blocks

pA pG

Linkage Disequilibrium (LD):

D = P(A and G) - pApG

Minor allele: A G

Page 15: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Population Sequencing – 1000 Genomes Project

The 1000 Genomes Project Consortium et al. Nature 467, 1061-1173 (2010) doi:10.1038/nature09534

Page 16: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Association Studies

Control

Disease

A/GA/GG/GG/GA/GG/GG/G

A/AA/GA/AA/GA/GA/AA/A

AA 0 4

AG 3 3

GG 4 0

p-value

Page 17: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Wellcome Trust Case Control

Nature 447, 661-678(7 June 2007) Nature 464, 713-720(1 April 2010)

Many associations of small effect sizes (<1.5)

Page 18: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Disease ClusteringDisease Genotyping

Multiple Sclerosis (MS)Illumina chip,

15K non-synon SNPs

Ankylosing Spondylitis (AS)

Autoimmune Thyroid (ATD)

Breast Cancer (BC)

Rheumatoid Arthritis (RA)

Affy 500K array

Bipolar Disorder (BD)

Crohn's Disease (CD)

Coronary Artery (CAD)

Hypertension (HT)

Type 1 Diabetes (T1D)

Type 2 Diabetes (T2D)

Randomization to determine significance

Use results as a distance metric for clustering diseases

Compute disease-disease correlations

PLoS Genet 5(12): e1000792. doi:10.1371/journal.pgen.1000792. 2009. 

Page 19: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Disease Clustering

• RA vs. ATD• RA vs. MS

– No recorded co-occurrence of RA and MS

SNP - Allele Gene Symbol

Genetic Variation Score (GVS)RA

(NARAC) RA AS T1D ATD MS (IMSGC) MS

rs11752919 - C ZSCAN23 -3.48 -3.21 -9.39 1.10 0.70 3.25 2.99

rs3130981 - A CDSN -0.46 -1.00 -9.47 -4.94 0.33 10.00 13.41

rs151719 - G HLA-DMB -6.71 -4.77 -1.08 -13.63 0.34 8.58 17.76

rs10484565 - T TAP2 25.52 8.37 1.34 15.74 -1.36 -0.56 -0.30

rs1264303 - G VARS2 11.51 7.36 18.76 0.89 -1.76 -1.85 -1.75

rs1265048 - C CDSN 6.59 2.97 50.13 6.34 -0.85 -2.39 -4.16

rs2071286 - A NOTCH4 5.30 0.78 6.42 4.04 -0.03 -1.89 -2.45

rs2076530 - G BTNL2 67.49 56.46 14.06 13.58 -6.41 -9.50 -18.52

rs757262 - T TRIM40 14.58 9.11 6.27 1.56 -0.79 -2.05 -7.34

Page 20: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Heritability & Environment

Bienvenu OJ, Davydow DS, & Kendler KS (2011).  Psychological medicine, 41 (1), 33-40 PMID:

Page 21: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Ancestry Inference

?Danish

French

Spanish

Mexican

Page 22: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Global Ancestry Inference

Nature. 2008 November 6; 456(7218): 98–101.

Page 23: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Ancestry Painting

?Danish

French

Spanish

Mexican

Page 24: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Ancestry Painting – Haplotype-based

HAPAA, HAPMIX

HAPAA: Genome Res. 2008. 18: 676-682HAPMIX: PLoS Genet 5(6): e1000519, 2009

Page 25: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Fixation, Positive & Negative Selection

Neutral Drift Positive SelectionNegative Selection

How can we detect negative

selection?

How can we detect positive

selection?

Page 26: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Conservation and Human SNPs

CNSs have fewer SNPs

SNPs have shifted allele frequency spectra

Neutral CNS

Page 27: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

How can we detect positive selection?

Ka/Ks ratio:Ratio of nonsynonymous tosynonymous substitutions

Very old, persistent, strong positive selection for a protein that keeps adapting

Examples: immune response, spermatogenesis

Page 29: 010101100010010100001010101010011011100110001100101000100101 Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG

Long Haplotypes –iHS test

Less time:• Fewer mutations• Fewer recombinations