population structure of uk biobank and ancient eurasians reveals … · 2016. 10. 26. ·...
TRANSCRIPT
Population structure of UK Biobank and ancient Eurasians
reveals adaptation at genes influencing blood pressure.
Kevin Galinsky
Harvard T.H. Chan School of Public Health
American Society of Human Genetics
October 21, 2016
Outline
1. Introduction
2. Population Structure
3. Natural Selection
2
Recent work was able to study population structure in the UK • People of British Isles (PoBI)
study contained 2,039 samples whose grandparents were born within 80km of each other
• Clustered into 17 groups using fineSTRUCTURE
• Sample size precluded natural selection analysis
Leslie et al. 2015 Nature 3
UK Biobank provides large dataset to detect population structure and natural selection
• Study follows 500k volunteers in UK
• Phase 1 data release contains 150k samples
• ~120k have UK ancestry
4
Sarkar, Webster & Galacher 2014 Healthy Cities: Public Health through Urban Planning
Previous work used PC-based approach to detect natural selection in 55k European Americans
Galinsky et al. 2016a AJHG 5
Population structure from FastPCA Natural selection along PC1
Outline
1. Introduction
2. Population Structure
3. Natural Selection
6
UK Biobank population structure analysis
7
SNPs Samples
847k 153k
Original 2101210k 113k
LD QC Filtering
Remove: Non-UK Ancestry Missing data SNPs Low MAF SNPs Missing data samples
LD pruning
HWE filter LD < 0.2 Prune Related
FastPCA
http://www.hsph.harvard.edu/alkes-price/software/
PCs
511k 119k
QC*
k-means clustering
Clusters
External Ref Pops
Project
Galinsky et al. 2016b AJHG
First 5 PCs contained interesting population structure
8
Galinsky et al. 2016b AJHG
210k SNPs × 113k samples FastPCA – 2 hours run time, 6.5GB memory
k-means clustering with 6 clusters produced stable clusters
9
Projecting PoBI samples identified the clusters Several north-south PCs, separate Welsh PCs
10
Leslie et al. 2015 Nature
Tracing UK ancestry to ancient populations
• 3 ancient populations thought to be origin of modern Europeans • Steppe
• Early farmers
• Hunter-gatherers
• Approach: project ancient populations onto UK Biobank PCs
Lazaridis et al. 2014 Nature Haak et al. 2015 Nature Mathieson et al. 2015 Nature
Steppe
Early Farmers
Hunter-gatherers
11
Ancient populations projected along north-south PCs; Steppe projected farther north
12
Outline
1. Introduction
2. Population Structure
3. Natural Selection
13
Unusual population differentiation is a powerful way to detect recent selection Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude Yi et al. 2010 Science One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date.
14
Unusual population differentiation is a powerful way to detect recent selection The Impact of Natural Selection on an ABCC11 SNP Determining Earwax Type Ohashi, Naka & Tsuchiya 2011 Mol Bio & Evol
In addition, we show that absolute latitude is significantly associated with the allele frequency of rs17822931-A in Asian, Native American, and European populations, implying that the selective advantage of rs17822931-A is related to an adaptation to a cold climate.
15
Unusual population differentiation is a powerful way to detect recent selection
Greenlandic Inuit show genetic signatures of diet and climate adaptation Fumagalli et al. 2015 Science
To detect signals of positive selection, we used the population branch statistic (PBS), which identifies alleles that have experienced strong changes in frequency in one population relative to two reference populations.
16
UK Biobank natural selection analysis
SNPs Samples http://www.hsph.harvard.edu/alkes-price/software/
17
LD pruning 2101210k 113k
LD
FastPCA
PCs
511k 119k
QC* 511k 113k
QC QC SNPs + Unrelated
Selected SNPs
External Selection Statistic
Combined Selection Statistic
PC-based Natural Selection
Evaluating SNPs for unusual differentiation along principal components • Previous work compared
allele frequencies in discrete populations
• Statistic can be extended to SNP loadings – correlation of PC and genotypes
• 𝑀𝑈𝑖𝑘2 =
𝑀 𝑋𝑖𝑉𝑘2
Σ2=
𝑋𝑖𝑉𝑘2
Λ𝑘~𝜒1
2
18
Bhatia et al. 2011 AJHG
An
cest
ral
Pop
ula
tio
n
𝑝
Pop 1 𝑝1
Pop 2 𝑝2
𝐹𝑆𝑇 𝑝1 − 𝑝2
Sample 𝑋: genotypes
𝑋 = 𝑈Σ𝑉 𝑋𝑇𝑋
𝑀= 𝑉𝑇Λ𝑉
Galinsky et al. 2016a AJHG
Detect selection at FUT2 within UK PC1 selection statistic for each SNP in the genome
19
FUT2 protects against norovirus
• Fucosyltransferase 2 is part of the Lewis blood group
• Top hit rs601338 is coding variant, rs601338*A codes for nonsecretor allele (more common in northern UK)
• Associations: • Protects against Norwalk Norovirus (Thorven et al. 2005 J Virol)
• Vitamin B12 level (Hazra et al. 2008 Nature)
• Crohn’s disease (McGovern et al. 2010 Hum Mol Genet)
• Celiac & IBD (Parmar et al. 2012 Tissue Antigens)
20
Combine our selection statistic with ancient Eurasian selection statistic for more power • Compares four modern and
three ancient populations
• Model each modern population as mixture of ancient ones
• Detects allele frequency deviations from expected
• Produces 𝜒42 statistic – adding to
PC stat produces 𝜒52 statistic
21
Mathieson et al. 2015 Nature
Find two new signals at F12 and CYP1A2/CSK Ancient selection scan combined with PC2 selection statistic for each SNP in the genome
22
Coagulation factor XII (F12) was detected under this combined scan • rs2545801 suggestive in ancient population scan (𝑝 = 5.3 × 10−8)
• Associated with activated partial thromboplastin time (Tang et al. 2012 AJHG)
• Associations of second significant SNP rs2731672 • Expression of F12 in liver (Innocenti et al. 2011 PLoS Genet)
• Plasma levels of factor XII (Guerrero et al. 2011 Haematologica)
23
Mathieson et al. 2015 Nature
CSK (rs1378942) is known target of selection UK Biobank selection more recent
24 Ding and Kullo 2011 BMC Medical Genetics
CYP1A2/CSK is associated with blood pressure
• CSK rs1378942 associated with: • Blood pressure (Newton-Cheh et al. 2009 Nat Genet)
• Systemic sclerosis (Martin et al. 2012 Hum Mol Genet)
• CYP1A2 rs2472304 associated with: • Esophageal cancer (Xie et al. 2005 BMC Bioinformatics)
• Caffeine consumption (Cornelis et al. 2011 PLoS Genet)
• UK Biobank associations: • Diastolic blood pressure (DBP) (𝑝 = 3.6 × 10−19)
• Hypertension (𝑝 = 4.8 × 10−9)
25
ATXN2/SH2B3 associated with blood pressure
• Additional signal of selection found within ATXN2/SH2B3 LD block
• ATXN2 associated with blood pressure • DBP: 𝑝 = 8.00 × 10−33
• Hypertension: 𝑝 = 1.30 × 10−9
26
SH2B3
ATXN2
PTPN11
Mathieson et al. 2015 Nature The International Consortium for Blood Pressure Genome-Wide Association Studies 2011 Nature
PC3 & PC4 are associated with diastolic blood pressure Phenotype PC1 PC2 PC3 PC4
BMI 2.36 × 10−27 2.59 × 10−11
Height < 1 × 10−50 < 1 × 10−50
DBP 2.40 × 10−8 6.16 × 10−13
SBP 6.15 × 10−7
Asthma 2.50 × 10−6 2.89 × 10−5
27
Galinsky et al. 2016b AJHG
Conclusions
• UK Biobank dataset enabled study of subtle population structure
• PC-based selection statistic detected selection at FUT2
• Demonstrated new approach combining multiple selection statistics that follow 𝜒2 distribution
28
Acknowledgements
This research has been conducted using the UK Biobank Resource
Alkes Price Po-Ru Loh
Swapan Mallick David Reich Iain Mathieson
Nick Patterson
For more details, see Galinsky et al. 2016b AJHG!