epi 511, advanced population and medical genetics
TRANSCRIPT
![Page 1: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/1.jpg)
Alkes Price
Harvard School of Public Health
January 24 & January 26, 2017
EPI 511, Advanced Population and Medical Genetics
Week 1:
• Intro + HapMap / 1000 Genomes
• Linkage Disequilibrium
![Page 2: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/2.jpg)
EPI 511: Course structure
Week 1: HapMap, 1000G / Linkage disequilibrium
Week 2: Population structure and admixture
Week 3: Population stratification
Week 4: Fine-mapping / Natural selection
Week 5: Heritability / Genetic risk prediction
Week 6: Mixed models / Rare variant analysis
Week 7: Functional interpretation
![Page 3: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/3.jpg)
EPI 511: How to address the instructor
Alkes
Dr. Price
Professor Price
Honorable Professor Price
Honorable Distinguished Dr. Professor Price
![Page 4: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/4.jpg)
EPI 511: Office Hours
Instructor: Alkes
Office Hours: Thu 3:30-4:30pm, Building 2, Room 211
Email Address: [email protected]
(Please put EPI511 in the subject of your email)
Teaching Assistant: Armin
Office Hours: Fri + Mon 2-3pm, Building 2, Room 209
Email Address: [email protected]
![Page 5: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/5.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
![Page 6: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/6.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion
![Page 7: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/7.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
![Page 8: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/8.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
![Page 9: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/9.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
Video of each class will be posted on
the course www site <1hr after class.
![Page 10: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/10.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
![Page 11: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/11.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
• Experiences 5 take-home projects due Tue Jan 31, …, Tue Feb 28
![Page 12: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/12.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
• Experiences 5 take-home projects due Tue Jan 31, …, Tue Feb 28
![Page 13: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/13.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
• Experiences 5 take-home projects due Tue Jan 31, …, Tue Feb 28
• short Research Paper due Fri Mar 10
![Page 14: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/14.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
• Experiences 5 take-home projects due Tue Jan 31, …, Tue Feb 28
• short Research Paper due Fri Mar 10
• self-assessment Opportunity
20min exam (date will not be announced in advance)
![Page 15: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/15.jpg)
EPI 511: Outcome measures
• Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session
• Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class
![Page 16: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/16.jpg)
EPI 511: Outcome measures
• Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session
• Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class
• Experiences (60% of course grade) 6 take-home projects (data and programming intensive)
![Page 17: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/17.jpg)
Approaches to Scientific Understanding
Love is Understanding.
-- Madonna
Data is Understanding.
-- Alkes
![Page 18: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/18.jpg)
EPI 511: Outcome measures
• Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session
• Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class
• Experiences (60% of course grade) 6 take-home projects (data and programming intensive)
![Page 19: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/19.jpg)
Approaches to Scientific Understanding
Understanding Data requires Fixing Bugs.
![Page 20: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/20.jpg)
Genetics + data + programming = bright future
Gewin 2007 Nature Hayden 2012 Nature
![Page 21: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/21.jpg)
EPI 511: Outcome measures
• Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session
• Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class
• Experiences (60% of course grade) 5 take-home projects (data and programming intensive)
• short Research Paper (40% of course grade) 1,000-1,500 words (suggested topics provided on Feb 16)
![Page 22: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/22.jpg)
EPI 511: Outcome measures
• Advance reading (0% of course grade) 1 required paper + 1 optional paper per course session
• Lecture + Discussion (0% of course grade) discussants: each student to sign up as discussant for 1 class
• Experiences (60% of course grade) 5 take-home projects (data and programming intensive)
• short Research Paper (40% of course grade) 1,000-1,500 words (suggested topics provided on Feb 16)
• self-assessment Opportunity (0% of course grade)
20min exam (date will not be announced in advance)
![Page 23: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/23.jpg)
EPI 511: Policy on group work
Experiences (60% of course grade) 6 take-home projects (data and programming intensive)
• OK to discuss experiences with your colleagues
• Each piece of code you write should be your own
short Research Paper (40% of course grade) 1,000-1,500 words (suggested topics provided on Feb 16)
• OK to discuss the project with your colleagues
• Each piece of code you write should be your own
• Each piece of text you write should be your own
![Page 24: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/24.jpg)
EPI 511, Advanced Population and Medical Genetics
Week 1:
• Introduction + HapMap Project
• Linkage Disequilibrium
![Page 25: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/25.jpg)
Outline
1. Introduction to Population Genetics
2. HapMap and HapMap2 projects
3. FST
4. HapMap3 and 1000 Genomes projects
![Page 26: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/26.jpg)
Outline
1. Introduction to Population Genetics
2. HapMap and HapMap2 projects
3. FST
4. HapMap3 and 1000 Genomes projects
![Page 27: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/27.jpg)
What is Population Genetics?
Population genetics is the study of genetic variation
both within and between human populations.
![Page 28: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/28.jpg)
Are different human populations
actually genetically different?
![Page 29: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/29.jpg)
Are different human populations
actually genetically different?
Slightly.
5-7% of worldwide human genetic variation is due to
genetic differences between human populations.
The remaining 93-95% of human genetic variation is due to
genetic variation within human populations
(Rosenberg et al. 2002 Science).
![Page 30: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/30.jpg)
Why study differences between
human populations?
• Learn about human migration patterns and ancient history.
![Page 31: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/31.jpg)
Why study differences between
human populations?
• Learn about human migration patterns and ancient history.
• Improve our power to identify and localize disease genes.
![Page 32: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/32.jpg)
Rosenberg et al. 2010
Nat Rev Genet
![Page 33: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/33.jpg)
Bustamante et al. 2011 Nature; also see Popejoy & Fullerton 2016 Nature
![Page 34: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/34.jpg)
Why study differences between
human populations?
• Learn about human migration patterns and ancient history.
• Improve our power to identify and localize disease genes.
Williams et al. 2014 Nature
![Page 35: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/35.jpg)
Why study differences between
human populations?
• Learn about human migration patterns and ancient history.
• Improve our power to identify and localize disease genes.
- Use differences in linkage disequilibrium for fine-mapping.
- Avoid false positives due to population stratification.
- Signals of natural selection at genes related to disease.
![Page 36: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/36.jpg)
Does “race” exist?
![Page 37: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/37.jpg)
Does “race” exist?
Worldwide patterns of human genetic variation are best
described using continuous clines instead of discrete clusters.
(Serre & Paabo 2004 Genome Res)
Racial classifications are inadequate descriptors of the
distribution of human genetic variation.
(Tishkoff & Kidd 2004 Nat Genet)
For a fun time: go to a population genetics party and ask,
![Page 38: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/38.jpg)
Isn’t it politically incorrect to study
differences between human populations?
![Page 39: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/39.jpg)
Isn’t it politically incorrect to study
differences between human populations?
No. It is not politically incorrect.
![Page 40: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/40.jpg)
Isn’t it politically incorrect to study
differences between human populations?
No. It is not politically incorrect.
“Studies of human population genetics have generated the
strongest proof that there is no scientific basis for racism.”
(Cavalli-Sforza 2005 Nat Rev Genet)
also see Cavalli-Sforza et al. 1994 The History and Geography of Human Genes
![Page 41: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/41.jpg)
Outline
1. Introduction to Population Genetics
2. HapMap and HapMap2 projects
3. FST
4. HapMap3 and 1000 Genomes projects
![Page 42: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/42.jpg)
The International HapMap Project (International HapMap Consortium 2005 Nature)
CEU (European) CHB (Chinese)
JPT (Japanese) YRI (Nigerian)
![Page 43: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/43.jpg)
CEU northern European USA 90
CHB Chinese China 45
JPT Japanese Japan 44
YRI Yoruba Nigeria 90
The International HapMap Project: 270 samples from 4 populations
![Page 44: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/44.jpg)
The International HapMap Project (International HapMap Consortium 2005 Nature)
CEU (European) CHB (Chinese)
JPT (Japanese) YRI (Nigerian)
Phase I HapMap:
>1,000,000 SNPs
![Page 45: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/45.jpg)
The International HapMap Project (International HapMap Consortium 2007 Nature)
CEU (European) CHB (Chinese)
JPT (Japanese) YRI (Nigerian)
Phase II HapMap:
>3,000,000 SNPs
![Page 46: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/46.jpg)
What is a SNP?
A Single Nucleotide Polymorphism (SNP) is a letter of the
genome that differs in different individuals (e.g. G/T).
![Page 47: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/47.jpg)
What is a SNP?
Rosenberg & Nordborg 2002 Nat Rev Genet
A Single Nucleotide Polymorphism (SNP) is a letter of the
genome that differs in different individuals (e.g. G/T).
Each SNP corresponds to one single mutation event in history,
e.g. G mutated to T in one single ancestor.
G = ancestral allele, T = derived allele.
Coalescent tree
![Page 48: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/48.jpg)
What is a SNP: physical position
Each SNP has a physical position on a chromosome.
physical
chrom. position (bp)
rs10910034 1 2165898
rs1713712 1 2166021
… … …
![Page 49: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/49.jpg)
What is a SNP: physical vs. genetic position
Each SNP has a physical and genetic position on a chromosome.
physical genetic position
chrom. position (Morgans)
rs10910034 1 2165898 0.01904785
rs1713712 1 2166021 0.01904814
… … … …
1 recombination event per Morgan per generation.
Genome-wide recombination rate is about 1cM / Mb.
[cM = centiMorgan = 1/100 Morgan, Mb = Megabase = 106 bp]
Thus, 1 Morgan is roughly 100Mb = 108 bp on average.
![Page 50: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/50.jpg)
HapMap project: Summary of main results
• 3.1 million SNPs successfully genotyped using Perlegen
genotyping technology (Hinds et al. 2005 Science).
• These 3.1 million SNPs: about 30% of all common SNPs
(defined as SNPs with minor allele frequency >5%).
![Page 51: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/51.jpg)
CEU northern European USA 90
CHB Chinese China 45
JPT Japanese Japan 44
YRI Yoruba Nigeria 90
HapMap: 270 samples from 4 populations
Affymetrix and
Illumina chips
![Page 52: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/52.jpg)
HapMap project: Summary of main results
• 3.1 million SNPs successfully genotyped using Perlegen
genotyping technology (Hinds et al. 2005 Science).
• These 3.1 million SNPs: about 30% of all common SNPs
(defined as SNPs with minor allele frequency >5%).
“Properties of SNPs are influenced by discovery sampling …
HapMap relied on nearly any piece of information available.”
Clark et al. 2005 Genome Res; also see Keinan et al. 2007 Nat Genet
![Page 53: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/53.jpg)
Summary of main results, continued
• Understanding genetic differences between populations.
• Patterns of linkage disequilibrium both within and across
populations.
• Most common SNPs in the human genome are in strong
linkage disequilibrium with at least one HapMap SNP
[avg r2 ≥ 0.90 in 10 sequenced ENCODE regions].
![Page 54: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/54.jpg)
Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature)
77% frequency
68% frequency
50% frequency C allele of rs10910034
![Page 55: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/55.jpg)
Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature)
FST = 0.19
FST = 0.11
FST = 0.16
Note: FST accounts for
sampling error due to
finite sample size.
![Page 56: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/56.jpg)
Populations can be distinguished using
a large number of genetic markers
Principal Components Analysis
using 100 markers
![Page 57: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/57.jpg)
Populations can be distinguished using
a large number of genetic markers
using 3 million markers
Principal Components Analysis
![Page 58: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/58.jpg)
Outline
1. Introduction to Population Genetics
2. HapMap and HapMap2 projects
3. FST
4. HapMap3 and 1000 Genomes projects
![Page 59: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/59.jpg)
Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature)
FST = 0.19
FST = 0.11
FST = 0.16
![Page 60: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/60.jpg)
Defining vs. Estimating FST
• FST is an underlying parameter that depends on the two
populations, but does not depend on a particular finite sample.
• FST is an estimate of the underlying FST that depends on a
particular finite sample that is analyzed.
Weir & Hill 2002 Annu Rev Genet, Bhatia et al. 2013 Genome Res
^
![Page 61: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/61.jpg)
Defining FST
Definition:
• The FST between two populations is the value such that the
allele frequency difference between the two populations has
mean 0 and variance 2FSTp(1 – p), where p is the allele
frequency in the ancestral population.
p
p2 p1
FSTp(1 – p) FSTp(1 – p)
Weir & Hill 2002 Annu Rev Genet, Bhatia et al. 2013 Genome Res
![Page 62: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/62.jpg)
Defining FST
Definition:
• The FST between two populations is the value such that the
allele frequency difference between the two populations has
mean 0 and variance 2FSTp(1 – p), where p is the allele
frequency in the ancestral population.
p1 ~ N(p, FSTp(1 – p))
p
p2 p1
FSTp(1 – p) FSTp(1 – p)
Weir & Hill 2002 Annu Rev Genet, Bhatia et al. 2013 Genome Res
![Page 63: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/63.jpg)
Defining FST
Definition:
• The FST between two populations is the value such that the
allele frequency difference between the two populations has
mean 0 and variance 2FSTp(1 – p), where p is the allele
frequency in the ancestral population.
p1 ~ Beta(p(1 – FST)/FST, (1 – p)(1 – FST)/FST)
p
p2 p1
FSTp(1 – p) FSTp(1 – p)
Weir & Hill 2002 Annu Rev Genet, Bhatia et al. 2013 Genome Res
![Page 64: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/64.jpg)
Defining FST
Definition:
• The FST between two populations is the value such that the
allele frequency difference between the two populations has
mean 0 and variance 2FSTp(1 – p), where p is the allele
frequency in the ancestral population.
OR
• The FST between two populations is equal to the proportion
of genotypic variance in a set of N individuals from each
population that is attributable to population differences.
Weir & Hill 2002 Annu Rev Genet, Bhatia et al. 2013 Genome Res
![Page 65: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/65.jpg)
Defining FST
Theorem 1:
• The FST between two populations is the value such that the
allele frequency difference between the two populations has
mean 0 and variance 2FSTp(1 – p), where p is the allele
frequency in the ancestral population.
=>
• The FST between two populations is equal to the proportion
of genotypic variance in a set of N individuals from each
population that is attributable to population differences.
![Page 66: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/66.jpg)
Defining FST
Proof: Let pavg = (p1 + p2)/2.
Total genotypic variance is 2pavg(1 – pavg) ≈ 2p(1 – p)
[Note that individuals are diploid: genotype = 0 or 1 or 2.
Binomial sampling with n=2.]
![Page 67: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/67.jpg)
Defining FST
Proof: Let pavg = (p1 + p2)/2.
Total genotypic variance is 2pavg(1 – pavg) ≈ 2p(1 – p)
[Note that individuals are diploid: genotype = 0 or 1 or 2.
Binomial sampling with n=2.]
Genotypic variance attributable to population differences:
Suppose we have N data points with value 2p1, N with value 2p2
After subtracting the average value (p1 + p2), we have
N data points with value (p1 – p2), N with value (p2 – p1).
Since p1 and p2 each have variance FSTp(1 – p), it follows that
(p1 – p2) and (p2 – p1) each have variance 2FSTp(1 – p)
![Page 68: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/68.jpg)
Defining FST
Proof: Let pavg = (p1 + p2)/2.
Total genotypic variance is 2pavg(1 – pavg) ≈ 2p(1 – p)
[Note that individuals are diploid: genotype = 0 or 1 or 2.
Binomial sampling with n=2.]
Genotypic variance attributable to population differences:
Suppose we have N data points with value 2p1, N with value 2p2
After subtracting the average value (p1 + p2), we have
N data points with value (p1 – p2), N with value (p2 – p1).
Since p1 and p2 each have variance FSTp(1 – p), it follows that
(p1 – p2) and (p2 – p1) each have variance 2FSTp(1 – p)
2FSTp(1 – p) / 2p(1 – p) = FST. Q.E.D.
![Page 69: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/69.jpg)
Defining FST
Theorem 1′:
• The FST between two populations is the value such that the
allele frequency difference between the two populations has
mean 0 and variance 2FSTp(1 – p), where p is the allele
frequency in the ancestral population.
=>
• The proportion of genotypic variance in a set of
αN individuals from population 1 and (1 – α)N individuals
from population 2 that is attributable to population differences
is equal to 4α(1 – α) · FST.
![Page 70: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/70.jpg)
Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature)
FST = 0.19
FST = 0.11
FST = 0.16
![Page 71: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/71.jpg)
Genetic differences between HapMap populations (International HapMap Consortium 2005 and 2007 Nature)
FST = 0.19
FST = 0.11
FST = 0.16
[2FSTp(1 – p)]1/2 = 0.23
for p = 0.5
[2FSTp(1 – p)]1/2 = 0.31
for p = 0.5
[2FSTp(1 – p)]1/2 = 0.28
for p = 0.5
![Page 72: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/72.jpg)
Genetic distances (FST) between
European American subpopulations
Ashkenazi
Northwest Eur. Southeast Eur.
FST = 0.009 FST = 0.004
FST = 0.005
Price, Butler et al. 2008 PLoS Genet
![Page 73: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/73.jpg)
Genetic distances (FST) between
European American subpopulations
Ashkenazi
Northwest Eur. Southeast Eur.
FST = 0.009 FST = 0.004
FST = 0.005
Price, Butler et al. 2008 PLoS Genet
[2FSTp(1 – p)]1/2 = 0.067 for p = 0.5
[2FSTp(1 – p)]1/2 = 0.050 for p = 0.5
[2FSTp(1 – p)]1/2 = 0.045 for p = 0.5
![Page 74: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/74.jpg)
Genetic distances (FST) between
East Asian subpopulations
FST = 0.007
International HapMap Consortium 2007 Nature
Chinese Japanese
[2FSTp(1 – p)]1/2 = 0.059 for p = 0.5
![Page 75: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/75.jpg)
Genetic distances (FST) between
West African subpopulations
FST = 0.008
International HapMap3 Consortium 2010 Nature
[2FSTp(1 – p)]1/2 = 0.063 for p = 0.5
Yoruba
(Nigeria)
Luhya
(Kenya)
![Page 76: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/76.jpg)
How do we estimate FST?
p1 and p2 are allele frequencies in 2 populations
Var(p1 – p2) = 2FSTp(1 – p).
Thus, estimate FST = Var((p1 – p2) / [2p(1 – p)]1/2).
= E((p1 – p2)2 / [2p(1 – p)]).
![Page 77: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/77.jpg)
How do we estimate FST?
p1 and p2 are allele frequencies in 2 populations
Var(p1 – p2) = 2FSTp(1 – p).
Thus, estimate FST = Var((p1 – p2) / [2p(1 – p)]1/2).
= E((p1 – p2)2 / [2p(1 – p)]).
A PROBLEM: we don’t get to observe p (ancestral frequency)
SOLUTION: approximate p ≈ pavg = (p1 + p2)/2.
![Page 78: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/78.jpg)
How do we estimate FST?
p1 and p2 are allele frequencies in 2 populations
Var(p1 – p2) = 2FSTp(1 – p).
Thus, estimate FST = Var((p1 – p2) / [2p(1 – p)]1/2).
= E((p1 – p2)2 / [2p(1 – p)]).
A BIGGER PROBLEM: we don’t get to observe p1 and p2.
We only get to observe sample allele frequencies p1 and p2
in sample sizes N1 (from pop. 1) and N2 (from pop. 2).
^ ^
![Page 79: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/79.jpg)
How do we estimate FST?
p1 and p2 are allele frequencies in 2 populations
Var(p1 – p2) = 2FSTp(1 – p).
Thus, estimate FST = Var((p1 – p2) / [2p(1 – p)]1/2).
= E((p1 – p2)2 / [2p(1 – p)]).
SOLUTION:
Since Var(p1 – p2) ≈ [2FST + 1/(2N1) + 1/(2N2)] p(1 – p), estimate
FST = E([(p1 – p2)2 – (1/(2N1) + 1/(2N2))p(1 – p)] / [2p(1 – p)])
(where we approximate p ≈ (p1 + p2)/2)
^ ^
^ ^
^ ^
some details omitted; see Bhatia et al. 2013 Genome Res
![Page 80: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/80.jpg)
How do we estimate FST?
p1 and p2 are allele frequencies in 2 populations
Var(p1 – p2) = 2FSTp(1 – p).
Thus, estimate FST = Var((p1 – p2) / [2p(1 – p)]1/2).
= E((p1 – p2)2 / [2p(1 – p)]).
SOLUTION:
Since Var(p1 – p2) ≈ [2FST + 1/(2N1) + 1/(2N2)] p(1 – p), estimate
FST = E([(p1 – p2)2 – (1/(2N1) + 1/(2N2))p(1 – p)] / [2p(1 – p)]).
OR FST = Σi [(pi1 – pi2)2 – (1/(2N1) + 1/(2N2))pi(1 – pi)]
Σi [2pi(1 – pi)]
^ ^
^ ^
some details omitted; see Bhatia et al. 2013 Genome Res
^ ^ (where i
indexes
SNPs)
![Page 81: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/81.jpg)
Drift vs. Divergence
YRI CHB CEU
0.02
0.04 0.07
0.10
YRI YRI CEU CEU CHB CHB
Divergence
(per 1000bp of DNA)
0.84 0.60 0.57
Keinan et al. 2007 Nat Genet
NA18488 NA06989 NA18597
Drift
(FST)
![Page 82: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/82.jpg)
Drift vs. Divergence
Drift
(FST)
YRI CHB CEU
0.02
0.04 0.05
0.10
YRI YRI CEU CEU CHB CHB
Divergence
(generations)
~30K
gen.
Keinan et al. 2007 Nat Genet
NA18488 NA06989 NA18597
Based on mut. rate 1.2–1.8 x 10-8
(Kong et al. 2012 Nature,
Sun et al. 2012 Nat Genet)
~20K
gen.
~20K
gen.
![Page 83: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/83.jpg)
Outline
1. Introduction to Population Genetics
2. HapMap and HapMap2 projects
3. FST
4. HapMap3 and 1000 Genomes projects
![Page 84: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/84.jpg)
CEU northern European USA 90
CHB Chinese China 45
JPT Japanese Japan 44
YRI Yoruba Nigeria 90
HapMap: 270 samples from 4 populations
Affymetrix and
Illumina chips
Perkel 2008 Nat Methods
![Page 85: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/85.jpg)
The HapMap Project:
Work is done, relax on beach?
![Page 86: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/86.jpg)
Beyond HapMap: what the world still needs
• Larger sample sizes for analyses of linkage disequilibrium
• More complete representation of world population diversity
e.g. South Asian and Native American genetic variation
• Analyses of copy number variation (CNV)
• Low-frequency variants (minor allele frequency <5%)
![Page 87: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/87.jpg)
The International HapMap3 Project:
1,260 samples from 11 diverse populations
International HapMap3 Consortium 2010 Nature
![Page 88: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/88.jpg)
CEU northern European USA 180
CHB Chinese China 90
JPT Japanese Japan 90
YRI Yoruba Nigeria 180
TSI Tuscan Italy 90
CHD Chinese USA 100
LWK Luhya Kenya 90
MKK Maasai Kenya 180
ASW African-American USA 90
MXL Mexican-American USA 90
GIH Gujarati-American USA 90
HapMap3: 1,260 samples from 11 populations
![Page 89: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/89.jpg)
The HapMap3 project
• Larger sample sizes for analyses of linkage disequilibrium
• More complete representation of world population diversity
e.g. South Asian and Native American genetic variation
• Analyses of copy number variation (CNV)
• Low-frequency variants (minor allele frequency <5%)
International HapMap3 Consortium 2010 Nature
![Page 90: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/90.jpg)
Data generation: SNPs and CNVs
Affymetrix 6.0 array
900K SNPs
940K copy-number probes
Illumina Infinium 1M array
1M SNPs, of which
80K targeted at CNV regions
1.5M SNPs passed QC in all populations
(99.3% concordance for 250K SNPs on both arrays)
Note: only 1.5M SNPs, versus 3.1 million SNPs in HapMap2
International HapMap3 Consortium 2010 Nature
![Page 91: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/91.jpg)
Not all HapMap3 populations are
similar to a population from HapMap
HapMap3 population Closest pop.
from HapMap
FST
TSI (Tuscan) CEU 0.004
CHD (Chinese) CHB 0.001
LWK (Luhya) YRI 0.008
MKK (Maasai) YRI 0.03
ASW (African-American) YRI 0.01
MXL (Mexican-American) CEU 0.04
GIH (Gujarati-American) CEU 0.04
![Page 92: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/92.jpg)
Approaches to Scientific Understanding
Love is Understanding.
-- Madonna
Data is Understanding.
-- Alkes
![Page 93: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/93.jpg)
HapMap3 data: individual files
CEU.ind:
NA06989 F CEU
NA11891 M CEU
NA11843 M CEU
NA12341 F CEU
NA12739 M CEU
…
[sample ID] [sex] [popname]
![Page 94: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/94.jpg)
HapMap3 data: SNP files
CEU.snp:
rs10458597 1 0.0 554484 C T
rs2185539 1 0.0 556738 C T
rs11240767 1 0.0 718814 C T
rs12564807 1 0.0 724325 A G
rs3131972 1 0.0 742584 G A
…
[SNP ID] [chr] [0.0] [position] [ref] [var]
![Page 95: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/95.jpg)
HapMap3 data: genotype files
CEU.geno:
2222222222… [Each line is 1 SNP, each column is 1 indiv.]
2222222222…
2222222222…
2222222222…
1121212112…
…
[Number of copies of reference allele: 0 or 1 or 2.
9 denotes missing data.]
Note: the HapMap3 data files for this course are restricted to
~700K SNPs that are common (MAF>5%) in every population.
![Page 96: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/96.jpg)
Beyond HapMap: what the world still needs
• Larger sample sizes for analyses of linkage disequilibrium
• More complete representation of world population diversity
e.g. South Asian and Native American genetic variation
• Analyses of copy number polymorphisms (CNV)
• Low-frequency variants (minor allele frequency <5%)
![Page 97: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/97.jpg)
Common Disease/Common Variant hypothesis
Lander 1996 Science; Reich & Lander 2001 Trends Genet
reviewed in Gibson 2012 Nat Rev Genet, Visscher et al. 2012 Am J Hum Genet
“For common diseases, there will be one or a few
predominating disease alleles with relatively high frequencies at
each of the major underlying disease loci”
![Page 98: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/98.jpg)
Are rare and low-frequency variants important?
Visscher et al. 2012 Am J Hum Genet
(to be continued, Thu of Week 6)
![Page 99: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/99.jpg)
Are rare and low-frequency variants important?
Gibson 2012 Nat Rev Genet
(to be continued, Thu of Week 6)
![Page 100: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/100.jpg)
Are rare and low-frequency variants important?
Kaiser 2012 Science (to be continued, Thu of Week 6)
![Page 101: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/101.jpg)
HapMap3 1Mb pilot sequencing study
and 1000 Genomes pilot projects
International HapMap3 Consortium 2010 Nature
1000 Genomes Project Consortium 2010 Nature
• HapMap3 pilot sequencing: 10 100kb regions spanning 1Mb (high coverage: Sanger sequencing)
692 individuals from 10 HapMap3 populations
• 1000 Genomes Trio pilot project: Genome-wide (high coverage: 42x)
6 individuals (one CEU trio and one YRI trio)
• 1000 Genomes Low-coverage pilot project: Genome-wide (low coverage: 2x-6x)
179 individuals from CEU, YRI, CHB, JPT populations
• 1000 Genomes Exon pilot project: 8,140 exons spanning 1.4Mb from 906 genes (high coverage: >50x)
697 individuals from 7 HapMap3 populations
![Page 102: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/102.jpg)
Sample size and SNP discovery (per Mb)
International HapMap3 Consortium 2010 Nature
![Page 103: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/103.jpg)
The 1000 Genomes (1000G) Project
Sequence the entire genomes of 1,092 individuals:
379 of European ancestry (Europe and USA)
286 of East Asian ancestry (Asia)
246 of African ancestry (Africa and USA)
181 of Latino ancestry (Latin America and USA)
Use next-generation sequencing technologies (~4x coverage):
e.g. Illumina, 454, SOLiD (read lengths 25-400bp)
(Metzker 2010 Nat Rev Genet, Davey et al. 2011 Nat Rev Genet,
also see Nielsen et al. 2011 Nat Rev Genet)
1000 Genomes Project Consortium 2012 Nature
![Page 104: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/104.jpg)
1000G project: Summary of main results
• 38 million SNPs discovered and successfully genotyped.
Most of these are rare and low-frequency variants.
• The 38 million SNPs include
99.7% of all SNPs with minor allele frequency 5%
98% of all SNPs with minor allele frequency 1% ***
50% of all SNPs with minor allele frequency 0.1%
based on an independent UK European sample.
***: stated goal to identify >95% of SNPs with frequency 1%
was successfully achieved.
1000 Genomes Project Consortium 2012 Nature
![Page 105: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/105.jpg)
Common variants are shared across populations,
but rare variants are often population-private
1000 Genomes Project Consortium 2012 Nature
![Page 106: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/106.jpg)
1000G project: the final phase
Sequence the entire genomes of 2,504 individuals:
503 of European ancestry (Europe and USA)
504 of East Asian ancestry (Asia)
661 of African ancestry (Africa and USA)
347 of Latino ancestry (Latin America and USA)
489 of South Asian ancestry (South Asia and USA)
Use next-generation sequencing technologies (~7x coverage):
Illumina only (read lengths 70-400bp only)
85 million SNPs, of which 64 million have MAF<0.5%
Related resource: UK10K project: 7x WGS of 3,781 UK samples
(UK10K Consortium 2015 Nature; also see Gudbjartsson et al. 2015 Nature)
1000 Genomes Project Consortium 2015 Nature
![Page 107: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/107.jpg)
1000G project: the final phase
Sequence the entire genomes of 2,504 individuals:
503 of European ancestry (Europe and USA)
504 of East Asian ancestry (Asia)
661 of African ancestry (Africa and USA)
347 of Latino ancestry (Latin America and USA)
489 of South Asian ancestry (South Asia and USA)
Use next-generation sequencing technologies (~7x coverage):
Illumina only (read lengths 70-400bp only)
85 million SNPs, of which 64 million have MAF<0.5%
1000 Genomes Project Consortium 2015 Nature; also see UK10K Consortium
2015 Nature, Gudbjartsson et al. 2015 Nat Genet, McCarthy et al. 2016 Nat Genet
![Page 108: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/108.jpg)
What about rare variants?
• The 1000G project has identified most low-frequency variants
(minor allele frequency 1%-5%). These variants can be placed
on genotyping arrays or imputed (see Thu of Week 1)
![Page 109: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/109.jpg)
What about rare variants?
• The 1000G project has identified most low-frequency variants
(minor allele frequency 1%-5%). These variants can be placed
on genotyping arrays or imputed (see Thu of Week 1)
• Rare variants: most have not been identified by 1000 Genomes!
Must sequence disease samples directly.
Past focus has been mostly on exome sequencing, but
now shifting to whole-genome sequencing.
(to be continued, Thu of Week 6)
Kiezun et al. 2012 Nat Genet, Tennessen et al. 2012 Science, Pasaniuc et al. 2012 Nat Genet,
Purcell et al. 2014 Nature, Do et al. 2015 Nature, Cai et al. 2015 Nature. Reviewed in
Goldstein et al. 2013 Nat Rev Genet, Lee et al. 2014 Am J Hum Genet, Zuk et al. 2014 PNAS
![Page 110: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/110.jpg)
• Human populations are slightly genetically different.
These differences may be important for disease mapping.
(see Thu slides: Linkage Disequilibrium.)
• FST quantifies differences between human populations.
• HapMap, HapMap2, HapMap3 and 1000 Genomes projects
provide a valuable resource for common & low-frequency
variants (but most rare variants have not yet been identified).
Conclusions
![Page 111: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/111.jpg)
EPI 511, Advanced Population and Medical Genetics
Week 1:
• Intro + HapMap / 1000 Genomes
• Linkage Disequilibrium
![Page 112: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/112.jpg)
EPI 511: Course components
• Advance reading 1 required paper + 1 optional paper per course session
• Lecture + Discussion discussants: each student to sign up as discussant for 1 class
![Page 113: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/113.jpg)
Outline
1. Introduction to Linkage Disequilibrium
2. LD and Tag SNPs
3. LD and imputation
4. LD and fine-mapping
![Page 114: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/114.jpg)
Outline
1. Introduction to Linkage Disequilibrium
2. LD and Tag SNPs
3. LD and imputation
4. LD and fine-mapping
![Page 115: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/115.jpg)
Definition: Linkage Disequilibrium (LD) refers to
correlations between genotypes of nearby markers.
Linkage Disequilibrium
![Page 116: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/116.jpg)
Definition: Linkage Disequilibrium (LD) refers to
correlations between genotypes of nearby markers.
Linkage Disequilibrium Association Studies
Linkage Disequilibrium Linkage Mapping
(reviewed in Ott et al. 2015 Nat Rev Genet)
Linkage Disequilibrium
![Page 117: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/117.jpg)
Linkage Disequilibrium: Example
Individuals
1 2 3 4 5 6 7 8
A A
G A
T T
A A
C G
T T
G G
C C
A A ... …
A A
G G
T T
A A
C C
T T
G G
T T
A A
... …
SNP 1
SNP 2 3 billion
letters
A A
G G
T T
A A
C C
T T
G G
C T
A A ... …
A A
A A
T T
A A
G G
T T
G G
T C
A A ... …
A A
G G
T T
A A
C C
T T
G G
T T
A A ... …
A A
G A
T T
A A
C G
T T
G G
C T
A A ... …
A A
G G
T T
A A
C C
T T
G G
C T
A A ... …
A A
G A
T T
A A
C C
T T
G G
C C
A A ... …
YES,
in LD
![Page 118: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/118.jpg)
Linkage Disequilibrium: Example
Individuals
1 2 3 4 5 6 7 8
A A
G A
T T
A A
C G
T T
G G
C C
A A ... …
A A
G G
T T
A A
C C
T T
G G
T T
A A
... …
SNP 1
SNP 2 3 billion
letters
A A
G G
T T
A A
C C
T T
G G
C T
A A ... …
A A
A A
T T
A A
G G
T T
G G
T C
A A ... …
A A
G G
T T
A A
C C
T T
G G
T T
A A ... …
A A
G A
T T
A A
C G
T T
G G
C T
A A ... …
A A
G G
T T
A A
C C
T T
G G
C T
A A ... …
A A
G A
T T
A A
C C
T T
G G
C C
A A ... …
SNP 3
YES,
in LD
NOT
in LD
![Page 119: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/119.jpg)
Linkage Disequilibrium: Example
Individuals
1 2 3 4 5 6 7 8
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0 ... …
SNP 1
SNP 2 3 billion
letters
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 0 ... …
0 0
1 1
0 0
0 0
1 1
0 0
0 0
1 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0 ... …
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 3
r2=1,
in LD
r2=0,
NOT
in LD
r2 is squared correlation
![Page 120: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/120.jpg)
Linkage Disequilibrium: Example
Individuals
1 2 3 4 5 6 7 8
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 1
SNP 2 3 billion
letters
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
0 0
1 1
0 0
0 0
1 1
0 0
0 0
0 0
1 1 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 1 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 3
r2=1,
in LD
r2=0.7,
partial
LD
r2 is squared correlation
![Page 121: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/121.jpg)
Linkage Disequilibrium: Example
Individuals
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 2 0 1 0 0 0
... … … … … … … …
SNP 1
SNP 2 3 billion
letters
SNP 3
r2=1,
in LD
r2=0.7,
partial
LD
r2 is squared correlation
![Page 122: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/122.jpg)
Genotypes vs. Haplotypes: phasing
Individuals
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 2 0 1 0 0 0
Individuals
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
PHASING
Genotypes Haplotypes
Stephens et al. 2001 Am J Hum Genet, Browning et al. 2011 Nat Rev Genet,
Williams et al. 2012 Am J Hum Genet, Delaneau et al. 2013 Nat Methods,
Loh et al. 2016a Nat Genet, Loh et al. 2016b Nat Genet
![Page 123: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/123.jpg)
Genotypes vs. Haplotypes: phasing
Individuals
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 2 0 1 0 0 0
Individuals
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
PHASING
Genotypes Haplotypes
Stephens et al. 2001 Am J Hum Genet, Browning et al. 2011 Nat Rev Genet,
Williams et al. 2012 Am J Hum Genet, Delaneau et al. 2013 Nat Methods,
Loh et al. 2016a Nat Genet, Loh et al. 2016b Nat Genet
![Page 124: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/124.jpg)
Genotypes vs. Haplotypes: phasing
Individuals
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 2 0 1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 2 0 1 0 0 0
Individuals
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
PHASING
Genotypes Haplotypes
Fact: r2 between SNP1 and SNP2 (phased haplotype data) equals
r2 between SNP1 and SNP2 (unphased genotype data),
assuming Hardy-Weinberg equilibrium holds
![Page 125: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/125.jpg)
Linkage Disequilibrium: Haplotype Blocks
Individuals
1 2 3 4 5 6 7 8
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 1
SNP 2 3 billion
letters
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
0 0
1 1
0 0
0 0
1 1
0 0
0 0
0 0
1 1 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 1 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 3
These 3 SNPs form a “haplotype block” with two main haplotypes
![Page 126: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/126.jpg)
LD with phased haplotypes: r2 vs. D′
Slatkin 2008 Nat Rev Genet
Consider two SNPs with frequencies pA and pB of alleles A, B.
Let gA refer to # copies (0, 1) of allele A for the first SNP.
Let gB refer to # copies (0, 1) of allele B for the second SNP.
)1()1(
)(
)()(
)]()()([ 222
BBAA
BAAB
BA
BABA
pppp
ppp
gVargVar
gEgEggEr
![Page 127: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/127.jpg)
LD with phased haplotypes: r2 vs. D′
Slatkin 2008 Nat Rev Genet
Consider two SNPs with frequencies pA and pB of alleles A, B.
Suppose pA < pB < 0.5.
)1()1(
2
2
BBAA
BAAB
pppp
pppr
BAA
BAAB
ppp
pppD
![Page 128: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/128.jpg)
LD with phased haplotypes: r2 vs. D′
Slatkin 2008 Nat Rev Genet
Consider two SNPs with frequencies pA and pB of alleles A, B.
Suppose pA < pB < 0.5. r2 and D′ are maximized when pAB = pA.
1
BAA
BAAB
ppp
pppD
BAB
BAA
BBAA
BAAB
ppp
ppp
pppp
pppr
)1()1(
2
2
![Page 129: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/129.jpg)
LD with phased haplotypes: r2 vs. D′
Slatkin 2008 Nat Rev Genet
Consider two SNPs with frequencies pA and pB of alleles A, B.
Suppose pA < pB < 0.5. r2 and D′ are maximized when pAB = pA.
e.g. pA = 0.25, pB = 0.4, pAB = 0.25 => r2 = 0.5, D′ = 1
1
BAA
BAAB
ppp
pppD
BAB
BAA
BBAA
BAAB
ppp
ppp
pppp
pppr
)1()1(
2
2
![Page 130: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/130.jpg)
LD with unphased diploid genotypes
Slatkin 2008 Nat Rev Genet
Consider two SNPs with frequencies pA and pB of alleles A, B.
Let gA refer to # copies (0, 1, 2) of allele A for the first SNP.
Let gB refer to # copies (0, 1, 2) of allele B for the second SNP.
1
BAA
BAAB
ppp
pppD
...)()(
)]()()([ 22
BA
BABA
gVargVar
gEgEggEr
cannot be directly computed,
since pAB relies on phased data!
![Page 131: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/131.jpg)
Approaches to Scientific Understanding
Love is Understanding.
-- Madonna
Data is Understanding.
-- Alkes
![Page 132: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/132.jpg)
Linkage Disequilibrium: Haplotype Blocks
Slatkin 2008 Nat Rev Genet
Haplotype blocks in
216kb region (MHC, chr 6)
x-axis = y-axis =
SNP position in region
D′ and L are measures of LD
(related to r2)
Red indicates high LD
Black indicates low LD
Also see Haploview program, Barrett et al. 2005 Bioinformatics
200 kb
100 kb
0 kb
![Page 133: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/133.jpg)
Linkage Disequilibrium: Haplotype Blocks
Europeans
and Asians
Africans
Gabriel et al. 2002 Science
also see Reich 2001 Nature, Daly 2001 Nat Genet
![Page 134: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/134.jpg)
Linkage Disequilibrium: Haplotype Blocks
African chromosomes: 50% of the genome lies in
haplotype blocks >22kb.
Europeans and Asians: 50% of the genome lies in
haplotype blocks >44kb.
Longer haplotype blocks in Europeans/Asians due to
out-of-Africa population bottleneck: descended from
small number of ancestors who left Africa 60-40 kya.
Gabriel et al. 2002 Science
also see Reich 2001 Nature, Daly 2001 Nat Genet
![Page 135: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/135.jpg)
A brief history of modern humans
Cavalli-Sforza & Feldman 2003 Nat Genet; also see Ramachandran et al. 2005 PNAS,
Mellars 2006 Science, Armitage et al. 2011 Science, Henn et al. 2012 PNAS
![Page 136: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/136.jpg)
A brief history of modern humans, contradicted
Green et al. 2010 Science, Reich et al. 2010 Nature, Meyer et al. 2012 Science,
Sankararaman et al. 2014 Nature, Vernot & Akey 2014 Science
reviewed in Racimo et al. 2015 Nat Rev Genet
• All non-African populations have ~2% of their genomes
descended from Neanderthals.
• Melanesian populations have ~5% of their genomes
descended from Denisovans, a relative of Neanderthals.
![Page 137: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/137.jpg)
Population bottlenecks increase LD
population
bottleneck
population
bottleneck
Cavalli-Sforza & Feldman 2003 Nat Genet; also see Ramachandran et al. 2005 PNAS,
Mellars 2006 Science, Armitage et al. 2011 Science, Henn et al. 2012 PNAS
![Page 138: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/138.jpg)
Population bottlenecks increase LD
Individuals
1 2 3 4 5 6 7 8
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0 ... …
SNP 2 3 billion
letters
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 0 ... …
0 0
1 1
0 0
0 0
1 1
0 0
0 0
1 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0 ... …
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 3
r2=0,
NOT
in LD
r2 is squared correlation
![Page 139: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/139.jpg)
Population bottlenecks increase LD
due to subsampling haplotypes (genetic drift) Individuals
1 2 3 4 5 6 7 8
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0 ... …
SNP 2 3 billion
letters
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 0 ... …
0 0
1 1
0 0
0 0
1 1
0 0
0 0
1 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0 ... …
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 3
r2=0,
NOT
in LD
r2 is squared correlation
![Page 140: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/140.jpg)
Population bottlenecks increase LD
due to subsampling haplotypes (genetic drift) Individuals
1 2 3 4 5 6 7 8
SNP 2 3 billion
letters
0 0
1 1
0 0
0 0
1 1
0 0
0 0
1 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 3
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 1
0 0 ... …
r2=0.5,
partial
LD
![Page 141: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/141.jpg)
Population bottlenecks increase LD
due to subsampling haplotypes (genetic drift) Individuals
1 2 3 4 5 6 7 8
0 0
1 1
0 0
0 0
1 1
0 0
0 0
1 0
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 2 3 billion
letters
0 0
1 1
0 0
0 0
1 1
0 0
0 0
1 0
0 0 ... …
0 0
1 1
0 0
0 0
1 1
0 0
0 0
1 0
0 0 ... …
0 0
1 1
0 0
0 0
0 1
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 1
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 1
0 0
0 0
0 1
0 0 ... …
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 ... …
SNP 3
r2 is squared correlation
r2=0.5,
partial
LD
![Page 142: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/142.jpg)
Population bottlenecks increase LD
Conrad et al. 2006 Nat Genet
Average number of haplotypes per genomic region
![Page 143: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/143.jpg)
Outline
1. Introduction to Linkage Disequilibrium
2. LD and Tag SNPs
3. LD and imputation
4. LD and fine-mapping
![Page 144: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/144.jpg)
Linkage Disequilibrium and tag SNPs
Individuals
Cases Controls
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
SNP 1: causal SNP
3 billion
letters
Direct association: genotype SNP1 in Cases and Controls.
![Page 145: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/145.jpg)
Linkage Disequilibrium and tag SNPs
Individuals
Cases Controls
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
SNP 1
3 billion
letters
Indirect association: genotype SNP2 in Cases and Controls.
If SNP1 affects disease risk, then SNP2 will also be associated!
SNP 2
r2=1,
in LD
![Page 146: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/146.jpg)
Linkage Disequilibrium and tag SNPs
Individuals
Cases Controls
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
SNP 1
3 billion
letters
Indirect association: genotype SNP3 in Cases and Controls.
If SNP1 affects disease risk, then SNP3 will also be associated!
SNP 3
r2=0.7,
partial
LD
SNP 2
![Page 147: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/147.jpg)
Linkage Disequilibrium and tag SNPs
Theorem 2 (Pritchard and Przeworski 2001 Am J Hum Genet):
If SNP1 is causal and LD(SNP1,SNP2) = r2, then
Power of an association study of SNP1 with N samples =
Power of an association study of SNP2 with N/r2 samples.
![Page 148: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/148.jpg)
Linkage Disequilibrium and tag SNPs
Theorem 2 (Pritchard and Przeworski 2001 Am J Hum Genet):
If SNP1 is causal and LD(SNP1,SNP2) = r2, then
Power of an association study of SNP1 with N samples =
Power of an association study of SNP2 with N/r2 samples.
Proof:
Let g1 and g2 be genotypes of SNP1 and SNP2 respectively
and π be phenotype, all normalized to mean 0 and variance 1.
Armitage Trend Test (χ2 = Nρ(g, π)2; Armitage 1955 Biometrics).
![Page 149: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/149.jpg)
Linkage Disequilibrium and tag SNPs
Theorem 2 (Pritchard and Przeworski 2001 Am J Hum Genet):
If SNP1 is causal and LD(SNP1,SNP2) = r2, then
Power of an association study of SNP1 with N samples =
Power of an association study of SNP2 with N/r2 samples.
Proof:
Let g1 and g2 be genotypes of SNP1 and SNP2 respectively
and π be phenotype, all normalized to mean 0 and variance 1.
Armitage Trend Test (χ2 = Nρ(g, π)2; Armitage 1955 Biometrics):
SNP1 with N samples: Nρ(g1, π)2 = NE(g1· π)2
SNP2 with N/r2 samples: (N/r2)ρ(g2, π)2 = (N/r2)E(g2 · π)2
= (N/r2)E([rg1 + (g2-rg1)] · π)2
= (N/r2)E(rg1· π)2 = NE(g1· π)2. Q.E.D.
![Page 150: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/150.jpg)
Linkage Disequilibrium: Haplotype Blocks
Control Case
Case
Case
Case
Control
Control
Control
Risk haplotype
Question: Which SNP to genotype?
Answer: Choose 1 SNP per haplotype block,
and take advantage of indirect association!
Case Control
![Page 151: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/151.jpg)
Linkage Disequilibrium: Haplotype Blocks
Control Case
Case
Case
Case
Control
Control
Control
Needed: a resource describing the haplotypes
at each location in the genome.
Case Control
Risk haplotype
![Page 152: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/152.jpg)
The International HapMap Project: 270 samples from 4 populations
CEU European USA 90 30 trios
YRI Yoruba Nigeria 90 30 trios
CHB Chinese China 45 unrelated
JPT Japanese Japan 45 unrelated
![Page 153: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/153.jpg)
Genetic differences between populations are small
68% frequency 50% frequency C allele of rs10910034
A allele of rs260509
52% frequency 51% frequency
11kb away on chr 1
![Page 154: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/154.jpg)
LD differences between populations are large!
68% frequency 50% frequency C allele of rs10910034
A allele of rs260509
52% frequency 51% frequency
11kb away on chr 1 r2 = 0.97 r2 = 0.34
![Page 155: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/155.jpg)
HapMap project: a resource for “SNP tagging”
Individuals
1 2 3 4 5 6 7 8
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
SNP 1
SNP 2 3 billion
letters
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0
0 0
1 1
0 0
0 0
0 0
1 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 1
0 0
0 0
0 1
0 0
0 0
0 0
0 1
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 SNP 3
SNP1 “tags” this entire haplotype block at an r2 of 0.7
![Page 156: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/156.jpg)
HapMap project: a resource for “SNP tagging”
How to select SNPs to genotype in an association study:
• Choose genomic region(s) of interest.
• Look up HapMap SNPs in the genomic region(s).
• Choose a subset of HapMap SNPs which “tag” haplotype
blocks in the genomic region(s).
(e.g. Tagger algorithm, de Bakker et al. 2005 Nat Genet)
Note: because LD patterns vary by population, it is
important to choose tag SNPs using a HapMap population
similar to the population in the association study.
![Page 157: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/157.jpg)
HapMap project: a resource for “SNP tagging”
International HapMap Consortium 2007 Nature; also see Barrett et al. 2006 Nat Genet,
Smith et al. 2006 Genomics, International HapMap Consortium 2005 Nature
How many “tag SNPs” are required?
For the entire genome, the answer is:
Thus, to choose tag SNPs at an r2 of 0.8, we need roughly
1 SNP per 3kb in YRI, or 1 SNP per 5kb in CEU or CHB+JPT
![Page 158: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/158.jpg)
Things aren’t always what they seem
![Page 159: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/159.jpg)
Things aren’t always what they seem
• Estimating LD using a small number of HapMap samples
may lead to overfitting.
• HapMap SNPs are not a random subset of SNPs.
![Page 160: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/160.jpg)
Things aren’t always what they seem
• Estimating LD using a small number of HapMap samples
may lead to overfitting.
• HapMap SNPs are not a random subset of SNPs.
Bhangale et al. 2008 Nat Genet
![Page 161: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/161.jpg)
Things aren’t always what they seem
According to International HapMap Consortium 2007 Nature:
82% of common SNPs are tagged at r2 ≥ 0.8 by Affymetrix 6.0
According to Bhangale et al. 2008 Nat Genet:
66% of common SNPs are tagged at r2 ≥ 0.8 by Affymetrix 6.0
Bhangale et al. 2008 Nat Genet
![Page 162: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/162.jpg)
Multi-SNP tagging
Haplotype
1 2 3 4 [freq. 25% for each haplotype]
SNP1 A A C C
SNP2 A C C A
SNP3 A C A C
r2=0,
NOT
in LD
(causal)
![Page 163: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/163.jpg)
Multi-SNP tagging
Haplotype
1 2 3 4 [freq. 25% for each haplotype]
SNP1 A A C C
SNP2+3 A+A C+C C+A A+C r2=1,
YES
in LD
(causal)
![Page 164: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/164.jpg)
Multi-SNP tagging
Pe’er et al. 2006 Nat Genet
also see Zaitlen et al. 2007 Am J Hum Genet
![Page 165: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/165.jpg)
Outline
1. Introduction to Linkage Disequilibrium
2. LD and Tag SNPs
3. LD and imputation
4. LD and fine-mapping
![Page 166: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/166.jpg)
What is imputation?
Marchini et al. 2007 Nat Genet, Howie et al. 2009 PLoS Genet, Li et al. 2010
Genet Epidemiol, Howie et al. 2012 Nat Genet, Fuchsberger et al. 2015 Bioinformatics
![Page 167: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/167.jpg)
What is imputation?
? Marchini et al. 2007 Nat Genet, Howie et al. 2009 PLoS Genet, Li et al. 2010
Genet Epidemiol, Howie et al. 2012 Nat Genet, Fuchsberger et al. 2015 Bioinformatics
![Page 168: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/168.jpg)
Imputation: Why try?
• Increase power to detect disease association at untyped causal SNP
(imputed causal SNP may have stronger association than tag SNP)
![Page 169: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/169.jpg)
Imputation: Why try?
r2 = 0.8
Causal SNP
Marchini et al. 2007 Nat Genet, Howie et al. 2009 PLoS Genet, Li et al. 2010
Genet Epidemiol, Howie et al. 2012 Nat Genet, Fuchsberger et al. 2015 Bioinformatics
![Page 170: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/170.jpg)
Imputation: Why try?
Causal SNP
Marchini et al. 2007 Nat Genet, Howie et al. 2009 PLoS Genet, Li et al. 2010
Genet Epidemiol, Howie et al. 2012 Nat Genet, Fuchsberger et al. 2015 Bioinformatics
![Page 171: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/171.jpg)
Imputation: Why try?
• Increase power to detect disease association at untyped causal SNP
(imputed causal SNP may have stronger association than tag SNP)
![Page 172: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/172.jpg)
Imputation: Why try?
• Increase power to detect disease association at untyped causal SNP
(imputed causal SNP may have stronger association than tag SNP)
• Enable meta-analysis of studies on Affymetrix + Illumina chips
![Page 173: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/173.jpg)
Imputation: Why try?
• Increase power to detect disease association at untyped causal SNP
(imputed causal SNP may have stronger association than tag SNP)
• Enable meta-analysis of studies on Affymetrix + Illumina chips
• Improve genotype data quality
![Page 174: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/174.jpg)
Imputation: Algorithms
Hidden Markov Model (HMM) based approaches:
• IMPUTE (Marchini et al. 2007 Nat Genet, Howie et al. 2009 PLoS Genet,
Howie et al. 2012 Nat Genet)
• MACH (Li et al. 2010 Genet Epidemiol)
• fastPHASE/BIMBAM (Scheet/Stephens 2006 AJHG, Servin/Stephens 2007
PLoS Genet, Guan/Stephens 2008 PLoS Genet)
• GEDI (Kennedy et al. 2008 ISBRA)
Localized Haplotype Clustering:
• BEAGLE (Browning/Browning 2007 AJHG, Browning/Browning 2009 AJHG)
Likelihood-based approaches:
• UNPHASED (Dudbridge 2008 Hum Hered)
• SNPMStat (Lin et al. 2008 AJHG)
reviewed in Marchini et al. 2010 Nat Rev Genet; also see Li et al. 2009 ARGHG
![Page 175: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/175.jpg)
Imputation: What do the algorithms output?
Integer-valued genotypes at untyped SNPs
e.g. genotype = 2
OR
Continuous genotype dosages at untyped SNPs
e.g. genotype dosage = 1.79
OR
Continuous genotype probabilities at untyped SNPs
e.g. genotype probabilities P(0) = 0.01, P(1) = 0.19, P(2) = 0.80
![Page 176: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/176.jpg)
Imputation: People do it.
reviewed in Marchini et al. 2010 Nat Rev Genet; also see Li et al. 2009 ARGHG
![Page 177: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/177.jpg)
HMM-based imputation approaches
hap1
hap2
hap3
hap4
hap5
Imp.
reviewed in Marchini et al. 2010 Nat Rev Genet; also see Li et al. 2009 ARGHG
? ? ?
Note: current paradigm is to first phase the data, then run imputation on
phased data (Howie et al. 2012 Nat Genet, Fuchsberger et al. 2015 Bioinformatics)
![Page 178: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/178.jpg)
HMM-based imputation approaches
hap1
hap2
hap3
hap4
hap5
Imp.
reviewed in Marchini et al. 2010 Nat Rev Genet; also see Li et al. 2009 ARGHG
Note: current paradigm is to first phase the data, then run imputation on
phased data (Howie et al. 2012 Nat Genet, Fuchsberger et al. 2015 Bioinformatics)
![Page 179: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/179.jpg)
Measuring imputation accuracy
Concordance rate: % of genotypes (or alleles) imputed correctly
• Natural analogue of genotyping error rate in QC analyses
• Concordance rate is often in the range of 95-99%.
Squared correlation (r2) between true and imputed genotype
• Natural analogue of r2 between causal SNP and tag SNP
• r2 << concordance rate, particularly for rare SNPs.
![Page 180: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/180.jpg)
Measuring imputation accuracy
Concordance rate: % of genotypes (or alleles) imputed correctly
• Natural analogue of genotyping error rate in QC analyses
• Concordance rate is often in the range of 95-99%.
Squared correlation (r2) between true and imputed genotype
• Natural analogue of r2 between causal SNP and tag SNP
• r2 << concordance rate, particularly for rare SNPs.
![Page 181: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/181.jpg)
Measuring imputation accuracy
Concordance rate: % of genotypes (or alleles) imputed correctly
• Natural analogue of genotyping error rate in QC analyses
• Concordance rate is often in the range of 95-99%.
Squared correlation (r2) between true and imputed genotype
• Natural analogue of r2 between causal SNP and tag SNP
• r2 << concordance rate, particularly for rare SNPs.
![Page 182: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/182.jpg)
Measuring imputation accuracy
Concordance rate: % of genotypes (or alleles) imputed correctly
• Natural analogue of genotyping error rate in QC analyses
• Concordance rate is often in the range of 95-99%.
Squared correlation (r2) between true and imputed genotype
• Natural analogue of r2 between causal SNP and tag SNP
• r2 << concordance rate, particularly for rare SNPs.
Normalized difference between true and imputed allele frequency
• Measures whether imputation is biased towards ref or var allele
![Page 183: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/183.jpg)
Imputation using HapMap data
International HapMap3 Consortium 2010 Nature
common SNPs imputed using HapMap2 CEU (N=120): r2 = 0.95
(European-ancestry WTCCC samples, Affymetrix & Illumina chips)
![Page 184: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/184.jpg)
Imputation using HapMap data
International HapMap3 Consortium 2010 Nature
common SNPs imputed using HapMap2 CEU (N=120): r2 = 0.95
common SNPs imputed using HapMap3 CEU+TSI (N=410): r2 = 0.96
(European-ancestry WTCCC samples, Affymetrix & Illumina chips)
![Page 185: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/185.jpg)
Imputation using HapMap data
International HapMap3 Consortium 2010 Nature
x-axis: MAF<5% SNPs, imputed using HapMap2 CEU (N=120)
y-axis: MAF<5% SNPs, imputed using HapMap3 CEU+TSI (N=410)
![Page 186: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/186.jpg)
Imputation using HapMap data
International HapMap3 Consortium 2010 Nature
x-axis: MAF<5% SNPs, imputed using HapMap2 CEU (N=120)
y-axis: MAF<5% SNPs, imputed using HapMap3 CEU+TSI (N=410)
![Page 187: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/187.jpg)
Imputation using HapMap data
International HapMap3 Consortium 2010 Nature
x-axis: MAF<5% SNPs, imputed using HapMap2 CEU (N=120)
y-axis: MAF<5% SNPs, imputed using HapMap3 CEU+TSI (N=410)
![Page 188: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/188.jpg)
![Page 189: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/189.jpg)
Low-coverage sequencing + imputation
increases power vs. genotyping arrays
Cost per
sample
Actual
#samples
Average
imputation r2
Effective
#samples
Illumina 1M array $400 750 1.00 750
0.4x sequencing $83* 3,600 0.81** 2,900
0.1x sequencing $43* 7,000 0.64** 4,500
Pasaniuc et al. 2012 Nat Genet; also see Cai et al. 2015 Nature, Davies et al. 2016 Nat Genet
Effective sample size of a GWAS with a $300,000 budget:
*Based on sample preparation cost of $30/sample, which is conservatively
double the $15/sample reported by Rohland & Reich 2012 Genome Res,
and on $133 per 1x sequencing (Illumina Network cost).
**Imputation r2 attained at Illumina 1M SNPs by downsampling reads from
real off-target exome sequencing data. Relative performance of
low-coverage sequencing will be even higher at non-Illumina 1M SNPs.
![Page 190: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/190.jpg)
Outline
1. Introduction to Linkage Disequilibrium
2. LD and Tag SNPs
3. LD and imputation
4. LD and fine-mapping (to be continued, Tue of Week 4)
![Page 191: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/191.jpg)
Definition of fine-mapping
Manhattan plot from Ikram et al. 2010 PLoS Genet
Which of these SNPs on chr 6 is the biologically causal SNP?
(Ditto for chr 5, 8, 12, 19)
![Page 192: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/192.jpg)
WTCCC fine-mapping study
Maller et al. 2012 Nat Genet
![Page 193: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/193.jpg)
GWAS in Europeans
SNP1: P-value = 10-8
LD and fine-mapping in Europeans
![Page 194: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/194.jpg)
TCF7L2 locus in T2D: 1 top signal
Maller et al. 2012 Nat Genet
![Page 195: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/195.jpg)
Fine-mapping in Europeans
SNP1: P-value = 10-8 CAUSAL??
SNP2: P-value = 10-8 CAUSAL??
LD and fine-mapping in Europeans
![Page 196: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/196.jpg)
FTO locus in T2D: many top signals
Maller et al. 2012 Nat Genet
![Page 197: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/197.jpg)
Fine-mapping in Europeans Fine-mapping in Africans
SNP1: P-value = 10-8 SNP1: P-value = 10-5
SNP2: P-value = 10-8 SNP2: P-value = 0.62
SNP3: P-value = 0.41 SNP3: P-value = 10-5
LD in Europeans LD in Africans
LD and cross-population fine-mapping
r2 SNP1 SNP2 SNP3
SNP1 1.00 0.99 0.08
SNP2 0.99 1.00 0.07
SNP3 0.08 0.07 1.00
r2 SNP1 SNP2 SNP3
SNP1 1.00 0.12 0.98
SNP2 0.12 1.00 0.14
SNP3 0.98 0.14 1.00
![Page 198: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/198.jpg)
Fine-mapping in Europeans Fine-mapping in Africans
SNP1: P-value = 10-8 SNP1: P-value = 10-5 CAUSAL
SNP2: P-value = 10-8 SNP2: P-value = 0.62
SNP3: P-value = 0.41 SNP3: P-value = 10-5
LD in Europeans LD in Africans
LD and cross-population fine-mapping
r2 SNP1 SNP2 SNP3
SNP1 1.00 0.99 0.08
SNP2 0.99 1.00 0.07
SNP3 0.08 0.07 1.00
r2 SNP1 SNP2 SNP3
SNP1 1.00 0.12 0.98
SNP2 0.12 1.00 0.14
SNP3 0.98 0.14 1.00
![Page 199: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/199.jpg)
LD and multi-ethnic fine-mapping
Zaitlen*, Pasaniuc* et al. 2010 Am J Hum Genet
also see Morris 2011 Genet Epidemiol, Udler et al. 2009 Hum Mol Genet,
Wu et al. 2013 PLoS Genet, Peters et al. 2013 PLoS Genet, Liu et al. 2016 Am J Hum Genet
![Page 200: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/200.jpg)
• Linkage Disequilibrium is good, because we can tag most
common SNPs using chips with 1,000,000 SNPs or less.
• Linkage Disequilibrium is good, because we can infer
imputed genotypes at most common HapMap SNPs.
• Linkage Disequilibrium is bad, because it leads to
ambiguity as to the causal SNP when doing fine-mapping.
• Studying multiple populations, especially Africans (low LD),
can improve our ability to localize causal variants.
Conclusions
![Page 201: EPI 511, Advanced Population and Medical Genetics](https://reader030.vdocuments.us/reader030/viewer/2022012708/61a8923e4f7b9e4887520287/html5/thumbnails/201.jpg)
EPI 511: Office Hours
Instructor: Alkes
Office Hours: Thu 3:30-4:30pm, Building 2, Room 211
Email Address: [email protected]
(Please put EPI511 in the subject of your email)
Teaching Assistant: Armin
Office Hours: Fri + Mon 2-3pm, Building 2, Room 209
Email Address: [email protected]