Download - Practical With Merlin
![Page 1: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/1.jpg)
Practical With Merlin
Gonçalo Abecasis
![Page 2: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/2.jpg)
MERLIN Websitewww.sph.umich.edu/csg/abecasis/Merlin
• Reference
• FAQ
• Source
• Binaries
• Tutorial– Linkage– Haplotyping– Simulation– Error detection– IBD calculation– Association Analysis
![Page 3: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/3.jpg)
QTL Regression Analysis
• Go to Merlin website– Click on tutorial (left menu)– Click on regression analysis (left menu)
• What we’ll do:– Analyze a single trait– Evaluate family informativeness
![Page 4: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/4.jpg)
Rest of the Afternoon
• Other things you can do with Merlin …
– Checking for errors in your data
– Dealing with markers that aren’t independent
– Affected sibling pair analysis
![Page 5: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/5.jpg)
Affected Sibling Pair Analysis
![Page 6: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/6.jpg)
Quantitative Trait Analysis
• Individuals who share particular regions IBD are more similar than those that don’t …
• … but most linkage studies rely on affected sibling pairs, where all individuals have the same phenotype!
Linkage No Linkage
![Page 7: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/7.jpg)
Allele Sharing Analysis
• Traditional analysis method for discrete traits
• Looks for regions where siblings are more similar than expected by chance
• No specific disease model assumed
![Page 8: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/8.jpg)
Historical References
• Penrose (1953) suggested comparing IBD distributions for affected siblings.– Possible for highly informative markers (eg. HLA)
• Risch (1990) described effective methods for evaluating the evidence for linkage in affected sibling pair data.
• Soon after, large-scale microsatellite genotyping became possible and geneticists attempted to tackle more complex diseases…
![Page 9: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/9.jpg)
Simple Case
• If IBD could be observed
• Each pair of individuals scored as • IBD=0
• IBD=1
• IBD=2
• Test whether sharing distribution is compatible with 1:2:1 proportions of sharing IBD 0, 1 and 2.
![Page 10: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/10.jpg)
Sib Pair Likelihood (Fully Informative Data)
210
210
210
41
21
41
hypothesis ealternativ Under the
)()()(
:hypothesis null Under the
IBDIBDIBD
IBDIBDIBD
nnn
nnn
zzzL
L
),,(
)ˆ,ˆ,ˆ(log
41
221
141
0
21010
zzzL
zzzLLOD
![Page 11: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/11.jpg)
The MLS Method
• Introduced by Risch (1990, 1992)– Am J Hum Genet 46:242-253
• Uses IBD estimates from partially informative data– Uses partially informative data efficiently
• The MLS method is still one of the best methods for analysis pair data
• I will skip details here …
![Page 12: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/12.jpg)
Non-parametric Analysis for Arbitrary Pedigrees
• Must rank general IBD configurations which include sets of more than 2 affected individuals– Low ranks correspond to no linkage– High ranks correspond to linkage
• Multiple possible orderings are possible– Especially for large pedigrees
• In interesting regions, IBD configurations with higher rank are more common
![Page 13: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/13.jpg)
Non-Parametric Linkage Scores
• Introduced by Whittemore and Halpern (1994)
• The two most commonly used ones are:– Pairs statistic
• Total number of alleles shared IBD between pairs of affected individuals in a pedigree
– All statistic• Favors sharing of a single allele by a large number of
affected individuals.
![Page 14: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/14.jpg)
Kong and Cox Method
• A probability distribution for IBD states– Under the null and alternative
• Null– All IBD states are equally likely
• Alternative– Increase (or decrease) in probability of each state is
modeled as a function of sharing scores
• "Generalization" of the MLS method
![Page 15: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/15.jpg)
Parametric Linkage Analysis
• Alternative to non-parametric methods– Usually ideal for Mendelian disorders
• Requires a model for the disease– Frequency of disease allele(s)– Penetrance for each genotype
• Typically employed for single gene disorders and Mendelian forms of complex disorders
![Page 16: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/16.jpg)
Typical Interesting Pedigree
![Page 17: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/17.jpg)
Checking for Genotyping Error
![Page 18: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/18.jpg)
Genotyping Error
• Genotyping errors can dramatically reduce power for linkage analysis (Douglas et al, 2000; Abecasis et al, 2001)
• Explicit modeling of genotyping errors in linkage and other pedigree analyses is computationally expensive (Sobel et al, 2002)
![Page 19: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/19.jpg)
Intuition: Why errors mater …
• Consider ASP sample, marker with n alleles
• Pick one allele at random to change– If it is shared (about 50% chance)
• Sharing will likely be reduced
– If it is not shared (about 50% chance)• Sharing will increase with probability about 1 / n
• Errors propagate along chromosome
t
![Page 20: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/20.jpg)
Effect on Error in ASP Sample
-4
-3
-2
-1
0
1
2
3
4
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
Ave
rag
e L
OD
![Page 21: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/21.jpg)
Error Detection
• Genotype errors can change inferences about gene flow– May introduce additional
recombinants
• Likelihood sensitivity analysis– How much impact does
each genotype have on likelihood of overall data
2 2 2 22 1 2 12 2 2 22 1 2 11 2 1 22 2 2 21 1 2 22 1 2 11 1 1 11 2 1 22 1 2 11 2 1 21 1 1 1
![Page 22: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/22.jpg)
Sensitivity Analysis
• First, calculate two likelihoods:– L(G|), using actual recombination fractions– L(G| = ½), assuming markers are unlinked
• Then, remove each genotype and:– L(G \ g|)– L(G \ g| = ½)
• Examine the ratio rlinked/runlinked
– rlinked = L(G \ g|) / L(G|)
– runlinked = L(G \ g| = ½) / L(G| = ½)
![Page 23: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/23.jpg)
Mendelian Errors Detected (SNP)
34.6 36.2
55.437.2 53.528.9 42.956.3
39.5 39.3 38.7 37.0 36.4 37.3 37.5 38.7 37.4
% of Errors Detected in 1000 Simulations
![Page 24: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/24.jpg)
Overall Errors Detected (SNP)
80.2 78.4
99.277.5 99.359.4 90.8100.0
95.6 95.8 96.3 96.0 96.6 96.6 97.4 97.6 98.0
![Page 25: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/25.jpg)
Error Detection
Mendelian
Errors Unlikely
Genotypes Overall
Detection Rate
No Genotyped Parents 2 siblings 0.00 0.16 0.16 3 siblings .00 .38 0.38 4 siblings .00 .61 0.61 5 siblings .00 .77 0.77 One Genotyped Parent 2 siblings 0.13 0.34 0.47 3 siblings .13 .58 0.71 4 siblings .12 .72 0.84 5 siblings .12 .78 0.91 Two Genotyped Parents 2 siblings 0.37 0.56 0.93 3 siblings .37 .56 0.93 4 siblings .38 .59 0.97 5 siblings .37 .60 0.97
Simulation: 21 SNP markers, spaced 1 cM
![Page 26: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/26.jpg)
Markers That Are not Independent
![Page 27: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/27.jpg)
SNPs
• Abundant diallelic genetic markers
• Amenable to automated genotyping– Fast, cheap genotyping with low error rates
• Rapidly replacing microsatellites in many linkage studies
![Page 28: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/28.jpg)
The Problem
• Linkage analysis methods assume that markers are in linkage equilibrium– Violation of this assumption can produce large
biases
• This assumption affects ...– Parametric and nonparametric linkage– Variance components analysis– Haplotype estimation
![Page 29: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/29.jpg)
Standard Hidden Markov Model
1G 2G 3G MG
2I 3I MI1I
)|( 12 IIP )|( 23 IIP (...)P
)|( 11 IGP )|( 22 IGP )|( 33 IGP )|( MM IGP
Observed Genotypes Are Connected Only Through IBD States …
![Page 30: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/30.jpg)
Our Approach
• Cluster groups of SNPs in LD – Assume no recombination within clusters– Estimate haplotype frequencies– Sum over possible haplotypes for each founder
• Two pass computation …– Group inheritance vectors that produce
identical sets of founder haplotypes – Calculate probability of each distinct set
![Page 31: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/31.jpg)
1clusterI
Hidden Markov Model
1G 2G 3G MG
)|( 12 clustercluster IIP (...)P
)|,( 121 clusterIGGP )|,( 243 clusterIGGP )|( ,1 clusterNMM IGGP
Example With Clusters of Two Markers …
4G 1MG
2clusterI clusterNI
![Page 32: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/32.jpg)
Practically …
h
H
h
H
f
ihifC
h
H
h
HhffChC
f
f
ffHvHHGG
ffHHvHHGGvffGGP
1 1
2
11211
1 112121111
1 2
1 2
)...|Pr(),...|...Pr(...
)...|...Pr(),...|...Pr(...),...|...(
• Probability of observed genotypes G1…GC
– Conditional on haplotype frequencies f1 .. fh
– Conditional on a specific inheritance vector v
• Calculated by iterating over founder haplotypes
![Page 33: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/33.jpg)
Computationally …
• Avoid iteration over h2f founder haplotypes– List possible haplotype sets for each cluster– List is product of allele graphs for each marker
• Group inheritance vectors with identical lists– First, generate lists for each vector– Second, find equivalence groups– Finally, evaluate nested sum once per group
![Page 34: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/34.jpg)
Example of What Could Happen…
![Page 35: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/35.jpg)
Simulations …
• 2000 genotyped individuals per dataset– 0, 1, 2 genotyped parents per sibship– 2, 3, 4 genotyped affected siblings
• Clusters of 3 markers, centered 3 cM apart– Used Hapmap to generate haplotype frequencies
• Clusters of 3 SNPs in 100kb windows• Windows are 3 Mb apart along chromosome 13• All SNPs had minor allele frequency > 5%
– Simulations assumed 1 cM / Mb
![Page 36: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/36.jpg)
Average LOD Scores(Null Hypothesis)
Analysis Ignore Model IndependentStrategy LD LD SNPs
No parents genotyped… 2 sibs per family 2.111 -0.016 -0.015… 3 sibs per family 3.202 -0.010 -0.013… 4 sibs per family 2.442 -0.022 -0.015
One parent genotyped… 2 sibs per family 0.603 -0.004 -0.003… 3 sibs per family 0.703 -0.002 -0.004… 4 sibs per family 0.471 -0.012 -0.010
Two parents genotyped… 2 sibs per family -0.006 -0.006 -0.006… 3 sibs per family 0.008 0.008 0.005… 4 sibs per family -0.014 -0.014 -0.012
Average LOD
![Page 37: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/37.jpg)
5% Significance Thresholds(based on peak LODs under null)
Analysis Ignore Model IndependentStrategy LD LD SNPs
No parents genotyped… 2 sibs per family 11.37 1.33 1.26… 3 sibs per family 15.80 1.34 1.28… 4 sibs per family 13.46 1.27 1.17
One parent genotyped… 2 sibs per family 4.97 1.43 1.35… 3 sibs per family 5.48 1.38 1.27… 4 sibs per family 4.32 1.42 1.35
Two parents genotyped… 2 sibs per family 1.58 1.58 1.40… 3 sibs per family 1.55 1.54 1.43… 4 sibs per family 1.44 1.44 1.30
Significance Threshold
![Page 38: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/38.jpg)
Empirical Power
Analysis Ignore Model IndependentStrategy LD LD SNPs
No parents genotyped… 2 sibs per family 0.188 0.289 0.276… 3 sibs per family 0.336 0.617 0.530… 4 sibs per family 0.538 0.920 0.871
One parent genotyped… 2 sibs per family 0.163 0.207 0.184… 3 sibs per family 0.384 0.535 0.493… 4 sibs per family 0.697 0.852 0.811
Two parents genotyped… 2 sibs per family 0.153 0.155 0.171… 3 sibs per family 0.424 0.428 0.438… 4 sibs per family 0.800 0.800 0.794
Power (Model 2)
Disease Model, p = 0.10, f11 = 0.01, f12 = 0.02, f22 = 0.04
![Page 39: Practical With Merlin](https://reader035.vdocuments.us/reader035/viewer/2022081519/56813b6d550346895da474cb/html5/thumbnails/39.jpg)
Conclusions from Simulations
• Modeling linkage disequilibrium crucial – Especially when parental genotypes missing
• Ignoring linkage disequilibrium– Inflates LOD scores– Both small and large sibships are affected– Loses ability to discriminate true linkage