theory of ibd sharing in the wright-fisher model
Post on 22-Feb-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
Theory of IBD sharing in the Wright-Fisher modelShai Carmi, Pier Francesco Palamara, Vladimir Vacic, and Itsik Pe’er
Department of Computer Science, Columbia University, New York, NY
1. A. Gusev et al., Whole population, genome-wide mapping of hidden relatedness, Genome Res. 19, 318 (2009).2. B. L. Browning and S. R. Browning, A fast, powerful method for detecting identity-by-descent, AJHG 88, 173 (2011).3. S. R. Browning and B. L. Browning, Identity by Descent Between Distant Relatives: Detection and Applications, Annu. Rev. Genet. 46, 615 (2012).4. P. F. Palamara et al. Length Distributions of Identity by Descent Reveal Fine-Scale Demographic History, AJHG (2012).5. H. Li and R. Durbin, Inference of human population history from individual whole-genome sequences, Nature 449, 851 (2011).6. A. Gusev et al. Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population, Genetics
190, 679 (2012).7. Y. Shen et al., Coverage tradeoffs and power estimation in the design of whole-genome sequencing experiments for detecting association,
Bioinformatics 27, 1995 (2011).
References
Distribution of the total sharing: renewal theory
Background: Identity-by-descent
Direct calculation of the variance
The cohort-averaged sharing and imputation by IBD
More applications and conclusions
A B
AB
A shared segment
• In populations that have recently underwent strong genetic drift, most individuals share a very recent common ancestor.
• Long haplotypes are frequently shared identical-by-descent (IBD).• Algorithms can detect IBD shared segments between all pairs in
large cohorts based on either segment length or frequency [1,2].• Applications:
o Demographic inferenceo Imputationo Phasingo Association of rare variants/haplotypeso Pedigree reconstructiono Detection of positive selection.o See review [3]
How does the amount of sharing depend on the demographic history of the population?
The Wright-Fisher model:• Non-overlapping, discrete generations.• Constant size of N haploid individuals,
or,changing size
• Ignore recent mutations.• Recombination is a Poisson process.• Each pair of individuals (linages) has
probability 1/N to coalesce in the previous generation.
• For continuous-time and large population size, approximated by the coalescent.
• (Scaled) Time to most recent common ancestor: ( for constant size).
• Assume a segment can be detected only if it is longer than m (Morgans).
• Denote the fraction of the chromosome shared between two random individuals as the total sharing fT.
• Palamara et al. [4]:.
• For constant size,
• Used to infer population histories.
Questions:• Distribution? Higher moments?• Differences between individuals?• Applications [imputation by IBD, sharing between siblings]
ℓ1
0 Lcoordinate
ℓ2ℓ3 ℓ4
ℓ5 ℓ6ℓ7 ℓ8
ℓ9 ℓ10ℓ11
m ℓT=ℓ1+ℓ5+ℓ9
A
B
• In each block, the two chromosomes maintain the same ancestor. • Blocks (segments) end at recombination events.• Define ℓT as the total length of segments having length ≥m.• In the Sequentially Markov Coalescent, fT=ℓT/L.
• Li and Durbin [5] showed that at segment ends:
• For a given t, the probability of no recombination at distance ℓ is . Therefore (see also [4]),
• For constant size, , .• Find the distribution of ℓT using renewal theory. Map:
o Coordinate on chromosome → time (t) o Shared segments → waiting times between eventso L → T, ℓT → tS
o Segment length PDF P(ℓ) → waiting time PDF ψ(τ).• Laplace transform the PDF PT(tS) → Ps(u).
The PDF of the number of shared segments (Laplace transformed T → s)
• A general equation for the variance of the total sharing fT:
• M: number of markers; sum is over all markers• I(s): indicator of a site to lie on a shared segment; with probability π.• π2(s1,s2): probability of both sites s1 and s2 to lie on shared segments.• A simple approximation:
• For a constant size population:
• A full solution of the variance: (only key equations shown)• .• pnr: probability of no recombination between the two sites in the
history of the two chromosomes.• πnr: the probability of the two sites to lie on shared segments, given
that there was no recombination (similarly for πr).• For a discrete ancestral process and distance d between the sites:
• When there was recombination, calculation of πr is complicated by the fact that the segments are bounded on one end.
• Solve by explicit calculations on the coalescent with recombination.
• Define the cohort-averaged sharing:
• For each individual: the average sharing to the rest of the cohort.
• Approximate the variance:• .• For small n, • For large n, , independent of n.• Distribution is approximately normal.
Imputation by IBD:
• Assume a cohort of size n is genotyped and IBD sharing is detected between all pairs.
• A fraction ns/n of the individuals is selected for sequencing.
• Non-sequenced individuals are imputed using the sequenced individuals along segments of IBD sharing.
• What is the expected imputation success rate when individuals are randomly selected?
• What is the success rate when individuals are selected according to their cohort-averaged sharing [6]?
• Define pc as the fraction of the genome covered by IBD segments shared with the sequenced individuals.
Downstream effect on power of association:
• The effective number of sequenced individuals increases with imputation success rate.
• Power to detect variant of frequency β appearing in cases only [7]:
See our paper:
S. Carmi, P. F. Palamara, V. Vacic, T. Lencz, A. Darvasi, and I. Pe’er, The variance of identity-by-descent sharing in the Wright-Fisher model, Submitted (2012). arXiv:1206.4745.
Sharing between siblings:
• The variance in sharing between (same parent) chromosomes of siblings is known.
• What happens when siblings come from an inbred population and thus share also due to remote ancestry?
• The mean sharing is • When calculating variance, decompose sharing into
either same-grandparent or remote.
A simple estimator of the population size:
• Use , isolate N and simplify: , where is the total sharing averaged over all pairs.
• Can be seen that • The variance of the estimator:
.
Summary and discussion:
• We obtained analytical results for properties of IBD sharing in the Wright-Fisher model.
• Calculated the distribution using renewal theory and the variance using two methods.
• Treat genotyping/phasing errors by increasing the length cutoff m. If segments are missed with probability ε, can show that both mean and variance are scaled by (1- ε).
• Other analytical approaches and applications to demographic inferences in [4] and talk here.
• The sharing per individual (averaged over cohort) exhibits a surprisingly wide distribution even for large cohorts.
• Can be taken advantage of in imputation by IBD.
,
top related