bayesian estimators of time to most recent common ancestry

56
Bayesian Estimators of Time to Most Recent Common Ancestry Ecology and Evolutionary Biology Adjunct Appointments Molecular and Cellular Biology Plant Sciences Epidemiology & Biostatistics Animal Sciences Bruce Walsh [email protected]

Upload: plato

Post on 14-Feb-2016

18 views

Category:

Documents


1 download

DESCRIPTION

Bayesian Estimators of Time to Most Recent Common Ancestry. Bruce Walsh. [email protected]. Ecology and Evolutionary Biology. Adjunct Appointments Molecular and Cellular Biology Plant Sciences Epidemiology & Biostatistics Animal Sciences. Definitions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bayesian Estimators of Time to Most Recent Common Ancestry

Bayesian Estimators ofTime to Most Recent Common Ancestry

Ecology and Evolutionary BiologyAdjunct Appointments

Molecular and Cellular BiologyPlant Sciences

Epidemiology & Biostatistics Animal Sciences

Bruce [email protected]

Page 2: Bayesian Estimators of Time to Most Recent Common Ancestry

DefinitionsMRCA - Most Recent Common Ancestor

TMRCA - Time to Most Recent Common Ancestor

Question: Given molecular marker information from a pair of individuals, what is the estimated time backto their most recent common ancestor?

With even a small number of highly polymorphicautosomal markers, trivial to assess zero (subject/biological sample) and one (parent-offspring) MRCA

Page 3: Bayesian Estimators of Time to Most Recent Common Ancestry

Problems with Autosomal Markers

Often we are very interested in MRCAs that are modest(5-10 generations) or large (100’s to 10,000’s of generations)

Unlinked autosomal markers simply don’t work over thesetime scales.

Reason: IBD probabilities for individuals sharing a MRCA5 or more generations ago are extremely small and hencevery hard to estimate (need VERY large number of markers).

Page 4: Bayesian Estimators of Time to Most Recent Common Ancestry

MRCA-I vs. MRCA-GWe need to distinguish between the MRCA for a pairof individuals (MRCA-I) and the MRCA for a particulargenetic marker G (MRCA-G).

MRCA-G varies between any two individuals over recombination units.

For example, we could easily have for a pair of relativesMRCA (mtDNA ) = 180 generationsMRCA (Y ) = 350 generationsMRCA (one -globulin allele ) = 90 generationsMRCA (other -globulin allele ) = 400 generations

Page 5: Bayesian Estimators of Time to Most Recent Common Ancestry

MRCA-G > MRCA-I

MRCA-I

lost

MRCA-G( )

MRCA-G( )

Page 6: Bayesian Estimators of Time to Most Recent Common Ancestry

mtDNA and Y Chromosomes

So how can we accurately estimate TMRCA for modestto large number of generations?Answer: Use a set of completely linked markers

With autosomes, unlinked markers assort each generationleaving only a small amount of IBD information on eachmarker, which we must then multiply together. IBD information decays on the order of 1/2 each generation.

With completely linked marker loci, information on IBD does not assort away via recombination. IBD information decay is on the order of the mutation rate.

Page 7: Bayesian Estimators of Time to Most Recent Common Ancestry

Y chromosome microsatellitemutation rates- I

Estimate of u Source Reference0.0028 Y chromosome Kayser et al. 20000.0021 Y chromosome Heyer et al. 19970.001 - 0.0021 Autosomal

chromosomesWong & Weber 1993Brinkmann 1998

Estimates of human mutation rate in microsatellitesare fairly consistent over both the Y and the autosomes

Page 8: Bayesian Estimators of Time to Most Recent Common Ancestry

Basic Structure of the Problem

What is the probability that the two marker alleles at a haploid locus from two related individuals agree given that their MRCA was t generation ago?

Phrased another way, what is their probabilityof identity by state (IBS), given they are identicalby descent (IBD) when their TMRCA is t generations

Page 9: Bayesian Estimators of Time to Most Recent Common Ancestry

Infinite Alleles ModelThe first step in answering this question is to assume a particular mutational modelOur (initial) assumption will be the infinite allelesmodel (IAM)

The key assumption of this model (originally due to Kimura and Crow, 1964) is that each new mutationgives rise to a new allele.

The IAM was the first population-genetics model toattempt to formally incorporate the structure of DNAinto a model

Page 10: Bayesian Estimators of Time to Most Recent Common Ancestry

Key: Under the infinite alleles, two alleles that areidentical in state that are also ibd have notexperienced any mutations since their MRCA.

Let q(t) = Probability two alleles with a MRCAt generations ago are identical in state

If u = per generation mutation rate, then

q(t) = (1-u)2t

MRCA

(1-u) t

A

(1-u)t

B

MRCA

Pr(No mutation from MRCA->A) = (1-u)t

Pr(No mutation from MRCA->B) = (1-u)t

Page 11: Bayesian Estimators of Time to Most Recent Common Ancestry

q(t) = (1-u)2t ≈ e-2ut = e-, = 2ut

Building the Likelihood Function for n Loci

For any single marker locus, the probability of IBSgiven a TMRCA of t generations is

The probability that k of n marker loci are IBS is justPr(k)=n!(n°k)!k!q(t)k[1°q(t)]n°k- - -

a Binomial distribution with success parameter q(t)L(tjn;k)=n!(n°k)!k!e°kø°1°e°ø¢n°k- -- --( )Likelihood function for t given k of n matches

Page 12: Bayesian Estimators of Time to Most Recent Common Ancestry

ML Analysis of TMRCAL(tjn;k)=n!(n°k)!k!e°kø°1°e°ø¢n°k- -- --( )

It would seem that we now have all the pieces inhand for a likelihood analysis of TMRCA giventhe marker data (k of n matches)

Likelihood function ( = 2ut)

MLE for t is solution of ∂ L/∂t = 0

p = fraction of matches

ø=2tπ=ln≥nk¥=lnµ1p∂=°ln(p)-( ) ( )^ ^

Page 13: Bayesian Estimators of Time to Most Recent Common Ancestry

In particular, the MLE for t becomes

Likewise, the precision of this estimator followsfor the (negative) inverse of the 2nd derivativeof the log-likelihood function evaluated at theMLE,

(bt=12πln≥nk¥=°12πln(p)) -°µ@2lnL(tjn;k)@t2ØØØØt=t∂°1=14u21nµ1°pp∂Var( t ) = --( )^^ -

Page 14: Bayesian Estimators of Time to Most Recent Common Ancestry

Likewise, we can (numerically) easily find 1-LOD support intervals for t and hence constructapproximate 95% confidence intervals to TMRCA

Finally, hypothesis testing, say Ho: MRCA = t0, is easily accomplished by comparing -2* the naturallog of the ratio of the value of the likelihood function at t = t0 over the value of the likelihood function at the MLE t = t ̂

The resulting log likelihood ratio LR is (asymptotically)distributed as a chi-square distribution with one degree of freedom

Page 15: Bayesian Estimators of Time to Most Recent Common Ancestry

Trouble in ParadiseThe ML machinery has seem to have done its job,giving us an estimate, its approximate samplingerror, approximate confidence intervals, and a schemefor hypothesis testing.

Hence, all seems well.

Problem: Look at k=n (= complete match at all markers).MLE (TMRCA) = 0 (independent of n)Var(MLE) = 0 (ouch!)

Page 16: Bayesian Estimators of Time to Most Recent Common Ancestry

With n=k, the value of the likelihood function is

L(t) = (1-u)2tn ≈ e-2tun

What about one-LOD support intervals (95% CI) ?

L has a maximum value of one under the MLE

Hence, any value of t that gives a likelihood value of0.1 or larger is in the one-LOD support interval

Solving, the one-LOD support interval is from t=0 to t = (1/2n) [ -Ln(10)/Ln(1-u) ] ≈ (1/n) [ Ln(10)/(2u) ]

For u = 0.002, CI is (0, 575/n)

Page 17: Bayesian Estimators of Time to Most Recent Common Ancestry

With n=k, likelihood function reduces toL(t) = (1-u)2tn ≈ e-2tun

t

L(t)

(Plots for u = 0.002)

MLE(t) = 0 for all values on nn=5

n=10

n=20

0.1 of max value (1) oflikelihood function

1 LOD ≈ t = 291 LOD ≈ t = 581 LOD ≈ t = 115

Page 18: Bayesian Estimators of Time to Most Recent Common Ancestry

What about Hypothesis testing?Again recall that for k =n that the likelihood at t = t0 is L(t0) ≈ Exp(-2t0un)

Hence, the LR test statistic for Ho: t = t0 is just LR = -2 ln [ L(t0)/ L(0) ] = -2 ln [ Exp(-2t0un) / 1 ] = 4t0unThus the probability for the test that TMRCA = t0 is just Pr( 1

2 > 4t0un)

Page 19: Bayesian Estimators of Time to Most Recent Common Ancestry

The problem(s) with MLThe expressions developed for the sampling variance,approximate confidence intervals, and hypothesistesting are all large-sample approximations

Problem 1: Here our sample size is the number ofmarkers scored in the two individuals. Not likely tobe large.

Problem 2: These expressions are obtained by takingappropriate limits of the likelihood function. If theML is exactly at the boundary of the admissible spaceon the likelihood surface, this limit may not formallyexist, and hence the above approximations are incorrect.

Page 20: Bayesian Estimators of Time to Most Recent Common Ancestry

The Solution? Go Bayesian

An extension of likelihood is Bayesian statistics

p( | x) = C * l(x | ) p()

Instead of simply estimating a point estimate (e.g., the MLE), the goal is the estimate the entire distribution for the unknown parameter given the data x

posterior distribution ofgiven x

Likelihood function for Given the data x

prior distribution for The appropriate constant so that the posteriorintegrates to one.

Why Bayesian? • Exact for any sample size• Marginal posteriors• Efficient use of any prior information

• MCMC (such as Gibbs sampling) methods

Page 21: Bayesian Estimators of Time to Most Recent Common Ancestry

The Prior on TMRCAThe first step in any Bayesian analysis is choice ofan appropriate prior distribution p(t) -- our thoughts onthe distribution of TMRCA in the absence of any ofthe marker data

Standard approach: Use a flat or uninformative prior,with p(t) = a constant over the admissible range of theparameter. Can cause problems if the likelihood functionis unbounded (integrates to infinity)

In our case, population-genetic theory provides theprior: under very general settings, the time to MRCA for a pair of individuals follows a geometric distribution

Page 22: Bayesian Estimators of Time to Most Recent Common Ancestry

In particular, for a haploid gene, TMRCA followsa geometric distribution with mean 1/Ne.

Hence, our prior is just

p(t) = (1-)t ≈ e-t, where = 1/Ne

Hence, we can use an exponential prior withhyperparameter (the parameter fullycharacterizing the distribution) = 1/Ne.The posterior thus becomesp(tjk)/L(tjn;k)p(t)=exp[°(2πk+∏)t](1°exp[°(2πt)])n°k- - - -

Previous likelihood function (ignoring constantsthat cancel when we compute the normalizing factorC)

Prior

Prior hyperparameter = 1/Ne

Page 23: Bayesian Estimators of Time to Most Recent Common Ancestry

The Normalizing constantp(tjk)=exp[°(2πk+∏)t](1°exp[°(2πt)])n°kI(π;k;n;∏)- - - -

whereI(π;k;n;∏)=Z10exp[°(2πk+∏)t](1°exp[°(2πt)])n°kdt- - - -

I ensures that the posterior distribution integratesto one, and hence is formally a probability distribution

Page 24: Bayesian Estimators of Time to Most Recent Common Ancestry

What is the effect of the hyperparameter?p(tjk)=exp[°(2πk+∏)t](1°exp[°(2πt)])n°kI(π;k;n;∏)- - - -

If 2uk >> , then essentially no dependence on theactual value of chosen.

Hence, if 2Neuk >> 1, essentially no dependence on(hyperparameter) assumptions of the prior.

For a typical microsatellite rate of u = 0.002, this is justNek >> 250, which is a very weak assumption. For example,with k =10 matches, Ne >> 25. Even with only 1 match (k=1),just require Ne >> 250.

Page 25: Bayesian Estimators of Time to Most Recent Common Ancestry

Closed-form Solutions for the Posterior Distribution

Complete analytic solutions to the prior can be obtainedby using a series expansion (of the (1-ex)n term) to giveexp[°(2πk+∏)t](1°exp[°(2πt)])n°k- - - -exp[°(2πk+∏)t]√n°kXi=0(°1)i(n°k)!i!(n°k°i)!exp[°(2πti)]!- - -- -

--(=

-=n°kXi=0(°1)i(n°k)!i!(n°k°i)!exp[°(2π(k+i)+∏)t]----

-

Each term is just a * ebt, which is easily integrated

Page 26: Bayesian Estimators of Time to Most Recent Common Ancestry

I(π;k;n;∏)=n°kXi=0(°1)i(n°k)!i!(n°k°i)!Z10exp[°(2π(k+i)+∏)t]dt=n°kXi=0(°1)i(n°k)!i!(n°k°i)!12π(k+i)+∏=2n°k(n°k)!πn°kn°ki=0[∏+2π(n°i)]Q- -

-

-- - -

-- -

--

-

-

- -

With the assumption of a flat prior, = 0, this reduces toI(π;k;n;0)=(n°k)!(k°1)!(2π)n!- -

Page 27: Bayesian Estimators of Time to Most Recent Common Ancestry

Hence, the complete analytic solution of the posterior is

Suppose k = n (no mismatches)

p(tjk;∏)=√Qn°ki=0[∏+2π(n°i)]2n°k(n°k)!πn°k!(1°exp[°2πt])n°kexp[t(2πk+∏)]- - --

-- -

-(In this case, the prior is simply an exponentialdistribution with mean 2un + ,p(tjk=n)=(∏+2nπ)exp[°(2πn+∏)t]-

Page 28: Bayesian Estimators of Time to Most Recent Common Ancestry

Analysis of n = k caseMean TMRCA and its variance:πt=æt=1∏+2nπ'12nπPr(t∑T)=ZT0p(tjk=n)dt=1°exp(°(2πn+∏)T)< --

Cumulative probability:

In particular, the time T satisfying P(t < T) = is TÆ=°ln(1°Æ)2πn+∏--

Page 29: Bayesian Estimators of Time to Most Recent Common Ancestry

For a flat prior ( = 0), the 95% (one-side) confidenceinterval is thus given by -ln(0.5)/(2nu) ≈ 1.50/(nu)

Hence, under a Bayesian analysis for u = 0.02, the95% upper confidence interval is given by ≈ 749/n

Recall that the one-LOD support interval (approximate95% CI) under an ML analysis is ≈ 575/n

The ML solution’s asymptotic approximation significantlyunderestimates the true interval relative to theexact analysis under a full Bayesian model

Page 30: Bayesian Estimators of Time to Most Recent Common Ancestry

Why the difference?

Under ML, we plot the likelihood function and lookfor the 0.1 value

Under a Bayesian analysis, we lookat the posterior probabilitydistribution (likelihood adjustedto integrate to one) and find thevalues that give an area of 0.95

n = 20, area toleft of t=38 = 0.95

n = 10, area toleft of t=75 = 0.95

t

Pr(T

MRC

A <

t)

n = 20, t0.95 = 38 n = 10, t0.95 = 75

Page 31: Bayesian Estimators of Time to Most Recent Common Ancestry

3002502001501005000.00

0.01

0.02

0.03

Time t to MRCA

p( t | k )

10

9

87

65

Posteriors for n = 10

Sample Posteriors for u = 0.002

10090807060504030201000.00

0.02

0.04

0.06

0.08

Time t to MRCA

p( t | k )

20

1918 17

1516

Posteriors for n = 20

40353025201510500.00

0.05

0.10

0.15

0.20

Time t to MRCA

p( t | k )

50

4948 47 46

45

Posteriors for n = 50

20191817161514131211109876543210.00

0.05

0.10

0.15

0.20

0.25

0.30

Time t to MRCA

p( t | k )

100

9998

9695

97

n = 100

656055504540353025201510500.000

0.010

0.020

0.030

0.040

0.050

0.060

Time t to MRCA

p( t | k )

9493

9291

90

89

n = 100

Page 32: Bayesian Estimators of Time to Most Recent Common Ancestry

Key points

• By using the appropriate number of markers wecan get accurate estimates for TMRCA for evenjust a few generations. 20-50 markers will do.

• By using markers on a non recombining chromosomalsection, we can estimate TMRCA over much, muchlonger time scales than with unlinked autosomal markers

• Hence, we have a fairly large window of resolutionfor TMRCA when using a modest number of completelylinked markers.

Page 33: Bayesian Estimators of Time to Most Recent Common Ancestry

Extensions I: Different Mutation Rates

Let marker locus k have mutation rate uk.

Code the observations as xk = 1 if a match, otherwise code xk = 0p(tjx)/exp"°t√∏+2nXk=1πkxk!#nYk=1£1°e°2tπi§1°xk- ( [ ] -- -

The posterior becomes:

Page 34: Bayesian Estimators of Time to Most Recent Common Ancestry

Stepwise Mutation Model (SMM) The Infinite alleles model (IAM) is not especiallyrealistic with microsatellite data, unless the fractionof matches is very high.

Under IAM, score as a match, and hence no mutationsIn reality, there are two mutations

Microsatellite allelic variants are scored by their numberof repeat units. Hence, two “matching” alleles can actuallyhide multiple mutations (and hence more time to the MRCA)

Mutation 1

Mutation 2

Page 35: Bayesian Estimators of Time to Most Recent Common Ancestry

Y chromosome microsatellitemutation rates- II

The SMM model is an attempt to correct formultiple hits by fully accounting for the mutationalstructure.Good fit to array sizes in natural populations when assuming the symmetric single-step model • Equal probability of a one-step move up or down

In direct studies of (Y chromosome) microsatellites35 of 37 dectected mutations in pedigrees weresingle step, other 2 were two-step

Page 36: Bayesian Estimators of Time to Most Recent Common Ancestry

SMM0 model -- match/no match under SMM

The simplest implementation of the SMM model isto simply replace the match probabilities q(t) underthe IAM model with those for the SMM model.

This simply codes the marker loci as match / no matchWe refer to this as the SMMO model

Page 37: Bayesian Estimators of Time to Most Recent Common Ancestry

Formally, the SMM model assumes the followingtransition probabilitiesPr(X(t+1)=i°1jX(t)=i)=Pr(X(t+1)=i+1jX(t)=i)=π2Pr(X(t+1)=ijX(t)=i)=1°πPr(jX(t+1)°X(t)j∏2jX(t)=i)=0-

-- >

Note that two alleles can match only if they haveexperienced an even number of mutations in total betweenthem. In such cases, the match probabilities becomePr(matchj2Mmoves)=122Mµ2MM∂=122M(2M)!(M!)2( )

Page 38: Bayesian Estimators of Time to Most Recent Common Ancestry

Pr(matchj2Mmoves)=122Mµ2MM∂=122M(2M)!(M!)2( )Number ofmutations

Prob(Match)

2 0.5004 0.3756 0.3138 0.27310 0.246

q(t)=1XM=0Pr(matchj2Mmoves)Pr(2Mmovesjt)=1XM=0µ122M(2M)!(M!)2∂µ(2πt)2M(2M)!∂exp(°2tπ)=exp(°2tπ)√1XM=0(πt)2M(M!)2!-

- (1X=0(x)2k(k!)2=I0(2x)kThe zero-order modifedType I Bessel Function

Hence,q(ø)=exp(°ø)I0(ø)-

Page 39: Bayesian Estimators of Time to Most Recent Common Ancestry

= 2μt

q(t)

Infinite-alleles

stepwise

0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 5 10 15 20

0.1

0.2

0.3

0.4

0.5

= 2μt

q(t) Infinite-allelesstepwise

Page 40: Bayesian Estimators of Time to Most Recent Common Ancestry

Under the SMM model, the prior hyperparametercan now become important.

This is the case when the number n of markers issmall and/or k/n is not very close to one

Why? Under the prior, TMRCA is forced by ageometric with 1/Ne. Under the IAM model formost values this is still much more time thanthe likelihood function predicts from the marker data

Under the SMM model, the likelihood alone predictsa much longer value so that the forcing effect of theinitial geometric can come into play

Page 41: Bayesian Estimators of Time to Most Recent Common Ancestry

n =5, k = 3, u = 0.02

Time, t

Pr(T

MRC

A <

t)

IAM, both flat prior and Ne = 5000

SSMO, Ne = 5000

SSMO, flat prior

Prior with Ne =5000

Page 42: Bayesian Estimators of Time to Most Recent Common Ancestry

An Exact Treatment: SMME

With a little work we can show that the probabilitytwo sites differ by j steps is justq(j)(ø)=2exp(°ø)Ij(ø)forj∏1- >

The resulting likelihood thus becomesL(tjn0;¢¢¢;nk)=n!n0!n1!¢¢¢nk!kYj=0hq(j)(2πt)inj… …

Where nj is the number of sites that differby k (observed) steps

The jth-order modifedType I Bessel Function

Page 43: Bayesian Estimators of Time to Most Recent Common Ancestry

With this likelihood, the resulting prior becomesp(tjn0;¢¢¢;nk)/kYj=0hq(j)(2πt)inje°∏t… -

This rearranges to give the general posteriorunder the Exact SMM model (SMME) asp(tjn0;¢¢¢;nk)=e°(∏+2πn)tQkj=0[Ij(2πt)]njR10e°(∏+2πn)tQkj=0[Ij(2πt)]njdt-

-…

Number of exact matchesNumber of k steps differences

Page 44: Bayesian Estimators of Time to Most Recent Common Ancestry

ExampleConsider comparing haplotypes 1 and 3 from Thomaset al.’s (2000) study on the Lemba and Cohen Y chromosome modal haplotypes. Here six markers used,four exactly match, one differs by one repeat, the otherby two repeats

Hence, n = 6, k = 4 for IAM and SMM0 models

n0 = 4, n1 = 1, n2 = 1, n = 6 under SMME model

Assume Hammer’s value of Ne=5000 for the prior

Page 45: Bayesian Estimators of Time to Most Recent Common Ancestry

IAM

SMM0

SMME

Time to MRCA, t

P(t |

mar

kers

)TMRCA for Lemba and Cohen Y

Model used Mean Medium 2.5% 97.5%

IAM 152.3 135.4 31.1 371

SMM0 454.7 233.7 40.4 2389

SMME 422.3 286.2 65.1 1631

Page 46: Bayesian Estimators of Time to Most Recent Common Ancestry

Time to MRCA, t

Pr(T

MRC

A <

t)

IAM

SMM0

SMME

Page 47: Bayesian Estimators of Time to Most Recent Common Ancestry
Page 48: Bayesian Estimators of Time to Most Recent Common Ancestry

Technology Transfer

Family Tree DNA (ftDNA) -- provides Y chromosome marker kits for genealogical studies

So far, ftDNA has processed over 10,000 suchkits

This amounts to a rough gross of around 3 milliondollars.

The expressions developed above have directcommercial applications

Page 49: Bayesian Estimators of Time to Most Recent Common Ancestry

Forensic applications of the Y• A not uncommon situation is the only DNA

is from fingernail scrappings.• The result is a mixture wherein the

victim's DNA often overwhelms the DNA of the perpetrator.

• Result: only modest match probability as many autosomal markers cannot be detected

• One solution: use Y chromosome markers. Easily amplified over (female) background

Page 50: Bayesian Estimators of Time to Most Recent Common Ancestry

Problem: How do we combine Y match with autosomal match?

NRC 1996 recommendations (autosomal loci)

Prob(Y match)*Prob(autosomal match)

Problem: Y markers may provide information about population substructure membershipFor example, a particular haplotype may be restricted to a certain subpopulation, e.g., Native Americans

Product rule across markersPopulation substructure correction within markers

Page 51: Bayesian Estimators of Time to Most Recent Common Ancestry

Correcting for Y substructureLet y denote the observed Y haplotypeA the multilocus autosomal marker genotype

P(y,A) = P(A | y)*P(y)

Simple approach: knowledge of y indicates membershipin a particular subpopulation, P(A) computed using allele frequencies for that subpopulation.

Suggestion: Multiply freq(y)* max Freq(A over subgroups)

Page 52: Bayesian Estimators of Time to Most Recent Common Ancestry

A more precise accountingSuppose two individuals share the same y haplotype.What is there average coancestry, ?

Balding and Nichols give expressions for autosomalsingle-locus genotype frequencies given that thepopulation shows structure with coancestry .

Second approach: Compute from haplotype matching.Using this value in Balding - Nichols expressionsto compute (single-locus) autosomal frequencies.

Page 53: Bayesian Estimators of Time to Most Recent Common Ancestry

P(tjt∏k)=(1°u)2nø¢(1°∏)ø°1P1ø=k(1°u)2nø¢(1°∏)ø°1>

Posterior Distribution for a match at alln markers with a prior of = 1/NeE[µjt∏k]=1Xt=kµ122t+1∂P(tjt∏k)> >

With a MRCA of t generations, = (1/2)2t+1

Page 54: Bayesian Estimators of Time to Most Recent Common Ancestry

Typical situation is where we can exclude father-son and paternal half-sibs, k > 2nπ+2nπ)E[µjt∏2]'∏+2nπ24°8(∏+2nπ)>

Typical values, n = 11, = 1/500

• For = 1/5000, E [ ] = 0.00186• For = 1/500, E [ ] = 0.00194• For = 1/50, E [ ] = 0.00272

For these values, unless pi < 0.01, Balding-Nichols expression are essentially HW.

Page 55: Bayesian Estimators of Time to Most Recent Common Ancestry

Formal procedureEstimate P(y) from a database (counting methods,Bayesian estimators)

Compute mutlilocus autosomal frequencies byeach major ethnic group using the product of the single-locus genotypes computed using group-specific allele frequencies and = 0.002correction.

Conservative P(y,A) = P(y)*max P(A)

Page 56: Bayesian Estimators of Time to Most Recent Common Ancestry

Thanks to Assistance from

Jay TaylorMike Hammer

Matt Kaplan