ancestry of alleles and extinction of genes

Upload: littlecedar

Post on 14-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    1/10

    Zoo Biology 5161-170 (1986)

    Ancestry of Alleles and Extinction of Genesin Populations With Defined PedigreesE.A. ThompsonStatistical Laboratory, Un iversity of Cambr idge

    The paper presents, without mathematical formalities, two computational ap-proaches to the analysis of pedigrees. One provides a method for inferring theancestry of rare alleles observed in a population. The other provides the probabil-ities that specified combinations of founder genes survive to this current popula-tion. The emphasis in both approaches s upon the importance of joint probabilities.If some founder genes survive, others have smaller probability of doing so also.If an allele descends from a given founder to a current individual, it is likely toappear also in individuals sharing the same lines of descent.K ey words: complex pedigrees, rare alleles, gene extinction, joint probabilitiesINTRODUCTION

    Where the whole of a population or even of a species descends from feworiginal founders via complex paths of descent, several questions arise. Often thereare rare alleles segregating in the pedigree; it is of interest to infer the originalcontributor of such alleles. This is particularly the case for recessive alleles, wheredifferent inferences as to the origin provide different assessments of the currentdistribution of unobservable carriers. A second setof questions concerns the extinc-tion of genes. At a given autosomal locus, how many distinct founder genes survive?How does survival of certain founder genes affect that of others? Within a definedpedigree, homologus genes are competing for survival, in thesensethat only one canbe passed on at any given segregation. On the other hand, some genes must survivein an extant population. The importance of joint assessments is summarised by thefollowing apt quotation (concerning a crossword competition) from a national dailypaper:

    How many of this years3600entrants can reasonably expectto be amongst the first three? Probably about ten.Received September 17, 1985; accepted October 22 , 1985.Address reprint requests to Dr. E.A. Thompson, Department of Statistics, GN22, University of Wash-ington, Seattle, Washington 98195

    0 1986 Alan R. Li ss, Inc.

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    2/10

    162 ThompsonQuestions of the above kind have been studied in the context of geneticallyisolated human populations. On Tristan da Cunha, the whole of the population of 268individuals descends mainly from eleven early founders [Roberts, 19711. Inferencesabout the alleles contributed by some of these are made by Thompson [19781. For aMennonite-Amish genealogy, the question was of the ancestral origin of a recessiveallele [K idd et al, 19801. A method facilitating joint computations was given byThompson [1983al. In a Newfoundland genealogy [Buehler et al, 19751, the questionswere of validity of a single-locus genetic model for an apparently recessive trait, andof estimates of the individuals carrying or having carried the recessive allele [Thomp-son, 19811. The aim here is to present the general methodology to those interested inthe pedigrees of animal populations.Animal pedigrees differ in detailed structure from human genealogies. Theformer often show some individuals with very large numbers of mates and closeinbreeding. However, neither of these factors complicates the computational proce-dures. A more serious problem is in multiple loose inbreeding resulting in lengthyand entangled loops in the genealogy, but this occurs also in human genealogies.Another feature of animal pedigrees is of extensive cross-generation mating, whichin human genealogies is more limited in depth. This feature can cause added compu-tational complexity. The subject of drawing genealogies is not at issue here, but itwill be convenient to introduce the marriage node graph representation. An exampleis given in Figure I .The lines represent individuals: an individual connects his parents marriageto his own. Where an individual is involved in matings with different individuals, aspecial symbol () is used at the connecting point of the relevant arcs. For largeand complex pedigrees, this representation greatly facilitates the tracing of ancestryof individuals, and the graph-definition of the structure has been used in programsfor computations on such pedigrees [Thompson, 19801.

    ANCESTRY OF ALLELESThe classical coefficient of kinship between two individuals is well known toboth animal and human geneticists. The simplest overall summary of a relationship,it isdefined as the probability that two randomly chosen homologous genes, one fromeach individual, are identical by descent; that is, are copies of a single gene in somecommon ancestor, received by each via repeated segregations. Wright [19221 first

    proposed a method for computing kinship coefficients, and there have been manyapproaches over the years. However, the advent of recursive block-structure pro-gramming languages has resulted in the computational feasibility of very simplerecursive algorithm.Provided A is not B, nor an ancestor of B, the kinship coefficient *(A ,B),between A and B, satisfies the following well-known equation:

    where M A and FA are the parents of A . If A is an ancestor of B, the symmetry ofkinship coefficients allows the equation to be instead applied to B;

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    3/10

    Allele Ancestry and Gene Extinction 163

    a

    'Current' Individuals

    b

    ' C u r r e n t ' l n d ! v l d u a l s

    Fig. 1. Exampleof a small pedigree (a), with its marriage node graph representation (b).

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    4/10

    164 Thompson

    Further, if A is B, we have the equation

    while if A is a founder (MA=FA =0),not an ancestor of B,*(A,B) =0 and k(A,A) = 1/2. (4)

    The four equations (1) to (4)are sufficient to define the kinship coefficientbetween any two individuals, and can be implemented in a recursive languageprecisely as they stand. In fact, together with arrays or pointers providing the fatherand mother of each individual, they essentially form a program.Karigl [19811 generalised kinship coefficients to larger numbers of individuals.That precise generalisation is not relevant here, but leadsto the following considera-tion. Let gr(B1,. . .,B,:A) denote the probability that a set of r homologous genes,one chosen from each of the individuals B1,.. B,, all descend from a specified set offounder genes, denoted A. (Note that the genes are not identical by descent: they maydescend from different genes within A.) The index r is not an integral part of thefunction, but is convenient for clarity and facilitates programming. Now these descentfunctions satisfy equations precisely analogous to equations (1) to (4). Thompson[1983al explains the derivation, while Thompson[1983bl gives the general formula.If B1 neither is, nor is an ancestor of, any of B2, . . .B,,

    where MI and F1 are the parents of B1.The function g, is symmetric in all its arguments, (2)

    and if the individual whose parents are next to be considered is repeated k times(k >1)

    If BI is a founder, with0,1, or 2, genes in A, andk 2 1,

    or g,(Bk+ . Br:A), respectively (4 )Equations (1) to (4 )can be as easily implemented as equations (1 ) to (4),although the greater number of arguments increases computation time for any partic-ular instance.

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    5/10

    Allele Anc estry an d Gene Extinctio n 165TABLE 1. Probabilities(y,& a)Defined in the Text, for the Individuals and Founders Shown.For Easeof Presentation,All Probabilities naRow are Multiplied by the Given Factor

    Founder InbreedingIndividual Factor B C F coefficientU 16 6,2,1 6,2,1 4,0,0 18V 64 20,6,3 20,6,3 8,0,0 3/32Q 32 o,o,o 8,0,0 1/16R 32 8,2,1 4 ~ 0 16,8,4 5/32w 256 72,20,11 56,10,6 80,16,8 251256

    There are some special cases of the probabilities of g, which can illuminate thestructure of ancestral contributions. The probability y =gl(B;A2), where A2 is thepair of genes in a single founder A , is simply the ancestral contribution of A to B. IfMB and FB are the parents of B, 0 =g2(MB,FB;A2)s the bilineal contribution: theprobability that both genes of B derive from A. If Al is a single gene of A, thena =~~~(M B ,F B ;A ~)s the inbreeding contribution: the probability of the two genesof B are copies of some single gene of A. The recursive program has been extendedsothat these (y,P , &)-typeprobabilitiesare computed for any current ndividual, simulta-neously for each of an array of founder individuals [Gilpin and Thompson, inpreparation]. The genealogy of Figure 1has six founders (A,B,C,L,F,K), and thesix-dimensional(y, 6, a) vectors have been computed for all individuals. For a smallgenealogy such as this, computation time is only a few seconds. Table 1shows theresults for the three major founders B,C andF, and current individualsU, V, Q, Rand W.The first component of each triple, the ancestral contribution, is straightfor-ward. The final contribution, a, summed over founders, provides the inbreedingcoefficient. (Note that for individual Q, the founder A alsocontributesto inbreeding.)More important, for each current-founder pair of individuals0 a is the probabilitythat both of the two different genes of the founder descend to the current descendant:the individual is a copy of the founder. Where lines of descent to the mother andthe father of the individual are not intertwined, @ = 2a, and an individual whoreceives two genes from the founder receives two copies of the same one withprobabilitya lp = 112. In general,a

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    6/10

    166 Thompsonchances are greater than even that they are two copies of only one gene. In anextended genealogy, with limited paths of descent, such probabilities may be veryhigh-sometimes even one, indicating that the individual can never carry two differentgenes of the founder.The more general probabilities g,, for larger sets of founder genes and for largersetsof current individuals, can also be of practical importance. Suppose a certain fewindividuals are known to carry a recessive allele.Forexample they may be the parentsof individuals affected with a recessive trait. Suppose we compute the probabilityg,(B,, . . .,B,:A) for this set of individuals, and for a variety of sets A. The set Amay be considered hypothesised original copies of the allele. If the descent proba-bility g, is much higher for a set A l than for a set A2of equal size, this implies thatA l is a more likely set of original allelic copies than isA2, since genes we known tobe of the required allelic type descend jointly from Al with higher probability thanfrom A2. This interpretation must be treated with some caution. Larger sets of foundergenes will clearly give larger probabilities.So also will more recent sets, providedthe probabilities are nonzero. There is a bias in the approach in that only descent ofthe affected allele is considered, and not the descent of the normal allele to the manyunaffected individuals of the population.Against this, the method has the advantage of computational speed. Manyalternative setsA can be considered rapidly, to obtain at least a qualitative inferenceas to likely origins. Moreover, the set A need not consist only of founder genes.Provided it doesnot involve individuals who are ancestors of each other, A can be aset of hypothesised ancestral copies at any level in the pedigree. Thus the likely pathsof descent from founders to the current individuals can be traced. Thirdly, and perhapsmore important, it is joint descent to several current genes that is considered. Wherecurrently affected individuals are interrelated, it isthe paths by which the allele coulddescend jointly toall their carrier parents that are relevant.GENE EXTINCTION

    A full anaylsis of gene ancestry would require that the ancestryof all alleles bejointly considered. The relevant probability would be:P(al1 genotypes for this locus are as observed givena specified combination of founder genotypes). ( 5 )

    Computed for each possible founder combination, these probabilites would provide acomparison between alternative ancestral hypotheses. The problem, of course, is incomputing (5). The method of Cannings et a1(1978)provides a method which maybe employed on relatively small yet complex pedigrees, such as that of the Tristan daCunha population. The method involves working back from the current population tothe ancestors, accumulating information, which is summarised as a set of probabilitiesof the form(5) for the joint genotype combinations of ancestral individuals across thepedigree.Now suppose we wish to determine the probability of extinction of a certain setof homologous founder genes. Let us label these genes as a certain allelic type(E, forextinct, say) and all other founder genes as typeS for (possibly) surviving genes(see Figure2). Then theE genes (and perhapsalsosomeS genes) are extinct if and

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    7/10

    Allele Ancestry and Gene Extinction 167

    *Current IndividualsFig. 2. Ancestral likelihoods as gene extinction probabilities. The E alleles amongst founders (andperhaps also some of the Salleles) are extinct in the current population, which is homozygous SS. Thepedigree is that of Figure I , with the three individuals Q. R, and W assumed to constitute the currentpopulation. The ancestral set whose probability of extinction is considered here consists of four genes;both genes of founder A and one gene of each of B and K .

    only if the current population consists entirely of genotypesSS. Conversely, startingfrom the observation of a current population of SS individuals (homozygous fornon-extinct genes) we can work backwards through the genealogy to compute theexpression (5) for all possible founder E/S combinations. That is, we have preciselythe joint extinction probabilities for all combinations of founder genes. Hence we cancompute the probability distribution for the number of distinct founder genes.These computations have been performed for the example genealogyof Figure1, and someof the results are summarized in Table2. The values shown are theoverall probabilities that 0, 1 and2genes of each of the major foundersB, C and Fsurvive in the three descendant individualsQ,R andW. Note that it is impossible forall six of these founder genes to be extinct in the three current individuals; R mustin fact carry one gene fromN and thence fromB, C , or F. It is also impossible forall the six genes to survive, or indeed for both those of C andF to survive. All othercombinations have a strictly positive probability, although some are very small. Theevent of maximum probability (0.191) is that the six genes of Q, R and W containonly two genes fromB, C and F-one each from B and fromF. The event of onegene from each of the three founders has only slightly smaller probability (0.182).From the totals we see that each of B and F has a greater than 50%probability ofexactly one gene surviving and a substantial probability that both do so, but C has agreater than50%probability that both are lost. Notealsothe interactions: when both

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    8/10

    168 ThompsonTABLE 2. Probabilitiesof Extinction( X 1000)for Each Cornhination of 0, 1or 2Extinct Genes,for the FoundersB, C andF

    GenesofC F0 0

    121 0I2

    2 012

    No. of extinctgenes forB0 1 20 0 0

    1 11 12'/z 4 33 32 35182 9441 10411523 93 4486 191 4417 18 0

    Totalsfor B 186 572 242Totalsfor C 31 453 516TotalsforF 230 662 108

    genes of C and one of B are extinct it ismore than 15 times as probable that one ortwo genes of F survive (0.191 +0.093) than that both are extinct (0.018),but whenboth genes of C survive, the ratio of survival to extinction of genes of F is only 3:1.An illuminating example, both of dependence in gene extinction, and of therelevance of numbers of distinct surviving genes is provided by the Tristan da Cunhapopulation. If we simply calculate founder contributions, as is currently done inanalyses of populations of zooanimals [Ballou and Seidensticker, 19831, we find thateleven early founders contribute 84.5% of current genes, and genealogically they fallclearly into two groups of five, together with an eleventh individual whose geneshave high probability of survival. The contributions of the two groups are almostequal, being 36%and 33%, with the additional individual providing just over 15%[Thomas and Thompson, 19841.On the other hand, in terms of numbers of distinct genes, one group contributessubstantially more than the other. The probabilities of the numbers of surviving genesare shown in Figure 3. Each group has at most nine genes surviving, since in eachthere is one of the five founders who has only one offspring. However, one groupmust have at least four surviving genes, while the other may have only two. The firstis expected to have seven distinct founder genes surviving, but the second only five.Thus the former group shows much greater variety in its contribution to the currentpopulation, while the latter (with almost equal total contribution) is dominated bymultiple copies of two or three genes from one founder couple.In human populations, where founders are often couples, sometimes with fewoffspring contributing to the current population, the main dependence in gene survivalis between the genes of founder couples. Any grandchild of such a couple receives(at a given locus) a gene from one member but not the other. Thence, more generally,there will be a negative dependence between the genes of female founders and genesof male founders. This dependence was examined by Thomas and Thompson [1984]for the case of the Tristan da Cunha population. Dependence was found to be slightin this particular case, probably because there were only two founder couples, theother founders (six males and one female) marrying descendants of these. In animal

  • 7/30/2019 Ancestry of Alleles and Extinction of Genes

    9/10

    Allele Ancestry andGene Extinction 169.4

    . 309>[r3v)LL

    >

    z--

    0 . 2cdm