Download - Lecture 5: Major Genes, Polygenes, and QTLs. Major genes --- genes that have a significant effect on the phenotype Polygenes --- a general term of the

Lecture 5: Major Genes, Polygenes,

and QTLs

Major genes --- genes that have a significant effect on the phenotype

Polygenes --- a general term of the genes of small effect that influence a trait

QTL, quantitative trait locus --- a particular gene underlying the trait.

Usually used when a gene underlying a trait ismapped to a particular chromosomal region

Candidate gene --- a particular known gene that is of interest as being a potential candidate for contributing to the variation in a trait

Mendelizing allele. The allele has a sufficiently large effect that its impact is obvious when looking at phenotype

Major Genes

• Major morphological mutations of classical genetics that arose by spontaneous or induced

mutation• Genes of large effect have been found selected

lines– pygmy, obese, dwarf and hg alleles in mice – booroola F in sheep– halothane sensitivity in pigs

• Major genes tend to be deleterious and are at very low frequencies in unselected populations, and contribute little to Var(A)

Genes for Genetic modification of muscling

“Natural” mutations in the myostatin gene in cattle

“Natural” mutation in the callipyge - gene in sheep

“Booroola” gene in sheep increasing ovulation rate

Merino Sheep

Major genes for mouse body size

The mutations ob or db cause deficiencies in leptin production, or leptin receptor deficiencies

Major Genes and IsoallelesWhat is the genetic basis for quantitative variation?

Honest answer --- don’t know.

One hypothesis: isoalleles. A locus that has an allele ofmajor effect may also have alleles of much smaller effect(isoalleles) that influence the trait of interest.

Structural vs. regulatory changes

Structural: change in an amino acid sequence

Regulatory: change affecting gene regulation

General assumption: regulatory changes are likely more important

Cis vs. trans effectsCis effect --- regulatory change only affects gene (tightly) linked on the same chromosome

Trans effect --- a diffusible factor that can influence regulation of unlinked genes

Cis-acting locus. The allele influencesThe regulation of a gene on the sameDNA molecule

Trans-acting locus. This locus influences genes onother chromosomes and non-adjacent sites on the samechromosome

Genomic location of mRNA level modifiers

Gen

om

ic locati

on

of

gen

es o

n a

rray

CIS-modifiersTRANS-modifiers MASTER modifiers

Polygenic MutationFor “normal” genes (i.e., those with large effects) simply giving a mutation rate is sufficient (e.g. the rate at which an dwarfing allele appears)

For alleles contributing to quantitative variation, we must account for both the rate at which mutants appear as well as the phenotypic effect of each

Mutational variance, Vm or 2m - the amount of new additive

genetic variance introduced by mutation each generation

Typically Vm is on the order of 10-3 VE

Simple Tests for the Presence of Major Genes

• Phenotypes fall into discrete classes

• Multimodality --- distribution has several modes (peaks)

Simple Visual tests:

Simple statistical tests

• Fit to a mixture model (LR test)

p(z) = pr(QQ)p(z|QQ) +pr(Qq)p(z|Qq) + pr(qq)p(z|qq) • Heterogeneity of within-family variances

Select and backcross

Mixture Modelsp(z)=nXi=1Pr(i)pi(z)The distribution of trait value z is the weighted sum of nunderlying distributions

The probability that a random individual is fromclass i

The distributions of phenotypes conditional of the individual belonging to class i

The component distributions are typically assumed normalp(z)=nXi=1Pr(i)'(z;πi;æ2i)'(z;πi;æ2i)=1p2ºæ2iexp∑°(z°πi)22æ2i∏( )Normal with mean and variance 2

3n-1 parameters: n-1 mixture proportions, n means, n variances

Typically assume common variances -> 2n-1 parameters

In quantitative genetics, the underlying classes aretypically different genotypes (e.g. QQ vs. Qq) althoughwe could also model different environments in the samefashion

Likelihood function for an individual under a mixture model(̀zj)=Pr(QQ)pQQ(zj)+Pr(Qq)pQq(zj)+Pr(qq)pqq(zj)=Pr(QQ)'(zj;πQQ;æ2)+Pr(Qq)'(zj;πQq;æ2)+Pr(qq)'(zj;πqq;æ2)Mixture proportions follow from Hardy-Weinberg, e.g. Pr(QQ) = pQ* pQ

Likelihood function for a random sample of m individuals(̀z)=̀(z1;z2;¢¢¢;zm)=mYj=1̀(zj). . .

Likelihood Ratio test for Mixtures

Null hypothesis: A single normal distribution isadequate to fit the data. The maximum of thelikelihood function under the null hypothesis ismax̀0(z1;z2;¢¢¢;zm)=… (2ºS2)°m=2exp°12S2mXj=1(zj°z)20@1A

The LR test for a significantly better fit under a mixtureis given by 2 ln (max { likelihood under mixture}/max l0 )

The LR follows a chi-square distribution with n-2 df, wheren-1 = number of fitted parameters for the mixture

S2=1mX(zi°z)2

Complex Segregation Analysis

A significant fit to a mixture only suggests the possibilityof a major gene.

A much more formal demonstration of a major gene isgiven by the likelihood-based method of ComplexSegregation Analysis (CSA)

Testing the fit of a mixture model requires a sample ofrandom individuals from the population.

CSA requires a pedigree of individuals. CSA useslikelihood to formally test for the transmission ofA major gene in the pedigree

Building the likelihood for CSA

Start with a mixture model

Difference is that the mixing proportions are not the same for each individual, but rather are a function of its parental (presumed) genotypes

Phenotypic value of individual j in family iMajor-locus genotypes of parents̀

(zijjgf;gm)=3Xgo=1Pr(gojgf;gm)'(zij;πgo;æ2)Transmission Probability of an offspring having genotypego given the parental genotypes are gf, gm.(go=3jgf=1;gm=2)=(qqjgf=QQ;gm=Qq)=0(go=2jgf=1;gm=2)=(Qqjgf=QQ;gm=Qq)=1=2(go=1jgf=1;gm=2)=(QQjgf=QQ;gm=Qq)=1=2Example: code qq=3, Qq=2,QQ=1

Sum is over all possible genotypes, indexed by go =1,2,3

Mean of genotype goPhenotypic variance conditioned on major-locus genotype(̀zi¢)=3Xgf=13Xgm=1̀(zi¢jgf;gm)(gf;gm)Sum over all possible parental genotypes

Likelihood for family i (̀zi¢jgf;gm)=niYj=1̀(zijjgf;gm)Conditional family likelihood

Transmission ProbabilitiesExplicitly model the transmission probabilitiesPr(qqjgf;gm)=(1°øgf)(1°øgm)Pr(Qqjgf;gm)=øgf(1°øgm)+øgm(1°øgf)Pr(QQjgf;gm)=øgføgm- -

- -

Probability that the father transmits QProbability that the mother transmits QFormal CSA test of a major gene (three steps):

• Rejection of the hypothesis of equal transmission for all genotypes (QQ = Qq = qq )

• Failure to reject the hypothesis of Mendelian segregation :QQ = 1, , Qq = 1/2, qq = 0

• Significantly better overall fit of a mixture model compared with a single normal

CSA Modification: Common Family Effects

Families can share a common environmental effect

Expected value for go genotype, family i is go + ci(̀zijgf;gm;ci)=niYj=1243Xgoj=1Pr(gojjgf;gm)'(zij;πgoj+ci;æ2)35Likelihood conditioned on common family effect ci

Unconditional likelihood (average over all c --- assumedNormal with mean zero and variance c

2(̀zijgf;gm)=Z1°1̀(zijgf;gm;c)'(c;0;æ2c)dcLikelihood function with no major gene, but family effects(̀zi)=Z1°1̀(zijc)'(c;0;æ2c)dc=Z1°124niYj=1'°zij;π+c;æ2¢35'°c;0;æ2c¢dc( () )

Maps and Mapping Functions

The unit of genetic distance between two markers isthe recombination frequency, c

If the phase of a parent is AB/ab, then 1-c is thefrequency of “parental” gametes (e.g., AB and ab),while c is the frequency of “nonparental” gametes(e.g.. Ab and aB).

A parental gamete results from an EVEN number ofcrossovers, e.g., 0, 2, 4, etc.

For a nonparental (also called a recombinant) gamete,need an ODD number of crossovers between A & be.g., 1, 3, 5, etc.

Hence, simply using the frequency of “recombinant”(i.e. nonparental) gametes UNDERESTIMATESthe m number of crossovers, with E[m] > c

Mapping functions attempt to estimate the expectednumber of crossovers m from observed recombinationfrequencies c

When considering two linked loci, the phenomenaof interference must be taken into account

The presence of a crossover in one interval typicallydecreases the likelihood of a nearby crossover

In particular, c = Prob(odd number of crossovers)

cAC=cAB(1°cBC)+(1°cAB)cBC=cAB+cBC°2cABcBCcAC=cAB+cBC°2(1°±)cABcBCSuppose the order of the genes is A-B-C.

If there is no interference (i.e., crossovers occurindependently of each other) then

Probability(odd number of crossovers btw A and C)Odd number of crossovers btw A & B and evennumber between B & CWe need to assume independence of crossovers inorder to multiply these two probabilities

Even number in A-B, odd number in B-C

When interference is present, we can write this as

Interference parameter = 1 --> complete interference: The presence of a crossover eliminates nearby crossovers = 0 --> No interference. Crossovers occur independently of each other

c=1Xk=0p(m;2k+1)=e°m1Xk=0m2k+1(2k+1)!=1°e°2m2- - -

Mapping functions. Moving from c to m

Haldane’s mapping function (gives Haldane mapdistances)

Assume the number k of crossovers in a regionfollows a Poisson distribution with parameter mThis makes the assumption of NO INTERFERENCE

Prob(Odd number of crossovers)Odd numberm=°ln(1°2c)2This gives the estimated Haldane distance as

Usually reported in units of Morgans or Centimorgans (Cm)One morgan --> m = 1.0. One Cm --> m = 0.01

Pr(Poisson = k) = k Exp[-]/k! = expected number of successes

Linkage disequilibrium mapping

Idea is to use a random sample of individuals fromthe population rather than a large pedigree.

Ironically, in the right settings this approach has more power for fine mapping than pedigree analysis.

Why?

Key is the expected number of recombinants.in a pedigree, Prob(no recombinants) in nindividuals is (1-c)n

LD mapping uses the historical recombinants ina sample. Prob(no recomb) = (1-c)2t, where t =Time back to most recent common ancestor

Expected number of recombinants in a sample ofn sibs is cn

Expected number of recombinants in a sample ofn random individuals with a time t back to theMRCA (most recent common ancestor) is 2cnt

Hence, if t is large, many more expected recombinantsin random sample and hence more power for veryfine mapping (i.e. c < 0.01)

Because so many expected recombinants, only workswith c very small

Fine-mapping genes

Suppose an allele causing a large effect on the traitarose as a single mutation in a closed population

New mutation arises on red chromosome

Initially, the new mutation islargely associated with thered haplotype

Hence, markers that define the red haplotype arelikely to be associated (i.e. in LD) with the mutant allele

This linkage disequilibrium decays slowly with time ifc is small

Let = Prob(mutation associated with original haplotype)

=(1-c)t

Thus if we can estimate and t, we can solve for c,

c = 1- 1/t

Allele Normal DTD-bearing

1-1 4 (3.3%) 144 (94.7%)

1-2 28 (22.7%) 1 (0.7%)

2-1 7 (5.7%) 0 (0%)

2-2 84 (68.3%) 7 (4.6%)

Diastrophic dysplasis (DTD) association with CSF1R marker locus alleles

Most frequent allele type varies betweennormal and DTD-bearing haplotypesc = 1- 1/t = 1- 1/100

Hence, allele 1-1 appears to be on the original haplotypein which the DTD mutation arose --> = 0.947

100 generations to MRCA usedfor Finnish population

Gives c = 0.00051 between marker and DTD. BestEstimate from pedigrees is c = 0.012 (1.2cM)

Candidate Loci and the TDT

Often try to map genes by using case/control contrasts, also called association mapping.

The frequencies of marker alleles are measured in both a case sample -- showing the trait (or extreme values) control sample -- not showing the trait

The idea is that if the marker is in tight linkage, we mightexpect LD between it and the particular DNA site causingthe trait variation.

Problem with case-control approach: Population Stratification can given false positives.

Gm+ Total % with diabetes

Present 293 8%

Absent 4,627 29%

When population being sampled actually consists of several distinct subpopulations we have lumped together,marker alleles may provide information as to which groupan individual belongs. If there are other risk factors ina group, this can create a false association btw marker andtrait

Example. The Gm marker was thought (for biologicalreasons) to be an excellent candidate gene for diabetes in the high-risk population of Pima indiansin the American Southwest. Initially a very strongassociation was observed:

Problem: freq(Gm+) in Caucasians (lower-risk diabetesPopulation) is 67%, Gm+ rare in full-blooded Pima

Gm+ Total % with diabetes

Present 17 59%

Absent 1,764 60%

The association was re-examined in a population of Pimathat were 7/8th (or more) full heritage:

Transmission-disequilibrium test (TDT)¬2td=(T°NT)2(T+NT)The TDT accounts for population structure. It requires

sets of relatives and compares the number of times a marker allele is transmitted (T) versus not-transmitted (NT) from a marker heterozygote parent to affected offspring.

Under the hypothesis of no linkage, these values should be equal, resulting in a chi-square test forlack of fit:

Allele T NT 2 p

228 81 45 10.29 0.001

230 59 73 1.48 0.223

240 36 24 2.30 0.121

Scan for type I diabetes in Humans. Marker locusD2S152 ¬2=(81°45)2(81+45)=10:29

Download - Lecture 5: Major Genes, Polygenes, and QTLs. Major genes --- genes that have a significant effect on the phenotype Polygenes --- a general term of the

Top Related