Lecture 5: Major Genes, Polygenes,
and QTLs
Major genes --- genes that have a significant effect on the phenotype
Polygenes --- a general term of the genes of small effect that influence a trait
QTL, quantitative trait locus --- a particular gene underlying the trait.
Usually used when a gene underlying a trait ismapped to a particular chromosomal region
Candidate gene --- a particular known gene that is of interest as being a potential candidate for contributing to the variation in a trait
Mendelizing allele. The allele has a sufficiently large effect that its impact is obvious when looking at phenotype
Major Genes
• Major morphological mutations of classical genetics that arose by spontaneous or induced
mutation• Genes of large effect have been found selected
lines– pygmy, obese, dwarf and hg alleles in mice – booroola F in sheep– halothane sensitivity in pigs
• Major genes tend to be deleterious and are at very low frequencies in unselected populations, and contribute little to Var(A)
Genes for Genetic modification of muscling
“Natural” mutations in the myostatin gene in cattle
“Natural” mutation in the callipyge - gene in sheep
“Booroola” gene in sheep increasing ovulation rate
Merino Sheep
Major genes for mouse body size
The mutations ob or db cause deficiencies in leptin production, or leptin receptor deficiencies
Major Genes and IsoallelesWhat is the genetic basis for quantitative variation?
Honest answer --- don’t know.
One hypothesis: isoalleles. A locus that has an allele ofmajor effect may also have alleles of much smaller effect(isoalleles) that influence the trait of interest.
Structural vs. regulatory changes
Structural: change in an amino acid sequence
Regulatory: change affecting gene regulation
General assumption: regulatory changes are likely more important
Cis vs. trans effectsCis effect --- regulatory change only affects gene (tightly) linked on the same chromosome
Trans effect --- a diffusible factor that can influence regulation of unlinked genes
Cis-acting locus. The allele influencesThe regulation of a gene on the sameDNA molecule
Trans-acting locus. This locus influences genes onother chromosomes and non-adjacent sites on the samechromosome
Genomic location of mRNA level modifiers
Gen
om
ic locati
on
of
gen
es o
n a
rray
CIS-modifiersTRANS-modifiers MASTER modifiers
Polygenic MutationFor “normal” genes (i.e., those with large effects) simply giving a mutation rate is sufficient (e.g. the rate at which an dwarfing allele appears)
For alleles contributing to quantitative variation, we must account for both the rate at which mutants appear as well as the phenotypic effect of each
Mutational variance, Vm or 2m - the amount of new additive
genetic variance introduced by mutation each generation
Typically Vm is on the order of 10-3 VE
Simple Tests for the Presence of Major Genes
• Phenotypes fall into discrete classes
• Multimodality --- distribution has several modes (peaks)
Simple Visual tests:
Simple statistical tests
• Fit to a mixture model (LR test)
p(z) = pr(QQ)p(z|QQ) +pr(Qq)p(z|Qq) + pr(qq)p(z|qq) • Heterogeneity of within-family variances
Select and backcross
Mixture Modelsp(z)=nXi=1Pr(i)pi(z)The distribution of trait value z is the weighted sum of nunderlying distributions
The probability that a random individual is fromclass i
The distributions of phenotypes conditional of the individual belonging to class i
The component distributions are typically assumed normalp(z)=nXi=1Pr(i)'(z;πi;æ2i)'(z;πi;æ2i)=1p2ºæ2iexp∑°(z°πi)22æ2i∏( )Normal with mean and variance 2
3n-1 parameters: n-1 mixture proportions, n means, n variances
Typically assume common variances -> 2n-1 parameters
In quantitative genetics, the underlying classes aretypically different genotypes (e.g. QQ vs. Qq) althoughwe could also model different environments in the samefashion
Likelihood function for an individual under a mixture model(̀zj)=Pr(QQ)pQQ(zj)+Pr(Qq)pQq(zj)+Pr(qq)pqq(zj)=Pr(QQ)'(zj;πQQ;æ2)+Pr(Qq)'(zj;πQq;æ2)+Pr(qq)'(zj;πqq;æ2)Mixture proportions follow from Hardy-Weinberg, e.g. Pr(QQ) = pQ* pQ
Likelihood function for a random sample of m individuals(̀z)=̀(z1;z2;¢¢¢;zm)=mYj=1̀(zj). . .
Likelihood Ratio test for Mixtures
Null hypothesis: A single normal distribution isadequate to fit the data. The maximum of thelikelihood function under the null hypothesis ismax̀0(z1;z2;¢¢¢;zm)=… (2ºS2)°m=2exp°12S2mXj=1(zj°z)20@1A
The LR test for a significantly better fit under a mixtureis given by 2 ln (max { likelihood under mixture}/max l0 )
The LR follows a chi-square distribution with n-2 df, wheren-1 = number of fitted parameters for the mixture
S2=1mX(zi°z)2
Complex Segregation Analysis
A significant fit to a mixture only suggests the possibilityof a major gene.
A much more formal demonstration of a major gene isgiven by the likelihood-based method of ComplexSegregation Analysis (CSA)
Testing the fit of a mixture model requires a sample ofrandom individuals from the population.
CSA requires a pedigree of individuals. CSA useslikelihood to formally test for the transmission ofA major gene in the pedigree
Building the likelihood for CSA
Start with a mixture model
Difference is that the mixing proportions are not the same for each individual, but rather are a function of its parental (presumed) genotypes
Phenotypic value of individual j in family iMajor-locus genotypes of parents̀
(zijjgf;gm)=3Xgo=1Pr(gojgf;gm)'(zij;πgo;æ2)Transmission Probability of an offspring having genotypego given the parental genotypes are gf, gm.(go=3jgf=1;gm=2)=(qqjgf=QQ;gm=Qq)=0(go=2jgf=1;gm=2)=(Qqjgf=QQ;gm=Qq)=1=2(go=1jgf=1;gm=2)=(QQjgf=QQ;gm=Qq)=1=2Example: code qq=3, Qq=2,QQ=1
Sum is over all possible genotypes, indexed by go =1,2,3
Mean of genotype goPhenotypic variance conditioned on major-locus genotype(̀zi¢)=3Xgf=13Xgm=1̀(zi¢jgf;gm)(gf;gm)Sum over all possible parental genotypes
Likelihood for family i (̀zi¢jgf;gm)=niYj=1̀(zijjgf;gm)Conditional family likelihood
Transmission ProbabilitiesExplicitly model the transmission probabilitiesPr(qqjgf;gm)=(1°øgf)(1°øgm)Pr(Qqjgf;gm)=øgf(1°øgm)+øgm(1°øgf)Pr(QQjgf;gm)=øgføgm- -
- -
Probability that the father transmits QProbability that the mother transmits QFormal CSA test of a major gene (three steps):
• Rejection of the hypothesis of equal transmission for all genotypes (QQ = Qq = qq )
• Failure to reject the hypothesis of Mendelian segregation :QQ = 1, , Qq = 1/2, qq = 0
• Significantly better overall fit of a mixture model compared with a single normal
CSA Modification: Common Family Effects
Families can share a common environmental effect
Expected value for go genotype, family i is go + ci(̀zijgf;gm;ci)=niYj=1243Xgoj=1Pr(gojjgf;gm)'(zij;πgoj+ci;æ2)35Likelihood conditioned on common family effect ci
Unconditional likelihood (average over all c --- assumedNormal with mean zero and variance c
2(̀zijgf;gm)=Z1°1̀(zijgf;gm;c)'(c;0;æ2c)dcLikelihood function with no major gene, but family effects(̀zi)=Z1°1̀(zijc)'(c;0;æ2c)dc=Z1°124niYj=1'°zij;π+c;æ2¢35'°c;0;æ2c¢dc( () )
Maps and Mapping Functions
The unit of genetic distance between two markers isthe recombination frequency, c
If the phase of a parent is AB/ab, then 1-c is thefrequency of “parental” gametes (e.g., AB and ab),while c is the frequency of “nonparental” gametes(e.g.. Ab and aB).
A parental gamete results from an EVEN number ofcrossovers, e.g., 0, 2, 4, etc.
For a nonparental (also called a recombinant) gamete,need an ODD number of crossovers between A & be.g., 1, 3, 5, etc.
Hence, simply using the frequency of “recombinant”(i.e. nonparental) gametes UNDERESTIMATESthe m number of crossovers, with E[m] > c
Mapping functions attempt to estimate the expectednumber of crossovers m from observed recombinationfrequencies c
When considering two linked loci, the phenomenaof interference must be taken into account
The presence of a crossover in one interval typicallydecreases the likelihood of a nearby crossover
In particular, c = Prob(odd number of crossovers)
cAC=cAB(1°cBC)+(1°cAB)cBC=cAB+cBC°2cABcBCcAC=cAB+cBC°2(1°±)cABcBCSuppose the order of the genes is A-B-C.
If there is no interference (i.e., crossovers occurindependently of each other) then
Probability(odd number of crossovers btw A and C)Odd number of crossovers btw A & B and evennumber between B & CWe need to assume independence of crossovers inorder to multiply these two probabilities
Even number in A-B, odd number in B-C
When interference is present, we can write this as
Interference parameter = 1 --> complete interference: The presence of a crossover eliminates nearby crossovers = 0 --> No interference. Crossovers occur independently of each other
c=1Xk=0p(m;2k+1)=e°m1Xk=0m2k+1(2k+1)!=1°e°2m2- - -
Mapping functions. Moving from c to m
Haldane’s mapping function (gives Haldane mapdistances)
Assume the number k of crossovers in a regionfollows a Poisson distribution with parameter mThis makes the assumption of NO INTERFERENCE
Prob(Odd number of crossovers)Odd numberm=°ln(1°2c)2This gives the estimated Haldane distance as
Usually reported in units of Morgans or Centimorgans (Cm)One morgan --> m = 1.0. One Cm --> m = 0.01
Pr(Poisson = k) = k Exp[-]/k! = expected number of successes
Linkage disequilibrium mapping
Idea is to use a random sample of individuals fromthe population rather than a large pedigree.
Ironically, in the right settings this approach has more power for fine mapping than pedigree analysis.
Why?
Key is the expected number of recombinants.in a pedigree, Prob(no recombinants) in nindividuals is (1-c)n
LD mapping uses the historical recombinants ina sample. Prob(no recomb) = (1-c)2t, where t =Time back to most recent common ancestor
Expected number of recombinants in a sample ofn sibs is cn
Expected number of recombinants in a sample ofn random individuals with a time t back to theMRCA (most recent common ancestor) is 2cnt
Hence, if t is large, many more expected recombinantsin random sample and hence more power for veryfine mapping (i.e. c < 0.01)
Because so many expected recombinants, only workswith c very small
Fine-mapping genes
Suppose an allele causing a large effect on the traitarose as a single mutation in a closed population
New mutation arises on red chromosome
Initially, the new mutation islargely associated with thered haplotype
Hence, markers that define the red haplotype arelikely to be associated (i.e. in LD) with the mutant allele
This linkage disequilibrium decays slowly with time ifc is small
Let = Prob(mutation associated with original haplotype)
=(1-c)t
Thus if we can estimate and t, we can solve for c,
c = 1- 1/t
Allele Normal DTD-bearing
1-1 4 (3.3%) 144 (94.7%)
1-2 28 (22.7%) 1 (0.7%)
2-1 7 (5.7%) 0 (0%)
2-2 84 (68.3%) 7 (4.6%)
Diastrophic dysplasis (DTD) association with CSF1R marker locus alleles
Most frequent allele type varies betweennormal and DTD-bearing haplotypesc = 1- 1/t = 1- 1/100
Hence, allele 1-1 appears to be on the original haplotypein which the DTD mutation arose --> = 0.947
100 generations to MRCA usedfor Finnish population
Gives c = 0.00051 between marker and DTD. BestEstimate from pedigrees is c = 0.012 (1.2cM)
Candidate Loci and the TDT
Often try to map genes by using case/control contrasts, also called association mapping.
The frequencies of marker alleles are measured in both a case sample -- showing the trait (or extreme values) control sample -- not showing the trait
The idea is that if the marker is in tight linkage, we mightexpect LD between it and the particular DNA site causingthe trait variation.
Problem with case-control approach: Population Stratification can given false positives.
Gm+ Total % with diabetes
Present 293 8%
Absent 4,627 29%
When population being sampled actually consists of several distinct subpopulations we have lumped together,marker alleles may provide information as to which groupan individual belongs. If there are other risk factors ina group, this can create a false association btw marker andtrait
Example. The Gm marker was thought (for biologicalreasons) to be an excellent candidate gene for diabetes in the high-risk population of Pima indiansin the American Southwest. Initially a very strongassociation was observed:
Problem: freq(Gm+) in Caucasians (lower-risk diabetesPopulation) is 67%, Gm+ rare in full-blooded Pima
Gm+ Total % with diabetes
Present 17 59%
Absent 1,764 60%
The association was re-examined in a population of Pimathat were 7/8th (or more) full heritage:
Transmission-disequilibrium test (TDT)¬2td=(T°NT)2(T+NT)The TDT accounts for population structure. It requires
sets of relatives and compares the number of times a marker allele is transmitted (T) versus not-transmitted (NT) from a marker heterozygote parent to affected offspring.
Under the hypothesis of no linkage, these values should be equal, resulting in a chi-square test forlack of fit:
Allele T NT 2 p
228 81 45 10.29 0.001
230 59 73 1.48 0.223
240 36 24 2.30 0.121
Scan for type I diabetes in Humans. Marker locusD2S152 ¬2=(81°45)2(81+45)=10:29