the rise (and fall) of qtl mapping: the fusion of quantitative & molecular genetics bruce walsh...

54
The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh ([email protected]) Depts of Ecology & Evolutionary Biology Plant Sciences Animal Sciences Molecular & Cellular Biology Epidemiology & Biostatistics University of Arizona

Upload: jason-haynes

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

The rise (and fall) of QTL mapping:The fusion of quantitative

& molecular genetics

Bruce Walsh ([email protected])Depts of Ecology & Evolutionary Biology

Plant SciencesAnimal Sciences

Molecular & Cellular BiologyEpidemiology & Biostatistics

University of Arizona

Page 2: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Rough outline• Classical Quantitative Genetics• The Golden Age: The search for QTLs

– History and review of methods

• History revised: how successful has the search for QTLs been?

• The next wave: – eQTLs– Association mapping– Molecular signatures of selection– Are these improvements?

• Summary: Where is quantitative genetics today?

Page 3: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Quantitative Genetics

Quantitative Genetics is the analysis of traits whosevariation is influenced by both genetic and environmental factors

The assumption is that the genotype of an individual cannot be easily predicted from its phenotype.

Indeed, the genotypes (and hence loci) contributingto trait variation have historically been assumed to beunknown and largely unknowable.

“Classical” Quantitative Genetics works with geneticvariance components, which are often easy to estimate.

Page 4: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Genetic variance components

Fisher (1918) reconciled quantitative traits with MendelianGenetics, building on statistical machinery developed bythe biometricians. The term variance was first introducedin Fisher’s paper (as well as ANOVA)

Z = G + E

Fisher’s key insight was the, in sexual species, parentsdo not pass along their genotypic value G to their offspring, but rather only pass along part, the breeding value A,

G = A + D + I

Fisher also noted that the variance of A can beestimated by phenotypic covariances among relatives

Page 5: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Variance components and Selection Response

Cov(Parent, offspring) = Var(A)/2

Cov(half sibs) = Var(A)/4

Cov(full sibs) = Var(A)/2 + Var(D)/4 + Var(Ec)

Thus, without any genetic information, we can stillestimate important genetic features associated withthe trait variation in a particular population.

Key use: The Breeders’ Equation for selection responseR = h2 S, with the heritability h2 = Var(A)/Var(P)

Page 6: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Quantitative Genetics: The infinitesimal model

At the heart of much of classical quantitative genetics is the infinitesimal model -- the geneticvariation in a trait is due to a large number ofloci each of small effect.

Classical quantitative genetics represents thefusion of Mendelian and population genetics, underthe umbrella of classical statistical methods

What about a fusion of quantitative genetics withmolecular biology and genomics?

Page 7: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Statistics and Molecular biology

The success of “classical” quantitative-genetics(variance components and related statistical measures)has been spectacular, esp. in plant and animal breeding.

However, the solely statistical nature of this approach has been unsettling to some, and the demiseof the field was predicted once we had a bettermolecular handle on trait variation.

Thus, starting with the ability to score a vast numberof molecular markers, the fusion of molecular biologyand quantitative genetics seemed a possibility.

Page 8: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Quantitative Trait Loci, QTLs

The first “harvest” from the ability to score modestnumber of molecular markers was the ability to search forQuantitative Trait Loci, QTLs, loci showing allelicvariation that influences trait variation (mid 1980’s).

Conceptually, nothing new, as this is just linkage analysis

Consider the gametes from an AB/ab parent, whereA & B are linked loci. We observe an excess of AB and abgametes, and a deficiency of Ab, aB.

Suppose B influences a trait, making it larger. Offspringgetting the A allele from this parent disproportionately get the B allele as well, and hence have larger trait values.

Page 9: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Sax (1923) crossed two inbred bean lines differing in seed pigment and weight, with the pigmented parents having heavier seeds than the nonpigmented parents.

Early localization of factors influencing quantitative traitswas done by Payne 1918, Sax 1923, and Thoday 1960’s

These crosses demonstrated that seed pigment isdetermined by a single locus with two alleles, P and p.

Among F2 segregants from this cross, PP and Pp seeds were 4.3 +/- 0.8 and 1.9 +/- 0.6 centigrams heavier than pp seeds.

Hence, the P allele is linked to a factor (or factors) that act in an additive fashion on seed weight.

Page 10: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Makers and more markers

While the basic outlines for QTL mapping has beenknown for over 70 years, the lack of sufficient genetic markers prevented its widespread useuntil the mid 1980’s.

The early studies (in maize) used 50-80 markers, mostlyallozymes and were very loosely-linked (marker spacing much greater than 20 cM)

With the advent of DNA (esp. STR = microsat) markers,numbers and density of markers have grown, resultingin a parallel development of more statistically-sophisticatedapproaches to mapping to use this additional information.

Page 11: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

The statistical machinery for QTL mapping

Single marker linear model approaches

Interval mapping: pairs of markers, moveto Maximum likelihood approaches

Composite Interval mapping: analysis of a markerinterval, flanked by adjacent markers. ML-based

Shrinkage and Bayesian approaches for detectingepistasis

From from line-cross analysis to the analysis ofoutbred populations: mixed models

Page 12: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Conditional Probabilities of QTL Genotypes

The basic building block for all QTL methods isPr(Qk | Mj) --- the probability of QTL genotypeQk given the marker genotype is Mj.

Pr(Qk j M j ) =Pr(QkM j )

Pr(Mj )Consider a QTL linked to a marker (recombinationFraction = c). Cross MMQQ x mmqq. In the F1, allgametes are MQ and mq

In the F2, freq(MQ) = freq(mq) = (1-c)/2, freq(mQ) = freq(Mq) = c/2

Page 13: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Hence, Pr(MMQQ) = Pr(MQ)Pr(MQ) = (1-c)2/4

Pr(MMQq) = 2Pr(MQ)Pr(Mq) = 2c(1-c) /4

Since Pr(MM) = 1/4, the conditional probabilities become

Pr(MMqq) = Pr(Mq)Pr(Mq) = c2 /4

Pr(QQ | MM) = Pr(MMQQ)/Pr(MM) = (1-c)2

Pr(Qq | MM) = Pr(MMQq)/Pr(MM) = 2c(1-c)

Pr(qq | MM) = Pr(MMqq)/Pr(MM) = c2

Page 14: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Expected Marker Means

πM j =NX

k=1

πQk Pr(Qk j M j )

The expected trait mean for marker genotype Mj

is just

For example, if QQ = 2a, Qa = a(1+k), qq = 0, then in the F2 of an MMQQ/mmqq cross,

(πM M πmm)=2 = a(1 2c)- -

• If the trait mean is significantly different for thegenotypes at a marker locus, it is linked to a QTL

• A small MM-mm difference could be (i) a tightly-linked QTL of small effect or (ii) loose linkage to a large QTL

Page 15: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Hence, the use of single markers provides fordetection of a QTL. However, single marker means doesnot allow separate estimation of a and c.

Now consider using interval mapping (flanking markers)

°'

πM1 M1 M2 M2 ° πm1m1m2m2

2= a

µ1 ° c1 ° c2

1 ° c1 ° c2 +2c1c2

a(1 2c1 c2)

-

This is essentially a foreven modest linkage

∂µc1 =

12

1 °πM1 M 1 ° πm1m1

2a

'12

µ1 °

πM1 M1 ° πm1m1

πM1 M 1 M 2 M2 ° πm1m1m2m2

Hence, a and c can be estimated from the mean values offlanking marker genotypes

Page 16: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Linear Models for QTL Detection

The use of differences in the mean trait valuefor different marker genotypes to detect a QTL and estimate its effects is a use of linear models.

zik = π +bi +eik

One-way ANOVA.

Value of trait in kth individual of marker genotypetype i

Effect of marker genotype i on trait value Detection: a QTL is linked to the marker if at least one of the bi is significantly different from zero

Estimation (QTL effect and position): This requiresrelating the bi to the QTL effects and map position

Page 17: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Maximum Likelihood Methods

ML methods use the entire distribution of the data, notjust the marker genotype means.

More powerful that linear models, but not as flexiblein extending solutions (new analysis required for each model)

`(z j Mj ) =NX

k=1

' (z;πQk ;æ2) Pr(Qk j M j )

Basic likelihood function:

Trait value given marker genotype is type jSum over the N possible linked QTL genotypesDistribution of trait value given QTL genotype is kis normal with mean Qk. (QTL effects enter here)

Probability of QTL genotype k given marker genotypej --- genetic map and linkage phase entire here

This is a mixture model

Page 18: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

LR = °2lnmax `r(z)max `(z)

LOD(c) = ° log10

∑max `r(z)max `(z;c)

∏=

LR(c)2 ln10

'LR(c)4:61

ML methods combine both detection and estimationOf QTL effects/position.

Test for a linked QTL given from the LR test

Maximum of the likelihood under a no-linked QTLmodel Maximum of the full likelihoodThe LR score is often plotted by trying different locations

for the QTL (i.e., values of c) and computing a LOD scorefor each

Page 19: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

A typical QTL map from a likelihood analysis

Estimated QTL location

Support interval

SignificanceThreshold

Page 20: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Interval Mapping with Marker Cofactors

i i+1 i+2i-1

Consider interval mapping using the markers i and i+1.QTLs linked to these markers, but outside thisinterval, can contribute (falsely) to estimation ofQTL position and effect

Now suppose we also add the two markers flanking theinterval (i-1 and i+2)

Interval being mappedInclusion of markers i-1 and i+2 fully accountfor any linked QTLs to the left of i-1 and theright of i+2

However, still do not account for QTLs in the blue areasInterval mapping + marker cofactors is called Composite Interval Mapping (CIM)

CIM also (potentially) includes unlinked markers toaccount for QTL on other chromosomes.

X

k6=i; i+1

bk xk j

CIM works by adding an additional term to thelinear model ,

Page 21: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

From Line Crosses to Outbred Populations

Much of the above discussion was for the analysisof line-cross data.

In such cases, all of the F1 offspring have thesame genotype, namely MQ/mq, being a heterozygoteat all loci that show fixed differences between thelines being crossed. We can thus lump all offspring

In contrast, with outbred populations, each individualhas a unique genotype, and hence each parent mustbe examined separately.

For example, if a father is M1/M2, we contrastphenotypic values in offspring getting M1 vs. M2

from this parent.

The reason is that (say) a father could be M1Q/M2q,while his mate might be M1q/M2Q.

Likewise, many individuals have no linkage information,e.g., M1Q/M2Q or M1/M1

Page 22: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

General Pedigree Methods

zi = π + A i + A0i +ei

æ(zi;zj ) = Ri jæ2A +2£i jæ2

A0

Random effects (hence, variance component) methodfor detecting QTLs in general pedigrees

Trait value for individual iGenetic effect of chromosomal region of interestGenetic value of other (background) QTLsThe covariance between individuals i and j is thus

Fraction of chromosomal region shared IBDbetween individuals i and j.

Resemblance between relatives correctionMixed-model approaches are used, with variancesestimated for each chromosomal region.

Page 23: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

V = Ræ2A + Aæ2

A0 + Iæ2e

Ri j =Ω1 for i = j

Ri j for i6= j; A i j =

Ω1 for i = j2£ i j for i6= jb

Assume z is MVN, giving the covariance matrix as

Here

Estimated from markerdata

Estimated fromthe pedigree

p`(z j π;æ2A;æ2

A0;æ2e ) =

1

(2º)n jVjexp

∑°

12

(z ° π)T V °1 (z ° π)∏

The resulting likelihood function is

A significant A2 indicates a linked QTL.

Page 24: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

What are some of the take-homemessages from QTL mapping studies?

• Most traits show several (4-30) QTLs that arelocalized to modest-sized chromosomal segments

• Detected QTLs typically account for between 5 and 50%of the observed phenotypic variation (in the F2)

• Transgressive segregation is often observed, with high trait alleles being found in low trait value lines,and vise-versa (hidden variation for selection).

• Epistasis appears to lacking in many studies, but seems to be fairly common in eQTLs

Page 25: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

What are some concerns from QTL mapping studies?

• Replication of results is often poor.

• Common for a “single” QTL region to show multipleQTLs upon more careful fine analysis, often witheffects in opposite directions

• QTL mapping does not get at the underling genes,only isolates chromosomal regions of interest, usuallywith rather poor resolution (20 cM = 20 Megabases =200 - 2000 genes)

• When isolated in inbred lines, QTLs often show stronginteraction effects (G x G, G x E), that are not apparentin a normal analysis. Hence, likely very context-specific.

Page 26: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Density

50D

Low High

Lifespan (Days)40

45

50

55

6068B

Low High

40

45

50

55

6076B

Low High

40

45

50

55

60

OB BBOO

Genotype X environment interactionAdditive and dominance effects of QTL

are often environment-specific QTL for Drosophila longevity, different larval rearing densities

Slide courtesy of Trudy Mackay

Page 27: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

High Density

76B

BB OB

Lifespan (Days)35404550556065

Low Density

76B

BB OB

Lifespan (Days)35404550556065 BB

50D

OBBBOB

50D

More complicated effectsEpistatic effects can be sex- and environment specific

QTL for Drosophila longevity

Slide courtesy of Trudy Mackay

Page 28: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Cracks in the façade?

QTL mapping appears to dispute the infinitesimal model, suggesting a few discrete loci account for much of the variation.

Problem 1: Upon closer analysis, many of these high-value regions themselves decompose into several QTLs, not just one. How fine such a decomposition can be continued until no more QTL appear is unresolved.

Problem 2: From a molecular-biology standpoint, QTLs have not really led us significantly closer to the underling genes, and hence the molecular mechanisms for quantitative trait variation.

Page 29: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Power for detection

For an alpha of = 0.01, sample size required for90% power of detection (F2 design) is roughly22/2 , where = a/the allele effect in units of SD

Effect of linkage: for c = 0.05, 0.1, 0.2, increasein sample size (over c = 0) is 1.2, 1.6, and 2.8

Most QTL studies are vastly underpowered.

How many individuals must be scored in an F2 designin a line cross (high power setting)

Thus, the sample size for = 0.5, 0.2, 0.1, 0.05 are88,550, 2200, and 8800.

Typical QTL study in the range of n = 350, giving = 0.25

Page 30: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Power and Repeatability: The Beavis Effect

QTLs with low power of detection tend to have theireffects overestimated, often very dramatically

As power of detection increases, the overestimationof detected QTLs becomes far less serious

This is often called the Beavis Effect, after BillBeavis who first noticed this in simulation studies

The Beavis effect raises the real concern that many QTL of apparent large effect may be artifacts. Underan infinitesimal model this is especially a concern.

For example, a QTL accounting for 0.75% of the total F2 variation has only a 3% chance of being detectedwith 100 F2 progeny (markers spaced at 20 cM). For cases in which such a QTL is detected, the average estimated total variance it accounts for is 15%!.

Page 31: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Detection vs. localization

Darvasi & Soller (1997) give an appropriateexpression for the sample size required for a 95% confidence interval in position, CI = 1500/(n)2

For a QTL with d = 0.25, 0.1, and 0.05, the samplesizes needed for a 1cM CI are 1500, 3800, and 7600.

Fine mapping (localizing to under 1 cm) requires the generation of special lines, such as advanced intercross (AIC), or recombinant inbred lines (RILs). In flies, A series of overlapping deficieny strains can be used.

Page 32: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Tradeoffs in sample designs

Most QTL mapping studies are highly underpowered.While QTLs of modest effects can be detected withsample sizes of 500 or less, an order of magnitudemore is needed for high-resolution mapping.

Adding more markers does not really improvepower or resolution very much. Increasing the numberof individuals does.

Ironically, we are now at the stage where it is fair easierto score markers than to score phenotypes. This limitsthe sample sizes that can be used.

Page 33: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Mapping eQTLs

A current very fashionable trend is the mappingof expression QTLs, locations that influence theamount of protein or RNA made by a particular gene

A common design is to use RILs and examine a numberof microarrays across a modest set of lines (10-100).

Still, such designs are underpowered, making localization(cis vs. trans) difficult and the contribution from detected eQTLs being inflated by the Beavis effect.

Some improvement in power (over an F2 design) occurs because of being able to replicated within each RIL andthe expanded map distances (4 fold) found in RILs vs. F2

Page 34: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

How can we improve the abilityTo detect QTLs?

• Association mapping -- much finer resolution witha smaller sample size, using historical recombinants

• Methods for detecting genes under (or very recently under) selection.

Two complementary approaches, which require verydense marker maps, have been suggested.

Page 35: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Association mapping

Basic idea is very straightforward: If there exists very tight linkage between a marker anda QTL, with marker and QTL alleles in linkage-disequilibrium, then a random collection ofindividuals show a marker-trait association.

Since the region of LD is expected to be very small,this method potentially allows for fine mapping usingnot a collection of relatives (hard to get), but rathera random (and hence likley much larger) collectionof individuals from a population.

Page 36: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Linkage disequilibrium mapping

Idea is to use a random sample of individuals fromthe population rather than a large pedigree.

Ironically, in the right settings this approach has more power for fine mapping than pedigree analysis.

Why?

• Key is the expected number of recombinants.in a pedigree, Prob(no recombinants) in nindividuals is (1-c)n

• LD mapping uses the historical recombinants ina sample. Prob(no recomb) = (1-c)2t, where t =time back to most recent common ancestor

Page 37: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Expected number of recombinants in a sample ofn sibs is cn

Expected number of recombinants in a sample ofn random individuals with a time t back to theMRCA (most recent common ancestor) is 2cnt

Hence, if t is large, many more expected recombinantsin random sample and hence more power for veryfine mapping (i.e. c < 0.01)

Because so many expected recombinants, only workswith c very small

Page 38: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Dense SNP Association Mapping

Mapping genes using known sets of relatives can beproblematic because of the cost and difficulty inobtaining enough relatives to have sufficient power.

By contrast, it is straightforward to gather largesets of unrelated individuals, for example a largenumber of cases (individuals with a particular trait/disease) and controls (those without it).

With the very dense set of SNP markers (dense =very tightly linked), it is possible to scan for markersin LD in a random mating population with QTLs, simplybecause c is so small that LD has not yet decayed

Page 39: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

These ideas lead to consideration of a strategy ofDense SNP association mapping.

For example, using 30,000 equally spaced SNP inThe 3000cM human genome places any QTL within0.05cM of a SNP. Hence, for an association createdt generations ago (for example, by a new mutantallele appearing at that QTL, the fraction oforiginal LD still present is at least (1-0.0005)t ~1-exp(t*0.0005). Thus for mutations 100, 500,and 1000 generations old (2.5K, 12.5K, and 25 Kyears for humans), this fraction is 95.1%, 77.8%, 60.6%,

We thus have large samples and high disequilibrium,the recipe needed to detect linked QTLs of small effect

Page 40: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Problems with association mapping

Good news: Do not need a set of relatives. Hence, easierto gather a large sample.

Bad news: One can have marker-trait associations inthe absence of linkage. For example if a markerpredict group membership, and being in that groupgives you a different trait value, then a marker-trait covariance will occur.

This is the problem of population stratification.

Page 41: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Gm+ Total % with diabetes

Present 293 8%

Absent 4,627 29%

When population being sampled actually consists of several distinct subpopulations we have lumped together,marker alleles may provide information as to which groupan individual belongs. If there are other risk factors ina group, this can create a false association btw marker andtrait

Example. The Gm marker was thought (for biologicalreasons) to be an excellent candidate gene for diabetes in the high-risk population of Pima indiansin the American Southwest. Initially a very strongassociation was observed:

Problem: freq(Gm+) in Caucasians (lower-risk diabetesPopulation) is 67%, Gm+ rare in full-blooded Pima

Gm+ Total % with diabetes

Present 17 59%

Absent 1,764 60%

The association was re-examined in a population of Pimathat were 7/8th (or more) full heritage:

Page 42: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Adjusting for population stratification

• Use molecular makers to classify individuals intogroups, do association mapping within each group (structured association mapping). This approachtypically uses the program STRUCTURE

• Use a simple regression approach, adding additionalmarkers as cofactors for group membership, removingtheir effect,

y = π +nX

k=1

Øk Mk +mX

j =1

∞j bj +e

Page 43: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Scans for genes under selection

• Formal tests based on molecular variation (Tijama’s D, MK, ect.) -- either as a test for candidategenes or scanning the genome for regions showing strong signals

• Reduction in levels of polymorphism around selected site (selected sweep), or increase in the levels of polymorphism around a locus under stabilizing selection.

• Dense SNP approaches based on linkage disequilibrium and age of allele.

Page 44: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

A scan of levels of polymorphism can thussuggest sites under selection

Directional selection(selective sweep)

Balancing selection

Local region withreduced mutation rate

Local region withelevated mutation rate

Map location

Map location

Vari

ati

on

Vari

ati

on

Page 45: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Example: maize domestication gene tb1

Major changes in plant architecture in transition from teosinte to maize

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Doebley lab identified a gene, teosinite branched 1, tb1, involved in many of thesearchitectural changes

Wang et al. (1999) observed a significant decrease in genetic variation in the 5’ NTR region of tb1,suggesting a selective sweep influenced this region. The sweep did not influence the coding region.

Page 46: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Wang et al (1999) Nature 398: 236.

Page 47: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Clark et al (2004) examined the 5’ tb1 regionin more detail, finding evidence for asweep influencing a region of 60 - 90 kb

Clark et al (2004) PNAS 101: 700.

Page 48: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Formal tests

Strict neutral theory: single parameter describes(i) heterozygosity, (ii) average number of differences between alleles(iii) Number of singletons (alleles present once in sample)

A number of tests comparing these various measuresof within-population variation have been proposed: Tajima’s D, HKA, Fu and Li’s D* and F*, Fu’s W and Fs, Fay and Wu’s H, etc.

One could either test a candidate gene or do a genomicscan using dense markers to test a sliding window alonga chromosome.

Page 49: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Rejection of neutrality = locusunder selection!

A central problem with all of these frequencyspectrum tests is that a rejection of the strictneutral model can be caused by changes in populationsize in addition to a locus under selection.

Such demographic signals would be present at all loci,so that one approach is to use such signals over allloci to correct the test at any particular locus.

Another approach is to use marker information toeestimate the demographic parameters and then againuse these to generate an appropriate null (neutral) model.

Page 50: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

LD tests based on dense markers

A newer class of tests that is not influenced by demographic factors are those based on the length of linkage disequilibrium around a target site.

Under drift, alleles at moderate to high frequencies are old, and hence have smaller tracks of disequilibrium, due to time for recombination to break down longer tracks.

LD based tests of selection look for long tracks ofdisequilibrium around allele at high frequency. Thisrequires dense SNP markers

Page 51: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Summary

The jury is still out on whether current QTL studies show that the infinitesimal model (lots of loci each of small average effects) is incorrect.

In its classic form, QTL mapping has not successfully yielded a number of actual genes contributing to small amounts of variation. Hence, they have not helped us to fuse molecular biology and Quantitative genetics.

The problem with QTL mapping is not marker density(i.e, number of markers scored), but rather poor powerfrom too few individuals being scored.

Page 52: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

Summary (cont)

QTL mapping in microarrays (eQTLs) faces many of these lack of power issues and results should be interpreted with some care in the absence of replication.Association mapping, requiring very dense SNP markers,offers the potential for (i) using a much larger sample(as unrelated individuals can be used) and (ii) fine mapping.However, correction for population stratification remainsa concern.LD-based tests for selection signatures seems to be apromising approach, but also requires dense SNP mapping.while not a method to directly get at QTLs for a trait of interest, it does suggest loci under recent selection, which may eventually point to ecologically interested traits.

Page 53: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

U of A Campus

Farewell from the “desert”

Page 54: The rise (and fall) of QTL mapping: The fusion of quantitative & molecular genetics Bruce Walsh (jbwalsh@u.arizona.edu) Depts of Ecology & Evolutionary

z = π +ai +bk +dik +e

Detecting epistasisOne major advantage of linear models is theirflexibility. To test for epistasis between two QTLs,used an ANOVA with an interaction term

Effect from marker genotype at firstmarker set (can be > 1 loci)

Effect from marker genotype at secondmarker set

Interaction between marker genotypes i in 1stmarker set and k in 2nd marker set• At least one of the ai significantly different from 0

---- QTL linked to first marker set

• At least one of the bk significantly different from 0 ---- QTL linked to second marker set

• At least one of the dik significantly different from 0 ---- interactions between QTL in sets 1 and two