human genetics: the hidden text of genome-wide associations

4

Click here to load reader

Upload: greg-gibson

Post on 05-Sep-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Human Genetics: The Hidden Text of Genome-wide Associations

upwards of 80% of the messengerRNA for a particular gene and stillget a ‘normal’ plant, becausemRNA levels are not alwaysproportional to the amount ofprotein actually produced (orrequired). The possibility exists,therefore, that the alterations ingene expression observed inpolyploids are compensated forat the translational level. Logic,however, suggests that this isprobably not the case, because ofthe phenotypic success of mostpolyploids compared to theirprogenitors, and recent evidencefrom proteomic studies supportsthis assumption. In a study ofsynthetic Brassica allopolyploids,Albertin et al. [20] discovereda large number of proteinsdisplaying non-additive changes toexpression when compared to theparental taxa (305 in stem, 200 inroot). They then undertook anin silico gene expression analysisto determine whether theseproteins were restricted to aparticular grouping, such ascellular function and/orlocalisation, and found that, as withgenes identified in transcriptionalexpression studies, this was notthe case. Similarly, Albertin et al.[20] found that different proteinswithin the same complex couldhave their expression altered inopposing directions — again, asobserved in transcriptional studies.

Recent studies thereforesuggest that the process ofpolyploidisation, whetherincorporating a hybridisationevent or not, has a large-scaleimpact on gene expression inthe new individual, therebyproviding raw material uponwhich selection can act. Smallwonder, then, that polyploidspecies are so numerous — inevolutionary terms, it would seemthat two genomes are indeedbetter than one.

References1. Grant, V. (1981). Plant speciation, 2nd

Edition (New York: Columbia UniversityPress).

2. Liu, Z., and Adams, K.L. (2007).Expression partitioning between genesduplicated by polyploidy under abioticstress and during organ development.Curr. Biol. 17, 1669–1674.

3. Adams, K.L., Cronn, R., Percifield, R., andWendel, J.F. (2003). Genes duplicated bypolyploidy show unequal contributions tothe transcriptome and organ-specific

reciprocal silencing. Proc. Natl. Acad. Sci.USA 100, 4649–4654.

4. Adams, K.L., Percifield, R., andWendel, J.F. (2004). Organ-specificsilencing of duplicated genes in a newlysynthesized cotton allotetraploid.Genetics 168, 2217–2226.

5. Rieseberg, L.H., Raymond, O.,Rosenthal, D.M., Lai, Z., Livingstone, K.,Nakazato, T., Durphy, J.L.,Schwarzbach, A.E., Donovan, L.A., andLexer, C. (2003). Major ecologicaltransitions in wild sunflowers facilitated byhybridization. Science 301,1211–1216.

6. Hegarty, M.J., Jones, J.M., Wilson, I.D.,Barker, G.L., Coghill, J.A., Sanchez-Baracaldo, P., Liu, G., Buggs, R.J.A.,Abbott, R.J., Edwards, K.J., et al. (2005).Development of anonymous cDNAmicroarrays to study changes to theSenecio floral transcriptome during hybridspeciation. Mol. Ecol. 14, 2493–2510.

7. Hegarty, M.J., Barker, G.L., Wilson, I.D.,Abbott, R.J., Edwards, K.J., andHiscock, S.J. (2006). Transcriptome shockafter interspecific hybridization in Seneciois ameliorated by genome duplication.Curr. Biol. 16, 1652–1659.

8. Wang, J., Tian, L., Lee, H.-S., Wei, N.E.,Jiang, H., Watson, B., Madlung, A.,Osborn, T.C., Doerge, R.W., Comai, L.,et al. (2006). Genomewide nonadditivegene regulation in Arabidopsisallotetraploids. Genetics 172, 507–517.

9. Riddle, N.C., and Birchler, J.A. (2003).Effects of reunited diverged regulatoryhierarchies in allopolyploids and specieshybrids. Trends Genet. 19, 597–600.

10. Hammerle, B., and Ferrus, A. (2003).Expression of enhancers is altered inDrosophila melanogaster hybrids. Ecol.Dev. 5, 221–230.

11. Rieseberg, L.H., Sinervo, B., Linder, C.R.,Ungerer, M.C., and Arias, D.M. (1996).Role of gene interactions in hybridspeciation: evidence from ancient andexperimental hybrids. Science 272,741–745.

12. Wang, J., Tian, L., Lee, H.S., andChen, Z.J. (2006). Nonadditiveregulation of FRI and FLC loci mediatesflowering-time variation in Arabidopsisallopolyploids. Genetics 173, 965–974.

13. Madlung, A., Masuelli, R.W., Watson, B.,Reynolds, S.H., Davison, J., and Comai, L.(2002). Remodeling of DNA methylationand phenotypic and transcriptionalchanges in synthetic Arabidopsisallotetraploids. Plant Physiol. 129,733–746.

14. Salmon, A., Ainouche, M.L., andWendel, J.F. (2005). Genetic andepigenetic consequences of recenthybridization and polyploidy in Spartina(Poaceae). Mol. Ecol. 14, 1163–1175.

15. Lukens, L.N., Pires, J.C., Leon, E.,Vogelzang, R., Oslach, L., and Osborn, T.(2006). Patterns of sequence loss andcytosine methylation within a populationon newly resynthesized Brassica napusallopolyploids. Plant Physiol. 140,336–348.

16. Chen, Z.J., and Tian, L. (2007). Roles ofdynamic and reversible histoneacetylation in plant development andpolyploidy. Biochem. Biophys. Acta1769, 295–307.

17. Matzke, M., Kanno, T., Huettel, B.,Daxinger, L., and Matzke, A.J.M. (2006).RNA-directed DNA methylation and PolIVb in Arabidopsis. Cold Spring Harb.Symp. Quant. Biol. 71, 449–459.

18. Pikaard, C.S. (2000). The epigenetics ofnucleolar dominance. Trends Genet. 16,495–500.

19. Liu, B., Brubaker, C.L., Cronn, R.C., andWendel, J.F. (2001). Polyploid formationin cotton is not accompanied byrapid genomic changes. Genome 44,321–330.

20. Albertin, W., Alix, K., Balliau, T.,Brabant, P., Davanture, M., Malosse, C.,Valot, B., and Theillement, H. (2007).Differential regulation of gene products innewly synthesized Brassicaallotetraploids is not related to proteinfunction nor subcellular localization.BMC Genomics 8, 56–70.

School of Biological Sciences, Universityof Bristol, Woodland Road, Bristol BS81UG, UK.E-mail: [email protected]

DOI: 10.1016/j.cub.2007.08.060

DispatchR929

Human Genetics: The Hidden Textof Genome-wide Associations

Genome-wide association studies are finally leading geneticists straightto the genetic susceptibility factors for complex diseases. Severalchallenges lie ahead, including translation of the findings into practicalpublic health outcomes, and integrating genetic analysis with broaderbiological understanding.

Greg Gibson1

and David B. Goldstein2

Human genetics is in the midst of arevolution. Testing for associationbetween hundreds of thousandsof polymorphisms in severalthousand unrelated cases andcontrols allows the genome to be

scanned in an unbiased mannerfor the major susceptibility variantsfor complex diseases. Up untileighteen months ago only a handfulof gene variants had been securelyassociated with any commondiseases and the majority ofpublished claims of associationwere at best unsubstantiated, but

Page 2: Human Genetics: The Hidden Text of Genome-wide Associations

Current Biology Vol 17 No 21R930

more often simply false-positivediscoveries. Now that has allchanged. Real advances arebeing made daily and the leadingjournals weekly report thediscovery of multiple clearlyvalidated risk factors.

One recent example of this newreality is the recent paper from theWellcome Trust Case ControlConsortium [1]. The Consortiumsought to scan the majority of thehuman genome for somecontribution to seven commondiseases: bipolar disorder,hypertension, coronary arterydisease, types 1 and 2 diabetes,Crohn’s disease, and rheumatoidarthritis. For each disease, halfa million single nucleotidepolymorphisms (SNPs) weregenotyped in 2,000 patients, andthe genotype frequencies werecompared with a common set of3,000 controls, namely ‘healthy’British subjects. The core resultwas the identification of 24 highlysignificant independentassociation signals. Though notreaching true experiment-widethresholds in all cases, these sitesnevertheless stick out like sorethumbs against a mass of lessertest statistics. Approximately halfof the associations have beennoted before, providing strongvalidation for the approach. Theother half are novel, pinpointingloci of interest for follow-up, andeight of these have already beenindependently replicated in newcohorts.

For Crohn’s disease, type 2diabetes, and breast cancer, wenow have multiple, multiplyvalidated SNPs that contribute atleast several percent each topopulation attributable risk — let’scall them parSNPs. If these variantswere not present in the humangenome, or if their effects could betherapeutically ameliorated, therewould be a measurable reductionin disease incidence. Once tensof thousands of cases have beensampled in a meta-analysis ofseveral studies, parSNPs thatprovide a relative risk in thevicinity of 1.2 — a twenty percentincrease in risk of contractingthe disease relative toindividuals with the protectivegenotype — significance canelevate to in excess of 10210,

well beyond any reasonableexperiment-wise threshold. If theydon’t survive such analysis, eitherthey are false positives, perhapsbeneficiaries of the ‘winner’s curse’of sampling bias that can inflateestimates of risk, or they arepopulation-specific.

Genome-wide associationresults are being widely heralded[2] as a confirmation that many ofthe precepts (and promises) of ‘bigscience’ projects like the humangenome and HapMap projects arebeing realized. After so many yearsof frustration and disagreement inthe field, the sudden appearanceof both real discoveries and realstandards is most welcome.Henceforth, claims that variationin a particular gene contributes todisease susceptibility mustdemonstrate significance takinginto account all the hypothesesthat have been evaluated, and mustbe replicated using the same SNPand the same phenotype [3].Understandably, these standardsmay percolate with differentspeeds through the differentresearch communities that facespecific practical challenges.Naysayers may challenge theefficacy and necessity of theHapMap project as a means to thisend [4], but few practitioners willdoubt its value and utility.Nevertheless, without wishing todiminish from the scale of theseachievements or the transitionrepresented by genome-wideassociation, it is worth returningto what the real motivation is forgenetic studies, and judgingprogress by that metric.

There are three fundamentalreasons to want to map genevariants for common disease.These are: first, to predict risk andtherefore allow tailoring of lifestylechoices to risk; second, to improveunderstanding of diseasepathophysiology in a manner thatsuggests new directions fortherapy; and third, to identifysubclasses of clinically similardiseases that nevertheless havedifferent genetic etiologies andhence should respond to differenttherapies.

By the standards of thesefundamental goals whole genomestudies of common diseases haveyet to prove themselves. Quite

aside from practical challengesin relation to regulatory oversightand implementation of robustgenetic counseling protocols,genome-wide association data arecurrently explanatory but are notyet usefully predictive. Even in themost successful case, Crohn’sDisease, with nine validatedassociations, most of the geneticrisk in the population remains to beexplained, and the proportionalcontribution of parSNPs to familyclustering seems to be less thana few percent [1,5]. If there areother factors out there, they areunlikely to individually account formore than one or two percent ofsusceptibility, and hence somewould argue that they will bepractically and clinically irrelevant.By contrast, it has been suggestedthat pharmacogenetics is likely toprovide more immediate clinicalreturns than disease genetics,a good example of which is therecent demonstration that HLA-Bgenotype predicts hypersensitiveresponse to the anti HIV drugabacavir [6].

The term parSNP serves asa reminder that the purpose ofgenome-wide association is not somuch to find individual risk alleles aspopulation attributable risk factors.Like the concept of heritability,which applies to populations notindividuals, parSNPs are identifiedby their influence averaged acrossindividuals within a population. Weare not aware of any claims thatcommon disease variants wouldtypically be prognostic, andsuspect that geneticists much morecommonly imagine that there arelikely dozens if not hundreds ofvariants that can contribute to anygiven disease. Individuals whohappen to inherit an excess ofthese, and experience damningenvironmental circumstances, aremore susceptible than others. It isthe constellation of factors that ispotentially predictive, not singlevariants.

There are some tantalizingsuggestions from multivariatemodeling that this may be the case.As a class, SNPs in genes involvedin steroid hormonal regulation andcell cycle control are morecommonly associated with breastcancer than if you query similarlysized sets of SNPs in randomly

Page 3: Human Genetics: The Hidden Text of Genome-wide Associations

DispatchR931

chosen loci [7]. Similarly, a recentreanalysis of a low resolutiongenome-wide association forParkinson’s disease, which initiallycame up with a single strongcandidate parSNP, implicatesgenes involved in axonal guidanceas a class in the etiology of thisneurodegenerative disease [8].The joint probability of associationof a signature involving 117 geneswith Parkinson’s disease is in therange of 10240.

Andre Rzhetsky and colleagues[9] arrived at a similar conclusionwithout even looking at genotypes.They scoured the medical recordsof 1.5 million patients of theColumbia University MedicalSystem and considered the overlapin disease diagnoses for 124relatively common diseases. Afteradjusting for correlations due toage and sex, they find strikingtendencies for certain diseasesto co-occur in individuals (forexample, autism, bipolardepression and schizophrenia).They even estimate that some ofthese diseases are likely to shareas many as 20% or more of theirsusceptibility alleles.

We are led to conclude thatgenome-wide association is morelikely to have an impact withrespect to the second and thirdobjectives presented above.For example, a genome-wideassociation for asthmasupports increased interestin airway remodelingalongside inflammatoryhyper-responsiveness [10];autophagy is implicated in theetiology of inflammatory boweldisease [5]; and the bipolar disorderdata [1] place the copious literaturepertaining to neurotransmittersand synaptic transmission asmediators of depression in agenome-wide perspective. Theimpact of genetics on diseasesub-classification is yet to be felt,but clearly there is now goodreason to expect advances.

A remarkable but yet-to-beremarked upon aspect of therecent deluge of genome-wideassociation data is the apparentdiversity in the yield of associationsfor different diseases. Why, forexample, is it that hypertension andbipolar disorder are devoid of theclass of highly significant hits that

characterize the immune-relateddiseases Crohn’s disease and type1 diabetes? A common view is thatthis as essentially the luck of thedraw — some traits happen to havepolymorphisms influencing themthat are both common andimportant enough to be detectablein manageable sample sizes, andother traits do not. A moreintriguing interpretation is that thedata are telling us that some traitsare largely resistant to the effectsof common genetic variationwhereas others are much moresusceptible.

While it is perhaps too early todeclare a clear pattern, we suspectthere may be evolutionaryexplanations for the differencesamongst traits. Contrast, forexample, two quantitativetraits: blood pressure and HIVviral load following infection. Whilehypertension produced no cleardiscoveries in 2000 cases [1],a study of just 500 subjectscharacterized for HIV-1 viral loadidentified three variants explaining14% of the total variation [11]. Wedoubt these differences are theluck of the evolutionary draw andsuspect they may result froma kind of evolutionary canalization.A key difference between HIV-1set point and blood pressure isthat blood pressure is likelyto have been close to along-term optimum, whereasthis is not the case for the HIV-1setpoint.

Canalization refers to theobservation that genetic systemshave often evolved robustness togenetic and environmentalperturbation [12]. Expression ofphenotypic variation is suppressedby the network of epistaticinteractions among alleles withinthe normal environmental contextthat an organism finds itself in.Take the organism outside of thisbuffering zone, for example, bydramatically changing diet andimmune exposure, and crypticvariation can be exposed [13].Variants that under ‘normal’circumstances would notcontribute appreciably todisease do so in the modernworld because they pushindividuals closer to the thresholdupon which the new environmentacts (Figure 1A).

The possibility that(de)canalization helps to explainthe heterogeneity of genome-wideassociation results to date shouldbe considered. A tentativeindication that genetic bufferingmay suppress the effects ofmutations that would otherwiselead to disease is seen in the shapeof the Q-Q plots from the WellcomeTrust Case Control Consortiumstudy [1] (Figure 1B,C). These plotsshow the relationship betweenobserved and expected teststatistics, and for five of thediseases, the observed curvesdeviate sharply to higher thanexpected test statistic values atthe upper extreme, due to theparSNPs. For bipolar disorder andhypertension, by contrast, thesecurves actually plateau off aswould be observed if the effectsof candidate parSNPs werebuffered.

Intriguingly, there is alsoa difference in the polarity ofassociations for different classesof disease. Most of the parSNPsare more likely tagging SNPsthan the causal sites, so it is notclear what their relationships tothe ancestral status of the truedisease-promoting variants are.Nevertheless, including moderateevidence for association, almosttwo thirds of all risk-associatedSNPs in the Wellcome Trust CaseControl Consortium study [1] areancestral, implying that the derivedallele in thehuman lineage isofferingprotection from disease. Morestrikingly, for hypertension, all of the(moderate) risk alleles are ancestral,and for rheumatoid arthritis 10 of 12are. The clear exception is Crohn’sdisease, for which 10 of 16 of therisk-associated parSNPs that wewere able to polarizeunambiguously are derived.

A prediction of thedecanalization hypothesis is thatas the prevalence of a diseaseincreases in modern society, thelikelihood of genome-wideassociation uncoveringsusceptibility alleles alsoincreases. Inflammatory diseaseshave increased in prevalencerecently, arguing that deleteriousinteractions between the modernenvironment and derived allelescorrode the evolved buffering,which in turn may expose ancestral

Page 4: Human Genetics: The Hidden Text of Genome-wide Associations

Current Biology Vol 17 No 21R932

Disease

Freq

(su

scep

t.)A

B C

Genotype

30

20

10

0

30

20

10

0

Obs

erve

d

Obs

erve

d

0 10 20 30

Expected

0 10 20 30

ExpectedCurrent Biology

Figure 1. Canalization and genome-wide association.

(A) Model of effect of canalization on disease. Each genotype in a population is asso-ciated with a level of disease susceptibility that will typically be normally distributedbut with a liability threshold indicated by the vertical broken line. Under normal envi-ronmental or genetic circumstances (green), very few individuals are above the thresh-old, but in an altered environment or non-equilibrium genetic circumstances (red), thevariance of the susceptibility increases, and more genotypes give rise to individualsabove the threshold, exposing hidden variation. (B) For coronary artery disease,Crohn’s disease, types 1 and 2 diabetes, and rheumatoid arthritis, the observed asso-ciation test statistics (negative log p-values) exceed the expected range of valuesshown in light green. (C) For bipolar disorder and hypertension, by contrast, high as-sociation scores are not seen and the Q-Q plot curve actually flattens off to a plateau,possibly suggesting buffering.

alleles to disease promotion. Ourrecent demonstration of highlysignificant associations of derivedalleles with HIV setpoint [11] maybe another instance of thisphenomenon. Not having beenexposed to the virus until a fewgenerations ago, there has beenno time for the genome to evolvecanalization, and additive variationfor viral titers segregates in humanpopulations.

This hypothesis puts us slightlyat odds with the recommendationsof Merikangas and Risch [14] inrelation to public health prioritiesfor investment in genomicresearch. They argued that thereis little point in expenditure ongenome-wide association fordiseases where the geneticcontribution pales in comparison toenvironmental factors, since publichealth campaigns to modifybehavior or otherwise address theenvironmental risk will be far moreeffective than treatments that

target genetic variation. However,strong environmentalperturbations are precisely theconditions under which hiddengenetic variants may be revealedand hence under whichgenome-wide association may bemost likely to succeed. Theimmediate benefit of parSNPs isnot as biomarkers for personalizedmedicine, but rather as entrypoints into the hidden genetic textof complex disease. A policyconsequence is that geneticadvances may, counterintuitively,be most obvious where humans areexposed to novel environmentalexposures, including nutrition andchildhood infection. Findings in atrisk populations are likely to carryover into canalized ones, and thekey is to seek understanding of allfacets of risk.

References1. The Wellcome Trust Case Control

Consortium (2007). Genome-wide

association study of 14,000 casesof seven common diseases and3,000 shared controls. Nature 447,661–678.

2. Altshuler, D., and Daly, M. (2007). Guiltbeyond a reasonable doubt. Nat. Genet.39, 813–815.

3. NCI-NHGRI Working Group onReplication in Association Studies (2007).Replicating genotype–phenotypeassociations. Nature 447, 655–660.

4. Terwilliger, J.D., and Hiekkalinna, T.(2006). An utter refutation of the‘‘Fundamental Theorem of theHapMap’’. Eur. J. Hum. Genet. 14,426–437.

5. Rioux, J.D., Xavier, R.J., Taylor, K.D.,Silverberg, M.S., Goyette, P., Huett, A.,Green, T., Kuballa, P., Barmada, M.M.,Datta, L.W., et al. (2007). Genome-wideassociation study identifies newsusceptibility loci for Crohn disease andimplicates autophagy in diseasepathogenesis. Nat. Genet. 39,596–604.

6. Phillips, E., and Mallal, S. (2007). Drughypersensitivity in HIV. Curr. Opin. AllergyClin. Immunol. 7, 324–330.

7. Pharoah, P.D.P., Tyrer, J., Dunning, A.M.,Easton, D.F., and Ponder, B.A.J., and theSEARCH Investigators (2007). Associationbetween common variation in 120candidate genes and breast cancer risk.PLoS Genet. 3, e42.

8. Lesnick, T.G., Papapetropoulos, S.,Mash, D.C., Ffrench-Mullen, J.,Shehadeh, L., de Andrade, M., Henley, J.,Rocca, W., Ahlskog, J.E., andMaraganore, D.M. (2007). A genomicpathway approach to a complex disease:axon guidance and Parkinson Disease.PLoS Genet. 3, e98.

9. Rzhetsky, A., Wajngurt, D., Park, N., andZheng, T. (2007). Probing genetic overlapamong complex human phenotypes.Proc. Natl. Acad. Sci. USA 104,11694–11699.

10. Moffatt, M.F., Kabesch, M., Liang, L.,Dixon, A.L., Strachan, D., Heath, S.,Depner, M., von Berg, A., Bufe, A.,Rietschel, E., et al. (2007). Geneticvariants regulating ORMDL3expression contribute to the risk ofchildhood asthma. Nature 448,470–473.

11. Fellay, J., Shianna, K.V., Ge, D.,Colombo, S., Ledergerber, B., Weale, M.,Zhang, K., Gumbs, C., Castagna, A.,Cossarizza, A., et al. (2007). Awhole-genome association study of majordeterminants for host control of HIV-1.Science 317, 944–947.

12. Gibson, G., and Wagner, G. (2000).Canalization in evolutionary genetics:a stabilizing theory? Bioessays 22,372–380.

13. Gibson, G., and Dworkin, I. (2004).Uncovering cryptic genetic variation.Nat. Rev. Genet. 5, 681–690.

14. Merikangas, K.R., and Risch, N. (2003).Genomic priorities and public health.Science 302, 599–601.

1Department of Genetics, North CarolinaState University, Gardner Hall, Raleigh,North Carolina 27695-7614, USA.2Center for Population Genomics andPharmacogenetics, Institute for GenomeScience and Policy, Duke UniversityMedical Center, Durham, North Carolina27708, USA.E-mail: [email protected],[email protected]

DOI: 10.1016/j.cub.2007.08.044