what has variation data taught us about the biology of recombination?

20
What has variation data taught us about the biology of recombination? Rory Bowden, Afidalina Tumian, Ronald Bontrop, Colin Freeman, Tammie MacFie, Gil McVean, Peter Donnelly Simon Myers

Upload: dyllis

Post on 13-Jan-2016

26 views

Category:

Documents


2 download

DESCRIPTION

What has variation data taught us about the biology of recombination?. Simon Myers. Rory Bowden, Afidalina Tumian, Ronald Bontrop, Colin Freeman, Tammie MacFie, Gil McVean, Peter Donnelly. Recap: Composite likelihood results. Loci. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What has variation data taught us about the biology of recombination?

What has variation data taught us about the biology of

recombination?

Rory Bowden, Afidalina Tumian, Ronald Bontrop, Colin Freeman, Tammie MacFie,

Gil McVean, Peter Donnelly

Simon Myers

Page 2: What has variation data taught us about the biology of recombination?

Recap: Composite likelihood results

• Statistical algorithms to estimate historical rates, and identify hotspots– Applied genome-wide– Kilobase scale resolution

• Model-based inference from linkage disequilibrium data (LD)– coalescent model

1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 00 0 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 1 1 0 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 ? 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0 01 0 1 0 0 0 0 0 0 0 0 0 0 ? 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 1 1 0 ? 1 1 01 0 1 0 0 0 0 0 0 0 1 0 0 ? 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 ? 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0 10 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0 00 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 10 0 0 1 1 1 1 1 1 1 0 1 1 0 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 01 0 1 0 1 0 0 0 0 0 0 0 0 ? 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 ? 1 1 00 0 0 0 0 0 1 0 0 0 0 0 0 ? 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ? 0 1 1 0 1 1 0 ? 0 0 10 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 10 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0 10 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 01 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 ? 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 00 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 0 1 1 00 0 0 1 1 1 1 1 1 1 0 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0

Indi

vidu

als

Loci

(Myers et al. 2005)

Page 3: What has variation data taught us about the biology of recombination?

Recombination questions• Human recombination is poorly understood despite intense work

• Recombination clusters into 1-2kb “hotspots” – why?

• Why are hotspots where they are in the genome?– Primary DNA sequence?– Epigenetics?

• What biological machinery produces hotspots?

• How are hotspots evolving?

Page 4: What has variation data taught us about the biology of recombination?

32,996 “HapMap” hotspots• These hotspots account for 50-70% of all human recombination• Why are they where they are?• We can look at the fraction of a genome that is “G” or “C” in a region

Also see weak correlation with e.g. positions of genes

Are there any stronger predictive features?

Page 5: What has variation data taught us about the biology of recombination?

Broad scale sequence features and recombination

THE1B (LTR of retrotransposon)

THE1B: Found in 1196 hotspots versus 606 coldspots (p<<10-20) AluY: Found in 3635 hotspots versus 3262 coldspots (p=7x10-5)

Use >20,000 hotspots localized to within 5kbFor each, create a matched “coldspot”Compare sequence features

Page 6: What has variation data taught us about the biology of recombination?

• Compared primary DNA sequence at 30,000 human hotspots and matched coldspots

• Looked at all “words” of length 5-9 (e.g. 131072 possible 9-mers), refined results• Identified a 13-bp motif, CCNCCNTNNCCNC (Myers et al. 2008)

A motif for human hotspots

...CTTCCGCTATGATTGTGAGGCCTCCCTAGCCATGTGGAACTGTGAGCCCATT...

...CTTCCGCCATGATTGTGAGGCCTCCCTAGCCATGTGGAACTGTGAGTCCATT...

...CTTCCGCCATGATTGTGAGGCCTCCCTAGCCACGTGGAAC-GTGAGTCCATT...

...CATCCGCCATGATTGTGAGGCCTCCCTAGCCACGTGGAACTGAGAGTCCATT...

...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGGGGAACTGTGAGTCCATT...

THE1 repeats in hotspots...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT...

...CTTCCGTTATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAATCCATT...

...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGGACTGTGAGTCCATT...

...CTTCCGCC-TGATTCTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT...

...CTTTCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGC-TGTCCATT...

THE1 repeats in coldspots

Page 7: What has variation data taught us about the biology of recombination?

Average rates around the motif

• Confirmed via sperm studies, revealing disruption of first 7-bp part of motif disrupts hotspot activity (Neumann and Jeffreys 2002)

• Active on multiple backgrounds (e.g. THE1, L2, Alu repeats and unique DNA…)• Plays a role at c. 43% of hotspots identified through LD, or directly through sperm typing

Penetrance >60%3-5% of hotspots

Penetrance 7.5%5% of hotspots

Page 8: What has variation data taught us about the biology of recombination?

The motif is actually longer

• Based on examining only non-repeat DNA in hotspots• Independent of results on previous slide• Region that matters >30bp

Page 9: What has variation data taught us about the biology of recombination?

Recombination and human disease:X-linked ichthyosis

The breakpoint hotspot contains the greatest concentration of the 13-bp motif, within a segmental duplication, anywhere in the entire genome

Deletion breakpoint hotspot (Van Esch et al. 2005)

(Myers et al. 2008)

Page 10: What has variation data taught us about the biology of recombination?

The motif is associated with NAHR syndromes

Multiple genomic disorders are caused by the same phenomenon: “non-allelic homologous recombination” (NAHR) Rearrangement endpoints are consistently clustered into narrow hotspots: • X-linked ichthyosis

• Charcot-Marie-Tooth disease (CMT1A)

• NF1

• Sotos syndrome

• Smith-Magenis syndrome• Williams-Beuren syndrome

The motif is present, close to breakpoint hotspots, in each case (p=0.00055)

Page 11: What has variation data taught us about the biology of recombination?

A ‘common deletion’ in mitochondria occurs at the motif

Myers et al (2008)

Page 12: What has variation data taught us about the biology of recombination?

What binds the motif?

• 3-bp periodicity suggests by a “zinc finger” (ZF) protein with at least 12 zinc fingers (Myers et al. 2008)

• For genes coding for ZF proteins, we can predict their binding target bioinformatically (Persikov et al. 2009)

• Searched systematically– Zinc finger protein database of 691 C2H2 ZF proteins– Perform in silico binding predictions

• Look for matches to 13-bp motif, degeneracy (Myers et al. 2009)

Page 13: What has variation data taught us about the biology of recombination?

PRDM9 is unique candidate for the motif binding protein

Page 14: What has variation data taught us about the biology of recombination?

PRDM9 binding of the motif

Motif identified by hotspot-coldspot comparison

Bioinformatic prediction of PRDM9 binding “target”

ZF part of PRDM9. 13 zinc fingers, one separated(showing four codons in each zinc finger that determine binding target)

(Myers et al. 2009)

Page 15: What has variation data taught us about the biology of recombination?

Details of PRDM9• Independent work by two additional groups confirms that PRDM9 is a gene that directly

determines hotspot locations in both humans and mice– Mapped a gene in mice, meaning different inbred strains possess different hotspot positions, to PRDM9– Baudat et al. (2009), Parvanov et al. (2009), Myers et al. (2009)– Gel shift assays demonstrate PRDM9 really does bind the predicted motif: Baudat et al. (2009)

• PRDM9 puts an epigenetic mark on the histone DNA packaging– H3K4 trimethylation– The identical mark is used by yeast to mark hotspots (Borde et al. 2009)– Conservation over >1 billion years of evolution

• In mice – Different PRDM9 types mean different hotspot positions (Buard et al. 2009; Baudat et al. 2009)– Prdm9 expressed only in meiotic prophase (Hayashi et al. 2005)– Prdm9 -/- mutants infertile,fail to repair DSBs (Hayashi et al. 2005)

Page 16: What has variation data taught us about the biology of recombination?

Baudat et al. (2009)

Percent usage of LD hotspots

• Considerable variation in PRDM9 in humans, which influences the usage of hotspots as defined from LD data

• Different humans have different hotspot

Page 17: What has variation data taught us about the biology of recombination?

How are hotspots evolving?

Hotspots are radically different between humans and chimps

Human

Chimp

LDhat rate estimates

LDhot hotspots

Winckler et al. (2005)

Page 18: What has variation data taught us about the biology of recombination?

PRDM9 is radically different in chimpanzees

• Sharing between human and chimps: 1 of 13 zinc fingers

• Least shared of all 544 orthologous ZF protein pairs with at least two distinct zinc fingers in each species

• Patterns in multiple species indicate positive selection (Oliver et al. 2009)

• One of the fastest evolving genes in the human genome

Page 19: What has variation data taught us about the biology of recombination?

Crossover activity at motif is human-specific

p=0.0007

Human motif sites Chimp motif sites

THE1 repeatsL2 repeats

694 SNPs, 36 western chimpanzees16 THE1 regions, 6 L2 regions

HapMap data, 210 humans

Position relative to motif Position relative to motif

Page 20: What has variation data taught us about the biology of recombination?

Conclusions and current directions• Why are hotspots where they are in the genome?

– PRDM9 has sequence specific binding– Specifies narrow hotspot sites– Targets primary DNA sequence but makes an epigenetic “mark”– Only 40% of hotspots??– Looking at PRDM9 binding in vivo using Chip-seq

• PRDM9 is evolving like crazy!– Between species– Within humans– Within mice, chimps,….– Resequencing data for 10 chimpanzees to define their hotspots

• PRDM9 is the only mapped speciation gene in any mammal – Hybrid sterility in mouse (Mihola et al. 2009)– What is the link between recombination and speciation?– Does PRDM9 evolution, in general, lead to breeding barriers between species?

• Recombination and the motif implicated in multiple diseases– PRDM9 variation suggests different people susceptible to different genomic disorders