rsnp's:evolution caught in the act

9
rSNP's: Evolution Caught in the Act Michael J.T. O'Kelly 6.085 Final Project Presentation In this talk: First, some background science: yeast ribosomal DNA and repeat Single Nucleotide Polymorphism model How to get rSNP data from a shotgun DNA database Improving the data with quality scores Inferring recombination dynamics from rSNP statistics

Upload: mjtokelly

Post on 27-Jun-2015

661 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: rSNP's:Evolution Caught in the Act

rSNP's:Evolution Caught in the Act

Michael J.T. O'Kelly6.085 Final Project Presentation

In this talk:●First, some background science: yeast ribosomal DNA andrepeat Single Nucleotide Polymorphism model

●How to get rSNP data from a shotgun DNA database●Improving the data with quality scores●Inferring recombination dynamics from rSNP statistics

Page 2: rSNP's:Evolution Caught in the Act

Background: Ribosomal DNA●Yeast rDNA consists of ~150 identical* copies of a 9.1 kbp sequence encoding several ribosomal RNA's. ●Mutation strikes only one repeat at a time. Recombination either duplicates or eliminates neutral mutations, homogenizing the rDNA*as far as anyone knows or cares, so far

●Repeats are gained or lost about every 30 generations, through several recombinatory mechanisms (illustrated at left). ●Mutation in the rDNA array occurs about every 1,000 generations●A repeat Single Nucleotide Polymorphismis a mutation shared by only a fraction of the rDNA repeats in a particular yeast strain

Page 3: rSNP's:Evolution Caught in the Act

Finding rSNP's:Example: GATACATGTCTTGATAATGT

Let's use BLAST to align shotgun fragments, with a sliding window along the entire consensus rDNA sequence.

●Shotgun DNA library's coverage is ~170x for rDNA.●Align all shotgun sequences that agree (mostly) with the target. ●Basepairs that deviate entirely are conventional Single Nucleotide Polymorphisms●Basepairs that deviate sometimes are probably repeat Single Nucleotide Polymorphisms

ttttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacg ttgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaacc tGATACATTTCTTGATAATGTtgcatatcagtaa tttctggctcattgatagattgttGATACATTTCTTGATCATGT ttGATACATTTCTTGATAATGTtgcatatcagtaac agattgttGATACATTTCTTGATAATGTtgcatatcagt ctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaac atagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt ttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaac ctcattgatagattgttGATACATTTCTTGATAATGTtgcata tttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaac attgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccc gattgttGATACATTTCTTGATCATGTtgcatatcagtaacgtaaccc ttGATACATTTCTTGATAATGTtgcatatcagtaacgt attgatagattgttGATACATTTCTTGATAATGTt gctcattgatagattgttGATACATTTCTTGATAATGTtgcatat ttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt tcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt tctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcag gattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaacccttg ttGATACATTTCTTGATAATGTtgcatatcag gatagattgttGATACATTTCTTGATCATGTtgcat tggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagt ctcattgatagattgttGATACATTTCTTGATCATGTtgcatatcagtaa ttgttGATACATTTCTTGATAATGTtgcatatcagt tggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtttttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcat gatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt

Page 4: rSNP's:Evolution Caught in the Act

SNP & rSNP map for one yeast strain

Disagreement ratio shows some bp with 100% disagreement, some with moderate disagreement, and many probably spurious points of low disagreement.

Total coverage varies from 25x to 150x.

Page 5: rSNP's:Evolution Caught in the Act

Using Quality Scores to evaluate correctness of disagreement

Quality score: n=0-60 represents reliability of nucleotide determination.

Let's reject all scores worse than 30.

Then C is accepted as a probable rSNP, but G is rejected.

G A T A C A T T T C T T G A T A G T G T5 5 5 5 3 6 5 6 3 3 4 5 5 3 3 5 2 3 6 66 8 2 6 1 0 6 0 3 1 2 8 5 5 2 7 6 8 0 0

G A T A C A T T T C T T G A T A A T G T4 5 5 5 2 6 4 5 4 4 5 6 5 4 5 3 5 3 6 64 1 0 8 8 0 0 0 6 2 0 0 3 1 5 6 2 0 0 0

G A T A C A T T T C T T G A T A A T G T4 3 5 4 5 3 6 6 5 3 5 5 4 5 4 3 5 5 5 57 3 9 7 8 9 0 0 8 8 1 8 3 1 2 3 3 8 1 9

G A T A C A T T T C T T G A T C A T G T5 4 4 2 3 3 5 5 5 6 5 5 5 3 4 6 3 3 6 57 8 8 7 6 0 9 2 8 0 6 8 9 4 0 0 7 6 0 9

G A T A C A T T T C T T G A T A A T G T5 5 5 6 4 4 5 4 3 5 5 6 4 6 6 4 5 5 5 69 5 6 0 5 3 5 5 1 9 9 0 4 0 0 4 9 9 2 0

P error =10− n

10

Page 6: rSNP's:Evolution Caught in the Act

SNP & rSNP map after Quality Score filter

Quality coverage is nearly as frequent as total coverage. Most basepairs that disagreed in only one alignment had low Quality.

Page 7: rSNP's:Evolution Caught in the Act

rSNP fingerprints of all yeast strains

(Peak heights exaggerated for visibility.) Partial and full peaks line up for many strains.

Page 8: rSNP's:Evolution Caught in the Act

Aggregate rSNP distribution observed in shotgun database

●For all rSNPs we estimate the fraction of repeats containing each variant letter

●RSNPs tend to be observed in a small minority of repeats, rather than in 50/50 ratios.

●The number of base-pairs having rSNPs, and the fraction of repeats containing each rSNP, are influenced by the underlying dynamics of recombination.

What models best predict the observed distribution?

Page 9: rSNP's:Evolution Caught in the Act

Uniform vs. Non-Uniform Recombination Models

●Non-uniform recombination results in more peaked rSNP distribution than uniform model. ●Posterior probability of non-uniform model is higher by far. ●Shotgun analysis and labwork agree: recombination in rDNA is non-uniform!

Observed: Expected under models: