improved hit criteria for dna local alignment jobim 2004 montréal - june 28th laurent noé, gregory...

Post on 31-Mar-2015

214 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Improved hit criteria for DNA local Improved hit criteria for DNA local alignmentalignment

JOBIM 2004 Montréal - June 28th

Laurent Noé, Gregory KucherovLORIA, Nancy

France

2

PlanPlan

Introduction– Local alignment– Heuristic methods

Hit criteria– Seed Models and extension proposed– Single/Multiple hit strategies and extension proposed

Experiments Conclusion

– Extensions

3

Local alignment methodsLocal alignment methods

Why being interested in local alignment methods– Improvement needed

#sequences , #users , ( budget )

Dynamic programming (Smith-Waterman)– Give an exact solution– Quadratic cost

(Best optimization in [Crochemore et al 02])

Heuristic Algorithms– Fasta, Blast, PatternHunter, Blastz, Yass,…In practice

4

Dot plot

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

Detected alignment

Seed filtering Seed filtering

Start with small conserved and easily detected fragments (seeds).

Then extend seeds to build possible alignments

Detected seeds

5

Dot plot

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

Two questions usually askedTwo questions usually asked

1. seed model: What can serve as a seed?

2. hit criterion: What is the criterion that witnesses a potential alignment?

Detected alignment

Detected seeds → 1. Seed model

→ 2. Hit criterion

6

1.1. What can serve as a seedWhat can serve as a seed

Exact similarity :

Seed Pattern :

Contiguous Seed

Example :

ATCAGT||||||ATCAGT######

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

7

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

Spaced Seed Model Spaced Seed Model [Ma et al. 02: PATTERNHUNTER][Ma et al. 02: PATTERNHUNTER]

Seed Pattern : ###--#-##

‘#’ : obligatory match position‘-’ : joker position (“don’t care” position)

Weight : 6 [number of #] Span : 9 [number of all symbols]

Example : ###--#-##

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

8

Spaced SeedsSpaced Seeds

Some probabilistic observations:

For spaced seeds, hits at subsequent positions are more independent events

For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different

||||||||||||||||| ###### ######

||||||||||||||||| ###--#-## ###--#-##

||||||||||||||||| ###### ######

||||||||||||||||| ###--#-## ###--#-##

9

Some probabilistic observations:

ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA

###--#-##

ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA

######

ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA

###--#-##

ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA

######

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA

######

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA

######

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA

######

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA

###--#-##

ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA

######

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

###--#-##

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

######

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##

###--#-##

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######

######

For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different

10

Spaced seedsSpaced seeds

Spaced seed model is generally more sensitive than the contiguous seed model

Extend spaced seed model by taking into account DNA substitutions specificity

11

Biological properties

Transitions are usually over-represented.Regularity phenomenon in coding sequences. Use those properties to extend the spaced seed model

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

Mutational events Mutational events

A T

G Ctransitions

transversions

.:

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

12

BLASTZ modelBLASTZ model

[Schwartz et al. 03][Schwartz et al. 03]

A spaced seed that allows one possible transition substitution over its ‘#’ positions.

Problem : running time seed of large weight to obtain reasonable speed.

ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

13

YASS model: YASS model: Transition Constrained SeedsTransition Constrained Seeds

Seed Pattern: ##@#-#@-###‘#’ : obligatory match position‘-’ : joker position (“don’t care” position)‘@’ : transition constrained position

transition constrained position: position that corresponds to either a match or a transition.

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

14

Transition Constrained SeedsTransition Constrained Seeds

Seed Pattern: ##@#-#@-###‘#’ : obligatory match position‘-’ : joker position (“don’t care” position)‘@’ : transitions constrained position

Weight : 8 [number of # + half number of @]

@ carries 1 bit of information whereas # carries 2 bits.

@ adapted to GC-rich/poor genomes

15

Spaced seeds and Spaced seeds and Transition-Constrained SeedsTransition-Constrained Seeds

Seed pattern ( why ##@#-#@-### and not #@-#-#-#@# ?) – Not chosen randomly → Need to:

• define an alignment model.• search for the best (at least a good) seed pattern according to

this model. ( Sensitivity : probability to detect any alignment given by the

model )

– Chosen model can drastically change the seed shape…

ExampleBernoulli model ##@-#@#--#-#-###Markov model ##@##-##@##

16

– Bernoulli [Keich et al 02]

– Markov [Buhler et al 03]

– Automata (M3/M8) and HMMs [Brejova et al 03]

– Homogeneous alignments [Kucherov et al 04]

ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA

|||||.||.||||:||||||||||.||.||||:|||||2222212212222022222

2222212212222022222

P(’2’) = 0.7, P(’1’) = 0.15, P(’0’) = 0.15

222221221222 X

Transition has an emission probability for each symbol

Ex : P(’2’) = 0.8, P(’1’) = 0.10, P(’0’) = 0.10

Probabilistic Alignment Models:Probabilistic Alignment Models:

“HSP” Alignments found by heuristic algorithms

17

Seed DesignSeed Design

Alignment Model : Bernoulli– P(match) = 0.7, P(transition)=0.15, P(transversion)=0.15

– alignment length = 64

18

Seed DesignSeed Design

Alignment Model : Markov– 5th Order, obtained on N.Menengitidis, S.Cerevisiae, Drosophila, and

Human sequences.

19

ExperimentsExperiments

S.Cerevisiae/Neisseiria sequences

20

To summarize ...To summarize ...

We have presented several seed models (contiguous, “classic” spaced seeds, BLASTZ)

We introduced transition-constrained seeds and showed how they improve the sensitivity

From detected seeds to detected alignments

21

2.2. Hit criterionHit criterion

What is the criterion that witnesses a potential alignment ?

Restriction : only the information about seeds is available

Dot plot

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

Detected alignment

Detected seeds

→ 2. Hit criterion

22

Several methods have been proposedSeveral methods have been proposed

FASTA:– Several small seeds on

proximal diagonals

BLAST: (single hit)– One “large” seed.

Gapped-BLAST: (double hit)– Two seeds on the same diagonal

To define a good criterion we have first to define a class of similarities we want to detect : mutation model

Dot plot

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

23

Mutation effect Mutation effect onon Seeds Seeds

Mutation effect

– Substitutions : “suppressing seeds”

– Indels : “diagonal shifts”

Remaining seeds

– Estimation of inter-seed distances• via a Waiting Time distribution

– Estimation of diagonals shifts• via a Random Walk model

ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

24

YASS hit criterionYASS hit criterion

According to these parameters, YASS propose:

– An intermediate criterion between BLAST single/Gapped Blast double hit criterion.

– Overlap controlled multi-hits

|:|||||||:|||:||| ###### ######

|:||||:|||||:|.|. ###--#-## ###--#-##

7 9

25

SensiSensitivitytivity Comparison of BLASTn/Gapped-BLAST/YASS hit criteria

score 25

26

SensiSensitivitytivity (cont) (cont) Comparison of BLASTn/Gapped-BLAST/YASS hit criteria

score 35

27

YASS criterion mixed with spaced seedsYASS criterion mixed with spaced seeds

28

ExperimentsExperiments

Local alignment sensitivity– YASS software / BLASTn (2.2.6 package)

M.t : M. tuberculosis CDC1551 S.s : Synechocystis sp. PCC 6803V.p : Vibrio p. RIMD 2210633 IY.p : Yersinia pestis KIM

29

AdsAds

30

AdsAds

YASS web page

http://www.loria.fr/projects/YASS

YASS can be queried online

http://yass.loria.fr

YASS is Open Source

31

ConclusionsConclusions

Two improvements:– Transition-constrained spaced seeds– Hit criterion combining statistical models and advantage of

single/multi hit strategies.

A tool that implements both of them

32

ExtensionsExtensions

To be done

– Multi-seed approach [Li03, Bulher04, Noe04]

– Seed design on the fly (non necessary static seeds).

– and others …

33

QuestionsQuestions

agctga

g?cc??

tatgag

caa?ga

cca??a

ctc?gc

ggcgca

tctagg

ag??ac

c???tc

ttcttc

g

???? ??

34

35

|95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:||||CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700

|95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.|ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770

|95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|:CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850

|95320 |95310 |95300 |95290 |95280 |95270 |95260GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG|||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:||||GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930

|95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG:|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.|AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010

|95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:||CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090

|95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA:|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||.GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170

|95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.|||TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250

|94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..::CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT|584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330

|94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA|||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::...TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410

|94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:||TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460

*(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r* S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand)* score = 1073 : bitscore = 491.92* mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335

|96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG|||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||.TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980

|96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.|||CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060

|96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG|:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.|||||CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140

|96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA:|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:.CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220

|95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:|TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300

|95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT|.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:||||CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380

|95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||.GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460

|95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.:::::TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT|583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540

|95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG|....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:||CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620

36

|95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:||||CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700

|95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.|ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770

|95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|:CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850

|95320 |95310 |95300 |95290 |95280 |95270 |95260GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG|||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:||||GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930

|95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG:|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.|AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010

|95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:||CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090

|95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA:|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||.GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170

|95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.|||TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250

|94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..::CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT|584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330

|94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA|||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::...TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410

|94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:||TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460

*(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r* S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand)* score = 1073 : bitscore = 491.92* mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335

|96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG|||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||.TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980

|96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.|||CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060

|96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG|:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.|||||CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140

|96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA:|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:.CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220

|95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:|TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300

|95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT|.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:||||CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380

|95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||.GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460

|95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.:::::TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT|583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540

|95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG|....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:||CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620

top related