improved hit criteria for dna local alignment jobim 2004 montréal - june 28th laurent noé, gregory...
Post on 31-Mar-2015
214 Views
Preview:
TRANSCRIPT
Improved hit criteria for DNA local Improved hit criteria for DNA local alignmentalignment
JOBIM 2004 Montréal - June 28th
Laurent Noé, Gregory KucherovLORIA, Nancy
France
2
PlanPlan
Introduction– Local alignment– Heuristic methods
Hit criteria– Seed Models and extension proposed– Single/Multiple hit strategies and extension proposed
Experiments Conclusion
– Extensions
3
Local alignment methodsLocal alignment methods
Why being interested in local alignment methods– Improvement needed
#sequences , #users , ( budget )
Dynamic programming (Smith-Waterman)– Give an exact solution– Quadratic cost
(Best optimization in [Crochemore et al 02])
Heuristic Algorithms– Fasta, Blast, PatternHunter, Blastz, Yass,…In practice
4
Dot plot
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt
Detected alignment
Seed filtering Seed filtering
Start with small conserved and easily detected fragments (seeds).
Then extend seeds to build possible alignments
Detected seeds
5
Dot plot
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt
Two questions usually askedTwo questions usually asked
1. seed model: What can serve as a seed?
2. hit criterion: What is the criterion that witnesses a potential alignment?
Detected alignment
Detected seeds → 1. Seed model
→ 2. Hit criterion
6
1.1. What can serve as a seedWhat can serve as a seed
Exact similarity :
Seed Pattern :
Contiguous Seed
Example :
ATCAGT||||||ATCAGT######
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
######ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
7
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
Spaced Seed Model Spaced Seed Model [Ma et al. 02: PATTERNHUNTER][Ma et al. 02: PATTERNHUNTER]
Seed Pattern : ###--#-##
‘#’ : obligatory match position‘-’ : joker position (“don’t care” position)
Weight : 6 [number of #] Span : 9 [number of all symbols]
Example : ###--#-##
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
###--#-##ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
8
Spaced SeedsSpaced Seeds
Some probabilistic observations:
For spaced seeds, hits at subsequent positions are more independent events
For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different
||||||||||||||||| ###### ######
||||||||||||||||| ###--#-## ###--#-##
||||||||||||||||| ###### ######
||||||||||||||||| ###--#-## ###--#-##
9
Some probabilistic observations:
ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA
###--#-##
ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA
######
ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA
###--#-##
ATCAGTGCAATGCTCAAGA|||||||||||||||||||ATCAGTGCAATGCTCAAGA
######
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA
######
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||||||||ATCAGCGCAATGCTCAAGA
######
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA
######
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA
###--#-##
ATCAGTGCAATGCTCAAGA|||||.|||||||:|||||ATCAGCGCAATGCGCAAGA
######
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
###--#-##
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
######
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-##
###--#-##
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ######
######
For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different
10
Spaced seedsSpaced seeds
Spaced seed model is generally more sensitive than the contiguous seed model
Extend spaced seed model by taking into account DNA substitutions specificity
11
Biological properties
Transitions are usually over-represented.Regularity phenomenon in coding sequences. Use those properties to extend the spaced seed model
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
Mutational events Mutational events
A T
G Ctransitions
transversions
.:
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
12
BLASTZ modelBLASTZ model
[Schwartz et al. 03][Schwartz et al. 03]
A spaced seed that allows one possible transition substitution over its ‘#’ positions.
Problem : running time seed of large weight to obtain reasonable speed.
ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
###-#--##--#-#--#--##ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA|||.|||:|||.|||||.||:||||||:||.||||ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA
13
YASS model: YASS model: Transition Constrained SeedsTransition Constrained Seeds
Seed Pattern: ##@#-#@-###‘#’ : obligatory match position‘-’ : joker position (“don’t care” position)‘@’ : transition constrained position
transition constrained position: position that corresponds to either a match or a transition.
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
##@#-#@-###ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
14
Transition Constrained SeedsTransition Constrained Seeds
Seed Pattern: ##@#-#@-###‘#’ : obligatory match position‘-’ : joker position (“don’t care” position)‘@’ : transitions constrained position
Weight : 8 [number of # + half number of @]
@ carries 1 bit of information whereas # carries 2 bits.
@ adapted to GC-rich/poor genomes
15
Spaced seeds and Spaced seeds and Transition-Constrained SeedsTransition-Constrained Seeds
Seed pattern ( why ##@#-#@-### and not #@-#-#-#@# ?) – Not chosen randomly → Need to:
• define an alignment model.• search for the best (at least a good) seed pattern according to
this model. ( Sensitivity : probability to detect any alignment given by the
model )
– Chosen model can drastically change the seed shape…
ExampleBernoulli model ##@-#@#--#-#-###Markov model ##@##-##@##
16
– Bernoulli [Keich et al 02]
– Markov [Buhler et al 03]
– Automata (M3/M8) and HMMs [Brejova et al 03]
– Homogeneous alignments [Kucherov et al 04]
ATCAGTGCAATGCTCAAGA|||||.||.||||:|||||ATCAGCGCGATGCGCAAGA
|||||.||.||||:||||||||||.||.||||:|||||2222212212222022222
2222212212222022222
P(’2’) = 0.7, P(’1’) = 0.15, P(’0’) = 0.15
222221221222 X
Transition has an emission probability for each symbol
Ex : P(’2’) = 0.8, P(’1’) = 0.10, P(’0’) = 0.10
Probabilistic Alignment Models:Probabilistic Alignment Models:
“HSP” Alignments found by heuristic algorithms
17
Seed DesignSeed Design
Alignment Model : Bernoulli– P(match) = 0.7, P(transition)=0.15, P(transversion)=0.15
– alignment length = 64
18
Seed DesignSeed Design
Alignment Model : Markov– 5th Order, obtained on N.Menengitidis, S.Cerevisiae, Drosophila, and
Human sequences.
19
ExperimentsExperiments
S.Cerevisiae/Neisseiria sequences
20
To summarize ...To summarize ...
We have presented several seed models (contiguous, “classic” spaced seeds, BLASTZ)
We introduced transition-constrained seeds and showed how they improve the sensitivity
From detected seeds to detected alignments
21
2.2. Hit criterionHit criterion
What is the criterion that witnesses a potential alignment ?
Restriction : only the information about seeds is available
Dot plot
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt
Detected alignment
Detected seeds
→ 2. Hit criterion
22
Several methods have been proposedSeveral methods have been proposed
FASTA:– Several small seeds on
proximal diagonals
BLAST: (single hit)– One “large” seed.
Gapped-BLAST: (double hit)– Two seeds on the same diagonal
To define a good criterion we have first to define a class of similarities we want to detect : mutation model
Dot plot
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt
23
Mutation effect Mutation effect onon Seeds Seeds
Mutation effect
– Substitutions : “suppressing seeds”
– Indels : “diagonal shifts”
Remaining seeds
– Estimation of inter-seed distances• via a Waiting Time distribution
– Estimation of diagonals shifts• via a Random Walk model
ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgctaggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt
24
YASS hit criterionYASS hit criterion
According to these parameters, YASS propose:
– An intermediate criterion between BLAST single/Gapped Blast double hit criterion.
– Overlap controlled multi-hits
|:|||||||:|||:||| ###### ######
|:||||:|||||:|.|. ###--#-## ###--#-##
7 9
25
SensiSensitivitytivity Comparison of BLASTn/Gapped-BLAST/YASS hit criteria
score 25
26
SensiSensitivitytivity (cont) (cont) Comparison of BLASTn/Gapped-BLAST/YASS hit criteria
score 35
27
YASS criterion mixed with spaced seedsYASS criterion mixed with spaced seeds
28
ExperimentsExperiments
Local alignment sensitivity– YASS software / BLASTn (2.2.6 package)
M.t : M. tuberculosis CDC1551 S.s : Synechocystis sp. PCC 6803V.p : Vibrio p. RIMD 2210633 IY.p : Yersinia pestis KIM
29
AdsAds
30
AdsAds
YASS web page
http://www.loria.fr/projects/YASS
YASS can be queried online
http://yass.loria.fr
YASS is Open Source
31
ConclusionsConclusions
Two improvements:– Transition-constrained spaced seeds– Hit criterion combining statistical models and advantage of
single/multi hit strategies.
A tool that implements both of them
32
ExtensionsExtensions
To be done
– Multi-seed approach [Li03, Bulher04, Noe04]
– Seed design on the fly (non necessary static seeds).
– and others …
33
QuestionsQuestions
agctga
g?cc??
tatgag
caa?ga
cca??a
ctc?gc
ggcgca
tctagg
ag??ac
c???tc
ttcttc
g
???? ??
34
35
|95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:||||CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700
|95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.|ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770
|95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|:CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850
|95320 |95310 |95300 |95290 |95280 |95270 |95260GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG|||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:||||GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930
|95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG:|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.|AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010
|95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:||CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090
|95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA:|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||.GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170
|95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.|||TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250
|94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..::CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT|584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330
|94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA|||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::...TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410
|94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:||TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460
*(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r* S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand)* score = 1073 : bitscore = 491.92* mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335
|96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG|||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||.TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980
|96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.|||CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060
|96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG|:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.|||||CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140
|96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA:|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:.CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220
|95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:|TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300
|95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT|.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:||||CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380
|95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||.GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460
|95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.:::::TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT|583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540
|95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG|....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:||CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620
36
|95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:||||CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700
|95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.|ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770
|95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|:CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850
|95320 |95310 |95300 |95290 |95280 |95270 |95260GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG|||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:||||GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930
|95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG:|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.|AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010
|95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:||CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090
|95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA:|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||.GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170
|95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.|||TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250
|94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..::CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT|584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330
|94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA|||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::...TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410
|94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:||TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460
*(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r* S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand)* score = 1073 : bitscore = 491.92* mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335
|96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG|||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||.TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980
|96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.|||CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060
|96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG|:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.|||||CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140
|96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA:|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:.CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220
|95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:|TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300
|95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT|.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:||||CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380
|95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||.GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460
|95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.:::::TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT|583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540
|95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG|....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:||CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620
top related