biological definitions for r elated sequences
DESCRIPTION
Biological definitions for r elated sequences. Homologues are similar sequences in two different organisms that have been derived from a common ancestor sequence. Homologues can be described as either orthologues or paralogues. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/1.jpg)
Biological definitions for Biological definitions for rrelated sequenceselated sequences
• Homologues are similar sequences in two different organisms that have been derived from a common ancestor sequence. Homologues can be described as either orthologues or paralogues.
• Orthologues are similar sequences in two different organisms that have arisen due to a speciation event. Orthologs typically retain identical or similar functionality throughout evolution.
• Paralogues are similar sequences within a single organism that have arisen due to a gene duplication event.
• Xenologues are similar sequences that do not share the same evolutionary origin, but rather have arisen out of horizontal transfer events through symbiosis, viruses, etc.
![Page 2: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/2.jpg)
So this means …So this means …
Source: http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html
![Page 3: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/3.jpg)
Multiple sequence alignmentMultiple sequence alignment
Sequences can be mutated or rearranged to perform an altered function.
• which changes in the sequence have caused a change in the functionality.
Multiple sequence alignment: the idea is to take three or more sequences and align them so that the greatest number of similar characters are aligned in the same column of the alignment.
• hold information about which regions have high mutation rates over evolutionary time and which are evolutionarily conserved • identification of regions or domains that are critical to functionality.
Sequences can be conserved across species and perform similar or identical functions.
![Page 4: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/4.jpg)
What to ask yourselfWhat to ask yourself
How do we get a multiple alignment? (three or more sequences)
What is our main aim?– Do we go for max accuracy, least
computational time or the best compromise? What do we want to achieve each time?
![Page 5: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/5.jpg)
sequence
sequence
SequenceSequence-sequence alignment-sequence alignment
![Page 6: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/6.jpg)
Multiple alignment methodsMultiple alignment methods
Multi-dimensional dynamic programmingMulti-dimensional dynamic programming Progressive alignmentProgressive alignment Iterative alignment Iterative alignment
![Page 7: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/7.jpg)
Simultaneous multiple alignmentSimultaneous multiple alignmentMulti-dimensional dynamic programmingMulti-dimensional dynamic programming
The combinatorial explosion:The combinatorial explosion: 2 sequences of length n 2 sequences of length n
– nn22 comparisons comparisons
Comparison number increases exponentiallyComparison number increases exponentially– i.e. ni.e. nNN where n is the length of the sequences, and N is where n is the length of the sequences, and N is
the number of sequencesthe number of sequences
Impractical for even a small number of short Impractical for even a small number of short sequences quite quicklysequences quite quickly
![Page 8: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/8.jpg)
Multi-dimensional dynamic Multi-dimensional dynamic programmingprogramming (Murata et al, 1985)(Murata et al, 1985)
Sequence 1
Seq
uenc
e 2
Sequence 3
![Page 9: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/9.jpg)
The MSA approachThe MSA approach
MSA (Lipman et al., 1989, PNAS 86, 4412)MSA (Lipman et al., 1989, PNAS 86, 4412)• Calculate all pairCalculate all pair--wise alignment scorewise alignment scores.s.• Use the scores to predict a tree. Use the scores to predict a tree. • Calculate pair weights based on the tree. Calculate pair weights based on the tree. • Produce a heuristic alignment based on the tree. Produce a heuristic alignment based on the tree. • Calculate the maximum weight for each sequence pair. Calculate the maximum weight for each sequence pair. • Determine the spatial positions that must be calculated to obtain Determine the spatial positions that must be calculated to obtain
the optimal alignment. the optimal alignment. • Perform the optimal alignment. Perform the optimal alignment. • Report the weight found compared to the maximum weight Report the weight found compared to the maximum weight
previously foundpreviously found..• EExtremely slow and memory intensivextremely slow and memory intensive• Max 8-9 sequences of ~250 residuesMax 8-9 sequences of ~250 residues
![Page 10: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/10.jpg)
The DCA approachThe DCA approach
DCA (Stoye et al 1997)DCA (Stoye et al 1997) Iteratively split at Iteratively split at
optimal cut pointsoptimal cut points Use MSA Use MSA ConcatenateConcatenate
![Page 11: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/11.jpg)
So in effect …So in effect …Sequence 1
Seq
uenc
e 2
Sequence 3
![Page 12: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/12.jpg)
Multiple alignment methodsMultiple alignment methods
Multi-dimensional dynamic programmingMulti-dimensional dynamic programming Progressive alignmentProgressive alignment Iterative alignment Iterative alignment
![Page 13: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/13.jpg)
Multiple alignment profilesMultiple alignment profilesGribskov et al. 1987Gribskov et al. 1987
ACDWY
-
i
fA..fC..fD..fW..fY..Gapo, gapxGapo, gapx
Position dependent gap penalties
Core region Core regionGapped region
Gapo, gapx
fA..fC..fD..fW..fY..
fA..fC..fD..fW..fY..
![Page 14: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/14.jpg)
Profile buildingProfile building
ACDWY
Gappenalties
i0.30.100.30.3
0.51.0
Position dependent gap penalties
0.50000.5
00.50.20.10.2
1.0
Example: Each aa is represented as a frequency, penalties as weights
![Page 15: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/15.jpg)
ACD……VWY
sequence
profile
Profile-sequence alignmentProfile-sequence alignment
![Page 16: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/16.jpg)
ACD..Y
ACD……VWY
profile
profileProfile-profile alignmentProfile-profile alignment
![Page 17: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/17.jpg)
Scoring profilesScoring profiles Think of sequence-sequence alignment Same principles but more information for each
position
Reminder: The sequence pair alignment score S comes from
the sum of the positional scores M(aai,aaj) (i.e. the substitution matrix values at each alignment position minus penalties if applicable)
Profile alignment scores are exactly the same, but the positional scores are more complex
![Page 18: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/18.jpg)
Scoring a profile positionScoring a profile position
At each position (column) we have different residue frequencies for each amino acid (rows)
SO: Instead of saying S=M(aa1, aa2) (one residue pair) For frequency f>0 (amino acid is actually there) we take:
ACD..Y
Profile 1ACD..Y
Profile 2
20 20
),(i j
jiji aaaaMfaafaaS
![Page 19: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/19.jpg)
Log-average scoringLog-average scoring
Remember the substitution matrix formula?
In log-average scoring (von Ohsen et al, 2003)
Why is this so important? Think about it…
20 20
logi j aaaa
aaaa
ji
ji
ji
pfaafaaS
20 20
logi j aaaa
aaaa
ji
ji
ji
pfaafaaS
![Page 20: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/20.jpg)
Progressive alignmentProgressive alignment
1) Perform pair-wise alignments of all of the sequences2) Use the alignment scores to produces a dendrogram using
neighbour-joining methods3) Align the sequences sequentially, guided by the
relationships indicated by the tree
Biopat (first method ever) MULTAL (Taylor 1987) DIALIGN (1&2, Morgenstern 1996) PRRP (Gotoh 1996) ClustalW (Thompson et al 1994) Praline (Heringa 1999) T Coffee (Notredame 2000) POA (Lee 2002) MUSCLE (Edgar 2004)
![Page 21: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/21.jpg)
Progressive multiple alignmentProgressive multiple alignment1213
45
Guide tree Multiple alignment
Score 1-2
Score 1-3
Score 4-5
Scores Similaritymatrix5×5
Scores to distances Iteration possibilities
![Page 22: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/22.jpg)
General progressive multiple General progressive multiple alignment techniquealignment technique (follow generated tree)(follow generated tree)
13
25
13
13
13
25
254
d
root
![Page 23: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/23.jpg)
PPRALINERALINE progressive strategy progressive strategy
13
2
13
13
13
25
254
d
4
![Page 24: Biological definitions for r elated sequences](https://reader035.vdocuments.us/reader035/viewer/2022062519/56814d75550346895dbad21a/html5/thumbnails/24.jpg)
There are problems:There are problems:
Accuracy is very important !!!!Accuracy is very important !!!! Errors are propagated into theErrors are propagated into the progressiveprogressive stepssteps
“ “ Once a gap, always a gap”Once a gap, always a gap”Feng & Doolittle, 1987Feng & Doolittle, 1987