![Page 1: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/1.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
1
Multiple Global Sequence Alignment and
Phylogenetic trees
Inge Jonassen
and
Ingvar Eidhammer
![Page 2: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/2.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
2
Definition
• A global alignment of a set of sequences is obtained by– inserting into each sequence gap characters ‘
’
• so that– the resulting sequences are of the same
length
• and so that– no “column” has only gap characters
![Page 3: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/3.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
3
Example: Chromo domains aligned
![Page 4: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/4.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
4
Use of alignments• High sequence similarity usually means significant structural and/or
functional similarity. The reverse does not need to be true
• Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site.
• Comparison of several sequences in a family can reveal what is common for the family (From Lesk: Two homologous sequences whisper,.. A full multiple alignment shouts out load). Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two.
• Multiple alignment can be used to derive evolutionary history.
![Page 5: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/5.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
5
Use of alignments
• Predict features of aligned objects– conserved positions
• structurally/functionally important
![Page 6: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/6.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
6Conserved positions
![Page 7: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/7.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
7
Use of alignments
• Predict features of aligned objects– conserved positions
• structurally/functionally important
– patterns of hydrophobicity/hydrophilicity• secondary structure elements
![Page 8: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/8.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
8Helix pattern
![Page 9: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/9.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
9
Use of alignments
• Predict features of aligned objects– conserved positions
• structurally/functionally important
– patterns of hydrophobicity/hydrophilicity• secondary structure elements
– “gappy” regions• loops/variable regions
![Page 10: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/10.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
10Loop? Loop?Loop?
![Page 11: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/11.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
11
Use of alignments
• Predict features of aligned objects– conserved positions
• structurally/functionally important
– patterns of hydrophobicity/hydrophilicity• secondary structure elements
– “gappy” regions• loops/variable regions
– covariation• structural proximity
![Page 12: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/12.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
12
![Page 13: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/13.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
13
Use of Alignments- make patterns/profiles
• Can make a profile or a pattern that can be used to match against a sequence database and identify new family members
• Profiles/patterns can be used to predict family membership of new sequences
• Databases of profiles/patterns– PROSITE– PFAM– PRINTS– ...
![Page 14: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/14.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
14
Prosite: Motifs for classification
Protein sequence
Prositepattern 1
Prositepattern 2
Prositepattern n
Family 1 Family 2 Family n
PatternRegular expression
Profile
![Page 15: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/15.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
15
Pattern from alignment[FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]
![Page 16: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/16.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
16
Alignment problem
Given a set of sequences, produce a multiple alignment which corresponds as
well as possible to the biological relationships between the corresponding
bio-molecules
![Page 17: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/17.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
17
For homologous proteins
• Two residues should be aligned (on top of each other)– if they are homologous (evolved from the
same residue in a common ancestor protein)– if they are structurally equivalent
![Page 18: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/18.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
18
Automatic approach
• Need a way of scoring alignments – fitness function which for an alignment
quantifies its “goodness”
• Need an algorithm for finding alignments with good scores
• Not all methods provide a scoring function for the final alignment!
![Page 19: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/19.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
19
Analysis of fitness function
• One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences
• For example, if the structure of (some of) the proteins are known.
![Page 20: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/20.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
20
Align by use of dynamic programming
• Dynamic programming finds best alignment of k sequences with given scoring scheme
• For two sequences there are three different column types
• For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x
• Time complexity of O(nk) (sequence lengths = n)
![Page 21: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/21.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
21
![Page 22: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/22.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
22
Use of dynamic programming
• Dynamic programming finds best alignment of k sequences given scoring scheme
![Page 23: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/23.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
23
![Page 24: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/24.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
24
Algorithm for dynamic programming
![Page 25: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/25.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
25
Connection alignment and evolutionary tree
Consider a set of sequences ARL, ARTL, ARSI, ARSL, AWTL, AWT
AlignmentAR-LARTLARSIARSLAWTLAWT-
Possible tree
Use the tree to calculate alignment
AWTL ARTL ARSL AWTL ARTL AWTLAWT- AR-L ARSI AWT- AR-L AWT- ARSL ARTL ARSI AR-L ARSL ARSI
![Page 26: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/26.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
26
Phylogenetic studies
The purpose of phylogenetic studies of related objects are
• to reconstruct the correct genealogical ties between them (the topology); and
• to estimate the time of divergence between them since they last shared a common ancestor (length of edges in the tree).
In phylogenetic studies, the objects are often referred to as operational taxonomic units (OTUs). In our case the objects are protein or nucleic acid sequences. We will denote the set of sequences we have at the start for the original sequences.
![Page 27: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/27.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
27
Phylogenetic studies
![Page 28: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/28.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
28
Example
![Page 29: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/29.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
29
Number of different tree topologies
![Page 30: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/30.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
30
Additive tree
![Page 31: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/31.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
31
Additive and ultrametric
Lemma1 It is possible to construct an additive tree from the distances between the sequences (metric space) if and only if for any four of them we can label them i,j,k,lsuch that Di,j + Dk,l = Di,k + Dj,l >= Di,l + Dj,k
Lemma2 It is possible to construct an ultrametric tree from the distances between theSequences (metric space) if and only if for every i,j,k Di,j <= max(Di,k,Dk,j)
![Page 32: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/32.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
32
Maximum parsimony
![Page 33: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/33.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
33
Parvis gruppering
![Page 34: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/34.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
34
An example
![Page 35: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/35.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
35
Neighbour joining
![Page 36: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/36.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
36
![Page 37: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/37.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
37
Bootstrapping
![Page 38: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/38.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
38
![Page 39: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/39.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
39
General progressive alignment
Algorithm 4.3. General progressive alignment.Progressive alignment of the sequences {s1, s2, . . . , sm} var C current set of alignmentsbegin C := ∅ for i := 1 to m do C := C union {{si }} end one alignment of each sequence
for i := 1 to m − 1 do choose two alignments Ap,Aq from C; C := C − {Ap,Aq } Ar := align(Ap,Aq );C := C union {Ar } end C now contains the (single) final alignmentend
![Page 40: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/40.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
40
![Page 41: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/41.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
41
![Page 42: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/42.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
42
![Page 43: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/43.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
43
Clustering philosophy
Join the two groups with highest pairwise score.
1. Average scoring method: find average score over all pasirs in the two groups
2. Maximum scoring method: find maximum score over all pairs in the two groups (needs only one high-scoring pair)
3. Minimum (complete) scoring method: find minimum scoring over all pairs (all pairs are taken into account)
4. Special scoring method
![Page 44: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/44.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
44
![Page 45: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/45.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
45
The Clustal Algorithm
• Three steps:1 Compare all pairs of sequences to obtain a
similarity matrix2 Based on the similarity matrix, make a guide
tree relating all the sequences3 Perform progressive alignment where the
order of the alignments is determined by the guide tree
![Page 46: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/46.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
46
(A)1 pairwise comparison2 clustering/making tree
(B)3 Align according to tree
![Page 47: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/47.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
47
ClustalW - Score of aligning two alignment columns
• sum the score matrix entry for all pairs of residues
• weight each pair by the sequences’ weights
1:peeksavtal2:geekaavlal
3:egewglvlhv4:aaektkirsa
Score: M(t,v)+M(t,i)+M(l,v)+M(l,i)
![Page 48: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/48.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
48
ClustalW - Weighting sequences
• each sequence is given a weight
• groups of related sequences receive lower weight
Weighted score: w1*w3*M(t,v)+w1*s4*M(t,i)+w2*w3*M(l,v)+w2*w4*M(l,i)
1:peeksavtal2:geekaavlal
3:egewglvlhv4:aaektkirsa
![Page 49: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/49.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
49
ClustalW - Similarity matrix
• Distance between sequences - measure from the guide tree - determines which matrix to use– 80-100% seq-id -> use Blosum80– 60-80% seq-id -> Blosum60– 30-60% seq-id -> Blosum45– 0-30% seq-id -> Blosum30
![Page 50: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/50.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
50
ClustalW - Gap penalties
• Initial gap penalty– GOP
• Gap extension penalty– GEP
GTEAKLIVLMANEGA---------KL
Penalty: GOP+8*GEP
![Page 51: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/51.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
51
ClustalW -Modifications of gap penalty
• Position specific penalty– gap at position
• yes -> lower GOP• no, but gap within 8 residues -> increase GOP
– hydrophilic residues• lower GOP
![Page 52: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/52.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
52
Globin alignment
Default gap penaltyGEP=0.05
![Page 53: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/53.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
53
Globin alignment - with insert
Default gap penaltyGEP=0.05
![Page 54: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/54.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
54
Globin alignment - with insert
Lowered gap penaltyGEP=0.01
![Page 55: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/55.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
55
ClustalW - summary
• Does not use a score for the final alignment
• Each pairwise alignment is done using dynamic programming
• Heuristics (e.g., gap-penalty modifications) are used - tailored to globular proteins
• Graphical version: ClustalX
![Page 56: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/56.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
56
SAGA: Sequence Alignment by Genetic Algorithm
• An “objective function” is used to score the alignments
• An alignment is represented as a bit string• A population of alignment is “evolved”• Alignments can be combined (cross-over)• Alignments can be mutated• Alignments with higher score are more likely
to be chosen for mating/survival
![Page 57: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/57.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
57
![Page 58: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/58.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
58
Evaluation of Alignment Methods
• Align set of protein sequences where the structures are known (at least for some proteins)
• Align the protein structures
• Identify “motifs” from the structure alignment
• Check if sequence alignment has correctly aligned motifs
• McClure et al, 1994
• Thompson et al, 1999
![Page 59: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/59.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
59
Alignments are important
• Basis for other analyses– structure prediction– phylogeny– experiments
• PCR primer identification• site directed mutagenesis• ...
– identification of motifs
![Page 60: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/60.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
60
Open Problems - space for improvements!
• Good scoring function for alignments– identify well aligned regions
• Efficient algorithms
• Resolving repeat structure, domain movements etc.
• Incorporating external information
![Page 61: Multiple Global Sequence Alignment and Phylogenetic trees](https://reader035.vdocuments.us/reader035/viewer/2022062304/5681440b550346895db0a20f/html5/thumbnails/61.jpg)
Eidhammer et al. Protein Bioinformatics Chapter 4
61
Future development
• More sequences– More families, but not so many– More densely populated families– “Easier” alignment problem– Identify more ancient relationships
(superfamilies)
• More structures– more sequences can be “threaded”– alignments help