6/3/2015burkhard morgenstern, tunis 2007 multiple alignment and motif searching burkhard morgenstern...

399
06/20/22 Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and Genetics Department of Bioinformatics Tunis, March 2007

Post on 15-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple Alignment and Motif Searching

Burkhard Morgenstern

Universität Göttingen

Institute of Microbiology and Genetics

Department of Bioinformatics

Tunis, March 2007

Page 2: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple Alignment and Motif Searching

http://www.gobics.de/

burkhard/teaching/tunis_07.php

Page 3: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

www.gobics.de/burkhard/teaching/tunis_07.php

Page 4: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Information flow in the cell

Page 5: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Information flow in the cell

Idea:

Sequence -> Structure -> Function

Page 6: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Information flow in the cell

Page 7: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Information flow in the cell

gap between sequence and structure/function data

Lots of data available at the sequence level

Fewer data at the structure and function level

Page 8: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Exponential growth of data bases

Page 9: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Major goal of bioinformatics: close the gap between sequence information and structure/function information

Most important tool for sequence analysis: sequence comparison

Simple approach: dot plot, more advanced approach: sequence alignment

Page 10: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Page 11: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Gibbs and McIntyre (1970)

Page 12: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y I V A R E A Q Y E C I V M R E Q Y

Two sequences to be compared

Page 13: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y I V A R E A Q Y E C I V M R E Q Y

Comparison matrix

Page 14: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V M R E Q Y

Search pairs of identical residues

Page 15: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Dot plot: dot (X) for all pairs of identical residues

Page 16: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Page 17: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Homologies as diagonal lines from top-left to bottom-right corner

Page 18: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Inversions as diagonals from bottom left to top right

Page 19: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X

Repeats as parallel diagonals

Page 20: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X

Page 21: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Advantages:

1. Various types of similarity detectable (repeats, inversions)

2. Useful for large-scale analysis

Use filtering for long sequeces: dots represent matching segments instead of matching single residues

Page 22: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The dot plot

Page 23: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Evolutionary or structurally related sequences:

alignment possible

Sequence homologies represented by inserting gaps

Page 24: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C I V M R E A Q Y

Two input sequences

Page 25: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C I V M R E A Q Y

Comparison matrix for two sequences

Page 26: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C I X V X M R X E X X A X Q X Y X X Dot plot for two sequences

Page 27: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C I X V X M R X E X X A X Q X Y X X

Similarities in same relative order over entire seqences

Page 28: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C I X V X M R X E X A Q X Y X

Global alignment of sequences possible

Page 29: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X

Alignment corresponds to path through comparison matrix

Page 30: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X

Matches (red), mis-matches (green), gaps (blue)

Page 31: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X

Matches (red), mis-matches (green), gaps (blue)

Page 32: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

(global) alignment: write sequences on top of each other, gaps represented by dash symbols

Page 33: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E

C I V M R E A Q Y

Input sequences

Page 34: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C - I V M R E A Q Y –

alignment of input sequences

Page 35: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C - I V M R E A Q Y -

alignment consists matches (red), mismatches (green) and gaps (blue)

Page 36: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C - I V M R E A Q Y –

Basic task:

Find ‘best’ alignment of two sequences

= alignment that reflects structural and evolutionary relations

Page 37: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C - I V M R E A Q Y –

Questions:

1. What is a good alignment?

2. How to find the best alignment?

Page 38: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C - I V M R E A Q Y –

Idea: consider alignment as hypothesis about evolution of sequences.

gaps correspond to insertions/deletions mismatches correspond to substitutions

Page 39: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C - I V M R E A - Q Y

Problem: astronomical number of possible alignments

Page 40: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E Q Y E

C I - V M R E A Q Y

Problem: astronomical number of possible alignments

Page 41: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

- C I V M R E A Q Y –

Problem: astronomical number of possible alignments

stupid computer has to find out: which alignment is best ??

Page 42: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

- C I V M R E A Q Y –

First (simplified) rules:

1. minimize number of mismatches

2. maximize number of matches

Page 43: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

- C I V M R E A Q Y –

General assumption: sequences not too distantly related.

In this case: mismatches (substitutions) and gaps (insertions/deletions) unlikely

Consequence: good alignment should reduce gaps and mismatches

Page 44: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C I - V M R E A Q Y –

First (simplified) rules:

1. minimize number of mismatches

2. maximize number of matches

Page 45: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

- C I V M R E A Q Y –

First (simplified) rules:

1. minimize number of mismatches

2. maximize number of matches

Page 46: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

- C I V M R E A Q Y –

First (simplified) rules:

1. minimize number of mismatches

2. maximize number of matches

Page 47: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E - Q Y E

C I - V M R E A Q Y –

Second (simplified) rule:

minimize number of gaps

Page 48: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V - A R E - Q Y E

C I - V M - R E A Q Y –

Second (simplified) rule:

minimize number of gaps

Parsimony principle: minimize number of evolutionary events

Page 49: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

For protein sequences: different degrees of similarity among amino

acids. counting matches/mismatches

oversimplistic

Page 50: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V

T L V

Protein sequences to be aligned

Page 51: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V

T L - V

Possible alignment

Page 52: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V

T - L V

Alternative alignment

Page 53: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V

T - L V

Some amino acid residues are more similar to each other than others

Therefore: similarity among amino acid residues has to be taken into account.

Page 54: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Page 55: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V

T - L V

To assess quality of protein alignments:

use similarity scores for amino acids

s(a,b) similarity score for amino acids a and b

Page 56: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Similarity measured by substitution matrices based on substitution probabilities

Important substitution matrices:

PAM (M. Dayhoff) BLOSUM (S. Henikoff / J. Henikoff)

Page 57: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

The PAM matrix:

Consider probability pa,b of substitution a → b (or b → a) for amino acids a and b

Define for amino acids a and b similarity score S(a,b) based on probability pa,b

First task: find out pa,b for every pair of amino acids a, b

Page 58: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

The PAM matrix:

Use closely related protein families – no alignment problem, no double substitutions

Construct phylogenetic tree with parsimony method

Count substitution frequencies/probabilities Normalize substitution probabilities Extrapolate probabilities for larger

evolutionary distances

Page 59: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Finally: define similarity score

S(a,b) = log (pa,b / qa qb)

qa = (relative) frequency of amino acid a

Page 60: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Page 61: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V

T - L V

Given a similarity score s(a,b) for pairs of amino acids, define quality score of alignment as:

sum of similarity values s(a,b) of aligned residues

minus gap penalty g for each residue aligned with a gap

Page 62: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V

T - L V

Example:

Score = s(T,T) + s(I,L) + s (V,V) - g

Page 63: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V T - L V

Next question: find alignment with best score

Dynamic-programming algorithm finds alignment with best score.

(Needleman and Wunsch, 1970)

Page 64: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Alignment corresponds to path through comparison matrix

Page 65: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X

Page 66: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E A Q Y E X X C X I X V X M X R X E X X Q X Y X X

Page 67: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Alignment corresponds to path through comparison matrix

Page 68: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V - R E A Q I - C I V M R E - H Y

Page 69: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Score of alignment: Sum of similarity values of aligned residues minus gap penatly

T W L V - R E A Q I - C I V M R E - H Y

Page 70: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Example: S = - g + s(W,C) + s(L,L) + s(V,V) - g + s(R,R) …

T W L V - R E A Q I - C I V M R E - H Y

Page 71: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X H X Y X X

T W L V - R E A Q I - C I V M R E - H Y

Page 72: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R E A Q Y I X X C X Alignment corresponds I X to path through V X comparison matrix M X R X E X X H X Y X X

T W L V - R E A Q I - C I V M R E - H Y

Page 73: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i T W L V R E A Q Y I X X Dynamic programming: C X Calculate scores S(i,j) I X of optimal alignment of V X prefixes up to positions M X i and j. j R X E H Y

T W L V - R - C I V M R

Page 74: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i T W L V R E A Q Y I X X C X S(i,j) can be calculated from I X possible predecessors V X S(i-1,j-1), S(i,j-1), S(i-1,j). M X j R X E H Y

T W L V - R - C I V M R

Page 75: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from top left = V X M X S(i-1,j-1) + s(R,R) j R X E H Y

T W L V - R - C I V M R

Page 76: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from above = V X j-1M X S(i,j-1) – g j R X E H Y

T W L V R - - C I V M R

Page 77: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i-1 i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from left = V X M X S(i-1,j) – g j R X X E H Y

T W L - - V R - C I V M R -

Page 78: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i-1 i T W L V R E A Q Y I X X C X Score of optimal path = I X V X Maximum of these three M X values j R X X E H Y

T W L - - V R - C I V M R -

Page 79: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Recursion formula for global alignment:

For sequences x and y

gijS

gjiS

yxsjiS

jiS

ji

)1,(

),1(

),()1,1(

max),(

Page 80: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R C I V M R E H Y

Page 81: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

Page 82: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

Page 83: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

Page 84: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x x M x x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

Page 85: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x H x x Y x x Fill matrix from top left to bottom right:

Page 86: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x Y x x Fill matrix from top left to bottom right:

Page 87: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x Fill matrix from top left to bottom right:

Page 88: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

Page 89: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

Page 90: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x x C x x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

Page 91: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x x C x x x x I x x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

Page 92: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x x C x x x x I x x x x V x x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

Page 93: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Fill matrix from top left to bottom right:

Page 94: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Find optimal alignment by trace-back procedure

Page 95: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R x x x x x x C x I x V x M x R x E x H x Y x Initial matrix entries?

Page 96: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i

T W L V R

X X

C X Entries S(i,j) scores

I X of optimal alignment of

j V X prefixes up to positions

M i and j.

R

E

H

Y

T W L V

- C I V

Page 97: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

i T W L V R j X X X X X C Entries S(i,0) scores I of optimal alignment of V prefix up to positions M i and empty prefix. R E Score = - i* g H Y T W L V - - - -

Page 98: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R C I V M R E H Y Initial matrix entries: Example, g = 2

Page 99: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

T W L V R 0 -2 -4 -6 -8 -10 C -2 I -4 V -6 M -8 R -10 E -12 H -14 Y -16 Initial matrix entries: Example, g = 2

Page 100: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise global alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 101: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise global alignment

Computational complexity: how does program run time and memory depend on size of input data?

l1 and l2 length of sequences:Computing time and memory proportional to

l1 * l2

Time and memory complexity = O(l1 * l2)

Page 102: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

More realistic gap penalty: affine-linear instead of linear

Penalty for gap of length l:

c0 + (l-1)* c1

c0 = ‘gap-opening penalty’

c0 = ‘gap-extension penalty’

Page 103: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

So far: global alignment considered: sequences aligned over their entire length.

But: sequences often share only local sequence similarity (conserved genes or domains)

Most important application: database searching

Page 104: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X H X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 105: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 106: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

Problem:

Find pair of segments with maximal alignment score (not necessarily part of optimal global alignment!)

Equivalent: find path starting and ending anywhere in the matrix.

Page 107: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 108: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

Recursion formula for global alignment:

S(i,j) = max { S(i-1,j-i)+s(ai,bj) , S(i-1,j) – g , S(i,j-i) – g }

Page 109: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

Recursion formula for local alignment:

S(i,j) = max { 0 , S(i-1,j-i)+s(ai,bj) , S(i-1,j) – g , S(i,j-i) – g }

Page 110: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

T W L V R 0 0 0 0 0 0 C 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 Initial matrix entries = 0

Page 111: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

T W L V R 0 0 0 0 0 0 C 0 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 s(C,T) = -2

Page 112: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Recursion formula for global alignment:

gijS

gjiS

yxsjiS

jiS

ji

)1,(

),1(

),()1,1(

max),(

Page 113: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

Recursion formula for local alignment:

0

)1,(

),1(

),()1,1(

max),(gijS

gjiS

yxsjiS

jiS

ji

Page 114: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise sequence alignment

For trace-back:

Store positions imax and jmax with

S(imax ,jmax) maximal

Page 115: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 116: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

Algorithm by Smith and Waterman (1983)

Implementation: e.g. BestFit in GCG package

Page 117: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Pair-wise local alignment

Complexity: l1 and l2 length of sequences:computing time

and memory proportional to l1 * l2

Time and space complexity = O(l1 * l2)

Too slow for data base searching! Therefore tools like BLAST necessary for

database searching

Page 118: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The Basic Local Alignment Search Tool (BLAST)

New BLAST version (1997)

Two-hit strategy Gapped BLAST Position-Specific Iterative BLAST

(PSI BLAST)

Page 119: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The Basic Local Alignment Search Tool (BLAST)

PSI BLAST:

1. search database with standard BLAST

2. take best hits and create multiple alignment

3. calculate profile from multiple alignment

4. search database again with profile as query

Page 120: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The Basic Local Alignment Search Tool (BLAST)

Page 121: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The Basic Local Alignment Search Tool (BLAST)

profile for sequence family or motif:

table of amino acid/nucleotide frequencies at any position in alignment.

Page 122: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The Basic Local Alignment Search Tool (BLAST)

Profile: frequencies of nucleotides at every position.

seq1 A T T G – A T

seq2 C T T G T A G

seq3 A - - G T A T

seq4 A T G G T G T

seq5 A C T G T A C

A 80 0 0 0 0 80 0

T 0 75 75 0 100 0 60

C 20 25 0 0 0 0 20

G 0 0 25 100 0 20 20

Page 123: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 T Y I M R E A Q Y E S A Q

s2 T C I V M R E A Y E

s3 Y I M Q E V Q Q E R

s4 W R Y I A M R E Q Y E

Page 124: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Q

s2 - T C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Page 125: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Q

s2 - T C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Page 126: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Q

s2 - T C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Page 127: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Q

s2 - T C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

General information in multiple alignment: Functionally important regions more conserved than

non-functional regions Local sequence conservation indicates functionality!

Page 128: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Qs2 - T C I V M R E A - Y E - - -s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

For phylogeny reconstruction: Estimate pairwise distances between sequences

(distance-based methods for tree reconstruction) Estimate evloutionary events in evolution (parsimony

and maximum likelihood methods)

Page 129: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Q

s2 - T C I V M R E A - Y E - - -

s3 - - Y I - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Astronomical number of possible alignments!

Page 130: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Q

s2 - T C I V M R E A - - - Y E -

s3 Y I - - - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Astronomical number of possible alignments!

Page 131: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

s1 - T Y I - M R E A Q Y E S A Q

s2 - T C I V M R E A - - - Y E -

s3 Y I - - - M Q E V Q Q E R - -

s4 W R Y I A M R E - Q Y E - - -

Computer has to decide: which one is best??

Page 132: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

Questions in development of multiple-alignment programs (as in pairwise alignment):

(1) What is a good alignment? → objective function (`score’)

(2) How to find a good alignment? → optimization algorithm

First question far more important !

Page 133: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

Traditional Objective functions:

Define Score of alignments as

Sum of individual similarity scores S(a,b) Gap penalties

Needleman-Wunsch scoring system (1970)

Page 134: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

Traditional Objective functions

Can be generalized to multiple alignment

(e.g. sum-of-pair score, tree alignment)

Needleman-Wunsch algorithm can also be generalized to multiple alignment, but:

Very time and memory consuming!

-> Heuristics needed

Page 135: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

First question: how to score multiple alignments?

Possible scoring scheme:

Sum-of-pairs score

Page 136: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 137: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 138: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 139: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 140: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Multiple alignment implies pairwise alignments:

Use sum of scores of these p.a.

1aboA 36 WCEAQt..kngqGWVPSNYITPVN......

1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd.....

1vie 28 YAVESeahpgsvQIYPVAALERIN......

Page 141: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment

Page 142: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Page 143: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Complexity:

For sequences of length l1 * l2 * l3

O( l1 * l2 * l3 )

For n sequences ( average length l ):

O( ln )

Exponential complexity!

Page 144: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Multiple sequence alignment

Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment

Optimal solution not feasible:

-> Heuristics necessary

Page 145: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Page 146: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Guide tree

Page 147: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Idea: align closely related sequences first!

Page 148: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASFQPVAALERIN

WLNYNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 149: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------

WLN-YNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 150: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN-

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------

WLN-YNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 151: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN--------

WW--RLNDKEGYVPRNLLGLYP--------

AVVIQDNSDIKVVP--KAKIIRD-------

YAVESEA---SVQ--PVAALERIN------

WLN-YNE---ERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 152: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

“Greedy” algorithm:

Consider partial solution of bigger problem

search best partial solution, fix solution search second-best partial solution that is consistent

with first solution, fix solution Search third-best partial solution … etc.

E.g.: Rucksack-Problem

Page 153: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN--------

WW--RLNDKEGYVPRNLLGLYP--------

AVVIQDNSDIKVVP--KAKIIRD-------

YAVESEA---SVQ--PVAALERIN------

WLN-YNE---ERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 154: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

`Progressive´ Alignment

Most important software program:

CLUSTAL W:J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL

W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680

(~ 18.000 citations in the literature)

Page 155: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

Problems with traditional approach:

Results depend on gap penalty

Heuristic guide tree determines alignment;

alignment used for phylogeny reconstruction

Algorithm produces global alignments.

Page 156: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Tools for multiple sequence alignment

Problems with traditional approach:

But:

Many sequence families share only local similarity

E.g. sequences share one conserved motif

Page 157: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Local sequence alignment

Find common motif in sequences; ignore the rest

EYENS

ERYENS

ERYAS

Page 158: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Local sequence alignment

Find common motif in sequences; ignore the rest

E-YENS

ERYENS

ERYA-S

Page 159: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Local sequence alignment

Find common motif in sequences; ignore the rest – Local alignment

E-YENSERYENSERYA-S

Page 160: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Local sequence alignment

Important methods for local multiple alignment:

•PIMA•MEME/MAST

Idea: expectation maximation.

Page 161: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Local sequence alignment

Traditional alignment approaches:

Either global or local methods!

Page 162: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

New question: sequence families with multiple local similarities

Neither local nor global methods appliccable

Page 163: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

New question: sequence families with multiple local similarities

Alignment possible if order conserved

Page 164: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Morgenstern, Dress, Werner (1996),PNAS 93, 12098-12103

Combination of global and local methods

Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

Page 165: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 166: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 167: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 168: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 169: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 170: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 171: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 172: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 173: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 174: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 175: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Consistency!

Page 176: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa

Page 177: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Score of an alignment:

Define score of fragment f:

l(f) = length of fs(f) = sum of matches (similarity values)

P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences.

Score w(f) = -ln P(f)

Page 178: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Score of an alignment:

Define score of fragment f:

Define score of alignment as

sum of scores of involved fragments

No gap penalty!

Page 179: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Score of an alignment:

Goal in fragment-based alignment approach: find

Consistent collection of fragments with maximum sum of weight scores

Page 180: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaaccccctcgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Pair-wise alignment:

Page 181: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaaccccctcgtgcttagagatccaaaccagtgcgtgtattactaacggttcaatcgcgcacatccgc

Pair-wise alignment:

recursive algorithm finds optimal chain of

fragments.

Page 182: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Pair-wise alignment:

recursive algorithm finds optimal chain of

fragments.

Page 183: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

------atctaatagttaaaccccctcgtgcttag-------agatccaaaccagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc--

Optimal pairwise alignment: chain of fragments with maximum sum of weights found by dynamic programming:

Standard fragment-chaining algorithm

Space-efficient algorithm

Page 184: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 185: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaccctgaattgaagagtatcacataa

(1) Calculate all optimal pair-wise alignments

Page 186: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 187: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 188: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Fragments from optimal pair-wise alignments might be inconsistent

Page 189: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 190: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 191: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 192: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 193: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 194: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 195: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Fragments from optimal pair-wise alignments might be inconsistent

1. Sort fragments according to scores

2. Include them one-by-one into growing multiple alignment – as long as they are consistent

(greedy algorithm, comparable to knapsack problem)

Page 196: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 197: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 198: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 199: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 200: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Consistency problem

Page 201: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Consistency problem

Page 202: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 203: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 204: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagt taaactcccccgtgcttag

Cagtgcgtgtattact aacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 205: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taata-----gttaaactcccccgtgcttag

Cagtgcgtgtatta-----ctaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Page 206: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

site x = [i,p] (sequence i, position p)

Page 207: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

Calculate upper bound bl(x,i) and lower bound bu(x,i) for each x and sequence i

Page 208: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Upper and lower bounds for alignable positions

bl(x,i) and bu(x,i) updated for each new fragment in alignment

Page 209: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Consistency bounds are to be updated for each new fragment that is included in to the growing Alignment

Efficient algorithm

(Abdeddaim and Morgenstern, 2002)

Page 210: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

Advantages of segment-based approach:

Program can produce global and local alignments!

Sequence families alignable that cannot be aligned with standard methods

Page 211: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

DIALIGN is available

Online at BiBiServ (Bielefeld Bioinformatics Server)

Downloadable UNIX/LINUX executables at BiBiServ

Source code (email to BM)

Page 212: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Program input

Program usage:

> dialign2-2 [options] <input_file>

<input_file> = multi-sequence file in FASTA-format

Page 213: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Program output

DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: [email protected] Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218.

For more information, please visit the DIALIGN home page at

http://bibiserv.techfak.uni-bielefeld.de/dialign/

program call: ./dialign2-2 -nt -anc s

Aligned sequences: length: ================== ======= 1) dog_il4 300 2) bla 200 3) blu 200

Average seq. length: 233.3

Please note that only upper-case letters are considered to be aligned.

Page 214: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Program output

Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777

Page 215: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 216: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 217: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 218: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 219: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 220: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 221: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 222: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 223: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 224: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaac----------ggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 225: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 226: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag------

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa--

Page 227: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 228: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Program output

Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777

Page 229: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

T-COFFEE

C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol.

Page 230: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

T-COFFEE

Problem with “progressive” approaches:

Strictly global alignments

Use only pair-wise comparison

Page 231: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

T-COFFEE

Idea: Start with local and global pair-wise alignments (“primary

library” of alignments)

Construct “scondary library” of residues that are indirectly aligned by primary library.

Re-score residue pairs

Construct final alignment with “progressive” method

Page 232: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

T-COFFEE

Advantage:

Combination of local and global approaches

Less sensitive against mis-alignments in progressive proceedure

Page 233: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

T-COFFEE

Page 234: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Page 235: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

T-COFFEE

T-COFFEE and DIALIGN: Less sensitive to spurious pairwise similarities Can handle local homologies better than

CLUSTAL

Page 236: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Most multi-alignment approaches automated, i.e. based on algorithmic rules. Two components:

Objective function: assess alignment quality

Optimization algorithm: find optimal or near-optimal alignment

Page 237: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Fully automated alignment programs necessary f no expert knowledge available if large amounts of data to be analyzed

But: Often no biologically reasonable

results Often additional information about

homologies etc. available

Page 238: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Idea for improved alignment

Use expert knowledge to influence alignment procedure

DIALIGN with user-defined anchor points

Page 239: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

Alignment of large genomic sequences to identify functional elements (phylogenetic footprinting)

Göttgens et al., 2000, 2001, 2002, … Pollard et al., 2004

DIALIGN, MGA, PipMaker, LAGAN, AVID, Mummer, WABA, …

Page 240: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

Gene-regulatory sites identified by mulitple sequence alignment (phylogenetic footprinting)

Page 241: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

Page 242: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

DIALIGN alignment of human and murine genomic sequences

Page 243: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

DIALIGN alignment of tomato and Thaliana genomic sequences

Page 244: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004)

Page 245: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004)

Alignment of Hox gene cluster:

Page 246: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004)

Alignment of Hox gene cluster:

DIALIGN able to identify small regulatory elements, but

Page 247: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004)

Alignment of Hox gene cluster:

DIALIGN able to identify small regulatory elements, but

Entire genes totally mis-aligned

Page 248: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004)

Alignment of Hox gene cluster:

DIALIGN able to identify small regulatory elements, but

Entire genes totally mis-aligned Reason for mis-alignment: duplications !

Page 249: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

The Hox gene cluster:

4 Hox gene clusters in pufferfish. 14 genes, different genes in different clusters!

Page 250: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of large genomic sequences

The Hox gene cluster:

Complete mis-alignment of entire genes!

Page 251: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Page 252: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Conserved motivs; no similarity outside motifs

Page 253: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in two sequences

Page 254: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in two sequences

Page 255: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in two sequences

Page 256: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Mis-alignment would have lower score!

Page 257: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in one sequence

Page 258: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in one sequence

Page 259: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in one sequence

Possible mis-alignment

Page 260: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in one sequence

S3

Page 261: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in one sequence

S3

Page 262: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in one sequence

S3

Page 263: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Duplication in one sequence

S3

Page 264: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Consistency problem

S3

Page 265: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

More plausible alignment – and higher score:

S3

Page 266: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Consistency problem

S3

Page 267: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Alignment of sequence duplications

S1

S2

Alternative alignment; probably biologically wrong;lower numerical score!

S3

Page 268: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

Biologically meaningful alignment not possible by automated approaches.

Idea: use expert knowledge to guide alignment procedure

User defines a set anchor points that are to be „respected“ by the alignment procedure

Page 269: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

Page 270: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

Page 271: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

Use known homology as anchor point

Page 272: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

Use known homology as anchor point

Anchor point = anchored fragment (gap-free pair of segments)

Remainder of sequences aligned automatically

Page 273: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

Alignment of anchored positions a and b not enforced – a and b may be un-aligned –, but:

a is only residue that can be aligned to b

Residues left of a aligned with residues left of b

Page 274: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

-------NLF VALYDFVASG DNTLSITKGE klrvlgynhn

iihredkGVI YALWDYEPQN DDELPMKEGD cmt-------

Anchored alignment

Page 275: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

Anchor points in multiple alignment

Page 276: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQND DELPMKEGDCMT

GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

Anchor points in multiple alignment

Page 277: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Anchored sequence alignment

-------NLF V-ALYDFVAS GD-------- NTLSITKGEk lrvLGYNhn

iihredkGVI Y-ALWDYEPQ ND-------- DELPMKEGDC MT-------

-------GYQ YrALYDYKKE REedidlhlg DILTVNKGSL VA-LGFS--

Anchored multiple alignment

Page 278: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

Goal:

Find optimal alignment (=consistent set of fragments) under costraints given by user-specified anchor points!

Page 279: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Algorithmic questions

Page 280: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMTGYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

Page 281: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Algorithmic questions

Page 282: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences

Algorithmic questions

Page 283: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences start positions

Algorithmic questions

Page 284: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences start positions length

Algorithmic questions

Page 285: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences start positions length score

Algorithmic questions

Page 286: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

Requirements:

Anchor points need to be consistent! – if necessary: select consistent subset from user-specified anchor points

Page 287: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 288: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 289: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Inconsistent anchor points!

Page 290: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaat---agttaaactcccccgtgcttag

Cagtgcgtgtattac-taacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Inconsistent anchor points!

Page 291: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

Requirements:

Anchor points need to be consistent! – if necessary: select consistent subset from user-specified anchor points

Find alignment under constraints given by anchor points!

Page 292: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

Use data structures from multiple alignment

Page 293: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 294: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Greedy procedure for multiple alignment

Page 295: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Greedy procedure for multiple alignment

Page 296: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Question: which positions are still alignable ?

Page 297: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

For each position x and each sequence Si exist an

upper bound ub(x,i) and a lower bound lb(x,i) for

residues y in Si that are alignable with x

Page 298: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

For each position x and each sequence Si exist an

upper bound ub(x,i) and a lower bound lb(x,i) for

residues y in Si that are alignable with x

Page 299: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

ub(x,i) and lb(x,i) updated during greedy procedure

Page 300: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

Initial values of lb(x,i), ub(x,i)

Page 301: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

ub(x,i) and lb(x,i) updated during greedy procedure

Page 302: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

ub(x,i) and lb(x,i) updated during greedy procedure

Page 303: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

Anchor points treated like fragments in greedy algorithm:

Sorted according to user-defined scores Accepted if consistent with previously accepted

anchors

ub(x,i) and lb(x,i) updated during greedy

procedure

Resulting values of ub(x,i) and lb(x,i) used as initial

values for alignment procedure

Page 304: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

Initial values of lb(x,i), ub(x,i)

Page 305: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

atctaatagttaaactcccccgtgcttag Si

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa x

Initial values of lb(x,i), ub(x,i) calculated using anchor

points

Page 306: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Algorithmic questions

Ranking of anchor points to prioritize anchor points, e.g.

anchor points from verified homologies -- higher priority

automatically created anchor points (using CHAOS, BLAST, … ) -- lower priority

Page 307: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: Hox gene cluster

Page 308: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: Hox gene cluster

Use gene boundaries as anchor points

Page 309: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: Hox gene cluster

Use gene boundaries as anchor points

+ CHAOS / BLAST hits

Page 310: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: Hox gene cluster

no anchoring anchoring

Ali. Columns

2 seq 2958 3674

3 seq 668 1091

4 seq 244 195

Score 1166 1007

CPU time 4:22 0:19

Page 311: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: Hox gene cluster

Example:

Teleost Hox gene cluster:

Score of anchored alignment 15 % higher than score of non-anchored alignment !

Conclusion: Greedy optimization algorithm does a bad job!

Page 312: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: Improvement of Alignment programs

Two possible reasons for mis-alignments:

Wrong objective function: Biologically correct

alignment gets bad numerical score

Bad optimization algorithms: Biologically correct

alignment gets best numerical score, but algorithm

fails to find this alignment

Page 313: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: Improvement of Alignment programs

Two possible reasons for mis-alignments:

Anchored alignments can help to decide

Page 314: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: RNA alignment

Page 315: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: RNA alignment

aa----CCCC AGC---GUAa gucgcuaucc a

cacucuCCCA AGC---GGAG Aac------- -

ccg----CCA AaagauGGCG Acuuga---- -

non-anchored alignment

Page 316: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: RNA alignment

aa----CCCC AGC---GUAa gucgcuaucc a

cacucuCCCA AGC---GGAG Aac------- -

ccg----CCA AaagauGGCG Acuuga---- -

structural motif mis-aligned

Page 317: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Application: RNA alignment

aaCCCCAGCG UAAGUCGCUA UCca--

--CACUCUCC CAAGCGGAGA AC----

----CCGCCA AAAGAUGGCG ACuuga

3 conserved nucleotides as anchor points

Page 318: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

WWW interface at GOBICS(Göttingen Bioinformatics Compute Server)

Page 319: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

WWW interface at GOBICS (Göttingen Bioinformatics Compute Server)

Page 320: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene predictions for eukaryotes

Page 321: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene predictions for eukaryotes

Goal: find location and structure of protein-coding genes in eukaryotic genome sequences.

Page 322: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene predictions for eukaryotes

attgccagtacgtagctagctacacgtatgctattacggatctgtagcttagcgtatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcttagtcgtgtagtcttgatctacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatggtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctag

Page 323: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene predictions for eukaryotes

attgccagtacgtagctagctacacgtatgctattacggatctgtagcttagcgtatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcttagtcgtgtagtcttgatctacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatggtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctag

Page 324: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene predictions for eukaryotes

Page 325: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene predictions for eukaryotes

Three different approaches to computational gene-finding:

Intrinsic: use statistical information about known genes (Hidden Markov Models)

Extrinsic: compare genomic sequence with known proteins / genes

Cross-species sequence comparison: search for similarities among genomes

Page 326: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

Generative probabilistic model for sequence of observations („symbols“).

Finite set of states

States can emit symbols Transitions between states possible Sequence generated by path between states

Page 327: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

Example: The occasionally dishonest casino.

3 5 6 6 6 4 6 5 1 6 5 1 2

F F U U U U U F F F F F F

Possible states:

fair (F); unfair (U); begin (B); end (E)

Page 328: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

Assumptions:

Emission probabilities known; depend only on current state.

Transition probabilities known, depend only on current state

Page 329: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

F

U

E B

Page 330: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

3 5 6 6 6 4 6 5 1 6 5 1 2 s

B F F U U U U U F F F F F F E φ

For sequence s and parse φ:

P(φ) probability of φ P(φ,s) joint probability of φ and s = P(φ) * P(s|φ) P(φ|s) a-posteriori probability of φ

Page 331: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

3 5 6 6 6 4 6 5 1 6 5 1 2

B F F U U U U U F F F F F F E

Goal: find path φ with maximum a-posteriori probability P(φ|s)

Idea: find path that maximizes joint probability P(φ,s) by dynamic programming (Viterbi algorithm)

Page 332: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

Application to gene prediction:

A T A A T G C C T A G T C s (DNA) Z Z Z E E E E E E I I I I φ (parse)

Introns, exons etc modeled as states in GHMM („generalized HMM“)

Given sequence s, find parse that maximizes P(φ|s)

(S. Karlin and C. Burge, 1997)

Page 333: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

Application to gene prediction:

A T A A T G C C T A G T C s (DNA) Z Z Z E E E E E E I I I I φ (parse)

Introns, exons etc modeled as states in GHMM („generalized HMM“)

Given sequence s, find parse that maximizes P(φ|s)

(S. Karlin and C. Burge, 1997)

Page 334: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS

Basic model for GHMM-based intrinsic gene finding comparable to GenScan (M. Stanke)

Page 335: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS

Page 336: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS

Page 337: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS

Features of AUGUSTUS:

Intron length model Initial pattern for exons Similarity-based weighting for splice sites Interpolated HMM Internal 3’ content model

Page 338: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Hidden-Markov-Models (HMM) for gene prediction

A T A A T G C C T A G T C s (DNA) Z Z Z E E E E I I I I φ (parse)

Explicit intron length model computationally expensive.

Page 339: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS

Intron length model:

• Explicit length distribution for short introns• Geometric tail for long introns

Intron (fixed)

Exon

Intron (expl.)

Exon

Intron (geo.)

Page 340: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS

Page 341: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS

Extension of AUGUSTUS using include extrinsic information:

Protein sequences EST sequences Syntenic genomic sequences User-defined constraints

Page 342: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Comparison of genomic sequences

(human and mouse)

Page 343: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Page 344: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

catcatatcttatcttacgttaactcccccgt

cagtgcgtgatagcccatatccgg

Gene prediction by phylogenetic footprinting

Page 345: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

catcatatcttatcttacgttaactcccccgt

cagtgcgtgatagcccatatccgg

Gene prediction by phylogenetic footprinting

Page 346: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

catcatatcttatcttacgttaactcccccgt

cagtgcgtgatagcccatatccgg

Standard score:Consider length, # matches, compute probability of random occurrence

Gene prediction by phylogenetic footprinting

Page 347: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Translation option:

catcatatcttatcttacgttaactcccccgt

cagtgcgtgatagcccatatccgg

Gene prediction by phylogenetic footprinting

Page 348: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Translation option:

L S Y V

catcatatc tta tct tac gtt aactcccccgt

cagtgcgtg ata gcc cat atc cgg

I A H I

DNA segments translated to peptide segments; fragment score based on peptide similarity:

Calculate probability of finding a fragment of the same length with (at least) the same sum of BLOSUM values

Gene prediction by phylogenetic footprinting

Page 349: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

P-fragment (in both orientations)

L S Y V

catcatatc tta tct tac gtt aactcccccgt

cagtgcgtg ata gcc cat atc cgg

I A H I

N-fragment catcatatc ttatcttacgtt aactcccccgtgct || | | | cagtgcgtg atagcccatatc cg

For each fragment f three probability values calculated; Score of f based on smallest P value.

Gene prediction by phylogenetic footprinting

Page 350: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

P-fragment (in both orientations)

L S Y V

catcatatc tta tct tac gtt aactcccccgt

cagtgcgtg ata gcc cat atc cgg

I A H I

N-fragment catcatatc ttatcttacgtt aactcccccgtgct || | | | cagtgcgtg atagcccatatc cg

P-fragments associated with strand and reading frame!

Gene prediction by phylogenetic footprinting

Page 351: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Page 352: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

AGenDA: Alignment-based Gene Detection Algorithm

Page 353: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Fragments in DIALIGN alignment

Page 354: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Build cluster of fragments

Page 355: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Identify conserved splice sites

Page 356: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

•Candidate exons bounded by conserved splice sites •Find optimal chain of candidate exons

Page 357: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Page 358: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Page 359: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

Page 360: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

0%10%20%30%40%50%60%70%80%90%

100%

sensitivity specificity

AGenDAGenScan

Page 361: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Gene prediction by phylogenetic footprinting

AGenDA

GenScan

64 %

12 % 17 %

Page 362: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Extended GHMM using extrinsic information

Additional input data: collection h of `hints’ about possible gene structure φ for sequence s

Consider s, φ and h result of random process. Define probability P(s,h,φ)

Find parse φ that maximizes P(φ|s,h) for given s and h.

Page 363: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Hints created using

Alignments to EST sequences Alignments to protein sequences Combined EST and protein alignment (EST

alignments supported by protein alignments) Alignments of genomic sequences User-defined hints

Page 364: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Alignment to EST: hint to (partial) exon

EST

G1

Page 365: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

EST alignment supported by protein: hint to exon (part), start codon

EST

G1

Protein

Page 366: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Alignment to ESTs, Proteins: hints to introns, exons

ESTs, Protein

G1

Page 367: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Alignment of genomic sequences: hint to (partial) exon

G2

G1

Page 368: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Consider different types of hints:

type of hints: start, stop, dss, ass, exonpart, exon, introns

Hint associated with position i in s (exons etc. associated with right end position) max. one hint of each type allowed per position in s Each hint associated with a grade g that indicates its source or reliability.

Page 369: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

hi,t = information about hint of type t at position i

hi,t = $ if no hint of type t available at i

hi,t = [grade, strand, (length, reading frame)] if hint available

(hints created by protein alignments or DIALIGN contain information about reading frame)

Page 370: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Standard program version, without hints

A T A A T G C C T A G T C s (sequence) Z Z Z E E E E E E I I I I φ (parse)

Find parse that maximizes P(φ|s)

Page 371: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

AUGUSTUS+ using hints

A T A A T G C C T A G T C s (sequence) $ $ $ $ $ $ $ X $ $ $ $ $ h (type 1) $ $ $ $ $ $ $ $ $ $ $ $ $ h (type 2) $ $ $ $ X $ $ $ $ $ $ $ $ h (type 3) . . . .

Z Z Z E E E E E E I I I I φ (parse)

Find parse that maximizes P(φ|s,h)

Page 372: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

As in standard HMM theory: maximize joint probability P(φ,s,h)

How to define P(φ,s,h) ?

Page 373: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).

Page 374: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).

),|(),(),,( shPsPhsP

Page 375: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).

),|(),(),,( shPsPhsP

ti

ti shPshPsP,

, ),|(),|(),(

Page 376: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Assumption: P(hi,t |φ,s) depends on type t, grade g and whether hi,t is compatible with φ or s.

Example: hi,t hint to exon E

hi,t compatible with parse φ if E part of φ.

hi,t compatible with sequence s if start and stop codons exist according to E and if no internal stop codon in E exists

Page 377: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

For given g and t: 3 possible values for P(hi,t |φ,s)

P(hi,t |φ,s) = q+(t,g) if hi,t compatible with φ

P(hi,t |φ,s) = q-(t,g) if hi,t compatible with s

but not compatible with φP(hi,t |φ,s) = 0 if hi,t not compatible with s

Values learned from training data

Page 378: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Results:

Gene (sub-)structures supported by hints receive bonus compared to non-supported structures

Gene (sub-)structures not supported by hints receive malus

(M. Stanke et al. 2006, BMC Bioinformatics)

Page 379: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Page 380: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

h, h’ collections of hints;

h’i,t = hi,t for (i,t) ≠ (I,T)

h’I,T ≠ hI,T = $; g grade of h’I,T

φ+, φ- gene structures on s

h’IT compatible with φ+, but not with φ-

Page 381: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

),|'(),(

),|'(),(

)',,(

)',,(

)',|(

)',|(

shPsP

shPsP

hsP

hsP

hsP

hsP

Page 382: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

),|'(),(

),|'(),(

)',,(

)',,(

)',|(

)',|(

shPsP

shPsP

hsP

hsP

hsP

hsP

titi

titi

shPsP

shPsP

,,

,,

),|'(),(

),|'(),(

Page 383: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

),|'(),(

),|'(),(

)',,(

)',,(

)',|(

)',|(

shPsP

shPsP

hsP

hsP

hsP

hsP

titi

titi

shPsP

shPsP

,,

,,

),|'(),(

),|'(),(

ti TI

TIti

TI

TI

titi

shP

shPshPsP

shP

shPshPsP

, ,

,,

,

,

,,

),|(

),|'(),|(),(

),|(

),|'(),|(),(

Page 384: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

ti TI

TIti

TI

TI

titi

shP

shPshPsP

shP

shPshPsP

, ,

,,

,

,

,,

),|(

),|'(),|(),(

),|(

),|'(),|(),(

Page 385: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

ti TI

TIti

TI

TI

titi

shP

shPshPsP

shP

shPshPsP

, ,

,,

,

,

,,

),|(

),|'(),|(),(

),|(

),|'(),|(),(

),|$(

),(),|(),(

),|$(

),(),|(),(

,

,

shP

gTqshPsP

shP

gTqshPsP

TI

TI

Page 386: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

),|$(

),(),|(),(

),|$(

),(),|(),(

,

,

shP

gTqshPsP

shP

gTqshPsP

TI

TI

Page 387: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

),|$(

),(),|(),(

),|$(

),(),|(),(

,

,

shP

gTqshPsP

shP

gTqshPsP

TI

TI

),|$(),(

),|$(),(

),|(

),|(

,

,

shPgTq

shPgTq

hsP

hsP

TI

TI

Page 388: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Result:

i.e. structure φ+, which is compatible with additional hint h’IT receives relative bonus

),|$(),(

),|$(),(

,

,

shPgTq

shPgTq

TI

TI

),|(),(

),|(),(

),|(

),|(

),|(

),|(

,

,

shPgTq

shPgTq

hsP

hsP

hsP

hsP

TI

TI

Page 389: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Results (gene level) on data set sag178

% SN % SP

Augustus 42 38

GenScan 18 14

GeneID 17 17

HMMGene 20 7

Aug. + EST 49 46

Aug. + prot 71 68

Aug. combined 68 65

Aug. all 82 79

GenomeScan 37 38

TwinScan 20 25

Page 390: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Using hints from DIALIGN alignments:

1. Obtain large human/mouse sequence pairs (up to 50kb) from UCSC

2. Run CHAOS to find anchor points3. Run DIALIGN using CHAOS anchor points4. Create hints h from DIALIGN fragments5. Run AUGUSTUS with hints

Page 391: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Hints from DIALIGN fragments:

Segment covered by peptide fragment minus 33 bp at both ends defines exon part hint on all 6 reading frames.

Page 392: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

Hints from DIALIGN fragments:

Consider fragments with score ≥ 20

Distinguish high scores (≥ 45) from low scores Consider reading frame given by DIALIGN Consider strand given by DIALIGN

=> 2*2*2 = 8 grades

Page 393: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

AUGUSTUS+

AUGUSTUS best ab-initio method at EGASP

Page 394: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

EGASP test results

AUGUSTUS

GENSCAN

geneid GeneMark.hmm

Genezilla

0

10

20

30

40

50

60

70

80

90

100 Nukleotid Level

Sensitivität

Spezifität

Page 395: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

EGASP test results

AUGUSTUS

GENSCAN

geneid GeneMark.hmm

Genezilla

0

10

20

30

40

50

60

70

80

90

100 Exon Level

Sensitivität

Spezifität

Page 396: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

EGASP test results

AUGUSTUS

GENSCAN

geneid GeneMark.hmm

Genezilla

0

2,5

5

7,5

10

12,5

15

17,5

20

22,5

25

27,5

30 Transkript Level

Sensitivität

Spezifität

Page 397: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

EGASP test results

AUGUSTUS

GENSCAN

geneid GeneMark.hmm

Genezilla

0

2,5

5

7,5

10

12,5

15

17,5

20

22,5

25

27,5

30 Gen Level

Sensitivität

Spezifität

Page 398: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Sn Sp Sn Sp Sn Sp Sn Sp

Base Exon Transcript Gene

Ac

cu

rac

y

AUGUSTUS

AUGUSTUS+DIALIGN

DOGFISH-C

SGP2

TWINSCAN

TWINSCAN-MARS

N-SCAN

EGASP test results

Page 399: 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and

04/21/23 Burkhard Morgenstern, Tunis 2007

Ongoing projects

Brugia malayi (TIGR)

Aedes aegypti (TIGR)

Schistosoma mansoni (TIGR)

Tetrahymena thermophilia (TIGR)

Galdieria Sulphuraria (Michigan State Univ.)

Coprinus cinereus (Univ. Göttingen)

Tribolium castaneum (Univ. Göttingen)