class 5: multiple sequence alignment

20
. Class 5: Multiple Sequence Alignment

Upload: ciaran-william

Post on 30-Dec-2015

19 views

Category:

Documents


0 download

DESCRIPTION

Class 5: Multiple Sequence Alignment. Multiple sequence alignment. VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Class 5: Multiple Sequence Alignment

.

Class 5:Multiple Sequence

Alignment

Page 2: Class 5: Multiple Sequence Alignment

Multiple sequence alignment

VTISCTGSSSNIGAG-NHVKWYQQLPG

VTISCTGTSSNIGS--ITVNWYQQLPG

LRLSCSSSGFIFSS--YAMYWVRQAPG

LSLTCTVSGTSFDD--YYSTWVRQPPG

PEVTCVVVDVSHEDPQVKFNWYVDG--

ATLVCLISDFYPGA--VTVAWKADS--

AALGCLVKDYFPEP--VTVSWNSG---

VSLTCLVKGFYPSD--IAVEWESNG--

Homologous residues are aligned together in columns Homologous - in the structural and evolutionary sense

Ideally, a column of aligned residues occupy similar 3d structural positions

Page 3: Class 5: Multiple Sequence Alignment

Multiple alignment – why?

Identify sequence that belongs to a family Family – a collection of homologous, with similar

sequence, 3d structure, function or evolutionary history

Find features that are conserved in the whole family Highly conserved regions, core structural elements

Page 4: Class 5: Multiple Sequence Alignment

The relation between the divergence of sequence and

structure

[Durbin p. 137, redrawn from data in Chothia and Lesk (1986)]

Page 5: Class 5: Multiple Sequence Alignment

Scoring a multiple alignment (1)

Important features of multiple alignment: Some positions are more conserved than others Position specific scoring

Sequences are not independent (related by phylogenetic tree)

Ideally, specify a complete model of molecular sequence evolution

Page 6: Class 5: Multiple Sequence Alignment

Scoring a multiple alignment (2)

Unfortunately, not enough data …

Assumption (1)Columns of alignment are statistically independent.

( ) ( )ii

S m G S m 1 2( , ,..., ) Column of alignment m

( ) Score for column

Gap scoring function

Ni i i i

i

m m m m i

S m i

G

Page 7: Class 5: Multiple Sequence Alignment

Minimum entropy

Assumption (2)Symbols within columns are independent

Observed counts of ( )

symbol in column

The probability of

symbol in column

jia i

j

ia

c m aa i

Pa i

( )

( ) log

iaci ia

a

i ia iaa

P m P

S m c P

Entropy measure

Page 8: Class 5: Multiple Sequence Alignment

Sum of pairs (SP)

Columns are scored by a “sum of pairs” function, using a substitution scoring matrix

Note:

( ) ( , )k li i i

k l

S m m m

log( ) log( ) log( ) log( )abc ab ac bc

a b c a b a c b c

P P P P

q q q q q q q q q

Page 9: Class 5: Multiple Sequence Alignment

Multidimensional DP

( ) ( )ii

S m S m

Page 10: Class 5: Multiple Sequence Alignment

Multidimensional DP

1 2 1 2

1 2 2

1 2 1

1 2 1 2 1 2

1 2 3

1 2 3

1 21, 1, , 1

2, 1, , 1

11, , , 1

1 2, , , 1, 1, ,

, , 1 , 1

, 1, 1 ,

( , , , )

( , , , )

( , , , )

max ( , , , )

( , , , )

N N

N N

N N

N N

N N

N

Ni i i i i i

Ni i i i i

Ni i i i i

i i i i i i i i

Ni i i i i

i i i i

x x x

x x

x x

x x

x

2

2( , , , )ix

Page 11: Class 5: Multiple Sequence Alignment

Multidimensional DP

: 1

: 0i

ii

xx

1 2 1 1 2 2 1 2

1

1 2, , , , , , 1 2

0max ( , , , )

N N N NN

Ni i i i i i i i N ix x x

Complexity

Space: Time: 1

N

ii

O L

1

2N

Ni

i

O L

Page 12: Class 5: Multiple Sequence Alignment

Pairwise projections of MA

Page 13: Class 5: Multiple Sequence Alignment

MSA (i)

[Carrillo and Lipman, 1988]

pairwise alignment between sequences ,

optimal pairwise alignment of ,

( ) ( )

lower bound on the optimal ( )

multiple alignment score

( ) ( )

kl

kl

klkl

a k l

a k l

S a S a

l a

l a S a

Page 14: Class 5: Multiple Sequence Alignment

MSA (ii)

' '

' '

' '

' '

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( , ) s.t. the best alignment of

, through ( , ) scores

kl k lkl

k l

kl k lkl kl

k l

k lkl

k l klk l

l a S a S a S a

S a l a S a S a

i iB

x x i i

Page 15: Class 5: Multiple Sequence Alignment

MSA (iii)

Algorithm sketch

1 2

( ) , ,

( , , , )

( , )

kl

kl

N

klk l

l a a

B

i i i

i i B

kl1. Calculate

2. Find

3. Use multidimensional

DP to evaluate only

cells for

which

Page 16: Class 5: Multiple Sequence Alignment

Progressive alignment methods (i)

Basic idea: construct a succession of PW alignments

Variatoins: PW alignment order One growing alignment or subfamilies Alignment and scoring procedure

Page 17: Class 5: Multiple Sequence Alignment

Progressive alignment methods (ii)

Most important heuristic – align the most similar pairs first.

Many algorithms build a “guide tree”: Leaves – sequence Interior nodes – alignments Root – complete multiple alignment

Page 18: Class 5: Multiple Sequence Alignment

Feng-Doolittle (1987)

Calculate all pairwise distances using alignment scores:

Construct a guide tree using hierarchical clustering

Highest scoring pairwise alignment determines sequence to group alignment

log log obs randeff

max rand

S SD S

S S

Page 19: Class 5: Multiple Sequence Alignment

Profile alignment

Use profiles for group to sequence and group to group alignments

CLUSTALW (Thompson et al., 1994): Similar to Feng-Doolittle, but uses profile alignment

methods Numerous heuristics

Page 20: Class 5: Multiple Sequence Alignment

Iterative Refinement

Addresses “frozen” sub-alignment problem

Iteratively realign sequences or groups to a profile of the rest

Barton and Sternberg (1987) Align two most similar sequences Align current profile to most similar sequence Remove each sequence and align it to profile