intro to alignment algorithms: global and...

37
Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor Istrail

Upload: others

Post on 18-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Intro to Alignment Algorithms:

Global and Local

Algorithmic Functions of Computational Biology

Professor Istrail

Page 2: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Sequence ComparisonBiomolecular sequences □ DNA sequences (string over 4 letter alphabet {A, C,

G, T}) □ RNA sequences (string over 4 letter alphabet

{ACGU}) □ Protein sequences (string over 20 letter alphabet

{Amino Acids})

Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins.

Algorithmic Functions of Computational Biology

Professor Istrail

Page 3: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

The Basic Similarity Analysis Algorithm

Global Similarity

• Scoring Schemes

• Edit Graphs

• Alignment = Path in the Edit Graph

• The Principle of Optimality

• The Dynamic Programming Algorithm

• The Traceback

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 4: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

The Sequence Alignment ProblemInput. : two sequences over the same alphabet and a scoring scheme Output: an alignment of the two sequences of maximum score

Example: □ GCGCATTTGAGCGA □ TGCGTTAGGGTGACCA A possible alignment: - GCGCATTTGAGCGA - - TGCG - - TTAGGGTGACC

match mismatch

indel

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 5: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

CSCI2820 - Class 4

Mismatch, Deletion, Insertion

5

mismatch

deletion (in

template)insertion

(in template)

TCAGGGGGCTATTAGTCCTCCGATAA

TCAGGGGGCTATTAGTCC-CCGATAATCAGGGGG-CTATTAGTCCCCCCGATAA

Page 6: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

m

n

yyyYxxxX

...21

...21

=

=Consider two sequences

Over the alphabet

},,, TGCA{=Σ

ji yx , belong to Σ

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 7: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Scoring Schemes

δUnit-score A C G T

ACGT

11

11

-

-

0000

000

0 00

00

000

0

00000

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 8: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Alignment

ACG | | | AGG

δScore = (A,A) (C,G) (G,G)+ +

= 1 + 0 + 1 = 2

Unit-cost

A | A

A is aligned with A

C | G

C is aligned with G

G is aligned with GG | G

Algorithmic Functions of Computational Biology –

Professor Istrail

δ δ

Page 9: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Gaps

ACATGGAAT ACAGGAAAT

ACAT GG - AAT ACA - GG AAAT

OPTIMAL ALIGNMENTS

SCORE 7 8

AAAGGG GGGAAA

SCORE 0 3

- - - AAAGGG GGGAAA - - -

“-” is the gap symbol

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 10: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

δ

δδ (x,y) = the score for aligning x with y

(-,y) = the score for aligning - with y

(x,-) = the score for aligning x with -

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 11: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

A-CG - G ATCGTG

Alignment

Score

δ δδ δ δδ(A,A) + (G,G) +(C,C) +(-,T) + (-,T ) + (G,G)

THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 12: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Margaret Dayhoff & PAM Similarity Matrices

ARTEMIS Summer 2008

Professor Istrail

Page 13: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Dr. Margaret Oakley Dayhoff The Mother & Father of Bioinformatics

ARTEMIS Summer 2008

Professor Istrail

Page 14: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Scoring SchemeDayhoff PAM scoring matrices

...δ

PTIPLSRLFDNAMLRAHRLHQ SAIENQRLFNIAVSRVQHLHL

Partial alignment for Monkey and Trout somatotropin proteins

- A R N D C Q E G H I L K M F P S T W Y V

-8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 3 -3 0 0 -3 -1 0 1 -3 -1 -3 -2 -2 -4 1 1 1 -7 -4 0A

R N D

64

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 15: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Scoring Functions

Scoring function = a sum of a terms each for a pair of aligned residues, and for each gap

The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated

Identities and conservative substitutions are Positive terms

Non-conservative substitutions are Negative terms

Mutations= Substitutions, Insertions, Deletions

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 16: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

CSCI2820 - Class 4

Global alignment problem

• Input: Sequences X and Y of length m and n respectively and a similarity matrix

• Output: An optimal global alignment of X and Y – Global alignments require all bases in both

sequences are aligned

16

Page 17: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

CSCI2820 - Class 4

Local alignment problem

• Input: Sequences X and Y of length m and n respectively and a similarity matrix

• Output: An optimal local alignment of X and Y – Local alignments do not require using all

bases in either sequence in the alignment

• Applicable when looking for subsequences of similarity

17

Page 18: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

The Edit Graph

Suppose that we want to align AGT with AT

We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph.

This is the Edit Graph

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 19: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

0 1 2 30

1

2

AGT has length 3 AT has length 2

The Edit graph has (3+1)*(2+1) nodes

The sequence AGT

The sequence AT

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 20: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

0 1 2 30

1

2

A G T

A

T

AGT indexes the columns, and AT indexes the rows of this “table”

Begin

End

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 21: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

0 1 2 30

1

2

A G T

A

T

Begin

EndThe Graph is directed. The nodes (i,j) will hold values.

Algorithmic Functions of Computational Biology –

Professor Istrail

0 10

1

2

A G

A

T

Page 22: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

T0 1 2 30

1

2

A G

A

T

Begin

End

Algorithmic Functions of Computational Biology

Professor Istrail

Page 23: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

T

A

T

0 1 2 30

1

2

A GA -

- A

A A

- A

- A-

A

- A

G -

A -

A -

G -

G -

T -

T -

T -

- T

- T

- T

- T

A T

G T

T T

G A

T A

Directed edges get as labels pairs of aligned letters.

Begin

End

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 24: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Alignment = Path in the Edit GraphT

A

T

0 1 2 30

1

2

A GA -

- A

A A

- A

- A-

A

- A

G -

A -

A -

G -

G -

T -

T -

T -

- T

- T

- T

- T

A T

G T

T T

G A

T A

AGT A-T

Begin

End

Every path from Begin to End corresponds to an alignment

Every alignment corresponds to a path between Begin and End

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 25: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

The Principle of Optimality

The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 26: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Dynamic Programming

Part 1: Compute first the optimal alignment score

Part 2: Construct optimal alignment

We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex

Given: Two sequences X and Y Find: An optimal alignment of X with Y

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 27: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

The DP Matrix S(i,j)0 1 2 3

0

1

2

A G T

A

TS(2,1)

S(1,0)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 28: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

The DP MatrixMatrix S =[S(i,j)]

S(i,j) = The score of the maximal cost path from the Begin Vertex and the vertex (i,j)

(i,j)

(i,j-1)

(i-1,j)

(i-1,j-1) The optimal path to (i,j) must pass through one of

the vertices

(i-1,j-1)

(i-1,j)

(i,j-1)

Opt

imal

Pat

h to

(i,j

)Algorithmic Functions of Computational Biology –

Professor Istrail

Page 29: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Opt path

(i,j)

(i,j-1)

(i-1,j)

(i-1,j-1)

Optimal path to (i-1,j) + (- , yj)

- xi

yj -

S(i-1,j) + δ

δ

(- , yj)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 30: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Optimal path

(i,j)

(i-1,j)

(i,j-1)

(i-1,j-1)

Optimal path to (i-1,j-1) + (xi,yj)δ

δS(i-1,j-1) + (xi , yj)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 31: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

Optimal path

(i,j)

(i,j-1)

(I-1,j)

(i-1,j-1)

Optimal path to (i,j-1) + (xi,-)δ

δS(i,j-1) + (xi, -)

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 32: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

The Basic ALGORITHM

S(i,j) =

S(i-1, j-1) + (xi, yj)

S(i-1, j) + (xi, -)

S(i, j-1) + (-, yj)

MAX

δ

δ

δ

Algorithmic Functions of Computational Biology –

Professor Istrail

Page 33: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

T

A

T

0 1 2 30

1

2

A GA -

- A

A A

- A

- A-

A

- A

G -

A -

A -

G -

G -

T -

T -

T -

- T

- T

- T

- T

A T

G T

T T

G A

T A

0

0

0

0

1

1

0

1

1

0

1

2AGT A - TOptimal Alignment

Optimal Alignment and TracbackAlgorithmic Functions of Computational Biology –

Professor Istrail

Page 34: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

S(i,j) =

S(i-1, j-1) + (xi, yj),

S(i-1, j) + (xi, -),

S(i, j-1) + (-, yj)

MAX

δ

δ

δ

0, We add this

The Basic ALGORITHM: Local SimilarityAlgorithmic Functions of Computational Biology –

Professor Istrail

Page 35: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

CSCI2820 - Class 4

Protein global alignment

35

X = hlsek Y = nlsak

• X and Y represent a protein subsequence from the BRCA2 (early onset) protein in human and chimpanzee

• Global alignments are used when the two sequences being compared represent a similar biological sequence

Page 36: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

CSCI2820 - Class 4

Margaret Dayhoff’s PAM 100 similarity matrix (partial)

36

A N E H L K S *

A 4 -1 0 -3 -3 -3 1 -9

N -1 5 1 2 -4 1 1 -9

E 0 1 5 -1 -5 -1 -1 -9

H -3 2 -1 7 -3 -2 -2 -9

L -3 -4 -5 -3 6 -4 -4 -9

K -3 1 -1 -2 -4 5 -1 -9

S 1 1 -1 -2 -4 -1 4 -9

* -9 -9 -9 -9 -9 -9 -9 1

Page 37: Intro to Alignment Algorithms: Global and Localcs.brown.edu/courses/csci1810/lectures/slides/alignment_intro-2020.… · Intro to Alignment Algorithms: Global and Local Algorithmic

CSCI2820 - Class 4 37

h l s e k

n

l

s

a

k

XY

-9 2 -7 -16 -25 -34

-9 -18 -27 -36 -45

-27 -16 -1 12 3 -6

-36 -25 -10 3 12 3

-18 -7 8 -1 -10 -19

0

-45 -34 -19 -6 3 17