regular expression constrained sequence alignment

20
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department

Upload: anisa

Post on 25-Feb-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Regular Expression Constrained Sequence Alignment. Abdullah N. Arslan Assistant Professor Computer Science Department. Outline. Sequence alignment Common frame-work DP solution Why constrained ? RE constrained sequence alignment Algorithm Concluding Remarks. Alignment Matrix. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regular Expression Constrained Sequence Alignment

Regular Expression Constrained Sequence Alignment

Abdullah N. ArslanAssistant Professor

Computer Science Department

Page 2: Regular Expression Constrained Sequence Alignment

Outline

• Sequence alignment Common frame-work DP solution Why constrained ?

• RE constrained sequence alignment Algorithm

• Concluding Remarks

Page 3: Regular Expression Constrained Sequence Alignment

Alignment Matrix

Page 4: Regular Expression Constrained Sequence Alignment

Edit Graph

Page 5: Regular Expression Constrained Sequence Alignment

Dynamic Programming Solution

Hi,j: maximum score achieved at (i, j)

where Hi,j = 0 whenever i=0 or j=0,

Hn,m in O(nm) time, O(m) space

Page 6: Regular Expression Constrained Sequence Alignment

DP Solution: Local Alignment

Hi,j: similarity score achieved at (i, j)

where Si,j = 0 whenever i=0 or j=0,

max Hi,j in O(nm) time, O(m) space

Page 7: Regular Expression Constrained Sequence Alignment

Dynamic Programming Formulation

Affine gap penalties Penalty for a gap of length k is +(k-1)

where Si,j = Fi,j = Ei,j = 0 when i=0 or j=0

max Hi,j O(nm) time, O(m) space

Page 8: Regular Expression Constrained Sequence Alignment

The Definition of the Constrained LCS Problem

• The contrained LCS (CLCS) problem Given strings S1,S2, and P

• Find lcs of S1 and S2 s.t. P is a subsequence of this lcs

• Motivation: Computing the homology of two biological

sequences that have a specific part in common

Page 9: Regular Expression Constrained Sequence Alignment

Constrained Sequence Alignment Problems

• Constrained LCS Tsai 2003, O(n2m2r) time Chin et. al 2004, Arslan and Egecioglu 2004

• O(nmr) time

• Edit-distance constrained sequence alignment Arslan and Egecioglu 2004, O(dnmr)

• Regular-expression constrained sequence alignment Motivation:

• Comet and Henry, 2002• PROSITE patterns

This paper

Page 10: Regular Expression Constrained Sequence Alignment

PROSITE patterns as constraints

• PROSITE patterns are Regular expressions with no Kleene closure PROSITE database e.g. [GA]-X(4)-G-K-[ST]

• ATP/GTP-binding site motif A (P-loop) (PS00017)

• Comet and Henry reward alignments• Regular expression constrained sequence

alignment Find a maximal alignment that includes a given

RE

Page 11: Regular Expression Constrained Sequence Alignment

Example: For [GA]-X(4)-G-K-[ST]

Page 12: Regular Expression Constrained Sequence Alignment

Using Edit Graph: e.g. A(C+G)*(S+T)

Page 13: Regular Expression Constrained Sequence Alignment

Automata for A(C+G)*(S+T)

Page 14: Regular Expression Constrained Sequence Alignment

Some Details of Automata Construction

• Equivalent NFA N to a given RE R

• Construct from N a new NxN automaton

Moves on edit operations • (or equivalently on alignment columns)

States have weights• Interested in the weights of the final states after the

alignment is complete

Page 15: Regular Expression Constrained Sequence Alignment

Weighted Automaton

• Initial weights are

• Weight of (q0,q0) is initially 0

• Update new maximum scores at reachable states

• Weights become in unreachable states

• What are the maximum weights at the final states?

Page 16: Regular Expression Constrained Sequence Alignment

Computations on Automata

Page 17: Regular Expression Constrained Sequence Alignment

Complexity• Simulate automata based on DP solution

Each steps requires examining the trasition functions

Maintain a list of active (reachable) states

Update state weights as alignments are formed

Automaton Mi,j has the optimum weights

Page 18: Regular Expression Constrained Sequence Alignment

Generalizations: Local Alignment & Affine gaps

Page 19: Regular Expression Constrained Sequence Alignment

CONCLUSION

• Introduced the regular expression constrained sequence alignment problem

• Present an algorithm for the problem

• Future work Generalization of the problem for

• Multiple sequence alignment• Multiple regular expressions as a constraint

Page 20: Regular Expression Constrained Sequence Alignment

Thank YouThank You