fixed parameter algorithms for protein similarity search under mrna structure constrains a joint...

27
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.

Upload: bernard-lane

Post on 19-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

Fixed parameter algorithms for

protein similarity search under mRNA structure constrains

A joint work by:

G. Blin, G. Fertin, D. Hermelin, and S. Vialette.

Page 2: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

2

Outline Biological motivation.

mRNA molecules. The mRNA to protein process. Selenocysteine Insertion.

The MRSO problem. Implied structure graph. Known results.

Two natural parameters. The parameters. Nice edge bipartition. A general algorithm for both parameters.

Page 3: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

3

Outline The cutwidth parameter.

An efficient algorithm for small cutwidth. Implications of this algorithm.

Binary similarity functions. Closing remarks.

Page 4: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

4

mRNA molecules:

Can be considered as strings over {A,C,G,U}. Complementary bases (A-U, G-C) may pair to form a folding

structure (secondary structure) of the mRNAs. Encode genetic information that is later translated into

proteins.

Biological Motivation

Page 5: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

5

Biological Motivation The mRNA protein process:

Page 6: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

6

The mRNA protein process - standard assumption: Each codon encodes into a single amino acid.

Recently, biologists found that this not necessarily true: According to different folding structures of the mRNA, a single

codon might encode into different amino acids. Example application - Selenocysteine insertion.

Biological Motivation

Page 7: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

7

Selenocysteine insertion:

Selenocysteine is a rare amino acid only recently discovered.

Generated by the UGA codon which usually encodes a stop signal.

The presence of the SECIS element forces the generation of Selenocysteine rather than stopping the encoding.

Biological Motivation

Page 8: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

8

Selenocysteine insertion:

Modifying existing proteins by inserting the SECIS element results in certain cases in enhanced proteins.

Is this application only the tip of the iceberg?

Biological Motivation

Page 9: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

9

The MRSO problem

The MRSO problem: Given a specified secondary structure S and an mRNA sequence

R, construct an mRNA sequence R’ with complementary

nucleotides according to S which is as similar as possible to R.

CGG CGA CUA AAU

+

R

S

Page 10: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

10

GCGU

The MRSO problem

The MRSO problem: Given a specified secondary structure S and an mRNA sequence

R, construct an mRNA sequence R’ with complementary

nucleotides according to S which is as similar as possible to R.

CG CGA CUA

R’

AG A U

Page 11: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

11

The score of a solution is given by n similarity functions:

Given f1,…,fn, one needs no additional information on the source mRNA sequence R.

CGU CGA CUA GCG

R’

s(R’) = f1(CGU) + f2(CGA) + f3(CUA) + f4(GCG)

The MRSO problem

Page 12: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

12

The implied structure graphimplied structure graph:

A linear graph with maximum degree 3. Complementary constrains within nucleotides are labeled on

the edges of G.

S

1 2 3 4G

The MRSO problem

Page 13: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

13

The MRSO problem A more formal definition [Backofen et al.’02]:

Given an implied structure graph G with n vertices, and f1,…,fn similarity functions, find an assignment of codons c1,…,cn to the vertices of G that:

1. Maximizes f(ci).

2. Is compatible with respect to G.

Definition allows adapting to different applications. Allows also a certain degree of combinatorial leverage as we

shall soon see…

Page 14: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

14

The MRSO problem – known results [Backofen et al.’02 and Bongartz’04]:

NP-complete (APX-hard) for general implied structure graphs.

Constant factor approximation algorithms. Cannot handle well -.

In P when the implied structure graph G is outer-planar. In other words, if one can permutate the nodes of G such that all

of the edges of G are non-crossing.

[Backofen et al.’02] give an O(n) algorithm for outer-planar

implied structure graphs.

We call this algorithm Aop in this talk.

Page 15: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

15

1 2 3 4

Two natural parameters

Let = # degree 3 vertices in G. Let = # edge crossings in G.

5 6 7 8

Page 16: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

16

Two natural parameters

Modifying the similarity functions: We can modify the similarity functions so that some vertices are

assigned specific codons in any feasible solution.

For example: Ensuring the first vertex is assigned AAA:

f*1(AAA) = f1 (AAA).

f*1(C) = - , for all C AAA.

Page 17: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

17

6

Nice edge bipartitionNice edge bipartition of G:

Upper part induces an outer-planar graph.

Two natural parameters

1 2 3 4 5 7 8

Upper part

Bottom part

Page 18: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

18

A general algorithm:

Enumerate all assignments which are compatible with respect to the bottom part.

Invoke Aop with each such assignment. Time complexity = O(2O(b)n), where b = # bottom edges.

Two natural parameters

61 2 3 4 5 7 8

Page 19: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

19

The general algorithm can be applied for our two natural parameters: Parameter = # edge crossings in G.

Time = O(2O()n), hence polynomial for = O(lgn).

5

Two natural parameters

1 2 3 4 6 7 8

Page 20: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

20

The general algorithm can be applied for our two natural parameters: Parameter = # degree 3 vertices in G. Every graph with maximum degree 2 is outer-planar.

Time = O(2O()n), hence polynomial for = O(lgn).

Two natural parameters

71 352 46 81 2 3 4 5 6 7 8

Page 21: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

21

4 5 631 2

The cutwidth cutwidth of G:

For p {1,…,n-1}, let Ep denote the edges connecting

vertices from {1,…,p} to {p+1,…,n}, and let Vp denote the

vertices of G which are incident to Ep.

Let denote the cutwidth of G. Then = maxp|Ep|.

7

The cutwidth parameter

8

p = 2

Ep

Vp

Page 22: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

22

Algorithm outline: Pick any p {1,…,n-1}. For each assignment for Vp that is compatible with Ep:

Recursively find the optimal solution for the subgraphs of G induced by {1,…,p} and {p+1,…,n} under this assignment.

Return the highest scoring solution found in the previous step.

The cutwidth parameter

1 2 73 4 5 6 8

CGA UAA CGG AUA GUU CGC

Page 23: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

23

Time = O(2O()n), hence polynomial for = O(lgn).

Theorem [Korach&Solel’93 via Chung&Seymour’89]:Any graph G with n vertices and constant treewidth has a vertex ordering such that G under this ordering has cutwidth of O(lgn).

Theorem [Bodlaender’95]:If G is either a chordal graph or a circular-arc graph with constant maximum clique size then G has constant treewidth. If G is k-outerplanar for any constant k then G has constant treewidth.

Combining all the above we get:MRSO is polynomial time solvable if G is either a chordal graph, a circular-arc graph, or k-outerplanar.

The cutwidth parameter

Page 24: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

24

Binary similarity functions Suppose we are only interested in the number of

“correct” codons in a solution.

In this case we can restrict ourselves to binary similarity functions. That is, for all i : fi : 3 {0,1}.

Unfortunately, MRSO is NP-hard even when restricted only to instances with binary similarity functions.

CGG CGA CUA AAUSource CUA GGA CGG UGA

Target CGG GA CUA AAU C GA CGG UGAU A C CCUA AAU C GA CGG UGAGACGG

Page 25: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

25

Binary similarity functions MRSO with restrictive similarity functions is in FPT for

parameter = score of the optimal solution. More precisely, its solvable in O(29.25n) time.

Proof sketch: We can assume w.l.o.g. that for all i there exists a C such that

fi(C) = 1.

Any maximal independent set in G is of size at least n/4, since G

is at most cubic.

We prove for n/4 and > n/4 separately.

Page 26: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

26

Binary similarity functions Suppose n/4:

Find an independent set of size in O() time. Since for all i there exists a C such that fi (C) = 1, there exists an

assignment to this independent set which guarantees a score of at least .

Since fi 0 for all i, this assignment can be extended to all vertices of G to obtain an assignment with score at least .

Suppose > n/4: Try all -subsets of the vertices of G. There are at most

23.25 such subsets. Enumerating all possible codon assignments

for each subset requires O(26) time.

4 ( )

Page 27: Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette

27

Closing remarks Extending our results:

Finding a practical algorithm for the cutwidth problem restricted to

cubic graphs with fixed cutwidth.

More interesting parameters? Hardness results?

Applying our techniques to a similar variation of the problem

which has been studied in the literature [Backofen’04].

Thank You!