a simpler 1.5-approximation algorithm for sorting by transpositions tzvika hartman weizmann...

Post on 01-Apr-2015

223 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Simpler 1.5-Approximation Algorithm for Sorting by

Transpositions

Tzvika Hartman

Weizmann Institute

Genome Rearrangements

During evolution, genomes undergo large-scale mutations which change gene order (reversals, transpositions, translocations).

Given 2 genomes, GR algs infer the most economical sequence of rearrangement events which transform one genome into the other.

Genome Rearrangements Model

Chromosomes are viewed as ordered lists of genes.

Unichromosomal genome, every gene appears once.

Genomes are represented by unsigned permutations fo genes.

Circular genomes (e.g., bacteria & mitochondria) are represented by circular perms.

Sorting by Transpositions

A transposition exchanges between 2 consecutive segments of a perm.

Example : 1 2 3 4 5 6 7 8 9

1 2 6 7 3 4 5 8 9

Sorting by transpositions : finding a shortest sequence of transpositions which sorts the perm.

Previous work

1.5-approximation algs for sorting by transpositions [BafnaPevzner98, Christie99].

An alg that sorts every perm of size n in at most 2n/3 transpositions [Erikkson et al 01].

Complexity of the problem is still open.

Main Results

1. The problem of sorting circular permutations by transpositions is equivalent to sorting linear perms by transpositions.

2. A new and simple 1.5-approximation alg for sorting by transpositions, which runs in quadratic time.

Linear & Circular Perms

A

B

A

C

t

BA DC DBCAt

BC

Linear transposition:

Circular transposition:

• Circular transpositions can be represented by exchanging any 2 of the 3 segments.

A transposition “cuts” the perm at 3 points.

Linear & Circular Equivalence

Thm : Sorting linear perms by transpositions is computationally equivalent to sorting circular perms.

Pf sketch: Circularize linear perm by adding an n+1 element and closing the circle.

Пn+1Пn П1

П1 . . . Пn .

.. .

.

• Every linear transposition is equivalent to a circular transposition that exchanges the 2 segments that do not include n+1.

Breakpoint Graph [BafnaPevzner98]

Perm : ( 1 6 5 4 7 3 2 ) Replace each element j by 2j-1,2j:

= (1 2 11 12 9 10 7 8 13 14 5 6 3 4) Circular Breakpoint graph G(): 1

10

2

8 7

914

5

6

34

11

13

12

Vertex for every element.

Black edges (2i, 2i+1)

Grey edges (2i, 2i+1)

Breakpoint Graph (Cont.)

Unique decomposition into cycles. codd() : # of odd cycles in G().

Define Δcodd(,t) = codd(t · ) – codd()

Lemma [BP98]: t and , Δcodd(,t) {0, 2, -2}.

1

10

2

8 7

914

5

6

34

11

13

12

Effect on Graph : Example

Perm: (1 3 2). After extension: (1 2 5 6 3 4). Breakpoint graph:

1

4

3 6

5

21

4

3 6

5

2

• # of cycles increased by 2

Effect on Graph : Example

Perm : (6 5 4 3 2 1). After extension : (11 12 9 10 7 8 5 6 3 4 1 2). Breakpoint graph :

11

2

1

4

36

58

7

10

912

11

2

1

4

36

58

7

10

912

• # of cycles remains 2

Breakpoint Graph (Cont.)

Max # of odd cycles, n, is in the id perm, thus: Lower bound [BP98]: For all ,

d() [n-codd()]/2.

Goal : increase # of odd cycles in G. t is a k-transposition if Δcodd(,t) = k.

A cycle that admits a 2-transposition is oriented.

Simple Permutations

A perm is simple if its breakpoint graph contains only short (3) cycles.

The theory is much simpler for simple perms. Thm : Every perm can be transformed into a

simple one, while maintaining the lower bound. Moreover, the sorting sequence can be mimicked.

Corr : We can focus only on simple perms.

3 - Cycles

2 possible configurations of 3-cycles:

Non-oriented 3-cycle Oriented 3-cycle

(0,2,2)-Sequence of Transpositions

A (0,2,2)-sequence is a sequence of 3 transpositions: the 1st is a 0-transposition and the next two are 2-transpositions.

A series of (0,2,2)-sequences preserves a 1.5 approximation ratio.

Throughout the alg, we show that there is always a 2-transposition or a (0,2,2)-sequence.

Interleaving Cycles

2 cycles interleave if their black edges appear alternatively along the circle.

Lemma : If G contains 2 interleaving 3-cycles, then a (0,2,2)-sequence.

Shattered Cycles

Lemma : If G contains a shattered cycle, then a (0,2,2)-sequence.

2 pairs of black edges intersect if they appear alternatively along the circle.

Cycle A is shattered by cycles B and C if every pair of black edges in A intersects with a pair in B or with a pair in C.

Shattered Cycles (Cont.)

Lemma : If G contains no 2-cycles, no oriented cycles and no interleaving cycles, then a shattered cycle.

The Algorithm

While G contains a 2-cycle, apply a 2-transposition [Christie99].

If G contains an oriented 3-cycle, apply a 2-transposition on it.

If G contains a pair of interleaving 3-cycles, apply a (0,2,2)-sequence.

If G contains a shattered unoriented 3-cycle, apply a (0,2,2)-sequence.

Repeat until perm is sorted.

Conclusions

We introduced 2 new ideas which simplify the theory and the alg:

1. Working with circular perms simplifies the case analysis.

2. Simple perms avoid the complication of dealing with long cycles (similarly to the HP theory for sorting by reversals).

Open Problems

Complexity of sorting by transpositions. Models which allow several rearrangement

operations, such as trans-reversals, reversals and translocations (both signed & unsigned).

Acknowledgements

Ron Shamir.

Thank you!

top related