algorithms for parameterized graph problems with ... · there is a growing, vital need for fast...

Algorithms for ParameterizedGraph Problems

with Applications to BiologicalNetwork Queries

Meirav Zehavi

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-10 - 2015

Algorithms for ParameterizedGraph Problems

with Applications to BiologicalNetwork Queries

Research Thesis

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

Meirav Zehavi

Submitted to the Senate of

the Technion — Israel Institute of Technology

Tammuz 5775 Haifa June 2015


This research thesis was done under the supervision of Prof. Ron Y. Pinter and

Prof. Hadas Shachnai in the Department of Computer Science.

Acknowledgements

I would like to thank my advisors, Ron Y. Pinter and Hadas Shachnai, for their

invaluable guidance throughout the years of my work. I would also like to thank faculty

members and students from the Technion and other institutions that supported and

expressed their interest in my research.

I am grateful to my brother, Shaked, my parents, Ada and Avi, and my grandpar-

ents, Reuven and Batya, for their continuous and unconditional love, encouragement,

inspiration, support, concern, good advices, patience and understanding. Words cannot

express my gratitude. I dedicate this work to my mother, Ada. I also thank my little

dog, Odi.

The generous financial support of the Technion and the Zeff family is gratefully acknowl-

edged.


List of Publications

Papers included in this dissertation:

1. Ron Y. Pinter and Meirav Zehavi. Algorithms for Topology-Free and Alignment

Network Queries. Journal of Discrete Algorithms (JDA), 27:29–53, 2014. [PZ14]

2. Ron Y. Pinter, Hadas Shachnai and Meirav Zehavi. Partial Information Net-

work Queries. Journal of Discrete Algorithms (JDA), 35 (invited IWOCA 2013

issue):129–145, 2015. [PSZ15]

• Preliminary version: Ron Y. Pinter and Meirav Zehavi. Partial Informa-

tion Network Queries. In the proc. of the 24th International Workshop on

Combinatorial Algorithms (IWOCA), pages 362–375, 2013. [PZ13]

3. Meirav Zehavi. Parameterized Algorithms for Module Motif. In the proc. of the

38th International Symposium on Mathematical Foundations of Computer Science

(MFCS), pages 825–836, 2013. [Zeh13c]

4. Meirav Zehavi. Algorithms for k-Internal Out-Branching. In the proc. of the

8th International Symposium on Parameterized and Exact Computation (IPEC),

pages 361–373, 2013. [Zeh13a]

5. Hadas Shachnai and Meirav Zehavi. Representative Families: A Unified Tradeoff-

Based Approach. In the proc. of the 22nd European Symposium on Algorithms

(ESA), pages 786–797, 2014. [SZ14b]

6. Hadas Shachnai and Meirav Zehavi. Parameterized Algorithms for Graph Partition-

ing Problems. In the proc. of the 40th International Workshop on Graph-Theoretic

Concepts in Computer Science (WG), pages 384–395, 2014. [SZ14a]

7. Ron Y. Pinter, Hadas Shachnai and Meirav Zehavi. Deterministic Parameterized

Algorithms for the Graph Motif Problem. In the proc. of the 39th International

Symposium on Mathematical Foundations of Computer Science (MFCS), pages

589–600, 2014. [PSZ14a]

8. Ron Y. Pinter, Hadas Shachnai and Meirav Zehavi. Improved Parameterized

Algorithms for Network Query Problems. In the proc. of the 9th International

Symposium on Parameterized and Exact Computation (IPEC), pages 85–96, 2014.

[PSZ14b]

9. Meirav Zehavi. Mixing Color Coding-Related Techniques. To appear in the

proc. of the 23rd European Symposium on Algorithms (ESA), 2015.


Contents

List of Figures

Abstract 1

Abbreviations and Notation 3

1 Introduction 5

1.1 Color Coding-Related Techniques . . . . . . . . . . . . . . . . . . . . . . 7

1.1.1 Color Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.2 Divide-and-Color . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.3 Multilinear Detection and Narrow Sieves . . . . . . . . . . . . . . 8

1.1.4 Representative Sets . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Parameterized Graph Problems . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 Network Query Problems . . . . . . . . . . . . . . . . . . . . . . 12

1.2.2 Subcases of Subgraph Isomorphism . . . . . . . . . . . . . . . . . 19

1.2.3 Spanning Tree-Related Problems . . . . . . . . . . . . . . . . . . 20

1.2.4 Graph Partitioning Problems . . . . . . . . . . . . . . . . . . . . 21

1.2.5 Matching and Packing Problems . . . . . . . . . . . . . . . . . . 22

2 Methods 25

2.1 Representative Sets: New Computations . . . . . . . . . . . . . . . . . . 25

2.1.1 A Unified Tradeoff-Based Approach . . . . . . . . . . . . . . . . 25

2.1.2 Disjointness Conditions . . . . . . . . . . . . . . . . . . . . . . . 26

2.2 Multilinear Detection with Proper Colorings . . . . . . . . . . . . . . . . 27

2.3 Variants of Randomized Separation . . . . . . . . . . . . . . . . . . . . . 27

2.4 Guiding Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5 Mixtures of Color Coding-Related Techniques . . . . . . . . . . . . . . . 29

3 Results 33

3.1 Algorithms for Topology-Free and Alignment Network Queries . . . . . 33

3.2 Partial Information Network Queries . . . . . . . . . . . . . . . . . . . . 59

3.3 Parameterized Algorithms for Module Motif . . . . . . . . . . . . . . . . 77

3.4 Algorithms for k-Internal Out-Branching . . . . . . . . . . . . . . . . . . 90


3.5 Representative Families: A Unified Tradeoff-Based Approach . . . . . . 104

3.6 Parameterized Algorithms for Graph Partitioning Problems . . . . . . . 117

3.7 Deterministic Parameterized Algorithms for the Graph Motif Problem . 130

3.8 Improved Parameterized Algorithms for Network Query Problems . . . 143

3.9 Solving Parameterized Problems by Mixing Color Coding-Related Tech-

niques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

4 Conclusion 191

Hebrew Abstract i


List of Figures

1.1 A subfamily S ⊆ S representing S in a matroid U5,4, with p = 2. . . . . 11

1.2 An example of a homeomorphism h from G to G′. . . . . . . . . . . . . 13

1.3 (A) An input for PINQI, where k = 8. (B) A solution for the input. . . 14

2.1 A family F ⊆ 2E . Assume that n = 5, k = 4, and p = 2. An arrow from

a set S ∈ S to a set F ∈ F indicates that S ⊆ F . . . . . . . . . . . . . . 25

2.2 A (v, u)-tree T , and a (v, u)-guide R, where d=3, k=12, and T listens

to R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29


Abstract

There is a growing, vital need for fast algorithms for problems that are unlikely to admit

efficient solutions, based on classical computational complexity theory. Parameterized

Complexity is an exciting paradigm for coping with computationally hard problems,

which is amazingly doable mathematically on a routine basis. In a nutshell, this paradigm

aims to reduce the running times of algorithms for NP-hard problems, by confining the

combinatorial explosion to a parameter k. In the past decade, three techniques, known

as “color coding-related techniques”, led to the design of breakthrough parameterized

algorithms. These techniques are non-standard in the extent in which they connect such

seemingly disparate branches of Computer Science and Mathematics as matroid theory,

linear algebra, graph theory and combinatorial optimization.

The original color coding method was proposed by Alon et al. in 1994. Roughly

speaking, it consists of several iterations where one randomly colors elements that

may be part of a solution; with high probability, we encounter an iteration where the

elements of a solution have distinct colors, in which case they can be discovered via

dynamic programming. Color coding can be derandomized and is suitable for developing

algorithms for weighted problems. Divide-and-color, introduced by Chen et al. in 2006,

improved upon color coding. It is a recursive technique, where each stage can be viewed

as an application of color coding with only two colors. Multilinear detection and narrow

sieves are tightly linked algebraic techniques, introduced in 2008 and 2010 by Koutis

et al. and Bjorklund et al., respectively. These techniques are suitable for developing

extremely fast randomized algorithms for unweighted problems. Recently, Fomin et

al. obtained a powerful computation of representative sets, which enables development

of deterministic algorithms for weighted problems that are faster than those based on

divide-and-color, though not as those based on narrow sieves.

This thesis presents general schemes for mixing and improving upon color coding-

related techniques, namely, divide-and-color, multilinear detection/narrow sieves and

representative sets. On a high level, to create wholes greater than the sums of their

parts, we propose strategies whose components combine different color coding-related

techniques. For example, we introduce a strategy consisting of two narrow sieves-

based procedures and a divide-and-color step. On a lower level, we examine improved

methods for applying each color coding-related technique individually. This includes a

unified tradeoff-based approach for computing representative sets, an application of the

1


multlinear detection technique that integrates proper colorings of graphs, and a tool

called guiding trees utilized for finding trees of unknown topologies.

We develop specific algorithms for classical problems such as k-Path, k-Internal

Out-Branching, 3-Set k-Packing and Max (k, n − k)-Cut, as well as problems

motivated by real-world applications to network queries, which are of major importance

to systems biology. In particular, we initiate the study of the Partial Information

Network Query problem. Roughly speaking, in a network query we seek a given

pattern in a node-labeled graph. In network biology, such queries can be used to study

the function, preservation and evolution of certain biological entities. Generalizing

previous variants of network query problems, the Partial Information Network

Query problem fits the scenario where only partial information is available on the

topology of the pattern.

2


Abbreviations and Notation

FPT : fixed-parameter tractable

O∗ : hides factors polynomial in the input size

ANQ : alignment network query

GM : graph motif

TFNQ : topology-free network query

PINQ : partial information network query

indels : insertions and deletions

IOB : internal out-branching

IST : internal spanning tree

FGPP : fixed-cardinality graph partitioning problem

SP : set packing

WSP : weighted set packing

PC : partial cover

DS : dominating set

3


4


Chapter 1

Introduction

Design and analysis of efficient algorithms is at the heart of computer science. Indeed,

algorithms are essential to the way computers process data. However, when considering

the class of NP-hard problems, such algorithms seem to be beyond our reach. Yet, large-

scale computer applications often need to rely on solutions to these problems. Therefore,

over the years, multiple paradigms for coping with computationally difficult problems

have been suggested. Parameterized Complexity (also known as Multivariate Complexity),

introduced in the 1990s by Downey and Fellows, is an exciting such paradigm, which is

amazingly doable mathematically on a routine basis [FG06, Nie06, DF13].

In a nutshell, parameterized complexity aims to reduce the running times of algo-

rithms for NP-hard problems, by confining the combinatorial explosion to a parameter

k. More precisely, a problem is fixed-parameter tractable (FPT) with respect to a

parameter k if it can be solved by a parameterized algorithm, whose running time

can be bounded by a polynomial in the input size n, multiplied by a function f that

depends only on k (e.g., 2k · n, klog k · n3 and 22k · log n). This bound is denoted by

O∗(f(k)), where O∗ is the standard notation for hiding factors polynomial in the input

size. For some problems, it is possible to develop not only parameterized algorithms, but

also kernelization algorithms. Roughly speaking, a kernelization algorithm translates

an instance of a given problem to a smaller instance of the same problem, whose size

depends only on k, in polynomial time. As noted by Marx [Mar12], “the existence

of polynomial kernels is a mathematically deep and very fruitful research direction.”

More details on kernelization can be found, e.g., in the excellent surveys by Bodlaender

[Bod14] and Lokshtanov et al. [LMS12].

By a careful examination of real datasets, one can choose a parameter k that will

often be significantly smaller than the input size. This allows developing practical

algorithms for non-trivial classes of instances of NP-hard problems. Niedermeier [Nie10]

observe that “problem parameterization is a pervasive and ubiquitous tool in attacking

intractable problems”, and “multivariate algorithmics helps to significantly increase the

impact of theoretical computer science on practical computing”. Indeed, parameterized

problems arise in a variety of areas, including Bioinformatics [HNW08], Social Choice

5


and Voting [BBC+12], Artificial Intelligence [GS08], Psychology and Cognitive Science

[vRT08], Geometry [GKR09] and Database Theory [Gro02].

To demonstrate the practical relevance of parameterized algorithms, NP-hard prob-

lems arising from molecular biology are natural candidates. These problems are notori-

ously known for requiring researchers to frequently explore and unravel great masses

of datasets. However, structural properties of common inputs for these problems can

be exploited by parameterized algorithms to sidestep an impractical dependency of

running time on the size of the dataset. In particular, when considering a problem

primarily motivated by bioinformatics applications, the size of its solution, which is

the most well-studied parameter in Parameterized Complexity, is often significantly

smaller than the input size. Huffner et al. [HNW08] demonstrate the usefulness of

parameterized algorithms in bioinformatics research, stating that they “firmly believe

that there will continue to be a strong interaction between parameterized complexity

analysis and algorithmic bioinformatics.” The explosion of publicly available genomic

datasets resulting from the Human Genome Project fueled a computational revolution

in biology−many bioinformatics techniques were developed to sequence and annotate

genomes and their observed mutations. However, these techniques cannot be used to

analyze the increasing amount of data available on biological networks. To unravel this

new mass of information, it is essential to design novel graph algorithms, in general,

and parameterized graph algorithms, in particular.

During the last decade, numerous techniques were incorporated in the design of

parameterized algorithms (see, e.g., the wide variety of research results published in the

proceedings of the International Symposium on Parameterized and Exact Computation).

This myriad of techniques reflects the rapid growth of the area of parameterized

complexity. In particular, three techniques, known as “color coding-related techniques”,

led to the design of breakthrough parameterized algorithms. These techniques are

non-standard in the extent in which they connect such seemingly disparate branches of

Computer Science and Mathematics as matroid theory, linear algebra, graph theory and

combinatorial optimization. They improve upon the seminal work on the color coding

technique by Alon et al. [AYZ95]; this work resulted in the first algorithms that run in

time O∗(ck), where c > 1 is a constant, for several subcases of Subgraph Isomorphism

such as the k-Path problem.

In this thesis, we present general schemes for mixing and improving upon color

coding-related techniques. On a high-level, to create wholes greater than the sums of

their parts, we propose strategies whose components combine different color coding-

related techniques. On a lower level, we examine improved methods for applying

each color coding-related technique individually. We demonstrate the power and wide-

applicability of color coding-related techniques, including our general schemes, by

developing fast parameterized algorithms for a variety of graph problems, including

subcases of subgraph isomorphism, a spanning-tree related problem, graph partitioning

problems, and matching and packing problems. Special emphasis is placed on network

6


query problems, which are of major importance to systems biology. In particular, we

define a new problem, the partial information network query problem, which forms a

bridge between two types of well-known network queries, namely alignment-based and

topology-free network queries.

1.1 Color Coding-Related Techniques

In this section, we give an overview of color coding, and the three color coding-related

techniques developed in the last decade: divide-and-color, multilinear detection/narrow

sieves, and representative sets.

1.1.1 Color Coding

The color coding method was proposed in 1994 by Alon et al. [AYZ94, AYZ95]. It

was originally used to solve subcases of the Subgraph Isomorphism problem. More

generally, it is suitable for developing algorithms for many problems, including weighted

problems, where one seeks solutions that contain exactly k distinct elements, and which

become “easier” if the entire universe of elements contains only k elements.

This method is usually applied as follows. Using a palette of k colors, one performs

several iterations−each iteration randomly colors every element that may be part of a

solution. If enough iterations are performed, with high probability, there is an iteration

where the k elements of a solution have distinct colors. Then, a solution can be found

via dynamic programming. While previously, for each partial solution, one had to store

the elements associated with it, now one has to store the colors associated with it. Since

there are only k colors, there are only 2k options for color sets associated with partial

solutions. Thus, the problem becomes easier. For example, given a colored graph, a

colorful simple path on k vertices can be found in time O∗(2k) via standard dynamic

programming.

In a given iteration, the probability that the elements of a solution have distinct

colors is k!kk

> e−k. Therefore, to ensure that one encounters an iteration where a

solution is colorful, it is sufficient to perform O∗(ek) iterations. For example, in the case

of k-Path (where one seeks a simple path on k vertices in a graph), this implies that the

overall running time of the algorithm is bounded by O∗((2e)k). Huffner et al. [HWZ08]

showed that by using more than k colors, certain color coding-based algorithms can be

sped-up. For example, the bound on the running time of the algorithm for k-Path is

improved from O∗((2e)k) to O∗(4.314k).

The color coding method can be derandomized by using a construction of a k-perfect

hash family. This is a family F of functions from {1, 2, . . . , n} to {1, 2, . . . , k}, where n

is the size of the universe, such that for every subset S ⊆ {1, 2, . . . , n} of size k, there is

a function in F that is bijective when restricted to S. Having such a family F , one does

not need to color elements randomly, but can color them according to each function in F .

7


The best explicit construction of a k-perfect hash family, given by Naor et al. [NSS95],

results in a family of size O∗(ek+o(k)), which can be computed in time O∗(ek+o(k)).

1.1.2 Divide-and-Color

The divide-and-color method was proposed in 2006 by Chen et al. [CLS+07, KMR+06,

CKL+09] to solve the k-Path problem and matching and packing problems. Divide-

and-color is suitable for solving weighted problems, and results in algorithms that have

polynomial space complexities, and which are faster than those based on color coding.

For example, using divide-and-color, k-Path can be solved in (deterministic) time

O∗(4k+o(k)).

The technique is based on recursion; each step of the recursion can be viewed as an

application of color coding, where one has a palette of only two colors. More precisely,

at each step, we have a set A of n elements, and we seek a certain subset A∗ of k

elements in A. We partition A into two (disjoint) sets, B and C, by randomly coloring

its elements. Thus, we get the problem of finding a subset B∗ ⊆ A∗ in B, and another

problem of finding the subset C∗ = A∗ \ B∗ in C. The partition should be done in a

manner that is both efficient and results in an easier problem, which does not necessarily

mean that we get two independent problems (of finding B∗ in B and C∗ in C).

Usually, one is interested in finding a partition such that |B∗| = |C∗| = k/2. For

example, in the case of k-Path, instead of directly finding a simple path on k vertices

from A, one is interested in finding a simple path on k/2 vertices from B, along with

such a path whose vertices belong to C, which can be combined to one simple path

on k vertices from A. Since the probability that the elements in A are colored in an

appropriate manner (i.e., B∗ ⊆ B and C∗ ⊆ C) is 2−k, the recursive step should consist

of O∗(2k) coloring iterations. Overall, this implies that one obtains an algorithm that

runs in time O∗(2k/20 · 2k/21 · 2k/22 · . . . · 1) = O∗(4k).

Deterministic applications of divide-and-color use a tool called an (n, k)-universal

set. This is a set F of functions from {1, 2, . . . , n} to {0, 1} such that for every pair (I, g)

of a subset I ⊆ {1, 2 . . . , n} of size k and a function g : I → {0, 1}, there is a function

f ∈ F which satisfies f(i) = g(i) for all i ∈ I. As in the case of color coding, having

such a family F , one does not need to color elements randomly, but can color them

according to each function in F . The best explicit construction of an (n, k)-universal

set, given by Naor et al. [NSS95], results in a family of size O∗(2k+o(k)), which can be

computed in time O∗(2k+o(k)).

1.1.3 Multilinear Detection and Narrow Sieves

The multilinear detection method was proposed in 2008 by Koutis and Williams [Kou08,

Wil09, KW09]. This technique evolved to the more powerful narrow sieves method,

introduced in 2010 by Bjorklund et al. [Bj10, BHK+10a]. Both multilinear detection

and narrow sieves are algebraic techniques, where a problem is translated to a new one

8


that involves polynomials by associating monomials with potential solutions. These

techniques enable development of extremely fast parameterized algorithms that have

polynomial space complexities; however, they are only known to be relevant to the

design of randomized algorithms for unweighted problems.1 In particular, using narrow

sieves, Bjorklund [Bj10] discovered an O∗(1.66n)-time algorithm for Hamiltonicity,

which is the first algorithm for this problem that breaks the O∗(2n)-time barrier.

In applying the mulitlinear detection method, for some parameter t, one associates

a monomial of degree at most t with each potential solution, such that a monomial is

multilinear iff it is associated with a correct potential solution. Then, there is a solution

to the original problem iff the polynomial that is the sum of the monomials associated

with potential solutions contains a multilinear monomial. The problem of deciding

whether the resulting polynomial indeed has a multilinear monomial can be solved using

a known algorithm in time O∗(2t) and polynomial space [Kou08, Wil09]. For example,

when using multilinear detection to solve k-Path, one can assign an indeterminate

to each vertex; while traversing the graph, a polynomial containing information on

the vertices already visited is multiplied by the indeterminate assigned to the vertex

currently visited. Thus, if in a certain walk a vertex is visited more than once, the

resulting monomial is not multilinear.

The narrow sieves method also expresses a parameterized problem by associating

monomials with potential solutions. However, now each monomial should either represent

a unique correct solution, or an even number of incorrect solutions. Roughly speaking,

this is achieved by using a set L of t labels, such that some indeterminates are defined

by labeled elements−one carefully chooses t elements, and replaces each of them by t

labeled elements. When using an element that was replaced, a polynomial containing

information on elements already used is multiplied by the sum of the t indeterminates

of the labeled elements replacing the element currently used. Proofs that the monomials

are associated according to the narrow sieves method often show that given a monomial

associated with a correct potential solution, the solution can be uniquely constructed,

while the set of incorrect potential solutions can be partitioned into pairs, where each

pair contains incorrect potential solutions associated with the same monomial. The

later argument is usually shown by examining an incorrect potential solution A, and

executing a procedure that modifies it to obtain a different incorrect potential solution

B; then, it is proved that if one performs the same procedure when starting from B, the

result is A. If A is incorrect since it contains at least two labeled elements, p and q, that

correspond to the same original element, the procedure obtains B by swapping labels

between such elements; it can be assumed that p and q have different labels (as explained

below), and thus A 6= B. Otherwise, a problem-specific procedure is necessary−for

example, in the case of k-Path, some incorrect potential solutions, which are walks that

are not simple paths, are paired via a procedure that changes the direction in which

1More precisely, they can be used to solve weighted problems, but the time complexities of theresulting algorithms have exponential dependencies on the length of the input weights.

9


parts of them are traversed.

Next, having associated monomials with potential solutions as required, one examines

the polynomial that is the sum of the monomials. This polynomial is evaluated

over a field of characteristic two. Clearly, it is extremely inefficient to explicitly

compute the polynomial, and therefore one only computes evaluations of it via dynamic

programming−each evaluation corresponds to a randomly chosen assignment of elements

from the field. Now, relying on the Schwartz-Zippel lemma [DL78, Sch80, Zip79], and

due to the special manner of associating monomials with potential solutions, with a

non-negligible probability, the evaluation returns zero iff the problem does not have a

solution. Thus, by performing a polynomial number of evaluations, one can solve the

problem. However, up to this point, we did not consider the ingredient that consumes

most of the computation time−one needs to ensure that if a potential solution is

incorrect since it contains at least two labeled elements p and q as described above,

those elements have different labels. To this end, one actually considers a polynomial

that is the sum of monomials that are not only associated with potential solutions,

but also with “illegal solutions” that do not satisfy the above condition. By using

the inclusion-exclusion principle, and since the polynomial is evaluated over a field of

characteristic two, it is possible to cancel the monomials associated with the illegal

solutions. This involves an iteration over every subset of the set of labels L, which

overall results in algorithms whose running times are bounded by O∗(2t). To truly

exploit the power of the narrow sieves method, it is desired to use as few labels as

possible, attempting to devise problem-specific procedures that capture useful features

of the studied problem in order to pair incorrect potential solutions.

1.1.4 Representative Sets

The representative sets method was introduced in 2013 by Fomin et al. [FLS13, FLS14].

This is a combinatorial technique, useful for developing deterministic algorithms for

weighted problems that are significantly faster than those relying on divide-and-color;

yet, for unweighted problems, the resulting algorithms are not as fast as their randomized

counterparts that rely on narrow sieves.

The definition of representative sets involves matroids. Since in deriving the results

in this thesis, we only consider two types matroids, we next give their specific definitions,

and refer the reader to [Oxl06] for a broader overview of matroids. Given a constant k,

the first is defined by a pair M = (E, I), where E is an n-element set, and I = {S ⊆E : |S| ≤ k}. Such a pair is called a uniform matroid, and is denoted by Un,k. Given

some constants ` and k1, k2, . . . , k`, the second is defined by a pair M = (E, I), where

E is an n-element set partitioned into disjoint sets E1, E2, . . . , E`, and I = {S ⊆ E :

|S ∩E1| ≤ k1, |S ∩E2| ≤ k2, . . . , |S ∩E`| ≤ k`}. Such a pair is called a partition matroid.

Note that, when ` = 1, the definitions for the two types of matroids coincide.

Now, we give the formal definition of a representative family, where a p-set is a set

10


Figure 1.1: A subfamily S ⊆ S representing S in a matroid U5,4, with p = 2.

of exactly p elements. The definition is followed by an explanation of its relevance to

the design of parameterized algorithms.

Definition 1.1.1. Given a matroid M = (E, I), a family S of p-subsets of E, and a

function w : S → R, we say that a subfamily S ⊆ S max (min) represents S if for every

pair of sets X ∈ S, and Y ⊆ E \X such that X ∪ Y ∈ I, there is a set X ∈ S disjoint

from Y such that w(X) ≥ w(X) (w(X) ≤ w(X)).

The special case where w(S) = 0, for all S ∈ S, is the unweighted version of

Definition 1.1.1. Observe that in the case of uniform matroids, Definition 1.1.1 simplifies

to the following definition.

Definition 1.1.2. Given a universe E, a family S of p-subsets of E, a function w :

S → R and a parameter k, we say that a subfamily S ⊆ S max (min) represents S if

for every pair of sets X ∈ S, and Y ⊆ E \X such that |Y | ≤ k− p, there is a set X ∈ Sdisjoint from Y such that w(X) ≥ w(X) (w(X) ≤ w(X)).

An illustrated example of an unweighted representative family is given in Fig. 1.1,

corresponding to the matroid U5,4 = (E, I), where I = {X ⊆ E : |X| ≤ 4}. The

subfamily S ⊆ S represents S, since for every pair of sets X ∈ S, and Y ⊆ E \ Xsuch that |Y | ≤ (k − p) = 2, there is a set X ∈ S disjoint from Y . For example, the

set X = {a, b} ∈ S is disjoint from {c, d}, {c, e} and {d, e}. Indeed, the subfamily Scontains the set {b, e} that is disjoint from {c, d}, the set {b, d} that is disjoint from

{c, e}, and the set {a, c} that is disjoint from {d, e}.In other words, the definition of a representative family S garauntees that if a set Y

can be extended to a solution, which is a set of size at most k, by adding a subset from

S, then it can also be extended to a solution, which is a set of the same size, by adding

a subset from S. A fast computation of representative families plays a central role in

obtaining better running times for parameterized algorithms. Plenty parameterized

algorithms are based on dynamic programming, where after each stage, the algorithm

computes a family S of sets that are partial solutions. At that point we can compute

a subfamily S ⊆ S that represents S. Then, each reference to S can be replaced by a

reference to S. The representative family S contains “enough” sets from S; therefore,

such replacement preserves the correctness of the algorithm. Thus, if we can efficiently

11


compute representative families that are small enough, we can substantially improve

the running time of the algorithm.

The Two Families Theorem of Bollobas [Bol65] implies that for any uniform matroid

Un,k = (E, I) and a family S of p-subsets of E, for some 1 ≤ p ≤ k, there is a subfamily

S ⊆ S of size(kp

)that represents S. For more general matroids (i.e., linear matroids),

the generalization of Lovasz for this theorem, given in [Lov77], implies a similar result,

and algorithms based on this generalization are given in [Mar09] and [FLS14]. In

particular, relying on a computation from [FLS14], Lokshtanov et al. [LMP+14] proved

the following result for partition matroids.

Theorem 1.1 ([FLS14, LMP+14]). Given constants `, k1, k2, . . . , k` and k ≤∑

i=1

ki, a

corresponding partition matroid M = (E, I), and a family S of p-subsets of E, a

family S ⊆ S of size at most(kp

)nO(1) that (k − p)-represents S can be found in time

O(|S|(kp

)w−1nO(1)), where w < 2.3727 is the matrix multiplication exponent [Wil12].

For uniform matroids, Monien [Mon85] computed representative families of size∑k−pi=0 p

i in time O(|S|p(k − p)∑k−pi=0 p

i), and Marx [Mar06] computed representative

families of size(kp

)in time O(|S|2pk−p). Recently, Fomin et al. [FLS14] introduced a

powerful technique, extending the splitters technique of Naor et al. [NSS95],2 which

substantially improved upon the previous results.

Theorem 1.2 ([FLS14]). Given a uniform matroid Un,k = (E, I), and a family S of

p-subsets of E, a family S ⊆ S of size at mostkk

pp(k − p)k−p 2o(k) log n that (k − p)-

represents S can be found in time O(|S|(k/(k − p))k−p2o(k) log n).

1.2 Parameterized Graph Problems

In this section, we define the problems studied in this thesis. We also discuss known

results related to them, as well as our contributions. A discussion on our technical

contributions is deferred to the following chapter.

1.2.1 Network Query Problems

With the increasing amount of data on biological networks available, the discovery

of conserved patterns or signaling pathways has become of major importance. Such

intrinsic structural properties can be identified through the extensive use of network

queries, which compare the graph modeling the network with a given pattern. Indeed,

the Alignment Network Query (ANQ) problem, which is a variant of the classic

Subgraph Isomorphism problem, and the well-studied Graph Motif (GM) problem

2Indeed, towards computing a representative family, Fomin et al. [FLS14] obtain a construction moregeneral than an (n, k)-universal set.

12


h(v) v

u1 v1

u3 v2

u4 v5

u8 v8

u7 v9

G G'

U = {v3,v4,v6,v7,v10 }

U' = {u2,u5,u6,u9,u10}

v1 v2 v3 v4 v5

v6 v7 v8 v9

v10

u1 u2 u3 u4

u5 u6 u7 u8

u9 u10

v1 v2 v5

v8 v9 G \ U

u1 u3 u4

u7 u8 G' \ U'

Figure 1.2: An example of a homeomorphism h from G to G′.

(also known as the Topology-Free Network Query (TFNQ) problem)3 play a

pivotal role in the analysis of biological networks [FP11, Sik12]. Recently, a different

approach for analyzing biological networks via network queries, resulting in the Module

Motif problem, has been suggested by Rizzi et al. [RS12]. Due to their general nature,

ANQ, GM and Module Motif can also be used in analyzing other types of networks,

such as social and technical networks [FFH+11].

Given a pattern P and an undirected graph H, GM and ANQ seek a subgraph of H

that resembles P . GM requires only the connectivity of the solution, while ANQ requires

resemblance between the topology of P and the solution. The Partial Information

Network Query (PINQ) problem, which we introduced in [PZ13, PSZ15], is a

generalization of both GM and ANQ that fits for the common scenario where we have

only partial information on the topology of P. The Module Motif problem does

not refer to the topology of the solution, but concerns the neighbors of the nodes in a

solution. Next, the subscript I (e.g., in GMI) refers to more general network queries,

which allow insertions and deletions (indels) of nodes.

Partial Information Network Queries

Given a host graph H and a set of graphs P, in PINQI we seek a disjoint collection of

subgraphs of H, each resembling a different graph in P, whose union is a connected

graph (see below). Each of these subgraphs is mapped to the graph it resembles in P , by

using a variant of isomorphism allowing to delete degree-2 nodes, called homeomorphism.

For biological motivation, see, e.g., [PRY+05].

Homeomorphism: Given a graph G = (V,E) and a set U of degree-2 nodes in V,

generate the multigraph G \ U as follows (see Fig. 1.2). Delete from G the nodes in

U and their incident edges. For every pair v, u ∈ V \ U and a simple path connecting

them, in which all other nodes belong to U , add an edge {v, u}. For every v ∈ V \ Uand a simple cycle in G consisting only of v and nodes in U, add a self-loop to v.

3In algorithm theory, TFNQ is better known as GM. However, in systems biology, the term TFNQis more widely used, since network motifs have a different meaning (recurrent and statistically significantsubgraphs).

13


v1 v2 v3 v4 v5 v6 v7 v8 v9 v10

p1 2 -∞ -∞ 9 2 1 3 2 1 1

p2 1 -∞ -∞ -∞ -∞ 1 2 -∞ -∞ -∞

p3 2 -∞ -∞ 3 7 1 2 2 1 3

p4 1 -∞ -∞ 2 2 3 1 2 8 3

p5 5 9 2 2 -∞ -∞ 3 -∞ -∞ -∞

p6 3 2 -∞ 2 -∞ -∞ 1 -∞ -∞ -∞

p7 3 1 9 2 -∞ -∞ 5 -∞ -∞ -∞

p8 2 -∞ -∞ -∞ -∞ -∞ 1 -∞ -∞ -∞

p1 P = { , , , }

P1 P2

p2

p3 p4

p5

p6 p7

p8

P3 P4

H = G (see Fig. 1), ∀e∈E: w(e)=1

IF = 2, IA = 1, D = 3

S

v1 v2 v3 v4 v5

v6 v7 v9

={v4}

={v5,v9} ={v1,v6,v7}

={}

h1:{p1}→{v4} h2:{p3,p4}→{v5,v9} h3:{p5,p7}→{v1,v7} h4:{}→{}

h1(p1)=v4 h2(p3)=v5,h2(p4)=v9 h3(p5)=v1,h3(p7)=v7 ={v2,v3}

Free insertions = {v2,v3}, Alignment insertions = {v6}, Deletions = {p2,p6,p8}, Score = 42

(A)

(B)

Figure 1.3: (A) An input for PINQI, where k = 8. (B) A solution for the input.

A homeomorphism from G = (V,E) to G′ = (V ′, E′) is defined as an isomorphism

from G \ U to G′ \ U ′, where U and U ′ are subsets of degree-2 nodes in V and V ′,

respectively. To simplify the presentation, we use the term homeomorphism also when

referring to a function whose domain is empty.

Definition of PINQI: The input for PINQI consists of a set of graphs P = {P1, . . . , Pt},where Pi = (Vi, Ei), and a host graph H = (V,E) having real numbers as edge-weights,

along with a similarity score table ∆. The table ∆ contains an entry ∆(p, h) ∈ R∪{−∞}for any pair of nodes p, h, where p ∈ Vi, 1 ≤ i ≤ t and h ∈ V (an entry ∆(p, h) = −∞indicates that p and h cannot be matched). The input contains also the nonnegative

integers IF , IA and D. Let k =∑t

i=1 |Vi| denote the total number of nodes in P (see

Fig. 1.3(A)).

We now give the definition of a solution for PINQI (see Fig. 1.3(B)). Let S =

(S, V 1S , . . . , V

t+1S , h1, . . . , ht), where S = (VS , ES) is a connected subgraph of H, {V 1

S , . . . ,

V t+1S } is a partition of VS , and hi is a homeomorphism from Pi to the subgraph of S

induced by V iS , for all 1 ≤ i ≤ t. Let domain(f) and image(f) denote the domain and

image of a function f , respectively; denote by w(e) the weight of an edge e. The number

of indels and score of S are defined as follows.

• The number of free insertions is |V t+1S |. Informally, this is the number of nodes

connecting the subgraphs of S that are mapped to graphs in P.

• The number of alignment insertions is the number of unmapped nodes in⋃ti=1 V

iS ,

i.e.,∑t

i=1 |V iS \ image(hi)|. Informally, this is the number of nodes that are not

mapped to nodes of graphs in P, and yet belong to the subgraphs of S that are

mapped to P.

• The number of deletions is the number of unmapped nodes in⋃ti=1 Vi, i.e.,∑t

i=1 |Vi \ domain(hi)|.

14


Reference Weights Indels Det/Rand The Topology of Pi Time Complexity

[PZ13]∗ R No rand Tree O∗(6.75k+O(log2 k)3t)

[PSZ14b]∗ Z Yes rand Bounded treewidth O∗(3.7k−D+IAW )

[PSZ14b]∗,$ N0 Yes rand Bounded treewidth O∗(3.7k−D+IAb1ε c)

Table 1.1: Parameterized algorithms for PINQI.

• The score is the sum of the similarity scores between matched nodes, and the

weights of the edges in ES , i.e.,t∑

i=1

∑

p∈domain(hi)

∆(p, hi(p)) +∑

e∈ESw(e).

We say that S is a solution if it includes exactly IF free insertions, IA alignment

insertions and D deletions, and any cycle in S is completely contained in the subgraph

induced by V iS , for some 1 ≤ i ≤ t. The cycle requirement allows us to avoid generalizing

the Clique problem, which is W[1]-hard [DF95] (see below). Assuming that there is no

solution having less than IF free insertions (if such solution exists, we can simply reject

the input), the objective of PINQI is to find the maximum score OPT of a solution.

Relation of PINQI to Known Network Queries: Clearly, PINQ is the special

case where IF = IA = D = 0. Also, ANQ with Indels (ANQI) [DSG+08] is the

special case where t = 1. Finally, GM with Indels (GMI) [BHK+10b] is the special

case where t = k, and ∆(p, h) ∈ {−∞, 0} for any p ∈ Vi, 1 ≤ i ≤ t and h ∈ V .

Relation of PINQI to the Clique Problem: We note that, without the above cycle

requirement, the Clique problem is the special case where t = k, IF = IA = D = 0,

∆(p, h) = 0 for all p ∈ Vi, 1 ≤ i ≤ t and h ∈ V , and w(e) = 1 for all e ∈ E. However,

with the cycle requirement, each clique on at least three nodes contained in a solution

must be entirely contained in a graph in P. Thus, for certain families of graphs (e.g.,

trees), the corresponding special case of PINQ does not generalize Clique.

Previous Work: ANQ is NP-hard even if the single graph in P is a path, since this

case generalizes the Hamiltonian Path problem [GJ79]. Pinter et al. [PRT+08] show

that a special case of ANQ, in which D = 0 and H and the single graph in P are trees,

can be solved in time O(k|V |(k+log |V |)). Moreover, Pinter et al. [PRY+05] modify the

algorithm in [PRT+08] to handle directed trees, and employ it to perform inter-species

and intra-species alignments of metabolic pathways. The algorithm of [PRT+08] was

also employed in a pathway evolution study [MTB+10].

The GM problem is NP-hard even when H is a tree [LFS06]. Fellows et al. [FFH+11]

and Ambalath et al. [ABC+10] show that GM is also NP-hard in other restricted settings

(e.g., if H is a graph of diameter two [ABC+10]). Furthermore, Bjorklund et al. [BKK13]

provide evidence that shows that it is unlikely that GM can be solved in time O∗((2−ε)k)

for any constant ε > 0.

Tables 1.1, 1.2 and 1.3 present known parameterized algorithms for PINQI, ANQI

and GMI, where tw is the maximum treewidth [Bod96] of a graph in P. The Weights

15


Reference Weights Indels Det/Rand The Topology of P1 Time Complexity

[BSV10] R Yes rand Bounded feedback vertex set O∗(8.2k+IA)

[DSG+08] R Yes rand Bounded treewidth O∗(8.2k+IA)

[SSR+06] R Yes rand Simple Path O∗(5.44k+IA)

[HWZ08] R Yes rand Simple Path O∗(4.32k+IA)

[PZ13]∗ R No rand Tree O∗(6.75k)[PZ13]∗ R No rand Simple Path O∗(4k)[PSZ14b]∗ Z Yes rand Bounded treewidth O∗(2k−D+IAW )

[PSZ14b]∗,$ N0 Yes rand Bounded treewidth O∗(2k−D+IAb1ε c)

Table 1.2: Parameterized algorithms for ANQI.

Reference Weights Indels Det/Rand Time Complexity

[BHK+10b] R Yes rand O∗(k!3k)

[DFV11] {0} Yes rand O∗(2O(k−D))

[FFH+11] {0} No det O∗(87k)

[PZ13]∗ R No rand O∗(20.25k+O(log2 k))

[BBF+11] {0} Yes rand O∗(29.6k−D)[BBF+11] {0} No rand O∗(10.88k)

[PSZ14a, Zeh14]∗ {0} Yes det O∗(6.74k−D)

[PSZ14a]∗ {0} No det O∗(5.22k)[PSZ14a]∗ {0} Yes det O∗(5.18k−D) in a restricted setting

[BFK+08] {0} No rand O∗(4.32k)

[GS13] N0 Yes rand O∗(4k−DW 2)

[Kou12] {0} Yes rand O∗(2.54k−D) in a restricted setting

[PZ14]∗ N0 Yes rand O∗(2kW )

[BKK13] {0} Yes rand O∗(2k−D)

[PSZ14b]∗ Z Yes rand O∗(2k−DW )

[PSZ14b]∗,$ N0 Yes rand O∗(2k−Db1ε c)

Table 1.3: Parameterized algorithms for GMI.

columns refer to the possible values for edge-weights and scores in ∆, excluding −∞, and

W denotes the maximum absolute value of any weight. Entries marked by ’∗’ indicate

results given in this thesis. Moreover, the mark ’$’ indicates instances solved via an

FPT-approximation scheme (FPT-AS), that returns a value in [(1−ε)OPT,OPT ], for

any fixed ε > 0. The algorithms in Table 1.2, excluding our contributions, are based on

the color coding method. The algorithms in Table 1.3, excluding our contributions and

the algorithms in [GS13, Kou12, BKK13], are also based on this method. Guillemot

and Sikora [GS13] and Koutis [Kou12] rely on the multilinear detection method. The

algorithm of Bjorklund et al. [BKK13] is based on an involved application of the narrow

sieves method; it was implemented in [BKL15], where Bjorklund et al. tested its practical

relevance to the analysis of large networks.

Our Contribution: First, in the paper [PZ14], we extended the algorithm for ANQ

16


in [PRY+05] to handle topologies more general than directed trees, as well as a more

refined scoring scheme for handling indels. Moreover, we developed an O∗(2kW )-time

algorithm for GM, which improved upon the previous best algorithm of [Kou12]. This

algorithm is based on the narrow sieves method; interestingly, it does not introduce a

new set of labels relevant to the swapping procedure of narrow sieves (which would have

resulted in a slower running time, bounded by O∗(4kW )), but uses a set of labels that is

part of the problem. While our paper [PZ14] was under review, an algorithm running in

time O∗(2k−D) was given in [BKK13]. Bjorklund et al. [BKK13], relying on the narrow

sieves method, also avoid “double-lablling elements”; however, they achieve this in a

more sophisticated manner, which allows them to obtain a parameterized algorithm

with respect to k −D rather than k.

In the paper [PSZ14a], we presented an O∗(6.855k−D)-time deterministic algorithm

for GMI,4 as well as improved algorithms for two special cases of GMI (see Table

1.3). These algorithms make use of representative sets, tailored to each variant at

hand, with respect to both uniform and partition matroids−in particular, they rely

on our tradeoff-based approach for computing representative sets (see Section 2.1.1);

these algorithms are also based on a novel tool that we call guiding trees, on which we

elaborate in Section 2.4.

Moreover, in the paper [PZ13] (whose full version is given in [PSZ15]), we introduced

the PINQ problem, forming a bridge between ANQ and GM. In this paper, we

developed a parameterized algorithm for a special case of PINQ that is based on

the divide-and-color method (see Table 1.1). More recently, in the paper [PSZ14b],

we obtained another algorithm for a special case of PINQ, which is better than the

one in [PZ13] when the weights associated with the instances of PINQ are small. To

develop this algorithm, we proposed an interesting mixture of two narrows sieves-based

procedures and a divide-and-color step (see Section 2.5).

Module Motif

The Module Motif problem, introduced by Rizzi and Sikora [RS12], can be viewed as

a variant of GM where the connectivity constraint is replaced by modularity. A module

M of a graph H = (V,E) is a subset of V such that (∀u, v ∈ M, ∀x /∈ M : {v, x} ∈ Eiff {u, x} ∈ E) [CHM81]; in other words, a module is a set of nodes that have the same

neighborhood outside the set.

Formally, given a set of colors C, a multiset P of colors from C, a graph H = (V,E)

and Col : V → 2C , the Module Motif problem asks whether there are a module M

of H and a function m : M → C such that

1. ∀v ∈M : m(v) ∈ Col(v).

2. ∀c ∈ C : c occurs in P exactly |{v ∈M : m(v) = c}| times.

4Relying on the modified technique for applying the representative sets method, which we give in[Zeh14], the running time of this algorithm is improved to O∗(6.74k−D).

17


In the limited case of Module Motif, |Col(v)| = 1 for all v ∈ V .

Rizzi et al. [RS12] proved that the limited case of Module Motif is NP-hard even

if P is a set and H is a collection of paths of size 3. Denoting by c the number of different

colors in P , they give an O∗((k(2c)k)k+1ck)-time algorithm for Module Motif, which

is not satisfying for practical issues, as well as an O∗(2k)-time and space algorithm for

the limited case. Rizzi et al. [RS12] developed their algorithms by using modular tree

decompositions [TCH+08] and dynamic programming, and left the handling of indels as

an open problem. Roughly speaking, a modular tree decomposition is a rooted tree,

whose vertices are labelled by sets of nodes of a given graph G, and which captures, via

unions of the certains sets associated with its vertices, all the modules of G.

In the paper [Zeh13c], we generalized Module Motif to handle indels, as well as

weights, by defining the following variant.

General Module Motif

• Input: A set of colors C, a multiset P of colors from C, a graph H = (V,E),

col : V → C, Col : V → 2C , ∆ : C × C → R, D, I ∈ N0 and S ∈ R.

• Decide if there are a module M of H, a subset U ⊆M and a function m : U → C

such that

1. ∀v ∈ U : m(v) ∈ Col(v).

2. ∀c ∈ C : c occurs in P at least |{v ∈ U : m(v) = c}| times.

3. |U | = k −D = |M | − I.

4.∑

v∈U ∆(col(v),m(v)) ≥ S.

Conditions 1 and 2 are similar to those of Module Motif. Condition 3 states

that we delete D occurrences of colors from P , and add I occurrences of colors to

P . Finally, condition 4 states that the score of the solution is at least S. Observe

that Module Motif is the special case of General Module Motif in which

(∀c, c′ ∈ C : ∆(c, c′) = 0) and I = D = S = 0.

We developed (in [Zeh13c]) parameterized algorithms for General Module Motif

that use modular tree decompositions. First, we presented an O∗(2k)-time General

Module Motif algorithm that uses dynamic programming. Consequently, we obtained

an O∗(2k)-time Module Motif algorithm, which improved the O∗((k(2c)k)k+1ck)-time

algorithm in [RS12]. Then, we used this algorithm, along with the color coding method,

to design a randomized O∗(4.314k−D)-time General Module Motif algorithm. We

also gave a randomized O∗(2k−DS+)-time algorithm for the special case of General

Module Motif where S and the values in image(∆) are nonnegative integers. This

algorithm relies on the narrow sieves method. Furthemore, we observed that some of

our results might be essentially tight, and proved that Module Motif is unlikely to

admit a polynomial kernel.

18


Reference Weighted? Directed? Det/Rand Time Complexity

[Mon85] Yes Yes det O∗(k!)

[Bod93] Yes Yes det O∗(k!2k)

[AYZ95] Yes Yes rand O∗((2e)k)[AYZ95] Yes Yes det O∗(ck)[DBK07] Yes Yes rand O∗(4.5k)[HWZ08] Yes Yes rand O∗(4.32k)

[KMR+06] Yes Yes rand O∗(4k)[CLS+07]

[KMR+06] Yes Yes det O∗(16k)

[CLS+07] Yes Yes det O∗(12k)

[CKL+09] Yes Yes det O∗(4k)[Kou08] No Yes rand O∗(2.83k)

[Wil09] No Yes rand O∗(2k)[BHK+10a, Bj10] No No rand O∗(1.66k)[AB14]

[FLS14] Yes Yes det O∗(2.851k)

[FLP+14] Yes Yes det O∗(2.619k)[SZ14b]∗

[Zeh14]∗ Yes Yes det O∗(2.597k)

Table 1.4: Parameterized algorithms for k-Path.

1.2.2 Subcases of Subgraph Isomorphism

The Subgraph Isomorphism problem is clearly a special case of ANQ. Since Subgraph

Isomorphism generalizes Clique, which is W[1]-hard, it is unlikely that there exists a

parameterized algorithm that solves it. However, there are many interesting subcases

of Subgraph Isomorphism for which there exist fast parameterized algorithms. In

this thesis, we are interested in the subcases of k-Path and k-Tree. The k-Path

problem asks if a given (possibly directed) graph G contains a path of length at least k,

while the more general k-Tree problem asks if G contains a copy of a certain tree. In

the weighted versions of these problems, the edges of G have weights, and we need to

minimize the sum of the weights of the edges in a solution (if one exists).

The k-Path problem is one of the most well-studied problems in the area of Param-

eterized Complexity. Table 1.4, adapted from [BB15], summarizes known parameterized

algorithms for this problem; the third column refers to the possibility that G is a directed

graph, and entries marked by ’∗’ indicate results given in this thesis. Prior to our work,

the best deterministic algorithm for k-Path was given by Fomin et al. [FLS14]. Relying

on a tradeoff-based approach for computing representative sets (see Section 2.1.1), we

reduced (in [SZ14b]) the running time of their algorithm from O∗(2.851k) to O∗(2.619k).

At the same venue, a similar result was obtained by Fomin et al. [FLP+14]. Recently,

by employing a mixing strategy (see Sections 2.1.2 and 2.5), we further reduced (in

[Zeh14]) the running time of the algorithm to O∗(2.597k). We note that obtaining an

19


O∗(2k)-time deterministic algorithm for the weighted version of k-Path is a major

open problem in the area of Parameterized Complexity. For the unweighted version,

Bjorklund et al. [BHK+10a, Bj10] obtained an O∗(1.66k)-time randomized algorithm,

which relies on the narrow sieves method. Recently, in the paper [BKK+15] (which is

not included in this thesis), we obtained a randomized algorithm that is faster than the

one in [BHK+10a] when k-Path is restricted to bounded degree graphs.

For the weighted version of k-Tree, we obtained a similar improvement (in the

same papers). That is, we improved the running time of a deterministic algorithm

by Fomin et al. [FLS14] from O∗(2.851k) to O∗(2.597k). For the unweighted version,

Koutis and Williams [KW09] obtained an O∗(2k)-time randomized algorithm, which

relies on the multilinear detection method. An earlier algorithm for k-Tree relies on

a derandomization of the color coding method [AYZ95, NSS95], where an improved

algorithm by Cohen et al. [CFG+10], based on the divide-and-color method, solves

k-Tree in deterministic time O∗(6.14k) and in randomized time O∗(5.704k).

1.2.3 Spanning Tree-Related Problems

The input for the k-Internal Out-Branching (k-IOB) problem consists of an

n-vertex directed graph G = (V,E), and a parameter k ∈ N . The objective is to decide

if G has an out-branching (i.e., a spanning tree with exactly one node of in-degree 0,

that we call the root) with at least k internal nodes (i.e., nodes of out-degree ≥ 1). Let

∆ denote the maximum degree of a node in the (simple) underlying undirected graph of

a graph G. The k-IOB problem is of interest in database systems [DD13].

A special case of k-IOB, called k-Internal Spanning Tree (k-IST), asks if a

given undirected graph G = (V,E) has a spanning tree with at least k internal nodes.

Observe that an algorithm for k-IOB can be used to solve k-IST in the same time

complexity (replace each undirected edge {v, u} in G by two directed edges, (v, u) and

(u, v)). A possible application of k-IST, for connecting cities with water pipes, is given

in [RFG+13]. The k-IST problem is NP-hard even in graphs of bounded degree 3, since

it generalizes the Hamiltonian Path problem in such graphs; thus, k-IOB is also

NP-hard in such graphs.

Nederlof [Ned09] gave an O∗(2n)-time and polynomial space algorithm for k-IST.

For k-IST in graphs of bounded degree, Raible et al. [RFG+13] gave an O∗(((2∆+1 −1)

1∆+1 )|V |)-time and exponential space algorithm. Recently, in the paper [BKK+15]

(which is not included in this thesis), we gave an improved algorithm for k-IST, which

runs in time O∗(1.946n), and is faster in graphs of bounded degree.

Table 1.5 presents a summary of known parameterized algorithms for k-IOB and

k-IST; entries marked by ’∗’ indicate results given in this thesis. By integrating proper

colorings of graphs in the multilinear detection method (see Section 2.2), we obtained

(in [Zeh13a]) the fastest algorithm for k-IOB in bounded degree graphs. Moreover, in

the paper [Zeh14], relying on a result we showed in [SZ14b], we developed the fastest

20


Reference Directed? Det/Rand Time Complexity

[PS05] No det O∗(2O(k log k))

[GRK09] Yes det O∗(2O(k log k))

[CFG+10] Yes det O∗(55.8k)Yes rand O∗(49.4k)

[FGL+12] Yes det O∗(16k+o(k))

Yes rand O∗(16k+o(k))

[FGS+13] No det O∗(8k)[SZ14b] Yes det O∗(6.855k)

[Dal11], [Zeh13a]∗ Yes rand O∗(4k)[LWC+14] No det O∗(4k)

[Zeh13a]∗ Yes, ∆ = O(1) rand O∗(4(1− ∆+12∆(∆−1)

)k)

[Zeh14]∗ Yes det O∗(5.139k)Yes rand O∗(3.617k)

[RFG+13] No, ∆ = 3 det O∗(2.137k)

[BKK+15] No rand O∗(3.455k)No, ∆ = O(1) rand faster than [Zeh13a]

Table 1.5: Known parameterized algorithms for k-IOB and k-IST.

randomized and deterministic algorithms for k-IOB in general graphs. To this end,

we used a novel mixing strategy (see Section 2.5) along with guiding trees and our

tradeoff-based approach for computing representative sets (see Sections 2.1.1 and 2.4), as

well as an interesting reduction from a problem that requires finding a tree to a problem

that requires finding a smaller tree and a disjoint matching. Further information on

k-IOB, k-IST and variants of these problems is given in the surveys [OY11, Sal10].

1.2.4 Graph Partitioning Problems

Graph partitioning problems arise in many areas including VLSI design, data mining, par-

allel computing, and sparse matrix factorization (see, e.g., [Ber06, KJM+11, DRL+10]).

We study the broad class of fixed-cardinality graph partitioning problems (FGPPs),

where each problem is specified by a graph G = (V,E), and parameters k and p. We

seek a subset U ⊆ V of size k, such that α1m1 + α2m2 is at most (or at least) p, where

α1, α2 ∈ R are constants defining the problem, and m1,m2 are the cardinalities of the

edge sets having both endpoints, and exactly one endpoint, in U , respectively. This

class encompasses such fundamental problems as Max and Min (k, n− k)-Cut, Max

and Min k-Vertex Cover, k-Densest Subgraph, and k-Sparsest Subgraph. For

example, Max (k, n− k)-Cut is a max-FGPP (i.e., maximization FGPP) satisfying

α1 = 0 and α2 = 1, Min k-Vertex Cover is a min-FGPP (i.e., minimization FGPP)

satisfying α1 = α2 = 1, k-Densest Subgraph is a max-FGPP satisfying α1 = 1 and

α2 = 0, and k-Sparsest Subgraph is a min-FGPP satisfying α1 = 1 and α2 = 0.

Parameterized by k, Max and Min (k, n− k)-Cut, and Max and Min k-Vertex

21


Cover are W[1]-hard [DEF+03, Cai08, GNW07]. Moreover, Clique and Independent

Set, two well-known W[1]-hard problems [DF95], are special cases of k-Densest

Subgraph (where p = k(k − 1)/2), and k-Sparsest Subgraph (where p = 0),

respectively. Therefore, parameterized by (k + p), k-Densest Subgraph and k-

Sparsest Subgraph are W[1]-hard. Cai et al. [CCC06] and Bonnet et al. [BEP+13]

studied the parameterized complexity of FGPPs with respect to (k + ∆). The paper

[CCC06] gives O∗(2(k+1)∆) time algorithms for k-Densest Subgraph and k-Sparsest

Subgraph. This result was recently improved in [BEP+13] to O∗(∆k) for degrading

FGPPs. This subclass contains max-FGPPs in which α1/2 ≤ α2, and min-FGPPs

in which α1/2 ≥ α2.5 The authors of [BEP+13] also proposed an O∗(k2k∆2k) time

algorithm for all FGPPs, and posed as an open question the existence of constants a

and b such that any FGPP can be solved in time O∗(ak∆bk).

Parameterized by p, Max k-Vertex Cover can be solved in time O∗(1.396p), and in

randomized time O∗(1.2993p) [KLR08]. Kneis et al. [KLR08] also show (implicitly) that

Min k-Vertex Cover can be solved in time O∗(4p), and in randomized time O∗(3p).

Moreover, by solving any degrading FGPP in time O∗(∆k), Bonnet et al. [BEP+13]

prove that Max (k, n − k)-Cut can be solved in time O∗(pp). Recently, Cygan et

al. [CLP+14] showed that Min (k, n− k)-Cut is also fixed-parameter tractable with

respect to p. Parameterized by (k + p), Min (k, n − k)-Cut can be solved in time

O∗(k2k(k + p)2k) [BEP+13].

In the paper [SZ14a], we presented an O∗(4k+o(k)∆k)-time algorithm for the class

of all FGPPs, answering affirmatively the question posed by Bonnet et al. [BEP+13].

To this end, we establish an interesting reduction from non-degrading FGPPs to the

Weighted k-Exact Cover problem, which is combined with a fast construction of a

representative family. Relying on variants of the randomized separation method (see

Section 2.3), we obtained fast algorithm for specific FGPPs and subclasses of FGPPs.

First, we developed an O∗(4p+o(p))-time algorithm for Max (k, n− k)-Cut. Then, we

obtained an O∗(2k+ pα2

+o(k+p))-time algorithm for the subclass of positive min-FGPPs, in

which α1 ≥ 0 and α2 > 0. Moreover, we developed a faster algorithm for non-degrading

positive min-FGPPs (i.e., min-FGPPs satisfying α2 ≥ α1/2 > 0), which yields an

O∗(2p+o(p))-time algorithm for Min k-Vertex Cover.

1.2.5 Matching and Packing Problems

Matching and packing problems form an important class of NP-hard problems. The

question of finding the largest 3-dimensional matching is a classic optimization problem,

and the decision version is listed as one of the six fundamental NP-complete problems in

Garey and Johnson [GJ79]. This problem can be viewed as an immediate generalization

of the matching problem in bipartite graphs to three-partite, three-uniform hypergraphs.

In this thesis, we study a well-known variant of 3-Dimensional Matching, called

5A max-FGPP (min-FGPP) is non-degrading if α1/2 ≥ α2 (α1/2 ≤ α2).

22


Reference Weighted? Det/Rand Time Complexity

[CFJ+04] No det O∗(kO(k))

[DF99] Yes det O∗(kO(k))

[FKN+08] Yes det O∗(2O(k))

[LCW07] Yes det O∗(20097.152k)

[Kou05] No det O∗(2O(k))No rand O∗(1285.475k)

[WF08a] Yes det O∗(432.082k)

[LLC+06] No det O∗(97.973k)

[CKL+09] Yes det O∗(64k+o(k))

Yes rand O∗(16k+o(k))

[WF08b] No det O∗(43.615k)

[CFL+11] Yes det O∗(32k+o(k))

[GMP+14] Yes det O∗(12.155k)

[Kou08] No rand O∗(8k)[BHK+10a] No rand O∗(3.344k)

[Zeh14]∗ Yes det O∗(8.097k)

Table 1.6: Known parameterized algorithms for (3, k)-WSP.

3-Set k-Packing ((3, k)-SP), as well as the related P2-Packing and Partial Cover

problems.

The Weighted (3, k)-SP ((3, k)-WSP) problem is defined as follows. Given a

universe U , a family S of 3-subsets of U , a weight function w : S → R, a weight W ∈ Rand a parameter k ∈ N , (3, k)-WSP asks if there exists a subfamily S ′ ⊆ S of k disjoint

sets whose total weight is at least W . The special case of P2-Packing asks if a given

undirected graph G = (V,E) contains k (node-)disjoint simple paths, each on 3 nodes.

In the past decade, (3, k)-WSP enjoyed a race towards obtaining the fastest param-

eterized algorithm that solves it. Table 1.6 presents a summary of known parameterized

algorithms for (3, k)-WSP. We solved (3, k)-WSP in deterministic time O∗(8.097k),

improving upon the previous best deterministic time O∗(12.155k) [GMP+14]. Our

result builds upon the algorithm in [GMP+14], integrating in a technique that we

call unbalanced cutting of the universe (see Section 2.5). Specialized parameterized

algorithms for P2-Packing were given in [FWC14, FWL+13, FR09, PS06]. Feng et

al. [FWL+13] gave a randomized algorithm for P2-Packing that runs in time O∗(6.75k),

for which Feng et al. [FWC14] gave a deterministic version that runs in time O∗(8k+o(k)).

We observed that the algorithms of [FWC14, FWL+13] can be modified to solve P2-

Packing in deterministic time O∗(6.75k+o(k)). We then gave an alternative algorithm

that solves P2-Packing in deterministic time O∗(6.777k), which relies on a mixture of

representative sets and unbalanced cutting of the universe.

Given a universe U , a family S of subsets of U and a parameter k ∈ N , the k-

Partial Cover (k-PC) problem seeks the smallest number of sets in S whose union

23


Reference Det/Rand Variant Time Complexity

[BPS13] det k-PC O∗(4kk2k)

[Bl03] rand k-PC O∗(5.437k)

[KMR07] det k-DS O∗((16 + ε)k)rand k-DS O∗((4 + ε)k)

[CC13] det k-DS O∗(5.437k)

[Kne09] det k-DS O∗((4 + ε)k)

[KW09] rand k-DS O∗(2k)[SZ14b, Zeh14]∗ det k-PC O∗(2.597k)

Table 1.7: Known parameterized algorithms for k-PC and k-DS.

contains at least k elements. This problem generalizes the well-known k-Dominating

Set (k-DS) problem, defined as follows. Given a graph G = (V,E) and a parameter

k ∈ N , k-DS seeks the smallest size of a set U ⊆ V such that the number of nodes in

the closed neighborhood of U is at least k. If k-PC can be solved in time t(|U |, |S|, k),

then k-DS can be solved in time t(|V |, |V |, k) (see, e.g., [BPS13]). Note that the special

cases of k-PC and k-DS in which k = n, are the classical NP-hard Set Cover and

Dominating Set problems [GJ79], respectively. Table 1.7 presents a summary of

known parameterized algorithms for k-PC and k-DS. Relying on our tradeoff-based

approach for computing representative sets (see Section 2.1.1), we obtained (in [SZ14b])

the fastest algorithm for k-PC, which runs in time O∗(2.619k), and can be further

sped-up to run in time O∗(2.597k) by using a mixing strategy (see Section 2.5).

24


Chapter 2

Methods

In this section, we discuss our technical contributions.

2.1 Representative Sets: New Computations

We first present two computations of representative sets with respect to uniform matroids,

which we developed by extending the technique of Fomin et al. [FLS14].

2.1.1 A Unified Tradeoff-Based Approach

Our unified approach exploits an interesting tradeoff between running time and the

size of the representative families. This tradeoff is made precise by using, along with

the scheme of [FLS14], a parameter c ≥ 1, which enables a more careful selection of

elements to the sets.

Indeed, towards computing a representative family S, we seek a family F ⊆ 2E that

satisfies the following condition. For every pair of sets X ∈ S, and Y ⊆ E \X such

that |Y | ≤ k − p, there is a set F ∈ F such that X ⊆ F , and Y ∩ F = ∅ (see Fig. 2.1).

Then, we compute S by iterating over all S ∈ S and F ∈ F such that S ⊆ F . The time

complexity of this iterative process is the dominant factor in the overall running time.

Thus, we seek a small family F , such that for any S ∈ S, the expected number of sets

in F containing S is small. In constructing each set F ∈ F , we insert each element

Figure 2.1: A family F ⊆ 2E . Assume that n = 5, k = 4, and p = 2. An arrow from aset S ∈ S to a set F ∈ F indicates that S ⊆ F .

25


e ∈ E to F with probability p/(ck). For c = 1, this is the approach proposed in [FLS14].

When we take a larger value for c, we need to construct a larger family F . Yet, since

elements in E are inserted to sets in F with a smaller probability, we get that for any

S ∈ S, the expected number of sets in F containing S is smaller.

This leads to the following result, which we proved in the paper [SZ14b]:1

Theorem 2.1. Given a parameter c ≥ 1, a uniform matroid Un,k = (E, I), a family

S of p-subsets of E, and w : S → R, a family S ⊆ S of size(ck)k

pp(ck − p)k−p 2o(k) log n that

max (min) represents S can be found in time O(|S|(ck/(ck − p))k−p2o(k) log n+ |S| log |S|).

2.1.2 Disjointness Conditions

In the paper [Zeh14], we proved that under certain conditions, that we called “disjointness

conditions”, the computation in Theorem 2.1 can be further sped-up. To this end, we

first generalized the definition of a representative family (Definition 1.1.2):

Definition 2.1.1. Let E1, E2, . . . , Et be disjoint universes of elements, p1, p2, . . . , pt ∈N , and S be a family of subsets of (

⋃ti=1Ei) such that [∀S ∈ S, i ∈ {1, 2, . . . , t}:

|S ∩ Ei| = pi]. Given a function w : S → R and parameters k1, k2, . . . , kt ∈ N , we say

that a subfamily S ⊆ S max (min) (k1−p1, k2−p2, . . . , kt−pt)-represents S if for every

pair of sets X ∈ S and Y ⊆ (⋃ti=1Ei)\X such that [∀i ∈ {1, 2, . . . , t} : |Y ∩Ei| ≤ ki−pi],

there is a set X ∈ S disjoint from Y such that w(X) ≥ w(X) (w(X) ≤ w(X)).

Observe that when t = 1, the above definition and Definition 1.1.2 coincide. Now,

for the generalization, we obtained a corresponding computation:

Theorem 2.2. Given parameters c1, c2, . . . , ct ≥ 1, along with E1, E2, . . . , Et, p1, p2, . . . ,

pt, S, w and k1, k2, . . . , kt as described in Definition 2.1.1, a subfamily S ⊆ S of sizet∏

i=1

((ciki)

ki

pipi(cki − pi)ki−pi2o(ki) log |Ei|

)that max (min) (k1−p1, k2−p2, . . . , kt−pt)-represents

S can be found in time O(|S|t∏

i=1

((ciki/(ciki − pi))ki−pi2o(ki) log |Ei|

)+ |S| log |S|).

The crux of the proof is in avoiding a direct construction of the family F , which is

required to compute a representative family (see Section 2.1.1); the necessary information

that such a family provides is gathered by constructing a set of smaller families, each

corresponding to a different universe Ei. The computation of Theorem 2.2 is useful,

since for certain parameters p1, p2, . . . , pt and k1, k2, . . . , kt, it is faster to obtain an

(k1 − p1, k2 − p2, . . . , kt − pt)-representative family of a certain size, rather than an

(∑t

i=1 ki −∑t

i=1 pi)-representative family of the same size. Given a problem that is

not inherently related to disjointness conditions, we employ a mixture of the above

computation with other techniques (see Section 2.5), which transforms the problem to a

form that contains disjointness conditions.

1At the same venue, a similar tradeoff was independently obtained by Fomin et al. [FLP+14].

26


2.2 Multilinear Detection with Proper Colorings

In applying the multilinear detection method to solve parameterized graph problems,

one often associates an indeterminate with each vertex. In the paper [Zeh13a], we

observed that in the case of graphs of bounded degree ∆, one can avoid associating

indeterminates with certain vertices (see below). If there are enough such vertices, the

maximum degree t of the monomials can be significantly smaller, which overall results

in faster algorithms−recall, from Section 1.1.3, that the time complexity of a multilinear

detection-based procedure is O∗(2t). Indeed, by considering the case of k-IOB, we

demonstrated (in the paper [Zeh13a]) the potential that lies in our approach, obtaining

an algorithm that runs in time O∗(4(1− ∆+12∆(∆−1)

)k).

To apply our technique, we assume that the given problem is restricted to graphs

that are neither complete nor odd cycles, since otherwise it is solvable in polynomial

time (as in the case of k-IOB). Then, it is possible to compute a proper ∆-coloring

in linear time (e.g., by Lovasz’s proof of Brooks’ theorem [Lov75]). Next, we consider

each color in a separate iteration. Let red be the color that is currently examined.

Now, during the execution of a multilinear detection-based procedure, when we visit a

red vertex (from some vertex v), and from this red vertex, continue to visit all of its

neighbors (excluding perhaps v), we do not use the indeterminate that is associated

with the red vertex. To prove the correctness of the omission, it should be shown that

we do not turn a monomial that is not multilinear into a multilinear monomial. This

relies on the observation that if we omitted an indeterminate of a vertex that occurs

more than once in a solution, this vertex has a neighbor that also occurs more than

once; since the color of this neighbor is not red (because we use a proper coloring), its

indeterminate has degree at least two in the monomial associated with the constructed

potential solution.

2.3 Variants of Randomized Separation

In designing algorithms for FGPPs (see Section 1.2.4), parameterized by p or (k+p), we

use as a key tool randomized separation [CCC06], which is a specific method of applying

one step of divide-and-color. Roughly speaking, randomized separation finds a ‘good’

partition of the nodes in the input graph G via randomized coloring of the nodes in

red or blue. If a solution exists, then, with some positive probability, there is a red

colored node-set X that is a solution, such that all of the neighbors of nodes in X that

are outside X are colored blue. Finding such a solution can be easier, since it may be

performed by using dynamic programming, where one seeks a correct combination of

maximal red subgraphs.

By a standard application of this technique, we obtained an algorithm for positive

min-FGPPs. Then, we considered two interesting variants of randomized separation.

First, our algorithm for Max (k, n− k)-Cut makes non-standard use of this technique,

27


in requiring that only a sufficient number of the neighbors outside X of nodes in X

are blue. This yields an improvement in the running time of the algorithm, since less

coloring iterations are necessary to ensure that some (rather than all) of the neighbors

outside X are blue.

Our algorithm for non-degrading positive FGPPs is based on a somewhat different

application of randomized separation, in which we randomly color edges rather than

nodes. If a solution exists, then with some positive probability, there is a node-set X

that is a solution, such that some edges between nodes in X are red, and all of the edges

connecting nodes in X and nodes outside X are blue. In particular, we require that the

subgraph induced by X, and the subgraph induced by X from which we delete all blue

edges, contain the same connected components. Again, requiring from less elements

to have a particular color (in this case, the edges between nodes in X), results in an

improved running time.

2.4 Guiding Trees

Building upon a result given by Fomin et al. [FLS14] (to solve the k-Tree problem),

guiding trees is a tool that we introduced in the paper [PSZ14a] (to solve variants of

the GM problem), and also used in the paper [SZ14b] (in the context of the k-IOB

problem), to handle problems in which the solution is a tree whose exact topology is

unknown. By using this tool, one does not need to examine every possible topology

that is a tree on k vertices, which is extremely inefficient−there are O∗(2k) such options

to consider; instead, it is only necessary to “follow the instructions” of a polynomial

number of structures that we call guiding trees.

Roughly speaking, a guiding tree is a constant-size rooted tree which provides some

structural information about the solution tree; more precisely, the vertices of the guiding

tree are part of the solution tree, while the edges represent small unknown topologies

that connect these vertices. In this sense, the guiding tree instructs how the solution

tree can be partitioned into a set of small trees; more generally, this instructs how one

can partition a given problem into a small number of small problems. In the context

of the representative sets method, such instructions can be used to compute smaller

families, which result from combinations of partial solutions, for which we need to

compute representative families.

Formally, let G = (V,E) be the input graph, and let 2 ≤ d ≤ k/2 be a constant.

Given nodes v, u ∈ V , we say that a tree T rooted at v is a (v, u)-tree if u ∈ VT .

Furthermore, a (v, u)-tree R is a (v, u)-guide if 3 ≤ |VR| ≤ 2d and VR ⊆ V (ER may not

be contained in E). Let Gv,u be the set of (v, u)-guides. Finally, let Tv,u,` be the set of

(v, u)-trees on ` nodes, that, when unrooted, are subtrees of G.

We now define which subtrees of G listen to the instructions of a given guide (see

Fig. 2.2).

28


v v v

u

u u

r r r

s

s s

t t

t

a

b c

e f

g

h

i

j

l

a

b c

e f

g

h

i

j

l

q

x

y z

G T R

Figure 2.2: A (v, u)-tree T , and a (v, u)-guide R, where d=3, k=12, and T listens to R.

Definition 2.4.1. Given v, u ∈ V and ` ≤ k, we say that T ∈ Tv,u,` listens to R ∈ Gv,uif the following two conditions are satisfied.

1. ∀v′, u′ ∈ VR : v′ is an ancestor of u′ in R iff v′ is an ancestor of u′ in T .

2. For each tree X in the forest obtained by removing VR from T , let NX = {v′ ∈VR : {v′, u′} ∈ ET for some u′ ∈ VX}.Then, |NX | ≤ 2, and [NX 6= {v} → (|VX ∪NX | ≤ k/d)].

The next lemma, which asserts that none of the subtrees of G is completely undisci-

plined, is implicit in [FLS13].

Lemma 2.4.2. For any rooted tree T ∈ Tv,u,`, where v, u ∈ V and 3 ≤ ` ≤ k, there

exists R ∈ Gv,u to whom T listens.

Informally, given a tree T , the proof of Lemma 5.7 in [FLS13] implies how to find a

certain set of O(1) vertices in VT that, when removed, partitions T into a forest of small

trees (only). Considering this proof along with our definitions, it is straightforward to

see that Lemma 2.4.2 holds. Thus, to use the tool of guiding trees when information on

the topology of a solution tree is necessary, one can iterate over every possible guiding

tree (there is a polynomial number of guides), and try to follow the instructions of each

of them.

2.5 Mixtures of Color Coding-Related Techniques

In the papers [PSZ14b] and [Zeh14], we proposed mixtures of color coding-related

methods.

Two Narrow Sieves-Based Procedures: First, in the paper [PSZ14b], we examined

a scheme that consists of two narrow sieves-based procedures, combined with a divide-

and-color step. In solving PINQI, we observed that if the nodes in V (the node-set of

H) can be mapped only to nodes of single-node graphs in P, PINQI can be efficiently

solved by a narrow sieves-based procedure, PA, that is a straightforward extension of

the algorithm for GM given in [BKK13]. On the other hand, if |P| = 1, PINQI can be

29


efficiently solved by a different procedure, PB, using a standard application of narrow

sieves.

Now, suppose that we have a partition of V into a set A of nodes that can be mapped

only to nodes of single-node graphs in P, and a set B of nodes that can be mapped

only to nodes of the other graphs in P. For such a scenario, we developed a narrow

sieves-based procedure, ManySingles, handling nodes in A in an efficient manner similar

to PA, and nodes in B in a manner similar to PB. To handle only such scenarios, before

each call to ManySingles, we used divide-and-color to partition V into the sets A and

B. The correctness of ManySingles crucially relies on this selection step. The combined

application of divide-and-color and ManySingles is efficient only for solutions containing

many single-node graphs from P .2 However, solutions containing few single-node graphs

from P cannot contain too many graphs from P. For such solutions, we developed a

procedure, FewSingles, handling all the nodes in V in a manner similar to PB.

Thus, our algorithm proceeds in the following main steps:

1. Examine all choices for the number s of single-node graphs from P in the solution.

2. If s is “large”:

(a) Apply divide-and-color to partition V as described above.

(b) Call ManySingles.

3. Else: Call FewSingles.

Narrow Sieves and Representative Sets: In the paper [Zeh14], we examined a

mixture of a representative sets-based procedure and a narrow sieves-based procedure,

which we used to solve the k-IOB problem. This mixture is similar to the previous one

in the sense that we identify two extreme cases that can be efficiently solved, iterate

over a parameter that forms a bridge between them, and define a threshold such that if

the parameter is smaller than this threshold, we execute one procedure, and otherwise,

we execute another. Again, one of the procedures relies on the narrow sieves method.

However, the second procedure relies on the representative sets method, combined with

a polynomial time computation. More precisely, we run a representative sets-based

procedure up to a certain point, where it obtains a family of partial solutions of some

desired form and size; then, we try to complete each partial solution by performing

a polynomial time computation. Thus, we replace the divide-and-color step of the

previous mixture by a more complicated partial execution of a representative sets-based

procedure, while the following narrow sieves-based procedure is replaced by a polynomial

time computation.

Specifically, to develop an algorithm for k-IOB, we employed this mixture as follows.

First, we showed that instead of directly solving k-IOB, one can also decide to focus

2For such solutions, the time gained by handling A in a manner similar to PA prevails the timerequired for the preceding selection step.

30


on finding a certain small tree, along with a set of disjoint paths on two nodes. Then,

we examined several possible values for the size of a (different) tree that we seek

towards constructing a solution. If the current value is small, we called a narrow

sieves-based procedure; otherwise, we opted for the option of finding a small tree, using

a representative sets-based procedure, which we attempted to complete to a solution by

finding a maximum matching in the underlying undirected graph of the input graph

(from which the tree is removed).

Balanced Cutting of the Universe: In the paper [Zeh14], we obtained a strategy

using which the running times of certain parameterized algorithms, which rely on the

representative sets method, can be improved from O∗(2.619k) to O∗(2.597k). Recall that

in Section 2.1.2, we presented a computation of representative sets that is fast under

certain disjointness conditions. To handle problems that are not inherently related to

disjointness conditions, we perform several preprocessing iterations−each involves a

step of divide-and-color, combined with a technique that we call balanced cutting of the

universe.

On a high level, using balanced cutting starts by examinining a polynomial number

of options that cut a part of the universe, such that at least one of these options

corresponds to a part that contains (exactly) a specific portion of the set of elements

that define a solution (if one exists). For example, this can be performed by choosing

some order between the elements of the universe, and for each element, defining a

part that contains all the elements that are larger than it. Then, for each part P , one

performs a step of divide-and-color, attempting to divide the part P into two sets of

elements, L and R; the set L should contain all the elements in P that are relevant

to the first half of a following representative sets-based computation, while the set R

should contain all the elements in P that are relevant to its second half.

Next, the second phase of balanced cutting is performed.3 In this phase, one implicitly

partitions the entire universe into a constant-size array of smaller universes−the first

small universe should contain all the elements relevant to the “beginning” of a following

representative sets-based computation, the second small universe should contain all

the elements relevant exactly afterwards, and so on. It is important to ensure that

the small universes further satisfy the following condition−the elements in L roughly

belong only to universes that appear in the first half of the array, and among them,

the elements in L that are contained in some solution are either “spread in a balanced

manner” or “tend to be congested” in the universes that have smaller indices in the

array; a symmetrical condition that refers to R should also be satisfied. Then, in the

paper [Zeh14], we showed that this process distorts the entire universe in a manner

that allows one to define a special subcase of the original problem, which contains

explicit disjointness conditions; thus, the problem is translated into an easier one that

can be efficiently solved using a straightforward, albeit technical, application of the

3The order between the phases and the divide-and-color step is not important; here, for the sake ofclarity of explanation, we refer to an order.

31


computation mentioned in Section 2.1.2.

Unbalanced Cutting of the Universe: In the paper [Zeh14], we also presented a

technique called unbalanced cutting of the universe; we combined this technique with an

algorithm for (3, k)-WSP that we gave in the paper [GMP+14] (which is not included

in this thesis), in order to improve its running time. Interestingly, the algorithm given

in [GMP+14] does not only remove sets from families of partial solutions (by computing

representative families), but also removes elements from the partial solutions themselves.

By using unbalanced cutting of the universe, one attempts to order the universe in a

special manner that allows to remove more elements from each partial solution, ensuring

that the elements that are removed cannot be encountered in the rest of the computation.

Having smaller partial solutions speeds-up computations of representative sets, which

can overall result in a significantly improved running time.

The technique is similar to balanced cutting of the universe in the sense that again,

we partition the universe into an array of smaller universes. However, now the partition

is explicit, and its aim is to create a congestion of certain elements−those are the

elements that should be removed−as early as possible. The representative sets-based

computation is partitioned into a constant number of phases, where at each phase, one

is given an element e such that all the elements that are smaller than e can be removed.

A lower bound for the number of elements removed at each phase is obtained using a

recursive formula.

32


Chapter 3

Results

This section contains the papers published as part of this thesis, in chronological order.

3.1 Algorithms for Topology-Free and Alignment Network

Queries

Ron Y. Pinter and Meirav Zehavi. Algorithms for Topology-Free and Alignment Network

Queries. Journal of Discrete Algorithms (JDA), 27:29–53, 2014.

33


Journal of Discrete Algorithms 27 (2014) 29–53

Contents lists available at ScienceDirect

Journal of Discrete Algorithms

www.elsevier.com/locate/jda

Algorithms for topology-free and alignment network queries

Ron Y. Pinter, Meirav Zehavi ∗

Department of Computer Science, Technion, Haifa 32000, Israel

a r t i c l e i n f o a b s t r a c t

Article history:Received 12 November 2012Received in revised form 29 November 2013Accepted 10 March 2014Available online 17 March 2014

Keywords:Topology-free network queryAlignment network queryComputational biologyParameterized algorithmSubgraph homeomorphism

In this article we address two pattern matching problems which have importantapplications to bioinformatics. First we address the topology-free network query problem:Given a set of labels L, a multiset P of labels from L, a graph H = (V H , E H ) anda function LabelH : V H → 2L , we need to find a subtree S of H which is an occurrenceof P . We provide a parameterized algorithm with parameter k = |P | that runs in timeO ∗(2k) and whose space complexity is polynomial. We also consider three variants ofthis problem. Then we address the alignment network query problem: Given two labeledgraphs P and H , we need to find a subgraph S of H whose alignment with P is thebest among all such subgraphs. We present two algorithms for cases in which P andH belong to certain families of DAGs. Their running times are polynomial and they areless restrictive than algorithms that are available today for alignment network queries.Topology-free and alignment network queries provide means to study the function andevolution of biological networks, and today, with the increasing amount of knowledgeregarding biological networks, they are extremely relevant.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Performing topology-free and alignment network queries is an important problem in the analysis of biological networks.Such queries provide means to study their function and evolution. In comparison, similar queries for sequences have beenstudied and used extensively for the past 30 years. Today, with the increasing amount of knowledge we have regardingbiological networks, they are extremely relevant for them as well.

In both problems we are given a pattern P and a host graph H , and we need to find a subgraph S of H which resem-bles P . Furthermore, in both problems P and H have labels which we consider when measuring the similarity between Pand the subgraphs of H .

However, there is one significant difference between them: A topology-free network query does not assume we knowthe topology of P and therefore requires only the connectivity of the solution, while an alignment network query assumessuch knowledge and therefore requires resemblance between the topology of P and the solution. Another difference is thatin the optimization version of the topology-free network query problem we consider the sum of the weights of the edgesof the solution, while in the alignment network query problem we are interested in maximizing the similarity between thenodes of P and the solution. This is mostly a result of the uncertainty concerning the existence of many edges in biologicalnetworks that are more suitable for topology-free network queries.

* Corresponding author.E-mail addresses: [email protected] (R.Y. Pinter), [email protected] (M. Zehavi).

http://dx.doi.org/10.1016/j.jda.2014.03.0021570-8667/© 2014 Elsevier B.V. All rights reserved.


30 R.Y. Pinter, M. Zehavi / Journal of Discrete Algorithms 27 (2014) 29–53

Table 1Known parameterized algorithms with parameter k = |P | for the topology-free network query problem (including its limited case and weighted variant).

Reference General\limited Time complexity Polynomial space? Considers weights?

Bruckner et al. [9] General O ∗(k!3k) No YesFellows et al. [17] Limited O ∗(64k) No NoBetzler et al. [5] General O ∗(10.88k) No No

Limited O ∗(4.32k) No NoGuillemot et al. [20] General O ∗(4k) Yes No

General O ∗(4k W 2) No YesKoutis [24] Limited O ∗(2.54k) Yes NoThis paper General O ∗(2k) Yes No

General O ∗(2k W ) No Yes

Throughout this paper we use O ∗ to hide factors polynomial in the input size and O to hide factors polylogarithmic inthe input size. Furthermore, given a graph G , we use V G and EG to denote its node set and edge set, respectively.

1.1. Topology-free network queries

In the general case of the topology-free network query problem we are given:

1. L – A set of labels.2. P – A multiset of labels from L.3. H – An undirected graph.4. LabelH : V H → 2L .

Our goal is to find a pair (S, label), where S is a subtree of H and label is a function label : V S → L, such that the followingrequirements hold.

1. ∀v ∈ V S : label(v) ∈ LabelH (v).2. ∀l ∈ L: The number of occurrences of l in P is |{v ∈ V S : label(v) = l}|.

In the limited case of this problem we add the following restriction on the input: ∀v ∈ V H : |LabelH (v)| = 1.The topology-free network query problem was introduced by Lacroix et al. [25]. It is motivated mostly by the fact that

there are many cases when querying biological networks in which we do not know the topology of P and thus require onlythe connectivity of the subgraph S . For example, this occurs often when querying protein–protein interaction networks [9].

We consider the general case of the problem. We allow matching a node from H with a label from P that is similar(rather than identical) to its label in order to allow more solutions. Thus each node of H is given a set of labels byLabelH (i.e., its label and the labels similar to it). For example, when querying protein–protein interaction networks, a noderepresenting a certain protein can be matched with any label representing a protein homologous to it.

The weighted variant of this problem was introduced by Bruckner et al. [9]. In this variant the edges of H have weightsand we are also given a weight W . The weights can represent the confidence we have in the existence of the interactionstheir edges represent. As in [20], we assume that the weights are positive integers. Our goal is to find a pair (S, label) whichfulfills the requirements of the unweighted variant and such that the sum of the weights of the edges of S is at most W .Clearly, we can use an algorithm for this variant to solve its corresponding optimization problem (i.e., the problem in whichwe need to minimize the sum of the weights of the edges of S).

We note that when we refer to the problem without specifying its variant, we consider the unweighted general case.Lacroix et al. [25] give an exponential time algorithm for the problem which is implemented by a tool called MOTUS.

Fellows et al. [17] prove that the limited case of the problem is NP-hard when P is a set and H is a tree of maximumdegree three, and when P consists of only two different labels and H is bipartite with maximum degree four. Moreover,the limited case of the problem is NP-hard when H is a tree of maximum degree four in which each label appears at mosttwice [15], and when P is a set and H has diameter two [2].

The limited case of the problem is polynomial-time solvable when H has maximum degree two, when P consists of abounded number of different labels and H has a bounded treewidth, and when P is a set and H is a tree in which eachlabel is shared by at most two nodes [17]. It is also polynomial-time solvable when P is a set and H is a caterpillar [2].

The problem is W[1]-hard when parameterized by the number of different labels in P [17]. Betzler et al. [4] prove thatthe problem is W[1]-hard when parameterized by |V H | − |P | even if P consists of only two different labels or it is a set.They also prove that if we search for an l-connected or l-edge connected subgraph of H instead of a subtree of H , theproblem is W[1]-hard when parameterized by |P |.

On the positive side, the problem is fixed-parameter tractable with parameter k = |P |. Table 1 presents a summary ofknown parameterized algorithms for the problem. The algorithms presented in [9,17,5] are based on color coding [1], and


R.Y. Pinter, M. Zehavi / Journal of Discrete Algorithms 27 (2014) 29–53 31

the algorithms presented in [20,24] are based on reductions to the multilinear detection problem [23]. Our algorithms arebased on the algebraic framework of Björklund et al. [6].

We also consider a variant of the problem where we can delete and insert labels to P , and another variant where thesolution may not be connected (known as the min connected components problem). These variants were introduced byBruckner et al. [9] and Dondi et al. [14], respectively.

Remark. During the time in which this paper was under review, Björklund et al. [7] presented algorithms similar to ouralgorithms for the topology-free network query problem and its variant that allows deleting and inserting labels.

1.2. Alignment network queries

When we refer to alignment network queries, we consider node labeled graphs. We use lv to denote the label of a nodev , � is a label-to-label similarity score, and δ is a function specifying for each label l the non-positive cost of deletinga node whose label is l. Given �, δ, a pattern graph P and a host graph H s.t. |P | � |H |, the objective of an alignmentnetwork query is to find a subgraph S of H whose alignment with P is the best among all such subgraphs, considering thetopologies and the similarity (rather than identity) between the node labels of the graphs.

Subgraph isomorphism is the basis for matching topologies by alignment network queries. It is NP-hard even if P is asimple path and H is a general graph, or P is a forest and H is a tree [19]. Given a family of DAGs (i.e., Directed AcyclicGraphs) F , denote by Fund the family of graphs consisting of the underlying undirected graphs of the graphs in F . Usingedge subdivision, it is easy to prove that if subgraph isomorphism is NP-hard for Fund, it is also NP-hard for F . On the

positive side, it is solvable in time O (|V P |1.5|V H |

log |V P | ) for trees [34].In biological networks, certain nodes have been deleted or inserted during evolution, a fact that must be considered by

an alignment network query. Deletions in P can be interpreted as insertions in H and vice versa. Thus considering deletionsin both graphs is equal to considering deletions and insertions in both graphs.

Homeomorphism is a variant of isomorphism that allows deleting nodes whose degrees are two. After a node is deleted,its neighbors are connected. In directed graphs each deleted node must have one ingoing neighbor and one outgoing neigh-bor, and the inserted edge is directed from the ingoing neighbor to the outgoing neighbor. The subgraph homeomorphismproblem is NP-hard [26]. We note that the choice of such deletions has a biological reason (see Section 3.6).

Pinter et al. [30] limit P and H to trees, allow deletions only from H , assume the same deletion cost c ∈ R−0 for all

nodes, and define the following score.

Definition 1. Let c ∈ R−0 . Also let P and S H be two trees s.t. there exists a homeomorphism preserving mapping M from P

to S H which matches all the nodes of P . The ALSH (i.e., Approximate Labeled Subtree Homeomorphism) score of such Mis

ALSH(M) = c · (|V S H | − |V P |) +∑

v∈V P

�[lM(v), lv ]

Definition 2. Given �, c ∈ R−0 , and two trees P and H , the ALSH problem is to find a homeomorphism preserving mapping

M from P to a subgraph S H of H which matches all the nodes of P and maximizes the ALSH score.

Pinter et al. [30] give an O (|V P ||V H |(|V P | + log |V H |)) time algorithm for the ALSH problem, which performs in timeO (|V P ||V H |( |V P |

log |V P | + log |V H |)) when the number of labels is bounded. We refer to this algorithm as the ALSH algorithm.Given pairs of nodes which must be matched in the alignment (each pair contains a node from P and a node from H),Lozano et al. [27] show how to achieve better performance for a family of algorithms which includes the ALSH algorithmand the algorithm we present in Section 3.2.

A multisource tree is a directed graph whose underlying undirected graph is a tree. The adaption of the ALSH problemand algorithm to multisource trees and results of using the algorithm to perform inter-species and intra-species alignmentsof metabolic pathways appear in [31]. This algorithm was also used in a pathway evolution study [28].

NetGrep [3] allows P and H to have general topologies. It has an exponential running time and is efficient only for verysmall queries. Cheng et al. [12] allow H to have a general topology, and use homomorphism instead of homeomorphism toprovide a simple polynomial time algorithm. However, allowing different nodes in P to be mapped to the same node in His undesirable.

Another approach, based on color coding [1], enables H to have a general topology. The sum of |V P | and the numberof deletions from H is bounded to k to provide parameterized algorithms with parameter k. This approach is used inQPath [36], which performs linear path queries in time O ∗((2e)k). QNet [35] improves QPath, allowing P to be a boundedtreewidth graph. It runs in time O ∗((3e)k|V H |t+1), where t is the treewidth of P . PADA1 [8] is an alternative to QNet whichbounds the size of the feedback vertex set of P instead of its treewidth. It runs in time O ∗((3e)k|V H ||F |), where |F | is thesize of the feedback vertex set of P . The space complexities of these algorithms have an exponential dependency on k.



Fig. 1. An illustration used in explaining the definitions given in Sections 2.1 and 2.2. Assume that the labels and neighbors of each node are orderedaccording to the order between their indexes (e.g., N1(v5) = v3 and N2(v5) = v6).

Hüffner et al. [22] show how to reduce the running time of QPath to O ∗(4.32k), and give several practical improvementsto color coding. Scott et al. [33] improve the running time of QPath when each node of P has an additional label whichrestricts the set of nodes of H with whom it can be matched. They also extend QPath to two-terminal series-parallel graphs.

We consider topologies more general than multisource trees, deletions from both P and H and a more flexible deletionsscoring scheme. We use homeomorphism and do not bound P or the number of deletions. Our algorithms have polynomialrunning times and are based on dynamic programming, finding disjoint paths in graphs and maximum weight matchingcomputations.

2. Algorithms for topology-free network queries

Our algorithms are based on the algebraic framework of Björklund et al. [6].We express our parameterized problem by associating monomials with potential solutions. Each feasible solution is

associated with a unique monomial, and each monomial which is not associated with a feasible solution is associated withan even number of different potential solutions. Having a polynomial which is the sum of the monomials associated withpotential solutions, we need to determine whether it has a monomial whose coefficient is odd.

First we consider the unweighted general case of the topology-free network query problem. In Sections 2.1 and 2.2 wedefine the potential solutions. Then, in Sections 2.3 and 2.4, we associate them with monomials and consider the polynomialwhich is the sum of these monomials. Section 2.5 presents the algorithm, and the last three sections concern three variantsof the problem.

Throughout this section we denote |P | by k.

2.1. Potential topologies

Given a rooted tree S , we denote its root by root(S).We intend to traverse H in order to find a solution by using the following approach: If the current node in which we

are located is v , we try to extend some of our partial solutions by using its neighbors. Thus we may visit some nodes morethan once. We use a rooted tree to represent our traversal, where its root corresponds to the node from where we havestarted it. The two following definitions concern this rooted tree.

Definition 3. Given a rooted tree S , a function match : V S → V H , s ∈ V H and an integer t , we say that (S,match) is an(s, t)-potential topology if

1. match(root(S)) = s.2. |V S | = t .3. If {v, u} ∈ E S , then {match(v),match(u)} ∈ E H .4. If v and u are different nodes in S which have the same father, then match(v) �= match(u).

For example, in Fig. 1, (S,match1) and (S,match2) are (v3,5)-potential topologies.



Definition 4. Given an (s, t)-potential topology (S,match), its embedding, denoted by EMB(S, match), is a subgraph G of Hwhich is defined as follows:

• V G = {v ∈ V H : ∃u ∈ V S s.t. match(u) = v}.• EG = {{u, v} ∈ E H : ∃{w, p} ∈ E S s.t. {match(w),match(p)} = {u, v}}.

Examples for this definition are given in Fig. 1, illustrating EMB(S,match1) and EMB(S,match2).The following observation follows immediately from Definitions 3 and 4.

Observation 1. Let (S,match) be an (s, t)-potential topology. There are no two nodes in S which match maps to the same node iff|V EMB(S,match)| = t. Moreover, if |V EMB(S,match)| = t, then EMB(S,match) is a tree.

2.2. Potential solutions

Roughly speaking, a potential solution describes a traversal of H in which we label each traversed node v ∈ V H by usinga label from LabelH (v). More precisely, it consists of a potential topology (S,match) (describing the traversal of H) and afunction that assigns a label to each node v ∈ V S from LabelH (match(v)) (indicating the labels of the traversed nodes).

Formally, our potential solutions are defined as follows.

Definition 5. Given an (s, t)-potential topology (S,match) and a function label : V S → L, we say that (S,match, label) is an(s, t)-potential solution if ∀v ∈ V S : label(v) ∈ LabelH (match(v)).

For example, in Fig. 1, (S,matchi, label j) is a (v3,5)-potential solution for all i, j ∈ {1,2}.

Definition 6. We say that an (s, t)-potential solution (S,match, label) is bijectively labeled if label is bijective.

For example, in Fig. 1, (S,match1, label2) and (S,match2, label2) are bijectively labeled.We next determine which potential solutions are feasible solutions. First we show that we can assume WLOG that P = L.Suppose that L contains labels which do not appear in P . We can delete them from L and from the sets assigned by

LabelH , and get that a subtree of H is a solution to the original instance iff it is a solution to the new one.Now suppose l is a label which appears nl > 1 times in P . We can replace l with nl new labels l1, l2, . . . , lnl as follows.

1. Remove l from L and insert each of the new labels.2. Remove the occurrences of l from P and insert each of the new labels once.3. Replace LabelH with the following function Label′H : V H → 2L .

Label′H (v) ={

(LabelH (v) \ {l}) ∪ {l1, l2, . . . , lnl } if l ∈ LabelH (v)

LabelH (v) otherwise

Again we get that a subtree of H is a solution to the original instance iff it is a solution to the new one. We can repeatthis process for each label which appears more than once in P , and thus we make the following assumption.

Assumption 1. We assume WLOG that P = L.

Given an instance (L, H, LabelH ) of the problem (note that P = L) and a node s ∈ V H , denote by Bi jL,H,LabelH ,s theset of all bijectively labeled (s,k)-potential solutions, and by BijTree

L,H,LabelH ,s the set of all bijectively labeled (s,k)-potential

solutions whose embeddings have k nodes. When L, H, LabelH and s are clear from context, we write only Bi j and Bi jTree .For example, in Fig. 1, (S,match1, label2) ∈ BijP ,H,LabelH ,v3

\ Bi jTree and (S,match2, label2) ∈ Bi jTreeP ,H,LabelH ,v3

.

Suppose that (S,m, l) ∈ BijTreeL,H,LabelH ,s . By Observation 1, EMB(S,m) is a tree on k nodes and m−1 is well defined. We get

that l ◦ m−1 : V EMB(S,m) → L fulfills the requirement [∀v ∈ V EMB(S,m) : l ◦ m−1(v) ∈ LabelH (v)] and uses each label of L once.Thus we get the following observation.

Observation 2. Given (S,m, l) ∈ BijTreeL,H,LabelH ,s , (EMB(S,m), l ◦ m−1) is a solution to the instance (L, H, LabelH ) of the problem.

Furthermore, we have the following simple observation.

Observation 3. Let (S∗, label∗) be a solution to an instance (L, H, LabelH ) of the problem that contains a node s. Then there is a tuple(S,match, label) ∈ BijTree

L,H,LabelH ,s s.t. EMB(S) = S∗ and label∗ ≡ label ◦ match−1 .



2.3. The monomials associated with potential solutions

We introduce one indeterminate xe for each edge e ∈ E H , and one indeterminate yv,l for each pair of node v ∈ V H andlabel l ∈ L. We rename them arbitrarily as z1, z2, . . . , z|E H |+k|V H | . For an indeterminate of the type xe (resp. yv,l), index(e)(resp. index(v, l)) denotes the index it received in this ordering.

First we associate a monomial with each potential solution.

Definition 7. The monomial of an (s, t)-potential solution (S,match, label) is

m(S,match, label) =∏

{v,u}∈E S

x{match(v),match(u)}∏

v∈V S

ymatch(v),label(v)

For example, in Fig. 1, the monomials associated with (S,matchi, label j) for i, j ∈ {1,2} are:

• i = 1, j = 1: x{v3,v4}x2{v3,v6}x{v4,v6} y2v3,l4

y2v4,l2

yv6,l5 .

• i = 1, j = 2: x{v3,v4}x2{v3,v6}x{v4,v6} yv3,l1 yv3,l4 yv4,l2 yv4,l3 yv6,l5 .• i = 2, j = 1: x{v2,v3}x{v3,v6}x{v4,v6}x{v5,v6} yv2,l2 yv3,l4 yv4,l2 yv5,l4 yv6,l5 .• i = 2, j = 2: x{v2,v3}x{v3,v6}x{v4,v6}x{v5,v6} yv2,l2 yv3,l1 yv4,l3 yv5,l4 yv6,l5 .

On the one hand, we have the following observation.

Observation 4. Let e be an element in BijTreeL,H,LabelH ,s . Then, there is no element e′ �= e in BijL,H,LabelH ,s which has the same monomial

as e.

On the other hand, for Bij \ Bi jTree we have the following lemma.

Lemma 1. There is a fixed-point-free involution (i.e., a permutation that is its own inverse) f : BijL,H,LabelH ,s \ BijTree → Bij \ BijTree s.t.

m(S,m, l) = m( f (S,m, l)) for all (S,m, l) ∈ Bi j \ BijTree.

Proof. Select arbitrarily an order < on the labels in L. Given a,b, c,d ∈ L, (a,b) < (c,d) iff a < c or (a = c ∧ b < d).Let (S,m, l) be an element of Bij \ BijTree and denote:

1. M S,m,l = {{v, u}: v, u ∈ V S , v �= u,m(v) = m(u)}.2. SwapS,m,l = argmin{v,u}∈M(S,m,l){(min{l(v), l(u)}, max{l(v), l(u)})}.

For example, for (S,match1, label2) as in Fig. 1, we get that M S,match1,label2 = {{u1, u4}, {u3, u5}} and SwapS,match1,label2 ={u1, u4}.

By Observation 1, M S,m,l �= ∅. Since for each {v, u} ∈ M S,m,l we have a different (min{l(v), l(u)}, max{l(v), l(u)}), SwapS,m,lis well defined.

Denote {v∗, u∗} = SwapS,m,l . We define lS,m,l as l except that lS,m,l(v∗) = l(u∗) and lS,m,l(u∗) = l(v∗). Since m(v∗) = m(u∗)and (S,m, l) ∈ Bij \ BijTree , we have that ∀v ∈ V S : lS,m,l(v) ∈ LabelH (m(v)). Therefore (S,m, lS,m,l) ∈ Bij \ BijTree .

We define f (S,m, l) = (S,m, lS,m,l). Clearly m(S,m, l) = m( f (S,m, l)). Similarly we define f for each element in Bij\BijTree

to get a function f : Bij \ BijTree → Bij \ BijTree .Denote l′ = lS,m,l , {v∗∗, u∗∗} = SwapS,m,l′ and l′′ = lS,m,l′ . Therefore e′ = (S,m, l′) = f (e) and e′′ = (S,m, l′′) = f ( f (e)). We

need to prove that e �= e′ and e = e′′ .We prove that e �= e′ by showing that there is no function g : V S → V S s.t.

1. ∀v ∈ V S : m(v) = m(g(v)) ∧ l(v) = l′(g(v)).2. ∀v, u ∈ V S s.t. v is an ancestor of u in S: g(v) is an ancestor of g(u) in S.

By our definition of l′ and since l is bijective, the only g which fulfills the first requirement is defined as follows:

g(v) =⎧⎨⎩

u∗ if v = v∗v∗ if v = u∗v otherwise

If v∗ and u∗ do not have the same father, g does not fulfill the second requirement. However, since m(v∗) = m(u∗) and(S,m) is an (s,k)-potential topology, we get that v∗ and u∗ cannot have the same father and thus e �= e′ .

In order to prove that e = e′′ , it is enough to show that l ≡ l′′ . This follows by observing that {v∗, u∗} = {v∗∗, u∗∗}. �Now we sum the monomials that are associated with bijectively labeled (s,k)-potential solutions:



Definition 8. Given an instance (L, H, LabelH ) of the problem and a node s of H , their polynomial is

POLL,H,LabelH ,s(z1, z2, . . . , z|E H |+k|V H |) =∑

(S,match,label)∈BijL,H,LabelH ,s

m(S,match, label).

Observations 2, 3 and 4 and Lemma 1 imply the following lemma.

Lemma 2. Let s ∈ V H . An instance (L, H, LabelH ) of the problem has a solution which contains s iff POLL,H,LabelH ,s has a monomialwith an odd coefficient.

Proof. First suppose that (L, H, LabelH ) has a solution which contains s. By Observation 3, there is e ∈ BijTreeL,H,LabelH ,s . By

Observation 4, there is no element e′ �= e in BijL,H,LabelH ,s which has the same monomial as e. We get that POLL,H,LabelH ,s hasa monomial with coefficient 1.

Now suppose that POLL,H,LabelH ,s has a monomial with an odd coefficient. By Lemma 1, there is e ∈ BijTreeL,H,LabelH ,s . Then,

by Observation 2, we get that there is a solution to (L, H, LabelH ) which contains s. �2.4. Evaluating POLL,H,LabelH ,s

We evaluate the polynomial over the finite field Fq (i.e., the finite field of order q), where q = 2�log(3(2k−1)|V H |)� . Sincethis field has characteristic 2, we get the following corollary to Lemma 2.

Corollary 1. Let s ∈ V H . An instance (L, H, LabelH ) of the problem has a solution which contains s iff POLL,H,LabelH ,s is a nonzeropolynomial.

The following lemma is proved in [13] and [32].

Lemma 3. Let p(x1, x2, . . . , xn) be a nonzero polynomial of total degree at most d over the finite field F . Then, for a1,a2, . . . ,an ∈ Fselected independently and uniformly at random: Pr(p(a1,a2, . . . ,an) �= 0) � 1 − d/|F |.

Note that the degree of POLL,H,LabelH ,s is at most 2k − 1. Using Corollary 1 and Lemma 3, we get the following lemma.

Lemma 4. Suppose s ∈ V H , and a1,a2, . . . ,a|E H |+k|V H | ∈ Fq are selected independently and uniformly at random. If the in-stance (L, H, LabelH ) has a solution which contains s, then Pr(POLL,H,LabelH ,s(a1,a2, . . . ,a|E H |+k|V H |) �= 0) � 1 − 1

3|V H | . OtherwisePOLL,H,LabelH ,s(a1,a2, . . . ,a|E H |+k|V H |) = 0.

Given A ⊆ L, let L[A, s] denote the set of (s,k)-potential solutions that do not use labels from A. The principle ofinclusion–exclusion implies the following observation, and the fact that the polynomial is evaluated over a field of charac-teristic 2 implies its following corollary.

Observation 5.

POLL,H,LabelH ,s(z1, z2, . . . , z|E H |+k|V H |) =∑A⊆L

(−1)|A| ∑(S,match,label)∈L[A,s]

m(S,match, label).

Corollary 2.

POLL,H,LabelH ,s(z1, z2, . . . , z|E H |+k|V H |) =∑A⊆L

∑(S,match,label)∈L[A,s]

m(S,match, label).

Select A ⊆ L and a1,a2, . . . ,a|E H |+k|V H | ∈ Fq arbitrarily. Next we present a procedure EVAL(L, H, LabelH , A,a1,a2, . . . ,

a|E H |+k|V H |) for evaluating the sum∑

(S,match,label)∈L[A,v] m(S,match, label) for all v ∈ V H in polynomial time where ∀1 � i �|E H | + k|V H |,ai is assigned to zi .

For each v ∈ V H , we order its neighbors in H arbitrarily and denote them accordingly as N1(v), N2(v), . . . , Nd(v)(v),where d(v) denotes their number.

The next definition will help us enforce condition 4 of Definition 3.

Definition 9. Given v ∈ V H and integers t and i s.t. 1 � t and 0 � i � d(s), we say that a (v, t)-potential solution(S,match, label) is a (v, t, i)-potential solution if root(S) does not have a son u s.t. match(u) = N p(v) for p � i.



Fig. 2. The matrix used in EVAL(L, H, LabelH , {l3, l5},a1,a2, . . . ,a|E H |+k|V H |), where L, H and LabelH are as in Fig. 1. The cells hold the results of the specifiedcalculations.

For example, in Fig. 1, (S,match1, label1) and (S,match1, label2) are (v3,5,1)-potential solutions. Note that a (v, t)-po-tential solution is a (v, t,0)-potential solution.

We denote by L[A, v, t, i] the set of (v, t, i)-potential solutions that do not use labels from A. For example, among{(S,matchi, label j): i, j ∈ {1,2}} in Fig. 1, only (S,match1, label1) belongs to L[{l1}, v3,5,1].

We use dynamic programming to compute the required sum. We use a matrix M which has a cell [v, t, i] ∀v ∈ V H ,1 � t � k and 0 � i � d(v). The cell M[v, t, i] holds the sum

∑(S,match,label)∈L[A,v,t,i] m(S,match, label). We are interested in

cells of the type M[v,k,0]. See Fig. 2 for an illustrative example.

Base cases:

• ∀v ∈ V H , 0 � i � d(v):– If LabelH (v) \ A �= ∅: M[v,1, i] = ∑

l∈LabelH (v)\A aindex(v,l)– Otherwise: M[v,1, i] = 0

• ∀v ∈ V H , 1 < t � k: M[v, t,d(v)] = 0

Step:

• ∀v ∈ V H , 1 < t � k, 0 � i < d(v):

M[v, t, i] = M[v, t, i + 1] + aindex({v,Ni+1(v)})∑

1�t′�t−1

M[Ni+1(v), t′,0

] · M[v, t − t′, i + 1

]

Order of computation:

• For t = 1,2, . . . ,k:– For each v ∈ V H :

∗ For i = d(v),d(v) − 1, . . . ,0: Compute M[v, t, i]

Correctness:Note that each cell depends only on cells that are calculated before it. We prove that M[v, t, i] = ∑

(S,m,l)∈L[A,v,t,i] m(S,

m, l) by using induction on the computation.The base cases are clearly correct.Given v ∈ V H , 1 < t � k and 0 � i < d(v), we assume that M[v, t, i + 1], M[Ni+1(v), t′,0] and M[v, t − t′, i + 1] for all

1 � t′ � t − 1 are correct, and prove the correctness of M[v, t, i].

M[v, t, i] = M[v, t, i + 1] + aindex({v,Ni+1(v)})∑

1�t′�t−1

M[Ni+1(v), t′,0

] · M[v, t − t′, i + 1

]

=∗1

∑(S,m,l)∈L[A,v,t,i+1]

m(S,m, l) + aindex({v,Ni+1(v)}) ·∑

1�t′�t−1

( ∑(S,m,l)∈L[A,Ni+1(v),t′,0]

m(S,m, l)

)



·( ∑

(S,m,l)∈L[A,v,t−t′,i+1]m(S,m, l)

)

=∗2

∑(S,m,l)∈L[A,v,t,i+1]

m(S,m, l) +∑

(S,m,l)∈L[A,v,t,i]s.t.root(S) has a son u for which m(u)=Ni+1(v)

m(S,m, l)

=∑

(S,m,l)∈L[A,v,t,i]m(S,m, l)

*1 – by the induction hypothesis.*2 – by the definitions of L[A, v, t, i], (v, t, i)-potential solutions and their monomials.

Time and space complexities:

The procedure runs in time O (∑

v∈V H

∑1�t�k

∑0�i�d(v) t) = O (|E H |k2). We have O (|E H |k) cells and each requires O (1)

space. Therefore the space complexity is O (|E H |k).

2.5. The algorithm

First we present an algorithm for the decision version of the problem. We use the matrix M of the procedure EVAL (seeSection 2.4), and an array SUM which has a cell for every node of H .

DECIDE(L, H, LabelH ):

1. Select a1,a2, . . . ,a|E H |+k|V H | ∈ Fq independently and uniformly at random.2. Initialize each of the cells of SUM to 0.3. For each A ⊆ L:

(a) Run the procedure EVAL(L, H, LabelH , A,a1,a2, . . . ,a|E H |+k|V H |).(b) For each s ∈ V H : SUM[s] ⇐ SUM[s] + M[s,k,0].

4. Accept iff at least one of the cells of SUM does not hold 0.

Correctness:Lemma 4, Corollary 2 and the correctness of EVAL imply the following lemma.

Lemma 5. Let (L, H, LabelH ) be an instance of the problem. If it has a solution, then the algorithm accepts with probability at least1 − 1

3|V H | . Otherwise the algorithm rejects.

Time and space complexities:By the pseudocode of the algorithm and the time and space complexities of EVAL, the running time of the algorithm is

O (2k|E H |k2) and its space complexity is O (|E H |k).

Using the decision algorithm, we now solve the search version of the problem.

SEARCH(L, H, LabelH ):

1. Initialize H ′ as H .2. For each node v ∈ V H :

(a) Initialize H ′′ as H ′ without v and the edges that are incident to it.(b) If DECIDE(L, H ′′, LabelH ) accepts, assign H ′ ⇐ H ′′ .

3. If |V H ′ | �= k, return NIL.4. Calculate a spanning tree S of H ′ . If such S does not exist (i.e., H ′ is not connected), return NIL.5. Construct a bipartite graph G = (V S , L, E) where E = {{v, l} : l ∈ LabelH (v)}. Calculate a maximum matching M in G . If

its size is not k, return NIL. Else set label : V S → L according to M (i.e., label(v) is the label l with whom v is matchedby M).

6. Return (S, label).

Correctness:Let (L, H, LabelH ) be an instance of the problem. By the pseudocode, if the instance does not have a feasible solution

then the algorithm returns NIL, and if the algorithm returns a solution then it is a feasible solution.Now suppose that the instance has a feasible solution. By the pseudocode and Lemma 5, the probability that the algo-

rithm returns a solution is at least the probability that the decision algorithm answers correctly |V H | times, which is atleast (1 − 1

3|V H | )|V H | � 2/3.

Thus we have the following lemma.



Lemma 6. Let (L, H, LabelH ) be an instance of the problem. If it has a solution, then the algorithm returns a feasible solution withprobability at least 2/3. Otherwise the algorithm returns NIL.

Time and space complexities:

We find a spanning tree in Step 4 in time O (k2) by using depth-first search and a maximum matching in Step 5 in timeO (k2.376) [29]. Thus, by the pseudocode and the time and space complexities of the decision algorithm, we have that therunning time of the algorithm is O (2k|E H ||V H |k2) and its space complexity is O (|E H |k).

We have proved the following theorem.

Theorem 1. The general case of the topology-free network query problem can be solved in time O ∗(2k) and polynomial space com-plexity by a randomized algorithm with a constant, one-sided error.

2.6. The weighted variant

Guillemot et al. [20] show how to modify their algorithm for the unweighted case to the weighted case while increasingits running time by O ∗(W 2) and its space complexity by O ∗(W ). Their idea can also be used for our algorithm. We showa different idea which increases the running time of our algorithm by O ∗(W ) and its space complexity by O ∗(W ).

We assume WLOG that the weight of each edge in H is at most W . We add an indeterminate w to which we do notassign values, and perform only one change in the decision algorithm for the unweighted case, which is in the step of theprocedure EVAL. Each cell [v, t, i] of M holds a polynomial in w with coefficients from Fq whose degree is at most W , andwe use the degrees of w to track the weights of its corresponding (v, t, i)-potential solutions.

Step:

• ∀v ∈ V H , 1 < t � k, 0 � i < d(v):

M[v, t, i] = M[v, t, i + 1] + w weight({v,Ni+1(v)})aindex({v,Ni+1(v)})∑

1�t′�t−1

M[Ni+1(v), t′,0

] · M[v, t − t′, i + 1

]

where we omit monomials of the form c · wd where c ∈ Fq and d > W .

In the search algorithm we change Step 4 as follows.

4. Calculate a minimum-weight spanning tree S of H ′ . If such S does not exist or the sum of the weights of its edges ismore than W , return NIL.

The proofs of correctness of the decision and search algorithms are similar to those of the original algorithms. Further-more, their space complexity is clearly O (W |E H |k).

We compute each minimum weight spanning tree in time O (k2) [11]. Each cell in M requires O (k) multiplications ofpolynomials with coefficients from Fq whose degrees are at most W (then we omit the unnecessary monomials in O (W )

time). Each such multiplication can be done in time O (W log W ) [21]. Thus the running time of the decision algorithm isO (2k|E H |k2W log W ) and of the search algorithm it is O (2k|E H ||V H |k2W log W ).

We have proved the following lemma.

Lemma 7. The weighted general case of the topology-free network query problem can be solved in time O ∗(2k W ) and space complexityO ∗(W ) by a randomized algorithm with a constant, one-sided error.

2.7. Allowing deletions and insertions

As we have noted in Section 1.2, in biological networks certain nodes have been deleted or inserted during evolution.Therefore we consider the variant of the problem in which we also have:

• D – A nonnegative integer (D stands for deletions).• I – A nonnegative integer (I stands for insertions).

Our goal is to find a triple (S, U , label) where S is a subtree of H s.t. |V S | � |U |+ I , U ⊆ V S s.t. |U | � k − D , label : U → Land the following requirements hold.

1. ∀v ∈ U : label(v) ∈ LabelH (v).2. ∀l ∈ L: The number of occurrences of l in P is at least |{v ∈ U : label(v) = l}|.



We assume WLOG that D � k and I � |V H |. We briefly explain how to modify the algorithm presented in Section 2.5 towork for this variant.

We use the following definition for potential solutions.

Definition 10. Given a (v, t)-potential topology (S,m), U ⊆ V S and l : U → L, (S,m, U , l) is a (v, t, i, j)-potential solution if

1. ∀v ∈ U : l(v) ∈ LabelH (match(v)).2. root(S) does not have a son u s.t. m(u) = Np(v) for p � i.3. |V S | = t + j and |U | = t .

L[A, v, t, i, j] denotes the set of (v, t, i, j)-potential solutions that do not use labels from A.We modify the dynamic programming calculation to track the number of insertions. We use a matrix M which

has a cell [v, t, i, j] for each v ∈ V H , 0 � t � k − D , 0 � i � d(v) and 0 � j � I . The cell M[v, t, i, j] holds the sum∑(S,match,label)∈L[A,v,t,i, j] m(S,match, label).

Base cases:

• ∀v ∈ V H ,0 � i � d(v):– If LabelH (v) \ A �= ∅: M[v,1, i,0] = ∑

l∈LabelH (v)\A aindex(v,l)– Otherwise: M[v,1, i,0] = 0

• ∀v ∈ V H ,0 � i � d(v): M[v,0, i,1] = 1• ∀v ∈ V H , 0 � t � k − D , 0 � j � I s.t. 1 < t + j: M[v, t,d(v), j] = 0

Step:

• ∀v ∈ V H , 0 � t � k − D , 0 � i < d(v), 0 � j � I s.t. 1 < t + j:

M[v, t, i, j] = M[v, t, i + 1, j] + aindex({v,Ni+1(v)}) ·∑

0�t′�t, 0� j′� j s.t. 1�t′+ j′�t+ j−1

M[Ni+1(v), t′,0, j′

]· M

[v, t − t′, i + 1, j − j′

]In the decision algorithm we change Step 3(b) as follows.

3. (b) For each s ∈ V H : SUM[s] ⇐ SUM[s] + ∑0� j�I M[s,k − D,0, j].

In the search algorithm we change Steps 3, 5 and 6 as follows.

3. If |V H ′ | > k − D + I , return NIL.

5. Construct a bipartite graph G = (V S , L, E) where E = {{v, l}: l ∈ LabelH (v)}. Calculate a maximum matching M in G . If|M| < k − D , return NIL. Else set U as the set of matched nodes, and set label : U → L according to M .

6. Return (S, U , label).

The proofs of correctness of the decision and search algorithms are similar to those of the original algorithms. Therunning time is increased by O (I2) and the space complexity by O (I). We have the following lemma.

Lemma 8. The general case of the topology-free network query problem with deletions and insertions can be solved in time O ∗(2k)

and polynomial space complexity by a randomized algorithm with a constant, one-sided error.

2.8. Reduction to the min connected components problem

In the general case of the min connected components problem we have:

1. L – A set of labels.2. P – A multiset of labels from L.3. H – An undirected graph.4. LabelH : V H → 2L .5. c – A positive integer.

Our goal is to find a pair (S, label) where S is a subforest of H of at most c trees, label : V S → L and the followingrequirements hold.



1. ∀v ∈ V S : label(v) ∈ LabelH (v).2. ∀l ∈ L: The number of occurrences of l in P is |{v ∈ V S : label(v) = l}|.

In the limited case of this problem we add the following restriction on the input: ∀v ∈ V H : |LabelH (v)| = 1.The min connected components problem was introduced by Dondi et al. [14]. There is a simple reduction from the min

connected components problem to the weighted topology-free network query problem with 0\1 weights by Guillemot etal. [20]: Given L, P , H, LabelH and c, we construct a complete graph H ′ with the same nodes. We assign a weight 0 to theedges of H ′ which appear also in H and 1 to the other edges. We set W = c − 1. The input for the weighted topology-freenetwork query problem is L, P , H ′, LabelH and W .

The algorithms with the previous best running times to the min connected components problem use this reduction andachieve time O ∗(4k) for the general case [20] and time O ∗(2.54k) for the limited case [24]. Using the same reduction, wehave the following lemma.

Lemma 9. The general case of the min connected components problem can be solved in time O ∗(2k) and polynomial space complexityby a randomized algorithm with a constant, one-sided error.

3. Algorithms for alignment network queries

First we present our definitions and the intuition behind them. In particular, we present the SubDAG alignment problem,which is our definition for the alignment network query problem. We then present an algorithm for multisource trees.Afterwards we modify it to work for a certain family of DAGs. In the following two sections we provide the proofs ofcorrectness of the algorithms and analyze their running times. Our algorithms are based on dynamic programming, findingdisjoint paths in graphs and maximum weight matching computations. Finally, we briefly discuss an application of ouralgorithms to the alignment of metabolic pathways.

3.1. Definitions

We start by giving a definition regarding disjoint paths in DAGs, which is followed by an explanation of the intuitionbehind its necessity. We note that dsr stands for distribution, and frb stands for forbidden.

Definition 11. Given a DAG G and a node v ∈ V G , define:

1. dsr(v) = {c ∈ V G : there are two internally node-disjoint paths from v to c}.2. frb(v) = {c ∈ V G : ∃u ∈ V G s.t. there are two internally node-disjoint paths from u to c and v is on exactly one of them}.3. dsr(G) = maxv∈V G |dsr(v)|, frb(G) = maxv∈V G |frb(v)|.4. G is a treelike DAG if dsr(G) and frb(G) are bounded.5. v is a crossroad if ∃u ∈ V G s.t. v ∈ dsr(u).

For example, in Fig. 3, the nonempty dsr and frb sets are frb(p2) = dsr(p3) = {p1}, frb(h2) = dsr(h3) = {h1}, frb(p6) =frb(p7) = dsr(p8) = {p4, p5} and frb(h6) = frb(h7) = dsr(h8) = {h4,h5}. We have that frb(P ) = dsr(P ) = frb(H) = dsr(H) = 2,and the crossroads are p1, p4, p5,h1,h4 and h5.

When matching two nodes, we decide how to map the subgraphs of their outgoing neighbors, where a subgraph ofa node is the nodes that are reachable from it and the edges between them. Consider the graphs given in Fig. 3. Whenmatching p8 with h8, we map the subgraphs of p6 and p7 to the subgraphs of h6 and h7. We cannot match a node p∗that belongs to both the subgraphs of p6 and p7 with one node when mapping the subgraph of p6 and with another whenmapping the subgraph of p7, and then construct one mapping from these two mappings (since we need the mapping tomatch each node in the subgraph of p8 with one node of H , and here we match p∗ with two different nodes of H). Thesubgraphs of p6 and p7 must agree on how the crossroads in dsr(p8) are matched with the crossroads in dsr(h8) (e.g.,p4 ∈ dsr(p8) cannot be matched with h4 ∈ dsr(h8) in the subgraph of p6 and with h5 ∈ dsr(h8) in the subgraph of p7).Thus, when matching p8 with h8, we compute how to map the subgraphs of p6 and p7 with the subgraphs of h6 and h7subject to a distribution of dsr(p8) to dsr(h8) (i.e., a function from dsr(p8) to dsr(h8), indicating how to match the nodesin dsr(p8)). Moreover, when matching p6 we are forbidden from deciding how to match the crossroads in frb(p6) (we savethe scores of all such options). Similarly, when matching p7, h6 and h7, we are forbidden from deciding how to matchthe crossroads in their frb sets. Only afterwards, when iterating over each distribution of dsr(p8) to dsr(h8), we make thisdecision.

We use the next definition to limit the subgraphs which are potential solutions (e.g., if a node u is in a potential solution,we assume that each node v that is supposed to distribute it (i.e., u ∈ dsr(v)) and each of the nodes on the paths from v tou (i.e., nodes that have u in their frb sets) are also in the potential solution). This allows us to design an efficient algorithmfor treelike DAGs. We can still delete nodes from such a subgraph, but only if they have one ingoing neighbor and oneoutgoing neighbor in the subgraph. Consider Fig. 3 as an example. If p1 is in a potential solution then so are p2 and p3,and p2 can be deleted.



Fig. 3. An illustration of parts of the matrices that are used in Algorithm 2. We consider the following parameters: (1) the labels of p2 and h2 equal l′ andthe labels of all the other nodes equal l, (2) �[l, l] = 10, and �[l, l′] = �[l′, l′] = −10, (3) δ(l) = δ(l′) = dH = dP = −1.

Definition 12. Given a DAG G , a subgraph SG of G is limited if it has one source and {v ∈ V G : ∃u ∈ V SG s.t. u ∈ frb(v) ∪dsr(v) or v ∈ frb(u)} ⊆ V SG .

Given a DAG G and a set D ⊆ V G , nG,D denotes the number of undirected connected components we have in thesubgraph of G induced by D , and dG denotes the non-positive extra cost we pay for each such component deletion.

Now we can define our SDA (i.e., SubDAG Alignment) score and problem similarly to the ALSH score and problem (seeDefinitions 1 and 2).

Definition 13. Let P and H be DAGs, and S P and S H be subgraphs of P and H s.t. there is a homeomorphism preservingmapping M from S P to S H . Denote D P = {v ∈ V P : v /∈ V S P or M deletes v}, and D H = {v ∈ V S H : M deletes v}. The SDAscore of such M is

SDA(M) = dP nP ,D P + dHnS H ,D H +∑

v∈D P ∪D H

δ(lv) +∑

v∈V P \D P

�[lM(v), lv ]

Definition 14. Given �, δ, dP , dH , and two multisource trees (resp. treelike DAGs) P and H , the SDA problem is to find ahomeomorphism preserving mapping M from a subgraph (resp. a limited and induced subgraph) S P of P whose underlyingundirected graph is connected to a subgraph S H of H which maximizes the SDA score.

Consider Fig. 3 as an example. We get the best SDA score by using a mapping M that matches pi with hi for all 4 � i � 8.In this case D P = {p1, p2, p3}, D H = ∅,nP ,D P = 1,nH,D H = 0 and SDA(M) = 46.

Note that this problem can be seen as a variant of both the subgraph isomorphism problem and the subgraph homeo-morphism problem.

Definition 13 allows deletions from both P and H and considers a scoring scheme in which we pay an extra cost foreach component deletion (in case we do not want to pay extra costs, we can set dP = dH = 0). This encourages alignmentsto have few large deleted components rather than many small ones. Our motivation for considering this scoring scheme fornetworks is that it is frequently used when aligning sequences [40].

We note that considering only treelike DAGs still allows us to query many biological networks. For example, mostmetabolic pathways are treelike DAGs (see Section 3.6).

Assume G is a DAG, v ∈ V G , and U ⊆ V G . The following notations are used for the sake of clarity of presentation.

1. Gund denotes the underlying undirected graph of G .2. N(v), Ni(v) and No(v) denote the neighbors, ingoing neighbors and outgoing neighbors of v respectively.3. Given U ⊆ V G , U+v denotes U ∪ {v} if v is a crossroad, and U otherwise.4. max{} = −∞.

Our algorithms are based on dynamic programming in which at each stage we match a subgraph of P with a subgraph ofH . The next two definitions are of these subgraphs and the mappings we use to match between them. Part 1 of Definition 15



Fig. 4. The matrices of Algorithm 1. Each of the cells of Mdm and Mdd stores −∞. We consider the following parameters: (1) lrP = lrH = lp1 = lh1 = l, andlp2 = lh2 = l′ , (2) �[l, l] = 10, and �[l, l′] = �[l′, l′] = −10, (3) δ(l) = δ(l′) = dH = dP = −1.

and parts 1 and 2 of Definition 16 concern our first algorithm, and part 2 of Definition 15 and part 3 of Definition 16 concernour second algorithm.

Definition 15. Let G be a DAG, v ∈ V G ,n ∈ N(v) and SG be a subgraph of G .

1. sub(v,n) is the subgraph induced by the nodes in Gund \ {n} which are reachable from v .2. subSG (v) is the subgraph induced by the nodes in V SG which are reachable from v .

Definition 16. Let p, p′ ∈ V P ,h,h′ ∈ V H ∪ {nil} and M be a mapping from a subgraph S P of P to a subgraph S H of H .

1. M is (h, p)-rooted if H and P are multisource trees, h ∈ V S H , p ∈ V S P and M is homeomorphism preserving betweenS P und rooted at p and S Hund rooted at h (i.e., M preserves the ancestor relations of the nodes).

2. M is (h,h′, p, p′)-relaxed if h ∈ V S H , p ∈ V S P , and M is homeomorphism preserving with the exception of h whichcan be deleted if N(h) ∩ V S H = {v} and {(h′,h), (h, v)} ⊆ E H ∨ {(v,h), (h,h′)} ⊆ E H , and of p which can be deleted ifN(p) ∩ V S P = {u} and {(p′, p), (p, u)} ⊆ E P ∨ {(u, p), (p, p′)} ⊆ E P .

3. M is (h, p)-relaxed if h ∈ V S H , p ∈ V S P , and M is homeomorphism preserving with the exception of h which can bedeleted if it has exactly one neighbor in S H , and of p which can be deleted if |No(p) ∩ V S P | = 1.

Again consider Fig. 3 as an example. subP (p2) is the subgraph induced by {p1, p2} and sub(h2,h3) is the subgraphinduced by {h1,h2}. Suppose M is a mapping between these subgraphs which matches p1 with h1 and deletes p2 and h2.Then M is (h2, p2)-relaxed and (h2,h3, p2, p3)-relaxed.

3.2. Algorithm 1 for multisource trees

Algorithm 1 is suitable for multisource trees. It is based on dynamic programming and uses 5 matrices: M, Mmm, Mmd,Mdm and Mdd. M has a cell [h, p] ∀h ∈ V H , p ∈ V P , and each of the other matrices has a cell [h, p,n] ∀h ∈ V H , p ∈ V P , andn ∈ N(p). See Fig. 4 for an illustrative example.

Given a function f : V H → V H , we consider the following requirements.

Requirement 1. ∀h ∈ V H , p ∈ V P ,n ∈ N(p), Mα[h, p,n] holds the best SDA score for the host sub(h, f (h)) and pattern sub(p,n)

when we consider mappings that are (h, p)-rooted, (h, f (h), p,n)-relaxed, and

1. If α = mm, h and p are matched.2. If α = md, h is matched and p is deleted.3. If α = dm, h is deleted and p is matched.4. If α = dd, h and p are deleted and there is a pair of matched nodes.

Requirement 2. ∀h ∈ V H , maxp∈V P M[h, p] is the best SDA score for the host sub(h, f (h)) and pattern P s.t. h is in the matchedsubgraph of the host and there is a pair of matched nodes.

Next we describe the basic idea of the algorithm. Then we give its pseudocode, where each part is followed by explana-tions.

We traverse the multisource trees using a postorder and a preorder on their underlying undirected trees which we rootarbitrarily. Given h ∈ V H and p ∈ V P , it is not correct that descendents of h must be matched with descendents of p, wheredescendents are defined according to the rooted trees which we have chosen. Therefore we need the third argument n inthe cells of Mα for α ∈ {mm,md,dm,dd}. When calculating such a cell, n is considered to be the father of p (rather thanthe father of p in the rooted tree which we have chosen for P ).

Given h, p and n, we calculate the best score of deleting h and continuing the match with one of its descendents(where a descendent is defined according to the rooted tree which we have chosen for H), the best score of deleting p and



continuing the match with one of its descendents (where a descendent is defined as each of its neighbors excluding n), andthe best score of matching h with p and continuing the match with their descendents. When h and p are matched, we usea maximum weight matching calculation in order to find the best option to match between their descendents.

Algorithm 1.

1. ∀p ∈ V P , n ∈ N(p), calculate δ(p,n) = ∑p′∈V sub(p,n)

δ(lp′ ).

2. ∀G ∈ {H, P }, choose rG ∈ V G . Denote by GrGund the tree Gund rooted at rG , and denote by f (v) the father of v in GrG

und(where f (rG) = nil).

3. ∀h ∈ V H in postorder on HrHund:

(a) ∀p ∈ V P in postorder on P rPund, PostTraverseCalc(h, p).

(b) ∀p ∈ V P in preorder on P rPund, PreTraverseCalc(h, p).

4. Return max{∑p∈V Pδ(lp) + dP ,maxh∈V H ,p∈V P M[h, p]}.

In Step 1 we calculate the cost of deleting sub(p,n) for each p ∈ V P and n ∈ N(p). In Step 3 we traverse H and P inan order that guarantees us that in a calculation of a cell, the cells on which it depends have been already calculated. InStep 4 we return the maximum between the score of deleting all P and the best SDA score for H and P s.t. there is a pairof matched nodes.

PostTraverseCalc(h, p):

1. Denote X = Ni(p), X ′ = {x′: x ∈ X} and Y = Ni(h) \ { f (h)}.2. Construct a bipartite graph Gi with bipartition (X, Y ∪ X ′).

∀x ∈ X , connect x and x′ with an edge whose weight is δ(x, p) + dP , and ∀x ∈ X , y ∈ Y , connect x and y with an edgewhose weight is maxα∈{mm,md,dm,dd} Mα[y, x, p].

3. Calculate W i the maximum weight of a matching in Gi .4. Similarly construct Go and calculate Wo .5. M[h, p] ⇐ �[lh, lp] + W i + Wo .

In Steps 2–4 we construct two bipartite graphs whose maximum weight matchings correspond to the best matchingsbetween the ingoing and outgoing neighbors of h and p. In Step 5 we store the sum of these weights and the score ofmatching h with p in M[h, p].

6. If f (h) = nil, define Rh = ∅. Else if ( f (h),h) ∈ E H , define Rh = No(h). Else define Rh = Ni(h). Similarly define R p .7. ∀n ∈ Ni(p):

(a) Calculate W ni the maximum weight of a matching in Gi \ {n,n′}.

(b) Mmm[h, p,n] ⇐ �[lh, lp] + W ni + Wo .

(c) Mdm[h, p,n] ⇐ δ(lh) + maxh′∈Rh max{Mdm[h′, p,n],Mmm[h′, p,n] + dH }.(d) Mdd[h, p,n] ⇐ δ(lh) + maxh′∈Rh max{Mdd[h′, p,n],Mmd[h′, p,n] + dH }.

8. Similarly calculate Mmm, Mdm and Mdd for h, p and each n ∈ No(p).

In Step 7(a) we update Gi so that its matching will not include n. In Step 7(b) we store the updated sum in Mmm[h, p,n].In Step 7(c) we calculate the sum of the score of deleting h and matching sub(h′,h) with sub(p,n) for the best choice of h′from Rh . We consider only the matrices Mdm and Mmm since we need to match p, and we consider only neighbors of h inRh since we are restricted to (h, f (h), p,n)-relaxed matchings. If h′ is matched, we pay dH for the deletion of h (we start anew component deletion). Step 7(d) is similar.

9. If f (p) �= nil: Mmd[h, p, f (p)] ⇐ δ(p, f (p)) + maxp′∈R p (max{Mmd[h, p′, p],Mmm[h, p′, p] + dP } − δ(p′, p)).

Step 9 is similar to Step 7(c) with the exception that we pay δ(p, f (p)) − δ(p′, p) since we delete all the nodes whichare in sub(p,n) and not in sub(p′, p) (the nodes in P must be either matched or deleted).

PreTraverseCalc(h, p):

1. mdo = maxp′∈No(p)(max{Mmd[h, p′, p],Mmm[h, p′, p]+dP }− δ(p′, p)). ∀n ∈ Ni(p)\ { f (p)}: Mmd[h, p,n] ⇐ δ(p,n)+mdo .2. Similarly calculate Mmd for h, p and each n ∈ No(p) \ { f (p)}.3. ∀p′ ∈ Ni(p):

ip′,d = maxhi∈Ni(h)\{ f (h)} max{Mmd[hi, p′, p],Mdd[hi, p′, p] − dH },ip′,m = maxhi∈Ni(h)\{ f (h)} max{Mmm[hi, p′, p],Mdm[hi, p′, p] − dH },op′,d = maxho∈No(h)\{ f (h)} max{Mmd[ho, p, p′],Mdd[ho, p, p′] − dH },op′,m = maxho∈No(h)\{ f (h)} max{Mmm[ho, p, p′],Mdm[ho, p, p′] − dH }.



4. t = maxp′∈Ni(p) max{ip′,d + op′,d − dP , ip′,d + op′,m, ip′,m + op′,d, ip′,m + op′,m}.5. M[h, p] ⇐ max{M[h, p],dH + δ(lh) + t}.

Steps 1 and 2 are similar to Step 9 of PostTraverseCalc. In Steps 3–5 we update M[h, p] if we can get a better score bydeleting h. The deletion of h requires the matching of sub(hi,h) with sub(p′, p) and of sub(ho,h) with sub(p, p′) for somep′ ∈ Ni(p), hi ∈ Ni(h) \ { f (h)} and ho ∈ No(h) \ { f (h)}. t saves the score of the best choice of such p′ , hi and ho .

The following theorem states the correctness and running time of Algorithm 1. We prove it in Sections 3.4 and 3.5. Notethat the algorithm returns the score of a solution. However, we can track its calculations in order to obtain the solutionitself without increasing its time and space complexities.

Theorem 2. Algorithm 1 solves the SDA problem for multisource trees, and it runs in time O (|V P ||V H |(|V P | + log |V H |)). Moreover,it performs in time O (|V P ||V H |( |V P |

log |V P | + log |V H |)) when the number of labels is bounded.

3.3. Algorithm 2 for treelike DAGs

Algorithm 2 is suitable for treelike DAGs s.t. P has one source. The requirement that P has one source is made forthe sake of clarity of presentation. In case P has s > 1 sources v1, v2, . . . , vs , we run a slightly modified version of thisalgorithm s times where in the ith time we use P without nodes that are not reachable from vi . Note that since P has onesource, we get that ∀p ∈ V P , p′ ∈ No(p): frb(p′) ⊆ frb(p) ∪ dsr(p).

The algorithm is based on dynamic programming and uses 4 matrices: Mmm, Mmd, Mdm and Mdd. Each matrix has acell [h, p, S, f , g] ∀h ∈ V H , p ∈ V P , S ⊆ frb(h), 1–1 f : frb(p)+p → S+h , and g : frb(p)+p → 2V H s.t. [∀p′ ∈ frb(p)+p, g(p′) ⊆frb( f (p′)) \ (frb(h) \ S)]. See Fig. 3 for an illustrative example.

We consider the following requirement.

Requirement 3. For each cell [h, p, S, f , g], Mα[h, p, S, f , g] holds the best SDA score for the host subH\(frb(h)\S)(h) and pat-

tern subP (p) when we consider (h, p)-relaxed mappings s.t. [∀p′ ∈ frb(p)+p, p′ is matched with f (p′) ∧ subP (p′) is matched withsubH\(frb( f (p′))\g(p′))( f (p′))], and

1. If α = mm, h and p are matched.2. If α = md, h is matched and p is deleted.3. If α = dm, h is deleted and p is matched.4. If α = dd, h and p are deleted and there is a pair of matched nodes.

Next we describe the basic idea of the algorithm. Then we give its pseudocode, where each part is followed by explana-tions.

We traverse the treelike DAGs in a reverse topological order. Given h ∈ V H and p ∈ V P , we need to know the set S ofnodes from frb(h) which we are allowed to use for matching nodes that can be reached from p, and thus we iterate overeach such set. Moreover, ∀p′ ∈ frb(p)+p we need to know which was the node h′ ∈ frb(h)+h with whom it was previouslymatched, and which nodes from frb(h′) were allowed to be used in order to match nodes which can be reached from p′(see the explanation of Definition 11 in Section 3.1 for intuition why we need this information). Thus we need f and g inthe matrices cells.

Given h, p and the information mentioned above, we calculate the best score of deleting h and continuing the match withone of its outgoing neighbors, the best score of deleting p and continuing the match with one of its outgoing neighbors,and the best score of matching h with p and continuing the match with their outgoing neighbors. When h and p arematched, we use a maximum weight matching calculation in order to find the best option to match between their outgoingneighbors.

Algorithm 2.

1. ∀p ∈ V P , calculate δ(p) = ∑p′∈V subP (p)

δ(lp′ ).

2. ∀v ∈ V H ∪ V P , calculate frb(v) and dsr(v).3. Initialize all the cells of Mmm, Mmd, Mdm and Mdd to −∞.4. ∀h ∈ V H in reverse topological order:

∀p ∈ V P in reverse topological order:For k = 0 . . . |frb(h)|, iterate over each S ⊆ frb(h) of size k:

For each cell [h, p, S, f , g]: Calc(h, p, S, f , g).5. Return

∑p∈V P

δ(lp)+dP + max{0,maxh∈V H ,p∈V P s.t. frb(p)+p=∅(Mmm[h, p, frb(h),∅,∅]− δ(p)−d(p))} where d(p) = dP ifp is the source of P and 0 otherwise.



In Step 1 we calculate the cost of deleting subP (p) for each p ∈ V P . In Step 4 we iterate over the cells in an order whichguarantees that in a calculation of a cell, the cells on which it depends have been already calculated. In Step 5 we returnthe maximum between the score of deleting all P and the best SDA score for H and P s.t. there is a pair of matched nodes.We consider only nodes in P whose frb sets are empty and which are not crossroads since Definition 14 restricts us tolimited subgraphs.

Calc(h, p, S, f , g):

1. Denote I = frb(h) \ S .2. If p is not a crossroad:

(a) R p = {p′ ∈ No(p): �v ∈ dsr(p) which is reachable from p′ and �v ∈ frb(p) which is not reachable from p′}.(b) Mmd[h, p, S, f , g] ⇐ δ(p) + maxp′∈R p (max{Mmd[h, p′, S, f , g],Mmm[h, p′, S, f , g] + dP } − δ(p′)).(c) Mdd[h, p, S, f , g] ⇐ δ(p) + maxp′∈R p (max{Mdd[h, p′, S, f , g],Mdm[h, p′, S, f , g] + dP } − δ(p′)).

If p is not a crossroad, we delete it (we cannot delete a crossroad since Definition 14 restricts us to limited subgraphs).In Step 2(b) we calculate the sum of the score of deleting the nodes in subP (p) which are not in subP (p′) and matchingsubH\I (h) with subP (p′) for the best option to choose p′ ∈ R p . Since P has one source, the calculation is legal. We consideronly the matrices Mmm and Mmd since we need to match h, and we consider only neighbors of p in R p since Definition 14restricts us to limited subgraphs. If p′ is matched, we pay dP for the deletion of p (we start a new component deletion).Step 2(c) is similar.

3. Rh = {h′ ∈ No(h) \ I: ∀p′ ∈ frb(p)+p there is a path from h′ to f (p′)}.4. Mdm[h, p, S, f , g] ⇐ δ(lh) + maxh′∈Rh max{Mdm[h′, p, frb(h′) \ I, f , g],Mmm[h′, p, frb(h′) \ I, f , g] + dH }.

Step 4 is similar to Step 2(b), with the exception that we pay only for the deletion of h (the nodes in H can be neithermatched nor deleted). Moreover, the definition of Rh is not symmetric to R p . Here we consider this definition since weneed to choose an outgoing neighbor of h which we are allowed to use (i.e., it is not in I) and from which the nodes thatwe must match in order to be consistent with f are reachable.

The rest of the procedure concerns the calculation of Mmm[h, p, S, f , g].

5. If p is a crossroad:(a) If f (p) �= h, return.(b) Else if g(p) �= S , Mmm[h, p, S, f , g] ⇐ Mmm[h, p, g(p), f , g] and return.

If p is a crossroad and f (p) �= h, we cannot match h with p and thus we return. If p is a crossroad, f (p) = h andg(p) �= S , we have already performed the required calculation for Mmm[h, p, g(p), f , g], and thus we just copy it.

6. ∀v ∈ dsr(p) initialize rep(v) = |{p′ ∈ No(p): v is reachable from p′}|.7. ∀v ∈ dsr(p) in topological order:

(a) fix(v) = −(rep(v) − 1).(b) ∀v ′ ∈ dsr(p) which is reachable from v: add fix(v) to rep(v ′).

In Steps 6 and 7 we calculate fix(v) for each v ∈ dsr(p). These calculations are used in the last part of the algorithm tocorrect the problem that when we match the outgoing neighbors of p, we may consider the scores of matching or deletingnodes in dsr(p) more than once. For example, consider the calculation of Mmm[h3, p3,∅,∅,∅] as in Fig. 3. When matchingNo(p3) with No(h3), we consider the matching of p1 once for each node in No(p3) from whom it is reachable (i.e., once forp1 and once for p2).

8. ∀Del ⊆ dsr(p)\ frb(p) for which {v ∈ (frb(p)∪ (dsr(p)\ Del)): ∃p′ ∈ No(p), u ∈ Del s.t. v and u are reachable from p′} =∅:(a) Denote X = {n ∈ No(p): �v ∈ Del which is reachable from n}, X = No(p) \ X , X ′ = {x′: x ∈ X} and Y = No(h) \ I .(b) Define C = ( X ∪ Del, {{v, u}: u ∈ Del ∩ frb(v)}). Calculate the number nC of connected components in C and D =

nC dP + ∑p′∈ X δ(p′).

We iterate over all the legal options to delete nodes from dsr(p) \ frb(p). Since Definition 12 restricts us to limitedsubgraphs, the deletion of a node v ∈ dsr(p) \ frb(p) determines that all the internal nodes on the paths from p to v aredeleted. Moreover, if a node v is not deleted, we can neither delete nodes from its frb set nor delete nodes of degree greaterthan 2 that have v in their frb or dsr sets. Thus a legal set Del of deleted nodes satisfies {v ∈ (frb(p)∪ (dsr(p) \ Del)): ∃p′ ∈No(p), u ∈ Del s.t. v and u are reachable from p′} = ∅.

In Step 8(b) we calculate D , which is the score of deleting the nodes that were determined as deleted because of Del.

8. (c) ∀ 1–1 f ′ : frb(p)+p ∪ (dsr(p) \ Del) → S+h ∪ (dsr(h) \ I) s.t. [ f ′|frb(p)+p ≡ f ], and



∀g′ : frb(p)+p ∪ (dsr(p) \ Del) → 2V H s.t. [g′|frb(p)+p ≡ g ∧ ∀p′, g(p′) ⊆ frb( f ′(p′)) \ I]:i. If ∃p1, p2 s.t. p1 �= p2, there is a path from p1 to p2 and [ f ′(p2) /∈ g′(p1) ∨ g′(p2) ∩ (frb( f ′(p1)) \ g′(p1)) �= ∅],

continue.ii. If ∃p1, p2 s.t. there is no path from p1 to p2 or from p2 to p1 and ∃v ∈ (g′(p1) ∪ { f ′(p1)}) ∩ (g′(p2) ∪ { f ′(p2)})

for which [�u ∈ frb(p1) ∩ frb(p2) s.t. v ∈ g′(u) ∪ { f ′(u)}], continue.

We iterate over all the legal options to choose f ′ and g′ , which are extensions of f and g respectively. f determines themapping of the crossroads in dsr(p)∪ frb(p) which were not determined as deleted, and g′ determines the mapping of theircorresponding subgraphs (i.e., for p′ ∈ dsr(p) ∪ frb(p): if p′ ∈ Del then subP (p′) is deleted, and otherwise [p′ is matchedwith f ′(p′) ∧ subP (p′) is matched with subH\(frb( f ′(p′))\g′(p′))( f ′(p′))])). In particular, Step 8(c)ii is correct since P has onesource.

8. (c) iii. Denote Dsr = {h′ ∈ dsr(h) \ I: �p′ s.t. h′ ∈ g′(p′) ∪ { f ′(p′)}}.iv. ∀d : X → 2Dsr s.t. [∪p′d(p′) = Dsr ∧ ∀p1, p2 ∈ X d(p1) ∩ d(p2) = ∅]:

We iterate over all the options to choose which ‘unused’ nodes in dsr(h) \ I (i.e., nodes in Dsr) can be mapped whenmatching subP (p′) for each p′ ∈ No(p) that was not determined as deleted.

8. (c) iv. A. Construct a bipartite graph G with bipartition (X, Y ∪ X ′). ∀x ∈ X , connect x and x′ with an edge whoseweight is [δ(x) + dP if frb(x)+x = ∅, and −∞ otherwise].∀x ∈ X , y ∈ Y :• Denote Sxy = { f ′(v): v ∈ frb(x)+x} ∪ {v ∈ frb(y)+y: [v ∈ d(x) ∪ S \ dsr(h)] ∨ [∃u ∈ frb(x)+x s.t. v ∈

g′(u)] ∨ [v /∈ frb(h) ∪ dsr(h)]}.• If { }+y ⊆ Sxy ⊆ frb(y)+y , connect x and y with an edge whose weight is maxα∈{mm,md,dm,dd} Mα[y, x,

Sxy \ {y}, f ′|frb(x)+x , g′|frb(x)+x ].• Else connect them with an edge whose weight is −∞.

B. Calculate W the maximum weight of a matching in G .

We calculate the best score of matching the outgoing neighbors of p with the outgoing neighbors of h, which is denotedby W . It is easy to verify that the calculation is correct since Definition 12 restricts us to limited subgraphs and P has onesource. For example, x can be deleted iff it is not a crossroad and its frb set is empty. Therefore we connect x and x′ withan edge whose weight is δ(x) + dP (i.e., the cost of deleting subP (x)) if frb(x)+x = ∅, and −∞ otherwise.

8. (c) iii. C. ∀v ∈ dsr(p): val(v) is δ(v) if v ∈ Del, and Mmm[ f ′(v), v, g′(v), f ′|frb(v)+v , g′|frb(v)+v ] otherwise.D. Denote FIX = ∑

v∈dsr(p) fix(v)val(v).E. If Mmm[h, p, S, f , g] < �[lh, lp] + D + W + FIX: Mmm[h, p, S, f , g] ⇐ �[lh, lp] + D + W + FIX.

We sum the score of matching p with h with the scores D and W . By adding FIX to �[lh, lp]+ D + W , the score of eachmatched pair or deleted node is considered exactly once.

The following theorem states the correctness and running time of Algorithm 2. We prove it in Sections 3.4 and 3.5. Notethat the algorithm returns the score of a solution. However, we can track its calculations in order to obtain the solutionitself without increasing its time and space complexities.

Theorem 3. Algorithm 2 solves the SDA problem for treelike DAGs s.t. P has one source, and it runs in time O (|E P ||V P |max{dsr(H)−1,0} ×|E H |(|V P | + log |V H |) + |V H ||E H |). Moreover, it performs in time O (|V P |( |V P |

log |V P | )max{dsr(H)−1,0}|E H | · (

|V P |log |V P | + log |V H |) +

|V H ||E H |) when P is a directed tree and the number of labels is bounded.

3.4. Correctness

The correctness of Algorithm 1 is based on the following Lemmas 10 and 11.

Lemma 10. Algorithm 1 satisfies Requirement 1.

Proof. First of all, note that each cell depends only on previously computed cells. Thus we can assume that cells which donot depend on other cells are calculated first, and the computation order of all the other cells is as in the algorithm. Weprove the lemma by using induction on this order.

The induction basis includes the following calculations ∀h ∈ V H , p ∈ V P :

• If N(p) = {n}, then Mmm[h, p,n] = �[lh, lp], which is correct since we only need to match h with p.• If Rh = ∅, then ∀n ∈ N(p), α ∈ {dm,dd}, Mα[h, p,n] = −∞, which is correct since in this case h cannot be deleted.



• If No(p) = ∅, then ∀n ∈ Ni(p) Mmd[h, p,n] = −∞, which is correct since in this case p cannot be deleted.• If Ni(p) = ∅, then ∀n ∈ No(p) Mmd[h, p,n] = −∞, which is correct since in this case p cannot be deleted.

Now we prove that each calculation which does not belong to the basis of the induction is correct assuming the cells onwhich it depends are correct:

1. For n ∈ Ni(p), Mmm[h, p,n] = �[lh, lp] + W ni + Wo .

We consider mappings which are (h, p)-rooted, (h, f (h), p,n)-relaxed and match h with p. Thus ∀p′ ∈ Ni(p) \ {n},sub(p′, p) is either deleted or matched with a different subgraph sub(h′,h) where h′ ∈ Ni(h) \ { f (h)} using a match-ing which is (h′, p′)-rooted and (h′,h, p′, p)-relaxed. Moreover, ∀p′ ∈ No(p) we have the symmetric claim. Thus thecorrectness of the calculation follows from the induction hypothesis and our maximum weight matching calculation.For n ∈ No(p), the proof of the calculation of Mmm[h, p,n] is symmetric.

2. Mdm[h, p,n] = δ(lh) + maxh′∈Rh max{Mdm[h′, p,n],Mmm[h′, p,n] + dH }.We pay δ(lh) for the deletion of h. We consider mappings which are (h, p)-rooted, (h, f (h), p,n)-relaxed, delete h andmatch p, and thus sub(p,n) must be matched with sub(h′,h) for some h′ ∈ Rh using a mapping which is (h′, p)-rooted,(h′,h, p,n)-relaxed and matches p. Moreover, if h′ is matched, we pay the additional cost dH for the deletion of h. Thecorrectness of the calculation follows from these arguments and the correctness of the induction hypothesis.

3. The proof of the calculation of Mdd[h, p,n] is similar to 2.4. For n ∈ Ni(p),

Mmd[h, p,n] = δ(p,n) + maxp′∈No(p)

(max

{Mmd

[h, p′, p

],Mmm

[h, p′, p

] + dP} − δ

(p′, p

)).

Since we consider mappings which are (h, p)-rooted, (h, f (h), p,n)-relaxed, delete p and match h, sub(h, f (h)) mustbe matched with sub(p′, p) for some p′ ∈ No(p) using a mapping which is (h, p′)-rooted, (h, f (h), p′, p)-relaxed andmatches h. We pay δ(p,n) − δ(p′, p) for the deletion of the nodes in sub(p,n) which are not in sub(p′, p). Moreover, ifp′ is matched, we pay the additional cost dP for the deletion of p. The correctness of the calculation follows from thesearguments and the correctness of the induction hypothesis.For n ∈ No(p), the proof of the calculation of Mmd[h, p,n] is symmetric. �


Proof. Let h be a node in H and p be a node in P . When calculating M[h, p], the cells of the other matrices on whichit depends were already calculated. Thus according to Lemma 10, Step 5 of PostTraverseCalc(h, p) and Steps 3–5 ofPreTraverseCalc(h, p), M[h, p] holds the maximum of the following scores:

• The best SDA score for the host sub(h, f (h)) and pattern P when we consider homeomorphism preserving (h, p)-rootedmappings which match h with p.

• The best SDA score for the host sub(h, f (h)) and pattern P when we consider homeomorphism preserving mappingswhich consist of– A mapping which only deletes h.– A (hi, p′)-rooted and (hi,h, p′, p)-relaxed mapping between sub(hi,h) and sub(p′, p).– A (ho, p)-rooted and (ho,h, p, p′)-relaxed mapping between sub(ho,h) and sub(p, p′).for some p′ ∈ Ni(p),hi ∈ Ni(h) \ { f (h)} and ho ∈ No(h) \ { f (h)}. We define this score as −∞ in case no such mappingexists.

Note that the best SDA score for the host sub(h, f (h)) and pattern P is equal to the maximum of these scores for somep ∈ V P and is equal or bigger from these scores for all the other nodes of P . Thus we derive the lemma. �

The correctness of Algorithm 2 is based on the following Lemma 12.


Proof. First of all, note that each cell depends only on previously computed cells. Thus we can assume that cells which donot depend on other cells are calculated first, and the computation order of all the other cells is as in the algorithm. Weprove the lemma by using induction on this order.

The induction basis includes the following calculations for each [h, p, S, f , g]:

• If p is a crossroad or R p = ∅, Mmd[h, p, S, f , g] = Mdd[h, p, S, f , g] = −∞, which is correct since in this case there areno matchings as the lemma requires which delete p (we consider only limited subgraphs of P ).

• If Rh = ∅, Mdm[h, p, S, f , g] = −∞, which is correct since in this case there are no matchings as the lemma requireswhich delete h.



• If p is a crossroad and f (p) �= h, or we reach Step 8 of Calc(h, p, S, f , g) though no bipartite graph is constructed, thenMmm[h, p, S, f , g] remains −∞ as it is initialized, since in this case there are no matchings as the lemma requireswhich match p with h. Else if a bipartite graph is constructed in Step 8 of Calc(h, p, S, f , g), and No(p) = ∅ or No(h) \I = ∅, Mmm[h, p, S, f , g] = �[lh, lp] + D + FIX. This is the score of matching p with h and deleting all the other nodesin subP (p), which is the only legal option in this case.

Now we prove that each calculation which does not belong to the basis of the induction is correct assuming the cells onwhich it depends are correct:

1. Mmd[h, p, S, f , g] ⇐ δ(p) + maxp′∈R p (max{Mmd[h, p′, S, f , g],Mmm[h, p′, S, f , g] + dP } − δ(p′)).In order to fulfill the requirements of the lemma, subH\I (h) must be matched with subP (p′) for some p′ ∈ R p using amapping which is (h, p′)-relaxed, matches h, and [∀v ∈ frb(p′)+p′

, v is matched with f (v) ∧ subP (v) is matched withsubH\(frb( f (v))\g(v))

( f (v))] (note that ∀p′ ∈ R p , frb(p′)+p′ = frb(p)+p ). We pay δ(p)−δ(p′) for the deletion of the nodesin subP (p) which are not in subP (p′). Moreover, if p′ is matched, we pay the additional cost dP for the deletion of p.The correctness of the calculation follows from these arguments and the correctness of the induction hypothesis.

2. The proof of the calculation of Mdd[h, p, S, f , g] is similar to 1.3. Mdm[h, p, S, f , g] ⇐ δ(lh) + maxh′∈Rh max{Mdm[h′, p, frb(h′) \ I, f , g],Mmm[h′, p, frb(h′) \ I, f , g] + dH }.

We pay δ(lh) for the deletion of h. In order to fulfill the requirements of the lemma, subP (p) must be matched withsubH\I (h′) for some h′ ∈ Rh using a mapping which is (h′, p)-relaxed, matches p, and [∀v ∈ frb(p)+p, v is matchedwith f (v) ∧ subP (v) is matched with subH\(frb( f (v))\g(v))

( f (v))]. The correctness of the calculation follows from thesearguments and the correctness of the induction hypothesis.

4. If p is a crossroad, f (p) = h and g(p) �= S , the correctness of the calculation Mmm[h, p, S, f , g] ⇐ Mmm[h, p, g(p), f , g]follows immediately from the requirements of the lemma.Otherwise when calculating Mmm[h, p, S, f , g] we perform an exhaustive search for the best of all the legal optionsto choose Del, f ′, g′ and d, which determine the deletions or matchings of all the nodes in dsr(p) ∪ frb(p) and theircorresponding subgraphs (i.e., for p′ ∈ dsr(p)∪ frb(p): if p′ ∈ Del then subP (p′) is deleted, and otherwise [p′ is matchedwith f ′(p′) ∧ subP (p′) is matched with subH\(frb( f ′(p′))\g′(p′))( f ′(p′))]) and which nodes from dsr(h) can be used whenmatching subP (p′) for each p′ ∈ No(p). For each such option we calculate the best matching of the subgraphs corre-sponding to nodes in No(p) which may not be deleted with deletions or with the subgraphs corresponding to nodes inNo(h) \ I , whose correctness follows from the maximum weight matching calculation the algorithm performs and theinduction hypothesis. We add this result to the score of matching p with h and deleting the subgraphs correspondingto nodes in No(p) which were determined as deleted by Del. By adding FIX to �[lh, lp] + D + W , the score of eachmatched pair or deleted node is considered exactly once, and thus the calculation is correct. �

3.5. Running time analysis

Given p ∈ V P and p′ ∈ N(p), denote:

1. iso(p′, p) is {n ∈ N(p): there is an isomorphism between sub(n, p) and sub(p′, p), which is also an isomorphismbetween their underlying undirected trees rooted at n and p′, respectively} if P is a multisource tree, and {p′}

otherwise.2. As(p) = {iso(n, p): n ∈ N(p)}, and A(p) = |As(p)|.

Lemma 13. For a multisource tree P whose number of labels is bounded, we have that ∀p ∈ V P : A(p) = O (|V P |

log |V P | ).

Proof. Denote by f (n) the number of distinct labeled multisource trees in a forest of n nodes which have a unique node.Shamir et al. [34] prove that the number of distinct labeled rooted trees in a forest of n nodes is O ( n

log n ). In their proof,they use the rooted tree topology only when claiming that the number of distinct unlabeled rooted trees on i nodes is atmost ci for some constant c. We have 2i−1 options to direct the edges of each of them (some of them may result in thesame tree). We treat their roots as their unique nodes. Thus the number of unlabeled multisource trees on i nodes whichhave a unique node is at most ci for some constant c, and we can use the same proof to show that f (n) = O ( n

log n ). Thelemma follows by observing that ∀p ∈ V P : A(p) � f (|V P |). �

Given a bipartite graph G = (X, Y ∪ X ′, E) that is constructed in Algorithm 1 or 2 according to the nodes h ∈ V H , p ∈V P and the weight function w : E → R , denote by X∗ the partition of X into sets s.t. x1, x2 ∈ X are in the same set iffw(x1, x′

1) = w(x2, x′2) ∧ ∀y ∈ Y : w(x1, y) = w(x2, y). Note that |X | � |N(p)|, |Y | � |N(h)| and A(p) � |N(p)|. Moreover,

|X∗| = O (A(p)) since if x1 and x2 belong to the same set in As(p) and if d(x1) = d(x2) (in case G is constructed inAlgorithm 2), then x1 and x2 belong to the same set in X∗ .

Lemma 14. A maximum weight matching in G can be computed in time O (|X |[|X∗||Y | + (|X∗| + |Y |) log(|X∗| + |Y |)]).



Fig. 5. An example of the graph G ′ which is constructed in the proof of Lemma 14. The costs of the edges appear in the figure. For i ∈ {1,2}, the capacityof {s, xi

∗} is |xi∗|, and the capacity of {y, t} is 1. The capacity of each of the other edges is ∞.

Proof. We use the Edmonds–Karp algorithm [16] and a Fibonacci heap [18] to compute a maximum weight matching in G .The problem is reduced to finding a min-cost max-flow in the graph G∗ whose nodes are V G ∪ {s, t}, source s, sink t andthe following edges: an edge (s, x) ∀x ∈ X of cost 0 and capacity 1, an edge (x, z) ∀x ∈ X , z ∈ Y ∪ {x′} of cost −w(x, z) andcapacity ∞, and an edge (z, t) ∀z ∈ Y ∪ X ′ of cost 0 and capacity 1. Denote by f ∗ the max flow in G∗ . The running time ofthe computation is O ( f ∗(|EG∗ | + |V G∗ | log |V G∗ |)) = O (|X |(|X ||Y | + (|X | + |Y |) log(|X | + |Y |))).

However, we can use the sets in X∗ as nodes instead of the nodes in X . We define G ′ using X∗ ∪ Y ∪ X∗′ , source s,sink t and the following edges (see Fig. 5 for an example): an edge (s, x∗) ∀x∗ ∈ X∗ of cost 0 and capacity |x∗|, an edge(x∗, z) ∀x∗ ∈ X∗ , z ∈ Y ∪ {x∗′} of cost −w(x∗, z) and capacity ∞, an edge (y, t) ∀y ∈ Y of cost 0 and capacity 1, and an edge(x∗′, t) ∀x∗′ ∈ X∗′ of cost 0 and capacity ∞. Thus the running time of the computation is O ( f ∗(|EG ′ | + |V G ′ | log |V G ′ |)) =O (|X |(|X∗||Y | + (|X∗| + |Y |) log(|X∗| + |Y |))). �Lemma 15. Given a computation of a maximum weight matching in G, a maximum weight matching in G (1) without a node xd ∈ Xand its corresponding node x′

d ∈ X ′ , or (2) with additional nodes xa and x′a in X and X ′ respectively and the weighted edges (xa, z)

∀z ∈ Y ∪ {x′a}, can be computed in time O (|X∗||Y | + (|X∗| + |Y |) log(|X∗| + |Y |)).

Proof. In each stage of Edmonds–Karp algorithm (see the proof of Lemma 14) we have a potential function p for thenodes and a corresponding cost function c′ for the edges (∀e = (u, v) of the residual graph, c′(e) = ±c(e) + p(u) − p(v) � 0where c(e) is the original cost of e and the sign is + if e ∈ EG ′ and − otherwise), and we run Dijkstra algorithm in timeO (|X∗||Y |+ (|X∗|+ |Y |) log(|X∗|+ |Y |)) on the residual graph to find an augmentation path for the existing flow and updatep accordingly (Dijkstra algorithm calculates a distance for each node which we add to its potential).

Denote by f the given min-cost max-flow in G ′ (the flow network of G), by G f the resulting residual graph, by p thepotential function, by c′ its corresponding cost function, and by Gn the new graph (after updating G).

We show that in both cases we need to use at most one run of Dijkstra algorithm to find a min-cost max-flow in theflow network of Gn:

1. Denote by x∗d the node in G ′ that represents the set which includes xd , and by Md the set of nodes which are matched

to x∗d . If x∗

d′ ∈ Md denote md = x∗

d′ , and otherwise choose some node from Md and denote it by md . Push back one unit

of flow on t −md − x∗d − s, and denote the resulting flow by fn . If |x∗

d| = 1, remove x∗d and x∗′

d from the graph. Otherwisedecrease the capacity of (s, x∗

d) by one. Denote the resulting graph by G ′n . G ′

n is the flow network of Gn , and fn is amaximum flow in G ′

n . Denote by R the resulting residual graph. If md = x∗d′ , fn is min-cost since then E R ⊆ EG f . Thus

next we assume md �= x∗′d .

Denote e∗ = (md, t). The costs c′ of all the edges in E R \ {e∗} are nonnegative since E R \ {e∗} ⊆ EG f . Thus if c′(e∗) � 0,fn is min-cost and we are done. Else we use Dijkstra algorithm to calculate a min-cost path q from t to md in R \ e∗(according to c′). Denote by c′(q) the sum of the costs of the edges of q. We update the potential function accordingto the Dijkstra computation. Denote by pn the new potential function, and by cn its corresponding cost function. It iseasy to verify that ∀e ∈ E R \ {e∗}, cn(e) � 0 and in particular ∀e ∈ q, cn(e) = 0. Denote by d(v) the distance to v as wascomputed in the Dijkstra computation. Note that cn(e∗) = c(e∗)+ pn(md)− pn(t) = c(e∗)+ p(md)− p(t)+d(md)−d(t) =c′(e∗) + c′(q).We consider two cases:(a) c′(q) + c′(e∗) � 0. Thus cn(e∗) � 0, and we get that fn is min-cost.(b) c′(q)+ c′(e∗) < 0. We push one unit of flow on the cycle which consists of q and e∗ . The costs cn of all the edges in

the resulting residual graph are nonnegative (in particular cn(e∗) < 0 and therefore cn(←−e∗) > 0). Thus the resulting

max-flow is min-cost.2. Denote by G ′

n the flow network of Gn , and by R the residual graph of G ′n with f (note that f is a legal flow in G ′

n thathas to be increased by one in order to be a max-flow).We consider two cases:



(a) xa belongs to a set x∗a which is already a node in G ′ . Then G ′

n is G ′ where the capacity of (s, x∗a) is increased by one.

Since E R \ EG f = {(s, x∗a)}, the costs c′ of all the other edges in R are nonnegative.

We use Dijkstra algorithm to compute a min-cost path q from x∗a to t in R without (s, x∗

a), push one unit of flow onthe path which consists of (s, x∗

a) and q, and update the potential function accordingly. If the cost corresponding tothe new potential of (x∗

a , s) is negative, we add it to the potential of s. It is easy to verify that the costs correspondingto the new potential of all the edges in the resulting residual graph are nonnegative, and thus we have a min-costmax-flow.

(b) xa does not belong to a set x∗a which is already a node in G ′ . We define p(x∗

a) = p(s), p(x∗′a) = p(t) and extend

c′ accordingly. The only edges in R which may have negative costs c′ are (x∗a , z) for z ∈ Y ∪ {x∗′

a} since c′(s, x∗a) =

c′(x∗′a, t) = c′(t, x∗′

a) = 0 and all the other edges exist in G f .We subtract m = minz∈Y ∪{x∗′

a}c′(x∗a , z) from the original cost c of each edge (x∗

a , z) s.t. z ∈ Y ∪{x∗′a}. Since we subtract

m from the costs of all the max flows, a max-flow is min-cost when using the new costs iff it is min-cost whenusing the original costs. The costs c′ of all the edges in R are now nonnegative. We use one run of Dijksra algorithmto find an augmentation path for the existing flow f and update p to pn accordingly. Then the costs of all the edgesin the resulting residual graph (according to the cost function corresponding to pn) are nonnegative, and we have amin-cost max-flow. �

Now we can prove the running time of Algorithm 1.

Lemma 16. The running time of Algorithm 1 is O (|V P ||V H |(|V P | + log |V H |)), and O (|V P ||V H |( |V P |log |V P | + log |V H |)) when the

number of labels is bounded.

Proof. The running time of the initialization and all the PostTraverseCalc and PreTraverseCalc calls excluding the maximumweight matching computations is bounded by O (|V P ||V H |).

In PostTraverseCalc, ∀h ∈ V H , p ∈ V P we construct Gi and Go and calculate W i and Wo in time O (|N(p)|[A(p)|N(h)| +(A(p)+|N(h)|) log(A(p)+|N(h)|)]) according to Lemma 14. Then, according to Lemma 15, ∀n ∈ N(p) we calculate W n

i usingGi or W n

o using Go in time O (A(p)|N(h)| + (A(p) + |N(h)|) log(A(p) + |N(h)|)).The total running time is

O

( ∑h∈V H ,p∈V P

(∣∣N(p)∣∣[A(p)

∣∣N(h)∣∣ + (

A(p) + ∣∣N(h)∣∣) log

(A(p) + ∣∣N(h)

∣∣)]) + |V P ||V H |)

= O

(|V H |

∑p∈V P

∣∣N(p)∣∣A(p) + |V H ||V P | log |V H |

)

We get the required running time by using A(p) � |N(p)| and Lemma 13. �We prove two more lemmas and then the running time of Algorithm 2.

Lemma 17. Given a treelike DAG G = (V , E), the time required for computing the dsr and frb sets of all its nodes is bounded byO (|V ||E|).

Proof. Given v, u ∈ V , Q(v, u) denotes a query which returns true if u ∈ dsr(v) and false otherwise. We use the datastructure presented in [38] for directed graphs, which allows us to perform a query Q(v, u) in time O (1) after the insertionof V and E . For general directed graphs, the insertion of E is done in time O (|V |2|E|). We will show that for DAGs it canbe done in time O (|V ||E|).

The structure is initialized by inserting V in time O (|V |2). ∀v ∈ V the structure holds a tree T v . When inserting an edgee, it updates the tree T v of each v s.t. ∃u ∈ V which is reachable from v , but was not reachable from v before the insertionof e. Updating a tree is done in time O (|V |). We compute a reverse topological order v1, . . . , v |V | of V . For i = 1 . . . |V |, weinsert all the outgoing edges of vi . Assume we are inserting the edge (vi, x), and consider a node v j . If j < i, T v j does notneed an update since there is no path from v j to vi because of the reverse topological order. If i < j, T v j does not needan update since there is no path from v j to vi because we have not yet inserted any edge outgoing from v j . If j = i, T v j

might need an update. Thus we insert E in time O (|V ||E|).We compute dsr(v) ∀v ∈ V by iterating over each u ∈ V and performing the query Q(v, u). Afterwards we iterate over

every three different nodes v, u ∈ V and d ∈ dsr(v), and insert d to frb(u) iff d is reachable from u and u is reachablefrom v . Since ∀v ∈ V dsr(v) is bounded, we compute the dsr and frb values of all the nodes in time O (|V |2). Therefore thetotal time is bounded by O (|V ||E|). �Lemma 18. In Calc(h, p, S, f , g) Step 8(c)iv, it is sufficient to iterate over k = O (A(p)|dsr(h)|) functions.



Proof. If P is not a directed tree, A Xs (p) denotes {{x}: x ∈ X}. Else it denotes the partition of X into sets s.t.

x1, x2 ∈ X are in the same set iff δ(x1) = δ(x2) and (∀y ∈ Y , F ⊆ frb(y): maxα∈{mm,md,dm,dd} Mα(y, x1, F ,∅,∅) =maxα∈{mm,md,dm,dd} Mα(y, x2, F ,∅,∅)). Denote A X (p) = |A X

s (p)|. Note that A X (p) � A(p).We distribute the nodes in Dsr to the sets of A X

s (p). The number of such options is O (A X (p)|Dsr|). Consider a setZ ∈ A X

s (p). We can treat its nodes as identical cells in which we need to place at most |Dsr| balls representing the nodes Zreceived (e.g., given two nodes p1, p2 ∈ Z and a distribution d s.t. d(p1) = S1 and d(p2) = S2, it is not necessary to checka distribution d′ which is identical to d with the exception of d′(p1) = S2 and d′(p2) = S1). The number of such options isbounded, and since the number of sets in A X

s (p) which received at least one node is bounded, the total number of functionson which it is sufficient to iterate is O (A X (p)|Dsr|), which is O (A(p)|dsr(h)|). �Lemma 19. The running time of Algorithm 2 is O (|E P ||V P |max{dsr(H)−1,0}|E H |(|V P | + log |V H |) + |V H ||E H |). It runs in timeO (|V P |( |V P |

log |V P | )max{dsr(H)−1,0}|E H |( |V P |

log |V P | + log |V H |) + |V H | · |E H |) when P is a directed tree and the number of labels is bounded.

Proof. By Lemma 17, the time required for the initialization of the algorithm is O (|V H ||E H |). ∀h ∈ V H , p ∈ V P we call Calca bounded number of times.

Consider a call Calc(h, p, S, f , g). The time of Steps 1–7 is O (|N(p)| + |N(h)|). The number of iterations of Step 8 isbounded, the time of Steps 8(a) and 8(b) is O (|N(p)| + |N(h)|), and the number of iterations of Step 8(c) is also bounded.

By Lemma 18 and its proof, we denote by d1, . . . ,dk the k = O (A(p)|dsr(h)|) iterated functions in some iteration ofStep 8(c)iv ordered s.t. ∀2 � i � k, |{x: di(x) �= di′ (x)}| = 2 for some 1 � i′ < i. Denote by G1, . . . , Gk their correspondingbipartite graphs. By Lemma 14, we can compute a maximum weight matching for G1 in time O (|N(p)|[A(p)|N(h)|+(A(p)+|N(h)|) log(A(p) + |N(h)|)]). For 2 � i � k, there are two nodes in the set X of Gi which are endpoints of edges whoseweights may differ from those in Gi′ . By Lemma 15, we can compute a maximum weight matching for Gi by removing thesetwo nodes from Gi′ and inserting them with their new weighted edges in time O (A(p)|N(h)| + (A(p) + |N(h)|) log(A(p) +|N(h)|)). Thus the time of the iteration is O (|N(p)|A(p)max{|dsr(h)|−1,0}[A(p)|N(h)| + (A(p) + |N(h)|) log(A(p) + |N(h)|)]).

We get that the running time of the algorithm is O (|V H ||E H | + |E H |∑p∈V P|N(p)|A(p)max{|dsr(h)|−1,0}(A(p) + log |V H |)).

We get the required running time by using A(p) � |N(p)| and Lemma 13. �3.6. Application to metabolic pathways

We model metabolic networks by enzyme graphs: Each node represents an enzyme and is labeled by its enzyme class(EC) number, and a directed edge from enzyme e1 to enzyme e2 exists iff a product of a reaction activated by e1 is asubstrate for a reaction activated by e2.

As noted in [31], the choice of homeomorphism deletions has a biological reason in addition to the fact that it allowsus to design efficient algorithms: A single enzyme in one pathway may replace a few consecutively acting enzymes inanother pathway. The replacement can take place if the replacing enzyme is multifunctional and can thus catalyze severalconsecutive reactions, or if the enzyme uses an alternative catalysis that leads directly from the initial substrate to the finalproduct.

We compare two labels using the following definition from [39]:

Definition 17. For an enzyme class h, C(h) denotes the number of enzymes whose classes are included under h. Theinformation content of h is defined as I(h) = − log2 C(h). For two enzymes ei and e j whose lowest common upper class ishij , we consider I(hij) to express the similarity between ei and e j .

In order to test our algorithms on real data, we wrote a plugin for Cytoscape [37], which is an open source bioinfor-matics software platform for visualizing and analyzing biological networks. The plugin chooses between Algorithm 1 andAlgorithm 2 according to the input.

We extracted metabolic pathways from Biocyc [10]. Pathways which are not multisource trees have only few cycles intheir undirected underlying graphs, and these cycles are rarely directed cycles [31]. We noticed that the dsr and frb valuesof the DAGs are mostly 0 or 1. Therefore, our plugin is applicable to many metabolic pathways.

In Fig. 6 we present alignments of the phenylalanine, tyrosine and tryptophan pathway as examples for our pluginoutputs. We used the parameter −8 for each node deletion (regardless of its label), and set dH = dP = −2. In all parts, theleft pathway is the host and it belongs to Escherichia coli. In part A the right pathway belongs to Saccharomyces cerevisiae, inpart B it belongs to Caulobacter crescentus, and in part C it belongs to Buchnera aphidicola. Note that some of the pathwaysare not multisource trees. However, their dsr and frb values are 0 or 1. Moreover, each of the alignments deletes nodes fromits pattern.

4. Conclusions

In this article we have presented several algorithms for pattern matching problems which have applications to bioinfor-matics.



Fig. 6. Alignments of the phenylalanine, tyrosine and tryptophan pathways as outputed by our plugin. The cost of each node deletion is −8, and dH = dP =−2. In all parts, the left pathway is the host and it belongs to Escherichia coli. In part A the right pathway belongs to Saccharomyces cerevisiae, in part B itbelongs to Caulobacter crescentus, and in part C it belongs to Buchnera aphidicola. All pathways were extracted from Biocyc [10].

First we have presented an algorithm for the general case of the topology-free network query problem that runs in timeO ∗(2k) and whose space complexity is polynomial. Then we extended this algorithm to three variants of the problem. Inparticular, we presented an algorithm for the weighted variant of this problem that runs in time O ∗(2k W ). The algorithmof Bruckner et al. [9] runs in time O ∗(k!3k). Thus, it is interesting to design algorithms with improved running times whosedependencies on W are polylogarithmic.

We have also presented two algorithms for the alignment network query problem. We used homeomorphism, did notbound P or the number of deletions and considered a scoring scheme in which we can pay an extra cost for each deletedcomponent in the solution. Our first algorithm is suitable for multisource trees and the second for a certain family of DAGs.We have noted that this family is suitable for querying many metabolic pathways. However, it is interesting to extend thesecond algorithm to handle cycles so we can query efficiently other biological networks as well.

References

[1] N. Alon, R. Yuster, U. Zwick, Color coding, J. Assoc. Comput. Mach. 42 (1995) 844–856.[2] A.M. Ambalath, R. Balasundaram, R.H. Chintan, K. Venkata, M. Neeldhara, P. Geevarghese, M.S. Ramanujan, On the kernelization complexity of colorful

motifs, in: Proc. 5th International Symp. on Parameterized and Exact Computation, 2010, pp. 14–25.[3] E. Banks, E. Nabieva, R. Peterson, M. Singh, Netgrep: fast network schema searches in interactomes, Genome Biol. 9 (2008) R138.[4] N. Betzler, R. Bevern, M.R. Fellows, C. Komusiewicz, R. Niedermeier, Parameterized algorithmics for finding connected motifs in biological networks,

IEEE/ACM Trans. Comput. Biol. Bioinform. 8 (2011) 1296–1308.[5] N. Betzler, M.R. Fellows, C. Komusiewicz, R. Niedermeier, Parameterized algorithms and hardness results for some graph motif problems, in: Proc. 19th

Annual Symp. on Combinatorial Pattern Matching, 2008, pp. 31–43.[6] A. Björklund, T. Husfeldt, P. Kaski, M. Koivisto, Narrow sieves for parameterized paths and packings, CoRR, arXiv:abs/1007.1161, 2010.[7] A. Björklund, P. Kaski, L. Kowalik, Probably optimal graph motifs, in: Proc. 36th International Symp. on Theoretical Aspects of Computer Science, 2013,

pp. 20–31.[8] G. Blin, F. Sikora, S. Vialette, Querying graphs in protein–protein interactions networks using feedback vertex set, IEEE/ACM Trans. Comput. Biol.

Bioinform. 7 (2010) 628–635.[9] S. Bruckner, F. Hüffner, R.M. Karp, R. Shamir, R. Sharan, Topology-free querying of protein interaction networks, J. Comput. Biol. 17 (2010) 237–252.

[10] R. Caspi, T. Altman, J.M. Dale, K. Dreher, C.A. Fulcher, F. Gilham, P. Kaipa, A.S. Karthikeyan, A. Kothari, M. Krummenacker, M. Latendresse, L.A. Mueller,S. Paley, L. Popescu, A. Pujar, A.G. Shearer, P. Zhang, P.D. Karp, The metacyc database of metabolic pathways and enzymes and the biocyc collection ofpathway/genome databases, Nucleic Acids Res. 38 (2010) 473–479.

[11] B. Chazelle, A minimum spanning tree algorithm with inverse-ackermann type complexity, J. Assoc. Comput. Mach. 47 (2000) 1028–1047.[12] Q. Cheng, R.W. Harrison, A. Zelikovsky, Homomorphisms of multisource trees into networks with applications to metabolic pathways, in: Proc. 7th IEEE

International Conf. on Bioinform. and Bioengineering, 2007, pp. 350–357.



[13] R.A. DeMillo, R.J. Lipton, A probabilistic remark on algebraic program testing, Inf. Process. Lett. 7 (1978) 193–195.[14] R. Dondi, G. Fertin, S. Vialette, Complexity issues in vertex-colored graph pattern matching, J. Discrete Algorithms 9 (2011) 82–99.[15] R. Dondi, G. Fertin, S. Vialette, Finding approximate and constrained motifs in graphs, Theor. Comput. Sci. 438 (2013) 10–21.[16] J. Edmonds, R.M. Karp, Theoretical improvements in algorithmic efficiency for network flow problems, J. Assoc. Comput. Mach. 19 (1972) 248–264.[17] M.R. Fellows, G. Fertin, D. Hermelin, S. Vialette, Upper and lower bounds for finding connected motifs in vertex-colored graphs, J. Comput. Syst. Sci. 77

(2011) 799–811.[18] M.L. Fredman, R.E. Tarjan, Fibonacci heaps and their uses in improved network optimization algorithms, J. Assoc. Comput. Mach. 34 (1987) 596–615.[19] M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, New York, 1979.[20] S. Guillemot, F. Sikora, Finding and counting vertex-colored subtrees, Algorithmica 65 (2013) 828–844.[21] D. Harvey, D.S. Roche, An in-place truncated Fourier transform and applications to polynomial multiplication, in: Proc. 35th International Symp. on

Symbolic and Algebraic Comput, 2010, pp. 325–329.[22] F. Hüffner, S. Wernicke, T. Zichner, Algorithm engineering for color-coding with applications to signaling pathway detection, Algorithmica 52 (2008)

114–132.[23] I. Koutis, Faster algebraic algorithms for path and packing problems, in: Proc. 35th International Colloquium on Automata, Languages and Programming,

2008, pp. 575–586.[24] I. Koutis, Constrained multilinear detection for faster functional motif discovery, CoRR, arXiv:abs/1206.3483, 2012.[25] V. Lacroix, C.G. Fernandes, M.F. Sagot, Motif search in graphs: application to metabolic networks, IEEE/ACM Trans. Comput. Biol. Bioinform. 3 (2006)

360–368.[26] A.S. LaPaugh, R.L. Rivest, The subgraph homeomorphism problem, J. Comput. Syst. Sci. 20 (1980) 133–149.[27] A. Lozano, R.Y. Pinter, O. Rokhlenko, G. Valiente, M. Ziv-Ukelson, Seeded tree alignment, IEEE/ACM Trans. Comput. Biol. Bioinform. 5 (2008) 503–513.[28] A. Mano, T. Tuller, O. Béjà, R.Y. Pinter, Comparative classification of species and the study of pathway evolution based on the alignment of metabolic

pathways, BMC Bioinform. 11 (2010) S38.[29] M. Mucha, P. Sankowski, Maximum matchings via gaussian elimination, in: Proc. 45th IEEE Symp. on Foundations of Computer Science, 2004,

pp. 248–255.[30] R.Y. Pinter, O. Rokhlenko, D. Tsur, M. Ziv-Ukelson, Approximate labelled subtree homeomorphism, J. Discrete Algorithms 6 (2008) 480–496.[31] R.Y. Pinter, O. Rokhlenko, E. Yeger-Lotem, M. Ziv-Ukelson, Alignment of metabolic pathways, Bioinformatics 21 (2005) 3401–3408.[32] J.T. Schwartz, Fast probabilistic algorithms for verification of polynomial identities, J. Assoc. Comput. Mach. 27 (1980) 701–717.[33] J. Scott, T. Ideker, R.M. Karp, R. Sharan, Efficient algorithms for detecting signaling pathways in protein interaction networks, J. Comput. Biol. 13 (2006)

133–144.[34] R. Shamir, D. Tsur, Faster subtree isomorphism, J. Algorithms 33 (1999) 267–280.[35] R. Sharan, B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, QNet: a tool for querying protein interaction networks, J. Comput. Biol. 15 (2008) 913–925.[36] T. Shlomi, D. Segal, E. Ruppin, R. Sharan, QPath: a method for querying pathways in a protein–protein interaction networks, BMC Bioinform. 7 (2006)

199.[37] M. Smoot, K. Ono, J. Ruscheinski, P.L. Wang, T. Ideker, Cytoscape 2.8: new features for data integration and network visualization, Bioinformatics 27

(2011) 431–432.[38] T. Tholey, A dynamic data structure for maintaining disjoint paths information in digraphs, in: Proc. 14th International Symp. on Algorithms and

Comput., 2003, pp. 565–574.[39] Y. Tohsato, H. Matsuda, A. Hashimoto, A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy, in: Proc. 8th Interna-

tional Conf. on Intelligent Syst. for Molecular Biol, 2000, pp. 376–383.[40] M. Vingron, M.S. Waterman, Sequence alignment and penalty choice, review of concepts, case studies and implications, J. Mol. Biol. 235 (1994) 1–12.


3.2 Partial Information Network Queries

Ron Y. Pinter, Hadas Shachnai and Meirav Zehavi. Partial Information Network Queries.

Journal of Discrete Algorithms (JDA), 35 (invited IWOCA 2013 issue):129–145, 2015.

• Preliminary version: Ron Y. Pinter and Meirav Zehavi. Partial Information

Network Queries. In the proc. of the 24th International Workshop on Combinatorial

Algorithms (IWOCA), pages 362–375, 2013.

59


Journal of Discrete Algorithms 31 (2015) 129–145

Contents lists available at ScienceDirect

Journal of Discrete Algorithms

www.elsevier.com/locate/jda

Partial Information Network Queries ✩

Ron Y. Pinter, Hadas Shachnai, Meirav Zehavi ∗

Department of Computer Science, Technion – Israel Institute of Technology, Haifa 32000, Israel

a r t i c l e i n f o a b s t r a c t

Article history:Available online 3 December 2014

Keywords:Parameterized algorithmPattern matchingPartial information network queryAlignment network queryTopology-free network query

We study the Partial Information Network Query (PINQ) problem, which generalizes two problems that often arise in bioinformatics: the Alignment Network Query (ANQ) problem and the Topology-Free Network Query (TFNQ) problem. In both ANQ and TFNQ we have a pattern P and a graph H , and we seek a subgraph of H that resembles P . ANQ requires knowing the topology of P , while TFNQ ignores it. PINQ fits the scenario where partial information is available on the topology of P . Our main result is a parameterized algorithm that handles inputs for PINQ in which P is a set of trees. This algorithm significantly improves the best known O ∗ running time in solving TFNQ. We also improve the best known O ∗ running times in solving two special cases of ANQ.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Algorithms for the Alignment Network Query (ANQ) and Topology-Free Network Query (TFNQ) problems provide means to study the function and evolution of biological networks. Given a pattern P and a host graph H , these queries seek a subgraph of H that resembles P . With the increasing amount of information we have on biological networks, ANQ and TFNQ are becoming widely spread (see, e.g., [14] and [30]). We note that similar queries for sequences have been studied and used extensively in the past four decades [22].

TFNQ , also known as Graph Motif, requires only the connectivity of the solution, while ANQ requires resemblance between the topology of P and the solution. A user having partial information on the topology of P can either run an alignment network query for each possible topology for P , given this partial information, or run a topology-free network query. The first method is inefficient, while the second may output undesirable results that contradict the partial information on P . We present a generalization of ANQ and TFNQ, that we call the Partial Information Network Query (PINQ) problem, which fits the scenario where only partial information is available on P .

Parameterized algorithms are an approach to solve NP-hard problems by confining the combinatorial explosion to a parameter k. More precisely, a problem is fixed-parameter tractable (FPT) with respect to a parameter k if an instance of size n can be solved in time O ∗( f (k)) for some function f [23].1

In this paper we present parameterized algorithms for NP-hard special cases of PINQ. In particular, we significantly improve the best known O ∗ running time in solving TFNQ.

Abbreviations: Partial Information Network Query (PINQ); Alignment Network Query (ANQ); Topology-Free Network Query (TFNQ).✩ A preliminary version of this paper appeared in the proceedings of the 24th International Workshop on Combinatorial Algorithms (IWOCA’13) [24].

* Corresponding author.E-mail addresses: [email protected] (R.Y. Pinter), [email protected] (H. Shachnai), [email protected] (M. Zehavi).

1 O ∗ hides factors polynomial in the input size.

http://dx.doi.org/10.1016/j.jda.2014.11.0071570-8667/© 2014 Elsevier B.V. All rights reserved.


130 R.Y. Pinter et al. / Journal of Discrete Algorithms 31 (2015) 129–145

Notation: Given a graph G , let V (G) and E(G) denote its node set and edge set, respectively. Given U ⊆ V (G), let G[U ]denote the subgraph of G induced by U . We denote by l(v) and N(v) the label and neighbor set of a node v , respectively. Given a set of tuples A, i ∈ N and an element e, let A[(i, e)] be the set of tuples in A in which e appears in the ith position.

1.1. Problem statement

Roughly speaking, given a graph H and a set of graphs P , PINQ asks if H has a connected subset of vertices that can be partitioned, such that each of the induced subgraphs resembles a different graph in P .

Formally, the input for PINQ consists of

• L – A set of labels.• � : L × L → {−∞} ∪ R – A label-to-label similarity score table.• P – A set of labeled graphs P1, P2, . . . , Pt .• H, w : E(H) → R – An edge-weighted labeled graph.• W ∈ R – A minimum score (for a solution).

A solution consists of a connected subgraph S of H , a partition of V (S) into the subsets {V 1S , . . . , V t

S }, and an isomor-phism mi between S[V i

S ] and Pi , for all 1 ≤ i ≤ t , such that

• ∑1≤i≤t

∑v∈V i

S�(l(v), l(mi(v))) + ∑

e∈E(S) w(e) ≥ W .

• Any cycle in S is completely contained in S[V iS ], for some 1 ≤ i ≤ t .

Let V (P) = ⋃1≤i≤t V (Pi), and k = |V (P)|. We note that the cycle requirement allows us to avoid solving a generalization

of the Clique problem, which is W[1]-hard [12]. Indeed, without the cycle requirement, Clique is the special case where t = k, �(l, l′) = 0 for all l, l′ ∈ L, w(e) = 1 for all e ∈ E , and W = k(k − 1)/2. However, with the cycle requirement, each clique on at least 3 nodes contained in a solution must be entirely contained in a graph in P . Thus, for certain families of graphs (e.g., trees), the corresponding special case of PINQ does not generalize Clique.

Special cases of PINQ: We note that ANQ is the special case where t = 1, and all the edge weights are 0. Moreover, TFNQ is the special case where t = k, and �(l, l′) ∈ {−∞, 0} for all l, l′ ∈ L.

1.2. Prior work

The alignment network query problem: ANQ is NP-hard even if the single graph in P is a path, since this case generalizes the Hamiltonian path problem [15].

Pinter et al. [27] gave a polynomial time algorithm that handles inputs for ANQ in which both P1 and H are trees. This algorithm was used to perform inter-species and intra-species alignments of metabolic pathways [26], and a pathway evolution study [21]. Recently, it was extended to handle a certain family of DAGs [25].

Another approach, based on color coding [1], enables H to be a general graph, and provides parameterized algorithms with parameter k. This approach is used by QPath [29] to perform simple path queries in time O ∗(5.437k). QNet [11]extends QPath by allowing P1 to be a graph whose treewidth t w is bounded. Its running time is O ∗(8.155k|V (H)|t w+1). PADA1 [6] is an alternative to QNet that bounds the size f vs of the feedback vertex set of P1 instead of its treewidth. Its running time is O ∗(8.155k|V (H)| f vs). Hüffner et al. [17] reduced the running time of QPath to O ∗(4.314k). All of these algorithms are randomized.

We note that there are several problems related to ANQ that have applications in bioinformatics, and refer the reader to the surveys [14,31] for the precise details.

The topology-free network query problem: Unweighted TFNQ (i.e., TFNQ restricted to inputs in which all the edge weightsare 0) was introduced by Lacroix et al. [20], and TFNQ was first studied by Bruckner et al. [7]. Lacroix et al. [20] proved that unweighted TFNQ is NP-hard even if H is a tree. On the positive side, TFNQ, when parameterized by k, is in FPT [7].

In recent years, many papers investigated the parameterized complexity of unweighted TFNQ and other closely related problems (see [2,3,5,28,32,7,9,10,13,16,19,25]). Yet, prior to this paper, the algorithm of best O ∗ running time for TFNQ, which does not depend on the numeric value of W , was the one of Bruckner et al. [7], running in time O ∗(k!3k). Table 1presents known parameterized algorithms for (weighted) TFNQ. All of the algorithms are randomized.

1.3. Our contribution

For the sake of clarity, we start by presenting an algorithm called ANQ-Alg, which handles inputs for PINQ where H is a general graph, and P is a set consisting of a single tree. ANQ-Alg runs in time O ∗(6.75k), which improves the O ∗ running times of QNet and PADA1 for inputs where P1 is a tree. For the special case in which P1 is a path, ANQ-Alg runs in time O ∗(4k), which further improves the O ∗ running time of QPath. ANQ-Alg is based on the randomized divide-and-conquer method [8].


R.Y. Pinter et al. / Journal of Discrete Algorithms 31 (2015) 129–145 131

Table 1Parameterized algorithms for (weighted) TFNQ. The first two algorithms require W and the edge weights to be nonnegative integers, and their running times depend on the numeric value of W .

Reference Running time Method

Guillemot et al. [16] O ∗(4k W 2) multilinear detection [18]Pinter et al. [25] O ∗(2k W ) narrow sieves [4]Bruckner et al. [7] O ∗(k!3k) color coding [1]

This paper O∗(20.25k+O(log2k)) randomizeddivide-and-conquer [8]

Our main result is the second algorithm, PINQ-Alg, which builds on ANQ-Alg. PINQ-Alg handles inputs for PINQ in which H is a general graph and P is a set of trees. It runs in time O ∗(6.75k+O (log2 k)3t). In particular, it solves TFNQ in time O ∗(20.25k+O (log2 k)), i.e., for t = k, which significantly improves the previous best O ∗(k!3k) running time of Bruckner et al. [7].2

2. An algorithm for ANQ

We first present an algorithm, called ANQ-Alg, for the special case of PINQ where P is a set consisting of a single tree (i.e., t = 1 and P1 is a tree). We use the randomized divide-and-conquer method of [8]: Our algorithm ANQ-Alg randomly divides the problem into two smaller subproblems that it recursively solves, and then combines the answers.

Roughly speaking, the main idea of ANQ-Alg is to divide the problem of finding one tree T that is a solution3 into two subproblems of finding two “almost disjoint” subtrees of T that are small, yet allow constructing T . At first, we root P1 and partition it into two subtrees, P L

1 and P R1 , containing (together) all of its nodes, and having exactly one common node, m1,

which is the root of P L1 . It suffices to consider only subtrees that contain (each) at most 2/3|V (P1)| + 1 nodes. To ensure

that there is no overlap between the subtrees of H to which we (isomorphically) map P L1 and P R

1 , excluding the node to which we map m1, we try several random partitions of the node-set of H into two subsets, U L and U R . We cannot try only one partition, since we do not know in advance which nodes in H are relevant to mapping P L

1 , and which to mapping P R1 .

Consider some examined partition (U L, U R). We solve the subproblem that concerns P L1 and U L , obtaining several map-

pings of P L1 into subtrees of H , using only nodes from U L , except for m1, which is mapped to a node in U R . Each mapping

is represented by a node in U R , to which m1 is mapped, and its score. To ensure that we do not obtain too many mappings, for each node h in U R , we keep at most one score—this is the best score that can be obtained by mapping P L

1 to a subtree of H , using only h and nodes from U L , where m1 is mapped to h. Now, we need to map P R

1 to a subtree of H in a manner that corresponds to a mapping of the entire tree P1. Observe that this can be achieved by mapping P R

1 in a manner that maps m1 to a node h that appears, with some score s, as a possible answer to the first subproblem, where we assume that the score of matching m1 with h is s. Next, assume that we are trying to map P R

1 in this manner.To solve the subproblem of mapping P R

1 , we again try to partition it into two subtrees, (P R1 )R and (P R

1 )L , having a common node m2. Now, however, we need to consider the answers of the subproblem related to P L

1 when mapping the subtree among (P R

1 )R and (P R1 )L that contains m1. Suppose, for instance, that this subtree is (P R

1 )R . Then, when mapping (P R

1 )R , we have to consider both answers related to P L1 and answers related to (P R

1 )L . Therefore, a problem considered at some recursive stage corresponds to finding a mapping of a subtree T of P1, considering answers—pairs of a node in H and a score—related to already computed mappings of several subtrees of P1 whose roots belong to T . When dividing such a problem into two subproblems, each answer that is a part of the original problem, corresponding to some tree T ′ , becomes part of the subproblem whose corresponding subtree contains the root of T ′ .

In Section 2.1, we give a more formal overview of ANQ-Alg. Section 2.2 contains some definitions required for the pseudocode of ANQ-Alg (given in Section 2.3). Finally, in Section 2.4, we prove the correctness and analyze the running time of ANQ-Alg.

2.1. Overview

Recall that, following the randomized divide-and-conquer method, ANQ-Alg is a recursive algorithm. Next, we use Fig. 1to describe and illustrate a recursive stage in further detail.

Each recursive stage refers to a rooted subtree R of P1, a set U ⊆ V (H) and a set Solved of rooted subtrees of R , such that the subgraph R ′ of R induced by the nodes in R that do not belong to any tree in Solved and the roots of the trees in Solved is a tree. For example, part A of Fig. 1 illustrates an input for algorithm ANQ-Alg and a recursive stage, where Solved

2 The algorithms of [16,25] are faster than the algorithm of [7] and our algorithm only if W and the edge weights are nonnegative polynomially bounded integers.

3 Recall that, informally, a subtree T of H is a solution if there is an isomorphism between T and P1 such that the sum of the scores of the labels it matches, to which we add the sum of the edge weights of T , is at least W (See Section 1.1).



Fig. 1. An illustration of a randomized divide-and-conquer step in ANQ-Alg.

contains the squares, triangles and hexagons trees, and R ′ is the subtree of R induced by the bold nodes. Each tree in Solvedhas several pairs, where each such pair consists of a node h ∈ V (H) and a score s, and it refers to an isomorphism of score s between the tree (to which the pair belongs) and a subtree of H that maps the root of the tree to h. For example, part A of Fig. 1 illustrates the pairs (c, 2) and (e, 2) that belong to the squars tree.

Using such pairs, algorithm ANQ-Alg computes scores of isomorphisms between R and subtrees of H by computing scores of isomorphisms between the smaller tree R ′ and subtrees of H which map the nodes in R ′ (excluding its root) to nodes in U . In the base cases of the recursion, R ′ contains at most two nodes, and thus it can be easily mapped. In the step of the recursion, algorithm ANQ-Alg divides R ′ into two subtrees that have a common node, and randomly divides Uinto two subsets. Algorithm ANQ-Alg uses the first subset to map the first subtree; then it uses the corresponding results and the second subset to map the second subtree. For example, in order to solve the problem illustrated in part A of Fig. 1, algorithm ANQ-Alg divides it into the subproblems illustrated in parts B and C. Algorithm ANQ-Alg solves the subproblem illustrated in part B and uses its answer (i.e., the isomorphism of the hexagons tree in part C) to solve the subproblem illustrated in part C.

2.2. Some definitions

Each of the definitions given below is illustrated in Fig. 1.First, choose some node p3 ∈ V (P1), and add p1, p2, {p1, p2} and {p2, p3} (these are new nodes and edges) to P1. Also,

add a new node h∗ to H and connect it to all the nodes in V (H) by edges of weight 0. We thus avoid a special treatment of the first call to the recursive procedure ANQ-Rec, that is the main part of our algorithm ANQ-Alg.



Root P1 at p1, and use a preorder to denote its nodes by p1, p2, . . . , p|V (P1)| . Given nodes p and np ∈ N(p), let T (p, np)

denote the subtree induced by p, its children whose indices are greater than that of np , and the descendants of these children. For example, in part A of Fig. 1, we have that T (p3, p2) = R .

Each stage of ANQ-Rec involves a subtree of P1 of the form T (r, nr), for nodes r and nr ∈ N(r), a set U ⊆ V (H), and a set Solved of disjoint subtrees of T (r, nr). The trees in Solved are of the form T (p, np), and can thus be represented by pairs (p, np). Definition 1 refers to this set of trees.

Definition 1. Given Solved ⊆ {(p, np) : p ∈ V (P1), np ∈ N(p)}, r ∈ V (P1) and nr ∈ N(r), we say that Solved is an (r, nr)-subtree set if its trees are disjoint subtrees of T (r, nr) and one of them is rooted at r (i.e., Solved[(1, r)] �= ∅).

For example, in part A of Fig. 1, we have that Solved = {(p3, p4), (p6, p5), (p5, p8)} is a (p3, p2)-subtree set.Each tree T (p, np) in Solved has several scores. Each score corresponds to its mapping to a subtree T of H . We only

know the root of T , and it belongs to U iff p �= r. Moreover, no tree has different scores for isomorphisms that map its root to the same node in V (H). We use a tuple (p, np, h, s) to represent an isomorphism of score s between T (p, np) and a subtree of H that maps p to h. Definition 2 refers to these tuples. We note that PS stands for Partial Solutions.

Definition 2. Let PS ⊆ {(p, np, h, s) : p ∈ V (P1), np ∈ N(p), h ∈ V (H), s ∈ R}, r ∈ V (P1), nr ∈ N(r) and U ⊆ V (H). We say that PS is an (r, nr, U )-set if

1. {(p, np) : PS[(1, p), (2, np)] �= ∅} is an (r, nr)-subtree set.2. ∀(p, np, h, s) ∈ PS: ∀s′[(p, np, h, s′) ∈ PS → s = s′] and (p �= r ↔ h ∈ U ).

For example, in part A of Fig. 1, we have that PS = {(p3, p4, c, 2), (p3, p4, e, 2), (p6, p5, i, 2), (p5, p8, i, 3), (p5, p8, h, 3)}is a (p3, p2, U )-set.

Suppose we have an (r, nr, U )-set PS. We find the best options (corresponding to different mappings of r) to map the roots of the subtrees of T (r, nr) in PS and the nodes in T (r, nr) which do not belong to these subtrees to subtrees whose nodes (excluding the mappings of r) are in U . Thus we map all T (r, nr) and use only nodes in U and nodes that we have already used for computing PS. Definition 3 concerns this set of nodes in T (r, nr) that we want to map.

Definition 3. Given an (r, nr, U )-set PS, T (PS) is the subtree of P1 induced by {v ∈ V (T (r, nr)) : �(p, np, h, s) ∈ PS s.t. v ∈V (T (p, np)) \ {p}}.

For example, in part A of Fig. 1, we have that T (PS) is the subtree of R induced by its bold nodes.We divide our problem into two smaller subproblems. We achieve this by finding a node m ∈ V (T (PS)) and a neighbor

nm ∈ N(m) that divide T (PS) into two smaller subtrees: P1[V (T (PS)) ∩ V (T (m, nm))] and P1[V (T (PS)) \ V (T (m, nm))] ∪ {m}. Definition 4 refers to our division options.

Definition 4. Given an (r, nr, U )-set PS, we define: mid(PS) = {(m, nm, sizeL, sizeR) : m ∈ V (T (PS)), nm ∈ N(m), sizeL =|V (T (PS)) ∩ V (T (m, nm))| − 1, sizeR = |V (T (PS))| − sizeL − 1}.

For example, in part A of Fig. 1, we have that mid(PS) = {(p3, p2, 4, 0), (p3, p4, 0, 4), (p3, p11, 0, 4), (p4, p3, 3, 1),

(p4, p5, 0, 4), (p5, p4, 2, 2), (p5, p6, 1, 3), (p5, p8, 0, 4), (p5, p9, 0, 4), (p6, p5, 0, 4), (p6, p7, 0, 4), (p8, p5, 0, 4)}.We seek a tuple (m, nm, sizeL, sizeR) ∈ mid(PS) that minimizes max{sizeL, sizeR}. Then, as the following lemma implies,

our new subproblems are small.

Lemma 1. Given a rooted tree T such that v1, v2, . . . , vn is a preorder of V (T ) and n ≥ 3, there are vi ∈ V (T ) and v j ∈ N(vi) such that max{2, � n

3 �} ≤ |V (T (vi, v j))| ≤ � 2n3 �. If T is a path, then there are vi ∈ V (T ) and v j ∈ N(vi) such that |V (T (vi, v j))| = � n

2 �.

Proof. Given vx, v y ∈ V (T ), denote V x,y = V (T (vx, v y)). Also denote U1 = {(vx, v y) : vx ∈ V (T ), v y ∈ N(v), |V x,y| ≤ � 2n3 �},

and U2 = {(vx, v y) : vx ∈ V (T ), v y ∈ N(v), max{2, � n3 �} ≤ |V x,y|}.

Since n ≥ 3, we have that if |V 1,2| < max{2, � n3 �}, then |V 2,1| = n − |V 1,2| ≥ max{2, � n

3 �}; therefore U2 �= ∅. Let (vi, v j)

be a pair in U2 that minimizes |V i, j |.Suppose, by way of contradiction, that (vi, v j) /∈ U1. Since |V i, j| ≥ 2, we can denote by vl the child of vi with the

smallest index that is greater than j. Since |Vl,i| < |V i, j|, our choice of (vi, v j) implies that (vl, vi) /∈ U2. Therefore |V i,l| =|V i, j| − |Vl,i| ≥ (� 2n

3 � + 1) − (max{2, � n3 �} − 1) = � 2n

3 � − max{2, � n3 �} + 2 ≥ max{2, � n

3 �}. We get that (vi, vl) ∈ U2, but since |V i,l| < |V i, j|, this is a contradiction. Thus (vi, v j) ∈ U1, which proves the first part of the lemma.

Now suppose that T is a path. Denote by l the index of the leaf in T that is not vn . If l < � n2 � + 1, then i = � n

2 � + 1 and j denotes the index of the father of vi (which exists since i > 1), and thus |V i, j | = |{vi, vi+1, ..., vn}| = n − (i − 1) = � n

2 �. Else if l = � n

2 � + 1, then i = 1 and j = 2, and thus |V i, j| = |{v1} ∪ {vl+1, vl+2, ..., vn}| = 1 + n − (� n2 � + 1) = � n

2 �. Otherwise i = l − � n

2 � + 1 and j = i − 1 (note that j ≥ 1), and thus |V i, j | = |{vi, vi+1, ..., vl}| = l − (i − 1) = � n2 �. �



2.3. The algorithm

We are now ready to present our algorithm ANQ-Alg, whose main component is a recursive procedure called ANQ-Rec. We first note that an input for ANQ-Rec is of the form (r, nr, U , PS), where r ∈ V (P1), nr ∈ N(r), U ⊆ V (H), and PS is an empty set or an (r, nr, U )-set. The output SOL of ANQ-Rec is an empty set or an (r, nr, U )-set such that SOL[(1, r), (2, nr)] =SOL (i.e., the tuples in SOL represent mappings of T (r, nr)).

ANQ-Alg(P, H,�, W ):

1. Add elements to the input as described in Section 2.2.2. SOL ⇐ ANQ-Rec(p2, p1, V (H) \ {h∗}, {(p2, p3, h∗, 0)}).3. Accept iff (SOL �= ∅ ∧ max(p,np ,h,s)∈SOL{s} ≥ W ).

The following is the pseudocode of ANQ-Rec.

ANQ-Rec(r,nr, U ,PS):

1. If PS = ∅ ∨ |V (T (PS))| = 1: Return PS.

We handle two base cases. PS = ∅ implies that we could not map some subtree of T (r, nr) in previous computations, and thus we return ∅.

2. If |V (T (PS))| = 2:(a) Denote by v the node in V (T (PS)) which is not r.(b) If PS[(1, v)] = ∅: Return

⋃h∈V (H)

s.t. U∩N(h) �=∅,s∈R

s.t. (r,v,h,s)∈PS

{(r,nr,h, max

h′∈U∩N(h)

{s + w

({h,h′}) + �

(l(v), l

(h′))})}

.

2. (c) Return

⋃h∈V (H)

s.t.⋃h′∈N(h){PS[(1,v),(3,h′)]}�=∅,

s∈Rs.t. (r,v,h,s)∈PS

{(r,nr,h, max

h′∈N(h),s′∈Rs.t.

(v,r,h′,s′)∈PS

{s + s′ + w

({h,h′})})}

.

We handle the two remaining base cases. They correspond to whether or not v is a root of a tree in PS. In both, for each mapping of r, we find the best legal mapping of v to a node h′ in U .

3. SOL ⇐ ∅.

The set SOL will hold tuples that represent the best mappings we find for T (r, nr).

4. Choose (m, nm, sizeL, sizeR) ∈ mid(PS) that minimizes max{sizeL, sizeR}.

We find the best nodes m and nm to divide our problem of mapping T (PS) into the two smaller subproblems of mapping V (T (PS)) ∩ V (T (m, nm)) and (V (T (PS)) \ V (T (m, nm))) ∪ {m}.

5. probL ⇐ sizeL|V (T (PS))|−1 and probR ⇐ 1 − probL .

6. Repeat 1(1−1/e)2probL

sizeL probRsizeR

times:

(a) U L ⇐ ∅ and U R ⇐ U .(b) For each h ∈ U : With probability probL , move h from U R to U L .

We randomly partition U into two sets, U L and U R , that we use in the first and second subproblems, respectively, as follows. The nodes in (V (T (PS)) ∩ V (T (m, nm))) \ {m} are mapped to nodes in U L , and then the other nodes in V (T (PS)) \ {r}are mapped to nodes in U R . The probability probL of a node to be in U L and the number of executions of Step 6 guarantee that with good probability the solutions to our subproblems allow solving our problem of mapping T (PS).



6. (c) SOLL ⇐ ∅ and SOLR ⇐ ∅.

The sets SOLL and SOLR will hold the solutions we find to our subproblems.

6. (d) PSL ⇐ {(p, np, h, s) ∈ PS : p ∈ V (T (m, nm)), p �= m ↔ h ∈ U L}.(e) If PS[(1, m)] = ∅:

i. If U R = ∅: Go to the next iteration.ii. Add

⋃np∈N(m) s.t. V (T (m,np))={m},h∈U R

{(m, np, h, �(l(m), l(h)))} to PSL .(f) If ∃p ∈ V (T (m, nm))[PS[(1, p)] �= ∅ ∧ PSL[(1, p)] = ∅]: Go to the next iteration.

The set PSL is initialized to hold the tuples in PS that are relevant to mapping T (m, nm). If it does not contain a tree rooted at m, then we add all the options of mapping the tree that contains only m to a node in U R (if U R = ∅, then we skip the rest of the iteration). If we lost the tuples representing all the mappings in PS of a tree that is relevant to PSL , then we skip the rest of the iteration.

6. (g) SOLL ⇐ ANQ-Rec(m, nm, U L, PSL).

We solve our first subproblem.

6. (h) PSR ⇐ SOLL ∪ {(p, np, h, s) ∈ PS : p /∈ V (T (m, nm)), h /∈ U L}.(i) If SOLL = ∅ ∨ ∃p /∈ V (T (m, nm))[PS[(1, p)] �= ∅ ∧ PSR [(1, p)] = ∅]: Go to the next iteration.

The set PSR is initialized to hold SOLL and the tuples in PS that are relevant to our second subproblem. If SOLL = ∅ or we lost the tuples representing all the mappings in PS of a tree that is relevant to our second subproblem, then we skip the rest of the iteration.

6. (j) SOLR ⇐ ANQ-Rec(r, nr, U R , PSR).(k) For each h, s s.t. (r, nr, h, s) ∈ SOLR ∧ �s′[(r, nr, h, s′) ∈ SOL ∧ s ≤ s′]: SOL ⇐ (SOL ∪ {(r, nr, h, s)}) \ SOL[(1, r), (2, nr),

(3, h)].7. Return SOL.

We solve our second subproblem. Then we update SOL to hold the tuples representing the best mappings of T (r, nr) we have found so far.

2.4. Correctness and running time

In this section we prove the following theorem.

Theorem 1. Algorithm ANQ-Alg solves inputs for PINQ in which P is a set of one tree in time O (6.75k+O (log k)|E(H)|) and space O (|V (H)| log2 k). If P1 is a path, then it runs in time O (4k+O (log k)|E(H)|).

First we prove the correctness of algorithm ANQ-Alg.Let r ∈ V (P1), nr ∈ N(r), U ⊆ V (H) and PS be an (r, nr, U )-set. Given h ∈ V (H), let ISO(r, nr, U , PS)h denote the set of

every isomorphism M between T (PS) and a subtree of H , such that M(r) = h and for each p ∈ V (T (PS)) the following condition holds.

1. If PS[(1, p)] = ∅: M(p) ∈ U . Denote s(M, p) = �(l(p), l(M(p))).2. Else: PS[(1, p), (3, M(p))] �= ∅.

Denote by s(M, p) the score s for which PS[(1, p), (3, M(p)), (4, s)] �= ∅.

Given an isomorphism M ∈ ISO(r, nr, U , PS)h , let s(ISO(r, nr, U , PS)h, M) denote the score ∑

p∈V (T (PS)) s(M, p) +∑{p,p′}∈E(T (PS)) w({M(p), M(p′)}). Furthermore, if ISO(r, nr, U , PS)h �= ∅, let s(ISO(r, nr, U , PS)h) denote the score

maxM∈ISO(r,nr ,U ,PS)h {s(ISO(r, nr, U , PS)h, M)}.Given an input I , let (P, H, �, W ) denote the instance that we get after adding p1, p2, {p1, p2}, {p2, p3}, h∗ and its edges

to I (in Step 1 of ANQ-Alg). Note that I has a solution iff there is an isomorphism M between T (p2, p1) and a subtree Sof H s.t. M(p2) = h∗ and

∑p∈V (T (p2,p1))\{p2} �(l(p), l(M(p))) + ∑

e∈E(S) w(e) ≥ W . Thus, the following lemma implies the correctness of ANQ-Alg.

Lemma 2. Given r ∈ V (P1), nr ∈ N(r), U ⊆ V (H) and ∅ or an (r, nr, U )-set PS, the ∅ or (r, nr, U )-set SOL returned by ANQ-Rec(r, nr, U , PS) satisfies



1. If PS = ∅, then SOL = ∅.2. Else ∀h∗ ∈ V (H):

(a) If ISO(r, nr, U , PS)h∗ = ∅, then SOL[(3, h∗)] = ∅.(b) Else:

i. For each s s.t. (r, nr, h∗, s) ∈ SOL, s ≤ s(ISO(r, nr, U , PS)h∗ ).ii. With probability at least 1 − 1/e, (r, nr, h∗, s(ISO(r, nr, U , PS)h∗ )) ∈ SOL.

Proof. We prove the lemma by using induction on l = |V (T (PS))| (where |V (T (∅))| = 0). If 0 ≤ l < 3, then by Steps 1 and 2 of the pseudocode, the lemma holds.

Now suppose that l ≥ 3, and assume that the lemma holds ∀r′ ∈ V (P1), n′r ∈ N(r′), U ′ ⊆ V (H) and ∅ or an (r′, n′

r, U ′)-set PS′ s.t. |V (T (PS′))| < l. Let h∗ be a node in V (H).

Let (U L, U R) be a partition chosen in an iteration of Step 6. If we do not skip the rest of the iteration in Steps 6(e)i or 6(f), then by the induction hypothesis, the set SOLL computed in Step 6(g) satisfies [∀h, s s.t. (m, nm, h, s) ∈ SOLL :(PS[(1, m)] = ∅ → h ∈ U R) ∧ (PS[(1, m)] �= ∅ → PS[(1, m), (3, h)] �= ∅) ∧ ISO(m, nm, U L, PSL)h �= ∅ ∧ s ≤ s(ISO(m, nm, U L, PSL)h)]. If we do not skip the rest of the iteration in Step 6(i), then PS[(1, p), (2, np)] �= ∅ → (PSL ∪ PSR)[(1, p), (2, np)] �= ∅, and PSL ∪ (PSR \ SOLL) ⊆ PS to which we add (

⋃np∈N(m) s.t. V (T (m,np))={m},h∈U R

{(m, np, h, �(l(m), l(h)))}) iff PS[(1, m)] = ∅. By the induction hypothesis, the set SOLR computed in Step 6(j) satisfies [∀h, s s.t. (r, nr, h, s) ∈ SOLR : PS[(1, r), (3, h)] �=∅ ∧ ISO(r, nr, U R , PSR)h �= ∅ ∧ s ≤ s(ISO(r, nr, U R , PSR)h)]. Thus, by the pseudocode and the definitions of ISO and s(ISO(. . .)), we get that if ISO(r, nr, U , PS)h∗ = ∅ and we do not skip the rest of the iteration before Step 6(j), then SOLR [(3, h∗)] = ∅, and otherwise [∀s s.t. (r, nr, h∗, s) ∈ SOLR : s ≤ s(ISO(r, nr, U , PS)h∗ )].

Now suppose that ISO(r, nr, U , PS)h∗ �= ∅. Then, there is a mapping M ∈ ISO(r, nr, U , PS)h∗ s.t. s(ISO(r, nr, U , PS)h∗ , M) =s(ISO(r, nr, U , PS)h∗ ).

Denote SL = {h ∈ U : ∃p ∈ V (T (PS)) ∩ V (T (m, nm)) s.t. M(p) = h} \ {M(m)}, and S R = {h ∈ U : ∃p ∈ V (T (PS)) \(V (T (m, nm)) \ {m}) s.t. M(p) = h} \ {M(r)}.

The probability that a partition (U L, U R) s.t. SL ⊆ U L and S R ⊆ U R is chosen in a given iteration of Step 6 is probL

sizeL probRsizeR . Now consider an iteration in which such a partition is chosen. We do not skip before Step 6(g). By

the induction hypothesis, with probability at least 1 − 1/e, the set SOLL computed in Step 6(g) includes (m, nm, M(m),

s(ISO(m, nm, U L, PSL)M(m))). Then, we do not skip the rest of the iteration in Step 6(i); and by the induction hypothesis, with probability at least 1 − 1/e, the set SOLR computed in Step 6(j) includes (r, nr, h∗, s(ISO(r, nr, U R , PSR)h∗ )). Note that PS[(1, p), (2, np)] �= ∅ → (PSL ∪ PSR)[(1, p), (2, np)] �= ∅, and PSL ∪ (PSR \ SOLL) ⊆ PS to which we add(⋃

np∈N(m) s.t. V (T (m,np))={m},h∈U R{(m, np, h, �(l(m), l(h)))}) iff PS[(1, m)] = ∅. We get that s(ISO(r, nr, U R , PSR)h∗ ) =

s(ISO(r, nr, U , PS)h∗ ).Thus, the probability that there is an iteration in which we reach Step 6(j) and the computed set SOLR includes

(r, nr, h∗, s(ISO(r, nr, U , PS)h∗ )) is at least

1 − (1 − (1 − 1/e)2probL

sizeL probRsizeR

)1/((1−1/e)2probLsizeL probR

sizeR ) ≥ 1 − 1

e. �

Now we analyze the running time of algorithm ANQ-Alg. Assume w.l.o.g that |V (H)| ≤ |E(H)|. Given an input (r, nr, U , PS) to ANQ-Rec such that l = |V (T (PS))|, let T (l) be the running time of ANQ-Rec(r, nr, U , PS). The pseudocode and Lemma 1 imply the following recurrence relation for some constants a and b (note that if l ≥ 4, then 2 ≤ � l

3 �):

• If 0 ≤ l < 4 : T (l) ≤ b|E(H)|.• Else, if P1 is a path

T (l) ≤ a ·(

l − 1

� l2 � − 1

)� l2 �−1( l − 1

� l2 �

)� l2 �[

l∣∣V (H)

∣∣ + T

(⌈l

2

⌉)+ T

(⌊l

2

⌋+ 1

)]

≤ b · 2l[

l∣∣E(H)

∣∣ + T

(⌊l

2

⌋+ 1

)].

• Else,

T (l) ≤ a · max� l

3 �≤l′≤� 2l3 �

{(l − 1

l′ − 1

)l′−1( l − 1

l − l′

)l−l′[l∣∣V (H)

∣∣ + T(l′) + T

(l − l′ + 1

)]}

≤ b ·(

3

413

)l[l∣∣E(H)

∣∣ + T

(⌊2l

3

⌋+ 1

)].

Lemma 3. If P1 is a path, then ANQ-Alg runs in time O (4kkO (1)|E(H)|).



Proof. Algorithm ANQ-Alg runs in time O (T (k + 1)). We prove that for all 0 ≤ l, T (l) ≤ c4l(max{l, 1})c |E(H)|, where c ≥ b is a constant s.t. 8c( 3

4 )c ≤ 1. We use induction on l. If 0 ≤ l < 4, then the claim clearly holds.Now suppose that l ≥ 4, and assume that the claim holds for all l′ < l. Using the induction hypothesis, we get that

T (l) ≤ b · 2l[

l∣∣E(H)

∣∣ + c · 4(� l2 �+1)

(⌊l

2

⌋+ 1

)c

|E(H)|]

≤ c · 2ll∣∣E(H)

∣∣ + 4c2 · 4l(

l

2+ 1

)c∣∣E(H)∣∣

≤ 8c2 · 4l(

3l

4

)c∣∣E(H)∣∣ ≤ c4llc

∣∣E(H)∣∣. �

Lemma 4. Algorithm ANQ-Alg runs in time O (6.75kkO (1)|E(H)|).

Proof. Algorithm ANQ-Alg runs in time O (T (k + 1)). We prove that for all 0 ≤ l, T (l) ≤ c6.75l(max{l, 1})c |E(H)|, where c ≥ bis a constant s.t. 13.5c( 11

12 )c ≤ 1. We use induction on l. If 0 ≤ l < 4, then the claim clearly holds.Now suppose that l ≥ 4, and assume that the claim holds for all l′ < l. Using the induction hypothesis, we get that

T (l) ≤ b ·(

3

413

)l[l∣∣E(H)

∣∣ + c · 6.75(� 2l3 �+1)

(⌊2l

3

⌋+ 1

)c∣∣E(H)∣∣]

≤ c ·(

3

413

)l

l∣∣E(H)

∣∣ + 6.75c2 · 6.75l(

2l

3+ 1

)c∣∣E(H)∣∣

≤ 13.5c2 · 6.75l(

11l

12

)c∣∣E(H)∣∣

≤ c6.75llc∣∣E(H)

∣∣. �In order to analyze the space complexity of algorithm ANQ-Alg, we first prove the following lemma.

Lemma 5. For any call ANQ-Rec(r, nr, U , PS) performed during the execution of ANQ-Alg, we have that |{p ∈ V (P1) : PS[(1, p)] �=∅}| = O (log k).

Proof. Algorithm ANQ-Alg calls ANQ-Rec with a set PS s.t. |{p ∈ V (P1) : PS[(1, p)] �= ∅}| = 1. Lemma 1 and the pseu-docode imply that the recursive depth of ANQ-Rec is bounded by O (log k). Moreover, by the pseudocode, each call ANQ-Rec(r′, nr

′, U ′, PS′) executed by a call ANQ-Rec(r, nr, U , PS) satisfies |{p ∈ V (P1) : PS′[(1, p)] �= ∅}| ≤ |{p ∈ V (P1) : PS[(1, p)] �=∅}| + 1, and we thus conclude the lemma. �

By the pseudocode and Lemma 5, we get that each recursive call to ANQ-Rec uses O (|V (H)| log k) space, and the re-cursive depth of ANQ-Rec is bounded by O (log k). Thus, we conclude that the space complexity of ANQ-Alg is bounded by O (|V (H)| log2 k).

3. An algorithm for PINQ

We are now ready to present our main result, an algorithm which handles inputs for PINQ in which P is a set of trees (i.e., Pi is a tree for all 1 ≤ i ≤ t). This algorithm, called PINQ-Alg, builds upon ANQ-Alg. We next give an informal explanation of the main idea of this result.

Applying the randomized divide-and-conquer method, our goal is to divide the problem of finding a combination T of t trees that is a solution4 into two subproblems of finding two “almost disjoint” combinations of trees that are small, yet allow constructing T . At first, we need to map size = k nodes. We partition Pi , trying all choices of an index i ∈ {1, 2, . . . , t}and a node m1 ∈ V (Pi), into two subtrees, P L

i and P Ri , containing (together) all of its nodes, and having exactly one common

node, m1, which will next be considered as the root of P Li .5 Since we do not know in advance how the trees in a solution are

connected, we try several choices of the number of nodes, size′ , that should be mapped when solving the first subproblem

4 Recall that, informally, a combination of t trees is a solution if it forms a subtree of H , and each of its trees is isomorphically mapped to a different tree in P , such that the total score of the mapping is at least W .

5 Consider some node in P1 as the root of a tree C that is a combination of the trees in P which can be mapped to a solution. Then, the root of Pi

should be chosen to be the node in Pi whose father in C belongs to a different tree in P . Since we do not know in advance how the trees are connected in C , we fix in advance only the root of P1.



(then the number of nodes that should be mapped in the second subproblem is size − size′ + 1). Since we try all choices of an index i ∈ {1, 2, . . . , t} and a node m1 ∈ V (Pi), it will be enough to consider options where both size′ and size − size′ + 1are at most (2/3)size + 1. We also consider random partitions of the form (U L , U R) (see Section 2).

Solving the first subproblem, we obtain several mappings, using only nodes in U L (excluding the node with whom m1is matched), where each maps a tree on size′ nodes that is combination of trees consisting of P L

i and trees in P \ {Pi}. As in ANQ-Alg, a mapping is represented by a node h in U R (with whom m1 is matched) and a score, but also by the set of mapped trees (excluding P L

i ). Note that Pi is the only tree that is partially mapped; thus, it is possible to represent each mapping by such a compact triple. To ensure that we do not obtain too many mappings, we will have at most one score that corresponds to the same node h in U R and set of mapped trees.

To solve the second subproblem, we need to map a tree on size − size′ + 1 nodes that is a combination of P Ri and trees

in P \ {Pi} to a subtree of H in a manner that will correspond to a mapping of a tree on size nodes, which, in particular, maps P L

i . Observe that this can be achieved by mapping P Ri , along with some subset of trees P ⊆ P , in the following

manner. We map m1 to a node h that appears, with the subset of trees P \ (P ∪ {Pi}) and a score s, as a possible answer to the first subproblem, where we assume that the score of matching m1 with h is s. Next, assume that we are trying to map P R

i in this manner.We partition the problem of mapping P R

i in the above manner into two subproblems. This, in particular, involves parti-tioning P R

i or a tree P j ∈ P \ {Pi} into two subtrees that have a common node m2. Now, however, we need to consider the answers of the subproblem related to P L

i when considering the subproblem (among the two new subproblems) in which we should map m1. As we continue partitioning subproblems into smaller subproblems, we might have more answers, some related to subtrees of different trees in P , that we will have to consider. In this context, there are two issues we would like to note (precise details are given in the following sections). First, when partitioning a problem, in which we should consider a certain set of answers, into two new subproblems, we will need to try several options of partitioning the set of answers between the new subproblems. This is a result of the fact that we may not know in advance (unlike in the case where |P| = 1) when each answer should be considered (as we do not know in advance how the trees in a solution are connected). Second, each returned answer should correspond to a triple of the above mentioned compact form. This implies that when solving a subproblem, we should partially map only one tree in P , and all of the other trees that are partially mapped (in answers given to us as part of the subproblem) should now be completely mapped.

In Section 3.1, we give a more formal overview of PINQ-Alg. Section 3.2 includes some definitions required for the pseudocode of PINQ-Alg (given in Section 3.3). Finally, in Section 3.4, we prove the correctness and analyze the running time of PINQ-Alg.

3.1. Overview

Recall that, following the randomized divide-and-conquer method, PINQ-Alg is a recursive algorithm. Next, by using Fig. 2, we describe and illustrate a recursive stage in more detail.

Each recursive stage involves a rooted subtree R of a tree in P , U ⊆ V (H), a set Solved of rooted trees and a positive integer size. For example, part A of Fig. 2 illustrates an input for algorithm PINQ-Alg and a recursive stage, where Solvedcontains the squares, triangles and hexagons trees. Each tree in Solved has several triples (as opposed to the pairs used byANQ-Alg), where each such triple consists of a set of trees P ⊆ P , a node h ∈ V (H) and a score s. Such a triple (P, h, s)concerns a subtree S of H , a partition of V (S) into the subsets {V 1

S , . . . , V |P |+1S }, an isomorphism m1 between the tree in

Solved (to which the triple belongs) and S[V 1S ] that maps the root of the tree to h, and an isomorphism mi between S[V i

S ]and a different tree in P , for all 2 ≤ i ≤ |P| + 1, such that

∑1≤i≤|P|+1

∑v∈V i

S�(l(v), l(mi(v))) +∑

e∈E(S) w(e) = s. Note that

the triple (P, h, s) can be considered as a “partial solution”. For example, part A of Fig. 2 illustrates the triples ({P4}, e, 3), ({P5, P6}, e, 3) and ({P5, P6}, i, 3) that belong to the triangles tree.

Using such triples, algorithm PINQ-Alg maps sets of size nodes to subtrees of H , such that each node (excluding the root of R) is mapped to a node in U and neighbors are mapped to neighbors. Each such set of size nodes contains the nodes of the subtree R ′ of R induced by the nodes in R that do not belong to any tree in Solved and the roots of the subtrees of R in Solved (see Fig. 2 for an example of such a tree R ′). Moreover, each such set of size nodes must help us “complete” mapping the trees in P that have subtrees in Solved, excluding the tree containing R; thus it also contains their nodes, excluding those that belong to trees in Solved and are not their roots. The number of nodes we have just mentioned to be contained in such set of size nodes may be less than size; therefore algorithm PINQ-Alg examines several choices of adding nodes of trees in P that do not have subtrees in Solved and thus getting sets of size nodes (to be mapped to subtrees of H).

In the base cases of the recursion, size ≤ 2, and then the problem can be easily solved. In the step of the recursion, algorithm PINQ-Alg divides the problem into two subproblems as follows. Any set of size nodes that PINQ-Alg attempts to map may contain nodes of different trees in P , and it does not know in advance how to “connect” them.6 Thus PINQ-Alg

6 Algorithm PINQ-Alg needs to “connect” these trees to get one tree to be mapped to a subtree of H .



Fig. 2. An illustration of a randomized divide-and-conquer step in PINQ-Alg.

examines several choices of dividing the set of nodes it must map (i.e., the nodes that must be contained in any set of sizenodes that it attempts to map) into two sets to be separately mapped in the first and second subproblems. Such a division may not imply the number of nodes to be mapped in each subproblem (since the number of nodes that PINQ-Alg must map may be less than size), and thus PINQ-Alg also examines several choices of these numbers. Algorithm PINQ-Alg randomly divides U into two subsets. It uses the first subset to solve first subproblem; then it uses the corresponding results and the second subset to solve the second subproblem. For example, in order to solve the problem illustrated in part A of Fig. 2, algorithm PINQ-Alg divides it into the subproblems illustrated in parts B and C. Algorithm PINQ-Alg solves the subproblem illustrated in part B and uses its answer (i.e., the isomorphism of the hexagons tree in part C) to solve the subproblem illustrated in part C.

3.2. Some definitions

Each definition given below is preceded by an explanation of its relevance and is illustrated in Fig. 2.First choose some node p2 ∈ V (P1), and add a new node p1 and an edge {p1, p2} of weight 0 to P1. Define ∀l ∈ L :

�(l(p1), l) = −∞. Also add a new node h∗ to H and connect it to all the nodes in V (H) by edges of weight 0.



Since we may not know a topology for a solution (we only know that it is a tree), we cannot define a preorder on its nodes and use the form T (p, np), which is used by ANQ-Alg, for the trees considered at each recursive stage. Thus, we define a different form as follows. We order the neighbors of each node p ∈ V (P) (arbitrarily), and de-note them by nei1(p), . . . , nei|N(p)|(p). Denote N(p) = N(p) ∪ {nil}. Also denote N(p, neii(p), nei j(p)) = {neil(p) ∈ N(p) :i ≤ l < j ∨ j < i ≤ l ∨ l < j < i}, N(p, neii(p), nil) = {neil(p) ∈ N(p) : i ≤ l} and N(p, nil, nil) = {}. Let P (p) denote the tree in P that contains p, and let T (p, neii(p), nei j(p)) denote the subtree induced by the nodes reachable from p in P (p)[V (P (p)) \ (N(p) \ N(p, neii(p), nei j(p)))]. Root T (p, neii(p), nei j(p)) at p. For example, in part A of Fig. 2, we have that T (p2, p3, p1) = R . Algorithm PINQ-Alg uses the form T (p, neii(p), nei j(p)) for the trees it considers at each recursive stage. Definition 5 refers to these trees.

Definition 5. Given Solved ⊆ {(p, n1p, n2

p) : p ∈ V (P), n1p ∈ N(p), n2

p ∈ N(p) s.t. n1p = nil → n2

p = nil}, r ∈ V (P), n1r ∈ N(r) and

n2r ∈ N(r) s.t. n1

r = nil → n2r = nil, we say that Solved is an (r, n1

r , n2r )-subtree set if its trees are disjoint, those that are subtrees

of P (r) are also subtrees of T (r, n1r , n2

r ), and Solved[(1, r), (3, n2r )] �= ∅.

For example, in part A of Fig. 2, we have that Solved = {(p2, p6, p1), (p4, p3, p3), (p14, p15, p13)} is a (p2, p3, p1)-subtree set.

We now define the information of each tree in Solved similarly to Definition 2; though now each tree in Solved has additional information on a set P of trees in P that are connected to it, and each of its scores corresponds to a mapping of it and the trees in P .

Definition 6. Let PS ⊆ {(p, n1p, n2

p, P, h, s) : p ∈ V (P), n1p ∈ N(p), n2

p ∈ N(p) s.t. n1p = nil → n2

p = nil, h ∈ V (H), P ⊆ P, s ∈ R}, r ∈ V (P), n1

r ∈ N(r), n2r ∈ N(r) s.t. n1

r = nil → n2r = nil, and U ⊆ V (H). We say that PS is an (r, n1

r , n2r , U )-set if

1. {(p, n1p, n2

p) : PS[(1, p), (2, n1p), (3, n2

p)] �= ∅} is an (r, n1r , n2

r )-subtree set.

2. ∀(p, n1p, n2

p, P, h, s) ∈ PS: ∀s′[(p, n1p, n2

p, P, h, s′) ∈ PS → s = s′], (p �= r ↔ h ∈ U ) and ∀P ′[PS[(4, P ′)] �= ∅ → P (p) /∈ P ′].

For example, in part A of Fig. 2, we have that PS = {(p2, p6, p1, {}, a, 4), (p4, p3, p3, {P4}, e, 3), (p4, p3, p3, {P5, P6}, e, 3),

(p4, p3, p3, {P5, P6}, i, 3), (p14, p15, p13, {}, o, 2)} is a (p2, p3, p1, U )-set.Next we define the set of nodes that a given (r, n1

r , n2r , U )-set PS implies we must map. This is a modification of Defini-

tion 3.

Definition 7. Given an (r, n1r , n2

r , U )-set PS, V (PS) is the set of nodes [V (T (r, n1r , n2

r )) ∪⋃p s.t. PS[(1,p)]�=∅∧P (p)�=P (r) V (P (p))] \

(⋃

(p,n1p ,n2

p ,P,h,s)∈PS V (T (p, n1p, n2

p)) \ {p}).

For example, in part A of Fig. 2, we have that V (PS) = {p2, p3, p4, p5, p13, p14}.We do not use a definition similar to Definition 4 since we may not know a topology for V (PS) in a solution, and thus

we cannot determine how each node divides it. We consider every node in V (PS) as a possible divisor of our problem and examine several options for the sizes of the resulting smaller subproblems.

The next definition is used for the sake of clarity of the pseudocode. Given (r, n1r , n2

r , U )-sets PS and PS′ , we define a calculation that uses PS′ to update PS to hold the information of both sets that corresponds to the best scores.

Definition 8. Given (r, n1r , n2

r , U )-sets PS and PS′ , PS+⇐ PS′ is defined as {(p, n1

p, n2p, P, h, s) ∈ PS∪PS′ : ∀s′[(p, n1

p, n2p, P, h, s′) ∈

PS ∪ PS′ → s′ ≤ s]}.

3.3. The algorithm

We are now ready to present our algorithm PINQ-Alg, whose main component is a recursive procedure called PINQ-Rec. We first note that an input for PINQ-Rec is of the form (r, n1

r , n2r , U , size, PS), where r ∈ V (P), n1

r ∈ N(p), n2r ∈ N(p) s.t. n1

r =nil → n2

r = nil, U ⊆ V (H), size ∈ N and PS is an empty set or an (r, n1r , n2

r , U )-set such that |V (PS)| ≤ size. The output SOL ofPINQ-Rec is an empty set or an (r, n1

r , n2r , U , size)-set such that SOL[(1, r), (2, n1

r ), (3, n2r )] = SOL.

PINQ-Alg(P, H,�, W ):

1. Add elements to the input as described in Section 3.2.2. SOL ⇐ PINQ-Rec(p1, p2, nil, V (H) \ {h∗}, k + 1, {(p1, nil, nil, h∗, 0)}).3. Accept iff (SOL �= ∅ ∧ max

(p,n1p ,n2

p ,P,h,s)∈SOL{s} ≥ W ).



PINQ-Rec(r,n1r ,n2

r , U , size,PS):

1. If PS = ∅ ∨ size = 1: Return PS.2. SOL ⇐ ∅.

Step 1 handles two base cases. If PS = ∅, we could not map some subtree (in previous computations) that is relevant to the current recursive stage, and thus we return ∅. Else if size = 1, we have that |V (PS)| = 1; then, PS is an (r, n1

r , n2r , U )-set

such that PS[(1, r), (2, n1r ), (3, n2

r )] = PS, and thus we return PS. The set SOL will hold tuples that represent the best mappings we find.

3. If size = 2:(a) If |V (PS)| = 2: Denote by v the node in V (PS) which is not r.(b) If |V (PS)| = 1, then for each (p, n1

p, n2p, P, h, s) ∈ PS, v ∈ V (P) \ {r} s.t. [P (v) /∈ P ∧ N(v) = ∅], h′ ∈ U ∩ N(h):

• SOL+⇐ {(r, n1

r , n2r , P ∪ {P (v)}, h, s + w({h, h′}) + �(l(v), l(h′)))}.

(c) Else if PS[(1, v)] = ∅, then for each (p, n1p, n2

p, P, h, s) ∈ PS, h′ ∈ U ∩ N(h):

• SOL+⇐ {(r, n1

r , n2r , P, h, s + w({h, h′}) + �(l(v), l(h′)))}.

(d) Else if v ∈ V (P (r)), then for each (p, n1p, n2

p, P, h, s) ∈ PS[(1, r)], P ′ ⊆ P \ P, h′ ∈ N(h), s′ s.t. ∃nv [(v, nv , r, P ′, h′, s′) ∈PS]:• SOL

+⇐ {(r, n1r , n2

r , P ∪ P ′, h, s + w({h, h′}) + s′)}.(e) Else for each (p, n1

p, n2p, P, h, s) ∈ PS[(1, r)], P ′ ⊆ P \ P, h′ ∈ N(h), s′ s.t. ∃nv [(v, nv , nil, P ′, h′, s′) ∈ PS]:

• SOL+⇐ {(r, n1

r , n2r , P ∪ P ′ ∪ {P (v)}, h, s + w({h, h′}) + s′)}.

(f) Return SOL.

If size = 2, then we have four base cases. For each of the two base cases corresponding to whether or not PS[(1, v)] = ∅, we have two base cases that correspond to whether or not v and r belong to the same tree in P . Note that, if r and vbelong to the same tree in P , then r, v and the nodes on the simple path between them belong to V (PS); thus, since |V (PS)| ≤ size = 2, r and v are neighbors.

In more detail, Step 3b solves the case where PS[(1, v)] = ∅ and v /∈ V (P (r)), Step 3c solves the case where PS[(1, v)] =∅ and v ∈ V (P (r)), Step 3d solves the case where PS[(1, v)] �= ∅ and v ∈ V (P (r)), and Step 3e solves the case where PS[(1, v)] �= ∅ and v /∈ V (P (r)). In Step 3b, we try to extend each partial solution in PS that maps r to some node h, by mapping a new graph in P that contains exactly one node to a neighbor of h. Steps 3c–3e correspond to cases for which v is determined in Step 3a. In these steps, we try to extend each partial solution ps in PS that maps r to some node h, by mapping v to a neighbor of h, where in Steps 3d and 3e we also ensure that v is mapped to this neighbor according to a partial solution in PS (while not mapping any tree in P twice).

4. For each m ∈ V (P), n1m ∈ N(m),n2

m ∈ N(m) s.t. (n1m = nil → n2

m = nil), sizeL ∈ {max{1, � size3 � − 1}, max{2, � size

3 �}, . . . ,� 2size

3 � − 1}, partition (P L, P R) of {p : PS[(1, p)] �= ∅} s.t. m /∈ P R :

We examine choices for m, n1m and n2

m that may divide our problem of mapping sets of size nodes (which are supersets of V (PS)) into smaller subproblems.

Since we do not know all the nodes we need to map in each of the resulting subproblems, we examine several choices for the sizes of the subproblems. We only examine choices in which both sizes are at most � 2size

3 � + 1 (if size = 3, then at most 2) to get the desired running time. Indeed, note that both sizeL and sizeR (defined below), which correspond to the number of nodes to be mapped in the first and second subproblems, excluding the common node m, are at most � 2size

3 �. This still allows us to find a solution (see Lemma 1 for intuition). For the same reason, we examine all the partitions of the set of roots of trees in PS into two sets, P L and P R , to be used in the first and second subproblems, respectively. We require that m /∈ P R since in our second subproblem we will only be interested in mappings of m that were found in solutions to our first subproblem.

4. (a) sizeR ⇐ size − sizeL − 1,probL ⇐ sizeLsize−1 and probR ⇐ 1 − probL .

(b) Repeat 1(1−1/e)2 probL

sizeL probRsizeR

times:

i. U L ⇐ ∅ and U R ⇐ U , SOLL ⇐ ∅ and SOLR ⇐ ∅.ii. For each h ∈ U : With probability probL move h from U R to U L .

We randomly partition U into two sets, U L and U R , containing the only nodes in H to which we can map new nodes (i.e., nodes that extend already computed partial solutions) in the first and second subproblems, respectively. Recall that, informally, the goal of using these partitions is to allow us, when we next combine solutions for the first and second



subproblems, to avoid overlaps (i.e., to avoid returning a solution where different nodes in V (P) are mapped to the same node in V (H)). The sets SOLL and SOLR will hold the solutions we find to our first and second subproblems.

4. (b) iii. PSL ⇐ {(p,n1p,n2

p, P,h, s) ∈ PS : p ∈ P L, p �= m ↔ h ∈ U L}.

iv. If m /∈ P L : PSL+⇐ ⋃

h∈U R{(m,n2

m,n2m,∅,h,�(l(m), l(h)))}.

v. For each (p,n1p,n2

p, P,h, s) ∈ PSL s.t. P (m) ∈ P : PSL ⇐ PSL \ {(p,n1p,n2

p, P,h, s)}.

vi. If ∃p ∈ P L s.t. PSL[(1, p)] = ∅ ∨ PSL is not an (m,n1m,n2

m, U L)-set ∨ |V (PSL)| > sizeL + 1: Go to the next iteration.

The set PSL is initialized to hold all the tuples in PS that are suitable for our first subproblem (Step 4(b)iii). If it does not include a tree rooted at m, then we add all the options of mapping the tree that contains only m to a node in U R

(Step 4(b)iv). We remove tuples that do not correspond to m and yet map its entire tree from PSL (Step 4(b)v). If P L

contains a node that is not a root of a tree in PSL , or PSL is not an (m, n1m, n2

m, U )-set, or PSL requires mapping too many nodes, then we skip the rest of the iteration (Step 4(b)vi).

4. (b) vii. SOLL ⇐ PINQ-Rec(m, n1m, n2

m, U L, sizeL + 1, PSL).

We solve our first subproblem.

4. (b) viii. PSR ⇐ SOLL ∪ {(p,n1p,n2

p, P,h, s) ∈ PS : p ∈ P R ,h /∈ U L}.

ix. For each (p,n1p,n2

p, P,h, s) ∈ PSR s.t. ∃p′ ∈ P R ∪ {m} for which P (p′) ∈ P: PSR ⇐ PSR \ {(p,n1p,n2

p, P,h, s)}.

x. If ∃p ∈ P R ∪ {m} s.t. PSR [(1, p)] = ∅ ∨ PSR is not an (r, n1r , n2

r , U R)-set ∨ |V (PSR)| > sizeR + 1: Go to the next iteration.

The set PSR is initialized to hold the solutions to the first subproblem and the tuples in PS that are suitable to our second subproblem (Step 4(b)viii). We then remove tuples that do not correspond to a node p′ ∈ P R ∪ {m} and yet map its entire tree from PSR (Step 4(b)ix). If the resulting PSR is illegal (the check is similar to the one performed in Step 4(b)vi), then we skip the rest of the iteration (Step 4(b)x).

4. (b) xi. SOL+⇐ PINQ-Rec(r, n1

r , n2r , U R , sizeR + 1, PSR).

5. Return SOL.

We solve our second subproblem and update SOL.

3.4. Correctness and running time

In this section we prove the following theorem.

Theorem 2. Algorithm PINQ-Alg solves inputs for PINQ in which P is a set of trees in time O (6.75k+O (log2 k)3t |E(H)|) and space O (2t |V (H)| log2 k).

First we prove the correctness of algorithm PINQ-Alg.Let r ∈ V (P), n1

r ∈ N(p), n2r ∈ N(p) s.t. n1

r = nil → n2r = nil, U ⊆ V (H), size ∈ N, PS be an (r, n1

r , n2r , U )-set s.t. |V (PS)| ≤

size, h ∈ V (H) and P ⊆ P . Given A ⊆ P , denote V (PS, A) = V (PS) ∪ (⋃

Pi∈A V (Pi)).

We next define ISO(r, n1r , n2

r , U , size, PS)h,P to be the set of solutions relevant to r, nr1, nr

2, U , size, PS, h and P . Informally, each such solution corresponds to an isomorphism which maps a tree on size nodes that is rooted at r and contains T (r, n1

r , n2r ) as a subtree, completing the mapping of all the trees in PS (that are not P (r)) and using only nodes in U to

map new nodes (i.e., nodes that are not mapped according to a partial solution in PS).If P (r) ∈ P or [∃p ∈ V (PS) s.t. P (p) /∈ P ∪ {P (r)}], then denote ISO(r, n1

r , n2r , U , size, PS)h,P = ∅.

Else let ISO(r, n1r , n2

r , U , size, PS)h,P denote the set of every tuple M = (A, MV , M P ),7 where A ⊆ P \ ⋃p∈V (PS) P (p) s.t.

|V (PS, A)| = size, MV is a mapping between a tree on V (PS, A) and a subtree SM of H s.t. MV (r) = h, M P : V (PS, A) → 2P

s.t. [(∀p, p′ ∈ V (PS) : M P (p) ∩ M P (p′) = ∅) ∧⋃p∈V (PS) M P (p) = P \⋃

p∈V (PS,A) P (p)], and for each p ∈ V (PS, A) the following conditions hold:

7 Informally, such a tuple corresponds to a solution as follows: A is the set of trees (from P) connected by an edge to an unmapped node of a tree partially mapped in PS, MV is the mapping of the nodes in V (PS) and of the trees in A, and M P describes to which trees in P nodes already mapped in partial solutions in PS are connected (in particular, it ensures that these trees do not belong to A, so we will not map the same tree twice).



1. For all np ∈ N(p) ∩ V (PS, A): {MV (p), MV (np)} ∈ E(SM).2. If PS[(1, p)] = ∅: MV (p) ∈ U and M P (p) = ∅.

Denote s(M, p) = �(l(p), l(MV (p))).3. Else: There is an element n2

p s.t. PS[(1, p), (3, n2p), (4, M P (p)), (5, MV (p))] �= ∅ and [if p �= r, then ( f /∈ V (P (p)) ∧ n2

p =nil) ∨ f = n2

p , where f is the node for which MV ( f ) is the father of MV (p) in SM when rooted at h].Let s(M, p) denote the score s for which PS[(1, p), (4, M P (p)), (5, MV (p)), (6, s)] �= ∅.

We now define the score of a specific solution in ISO(r, n1r , n2

r , U , size, PS)h,P , followed by the definition of the best score of a solution in this set. Given M ∈ ISO(r, n1

r , n2r , U , size, PS)h,P , let s(ISO(r, n1

r , n2r , U , size, PS)h,P , M) denote the score ∑

p∈V (PS,A) s(M, p) + ∑e∈E(SM ) w(e).

If ISO(r, n1r , n2

r , U , size, PS)h,P �= ∅, then denote s(ISO(r, n1r , n2

r , U , size, PS)h,P ) = maxM∈(r,n1r ,n2

r ,U ,size,PS)h,P{s(ISO(r, n1

r , n2r , U ,

size, PS)h,P , M)}.Given an input I , let (P, H, �, W ) be the instance we get after adding p1, {p1, p2}, h∗ and its edges to I (in Step 1

of PINQ-Alg). Note that I has a solution iff there is a subtree S of H whose nodes can be partitioned into the subsets {V S

1 , . . . , V St }, such that there is an isomorphism M1 between T (p1, p2, nil) and S[V S

1 ] for which M(p1) = h∗ , there is an isomorphism Mi between Pi and S[V S

i ], for all 2 ≤ i ≤ t , and ∑

1≤i≤t

∑v∈V (Pi)\{p1} �(l(v), l(Mi(v))) + ∑

e∈E(S) w(e) ≥ W . Thus, the following lemma, which we prove in Appendix A implies the correctness of PINQ-Alg.

Lemma 6. Let r ∈ V (P), n1r ∈ N(p), n2

r ∈ N(p) s.t. n1r = nil → n2

r = nil, U ⊆ V (H), size ∈ N and ∅ or an (r, n1r , n2

r , U )-set PSs.t. |V (PS)| ≤ size. The ∅ or (r, n1

r , n2r , U )-set SOL returned by PINQ-Rec(r, n1

r , n2r , U , size, PS) satisfies

1. If PS = ∅, then SOL = ∅.2. Else ∀h∗ ∈ V (H), P∗ ⊆ P :

(a) If ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ = ∅, then SOL[(4, P∗), (5, h∗)] = ∅.(b) Else:

i. For each score s s.t. (r, n1r , n2

r , P∗, h∗, s) ∈ SOL: s ≤ s(ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ ).

ii. With probability at least 1 − 1/e: (r, n1r , n2

r , P∗, h∗, s(ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ )) ∈ SOL.

Now we analyze the running time of algorithm PINQ-Alg. Assume w.l.o.g that |V (H)| ≤ |E(H)|. We start by proving the following lemma.

Lemma 7. For any call PINQ-Rec(r, n1r , n2

r , U , size, PS) performed during the execution of PINQ-Alg, we have that |{p ∈ V (P) :PS[(1, p)] �= ∅}| = O (log k).

Proof. Algorithm PINQ-Alg calls PINQ-Rec with a set PS s.t. |{p ∈ V (P) : PS[(1, p)] �= ∅}| = 1. Lemma 1 and the pseu-docode imply that the recursive depth of PINQ-Rec is bounded by O (log k). Moreover, by the pseudocode, each callPINQ-Rec(r′, n1

r′, n2

r′, U ′, size′, PS′) executed by a call PINQ-Rec(r, n1

r , n2r , U , size, PS) satisfies |{p ∈ V (P) : PS′[(1, p)] �= ∅}| ≤

|{p ∈ V (P) : PS[(1, p)] �= ∅}| + 1, and we thus conclude the lemma. �Given an input (r, n1

r , n2r , U , size, PS) to PINQ-Rec such that if PS = ∅ then l = 0 and else l = size, let T (l) be the running

time of PINQ-Rec(r, n1r , n2

r , U , size, PS). The pseudocode and Lemma 7 imply the following recurrence relation for some constants a and b (note that if l ≥ 4, then 2 ≤ � l

3 �):

• If 0 ≤ l < 4 : T (l) ≤ b · 3t |E(H)|.• Else,

T (l) ≤ a∑

m∈V (P )

∑n1

m∈N(m)

∑n2

m∈N(m)

∑� l

3 �≤l′≤� 2l3 �

[(l − 1

l′ − 1

)l′−1( l − 1

l − l′

)l−l′

2a log k(l2t |V (H)| + T(l′) + T

(l − l′ + 1

))]

≤ b · kb(

3

413

)l[l3t

∣∣E(H)∣∣ + T

(⌊2l

3

⌋+ 1

)].

Lemma 8. Algorithm PINQ-Alg runs in time O (6.75kkO (log k)3t |E(H)|).

Proof. Algorithm PINQ-Alg runs in time O (T (k + 1)). We prove that for all 0 ≤ l, T (l) ≤ c6.75lkc log l3t |E(H)|, where c is a constant s.t. 1 + b ≤ c log( 23

22 ) ∧ 13.5ckc log( 23l24 ) ≤ kc log l . We use induction on l. If 0 ≤ l < 4, then the claim clearly holds.



Now suppose that 4 ≤ l ≤ k + 1, and assume that the claim holds for all l′ < l. Using the induction hypothesis, we get that

T (l) ≤ b · kb(

3

413

)l[l3t

∣∣E(H)∣∣ + c · 6.75� 2l

3 �+1kc log(� 2l3 �+1)3t

∣∣E(H)∣∣]

≤ c · kb+1(

3

413

)l

3t∣∣E(H)

∣∣ + 6.75c2 · 6.75lkb+c log( 2l3 +1)3t

∣∣E(H)∣∣

≤ c · kc(

3

413

)l

3t∣∣E(H)

∣∣ + 6.75c2 · 6.75lkc log( 2322 )+c log( 11l

12 )3t∣∣E(H)

∣∣≤ 13.5c2 · 6.75lkc log( 23l

24 )3t∣∣E(H)

∣∣ ≤ c6.75lkc log l3t∣∣E(H)

∣∣. �In terms of space complexity, by the pseudocode and Lemma 7, we get that each recursive call to PINQ-Rec uses

O (2t |V (H)| log k) space, and the recursive depth of PINQ-Rec is bounded by O (log k). Thus, we conclude that the space complexity of PINQ-Alg is bounded by O (2t |V (H)| log2 k).

Appendix A. Proof of Lemma 6

By Step 1 of the pseudocode, the lemma holds for PS = ∅. We next assume that PS �= ∅, and prove the lemma by using induction on size. If 1 ≤ size < 3, then by Steps 1 and 3 of the pseudocode, the lemma holds.

Now suppose that size ≥ 3, and assume that the lemma holds ∀r′ ∈ V (P), n1r′ ∈ N(p), n2

r′ ∈ N(p) s.t. n1

r′ = nil → n2

r′ =

nil, U ′ ⊆ V (H), size′ < size and ∅ or an (r′, n1r′, n2

r′, U ′)-set PS′ s.t. |V (PS′)| ≤ size′ . Let h∗ be a node in V (H) and P∗ be a

subset of P .Consider an iteration of Step 4, and denote the elements to which it corresponds by m, n1

m, n2m, sizeL and (P L, P R).

Let (U L, U R) be a partition chosen next in an iteration of Step 4(b). If we do not skip the rest of the iteration in Step 4(b)vi, then by the induction hypothesis, the next computed set SOLL satisfies [∀P, h, s s.t. (m, n1

m, n2m, P, h, s) ∈

SOLL : (PS[(1, m)] = ∅ → h ∈ U R) ∧ (PS[(1, m)] �= ∅ → PS[(1, m), (5, h)] �= ∅) ∧ ISO(m, n1m, n2

m, U L, sizeL + 1, PSL)h,P �= ∅ ∧ s ≤s(ISO(m, n1

m, n2m, U L, sizeL + 1,PSL)h,P )]. If we do not skip the rest of the iteration in Step 4(b)x, then PS[(1, p), (2, n1

p),

(3, n2p)] �= ∅ → (PSL ∪ PSR)[(1, p), (2, n1

p), (3, n2p)] �= ∅, and PSL ∪ (PSR \ SOLL) ⊆ PS to which we add (

⋃h∈U R

{(m, n2m, n2

m, ∅, h,

�(l(m), l(h)))}) iff PS[(1, m)] = ∅. By the induction hypothesis, the set returned by PINQ-Rec in Step 4(b)xi, which we next denote by SOLR , satisfies [∀P, h, s s.t. (r, n1

r , n2r , P, h, s) ∈ SOLR : PS[(1, r), (5, h)] �= ∅ ∧ ISO(r, n1

r , n2r , U R , sizeR + 1, PSR)h,P �=

∅ ∧ s ≤ s(ISO(r, n1r , n2

r , U R , sizeR + 1, PSR)h,P )]. Thus, by the pseudocode and the definitions of ISO and s(ISO(. . .)), we get that if ISO(r, n1

r , n2r , U , size, PS)h∗,P∗ = ∅ and we do not skip the rest of the iteration before Step 4(b)xi, then

SOLR [(4, P∗), (5, h∗)] = ∅, and otherwise [∀s s.t. (r, n1r , n2

r , P∗, h∗, s) ∈ SOLR : s ≤ s(ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ )].

Now suppose that ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ �= ∅. Then, there is a tuple M = (A, MV , M P ) ∈ ISO(r, n1r , n2

r , U , size,

PS)h∗,P∗ s.t. s(ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ , M) = s(ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ ).Let T ∗ denote the tree on V (PS, A) rooted at r whose edge set is {{p, np} : p ∈ V (PS, A), np ∈ V (PS, A), {MV (p), MV (np)}

∈ E(SM)}. Order the nodes of T ∗ by using a preorder as v1, v2, . . . , vsize , where for each p ∈ V (T ∗), we first visit its children which do not belong to P (p) (in an arbitrary order), and then the children which belong to P (p) in the following order:

1. Let neii(p) and nei j(p) be neighbors of p (in P (p)) that are its children in T ∗ , such that i < j (if p has less than two such children, then there is no order that we need to define).

2. If p = r ∧ n2r �= nil, then let l be the index for which neil(r) = n2

r and visit neii(p) before nei j(p) iff l < i ∨ j < l.3. Else if (p = r ∧ n2

r = nil) or the father of p in T ∗ is not in P (p), then visit neii(p) before nei j(p).4. Else let l be the index for which neil(p) is the father of p. Visit neii(p) before nei j(p) iff l < i ∨ j < l.

By Lemma 1, there are neighbors vi and v j in T ∗ such that max{2, � n3 �} ≤ |V (T (vi, v j))| ≤ � 2n

3 �. Denote PL = {P ′ ∈ P∗ :∃v ∈ V (T (vi, v j)) s.t. P ′ ∈ M P (v) ∪ {P (v)}} \ {P (vi)}, m = vi and

1. If m = r, then denote n2m = n2

r .2. Else if the father of m in T ∗ belongs to P (m), then denote it by n2

m .3. Else denote n2

m = nil.4. Let n1

m denote the node in (V (T (vi, v j)) ∩ N(m)) ∪ {n2m} s.t. N(m, n1

m, n2m) = N(m) ∩ V (T (vi, v j)) (the preorder we have

used implies that it exists).

Denote sizeL = |V (T (vi, v j))| − 1, P L = {p ∈ V (T (vi, v j)) : PS[(1, p)] �= ∅} and P R = {p /∈ V (T (vi, v j)) : PS[(1, p)] �= ∅}. Note that there is an iteration of Step 4 in which we iterate over these elements. Next consider this iteration.



Denote SL = {h ∈ U : ∃p ∈ V (T (vi, v j)) s.t. MV (p) = h} \ {MV (m)} and S R = {h ∈ U : ∃p ∈ V (PS) \ (V (T (vi, v j)) \{m}) s.t. MV (p) = h} \ {MV (r)}.

The probability of choosing a partition (U L, U R) s.t. SL ⊆ U L and S R ⊆ U R in a given iteration of Step 4(b) is probL

sizeL probRsizeR . Consider an iteration in which such a partition is chosen. We do not skip the rest of the iteration

in Step 4(b)vi. By the induction hypothesis, with probability at least 1 − 1/e, the set SOLL computed in Step 4(b)vii in-cludes (m, n1

m, n2m, PL, MV (m), s(ISO(r, n1

r , n2r , U L, sizeL + 1, PSL)MV (m),PL

)). Then we do not skip the rest of the iteration in Step 4(b)x, and by the induction hypothesis, with probability at least 1 − 1/e, the set returned by PINQ-Rec in Step 4(b)xi includes (r, n1

r , n2r , P∗, h∗, s(ISO(r, n1

r , n2r , U R , sizeR + 1, PSR)h∗,P∗ )). We have that PS[(1, p), (2, n1

p), (3, n2p)] �= ∅ → (PSL ∪

PSR)[(1, p), (2, n1p), (3, n2

p)] �= ∅, and PSL ∪ (PSR \ SOLL) ⊆ PS to which we add the set (⋃

h∈U R{(m, n2

m, n2m, ∅, h, �(l(m), l(h)))})

iff PS[(1, m)] = ∅. These arguments imply that s(ISO(r, n1r , n2

r , U R , sizeR + 1, PSR)h∗,P∗ ) = s(ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ ).We get that the probability that there is an iteration in which we reach Step 4(b)xi and then the set returned by

PINQ-Rec includes (r, n1r , n2

r , P∗, h∗, s(ISO(r, n1r , n2

r , U , size, PS)h∗,P∗ )) is at least

1 − (1 − (1 − 1/e)2probL

sizeL probRsizeR

)1/((1−1/e)2probLsizeL probR

sizeR ) ≥ 1 − 1

e. �

References

[1] N. Alon, R. Yuster, U. Zwick, Color coding, J. Assoc. Comput. Mach. 42 (1995) 844–856.[2] A.M. Ambalath, R. Balasundaram, R.H. Chintan, K. Venkata, M. Neeldhara, P. Geevarghese, M.S. Ramanujan, On the kernelization complexity of colorful

motifs, in: Proc. IPEC, 2010, pp. 14–25.[3] N. Betzler, R. Bevern, M.R. Fellows, C. Komusiewicz, R. Niedermeier, Parameterized algorithmics for finding connected motifs in biological networks,

IEEE/ACM Trans. Comput. Biol. Bioinform. 8 (2011) 1296–1308.[4] A. Björklund, T. Husfeldt, P. Kaski, M. Koivisto, Narrow sieves for parameterized paths and packings, CoRR, arXiv:1007.1161, 2010.[5] A. Björklund, P. Kaski, L. Kowalik, Probably optimal graph motifs, in: Proc. STACS, 2013, pp. 20–31.[6] G. Blin, F. Sikora, S. Vialette, Querying graphs in protein–protein interactions networks using feedback vertex set, IEEE/ACM Trans. Comput. Biol.

Bioinform. 7 (2010) 628–635.[7] S. Bruckner, F. Hüffner, R.M. Karp, R. Shamir, R. Sharan, Topology-free querying of protein interaction networks, in: Proc. RECOMB, 2009, pp. 74–89.[8] J. Chen, J. Kneis, S. Lu, D. Molle, S. Richter, P. Rossmanith, S. Sze, F. Zhang, Randomized divide-and-conquer: improved path, matching, and packing

algorithms, SIAM J. Comput. 38 (2009) 2526–2547.[9] R. Dondi, G. Fertin, S. Vialette, Weak pattern matching in colored graphs: minimizing the number of connected components, in: Proc. ICTCS, 2007,

pp. 27–38.[10] R. Dondi, G. Fertin, S. Vialette, Finding approximate and constrained motifs in graphs, in: Proc. CPM, 2009, pp. 221–235.[11] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, R. Sharan, QNet: a tool for querying protein interaction networks, J. Comput. Biol. 15 (2008) 913–925.[12] R.G. Downey, M.R. Fellows, Fixed-parameter tractability and completeness II: on completeness for W[1], Theor. Comput. Sci. 141 (1995) 109–131.[13] M.R. Fellows, G. Fertin, D. Hermelin, S. Vialette, Upper and lower bounds for finding connected motifs in vertex-colored graphs, J. Comput. Syst. Sci. 77

(2011) 799–811.[14] V. Fionda, L. Palopoli, Biological network querying techniques: analysis and comparison, J. Comput. Biol. 18 (2011) 595–625.[15] M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, New York, 1979.[16] S. Guillemot, F. Sikora, Finding and counting vertex-colored subtrees, Algorithmica 65 (2013) 828–844.[17] F. Hüffner, S. Wernicke, T. Zichner, Algorithm engineering for color-coding with applications to signaling pathway detection, Algorithmica 52 (2008)

114–132.[18] I. Koutis, Faster algebraic algorithms for path and packing problems, in: Proc. ICALP, 2008, pp. 575–586.[19] I. Koutis, Constrained multilinear detection for faster functional motif discovery, Inf. Process. Lett. 112 (2012) 889–892.[20] V. Lacroix, C.G. Fernandes, M.F. Sagot, Motif search in graphs: application to metabolic networks, IEEE/ACM Trans. Comput. Biol. Bioinform. 3 (2006)

360–368.[21] A. Mano, T. Tuller, O. Béjà, R.Y. Pinter, Comparative classification of species and the study of pathway evolution based on the alignment of metabolic

pathways, BMC Bioinform. 11 (2010) S38.[22] D.W. Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2004.[23] R. Niedermeier, Invitation to Fixed-Parameter Algorithms, Oxford University Press, 2006.[24] R.Y. Pinter, M. Zehavi, Partial information network queries, in: Proc. IWOCA, 2013, pp. 362–375.[25] R.Y. Pinter, M. Zehavi, Algorithms for topology-free and alignment network queries, J. Discrete Algorithms 27 (2014) 29–53.[26] R.Y. Pinter, O. Rokhlenko, E. Yeger-Lotem, M. Ziv-Ukelson, Alignment of metabolic pathways, Bioinformatics 21 (2005) 3401–3408.[27] R.Y. Pinter, O. Rokhlenko, D. Tsur, M. Ziv-Ukelson, Approximate labelled subtree homeomorphism, J. Discrete Algorithms 6 (2008) 480–496.[28] R. Rizzi, F. Sikora, Some results on more flexible versions of graph motif, in: Proc. CSR, 2012, pp. 278–289.[29] T. Shlomi, D. Segal, E. Ruppin, R. Sharan, QPath: a method for querying pathways in a protein-protein interaction networks, BMC Bioinform. 7 (2006)

199.[30] F. Sikora, An (almost complete) state of the art around the graph motif problem, Université Paris-Est Technical reports, 2012.[31] H. Wang, T. Xiang, X. Hu, Research on pattern matching with wildcards and length constraints: methods and completeness, www.intechopen.com/

books/bioinformatics, 2012.[32] M. Zehavi, Parameterized algorithms for module motif, in: Proc. MFCS, 2013, pp. 825–836.


3.3 Parameterized Algorithms for Module Motif

Meirav Zehavi. Parameterized Algorithms for Module Motif. In the proc. of the 38th

International Symposium on Mathematical Foundations of Computer Science (MFCS),

pages 825–836, 2013.

77


Parameterized Algorithms for Module Motif

Meirav Zehavi

Department of Computer Science, Technion - Israel Institute of Technology,Haifa 32000, Israel

[email protected]

Abstract. Module Motif is a pattern matching problem that was intro-duced in the context of biological networks. Informally, given a multisetof colors P and a graph H whose nodes have sets of colors, it asks if Poccurs in a module of H (i.e. in a set of nodes that have the same neigh-borhood outside the set). We present three parameterized algorithms forthis problem that measure similarity between matched colors and handledeletions and insertions of colors to P . We observe that the running timeof two of them might be essentially tight and prove that the problem isunlikely to admit a polynomial kernel.

Keywords: parameterized algorithm, module motif, pattern matching.

1 Introduction

Graph Motif is an important problem in the analysis of biological networks thathas received considerable attention since its introduction by Lacroix et al. [16](see [1, 2, 4, 5, 7, 10–13, 15, 19–21]). Informally, given a multiset of colors P anda graph H whose nodes have sets of colors, it asks if P occurs in a subtree of H .

A module M of a graph H = (V, E) is a subset of V s.t. ∀u, v ∈ M, ∀x /∈M : {v, x} ∈ E iff {u, x} ∈ E [8] (i.e. it is a set of nodes that have the sameneighborhood outside the set). Rizzi et al. [21] replace the connectivity constraintof Graph Motif with modularity and thus define Module Motif. They presentbiological justifications for considering this replacement.

Module Motif

– Input: A set of colors C, a multiset P of colors from C, a graph H = (V, E)and Col : V → 2C .

– Decide if there is a module M of H and m : M → C s.t.1. ∀v ∈ M : m(v) ∈ Col(v).2. ∀c ∈ C : c occurs in P exactly |{v ∈ M : m(v) = c}| times.

In the limited case of the problem we have that |Col(v)| = 1 for all v ∈ V . Wedenote MaxCol = maxv∈V {|Col(v)|}.

We use the common O∗ and O notation to hide factors polynomial and poly-logarithmic in the input size respectively. As in [21], we denote k = |P |.

K. Chatterjee and J. Sgall (Eds.): MFCS 2013, LNCS 8087, pp. 825–836, 2013.c© Springer-Verlag Berlin Heidelberg 2013


826 M. Zehavi

Rizzi et al. [21] prove that the limited case of the problem is NP-completeeven if P is a set and H is a collection of paths of size 3. They denote by c thenumber of different colors in P and give an O∗((k(2c)k)k+1ck) time algorithmfor the problem, which is not satisfying for practical issues. They also give anO∗(2k) time and space algorithm for the limited case. They use a modular treedecomposition [24] and dynamic programming.

Rizzi et al. [21] leave the handling of deletions and insertions of colors to Pas an open problem. Several Graph Motif algorithms handle deletions and inser-tions (see e.g. [19]), and we shall handle them in a similar manner. In biologicalnetworks such as protein-protein interaction networks and metabolic networks,we can measure the similarity between the elements that the colors represent (seee.g. [18, 23]) and thus assess the relevance of a solution to Module Motif. Thisapproach is used in the alignment query problem [19], which is another patternmatching problem that was introduced in the context of biological networks.Thus we define the following generalization of Module Motif:

General Module Motif

– Input: A set of colors C, a multiset P of colors from C, a graph H = (V, E),col : V → C, Col : V → 2C , Δ : C × C → R, D ∈ N0, I ∈ N0 and S ∈ R.

– Decide if there is a module M of H , U ⊆ M and m : U → C s.t.1. ∀v ∈ U : m(v) ∈ Col(v).2. ∀c ∈ C : c occurs in P at least |{v ∈ U : m(v) = c}| times.3. |U | = k − D = |M | − I.4.

∑v∈U Δ(col(v), m(v)) ≥ S.

The function col assigns to each node a color that represents the element thatthe node represents, and Col assigns to each node a set of colors that representelements that are similar to the element that the node represents. For example,in protein-protein interaction networks col(v) is the color that represents theprotein that v represents, and Col(v) is the set of colors that represent proteinsthat are homologous to the protein that v represents. Δ is symmetric, and D, Iand S stand for Deletions, Insertions and Score, respectively.

Conditions 1 and 2 are similar to those of Module Motif. Condition 3 statesthat we delete D occurrences from P and add I occurrences to P . Condition 4states that the score of the solution is at least S. We denote the occurrences ofcolors in P by p1, p2, . . . , pk, the color of an occurrence pi by col(pi) and S+ =max{S, 1}. We assume WLOG that D ≤ k, I ≤ |V | and MaxCol ≤ min{|C|, k}.

Module Motif is the special case of this problem in which (∀c, c′ ∈ C :Δ(c, c′) = 0) and I = D = S = 0.

Fixed-parameter algorithms [17] are an approach to solve NP-hard problemsby confining the combinatorial explosion to a parameter t. More precisely, aproblem is fixed-parameter tractable with respect to a parameter t if an instanceof size n can be solved in O∗(f(t)) time for some function f . We shall considerthe parameters k and k − D.

Our algorithms use modular tree decompositions (see Section 2). Section 3presents an O∗(2k) time General Module Motif algorithm that uses dynamic


Parameterized Algorithms for Module Motif 827

programming. In Section 4 we use it and improved color coding [14] to designan O∗(4.314k−D) time General Module Motif algorithm. Section 5 presents anO∗(2k−DS+) time algorithm for General Module Motif where S and Δ(c, c′) arenonnegative integers ∀c, c′ ∈ C. It uses the algebraic framework of Bjorklund etal. [3]. We get an O∗(2k) time and O∗(1) space Module Motif algorithm, whichimproves the O∗((k(2c)k)k+1ck) time algorithm of Rizzi et al. [21]. In Section 6we observe that some of our results might be essentially tight. Finally, in Section7, we prove that Module Motif is unlikely to admit a polynomial kernel.

Due to space constraints, some of the proofs are omitted.

2 Sets of Disjoint Sets Instead of Modules

In this section we show that we can focus our attention on finding a certainsubset of a set of disjoint sets instead of a certain module of a graph.

A modular tree decomposition of a graph H = (V, E) is a linear-sized repre-sentation of all its modules. It includes a rooted tree T = (VT , ET ), a functionf : VT → 2V and a function g : VT → {0, 1}. A formal definition appears in [24].In this paper we are only interested in its following properties (which are alsoused in [21]):

1. M is a module of H iff there is a node v ∈ VT s.t. M = f(v) or (g(v) = 1and there is a subset U of the set of children of v s.t. M =

⋃u∈U f(u)).

2. Every v, u ∈ VT that have the same father satisfy f(v) ∩ f(u) = ∅.3. |VT | ≤ 2|V | − 1.

Now we define a problem whose algorithms (which we design in Sections 3 and5) and modular tree decompositions will help us solve General Module Motif.

Set Motif

– Input: A set of colors C, a multiset P of colors from C, a set A of disjointsets, col :

⋃A → C, Col :⋃A → 2C , Δ : C × C → R, D ∈ N0, I ∈ N0 and

S ∈ R.– Decide if there is A ⊆ A, U ⊆ ⋃ A and m : U → C s.t.

1. ∀a ∈ U : m(a) ∈ Col(a).2. ∀c ∈ C : c occurs in P at least |{a ∈ U : m(a) = c}| times.

3. |U | = k − D = |⋃ A| − I.4.

∑a∈U Δ(col(a), m(a)) ≥ S.

We denote the sets of A by A1, A2, . . . , A|A|. For each Ai ∈ A, we denote itselements by ai

1, ai2, . . . , a

i|Ai|.

Let SetALG(C, P, A, col, Col, Δ, D, I, S) be a Set Motif algorithm that usest(MaxCol, k, |⋃A|, D, I, S) time and s(MaxCol, k, |⋃A|, D, I, S) space. Nextwe present a procedure that solves General Module Motif by using a modulartree decomposition and SetALG.


828 M. Zehavi

ModuleALG(C, P, H = (V, E), col, Col, Δ, D, I, S)

1. Compute a modular tree decomposition (T = (VT , ET ), f, g) of H in O(|V |2)time and O(|V |) space (e.g. use the algorithm of Tedder et al. [24]).

2. ∀v ∈ VT :If SetALG(C, P, {f(v)}, col, Col, Δ, D, I, S) accepts or (g(v) = 1 andSetALG(C, P, {f(u) : u is a child of v}, col, Col, Δ, D, I, S) accepts): Accept.

3. Reject.

Property 2 of a modular tree decomposition implies that ∀v ∈ VT , {f(u) : u isa child of v} is a set of disjoint subsets of V . Thus each call of SetALG is legaland uses O(t(MaxCol, k, |V |, D, I, S)) time and O(s(MaxCol, k, |V |, D, I, S))space. By Property 3 of a modular tree decomposition, we get that Step 2 usesO(|V |t(MaxCol, k, |V |, D, I, S)) time and O(s(MaxCol, k, |V |, D, I, S)) space.

Property 1 of a modular tree decomposition implies that if (M, U, m) is asolution, then there is v ∈ VT s.t. ({f(v)}, U, m) is a solution to Set Mo-tif whose input is (C, P, {f(v)}, col, Col, Δ, D, I, S) or (g(v) = 1 and ({f(u) :u ∈ M}, U, m) is a solution to Set Motif whose input is (C, P, {f(u) : u is a

child of v}, col, Col, Δ, D, I, S)), and if there is v ∈ VT s.t. (A, U, m) is a solu-tion to Set Motif whose input is (C, P, {f(v)}, col, Col, Δ, D, I, S) or (g(v) = 1and it is a solution to Set Motif whose input is (C, P, {f(u) : u is a child of

v}, col, Col, Δ, D, I, S)), then (⋃ A, U, m) is a solution.

We get the following lemma:

Lemma 1. ModuleALG solves General Module Motif in O(|V |t(MaxCol, k, |V |,D, I, S) + |V |2) time and O(s(MaxCol, k, |V |, D, I, S) + |V |) space.

3 An O∗(2k)-Time Algorithm

We present an O∗(2k) time and space Set Motif algorithm that uses dynamicprogramming. By Lemma 1, we thus get an O∗(2k) time and space GeneralModule Motif algorithm, which we denote by ALG3.

First we define the partial solutions that we consider in our computation.

Definition 1. Given a multiset P ⊆ P s.t. |P | ≤ k − D, 1 ≤ i ≤ |A|, 1 ≤ j ≤|Ai| and 0 ≤ ins ≤ I, Sol(P , i, j, ins) denotes the set of tuples (A, U, m) s.t.

A ⊆ {A1, A2, . . . , Ai−1, {ai1, a

i2, . . . , a

ij}}, U ⊆ ⋃ A, m : U → C and

1. ∀a ∈ U : m(a) ∈ Col(a).

2. ∀c ∈ C : c occurs in P exactly |{a ∈ U : m(a) = c}| times.

3. |U | = |P | = |⋃ A| − ins.

We use two matrices:

1. M has a cell [P , i, j, ins] for every multiset P ⊆ P s.t. |P | ≤ k − D, 1 ≤ i ≤|A|, 1 ≤ j ≤ |Ai| and 0 ≤ ins ≤ I.

2. N has a cell [P , i, ins] for every multiset P ⊆ P s.t. |P | ≤ k −D, 1 ≤ i ≤ |A|and 0 ≤ ins ≤ I.



The cells of the matrices hold the following scores:

1. M [P , i, j, ins] =max(A,U,m)∈Sol(P ,i,j,ins) s.t. {ai

1,ai2,...,ai

j}∈A{∑a∈U Δ(col(a), m(a))}.

2. N [P , i, ins] = max(A,U,m)∈Sol(P ,i,|Ai|,ins){∑

a∈U Δ(col(a), m(a))}.

Lemma 2. The cells of M and N can be computed in O(2kMaxCol|⋃A|I)time and O(2kI) space by using dynamic programming.

After we compute the cells, we accept iff maxP ⊆P s.t. |P |=k−D{N [P , |A|, I]}≥ S, and the correctness immediately follows from the definition of Set Motif.We get the following theorem:

Theorem 1. ALG3 solves General Module Motif in O(2kMaxCol|V |2I) timeand O(2kI + |V |) space.

4 An O∗(4.314k−D)-Time Algorithm

We use improved color coding [14] and ALG3 (see Section 3) to design anO∗(4.314k−D) time and O∗(2.463k−D) space General Module Motif algorithm.

The idea of the algorithm is to introduce a new multiset of colors that repre-sents P and whose size is a function of k − D. We reduce the size from k to afunction of k − D by allowing each occurrence in the new multiset to representseveral occurrences in P . Then we call ALG3 with the new multiset and get aparameterized algorithm whose parameter is k − D.

Let P ∗ = {p∗1, p

∗2, . . . , p

∗1.3(k−D)} be a set of 1.3(k − D) new colors. For each

v ∈ V , define col∗(v) = c∗v, where c∗

v is a new color. Define C∗ = P ∗∪{c∗v : v ∈ V }.

Given f : P → P ∗, we define Colf : V → 2C∗as follows ∀v ∈ V :

– Colf (v) = {f(pi) : pi ∈ P, col(pi) ∈ Col(v)}.

We also define a symmetric Δf : C∗ × C∗ → R as follows ∀c, d ∈ C∗:

1. If c = c∗v and d ∈ Colf (v) for a node v ∈ V :

Δf (c, d) = Δf (d, c) = maxpi∈P s.t. f(pi)=d∧col(pi)∈Col(v)Δ(col(v), col(pi)).2. Else: Δf (c, d) = Δf (d, c) = 0.

Now we present the algorithm:

ALG4(C, P, H = (V, E), col, Col, Δ, D, I, S):

1. Repeat 1.752k−D times:(a) ∀pi ∈ P , independently and uniformly at random assign a color from

P ∗. Denote the resulting function by f : P → P ∗.(b) If ALG3(C∗, P ∗, H, col∗, Colf , Δf , 0.3(k − D), I, S) accepts: Accept.

2. Reject.

Next we prove the correctness of ALG4.


830 M. Zehavi

Observation 1. If there is f : P → P ∗ s.t. (M, U, m∗) is a solution to (C∗, P ∗,H, col∗, Colf , Δf , 0.3(k−D), I, S), then there is a solution to the input instance.

Proof. ∀v ∈ U , we define m(v) = col(pi) for some pi ∈ P s.t. f(pi) = m∗(v),col(pi) ∈ Col(v) and Δf (col∗(v), f(pi)) = Δ(col(v), col(pi)) (our definitions ofColf and Δf imply that such a pi exists). Clearly M is a module of H , U ⊆ Mand m : U → C. Moreover, they fulfill the following conditions:

1. Let v ∈ U . We have that m(v) ∈ Col(v).2. Let c ∈ C. If c does not occur in P , then |{v ∈ U : m(v) = c}| = 0. If there

are two different v, u ∈ U s.t. we defined both m(v) and m(u) as c becauseof the same occurrence pi of c in P , then |{v ∈ U : m∗(v) = f(pi)}| ≥ 2,which is a contradiction (since f(pi) occurs once in P ∗). Thus the numberof occurrences of c in P is at least |{v ∈ U : m(v) = c}|.

3. |U | = |P ∗| − 0.3(k − D) = k − D.4. |U | = |M | − I.5.

∑v∈U Δ(col(v), m(v)) =

∑v∈U Δf (col∗(v), m∗(v)) ≥ S.

Thus (M, U, m) is a solution to the input instance. ��

Observation 2. If there is a solution (M, U, m) to the input instance, then withprobability at least 1 − 1/e, there is an iteration where we choose f : P → P ∗

s.t. there is a solution to (C∗, P ∗, H, col∗, Colf , Δf , 0.3(k − D), I, S).

Proof. Each c ∈ C occurs in P at least |{v ∈ U : m(v) = c}| times. Thus ∀c ∈ Cwe can denote some set of |{v ∈ U : m(v) = c}| occurrences of c in P by Occ(c).

We denote by F the set of functions f : P → P ∗ that satisfy ∀pi, pj ∈⋃c∈C Occ(c) s.t. i = j, f(pi) = f(pj). Note that |⋃c∈C Occ(c)| = k − D.Given sets A, B ⊆ A and C s.t. 1.3|B| = |C|, Huffner et al. [14] prove that if

we repeat 1.752|B| times the step

– ∀a ∈ A, independently and uniformly at random assign an element from C.

then with probability at least 1 − 1/e, there is a step where we assign to eachelement in B a different element from C.

We get that with probability at least 1 − 1/e, there is an iteration where wechoose f ∈ F . Next consider an iteration that corresponds to such a f .

For each v ∈ U , choose a different occurrence pi of m(v) in Occ(m(v)) anddenote m∗(v) = f(pi). Clearly M is a module of H , U ⊆ M and m∗ : U → C∗.Moreover, they fulfill the following conditions:

1. Let v ∈ U . m(v) ∈ Col(v), and thus m∗(v) ∈ {f(pi) : pi ∈ P, col(pi) ∈Col(v)}. Therefore m∗(v) ∈ Colf (v).

2. Let c ∈ C∗. If c /∈ P ∗, then |{v ∈ U : m∗(v) = c}| = 0, and if c ∈ P ∗, thenby our choice of f , |{v ∈ U : m∗(v) = c}| ≤ 1.

3. |U | = k − D = |P ∗| − 0.3(k − D).4. |U | = |M | − I.5.

∑v∈U Δf (col∗(v), m∗(v)) ≥ ∑

v∈U Δ(col(v), m(v)) ≥ S.



Thus (M, U, m∗) is a solution to (C∗, P ∗, H, col∗, Colf , Δf , 0.3(k −D), I, S). ��

The time and space complexities of Step 1b are O(21.3(k−D)(k − D)|V |2I) andO(21.3(k−D)I + |V |) respectively. Since we repeat it 1.752k−D times, we get thatALG4 uses O(4.314k−Dk|V |2I) time and O(2.463k−DI + |V |) space.

We get the following theorem:

Theorem 2. ALG4 solves General Module Motif. It has one-sided error anduses O(4.314k−Dk|V |2I) time and O(2.463k−DI + |V |) space.

5 An O∗(2k−DS+)-Time Algorithm

We present an O∗(2k−DS+) time and O∗(S+) space algorithm for Set Motifwhere S and Δ(c, c′) are nonnegative integers ∀c, c′ ∈ C. By Lemma 1, we thusget an O∗(2k−DS+) time and O∗(S+) space algorithm for General Module Motifwhere S and Δ(c, c′) are nonnegative integers ∀c, c′ ∈ C, which we denote byALG5. Since in Module Motif, (∀c, c′ ∈ C : Δ(c, c′) = 0) and D = I = S = 0,we get an O∗(2k) time and O∗(1) space Module Motif algorithm.

We use the algebraic framework of Bjorklund et al. [3]. We express our param-eterized problem by associating monomials with potential solutions. A correctsolution is associated with a unique monomial, and a monomial which is notassociated with a correct solution is associated with an even number of potentialsolutions. Having a polynomial which is the sum of such monomials, we need todetermine whether it has a monomial whose coefficient is odd.

Given i ∈ N, we denote [i] = {1, 2, . . . , i}.

5.1 Potential Solutions

First we define the potential solutions (PS stands for Potential Solutions).

Definition 2. Given 0 ≤ size ≤ k − D, 1 ≤ i ≤ |A|, 1 ≤ j ≤ |Ai|, 0 ≤ ins ≤ I

and sco ∈ N0, PS(size, i, j, ins, sco) is the set of tuples (A, U, m, l) s.t. A ⊆{A1, A2, . . . , Ai−1, {ai

1, ai2, . . . , a

ij}}, U ⊆ ⋃ A, m : U → P , l : U → [k − D] and

1. ∀a ∈ U : col(m(a)) ∈ Col(a).

2. |U | = size = |⋃ A| − ins.

3.∑

a∈U Δ(col(a), col(m(a))) = sco.

We denote:

1. Bij =⋃

s∈N0 s.t. s≥S{(A, U, m, l) ∈ PS(k−D, |A|, |A|A||, I, s) : l is bijective}.

2. BijM = {(A, U, m, l) ∈ Bij : m is injective}.

Observation 3. The input instance has a solution iff BijM = ∅.


832 M. Zehavi

5.2 Associating Monomials with Potential Solutions

We introduce an indeterminate x(Ai) for all Ai ∈ A, an indeterminate y(a, p) forall a ∈ ⋃A and p ∈ P , and an indeterminate z(p, i) for all p ∈ P and i ∈ [k−D].We order them arbitrarily as q1, q2, . . . , qr where r = |A| + k(|⋃A| + k − D).

Definition 3. Given (A, U, m, l) ∈ PS(size, i, j, ins, sco), its monomial is

mon(A, U, m, l) =∏

Ai∈A

x(Ai)∏

a∈U

y(a, m(a))z(m(a), l(a)).

Observation 4. If (A, U, m, l) ∈ BijM and (A′, U ′, m′, l′) ∈ Bij \ {(A, U, m,

l)}, then mon(A, U, m, l) = mon(A′, U ′, m′, l′).

Observation 5. There is a fixed-point-free involution (i.e. a permutation that

is its own inverse) f : Bij \ BijM → Bij \ BijM s.t. mon(A, U, m, l) =

mon(f(A, U, m, l)) for all (A, U, m, l) ∈ Bij \ BijM .

5.3 Evaluating the Sum of the Monomials

Denote POL(q1, q2, . . . , qr) =∑

(A,U,m,l)∈Bij mon(A, U, m, l). Observations 3, 4

and 5 imply that the input instance has a solution iff POL has a monomial withan odd coefficient. We evaluate the polynomial over the finite field Fq (i.e. thefinite field of order q), where q = 2log2(e(3k+I)). Since this field has characteristic2, we get the following observation:

Observation 6. The input instance has a solution iff POL ≡ 0.

Denote the image of a function l : U → [k − D] by l(U).Given L ⊆ [k − D], denote:

1. PSL = {(A, U, m, l) ∈ ⋃s∈N0 s.t. s≥S PS(k − D, |A|, |A|A||, I, s) : l(U) ⊆ L}.

2. POLL(q1, q2, . . . , qr) =∑

(A,U,m,l)∈PSLmon(A, U, m, l).

By the inclusion-exclusion principle and since Fq has characteristic 2, we get thefollowing observation:

Observation 7. POL(q1, q2, . . . , qr) =∑

L⊆[k−D] POLL(q1, q2, . . . , qr).

Lemma 3. Given L ⊆ [k − D] and b1, b2, . . . , br ∈ Fq, POLL(b1, b2, . . . , br) can

be evaluated in O(S+ log S+k|⋃A|I) time and O(S+kI) space by using dynamicprogramming.

5.4 The Algorithm

SetALG5(C, P, A, col, Col, Δ, D, I, S)

1. Select b1, b2, . . . , br ∈ Fq independently and uniformly at random.2. SUM ⇐ 0.



3. For all L ⊆ [k − D]: SUM ⇐ SUM + POLL(b1, b2, . . . , br).4. Accept iff SUM = 0.

The proof of the following lemma appears in [22]:

Lemma 4. Let p(x1, x2, . . . , xn) be a nonzero polynomial of total degree at mostd over the finite field F . Then, for b1, b2, . . . , bn ∈ F selected independently anduniformly at random: Pr(p(b1, b2, . . . , bn) = 0) ≥ 1 − d/|F |.

By Observations 6 and 7 and Lemmas 3 and 4 (note that the degree of POL isat most 3k + I), we have that:

Lemma 5. SetALG5 solves Set Motif where S ∈ N0 and Δ(c, c′) ∈ N0 for all

c, c′ ∈ C. It has one-sided error and uses O(2k−DS+ log S+k|⋃A|I) time and

O(k(S+I + |⋃A|)) space.

We get the following theorem:

Theorem 3. ALG5 solves General Module Motif where S ∈ N0 and Δ(c, c′) ∈N0 for all c, c′ ∈ C. It has one-sided error and uses O(2k−DS+ log S+k|V |2I)

time and O(k(S+I + |V |)) space.

6 The Tightness of the Results

In this section we observe that further improvement on the running time of thealgorithms we have presented in Sections 3 and 5 is substantially harder.

Set Cover

– Input: t ∈ N0, a set of sets S = {S1, S2, . . . , Sm} and A =⋃S where |A| = n.

– Decide if there is S ⊆ S s.t. |S| = t and A =⋃ S.

We assume WLOG that 2 ≤ t ≤ m.We prove that for any ε > 0, there is ε′ > 0 s.t. the existence of an O∗((2−ε)k)

time algorithm for Module Motif even if P is a set and H is a collection of pathsimplies an O∗((2 − ε′)n) time Set Cover algorithm. Thus the existence of anO∗((2−ε)k) General Module Motif algorithm or an O∗((2−ε)k−DS+) algorithmfor General Module Motif where S ∈ N0 and Δ(c, c′) ∈ N0 for all c, c′ ∈ C impliesan O∗((2 − ε′)n) time Set Cover algorithm.

Bjorklund et al. [4] use this approach to claim that further improvement on therunning time of their Graph Motif algorithm is substantially harder. Set Coveris a well-known problem researched for decades, what suggests that an O∗((2 −ε)n) time algorithm for it, if possible at all, would be a major breakthrough inthe field. The nonexistence of such an algorithm has already been used as anassumption for proving hardness results [9].

Theorem 4. Let ALG6 be an O∗((2 − ε)k) time algorithm for Module Motifwhere P is a set and H is a collection of paths. Then there is ε′ > 0 s.t. there isan O∗((2 − ε′)n) time Set Cover algorithm.


834 M. Zehavi

Proof. Consider the following algorithm:

SetCoverALG(t, S = {S1, S2, . . . , Sm}, A)

1. Construct an instance of Module Motif where P is a set and H is a collectionof paths as follows:

(a) C = P = A ∪ {c1, c2, . . . , ct}. Note that k = n + t.(b) V = {vi,r,j : 1 ≤ i ≤ m, 1 ≤ r ≤ |Si|, 1 ≤ j ≤ r}∪

{ui,r : 1 ≤ i ≤ m, 0 ≤ r ≤ |Si|}.(c) E = {{vi,r,j, vi,r,j+1} : 1 ≤ i ≤ m, 1 ≤ r ≤ |Si|, 1 ≤ j ≤ r − 1}∪

{{vi,r,r, ui,r} : 1 ≤ i ≤ m, 1 ≤ r ≤ |Si|}.(d) ∀vi,r,j ∈ V : Col(vi,r,j) = Si.(e) ∀ui,r ∈ V : Col(ui,r) = {c1, c2, . . . , ct}.

2. Accept iff ALG6(C, P, H = (V, E), Col) accepts.

Lemma 6. SetCoverALG solves Set Cover.

Since ALG6 runs in O∗((2−ε)k) time, we get that SetCoverALG runs in O∗((2−ε)n+t) time. Cygan et al. [9] prove that the existence of an O∗((2 − ε)n+t) timeSet Cover algorithm implies that there is ε′ > 0 s.t. there is an O∗((2 − ε′)n)time Set Cover algorithm, and thus we get the theorem. ��

7 No Polynomial Kernel

We denote by LMMS the Limited case of Module Motif where P is a Set andthe parameter is k. We prove that even LMMS is unlikely to admit a polynomialkernel (i.e. there is no O∗(1) time algorithm that for every input of this problem,its output is an input of this problem that has a solution iff the original inputhas a solution and whose size is polynomial in k).

Given an input IN of a parameterized problem, we denote by p(IN) theparameter of IN (e.g. the parameter of the input (C, P, H, col, Col, D, I, S) ofGeneral Module Motif that we have considered in Section 3 is k = |P |). The

unparameterized version of a parameterized problem L is L = {x#p(x) : x is aninput of L that has a solution} where # is a symbol that does not appear in L[6]. We also need the following definition of Bodlaender et al. [6]:

Definition 4. A parameterized problem L is compositional if it has a composi-tional algorithm, which is an algorithm whose input is a tuple (x1, x2, . . . , xt)of inputs of L s.t. p(x1) = p(x2) = . . . = p(xt), runs in time polynomialin

∑1≤i≤t |xi| + p(x1), and outputs an input y of L s.t. (y has a solution iff

∃1 ≤ i ≤ t s.t. xi has a solution) and p(y) is polynomial in p(x1).

The proof of the following theorem appears in [6]:

Theorem 5. If a compositional parameterized problem whose unparameterizedversion is NP-complete has a polynomial kernel, then NP ⊆ coNP/Poly.



Rizzi et al. [21] prove that LMMS is NP-complete. Assume that there is anO∗(1) time algorithm ALG7 for the unparameterized version of LMMS. Givenan input x of LMMS, we can call ALG7 on x#p(x) and answer the same. Since|x#p(x)| = O(|x|), we thus get an O∗(1) time LMMS algorithm and have acontradiction. We get the following observation:

Observation 8. The unparameterized version of LMMS is NP-complete.

Consider the following algorithm:

CompALG((C1, P1, H1 = (V1, E1), Col1), . . . , (Ct, Pt, Ht = (Vt, Et), Colt)):

1. C ⇐ P1 ∪ {c∗} where c∗ is a new color.2. V ⇐ V1 ∪ V2 ∪ . . . ∪ Vt ∪ {v∗

1 , v∗2 , . . . , v∗

t } where each v∗i is a new node.

3. E ⇐ E1 ∪ E2 ∪ . . . ∪ Et ∪ {{v∗i , v} : 1 ≤ i ≤ t, v ∈ Vi}.

4. For i = 1, 2, . . . , t, let fi be some bijective function fi : Pi → P1.5. Define Col : V → C as follows for i = 1, 2, . . . , t:

(a) ∀v ∈ Vi s.t. Coli(v) = {c} for c ∈ Pi: Col(v) = {fi(c)}.(b) ∀v ∈ Vi s.t. Coli(v) = {c} for c /∈ Pi: Col(v) = {c∗}.(c) Col(v∗

i ) = {c∗}.6. Return (C, P1, H = (V, E), Col).

Lemma 7. CompALG is a compositional algorithm for LMMS.

Theorem 5, Observation 8 and Lemma 7 imply the following theorem, whichstates that LMMs is unlikely to admit a polynomial kernel:

Theorem 6. If LMMS admits a polynomial kernel, then NP ⊆ coNP/Poly.

References

1. Ambalath, A.M., Balasundaram, R., Rao H., C., Koppula, V., Misra, N., Philip, G.,Ramanujan, M.S.: On the kernelization complexity of colorful motifs. In: Raman,V., Saurabh, S. (eds.) IPEC 2010. LNCS, vol. 6478, pp. 14–25. Springer, Heidelberg(2010)

2. Betzler, N., Bevern, R., Fellows, M.R., Komusiewicz, C., Niedermeier, R.: Pa-rameterized algorithmics for finding connected motifs in biological networks.IEEE/ACM Trans. Comput. Biol. Bioinform. 8(5), 1296–1308 (2011)

3. Bjorklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: Narrow sieves for parameter-ized paths and packings. CoRR abs/1007.1161 (2010)

4. Bjorklund, A., Kaski, P., Kowalik, L.: Probably optimal graph motifs. In: Proc.STACS, pp. 20–31 (2013)

5. Blin, G., Sikora, F., Vialette, S.: Gramofone: a cytoscape plugin for querying motifswithout topology in protein-protein interactions networks. In: Proc. BICoB, pp.38–43 (2010)

6. Bodlaender, H.L., Downey, R.G., Fellows, M.R., Hermelin, D.: On problems with-out polynomial kernels. J. Comput. Syst. Sci. 75(8), 423–434 (2009)

7. Bruckner, S., Huffner, F., Karp, R.M., Shamir, R., Sharan, R.: Topology-free query-ing of protein interaction networks. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS,vol. 5541, pp. 74–89. Springer, Heidelberg (2009)


836 M. Zehavi

8. Chein, M., Habib, M., Maurer, M.C.: Partitive hypergraphs. Discrete Mathemat-ics 37(1), 35–50 (1981)

9. Cygan, M., Dell, H., Lokshtanov, D., Marx, D., Nederlof, J., Okamoto, Y., Paturi,R., Saurabh, S., Wahlstrom, M.: On problems as hard as cnf-sat. In: Proc. CCC,pp. 74–84 (2012)

10. Dondi, R., Fertin, G., Vialette, S.: Weak pattern matching in colored graphs: min-imizing the number of connected components. In: Proc. ICTCS, pp. 27–38 (2007)

11. Dondi, R., Fertin, G., Vialette, S.: Finding approximate and constrained motifsin graphs. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp.388–401. Springer, Heidelberg (2011)

12. Fellows, M.R., Fertin, G., Hermelin, D., Vialette, S.: Upper and lower bounds forfinding connected motifs in vertex-colored graphs. J. Comput. Syst. Sci. 77(4),799–811 (2011)

13. Guillemot, S., Sikora, F.: Finding and counting vertex-colored subtrees. In:Hlineny, P., Kucera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 405–416. Springer,Heidelberg (2010)

14. Huffner, F., Wernicke, S., Zichner, T.: Algorithm engineering for color-coding withapplications to signaling pathway detection. Algorithmica 52(2), 114–132 (2008)

15. Koutis, I.: Constrained multilinear detection for faster functional motif discovery.Inf. Process. Lett. 112(22), 889–892 (2012)

16. Lacroix, V., Fernandes, C.G., Sagot, M.F.: Motif search in graphs: Application tometabolic networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 3(4), 360–368(2006)

17. Niedermeier, R.: Invitation to fixed-parameter algorithms. Oxford University Press(2006)

18. Pinter, R.Y., Rokhlenko, O., Yeger-Lotem, E., Ziv-Ukelson, M.: Alignment ofmetabolic pathways. Bioinformatics 21(16), 3401–3408 (2005)

19. Pinter, R.Y., Zehavi, M.: Algorithms for topology-free and alignment queries. Tech-nion Technical Reports CS-2012-12 (2012)

20. Pinter, R.Y., Zehavi, M.: Partial information network queries. In: Proc. IWOCA(to appear, 2013)

21. Rizzi, R., Sikora, F.: Some results on more flexible versions of graph motif. In:Hirsch, E.A., Karhumaki, J., Lepisto, A., Prilutskii, M. (eds.) CSR 2012. LNCS,vol. 7353, pp. 278–289. Springer, Heidelberg (2012)

22. Schwartz, J.T.: Fast probabilistic algorithms for verification of polynomial identi-ties. J. Assoc. Comput. Mach. 27(4), 701–717 (1980)

23. Shlomi, T., Segal, D., Ruppin, E., Sharan, R.: Qpath: a method for querying path-ways in a protein-protein interaction networks. BMC Bioinform. 7, 199 (2006)

24. Tedder, M., Corneil, D., Habib, M., Paul, C.: Simpler linear-time modular decom-position via recursive factorizing permutations. In: Aceto, L., Damgard, I., Gold-berg, L.A., Halldorsson, M.M., Ingolfsdottir, A., Walukiewicz, I. (eds.) ICALP2008, Part I. LNCS, vol. 5125, pp. 634–645. Springer, Heidelberg (2008)


3.4 Algorithms for k-Internal Out-Branching

Meirav Zehavi. Algorithms for k-Internal Out-Branching. In the proc. of the 8th

International Symposium on Parameterized and Exact Computation (IPEC), pages

361–373, 2013.

90


Algorithms for k-Internal Out-Branching

Meirav Zehavi

Department of Computer Science, Technion - Israel Institute of Technology,Haifa 32000, Israel

[email protected]

Abstract. The k-Internal Out-Branching (k-IOB) problem asks if agiven directed graph has an out-branching (i.e., a spanning tree withexactly one node of in-degree 0) with at least k internal nodes. Thek-Internal Spanning Tree (k-IST) problem is a special case of k-IOB,which asks if a given undirected graph has a spanning tree with at leastk internal nodes. We present an O∗(4k) time randomized algorithm for k-IOB, which improves the O∗ running times of the best known algorithmsfor both k-IOB and k-IST. Moreover, for graphs of bounded degree Δ,

we present an O∗(2(2− Δ+1

Δ(Δ−1))k

) time randomized algorithm for k-IOB.Both our algorithms use polynomial space.

1 Introduction

In this paper we study the k-Internal Out-Branching (k-IOB) problem. Theinput for k-IOB consists of a directed graph G = (V, E) and a parameter k ∈ N,and the objective is to decide if G has an out-branching (i.e., a spanning treewith exactly one node of in-degree 0, that we call the root) with at least kinternal nodes (i.e., nodes of out-degree ≥ 1). The k-IOB problem is of interestin database systems [2].

A special case of k-IOB, called k-Internal Spanning Tree (k-IST), asks if agiven undirected graph G = (V, E) has a spanning tree with at least k internalnodes. A possible application of k-IST, for connecting cities with water pipes, isgiven in [14].

The k-IST problem is NP-hard even for graphs of bounded degree 3, since itgeneralizes the Hamiltonian path problem for such graphs [5]; thus k-IOB is alsoNP-hard for such graphs. In this paper we present parameterized algorithms fork-IOB. Such algorithms are an approach to solve NP-hard problems by confiningthe combinatorial explosion to a parameter k. More precisely, a problem is fixed-parameter tractable (FPT) with respect to a parameter k if an instance of sizen can be solved in O∗(f(k)) time for some function f [10].1

Related Work: Nederlof [9] gave an O∗(2|V |) time and polynomial spacealgorithm for k-IST. For graphs of bounded degree Δ, Raible et al. [14] gave an

O∗(((2Δ+1 − 1)1

Δ+1 )|V |) time and exponential space algorithm for k-IST.

1 O∗ hides factors polynomial in the input size.

G. Gutin and S. Szeider (Eds.): IPEC 2013, LNCS 8246, pp. 361–373, 2013.c© Springer International Publishing Switzerland 2013


362 M. Zehavi

Table 1. Known parameterized algorithms for k-IOB and k-IST

Reference Variation Time Complexity The Topology of G

Priesto et al. [12] k-IST O∗(2O(k log k)) General

Gutin al. [6] k-IOB O∗(2O(k log k)) General

Cohen et al. [1] k-IOB O∗(49.4k) General

Fomin et al. [4] k-IOB O∗(16k+o(k)) General

Fomin et al. [3] k-IST O∗(8k) General

Raible et al. [14] k-IST O∗(2.1364k) Δ = 3

This paper k-IOB O∗(4k) General

k-IOB O∗(2(2− Δ+1

Δ(Δ−1))k

) Δ = O(1)

Table 2. Some concrete figures for the running time of the algorithm Δ-IOB-Alg

Δ 3 4 5 6

Time complexity O∗(2.51985k) O∗(2.99662k) O∗(3.24901k) O∗(3.40267k)

Table 1 presents a summary of known parameterized algorithms for k-IOBand k-IST. In particular, the algorithms having the best known O∗ runningtimes for k-IOB and k-IST are due to [4], [3] and [14]. Fomin et al. [4] gavean O∗(16k+o(k)) time and polynomial space randomized algorithm for k-IOB,and an O∗(16k+o(k)) time and O∗(4k+o(k)) space deterministic algorithm for k-IOB. Fomin et al. [3] gave an O∗(8k) time and polynomial space deterministicalgorithm for k-IST. For graphs of bounded degree 3, Raible et al. [14] gave anO∗(2.1364k) time and polynomial space deterministic algorithm for k-IST.

Further information on k-IOB, k-IST and variants of these problems is givenin surveys [11,15].

Our Contribution: We present an O∗(4k) time and polynomial space random-ized algorithm for k-IOB, that we call IOB-Alg. Our algorithm IOB-Alg improvesthe O∗ running times of the best known algorithms for both k-IOB and k-IST.

For graphs of bounded degree Δ, we present an O∗(2(2− Δ+1Δ(Δ−1)

)k) time andpolynomial space randomized algorithm for k-IOB, that we call Δ-IOB-Alg. Someconcrete figures for the running time of Δ-IOB-Alg are given in Table 2.

Techniques: Our algorithm IOB-Alg is based on two reductions as follows. Wefirst reduce k-IOB to a new problem, that we call (k, l)-Tree, by using an ob-servation from [1]. This reduction allows us to focus our attention on finding atree whose size depends on k, rather than a spanning tree whose size depends on|V |. We then reduce (k, l)-Tree to the t-Multilinear Detection (t-MLD) problem,which concerns multivariate polynomials and has an O∗(2t) time randomizedalgorithm [7,17]. We note that reductions to t-MLD have been used to solve sev-eral problems quickly (see, e.g., [8]). IOB-Alg is another proof of the applicabilityof this new tool.

Our algorithm Δ-IOB-Alg, though based on the same technique as IOB-Alg,requires additional new non-trivial ideas and is more technical. In particular, we


Algorithms for k-Internal Out-Branching 363

now use a proper coloring of the graph G when reducing (k, l)-Tree to t-MLD.This idea might be useful in solving other problems.

Organization: Section 2 presents our algorithm IOB-Alg. Specifically, Section2.1 defines (k, l)-Tree, and presents an algorithm that solves k-IOB by using analgorithm for (k, l)-Tree. Section 2.2 defines t-MLD, and reduces (k, l)-Tree tot-MLD. Then, Section 2.3 presents our algorithm for (k, l)-Tree, and thus con-cludes IOB-Alg. Section 3 presents our algorithm Δ-IOB-Alg. Specifically, Sec-tion 3.1 modifies the algorithm presented in Section 2.1, Section 3.2 modifiesthe reduction presented in Section 2.2, and Section 3.3 modifies the algorithmspresented in Section 2.3. Finally, Section 4 presents a few open questions.

2 An O∗(4k)-time k-IOB Algorithm

2.1 The (k, l)-Tree Problem

We first define a new problem, that we call (k, l)-Tree.

(k, l)-Tree

– Input: A directed graph G = (V, E), a node r ∈ V , and parameters k, l ∈ N.– Goal: Decide if G has an out-tree (i.e., a tree with exactly one node of in-

degree 0) rooted at r with exactly k internal nodes and l leaves.

We now show that we can focus our attention on solving (k, l)-Tree. LetA(G, r, k, l) be a t(G, r, k, l) time and s(G, r, k, l) space algorithm for (k, l)-Tree.

Algorithm 1. IOB-Alg[A](G, k)

1: for all r ∈ V do2: if G has no out-branching T rooted at r then Go to the next iteration. end if3: for l = 1, 2, ..., k do4: if A(G, r, k, l) accepts then Accept. end if5: end for6: end for7: Reject.

The following observation immediately implies the correctness of IOB-Alg[A](see Algorithm 1).

Observation 1 ([1]). Let G = (V, E) be a directed graph, and r ∈ V such thatG has an out-branching rooted at r.

– If G has an out-branching rooted at r with at least k internal nodes, then Ghas an out-tree rooted at r with exactly k internal nodes and at most k leaves.

– If G has an out-tree rooted at r with exactly k internal nodes, then G has anout-branching with at least k internal nodes.

By Observation 1, and since Step 2 can be performed in O(|E|) time andO(|V |) space (e.g., by using DFS), we have the following result.


364 M. Zehavi

Lemma 1. IOB-Alg[A] is an O(∑

r∈V (|E| +∑

1≤l≤k t(G, r, k, l))) time and O(|V | + maxr∈V,1≤l≤k s(G, r, k, l)) space algorithm for k-IOB.

2.2 A Reduction from (k, l)-Tree to t-MLD

We first give the definition of t-MLD [7].

t-MLD

– Input: A polynomial P represented by an arithmetic circuit C over a set ofvariables X , and a parameter t ∈ N.

– Goal: Decide if P has a multilinear monomial of degree at most t.

Let (G, r, k, l) be an input for (k, l)-Tree. We now construct an input f(G, r, k,l) = (Cr,k,l, X, t) for t-MLD. We introduce an indeterminate xv for each v ∈ V ,and define X = {xv : v ∈ V } and t = k + l.

The idea behind the construction is to let each monomial represent a pair of anout-tree T = (VT , ET ) and a function h : VT → V , such that if (v, u) ∈ ET , then(h(v), h(u)) ∈ E (i.e., h is a homomorphism). The monomial is

∏v∈VT

xh(v).We get that the monomial is multilinear iff {h(v) : v ∈ VT } is a set (thenh(T ) = ({h(v) : v ∈ VT }, {(h(v), h(u)) : (v, u) ∈ ET }) is an out-tree).

Towards presenting Cr,k,l, we inductively define an arithmetic circuit Cv,k′,l′

over X , for all v ∈ V, k′ ∈ {0, ..., k} and l′ ∈ {1, ..., l}. Informally, the multilinearmonomials of the polynomial represented by Cv,k′,l′ represent out-trees of Grooted at v that have exactly k′ internal nodes and l′ leaves.

Base Cases:

1. If k′ = 0 and l′ = 1: Cv,k′,l′ = xv.2. If k′ = 0 and l′ > 1: Cv,k′,l′ = 0.

Steps:

1. If k′ > 0 and l′ = 1: Cv,k′,l′ =∑

u s.t.(v,u)∈E xvCu,k′−1,l′ .

2. If k′ > 0 and l′ > 1: Cv,k′,l′ =∑u s.t.(v,u)∈E(xvCu,k′−1,l′ +

∑1≤k∗≤k′

∑1≤l∗≤l′−1 Cv,k∗,l∗ · Cu,k′−k∗,l′−l∗).

The following order shows that when computing an arithmetic circuit Cv,k′,l′ ,we only use arithmetic circuits that have been already computed.

Order:

1. For k′ = 0, 1, ..., k:

(a) For l′ = 1, 2, ..., l:

i. ∀v ∈ V : Compute Cv,k′,l′ .

Denote the polynomial that Cv,k′,l′ represents by Pv,k′,l′ .

Lemma 2. (G, r, k, l) has a solution iff (Cr,k,l, X, t) has a solution.



Proof. By using induction, we first prove that if G has an out-tree T = (VT , ET )rooted at v with exactly k′ internal nodes and l′ leaves, then Pv,k′,l′ has the(multilinear) monomial

∏w∈VT

xw .The claim is clearly true for the base cases, and thus we next assume that

k′ > 0, and the claim is true for all v ∈ V , k∗ ∈ {0, ..., k′} and l∗ ∈ {1, ..., l′},such that (k∗ < k′ or l∗ < l′).

Let T = (VT , ET ) be an out-tree of G, that is rooted at v and has exactlyk′ internal nodes and l′ leaves. Also, let u be a neighbor of v in T . Denote byTv = (Vv, Ev) and Tu = (Vu, Eu) the two out-trees of G in the forest F =(VT , ET \ {(v, u)}), such that v ∈ Vv. We have the following cases.

1. If |Vv| = 1: Tu has k′ − 1 internal nodes and l′ leaves. By the inductionhypothesis, Pu,k′−1,l′ has the monomial

∏w∈Vu

xw. Thus, by the definitionof Cv,k′,l′ , Pv,k′,l′ has the monomial xv

∏w∈Vu

xw =∏

w∈VTxw.

2. Else: Denote the number of internal nodes and leaves in Tv by kv andlv, respectively. By the induction hypothesis, Pv,kv ,lv has the monomial∏

w∈Vvxw, and Pu,k′−kv ,l′−lv has the monomial

∏w∈Vu

xw . By the defini-tion of Cv,k′,l′ , Pv,k′,l′ has the monomial

∏w∈Vv

xw

∏w∈Vu

xw =∏

w∈VTxw .

Now, by using induction, we prove that if Pv,k′,l′ has the (multilinear) mono-mial

∏w∈U xw , for some U ⊆ V , then G has an out-tree T = (VT , ET ) rooted

at v with exactly k′ internal nodes and l′ leaves, such that VT = U . This claimimplies that any multilinear monomial of Pv,k′,l′ is of degree exactly k′ + l′.

The claim is clearly true for the base cases, and thus we next assume thatk′ > 0, and the claim is true for all v ∈ V , k∗ ∈ {0, ..., k′} and l∗ ∈ {1, ..., l′},such that (k∗ < k′ or l∗ < l′).

Let∏

w∈U xw, for some U ⊆ V , be a monomial of Pv,k′,l′ . By the definitionof Cv,k′,l′ , there is u such that (v, u) ∈ E, for which we have the following cases.

1. If Pu,k′−1,l′ has a monomial∏

w∈U\{v} xw: By the induction hypothesis, G

has an out-tree Tu = (Vu, Eu) rooted at u with exactly k′ − 1 internal nodesand l′ leaves, such that Vu = U \ {v}. By adding v and (v, u) to Tu, we getan out-tree T = (VT , ET ) of G that is rooted at v, has exactly k′ internalnodes and l′ leaves, and such that VT = U .

2. Else: There are k∗ ∈ {1, ..., k′}, l∗ ∈ {1, ..., l′ − 1} and U∗ ⊆ U , such thatPv,k∗,l∗ has the monomial

∏w∈U∗ xw, and Pu,k′−k∗,l′−l∗ has the monomial∏

w∈U\U∗ xw . By the induction hypothesis, G has an out-tree Tv = (Vv, Ev)rooted at v with exactly k∗ internal nodes and l∗ leaves, such that Vv = U∗.Moreover, G has an out-tree Tu = (Vu, Eu) rooted at u with exactly k′ − k∗

internal nodes and l′ − l∗ leaves, such that Vu = U \ U∗. Thus, we get thatthe out-tree T = (U, E(Tv) ∪ E(Tu) ∪ (v, u)) of G is rooted at v, and hasexactly k′ internal nodes and l′ leaves.

We get that G has an out-tree rooted at r of exactly k internal nodes and lleaves iff Pr,k,l has a mutlilinear monomial of degree at most t. �

The definition of (Cr,k,l, X, t) immediately implies the following observation.

Observation 2. We can compute (Cr,k,l, X, t) in polynomial time and space.


366 M. Zehavi

2.3 The Algorithm IOB-Alg[Tree-Alg]

Koutis et al. [7,17] gave an O∗(2t) time and polynomial space randomized algo-rithm for t-MLD. We denote this algorithm by MLD-Alg, and use it to get analgorithm for (k, l)-Tree (see Algorithm 2).

Algorithm 2. Tree-Alg(G, r, k, l)

1: Compute f(G, r, k, l) = (Cr,k,l, X, t).2: Accept iff MLD-Alg(Cr,k,l, X, t) accepts.

By Lemmas 1 and 2, and Observation 2, we have the following theorem.

Theorem 1. IOB-Alg[Tree-Alg] is an O∗(4k) time and polynomial space ran-domized algorithm for k-IOB.

3 A k-IOB Algorithm for Graphs of Bounded Degree Δ

3.1 A Modification of the Algorithm IOB-Alg[A]

We first prove that in Step 3 of IOB-Alg[A] (see Section 2.1), we can iterate overless than k values for l.

Given an out-tree T = (VT , ET ) and i ∈ N, denote the number of degree-inodes in T by nT

i .

Observation 3 ([14]). If |VT | ≥ 2, then 2 +∑

3≤i(i − 2)nTi = nT

1 .

Observation 4. An out-tree T of G with exactly k internal nodes contains anout-tree with exactly k internal nodes and at most k − k−2

Δ−1 leaves.

Proof. As long as T has an internal node v with at least two out-neighbors thatare leaves, delete one of these leaves and its adjacent edge from T . Denote theresulting out-tree by T ′, and denote the tree that we get after deleting all theleaves in T ′ by T ′′. Note that T ′ has exactly k internal nodes, and that T ′ andT ′′ have the same number of leaves. Since T ′′ has k nodes and bounded degreeΔ, Observation 3 implies that if nT ′′

1 + nT ′′Δ = k, then nT ′′

1 = k − k−2Δ−1 , and if

nT ′′1 +nT ′′

Δ < k, then nT ′′1 < k− k−2

Δ−1 . We have that nT ′′1 ≤ k − k−2

Δ−1 , and thus we

conclude that T ′ has exactly k internal nodes and at most k − k−2Δ−1 leaves. �

Thus, in Step 3 of IOB-Alg[A], we can iterate only over l = 1, 2, ..., k − � k−2Δ−1�.

We add some preprocessing steps to IOB-Alg[A], and thus get Δ-IOB-Alg[A] (seeAlgorithm 3). These preprocessing steps will allow us to assume, when presentingalgorithm A, that the underlying undirected graph of G is a connected graphthat is neither a cycle nor a clique. This assumption will allow us to computea proper Δ-coloring of the underlying undirected graph of G (see Section 3.3),which we will use in the following Section 3.2.



Algorithm 3. Δ-IOB-Alg[A](G, k)

1: if k ≥ |V | or the underlying undirected graph of G is not connected then2: Reject.3: else if the underlying undirected graph of G is a cycle then4: if k = |V | − 1 then Accept iff G has a hamiltonian path. else Accept iff there

is at most one node of out-degree 2 in G. end if5: else if the underlying undirected graph of G is a clique then6: Accept.7: end if8: for all r ∈ V do9: if G has no out-branching T rooted at r then Go to the next iteration. end if

10: for l = 1, 2, ..., k − � k−2Δ−1

� do11: if A(G, r, k, l) accepts then Accept. end if12: end for13: end for14: Reject.

We can clearly perform the new preprocessing steps in O(|E|) time and O(|V |)space. Steps 2 and 4 are clearly correct. Since a tournament (i.e., a directed graphobtained by assigning a direction for each edge in an undirected complete graph)has a hamiltonian path [13], we have that Step 6 is also correct.

We have the following lemma.

Lemma 3. Δ-IOB-Alg[A] is an O(∑

r∈V (|E|+∑1≤l≤k−� k−2

Δ−1 � t(G, r, k, l))) time

and O(|V | + maxr∈V,1≤l≤k− k−2Δ−1 s(G, r, k, l)) space algorithm for k-IOB.

3.2 A Modification of the Reduction f

In this section assume that we have a proper Δ-coloring col : V → {c1, ..., cΔ} ofthe underlying undirected graph of G. Having such col, we modify the reductionf (see Section 2.2) to construct a ”better” input for t-MLD (i.e., an input inwhich t < k + l).

The Idea Behind the Modification: Recall that in the previous construction,we let each monomial represent a certain pair of an out-tree T = (VT , ET ) and afunction h : VT → V . The monomial included indeterminates representing all thenodes to which the nodes in VT are mapped. We can now select some color c ∈{c1, ..., cΔ}, and ignore some occurrences of indeterminates that represent nodeswhose color is c and whose degree in h(T ) is Δ. We thus construct monomialswith smaller degrees, and have an input for t-MLD in which t < k + l.

More precisely, the monomial representing T and h is∏

v∈U xh(v), where U isVT , excluding nodes mapped to nodes whose color is c and whose degree in T isΔ (except the root). We add constraints on T and h to garauntee that nodes inVT that are mapped to the same node do not have common neighbors in T .

The correctness is based on the following observation. Suppose that thereis an indeterminate xv that occurs more than once in the original monomial


368 M. Zehavi

representing T and h, but not in the new monomial representing them. Thusthe color of v is c. Moreover, there are different nodes u, w ∈ VT such thath(u) = h(w) = v, and the degree of u in T is Δ. We get that u has a neighboru′ in T and w has a different neighbor w′ in T , such that h(u′) = h(w′) and thecolor of h(u′) is not c. Thus xh(u′) occurs more than once in the new monomialrepresenting T and h. This implies that monomials that are not multilinear inthe original construction do not become multilinear in the new construction.

The Construction: Let (G, r, k, l) be an input for (k, l)-Tree. We now constructan input f(G, r, k, l, col) = (C, X, t) for t-MLD.

We add a node r′ to V and the edge (r′, r) to E. We color r′ with some c ∈{c1, ..., cΔ}\{col(r)}. In the following let < be some order on V ∪{nil}, such thatnil is the smallest element. Define X = {xv : v ∈ V }, and t = (2− Δ+1

Δ(Δ−1) )k +8.

Denote N(v, i, o) = {u ∈ V \ {i} : (v, u) ∈ E, u > o}.

We inductively define an arithmetic circuit Cc,i,o,bv,k′,l′ over X , for all v ∈ V, k′ ∈

{0, ..., k}, l′ ∈ {1, ..., l}, c ∈ {c1, ..., cΔ}, i such that (i, v) ∈ E, o such that(v, o) ∈ E or o = nil, and b ∈ {F, T }. Informally, v, k′ and l′ play the same roleas in the original construction; c indicates that only indeterminates representingnodes colored by c can be ignored; i and o are used for constraining the pairs oftrees and functions represented by monomials as noted in ”The Idea Behind theModification”; and b indicates whether the indeterminate of v is ignored.

Base Cases:

1. If k′ = 0, l′ = 1 and b = F : Cc,i,o,bv,k′,l′ = xv.

2. Else if [k′ = 0] or [N(v, i, o) = ∅] or [(|N(v, i, o)| > l′ or col(v) �= c or v = r

or |N(v, i, nil)| < Δ − 1) and b = T ]: Cc,i,o,bv,k′,l′ = 0.

Steps: (assume that none of the base cases applies)

1. If l′ = 1 and b = F : Cc,i,o,bv,k′,l′ = xv

∑u∈N(v,i,o)(C

c,v,nil,Fu,k′−1,l′ + Cc,v,nil,T

u,k′−1,l′).2. Else if b = F :

Cc,i,o,bv,k′,l′ =

∑u∈N(v,i,o)[xvCc,v,nil,F

u,k′−1,l′ + xvCc,v,nil,Tu,k′−1,l′+∑

1≤k∗≤k′∑

1≤l∗≤l′−1 Cc,i,u,bv,k∗,l∗(Cc,v,nil,F

u,k′−k∗,l′−l∗ + Cc,v,nil,Tu,k′−k∗,l′−l∗)].

3. If b = T and there is exactly one node u in N(v, i, o): Cc,i,o,bv,k′,l′ = Cc,v,nil,F

u,k′−1,l′ .4. Else if b = T :

(a) Denote u = min(N(v, i, o)).

(b) Cc,i,o,bv,k′,l′ =

∑1≤k∗≤k′

∑1≤l∗≤l′−1 Cc,i,u,b

v,k∗,l∗Cc,v,nil,Fu,k′−k∗,l′−l∗ .

The following order shows that when computing an arithmetic circuit Cc,i,o,bv,k′,l′ ,

we only use arithmetic circuits that have been already computed.

Order:

1. For k′ = 0, 1, ..., k:(a) For l′ = 1, 2, ..., l:

i. ∀v ∈ V, c ∈ {c1, ..., cΔ}, i s.t. (i, v) ∈ E, o s.t. (v, o) ∈ E or o = nil,

b ∈ {F, T }: Compute Cc,i,o,bv,k′,l′ .



Define C =∑

c∈{c1,...,cΔ} Cc,r′,nil,Fr,k,l .

Denote the polynomial that Cc,i,o,bv,k′,l′ (resp. C) represents by P c,i,o,b

v,k′,l′ (resp. P ).

Correctness: We need the next two definitions, which we illustrate in Fig. 1.

Definition 1. Let v ∈ V , k′ ∈ {0, ..., k}, l′ ∈ {1, ..., l}, c ∈ {c1, ..., cΔ}, i suchthat (i, v) ∈ E, o such that (v, o) ∈ E or o = nil. Given a subgraph T = (VT , ET )of G, we say that

1. T is a (v, k′, l′, c, i, o, F )-tree if(a) T is an out-tree rooted at v with exactly k′ internal nodes and l′ leaves.(b) Every out-neighbor of v in T belongs to N(v, i, o).

2. T is a (v, k′, l′, c, i, o, T )-tree if(a) col(v) = c, v �= r, and |N(v, i, nil)| = Δ − 1.(b) Every node in N(v, i, o) is an out-neighbor of v in T , and N(v, i, o) �= ∅.(c) There is at most one node i′ ∈ VT such that (i′, v) ∈ ET .

i. If such an i′ exists: (v, i′) /∈ ET , and T ′ = (VT , ET \ {(i′, v)}) is anout-tree rooted at v.

ii. Else: T is a (v, k′, l′, c, i, o, F )-tree.

Definition 2. Given a (v, k′, l′, c, i, o, b)-tree T = (VT , ET ), define I(T ) =

{u ∈ VT : [u �= v ∧ (col(u) �= c ∨ u has less than (Δ − 1) out − neighbors in T )]

∨[u = v ∧ (b = F ∨ v has an in − neighbor in T )]}.

v1

v1 v1

v2 v2 v2 v3

v3

v4

v4 v4

v3 v5 v5

v5 G T1 T2 = 3

Fig. 1. Assume that r = v1 < v2 < v3 < v4 < v5, and that shapes represent col-ors. We have that T1 is a (v2, k

′, l′, O, v1, nil, T )-tree for any k′ and l′, and I(T1) ={v1, v2, v3, v4, v5}. Moreover, T2 is a (v2, 3, 2, O, v1, v3, T )-tree, and I(T2) = {v1, v3, v4}.

Observation 5. Let T = (VT , ET ) be a (v, k′, l′, c, i, o, b)-tree of G, such that

there is no i′ ∈ VT for which (i′, v) ∈ ET . Then, P c,i,o,bv,k′,l′ has the (multilinear)

monomial∏

w∈I(T ) xw.

Proof. We prove the claim by using induction on the construction. The claimis clearly true for the base cases. Next consider a (v, k′, l′, c, i, o, b)-tree T =

(VT , ET ) of G, such that Cc,i,o,bv,k′,l′ is not constructed in the base cases. Assume

that the claim is true for all (v, k, l, c, i, o, b) such that C c,i,o,b

v,k,lis constructed

before Cc,i,o,bv,k′,l′ . Denote by u the smallest out-neighbor of v in T .


370 M. Zehavi

Denote by Tv = (Vv, Ev) and Tu = (Vu, Eu) the two out-trees of G in theforest F = (VT , ET \ {(v, u)}), such that v ∈ Vv. If u /∈ I(T ) (this is not the caseif b = T , since then col(u) �= c), then denote b′ = T , and note that the set ofout-neighbors of u in Tu contains all of the neighbors of u in G, excluding v; elsedenote b′ = F . We have the following cases.

1. If |Vv| = 1: Tu is a (u, k′ − 1, l′c, v, nil, b′)-tree of G. If b = F , then I(Tu) =

I(T ) \ {v}; else I(Tu) = I(Tv). By the induction hypothesis Cc,v,nil,b′

u,k′−1,l′ has

the monomial∏

w∈I(Tu) xw. Thus, by the definition of Cc,i,o,bv,k′,l′ , P c,i,o,b

v,k′,l′ hasthe required monomial.

2. Else: Denote the number of internal nodes and leaves in Tv by kv and lv,respectively. Note that 1 ≤ kv ≤ k′, 1 ≤ lv < l′, Tv is a (v, kv, lv, c, i, u, b)-tree of G, and Tu is a (u, k′ − kv, l′ − lv, c, v, nil, b′)-tree of G. Moreover,I(Tv) and I(Tu) are disjoint sets whose union is I(T ). By the induction hy-

pothesis, P c,i,u,bv,kv ,lv

has the monomial∏

w∈I(Tv) xw, and P c,v,nil,b′

u,k′−kv ,l′−lvhas the

monomial∏

w∈I(Tu) xw. By the definition of Cc,i,o,bv,k′,l′ , P c,i,o,b

v,k′,l′ has the mono-

mial∏

w∈I(Tv) xw

∏w∈I(Tu) xw =

∏w∈I(T ) xw.

�Observation 6. If P c,i,o,b

v,k′,l′ has a (multilinear) monomial∏

w∈U xw, for someU ⊆ V , then G has a (v, k′, l′, c, i, o, b)-tree T such that I(T ) = U .

Proof. We prove the claim by using induction on the construction. The claim isclearly true for the base cases. Let

∏w∈U xw, for some U ⊆ V , be a monomial

of P c,i,o,bv,k′,l′ , such that Cc,i,o,b

v,k′,l′ is not constructed in the base cases. Assume that

the claim is true for all C c,i,o,b

v,k,lthat is constructed before Cc,i,o,b

v,k′,l′ .

First suppose that b = F . By the definition of Cc,i,o,bv,k′,l′ , there are u ∈ N(v, i, o)

and b′ ∈ {F, T } such that one of the next conditions is fulfilled.

1. Cc,v,nil,b′

u,k′−1,l′ has the monomial∏

w∈U\{v} xw. By the induction hypothesis, G

has a (u, k′ − 1, l′, c, v, nil, b′)-tree Tu = (Vu, Eu), such that I(Tu) = U \ {v}.Suppose that there is i′ ∈ Vu such that (i′, u) ∈ Eu. In this case b′ = T ; thusv /∈ Vu and the set of out-neighbors of u in Tu contains all the neighbors ofu in G, excluding v. We get that i′ is an out-neighbor of u in Tu, which acontradiction. Thus, by adding v and (v, u) to Tu, we get a (v, k′, l′, c, i, o, b)-tree T such that I(T ) = U (since I(T ) = I(Tu) ∪ {v}).

2. There are k∗ ∈ {1, ..., k′}, l∗ ∈ {1, ..., l′ − 1} and U∗ ⊆ U , such that

P c,i,u,bv,k∗,l∗ has the monomial

∏w∈U∗ xw, and P c,v,nil,b′

u,k′−k∗,l′−l∗ has the monomial∏w∈U\U∗ xw . By the induction hypothesis, G has a (v, k∗, l∗, c, i, u, b)-tree

Tv = (Vv, Ev) such that I(Tv) = U∗, and a (u, k′ −k∗, l′ − l∗, c, v, nil, b′)-treeTu = (Vu, Eu) such that I(Tu) = U \ U∗. Consider the following cases.

(a) If v ∈ Vu: v /∈ I(Tu) (since v ∈ I(Tv)). Thus col(v) = c and v hasΔ − 1 out-neighbors in Tu. Note that v is not an out-neighbor of u inTu, and thus u is an out-neighbor of v in Tu. Therefore b′ = T , and thuscol(u) = c, which is a contradiction (since col is a proper coloring).



(b) If there is w ∈ (Vv ∩ Vu) \ {v, u} �= ∅: Since I(Tv) ∩ I(Tu) = ∅, we getthat col(w) = c and (w has Δ neighbors in Tv or Tu). Thus there is w′

that is a neighbor of w in both Tv and Tu, such that col(w′) �= c. We getthat w′ ∈ I(Tv) ∩ I(Tu) = ∅, which is a contradiction.

(c) If u ∈ Vv: u is not an out-neighbor of v in Tv. Therefore u has less thanΔ − 1 out-neighbors in Tv, and thus u ∈ I(Tv). We get that u /∈ I(Tu),which implies that the set of out-neighbors of u in Tu contains all theneighbors of u in G, excluding v. Thus u has a neighbor, which is not v,in both Tv and Tu, and we have a contradiction according to Case 2b.

We get that Vv ∩ Vu = ∅. If there is i′ ∈ Vu such that (i′, u) ∈ Eu, thenwe get a contradiction in the same manner as in Case 1. We get that T =(Vv ∪Vu, Ev ∪Eu ∪{(v, u)}) is an out-tree of G. It is straightforward to verifythat T is a (v, k′, l′, c, i, o, b)-tree of G such that I(T ) = I(Tv) ∪ I(Tu) (andthus I(T ) = U).

Now suppose that b = T . Denote by u the smallest node in N(v, i, o). By the

definition of Cc,i,o,bv,k′,l′ , one of the next conditions is fulfilled.

1. If N(v, i, o) = {u}: P c,v,nil,Fu,k′−1,l′ has the monomial

∏w∈U xw . By the induction

hypothesis, G has a (u, k′ − 1, l′, c, v, nil, F )-tree Tu such that I(Tu) = U .Since v is not an out-neighbor of u in Tu, by adding v and (v, u) to Tv, weget a (v, k′, l′, c, i, o, b)-tree T of G (which may not be an out-tree), such thatI(T ) = I(Tu) = U .

2. Else: There are k∗ ∈ {1, ..., k′}, l∗ ∈ {1, ..., l′ − 1} and U∗ ⊆ U , such that

P c,i,u,bv,k∗,l∗ has the monomial

∏w∈U∗ xw, and P c,v,nil,F

u,k′−k∗,l′−l∗ has the monomial∏w∈U\U∗ xw . By the induction hypothesis, G has a (v, k∗, l∗, c, i, u, b)-tree

Tv = (Vv, Ev) such that I(Tv) = U∗, and a (u, k′ −k∗, l′ − l∗, c, v, nil, F )-treeTu = (Vu, Eu) such that I(Tu) = U \ U∗. Consider the following cases.(a) If there is w ∈ (Vv ∩Vu) \ {v, u} �= ∅: We get a contradiction in the same

manner as in the previous Case 2b.(b) If u ∈ Vv: Since col(u) �= c, we get that u ∈ I(Tv) ∪ I(Tu) = ∅, which is

a contradiction.We get that Vv ∩ Vu \ {v} = ∅. Denote T = (VT = (Vv ∪ Vu), ET = (Ev ∪Eu ∪ {(v, u)})). Suppose, by way of contradiction, that there are two nodesi1, i2 ∈ VT such that (i1, v), (i2, v) ∈ ET . Since Tv is a (v, k∗, l∗, c, i, u, b)-treeand Tu is an out-tree, we can assume WLOG that i1 ∈ Vv and i2 ∈ Vu. Weget that v ∈ I(Tv), and thus v /∈ I(Tu). Therefore v has Δ− 1 out-neighborsin Tu; but since Tu is an out-tree rooted at u, and v is not an out-neighbor ofu in Tu, we have a contradiction. Thus we get that T is a (v, k′, l′, c, i, o, b)-tree of G such that I(T ) = I(Tv) ∪ I(Tu) (and thus I(T ) = U). �

Observation 7. If (G, r, k, l) has a solution, then P has a multilinear monomialof degree at most t.

Proof. Let T = (VT , ET ) be a solution. Denote n(T, c) = {v ∈ VT : col(v) =c, v has Δ neighbors in T }, and c∗ = argmaxc∈{c1,...,cΔ}{|n(T, c)|}. By Observa-tion 4 and the pseudocode of Δ-IOB-Alg[A] (see Section 3.1), we get that


372 M. Zehavi

1. 2 +∑

3≤i≤Δ(i − 2)nTi = nT

1 .

2.∑

1≤i≤Δ nTi = k + l.

3. nT1 − 1 ≤ l ≤ k − k−2

Δ−1 .

4. |n(T, c∗)| ≥ nTΔ/Δ.

These conditions imply that k+l−|n(T, c∗)| ≤ (2− Δ+1Δ(Δ−1) )k+7. Since T is an

(r, k, l, c∗, r′, nil, F )-tree, the definition of C and Observation 5 imply that P hasthe (multilinear) monomial

∏w∈I(T ) xw. Note that |I(T )| ≤ k+ l−|n(T, c∗)|+1,

and thus we get the observation. �SinceObservation 6 implies that ifP has amultilinearmonomial, then (G, r, k, l)

has a solution, and by Observation 7, we get the following lemma.

Lemma 4. (G, r, k, l) has a solution iff (C, X, t) has a solution.

The definition of (C, X, t) immediately implies the following observation.

Observation 8. We can compute (C, X, t) in polynomial time and space.

3.3 The Algorithm Δ-IOB-Alg[Δ-Tree-Alg]

Skulrattanakulchai [16] gave a linear-time algorithm that computes a proper Δ-coloring of an undirected connected graph of bounded degree Δ, which is notan odd cycle or a clique. In Δ-Tree-Alg (see Algorithm 4), we assume that theunderlying undirected graph of G is connected, and that it is not a cycle or aclique, since these cases are handled in the preprocessing steps of Δ-IOB-Alg[A].

Algorithm 4. Δ-Tree-Alg(G, r, k, l)

1: Use the algorithm in [16] to get a proper Δ-coloring col of the underlying undirectedgraph of G.

2: Compute f(G, r, k, l, col) = (C, X, t).3: Accept iff MLD-Alg(C, X, t) accepts.

By Lemmas 3 and 4, and Observation 8, we have the following theorem.

Theorem 2. Δ-IOB-Alg[Δ-Tree-Alg] is an O∗(2(2− Δ+1Δ(Δ−1) )k) time and polyno-

mial space randomized algorithm for k-IOB.

4 Open Questions

In this paper we have presented an O∗(4k) time algorithm for k-IOB, whichimproves the previous best known O∗ running time for k-IOB. However, ouralgorithm is randomized, while the algorithm that has the previous best knownO∗ running time is deterministic. Can we obtain an O∗(4k) time determin-istic algorithm for k-IOB? Moreover, can we further reduce the O∗(4k) and

O∗(2(2− Δ+1Δ(Δ−1)

)k) running times for k-IOB presented in this paper?



References

1. Cohen, N., Fomin, F.V., Gutin, G., Kim, E.J., Saurabh, S., Yeo, A.: Algorithm forfinding k-vertex out-trees and its application to k-internal out-branching problem.J. Comput. Syst. Sci. 76(7), 650–662 (2010)

2. Demers, A., Downing, A.: Minimum leaf spanning tree. US Patent no. 6,105,018(August 2013)

3. Fomin, F.V., Gaspers, S., Saurabh, S., Thomasse, S.: A linear vertex kernel formaximum internal spanning tree. J. Comput. Syst. Sci. 79(1), 1–6 (2013)

4. Fomin, F.V., Grandoni, F., Lokshtanov, D., Saurabh, S.: Sharp separation andapplications to exact and parameterized algorithms. Algorithmica 63(3), 692–706(2012)

5. Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete prob-lems. In: Proc. STOC, pp. 47–63 (1974)

6. Gutin, G., Razgon, I., Kim, E.J.: Minimum leaf out-branching and related prob-lems. Theor. Comput. Sci. 410(45), 4571–4579 (2009)

7. Koutis, I.: Faster algebraic algorithms for path and packing problems. In: Aceto, L.,Damgard, I., Goldberg, L.A., Halldorsson, M.M., Ingolfsdottir, A., Walukiewicz,I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 575–586. Springer, Heidelberg(2008)

8. Koutis, I., Williams, R.: Limits and applications of group algebras for parameter-ized problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas,S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 653–664. Springer,Heidelberg (2009)

9. Nederlof, J.: Fast polynomial-space algorithms using mobius inversion: improvingon steiner tree and related problems. In: Albers, S., Marchetti-Spaccamela, A., Ma-tias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555,pp. 713–725. Springer, Heidelberg (2009)

10. Niedermeier, R.: Invitation to fixed-parameter algorithms. Oxford University Press(2006)

11. Ozeki, K., Yamashita, T.: Spanning trees: A survey. Graphs and Combina-torics 27(1), 1–26 (2011)

12. Prieto, E., Sloper, C.: Reducing to independent set structure – the case of k-internalspanning tree. Nord. J. Comput. 12(3), 308–318 (2005)

13. Redei, L.: Ein kombinatorischer satz. Acta Litteraria Szeged 7, 39–43 (1934)14. Raible, D., Fernau, H., Gaspers, D., Liedloff, M.: Exact and parameterized algo-

rithms for max internal spanning tree. Algorithmica 65(1), 95–128 (2013)15. Salamon, G.: A survey on algorithms for the maximum internal spanning tree and

related problems. Electronic Notes in Discrete Mathematics 36, 1209–1216 (2010)16. Skulrattanakulchai, S.: Delta-list vertex coloring in linear time. Inf. Process.

Lett. 98(3), 101–106 (2006)17. Williams, R.: Finding paths of length k in O∗(2k) time. Inf. Process. Lett. 109(6),

315–318 (2009)


3.5 Representative Families: A Unified Tradeoff-Based

Approach

Hadas Shachnai and Meirav Zehavi. Representative Families: A Unified Tradeoff-Based

Approach. In the proc. of the 22nd European Symposium on Algorithms (ESA), pages

786–797, 2014.

104


Representative Families:

A Unified Tradeoff-Based Approach

Hadas Shachnai and Meirav Zehavi

Department of Computer Science, Technion, Haifa 32000, Israel{hadas,meizeh}@cs.technion.ac.il

Abstract. Let M = (E, I) be a matroid, and let S be a family of sub-

sets of size p of E. A subfamily S ⊆ S represents S if for every pair of setsX ∈ S and Y ⊆ E \ X such that X ∪Y ∈ I, there is a set X ∈ S disjoint

from Y such that X ∪ Y ∈ I. Fomin et al. (Proc. ACM-SIAM Sympo-sium on Discrete Algorithms, 2014) introduced a powerful technique forfast computation of representative families for uniform matroids. In thispaper, we show that this technique leads to a unified approach for sub-stantially improving the running times of parameterized algorithms forsome classic problems. This includes, among others, k-Partial Cover,k-Internal Out-Branching, and Long Directed Cycle. Our ap-proach exploits an interesting tradeoff between running time and thesize of the representative families.

1 Introduction

Matroid theory connects such disparate branches of combinatorial theory andalgebra as graph theory, combinatorial optimization, linear algebra, and algo-rithm theory. Marx [20] was the first to apply matroids to design fixed-parametertractable algorithms, using the notion of representative families as a main tool.Representative families for set systems were introduced by Monien [21].

Let E be a universe of n elements, and I a family of subsets of size at mostk of E, for some k ∈ N, i.e., I ⊆ {S ⊆ E : |S| ≤ k}. Then, Un,k = (E, I) iscalled a uniform matroid. Consider such a matroid and a family S of p-subsetsof E, i.e., sets of size p. A subfamily S ⊆ S represents S if for every pair of setsX ∈ S and Y ⊆ E \ X such that X ∪ Y ∈ I (i.e., |Y | ≤ (k − p)), there is a set

X ∈ S disjoint from Y . In other words, if a set Y can be extended to a set ofsize at most k by adding a subset from S, then it can also be extended to a setof the same size by adding a subset from S.

The Two Families Theorem of Bollobas [2] implies that for any uniform ma-troid Un,k = (E, I) and a family S of p-subsets of E, for some 1 ≤ p ≤ k, there is

a subfamily S ⊆ S of size(kp

)that represents S. For more general matroids, the

generalization of Lovasz for this theorem, given in [18], implies a similar result,and algorithms based on this generalization are given in [20] and [11].

A parameterized algorithm with parameter k has running time O∗(f(k)) forsome function f , where O∗ hides factors polynomial in the input size. A fast com-putation of representative families for uniform matroids plays a central role in

A. Schulz and D. Wagner (Eds.): ESA 2014, LNCS 8737, pp. 786–797, 2014.c© Springer-Verlag Berlin Heidelberg 2014


Representative Families: A Unified Tradeoff-Based Approach 787

obtaining better running times for such algorithms. Plenty parameterized algo-rithms are based dynamic programming, where after each stage, the algorithmcomputes a family S of sets that are partial solutions. At that point we cancompute a subfamily S ⊆ S that represents S. Then, each reference to S can bereplaced by a reference to S. The representative family S contains “enough” setsfrom S; therefore, such replacement preserves the correctness of the algorithm.Thus, if we can efficiently compute representative families that are small enough,we can substantially improve the running time of the algorithm.

For uniform matroids, Monien [21] computed representative families of size∑k−pi=0 pi in time O(|S|p(k−p)

∑k−pi=0 pi), and Marx [19] computed representative

families of size(kp

)in time O(|S|2pk−p). Recently, Fomin et al. [11] introduced

a powerful technique which enables to compute representative families of size(kp

)2o(k) log n in time O(|S|(k/(k−p))k−p2o(k) log n), thus significantly improving

the previous results.In this paper, we show that the technique of [11] leads to a unified tradeoff-

based approach for substantially improving the running time of parameterizedalgorithms for some classic problems. In particular, we demonstrate the appli-cability of our approach for the following problems (among others).

k-Partial Cover (k-PC): Given a universe U , a family S of subsets of U anda parameter k ∈ N, find the smallest number of sets in S whose union containsat least k elements.

k-Internal Out-Branching (k-IOB): Given a directed graph G = (V, E) anda parameter k ∈ N, decide if G has an out-branching (i.e., a spanning tree havingexactly one node of in-degree 0) with at least k nodes of out-degree ≥ 1.

1.1 Prior Work

The k-PC problem generalizes the well-known k-Dominating Set (k-DS) prob-lem, defined as follows. Given a graph G = (V, E) and a parameter k ∈ N, findthe smallest size of a set U ⊆ V such that the number of nodes in the closedneighborhood of U is at least k. If k-PC can be solved in time t(|U |, |S|, k), thenk-DS can be solved in time t(|V |, |V |, k) (see, e.g., [3]). Note that the specialcases of k-PC and k-DS in which k = n, are the classical NP-complete SetCover and Dominating Set problems [12], respectively. Table 1 presents asummary of known parameterized algorithms for k-PC and k-DS. We note thatthe parameterized complexity of k-PC and k-DS has been studied also withrespect to other parameters and for more restricted inputs (see, e.g., [3,10,28]).

The k-IOB problem is of interest in database systems [6]. A special case ofk-IOB, called k-Internal Spanning Tree (k-IST), asks if a given undirectedgraph G = (V, E) has a spanning tree with at least k internal nodes. An inter-esting application of k-IST, for connecting cities with water pipes, is given in[25]. The k-IST problem is NP-complete, since it generalizes the HamiltonianPath problem [13]; thus, k-IOB is also NP-complete. Table 2 presents a sum-mary of known parameterized algorithms for k-IOB and k-IST. More detailson k-IOB, k-IST and variants of these problems can be found in the excellentsurveys of [22,26].


788 H. Shachnai and M. Zehavi

Table 1. Known parameterized algorithms for k-PC and k-DS

Reference Deterministic\Randomized Variant Running Time

Bonnet et al. [3] det k-PC O∗(4kk2k)

Blaser [1] rand k-PC O∗(5.437k)

Kneis et al. [16] det k-DS O∗((16 + ε)k)

rand k-DS O∗((4 + ε)k)

Chen et al. [4] det k-DS O∗(5.437k)

Kneis [15] det k-DS O∗((4 + ε)k)

Koutis et al. [17] rand k-DS O∗(2k)

This paper det k-PC O∗(2.619k)

Table 2. Known parameterized algorithms for k-IOB and k-IST

Reference Deterministic\Randomized Variant Running Time

Priesto et al. [24] det k-IST O∗(2O(k log k))

Gutin al. [14] det k-IOB O∗(2O(k log k))

Cohen et al. [5] det k-IOB O∗(55.8k)

rand k-IOB O∗(49.4k)

Fomin et al. [8] det k-IOB O∗(16k+o(k))

Fomin et al. [7] det k-IST O∗(8k)

Zehavi [29] rand k-IOB O∗(4k)

This paper det k-IOB O∗(6.855k)

1.2 Our Results

Given a uniform matroid Un,k = (E, I) and a family S of p-subsets of E, we

compute a subfamily S ⊆ S of size(ck)k

pp(ck − p)k−p2o(k) log n which represents S,

in time O(|S|((ck)/(ck − p))k−p2o(k) log n), for any fixed c ≥ 1. For c = 1, we

have the result of Fomin et al. [11]. As c grows larger, the size of S increases,with a corresponding decrease in computation time. This enables to obtain betterrunning times for the algorithms for Long Directed Cycle, Weighted k-Path and Weighted k-Tree, as given in [11].

In particular, we use this approach to develop deterministic algorithms solvingk-PC and k-IOB in times O∗(2.619k) and O∗(6.855k), respectively. We thussignificantly improve the algorithm with the best known O∗(5.437k) running timefor k-PC [1], and the deterministic algorithm with the best known O∗(16k+o(k))running time for k-IOB [8]. This also improves the running times of the bestknown deterministic algorithms for k-DS and k-IST (see Tables 1 and 2).

Independently of our work, Fomin et al. [9] have recently obtained a tradeoffsimilar to the one we show in Section 3.

Technical Contribution: Our unified approach exploits an interesting tradeoffbetween running time and the size of the representative families. This tradeoff ismade precise by using, along with the scheme of [11], a parameter c ≥ 1, whichenables a more careful selection of elements to the sets.



Indeed, towards computing a representative family S, we seek a family F ⊆ 2E

that satisfies the following condition. For every pair of sets X ∈ S, and Y ⊆ E\Xsuch that X∪Y ∈ I, there is a set F ∈ F such that X ⊆ F , and Y ∩F = ∅. Then,we compute S by iterating over all S ∈ S and F ∈ F such that S ⊆ F . Thetime complexity of this iterative process is the dominant factor in the overallrunning time. Thus, we seek a small family F , such that for any S ∈ S, theexpected number of sets in F containing S is small. In constructing each setF ∈ F , we insert each element e ∈ E to F with probability p/(ck). For c = 1,this is the approach proposed in [11]. When we take a larger value for c, we needto construct a larger family F . Yet, since elements in E are inserted to sets inF with a smaller probability, we get that for any S ∈ S, the expected numberof sets in F containing S is smaller.

Organization: Section 2 gives some definitions and notation. Section 3 presentsa tradeoff between running time and the size of the representative families. Usingthis computation, we derive in Sections 4 and 5 our main results, which are fastparameterized algorithms for k-PC and k-IOB. Finally, Section 6 shows theimprovements in running times resulting from our tradeoff-based approach forthree previous applications of representative families of [11].

Due to space constraints, some of the results are omitted. We give the fulldetails in [27].

2 Preliminaries

We now define the weighted version of representative families.

Definition 1. Given a matroid Un,k = (E, I), a family S of p-subsets of E, and

a function w : S → R, we say that a subfamily S ⊆ S max (min) represents Sif for every pair of sets X ∈ S, and Y ⊆ E \ X such that X ∪ Y ∈ I, there is a

set X ∈ S disjoint from Y such that w(X) ≥ w(X) (w(X) ≤ w(X)).

The special case where w(S) = 0, for all S ∈ S, is the unweighted version ofDefinition 1.

Notation: Given a set U and a nonnegative integer t, let(Ut

)= {U ′ ⊆ U :

|U ′| = t}. Also, recall that an out-tree T is a directed tree having exactly onenode of in-degree 0, called the root. We denote by VT , ET , i(T ) and �(T ) thenode set, edge set, number of internal nodes (i.e., nodes of out-degree ≥ 1) andnumber of leaves (i.e., nodes of out-degree 0), respectively.

3 A Tradeoff-Based Approach

In this section we sketch the beginning of the proof of the following theorem,which contains our main contribution. We note that, to fully prove this theorem,we essentially follow and redo the proof of Theorem 6 in [11], taking into accountour tradeoff-related parameter c.



Theorem 1. Given a parameter c ≥ 1, a uniform matroid Un,k = (E, I), a

family S of p-subsets of E, and a function w : S → R, a family S ⊆ S of

size(ck)k

pp(ck − p)k−p2o(k) log n that max (min) represents S can be found in time

O(|S|(ck/(ck − p))k−p2o(k) log n + |S| log |S|).

Roughly speaking, the proof of Theorem 1 is structured as follows. We first arguethat we can focus on finding a certain data structure to compute representativefamilies. Then, we construct such a data structure that is not as efficient asrequired (first randomly, and then deterministically). Finally, we show how toimprove the “efficiency” of this data structure (this is made precise below).

Proof. Clearly, we may assume that |S| ≥ (ck)k

pp(ck − p)k−p2o(k) log n. Recall that

our computation of representative families requires finding initially a family F ⊆2E that satisfies the following condition. For every pair of sets X ∈ S, andY ⊆ E \ X such that X ∪ Y ∈ I, there is a set F ∈ F such that X ⊆ F , andY ∩ F = ∅. An (n, k, p)-separator is a data structure containing such a familyF , which, given a set S ∈

(Ep

), outputs the subfamily of sets in F that contain

S, i.e., χ(S) = {F ∈ F : S ⊆ F}.To derive a fast computation, we need an efficient (n, k, p)-separator, where

efficiency is measured by the following parameters: ζ = ζ(n, k, p), the number ofsets in the family F ; τI = τI(n, k, p), the time required to compute the family F ;Δ = Δ(n, k, p), the maximum size of χ(S), for any S ∈

(Ep

); and τQ = τQ(n, k, p),

an upper bound for the time required to output χ(S), for any S ∈(Ep

).

Given such a separator, a subfamily S ⊆ S of size ζ that max (min) representsS can be constructed in time O(τI + |S|τQ+ |S| log |S|) as follows. First, computeF , and χ(S) for all S ∈ S. Then, order S = {S1, . . . , S|S|}, such that w(Si−1) ≥w(Si) (w(Si−1) ≤ w(Si)), for all 2 ≤ i ≤ |S|. Finally, return all Si ∈ S forwhich there is a set F ∈ F containing Si but no Sj , for 1 ≤ j < i. Formally,

return S = {Si ∈ S : χ(Si) \ (⋃

1≤j<i χ(Sj)) = ∅}. The correctness of thisconstruction is proved in [11]. Thus, to prove the theorem it suffices to find an(n, k, p)-separator with parameters:

– ζ∗ ≤ (ck)k

pp(ck − p)k−p2o(k) log n. [Separator size]

– τ∗I ≤ (ck)k

pp(ck − p)k−p2o(k)n log n. [Initialization time]

– τ∗Q ≤ (ck/(ck − p))k−p2o(k) log n. [Query time]

We start by giving an (n, k, p)-separator, that we call Separator 1, with thefollowing parameters, which are worse than required:



– ζ1 = O((ck)k

pp(ck − p)k−pkO(1) log n). [Separator size]

– τ1I = O(

(2n

ζ1

)nO(k)). [Initialization time]

– Δ1 = O((ck/(ck − p))k−pkO(1) log n). [Query size]

– τ1Q = O(

(ck)k

pp(ck − p)k−pnO(1)). [Query time]

First, we give a randomized algorithm which constructs, with positive probabil-ity, an (n, k, p)-separator having the desired ζ1 and Δ1 parameters. We then showhow to deterministically construct an (n, k, p)-separator having all the desired

parameter values. Let t =(ck)k

pp(ck − p)k−p(k + 1) lnn, and construct the family

F = {F1, . . . , Ft} as follows. For each i ∈ {1, . . . , t} and element e ∈ E, insert eto Fi with probability p/(ck). The construction of different sets in F , as well asthe insertion of different elements into each set in F , are independent. Clearly,ζ1 = t is within the required bound.

For fixed sets X ∈(Ep

), Y ∈

(E\Xk−p

)and F ∈ F , the probability that

X ⊆ F and Y ∩ F = ∅ is (p

ck)p(1 − p

ck)k−p =

pp(ck − p)k−p

(ck)k= (k + 1) lnn/t.

Thus, the probability that no set F ∈ F satisfies X ⊆ F and Y ∩ F = ∅ is(1 − (k + 1) lnn/t)t ≤ e−(k+1) ln n = n−k−1. There are at most nk choices for

X ∈(Ep

)and Y ∈

(E\Xk−p

); thus, applying the union bound, the probability that

there exist X ∈(Ep

)and Y ∈

(E\Xk−p

), such that no set F ∈ F satisfies X ⊆ F

and Y ∩ F = ∅, is at most n−k−1 · nk = 1/n.For any sets S ∈

(Ep

)and F ∈ F , the probability that S ⊆ F is (p/(ck))p.

Therefore, |χ(S)|, the number of sets in F containing S, is a sum of t i.i.d.Bernoulli random variables with parameter (p/(ck))p. Then, the expected value

of |χ(S)| is E[|χ(S)|] = t(p

ck)p = (

ck

ck − p)k−p(k + 1) ln n. Applying standard

Chernoff bounds, we have that the probability that |χ(S)| ≥ 6E[|χ(S)|] is upperbounded by 2−6E[|χ(S)|] ≤ n−k−1. There are

(np

)choices for S ∈

(Ep

). Thus, by

the union bound, the probability that Δ1 > 6 · [((ck)/(ck − p))k−p(k + 1) lnn] isupper bounded by 1/n.

So far, we have given a randomized algorithm that constructs an (n, k, p)-separator having the desired ζ1 and Δ1 parameters with probability at least1− 2/n > 0. To deterministically construct F in time bounded by τ1

I , we iterate

over all families of t subsets of E (there are(2n

ζ1

)such families), where for each

family F , we test in time nO(k) whether Δ1 is within the required bound, andwhether for any pair of sets X ∈

(Ep

)and Y ∈

(E\Xk−p

), there is a set F ∈ F such

that X ⊆ F and Y ∩F = ∅. Then, given a set S ∈(Ep

), we can deterministically

compute χ(S) within the stated bound for τ1Q, by iterating over F and inserting

each set that contains S.



We next repeatedly apply Lemmas 4.4 and 4.5 of [11] to Separator 1, con-structing intermediate separators different than those in [11] (as we start witha different Separator 1). This process eventually leads to an (n, k, p)-separatorhaving the desired parameters ζ∗, τ∗

I and τ∗Q. ��

Given a parameter c and assuming that w is irrelevant, let RepAlg(E, k, S) bethe algorithm developed in Theorem 1.

4 An Algorithm for k-Partial Cover

We now apply our scheme, RepAlg, to obtain a faster parameterized algorithmfor k-PC. Let m = |S| be the number of sets in S. The main idea of the al-gorithm is to iterate over the sets in S in some arbitrary order S1, S2, . . . , Sm,such that when we reach a set Si, we have already computed representative fam-ilies for families of “partial solutions” that include only elements from the setsS1, . . . , Si−1. Then, we try to extend the partial solutions by adding uncoveredelements from Si. The key observation, that leads to our improved running time,is that we cannot simply add “many” elements from Si at once, but rather addthese elements one-by-one; thus, we can compute new representative familiesafter adding each element, which are then used when adding the next element.

The Algorithm: We now describe PCAlg, our algorithm for k-PC (see thepseudocode below). The first step solves the simple case where k elements canbe covered with one set. Then, algorithm PCAlg generates a matrix M, whereeach entry M[i, j, �] holds a family that represents Soli,j,�. The sets in Soli,j,� arethose of exactly j elements, which can be covered by � sets among {S1, . . . , Si},i.e., Soli,j,� = {S ⊆ (

⋃S ′) : S ′ ⊆ {S1, . . . , Si}, |S| = j, |S ′| = �}.PCAlg iterates over all triples (i, j, �), where i ∈ {1, . . . , m}, j ∈ {1, . . . , k} and

� ∈ {1, . . . , min{i, k}}. In each iteration, corresponding to a triple (i, j, �), PCAlgcomputes M[i, j, �] by using M[i−1, j′, �−1], for all 1 ≤ j′ ≤ j, and M[i−1, j, �]. Inother words, PCAlg computes a family that represents Soli,j,� by using familiesthat represent Soli−1,j′,�−1, for all 1 ≤ j′ ≤ j, and Soli−1,j,�. In particular,algorithm PCAlg adds elements in Si one-by-one to sets in M[i − 1, j′, � − 1],for all 1 ≤ j′ ≤ j. After adding an element, PCAlg computes (in Step 7) newrepresentative families, to be used when adding the next element. Let Si ={s1, . . . , sr}. Then, PCAlg computes a family Ar′,j′ , for all 1 ≤ r′ ≤ r and0 ≤ j′ ≤ j, that represents the family of sets of exactly j′ elements that can becovered by {s1, . . . , sr′} and � − 1 sets among {S1, . . . , Si−1}. The family Ar′,j′

is computed by calling RepAlg with the family parameter containing the unionof Ar′−1,j′ and the family of sets obtained by adding sr′ to sets in Ar′−1,j′−1.

Suppose the solution is �∗. Then, using representative families guaranteesthat each entry M[i, j, �] holds “enough” sets from Soli,j,�, such that when thealgorithm terminates, M[m, k, �∗] = ∅. Moreover, using representative familiesguarantees that each entry M[i, j, �] does not hold “too many” sets from Soli,j,�,thereby yielding an improved running time.

Correctness and Running Time: We first state a lemma referring toSteps 5–8 in PCAlg. In this lemma, we use the following notation. For all



Algorithm 1. PCAlg(U, k, S = {S1, . . . , Sm})

1: if there is S ∈ S s.t. |S| ≥ k then return 1. end if2: let M be a matrix that has an entry [i, j, �] for all 0 ≤ i ≤ m, 1 ≤ j ≤ k and

0 ≤ � ≤ k, initialized to ∅.3: for i = 1, . . . , m, j = 1, . . . , k, � = 1, . . . , min{i, k} do4: let Si = {s1, . . . , sr}.5: A0,0 ⇐ {∅}, and for j′ = 1, . . . , j do A0,j′ ⇐ M[i − 1, j′, � − 1]. end for6: for r′ = 1, . . . , r, j′ = 0, . . . j do7: Ar′,j′ ⇐ RepAlg(U, k, [Ar′−1,j′ ∪ {S ∪ {sr′} : j′ ≥1, S ∈Ar′−1,j′−1, sr′ /∈S}]).8: end for9: M[i, j, �] ⇐ RepAlg(U, k, M[i − 1, j, �] ∪ Ar,j).

10: end for11: return the smallest � such that M[m, k, �] = ∅.

0 ≤ i ≤ m, 1 ≤ j ≤ k and 0 ≤ � ≤ k, let A∗i,j,� denote the family of sets contain-

ing j elements, constructed by adding elements from Si to sets in (⋃

1≤j′≤j M[i−1, j′, �−1])∪{∅}, i.e., A∗

i,j,� = {S∪S′i : S ∈ (

⋃1≤j′≤j M[i−1, j′, �−1])∪{∅}, S′

i ⊆Si, |S ∪ S′

i| = j}.

Lemma 2. Consider an iteration of Step 3 in PCAlg, corresponding to somevalues i, j and �. Then, the family Ar,j represents the family A∗

i,j,�.

We use Lemma 2 in proving the next lemma, showing the correctness of PCAlg.

Lemma 3. For all 0≤ i≤m, 1≤j≤k and 0≤�≤k, M[i, j, �] represents Soli,j,�.

We summarize in the next result.

Theorem 4. PCAlg solves k-PC in time O(2.619k|S| log2 |U |).Proof. Lemma 3 and Step 11 imply that PCAlg solves k-PC. Also, Lemmas 2and 3, and the way RepAlg proceeds, imply that PCAlg runs in time

O(2o(k)|S| log2 |U | · max0≤t≤k

{(ck)k

tt(ck − t)k−t(

ck

ck − t)k−t

})

Choosing c = 1.447, the maximum is obtained at t = αk, where α ∼= 0.55277.Thus, PCAlg runs in time O(2.61804k|S| log2 |U |).1 ��

5 An Algorithm for k-Internal Out-Branching

We show below how to use our scheme, RepAlg, to obtain a faster parameterizedalgorithm for k-IOB. We first define an auxiliary problem called (k, t)-Tree,which requires finding a tree on a “small” number of nodes, rather than a span-ning tree. Given a directed graph G = (V, E), a node r ∈ V , and nonnegativeintegers k and t, the (k, t)-Tree problem asks if G contains an out-tree T rootedat r, such that i(T ) = k and �(T ) = t. The following lemma implies that we canfocus on solving (k, t)-Tree.

1 Choosing c = 1, PCAlg runs in time O∗(2.851k).



Lemma 5 ([29]). If (k, t)-Tree can be solved in time τ(G, k, t), then k-IOBcan be solved in time O(|V |(|E| +

∑1≤t≤k τ(G, k, t))).

We next solve (k, t)-Tree. Our solution technique is based on iterating over allpairs of nodes v, u ∈ V , and all values 0 ≤ i ≤ k−1 and 0 ≤ � ≤ t. When we reachsuch v, u, i and �, we have already computed, for all v′, u′ ∈ V , 0 ≤ i′ ≤ i, and0 ≤ �′ ≤ � satisfying i′ + �′ < i + �, representative families of “partial solutions”.Such a partial solution is a set of nodes of an out-tree of G that is rooted at v′,includes u′ as a leaf (unless v′ =u′) and consists of i′ internal nodes (excludingv′) and �′ leaves (excluding u′). We then try to “connect” out-trees representedby partial solutions in a manner that results in a legal out-tree—i.e., an out-treeof G that is rooted at v, includes u as a leaf (unless v = u) and consists of iinternal nodes (excluding v) and � leaves (excluding u). In constructing a setof such legal out-trees, we add families of “small” partial solutions one-by-one,so we can compute new representative families after adding each family, andthen use them when adding the next one—this is a crucial point in obtainingour improved running time. The construction itself is quite technical. On a highlevel, it consists of iterating over some trees that indicate which families of partialsolutions should be currently used, and in which order they should be added.We briefly note that this iterative process is based on a tool called guiding trees,introduced in [23] (based on [11]).

Some Definitions: Let d ≥ 2 be a constant. Given nodes v, u ∈ V , 0 ≤ i ≤ k−1and 0 ≤ � ≤ t, let Tv,u,i,� be the set of out-trees of G rooted at v, having exactlyi internal nodes and � leaves, excluding v and u, where v = u or u is a leaf. Also,let Solv,u,i,� = {VT \ {v, u} : T ∈ Tv,u,i,�}. Given nodes v, u ∈ V , let Cv,u be theset of trees C rooted at v, where v = u or u is a leaf, VC ⊆ V , and 3 ≤ |VC | ≤ 4d.Given a node v of a rooted tree T , let fT (v) be the father of v in T .

The Algorithm: We now describe TreeAlg, our algorithm for (k, t)-Tree (seethe pseudocode below). TreeAlg first generates a matrix M, where each entryM[v, u, i, �] holds a family that represents Solv,u,i,�. TreeAlg iterates over alli ∈ {0, . . . , k−1}, � ∈ {0, . . . , t} such that 1 ≤ i+ �, and v, u ∈ V . Next, considersome iteration, corresponding to such i, �, v and u.

The goal in each iteration is to compute M[v, u, i, �], by using entries thatare already computed. TreeAlg generates a matrix N, where each entry N[C]holds a family that represents the subfamily of Solv,u,i,� including the node set(excluding v and u) of each out-tree T ∈ Tv,u,i,� complying with the rooted treeC as follows (see Fig. 1): (1) for any two nodes v′, u′ ∈ VC , v′ is an ancestorof u′ in C iff v′ is an ancestor of u′ in T , (2) the leaves in C are leaves in T ,and (3) in the forest obtained by removing VC from T , each tree has at mosttwo neighbors in T from VC and, unless this neighborhood includes only v, thetree contains at most (k + t)/d nodes. Roughly speaking, each entry N[C] iseasier to compute than the entry M[v, u, i, �], since C “guides” us through thecomputation as follows. The rooted tree C implies which entries in M are relevantto N[C], in which order they should be used, and, in particular, it ensures thatthese are only entries of the form M[v′, u′, i′, �′], where i′ + �′ ≤ (k + t)/d. Thisbound on i′ + �′ ensures that the families for which we compute representative



families are “small”, thereby reducing the running time of calls to RepAlg. Next,consider an iteration corresponding to some C ∈ Cv,u.

The current goal is to compute N[C], using the guidance of C. To this end,TreeAlg generates a matrix L, where each entry L[j, i′, �′] holds a family thatrepresents the family of node sets, excluding nodes in VC , of trees in Pv,u,C,j,i′,�′ ,which is defined as follows. The set Pv,u,C,j,i′,�′ includes each subtree P ′ of Gcomplying with the subtree P of C induced by {w1, . . . , wj}, demanding onlyfrom leaves in P that are leaves in C to be leaves in P ′, such that: (1) VP ′ ∩(VC \VP ) = ∅, and (2) the number of internal nodes (leaves) in P ′, excluding thosein VP , is i′ (�′). Informally, we consider such a subtree P ′ as a stage towardscomputing an out-tree T ∈ Tv,u,i,� that complies with C. Indeed, Pv,u,C,|VC |,i∗,�∗

is the set of out-trees in Tv,u,i,� that comply with C.2 Roughly speaking, thematrix L is computed by using dynamic programming and RepAlg (in Steps 8–12) as follows. Each entry in L is computed by adding node sets of certain “small”trees to node sets of trees computed at a previous stage, and then calling RepAlgto compute a representative family for the result.

CTG v

u

k = 8 t = 5 d = 2

x y v

x u y

a b a b v

x u y

c d e f

c e f

g

d h

a b c d e

f

T \ VC

Fig. 1. An out-tree T ∈ Tv,u,4,4, complying with the rooted tree C ∈ Cv,u

Correctness and Running Time: The following lemma implies the correct-ness of TreeAlg.

Lemma 6. For all v, u ∈ V , 0 ≤ i < k and 0 ≤ � ≤ t, M[v,u,i,�] representsSolv,u,i,�.

For c = 1.447 and a large enough constant d, we obtain the following result.

Lemma 7. TreeAlg solves (k, t)-Tree in time O(2.61804k+t|V |O(1)).

Finally, Lemmas 5 and 7 imply the following theorem.3

Theorem 8. k-IOB can be solved in time O(6.85414k|V |O(1)).

6 Improving Known Applications

Fomin et al. [11] proved that Long Directed Cycle, Weighted k-Path andWeighted k-Tree can be solved in times O(8k|E|log2|V |), O(2.851k|V |log2|V |)2 Note that i∗ (�∗), defined in Step 6, is the number of internal nodes (leaves) in an

out-tree T ∈ Tv,u,i,�, excluding those in VC .3 Choosing c = 1, we solve k-IOB in time O∗(8.125k).



Algorithm 2. TreeAlg(G = (V, E), r, k, t)

1: let M be a matrix that has an entry [v, u, i, �] for all v, u ∈ V , 0 ≤ i ≤ k − 1 and0 ≤ � ≤ t, which is initialized to ∅.

2: M[v, u, 0, 0] ⇐ {∅} for all v, u ∈ V s.t. (v, u) ∈ E or v = u.3: for i = 0, . . . , k − 1, � = 0, . . . , t s.t. 1 ≤ i + �, all v, u ∈ V do4: let N be a matrix that has an entry [C] for all C ∈ Cv,u.5: for all C ∈ Cv,u do6: let w1, . . . , w|VC | be a preorder on VC , where w1 = v, and let i∗ = i+1− i(C)

and �∗ = � + |{u} \ {v}| − �(C).7: let L be a matrix that has an entry [j, i′, �′] for all 1 ≤ j ≤ |VC |, 0 ≤ i′ ≤ i∗

and 0 ≤ �′ ≤ �∗, which is initialized to ∅.8: L[1, i′, �′] ⇐ {U ∈ M[v, v, i′, �′] : U∩VC = ∅} for all 0 ≤ i′ ≤ i∗ and 0 ≤ �′ ≤ �∗.9: for j = 2, . . . , |VC |, i′ = 0, . . . , i∗, �′ = 0, . . . , �∗ do

10: let A be the family of all sets U ∪W such that U ∩ (W ∪VC) = ∅, and there

are 0 ≤ i′′ ≤ i′ and 0 ≤ �′′ ≤ �′ satisfying i′′ + �′′ ≤ k + t

dfor which

(1) U ∈ M[fC (wj), wj , i′′, �′′]} and W ∈ L[j − 1, i′ − i′′, �′ − �′′]; or

(2) wj is not a leaf in C, �′′ ≥1, U ∈M[wj ,wj ,i′′,�′′]} and W∈L[j,i′−i′′,�′−�′′].

11: L[j, i′, �′] ⇐ RepAlg(V, k + t, A).12: end for13: N[C] ⇐ {U ∪ (VC \ {v, u}) : U ∈ L[|VC |, i∗, �∗]}.14: end for15: M[v, u, i, �] ⇐ RepAlg(V, k + t,

⋃C∈Cv,u

N[C]).16: end for17: accept iff M[r, r, k − 1, t] = ∅.

and O(2.851k|V |O(1)), respectively. By replacing their computation of repre-sentative families with our scheme, RepAlg, we solve these problems in timesO(6.75k|E| log2 |V |), O(2.619k|V | log2 |V |) and O(2.619k|V |O(1)), respectively.

Acknowledgment. We thank the anonymous referees for valuable comments.

References

1. Blaser, M.: Computing small partial coverings. Inf. Proc. Let. 85(6), 327–331 (2003)2. Bollobas, B.: On generalized graphs. Acta Math. Aca. Sci. Hun. 16, 447–452 (1965)3. Bonnet, E., Paschos, V.T., Sikora, F.: Multiparameterizations for max k-set cover

and related satisfiability problems. CoRR abs/1309.4718 (2013)4. Chen, S., Chen, Z.: Faster deterministic algorithms for packing, matching and t-

dominating set problems. CoRR abs/1306.3602 (2013)5. Cohen, N., Fomin, F.V., Gutin, G., Kim, E.J., Saurabh, S., Yeo, A.: Algorithm for

finding k-vertex out-trees and its application to k-internal out-branching problem.J. Comput. Syst. Sci. 76(7), 650–662 (2010)

6. Demers, A., Downing, A.: Minimum leaf spanning tree. US Patent no. 6,105,018(August 2013)

7. Fomin, F.V., Gaspers, S., Saurabh, S., Thomasse, S.: A linear vertex kernel formaximum internal spanning tree. J. Comput. Syst. Sci. 79(1), 1–6 (2013)



8. Fomin, F.V., Grandoni, F., Lokshtanov, D., Saurabh, S.: Sharp separation andapplications to exact and parameterized algorithms. Algorithmica 63(3), 692–706(2012)

9. Fomin, F.V., Lokshtanov, D., Panolan, F., Saurabh, S.: Representative sets ofproduct families. CoRR abs/1402.3909 (2014)

10. Fomin, F.V., Lokshtanov, D., Raman, V., Saurabh, S.: Subexponential algorithmsfor partial cover problems. Inf. Proc. Let. 111(16), 814–818 (2011)

11. Fomin, F.V., Lokshtanov, D., Saurabh, S.: Efficient computation of representativesets with applications in parameterized and exact agorithms. In: SODA (see also:CoRR abs/1304.4626), pp. 142–151 (2014)

12. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theoryof NP-completeness. W.H. Freeman, New York (1979)

13. Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete prob-lems. In: STOC, pp. 47–63 (1974)

14. Gutin, G., Razgon, I., Kim, E.J.: Minimum leaf out-branching and related prob-lems. Theor. Comput. Sci. 410(45), 4571–4579 (2009)

15. Kneis, J.: Intuitive algorithms. RWTH Aachen University, pp. 1–167 (2009)16. Kneis, J., Molle, D., Rossmanith, P.: Partial vs. complete domination: t-dominating

set. In: SOFSEM, pp. 367–376 (2007)17. Koutis, I., Williams, R.: Limits and applications of group algebras for parameter-

ized problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas,S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 653–664. Springer,Heidelberg (2009)

18. Lovasz, L.: Flats in matroids and geometric graphs. In: BCC, pp. 45–86 (1977)19. Marx, D.: Parameterized coloring problems on chordal graphs. Theor. Comput.

Sci. 351, 407–424 (2006)20. Marx, D.: A parameterized view on matroid optimization problems. Theor. Com-

put. Sci. 410, 4471–4479 (2009)21. Monien, B.: How to find long paths efficiently. Annals Disc. Math. 25, 239–254

(1985)22. Ozeki, K., Yamashita, T.: Spanning trees: A survey. Graphs and Combina-

torics 27(1), 1–26 (2011)23. Pinter, R.Y., Shachnai, H., Zehavi, M.: Deterministic parameterized algorithms for

the graph motif problem. In: MFCS (to appear, 2014)24. Prieto, E., Sloper, C.: Reducing to independent set structure – the case of k-internal

spanning tree. Nord. J. Comput. 12(3), 308–318 (2005)25. Raible, D., Fernau, H., Gaspers, D., Liedloff, M.: Exact and parameterized algo-

rithms for max internal spanning tree. Algorithmica 65(1), 95–128 (2013)26. Salamon, G.: A survey on algorithms for the maximum internal spanning tree and

related problems. Electronic Notes in Disc. Math. 36, 1209–1216 (2010)27. Shachnai, H., Zehavi, M.: Representative families: a unified tradeoff-based ap-

proach. CoRR abs/1402.3547 (2014)28. Skowron, P., Faliszewski, P.: Approximating the MaxCover problem with bounded

frequencies in FPT time. CoRR abs/1309.4405 (2013)29. Zehavi, M.: Algorithms for k-internal out-branching. In: Gutin, G., Szeider, S.

(eds.) IPEC 2013. LNCS, vol. 8246, pp. 361–373. Springer, Heidelberg (2013)


3.6 Parameterized Algorithms for Graph Partitioning Prob-

lems

Hadas Shachnai and Meirav Zehavi. Parameterized Algorithms for Graph Partitioning

Problems. In the proc. of the 40th International Workshop on Graph-Theoretic Concepts

in Computer Science (WG), pages 384–395, 2014.

117


Parameterized Algorithms for GraphPartitioning Problems

Hadas Shachnai and Meirav Zehavi(B)

Department of Computer Science, Technion, 32000 Haifa, Israel{hadas,meizeh}@cs.technion.ac.il

Abstract. We study a broad class of graph partitioning problems, whereeach problem is specified by a graph G = (V, E), and parameters kand p. We seek a subset U ⊆ V of size k, such that α1m1 + α2m2 isat most (or at least) p, where α1, α2 ∈ R are constants defining theproblem, and m1, m2 are the cardinalities of the edge sets having bothendpoints, and exactly one endpoint, in U , respectively. This class offixed-cardinality graph partitioning problems (FGPPs) encompasses Max(k, n − k)-Cut, Min k-Vertex Cover, k-Densest Subgraph, and k-Sparsest Subgraph.

Our main result is an O∗(4k+o(k)Δk) algorithm for any problem inthis class, where Δ ≥ 1 is the maximum degree in the input graph. Thisresolves an open question posed by Bonnet et al. [IPEC 2013]. We obtainfaster algorithms for certain subclasses of FGPPs, parameterized by p,or by (k + p). In particular, we give an O∗(4p+o(p)) time algorithm forMax (k, n−k)-Cut, thus improving significantly the best known O∗(pp)time algorithm.

1 Introduction

Graph partitioning problems arise in many areas including VLSI design, data min-ing, parallel computing, and sparse matrix factorization (see, e.g., [1,7,12]). Westudy the broad class of fixed-cardinality graph partitioning problems (FGPPs),where each problem is specified by a graph G = (V,E), and parameters k and p. Weseek a subset U ⊆ V of size k, such that α1m1 + α2m2 is at most (or at least) p,where α1, α2 ∈ R are constants defining the problem, and m1,m2 are the cardi-nalities of the edge sets having both endpoints, and exactly one endpoint, in U ,respectively. This class encompasses such fundamental problems as Max and Min(k, n − k)-Cut, Max and Min k-Vertex Cover, k-Densest Subgraph, andk-Sparsest Subgraph. For example, Max (k, n − k)-Cut is a max-FGPP (i.e.,maximization FGPP) satisfying α1 = 0 and α2 = 1, Min k-Vertex Cover is amin-FGPP (i.e., minimization FGPP) satisfying α1 = α2 = 1, k-Densest Sub-graph is a max-FGPP satisfying α1 = 1 and α2 = 0, and k-Sparsest Subgraphis a min-FGPP satisfying α1 = 1 and α2 = 0.

A parameterized algorithm with parameter k has running time O∗(f(k)) forsome function f , where O∗ hides factors polynomial in the input size. In this

c© Springer International Publishing Switzerland 2014D. Kratsch and I. Todinca (Eds.): WG 2014, LNCS 8747, pp. 384–395, 2014.DOI: 10.1007/978-3-319-12340-0 32


Parameterized Algorithms for Graph Partitioning Problems 385

paper, we develop a parameterized algorithm with parameter (k + Δ) for theclass of all FGPPs, where Δ ≥ 1 is the maximum degree in the graph G. Forcertain subclasses of FGPPs, we develop algorithms parameterized by p, or by(k + p).

Related Work: Parameterized by k, Max and Min (k, n − k)-Cut, and Maxand Min k-Vertex Cover are W[1]-hard [4,8,11]. Moreover, k-Clique andk-Independent Set, two well-known W[1]-hard problems [9], are special casesof k-Densest Subgraph (where p = k(k − 1)/2), and k-Sparsest Subgraph(where p = 0), respectively. Therefore, parameterized by (k + p), k-DensestSubgraph and k-Sparsest Subgraph are W[1]-hard. Cai et al. [5] andBonnet et al. [2] studied the parameterized complexity of FGPPs with respectto (k + Δ). The paper [5] gives O∗(2(k+1)Δ) time algorithms for k-DensestSubgraph and k-Sparsest Subgraph. This result was recently improved in[2] to O∗(Δk) for degrading FGPPs. This subclass contains max-FGPPs in whichα1/2 ≤ α2, and min-FGPPs in which α1/2 ≥ α2.

1 The authors of [2] also pro-posed an O∗(k2kΔ2k) time algorithm for all FGPPs, and posed as an open ques-tion the existence of constants a and b such that any FGPP can be solved in timeO∗(akΔbk). In this paper we answer this question affirmatively, by developingan O∗(4k+o(k)Δk) time algorithm for any FGPP.

Parameterized by p, Max k-Vertex Cover can be solved in time O∗(1.396p),and in randomized time O∗(1.2993p) [14]. Kneis et al. [14] also show (implicitly)that Min k-Vertex Cover can be solved in time O∗(4p), and in randomizedtime O∗(3p). Moreover, by solving any degrading FGPP in time O∗(Δk), Bonnetet al. [2] prove that Max (k, n−k)-Cut can be solved in time O∗(pp). Recently,Cygan et al. [6] showed that Min (k, n−k)-Cut is also fixed-parameter tractablewith respect to p. Parameterized by (k + p), Min (k, n − k)-Cut can be solvedin time O∗(k2k(k + p)2k) [2].

We note that the parameterized complexity of FGPPs has also been studiedwith respect to other parameters, such as the treewidth and the vertex covernumber of G (see, e.g., [2,3,13]).

Contribution: Our main result is an O∗(4k+o(k)Δk) time algorithm for theclass of all FGPPs, answering affirmatively the question posed by Bonnet et al.[2] (see Sect. 2). In Sect. 3, we develop an O∗(4p+o(p)) time algorithm for Max(k, n − k)-Cut, which significantly improves the O∗(pp) running time obtained

in [2]. We also present (in Sect. 4) an O∗(2k+ pα2

+o(k+p)) time algorithm for thesubclass of positive min-FGPPs, in which α1 ≥ 0 and α2 > 0. Finally, we develop(in Sect. 4) a faster algorithm for non-degrading positive min-FGPPs (i.e., min-FGPPs satisfying α2 ≥ α1/2 > 0). This yields an O∗(2p+o(p)) time algorithmfor Min k-Vertex Cover, improving the previous randomized O∗(3p) timealgorithm. Note that all of our algorithms are deterministic.

Due to space constraints, proofs of the results given in Sect. 4 are omitted.We give the full details in [17].

1 A max-FGPP (min-FGPP) is non-degrading if α1/2 ≥ α2 (α1/2 ≤ α2).



Techniques: We obtain our main result by establishing an interesting reduc-tion from non-degrading FGPPs to the Weighted k-Exact Cover (k-WEC)problem (see Sect. 2). Building on this reduction, combined with an algorithm fordegrading FGPPs of [2], and an algorithm given in [19] for k-WEC, we developan algorithm for any FGPP. To improve the running time of our algorithm, weuse a fast construction of representative families [10,18].

In designing algorithms for FGPPs, parameterized by p or (k + p), we use asa key tool randomized separation [5]. Roughly speaking, randomized separationfinds a ‘good’ partition of the nodes in the input graph G via randomized col-oring of the nodes in red or blue. If a solution exists, then, with some positiveprobability, there is a red colored node-set X that is a solution, such that all ofthe neighbors of nodes in X that are outside X are colored blue. Our algorithmfor Max (k, n − k)-Cut makes non-standard use of randomized separation, inrequiring that only some of the neighbors outside X of nodes in X are blue. Thisyields the desired improvement in the running time of the algorithm.

Our algorithm for non-degrading positive FGPPs is based on a somewhatdifferent application of randomized separation, in which we randomly color edgesrather than nodes. If a solution exists, then with some positive probability, thereis a node-set X that is a solution, such that some edges between nodes in X arered, and all of the edges connecting nodes in X and nodes outside X are blue.In particular, we require that the subgraph induced by X, and the subgraphinduced by X from which we delete all blue edges, contain the same connectedcomponents. We derandomize our algorithms using universal sets [16].

Notation: Given a graph G = (V,E) and a subset X ⊆ V , we denote by E(X)the set of edges in E having both endpoints in X, and by E(X,V \ X) theset of edges having exactly one endpoint in X. Also, let val(X) = α1|E(X)| +α2|E(X,V \ X)|.

2 Solving FGPPs in Time O∗(4k+o(k)Δk)

In this section we develop an O∗(4k+o(k)Δk) time algorithm for the class of allFGPPs. We proceed in the following steps. In Sect. 2.1 we show that any non-degrading FGPP can be reduced to the Weighted k-Exact Cover (k-WEC)problem. Applying this reduction, we then show (in Sect. 2.2) how to decreasethe size of instances of k-WEC, by using representative families. Finally, weshow (in Sect. 2.3) how to solve any FGPP by using the results in Sects. 2.1 and2.2, an algorithm given in [19] for k-WEC, and an algorithm of [2] for degradingFGPPs.

2.1 From Non-degrading FGPPs to k-WEC

We show below that any non-degrading max-FGPP can be reduced to the max-imization version of k-WEC. Given a universe U , a family S of nonempty sub-sets of U , a function w : S → R, and parameters k ∈ N and p ∈ R, we seeka subfamily S ′ of disjoint sets from S satisfying |⋃ S ′| = k whose value, given



Fig. 1. An illustration of the reduction f , given in Sect. 2.1.

by∑

S∈S′ w(S), is at least p. Any non-degrading min-FGPP can be similarlyreduced to the minimization version of k-WEC.

Let Π be a max-FGPP satisfying α1/2 ≥ α2. Given an instance I = (G =(V,E), k, p) of Π, we define an instance f(I) = (U,S, w, k, p) of the maximizationversion of k-WEC as follows.

– U = V– S =

⋃ki=1 Si, where Si contains the node-set of any connected subgraph of G

on exactly i nodes– ∀S ∈ S : w(S) = val(S)

Note that k and p have the same values in both instances. We illustrate thereduction f in Fig. 1. First, we prove that our reduction is valid.

Lemma 1. I is a yes-instance iff f(I) is a yes-instance.

Proof. Assume first that there is a subset X ⊆ V of size k satisfying val(X) ≥ p.Let G1 = (V1, E1), . . . , Gt = (Vt, Et), for some 1 ≤ t ≤ k, be the maximalconnected components in the subgraph of G induced by X. Then, for all 1 ≤

� ≤ t, V� ∈ S. Moreover,

t∑

�=1

|V�| = |X| = k, and

t∑

�=1

w(V�) = val(X) ≥ p.

Now, assume there is a subfamily of disjoint sets {S1, . . . , St} ⊆ S, for some

1 ≤ t ≤ k, such thatt∑

�=1

|S�| = k andt∑

�=1

w(S�) ≥ p. Thus, there are connected

subgraphs G1 = (V1, E1), . . . , Gt = (Vt, Et) of G, such that V� = S�, for all1 ≤ � ≤ t. Let X� =

⋃tj=� Vj , for all 1 ≤ � ≤ t. Clearly, |X1| = k. Since

α1/2 ≥ α2, we get that



val(X1) = val(V1) + val(X2) + α1|E(V1,X2)| − 2α2|E(V1,X2)|≥ val(V1) + val(X2)= val(V1) + val(V2) + val(X3) + α1|E(V2,X3)| − 2α2|E(V2,X3)|≥ val(V1) + val(V2) + val(X3)...

≥t∑

�=1

val(V�).

Thus, val(X1) ≥t∑

�=1

w(V�) ≥ p. �

We now bound the number of connected subgraphs in G.

Lemma 2 ([15]). There are at most 4i(Δ− 1)i|V | connected subgraphs of G onat most i nodes, which can be enumerated in time O(4i(Δ − 1)i(|V | + |E|)|V |).Hence, we have the next result.

Lemma 3. The instance f(I) can be constructed in time O(4k(Δ − 1)k(|V | +|E|)|V |). Moreover, for any 1 ≤ i ≤ k, |Si| ≤ 4i(Δ − 1)i|V |.

2.2 Decreasing the Size of Inputs for k-WEC

In this section we develop a procedure, called Decrease, which compacts the sizeof an instance (U,S, w, k, p) of k-WEC. Note that we do not need this procedureto resolve the question posed by Bonnet et al. [2]. Indeed, we use it to improve therunning time of our algorithm, from O∗(11.404kΔk) to the desired O∗(4k+o(k)Δk)

steps. To this end, we find a subfamily S ⊆S that contains “enough” sets fromS, and thus enables to replace S by S without turning a yes-instance into ano-instance. The following definition captures such a subfamily S.

Definition 1. Given a universe E, nonnegative integers k and r, a family S ofsubsets of size r of E, and a function w : S → R, we say that a subfamily S ⊆ Smax (min) represents S if for any pair of sets X ∈ S, and Y ⊆ E \ X such

that |Y | ≤ k − r, there is a set X ∈ S disjoint from Y such that w(X) ≥ w(X)

(w(X) ≤ w(X)).

The next result implies that small representative families can be computed effi-ciently.2

Theorem 1 ([18]). Given a constant c ≥ 1, a universe E, nonnegative integersk and r, a family S of subsets of size r of E, and a function w : S → R, a subfam-

ily S ⊆ S of size at most(ck)k

rr(ck − r)k−r2o(k) log |E| that max (min) represents

S can be computed in time O(|S|(ck/(ck − r))k−r2o(k) log |E| + |S| log |S|).2 This result builds on a powerful construction technique for representative families

presented in [10].



Now, consider the maximization version of k-WEC and max representative fam-ilies. (The minimization version of k-WEC can be similarly handled by usingmin representative families.) Let RepAlg(E, k, r,S, w) denote the algorithm inTheorem 1 with c = 2, and let Si = {S ∈ S : |S| = i}, for all 1 ≤ i ≤ k.

We present below procedure Decrease, which replaces each family Si by afamily Si ⊆ Si that represents Si.

Procedure. Decrease(U,S, w, k, p)

1: for i = 1, 2, . . . , k do Si ⇐ RepAlg(U, k, i, Si, w). end for

2: S ⇐ ⋃ki=1Si.

3: return (U, S, w, k, p).

In the following, we prove that procedure Decrease is correct.

Lemma 4. (U,S, w, k, p) is a yes-instance iff (U, S, w, k, p) is a yes-instance.

Proof. First, assume that (U,S, w, k, p) is a yes-instance. Let S ′ be a subfamilyof disjoint sets from S, such that |⋃ S ′| = k,

∑S∈S′ w(S) ≥ p, and there is no

subfamily S ′′ satisfying these conditions, and |S ′ ∩ S| < |S ′′ ∩ S|. Suppose, by

way of contradiction, that there is a set S ∈ (Si ∩ S ′) \ S, for some 1 ≤ i ≤ k.

By Theorem 1, there is a set S ∈ Si such that w(S) ≥ w(S), and S ∩ S′ = ∅,

for all S′ ∈ S ′ \ {S}. Thus, S ′′ = (S ′ \ {S}) ∪ {S} is a solution to (U,S, w, k, p).

Since |S ′ ∩ S| < |S ′′ ∩ S|, this is a contradiction.

Now, assume that (U, S, w, k, p) is a yes-instance. Since S ⊆ S, we immedi-ately get that (U,S, w, k, p) is also a yes-instance. �

Next, we show that Theorem 1 implies the following.

Lemma 5. Procedure Decrease runs in time O(

k∑

i=1

(|Si|(2k

2k − i)k−i2o(k) log |U |

+|Si| log |Si|)). Moreover, |S| ≤k∑

i=1

(2k)k

ii(2k − i)k−i2o(k) log |U | ≤ 2.4k+o(k) log |U |.

Proof. For any 1 ≤ i ≤ k, Theorem 1 implies that Step 1 of procedure Decrease

can be executed in time O(|Si|(2k

2k − i)k−i2o(k) log |U | + |Si| log |Si|).

Moreover, by Theorem1, |Si| ≤ (2k)k

ii(2k − i)k−i2o(k) log |U |. Denoting i = αk, we

have that |Si| ≤ (2

αα(2 − α)1−α)k2o(k) log |U |. The maximum is obtained at

α ≈ 0.6465, and therefore |Si| ≤ 2.4k+o(k) log |U |.Thus, we get the desired upper bounds for |S| and the running time of pro-

cedure Decrease. �



2.3 An Algorithm for any FGPP

We now present FGPPAlg, an algorithm that solves any FGPP in O∗(4k+o(k) ·Δk) steps. Let DegAlg(G, k, p) denote the algorithm that solves any degradingFGPP in time O((Δ + 1)k+1|V |), given in [2]. Assuming that all the sets inS have the same size r, the algorithm in Sect. 5 of [19] solves k-WEC in timeO(2.851k(r−1)/r · |S| · |U | log2 |U |). This algorithm can be easily modified to solvek-WEC in time O(2.851k ·|S|·|U | log2 |U |), which is good enough for our purpose.

Let Π be an FGPP with parameters α1 and α2. Assume w.l.o.g that Δ ≥2, otherwise Π is clearly solvable in polynomial time, using a simple dynamicprogramming-based procedure. We now describe algorithm FGPPAlg (see thepseudocode below). First, if Π is a degrading FGPP, then FGPPAlg solves Πby calling DegAlg. Otherwise, by using the reduction f , FGPPAlg transformsthe input into an instance of k-WEC. Then, FGPPAlg compacts the size of theresulting instance by calling the procedure Decrease. Finally, FGPPAlg solves Πby calling WECAlg.

Algorithm 1. FGPPAlg(G = (V,E), k, p)

1: if (Π is a max-FGPP and α12

≤ α2) or (Π is a min-FGPP and α12

≥ α2) then2: accept iff DegAlg(G, k, p) accepts.3: end if4: (U, S, w, k, p) ⇐ f(G, k, p).

5: (U, S, w, k, p) ⇐ Decrease(U, S, w, k, p).

6: accept iff WECAlg(U, S, w, k, p) accepts.

Theorem 2. Algorithm FGPPAlg solves Π in time O(4k+o(k)Δk(|V |+|E|)|V |).

Proof. The correctness of the algorithm follows immediately from Lemmas 1 and4, and the correctness of DegAlg and WECAlg.

Note that 2.851k2.4k+o(k) = 6.8424k+o(k) ≤ 4k+o(k)Δk. Thus, by Lemmas 3and 5, and the running times of DegAlg and WECAlg, algorithm FGPPAlg runsin time

O(4k(Δ − 1)k(|V | + |E|)|V | +

k∑

i=1

(4i(Δ − 1)i|V |( 2k

2k − i)k−i2o(k) log |V |)

+ 2.851k2.4k+o(k)|V | log3 |V |)= O(4k+o(k)Δk(|V | + |E|)|V | + 2o(k)|V | log |V |[ max

0≤α≤1{4αΔα(

2

2 − α)1−α}]k)

= O(4k+o(k)Δk(|V | + |E|)|V | + 4k+o(k)Δk|V | log |V |)= O(4k+o(k)Δk(|V | + |E|)|V |).

�



3 Solving Max (k, n − k)-Cut in Time O∗(4p+o(p))

We give below an O∗(4p+o(p)) time algorithm for Max (k, n−k)-Cut. In Sect. 3.1we show that it suffices to consider an easier variant of Max (k, n − k)-Cut,that we call NC-Max (k, n− k)-Cut. We solve this variant in Sect. 3.2. Finally,our algorithm for Max (k, n − k)-Cut is given in Sect. 3.3.

3.1 Simplifying Max (k, n − k)-Cut

We first define an easier variant of Max (k, n − k)-Cut. Given a graph G =(V,E), where each node is either red or blue, and positive integers k and p, NC-Max (k, n − k)-Cut asks if there is a subset X ⊆ V of exactly k red nodes andno blue nodes, such that at least p edges in E(X,V \X) have a blue endpoint.

Given an instance (G, k, p) of Max (k, n − k)-Cut, we perform several iter-ations of coloring the nodes in G; thus, if (G, k, p) is a yes-instance, we generateat least one yes-instance of NC-Max (k, n−k)-Cut. To determine how to colorthe nodes in G, we need the following definition of universal sets.

Definition 2. Let F be a set of functions f : {1, 2, . . . , n} → {0, 1}. We saythat F is an (n, t)-universal set if, for every subset I ⊆ {1, 2, . . . , n} of size tand a function f ′ : I → {0, 1}, there is a function f ∈ F such that, for all i ∈ I,f(i) = f ′(i).

The following result asserts that small universal sets can be computed efficiently.

Lemma 6 ([16]). There is an algorithm, UniSetAlg, that given a pair of integers(n, t), computes an (n, t)-universal set F of size 2t+o(t) log n in time O(2t+o(t)nlog n).

We now present ColorNodes (see the pseudocode below), a procedure that givenan input (G, k, p, q), where (G, k, p) is an instance of Max (k, n − k)-Cut andq = k + p, returns a set of instances of NC-Max (k, n − k)-Cut. ProcedureColorNodes first constructs a (|V |, k + p)-universal set F . For each f ∈ F ,ColorNodes generates a colored copy V f of V . Then, ColorNodes returns a set I,including the resulting instances of NC-Max (k, n − k)-Cut.

The next lemma implies the correctness of procedure ColorNodes.

Lemma 7. An instance (G, k, p) of Max (k, n − k)-Cut is a yes-instanceiff ColorNodes(G, k, p, k + p) returns a set I containing at least one yes-instanceof NC-Max (k, n − k)-Cut.

Proof. If (G, k, p) is a no-instance of Max (k, n − k)-Cut, then clearly, for anycoloring of the nodes in V , we get a no-instance of NC-Max (k, n − k)-Cut.

Next suppose that (G, k, p) is a yes-instance, and let X be a set of k nodesin V such that |E(X,V \ X)| ≥ p. Note that there is a set Y of at most p nodesin V \ X such that |E(X,Y )| ≥ p. Let X ′ and Y ′ denote the indices of thenodes in X and Y , respectively. Since F is a (|V |, k + p)-universal set, there is



Procedure. ColorNodes(G = (V,E), k, p, q)

1: let V = {v1, v2, . . . , v|V |}.2: F ⇐ UniSetAlg(|V |, q).3: for all f ∈ F do4: let V f = {vf

1 , vf2 , . . . , vf

|V |}, where vfi is a copy of vi.

5: for i = 1, 2, . . . , |V | do6: if f(i) = 0 then color vf

i red. else color vfi blue. end if

7: end for8: end for9: return I = {(Gf = (Vf , E), k, p) : f ∈ F}.

a function f ∈ F such that: (1) for all i ∈ X ′, f(i) = 0, and (2) for all i ∈ Y ′,f(i) = 1. Thus, in Gf , the copies of the nodes in X are red, and the copies ofthe nodes in Y are blue. We get that (Gf , k, p) is a yes-instance of NC-Max(k, n − k)-Cut. �

Furthermore, Lemma 6 immediately implies the following result.

Lemma 8. Procedure ColorNodes runs in time O(2q+o(q)|V | log |V |), and returnsa set I of size O(2q+o(q) log |V |).

3.2 A Procedure for NC-Max (k, n − k)-Cut

We now present SolveNCMaxCut, a procedure for solving NC-Max (k, n − k)-Cut (see below). Procedure SolveNCMaxCut orders the red nodes in V by thenumber of their blue neighbors in a non-increasing manner. If there are at leastk red nodes, and the number of edges between the first k red nodes and bluenodes is at least p, procedure SolveNCMaxCut accepts, and otherwise rejects.

Procedure. SolveNCMaxCut(G = (V,E), k, p)

1: for all red v ∈ V do2: compute the number nb(v) of blue neighbors of v in G.3: end for4: let v1, v2, . . . , vr, for some 0 ≤ r ≤ |V |, denote the red nodes in V , such that

nb(vi) ≥ nb(vi+1) for all 1 ≤ i ≤ r − 1.

5: accept iff (r ≥ k andk∑

i=1

nb(vi) ≥ p).

Clearly, the following result holds.

Lemma 9. Procedure SolveNCMaxCut solves NC-Max (k, n − k)-Cut in timeO(|V | log |V | + |E|).



3.3 An Algorithm for Max (k, n − k)-Cut

Assume w.l.o.g that G has no isolated nodes. Our algorithm, MaxCutAlg, forMax (k, n−k)-Cut, proceeds as follows (see below). First, if p < min{k, |V |−k},then MaxCutAlg accepts, and if |V |−k < k, then MaxCutAlg performs a recursivecall with k replaced by |V | − k. Then, MaxCutAlg calls ColorNodes to computea set of instances of NC-Max (k, n − k)-Cut, and accepts iff SolveNCMaxCutaccepts at least one of them.

Algorithm 2. MaxCutAlg(G = (V,E), k, p)

1: if p < min{k, |V | − k} then accept. end if2: if |V | − k < k then accept iff MaxCutAlg(G, |V | − k, p) accepts. end if3: I ⇐ ColorNodes(G, k, p, k + p).4: for all (G′, k′, p′) ∈ I do5: if SolveNCMaxCut(G′, k′, p′) accepts then accept. end if6: end for7: reject.

The next lemma implies the correctness of Step 1 in MaxCutAlg.

Lemma 10 ([2]). In a graph G = (V,E) having no isolated nodes, there is asubset X ⊆ V of size k such that |E(X,V \ X)| ≥ min{k, |V | − k}.

Our main result is the following.

Theorem 3. MaxCutAlg solves Max (k, n − k)-Cut in time O(4p+o(p)(|V | +|E|) log2 |V |).

Proof. Clearly, (G, k, p) is a yes-instance iff (G, |V |−k, p) is a yes-instance. Thus,Lemmas 7, 9 and 10 immediately imply the correctness of MaxCutAlg.

Denote m = min{k, |V | − k}. If p < m, then MaxCutAlg runs in time O(1).Next suppose that p ≥ m. Then, by Lemmas 8 and 9, MaxCutAlg runs in timeO(2m+p+o(m+p)(|V | + |E|) log2 |V |) = O(4p+o(p)(|V | + |E|) log2 |V |). �

4 Algorithms for Positive Min-FGPPs

In this section we summarize our results for positive min-FGPPs.First, we use a standard application of randomized separation to prove the

following.

Theorem 4. Any positive min-FGPP can be solved in time O(2k+ pα2

+o(k+p) ·(|V | + |E|) log |V |).



Now, let Π be a non-degrading positive min-FGPP. To solve Π, we use a some-what different application of randomized separation, in which we randomly coloredges rather than nodes. To this end, we define an easier variant of the problemΠ, called EC-Π.

In EC-Π, we are given a graph G = (V,E) where each edge is either red orblue, and parameters k ∈ N and p ∈ R. For any subset X ⊆ V , let C(X) denotethe family containing the node-sets of the maximal connected components in thegraph Gr = (X,Er), where Er is the set of red edges in E having both endpointsin X. Also, let val∗(X) =

∑C∈C(X) val(C). The problem EC-Π asks if there is

a subset X ⊆ V of exactly k nodes, such that all of the edges in E(X,V \ X)are blue, and val∗(X) ≤ p.

To solve Π, we first construct a set I of instances of EC-Π. Then, using adynamic programming-based procedure, we solve each of the instances in I. Weaccept iff at least one of the instances in I is a yes-instance.

This approach leads to the following result, where x = max{ pα2

,min{ pα1

, pα2

+(1 − α1

α2)k}}.

Theorem 5. Any non-degrading positive min-FGPP can be solved in timeO(2x+o(x)(|V |k + |E|) log |E|).

In case α1 = α2 = 1, we have that x = p. Thus, since Min k-Vertex Coveris a non-degrading positive min-FGPP which satisfies α1 = α2 = 1, we have thefollowing.

Corollary 1. Min k-Vertex Cover can be solved in time O(2p+o(p)(|V |k +|E|) log |E|).

Acknowledgment. We thank the anonymous referees for valuable comments andsuggestions.

References

1. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas,C., Teboulle, M. (eds.) Grouping Multidimensional Data: Recent Advances in Clus-tering, pp. 25–71. Springer, Heidelberg (2006)

2. Bonnet, E., Escoffier, B., Paschos, V.T., Tourniaire, E.: Multi-parameter complex-ity analysis for constrained size graph problems: using greediness for parameteri-zation. In: Gutin, G., Szeider, S. (eds.) IPEC 2013. LNCS, vol. 8246, pp. 66–77.Springer, Heidelberg (2013)

3. Bourgeois, N., Giannakos, A., Lucarelli, G., Milis, I., Paschos, V.T.: Exact andapproximation algorithms for densest k -subgraph. In: Ghosh, S.K., Tokuyama, T.(eds.) WALCOM 2013. LNCS, vol. 7748, pp. 114–125. Springer, Heidelberg (2013)

4. Cai, L.: Parameterized complexity of cardinality constrained optimization prob-lems. Comput. J. 51(1), 102–121 (2008)

5. Cai, L., Chan, S.M., Chan, S.O.: Random separation: a new method for solv-ing fixed-cardinality optimization problems. In: Bodlaender, H.L., Langston, M.A.(eds.) IWPEC 2006. LNCS, vol. 4169, pp. 239–250. Springer, Heidelberg (2006)



6. Cygan, M., Lokshtanov, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Minimumbisection is fixed parameter tractable. In: STOC, pp. 323–332 (2014)

7. Donavalli, A., Rege, M., Liu, X., Jafari-Khouzani, K.: Low-rank matrix factor-ization and co-clustering algorithms for analyzing large data sets. In: Kannan, R.,Andres, F. (eds.) ICDEM 2010. LNCS, vol. 6411, pp. 272–279. Springer, Heidelberg(2012)

8. Downey, R.G., Estivill-Castro, V., Fellows, M.R., Prieto, E., Rosamond, F.A.: Cut-ting up is hard to do: the parameterized complexity of k-cut and related problems.Electron. Notes Theor. Comput. Sci. 78, 209–222 (2003)

9. Downey, R.G., Fellows, M.R.: Fixed-parameter tractability and completeness II:on completeness for W[1]. Theor. Comput. Sci. 141(1&2), 109–131 (1995)

10. Fomin, F.V., Lokshtanov, D., Saurabh, S.: Efficient computation of representativesets with applications in parameterized and exact agorithms. In: SODA, pp. 142–151 (2014)

11. Guo, J., Niedermeier, R., Wernicke, S.: Parameterized complexity of vertex covervariants. Theory Comput. Syst. 41(3), 501–520 (2007)

12. Kahng, A.B., Lienig, J., Markov, I.L., Hu, J.: VLSI Physical Design - From GraphPartitioning to Timing Closure. Springer, Netherlands (2011)

13. Kloks, T. (ed.): Treewidth, Computations and Approximations. LNCS, vol. 842.Springer, Heidelberg (1994)

14. Kneis, J., Langer, A., Rossmanith, P.: Improved upper bounds for partial vertexcover. In: Broersma, H., Erlebach, T., Friedetzky, T., Paulusma, D. (eds.) WG2008. LNCS, vol. 5344, pp. 240–251. Springer, Heidelberg (2008)

15. Komusiewicz, C., Sorge, M.: Finding dense subgraphs of sparse graphs. In: Thilikos,D.M., Woeginger, G.J. (eds.) IPEC 2012. LNCS, vol. 7535, pp. 242–251. Springer,Heidelberg (2012)

16. Naor, M., Schulman, L.J., Srinivasan, A.: Splitters and near-optimal derandomiza-tion. In: FOCS, pp. 182–191 (1995)

17. Shachnai, H., Zehavi, M.: Parameterized algorithms for graph partitioning prob-lems. CoRR abs/1403.0099 (2014)

18. Shachnai, H., Zehavi, M.: Representative families: a unified tradeoff-based app-roach. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 786–797.Springer, Heidelberg (2014)

19. Zehavi, M.: Deterministic parameterized algorithms for matching and packingproblems. CoRR abs/1311.0484 (2013)


3.7 Deterministic Parameterized Algorithms for the Graph

Motif Problem

Ron Y. Pinter, Hadas Shachnai and Meirav Zehavi. Deterministic Parameterized

Algorithms for the Graph Motif Problem. In the proc. of the 39th International

Symposium on Mathematical Foundations of Computer Science (MFCS), pages 589–600,

2014.

130


Deterministic Parameterized Algorithms

for the Graph Motif Problem

Ron Y. Pinter, Hadas Shachnai, and Meirav Zehavi

Department of Computer Science, Technion, Haifa 32000, Israel{pinter,hadas,meizeh}@cs.technion.ac.il

Abstract. We study the classic Graph Motif problem: given a graphG = (V, E) with a set of colors for each node, and a multiset M of colors,we seek a subtree T ⊆ G, and a coloring of the nodes in T , such that Tcarries exactly (also with respect to multiplicity) the colors in M . GraphMotif plays a central role in the study of pattern matching problems,primarily motivated from the analysis of complex biological networks.

Previous algorithms for Graph Motif and its variants either rely ontechniques for developing randomized algorithms that, if derandomized,render them inefficient, or the algebraic narrow sieves technique for whichthere is no known derandomization. In this paper, we present fast deter-ministic parameterized algorithms for Graph Motif and its variants.Specifically, we give such an algorithm for the more general Graph Mo-tif with Deletions problem, followed by faster algorithms for GraphMotif and other well-studied special cases. Our algorithms make non-trivial use of representative families, and a novel tool that we call guidingtrees, together enabling the efficient construction of the output tree.

1 Introduction

With the advent of network biology and complex network analysis in general, thestudy of pattern matching problems in graphs has become of major importance[12,16]. Indeed, the term “graph motif” plays a central role in this context, withdifferent node colors used to model different functionalities of the network (see,e.g., [17,7]). Due to the generic nature of the Graph Motif (GM) problem(also known as the Topology-Free Network Query problem), the so calledmotif analysis approach has become useful also in the study of social networks(see, e.g., [23] and the references therein).

The GM problem is a natural variant of classic pattern matching problems,where the topology of the pattern M is unknown or of lesser importance. Givena graph G = (V, E) with a set of colors for each node, and a multiset M ofcolors, we seek a subtree T ⊆ G, and a coloring of the nodes in T , such that Tcarries exactly (also with respect to multiplicity) the colors in M . We call T anoccurrence of M in G. To allow more flexibility in the definition of an occurrence,and since biological network data often contains noise, a generalized version ofGM allows deleting colors from M .

Parameterized algorithms solve NP-hard problems by confining the combina-torial explosion to a parameter k. More precisely, a problem is fixed-parameter

E. Csuhaj-Varju et al. (Eds.): MFCS 2014, Part II, LNCS 8635, pp. 589–600, 2014.c© Springer-Verlag Berlin Heidelberg 2014


590 R.Y. Pinter, H. Shachnai, and M. Zehavi

C = { , } M = { , , , }

G a b c d

k = 3

Solutions

a b d b c d

A

B

Fig. 1. An input for GMD (A), and two possible solutions (B)

tractable (FPT) with respect to a parameter k if it can be solved in time O∗(f(k))for some function f , where O∗ hides factors polynomial in the input size. SinceGM is NP-complete [17], there is a growing body of literature studying its param-eterized complexity (see the excellent survey in [26]). In this paper, we presentfast deterministic parameterized algorithms for GM and its variants.

1.1 Problem Statement

The most general variant considered in this paper is Graph Motif with Dele-tions (GMD): the input is a set of colors C, a multiset M of colors from C,and an undirected graph G = (V, E). The nodes in V are associated with colorsvia a (set-)coloring Col : V → 2C . We are also given a parameter k ≤ |M |.

We need to decide if there exists a subtree T = (VT , ET ) of G on k nodes, anda coloring col : VT → C that assigns a color from Col(v) to each node v ∈ VT ,such that

∀c ∈ C : |{v ∈ VT : col(v) = c}| ≤ occ(c), (1)

where occ(c) is the number of occurrences of a color c in M (see Fig. 1).1

Special Cases: Restricted GMD (RGMD) is the special case of GMD wherefor any node v ∈ V , |Col(v)| = 1. Also, GM and RGM are the special casesof GMD and RGMD, respectively, where deletions are not allowed (i.e., theinequality in (1) is replaced by equality, and k = |M |).

1.2 Known Results and Our Contribution

GMD has received considerable attention since it was introduced by Lacroix etal. [17]. The paper [17] also shows that RGM is NP-hard when M is a set and Gis a tree. Even seemingly simpler cases of RGM are known to be NP-hard (see[11,2,8]). Moreover, a natural optimization version of RGMD, minimizing the

number of deletions from M , is hard to approximate within factor |V | 13 −ε [24].

1 Some papers define GMD as a problem where one seeks a connected subgraph S ofG, which is equivalent to our definition (simply consider some spanning tree T of S).


Deterministic Parameterized Algorithms for the Graph Motif Problem 591

On the positive side, using techniques for developing randomized parameter-ized algorithms, many such algorithms have been obtained for GMD and itsvariants [3,4,6,7,9,10,14,15,21,22]. Some of these algorithms can be derandom-ized, resulting, however, in inefficient algorithms. In particular, Fellows et al.[10] gave a deterministic algorithm for RGM that runs in time O∗(87k), basedon a derandomization of the color coding technique [1]. Currently, the best ran-domized algorithm for GMD runs in time O∗(2k), due to Bjorklund et al. [6].This algorithm is based on the narrow sieves technique [5], for which there is noknown derandomization. Thus, previous studies left open the existence of a fastdeterministic parameterized algorithm for GMD.

In this paper, we present fast deterministic parameterized algorithms forGMD and its variants. In particular, we develop an O∗(6.86k) time algorithmfor GMD, an O∗(5.22k) time algorithm for GM, and an O∗(5.18k) time algo-rithm for RGMD.

Due to space constraints, some of the proofs are omitted. The detailed resultsappear in [20].

1.3 Techniques

Our algorithms make non-trivial use of representative families, and a novel toolthat we call guiding trees, together enabling the efficient construction of the out-put tree. Informally, a guiding tree is a constant-size rooted tree which providessome structural information about the solution tree. To efficiently compute afamily S of partial solutions, we first construct a polynomial number of suitableguiding trees. We then use these trees to generate S, by combining previouslycomputed families of partial solutions. Thus, we avoid iterating over all O∗(2k)possible topologies for the solution tree.

The efficiency of our algorithms is further improved via replacement of eachfamily of partial solutions, S, by a subfamily S ⊆ S, which represents S. Eachrepresentative family S contains enough sets from S, thus, we preserve the cor-rectness of the algorithm while improving its running time.

Building on the powerful technique of Fomin et al. [13], for efficient con-struction of representative families, we tailor the definitions of these sets to theproblem at hand. This also leads to replacing uniform matroid (often used forfast computation of representative families) by partition matroid, which capturesmore closely the restricted variants of GM.

2 Preliminaries

Given a graph H , let VH and EH denote its node-set and edge-set, respectively.

Matroids: In deriving our results, we use two types of matroids.2 Given aconstant k, the first is defined by a pair M = (E, I), where E is an n-elementset, and I = {S ⊆ E : |S| ≤ k}. Such a pair is called a uniform matroid, denotedby Un,k.

2 For a broader overview of matroids, see, e.g., [19].



Given some constants � and k1, k2, . . . , k�, the second is defined by a pair(E, I), where E is an n-element set partitioned into disjoint sets E1, E2, . . . , E�,and I = {S ⊆ E : |S ∩ E1| ≤ k1, |S ∩ E2| ≤ k2, . . . , |S ∩ E�| ≤ k�}. Such a pairis called a partition matroid. Note that, when � = 1, the definitions for the twotypes of matroids coincide.

Representative Families: Given a family S of sets that are partial solutions,we would like to replace S by a smaller subfamily S ⊆ S. If there is a partialsolution in S that can be extended to a solution, it is clearly necessary that therewould also be a partial solution in S that can be extended to a solution. Thefollowing definition captures such a family S.

Definition 1. Given a matroid M = (E, I), and a family S of subsets of size

p of E, we say that a subfamily S ⊆ S q-represents S if for every pair of setsX ∈ S, and Y ⊆ E \ X such that |Y | ≤ q and X ∪ Y ∈ I, there is a set X ∈ Sdisjoint from Y such that X ∪ Y ∈ I.

The next two results enable the efficient construction of small representativefamilies.

Theorem 1 ([13,25]). Given a parameter c ≥ 1, a uniform matroid Un,k =

(E, I), and a family S of subsets of size p of E, a family S ⊆ S of size at

most(ck)k

pp(ck − p)k−p2o(k) log n that (k − p)-represents S can be found in time

O(|S|(ck/(ck − p))k−p2o(k) log n).

Theorem 2 ([13,18]). Given constants �, k1, k2, . . . , k� and k ≤�∑

i=1

ki, a cor-

responding partition matroid M = (E, I), and a family S of subsets of size p

of E, a family S ⊆ S of size at most(kp

)nO(1) that (k − p)-represents S can be

found in time O(|S|(

kp

)w−1nO(1)), where w < 2.3727 is the matrix multiplication

exponent [27].

Let UniRep(c, Un,k, S) and ParRep(k, M, S) be the algorithms implied by The-orems 1 and 2, respectively.

Guiding Trees: Recall that G = (V, E) is the input graph, and let 2 ≤ d ≤ k/2be a constant (to be determined).3 Given a rooted tree T and a node v ∈ VT thatis not the root of T , let fT (v) be the father of v in T . Given nodes v, u ∈ V , wesay that a tree T rooted at v is a (v, u)-tree if u ∈ VT . Furthermore, a (v, u)-treeR is a (v, u)-guide if 3 ≤ |VR| ≤ 2d and VR ⊆ V (ER may not be contained inE). Let Gv,u be the set of (v, u)-guides. Finally, let Tv,u,� be the set of (v, u)-treeson � nodes, that, when unrooted, are subtrees of G.

We now define which subtrees of G listen to the instructions of a given guide(see Fig. 2).

3 The choice of d concerns the analysis of the running times of our algorithms.



v v v

u

uu

r r r

s

s s

t t

t

a

b c

e f

g

h

i

j

l

a

b c

e f

g

h

i

j

l

q

x

y z

G T R

Fig. 2. A (v, u)-tree T , and a (v, u)-guide R, where d = 3, k = 12, and T listens to R

Definition 2. Given v, u ∈ V and � ≤ k, we say that T ∈ Tv,u,� listens toR ∈ Gv,u if the following two conditions are fulfilled.

1. ∀v′, u′ ∈ VR : v′ is an ancestor of u′ in R iff v′ is an ancestor of u′ in T .2. For each tree X in the forest obtained by removing VR from T , let NX =

{v′ ∈ VR : {v′, u′} ∈ ET for some u′ ∈ VX}.Then, |NX | ≤ 2, and [NX = {v} → (|VX ∪ NX | ≤ k/d)].

The next lemma, which asserts that none of the subtrees of G relevant tosolving GMD is completely undisciplined, is implicit in [13].

Lemma 3. For any rooted tree T ∈ Tv,u,�, where v, u ∈ V and 3 ≤ � ≤ k, thereexists R ∈ Gv,u to whom T listens.

Feasible Colorings: Given U ⊆ V , we say that a coloring col : U → C isfeasible if [∀v ∈ U : col(v) ∈ Col(v)] and [∀c ∈ C : |{v ∈ U : col(v) = c}| ≤occ(c)]. Denote by ima(col) the image of col.

3 An Algorithm for GMD

In this section we solve GMD in time O∗(6.86k). Since in GMD each node isassigned a set of colors whose size can be greater than 1, we may assume w.l.o.gthat M is a set equal to C (a formal proof is given, e.g., in [22]).

The main idea of the algorithm is to iterate over all pairs of nodes v, u ∈ V ,and all values 1 ≤ � ≤ k. When we reach such v, u and �, we have alreadycomputed, for all v′, u′ ∈ V and 1 ≤ �′ < �, representative families for families ofcorresponding “partial solutions”. Each such partial solution is a union of a setA containing exactly �′ nodes, and a set B containing exactly �′ colors. The setsA and B correspond to a pair of a rooted tree T ∈ Tv′,u′,�′ satisfying A = VT ,and a feasible coloring col : A → B.

To compute a family of partial solutions corresponding to v, u and �, we iterateover all (v, u)-guides in Gv,u. We follow the instructions of the current guide R byusing another, internal dynamic programming-based computation. At each stageof this computation, we have a family of partial solutions listening to a certain



subtree of R. We unite these partial solutions with other small partial solutions,according to the instructions of R, thus efficiently constructing a family of partialsolutions listening to a greater subtree of R. For this family, we compute a smallerrepresentative family, so that the following stage can be executed efficiently. Afteriterating over all relevant guides, we find a family representing the union of thefamilies returned by the internal dynamic programming-based computations.This family includes enough, but not too many, partial solutions correspondingto v, u and �, which ensures the correctness of the algorithm.

3.1 The Algorithm

We now describe GMD-Alg, our algorithm for GMD (see the pseudocode below).GMD-Alg first generates a matrix M, where each entry [v, u, cv, cu, �] holds afamily that represents Solv,u,cv,cu,�, the family of every set (X ∪ Y ) satisfying|X | = |Y | = �, for which there exist T ∈ Tv,u,� such that X = VT , and a feasiblecol : X → Y satisfying col(v) = cv and col(u) = cu.

Algorithm 1. GMD-Alg(C, G = (V, E), Col, k)

1. let M be a matrix that has an entry [v, u, cv, cu, �] for all v, u∈V, cv ∈Col(v),cu ∈ Col(u), and 1 ≤ � ≤ k, initialized to ∅.

2. M[v,v,c,c,1]⇐{{v,c}} for all v∈V and c∈Col(v).3. M[v,v,c,c,2]⇐{{v,u,c,c′} : {v,u}∈E, c′∈Col(u)\{c}} for all v∈V and c∈Col(v).4. M[v,u,c,c′ ,2]⇐{{v,u,c,c′}} for all {v, u}∈E, c ∈ Col(v) and c′ ∈ Col(u)\{c}.5. for all v, u∈V , cv ∈Col(v), cu ∈Col(u), and � = 3, . . . , k do6. let N be a matrix that has an entry [R, colR] for all R ∈ Gv,u, and feasible

colR : VR → C satisfying colR(v) = cv and colR(u) = cu, initialized to ∅.7. for all [R, colR] ∈ N do8. let w1, . . . , w|VR| be a preorder on VR, where w1 =v.9. let L be a matrix that has an entry [i, �′] for all 1 ≤ i ≤ |VR| and 1 ≤ �′

≤ �, initialized to ∅.10. L[1, �′] ⇐ M[v, v, cv, cv, �′] for all 1 ≤ �′ < �.11. for i = 2, . . . , |VR|, and �′ = 2, . . . , � do12. let A include all sets (U ∪ W ) for which there exists 2 ≤ �′′ ≤ min{�′,

� − 1, k/d} satisfying (1) or (2):(1) U ∩ W = {fR(wi), colR(fR(wi))},

U ∈M[fR(wi),wi,colR(fR(wi)),colR(wi),�′′] and W∈L[i−1, �′−�′′+1].

(2) U ∩ W = {wi, colR(wi)},U ∈ M[wi, wi, colR(wi), colR(wi), �

′′] and W ∈ L[i, �′−�′′+1].13. L[i, �′] ⇐ UniRep(1.447, U(|V |+|C|),2k, A).14. end for15. N[R, colR] ⇐ L[|VR|, �].16. end for17. M[v, u, cv, cu, �] ⇐ UniRep(1.447, U(|V |+|C|),2k,

⋃[R,colR]∈N N[R, colR]).

18. end for19. accept iff (

⋃v∈V,cv∈Col(v) M[v, v, cv, cv, k]) = ∅.



Then, in Steps 2–4, GMD-Alg computes all “basic” entries of M, i.e., entries ofthe form [v, u, cv, cu, �], where � ≤ 2. Next, in Step 5, GMD-Alg iterates over allvalues v, u, cv, cu and � that define an entry of M that is not basic, in an orderthat guarantees that when we reach an entry [$] of M, we have already computedentries of M that are relevant to [$]. Now, consider a specific iteration of Step 5,and note that the goal of this iteration is to compute M[v, u, cv, cu, �].

GMD-Alg, in Step 6, generates a matrix N. Each entry [R, colR] holds a familythat represents a subfamily of Solv,u,cv,cu,�. A set (X∪Y ) ∈ Solv,u,cv,cu,� belongsto this subfamily if its corresponding (v, u)-tree T ∈ Tv,u,� and feasible coloringcol also satisfy the requirements that T listens to R, and col colors the nodesin VR exactly as colR colors them. Now, consider a specific iteration of Step 7,and note that the goal of this iteration is to compute N[R, colR]. To this end,GMD-Alg executes an internal dynamic programming-based computation, whichtakes place in Steps 9–14.

First, in Step 9, GMD-Alg generates a matrix L. Almost every entry [i, �′]holds a family that represents Soli,�′ ,4 the family including every set (X ∪ Y )satisfying |X | = |Y | = �′, for which there exist a (v, wi)-tree T ∈ Tv,wi,�′ and afeasible coloring col : X → Y , satisfying the following conditions. The subtreeT listens to the subtree of R induced by {w1, . . . , wi}, X = VT , and col colorsthe nodes in {w1, . . . , wi} exactly as colR colors them. Note that the subgraphof R induced by {w1, . . . , wi} is a tree because of the preorder defined in Step8. Then, in Step 10, GMD-Alg computes all “basic” entries of L, i.e., entries ofthe form [1, �′]. Next, in Step 11, GMD-Alg iterates over all values i and �′ thatdefine an entry of L that is not basic, in an order that guarantees that when wereach an entry [$] of L, we have already computed entries of L that are relevantto [$]. Now, consider a specific iteration of Step 11, and note that the goal ofthis iteration is to compute L[i, �′].

GMD-Alg, in Step 12, computes a family A that represents Soli,�′ . The compu-tation involves uniting sets U , found in previous stages of the external dynamicprogramming-based computation (i.e., U belongs to an entry of M), with setsW , found in previous stages of the internal dynamic programming-based compu-tation (i.e., W belongs to an entry of L). It is easy to verify that the restrictionsposed on the choices of U and W gaurantee that their union indeed belongsto Soli,�′ , noting the following observations. The restriction �′′ ≤ k/d concernsCondition 2 in Definition 2, whose relevance follows from the requirement ofexistence of a (v, wi)-tree T as defined above. The first line in each of the op-tions (1) and (2) ensures that we do not use any node or color more than once.The other line of option (1) ensure that U ∈ SolfR(wi),wi,colR(fR(wi)),colR(wi),�′′

and W ∈ Soli−1,�′−�′′+1, and the other line of option (2) ensures that U ∈Solwi,wi,colR(wi),colR(wi),�′′ and W ∈ Soli,�′−�′′+1.

After computing A, GMD-Alg computes L[i, �′] (in Step 13) by finding asmaller family that represents A. Upon completing the computation of L, sinceVR = {w1, . . . , w|VR|}, GMD-Alg can compute N[R, colR] (in Step 15) by a simpleassignment. Then, the union of the families stored in N is a family that represents

4 More precisely, here we refer to all entries [i, �′] such that (�′ = � → i = |VR|).



Solv,u,cv,cu,�, a claim supported by Lemma 3. Therefore, in Step 19,GMD-Alg cancompute M[v, u, cv, cu, �] by simply finding a family that represents this union.

Finally, GMD-Alg accepts iff⋃

v∈V,cv∈Col(v) M[v, v, cv, cv, k] = ∅. Indeed, note

that the input is a yes-instance iff⋃

v∈V,cv∈Col(v) Solv,v,cv,cv,k = ∅.

3.2 Correctness

Recall that Solv,u,cv,cu,� is the family of every set (X ∪Y ) satisfying |X | = |Y | =�, for which there exist T ∈ Tv,u,� such that X = VT , and a feasible col : X → Ysatisfying col(v) = cv and col(u) = cu.

The correctness of the algorithm follows directly from the next lemma.

Lemma 4. Every entry M[v, u, cv, cu, �] (2k − 2�)-represents Solv,u,cv,cu,�.

Proof (Lemma 4). By Steps 1–4, the lemma holds for any entry [v, u, cv, cu, �]in M such that � ≤ 2. Now, consider some v, u ∈ V , cv ∈ Col(v), cu ∈ Col(u)and 3 ≤ � ≤ k, and assume that the lemma holds for all v′, u′ ∈ V , c′

v ∈ Col(v′),c′u ∈ Col(u′) and 1 ≤ �′ < �.

For an entry N[R, colR], let Sol(R, colR)v,u,cv ,cu,� include every set (X ∪Y ) ∈Solv,u,cv,cu,� whose corresponding (v, u)-tree T ∈ Tv,u,� and feasible coloring colalso satisfy the requirements that T listens to R, and col colors the nodes in VR

exactly as colR colors them.Towards proving the main inductive claim, we need the following claim.

Claim 1. Every entry N[R, colR] (2k − 2�)-represents Sol(R, colR)v,u,cv ,cu,�.

We first show that Claim 1 implies the correctness of the main inductiveclaim. Since representation is a transitive relation, it is enough to prove thatB =

⋃[R,colR]∈N N[R, colR] (2k − 2�)-represents Solv,u,cv,cu,�. By Claim 1, B ⊆⋃

[R,colR]∈N Sol(R, colR)v,u,cv ,cu,� ⊆ Solv,u,cv,cu,�.

Consider some sets A ∈ Solv,u,cv,cu,�, and B ⊆ (V ∪ C) \ A such that |B| ≤2k − 2�. Since A ∈ Solv,u,cv,cu,�, we have that A is of the form (XA ∪YA), where|XA| = |YA| = �, for which there exist T ∈ Tv,u,� such that XA = VT , and afeasible col : XA → YA satisfying col(v) = cv and col(u) = cu. By Lemma 3,there exists R ∈ Gv,u such that T listens to R. Let colR be defined as col whenrestricted to the domain VR. We get that A ∈ Sol(R, colR)v,u,cv ,cu,�. By Claim

1, there is A ∈ N[R, colR] ⊆ B such that A∩B = ∅. Thus, B (2k −2�)-representsSolv,u,cv,cu,�. �

We now turn to prove Claim 1.

Proof (Claim 1). Consider an iteration of Step 7, corresponding to an entry N[R,colR]. For an entry L[i, �′], let R(i) be the subtree of R induced by {w1, . . . , wi}.Moreover, let Soli,�′ be the family including every set (X ∪ Y ) satisfying |X | =|Y | = �′, for which there exist a (v, wi)-tree T ∈ Tv,wi,�′ and a feasible coloringcol : X → Y , satisfying the following conditions. The subtree T listens to R(i),X = VT , and col colors the nodes in {w1, . . . , wi} exactly as colR colors them.

Towards proving Claim 1, we need the following claim.



Claim 2. Every entry L[i,�′], where (�′ =�→ i= |VR|), (2k−2�′)-represents Soli,�′ .

Since N[R, colR] = L[|VR|, �] and Sol(R, colR)v,u,cv,cu,� = Sol|VR|,�, Claim 2implies the correctness of Claim 1. �

Finally, we turn to prove Claim 2, concluding the correctness of the algorithm.

Proof (Claim 2). By Steps 9 and 10, and the induction hypothesis concerningthe matrix M, the claim holds for (i = 1 and all 1 ≤ �′ < �) and (all 1 ≤ i ≤ |VR|and �′ = 1). Now, consider some 2 ≤ i ≤ |VR| and 2 ≤ �′ ≤ �, and assume thatthe claim holds for all 1 ≤ i′ ≤ i and 1 ≤ �′′ < �′. Since representation is atransitive relation, it is enough to prove that A (2k − 2�′)-represents Soli,�′ .

By definition, a set A belongs to Soli,�′ iff there are sets U and W whoseunion is A, for which there exists 2 ≤ �′′ ≤ min{�′, �−1, k/d} satisfying (1) or (2):

1. U ∩ W = {fR(wi), colR(fR(wi))},U ∈ SolfR(wi),wi,colR(fR(wi)),colR(wi),�′′ and W ∈ Soli−1,�′−�′′+1.

2. U ∩ W = {wi, colR(wi)},U ∈ Solwi,wi,colR(wi),colR(wi),�′′ and W ∈ Soli,�′−�′′+1.

Thus, by Step 12 and the inductive hypotheses for the matrices M and L,A ⊆ Soli,�′ . Now, consider some A ∈ Soli,�′ , and B ⊆ (V ∪ C) \ A such that|B| ≤ 2k − 2�′. Since A ∈ Soli,�′ , there are U , W , and �′′ as mentioned above.

First, suppose that U , W , and �′′ correspond to the first option. Note that|(W \ {fR(wi), colR(fR(wi))}) ∪ B| = |W | − 2 + |B| ≤ 2(�′ − �′′ + 1) − 2 +(2k − 2�′) = 2k − 2�′′. Therefore, by the inductive hypothesis concerning M,

there is a set U ∈ M[fR(wi), wi, colR(fR(wi)), colR(wi), �′′] such that U ∩ ((W \

{fR(wi), colR(fR(wi))})∪B) = ∅. Moreover, |(U \{fR(wi), colR(fR(wi))})∪B| =

|U | − 2 + |B| ≤ (2�′′) − 2 + (2k − 2�′) = 2k − 2(�′ − �′′ + 1). Therefore, by the

inductive hypothesis concerning L, there is a set W ∈ L[i − 1, �′ − �′′ + 1] such

that W ∩ ((U \ {fR(wi), colR(fR(wi))}) ∪ B) = ∅.Now, suppose that U , W , and �′′ correspond to the second option. Note that

|(W \ {wi, colR(wi)}) ∪ B| = |W | − 2 + |B| ≤ 2(�′ − �′′ + 1) − 2 + (2k − 2�′) =2k − 2�′′. Therefore, by the inductive hypothesis concerning M, there is a setU ∈ M[wi, wi, colR(wi), colR(wi), �

′′] such that U∩((W \{wi, colR(wi)})∪B) = ∅.

Moreover, |(U \ {wi, colR(wi)}) ∪ B| = |U | − 2 + |B| ≤ (2�′′) − 2 + (2k − 2�′) =2k − 2(�′ − �′′ + 1). Therefore, by the inductive hypothesis concerning L, there

is a set W ∈ L[i, �′ − �′′ + 1] such that W ∩ ((U \ {wi, colR(wi)}) ∪ B) = ∅. �

3.3 Running Time

Let 0 < ε < 1 be some constant, c = 1.447, and q = 2k. Choose a constant d ≥ 2

satisfying, for any integer n,

(cn

n/d

)= O(2εn) and 1/d ≤ ε.

For any 0 ≤ r∗ ≤ q and call UniRep(c, U|V |+|C|,q, S) executed by GMD-Alg,where S is a family of subsets of size r∗ of V ∪ C, there exists 0 ≤ r′ ≤min{r∗, q/d} such that



|S| ≤ 2o(q)|V |O(d)((cq)q

(r∗ − r′)r∗−r′(cq − (r∗ − r′))q−(r∗−r′))(

(cq)q

r′r′(cq − r′)q−r′ ).

We get that GMD-Alg runs in time

O(2o(q)|V |O(d) qmaxr=0

min{q−r,q/d}maxr′=0

{(

(cq)q

rr(cq−r)q−r)(

(cq)q

r′r′(cq−r′)q−r′ )(

cq

cq−(r+r′))q−(r+r′)

})

=O(2o(q)|V |O(1) qmaxr=0

min{q−r,q/d}maxr′=0

{(

(cq)q

rr(cq−r)q−r)

(cq

r′

)(

cq

cq−(r+q/d))q−r

})

=O(2o(q)|V |O(1) qmaxr=0

{(

(cq)q

rr(cq−r)q−r)

(cq

q/d

)(

cq

cq−r−(1/d)q)q−r

})

=O(2εq+o(q)|V |O(1) qmaxr=0

{(

(cq)q

rr(cq−r)q−r)(

cq

cq−r−εq)q−r

}).

By choosing a small enough ε > 0, the maximum is obtained at r = αq, whereα ∼= 0.55277. Thus, GMD-Alg runs in time O(6.85414k|V |O(1)).

4 An Algorithm for GM

In this section we develop algorithm GM-Alg, proving the following result.

Theorem 5. GM-Alg solves GM in time O∗(5.21914k).

Algorithm GM-Alg computes families of “partial solutions” that contain onlynodes, and handles colors by adding a parameter to the matrices holding thesefamilies. More precisely, given a pair of nodes v, u ∈ V , and a subset of colorsD ⊆ C, we compute families of partial solutions of the following form. A partialsolution is a subset U ⊆ V of |D| nodes, for which there exist a (v, u)-treeT ∈ Tv,u,|D| satisfying U = VT , and a feasible coloring col : U → D. Having afamily of such partial solutions, we compute a family that represents it, callingalgorithm UniRep. Such computations of representative families are embeddedin a dynamic programming-based framework, whose progress is governed byguiding trees. Note that, since we iterate over every subset D ⊆ C, the runningtime of GM-Alg crucially relies on the fact that deletions are not allowed in GM.

5 An Algorithm for RGMD

In this section we develop algorithm RGMD-Alg, proving the following result.

Theorem 6. RGMD-Alg solves RGMD in time O∗(5.1791k).

To efficiently compute representative families, we define a partition matroidP = P (C, M, G, Col) = (E, I) as follows. Denote C = {c1, . . . , c|C|}. Now, letE = V be partitioned into sets E1, . . . , E|C|, where Ei = {v ∈ V : ci ∈ Col(v)},



for all 1 ≤ i ≤ |C|. The sets E1, . . . , E|C| are disjoint because |Col(v)| = 1, for allv ∈ V . Now, let ki = occ(ci) for all 1 ≤ i ≤ |C| (recall that occ(c) is the numberof occurences of a color c in M). Accordingly, define I = I(C, M, G, Col) ={S ⊆ E : |S ∩ E1| ≤ k1, . . . , |S ∩ E|C|| ≤ k|C|}.

Intuitively, this definition ensures that U ∈ I iff U can be colored withoutusing any color “too many” times, i.e., there exists a feasible coloring col : U →C.

Algorithm RGMD-Alg computes families of “partial solutions” that containonly nodes, and handles colors by computing representative families with respectto the partition matroid P . More precisely, when we now consider a pair of nodesv, u ∈ V , and a value 1 ≤ � ≤ k, we compute families of partial solutions of thefollowing form. A partial solution is a set of nodes U ∈ I, for which there exists a(v, u)-tree T ∈ Tv,u,� satisfying U = VT . Having a family of such partial solutions,we compute a family that represents it with respect to the matroid P , callingalgorithm ParRep. Such computations of representative families are embeddedin a dynamic programming-based framework, whose progress is governed byguiding trees.

References

1. Alon, N., Yuster, R., Zwick, U.: Color coding. J. Assoc. Comput. Mach. 42(4),844–856 (1995)

2. Ambalath, A.M., Balasundaram, R., Rao H., C., Koppula, V., Misra, N., Philip, G.,Ramanujan, M.S.: On the kernelization complexity of colorful motifs. In: Raman,V., Saurabh, S. (eds.) IPEC 2010. LNCS, vol. 6478, pp. 14–25. Springer, Heidelberg(2010)

3. Betzler, N., Bevern, R., Fellows, M.R., Komusiewicz, C., Niedermeier, R.: Pa-rameterized algorithmics for finding connected motifs in biological networks.IEEE/ACM Trans. Comput. Biol. Bioinf. 8(5), 1296–1308 (2011)

4. Betzler, N., Fellows, M.R., Komusiewicz, C., Niedermeier, R.: Parameterized al-gorithms and hardness results for some graph motif problems. In: Ferragina, P.,Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 31–43. Springer, Heidelberg(2008)

5. Bjorklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: Narrow sieves for parameter-ized paths and packings. CoRR abs/1007.1161 (2010)

6. Bjorklund, A., Kaski, P., Kowalik, L.: Probably optimal graph motifs. In: STACS,pp. 20–31 (2013)

7. Bruckner, S., Huffner, F., Karp, R.M., Shamir, R., Sharan, R.: Topology-free query-ing of protein interaction networks. J. Comput. Biol. 17(3), 237–252 (2010)

8. Dondi, R., Fertin, G., Vialette, S.: Finding approximate and constrained motifsin graphs. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp.388–401. Springer, Heidelberg (2011)

9. Dondi, R., Fertin, G., Vialette, S.: Maximum motif problem in vertex-coloredgraphs. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp.221–235. Springer, Heidelberg (2009)

10. Fellows, M.R., Fertin, G., Hermelin, D., Vialette, S.: Sharp tractability borderlinesfor finding connected motifs in vertex-colored graphs. In: Arge, L., Cachin, C.,Jurdzinski, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 340–351.Springer, Heidelberg (2007)



11. Fellows, M.R., Fertin, G., Hermelin, D., Vialette, S.: Upper and lower bounds forfinding connected motifs in vertex-colored graphs. J. Comput. Syst. Sci. 77(4),799–811 (2011)

12. Fionda, V., Palopoli, L.: Biological network querying techniques: Analysis andcomparison. J. Comput. Biol. 18(4), 595–625 (2011)

13. Fomin, F.V., Lokshtanov, D., Saurabh, S.: Efficient computation of representativesets with applications in parameterized and exact agorithms. In: SODA (see also:CoRR abs/1304.4626), pp. 142–151 (2014)

14. Guillemot, S., Sikora, F.: Finding and counting vertex-colored subtrees. In:Hlineny, P., Kucera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 405–416. Springer,Heidelberg (2010)


16. Koyuturk, M.: Algorithmic and analytical methods in network biology. Wiley In-terdiscip. Rev. Syst. Biol. Med. 2(3), 277–292 (2010)

17. Lacroix, V., Fernandes, C.G., Sagot, M.F.: Motif search in graphs: Application tometabolic networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 3(4), 360–368 (2006)

18. Lokshtanov, D., Misra, P., Panolan, F., Saurabh, S.: Deterministic truncation oflinear matroids. CoRR abs/1404.4506 (2014)

19. Oxley, J.G.: Matroid theory. Oxford University Press (2006)20. Pinter, R.Y., Shachnai, S., Zehavi, M.: Deterministic parameterized algorithms for

the graph motif problem,http://www.cs.technion.ac.il/~hadas/PUB/Graph_Motif_full.pdf

21. Pinter, R.Y., Zehavi, M.: Partial information network queries. In: Lecroq, T.,Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 362–375. Springer, Hei-delberg (2013)

22. Pinter, R.Y., Zehavi, M.: Algorithms for topology-free and alignment networkqueries. J. Discrete Algorithms (to appear, 2014)

23. Pinter-Wollman, N., Hobson, E.A., Smith, J.E., Edelman, A.J., Shizuka, D., deSilva, S., Waters, J.S., Prager, S.D., Sasaki, T., Wittemyer, G., Fewell, J., Mc-Donald, D.B.: The dynamics of animal social networks: analytical, conceptual, andtheoretical advances. Behavioral Ecology 25(2), 242–255 (2014)

24. Rizzi, R., Sikora, F.: Some results on more flexible versions of graph motif. In:Hirsch, E.A., Karhumaki, J., Lepisto, A., Prilutskii, M. (eds.) CSR 2012. LNCS,vol. 7353, pp. 278–289. Springer, Heidelberg (2012)

25. Shachnai, H., Zehavi, M.: Faster computation of representative families for uniformmatroids with applications. CoRR abs/1402.3547 (2014)

26. Sikora, F.: An (almost complete) state of the art around the graph motif problem.Universite Paris-Est Technical reports (2012)

27. Williams, V.V.: Multiplying matrices faster than Coppersmith-Winograd. In:STOC, pp. 887–898 (2012)


3.8 Improved Parameterized Algorithms for Network Query

Problems

Ron Y. Pinter, Hadas Shachnai and Meirav Zehavi. Improved Parameterized Algorithms

for Network Query Problems. In the proc. of the 9th International Symposium on

Parameterized and Exact Computation (IPEC), pages 85–96, 2014.

143


Improved Parameterized Algorithmsfor Network Query Problems

Ron Y. Pinter, Hadas Shachnai, and Meirav Zehavi(B)

Department of Computer Science, Technion, 32000 Haifa, Israel{pinter,hadas,meizeh}@cs.technion.ac.il

Abstract. In the Partial Information Network Query (PINQ)problem, we are given a host graph H, and a pattern P whose topologyis partially known. We seek a subgraph of H that resembles P. PINQis a generalization of Subgraph Isomorphism, where the topology ofP is known, and Graph Motif, where the topology of P is unknown.This generalization has important applications to bioinformatics, sinceit addresses the major challenge of analyzing biological networks in theabsence of certain topological data. In this paper, we use a non-standardpart-algebraic/part-combinatorial hybridization strategy to develop anexact parameterized algorithm as well as an FPT-approximation schemefor PINQ, allowing near resemblance between H and P. We thus unifyand significantly improve previous results related to network queries.

1 Introduction

With the increasing amount of data on biological networks available, thediscovery of conserved patterns has become of major importance. Such pat-terns can be identified through the use of network queries, which compare thegraph modeling the network with a given pattern. Indeed, the Alignment Net-work Query (ANQ) and Graph Motif (GM) problems play a pivotal role inthe analysis of biological networks [16,31]. Due to their general nature, they canalso be used in analyzing other types of networks, such as social and technicalnetworks [15].

Given a pattern P and a graph H, GM and ANQ seek a subgraph of H thatresembles P. GM requires only the connectivity of the solution, while ANQ, avariant of the classic Subgraph Isomorphism (SI) problem, requires resem-blance between the topology of P and the solution. The Partial InformationNetwork Query (PINQ) problem, introduced in [27], fits for the commonscenario where we have only partial information on the topology of P.

Since network query problems are often NP-hard, there is a growing body ofliterature studying their parameterized complexity. A problem is fixed-parametertractable (FPT) with respect to a parameter k if it can be solved in timeO∗(f(k)) for some function f .1 In this paper, we introduce a non-standard part-algebraic/part-combinatorial hybridization strategy, which we use to develop an

1 The notation O∗ hides factors polynomial in the input size.

c© Springer International Publishing Switzerland 2014M. Cygan and P. Heggernes (Eds.): IPEC 2014, LNCS 8894, pp. 294–306, 2014.DOI: 10.1007/978-3-319-13524-3 25


Improved Parameterized Algorithms for Network Query Problems 295

h(v)vu1v1u3v2u4v5u8v8u7v9

G G'

U = {v3,v4,v6,v7,v10}U' = {u2,u5,u6,u9,u10}

v1 v2 v3 v4 v5

v6 v7 v8 v9

v10

u1 u2 u3 u4

u5 u6 u7 u8

u9 u10

v1 v2 v5

v8 v9G \U

u1 u3 u4

u7 u8G' \U'

Fig. 1. An example of a homeomorphism h from G to G′.

exact as well as approximation FPT algorithms for a variant of PINQ thatallows insertions and deletions (indels) of nodes.

1.1 Problem Statement

Given a graph H and a set of graphs P, in PINQI we seek a disjoint collectionof subgraphs of H, each resembling a different graph in P, whose union is aconnected graph. Each of these subgraphs is mapped to the graph it resemblesin P, by using a variant of isomorphism allowing to delete degree-2 nodes, calledhomeomorphism (defined below). For biological motivation, see, e.g., [26].

Homeomorphism: Given a graph G = (V,E) and a set U of degree-2 nodes in V,generate the multigraph G\U as follows (see Fig. 1). Delete from G the nodes inU and their incident edges. For every pair v, u ∈ V \U and simple path connect-ing them, in which all other nodes belong to U , add an edge {v, u}. For everyv ∈ V \ U and simple cycle in G consisting only of v and nodes in U, add aself-loop to v.

A homeomorphism from G = (V,E) to G′ = (V ′, E′) is defined as an iso-morphism from G \ U to G′ \ U ′, where U and U ′ are subsets of degree-2 nodesin V and V ′, respectively. To simplify the presentation, we use the term home-omorphism also when referring to a function whose domain is empty.

Definition of PINQI: The input for PINQI consists of a set of graphs P ={P1, ..., Pt}, where Pi = (Vi, Ei), and a graph H = (V,E) having real numbersas edge-weights, along with a similarity score table Δ. The table Δ contains anentry Δ(p, h) ∈ R∪{−∞} for any pair of nodes p, h, where p ∈ Vi, 1 ≤ i ≤ t andh ∈ V (an entry Δ(p, h) = −∞ indicates that p and h cannot be matched). Theinput contains also the nonnegative integers IF , IA and D. Let k =

∑ti=1 |Vi|

denote the total number of nodes in P (see Fig. 2(A)).We now give the definition of a solution for PINQI (see Fig. 2(B)). Let S =

(S, V 1S , ..., V t+1

S , h1, ..., ht), where S = (VS , ES) is a connected subgraph of H,{V 1

S , ..., V t+1S } is a partition of VS , and hi is a homeomorphism from Pi to the

subgraph of S induced by V iS , for all 1 ≤ i ≤ t. Let dom(f) and ima(f) denote

the domain and image of a function f , respectively; denote by w(e) the weightof an edge e. The number of indels and score of S are defined as follows.


296 R.Y. Pinter et al.

Fig. 2. (A) An input for PINQI, where k = 8. (B) A solution for the input.

– The number of free insertions is |V t+1S |. Informally, this is the number of nodes

connecting the subgraphs of S that are mapped to graphs in P.– The number of alignment insertions is the number of unmapped nodes in⋃t

i=1 V iS , i.e.,

∑ti=1 |V i

S \ ima(hi)|. Informally, this is the number of nodes thatare not mapped to nodes of graphs in P, and yet belong to the subgraphs ofS that are mapped to P.

– The number of deletions is the number of unmapped nodes in P, i.e.,∑t

i=1 |Vi\dom(hi)|.

– The score is the sum of the similarity scores between the matched nodes, andthe weights of the edges in ES , i.e.,

∑ti=1

∑p∈dom(hi)

Δ(p, hi(p))+∑

e∈ESw(e).

We say that S is a solution if it includes exactly IF free insertions, IA alignmentinsertions and D deletions, and any cycle in S is completely contained in thesubgraph induced by V i

S , for some 1 ≤ i ≤ t. The cycle requirement allows usto avoid generalizing the Clique problem, which is W[1]-hard [14].2 Assumingthat there is no solution having less than IF free insertions,3 the objective ofPINQI is to find the maximum score OPT of a solution.

Relation of PINQI to Known Network Queries: Clearly, PINQ is thespecial case where IF = IA = D = 0. Also, ANQ with Indels (ANQI) [13] isthe special case where t = 1. Finally, GM with Indels (GMI) [10] is the specialcase where t = k, and Δ(p, h) ∈ {−∞, 0} for any p ∈ Vi, 1 ≤ i ≤ t and h ∈ V.

1.2 Related Work and Our Contribution

ANQ is NP-hard even if the single graph in P is a path, since this case generalizesthe Hamiltonian path problem [18]. GM is NP-hard even if H is a tree [24].

2 Indeed, without the cycle requirement, Clique is the special case where t = k,IF = IA = D = 0, Δ(p, h) = 0 for all p ∈ Vi, 1 ≤ i ≤ t and h ∈ V , and w(e) = 1 forall e ∈ E.

3 If such solution exists, we can simply reject the input.



Table 1. Parameterized algorithms for PINQI.

Reference Weights Indels The topology of each Pi Time complexity

Pinter et al. [27] R No Tree O∗(6.75k+O(log2 k)3t)

This paper Z Yes Bounded treewidth O∗(3.7k−D+IAW)

This paper$ N0 Yes Bounded treewidth O∗(3.7k−D+IA�1ε�)

Table 2. Parameterized algorithms for ANQI.

Reference Weights Indels The topology of P1 Time complexity

Blin et al. [7] R Yes Bounded feedback vertex set O∗(8.2k+IA)

Dost et al. [13] R Yes Bounded treewidth O∗(8.2k+IA)

Shlomi et al. [30] R Yes Simple path O∗(5.44k+IA)

Huffner et al. [20] R Yes Simple path O∗(4.32k+IA)

Pinter et al. [27] R No Tree O∗(6.75k)

Pinter et al. [27] R No Simple path O∗(4k)

This paper Z Yes Bounded treewidth O∗(2k−D+IAW)

This paper$ N0 Yes Bounded treewidth O∗(2k−D+IA�1ε�)

Tables 1, 2 and 3 present known FPT algorithms for PINQI, ANQI andGMI, where tw is the maximum treewidth [8] of a graph in P. The Weightscolumns refer to the possible values for edge-weights and scores in Δ, excluding−∞, and W denotes the maximum absolute value of any weight. Typically, inour applications W is polynomial in the input size [24]. Entries marked by ’$’indicate instances for which we present an FPT-approximation scheme (FPT-AS), that returns a value in [(1 − ε)OPT,OPT ], for any fixed ε > 0.

Our main result is Exact, a randomized O∗(3.7k−D+IAW ) time exact algo-rithm for PINQI, which handles a wide class of inputs (see Theorem 1). We thencomplement Exact by developing an FPT-AS for PINQI (see Theorem 2).

Algorithm Exact improves and unifies the previous results as follows.

– We extend the PINQ algorithm presented i n [27], by considering indels andbounded treewidth graphs (see Table 1). Note that a graph with a boundedfeedback vertex set has a bounded treewidth [9]. Thus, our results hold alsofor graphs with bounded feedback vertex sets.

– For inputs with polynomially bounded integral weights, we significantlyimprove the O∗ running times of the best known algorithms for PINQ (dueto [27]) and ANQI (due to [13] and [20]). For example, using the real datapresented in [24], the weights in the table Δ can take integral values in{−∞, 0, . . . , 4}. Applying the best known algorithm (of [13]) for ANQI, whereP1 has a bounded treewidth, we get a running time of O∗(8.2k+IA), whereasExact solves ANQI on such inputs in time O∗(2k−D+IA). We note that bothalgorithms have the same dependency on the treewidth of P1.



Table 3. Parameterized algorithms for GMI.

Reference Weights Indels Time complexity

Bruckner et al. [10] R Yes O∗(k!3k)

Dondi et al. [12] {0} Yes O∗(2O(k−D))

Fellows et al. [15] {0} No O∗(87k)

Pinter et al. [27] R No O∗(20.25k+O(log2 k))

Betzler et al. [2] {0} Yes O∗(29.6k−D)

Betzler et al. [2] {0} No O∗(10.88k)

Betzler et al. [3] {0} No O∗(4.32k)

Guillemot et al. [19] N0 Yes O∗(4k−DW 2)

Koutis [22] {0} Yes O∗(2.54k−D)

Pinter et al. [28] N0 Yes O∗(2kW )

Bjorklund et al. [6] {0} Yes O∗(2k−D)

This paper Z Yes O∗(2k−DW)

This paper$ N0 Yes O∗(2k−D�1ε�)

– We extend the algorithm for GMI presented in [6] to handle integral weights.– Exact has the same O∗ running time as the best known FPT algorithms for SI,

in which the subgraph is a tree [23], or has a bounded treewidth [17]. The sameholds for Group Steiner Tree [25], and Min Connected Components[28]. Indeed, all of these problems are special cases of PINQI.

Due to lack of space, some proofs are omitted. The detailed results will begiven in the full version of this paper.

Notation: Let V (P) =⋃t

i=1 Vi and E(P) =⋃t

i=1 Ei be the sets of nodes andedges in P, respectively. Also let P∗ be the set of single-node graphs in P,V (P∗) =

⋃Pi∈P∗ Vi and k∗ = |P∗|. Let V (G) and E(G) be the node-set and

edge-set of a graph G, respectively.

2 Main Technique

In developing algorithm Exact, we combine the algebraic narrow sieves technique[5] (see also [21]) with the divide-and-color technique [11], which are often usedas two separate tools in solving parameterized problems. Our approach containsa novel application of narrow sieves that consists of two monomial-associatingprocedures, rather than one such procedure, as detailed below. It may be usefulin obtaining fast FPT algorithms for other problems that include as special cases“color coding-related” problems (indeed, PINQI encompasses GM and SI). Wenote that the sophisticated algebraic algorithm for the Hamiltonicity problemby Bjorklund [4] (see also [5]) also applies (as preprocessing) a combinatorialpartitioning phase.



In the narrow sieves technique, we express a parameterized problem byassociating monomials with potential solutions. Each monomial either repre-sents a unique correct solution, or an even number of incorrect solutions. Havinga polynomial that is the sum of such monomials, we need to determine whetherit has a monomial whose coefficient is odd. On the other hand, divide-and-coloris a combinatorial technique where, in each step, we have a set Sn of n elements,and we seek a certain subset Sk of k elements in Sn. We randomly partition Sn

into two sets: S1n and S2

n. Thus, we get the problem of finding a subset S ⊆ Sk

in S1n, and another problem of finding the subset Sk \ S in S2

n.In solving PINQI, we first observe that if the nodes in V can be mapped

only to nodes in V (P∗), PINQI can be efficiently solved by a narrow sievesprocedure, PA, that is a straightforward extension of the algorithm for GMgiven in [6]. On the other hand, if |P| = 1, PINQI can be efficiently solved by adifferent procedure, PB , using a standard application of narrow sieves.

Now, suppose that we have a partition of V into a set A of nodes that can bemapped only to nodes in V (P∗), and a set B of nodes that can be mapped onlyto nodes in V (P) \ V (P∗). For such a scenario, we develop a non-trivial narrowsieves procedure, ManySingles, handling nodes in A in an efficient manner similarto PA, and nodes in B in a manner similar to PB. To handle only such scenarios,before each call to ManySingles, we use divide-and-color to partition V into thesets A and B.4 Indeed, since we build on the results of [6], the correctness ofManySingles crucially relies on the fact that A does not contain nodes that canbe mapped to nodes in V (P) \ V (P∗).

The combined application of divide-and-color and ManySingles is efficient onlyfor solutions containing many graphs from P∗.5 However, solutions containingfew graphs from P∗ cannot contain too many graphs from P (since each solutioncontains exactly k nodes from V (P)). For such solutions, we develop a procedure,FewSingles, handling all the nodes in V in a manner similar to PB .

Thus, our algorithm proceeds in the following main steps:

1. Examine all choices for the number nP∗ of graphs from P∗ in the solution.2. If nP∗ is “large”:

(a) Apply divide-and-color to partition V as described above.(b) Call ManySingles.

3. Else: Call FewSingles.

3 The Procedures FewSingles and ManySingles

Assume that P is a set of bounded treewidth graphs, and the weights arenonnegative integers (recall that the weights are the possible values for edge-weights and scores in Δ, excluding −∞). Algorithm Exact (see Sect. 4) onlyneeds the procedures to be correct under these assumptions. For the sake ofclarity, we first present a simple version of FewSingles that cannot handle indels.

4 This can be also viewed as applying the color coding technique [1] using only twocolors.

5 For such solutions, the time gained by handling A in a manner similar to PA prevailsthe time required for the preceding selection step.



3.1 SimpleFewSingles: A Narrow Sieves Procedure

Assuming that IF = IA = D = 0, we present a narrow sieves procedurethat efficiently finds solutions containing few graphs in P∗. We first define thestructure of a potential solution. We then describe the potential solutions, andassociate them with monomials. We show how to evaluate some sums of suchmonomials, and finally, we present the procedure, which heavily relies on suchevaluations.

3.1.1 The Structure of a Potential Solution

Recall that any cycle in a solution S is contained in a subgraph induced by V iS ,

for some 1 ≤ i ≤ t. Thus, by contracting each of the subgraphs into a singlenode, and choosing a node as a root, any solution for PINQI can be representedby a rooted tree. We study the mappings of such trees (into graphs in P) below.

A quad (T, fgra, fnod, fcon) refers to a rooted tree T = (VT , ET ) on t nodes,fgra : VT → P, fnod : X → V and fcon : X → 2X , where X = {(v, p) :v ∈ VT , p ∈ V (fgra(v))}. Informally, such a quad describes a structure for asolution as follows. T and fgra specify which graphs to choose from P and howto connect them; fnod indicates how to map the nodes of graphs chosen from Pto nodes in V ; and fcon refines our information about how the chosen graphs areconnected. Next, we define the quads corresponding to structures of potentialsolutions for PINQI.

Definition 1. Given r ∈ V , we say that a quad (T, fgra, fnod, fcon) is r-good if:

1. |{(v, p) ∈ dom(fnod) : fnod(v, p) = r}| =|{p ∈ V (fgra(root(T ))) : fnod(root(T ), p) = r}| = 1.

2. ∀v ∈ VT , {p, p′} ∈ E(fgra(v)) : {fnod(v, p), fnod(v, p′)} ∈ E.3. ∀(v, p) ∈ dom(fnod) : Δ(p, fnod(v, p)) = −∞.4. ∀(v, p) ∈ dom(fcon), (u, p′) ∈ fcon(v, p) :

(a) v is the father of u in T , and {fnod(v, p), fnod(u, p′)} ∈ E.(b) ∀(u′, p′′) ∈ fcon(v, p) \ {(u, p′)} : fnod(u, p′) = fnod(u

′, p′′).5. ∀u ∈ VT \ {root(T )} : |{(v, p, p′) : (u, p′) ∈ fcon(v, p)}| = 1.

Condition 1 states that we map only one node in V (P) to r, and this nodebelongs to the graph mapped to the root of T . Condition 2 requires that themapping of the graphs in P to subgraphs of H is correct (i.e., we map edgesof graphs in P to edges in E). By Condition 3, we do not match a node inV (P) with a node in V that cannot be matched according to Δ. Condition 4astates that fcon does not contradict the information provided by T on the edgesconnecting the graphs in P. More precisely, a node v being a father of a node uin T implies that fgra(v) and fgra(u) are connected by an edge. Only then fcon

may provide information on the connecting edge, where (u, p′) ∈ fcon(v, p), forsome p ∈ V (fgra(v)) and p′ ∈ V (fgra(u)), implies that p and p′ are connectedby an edge (which, by this condition, is mapped to an edge in E). Condition 4cavoids some quads in which several nodes in V (P) are mapped to the same nodein V . Finally, Condition 5 states that for each pair of a node u and its father v in



T , fcon provides information on exactly one pair (p, p′), for some p ∈ V (fgra(v))and p′ ∈ V (fgra(u)), indicating that p and p′ are connected by an edge.

We now define the score of an r-good quad by the mapping of the edges inE(P), the pairs of matched nodes, and the edges connecting the graphs in P.

Definition 2. The score of an r-good quad (T, fgra, fnod, fcon) is

∑

v∈VT ,{p,p′}∈E(fgra(v))

w({fnod(v, p), fnod(v, p′)})

+∑

(v,p)∈dom(fnod)

[Δ(p, fnod(v, p)) +∑

(u,p′)∈fcon(v,p)

w({fnod(v, p), fnod(u, p′)})].

3.1.2 Potential Solutions

Let L = {1, ..., k + t} be the set of indices used in labeling r-good quads (recallthat t = |P| and k =

∑ti=1 |Vi|), defining potential solutions of the same

score as follows.

Definition 3. Given an r-good quad (T, fgra, fnod, fcon) and � : VT ∪dom(fnod)→ L satisfying |dom(�)| = k + t, we say that (T, fgra, fnod, fcon, �) is an r-solution.

We now define two sets: Sol(r, s) contains all r-solutions (T, fgra, fnod, fcon, �) ofscore s where � is bijective; and Cor(r, s) = {(T, fgra, fnod, fcon, �) ∈ Sol(r, s) :fgra and fnod are injective}. The next lemma implies that each set Cor(r, s)includes enough r-solutions from Sol(r, s), and all these r-solutions are correct.

Lemma 1. The input has a solution of score s iff⋃

r∈V Cor(r, s) = ∅.

Note that (T, fgra, fnod, fcon, �), (T ′, f ′gra, f ′

nod, f′con, �′) ∈ Sol(r, s) are equal iff

there is an isomorphism iso between the rooted trees T and T ′, such that

1. ∀v ∈ VT : fgra(v) = f ′gra(iso(v)), and �(v) = �′(iso(v)).

2. ∀(v, p) ∈ dom(fnod) : fnod(v, p) = f ′nod(iso(v), p), �(v, p) = �′(iso(v), p), and

[∀(u, p′) : (u, p′) ∈ fcon(v, p) iff (iso(u), p′) ∈ f ′con(iso(v), p)].

3.1.3 Associating Monomials with Potential Solutions

Recall that, in the narrow sieves technique, a parameterized problem is solved viaassociating monomials with potential solutions. Towards defining these mono-mials, we introduce the variables x, ye,h for all e ∈ V (P) ∪ V and h ∈ V , andze,l for all e ∈ P ∪ V and l ∈ L. Let ind denote the number of these variables,i.e., ind = 1 + (k + |V |)|V | + (t + |V |)|L|.

We next define the monomials associated with potential solutions. In defininga monomial for an r-solution sol ∈ Sol(r, s), we store information about sol thatallows reconstructing sol iff it is a correct solution (i.e., sol ∈ Cor(r, s)).



Definition 4. m(T, fgra, fnod, fcon, �) = xs∏

v∈VTzfgra(v),�(v)

∏

(v,p)∈dom(fnod)

[yp,fnod(v,p)zfnod(v,p),�(v,p)

∏

(u,p′)∈fcon(v,p)

yfnod(v,p),fnod(u,p′)].

Given an r-solution, x tracks its score (as in [4]);∏

v∈VT

zfgra(v),�(v) marks which

graphs to choose from P and how to label them;∏

(v,p)∈dom(fnod)

yp,fnod(v,p)

zfnod(v,p),�(v,p) specifies how to map nodes in V (P) to nodes in V and how to

label nodes in V ; and∏

(v,p)∈dom(fnod),(u,p′)∈fcon(v,p)

yfnod(v,p),fnod(u,p′) notes how to

connect the graphs chosen from P.We now claim that correct solutions are associated with unique monomials,

and a monomial of an incorrect solution represents an even number of solutions.

Lemma 2. Pairs {sol, sol′} of different solutions in Cor(r, s) satisfy m(sol) =m(sol′), while Sol(r, s) \ Cor(r, s) can be partitioned into pairs {sol, sol′} s.t.m(sol) = m(sol′).

3.1.4 Evaluating the Sum of the Monomials

For each r ∈ V, let P (r) =∑

s∈{0,...,(|V |+|E|)W},sol∈Sol(r,s) m(sol). We willevaluate these polynomials over Fq, the finite field of order q, whereq = 2�log2(10(2(k+t)+t))�.

By Lemmas 1 and 2, the input has a solution of score s iff there exists anode r ∈ V such that P (r) has a monomial with an odd coefficient in which thedegree of x is s. Since Fq has characteristic 2, we have the following result.

Lemma 3. The input has a solution of score s iff there is a node r ∈ V suchthat P (r) has a monomial in which the degree of x is s.

Given A ⊆ L, let PA(r) =∑

sol is an r−solution in which ima(�)⊆A m(sol). Using

inclusion-exclusion, and since Fq has characteristic 2, we have that P (r) =∑A⊆L PA(r). Thus, we can evaluate P (r) by using the following lemma.

Lemma 4. Let A ⊆ L and a1, ..., aind−1 ∈ Fq. For all r ∈ V, we can eval-uate PA(r)(x, a1, ..., aind−1) (assign values to all variables except x) in timeO(W log W |V |tw+O(1)kO(1)) and space O(W |V |tw+O(1)kO(1)) (using dynamicprogramming).

3.1.5 Concluding Procedure SimpleFewSingles

SimpleFewSingles first chooses values from Fq (see below), to be assigned to all thevariables, excluding x, of polynomials of the form PA(r). It evaluates these poly-nomials, and thus evaluates polynomials of the form P (r). Finally, it determinesthe maximum score s of a solution by verifying that at least one evaluation of a



Procedure. SimpleFewSingles(P,H,Δ)

1: forall r ∈ V do Sum[r] ⇐ 0. end for

2: select a1, . . . , aind−1 ∈ Fq independently and uniformly at random.

3: forall A ⊆ L, r ∈ V do Sum[r] ⇐ Sum[r] + PA(r)(x, a1, . . . , aind−1). end for

4: return the maximum value s such that (there exists r ∈ V for which Sum[r] is

a nonzero polynomial of degree s), where if no such s exists – reject.

polynomial P (r) resulted in a polynomial (whose only variable is x) of degree s.Lemma 4 implies the time and space complexities of SimpleFewSingles, whilecorrectness follows from Lemma 3 and the Schwartz–Zippel lemma [29,32].

Lemma 5. If there is a solution, then SimpleFewSingles returns OPT with prob-ability ≥ 9

10 , and does not return a higher score otherwise; else, it rejects. It uses

O(2k+tW log W |V |tw+O(1)kO(1)) time and O(W |V |tw+O(1)kO(1)) space.

3.2 Procedures FewSingles and ManySingles

FewSingles extends SimpleFewSingles to handle indels. The input is of the form(nE , nP ,P,H,Δ, IF , IA,D), where nE and nP indicate that we seek solutions ofexactly nE edges from E, such that nP graphs in P are not entirely deleted.

Lemma 6. If there is a solution s.t. |ES | = nE, where {V 1S , . . . , V t

S} includesexactly nP nonempty sets, then FewSingles returns the maximum score of such asolution with probability ≥ 9

10 , and not a higher score otherwise; else, it rejects.

It uses O(2k−D+IA+nP W log W |V |tw+O(1)kO(1)) time and O(W |V |tw+O(1)kO(1))space.

ManySingles, extending [6], efficiently finds solutions of many graphs from P∗. Itsinput is of the form (nE , nP∗ , nP ,P,H,Δ, IF , IA,D), where nP∗ indicates thatwe seek solutions of exactly nP∗ graphs from P∗. ManySingles assumes that thereis a set U ⊆ V satisfying [∀h ∈ U : If p ∈ V (P) \ V (P∗) then Δ(p, h) = −∞]and [∀h ∈ V \ U : If p ∈ V (P∗) then Δ(p, h) = −∞].

Lemma 7. If there is a solution without alignment insertions from U , satisfying|ES | = nE, in which {V 1

S , ..., V tS} includes exactly nP nonempty sets and nP∗

one-node sets, then ManySingles returns the maximum score of such a solutionwith probability ≥ 9

10 , and not a higher score otherwise; else, it rejects. It uses

O(2k−D+IA+nP−nP∗ W log W |V |tw+O(1)kO(1)) time and O(W |V |tw+O(1)kO(1))space.

4 An Exact Algorithm

We now describe our main algorithm (see below). Exact first manipulates theweights to be nonnegative (Step 1). The variable s, initialized to −∞, holds thehighest score found so far, corresponding to the original weights. Exact iterates



over all choices for nE , nP∗ and nP , specifying the number of edges, graphsfrom P∗ and graphs from P, respectively, in the currently searched solution(Step 2).

For each choice, Exact uses a calculation which determines whether nP∗ is“small” or “large” (Step 3), indicating whether it is now preferable (in termsof running time) to call FewSingle or ManySingles. If nP∗ is “small”, Exact callsFewSingles to compute the maximum score of a solution complying with nE , nP∗

and nP (Step 4). In this step, the term v(nE +k−D) is used to correctly comparebetween the score returned by FewSingles and s, since only s corresponds to theoriginal weights. Now, suppose that nP∗ is “large”. Before calling ManySingles(Step 11), Exact uses divide-and-color (Steps 6–10) to examine several choicesof nodes in V for mapping graphs in P∗, and those used for mapping graphsin P \ P∗. In particular, the number of iterations of Step 6 ensures that, withhigh probability, Exact examines such a choice that complies with a solution ofmaximum score. Finally, Exact returns the score s, unless no solution was found,in which case it rejects (Step 15).

Algorithm 2. Exact(P,H,Δ, IF , IA,D)

1: subtract v =min(weights) from every weight, and initialize s ⇐ −∞.2: for nE = 0, . . . , |E|, nP∗ = max{0, k∗ − D}, . . . , min{k∗, k − D},

nP = nP∗ , . . . , min{t, nP∗ + (k − D + IA − nP∗)/2} do

3: if 2nP∗ ≤ (k − D + IA)k−D+IA

nnP∗P∗ (k − D + IA − nP∗)k−D+IA−nP∗

then

4: if FewSingles(nE , nP , P, H, Δ, IF , IA, D) returns s′ > s − v(nE + k − D)then s ⇐ s′ + v(nE + k − D). end if

5: else

6: for10(k − D + IA)k−D+IA

nP∗ nP∗ (k − D + IA − nP∗)k−D+IA−nP∗ times do

7: initialize U ⇐ ∅ and λ ⇐ Δ.8: forall h ∈ V , with probability

nP∗(k−D+IA)

do add h to U . end for

9: forall p ∈ V (P) \ V (P∗), h ∈ U do λ(p, h) ⇐ −∞. end for10: forall p ∈ V (P∗), h ∈ V \ U do λ(p, h) ⇐ −∞. end for11: if ManySingles(nE , nP∗ , nP , P, H, λ, IF , IA, D) returns

s′ > s − v(nE + k − D) then s ⇐ s′ + v(nE + k − D). end if12: end for13: end if14: end for15: if s = −∞ then return s. else reject. end if

Theorem 1. Exact solves PINQI in O(3.698k−D+IAW log W |V |tw+O(1)kO(1))time and O(W |V |tw+O(1)kO(1)) space, handling instances with integer weights,where P is a set of bounded treewidth graphs. Its running time for ANQI isO∗(2k+IA−DW ), and for GMI, O∗(2k−DW ).



Using scaling and rounding, we obtain the next result.

Theorem 2. There is an FPT-AS for PINQI, handling instances withnonnegative integer weights, where P is a set of bounded treewidth graphs. It usesO(3.698k−D+IA� 1

ε log(� 1ε )|V |tw+O(1)kO(1)) time and O(� 1

ε |V |tw+O(1)kO(1))space. Its running time for ANQI is O∗(2k+IA−DW ), and for GMI, O∗(2k−DW ).

References

1. Alon, N., Yuster, R., Zwick, U.: Color coding. J. Assoc. Comput. Mach. 42(4),844–856 (1995)

2. Betzler, N., Bevern, R., Fellows, M.R., Komusiewicz, C., Niedermeier, R.:Parameterized algorithmics for finding connected motifs in biological networks.IEEE/ACM Trans. Comput. Biol. Bioinf. 8(5), 1296–1308 (2011)

3. Betzler, N., Fellows, M.R., Komusiewicz, C., Niedermeier, R.: Parameterized algo-rithms and hardness results for some graph motif problems. In: Ferragina, P.,Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 31–43. Springer, Heidelberg(2008)

4. Bjorklund, A.: Determinant sums for undirected hamiltonicity. In: FOCS, pp. 173–182 (2010)

5. Bjorklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: Narrow sieves for parameter-ized paths and packings. CoRR (2010). arxiv:1007.1161

6. Bjorklund, A., Kaski, P., Kowalik, L.: Probably optimal graph motifs. In: STACS,pp. 20–31 (2013)

7. Blin, G., Sikora, F., Vialette, S.: Querying graphs in protein-protein interactionsnetworks using feedback vertex set. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(4),628–635 (2010)

8. Bodlaender, H.L.: A linear-time algorithm for finding tree-decompositions of smalltreewidth. SIAM J. Comput. 25(6), 1305–1317 (1996)

9. Bodlaender, H.L., Koster, A.M.C.A.: Combinatorial optimization on graphs ofbounded treewidth. Comput. J. 51(3), 255–269 (2008)

10. Bruckner, S., Huffner, F., Karp, R.M., Shamir, R., Sharan, R.: Topology-free query-ing of protein interaction networks. J. Comput. Biol. 17(3), 237–252 (2010)

11. Chen, J., Kneis, J., Lu, S., Molle, D., Richter, S., Rossmanith, P., Sze, S., Zhang,F.: Randomized divide-and-conquer: Improved path, matching, and packing algo-rithms. SIAM J. Comput. 38(6), 2526–2547 (2009)

12. Dondi, R., Fertin, G., Vialette, S.: Maximum motif problem in vertex-coloredgraphs. In: CPM, pp. 388–401 (2011)

13. Dost, B., Shlomi, T., Gupta, N., Ruppin, E., Bafna, V., Sharan, R.: Qnet: a toolfor querying protein interaction networks. J. Comput. Biol. 15(7), 913–925 (2008)

14. Downey, R.G., Fellows, M.R.: Fixed-parameter tractability and completeness II:on completeness for W[1]. Theor. Comput. Sci. 141(1–2), 109–131 (1995)

15. Fellows, M.R., Fertin, G., Hermelin, D., Vialette, S.: Upper and lower bounds forfinding connected motifs in vertex-colored graphs. J. Com. Sys. Sci. 77(4), 799–811(2011)

16. Fionda, V., Palopoli, L.: Biological network querying techniques: Analysis andcomparison. J. Comput. Biol. 18(4), 595–625 (2011)

17. Fomin, F.V., Lokshtanov, D., Raman, V., Saurabh, S., Rao, B.V.R.: Faster algo-rithms for finding and counting subgraphs. J. Com. Sys. Sci. 78(3), 698–706 (2012)



18. Garey, M.R., Johnson, D.S.: Computers And Intractability: A Guide To The The-ory Of Np-Completeness. W.H. Freeman, New York (1979)

19. Guillemot, S., Sikora, F.: Finding and counting vertex-colored subtrees. Algorith-mica 65(4), 828–844 (2013)

20. Huffner, F., Wernicke, S., Zichner, T.: Algorithm engineering for color-coding withapplications to signaling pathway detection. Algorithmica 52(2), 114–132 (2008)

21. Koutis, I.: Faster algebraic algorithms for path and packing problems. In: Aceto, L.,Damgard, I., Goldberg, L.A., Halldorsson, M.M., Ingolfsdottir, A., Walukiewicz,I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 575–586. Springer, Heidelberg(2008)


23. Koutis, I., Williams, R.: Limits and applications of group algebras for parameter-ized problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas,S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 653–664. Springer,Heidelberg (2009)

24. Lacroix, V., Fernandes, C.G., Sagot, M.F.: Motif search in graphs: Applicationto metabolic networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 3(4), 360–368(2006)

25. Misra, N., Philip, G., Raman, V., Saurabh, S., Sikdar, S.: FPT algorithms forconnected feedback vertex set. J. Comb. Optim. 24(2), 131–146 (2012)

26. Pinter, R.Y., Rokhlenko, O., Yeger-Lotem, E., Ziv-Ukelson, M.: Alignment ofmetabolic pathways. Bioinformatics 21(16), 3401–3408 (2005)

27. Pinter, R.Y., Zehavi, M.: Partial information network queries. In: Lecroq, T.,Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 362–375. Springer, Hei-delberg (2013)

28. Pinter, R.Y., Zehavi, M.: Algorithms for topology-free and alignment networkqueries. J. Discrete Algorithms 27, 29–53 (2014)

29. Schwartz, J.T.: Fast probabilistic algorithms for verification of polynomial identi-ties. J. Assoc. Comput. Mach. 27(4), 701–717 (1980)

30. Shlomi, T., Segal, D., Ruppin, E., Sharan, R.: Qpath: a method for querying path-ways in a protein-protein interaction networks. BMC Bioinform. 7, 199 (2006)

31. Sikora, F.: An (almost complete) state of the art around the graph motif problem.Universite Paris-Est Technical reports (2012)

32. Zippel, R.: Probabilistic algorithms for sparse polynomials. In: Ng, K.W. (ed.)EUROSAM 1979. LNCS, vol. 72, pp. 216–226. Springer, Heidelberg (1979)


3.9 Solving Parameterized Problems by Mixing Color Coding-

Related Techniques

Meirav Zehavi. Solving Parameterized Problems by Mixing Color Coding-Related

Techniques. arxiv.org/abs/1410.5062, 2014.

157


Solving Parameterized Problems by Mixing Color Coding-RelatedTechniques

Meirav Zehavi

Department of Computer Science, Technion - Israel Institute of Technology, Haifa 32000, [email protected]

Abstract. In the past two decades, several breakthrough techniques, known as “color coding-relatedtechniques”, lead to the design of extremely fast parameterized algorithms. In this paper, we introducea family of strategies, that we call “mixing strategies”, for applying these techniques, developing evenfaster, closer to optimal, parameterized algorithms. Our strategies combine the following novel ideas.

• Mixing narrow sieves and representative sets, two independent color coding-related techniques.• For certain “disjointness conditions”, improving the best known computation of representative sets.• Mixing divide-and-color-based preprocessing with the computation mentioned in the previous item,

speeding-up standard representative sets-based algorithms.• Cutting the universe into small pieces in two special manners, one used in the mix mentioned in

the previous item, and the other mixed with a non-standard representative sets-based algorithmto improve its running time.

Note that the first item implies that representative sets are relevant to the design of fast randomizedparameterized algorithms, and not only deterministic ones. We demonstrate the usefulness of our strate-gies by obtaining the following results. We first solve the well-studied k-Internal Out-Branchingproblem in deterministic time O∗(5.139k) and randomized time O∗(3.617k), improving upon the pre-vious best deterministic time O∗(6.855k) and randomized time O∗(4k). To this end, we establish arelation between “problematic” out-trees and maximum matching computations in graphs. We thenpresent a unified approach to improve the O∗ running times of the previous best deterministic algo-rithms for the classic k-Path, k-Tree, r-Dimensional k-Matching and Graph Motif problems,including their weighted versions, from O∗(2.619k), O∗(2.619k), O∗(2.619(r−1)k) and O∗(2.6192k) toO∗(2.597k), O∗(2.597k), O∗(2.597(r−1)k) and O∗(2.5972k), respectively. Finally, we solve the Weighted3-Set k-Packing problem in deterministic time O∗(8.097k), significantly improving upon the previousbest O∗(12.155k) deterministic time.

arX

iv:1

410.

5062

v2 [

cs.D

S] 2

3 N

ov 2

014


1 Introduction

Parameterized algorithms solve NP-hard problems by confining the combinatorial explosion to a parameterk. More precisely, a problem is fixed-parameter tractable (FPT) with respect to a parameter k if it can besolved in time O∗(f(k)) for some function f , where O∗ hides factors polynomial in the input size.

The classic color coding technique, introduced by Alon et al. [1], marked the beginning of a revolutionin the design of efficient FPT algorithms, enabling the discovery of the first single exponential time FPTalgorithms for many subcases of the Subgraph Isomorphism problem. In the past two decades, threebreakthrough techniques improved upon color coding, leading to the development of extremely fast FPTalgorithms for many fundamental problems. This includes the combinatorial divide-and-color technique [7],the algebraic multilinear detection technique [28,29,46] which later evolved into the algebraic narrow sievestechnique [2,3], and the combinatorial representative sets technique [18].

Divide-and-color was the first technique to allow developing FPT algorithms for weighted problems,both randomized and deterministic, that are faster than those relying on color coding. Representative setsimproved upon this technique, enabling to develop deterministic FPT algorithms for weighted problems thatare faster than the randomized ones based on divide-and-color. The fastest FPT algorithms, however, aredeveloped using narrow sieves. Unfortunately, narrow sieves is only known to be relevant to the design ofrandomized algorithms for unweighted problems.1

In this paper, we present non-standard strategies for applying these techniques, particularly the repre-sentative sets technique, developing improved, closer to optimal, FPT algorithms. Our strategies combinethe following novel ideas (see Section 2).

• Mixing narrow sieves and representative sets, previously considered to be two independent color coding-related techniques.

• Under certain “disjointness conditions”, speeding-up the best known computation of representative sets.• Mixing a preprocessing stage based on divide-and-color with the computation mentioned in the previous

item, speeding-up standard representative sets-based algorithms.• Cutting the universe into small pieces in two special manners, one used in the mix mentioned in the

previous item, and the other mixed with a non-standard representative sets-based algorithm to improveits running time by decreasing the size of the partial solutions it computes.

The first item indicates that the representative sets technique is relevant to the design of fast randomizedFPT algorithms, and not only deterministic ones. Using our strategies, we develop algorithms for k-InternalOut-Branching (k-IOB), k-Path and related classic problems,2 Weighted 3-Set k-Packing ((3, k)-WSP) and P2-Packing, defined as follows.

k-IOB: Given a directed graph G = (V,E) and a parameter k ∈ N, decide if G has an out-branching (i.e.,a spanning tree with exactly one node of in-degree 0) with at least k internal nodes (i.e., nodes of out-degreeat least 1).

Weighted k-Path: Given a directed graph G = (V,E), a weight function w : E → R, a weight W ∈ R anda parameter k ∈ N, decide if G has a simple directed path on exactly k nodes and of weight at most W .

(3, k)-WSP: Given a universe U , a family S of subsets of size 3 of U , a weight function w : S → R, a weightW ∈ R and a parameter k ∈ N, decide if there is a subfamily S ′ ⊆ S of k disjoint sets whose total weight isat least W .

P2-Packing: Given an undirected graph G = (V,E) and a parameter k ∈ N, decide if G has k (node-)disjointsimple paths, each on 3 nodes.

The k-IOB problem is of interest in database systems [10]. A special case of k-IOB, called k-InternalSpanning Tree (k-IST), asks if a given undirected graph G = (V,E) has a spanning tree with at least kinternal nodes. An interesting application of k-IST, for connecting cities with water pipes, is given in [40].The k-IST problem is NP-hard since it generalizes the Hamiltonian Path problem [23]; thus, k-IOB is

1 More precisely, narrow sieves can be used to solve weighted problems, but the time complexities of the resultingalgorithms have exponential dependencies on the length of the input weights.

2 Namely k-Path, k-Tree, r-Dimensional k-Matching ((r, k)-DM) and Graph Motif with Deletions (GMD),including their weighted versions.

1


Reference Deterministic/Randomized Variant Running Time

Priesto et al. [38] det k-IST O∗(2O(k log k))

Gutin al. [26] det k-IOB O∗(2O(k log k))

Cohen et al. [9] det k-IOB O∗(55.8k)


Fomin et al. [20] det k-IOB O∗(16k+o(k))

Fomin et al. [19] det k-IST O∗(8k)

Shachnai et al. [43] det k-IOB O∗(6.855k)

Zehavi [47] rand k-IOB O∗(4k)

This paper det k-IOB O∗(5.139k)


Table 1. Known FPT algorithms for k-IOB and k-IST.

also NP-hard. Table 1 presents a summary of known FPT algorithms for k-IOB and k-IST, including ourcontribution. More details on these problems can be found in the surveys [36,42]. We solve k-IOB in deter-ministic time O∗(5.139k) and randomized time O∗(3.617k), improving upon the previous best deterministictime O∗(6.855k) [43] and randomized time O∗(4k) [47]. To this end, we establish a relation between out-trees(i.e., directed trees with exactly one node of in-degree 0, called the root) that have many leaves and paths on2 nodes which. This relation shows how certain partial solutions to k-IOB can be completed in polynomialtime by using a computation of a maximum matching in the underlying undirected graph.

The k-Path, k-Tree, (r, k)-DM and GMD problems are well-studied problems, not only in the fieldof parameterized complexity. The (3, k)-DM problem, for example, is listed as one of the six fundamentalNP-complete problems in Garey and Johnson [22]. We present a unified approach to speed-up the O∗ runningtimes of standard representative sets-based (deterministic) algorithms. In particular, this approach can beused to modify the previous best deterministic algorithms for k-Path, k-Tree, (r, k)-DM and GMD, includ-ing their weighted variants, which run in times O∗(2.619k) [21,43], O∗(2.619k) [21,43], O∗(2.619(r−1)k) [25]and O∗(2.6192k) [37], to run in deterministic times O∗(2.597k), O∗(2.597k), O∗(2.597(r−1)k) and O∗(2.5972k),respectively. We choose to demonstrate our approach using the Weighted k-Path problem, since obtainingan O∗(2k) time deterministic algorithm for this problem is considered to be a major open problem in thefield of parameterized complexity (here we give another step in this direction).

The (3, k)-WSP problem is another well-known problem considered in this paper. In the past decade, itenjoyed a race towards obtaining the fastest FPT algorithm that solves it. Table 2 presents a summary ofknown FPT algorithms for (3, k)-WSP, including our contribution. We solve (3, k)-WSP in deterministictime O∗(8.097k), significantly improving upon the previous best deterministic time O∗(12.155k) [25]. TheP2-Packing problem is a special case of (3, k)-WSP, for which specialized FPT algorithms are given in[15,16,17,39]. Feng et al. [16] give a randomized algorithm for P2-Packing, including its weighted version,that runs in time O∗(6.75k), and Feng et al. [15] give a corresponding deterministic version that runs intime O∗(8k+o(k)). We first observe that the algorithms of [15,16] can be modified to solve P2-Packingin deterministic time O∗(6.75k+o(k)). We then give an alternative algorithm that solves P2-Packing indeterministic time O∗(6.777k).

2 Color Coding-Related Techniques

In this paper, we use a known algorithm (of [47]) that is based on narrow sieves as a black box, and thuswe do not describe this technique. We proceed by giving a brief description of the divide-and-color tech-nique, followed by a more detailed description of the representative sets technique. We then give a high-leveloverview of our strategies for mixing and applying the narrow sieves, divide-and-color and representativesets techniques when solving the problems mentioned in the introduction, including references to the rele-vant sections.

Divide-and-Color: Divide-and-color is based on recursion, where at each step we color elements, randomlyor deterministically. In our strategies, we are interested in applying only one step of this technique, which

2


Reference Randomized\Deterministic Variant Running Time

Chen et al. [6] det (3, p)-SP O∗(kO(k))

Downey et al. [11] det (3, p)-WSP O∗(kO(k))

Fellows et al. [14] det (3, p)-WSP O∗(2O(k))

Liu et al. [30] det (3, p)-WSP O∗(2, 0097.152k)

Koutis [27] det (3, p)-SP O∗(2O(k))

rand (3, p)-SP O∗(1, 285.475k)

Wang et al. [44] det (3, p)-WSP O∗(432.082k)

Liu et al. [31] det (3, p)-SP O∗(97.973k)

Chen et al. [7] det (3, p)-WSP O∗(64k+o(k))

rand (3, p)-WSP O∗(16k+o(k))

Wang et al. [45] det (3, p)-SP O∗(43.615k)

Chen et al. [5] det (3, p)-WSP O∗(32k+o(k))

Goyal et al. [25] det (3, p)-WSP O∗(12.155k)

Koutis [28] rand (3, p)-SP O∗(8k)

Bjorklund et al. [3] rand (3, p)-SP O∗(3.344k)

This paper det (3, p)-WSP O∗(8.097k)

Table 2. Known FPT algorithms for (3, k)-WSP and its unweighted variant (3, k)-SP.

can be viewed as applying the color coding technique with only two colors. In such a step, we have a setA of n elements, and we seek a certain subset A∗ of k elements in A. We partition A into two (disjoint)sets, B and C, by coloring its elements. Thus, we get the problem of finding a subset B∗ ⊆ A∗ in B, andanother problem of finding the subset C∗ = A∗ \B∗ in C. The partition should be done in a manner that isboth efficient and results in an easier problem, which does not necessarily mean that we get two independentproblems (of finding B∗ in B and C∗ in C). Deterministic applications of divide-and-color often use a toolcalled an (n, k)-universal set [35]. Here, however, we need the following generalization of this tool:

Definition 1. Let F be a set of functions f : {1, 2, . . . , n} → {0, 1}. We say that F is an (n, k, p)-universalset if for every subset I ⊆ {1, 2, . . . , n} of size k and a function f ′ : I → {0, 1} that assigns ’1’ to exactly pindices, there is a function f ∈ F such that for all i ∈ I, f(i) = f ′(i).

The next result (of [18]) asserts that small universal sets can be computed efficiently.

Theorem 1. There is an algorithm that, given integers n, k and p, computes an (n, k, p)-universal set F ofsize O∗(

(kp

)2o(k)) in deterministic time O∗(

(kp

)2o(k)).

Representative Sets: We first give the formal definition of a representative family, and then discuss itsrelevance to the design of FPT algorithms. We are only concerned with this definition with respect to uniformmatroids, in which case we avoid using the term matroid. Information on the more general definition can befound in, e.g., [18,33].

Definition 2. Given a universe E, family S of subsets of size p of E, function w : S → R and parameterk ∈ N, we say that a subfamily S ⊆ S max (min) (k − p)-represents S if for every pair of sets X ∈ Sand Y ⊆ E \ X such that |Y | ≤ k − p, there is a set X ∈ S disjoint from Y such that w(X) ≥ w(X)

(w(X) ≤ w(X)).

Roughly speaking, Definition 2 implies that if a set Y can be extended to a set of size at most k byadding a set X from S, then it can also be extended to a set of the same size by adding a set X from S thatis at least as good as X. The special case where w(S) = 0, for all S ∈ S, is the unweighted version of thedefinition.

Plenty FPT algorithms are based on dynamic programming, where after each stage, the algorithm com-putes a family S of sets that are partial solutions. At that point we can compute a subfamily S ⊆ S thatrepresents S. Then, each reference to S can be replaced by a reference to S. The representative family S

3


contains “enough” sets from S; therefore, such replacement preserves the correctness of the algorithm. Thus,if we can efficiently compute representative families that are small enough, we can substantially improve therunning time of the algorithm.

The Two Families Theorem of Bollobas [4] implies that for any universe E, family S of subsets of size p

of E and parameter k (≥ p), there is a subfamily S ⊆ S of size(kp

)that (k − p)-represents S. Monien [34]

computed representative families of size∑k−pi=0 p

i in time O(|S|p(k − p)∑k−pi=0 p

i), and Marx [32] computed

representative families of size(kp

)in time O(|S|2pk−p). Recently, Fomin et al. [18] introduced a powerful

technique which enables to compute representative families of size(kp

)2o(k) log |E| in time O(|S|(k/(k −

p))k−p2o(k) log |E|). Here we need the following tradeoff-based generalization of their computation, given in[43] (see also [21]):

Theorem 2. Given a parameter c≥1, universe E, family S of subsets of size p of E, function w : S → R

and parameter k ∈ N, a subfamily S ⊆ S of size(ck)k

pp(ck − p)k−p 2o(k) log |E| that max (min) (k−p)-represents

S can be found in time O(|S|(ck/(ck−p))k−p2o(k) log |E|+ |S| log |S|).

2.1 Mixing Strategies

Our first algorithm, a deterministic FPT algorithm for k-IOB, follows the strategy illustrated in Figure 1(I)(see Appendix A). The first reduction (of [9]) allows us to focus on finding a small out-tree rather than anout-branching, while the second reduction allows us to focus on finding an even smaller out-tree, but weneed to find this out-tree along with a set of paths on 2 nodes (see Section 3.1). We then use a representativesets-based procedure in a non-standard manner, which does not directly solves the problem, but returns afamily of partial solutions. We can then try to extend each partial solution to a solution in polynomial time(see Section 3.2). This is somewhat similar to an execution of a bounded search tree-based procedure (see[12]), where the leaves of the search tree may contain partial solutions that we try to extend to solutions inpolynomial time.

Our second algorithm, which is a randomized FPT algorithm for k-IOB, builds upon our first algorithmand follows the strategy illustrated in Figure 1(II). In particular, this strategy shows the usefulness ofmixing narrow sieves and representative sets (see Section 3.3), which were previously considered to betwo independent tools for developing FPT algorithms. This strategy indicates that the representative setstechnique is relevant to the design of fast randomized FPT algorithms, even for unweighted problems.

Our third algorithm, which is a deterministic FPT algorithm for Weighted k-Path, follows the strategyillustrated in Figure 1(III). This strategy can be used to obtain faster algorithms for other problems solvedusing a standard application of representative sets.3 Our improvement relies on the following generalizationof Definition 2 and Theorem 2:

Definition 3. Let E1, E2, . . . , Et be disjoint universes of elements, p1, p2, . . . , pt ∈ N, and S be a family ofsubsets of (

⋃ti=1Ei) such that [∀S ∈ S, i ∈ {1, 2, . . . , t}: |S ∩ Ei| = pi]. Given a function w : S → R and

parameters k1, k2, . . . , kt ∈ N, we say that a subfamily S ⊆ S max (min) (k1−p1, k2−p2, . . . , kt−pt)-representsS if for every pair of sets X ∈ S and Y ⊆ (

⋃ti=1Ei) \X such that [∀i ∈ {1, 2, . . . , t} : |Y ∩ Ei| ≤ ki − pi],

there is a set X ∈ S disjoint from Y such that w(X) ≥ w(X) (w(X) ≤ w(X)).

Theorem 3. Given parameters c1, c2, . . . , ct ≥ 1, and E1, E2, . . . , Et, p1, p2, . . . , pt, S, w and k1, k2, . . . , kt

as in Definition 3, a subfamily S ⊆S of size

t∏

i=1

((ciki)

ki


)that max (min) (k1−p1, k2−

3 The term standard refers to the k-Path, k-Tree, (r, k)-DM and GMD algorithms of [21,25,37,43], which userepresentative sets in a similar manner: (1) elements are never deleted from partial solutions (this is not the casein our algorithm and the previous algorithm of [25] for (3, k)-WSP), and (2) the solution is one of the computedrepresentative sets (this is not the case in the algorithm for Long Directed Cycle of [18], and our algorithmsfor k-IOB and P2-Packing).

4


p2, . . . , kt−pt)-represents S can be found in time bounded by O(|S|t∏

i=1

((ciki/(ciki − pi))ki−pi2o(ki) log |Ei|

)

+|S| log |S|).

We prove this theorem in Section 4.1, and use it to solve a subcase of Weighted k-Path in Section4.2. We translate Weighted k-Path to this subcase via divide-and-color preprocessing, combined with atechnique that we call balanced cutting of the universe (see Section 4.3).

Our fourth and fifth algorithms, which are deterministic FPT algorithms for (3, k)-WSP and P2-Packing, follow the strategies illustrated in Figures 1(IV) and 1(V), respectively. Here we also cut theuniverse into small parts (see Appendices H and I), though in a different manner, which allows us to deletemore elements from partial solutions than [25]. We call this technique unbalanced cutting of the universe.Our algorithm for P2-Packing is also based on the iterative compression technique [41], relying on a resultof [17]. Roughly speaking, applying iterative compression means that one solves a variant of the problemwhere we are given a (partial) solution of size t− 1, and need to find a solution of size t. Then, the generalproblem can be solved by running the specialized algorithm k times, iteratively increasing the value of tfrom 1 to k. We note that iterative compression has already been combined with representative sets in [25](to solve (3, k)-DM).

3 Solving the k-IOB Problem

In this section we develop FPT algorithms for k-IOB, following the first two strategies in Section 2.1.

3.1 From Out-Branchings to Small Out-Trees and Paths on Two Nodes

We first define the (ka, kb, à, `b)-Internal Out-Tree ((ka, kb, à, `b)-IOT) problem, which requires findingan out-tree rather than an out-branching. Given a directed graph G = (V,E), a node r ∈ V that is the rootof an out-branching of G, and parameters ka ≤ kb and à ≤ `b, this problem asks if G has an out-tree Trooted at r that contains exactly x internal nodes and y leaves, for some ka ≤ x ≤ kb and à ≤ y ≤ `b. Adiscussion given in [9] directly implies the correctness of the following observations (see also [47]):

Observation 1. If (k, k, 1, k)-IOT can be solved in (deterministic) time τ(G, k), then k-IOB can be solvedin (deterministic) time O(|V |(|E|+ τ(G, k))).

Observation 2. An input (G, r, k, k, 1, k) is a yes-instance of (k, k, 1, k)-IOT iff (G, r, k, |V |, 1, |V |) is ayes-instance of (k, |V |, 1, |V |)-IOT.

We next show that we can focus on finding a small out-tree along with a set of paths on 2 nodes. Tothis end, we define a new problem, called (k, `, q)-Tree&Paths ((k, `, q)-T&P). Given a directed graphG = (V,E), a node r ∈ V that is the root of an out-branching of G, and parameters k, ` ≤ k andq ≥ max{0, 2` − k}, this problem asks if G has an out-tree T rooted at r that contains exactly (k − q)internal nodes and (`− q) leaves, along with q (node-)disjoint paths, each on two nodes, that do not containany node from T .

Using Observations 1 and 2, we prove the following lemma in Appendix B:

Lemma 1. If (k, k, `, `)-IOT can be solved in (randomized/deterministic) time α(G, k, `) and (k, `, q)-T&Pcan be solved in (rand./det.) time β(G, k, `, q), then k-IOB can be solved in (rand./det.) time

O(|V |

|E|+

k∑

`=1

min

α(G, k, `),

∑

q=max{0,2`−k}β(G, k − q, `− q, q)

).

To obtain our deterministic algorithm for k-IOB, we only need the following corollary:

Corollary 1. If (k, `, q)-T&P can be solved in (rand./det.) time β(G, k, `, q), then k-IOB can be solved in

(rand./det.) time O(|V |(|E|+k∑

`=1

∑

q=max{0,2`−k}β(G, k − q, `− q, q))).

5


3.2 A Deterministic Algorithm for k-IOB

Let (G, r, k, `, q) be an instance of (k, `, q)-T&P. Moreover, let Tx,y denote the family that includes every setof nodes that is the node-set of an out-tree in G that is rooted at r and contains exactly x internal nodes andy leaves. The paper [43] gives a (deterministic) respresentative sets-based procedure, TreeAlg1(G, r, k, `, q, c),which satisfies the following:

Lemma 2. For any fixed c ≥ 1 and ε > 0, TreeAlg1 computes a family that z-represents Tx,y in time

O∗((x+y)∑

i=0

((c(x+ y + z))(2x+2y+2z−i)

ii(c(x+ y + z)− i)(2x+2y+2z−2i) 2ε(x+y+z))

).

Denoting x = (k − q), y = (`− q), z = 2q and T = Tx,y, we have the following corollary:

Corollary 2. For any fixed c ≥ 1 and ε > 0, TreeAlg1 computes a family that (2q)-represents T in time

O∗((k+`−2q)∑

i=0

((c(k + `))(2k+2`−i)

ii(c(k + `)− i)2k+2`−2i 2ε(k+`)

)).

Consider the following deterministic algorithm T&PAlg for (k, `, q)-T&P, where ε = 11010 .

Algorithm 1 T&PAlg(G = (V,E), r, k, `, q, c)

1: T ⇐ TreeAlg1(G, r, k, `, q, c).

2: for all X ∈ T do3: compute (in polynomial time) a maximum matching M in the underlying undirected graph of G from which

we remove the nodes in X and the adjacent edges (see [13]).4: if |M | ≥ q then accept. end if5: end for6: reject.

Lemma 3. T&PAlg solves (k, `, q)-T&P in time O∗((k+`−2q)∑

i=0

((c(k + `))2k+2`−i

ii(c(k + `)− i)2k+2`−2i 2(k+`)

1010

)).

Proof. By the pseudocode and Corollary 2, T&PAlg runs in the desired time. Moreover, if it accepts, theinput is clearly a yes-instance. Now, suppose that the input is a yes-instance, and let T and P1, P2, . . . , Pq be

a solution for it. By Corollary 2, T (2q)-represents T . Therefore, there exists X ∈ T that does not containany node that belongs to P1, P2, . . . , Pq. Thus, the underlying undirected graph of G from which we removethe nodes in X and the adjacent edges contains a maximum matching of size at least q. By the pseudocode,this implies that T&PAlg accepts. ut

Upper bounds for O∗(k∑

`=1

∑

q=max{0,2`−k}

(k+`−2q)∑

i=0

((c(k + `))(2k+2`−i)

ii(c(k + `)− i)2k+2`−2i 2(k+`)

1010

)), considering different

parameters 1 ≤ c, are given in Appendix C. In particular, by choosing c = 1.497, we get the boundO∗(5.139k). Thus, by Corollary 1, we have the following theorem:

Theorem 4. k-IOB can be solved in deterministic time O∗(5.139k).

3.3 A Randomized Algorithm for k-IOB

The paper [47] gives a (randomized) narrow sieves-based procedure, TreeAlg2, which satisfies the following:

Lemma 4. TreeAlg2 solves (k, k, `, `)-IOT in time O∗(2k+`).

6


Note that O∗(0.8545k∑

`=1

2k+`) = O∗(3.617k). Furthermore, we show (in Appendix C) that

O∗(k∑

`=0.8545k

∑

q=max{0,2`−k}

(k+`−2q)∑

i=0

((1.765(k + `))(2k+2`−i)

ii(1.765(k + `)− i)2k+2`−2i 2(k+`)

1010

)) = O∗(3.617k). Therefore, by Lem-

mas 1, 3 (setting c = 1.765) and 4, we have the following theorem:

Theorem 5. k-IOB can be solved in randomized time O∗(3.617k).

4 Solving Weighted k-Path and Related Problems

In this section we present a technique for speeding-up standard representative sets-based algorithms, follow-ing the third strategy in Section 2.1. This technique is demonstrated using the Weighted k-Path problem.

4.1 Computing Generalized Representative Sets

We now prove the correctness of Theorem 3 (see Section 2.1), generalizing the computation of representativesets that is given in [43], which heavily relies on [18].

Proof. We start by giving the definition of a data structure necessary to our computation of representativefamilies. Let E′ be a universe of n′ elements, and suppose that k′, p′ ∈ N are parameters such that p′ ≤ k′.Rephrasing Definition 1, we say that a family F ⊆ 2E

′is (E′, k′, p′)-good if it satisfies the following condition:

For every pair of sets X ⊆ E′ of size p′ and Y ⊆ E′ \X of size at most k′−p′, there is a set F ∈ F such thatX ⊆ F , and Y ∩F = ∅. An (E′, k′, p′)-separator is a data structure containing such a family F , which, givena set S ⊆ E′ of size p′, outputs the subfamily of sets in F that contain S, i.e., χ(S) = {F ∈ F : S ⊆ F}. Theefficiency of such a data structure is measured by the following parameters: ζ = ζ(E′, k′, p′), the number ofsets in the family F ; τI = τI(E

′, k′, p′), the time required to compute the family F ; and τQ = τQ(E′, k′, p′),an upper bound for the time required to output χ(X), for any X ⊆ E′ of size p′.

For any fixed c′ ≥ 1, the paper [43] shows how to construct Dc′(E′,k′,p′), an (E′, k′, p′)-separator for which

ζ ≤ (c′k′)k′

p′p′(c′k′ − p′)k′−p′

2o(k′) logn′, τI ≤

(c′k′)k′

p′p′(c′k′ − p′)k′−p′

2o(k′)n′ logn′ and τQ ≤ (

c′k′

c′k′ − p′ )k′−p′2o(k

′) logn′.

Algorithm 2 GenRepAlg(c1, c2, . . . , ct, E1, E2, . . . , Et, p1, p2, . . . , pt, S, w, k1, k2, . . . , kt)1: for i = 1, 2, . . . , t do2: construct Dci

(Ei,ki,pi), and let Fi be the family it stores.

3: for all S ∈ S do use Dci(Ei,ki,pi)

to compute χi(S) = {Fi ∈ Fi : S ∩ Ei ⊆ Fi}. end for4: end for5: compute F = {F1 ∪ F2 ∪ . . . ∪ Ft : F1 ∈ F1, F2 ∈ F2, . . . , Ft ∈ Ft}.6: for all S ∈ S do compute χ(S) = {F ∈ F : (∀i ∈ {1, 2, . . . , t} : F ∩ Ei ∈ χi(S))}. end for7: order S = {S1, . . . , S|S|} such that w(Si−1) ≥ w(Si) (w(Si−1) ≤ w(Si)), for all 2 ≤ i ≤ |S|.8: initialize S ⇐ ∅.9: for all F ∈ F do let zF ∈ {0, 1} be an indicator variable for using F , initialized to 0. end for

10: for i = 1, 2, . . . , |S| do11: compute χ0(Si) = {F ∈ χ(Si) : zF = 0}.12: if χ0(Si) 6= ∅ then13: add Si to S.14: for all F ∈ χ0(Si) do assign zF = 1. end for15: end if16: end for17: return S.

Denote E =⋃ti=1Ei, k =

∑ti=1 ki and p =

∑ti=1 pi. We can clearly assume that |S| is larger than the size

of the desired representative family (else we can simply return S). The pseudocode of our (deterministic)

7


algorithm, GenRepAlg, for computing representative sets is given above. The crux of its approach, whichallows it, under certain “disjointness conditions” (i.e., for certain parameters p1, p2, . . . , pt and k1, k2, . . . , kt),to compute representative sets faster than [43] is the following: It avoids a direct construction of an (E, k, p)-separator, but obtains, more efficiently, all the necessary information provided by such a separator by usingsmaller separators of the form Dci

(Ei,ki,pi). To this end, it first constructs the smaller separators (Step 2). It

uses them to compute a family F (Step 5), and a corresponding function χ (Steps 3 and 6). Note that Fmight not be (E, k, p)-good, but it will be sufficient for our purpose. Then, GenRepAlg orders the sets in Saccording to their weights (Step 7). Finally, it returns all Si ∈ S for which there is a set F ∈ F containingSi but no Sj , for 1 ≤ j < i (Steps 8–17).

The proof that GenRepAlg indeed returns a subfamily S ⊆ S of the desired size that max (min) (k1 −p1, k2 − p2, . . . , kt − pt)-represents S in the desired time is given in Appendix D. ut

4.2 Solving a Subcase of Weighted k-Path

We next consider Cut Weighted k-Path (k-CWP), the subcase of Weighted k-Path that we solveusing a generalized representative sets-based procedure. Since the definition of k-CWP is slightly technicaland we solve it in the appendix, we start by explaining the main idea. Roughly speaking, we define thisproblem to be Weighted k-Path where we are given two disjoint sets of nodes, L and R, and functionsthat cut the solution, a “light” path on k nodes (if one exists), in a manner that will allow us to consider Lonly in the first half (plus a chosen small fraction) of the execution of the following procedure that solves it,and R only in the second half (minus the fraction) of its execution. It is also important that these functions“spread” the nodes of L (resp. R) in a certain manner, which is at worst approximately balanced, amongthe paths considered in the first (resp. second) half of the execution. Approximate balance actually distortsthe balance in the computed partial solutions: partial solutions computed in the first (resp. second) half ofthe execution will contain a much smaller (resp. larger) fraction of the nodes from V \ (L ∪R) belonging tothe solution than of the nodes from L (resp. R) belonging to the solution, especially when we are close tothe middle of the execution. This distortion allows us to benefit (in terms of running time) from the factthat we compute generalized representative sets.

Fix 0 < ε, δ, γ < 0.1, whose values are determined later, such that 1ε ,m, m ∈ N, where m = 1

2 ( 1ε − 1) and

m = δ( 1ε−1). Also denote k = k−1. Formally, the input for k-CWP consists of a directed graph G = (V,E), a

weight function w : E → R, a weightW ∈ R and a parameter k ∈ N, along with two disjoint subsets L,R ⊆ V ,four injective functions `1, `2 : {1, 2, . . . ,m+ m} → V \ (L∪R) and r1, r2 : {1, 2, . . . ,m− m} → V \ (L∪R),v` ∈ image(`2) ∪ image(r2) and vr ∈ image(`1) ∪ image(r1). The functions should satisfy the following“function conditions”:

1. (image(`1) ∩ image(r1)) = (image(`2) ∩ image(r2)) = ∅.2. |(image(`1)∪image(r1))\(image(`2)∪image(r2))| = |(image(`2)∪image(r2))\(image(`1)∪image(r1))| = 2.3. v` ∈ (image(`2) ∪ image(r2)) \ (image(`1) ∪ image(r1)), and vr ∈ (image(`1) ∪ image(r1)) \ (image(`2) ∪

image(r2)).

We need to decide if G has 1ε simple internally node-disjoint directed paths, P1, P2, . . . , P 1

ε, whose total

weight is at most W , and whose number of nodes from L and R is almost the same—b( 12 + δ)γkc from L

and b( 12 − δ)γkc from R,4 such that the following “solution conditions” are satisfied:

1. ∀i ∈ {1, 2, . . . ,m + m} : The first and last nodes of Pi are `1(i) and `2(i), respectively, and it contains

exactly bεkc − 1 internal nodes, where none of them belongs to R or the images of the input functions.Moreover, the total number of nodes from L that are contained in the paths P1, . . . , Pi is at leasti

m+m (b( 12 + δ)γkc − (k − 2mbεkc − 2)).

2. The first and last nodes of Pm+m+1 are v` and vr, respectively, and it contains exactly k − 2mbεkc − 2internal nodes, where none of them belongs to the images of the input functions.

4 For intution why we take more nodes from L than from R, note that in [43] the worst running time of thealgorithm for Weighted k-Path is obtained slightly after it passes half of its computation (i.e., after computingpartial solutions that are paths on d k

2e nodes).

8


3. ∀i ∈ {1, 2, . . . ,m − m} : The first and last nodes of Pm+m+1+i are r1(i) and r2(i), respectively, and it

contains exactly bεkc − 1 internal nodes, where none of them belongs to L or the images of the inputfunctions. Moreover, the total number of nodes from R that are contained in the paths Pm+m+1, . . . , Piis at most i

m−m (b( 12 − δ)γkc − (k − 2mbεkc − 2)) + (k − 2mbεkc − 2).

Following the above explanation, Appendix E shows that k-CWP can indeed be efficiently solved usinggeneralized representative sets, giving a procedure that proves the following lemma:

Lemma 5. For any fixed c1, c2, c`, cr ≥ 1 and 0 < λ < 1 such that c1 ≥ c2,5 we can select a smallenough ε = ε(λ, δ, γ, c1, c2) (its choice depends only on λ, δ, γ, c1 and c2) such that k-CWP can be solved indeterministic time O∗(X · 2λk), where X is bounded by the maximum among the following two expressions:

1. max0≤i≤( 1

2+δ)γkmax

0≤j≤ iγk (k−γk)

((c`(

12 + δ)γk)2(

12+δ)γk−i

ii(c`(12 + δ)γk − i)2( 1

2+δ)γk−2i· (c1(k − γk))2(k−γk)−j

jj(c1(k − γk)− j)2(k−γk)−2j

).

2. max0≤i≤( 1

2−δ)γkmax

i+( 12+δ)γk

γk (k−γk)≤j≤k−γk

((cr(

12 − δ)γk)2(

12−δ)γk−i

ii(cr(12 − δ)γk − i)2(

12−δ)γk−2i

· (c2(k − γk))2(k−γk)−j


).

4.3 A Deterministic Algorithm for Weighted k-Path

Fix 0 < δ, γ < 0.1 and c1, c2, c`, cr ≥ 1 such that c1 ≥ c2, and let ε = ε( 11011 , δ, γ, c1, c2). We now present the

pseudocode of PathAlg, a deterministic algorithm for Weighted k-Path, followed by explanations.

Algorithm 3 PathAlg(G = (V,E), w,W, k)

1: let v1, v2, . . . , vn be an arbitrary ordering of the nodes in V .2: compute an (n, b( 1

2+ δ)γkc+ b( 1

2− δ)γkc, b( 1

2− δ)γkc)-universal set F by using the algorithm in Theorem 1.

3: for all f ∈ F do4: for all U ⊆ V such that |U | ≤ 2

εdo

5: for i = 1, 2, . . . , n do6: let L = {vj ∈ V \ U : j ≥ i, f(vj) = 0} and R = {vj ∈ V \ U : j ≥ i, f(vj) = 1}.7: for all `1, `2 : {1, 2, . . . ,m+m} → U , r1, r2 : {1, 2, . . . ,m−m} → U , v`∈ image(`2)∪ image(r2) and

vr∈ image(`1)∪image(r1) do8: if (G,w,W, k, L,R, `1, `2, r1, r2, v

`, vr) is an input for k-CWP then9: if the procedure of Lemma 5 accepts (G,w,W, k, L,R, `1, `2, r1, r2, v

`, vr) then accept. end if10: end if11: end for12: end for13: end for14: end for15: reject.

Algorithm PathAlg first orders the nodes in V . Then, it performs (in Steps 4 and 5) an exhaustive search,iterating over every “small” set U ⊆ V and node vi ∈ V (Step 5), to capture exactly (b( 1

2+δ)γkc+b( 12−δ)γkc)

from the nodes on a path that is a solution (if one exists) in the set {vi, vi+1, . . . , vn} \ U . It can then (inStep 6) obtain the sets L and R by using a standard divide-and-color step. Afterwards it further cuts theuniverse (which is V ) by exhaustively iterating over every option to choose four functions `1, `2, r1, r2 thatare legal according to Step 8, performing the phase called “balanced cutting of the universe” in Strategy III(see Section 2.1). At this point, it is important to observe that PathAlg does not explicitly cut the universe,obtaining several disjoint universes, which will be extremely inefficient. In Steps 4 and 7 it only considersevery option to choose which nodes should cut the universe in a balanced manner (those are the nodes in theimages of `1, `2, r1 and r2), which is all the information necessary as input for k-CWP. Note that there are

5 We note that obtaining the claim in this lemma when c1 < c2 is actually easier, but will not be relevant for therest of this paper.

9


|V |O( 1ε ) = O∗(1) such options. Overall, PathAlg uses divide-and-color and balanced cutting of the universe

to obtain a set of inputs for k-CWP, and accepts (in Step 9) iff at least one of them is a yes-instance.We next consider the correctness and running time of PathAlg.

Lemma 6. PathAlg solves Weighted k-Path in time O∗(X ·( γk( 12−δ)γk

)2

11010

k), where X is bounded by the

maximum among the following two expressions:

1. max0≤i≤( 1

2+δ)γkmax

0≤j≤ iγk (k−γk)

((c`(

12 + δ)γk)2(

12+δ)γk−i

ii(c`(12 + δ)γk − i)2( 1

2+δ)γk−2i· (c1(k − γk))2(k−γk)−j


).

2. max0≤i≤( 1

2−δ)γkmax

i+( 12+δ)γk

γk (k−γk)≤j≤k−γk

((cr(

12 − δ)γk)2(

12−δ)γk−i

ii(cr(12 − δ)γk − i)2(

12−δ)γk−2i

· (c2(k − γk))2(k−γk)−j


).

Proof. The running time of the algorithm follows immediately from the pseudocode, Theorem 1 and Lemma

5, and since 21

1011k+o(k) = O(2

11010

k).For the easier direction, suppose that PathAlg accepts, and let L, R, `1, `2, r1 and r2 be the param-

eters corresponding to the iteration in which it accepts. We get that (G,w,W, k, L,R, `1, `2, r1, r2) is ayes-instance of k-CWP, and can thus denote by P1, P2, . . . , P 1

εthe paths that form a solution to this in-

stance. Since “function conditions” 1–3 are satisfied by the injective functions `1, `2, r1 and r2, and since theinternally node-disjoint simple directed paths P1, P2, . . . , P 1

εsatisfy “path conditions” 1–3, we can reorder

P1, P2, . . . , P 1ε

as P ′1, P′2, . . . , P

′1ε

such that the last node in P ′i−1 is the first node in P ′i , for all 2 ≤ i ≤ 1ε . We

thus construct one directed simple path on k nodes (by “path conditions” 1–3, this is the total number ofdistinct nodes in P1, P2, . . . , P 1

ε). Its weight is at most W , as it contains the same edges as P1, P2, . . . , P 1

ε,

whose total weight is at most W .Recall that m = 1

2 ( 1ε − 1), m = δ( 1

ε − 1) and k = k − 1. Now, suppose that the input is a yes-instanceof Weighted k-Path. Thus, we can denote by P a path that is a solution to this instance. Let U be theset of the [(j − 1)bεkc + 1]st, kst and [(j − 1)bεkc + k − 2mbεkc]st nodes on P , for all j ∈ {1, 2, . . . , 1ε }.Let vi denote a node in V such that P contains exactly (b( 1

2 + δ)γkc+ b( 12 − δ)γkc) nodes from V ∗i , which

denotes the set of nodes in V that are not ordered before vi and do not belong to U . Let Pm+m+1 denote

the set of each subpath of P that starts at its [1 + (j− 1)bεkc]st node, for any j ∈ {1, 2, . . . , 1ε }, and contains

exactly k − 2mbεkc nodes. Let Pm+m+1 be a subpath in Pm+m+1 that contains the maximum number ofnodes from V ∗i among the subpaths in Pm+m+1. Denote the node-set of Pm+m+1 by Vm+m+1. Let v` andvr denote the first and last nodes of Pm+m+1, respectively. Remove the internal nodes of Pm+m+1 from P .Let P ′ and P ′′ be the two resulting subpaths, and denote their number of nodes by k′ and k′′, respectively.If k′ > 1, let P ′j , for all j ∈ {1, 2, . . . , k′−1bεkc }, denote the subpath of P ′ on bεkc + 1 nodes that starts at the

[(j− 1)bεkc+ 1]st node of P ′. Moreover, if k′′ > 1, let P ′′j , for all j ∈ {1, 2, . . . , k′′−1bεkc }, denote the subpath of

P ′′ on bεkc+ 1 nodes that starts at the [(j − 1)bεkc+ 1]st node of P ′′. Now, denote the paths of the forms

P ′j and P ′′j arbitrarily as P1, P2, . . . , P2m. Moreover, denote their node-sets as V1, V2, . . . , V2m, respectively.By our choice of U , the first and last nodes of each of these paths belong to U .

Next, in Appendix F, we use the paths P1, P2, . . . , P2m to define paths P1, P2, . . . , P2m+1, functions`1, `2, r1 and r2 and sets L and R such that (G,w,W, k, L,R, `1, `2, r1, r2, v

`, vr) is a yes-instance of k-CWP, which is examined in an execution of Step 9 by PathAlg. We note that the fact that we use an(n, b( 1

2 + δ)γkc + b( 12 − δ)γkc, b( 1

2 − δ)γkc)-universal set F is taken into consideration when we define thesets L and R. ut

Upper bounds for the running time in Lemma 6, considering different parameters γ, c, c` and cr, aregiven in Appendix G. In particular, by choosing δ = 0.046, γ = 0.084, c1 = 1.504, c2 = 1.398, c` = 1.092 andcr = 1.876, we get the bound O∗(2.59606k). Thus, we have the following theorem:

Theorem 6. Weighted k-Path can be solved in deterministic time O∗(2.59606k).

10


References

1. Alon, N., Yuster, R., Zwick, U.: Color coding. J. ACM 42(4), 844–856 (1995)2. Bjorklund, A.: Determinant sums for undirected hamiltonicity. In: FOCS. pp. 173–182 (2010)3. Bjorklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: Narrow sieves for parameterized paths and packings. CoRR

abs/1007.1161 (2010)4. Bollobas, B.: On generalized graphs. Acta Math. Aca. Sci. Hun. 16, 447–452 (1965)5. Chen, J., Feng, Q., Liu, Y., Lu, S., Wang, J.: Improved deterministic algorithms for weighted matching and

packing problems. Theor. Comput. Sci. 412(23), 2503–2512 (2011)6. Chen, J., Friesen, D., Jia, W., Kanj, I.: Using nondeterminism to design effcient deterministic algorithms. Algo-

rithmica 40(2), 83–97 (2004)7. Chen, J., Kneis, J., Lu, S., Molle, D., Richter, S., Rossmanith, P., Sze, S.H., Zhang, F.: Randomized divide-and-

conquer: Improved path, matching, and packing algorithms. SIAM J. on Computing 38(6), 2526–2547 (2009)8. Chen, J., Lu, S., Sze, S.H., Zhang, F.: Improved algorithms for path, matching, and packing problems. In: SODA.

pp. 298–307 (2007)9. Cohen, N., Fomin, F.V., Gutin, G., Kim, E.J., Saurabh, S., Yeo, A.: Algorithm for finding k-vertex out-trees and

its application to k-internal out-branching problem. J. Comput. Syst. Sci. 76(7), 650–662 (2010)10. Demers, A., Downing, A.: Minimum leaf spanning tree. US Patent no. 6,105,018 (August 2013)11. Downey, R., Fellows, M.: Parameterized Complexity. Springer, New York (1999)12. Downey, R., Fellows, M.: Fundamentals of Parameterized Complexity. Springer (2013)13. Edmonds, J.: Paths, trees, and flowers. Canad. J. Math. 17, 449–467 (1972)14. Fellows, M., Knauer, C., Nishimura, N., Ragde, P., Rosamond, F., Stege, U., Thilikos, D., Whitesides, S.: Faster

fixed-parameter tractable algorithms for matching and packing problems. Algorithmica 52(2), 167–176 (2008)15. Feng, Q., Wang, J., Chen, J.: Matching and weighted p2-packing: algorithms and kernels. Theor. Comput. Sci.

522, 85–94 (2014)16. Feng, Q., Wang, J., Li, S., Chen, J.: Randomized parameterized algorithms for p2-packing and co-path packing

problems. J. Comb. Optim. (2013)17. Fernau, H., Raible, D.: A parameterized perspective on packing paths of length two. J. Comb. Optim. 18(4),

319–341 (2009)18. Fomin, F., Lokshtanov, D., Saurabh, S.: Efficient computation of representative sets with applications in param-

eterized and exact agorithms. In: SODA. pp. 142–151 (2014)19. Fomin, F.V., Gaspers, S., Saurabh, S., Thomasse, S.: A linear vertex kernel for maximum internal spanning tree.

J. Comput. Syst. Sci. 79(1), 1–6 (2013)20. Fomin, F.V., Grandoni, F., Lokshtanov, D., Saurabh, S.: Sharp separation and applications to exact and param-

eterized algorithms. Algorithmica 63(3), 692–706 (2012)21. Fomin, F.V., Lokshtanov, D., Panolan, F., Saurabh, S.: Representative sets of product families. In: ESA. pp.

443–454 (2014)22. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completeness. W.H.

Freeman, New York (1979)23. Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete problems. In: STOC. pp. 47–63 (1974)24. Goyal, P., Misra, N., Panolan, F.: Faster deterministic algorithms for r-dimensional matching using representative

sets. In: FSTTCS. pp. 237–248 (2013)25. Goyal, P., Misra, N., Panolan, F., Zehavi, M.: Faster deterministic algorithms for matching and packing problems.

Submitted, results reported in [24,48] (2014)26. Gutin, G., Razgon, I., Kim, E.J.: Minimum leaf out-branching and related problems. Theor. Comput. Sci. 410(45),

4571–4579 (2009)27. Koutis, I.: A faster parameterized algorithm for set packing. Inf. Process. Lett. 94(1), 7–9 (2005)28. Koutis, I.: Faster algebraic algorithms for path and packing problems. In: ICALP. pp. 575–586 (2008)29. Koutis, I., Williams, R.: Limits and applications of group algebras for parameterized problems. In: ICALP. pp.

653–664 (2009)30. Liu, Y., Chen, J., Wang, J.: On efficient FPT algorithms for weighted matching and packing problems. In: TAMC.

pp. 575–586 (2007)31. Liu, Y., Lu, S., Chen, J., Sze, S.H.: Greedy localization and color-coding: improved matching and packing

algorithms. In: IWPEC. pp. 84–95 (2006)32. Marx, D.: Parameterized coloring problems on chordal graphs. Theor. Comput. Sci. 351, 407–424 (2006)33. Marx, D.: A parameterized view on matroid optimization problems. Theor. Comput. Sci. 410, 4471–4479 (2009)34. Monien, B.: How to find long paths efficiently. Annals Disc. Math. 25, 239–254 (1985)35. Naor, M., Schulman, J.L., Srinivasan, A.: Splitters and near-optimal derandomization. In: FOCS. pp. 182–191

(1995)

11


36. Ozeki, K., Yamashita, T.: Spanning trees: A survey. Graphs and Combinatorics 27(1), 1–26 (2011)37. Pinter, R.Y., Shachnai, H., Zehavi, M.: Deterministic parameterized algorithms for the graph motif problem. In:

MFCS (2014)38. Prieto, E., Sloper, C.: Reducing to independent set structure – the case of k-internal spanning tree. Nord. J.

Comput. 12(3), 308–318 (2005)39. Prieto, E., Sloper, C.: Looking at the stars. Theor. Comput. Sci. 351, 437–445 (2006)40. Raible, D., Fernau, H., Gaspers, D., Liedloff, M.: Exact and parameterized algorithms for max internal spanning

tree. Algorithmica 65(1), 95–128 (2013)41. Reed, B.A., Smith, K., Vetta, A.: Finding odd cycle transversals. Oper. Res. Lett. 32(4), 299–301 (2004)42. Salamon, G.: A survey on algorithms for the maximum internal spanning tree and related problems. Electronic

Notes in Discrete Mathematics 36, 1209–1216 (2010)43. Shachnai, H., Zehavi, M.: Representative families: A unified tradeoff-based approach. In: ESA. pp. 786–797 (2014)44. Wang, J., Feng, Q.: Improved parameterized algorithms for weighted 3-set packing. In: COCOON. pp. 130–139

(2008)45. Wang, J., Feng, Q.: An O∗(3.523k) parameterized algorithm for 3-set packing. In: TAMC. pp. 82–93 (2008)46. Williams, R.: Finding paths of length k in O∗(2k) time. Inf. Process. Lett. 109(6), 315–318 (2009)47. Zehavi, M.: Algorithms for k-internal out-branching. In: IPEC. pp. 361–373 (2013)48. Zehavi, M.: Deterministic parameterized algorithms for matching and packing problems. CoRR abs/1311.0484

(2013)

12


A Figure 1

Known reduction:

Find an out-branching → Find an out-tree

Try to complete each partial solution

using a maximum matching computation

Reduction: Find an out-tree →

Find a smaller out-tree along with a set of paths on 2 nodes

Known reduction:

Find an out-branching → Find an out-tree

The tree contains

many leaves?

Reduction: Find an out-tree →

Find a smaller out-tree along with

a set of paths on 2 nodes

Known narrow sieves-based

procedure:

Return a solution (if one exists)

Known representative sets-based procedure:

Return a family of partial solutions

Try to complete each partial solution using

a maximum matching computation

I

Yes No

Divide-and-color step

II

III

Known representative sets-based procedure:

Return a family of partial solutions

Balanced cutting of the universe

Generalized representative sets-

based procedure:


IV Unbalanced cutting

of the universe

Representative sets-based

procedure:


V Iterative compression

Representative sets-

based procedure:

Return a family of

partial solutions

Unbalanced cutting of

the universe

Dynamic programming

k

times

Fig. 1. Strategies for mixing color coding-related techniques, described in Section 2.1. We use these strategies todevelop a deterministic algorithm for k-IOB (I), a randomized algorithm for k-IOB (II), deterministic algorithmsfor k-Path, k-Tree, (r, k)-DM and GMD, including their weighted versions (III), a deterministic algorithm for(3, k)-WSP (IV), and a deterministic algorithm for P2-Packing (V).

13


B Proof of Lemma 1

Let IOT1Alg and AlgT&P be algorithms that solve (k, k, `, `)-IOT in (rand./det.) time α(G, k, `) and (k, `, q)-T&P in (rand./det.) time β(G, k, `, q), respectively. Then, we solve (k, k, 1, k)-IOT by using the followingalgorithm IOT2Alg:

Algorithm 4 IOT2Alg(G = (V,E), r, k, k, 1, k)

1: for ` = 1, 2, . . . , k do

2: if α(G, k, `) ≤∑

q=max{0,2`−k}β(G, k, `, q) then

3: if IOT1Alg(G, r, k, k, `, `) accepts then accept. end if4: else5: for q = max{0, 2`− k},max{0, 2`− k}+ 1, . . . , ` do6: if T&PAlg(G, r, k, `, q) accepts then accept. end if7: end for8: end if9: end for

10: reject.

IOT2Alg clearly runs in time O(k∑

`=1

min

α(G, k, `),

∑

q=max{0,2`−k}β(G, k − q, `− q, q)

). Thus, according

to Observation 1, it is enough to prove its correctness.

First, suppose that (G = (V,E), r, k, k, 1, k) is a yes-instance of (k, k, 1, k)-IOT. Let T = (VT , ET ) bean out-tree of minimal number of leaves among those that are solutions for this instance, and let ` denotethe number of leaves in T . Note that T is a solution for the instance (G, r, k, k, `, `) of (k, k, `, `)-IOT.

Therefore, if α(G, k, `) ≤∑

q=max{0,2`−k}β(G, k − q, `− q, q), IOT2Alg accepts at Step 3. Next suppose that

this condition is not fulfilled. Now, as long as there is an internal node v in T that has at least two children,and at least one of them, u, has exactly one child and this child is a leaf, remove the edge (v, u) from T .Denote the resulting out-tree by T ′ and the resulting paths on 2 nodes by P1, P2, . . . , Pq. Clearly, q ≤ `. Byour construction of T ′, it is an out-tree rooted at r that contains exactly (k − q) internal nodes and (`− q)leaves. Furthermore, our choice of T implies that both the father and grandfather of any leaf in T ′ do nothave more than one child. Therefore, the number of leaves in T ′ is at most half its number of internal nodes.We get that (`− q) ≤ (k − q)/2, and thus 2`− k ≤ q. Therefore, IOT2Alg accepts at Step 6.

Now, suppose that IOT2Alg accepts. If it accepts in Step 3, then this is clearly correct. Thus, we next sup-pose that it accepts in Step 6, and denote by ` and q the corresponding parameters. Let T ′ and P1, P2, . . . , Pqbe the disjoint out-tree and paths on 2 nodes, respectively, that form a solution for the instance (G, r, k, `, q)of (k, `, q)-T&P. Since T ′ is rooted at r and there is an out-branching of G that is rooted at r, we havethat there is an out-branching of G rooted at r that extends T ′. Thus, we can denote by T the set ofout-branchings of G rooted at r that extend T and have a maximum number of internal nodes, and let Tbe an out-branching in T that, among the out-branchings in T , contains a maximum number of paths from{P1, P2, . . . , Pq}. Suppose, by way of contradicition, that T contains less than k internal nodes. Since T ′

contains (k − q) internal nodes, there is a path (v → u) ∈ {P1, P2, . . . , Pq} such that both of v and u areleaves in T . By removing the edge incident to u from T , and then inserting the edge (v, u) to the result, weobtain an out-branching that has at least as many internal nodes as T , and more paths from {P1, P2, . . . , Pq}than T , which contradicts our choice of T . Therefore, T is a solution to the instance (G, r, k, |V |, 1, |V |) of(k, |V |, 1, |V |)-IOT. By Observation 2, we conclude that (G, r, k, k, 1, k) is a yes-instance of (k, k, 1, k)-IOT.

ut

14


C The Running Time of T&PAlg

First, note that (∗) = O∗(k∑

`=`∗

∑

q=max{0,2`−k}

(k+`−2q)∑

i=0

((c(k + `))(2k+2`−i)

ii(c(k + `)− i)(2k+2`−2i) 2(k+`)

1010

)) is bounded by

O∗(4k

1010

k∑

`=`∗

min{k+`,3(k−`)}∑

i=0

((c(k + `))(2k+2`−i)

ii(c(k + `)− i)(2k+2`−2i)

)), which we can further bound by

O∗(4k

1010 max`∗k ≤β≤1

(max

0≤i≤min{(1+β)k,3(1−β)k}

((c(1 + β)k)(2(1+β)k−i)

ii(c(1 + β)k − i)(2(1+β)k−2i)))

).

Let αc be the value α that maximizes the expression max0≤α≤1

{ c2−α

αα(c− α)2−2α}. If αc(1+β)k≤3(1−β)k, then

the maximum of max0≤i≤min{(1+β)k,3(1−β)k}

((c(1 + β)k)(2(1+β)k−i)

ii(c(1 + β)k − i)(2(1+β)k−2i))

is obtained at i=αc(1+β)k, and else

it is obtained at i=3(1−β)k. Thus, we can further bound (*) by O∗ of 4k

1010 times the following expression:

(max{ max`∗k ≤β≤

3−αc3+αc

(c2−αc

ααcc (c− αc)2−2αc)1+β

, maxmax{ `∗k ,

3−αc3+αc

}<β≤1

(c(1 + β))5β−1

(3(1− β))3(1−β)(c(1 + β)− 3(1− β))4(2β−1)})k.

The first part of the expression is relevant only if `∗

k ≤ 3−αc3+αc

, else we regard it as 0. Let βc be the value

β that maximizes the expression max3−αc3+αc

<β≤1

(c(1 + β))5β−1

(3(1−β))3(1−β)(c(1+β)− 3(1−β))4(2β−1).

Overall, we get that (*) is bounded by O∗ of 4k

1010 times the following:

• If βc ≤`∗

k: (

(c(1 + `∗

k ))5`∗k −1

(3(1− `∗k ))3(1−

`∗k )(c(1 + `∗

k )− 3(1− `∗k ))4(2

`∗k −1)

)k.

• Else if3− αc3 + αc

≤ `∗

k≤ βc: (

(c(1 + βc))5βc−1

(3(1−βc))3(1−βc)(c(1+βc)− 3(1−βc))4(2βc−1))k.

• Else: (max{(

c2−αc

ααcc (c− αc)2−2αc) 6

3+αc

,(c(1 + βc))

5βc−1

(3(1−βc))3(1−βc)(c(1+βc)− 3(1−βc))4(2βc−1)})k.

Table 3, given below, presents approximate values of αc and βc, corresponding to different choices of c.Then, Table 4 presents bounds for (*), corresponding to different choices of c, where `∗ = 1/k. In particular,by choosing c = 1.497, we get the bound O∗(5.139k) (used in Section 3.2). Next, Table 5 presents otherbounds for (*), corresponding to different choices of c and γ, where `∗ = γk. In particular, by choosingc = 1.765 and γ = 0.8545, we get the bound O∗(3.617k) (used in Section 3.3).

D Proof of Theorem 3 (Cont.)

First, note that the size of F is exactly

t∏

i=1

|Fi|, which, by the properties of the separators Dci(Ei,ki,pi)

, is

bounded by

t∏

i=1

((ciki)

ki


). We insert a set to S only if there exists an indicator

of the form zF that is turned off, and afterwards, at least one such indicator is turned on (permanently).

Therefore, the returned family S is of the desired size.By the properties of the separators Dci

(Ei,ki,pi), the time complexity of Step 1 is bounded by

O(

t∑

i=1

((ciki)

ki

pipi(cki − pi)ki−pi2o(ki)|Ei| log |Ei|+ |S|(

cikiciki − pi

)ki−pi2o(ki) log |Ei|)

), and those of Steps 5, 6 and

15


c αc(

c2−αcααcc (c−αc)2−2αc

) 63+αc· 4 1

10103−αc3+αc

βc(c(1+βc))

5βc−1·41

1010

(3(1−βc))3(1−βc)(c(1+βc)−3(1−βc))4(2βc−1)

1 0.55013 ≤ 5.873 0.69008 0.71350 ≤ 5.9441

1.4 0.54908 ≤ 5.094 0.69058 0.71582 ≤ 5.1552

1.45 0.55302 ≤ 5.080 0.68870 0.71441 ≤ 5.1424

1.495 0.55692 ≤ 5.075 0.68685 0.71299 ≤ 5.13864

1.496 0.55701 ≤ 5.075 0.68681 0.71296 ≤ 5.13864

1.497 0.55710 ≤ 5.075 0.68677 0.71293 ≤ 5.13863

1.498 0.55719 ≤ 5.075 0.68672 0.71289 ≤ 5.13863

1.499 0.55729 ≤ 5.075 0.68669 0.71286 ≤ 5.13864

1.5 0.55737 ≤ 5.075 0.68664 0.71283 ≤ 5.13865

Table 3. Approximate values of αc and βc, corresponding to different choices of c.

c 1 1.4 1.45 1.495 1.496 1.497 1.498 1.499 1.5

Bound 5.9441 5.1552 5.1424 5.13864 5.13864 5.13863 5.13863 5.13864 5.13865

Table 4. Upper bounds for (*), corresponding to different choices of c, where `∗ = 1/k. An entry that stores aconstant a corresponds to the bound O∗(ak).

γ \ c 1.763 1.764 1.765 1.766

0.8544 3.617665566 3.617665007 3.617665035 3.617665648

0.8545 3.615894763 3.615894103 3.615894029 3.615894539

Table 5. Upper bounds for (*), corresponding to different choices of c and `∗, where `∗ = γk. An entry that storesa constant a corresponds to the bound O∗(ak).

7 are bounded by O(|F|), O(|S|t∏

i=1

((

cikiciki−pi


) and O(|S| log |S|), respectively. Moreover,

the time complexity of the computation in Steps 8–17 is bounded by O(|S|t∏

i=1

((

cikiciki − pi


).

We thus conclude that GenRepAlg runs in the desired time.

It remains to show that S max (min) (k1−p1, k2−p2, . . . , kt−pt)-represents S. Consider any sets X ∈ Sand Y ⊆ E \X such that [∀i ∈ {1, 2, . . . , t} : |Y ∩Ei| ≤ ki − pi]. We need to prove that there is a set X ∈ Sdisjoint from Y such that w(X) ≥ w(X) (w(X) ≤ w(X)). If X ∈ S, then X is the desired set, and thus wenext assume that this is not the case. By the properties of the separators Dci

(Ei,ki,pi), for all i ∈ {1, 2, . . . , t},

there is a set Fi ∈ Fi such that X ∩Ei ⊆ Fi and Y ∩Fi = (Y ∩Ei)∩Fi = ∅. Therefore, by our definition ofF and χ(X), there is a set F ∈ F such that X ∩E ⊆ F and Y ∩ F = ∅. By the pseudocode, when we reachX we do not insert it to S since we have already inserted at least one other set X ′ that is ordered before Xand satisfies F ∈ χ(X ′). This set X ′ is the desired set X. ut

E Proof of Lemma 5

In this section we prove the following lemma, which implies the correctness of Lemma 5:

Lemma 7. Denote x = (b( 12 + δ)γkc+ b( 1

2 − δ)γkc). Then, for any fixed c1, c2, c`, cr ≥ 1 such that c1 ≥ c2,

k-CWP can be solved in deterministic time O∗(X · 2o(k)), where X is bounded by the maximum among thefollowing four expressions:6

6 We give each expression a short name that roughly describes its relevance.

16


1. “First half (+δ) of the procedure”:

m+mmaxi=1

b( 12+δ)γkcmax

j= i−1m+m

(b( 12+δ)γkc−(k−2mbεkc−2))

i(bεkc−1)−jmax

s=1+(i−1)(bεkc−1)−j

((c`b( 1

2 + δ)γkc)2b( 12+δ)γkc−j

jj(c`b( 12 + δ)γkc − j)2b( 1

2+δ)γkc−2j· (c1(k − 1

ε − x))2(k−1ε−x)−s

ss(c1(k − 1ε − x)− s)2(k− 1

ε−x)−2s

).

2. “Transition (phase 1) from L to R”:

b( 12+δ)γkcmax

i=b( 12+δ)γkc−(k−2mbεkc−2)

k−2mbεkc−2maxj=0

(k− 1ε bεkc)+(m+m+1)(bεkc−1)−b( 1

2+δ)γkc−jmax

s=1+(m+m)(bεkc−1)−i−j(

(c`b( 12 +δ)γkc)2b( 1

2+δ)γkc−i

ii(c`b( 12 +δ)γkc−i)2b( 1

2+δ)γkc−2i· (crb( 1

2−δ)γkc)2b(12−δ)γkc−j

jj(crb( 12−δ)γkc−j)2b(

12−δ)γkc−2j

· (c1(k− 1ε−x))2(k−

1ε−x)−s

ss(c1(k− 1ε−x)−s)2(k− 1

ε−x)−2s

).

3. “Transition (phase 2) from c1 to c2”:7

maxc∈{c1−ε,c1−2ε,...,c2}

k−2mbεkcmaxj=0

maxs=1+(k− 1

ε bεkc)+(m+m+1)(bεkc−1)−b( 12+δ)γkc−j

((crb( 1

2 − δ)γkc)2b(12−δ)γkc−j

jj(crb( 12 − δ)γkc − j)2b(

12−δ)γkc−2j

· ((c+ ε)(k − 1ε − x))k−

1ε−x

ss((c+ ε)(k − 1ε − x)− s)k− 1

ε−x−s· (c(k − 1

ε − x))k−1ε−x−s

(c(k − 1ε − x)− s)k− 1

ε−x−s

).

4. “Second half (−δ) of the procedure”:

m−mmaxi=1

im−m (b( 1

2−δ)γkc−(k−2mbεkc−2))+(k−2mbεkc−2)maxj=0

(k− 1ε bεkc)+(m+m+i+1)(bεkc−1)−b( 1

2+δ)γkc−jmax

s=1+(k− 1ε bεkc)+(m+m+i)(bεkc)−b( 1

2+δ)γkc−j(

(crb( 12 − δ)γkc)2b(

12−δ)γkc−j

jj(crb( 12 − δ)γkc − j)2b(

12−δ)γkc−2j

· (c2(k − 1ε − x))2(k−

1ε−x)−s

ss(c2(k − 1ε − x)− s)2(k− 1

ε−x)−2s

).

Proof. Let IMG denote the union of the images of `1, `2, r1 and r2. When we next refer to (generalized)representative families (see Definition 3), suppose that E1 = L, E2 = R, E2 = V \ (L ∪ R ∪ IMG),

k1 = b( 12 + δ)γkc, k2 = b( 1

2 − δ)γkc and k3 = k − 1ε − x.

We now present a standard dynamic programming-based procedure to prove the lemma, in which weembed representative sets computations (after each computation of a family of partial solutions, we computea family that represents it). To this end, we use the following three matrices:

1. M has an entry [i, j, s, v] for all i ∈ {1, . . . ,m+m}, j ∈ { i−1m+m (b( 1

2+δ)γkc−(k−2mbεkc−2)), . . . , b( 12+δ)γkc},

s ∈ {1+(i−1)(bεkc−1)−j, . . . , i(bεkc−1)−j}, and v ∈ V \(R∪IMG) such that [(j+s = (i−1)(bεkc−1)+1)→({`1(i), v} ∈ E)] and [(j+s = i(bεkc−1))→ ({v, `2(i)} ∈ E)].

2. N has an entry [i, j, s, v] for all i ∈ {b( 12+δ)γkc−(k−2mbεkc−2), . . . , b( 1

2+δ)γkc}, j ∈ {0, . . . , k−2mbεkc−2},s ∈ {1+(m+m)(bεkc−1)−i−j, . . . , (k−1

ε bεkc)+(m+m+1)(bεkc−1)−b( 12+δ)γkc−j}, and v ∈ V \IMG such

that [(i+j+s = (m+m)(bεkc−1)+1)→ ({v`, v} ∈ E)] and [(i+j+s = (m+m+1)(bεkc−1))→ ({v, vr} ∈ E)].

3. K has an entry [i, j, s, v] for all i ∈ {1, . . . ,m−m}, j ∈ {0, . . . , im−m (b( 1

2−δ)γkc−(k−2mbεkc−2))+(k−2mbεkc−2)}, s ∈ {1+(k−1

ε bεkc)+(m+m+i)(bεkc−1)−b( 12+δ)γkc−j, . . . , (k−1

ε bεkc)+(m+m+i+1)(bεkc−1)−b( 12+δ)γkc−j},

and v ∈ V \ (L ∪ IMG) such that [(b( 12 +δ)γkc+j+s = (i−1)(bεkc−1)+1) → ({r1(i), v} ∈ E)] and

[(b( 12 +δ)γkc+j+s = i(bεkc−1))→ ({v, r2(i)} ∈ E)].

7 Assume WLOG that c1−c2ε∈ N.

17


We next assume that a reference to an undefined entry returns ∅. The other entries will store the followingfamilies of partial solutions, where we assume that we track the weights of the sets of paths correspondingto the partial solutions:8

1. M[i, j, s, v]: A family that min (k1 − j, k2, k3 − s)-represents the family that contains any union of node-sets of i simple internally node-disjoint directed paths P1, P2, . . . , Pi, excluding the nodes in IMG, whichsatisfy the following conditions:• P1, P2, . . . Pi−1 satisfy “function condition 1”.• The first and last nodes of Pi are `1(i) and v, respectively, and it does not contain internal nodes

from (R∪IMG). Moreover, the total number of nodes from L and V \ (L∪IMG) that are containedin the paths P1, . . . , Pi are exactly j and s, respectively.

2. N[i, j, s, v]: A family that min (k1 − i, k2 − j, k3 − s)-represents the family that contains any union ofnode-sets of i simple internally node-disjoint directed paths P1, P2, . . . , Pi, excluding the nodes in IMG,which satisfy the following conditions:• P1, P2, . . . Pi−1 satisfy “function condition 1”.• The first and last nodes of Pi are v` and v, respectively, and it does not contain internal nodes fromIMG. Moreover, the total number of nodes from L, R and V \ (L∪ IMG) that are contained in thepaths P1, . . . , Pi are exactly i, j and s, respectively.

3. K[i, j, s, v]: A family that min (0, k2 − j, k3 − s)-represents the family that contains any union of node-sets of i simple internally node-disjoint directed paths P1, P2, . . . , Pi, excluding the nodes in IMG, whichsatisfy the following conditions:• P1, P2, . . . Pi−1 satisfy “function conditions 1–3”.• The first and last nodes of Pi are r1(i) and v, respectively, and it does not contain internal nodes

from (L ∪ IMG). Moreover, the total number of nodes from R and V \ (L ∪ R ∪ IMG) that arecontained in the paths P1, . . . , Pi are exactly j and s, respectively.

The entries are computed in the following order:

1. For i = 1, . . . ,m+m:• For j = i−1

m+m (b( 12 +δ)γkc−(k−2mbεkc−2)), . . . , b( 1

2 +δ)γkc:− For s = 1+(i−1)(bεkc−1)−j, . . . , i(bεkc−1)−j:∗ Compute all entries of the form M[i, j, s, v].

2. For i = b( 12 +δ)γkc−(k−2mbεkc−2), . . . , b( 1

2 +δ)γkc:• For j = 0, . . . , k−2mbεkc−2:

− For s = 1+(m+m)(bεkc−1)−i−j, . . . , (k− 1ε bεkc)+(m+m+1)(bεkc−1)−b( 1

2 +δ)γkc−j:∗ Compute all entries of the form K[i, j, s, v].

3. For i = 1, . . . ,m−m:• For j = 0, . . . , i

m−m (b( 12−δ)γkc−(k−2mbεkc−2))+(k−2mbεkc−2):

− For s = 1+(k−1ε bεkc)+(m+m+i)(bεkc−1)−b( 1

2 + δ)γkc−j, . . . , (k−1ε bεkc)+(m+m+i+1)(bεkc−

1)−b( 12 + δ)γkc−j:

∗ Compute all entries of the form N[i, j, s, v].

We now give the recursive formulas using which the entries are computed.9

The matrix M:

1. If j + s = 1:(a) If j = 1 and v ∈ L: M[i, j, s, v] = {{v}}.

8 When computing an entry and obtaining the same partial solution from several different sets of paths, we storethe minimal weight.

9 Entries whose computation is not specified hold ∅.

18


(b) Else if s = 1 and v /∈ L: M[i, j, s, v] = {{v}}.2. Else:

(a) If v ∈ L: M[i, j, s, v] = {{v} ∪A : A ∈⋃

u∈VM[i−1, j−1, s, u]} ∪ {{v} ∪A : A ∈

⋃

(u,v)∈EM[i, j−1, s, u]}.

(b) Else: M[i, j, s, v] = {{v} ∪A : A ∈⋃

u∈VM[i−1, j, s−1, u]} ∪ {{v} ∪A : A ∈

⋃

(u,v)∈EM[i, j, s−1, u]}.

• After 2: Replace the result by a family that min (k1 − j, k2, k3 − s)-represents it, computed using c`, 1and c1, corresponding to E1, E2 and E3, respectively.

The matrix N:

1. If i+ j + s = 1 + (m+ m)(bεkc − 1):

(a) If j = 0 and v ∈ L: N[i, j, s, v] = {{v} ∪A : A ∈⋃

u∈VM[(m+ m), j−1, s, u]}.

(b) Else if j = 1 and v ∈ R: N[i, j, s, v] = {{v} ∪A : A ∈⋃

u∈VM[(m+ m), j, s, u]}.

(c) Else if j = 0 and v ∈ V \ (L ∪R ∪ IMG): N[i, j, s, v] = {{v} ∪A : A ∈⋃

u∈VM[(m+ m), j, s−1, u]}.

2. Else:(a) If v ∈ L: N[i, j, s, v] = {{v} ∪A : A ∈

⋃

(u,v)∈VN[i−1, j, s, u]}.

(b) Else if v ∈ R: N[i, j, s, v] = {{v} ∪A : A ∈⋃

(u,v)∈VN[i, j−1, s, u]}.

(c) Else: N[i, j, s, v] = {{v} ∪A : A ∈⋃

(u,v)∈VN[i, j, s−1, u]}.

• After 1 and 2: Replace the result by a family that min (k1 − i, k2 − j, k3 − s)-represents it, computedusing c`, cr and c1, corresponding to E1, E2 and E3, respectively.

The matrix K:

1. If j + s = 1 + (k − 1ε bεkc) + (m+ m+ 1)(bεkc − 1)− (b( 1

2 + δ)γkc:(a) If v ∈ R: K[i, j, s, v] = {{v} ∪A : A ∈

⋃

u∈VN[(b(1

2+δ)γkc, j−1, s, u]}.

(b) Else: K[i, j, s, v] = {{v} ∪A : A ∈⋃

u∈VN[(b(1

2+δ)γkc, j, s−1, u]}.

2. Else:(a) If v ∈ R: K[i, j, s, v] = {{v} ∪A : A ∈

⋃

u∈VK[i−1, j−1, s, u]} ∪ {{v} ∪A : A ∈

⋃

(u,v)∈EK[i, j−1, s, u]}.

(b) Else: K[i, j, s, v] = {{v} ∪A : A ∈⋃

u∈VK[i−1, j, s−1, u]} ∪ {{v} ∪A : A ∈

⋃

(u,v)∈EK[i, j, s−1, u]}.

• After 1: Perform the following computation.1. Initialize A to be the result.2. For c = c1 − ε, c1 − 2ε, . . . , c2:− Replace A by a family that (k1, k2 − j, k3 − s)-represents it, computed using 1, cr and c, corre-

sponding to E1, E2 and E3, respectively.3. Replace the original result by A.

• After 2: Replace the result by a family that min (k1, k2 − j, k3 − s)-represents it, computed using 1, crand c2, corresponding to E1, E2 and E3, respectively.

Finally, we return yes iff at least one entry of the form K[ 1ε , k2, k3, v] contains a set of weight at most(W − w((v, r2( 1

ε )))).

By Theorem 3, up to a factor of 2o(k), the running time required to compute M is bounded by O∗

of the first expression in the lemma, the running time required to compute N is bounded by O∗ of thesecond expression, the running time required to compute the entries of K in which i = 1 and j + s =1 + (k − 1

ε bεkc) + (m+ m+ 1)(bεkc − 1)− (b( 12 + δ)γkc is bounded by O∗ of the third expression, and the

running time required to compute the other entries of K is bounded by O∗ of the fourth expression. ut

19


F Proof of Lemma 6 (Cont.)

Let PRT be the set of all partitions of {P1, P2, . . . , P2m} into two sets, PL of size (m+m) and PR of size (m−

m). By our definition of V ∗i ,

2m∑

j=1

|Vj ∩ V ∗i |+ |Vm+m+1 ∩ V ∗i | = (b(1

2+ δ)γkc+ b(1

2− δ)γkc). Furthermore, by

our choice of Pm+m+1, we have that |Vj ∩ V ∗i | ≤ |Vm+m+1 ∩ V ∗i |, for all j ∈ {1, 2, . . . , 2m}. Therefore, there

is a partition (PL,PR) in PRT such that∑

Pj∈PL

|Vj ∩ V ∗i | ≤ b(1

2+ δ)γkc and

∑

Pj∈PR

|Vj ∩ V ∗i | ≤ b(1

2− δ)γkc.

Next consider such a partition.Denote by P1, P2, . . . , Pm+m an order of the paths in PL such that |Vj−1 ∩ V ∗i | ≥ |Vj ∩ V ∗i |, for all

j ∈ {2, 3, . . . ,m + m}, where Vs is the node-set of Ps, for all s ∈ {1, 2, . . . ,m + m}. Furthermore, denoteby Pm+m+2, Pm+m+3, . . . , P2m+1 an order of the paths in PR such that |Vj−1 ∩ V ∗i | ≤ |Vj ∩ V ∗i |, for all j ∈{m+m+3,m+m+4, . . . , 2m+1}, where Vs is the node-set of Ps, for all s ∈ {m+m+2,m+m+3, . . . , 2m+1}.We define `1, `2 : {1, 2, . . . ,m + m} → U by letting `1(j) and `2(j) be the first and last nodes of Pj ,respectively, for all j ∈ {1, 2, . . . ,m+ m}. Similarly, we define r1, r2 : {1, 2, . . . ,m− m} → U by letting `1(j)and `2(j) be the first and last nodes of Pm+m+1+j , respectively, for all j ∈ {1, 2, . . . ,m− m}.

By our choice of (PL,PR), we can select a set of nodes Lm+m+1 from Vm+m+1 ∩ V ∗i such that for the

sets Rm+m+1 = (Vm+m+1 ∩ V ∗i ) \ Lm+m+1, L∗ = Lm+m+1 ∪ (⋃

j∈{1,2,...,m+m}(Vj ∩ V ∗i )) and R∗ = Rm+m+1∪

(⋃

j∈{m+m+2,m+m+3,...,2m+1}(Vj ∩ V ∗i )), we get that |L∗| = b( 1

2 + δ)γkc and |R∗| = b( 12 − δ)γkc. Let F be

the (n, b( 12 + δ)γkc+ b( 1

2 − δ)γkc, b( 12 − δ)γkc)-universal set computed by using the algorithm in Theorem

1. By its definition, there exists f ∈ F such that L∗ ⊆ L = {vj ∈ V \ U : j ≥ i, f(vj) = 0} andR∗ ⊆ R = {vj ∈ V \ U : j ≥ i, f(vj) = 1}.

Note that there exists an execution of Step 9 where PathAlg calls the procedure of Lemma 5 with the input(G,w,W, k, L,R, `1, `2, r1, r2, v

`, vr). Since this is a yes-instance of k-CWP (the paths P1, P2, . . . , P2m+1 forma solution for this instance), we get that PathAlg accepts. ut

G The Running Time of PathAlg

Consider the expression X in the running time given in Lemma 6. First, note that it is bounded by themaximum of the following two expressions:

1. max0≤α≤1

max0≤j≤( 1

2+δ)α(k−γk)

[(c2−α`

αα(c` − α)2−2α

)( 12+δ)γk

· (c1(k − γk))2(k−γk)−j


].

2. max0≤α≤1

max( 12+δ+α(

12−δ))(k−γk)≤j≤k−γk

[(c2−αr

αα(cr − α)2−2α

)( 12−δ)γk

· (c2(k − γk))2(k−γk)−j


].

We can further bound this expression by Y k, where Y is bounded by the maximum of the following twoexpressions:

1. max0≤α≤1

max0≤β≤α( 1

2+δ)

(

c2−α`

αα(c` − α)2−2α

)( 12+δ)γ

·(

c2−β1

ββ(c1 − β)2−2β

)(1−γ) .

2. max0≤α≤1

max( 12+δ+α(

12−δ))≤β≤1

(

c2−αr


)( 12−δ)γ

·(

c2−β2

ββ(c2 − β)2−2β

)(1−γ) .

Let α`c be the α that maximizes

(c2−α`

αα(c` − α)2−2α

), αrc be the α that maximizes

(c2−αr


), β1

c

be the β that maximizes

(c2−β1

ββ(c1 − β)2−2β

), and β2

c be the β that maximizes

(c2−β2

ββ(c2 − β)2−2β

). For any

20


choice of values for c1, c2, c` and cr we should try below, we have that ( 12+δ) < β1

c and β2c < ( 1

2+δ+αrc(12−δ)).

Denote α′ = max{0, β2c−1/2−δ1/2−δ }. Therefore, we can further bound Y by the maximum of the following two

expressions:

1. maxα`c≤α≤1

(

c2−α`

αα(c` − α)2−2α

)( 12+δ)γ

·(

c2−α( 1

2+δ)1

(α( 12 + δ))α(

12+δ)(c1 − α( 1

2 + δ))2−2α(12+δ)

)(1−γ).

2. maxα′≤α≤αrc

(

c2−αr

αα(cr−α)2−2α

)( 12−δ)γ·(

c2−( 1

2+δ+α(12−δ))

2

( 12 +δ+α( 1

2−δ))(12+δ+α(

12−δ))(c2−( 1

2 +δ + α( 12−δ)))2−2(

12+δ+α(

12−δ))

)(1−γ).

Denote the bounds in the first and second items by Y1 and Y2, respectively. Let, Z1 = Y k1 ·( γk( 12−δ)γk

)2

11010

k,

Z2 = Y k2 ·( γk( 12−δ)γk

)2

11010

k and Z = max{Z1, Z2}. Note that running time of PathAlg is bounded by O∗(Z).

Table 6, given below, presents bounds for Z, corresponding to different choices of δ, γ, c1, c2, c` and cr. Inparticular, by choosing δ = 0.046, γ = 0.084, c1 = 1.504, c2 = 1.398, c` = 1.092 and cr = 1.876, we get thebound 2.59606. In this case the maximum of Z1 is obtained at α ∼= 0.908105, where it is almost equal to2.59606, and the maximum of Z2 is obtained at α ∼= 0.123734, where it is also almost equal to 2.59606.

(δ, γ, c1, c2, c`, cr) Z Z1 Z2

(0.046,0.084,1.504,1.398,1.092,1.876) 2.5960542 2.5960542 2.5960425

(0.045, 0.084, 1.504, 1.398, 1.092, 1.876) 2.5965734 2.5953152 2.5965734

(0.047, 0.084, 1.504, 1.398, 1.092, 1.876) 2.5967889 2.5967889 2.5955049

(0.046, 0.083, 1.504, 1.398, 1.092, 1.876) 2.5960903 2.5960421 2.5960903

(0.046, 0.085, 1.504, 1.398, 1.092, 1.876) 2.5960711 2.5960711 2.5959989

(0.046, 0.084, 1.503, 1.398, 1.092, 1.876) 2.5960547 2.5960547 2.5960425

(0.046, 0.084, 1.505, 1.398, 1.092, 1.876) 2.5960545 2.5960545 2.5960425

(0.046, 0.084, 1.504, 1.397, 1.092, 1.876) 2.5960542 2.5960542 2.5960430

(0.046, 0.084, 1.504, 1.399, 1.092, 1.876) 2.5960542 2.5960542 2.5960434

(0.046, 0.084, 1.504, 1.398, 1.091, 1.876) 2.5960544 2.5960544 2.5960425

(0.046, 0.084, 1.504, 1.398, 1.093, 1.876) 2.5960545 2.5960545 2.5960425

(0.046, 0.084, 1.504, 1.398, 1.092, 1.875) 2.5960542 2.5960542 2.5960425

(0.046, 0.084, 1.504, 1.398, 1.092, 1.877) 2.5960542 2.5960542 2.5960425

Table 6. Upper bounds for Z, Z1 and Z2, corresponding to different choices of δ, γ, c1, c2, c` and cr.

H Solving the (3, k)-WSP Problem

In this section we develop a deterministic FPT algorithm for (3, k)-WSP, following the fourth strategy inSection 2.1. We first solve a subcase of (3, k)-WSP, and then show how to translate (3, k)-WSP to thissubcase by using unbalanced cutting of the universe.

H.1 Solving a Subcase of (3, k)-WSP

We first give a brief overview of the algorithm for (3, k)-WSP of [25] since our algorithm builds upon it, andit will allow us to explain the intuition behind the definition of the subcase solved in this section. To thisend, we need the following observation of [8], assuming an arbitrary order on the elements in U :

Observation 3. Let S ′ ⊆ S, and denote Smin = {u : ∃S ∈ S ′ in which u is the smallest element}. Then,any S ∈ S whose smallest element is larger than max(Smin) does not contain any element from Smin.

21


The algorithm of [25] iterates over U in an ascending order, such that when it reaches an element u ∈ U , ithas already computed representative families of families of partial solutions (each partial solution correspondsto a family of disjoint sets from S) that include only sets from S whose smallest elements are smaller thanu. Then, it tries to extend the partial solutions by adding sets whose smallest element is u, followed bycomputations of representative families to reduce the size of the resulting families. By Observation 3, theelements in U that are the smallest elements of sets in the partial solutions do not appear in any set whosesmallest element is at least u. This allows the algorithm to delete the smallest elements of sets after addingthem to partial solutions, which results in faster computations of representative families that overall improvesthe running time of the algorithm.

We now consider Cut (3, k)-WSP ((3, k)-CWSP), the subcase of (3, k)-WSP that we solve using arepresentative sets-based procedure. Informally, in defining this subcase, the main idea is to introduce afunction that cuts a solution in a manner that allows us, while executing a procedure that builds upon thealgorithm of [25], to delete more elements than just those that are the smallest in the inserted sets. Moreprecisely, in the algorithm, we will have a fixed, though large, number of locations where we are “given” anelement that indicates that from now on, we can delete all elements that are smaller than it.

Fix 0 < ε < 0.1, whose value is determined later, such that 1ε ∈ N. Formally, the input for (3, k)-CWSP

consists of an ordered universe U = {u1, u2, . . . , un}, where ui−1 < ui for all i ∈ {2, 3, . . . , n}. We are alsogiven a family S of subsets of size 3 of U , a weight function w : S → R, a weight W ∈ R and a parameterk ∈ N, along with a non-decreasing function f : {1, 2, . . . , 1ε } → U .

We need to decide if there is an ordered subfamily S ′ ⊆ S of k disjoint sets, denoted accordingly asS ′ = {S1, S2, . . . , Sk}, whose total weight is at least W , such that the following “solution conditions” aresatisfied:

1. ∀i ∈ {2, 3, . . . , k}: min(Si−1) < min(Si) (i.e., the smallest element in Si−1 is smaller than the smallestone in Si).

2. ∀i ∈ {1, 2, . . . , 1ε }: There are at least R(i) elements in

ibεkc⋃

j=1

(Sj \ {min(Sj)}) that are smaller or equal to

f(i), where R is defined according to the following recursion.

• R(0) = R(1) = 0.

• For j = 2, . . . , 1ε : R(j) = R(j − 1) +

⌈2(j − 1)bεkc −R(j − 1)

d3(k − (j − 1)bεkc)/bεkce

⌉.

3. ∀i ∈ {1, 2, . . . , 1ε }: All the elements in

k⋃

j=1+ibεkcSj are larger than f(i).

We next show that (3, k)-CWSP can indeed be efficiently solved using representative sets, proving thefollowing lemma:

Lemma 8. For any fixed c ≥ 1 and 0 < ε < 1, (3, k)-CWSP can be solved in deterministic time Y =

O∗(2o(k) ·1ε

maxi=1

ibεkcmax

j=1+(i−1)bεkcX), where, for i′∈{1, 2} such that [i′=2↔ j=1+(i−1)bεkc>1], X is equal to

(c(3k−(j−1)−R(i−i′)))3k−(j−1)−R(i−i′)

(2(j−1)−R(i−i′))2(j−1)−R(i−i′)(c(3k−(j−1)−R(i−i′))−(2(j−1)−R(i−i′)))(3k−(j−1)−R(i−i′))−(2(j−1)−R(i−i′)) ·

(c(3k − j −R(i− 1))

c(3k − j −R(i− 1))− (2j −R(i− 1)))(3k−j−R(i−1))−(2j−R(i−1)).

Proof. When we next refer to representative families, suppose that E = U . We now present a standarddynamic programming-based procedure to prove the lemma, in which we embed representative sets compu-tations. To this end, we use a matrix M that has an entry [i, j, s1, s2, . . . , s 1

ε,m] for all i ∈ {1, 2, . . . , 1ε + 1},

j ∈ {1 + (i − 1)bεkc, . . . , J}) where [(i ≤ 1ε → J = ibεkc) ∧ (i = 1

ε + 1 → J = (i − 1)bεkc + (kmod bεkc))],[∀` ∈ {1, . . . , 1ε } : s` ∈ {0, 1, . . . , 2j}] such that [∀` ∈ {1, . . . , i− 1} : s` ≥ R(`)], and m ∈ {j, . . . , n} such that(i > 1→ um > f(i− 1)).

22


We next assume that a reference to an undefined entry returns ∅. The other entries will store the followingfamilies of partial solutions, where we assume that we track the weights of the partial solutions:10

• M[i, j, s1, . . . , s 1ε,m]: A family that max 3(k − j)-represents the family F defined as follows. Let F be

the family of every ordered disjoint sets S1, S2, . . . , Sj such that:

1. ∀` ∈ {2, 3, . . . , j}: min(S`−1) < min(S`).

2. min(Sj) = um.

3. ∀` ∈ {1, 2, . . . , i−1}: There are at least R(`) elements in

`bεkc⋃

p=1

(Sp \ {min(Sp)}) that are at most f(`).

4. ∀` ∈ {1, 2, . . . , 1ε }: There are exactly s` elements in

j⋃

p=1

(Sp \ {min(Sp)}) that are at most f(`).

5. ∀` ∈ {1, 2, . . . , i− 1}: All the elements in

j⋃

p=1+`bεkcSp are larger than f(`).

Suppose that f(0) returns a value smaller than u1. Then, for each such sets S1, S2, . . . , Sj in F , F

contains (

j⋃

`=1

(S` \ {min(S`)})) \ {u` ∈ U : u` ≤ f(i−1)},


• For i = 1, . . . , 1ε + 1:− For j = 1 + (i− 1)bεkc, . . . , J :∗ Compute all entries of the form M[i, j, s1, . . . , s 1

ε,m].

We now give the recursive formulas using which the entries are computed.

1. If j=1: M[1, 1, s1, . . . , s 1ε,m] = {S∈S : min(S)=um, [∀` ∈ {1, . . . , 1ε } : |{up∈S\{um} : up<f(`)}|=s`]}.

2. Else: M[i, j, s1, . . . , s 1ε,m] =

{S ∪A : S ∩A = ∅, S ∈ S,min(S) = um, A ∈m−1⋃

m′=1

1⋃

i′=0

M[i− i′, j − 1, s′1, . . . , s′1ε,m′]},

where [∀` ∈ {1, . . . , 1ε } : s′` = s` − |{up ∈ S \ {um} : up ≤ f(`)}|].• After 2:

1. If i > 1: Remove from each set in M[i, j, s1, . . . , s 1ε,m] the elements that are at most f(i− 1).

2. Replace the result by a family that max 3(k − j)-represents it, where s0 = 0.

Finally, we return yes iff at least one entry of the form M[i, k, s1, . . . , s 1ε,m] contains a solution of weight

at least W . Note that Theorem 2 ensures that the procedure can be performed in the desired time. ut

We proceed to analyze the bound, denoted Y , for the running time in Lemma 8. First, define T accordingto the following recursion:

• T (0) = T (1) = 0.

• For j = 2, . . . , 1ε : T (j) = T (j − 1) + ε

(2(j − 1)ε− T (j − 1)

3(1− (j − 1)ε)

).

Thus, we get that: Y = O∗(2o(k) ·max1εi=1X

′), where X ′ is equal to

max(i−1)εk≤j≤iεk

((c(3k − j − T (i−1)k))2(3k−j−T (i−1)k)−(2j−T (i−1)k)

(2j − T (i−1)k)2j−T (i−1)k(c(3k − j − T (i−1)k)− (2j − T (i−1)k))2(3k−j−T (i−1)k)−2(2j−T (i−1)k)

).

10 When computing an entry and obtaining the same partial solution from several different families of sets, we storethe maximal weight.

23


Now, we can further bound X ′ by the following expression:

max(i−1)≤α≤i

((c(3− αε− T (i−1)))2(3−αε−T (i−1))−(2αε−T (i−1))

(2αε− T (i−1))2αε−T (i−1)(c(3− αε− T (i−1))− (2αε− T (i−1)))2(3−αε−T (i−1))−2(2αε−T (i−1))

)k.

Therefore Y is bounded by:

O∗(2o(k) ·1ε

maxi=1

max(i−1)≤α≤i

((c(3− αε− T (i−1)))6−4αε−T (i−1)

(2αε− T (i−1))2αε−T (i−1)(c(3− αε− T (i−1))− 2αε+ T (i−1))6−6αε

)k).

To allow us to compute a bound efficiently, we choose ε = 10−5. Choosing a smaller ε results in a betterbound, but the improvement is negligible. Table 7, given below, presents bounds for Y , corresponding todifferent choices of c. In particular, by choosing c = 1.591, we get the bound O∗(8.097k).

c Y i T (i− 1)

1.59 O∗(8.096400k) 54511 0.1476545

1.591 O∗(8.096396k) 54515 0.1476821

1.592 O∗(8.096397k) 54518 0.1477028

Table 7. Upper bounds for Y , corresponding to different choices of c, where ε = 10−5. The second and third entriesspecify approximate values for i and T (i− 1) that correspond to the maximum.

We thus obtain the following corollary:

Corollary 3. For c = 1.591 and ε = 10−5, (3, k)-CWSP can be solved in deterministic time O∗(8.097k).

H.2 A Deterministic Algorithm for (3, k)-WSP

Fix c = 1.591 and ε = 10−5. We now present the pseudocode of WSPAlg, a deterministic algorithm for(3, k)-WSP, followed by informal explanations.

Algorithm 5 WSPAlg(U,S, w,W, k)

1: order the universe U arbitrarily as U = {u1, u2, . . . , un}.2: for all distinct `1, `2, . . . , ` 1

ε, r1, r2, . . . , r 1

ε∈ U s.t. ì < ri for all i ∈ {1, 2, . . . , 1

ε} do

3: for i = 1, 2, . . . , 1εdo

4: define U i = {ì, ì + 1, . . . , ri} \⋃i−1j=1 U

j .5: end for

6: U1ε+1 = U \⋃

1εj=1 U

j .

7: let U ′ = {u′1, u′2, . . . , u′n} be an ordered copy of U such that the elements in U1 appear first, then those in U2,and so on, where the order between the elements in each U i is preserved according to their order in U .

8: define f : {1, 2, . . . , 1ε} → U ′ such that f(i) is the largest element in U ′ that belongs to U i.

9: if the procedure of Corollary 3 accepts (U ′,S, w,W, k, f) then accept. end if10: end for11: reject.

Algorithm WSPAlg first orders the elements in U . Then, it exhaustively examines every option to explicitlycut the universe into 1

ε parts, such that a part consists of elements that are consecutively ordered in U whenconsidering only those left after removing the elements that belong to parts we have already defined (see

Steps 2–6). Note that there are O(|U |O( 1ε )) = O∗(1) such options. The order between the parts is defined

according to the order in which they were defined, which, in turn, defines a reordering of the universe U (inStep 7). The function f is also defined according to the order between the parts (see Step 8), assigning, in an

24


ascending order, the last element of each part. Overall, WSPAlg uses unbalanced cutting of the universe toobtain a set of inputs for (3, k)-CWSP,11 and accepts (in Step 9) iff at least one of them is a yes-instance.

We next consider the correctness and running time of WSPAlg.

Theorem 7. WSPAlg solves (3, k)-WSP in time O∗(8.097k).

Proof. The running time of the algorithm follows immediately from the pseudocode and Corollary 3.

For the easier direction, note that for any ordering of the universe U as U ′, and for any function f :{1, 2, . . . , 1ε }, a solution to the instance (U ′,S, w,W, k, f) of (3, k)-CWSP is also a solution to the instance(U,S, w,W, k) of (3, k)-WSP. Therefore, if the algorithm accepts, the input is a yes-instance of (3, k)-WSP.

Now, suppose that the input is a yes-instance of (3, k)-WSP, and let S ′ be a corresponding solution. Let

U = {u1, u2, . . . , un} be the order chosen by WSPAlg in Step 1. We define U1, U2, . . . , U1ε , f and an order

S1, S2, . . . , Sk of the sets in S ′ as follows.

1. Initialize:

(a) Let S1, S2, . . . , Sbεkc be the sets in S ′ satisfying [min(S1) < min(S2) < . . . < min(Sbεkc)] and[min(Sbεkc) < min(S)] for all S ∈ S ′ \ {S1, S2, . . . , Sbεkc}.

(b) Let U1 = {u1, u2, . . . ,min(Sbεkc)} .

(c) Let f(1) = min(Sbεkc). Note that all the elements in (⋃S ′) \ (

⋃bεkcj=1 Sj) are larger than f(1).

(d) Let U1ord = {u′1, u′2, . . . , u′n} be an ordered copy of U such that the elements in U1 appear first, where

the internal order between the elements in U1, as well as the internal order between the elements inU \ U1, are preserved according to their order in U .

(e) Let P1 = ∅.

2. For i = 2, 3, . . . , 1ε :

(a) Let A′ =⋃(i−1)bεkcj=1 Sj , and A′min =

⋃(i−1)bεkcj=1 {min(Sj)} where min is computed according to the

ordered universe U i−1ord .

(b) Let B′ = (⋃S ′) \ (

⋃(i−1)bεkcj=1 Sj).

(c) Let B1, B2, . . . , Bx, where x = d |B′|

bεkce, be a partition of U i−1ord \ (⋃i−1j=1 U

j) (into x disjoint universes)

such that each Bi is a set of elements that are consecutively ordered in U i−1ord and contains exactlybεkc elements from B′, except Bx, if (|B′|mod bεkc) 6= 0, which contains exactly (|B′|mod bεkc)elements from B′.

(d) Let U i be a universe that contains the maximum number of elements from A′ \ (A′min ∪ Pi−1)among the universes B1, B2, . . . , Bx. If U i contains elements from less than bεkc sets in (S ′ \{S1, S2, . . . , S(i−1)bεkc}, add elements from U i−1ord \(

⋃i−1j=1 U

j) to U i such that its elements remain con-

secutively ordered in U i−1ord and it contains elements from exactly bεkc sets in (S ′\{S1, . . . , S(i−1)bεkc}.(e) Let f(i) be the largest element in U i.

(f) Let U iord = {u′1, u′2, . . . , u′n} be an ordered copy of U such that the elements in⋃i−1j=1 U

j appear

first, followed by those in U i, where the internal orders between the elements in⋃i−1j=1 U

j , U i and

U \ (⋃ij=1 U

j) are preserved according to their order in U i−1ord .

(g) Let S(i−1)bεkc+1, S(i−1)bεkc+2, . . . , Sibεkc be the sets in S ′ \ {S1, S2, . . . , S(i−1)bεkc} that contain ele-ments from U i, such that min(Sj−1) < min(Sj) (according to U iord).

(h) Note that all the elements in (⋃S ′) \ (

⋃ibεkcj=1 Sj) are larger than f(i) (according to U iord).

11 We chose the term “unbalanced cutting of the universe” to imply that when applying this method, we next deleteelements from partial solutions in a manner that distorts the balance between the number of small and largeelements that they contain.

25


(i) Let Pi be the set of elements in⋃ibεkcj=1 (Sj \{min(Sj)}) that are smaller or equal to f(i) (according to

U iord). We have that |Pi| ≥ |Pi−1|+⌈ |A′ \ (A′min ∪ Pi−1)|

x

⌉≥ R(i−1) +

⌈2(i− 1)bεkc −R(i−1)

d|B′|/bεkce

⌉=

R(i−1) +

⌈2(i− 1)bεkc −R(i−1)

d3(k − (i− 1)bεkc)/bεkce

⌉.

3. Let U ′ = U1ε

ord.

4. Let S1+ 1ε bεkc, . . . , Sk be the sets in S ′ \ {S1, . . . , S 1

ε bεkc}, such that min(Sj−1) < min(Sj) where min and

< are computed according to the ordered universe U ′.

We have thus defined an instance (U ′,S, w,W, k, f) of (3, k)-CWSP that is examined by WSPAlg in Step9. Since this is a yes-instance (by the above arguments, S1, S2, . . . , Sk is a solution to this instance), thealgorithm accepts. ut

I Solving the P2-Packing Problem

We say that a set of t (node-)disjoint simple paths, each on 3 nodes, is a t-packing. We first note that thepapers [15,16] develop an algorithm that given a bipartite graph H = (A,B,E), decides in polynomial timeif G contains a k-packing such that the end-nodes of its paths belong to B. Then, given a graph G = (V,E),they solve P2-Packing by examining 8k+o(k) options to deterministically partition V into the node-sets Aand B, and accepting iff at least one of the resulting bipartite graphs H = (A,B, {{a, b} ∈ E : a ∈ A, b ∈ B})has a k-packing such that the end-nodes of its paths belong to B. In the randomized version, they examine6.75k such options. The goal is to examine an option that captures the end-nodes of a k-packing in G (ifone exists) in B, and its other nodes in A. We observe that this can be done by using a (|V |, 3k, k)-universalset (see Definition 1). By Theorem 1, this results in an O∗(6.75k+o(k))-time deterministic algorithm forP2-Packing.

In the rest of this section we develop an alternative deterministic FPT algorithm for P2-Packing,demonstrating the fifth strategy in Section 2.1. We will focus on the following variant of P2-Packing:

Iterative Compression P2-Packing (P2-ICP): Given an undirected graph G = (V,E) and a parameterk ∈ N, along with a (k − 1)-packing in G, denoted S1 = (V1, E1), S2 = (V2, E2), . . . , Sk−1 = (Vk−1, Ek−1),decide if G has k-packing.

Clearly, if we can solve this variant in time T , we can solve P2-Packing in time O∗(T ) by using thealgorithm for P2-ICP k times, starting with an empty solution, and then, for i = 1, 2, . . . , k, constructinga solution that contains exactly i paths. It will be useful to focus on P2-ICP, since we can thus use thefollowing result of [17]:12

Theorem 8. Let G be a graph that has a k-packing. For any (k−1)-packing in G, denoted S1 = (V1, E1), S2 =

(V2, E2), . . . , Sk−1 = (Vk−1, Ek−1), G has a k-packing that contains at least d2.5(k−1)e nodes from⋃k−1i=1 Vi.

Assume WLOG that k is odd. We next denote X =⋃k−1i=1 Vi and Y = V \ U . Moreover, given p ∈

{3, 4, . . . , 3k − 2.5(k − 1)} and q ∈ {dp3e, dp3e + 1, . . . , p}, let Sol(p, q) be the set of k-packings that contain

exactly p nodes from Y and exactly q paths such that each of them includes at least one node from Y . By

Theorem 8, an instance of P2-ICP is a yes-instance iff⋃3k−2.5(k−1)p=3

⋃pq=d p3 e

Sol(p, q) 6= ∅.In the following two sections, we develop two deterministic procedures, ICPPro1 and ICPPro2, for which

we prove the following results:

Lemma 9. Given p ∈ {3, 4, . . . , 3k − 2.5(k − 1)}, q ∈ {dp3e, dp3e + 1, . . . , p} and an instance of P2-ICP,

ICPPro1 returns a family F of subsets of size 3q−p of X such that [Sol(p, q) 6= ∅] iff [there exist a set F ∈ Fand a (k − q)-packing in G whose nodes belong to X \ F ]. ICPPro1 runs in time O∗(6.75k+o(k)).

12 We note that our approach does not lead to an algorithm for Weighted P2-Packing since Theorem 8 does notconcern weighted graphs.

26


Lemma 10. Given p ∈ {3, 4, . . . , 3k − 2.5(k − 1)}, q ∈ {dp3e, dp3e + 1, . . . , p} and an instance of P2-ICP

along with a family F of subsets of size 3q − p of X, ICPPro2 decides if there exist a set F ∈ F and a(k − q)-packing in G whose nodes belong to X \ F . ICPPro2 runs in time O∗(6.777k).

Having such procedures, we can solve P2-ICP by trying, for each possible pair (p, q) such that p ∈{3, 4, . . . , 3k − 2.5(k − 1)} and q ∈ {dp3e, d

p3e + 1, . . . , p}, to call procedure ICPPro1, which returns a family

F , and then accept if ICPPro2 accepts when called with F . If no execution of ICPPro2 accepts, we reject.We thus obtain the following result:

Theorem 9. P2-ICP can be solved in determinstic time O∗(6.777k).

Corollary 4. P2-Packing can be solved in determinstic time O∗(6.777k).

I.1 The Procedure ICPPro1: Proof of Lemma 9

When we next refer to representative families, suppose that they are computed with respect to Y . Moreover,assume an arbitrarily order Y = {y1, y2, . . . , yn}. We present a dynamic programming-based procedure toprove the lemma, in which we embed representative sets computations. To this end, we use a matrix M that

has an entry [p′, q′,m,X ′] for all p′ ∈ {1, 2, . . . , p}, q′ ∈ {dp′3 e, dp′

3 e + 1, . . . ,min{p′, q}}, m ∈ {1, 2, . . . , n}and X ′ ⊆ X of size 3q′ − p′.

We next assume that a reference to an undefined entry returns ∅. The other entries will store the followingfamilies of partial solutions:

• M[p′, q′,m,X ′]: A family that (p− p′)-represents the family F defined as follows. Let F be the family ofevery q′-packing whose paths contain exactly p′ nodes from Y and whose set of other nodes (i.e., nodesin X) is exactly X ′, such that each of its paths contains a node from Y that is smaller than ym, exceptone path whose smallest node from Y is ym.For each such packing in F , F contains a set that includes the nodes of its paths that belong to Y ,excluding the smallest node from Y in the node-set of each of its paths.


• For p′ = 1, 2, . . . , p:

− For q′ = dp′3 e, dp′

3 e+ 1, . . . ,min{p′, q}:∗ Compute all entries of the form M[p′, q′,m,X ′].

We now give the recursive formulas using which the entries are computed.

1. If q′ = 1: M[p′, q′,m,X ′] = {Y ′ ⊆ Y : min(Y ′) = ym, there exists a 1-packing in G such that the node-setof its path is X ′ ∪ Y ′}.

2. Else: M[p′, q′,m,X ′] = {Y ′ ∪ Y ′′ : Y ′ ∩ Y ′′ = ∅,min(Y ′) = ym, there exists X ⊆ X ′ for which [there

exists a 1-packing in G such that the node-set of its path is X ∪ Y ′] and [Y ′′ ∈ ⋃m−1m′=1 M[p′ − |Y ′|, q′ −1,m′, X ′ \ X]]}.

• After 2: Replace the result by a family that max (p− p′)-represents it.

Finally, we return F = {X ′ ⊆ X :⋃nm=1 M[p, q,m,X ′] 6= ∅}. Note that the size of each set in the family

stored in an entry M[p′, q′,m,X ′] is (p′ − q′). Furthermore, note that, as required, [Sol(p, q) 6= ∅] iff [thereexist a set F ∈ F and a (k − q)-packing in G whose nodes belong to X \ F ].

By Theorem 2, the running time is bounded by:

O∗(2o(p) · pmaxp′=1

min{p′,q}maxq′=d p′3 e

(3(k − 1)

3q′ − p′)

(c(p− q′))2(p−q′)−(p′−q′)(p′ − q′)p′−q′(c(p− q′)− (p′ − q′))2(p−q′)−2(p′−q′) ).

We can bound the above expression by:

27


O∗(2o(k) ·k2

maxp′=1

maxp′3 ≤q′≤p′

(3k)3k

(3q′ − p′)3q′−p′(3k − 3q′ + p′)3k−3q′+p′· (c(k2 − q′))k−p

′−q′

(p′ − q′)p′−q′(c(k2 − q′)− (p′ − q′))k−2p′ ).

Now, we bound this expression by:

O∗(2o(k) · max0≤α≤1

maxαk

23 ≤q′≤α k2

(3k)3k

(3q′−αk2 )3q′−α k2 (3k−3q′+αk2 )3k−3q

′+α k2· (c(k2−q′))k−α

k2−q′

(αk2−q′)αk2−q′(c(k2−q′)−(αk2−q′))k−αk

).

This expression is further bounded by:

O∗(2o(k) · max0≤α≤1

max13≤β≤1

(33

( 3βα2 − α

2 )3βα2 −α2 (3− 3βα

2 + α2 )3−

3βα2 +α2

· (c( 12−

βα2 ))1−

α2−

βα2

(α2 −βα2 )

α2−

βα2 (c( 1

2−βα2 )−(α2 −

βα2 ))1−α

)

)k.

Which is bounded by:

O∗(2o(k) · max0≤α≤1

max13≤β≤1

(66

(3βα−α)3βα−α(6−3βα+α)6−3βα+α· (c(1−βα))2−α−βα

(α−βα)α−βα(c(1−βα)−(α−βα))2−2α)

) k2

.

For c = 1, the maximum is obtained when α = β = 1, which results in the bound O∗(6.75k+o(k)). ut

I.2 The Procedure ICPPro2: Proof of Lemma 10

First note that the nodes in Y are not relevant to this section—we can focus on the subgraph of G inducedby X. Furthermore, we need only know the set of 1-packings in this subgraphs, which can be given as afamily of sets on 3 nodes. Thus, to prove the lemma, we will solve following problem in time O∗(6.777k):

P2-Pro2: Given a universe U of size 3k, a family S of subsets of size 3 of U , p ∈ {1, 2, . . . , bk2 c}, q ∈{dp3e, d

p3e + 1, . . . , p} and a family F of subsets of size 3q − p of U , decide if there exist a set F ∈ F and a

subfamily S ′ ⊆ S of (k − q) disjoint sets that do not contain any element from F .

We next solve a subcase of P2-Pro2, and then show how to translate P2-Pro2 to this subcase by usingunbalanced cutting of the universe. To this end, we follow the ideas introduced in Appendix H, where themain differences result from the fact that now, since the universe is of the small size 3k, we do not need toperform any computation of a representative family.

I.2.1 Solving a Subcase of P2-Pro2

In this section we solve the subcase Cut P2-Pro2 (P2-CPro2) of P2-Pro2, defined as follows. Fix 0 <ε < 0.1, whose value is determined later, such that 1

ε ∈ N. The input for P2-CPro2 consists of an ordereduniverse U = {u1, u2, . . . , u3k}, where ui−1 < ui for all i ∈ {2, 3, . . . , 3k}. We are also given a family S ofsubsets of size 3 of U , p ∈ {1, 2, . . . , bk2 c}, q ∈ {d

p3e, d

p3e+ 1, . . . , p} and a family F of subsets of size 3q − p

of U , along with a non-decreasing function f : {1, 2, . . . , 1ε } → U .We need to decide if there is a set F ∈ F and an ordered subfamily S ′ ⊆ S of (k−q) disjoint sets, denoted

accordingly as S ′ = {S1, S2, . . . , Sk−q}, such that F ∩ (⋃S ′) = ∅ and the following “solution conditions” are

satisfied:

1. ∀i ∈ {2, 3, . . . , k − q}: min(Si−1) < min(Si).

2. ∀i ∈ {1, 2, . . . , 1ε }: There are at least R(i) elements in F ∪ (

ibε(k−q)c⋃

j=1

(Sj \ {min(Sj)})) that are smaller or

equal to f(i), where R is defined according to the following recursion.

• R(0) = 0.

28


• For j = 1, . . . , 1ε : R(j) = R(j − 1) +

⌈(3q − p) + 2(j − 1)bε(k − q)c −R(j − 1)

d3((k − q)− (j − 1)bε(k − q)c)/bε(k − q)ce

⌉.

3. ∀i ∈ {1, 2, . . . , 1ε }: All the elements in

(k−q)⋃

j=1+ibε(k−q)cSj are larger than f(i).

We next show that P2-CPro2 can be efficiently solved using dynamic programming, proving the followinglemma:

Lemma 11. For any fixed 0 < ε < 1, P2-CPro2 can be solved in deterministic time

Y = O∗(1ε

maxi=1

ibε(k−q)cmax

j=1+(i−1)bε(k−q)c

(3k − j −R(i− 1)

(3q − p) + 2j −R(i− 1)

)).

Proof. We now present a dynamic programming-based procedure to prove the lemma. To this end, we usethe following two matrices:

1. M that has an entry [i, j, s1, s2, . . . , s 1ε,m,U ′] for all i ∈ {1, 2, . . . , 1ε +1}, j ∈ {1+(i−1)bε(k−q)c, . . . , J})

where [(i ≤ 1ε → J = ibε(k − q)c) ∧ (i = 1

ε + 1 → J = (i − 1)bε(k − q)c + ((k − q) mod bε(k − q)c))],[∀` ∈ {1, . . . , 1ε } : s` ∈ {0, 1, . . . , (3q−p)+2j}] such that [∀` ∈ {1, . . . , i−1} : s` ≥ R(`)], m ∈ {j, . . . , 3k}such that (i > 1→ um > f(i− 1)), and U ′ ⊆ {u` ∈ U : ` > j + si−1} of size (3q − p) + 2j − si−1, wheres0 = 0.

2. N has an entry [i, s1, s2, . . . , s 1ε,m,U ′] for all i ∈ {2, 3, . . . , 1ε }, [∀` ∈ {1, . . . , 1ε } : s` ∈ {0, 1, . . . , (3q −

p) + 2(ibε(k − q)c)}] such that [∀` ∈ {1, . . . , i − 1} : s` ≥ R(`)], m ∈ {(ibε(k − q)c), . . . , 3k} such that(i > 1→ um > f(i− 1)), and U ′ ⊆ {u` ∈ U : ` > (ibε(k − q)c) + si−1} of size (3q − p) + 2j − si−1.

We next assume that a reference to an undefined entry returns FALSE. The other entries will store thefollowing boolean values:

• The entry M[i, j, s1, . . . , s 1ε,m,U ′] holds TRUE iff there is a set F ∈ F and an ordered subfamily S ′ ⊆ S

of j disjoint sets, denoted accordingly as S ′ = {S1, S2, . . . , Sj}, such that F ∩(⋃S ′) = ∅ and the following

“solution conditions” are satisfied:

1. ∀` ∈ {2, . . . , j}: min(S`−1) < min(S`).

2. min(Sj) = um.

3. ∀`∈{1, . . . , i−1}: There are at leastR(`) elements in F ∪ (

`bε(k−q)c⋃

p=1

(Sp\{min(Sp)})) that are at most f(`).

4. ∀` ∈ {1, . . . , 1ε }: There are exactly s` elements in F ∪ (

j⋃

p=1

(Sp \ {min(Sp)})) that are at most f(`).

5. Suppose that f(0) is a value smaller than u1. Then, U ′ is the set elements in F ∪ (

j⋃

p=1

(Sp\{min(Sp)}))that are larger than f(i− 1).

6. ∀` ∈ {1, . . . , i−1}: All the elements in

j⋃

p=1+`bε(k−q)cSp are larger than f(`).

• The entry N[i, s1, . . . , s 1ε,m,U ′] holds TRUE iff at least one entry M[i, ibε(k − q)c, s1, . . . , s 1

ε,m,U ′′],

where U ′ = U ′′ \ {u ∈ U : f(i− 1) < u ≤ f(i)}, holds TRUE.

Initialize the entries of N to FALSE. Then, the entries are computed in the following order:

1. For i = 1, . . . , 1ε + 1:(a) For j = 1 + (i− 1)bεkc, . . . , J :

i. Compute all entries of the form M[i, j, s1, . . . , s 1ε,m,U ′].

(b) If i ≤ 1ε :

i. For each entry of the form M[i, ibε(k − q)c, s1, . . . , s 1ε,m,U ′] that holds TRUE:

A. Assign TRUE to N[i, s1, . . . , s 1ε,m,U ′ \ {u ∈ U : f(i− 1) < u ≤ f(i)}].

We now give the recursive formulas using which the entries of M are computed.

29


1. If j = 1: M[1, 1, s1, . . . , s 1ε,m,U ′] holds TRUE iff there exist disjoint F ∈ F and S ∈ S such that

min(S)=um, [∀`∈{1, . . . , 1ε } : |{up∈F∪S\{um} : up≤f(`)}|=s`], and U ′=F∪(S\{um}).

2. Else if j = 1 + (i− 1)bε(k− q)c: M[i, j, s1, . . . , s 1ε,m,U ′] holds TRUE iff there exist disjoint sets S and A

such that S ∈ S, min(S) = um, TRUE ∈m−1⋃

m′=1

{N[i− 1, s′1, . . . , s′1ε,m′, A]}, where [∀` ∈ {1, . . . , 1ε } : s′` =

s` − |{up ∈ S \ {um} : up ≤ f(`)}|], and U ′ = A ∪ (S\{um}).

3. Else: M[i, j, s1, . . . , s 1ε,m,U ′] holds TRUE iff there exist disjoint sets S and A such that S ∈ S, min(S) =

um, TRUE ∈m−1⋃

m′=1

{M[i, j − 1, s′1, . . . , s′1ε,m′, A]}, where [∀` ∈ {1, . . . , 1ε } : s′` = s` − |{up ∈ S \ {um} :

up ≤ f(`)}|], and U ′ = A ∪ (S\{um}).

Finally, we return yes iff at least one entry of the form M[i, (k − q), s1, . . . , s 1ε,m,U ′] holds TRUE.

The matrices M and N contain O∗(1ε

maxi=1

ibε(k−q)cmax

j=1+(i−1)bε(k−q)c

(3k − j −R(i− 1)

(3q − p) + 2j −R(i− 1)

)) entries, where each

entry of M can be computed in time O∗(1), and the total time for computing all of the entries of N isbounded by O∗ of the number of entries of M. Thus, the entire procedure can be performed in the desiredtime. ut

We proceed to analyze the bound, denoted Y , for the running time in Lemma 11. First, define T accordingto the following recursion:

• T (0) = 0.

• For j = 1, . . . , 1ε : T (j) = T (j − 1) + ε

(3q−pk−q + 2(j − 1)ε− T (j − 1)

3(1− (j − 1)ε)

).

Thus, we get that:

Y = O∗(1ε

maxi=1

max(i−1)ε(k−q)≤j≤iε(k−q)

(3k − j − T (i− 1)(k − q)

(3q − p) + 2j − T (i− 1)(k − q)

)).

The worst case is obtained for instances where q is maximal, i.e., when p = q = bk2 c. Therefore, therunning time is bounded by:

O∗(1ε

maxi=1

max(i−1)ε k2≤j≤iε k2

(3k − j − T (i− 1)k2k + 2j − T (i− 1)k2

))

= O∗(1ε

maxi=1

max(i−1)≤α≤i

(3k − αεk2 − T (i− 1)k2k + αεk − T (i− 1)k2

))

= O∗(1ε

maxi=1

max(i−1)≤α≤i

(3k − αεk2 − T (i− 1)k2 )3k−αεk2−T (i−1) k2

(k + αεk − T (i− 1)k2 )k+αεk−T (i−1) k2 (2k − 3αεk2 )2k−3αεk2

)

= O∗(1ε

maxi=1

max(i−1)≤α≤i

((3− αε

2 −T (i−1)

2 )3−αε2 −

T (i−1)2

(1 + αε− T (i−1)2 )1+αε−

T (i−1)2 (2− 3αε

2 )2−3αε2

)k)

= O∗(1ε

maxi=1

max(i−1)≤α≤i

((6− αε− T (i− 1))6−αε−T (i−1)

(2 + 2αε− T (i− 1))2+2αε−T (i−1)(4− 3αε)4−3αε

) k2

).

30


Denote the expression above by Z. To allow us to compute a bound efficiently, we choose ε = 10−5.Choosing a smaller ε results in a better bound, but the improvement is negligible. We thus get that themaximum is obtained at i = α = 6377, where T (i− 1) ∼= 0.04485, which results in the bound O∗(6.77682k).We have proved the following corollary:

Corollary 5. For ε = 10−5, P2-CPro2 can be solved in deterministic time O∗(6.777k).

I.2.2 A Deterministic Procedure for P2-Pro2

Fix ε = 10−5. The pseudocode of Procedure2, a deterministic procedure that solves P2-Pro2, is given below.It uses unbalanced cutting of the universe in the same manner as WSPAlg (see Appendix H.2), obtaining aset of inputs for P2-CPro2 and accepting iff at least one of them is a yes-instance.

Algorithm 6 Procedure2(U, k,S, p, q,F)

1: order the universe U arbitrarily as U = {u1, u2, . . . , u3k}.2: for all distinct `1, `2, . . . , ` 1

ε, r1, r2, . . . , r 1

ε∈ U s.t. ì < ri for all i ∈ {1, 2, . . . , 1

ε} do

3: for i = 1, 2, . . . , 1εdo

4: define U i = {ì, ì + 1, . . . , ri} \⋃i−1j=1 U

j .5: end for

6: U1ε+1 = U \⋃

1εj=1 U

j .

7: let U ′ = {u′1, u′2, . . . , u′n} be an ordered copy of U such that the elements in U1 appear first, then those in U2,and so on, where the order between the elements in each U i is preserved according to their order in U .

8: define f : {1, 2, . . . , 1ε} → U ′ such that f(i) is the largest element in U ′ that belongs to U i.

9: if the procedure of Corollary 5 accepts (U ′, k,S, p, q,F , f) then accept. end if10: end for11: reject.

We next consider the correctness and running time of Procedure2.

Lemma 12. Procedure2 solves P2-Pro2 in time O∗(6.777k).

Proof. The running time of the algorithm follows immediately from the pseudocode and Corollary 5.For the easier direction, note that for any ordering of the universe U as U ′, and for any function f :

{1, 2, . . . , 1ε }, a solution to the instance (U ′, k,S, p, q,F , f) of P2-CPro2 is also a solution to the instance(U, k,S, p, q,F) of P2-Pro2. Therefore, if the algorithm accepts, the input is a yes-instance of P2-Pro2.

Now, suppose that the input is a yes-instance of P2-Pro2, and let F ∈ F and S ′ ⊆ S be a correspondingsolution. Let U = {u1, u2, . . . , u3k} be the order chosen by Procedure2 in Step 1. We define U1, U2, . . . , U

1ε ,

f and an order S1, S2, . . . , Sk of the sets in S ′ as follows.

1. Initialize:

(a) Let U0ord = {u′1, u′2, . . . , u′3k} be an ordered copy of U .

(b) Let P0 = ∅.2. For i = 1, 2, . . . , 1ε :

(a) Let A′ = F ∪ (⋃(i−1)bε(k−q)cj=1 Sj), and A′min =

⋃(i−1)bε(k−q)cj=1 {min(Sj)} where min is computed

according to the ordered universe U i−1ord .

(b) Let B′ = (⋃S ′) \ (

⋃(i−1)bε(k−q)cj=1 Sj).

(c) Let B1, B2, . . . , Bx, where x = d |B′|bε(k−q)ce, be a partition of U i−1ord \ (

⋃i−1j=1 U

j) (into x disjoint uni-

verses) such that each Bi is a set of elements that are consecutively ordered in U i−1ord and containsexactly bε(k − q)c elements from B′, except Bx, if (|B′|mod bε(k − q)c) 6= 0, which contains exactly(|B′|mod bε(k − q)c) elements from B′.

31


(d) Let U i be a universe that contains the maximum number of elements from A′ \ (A′min ∪ Pi−1)among the universes B1, B2, . . . , Bx. If U i contains elements from less than bε(k − q)c sets in

(S ′ \ {S1, S2, . . . , S(i−1)bε(k−q)c}, add elements from U i−1ord \ (⋃i−1j=1 U

j) to U i such that its elements

remain consecutively ordered in U i−1ord and it contains elements from exactly bε(k − q)c sets in(S ′ \ {S1, . . . , S(i−1)bε(k−q)c}.

(e) Let f(i) be the largest element in U i.

(f) Let U iord = {u′1, u′2, . . . , u′3k} be an ordered copy of U such that the elements in⋃i−1j=1 U

j appear

first, followed by those in U i, where the internal orders between the elements in⋃i−1j=1 U

j , U i and

U \ (⋃ij=1 U

j) are preserved according to their order in U i−1ord .

(g) Let S(i−1)bε(k−q)c+1, S(i−1)bε(k−q)c+2, . . . , Sibε(k−q)c be the sets in S ′\{S1, S2, . . . , S(i−1)bε(k−q)c} thatcontain elements from U i, such that min(Sj−1) < min(Sj) (according to U iord).

(h) Note that all the elements in (⋃S ′) \ (

⋃ibε(k−q)cj=1 Sj) are larger than f(i) (according to U iord).

(i) Let Pi be the set of elements in⋃ibε(k−q)cj=1 (Sj \{min(Sj)}) that are smaller or equal to f(i) (according

to U iord). Then, we have that |Pi| ≥ |Pi−1|+⌈ |A′ \ (A′min ∪ Pi−1)|

x

⌉≥

R(i−1)+

⌈(3q−p)+2(i−1)bε(k−q)c −R(i−1)

d|B′|/bε(k−q)ce

⌉= R(i−1)+

⌈(3q−p)+2(i−1)bε(k−q)c −R(i−1)

d3((k−q)−(i−1)bε(k−q)c)/bε(k−q)ce

⌉.

3. Let U ′ = U1ε

ord.

4. Let S1+ 1ε bε(k−q)c, . . . , Sk−q be the sets in S ′ \{S1, . . . , S 1

ε bε(k−q)c}, such that min(Sj−1) < min(Sj) where

min and < are computed according to the ordered universe U ′.

We have thus defined an instance (U ′, k,S, p, q,F , f) of P2-CPro2 that is examined by Procedure2 inStep 9. Since this is a yes-instance (by the above arguments, S1, S2, . . . , Sk−q is a solution to this instance),the procedure accepts. ut

32


Chapter 4

Conclusion

In this thesis, we developed new schemes for applying color coding-related methods. We

demonstrated their power by obtaining improved parameterized algorithms for a variety

of graph problems, giving special emphasis to network query problems. In this section,

we address each paper included in this thesis, summarizing our main contributions and

suggesting directions for future research.

Algorithms for Topology-Free and Alignment Network Queries

The first result in this thesis is a randomized parameterized algorithm for GMI, based

on the narrow sieves method, that runs in time O∗(2k). We have thus improved upon

previous randomized algorithms for GMI (see Section 1.2.1). Moreover, we gave a

polynomial time algorithm for ANQI that handles a family of acyclic directed graphs,

as well as a certain scoring scheme, that are more general than those considered in

previous studies [PRT+08, PRY+05].

The parameterized time complexity of GMI is now understood in the sense that

there is an O∗(2k−D)-time randomized algorithm that solves it, while it seems unlikely

that there exists an O∗((2 − ε)k)-time such algorithm [BKK13]. Furthermore, recent

studies affirmed the practical relevance of this algorithm [BKL15], and considered other

parameterizations of the problem [BS15]. As a possible direction for future research, we

suggest to integrate the algorithm of Bjorklund et al. [BKK13] in schemes related to the

analysis of actual biological networks, such as the scheme combining the algorithm for

ANQI, given in [PRY+05], to perform a pathway evolution study [MTB+10]. For the

ANQI problem, our work leaves several interesting directions for further research−in

particular, the development of parameterized algorithms, rather than polynomial time

algorithms, to handle wider classes of directed acyclic graphs, and the assessment of

practical use of these algorithms.

191


Partial Information Network Queries

To bridge the knowledge gap between the GM and ANQ problems, i.e., addressing

scenarios where partial information is available on the topology of the pattern, we

introduced PINQ. For a subcase of this problem, we developed a parameterized algorithm

that relies on the divide-and-color method. One possible direction for further research

is to improve the running time of our PINQ algorithm, while still efficiently handling

instances where weights can take arbitrary values (i.e., not necessarily polynomial in

the input size). While this goal can be achieved by using the representative sets method

(generalizing the ideas we used in developing our deterministic algorithms for GMI), it

is interesting to examine whether a faster algorithm can be obtained by employing, for

example, a mixing strategy that involves representative sets.

Parameterized Algorithms for Module Motif

We extended the definition of the Module Motif problem, introduced by Rizzi and

Sikora [RS12], to handle indels and weighted instances. For this generalization, we

developed three parameterized algorithms, observed that the running times of two

of them might be essentially tight, and proved that it is unlikely that there exists a

polynomial kernel.

The authors of [RS12] provide several biological arguments that support the study

of the Module Motif problem. For example, they argue that the nodes in a module

may represent biological entities that have a common biological function, since it is

known that genes having similar neighborhoods, have a better chance to be related to

the same biological processes. Considerng various biological networks, such as genetic

regulatory networks and protein-protein interaction networks, it would be interesting to

compare between results output by algorithms for Module Motif vs. results output

by algorithms for GM. Moreover, it is of interest to investigate network queries that

balance between connectivity and modularity constraints−in the case of GM and ANQ,

we defined such balance by studying PINQ.

Finally, as mentioned in [RS12], an intriguing open question is whether Module

Motif, parameterized by the number of different colors in M , is W[1]-hard. Clearly,

considering other parameterizations may also lead to the discovery of significant results.

Algorithms for k-Internal Out-Branching

We presented an approach that integrates proper colorings of graphs in a narrow sieves-

based algorithm. When the degree of the input graph is bounded, this approach attempts

to improve the running time of the algorithm by generating monomials of a smaller

maximal degree. We demonstrated the usefulness of this approach by developing an

algorithm for k-IOB whose running time is O∗(4(1− ∆+12∆(∆−1)

)k). Recently, we integrated

(in [BKK+15]) proper colorings, along with two other types of colorings, in the narrow

sieves method, to obtain improved algorithms for the special case of k-IST.

192


It is interesting to investigate the usefulness of various colorings of graphs in the

context of other problems and color coding-related methods. Moreover, it may be

possible to develop improved algorithms for k-IOB in bounded degree graphs, by

incorporating ideas that are not based on proper colorings, such as those integrated in

bounded search trees-related techniques [DF13].

Representative Families: A Unified Tradeoff-Based Approach

We presented a unified approach that enables speeding-up previous algorithms based

on the representative sets method (with respect to uniform matroids); this includes,

for example, improving the best known algorithms for k-Path, k-Tree and Long

Directed Cycle, whose running times are O∗(2.851k), O∗(2.851k) and O∗(8k), to run

in times O∗(2.619k), O∗(2.619k) and O∗(6.75k), respectively. Our approach introduces

a parameter c, which controls the tradeoff between running time and the size of

the representative families. By applying our approach, we also developed improved

parameterized algorithms for the PC and k-IOB problems, where the algorithm for

k-IOB also relies on our guiding trees tool.

It is of great interest to obtain further improvements in the computation of represen-

tative sets (with respect to uniform matroids, or more generally, with respect to linear

matroids). Indeed, such improvements can be used to speed-up, in a unified manner,

parameterized algorithms for many fundamental problems. In a later paper [Zeh14], we

showed that the upper bound on the running times of some algorithms relying on repre-

sentative sets can be further improved from O∗(2.619k) to O∗(2.597k). However, there is

still a large gap between the running times of representative sets-based algorithms and

narrow sieves-based algorithms. For example, the best known deterministic algorithm for

k-Path, relying on representative sets, runs in time O∗(2.597k) [Zeh14]. This algorithm

can efficiently handle weighted instances, as well as directed graphs. However, the best

known randomized algorithm for k-Path, based on multilinear detection, runs in time

O∗(2k) [Wil09]. This algorithm cannot efficiently handle weighted instances, but is

suitable for directed graphs. If one is interested only in undirected graphs, by using

narrow sieves, k-Path can be solved in randomized time O∗(1.66k) [BHK+10a]. We

note that Williams [Wil09] conjectured that k-Path in weighted graphs can be solved

in time O∗(2k), and both Koutis and Williams [Wil09, Kou08] conjectured that k-Path

can be solved in deterministic time O∗(2k).

Parameterized Algorithms for Graph Partitioning Problems

We studied the broad class of FGPPs, where one seeks a set of vertices of size exactly

k, satisfying a certain condition referring to the edges between vertices in the set,

and the edges connecting vertices in the set and vertices outside the set. By using a

reduction to the Weighted k-Exact Cover problem, combined with a computation

of representative sets, we developed an O∗(4k+o(k)∆k)-time algorithm for the class of

193


FGPPs. We have thus answered affirmatively an open question posed by Bonnet et

al. [BEP+13]. Moreover, we obtained parameterized algorithms for specific FGPPs and

subclasses of FGPPs, including an O∗(4p+o(p))-time algorithm for Max (k, n− k)-Cut,

an O∗(2k+ pα2

+o(k+p))-time algorithm for the subclass of positive min-FGPPs, and faster

algorithms for the subclass of non-degrading positive min-FGPPs (which results in an

O∗(2p+o(p))-time algorithm for Min k-Vertex Cover). These algorithms are based

on variants of the randomized separation method.

One obvious direction for future research is to improve the running times of our

algorithms. Furthermore, it may be interesting to explore the power of other variants of

randomized separation−for example, consider colorings of both vertices and edges, or

combine randomized separation with other techniques.

It is also possible to define and study classes of problems that are related or extend

the class of FGPPs. For example, one can define a class where not only the edges that

touch the vertices in a potential solution set affect its score, but also other edges, that

touch the neighbors of the vertices in the set. More generally, one can define a scoring

scheme that takes into consideration the distance between an edge and the vertices

in the set. Another possibility to extend the class of FGPPs is to add connectivity

or modularity constraints relating to the vertices in the potential solution set and/or

to their neighbors outside the set. Finally, it would be interesting to study classes

of problems relating to FGPPs in the context of directed graphs, colored graphs and

hypergraphs.

Deterministic Parameterized Algorithms for the Graph Motif Problem

Previous studies of GMI left open the existence of a fast deterministic algorithm for

this problem. We developed an O∗(6.855k−D)-time deterministic algorithm for GMI,

whose running time can be improved to O∗(6.74k−D) (see [Zeh14]), as well as two faster

deterministic algorithms for special cases of GMI. These algorithms rely on computations

of representative sets (with respect to both uniform and partition matroids), combined

with a tool that we called guiding trees.

However, there still remains a very wide gap between the time complexity of our

deterministic algorithms for GMI and subcases of this problem, and the best known

randomized algorithm for GMI, which runs in time O∗(2k−D) [BKK13]. To improve

the running time of one of our algorithms, it is possible to consider speeding-up the

best known computation of representative sets with respect to partition matroids. It

may be possible to obtain better results for GMI also by computing representative sets

with respect to other matroids.

It would be interesting to extend the tool of guiding trees to be useful when the

solution has a more general (not necessarily tree-like) unknown topology. Alternatively,

one can also define guiding trees-related structures for specific classes of graphs in which

we seek a solution of unknown topology. It is also interesting to consider cases where

194


only certain parts of the topology of a solution are unknown.

Improved Parameterized Algorithms for Network Query Problems

For PINQ instances where the graphs in the pattern are trees, and the weights are

polynomially bounded by the input size, we presented an algorithm that is faster than

the one given in [PZ13]. This algorithm also unifies and extends other results related

to network queries. An interesting technical contribution in the paper is the mixing

strategy that underlies the algorithm: we introduced a general scheme that combines

two narrow sieves-based procedures and a step of divide-and-color.

Recently, in the paper [BKK+15] (not included in this thesis), we also used a similar

strategy (to develop an algorithm for k-IST). We believe that this strategy can be

useful in obtaining fast algorithms for other problems, where one can identify extreme

cases that can be solved efficiently via different narrow sieves-based approaches. In

particular, it would be interesting to find problems for which it is helpful to combine

more than two narrow sieves-based procedures.

Solving Parameterized Problems by Mixing Color Coding-Related Tech-

niques

Apart from the strategy presented in [PSZ14b], we investigated other mixtures of

color coding-related methods. This includes a scheme that combines a narrow sieves-

based procedure with a representative sets-based procedure, and a computation of

representative sets, specialized to efficiently handle certain disjointness conditions, which

is preceded by a preprocessing stage—this preprocessing stage contains an integration

of a step of divide-and-color in a technique that we call balanced cutting of the universe.

We have also presented a technique called unbalanced cutting of the universe, which

we combined with a representative sets-based algorithm to improve its running time

(by removing additional elements from the partial solutions that it computes). We

demonstrated the usefulness of our mixtures by examining the k-IOB, k-Path, (3, k)-

WSP and P2-Packing problems.

We believe it is possible to develop algorithms for other problems that rely on such

mixtures. Furthermore, it seems interesting to study other mixtures−in particular, those

that are not based only on color coding-related methods, but combine them with other

techniques. One can examine also conditions other than the disjointness conditions

defined in the paper, under which computations of representative sets can be sped-up.

Such conditions can be satisfied for the problem at hand, or it may be possible to

efficiently transform this problem into one that satisfies the conditions.

195


196


Bibliography

[AB14] H Abasi and N Bshouty. A simple algorithm for hamiltonicity. CoRR

abs/1404.2827, 2014.

[ABC+10] A M Ambalath, R Balasundaram, R H Chintan, K Venkata, M Neeld-

hara, P Geevarghese, and M S Ramanujan. On the kernelization

complexity of colorful motifs. In IPEC, pages 14–25, 2010.

[AYZ94] N Alon, R Yuster, and U Zwick. Color-coding: a new method for

finding simple paths, cycles and other small subgraphs within large

graphs. In STOC, pages 326–335, 1994.

[AYZ95] N Alon, R Yuster, and U Zwick. Color coding. J. ACM, 42(4):844–856,

1995.

[BB15] R Ben-Basat. Parameterized automata constructions and their appli-

cations. M.Sc. Thesis, Technion, MSC-2015-03, 2015.

[BBC+12] N Betzler, R Bredereck, J Chen, and R Niedermeier. Studies in

computational aspects of voting—–a parameterized complexity per-

spective. The Multivariate Algorithmic Revolution and Beyond, pages

318–363, 2012.

[BBF+11] N Betzler, R Bevern, M R Fellows, C Komusiewicz, and R Niedermeier.

Parameterized algorithmics for finding connected motifs in biological

networks. IEEE/ACM Trans. Comput. Biol. Bioinf., 8(5):1296–1308,

2011.

[BEP+13] E Bonnet, B Escoffier, V T Paschos, and E Tourniaire. Multi-

parameter complexity analysis for constrained size graph problems:

using greediness for parameterization. In IPEC, pages 66–77, 2013.

[Ber06] P Berkhin. A survey of clustering data mining techniques. Grouping

Multidimensional Data, pages 25–71, 2006.

[BFK+08] N Betzler, M R Fellows, C Komusiewicz, and R Niedermeier. Param-

eterized algorithms and hardness results for graph motif problems.

In CPM, pages 31–43, 2008.

197


[BHK+10a] A Bjorklund, T Husfeldt, P Kaski, and M Koivisto. Narrow sieves

for parameterized paths and packings. CoRR abs/1007.1161, 2010.

[BHK+10b] S Bruckner, F Huffner, R M Karp, R Shamir, and R Sharan. Topology-

free querying of protein interaction networks. J. Comput. Biol.,

17(3):237–252, 2010.

[Bj10] A Bjorklund. Determinant sums for undirected hamiltonicity. In

FOCS, pages 173–182, 2010.

[BKK13] A Bjorklund, P Kaski, and L Kowalik. Probably optimal graph motifs.

In STACS, pages 20–31, 2013.

[BKK+15] A Bjorklund, V Kamat, L Kowalik, and M Zehavi. Spotting trees

with few leaves. In ICALP, 2015.

[BKL15] A Bjorklund, P Kaski, and J Lauri. Engineering motif search for

large graphs. In ALENEX, pages 104–118, 2015.

[Bl03] M Blaser. Computing small partial coverings. Inf. Process. Lett.,

85(6):327–331, 2003.

[Bod93] H L Bodlaender. On linear time minor tests with depth-first search.

J. Algorithms, 14(1):1–23, 1993.

[Bod96] H L Bodlaender. A linear-time algorithm for finding tree-

decompositions of small treewidth. SIAM J. Comput., 25(6):1305–

–1317, 1996.

[Bod14] H L Bodlaender. Lower bounds for kernelization. In IPEC, pages

1–14, 2014.

[Bol65] B Bollobas. On generalized graphs. Acta Math. Aca. Sci. Hun.,

16:447–452, 1965.

[BPS13] E Bonnet, V T Paschos, and F Sikora. Multiparameterizations for max

k-set cover and related satisfiability problems. CoRR abs/1309.4718,

2013.

[BS15] E Bonnet and F Sikora. The parameterized complexity of graph motif

relatively to the structure of the input graph. CoRR abs/1503.05110,

2015.

[BSV10] G Blin, F Sikora, and S Vialette. Querying graphs in protein-protein

interactions networks using feedback vertex set. IEEE/ACM Trans.

Comput. Biol. Bioinf., 7(4):628–635, 2010.

198


[Cai08] L Cai. Parameterized complexity of cardinality constrained optimiza-

tion problems. Comput. J., 51(1):102–121, 2008.

[CC13] S Chen and Z Chen. Faster deterministic algorithms for packing,

matching and t-dominating set problems. CoRR abs/1306.3602, 2013.

[CCC06] L Cai, S M Chan, and S O Chan. Random separation: A new method

for solving fixed-cardinality optimization problems. In IPEC, pages

239–250, 2006.

[CFG+10] N Cohen, F V Fomin, G Gutin, E J Kim, S Saurabh, and A Yeo.

Algorithm for finding k-vertex out-trees and its application to k-

internal out-branching problem. J. Comput. Syst. Sci., 76(7):650–662,

2010.

[CFJ+04] J Chen, D Friesen, W Jia, and I Kanj. Using nondeterminism to

design effcient deterministic algorithms. Algorithmica, 40(2):83–97,

2004.

[CFL+11] J Chen, Q Feng, Y Liu, S Lu, and J Wang. Improved deterministic

algorithms for weighted matching and packing problems. Theor.

Comput. Sci., 412(23):2503–2512, 2011.

[CHM81] M Chein, M Habib, and M C Maurer. Partitive hypergraphs. Discrete

Mathematics, 37(1):35—-50, 1981.

[CKL+09] J Chen, J Kneis, S Lu, D Molle, S Richter, P Rossmanith, S H Sze, and

F Zhang. Randomized divide-and-conquer: Improved path, matching,

and packing algorithms. SIAM J. on Computing, 38(6):2526–2547,

2009.

[CLP+14] M Cygan, D Lokshtanov, M Pilipczuk, M Pilipczuk, and S Saurabh.

Minimum bisection is fixed parameter tractable. In STOC, pages

323–332, 2014.

[CLS+07] J Chen, S Lu, S H Sze, and F Zhang. Improved algorithms for path,

matching, and packing problems. In SODA, pages 298–307, 2007.

[Dal11] J Daligault. Combinatorial techniques for parameterized algorithms

and kernels, with applicationsto multicut. PhD Thesis, Universit´e

Montpellier II, Montpellier, H´erault, France, 2011.

[DBK07] P Deshpande, R Barzilay, and D R Karger. Randomized decoding

for selection-and-ordering problems. In HLT-NAACL, pages 444–451,

2007.

199


[DD13] A Demers and A Downing. Minimum leaf spanning tree. US Patent

no. 6,105,018, August 2013.

[DEF+03] R G Downey, V Estivill-Castro, M R Fellows, E Prieto, and F A

Rosamond. Cutting up is hard to do: the parameterized complexity

of k-cut and related problems. Electron. Notes Theor. Comput. Sci.,

78:209–222, 2003.

[DF95] R G Downey and M R Fellows. Fixed-parameter tractability and

completeness II: on completeness for W[1]. Theor. Comput. Sci.,

141(1-2):109–131, 1995.

[DF99] R Downey and M Fellows. Parameterized Complexity. Springer, New

York, 1999.

[DF13] R Downey and M Fellows. Fundamentals of Parameterized Complexity.

Springer, 2013.

[DFV11] R Dondi, G Fertin, and S Vialette. Maximum motif problem in

vertex-colored graphs. In CPM, pages 388–401, 2011.

[DL78] R A DeMillo and R J Lipton. A probabilistic remark on algebraic

program testing. Inform. Process Lett., 7(4):193–195, 1978.

[DRL+10] A Donavalli, M Rege, X Liu, and K Jafari-Khouzani. Low-rank

matrix factorization and co-clustering algorithms for analyzing large

data sets. In ICDEM, pages 272–279, 2010.

[DSG+08] B Dost, T Shlomi, N Gupta, E Ruppin, V Bafna, and R Sharan.

Qnet: a tool for querying protein interaction networks. J. Comput.

Biol., 15(7):913–925, 2008.

[FFH+11] M R Fellows, G Fertin, D Hermelin, and S Vialette. Upper and

lower bounds for finding connected motifs in vertex-colored graphs.

J. Comput. Syst. Sci., 77(4):799–811, 2011.

[FG06] J Flum and M Grohe. Parameterized Complexity Theory. Springer,

2006.

[FGL+12] F V Fomin, F Grandoni, D Lokshtanov, and S Saurabh. Sharp

separation and applications to exact and parameterized algorithms.

Algorithmica, 63(3):692–706, 2012.

[FGS+13] F V Fomin, S Gaspers, S Saurabh, and S Thomasse. A linear vertex

kernel for maximum internal spanning tree. J. Comput. Syst. Sci.,

79(1):1–6, 2013.

200


[FKN+08] M R Fellows, C Knauer, N Nishimura, P Ragde, F Rosamond, U Stege,

D Thilikos, and S Whitesides. Faster fixed-parameter tractable algo-

rithms for matching and packing problems. Algorithmica, 52(2):167–

176, 2008.

[FLP+14] F V Fomin, D Lokshtanov, F Panolan, and S Saurabh. Representative

sets of product families. In ESA, pages 443–454, 2014.

[FLS13] F V Fomin, D Lokshtanov, and S Saurabh. Efficient computation

of representative sets with applications in parameterized and exact

agorithms. CoRR abs/1304.4626, 2013.

[FLS14] F V Fomin, D Lokshtanov, and S Saurabh. Efficient computation

of representative sets with applications in parameterized and exact

agorithms. In SODA, pages 142–151, 2014.

[FP11] V Fionda and L Palopoli. Biological network querying techniques:

Analysis and comparison. J. Comput. Biol., 18(4):595–625, 2011.

[FR09] H Fernau and D Raible. A parameterized perspective on packing

paths of length two. J. Comb. Optim., 18(4):319–341, 2009.

[FWC14] Q Feng, J Wang, and J Chen. Matching and weighted p2-packing:

algorithms and kernels. Theor. Comput. Sci., 522:85–94, 2014.

[FWL+13] Q Feng, J Wang, S Li, and J Chen. Randomized parameterized

algorithms for p2-packing and co-path packing problems. J. Comb.

Optim., 2013.

[GJ79] M R Garey and D S Johnson. Computers and intractability: a guide

to the theory of NP-completeness. W.H. Freeman, New York, 1979.

[GKR09] P Giannopoulos, C Knauer, and G Rote. The parameterized complex-

ity of some geometric problems in unbounded dimension. In IPEC,

pages 198–209, 2009.

[GMP13] P Goyal, N Misra, and Fahad Panolan. Faster deterministic algorithms

for r-dimensional matching using representative sets. In FSTTCS,

pages 237–348, 2013.

[GMP+14] P Goyal, N Misra, F Panolan, and M Zehavi. Deterministic algorithms

for matching and packing problems based on representative sets. A

merge of [GMP13] and [Zeh13b], 2014.

[GNW07] J Guo, R Niedermeier, and S Wernicke. Parameterized complexity of

vertex cover variants. Theory Comput. Syst., 41(3):501–520, 2007.

201


[GRK09] G Gutin, I Razgon, and E J Kim. Minimum leaf out-branching and

related problems. Theor. Comput. Sci., 410(45):4571–4579, 2009.

[Gro02] M Grohe. Parameterized complexity for the database theorist. SIG-

MOD Record, 31(4):86–96, 2002.

[GS08] G Gottlob and S Szeider. Fixed-parameter algorithms for artificial

intelligence, constraint satisfaction, and database problems. The

Computer Journal, 51(3):303–325, 2008.

[GS13] S Guillemot and F Sikora. Finding and counting vertex-colored

subtrees. Algorithmica, 65(4):828–844, 2013.

[HNW08] F Huffner, R Niedermeier, and S Wernicke. Developing fixed-

parameter algorithms to solve combinatorially explosive biological

problems. Methods in Molecular Biology, 453:395–421, 2008.

[HWZ08] F Huffner, S Wernicke, and T Zichner. Algorithm engineering for color-

coding with applications to signaling pathway detection. Algorithmica,

52(2):114–132, 2008.

[KJM+11] A B Kahng, Lienig J, I L Markov, and J Hu. VLSI Physical Design -

From Graph Partitioning to Timing Closure. Springer, 2011.

[KLR08] J Kneis, A Langer, and P Rossmanith. Improved upper bounds for

partial vertex cover. In WG, pages 240–251, 2008.

[KMR+06] J Kneis, D Molle, S Richter, and P Rossmanith. Divide-and-color. In

WG, pages 58–67, 2006.

[KMR07] J Kneis, D Molle, and P Rossmanith. Partial vs. complete domination:

t-dominating set. In SOFSEM, pages 367–376, 2007.

[Kne09] J Kneis. Intuitive algorithms. PhD Thesis, RWTH Aachen University,

2009.

[Kou05] I Koutis. A faster parameterized algorithm for set packing. Inf.

Process. Lett., 94(1):7–9, 2005.

[Kou08] I Koutis. Faster algebraic algorithms for path and packing problems.

In ICALP, pages 575–586, 2008.

[Kou12] I Koutis. Constrained multilinear detection for faster functional motif

discovery. Inf. Process. Lett., 112(22):889–892, 2012.

[KW09] I Koutis and R Williams. Limits and applications of group algebras

for parameterized problems. In ICALP, pages 653–664, 2009.

202


[LCW07] Y Liu, J Chen, and J Wang. On efficient FPT algorithms for weighted

matching and packing problems. In TAMC, pages 575–586, 2007.

[LFS06] V Lacroix, C G Fernandes, and M F Sagot. Motif search in graphs:

Application to metabolic networks. IEEE/ACM Trans. Comput. Biol.

Bioinf., 3(4):360–368, 2006.

[LLC+06] Y Liu, S Lu, J Chen, and S H Sze. Greedy localization and color-

coding: improved matching and packing algorithms. In IWPEC,

pages 84–95, 2006.

[LMP+14] D Lokshtanov, P Misra, F Panolan, and S Saurabh. Deterministic

truncation of linear matroids. CoRR abs/1404.4506, 2014.

[LMS12] D Lokshtanov, N Misra, and S Saurabh. Kernelization—–

preprocessing with a guarantee. The Multivariate Algorithmic Revo-

lution and Beyond, pages 129–161, 2012.

[Lov75] L Lovasz. Three short proofs in graph theory. J. Combin. Theory

Ser., 19:269–271, 1975.

[Lov77] L Lovasz. Flats in matroids and geometric graphs. In BCC, pages

45–86, 1977.

[LWC+14] W Li, J Wang, J Chen, and Y Cao. A 2k-vertex kernel for maximum

internal spanning tree. CoRR abs/1412.8296, 2014.

[Mar06] D Marx. Parameterized coloring problems on chordal graphs. Theor.

Comput. Sci., 351:407––424, 2006.

[Mar09] D Marx. A parameterized view on matroid optimization problems.

Theor. Comput. Sci., 410:4471—-4479, 2009.

[Mar12] D Marx. What’s next? future directions in parameterized complexity.

The Multivariate Algorithmic Revolution and Beyond, pages 469–496,

2012.

[Mon85] B Monien. How to find long paths efficiently. Annals Disc. Math.,

25:239–254, 1985.

[MTB+10] A Mano, T Tuller, O Beja, and R Y Pinter. Comparative classification

of species and the study of pathway evolution based on the alignment

of metabolic pathways. BMC Bioinform., 11(S-1):S38, 2010.

[Ned09] J Nederlof. Fast polynomial-space algorithms using mobius inversion:

improving on steiner tree and related problems. In ICALP, pages

713–725, 2009.

203


[Nie06] R Niedermeier. Invitation to fixed-parameter algorithms. Oxford

University Press, 2006.

[Nie10] R Niedermeier. Reflections on multivariate algorithmics and problem

parameterization. In STACS, pages 17–32, 2010.

[NSS95] M Naor, J L Schulman, and A Srinivasan. Splitters and near-optimal

derandomization. In FOCS, pages 182–191, 1995.

[Oxl06] J G Oxley. Matroid theory. Oxford University Press, 2006.

[OY11] K Ozeki and T Yamashita. Spanning trees: a survey. Graphs and

Combinatorics, 27(1):1–26, 2011.

[PRT+08] R Y Pinter, O Rokhlenko, D Tsur, and M Ziv-Ukelson. Approximate

labelled subtree homeomorphism. J. Discrete Algorithms, 6(3):480–

496, 2008.

[PRY+05] R. Y Pinter, O Rokhlenko, E Yeger-Lotem, and M Ziv-Ukelson.

Alignment of metabolic pathways. Bioinformatics, 21(16):3401–3408,

2005.

[PS05] E Prieto and C Sloper. Reducing to independent set structure – the

case of k-internal spanning tree. Nord. J. Comput., 12(3):308––318,

2005.

[PS06] E Prieto and C Sloper. Looking at the stars. Theor. Comput. Sci.,

351:437–445, 2006.

[PSZ14a] R Y Pinter, H Shachnai, and M Zehavi. Deterministic parameterized

algorithms for the graph motif problem. In MFCS, pages 589–600,

2014.

[PSZ14b] R Y Pinter, H Shachnai, and M Zehavi. Improved parameterized

algorithms for network query problems. In IPEC, pages 294–306,

2014.

[PSZ15] R Y Pinter, H Shachnai, and M Zehavi. Partial information network

queries. J. Discrete Algorithms, 35:129–145, 2015.

[PZ13] R Y Pinter and M Zehavi. Partial information network queries. In

IWOCA, pages 362–275, 2013.

[PZ14] R Y Pinter and M Zehavi. Algorithms for topology-free and alignment

network queries. J. Discrete Algorithms, 27:29–53, 2014.

204


[RFG+13] D Raible, H Fernau, D Gaspers, and M Liedloff. Exact and param-

eterized algorithms for max internal spanning tree. Algorithmica,

65(1):95–128, 2013.

[RS12] R Rizzi and F Sikora. Some results on more flexible versions of graph

motif. In CSR, pages 278–289, 2012.

[Sal10] G Salamon. A survey on algorithms for the maximum internal

spanning tree and related problems. Electronic Notes in Discrete

Mathematics, 36:1209–1216, 2010.

[Sch80] J T Schwartz. Fast probabilistic algorithms for verification of polyno-

mial identities. J. ACM, 27(4):701–717, 1980.

[Sik12] F Sikora. An (almost complete) state of the art around the graph

motif problem. Universite Paris-Est Technical reports, 2012.

[SSR+06] T Shlomi, D Segal, E Ruppin, and R Sharan. Qpath: a method for

querying pathways in a protein-protein interaction networks. BMC

Bioinform., 7:199, 2006.

[SZ14a] H Shachnai and M Zehavi. Parameterized algorithms for graph

partitioning problems. In WG, pages 384–395, 2014.

[SZ14b] H Shachnai and M Zehavi. Representative families: A unified tradeoff-

based approach. In ESA, pages 786–797, 2014.

[TCH+08] M Tedder, D Corneil, M Habib, and C Paul. Simpler linear-time

modular decomposition via recursive factorizing permutations. In

ICALP, pages 634–645, 2008.

[vRT08] I van Rooij and W Todd. Parameterized complexity in cognitive mod-

eling: foundations, applications, and opportunities. The Computer

Journal, 51(3):385–404, 2008.

[WF08a] J Wang and Q Feng. Improved parameterized algorithms for weighted

3-set packing. In COCOON, pages 130–139, 2008.

[WF08b] J Wang and Q Feng. An o∗(3.523k) parameterized algorithm for 3-set

packing. In TAMC, pages 82–93, 2008.

[Wil09] R Williams. Finding paths of length k in O∗(2k) time. Inf. Process.

Lett., 109(6):315–318, 2009.

[Wil12] V V Williams. Multiplying matrices faster than Coppersmith-

Winograd. In STOC, pages 887–898, 2012.

205


[Zeh13a] M Zehavi. Algorithms for k-internal out-branching. In IPEC, pages

361–373, 2013.

[Zeh13b] M Zehavi. Deterministic parameterized algorithms for matching and

packing problems. CoRR abs/1311.0484, 2013.

[Zeh13c] M Zehavi. Parameterized algorithms for module motif. In MFCS,

pages 825–836, 2013.

[Zeh14] M Zehavi. Solving parameterized problems by mixing color coding-

related techniques. CoRR abs/1410.5062, 2014.

[Zip79] R Zippel. Probabilistic algorithms for sparse polynomials. In EU-

ROSAM, pages 216–226, 1979.

206


להתאמות ציונים ונתונים תוויות, בעלי הצמתים בהן האיזומורפי, תת־הגרף בעיית של בגרסאות מדובר

,(topology−free queries) טופולוגיה תלויות לא בשאילתות זאת, לעומת האלה. התוויות בין שונות

תוויות). של (multiset) רב־קבוצה הינה P התבנית (כלומר התבנית של הטופולוגיה על מידע כל אין

מוכלת צמתיו תוויות את הכוללת שרב־הקבוצה תת־עץ Hב־ למצוא יש אלו בבעיות פורמלי, לא באופן

.P ב־

,(partial information queries) חלקי מידע שאילתות בהגדרת היא העבודה של נוספת תרומה

התבנית אלה, בשאילתות כלומר התבנית. של הטופולוגיה על חלקי מידע יש בהם לקלטים המתאימות

טופולוגיה, תלויית שאילתה נקבל בלבד, אחד גרף יש P ב־ כאשר קשירים. גרפים של אוסף הינה P

לא באופן טופולוגיה. תלויית לא שאילתה נקבל בודדים, צמתים שהם גרפים רק יש P ב־ וכאשר

כאשר גרפים, למספר לחלק ניתן אותו קשיר תת־גרף Hב־ למצוא יש חלקי מידע בשאילתות פורמלי,

אלגוריתמים פיתחנו אלה, שאילתות של הבסיסי הסוג מלבד .P ב־ אחר לגרף "דומה" מהם אחד כל

ניתן בהם וריאנטים כגון ביולוגיות, דווקא לאו רשתות, לחקר הרלוונטיות יותר כלליות לגרסאות גם

התבנית בהן שאילתות אלה מודולים: שאילתות חקרנו כן, על יתר לפתרון. צמתים להשמיט/להוסיף

במציאת אינה החשיבות טופולוגיה, תלויות לא לשאילתות בניגוד כעת, אך תוויות, של אוסף היא P

(כנדרש מסויימות תוויות בעלי Hב־ צמתים קבוצת למצוא יש מודולים בשאילתות קשיר. תת־גרף

לקבוצה. מחוץ שכנים אותם בדיוק יש בקבוצה צומת שלכל כך התבנית), לפי

iii


על המבוססים מאלה יותר מהירים אלגוריתמים לפתח ניתן אלה טכניקות באמצעות פוטנציאלי.

,2013 בשנת לאחרונה, ממושקלות. לא לבעיות רק ויעילים רנדומיים הם אך וצבע, הפרד שיטת

.(representative sets) מייצגות קבוצות של במיוחד יעיל חישוב נוספים ומחברים Fomin הציגו

דטרמיניסטיים אלגוריתמים לקבל ניתן דינאמי, תכנות עם מייצגות קבוצות של חישובים שילוב על־ידי

וצבע. הפרד שיטת על המבוססים מאלה יותר מהירים שהינם ממשוקלות לבעיות

קידוד על המבוססות שיטות ולייעול לשילוב כלליות סכימות פיתוח הינה העבודה של מרכזית תרומה

הקבוצות ושיטת הצרות, והנפות המולטילינארי הגילוי שיטות וצבע, הפרד שיטת (דהיינו בצבעים

קידוד על המבוססות בשיטות שימוש העושים אלגוריתמים מוצגים בעבודה כן, על יתר המייצגות).

בביולוגיה רבה חשיבות להן מודרניות, לבעיות והן בגרפים קלאסיות בעיות למגוון הן בצבעים,

שמרכיביהן אסטרטגיות בעבודה פיתחנו חלקיהם, מסך הגדולים שלמים לקבלת ראשית, מערכתית.

שתי בין משלבת ביניהן הראשונה האסטרטגיה בצבעים. קידוד על המבוססות שונות שיטות משלבים

השנייה האסטרטגיה וצבע. הפרד שיטת של וצעד הצרות הנפות בשיטת שימוש העושות פרוצדורות

בשיטת שימוש העושה פרוצדורה עם מייצגות קבוצות על המבוססת פרוצדורה משלבת שפיתחנו

חדשה טכניקה עם וצבע הפרד שיטת של צעד המשלבת אסטרטגיה פיתחנו בנוסף, הצרות. הנפות

זרות. קבוצות למספר נחלקים בבעיה האלמנטים בהם לקלטים המתאימה מייצגות, קבוצות לחישוב

מלבד בצבעים. קידוד על המבוססות שיטות של פרטנית להפעלה חדשות טכניקות פיתחנו שנית,

(tradeoff) תמורות שקלול על המבוססת גישה גם חקרנו מייצגות, קבוצות לחישוב לעיל הטכניקה

יותר נקבל גדל, שערכו ככל בפרמטר: נעזרת בעבודה המוצעת הגישה מייצגות. קבוצות לחישוב

אלגוריתם כל כמעט להאיץ מאפשרת זו גישה ביעילות. יותר לחשבן ניתן אך מייצגות, קבוצות

דרך הצגנו בנוסף, אחידים. למטרואידים ביחס מייצגות קבוצות של קודמים חישובים על המבוסס

להאיץ מאפשר זו בדרך שימוש גרפים. של בצביעה הנעזרת המוליטילינארי הגילוי שיטת להפעלת

דרגה בעל הינו הקלט גרף בהן גרפים על לבעיות מוליטלינארי, גילוי על המבוססים אלגוריתמים

ההפרדה שיטת להפעלת סטנדרטיות לא דרכים חקרנו גרפים, חלוקת בעיות עבור קטנה. מקסימלית

בשם כלי הצגנו כן, על יתר וצבע. הפרד שיטת של אחד צעד על המבוססת טכניקה שהינה הרנדומית,

שהטופולוגיה נתון) (בגרף תתי־עצים מציאת לצורך שימושי אשר (guiding trees) מנחים" "עצים

על־ידי העץ. טופולוגית על לוקלי מידע המספק קטן עץ הוא מנחה עץ ידועה. אינה שלהם המדוייקת

המייצגות. הקבוצות שיטת בהפעלת חיוניים חישובים יותר רבה ביעילות לבצע ניתן זה, במידע שימוש

בהן שימוש על־ידי בצבעים, קידוד על המבוססות שיטות של כוחן את מדגימות בעבודה התוצאות

נכללים בעבודה שטופלו הקלאסיות הבעיות בין בגרפים. בעיות של נרחב לאוסף אלגוריתמים לפיתוח

גרפים חלוקת בעיות ,(k באורך המסלול בעיית (כגון האיזומורפי תת־הגרף בעיית של פרטיים מקרים

3־קבוצות), של ה־k־אריזה בעיית (כגון ואריזה שידוך בעיות ,((k, n−k)מקסימום־ חתך בעיית (כגון

לפחות). פנימיים צמתים k בעלי פורשים עצים (כגון מיוחדים פורשים עצים למציאת הקשורות ובעיות

בביואינפורמטיקה ביישומים שמקורן שאילתות מסוג בעיות על מיוחד דגש מושם בעבודה כן, על יתר

ביולוגיות. רשתות לחקר

Hב־ יש האם להכריע ויש ,P ותבנית H מתוייג גרף נתונים ביולוגיות, שאילתות מסוג בבעיות

ובפענוח P של התפקיד בהבנת לסייע יוכלו אלה לבעיות פתרונות .P ל־ "הדומה" קשיר תת־גרף

,(topology−based queries) טופולוגיה תלויות בשאילתות שונים. מינים בין אבולוציוניים קשרים

למעשה, וקשיר); מתוייג גרף הינה P התבנית (כלומר בשלמותה ידועה P התבנית של הטופולוגיה

ii


תקציר

הידועות בעיות עבור קרובות, לעתים מהירים, באלגוריתמים הצורך גובר הטכנולוגיה, התקדמות עם

עצומה כמות על תכופות לעתים מסתמכים הביואינפורמטיקה בתחום מחקרים בפרט, כ־NP־קשות.

פרמטרית, סיבוכיות במיוחד. יעילים יהיו לביצועם הכלים כי הכרחי על־כן, רבים. ממקורות מידע של

להתמודדות פרדיגרמה היא ,Fellowsו־ Downey על־ידי ה־90 בשנות נערך הראשון השיטתי שמחקרה

ומפתיע. מפליא שגרתי–באופן בסיס על מתמטי ליישום הניתנת חישובית, מבחינה קשות בעיות עם

NP־קשות, לבעיות אלגוריתמים של הריצה זמן את להפחית היא זו פרדיגמה מטרת המזלג, קצה על

לגודל ביחס יעיל ריצה זמן על ושמירה ,k מסויים, לפרמטר המעריכית הזמן סיבוכיות הגבלת על־ידי

אותו הקלט, בגודל התלוי פולינום על־ידי החסום בזמן רץ פרמטרי אלגוריתם פורמלי, באופן הקלט.

בפועל, .O∗(f(k))ב־ מסומן זה חסם ;k בפרמטר ורק אך התלויה f (כלשהי) בפונקציה מכפילים

גודל (כגון פרמטרים לבודד ניתן אכן בביואינפורמטיקה, ובפרט המחשב, במדעי רבות בעיות עבור

סטנדרטי. קלט מגודל משמעותי באופן קטנים שהינם הפתרון)

על־ידי הוצעה פרמטריים, אלגוריתמים לפיתוח נפוצה טכניקה כיום המהווה בצבעים, הקידוד שיטת

של מסויימים לקלטים אלגוריתמים להשגת הובילה הטכניקה .1994 בשנת נוספים ומחברים אלון

cו־ הפתרון גודל הוא k כאשר ,O∗(ck) על־ידי חסום ריצתם שזמן האיזומורפי, תת־הגרף בעיית

כאשר איטרציות, מספר כוללת בצבעים הקידוד שיטת הפעלת בקצרה, מ־1. גדול כלשהו קבוע הוא

גבוהה, בהסתברות מפתרון; חלק להוות שעשוי אלמנט כל רנדומי באופן לצבוע יש איטרציה בכל

שימוש על־ידי לגלותם ניתן ואז מזה, זה שונים צבעים בעלי בפתרון האלמנטים בה איטרציה תבדק

בעיות לפתרון גם מתאימה והיא הצבעים, הקידוד לשיטת דה־רנדומיזציה לבצע ניתן דינאמי. בתכנות

ממושקלות.

פותחו באמצעותן בצבעים'', קידוד מבוססות כ־''שיטות הידועות שיטות, שלוש הוצגו האחרון בעשור

במדעי מרכזיות לבעיות בצבעים, בקידוד שימוש העושים מאלה יותר יעילים דרך, פורצי אלגוריתמים

בין מקשרות הן בה במידה סטנדרטיות אינן אלה שיטות ."k באורך ה־"מסלול בעיית כגון המחשב,

הגרפים תורת לינארית, אלגברה המטרואידים, תורת כגון ובמתמטיקה, המחשב במדעי שונים ענפים

בשנת הוצגה ,(divide־and־color) וצבע" "הפרד הראשונה, השיטה קומבינטורית. ואופטימיזציה

קידוד של הפעלה מעין הוא שלב כל בה רקורסיבית, שיטה זוהי נוספים. ומחברים Chen על־ידי 2006

(multilinear detection) המולטילינארי" "הגילוי שיטת בלבד. צבעים בשני שימוש תוך בצבעים,

משמעותי באופן ושופרה ,Williamsו־ Koutis על־ידי 2008 בשנת שפותחה אלגברית טכניקה היא

בשנת (narrow sieves) הצרות" "הנפות שיטת לקבלת נוספים, ומחברים Bjorklund על־ידי

מסויים בתחום נתונה בעיה תרגום על מבוססות הצרות והנפות המולטילינארי הגילוי שיטות .2010

פתרון לכל (monomial) משתנים של בודדת מכפלה התאמת תוך לפולינום, הגרפים) תורת (כגון,

i


המחשב. למדעי בפקולטה שכנאי הדס ופרופסור פינטר י. רון פרופסור בהנחיית נעשה המחקר

תודות

שהעניקו וההכוונה המסורה הנחייתם על שכנאי, והדס פינטר י. רון שלי, למנחים להודות ברצוני

שתמכו אחרים ומוסדות מהטכניון וסטודנטים סגל לחברי גם להודות רוצה אני לימודיי. במהלך לי

בעבודתי. עניין והביעו

האהבה, על ובתיה, ראובן וסבתי, ולסבי ואבי, עדה להוריי, שקד, לאחי, להודות ברצוני בנוסף,

מילים עת. בכל לי שהעניקו וההבנה הסבלנות הטובות, העיצות הדאגה, התמיכה, ההשראה, העידוד,

החמוד לכלב מודה גם אני עדה. לאמי, הזו העבודה את מקדישה אני תודתי. את להביע יכולות לא

אודי. שלי,

בהשתלמותי. הנדיבה הכספית התמיכה על זף ולמשפחת לטכניון מודה אני


בגרפים פרמטריות לבעיות אלגוריתמיםביולוגיות ברשתות לשאילתות יישומים עם

מחקר על חיבור

התואר לקבלת הדרישות של חלקי מילוי לשם

לפילוסופיה דוקטור

זהבי מירב

לישראל טכנולוגי מכון – הטכניון לסנט הוגש

2015 יוני חיפה תשע"ה תמוז


בגרפים פרמטריות לבעיות אלגוריתמיםביולוגיות ברשתות לשאילתות יישומים עם

זהבי מירב


algorithms for parameterized graph problems with ... · there is a growing, vital need for fast...

Documents