computing weighted solutions in asp: representation-based...

Noname manuscript No.(will be inserted by the editor)

Computing Weighted Solutions in ASP:Representation-Based Method vs. Search-Based Method

Duygu Cakmak · Esra Erdem · Halit Erdogan

Received: date / Accepted: date

Abstract For some problems with too many solutions, one way to obtain the more desir-able ones is to assign each solution a weight that characterizes its importance quantitatively,and then compute the solutions whose weights are over (resp. below) a given threshold. Forinstance, consider the problem of reconstructing phylogenies to be able to analyze evolution-ary relationships between species. If each phylogeny is assigned a weight that characterizesthe expected groupings with respect to some archeological evidence, then finding a phy-logeny of higher weight over some threshold might be more desirable. This paper studiesproblems related to computing weighted solutions in the context of Answer Set Program-ming (ASP), and in particular investigates two sorts of methods for computing weightedsolutions: one method suggests modifying the ASP representation of the problem to com-pute weighted solutions using an existing ASP solver and the other suggests modifying thesearch algorithm of the answer set solver to compute weighted solutions incrementally. Wehave applied these methods on two real-world problems: reconstructing weighted phylo-genies for Indo-European languages and for Quercus species. For experiments, we haveused the answer set solver CLASP with the representation-based method; and we have mod-ified the search algorithm of CLASP (and called it CLASP-W) for the search-based method.In the representation-based method, a given weight function is represented in ASP; in thesearch-based method however it is implemented in C++. We have defined two novel weightfunctions for phylogenies that can incorporate domain-specific information about simple(resp. hierarchical) groupings; the weight function that incorporates information about hier-archical groupings cannot be represented in ASP due to some mathematical functions notsupported by the ASP solvers. To compare the representation-based method with the search-based method, we have experimented with Indo-European languages with the weight mea-sure that can be represented in ASP. We have observed that the search-based method out-performs the representation-based method in terms of computational efficiency (both timeand space). To show the applicability and modularity of the search-based method (and thusCLASP-W), we have experimented with Quercus species considering the other weight func-tion. We have observed that, for weight functions that cannot be represented in ASP, thesearch-based method provides a tool for computing weighted solutions in ASP; and thesearch algorithm for CLASP-W does not need any problem/function-specific modifications.

Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, TurkeyE-mail: {duygucakmak,esraerdem,halit}@sabanciuniv.edu

2

As for effectiveness, with both approaches, plausible phylogenies among many can be foundwithout computing all phylogenies and then requiring historical linguists to go over all pos-sible phylogenies manually; in that sense, our methods contribute to phylogenetics studiesas well.

Keywords weighted solutions, answer set programming, phylogenies

1 Introduction

Answer Set Programming (ASP) [17] is a declarative programming paradigm oriented to-wards solving combinatorial search problems. In ASP, a combinatorial search problem isrepresented as an ASP program whose models (called “answer sets”) correspond to the so-lutions. The answer sets for a given ASP program can be computed by special systemscalled answer set solvers, such as CLASP [15]. Due to its expressive formalism that allows,e.g., negation, aggregates, recursive definitions, and due to the continuous improvementsof efficiency of its solvers, ASP has been used to solve such problems in a wide-range ofknowledge-intensive applications from different fields, such as product configuration [27],planning [16], multi-agent planning [29], phylogeny reconstruction [4], developing a deci-sion support system for a space shuttle [21], systems biology [30].

Generally such combinatorial search problems have too many solutions. Moreover, dueto auxiliary predicates defined in order to solve these problems, the correspondence be-tween the answer sets and the solutions may not be one-to-one; there may be many answersets that denote the same solution. For such problems, one way to obtain more desirablesolutions is to assign weights to solutions, and then compute the distinct solutions whoseweights are over (resp. below) a threshold. For instance, in planning, if each action of arobot is assigned a weight proportional to the length of the trajectory traveled and/or thedistance of the trajectory from obstacles, then finding a plan of lower weight might be moredesirable. In phylogenetics, consider the problem of reconstructing phylogenies (i.e., com-puting leaf-labeled trees, called phylogenies, to model the evolutionary history of a set ofspecies). If each phylogeny is assigned a weight that characterizes the expected groupingswith respect to some archeological evidence, then finding a phylogeny of higher weight oversome threshold might be more desirable. In puzzle generation, we can define the weight ofa puzzle instance by means of some difficulty measure, and then generate difficult puzzleswhose weights are over a given value.

Motivated by such problems that play an important role in real-world applications, wehave studied problems related to computing weighted solutions in ASP, investigated compu-tational methods to solve them using the state-of-the-art answer set solvers, compared thesemethods experimentally, and showed their usefulness and effectiveness in phylogeny recon-struction applied to two real datasets, Indo-European languages [23] and Quercus species(oak trees) [1].

The computational methods we have investigated for computing weighted solutions areof two sorts:

Representation-based methods The idea is to modify the ASP representation of the prob-lem by explicitly defining the weight measure and adding weight constraints, and tocompute weighted solutions using an existing ASP solver.

Search-based methods These methods do not modify the ASP representation of the prob-lem, but define the weight function externally (e.g., as a C++ program) and modify the

3

search algorithm of the answer set solver to compute weighted solutions incrementallyin the spirit of branch-and-bound.

We have applied these methods on two real-world problems: reconstructing weightedphylogenies for Indo-European languages and for Quercus species. For experiments, wehave used the answer set solver CLASP with the representation-based methods; and we havemodified the search algorithm of CLASP for the search-based method. The modified versionof CLASP is called CLASP-W. In the representation-based method, a given weight functionis represented in ASP; in the search-based method however it is implemented in C++ in aseparate file. We have defined two novel weight functions for phylogenies that can incor-porate domain-specific information about simple (resp. hierarchical) groupings; the weightfunction that incorporates information about hierarchical groupings cannot be representedin ASP due to some mathematical functions not supported by the ASP solvers. To com-pare the representation-based method with the search-based method, we have experimentedwith Indo-European languages with the weight measure that can be represented in ASP. Wehave observed that the search-based method outperforms the representation-based methodin terms of computational efficiency (both time and space). To show the applicability andmodularity of the search-based method (and thus CLASP-W), we have experimented withQuercus species considering the other weight function that cannot be represented in ASP.As for effectiveness, with both approaches, plausible phylogenies among many can be foundwithout computing all phylogenies.

These experimental studies illustrate the significance of our contributions both from thepoint of view of ASP and from the point of view of phylogenetics:

– There is no answer set solver that can compute weighted solutions incrementally usinga branch-and-bound method, where the weight function is defined externally in C++. Inthat sense, CLASP-W provides a tool for ASP to compute weighted solutions, in partic-ular, when the weight function cannot be represented in ASP. Note that since the weightfunction is defined in a separate file, CLASP-W does not require any problem/function-specific modifications.

– Reconstructing phylogenies for a given set of taxonomic units (e.g., species, languages)is important for various research such as historical linguistics, zoology, anthropology,archeology, etc.. For example, a phylogeny of languages may help scientists to betterunderstand human migrations [31]. For a given set of taxonomic units, some existingphylogenetic systems, like that of [4], generate too many phylogenies that explain theevolutionary relationships between the given taxonomic units. In such cases, to pick themost plausible ones, experts need to go over each one of these phylogenies manually andin detail. There is no phylogenetic system that can help experts to order phylogenies withrespect to a plausibility measure that includes also some domain-specific information.In that sense, our methods contribute to phylogenetics studies as well.

The rest of the paper is organized as follows. Section 2 precisely describes the deci-sion/optimization problems related to computing weighted solutions in ASP. Section 3 de-scribes the representation-based method and the search-based method; in particular, it ex-plains in detail how the search algorithm of CLASP is modified to turn CLASP into CLASP-W. Section 4 defines the weighted phylogeny reconstruction problem, analyzes its compu-tational complexity. Sections 5 and 6 describe the application of the representation-basedmethod and the search-based method to infer phylogenies for Indo-European languages andQuercus species respectively. These sections also introduce the two novel weight functionsthat take into account the compatibility criterion for reconstructing phylogenies as well assome domain-specific information provided by the experts. The results of our experiments

4

are then summarized in Section 7. Section 8 discusses related work; in particular, based onthe methods of [11] for computing similar (diverse) solutions, it proposes an extension ofthe applicability of our methods to compute similar/diverse weighted solutions. Section 9concludes.

This paper extends [7,8] substantially by a detailed discussion on the representation-based method and the search-based method for computing weighted solutions. It includesrelated theorems and their proofs. It extends the discussion on related work, in conjunctionwith alternative representation-based methods and similar computational problems (e.g.,computing similar/diverse solutions). Also it extends the discussion on experimental results,by considering different weight measures on a different dataset (Quercus species).

2 Weighted Solutions

We are interested in the following sorts of computational problems for computing weightedsolutions:

AT LEAST (resp. AT MOST) w-WEIGHTED SOLUTION: Given an ASP program P thatformulates a computational problem P , a weight measure ω that maps a solution for Pto a nonnegative integer, and a nonnegative integer w, decide whether a solution S existsfor P such that ω(S) ≥ w (resp. ω(S) ≤ w).

For instance, suppose that P describes the phylogeny reconstruction problem for Indo-European languages, and that ω describes the total weight of the characters compatible withthe to-be-constructed phylogeny and takes into account some domain-specific information.Then finding phylogenies whose weights are at least 45 is an instance of the problem above.

Suppose that the ASP program P is a propositional normal logic program and decidingω(S) ≤ w (resp. ω(S) ≥ w) for a given w is in NP. Then, we can conclude that:

Proposition 1 AT LEAST (resp. AT MOST) w-WEIGHTED SOLUTION is NP-complete.

3 Computing Weighted Solutions

We have studied two sorts of methods, representation-based and search-based, for comput-ing at least/most w-weighted solutions of a given problem P , given an ASP program P , aweight measure ω that maps a solution to a nonnegative integer, and a nonnegative integerw.

3.1 Representation-Based Methods

The idea behind the representation-based methods (shown in Figure 1) is

– to modify the ASP representation P of the problem by adding a definition of the weightmeasure ω as an ASP program W and a representation of the weight constraints as anASP program C, and

– to find at least/most w-weighted solutions by computing answer sets for the ASP pro-gram P ∪W ∪ C using an existing ASP solver like CLASP.

In some problems (e.g., finding a shorter plan), we do not have to define the weight of asolution explicitly; we can simply use aggregates (e.g., sum, count, times) as part of weightconstraints in the sense of [26,28,14]. However, many real-world applications (like planning

5

w

Solve.lp (P): (Computes a solution to the problem)

Weight.lp (W): (Computes the weight of a solution)

Constraint.lp (C): (Eliminates the solutions whose weight is above/below w)

w-weighted solutions

CLASP

Fig. 1 Representation-based method: computing at most/least w-weighted solutions.

or phylogeny reconstruction mentioned in the introduction) may need sophisticated weightmeasures due to the nature of the domain-specific information, and thus require an explicitASP definition of the weight of a solution. With this motivation, our representation-basedmethods focus on combinatorial search problems where weight measures need to be definedexplicitly.

3.2 Search-Based Methods

Search-based methods (as illustrated in Figure 2) do not modify the ASP representation ofthe problem, nor defines the weight measure as an ASP program. The idea is

– to define the weight function externally as a C++ program, and– to modify the search algorithm of an existing answer set solver like CLASP to compute

solutions whose weights are over (resp. below) a given threshold.

There is no answer set solver that can compute weighted solutions in such a way.In our studies, we have considered the ASP solver CLASP, and modified its search al-

gorithm in such a way that it computes weighted solutions incrementally in the spirit ofa branch-and-bound algorithm. Before describing these modifications, let us describe firstCLASP’s algorithm.

CLASP does a conflict-driven DPLL-like [9] branch and bound search to find an answerset for the program: at each level, it does propagation followed by backtracking or selectionof new literals according to the current conflicts. A rough working principle of CLASP isshown in Algorithm 1. As can be seen, CLASP goes through three main steps to find ananswer set. In the PROPAGATION step, it decides the literals that have to be included in theanswer set due to the current assignment and conflicts. In the RESOLVE-CONFLICT step,it tries to resolve the conflicts encountered in the previous step. If there is a conflict, thenCLASP learns it and does backtracking to an appropriate level. If there are no conflicts andthe currently selected literals do not represent an answer set, in SELECT, CLASP selects anew literal (based on some heuristics) to continue search.

We have modified CLASP’s algorithm (Algorithm 1) to compute at least (resp. at most)w-weighted solutions, as shown in Algorithm 2; the parts in red denote these modifications.

6

w

w-weighted solutions

Solve.lp (P): (Computes a solution to the problem)

Weight.cpp (Computes an upper bound for the weight of any completion of a partial solution)

CLASP-NK

Fig. 2 Search-based method: computing at most/least w-weighted solutions.

Algorithm 1 CLASPInput: An ASP program ΠOutput: An answer set A for ΠA← ∅ // current assignment of literals5← ∅ // set of conflictswhile No Answer Set Found do

// propagate according to the current assignment and conflicts; update the current assignmentPROPAGATION(Π,A,5)if There is a conflict in the current assignment then

RESOLVE-CONFLICT(Π,A,5) // learn and update the conflict set and do backtrackingelse

if Current assignment does not yield an answer set thenSELECT(Π,A,5) // select a literal to continue search

elsereturn A

end ifend if

end while

The new algorithm is called CLASP-W. Note that compared to CLASP, CLASP-W has anadditional procedure called WEIGHT-ANALYZE; therefore, in order to use CLASP-W, theWEIGHT-ANALYZE function needs to be implemented according to the weight measure ofthe specific domain.

The WEIGHT-ANALYZE function is called at each step of the search; therefore, it shouldbe able to identify the partial solution characterized by the currently selected literals, and es-timate the weight of a complete solution constructed on the partial solution. Since a partialsolution may extend to many complete solutions, the WEIGHT-ANALYZE function computesan upper bound (resp. a lower bound) for the weight of a solution that extends the currentpartial solution. Computing an exact upper bound (resp. a lower bound) might be hard andinefficient; therefore, one may be interested in implementing a heuristic function that ap-proximates the upper bound (resp. lower bound) for the weight of a solution. To guaranteecompleteness, the heuristic function shall be admissible. In other words, the upper bound(resp. lower bound) computed by the heuristic function shall be greater (resp. less) than orequal to the exact upper bound (resp. lower bound). If this is not the case, then we have arisk of missing a solution.

7

Algorithm 2 CLASP-WInput: An ASP program Π and a nonnegative integer wOutput: An answer set for Π , that describes an at least (resp. at most) w-weighted solutionA← ∅ // current assignment of literals5← ∅ // set of conflictswhile A does not represent an answer set do

// propagate according to the current assignment and conflicts;update the current assignmentPROPAGATION(Π,A,5)// compute an upper (resp. lower) bound for the weight of a solution that contains Aweight← WEIGHT-ANALYZE(A)// if the upper bound weight is less than the desired weight value w// then no need to continue search to find an at least w-weighted solutionif There is a conflict in propagation OR weight < w then

RESOLVE-CONFLICT (Π,A,5) // learn and update the conflict set and do backtrackingend ifif Current assignment does not yield an answer set then

SELECT(Π,A,5) // select a literal to continue searchelse

return Aend if

end whilereturn false

Once the WEIGHT-ANALYZE function is defined to estimate the weight of a solution,we can check whether the estimated weight is less (resp. greater) than the given weightthreshold w. If the upper bound (resp. the lower bound) computed by the heuristic functionis already less (resp. greater) than the given weight threshold w, then there is no solutionthat can be characterized by any extension of the current assignment of literals and that hasa weight greater (resp. smaller) than w. Therefore, the current assignment of literals can beset as conflict. CLASP learns this particular assignment of literals that leads to a conflict andnever selects this set of literals in the further stages of the search. By this way, it eliminatesredundancy in search.

4 Weighted Phylogeny Reconstruction Problem

The evolutionary relations between species (or “taxonomic units”) based on their sharedtraits can be modeled as a phylogeny—a tree whose leaves represent the species, internalvertices represent their ancestors and edges in between represent the relationships betweenthem. The problem of phylogeny reconstruction asks for “plausible” phylogenies for a givenset of taxonomic units. The plausibility of phylogenies can be evaluated with respect todomain-specific information (e.g., biological evidence, archeological evidence, historicallinguistics) provided by the experts.

There have been various studies to compute plausible phylogenies (cf. [4]). We haveconsidered a character-based cladistics approach with respect to the compatibility criterion,as in [4]. This approach describes each taxonomic unit with a set of “characters”—traitsthat every taxonomic unit can instantiate in a variety of ways. The taxonomic units thatinstantiate the character in the same way are assigned the same “state” of that character.Once the state of every character for every taxonomic unit is described, phylogenies canbe inferred using the “compatibility” criterion: the goal is to find a phylogeny with a smallnumber of incompatible characters. The problem of reconstructing a phylogeny with at mostc incompatible characters (let us call this problem as c-CP) is NP-hard [10].

8

While reconstructing phylogenies, some characters may give more information than theothers. For instance, to model the evolutionary history of a family of languages, morpholog-ical/phonological characters are more informative than lexical characters. In order to em-phasize the role of such characters in reconstructing a phylogeny, we can define the conceptof a weighted phylogeny.

Before defining weighted phylogenies, let us define a phylogeny. A phylogeny for a setof taxonomic units is a finite rooted binary tree 〈V, E〉 along with two finite sets I and S anda function f from L x I to S, where L is the set of leaves of the tree. The set L representsthe given taxonomic units, whereas the set V describes their ancestral units and the set Edescribes the genetic relationships between them. The elements of I represent, intuitively,qualitative characters, and elements of S are possible states of these characters. The functionf “labels” every leaf v by mapping every index i to the state f(v, i) of the correspondingcharacter in that taxonomic unit. A character i ∈ I is said to be compatible with a phylogeny(V,E, L, I, S, f) if there exist a function g : V x i→ S such that

– For every leaf v of the phylogeny, g(v, i) = f(v, i)

– For every s ∈ S if the setVis = {x ∈ V : g(x, i) = s}is nonempty, then the digraph 〈V, E〉 has a subgraph with the set Vis of vertices that isa rooted tree.

A weighted phylogeny is a phylogeny (V,E, L, I, S, f) along with a weight functionω that maps every character i ∈ I to a nonnegative integer. The weight of a phylogeny(V,E, L, I, S, f) can be defined in various ways with respect to ω; for instance, we candefine the weight of a phylogeny (weight(V,E, L, I, S, f)) as the sum of the weights of allcharacters that are compatible with that phylogeny.

We are interested in computing weighted phylogenies for a given set of taxonomic units,and thus the following decision problem:

AT LEAST w-WEIGHTED c-COMPATIBLE PHYLOGENIES (wc-WCP): Given three setsL, I, S, a function f from L× I to S, a function ω from I to nonnegative integers, andtwo nonnegative integers w and c, decide the existence of a phylogeny (V,E, L, I, S, f)

with at most c incompatible characters and whose weight is at least w.

Suppose that deciding weight(V,E, L, I, S, f) ≥ w is in NP. Then, we can concludethat:

Proposition 2 wc-WCP is NP-complete.

In addition to solving an optimization problem to find phylogenies with the minimumnumber of incompatible characters and maximum weight, based on our experts’ feedback,we are interested in solving a decision problem to find phylogenies with a small number ofincompatible characters and a large weight. The reason is that some computed phylogeniesthat are “almost optimal” can be identified as plausible by the experts even though they donot have minimum number of incompatible characters or maximum weight. For example,in [5], for Indo-European languages, some phylogenies computed with 17 or 18 incompati-ble characters are identified as plausible, even though the minimum number of incompatiblecharacters for the dataset is 16. In addition, in [7], the phylogenies with the weights over45 are identified as plausible, even though the maximum weight of the computed phyloge-nies is 65. Based on this motivation, we can decide for the thresholds as follows: first wecompute a phylogeny with the minimum number of compatible characters (since our ap-proach to phylogenetics is based on the compatibility criterion) by increasing the value of

9

c one by one starting from 0 until a phylogeny is computed. Since there may be plausiblephylogenies “close” to the optimal one in terms of the number of incompatible characters,we also compute phylogenies with a “small” number of incompatible characters by furtherincreasing the value of c by some units depending on the number of phylogenies computedso far. Similarly, we can decide for the value of the threshold w for weight.

5 Computing Weighted Phylogenies for Indo-European Languages

We can compute weighted phylogenies for the family of Indo-European languages describedin [4], using the representation-based method or the search-based method described in Sec-tion 3. Before we show the applicability of these methods, let us define the weight Φ of acharacter for languages, and the weight of a phylogeny (V,E, L, I, S, f) for Indo-Europeanlanguages.

5.1 Weight of a Phylogeny for Indo-European

As discussed in [4], out of all given characters and character states, only informative charac-ters1 and their essential states2 play a role in reconstructing phylogenies. Therefore, we haveconsidered informative characters and their essential states only, while reconstructing phy-logenies for Indo-European languages. We have observed that some informative charactershave less number of essential states whereas some have many; the ones with many essentialstates give more information as to how the languages are related to each other. Based on thisdomain-specific information, we have defined the weight Φ of an informative character i ∈ Iwith respect to a phylogeny (V,E, L, I, S, f), as the number of languages that are mappedto an essential state for that character:

ΦL,I,S,f (i) = |{l ∈ L : f(i, l, s) = s, i is informative, s is essential}| (1)

In the following, we drop the subscript from ΦL,I,S,f when L, I, S, f are clear from thecontext.

We have then defined the weight of a phylogeny (V,E, L, I, S, f) for Indo-Europeanlanguages simply as the sum of the weights of informative characters that are compatiblewith that phylogeny:

weight(V,E, L, I, S, f) =∑

i∈I,i is compatible

Φ(i) (2)

To get more plausible phylogenies, we have also incorporated further domain-specific in-formation. It was told us by historical linguist Don Ringe that it is least likely that Greco-Armenian (GA) languages are siblings with Balto-Slavic (BS) languages. Similarly, but notas least likely as GA and BS, is the grouping of GA with Germanic (GE) languages. If the to-be-reconstructed phylogenies have such odd groupings of languages, we have reduced someamount from the total weight of the phylogeny making sure that the weight of a phylogenyis not negative.

1 A character is informative if it has at least 2 essential states.2 For a phylogeny (V,E, L, I, S, f), a state s ∈ S is essential with respect to a character j ∈ I if there

exist two different leaves l1 and l2 in L such that f(l1, j) = f(l2, j) = s.

10

5.2 The Representation-Based Method

The representation-based method (Section 3, Figure 1) suggests describing the weight ofa phylogeny as an ASP program W , the weight constraints as an ASP program C, andthen compute weighted phylogenies by means of computing answer sets for the programP∪W∪C where P is a program that describes phylogeny reconstruction. We have describedASP programs in the language of LPARSE [25]. We have taken P as the phylogeny programof [4].

We have described the weight of a phylogeny (as defined in the previous section) asan ASP program W in four parts, as follows. Suppose that PW and W denote phylogenyweights, IC denotes an informative character, CW denotes a character weight, and C denotesa character. First we have described the weight CW of an informative character IC as in (1):

weightOfChar(IC,CW) :-CW{leaf(V):f(V,IC,S):essential_state(IC,S)}CW.

In the second part, first we have defined the sum of the weights of characters compatiblewith the to-be-reconstructed phylogeny, as in (2):

totalWeightOfChars(PW) :- addCharWeights(PW,C), maxChar(C).

addCharWeights(PW,1) :- compatible(1), weightOfChar(1,PW).addCharWeights(0,1) :- not compatible(1).

addCharWeights(PW+CW,C) :- compatible(C),weightOfChar(C,CW), addCharWeights(PW,C-1).

addCharWeights(PW,C) :- not compatible(C),addCharWeights(PW,C-1).

Observe in the third and the fifth rules that the incompatible characters do not contribute tothe weight of a phylogeny.

In the third part, we have taken into account the domain-specific information discussedin the previous section. For instance, it is least likely that Greco-Armenian (GA) languagesbe siblings with Balto-Slavic (BS) languages. Therefore, we have decreased some amount,say 30, from the weight of a phylogeny if it groups GA and BS as siblings. For Indo-European, historical linguist Don Ringe identified 4 odd groupings. We have defined thesum W of deductions due to all these odd groupings as follows:

totalDeductions(W) :- addDeductions(W,4).addDeductions(W,1) :- oddGroup(1,W).addDeductions(W+PW,G+1) :- oddGroup(G+1,PW), addDeductions(W,G).

Finally, in the fourth part, we have defined the weight of a phylogeny by subtracting thetotal deductions of odd groupings from the total weights of characters:

weightOfPhylogeny(PW-W) :- totalWeightOfChars(PW),totalDeductions(W).

We have described the weight constraints, to ensure that the weight of the phylogeny islarger than w, as another ASP program C as follows:

:- weightOfPhylogeny(W), W<w.

11

Since LPARSE and CLASP supports weight constraints [19], note that we can also modelthe weight measure using weight constraints instead of recursive definitions. There are twosummations in the weight definition: totalDeductions and totalWeightOfChars.Total deductions can be defined using weight constraints as follows:

totalDeductions(DW) :-DW[oddGroup(G,D): group(G): oddGroupWeight(D)=D]DW,phylogenyWeight(DW).

In addition, total weight of characters can be defined as follows:

totalWeightOfCharacters(PW) :-PW[weightOfCompatibleCharacter(K,CW):

informative_character(K): charWeight(CW)=CW]PW.weightOfCompatibleCharacter(IC,CW) :-

weightOfCharacter(IC,CW), compatible(IC).

We have also considered this representation of the weight function (with aggregates) in ourexperiments.

5.3 The Search-Based Method

The search-based method (Section 3, Fig 2) suggests defining the weight measure in C++(in a separate file) and use it with CLASP-W (the modified version of CLASP that computesweighted solutions).

As discussed in Section 3, we need to define and implement a heuristic function to esti-mate the upper bound for the weight of a phylogeny from a partial phylogeny. To guaranteecompleteness, the heuristic function should be admissible. Since the weight of a phylogenyis defined over the weights of incompatible characters, we can define the heuristic functionas follows with respect to the set I of informative characters and the partially constructedanswer set A:

UBΦ(A, I) =∑i∈I

Φ(i)−∑

i∈I,incompatible(i)∈AΦ(i)

Proposition 3 UBΦ is admissible.

We have implemented this heuristic function in C++, and used it with CLASP-W in ourexperiments. We did not have to modify the search algorithm of CLASP-W to be able to usethis weight function in the computations.

6 Computing Weighted Phylogenies for Quercus

We present another example to show the inapplicability/applicability of our methods, with adifferent weight function and for a different dataset (i.e., Quercus species).

The domain-specific information given about groupings of Quercus species is hierarchi-cal: both the expected groupings of species and further groupings of these subgroupings arespecified. To take into account such domain-specific information about hierarchical group-ings, we have defined a novel general weight function that can be used also for reconstruct-ing weighted phylogenies for the genus Quercus. However, this weight function involves

12

mathematical operations which can not be expressed in the input language of CLASP; forthat reason, the representation-based method cannot be applied to compute weighted phylo-genies for Quercus using this weight function. Fortunately, we can apply the search-basedmethod.

Before we show the applicability of the search based method, let us define the weight ϕof a vertex, and the weight of a phylogeny (V,E, L, I, S, f) for genus Quercus.

6.1 Weight of a Phylogeny for Quercus

Specific to Quercus dataset, biologist Yasin Bakis, who also gathered the data which isdescribed in [1], provided some domain-specific information about the expected groupingsof the species and expected classes of these groupings. According to this information, aphylogeny is preferable if these species are closer to each other in the tree. Therefore, wehave defined a weight measure to reflect such domain-specific information as follows.

The weight of a phylogeny (V,E, L, I, S, f) is the sum of the weights of all verticesexcept its root r:

weight(V,E, L, I, S, f) =∑

v∈V/{r}ϕ(v). (3)

where the weight ϕ(v) of a vertex v is defined as follows:

1. We label the leaves with their class information.2. We propagate the labels of the leaves up to the root and we label each internal vertex

with the labels of its children.3. We assign a weight to each vertex by comparing its labels with those of its sibling. To be

able to compare the labeling of the vertices, we define the contribution ς(c, v) of a vertexv with respect to a label c as follows. Let sibling(v) denote the sibling of the vertex v,and let label(v) denote the labels of the vertex v.

ς(c, v) =

0 if c 6∈ label(sibling(v)) or

|label(v)| = the total number of classes,1

|label(v)| otherwise(4)

The weight ϕ(v) of a vertex v is then the minimum of the following two values: themaximum value maxContr(v) of the contributions ς(c, v) over its labels c, and the max-imum value maxContr(sibling(v)) of the contribution ς(c′, sibling(v)) over its sibling’slabels c′:

ϕ(v) = min(maxContr(v),maxContr(sibling(v))). (5)

In the rest of the paper, for each function h defined over partial phylogenies above, wedenote by hZ the function h defined for a partial phylogeny Z.

6.2 The Search-Based Method

Similar to 5.3, a heuristic function is needed to estimate the upper bound for the weightof a phylogeny from a partial phylogeny. To guarantee completeness, admissibility of theheuristic function is required. Since the weight of a phylogeny is defined over the weights of

13

vertices, we can define the heuristic function as follows with respect to the set V of verticesand the partially constructed answer set A.

UBϕ(A, V ) =∑v∈V

ϕ′A(v). (6)

where ϕ′A(v) is defined as follows:

ϕ′A(v) =

1 if labelA(v) = ∅,

or v does not have a sibling in A,or labelA(siblingA(v)) = ∅ ,

minCA(v) otherwise.

(7)

and minCA(v) is defined as follows:

minCA(v) = min(maxContrA(v),maxContrA(siblingA(v))). (8)

Proposition 4 UBϕ is admissible.

We have implemented this heuristic function in C++ in a separate file, and used it withCLASP-W in our experiments. As in our experiments with Indo-European, we did not haveto modify the search algorithm of CLASP-W to be able to use this weight function in thecomputations.

7 Experimental Results

We applied the computational methods described above (i.e., the representation-based meth-ods and the search-based method) to weighted phylogeny reconstruction for Indo-Europeanlanguages and for Quercus species. In our experiments, we considered [4]’s ASP encodingof the phylogeny reconstruction program.

All experimental results reported below are for a workstation with a 3.00GHz Intel Xeon5160 Dual-Core Processor and 4GB RAM, running Red Hat Enterprise Linux (Version 4.3);in these experiments LPARSE v.1.0.17 is used as the grounder and CLASP v.1.3.1 is used asthe answer set solver.

7.1 Experiments with Indo-European Languages

We used the dataset assembled by Don Ringe and Ann Taylor [23]. As in [4], to computeweighted phylogenies, we considered the language groups Balto-Slavic (BS), Italo- Celtic(IC), Greco-Armenian (GA), Anatolian (AN), Tocharian (TO), Indo-Iranian (IIR), Germanic(GE), and the language Albanian (AL). While computing phylogenies, we also took intoaccount some domain-specific information about these languages.

Due to the nature of the domain-specific information about simple groupings, we con-sidered the weight measure defined in Section 5.2. Recall that this weight function can bedefined in ASP in two ways: iteratively using recursive rules, and using aggregates. There-fore, in our experiments we considered both representation-based methods and comparedthem with the search-based method.

According to the results of [4], the phylogenies identified as plausible have 16,17 or 18incompatible characters. Among those plausible phylogenies, the one with the lowest weighthas weight of 45. Therefore, in our experiments we considered c = 16, 17, 18 and w = 45.

14

Comparison of the methods Let us first consider computing an at least w-weighted phy-logeny with at most c incompatible characters. For each problem and for each method,Table 1 presents the computation time, the size of the ground program, and the maximumsize of the memory used in computation. For instance, let us consider computing a phy-logeny with at most 17 incompatible characters, and whose weight is at least 45. With therepresentation-method with recursive definitions, CLASP took 15.32 CPU sec.s to computesuch a phylogeny; the ground program has 79229 atoms and 1585419 rules; the computa-tion of the phylogeny consumed 369 MB of memory. With the other representation-basedmethod (with aggregates), CLASP took 2.08 CPU sec.s to compute such a phylogeny; theground program had 7982 atoms and 98245 rules; the computation of the phylogeny con-sumed 22 MB of memory. On the other hand, with the search-based method, CLASP-W took1.30 CPU sec.s to compute such a phylogeny; the ground program had 3744 atoms and55219 rules; the computation of the phylogeny consumed 22 MB of memory.

Observe in Table 1 that in terms of both computation time and the memory used, thesearch-based method performs much better than the representation-based method with re-cursive definitions, and slightly better than the representation-based method with aggre-gates. The representation of the weight measure play an important role in the computa-tional efficiency of these methods in terms the time and memory. The representation-basedmethod with recursive definitions result in larger programs and higher memory consump-tion; whereas the representation-based method with aggregates has a more compact encod-ing of the weight measure, it results in smaller programs and low memory consumption. Onthe other, the search-based method deals with the time consuming computation of weightsof phylogenies, not at the ASP representation level but at the search level: it does not requirean ASP representation of the weight function but requires a modification of the solver toguide the search to obtain a plausible phylogeny.

Now let us examine the results of our experiments (Table 2) for computing all of thephylogenies with at least w weight and with at most c incompatible characters. The re-sults are similar with those of the previous batch of experiments for computing a weightedphylogeny: the search-based method and the representation-based method with aggregatesoutperforms the representation-based method with recursive definitions. For instance, let usconsider the problem of computing all phylogenies with at most 17 incompatible characters,and whose weight is at least 45. With our representation-based method with recursive defi-nitions, CLASP took 126 CPU sec.s to compute all 8 phylogenies. With the representation-based method with aggregates, CLASP took 20 CPU sec.s, whereas, with the search-basedmethod, CLASP-W took 17 CPU sec.s to compute them.

Effectiveness of the methods In [4], after computing all 45 phylogenies, the authors exam-ine them manually and identify 34 of them as plausible (1 with 16 incompatible characters,7 with 17 incompatible characters, 6 with 18 incompatible characters, 3 with 19 incom-patible characters, 17 with 20 incompatible characters). With the weight measure definedin Section 5, and the representation/search-based methods described above for computingweighted phylogenies, we could automatically compute all plausible phylogenies with atmost 18 incompatible characters.

7.2 Experiments with Quercus species

We used the dataset assembled by Yasin Bakis [1] to compute weighted phylogenies forQuercus species. The dataset consists of 47 Quercus populations from different parts of

15

Table 1 The representation-based methods vs. the search-based method: computing an at least w-weightedphylogeny with at most c incompatible characters.

# of incompatibleweightmethod time program size memory sizecharacters (c) (w) (CPU sec.s) (MB)16 45 representation-based 15.52 # of atoms: 79229 369

(with recursive definitions) # of rules: 1585419representation-based 2.08 # of atoms: 7982 22(with aggregates) # of rules: 98245search-based 1.34 # of atoms: 3744 22

# of rules: 5521917 45 representation-based 15.32 # of atoms: 79229 369


# of rules: 5521918 45 representation-based 15.47 # of atoms: 79229 369


# of rules: 55219

Table 2 The representation-based methods vs. the search-based method: computing all at least w-weightedphylogenies with at most c incompatible characters.

# of incompatibleweight# of phylogeniesmethod time memory sizecharacters (c) (w) (CPU sec.s)(MB)16 45 1 representation-based 14.49 369

(with recursive definitions)representation-based 5.46 38(with aggregates)search-based 3.67 22

17 45 8 representation-based 126.06 369(with recursive definitions)representation-based 20.41 38(with aggregates)search-based 17 22

18 45 14 representation-based 210.32 369(with recursive definitions)representation-based 33.93 38(with aggregates)search-based 22.35 22

Turkey. In addition, we considered some domain-specific information about the expectedgroupings of these populations, and further classification of these groupings. In earlier ex-periments with Quercus, we had found out that the minimum number of character incompat-ible with a phylogeny is 9. Since it is useful in phylogenetics to compute solutions “close”to the optimal one, we also computed phylogenies with at most 9, 10, 11, or 12 incompatiblecharacters. Considering the weights of the plausible phylogenies, in our experiments we setc = 9, 10, 11, 12 and w = 11 as thresholds.

Due to the nature of the domain-specific information about hierarchical groupings, weconsidered the weight measure defined in Section 6.1. Recall that this weight function can

16

not be defined in ASP due to mathematical functions not supported by the ASP solvers.Therefore, we could not apply the representation-based methods for reconstructing weightedphylogenies for Quercus. Even then, our experiments with Quercus species are interestingto illustrate the usefulness of the search-based method. From the point of view of ASP, itprovides a tool for computing weighted solutions; neither the ASP formulation of the prob-lem nor the search algorithm of CLASP-W needs problem/function-specific modifications.From the point of view of phylogenetics, it provides a tool for computing more plausiblephylogenies and thus analyzing phylogenies.

8 Related Work

Computing weighted solutions can be considered as a method to compute preferred solu-tions. In that sense, our studies are related to computing preferred answer sets which hasbeen studied extensively. For instance, one way to describe preferences is to prioritize therules in the program [12,24]. With this approach, the rules of the program are ranked ac-cording to some preference relation; the answer sets which are described by higher rankedrules are selected as the preferred answer sets. Similar to prioritized programs, [2] and [3]propose several methods for describing preferences of answer sets via preferences over lit-erals. The idea is to have a ranking over the literals in the head of the rules; the answer setswhich consist of higher-ranked literals are presented as more preferred ones. Another ap-proach to represent preferences is to assign weights to weak constraints [6] or literals [20].The idea is to assign weights to rules or literals, then the answer sets whose total weight isabove some threshold are considered to be more preferred. In [22], the authors take a simi-lar approach by assigning weights to rules; the idea here is to compute the answer sets thatsatisfy a subset of the rules whose weight is maximum. Although these techniques mightbe effective in describing some simple weight measures, in general they are not suitable fordescribing more complex weight measures (e.g., weight of a phylogeny) that require explicitdefinitions.

Another line of work that is related is computation of similar/diverse solutions [11] withrespect to a given distance function. In [11], the authors propose defining a distance functionto measure the similarity/diversity of solutions, and present a method for integrating thisdistance measure in CLASP to compute n k-similar (resp. diverse) solutions (i.e., n solutionswhose distance is less (resp. greater) than or equal to k). They call this modified version ofCLASP as CLASP-NK. Note that AT LEAST (resp. AT MOST) w-WEIGHTED SOLUTION canbe solved using CLASP-NK of [11] by defining the distance function in terms of the givenweight function ω. However, in general, we cannot compute n k-similar (resp. diverse) atleast (resp. most) w-weighted solutions in this way. To be able to solve all four variationsof n k-SIMILAR (resp. k-DIVERSE) AT LEAST (resp. AT MOST) w-WEIGHTED SOLUTIONS

using CLASP-NK, we need to define four different distance functions (rather than a genericdistance function) in terms of the given distance function∆ and the given weight function ω.On the other hand, we can extend the applicability of our methods and the methods of [11],by considering more general problems as follows:

n k-SIMILAR (resp. k-DIVERSE) AT LEAST (resp. AT MOST) w-WEIGHTED SOLU-TIONS: Given an ASP program P that formulates a computational problem P , a weightmeasure ω that maps a solution for P to a nonnegative integer, a distance measure∆ thatmaps a set of solutions to a nonnegative integer, nonnegative integers w and k, decidewhether a set S of n solutions for P exists such that ∆(S) ≤ k (resp. ∆(S) ≥ k) andfor each s ∈ S, ω(s) ≥ w (resp. ω(s) ≤ w).

17

Algorithm 3 CLASP-NKWInput: An ASP program Π and nonnegative integers w, n and kOutput: A set of n k-similar at least w-weighted solutionsA← ∅ // current assignment of literals5← ∅ // set of conflictsX ← ∅ // previously computed answer setswhile |X| < n do

PROPAGATION(Π,A,5)weight← WEIGHT-ANALYZE(A) // Related to CLASP-Wdistance← DISTANCE-ANALYZE(A,X) // Related to CLASP-NKif (There is a conflict in propagation) OR (weight < w) OR (distance > k) then

RESOLVE-CONFLICT (Π,A,5)end ifif Current assignment does not yield an answer set then

SELECT(Π,A,5)else

return X ← X ∪Aend if

end whilereturn X

In fact, we have introduced a search-based method for computing n k-similar (resp. k-diverse) at least (resp. at most) w-weighted solutions in ASP, and implemented this methodby modifying the search algorithm CLASP-W as in Algorithm 3; the parts in blue denotethese modifications. The modified version is called CLASP-NKW. We applied the search-based method for computing similar/diverse weighted solutions to reconstructing phyloge-nies for Indo-European languages using CLASP-NKW. In these batch of experiments, weconsidered the nodal distance (as in [11]) to measure the similarity/diversity of a set of phy-logenies. For instance, we computed 2 12-similar and at least 45-weighted phylogenies usingCLASP-NKW; the computation took 2.3 CPU sec.s. Likewise, we computed 2 24-diverse andat least 45-weighted phylogenies using CLASP-NKW; the computation took 2.7 CPU sec.s.The results of our experiments comply with the groupings of [4]: the similar phylogeniescomputed by CLASP-NKW are in the same group; the diverse phylogenies are in differentgroups.

9 Conclusion

We have studied problems related to computing weighted solutions in ASP, investigatedcomputational methods to solve them using the state-of-the-art answer set solvers, comparedthese methods experimentally, and showed their usefulness and effectiveness in phylogenyreconstruction applied to two real datasets, Indo-European languages and Quercus species.

In particular, we have investigated two sorts of methods for computing weighted so-lutions: one method suggests modifying the ASP representation of the problem to computeweighted solutions using an existing ASP solver and the other suggests modifying the searchalgorithm of the answer set solver to compute weighted solutions incrementally. In our ex-periments, we have used the answer set solver CLASP with the representation-based method;and we have modified the search algorithm of CLASP and called it CLASP-Wfor the search-based method. In the representation-based method, a given weight function is represented inASP; in the search-based method however it is implemented in C++. We have defined twonovel weight functions for phylogenies that can incorporate domain-specific information

18

about simple (resp. hierarchical) groupings; the weight function that incorporates informa-tion about hierarchical groupings cannot be represented in ASP due to some mathematicalfunctions not supported by the ASP solvers.

In our experiments, we have observed the following:– In terms of computational efficiency (time and space), the search-based method outper-

forms the representation-based method (Section 7). This is mainly due to two reasons: inthe search-based method, the weight function is not explicitly defined in ASP, and thusprogram size is smaller; the search algorithm of the ASP solver is modified to preventredundant search, and thus the search space is smaller as well.

– The representation-based methods may be preferable if the weight function can be de-fined concisely in ASP. On the other hand, some weight measures, which involve math-ematical functions (e.g., division, logarithm) over real numbers, cannot be representedin ASP; in such cases the search-based method provides a useful tool for computingweighted solutions in ASP, as seen in Section 6.1.

– The performance of a representation-based method can be improved by utilizing theconstructs supported by the particular answer set solver. For instance, if the answer setsolver supports aggregates then it may be more advantageous to use aggregates insteadof a recursive definition while defining the weight function, as shown in Section 5.2.

– For search-based methods, one needs to construct a heuristic function to estimate theweight of a complete solution from a partial solution; to guarantee completeness, theheuristic function should be admissible. For some cases, for instance, when weightedsolutions are computed quite often as part of a real-world application (e.g., considercomputation of weighted phylogenies integrated as part of a phylogenetics system likePHYLO-ASP [13]), it may worth constructing an (admissible) heuristic function.

– In the search-based method, the weight function is defined in C++ in a separate file. Thisallows us to use CLASP-W easily with different weight measures, without modifying itssearch algorithm.Based on the promising experimental results in phylogenetics, we have extended our

search-based method to be able to compare weighted solutions; for that we have modifiedCLASP-W in the spirit of [11]. The modified version of CLASP-W is called CLASP-NKW.By this way, given a weight measure ω for a phylogeny and a distance measure ∆ for a setof phylogenies (both as C++ programs, in separate files), we can compute similar/diverseweighted phylogenies.

Our methods for computing (similar/diverse) weighted solutions, and the solvers CLASP-W and CLASP-NKW provide useful tools for analyzing/comparing solutions both in ASP andin phylogenetics. In ASP, there are many appealing applications. For example, one can useCLASP-W to compute more preferred products in a product configuration problem [27]. Inplanning [16], one can be interested in assigning weights to actions. Then, it might be de-sirable to compute plans whose total action weight is less than some threshold. We can useCLASP-W for such an application by defining an appropriate weight measure and an (ad-missible) heuristic function for the weight measure. As regards the phylogenetics, there isno software system that has utilities for analyzing/comparing phylogenies; in that sense, oursearch-based methods (including the weight/distance measures) allow experts to automati-cally analyze phylogenies.

Acknowledgments

This work has been supported by TUBITAK Grant 107E229.

19

References

1. Y. Bakıs. Morphometric Analysis of Oak (Quercus L.) Acorns in Turkey. PhD thesis, Abant Izzet BaysalUniversity, 2005.

2. G. Brewka, I. Niemela, and M. Truszczynski. Answer set optimization. In Proc. of IJCAI, pages 867–872, 2003.

3. Gerhard Brewka. Logic programming with ordered disjunction. In Proc. of AAAI, pages 100–105, 2002.4. D. R. Brooks, E. Erdem, S. T. Erdogan, J. W. Minett, and D. Ringe. Inferring phylogenetic trees using

answer set programming. Journal of Automated Reasoning, 39(4):471–511, 2007.5. D. R. Brooks, E. Erdem, J. W. Minett, and D. Ringe. Character-based cladistics and answer set program-

ming. In Proc. of PADL, pages 37–51, 2005.6. F. Buccafurri, N. Leone, and P. Rullo. Strong and weak constraints in disjunctive datalog. In Proc. of

LPNMR, pages 2–17, 1997.7. D. Cakmak, E. Erdem, and H. Erdogan. Computing weighted solutions in answer set programming. In

Proc. of LPNMR, pages 416–422, 2009.8. D. Cakmak, H. Erdogan, and E. Erdem. Computing weighted solutions in ASP: Representation-based

method vs. search-based method. In Proc. of RCRA Workshop on Experimental Evaluation of Algorithmsfor Solving Problems with Combinatorial Explosion, 2010.

9. M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Communicationsof the ACM, 5:394–397, 1962.

10. W. H. E. Day and D. Sankoff. Computational complexity of inferring phylogenies by compatibility.Systematic Zoology, 35(2):224–229, 1986.

11. T. Eiter, E. Erdem, H. Erdogan, and M. Fink. Finding similar or diverse solutions in answer set program-ming. In Proc. of ICLP, 2009.

12. T. Eiter, W. Faber, N. Leone, and G. Pfeifer. Computing preferred and weakly preferred answer sets bymeta interpretation in answer set programming. In Proc. of ASP Workshop, 2001.

13. Esra Erdem. Phylo-asp: Phylogenetic systematics with answer set programming. In Proc. of LPNMR,pages 567–572, 2009.

14. W. Faber, G., N. Leone, T. Dell’Armi, and G. Ielpa. Design and implementation of aggregate functionsin the dlv system. Theory and Practice of Logic Programming, 8(5-6):545–580, 2008.

15. M. Gebser, B. Kaufmann, A. Neumann, and T. Schaub. Conflict-driven answer set solving. In Proc. ofIJCAI, pages 386–392, 2007.

16. V. Lifschitz. Action languages, answer sets and planning. In The Logic Programming Paradigm: a25-Year Perspective, pages 357–373. Springer Verlag, 1999.

17. V. Lifschitz. What is answer set programming? In Proc. of AAAI, 2008.18. Luay Nakhleh. Phylogenetic networks. PhD thesis, 2004.19. Ilkka Niemela and Patrik Simons. Extending the smodels system with cardinality and weight constraints.

pages 491–521, 2001.20. D. V. Nieuwenborgh, S. Heymans, and D. Vermeir. Weighted answer sets and applications in intelligence

analysis. In Proc. of LPAR, pages 169–183, 2004.21. M. Nogueira, M. Balduccini, M. Gelfond, R. Watson, and M. Barry. An a-prolog decision support system

for the space shuttle. In Proc. of PADL, pages 169–183, 2001.22. E. Oikarinen and M. Jarvisalo. Max-ASP: Maximum satisfiability of answer set programs. In Proc. of

LPNMR, pages 236–249, 2009.23. D. Ringe, T. Warnow, and A. Taylor. Indo-European and computational cladistics. Transactions of the

Philological Society, 100(1):59–129, 2002.24. C. Sakama and K. Inoue. Prioritized logic programming and its application to commonsense reasoning.

Artificial Intelligence, 123(1-2):185–222, 2000.25. P. Simons, I. Niemela, and T. Soininen. Extending and implementing the stable model semantics. Artifi-

cial Intelligence, 138:181–234, 2002.26. P. Simons and T. Soininen. Stable model semantics of weight constraint rules. In Proc. of LPNMR, pages

317–331, 1999.27. Timo Soininen and Ilkka Niemela. Developing a declarative rule language for applications in product

configuration. In Proc. of PADL, pages 305–319, 1998.28. T. C. Son and E. Pontelli. A constructive semantic characterization of aggregates in answer set program-

ming. Theory and Practice of Logic Programming, 7(3):355–375, 2007.29. T.C. Son, E. Pontelli, and C. Sakama. Logic programming for multiagent planning with negotiation. In

Proc. of ICLP, pages 99–114, 2009.30. Nam Tran and Chitta Baral. Reasoning about triggered actions in ansprolog and its application to molec-

ular interactions in cells. In Proc. of KR, pages 554–564, 2004.31. J. P. White and J. F. O’Connell. A Prehistory of Australia, New Guinea, and Sahul. Academic, 1982.

20

Appendix

A Proof of Proposition 1

Proposition 1 AT LEAST (resp. AT MOST) w-WEIGHTED SOLUTION is NP-complete.

Proof Membership follows from the fact that we can not only guess a candidate solution S but also a witnessfor ω(S) ≤ w (resp. ω(S) ≥ w), and check in polynomial time whether ω(S) ≤ w (resp. ω(S) ≥ w).For hardness, we can reduce the answer set existence for normal propositional programs, which is an NP-complete problem, to this problem: take ω(S) = 1 for every S and w = 0.

B Proof of Proposition 2

Proposition 2 wc-WCP is NP-complete.

Proof Membership follows from the fact that we can not only guess a candidate tree (V,E) but also checkin polynomial time whether weight(V,E, L, I, S, f) ≥ w and whether the phylogeny has at most c incom-patible characters [18, Theorem 17]. For hardness, we can reduce c-CP to wc-WCP by taking ω(i) = 1 forevery i ∈ I .

C Proof of Proposition 3

Proposition 3 UBΦ is admissible.

Proof We want to show that UBΦ(A, I) never underestimates the weight function weight(P ), for everycomplete phylogeny P that extends A.

Take any complete phylogeny P that extends A. Let ICP be the characters that are incompatible withP ; and let ICA be the characters that are incompatible with A.

The weight of P is:

weight(P ) =∑

i∈I,i is compatible

Φ(i) =∑i∈I

Φ(i)−∑

i∈I,i∈ICP

Φ(i).

On the other hand, since P is a completion of A, ICA ⊆ ICP , and thus∑i∈I,i∈ICA

Φ(i) ≤∑

i∈I,i∈ICP

Φ(i).

Then ∑i∈I

Φ(i)−∑

i∈I,i∈ICP

Φ(i) ≤∑i∈I

Φ(i)−∑

i∈I,i∈ICA

Φ(i).

Therefore, weight(P ) ≤ UBΦ(A, I).

D Proof of Proposition 4

Proposition 4 UBϕ is admissible.

We need the following definitions, notation, and lemmas to prove Proposition 4.Let P be a phylogeny (V,E, L, I, S, f). We say that a phylogeny X = (VX , EX , LX , IX , SX , fX)

is contained in P (denoted X ⊆ P ) if VX ⊆ V , EX ⊆ E, LX ⊆ L, IX ⊆ I , SX ⊆ S, f |LX= fX .

Let X = (VX , EX , LX , IX , SX , fX) and Y = (VY , EY , LY , IY , SY , fY ) be two partial phyloge-nies contained in P . We say that X is label-contained in Y (denoted X ⊆l Y ) if

– X ⊆ Y ,– for every v ∈ V , labelX(v) ⊆ labelY (v).

21

In the following lemmas, let P1 (with vertices VP1 ) and P2 (with vertices VP2 ) be two partial phyloge-nies of P such that P1 ⊆l P2 and |EP2 \ EP1 | ≤ 1.

Lemma 1 For every vertex v ∈ V , if labelP1 (v) = ∅ or v is not in VP1 , then ϕ′P1(v) ≥ ϕ′P2

(v).

Lemma 2 For every vertex v ∈ V , if v does not have a sibling in P1 or labelP1 (siblingP1 (v)) = ∅ thenϕ′P1

(v) ≥ ϕ′P2(v).

Lemma 3 For every partial phylogeny P1 of P and for every vertex v ∈ V , if

(i) labelP1 (v) 6= ∅,(ii) labelP1 (siblingP1 (v)) 6= ∅,

(iii) labelP1 (v) ∩ labelP1 (siblingP1 (v)) = ∅,then ϕ′P1

(v) = 0.

Lemma 4 For every vertex v ∈ V , if


(iii) labelP1 (v) ∩ labelP1 (siblingP1 (v)) 6= ∅.ϕ′P1

(v) ≥ ϕ′P2(v).



(iii) labelP2 (v) ∩ labelP2 (siblingP2 (v)) = ∅.then ϕ′P1

(v) = ϕ′P2(v) = 0.



(iii) labelP1 (v) ∩ labelP1 (siblingP1 (v)) = ∅,(iv) labelP2 (v) ∩ labelP2 (siblingP2 (v)) 6= ∅.(v) EP2 = EP1

then there exists a label Z ∈ labelP2 (siblingP2 (v)) such that

(a) Z ∈ (labelP2 (v) ∩ labelP2 (siblingP2 (v)),(b) Z 6∈ labelP1 (v),(c) Z ∈ labelP1 (siblingP1 (v)),(d) for some leaf child vc of v, Z ∈ labelP2 (vc).



(iii) labelP1 (v) ∩ labelP1 (siblingP1 (v)) = ∅,(iv) labelP2 (v) ∩ labelP2 (siblingP2 (v)) 6= ∅,(v) EP2 6= EP1 .

then

(a) there exists an edge (v, vc) ∈ EP2 but not in EP1 and,(b) there exists a label Z ∈ labelP2 (siblingP2 (v)) such that,

(b1) Z ∈ (labelP2 (v) ∩ labelP2 (siblingP2 (v))(b2) Z ∈ labelP1 (siblingP1 (v)),(b3) Z 6∈ labelP1 (v),(b4) for some child vc of v, Z ∈ labelP2 (vc).



(iii) labelP1 (v) ∩ labelP1 (siblingP1 (v)) = ∅,(iv) labelP2 (v) ∩ labelP2 (siblingP2 (v)) 6= ∅

22

then the following hold:(a) ϕ′P2

(v) ≥ ϕ′P1(v),

(b) There exists a child vc of v, ϕ′P2(vc) ≤ ϕ′P1

(vc),(c) (ϕ′P1

(vc)− ϕ′P2(vc))− (ϕ′P2

(v)− ϕ′P1(v)) ≥ 0.

Lemma 9 UBϕ(P2, V ) ≤ UBϕ(P1, V ).

Lemma 10 weight(P ) = UBϕ(P, V ).

Proof (Proposition 4) We want to show that, for any partial phylogenyP0 ofP ,UBϕ(P0, V ) ≥ weight(P ).For any partial phylogeny P0, there is a sequence of partial phylogenies P0, P1, ..., Pk be partial phy-

logenies of P , such that P0 ⊆l P1 ⊆l ... ⊆l Pk = P .Due to Lemmas 9, ∀i(0 ≤ i ≤ k)UBϕ(Pi, V ) ≥ UBϕ(Pi+1, V ). Then,UBϕ(P0, V ) ≥ UBϕ(P, V )

as well. By Lemma 10, UBϕ(P0, V ) ≥ weight(P ). Therefore, UBϕ is admissible.

Proof (Lemma 1) Take any vertex v ∈ V . Assume that labelP1 (v) = ∅ or v is not in VP1 . Then by thedefinition of ϕ′P1

, ϕ′P1(v) = 1. Since 1 is the maximum value of ϕ′W for every partial phylogeny W ,

ϕ′P1(v) ≥ ϕ′P2

(v).

Proof (Lemma 2) Take any vertex v ∈ V . Assume that v does not have a sibling inP1 or labelP1 (siblingP1 (v)) =∅. Then by the definition of ϕ′P1

, ϕ′P1(v) = 1. Since 1 is the maximum value of ϕ′W for anyW , ϕ′P1

(v) ≥ϕ′P2

(v).

Proof (Lemma 3) Take any v ∈ V . Assume (i)–(iii). Due to (i) and (ii),

ϕ′P1(v) = minCP1 (v) = min(maxContrP1 (v),maxContrP1 (siblingP1 (v))).

Due to (iii), since v and siblingP1 (v) do not share a label in P1,

∀l ∈ labelP1 (v), ςP1 (l, v) = 0

and∀l ∈ labelP1 (siblingP1 (v)), ς(l, siblingP1 (v)) = 0.

That ismaxContrP1 (v) = 0 andmaxContrP1 (siblingP1 (v)) = 0.

Therefore, ϕ′P1(v) = 0.

Proof (Lemma 4) Take any v ∈ V . Assume that (i), (ii) and (iii) hold for v. Observe that the maximumvalue that |labelP1 (v)| can take is the total number of classes due to the definition of labels via propagation.Consider two cases:

Case 1: |labelP1(v)| = the total number of classesDue to (i) and (ii) and the propagation of labels described in the definition of label,

ϕ′P2(v) = minCP2 (v)

= min(maxContrP2 (v),maxContrP2 (siblingP2 (v))).

Due to propagation of labels described in the definition of label, |labelP2(v)| = the total number ofclasses. Then, due to the definition of ςP2 , ϕ′P2

(v) = maxContrP2 = 0. Since 0 is the minimumvalue of ϕ′W for any partial phylogeny W , ϕ′P1

(v) ≥ ϕ′P2(v).

Case 2: |labelP1(v)| < the total number of classesDue to (i) and (ii),


= min(maxContrP1 (v),maxContrP1 (siblingP1 (v))).= min( 1

|labelP1 (v)| ,1

|siblingP1 ((v))| )

Due to (i) and (ii) and the propagation of labels described in the definition of label,



Due toP1 ⊆l P2, since |labelP1 (v)| ≤ |labelP2 (v)| and |labelP1 (siblingP1 (v))| ≤ |labelP2 (siblingP2 (v))|,ϕ′P1

(v) ≥ ϕ′P2(v).

23

v

siblingP1(v)

vc

Z

A

A

P1

v

Z

A, Z

P2

siblingP2(v)

siblingP1(vc) vcA

siblingP1(vc)

Fig. 3 Case 1 of the proof of Lemma 8.

Proof (Lemma 5) Take any v ∈ V . Assume that (i), (ii), (iii) hold.Since labelP2 (v) ∩ labelP2 (siblingP2 (v)) = ∅ and P1 ⊆l P2,

labelP1 (v) ∩ labelP1 (siblingP1 (v)) = ∅.

Then, due to (i), (ii), and by Lemma 3, ςP1 (l, v) = 0.Since P1 ⊆l P2, for every v ∈ V , labelP1 (v) ⊆ labelP2 (v). Then, due to (i) and (ii), labelP2 (v) 6= ∅,

labelP2 (siblingP2 (v)) 6= ∅. Then, due to (i) and by Lemma 3, ςP2 (l, v) = 0.Therefore, ϕ′P1

(v) = 0 and ϕ′P2(v) = 0.

Proof (Lemma 6) Take any v ∈ V . Assume that (i), (ii), (iii),(iv) and (v) hold for v. Due to (iv), (a) holds.Due to (iii) and P1 ⊆l P2, (b) holds. Due to P1 ⊆l P2, (c) holds. Due to (iv) and the propagation of labelsdescribed in the definition of label, (d) holds.

Proof (Lemma 7) Take any v ∈ V . Assume that (i), (ii), (iii),(iv) and (v) hold for v. Due to (v), (a) holds.Due to (iv), (b1) holds. Due to P1 ⊆l P2, (b2) holds. Due to P1 ⊆l P2, (iii) and (iv), (b3) holds. Due to (iv)and the propagation of labels described in the definition of label, (b4) holds.

Proof (Lemma 8)Take any v ∈ V . Assume that (i), (ii), (iii) and (iv) hold for v.

(a) ϕ′P2(v) ≥ ϕ′P1

(v).Due to Lemma 3, ϕ′P1

(v) = 0. Since 0 is the minimum value of ϕ′Z for every partial phylogeny Z,ϕ′P2

(v) ≥ ϕ′P1(v).

(b) There exists a child vc of v, ϕ′P2(vc) ≤ ϕ′P1

(vc).Consider two cases:

Case 1: [EP2 = EP1 ](Figure 3) Due to Lemma 6, there exists a label Z 6∈ labelP1 (v) and Z ∈labelP2 (v) and there is a leaf-child vc of v such that Z ∈ label(vc) due to propagation of la-bels described in the definition of label. Since Z 6∈ labelP1 (v), there is no child v′c of v such thatZ ∈ labelP1 (v′c); therefore, Z 6∈ labelP1 (vc). Since vc is a leaf, then labelP1 (vc) = ∅. Thenby Lemma 1, ϕP1 (vc) = 1. Since 1 is the maximum value of ϕW for every partial phylogeny W ,ϕ′P2

(vc) ≤ ϕ′P1(vc).

Case 2: [EP2 6= EP1 ] (Figure 4) Due to Lemma 7, since edge (v, vc) 6∈ EP1 , vc 6∈ VP1 . Then byLemma 1, ϕP1 (vc) = 1. Since 1 is the maximum value of ϕW for every partial phylogeny W ,ϕ′P2

(vc) ≤ ϕ′P1(vc).

24

v

siblingP1(v)Z

A

A

P1

v

vc

A, Z

Z

P2

siblingP1(vc)

Z

ZAsiblingP1(v)siblingP1(vc)

Fig. 4 Case 2 of the proof of Lemma 8.

(c) (ϕ′P1(vc)− ϕ′P2

(vc))− (ϕ′P2(v)− ϕ′P1

(v)) ≥ 0.Consider two cases:

Case 1: [EP2 = EP1 ] (Figure 3)Since (i), (ii) and (iii) hold, then by Lemma 3, ϕ′P1

(v) = 0.

Let us consider the case when ∀v ∈ V , (ϕ′P2(v) − ϕ′P1

(v)) is maximum. Since (i) and (ii) hold,then labelP2 (v) 6= ∅, labelP2 (siblingP2 (v)) 6= ∅ and



Since (i), (iii) and (iv) hold, we know that one of v or siblingP2 (v) has at least 2 labels in P2 andthe other one has at least 1 label in P1. (Note that Z ∈ labelP2 (v) ∩ labelP2 (siblingP2 (v)).)Since by Lemma 6, Z 6∈ labelP1 (v), Z ∈ labelP2 (vc), Z is also in labelP2 (v); then we knowthat v has at least 2 labels and siblingP2 (v) has at least 1 label in P2. Therefore,

ϕ′P2(v) = min( 1

|labelP2 (v)| ,1

|labelP2 (siblingP2 (v))| )

= min( 12, 1) = 1

2.

(Observe that if the number of labels of v or sibling(v) is greater than 2, ϕ′P2(v) is smaller). Since

the maximum value of ϕ′P2is 1

2and the value of ϕ′P1

is 0, ϕ′P2(v)− ϕ′P1

(v) ≤ 12

.Let us consider the case when ϕ′P1

(vc) − ϕ′P2(vc) is minimum. Since Z ∈ labelP2 (v), there

should be a leaf l ∈ VP2 such that Z ∈ label(l). Let vc = l. Since Z 6∈ labelP1 (v), There isno child vd of v such that Z ∈ labelP1 (vd); therefore, Z 6∈ labelP1 (vc). Since vc is a leaf, theneither vc 6∈ V or labelP1 (vc) = ∅. Then by Lemma 1, ϕP1(vc) = 1. Since Z 6∈ labelP1 (v), and∃C ∈ labelP1 (v); C ∈ labelP1 (sibling(vc))( Because C should be propagated from its childsibling(vc) to v.). Since Z 6= C, and the condition (iii), by Lemma 3, ϕP1 (vc) = 0. Since thevalue of ϕP1 (vc) is 1 and the value of ϕP1 (vc) is 0, ϕ′P1

(vc)− ϕ′P2(vc) = 1.

Since ϕ′P2(v)−ϕ′P1

(v) ≤ 12

and ϕ′P1(vc)−ϕ′P2

(vc) = 1, (ϕ′P1(vc)−ϕ′P2

(vc))−(ϕ′P2(v)−

ϕ′P1(v)) > 0.

Case 2: [EP2 6= EP1 ] (Figure 4) Since (i), (ii) and (iii) hold, then by Lemma3, ϕ′P1(v) = 0.

Let us consider the case when (ϕ′P2(v) − ϕ′P1

(v)) is maximum. Due to (i),(ii), and P1 ⊆l P2,labelP2 (v) 6= ∅, labelP2 (siblingP2 (v)) 6= ∅ and



25

Due to (i) and P1 ⊆l P2, v has at least one label A in P2. Due to Lemma 7, v has another label Zin P2. Due to Lemma 7 and P1 ⊆l P2, Z is also a label of siblingP2 (v). We know that v has atleast 2 labels and siblingP2 (v) has at least 1 label in P2. Therefore,

ϕ′P2(v) = min(

1

|labelP2 (v)|,

1

|labelP2 (siblingP2 (v))|) = min(

1

2, 1) =

1

2.

Since the maximum value of ϕ′P2is 1

2and the value of ϕ′P1

(v) is 0, ϕ′P2(v)− ϕ′P1

(v) ≤ 12.

Let us now consider the case when ∀v ∈ V , ϕ′P1(vc) − ϕ′P2

(vc) is minimum. Due to Lemma 7,since edge (v, vc) 6∈ EP1 , vc 6∈ VP1 . Then by Lemma 1, ϕ′P1

(vc) = 1.Due to (i), v has at least one label A in V1. Due to Lemma 7, there exists a label Z in labelP2 (v),that is not in labelP1 (v); thus Z 6= A. Due to Lemma 7, vc 6∈ VP1 . On the other hand, dueto the definition of label, there exists a child vs of v in P1, such that A ∈ labelP1 (vs). Since(v, vc) ∈ P2, vs is the sibling of vc in P2. So, siblingP1 (vc) and siblingP2 (vc) has at least 1label which isA. So far we know that labelP2 (vc) has at least one label Z, and labelP2 (vc) has atleast one labelA. The function ϕ′P2

gets the maximum value for vc for instance under the followingcondition: labelP2 (siblingP2 (vc)) = {A} and labelP2 (vc) = {Z,A}. If vc and siblingP2 (vc)have more than 2 labels, due to the definition of ς , ϕ′P2

decreases. Then

ϕ′P2(vc) = minCP2 (v)

= min(maxContrP2 (vc),maxContrP2 (siblingP2 (vc)))= min( 1

|labelP2 (vc)| ,1

|labelP2 (siblingP2 (vc))| )

= min( 12, 1) = 1

2.

Since the minimum value ofϕ′P1 is 1 and the maximum value ofϕ′P1 is 12

,ϕ′P1(vc)−ϕ′P2

(vc) ≥12

.Since ϕ′P2

(v)−ϕ′P1(v) ≤ 1

2and ϕP1(vc) is 1

2, ϕ′P1

(vc)−ϕ′P2(vc) ≥ 1

2, we can conclude that

(ϕ′P1(vc)− ϕ′P2

(vc))− (ϕ′P2(v)− ϕ′P1

(v)) ≥ 0.

Proof (Lemma 9)Consider two cases:

Case 1: Assume that one of the following holds:(i) labelP1 (v) = ∅,

(ii) labelP1 (siblingP1 (v)) = ∅,(iii) labelP1 (v) ∩ labelP1 (siblingP1 (v)) 6= ∅,(iv) labelP2 (v) ∩ labelP2 (siblingP2 (v)) = ∅.If (i), by Lemma 1, ϕ′P1

(v) ≥ ϕ′P2(v). By definition of UB, UBϕ(P2, V ) ≤ UBϕ(P1, V ).

If (ii), by Lemma 2, ϕ′P1(v) ≥ ϕ′P2

(v). By definition of UB, UBϕ(P2, V ) ≤ UBϕ(P1, V ).If neither (i) nor (ii) holds, and (iii) holds, by Lemma 4, ϕ′P1

(v) ≥ ϕ′P2(v). By definition of UBϕ,

UBϕ(P2, V ) ≤ UBϕ(P1, V ).If neither (i) nor (ii) holds, and (iv) holds, by Lemma 5, ϕ′P1

(v) ≥ ϕ′P2(v). By definition of UBϕ,

UBϕ(P2, V ) ≤ UBϕ(P1, V ).Case 2: Assume that all of the following hold:


(iii) labelP1 (v) ∩ labelP1 (siblingP1 (v)) = ∅,(iv) labelP2 (v) ∩ labelP2 (siblingP2 (v)) 6= ∅.Although ϕ′P2

(v) ≥ ϕ′P1(v) by Lemma 8(a), there is a child vc of v such that ϕ′P2

(vc) ≤ ϕ′P1(vc)

by Lemma 8(b).Moreover, the difference between (ϕ′P1

(vc)−ϕ′P2(vc)) is greater than the difference between (ϕ′P2

(v)−ϕ′P1

(v)) by Lemma 8(c).Therefore,

∑v∈V ϕP2(v) ≤

∑v∈V ϕP1(v); and UBϕ(P2, V ) ≤ UBϕ(P1, V ).

Proof (Lemma 10) Since P is a complete phylogeny, all of its vertices and labels of each vertex is complete.Then, for every v ∈ V , by definition of ϕ′, ϕ′P (v) = minCP (v) = ϕP (v). Therefore, by the definitionsof weight and UBϕ, weight(P ) = UBϕ(P, V ).

computing weighted solutions in asp: representation-based...

Documents