separating regular languages by piecewise testable … regular languages by piecewise testable and...

Separating regular languages by piecewisetestable and unambiguous languages?

Thomas Place, Lorijn van Rooijen and Marc Zeitoun

LaBRI, Univ. Bordeaux & CNRS UMR 5800.351 cours de la Liberation, 33405 Talence Cedex, [email protected], [email protected], [email protected]

Abstract. Separation is a classical problem asking whether, given twosets belonging to some class, it is possible to separate them by a set fromanother class. We discuss the separation problem for regular languages.We give a Ptime algorithm to check whether two given regular languagesare separable by a piecewise testable language, that is, whether a BΣ1(<)sentence can witness that the languages are disjoint. The proof refines analgebraic argument from Almeida and the third author. When separationis possible, we also express a separator by saturating one of the originallanguages by a suitable congruence. Following the same line, we show thatone can as well decide whether two regular languages can be separatedby an unambiguous language, albeit with a higher complexity.

1 Introduction

Separation is a classical notion in mathematics and computer science. In general,one says that two structures L1, L2 from a class C are separable by a structureL if L1 ⊆ L and L2 ∩ L = ∅. In this case, L is called a separator. In separationproblems, the separator L is required to belong to a given class Sep. The problemasks whether two disjoint elements L1, L2 of C can always be separated by anelement of the class Sep. In the case that disjoint elements of C cannot always beseparated by an element of Sep, several natural questions arise:

(1) given elements L1, L2 in C, can we decide whether a separator exists in Sep?(2) if so, what is the complexity of this decision problem?(3) can we, in addition, compute a separator, and what is the complexity?

In this context, it is known for example that separation of two context-freelanguages by a regular one is undecidable [9].

Separating regular languages. This paper looks at separation problems forthe class C of regular languages, and for classes Sep closed under complement.Under this last condition, a separation algorithm for Sep entails an algorithmfor deciding membership in Sep, i.e., membership reduces to separability. Indeed,membership in Sep can be checked by testing whether the input language isSep-separable from its complement.

? Supported by the Agence Nationale de la Recherche ANR 2010 BLAN 0202 01 FREC.

2

Conversely, while finding a decidable characterization for Sep already requiresa deep understanding of the subclass, the search for separation algorithms isintrinsically more difficult. Indeed, powerful tools are available to decide member-ship in Sep: one normally makes use of a recognizing device of the input language,viz. its syntactic monoid. A famous result along these lines is Schutzenberger’sTheorem [14], which states that a language is definable in first-order logic if andonly if its syntactic monoid is aperiodic, a property one can easily decide.

Now for a separation algorithm, the question is whether the input languagesare sufficiently different, from the point of view of the subclass Sep, to allow thisto be witnessed by an element of Sep. Note that we cannot use standard methodson the recognizing devices, as was the case for the membership problem. We nowhave to decide whether there exists a recognition device of the given type thatseparates the input: we do not have it in hand, nor its syntactic monoid. An evenharder question then is to actually construct the so-called separator in Sep.

Contributions. In this paper, we study this problem for two subclasses of theregular languages: piecewise testable languages and unambiguous languages.

Piecewise testable languages are languages that can be described by thepresence or absence of scattered subwords up to a certain size within the words.Equivalently, these are the languages definable using BΣ1(<) formulas, i.e., first-order logic formulas that are boolean combinations of Σ1(<) formulas. A Σ1(<)formula is a first-order formula with a quantifier prefix ∃∗, followed by a quantifier-free formula. A well-known result about piecewise testable languages is Simon’sTheorem [16], which states that a regular language is piecewise testable if andonly if its syntactic monoid is J-trivial. This property yields a decision procedureto check whether a language is piecewise testable, refined by Stern into a Ptimealgorithm [18], of which the complexity has been improved by Trahtman [21].

The second class that we consider is the class of unambiguous languages, i.e.,languages defined by unambiguous products. This class has been given manyequivalent characterizations [19]. For example, these are the FO2(<)-definablelanguages, that is, languages that can be defined in first-order logic using onlytwo variables. Equivalently, this is the class ∆2(<) of languages that are definableby a first-order formula with a quantifier prefix ∃∗∀∗ and simultaneously bya first-order formula with a quantifier prefix ∀∗∃∗. Note that consequently, allpiecewise testable languages are FO2(<)-definable. It has been shown in [10]for ∆2(<), and in [20] for FO2(<) that these are exactly the languages whosesyntactic monoid belongs to the decidable class DA.

There is a common difficulty in the separation problems for these two classes.A priori, it is not known up to which level one should proceed in refiningthe candidate separators to be able to answer the question of separability. Forpiecewise testable languages, this refinement basically means increasing the sizeof the considered subwords. For unambiguous languages, it means increasingthe size of the unambiguous products. For both of these classes, we are able tocompute, from the two input languages, a number that suffices for this purpose.This entails decidability of the separability problem for both classes.

3

In both cases, we obtain a better complexity bound to answer the decisionproblem starting from NFAs: we show that two languages are separable if andonly if the corresponding automata contain certain forbidden patterns of thesame type. We prove that for piecewise testable languages this property can bedecided in polynomial time wrt. the size of the automata and of the alphabet.For unambiguous languages this can be done in exponential space.

Related work. The classes of piecewise testable and unambiguous languages arevarieties of regular languages. For such varieties, there is a generic connectionfound by Almeida [1] between profinite semigroup theory and the separationproblem: Almeida has shown that two regular languages over A are separable bya language of a variety A∗V if and only if the topological closures of these twolanguages inside a profinite semigroup, depending only on A∗V, intersect. Notethat this theory does not give any information about how to actually constructthe separator, in case two languages are separable. To turn Almeida’s result intoan algorithm deciding separability, we should compute representations of thesetopological closures, and test for emptiness of intersections of such closures.

So far, these problems have no generic answer and have been studied inan algebraic context for a small number of specific varieties. Deciding whetherthe closures of two regular languages intersect is equivalent to computing theso-called 2-pointlike sets of a finite semigroup wrt. the considered variety, see [1].This question has been answered positively for the varieties of finite grouplanguages [4,12], piecewise testable languages [3,2], star-free languages [8,7], anda few other varieties, but it was left open for unambiguous languages.

A general issue is that the topological closures may not be describable bya finite device. However, for piecewise testable languages, the approach of [3]builds an automaton over an extended alphabet, of exponential size wrt. theoriginal alphabet, recognizing the closure of a regular language. The algorithmis polynomial wrt. the size of the original automaton (the construction waspresented for deterministic automata but also works for nondeterministic ones).These automata admit the usual construction for intersection and can be checkedfor emptiness in Nlogspace. This yields an algorithm which, from two NFAs,decides separability by a piecewise testable language in time polynomial in thenumber of states of the NFAs, and exponential in the size of the original alphabet.

Our proof for separability by piecewise testable languages follows the samepattern as the method described above. A significant improvement is that we showthat non-separability is witnessed by paths of the same shape in both automata,which yields an algorithm providing better complexity: it runs in polynomial timein both the size of the automata and in the size of the alphabet. Also, we do notmake use of the theory of profinite semigroups: we work only with elementaryconcepts. We have described this algorithm in [13]. Furthermore, we show howto compute from the input languages an index that suffices to separate them.We use the same technique for unambiguous languages. Recently, Czerwinskiet. al. [6] also provided a Ptime algorithm for deciding separability by piecewisetestable languages, but do not provide the computation of such an index.

Due to space constraints, some proofs only appear in the full version of this paper.

4

2 Preliminaries

We fix a finite alphabet A = {a1, . . . , am}. We denote by A∗ the free monoidover A. The empty word is denoted by ε. For a word u ∈ A∗, the smallest B ⊆ Asuch that u ∈ B∗ is called the alphabet of u and is denoted by alph(u).

Separability. Given languages L,L1, L2, we say that L separates L1 from L2 if

L1 ⊆ L and L2 ∩ L = ∅.

Given a class Sep of languages, we say that the pair (L1, L2) is Sep-separable ifsome language L ∈ Sep separates L1 from L2. Since all classes we consider areclosed under complement, (L1, L2) is Sep-separable if and only if (L2, L1) is, inwhich case we simply say that L1 and L2 are Sep-separable.

We are interested in two classes Sep of separators: the class of piecewisetestable languages, and the class of unambiguous languages.

Piecewise Testable Languages. We say that a word u is a piece of v, if

u = b1 · · · bk, where b1, . . . , bk ∈ A, and v ∈ A∗b1A∗ · · ·A∗bkA∗.

For instance, ab is a piece of bbaccba. The size of a piece is its number of letters. Alanguage L ⊆ A∗ is piecewise testable if there exists κ ∈ N such that membershipof w in L only depends on the pieces of size up to κ occurring in w. We writew ∼κ w′ when w and w′ have the same pieces of size up to κ. Clearly, ∼κ is acongruence of finite index. Therefore, a language is piecewise testable if and onlyif it is a union of ∼κ-classes for some κ ∈ N. In this case, the language is said tobe of index κ. It is easy to see that a language is piecewise testable if and only ifit is a finite boolean combination of languages of the form A∗b1A

∗ · · ·A∗bkA∗.Piecewise testable languages are languages definable by BΣ1(<) formulas,

that is, boolean combinations of first-order formulas of the form:

∃x1 . . . ∃xn ϕ(x1, . . . , xn),

where ϕ is quantifier-free. For instance, A∗b1A∗ · · ·A∗bkA∗ is defined by the

formula ∃x1 . . . ∃xk[∧

i<k(xi < xi+1)∧∧i6k bi(xi)

], where the first-order variables

x1, . . . , xk are interpreted as positions, and where b(x) is the predicate testingthat position x carries letter b.

We denote by PT[κ] the class of all piecewise testable languages of index κ orless, and by PT =

⋃κ PT[κ] the class of all piecewise testable languages. Given

L ⊆ A∗ and κ ∈ N, the smallest PT[κ]-language containing L is

[L]∼κ = {w ∈ A∗ | ∃u ∈ L and u ∼κ w}.

In general however, there is no smallest PT-language containing a given language.

Unambiguous Languages. A product L = B∗0a1B∗1 · · ·B∗k−1akB∗k is called

unambiguous if every word of L admits exactly one factorization witnessing itsmembership in L. The number k is called the size of the product. An unambiguous

5

language is a finite disjoint union of unambiguous products. Observe that unam-biguous languages are connected to piecewise testable languages. Indeed, it wasproved in [15] that the class of unambiguous languages is closed under booleanoperations. Moreover, languages of the form A∗b1A

∗ · · ·A∗bkA∗ are unambiguous,witnessed by the product (A \ {b1})∗b1(A \ {b2})∗ · · · (A \ {bk})∗bkA∗. Therefore,piecewise testable languages form a subclass of the unambiguous languages.

Many equivalent characterizations for unambiguous languages have beenfound [19]. From a logical point of view, unambiguous languages are exactlythe languages definable by an FO2(<) formula [20]. Here, FO2(<) denotes thetwo-variable restriction of first-order logic. Another logical characterization whichfurther illustrates the link with piecewise testable languages (i.e., BΣ1(<)-definable languages) is ∆2(<). A Σ2(<) formula is a first-order formula ofthe form:

∃x1 . . . ∃xn ∀y1 . . . ∀ym ϕ(x1, . . . , xn, y1, . . . , ym),

where ϕ is quantifier-free. A language is ∆2(<)-definable if it can be defined bothby a Σ2(<) formula and the negation of a Σ2(<) formula. It has been provenin [10] that a language is unambiguous if and only if it is ∆2(<)-definable.

For two words w,w′, we write, w ∼=κ w′ if w,w′ belong to the same unambigu-

ous products of size κ or less. We denote by UL[κ] the class of all languages thatare unions of ∼=κ-classes, and we let UL =

⋃κUL[κ]. Since unambiguous languages

are closed under boolean operations, UL is the class of all unambiguous languages.Given L ⊆ A∗ and κ ∈ N, the smallest UL[κ]-language containing L is

[L]∼=κ = {w ∈ A∗ | ∃u ∈ L and u ∼=κ w}.Again, in general there is no smallest UL-language containing a given language.

Automata. A nondeterministic finite automaton (NFA) over A is denoted bya tuple A = (Q,A, I, F, δ), where Q is the set of states, I ⊆ Q the set of initialstates, F ⊆ Q the set of final states and δ ⊆ Q×A×Q the transition relation.The size of an automaton is its number of states plus its number of transitions.We denote by L(A) the language of words accepted by A. Given a word u ∈ A∗,a subset B of A and two states p, q of A, we denote

− by pu−−→ q a path from state p to state q labeled u,

− by p⊆B−−→ q a path from p to q of which all transitions are labeled over B,

− by p=B−−→ q a path from p to q of which all transitions are labeled over B, with

the additional demand that every letter of B occurs at least once along it.

Given a state p, we denote by scc(p,A) the strongly connected component of pin A (that is, the set of states that are reachable from p and from which p canbe reached), and by alph scc(p,A) the set of labels of all transitions occurring inthis strongly connected component. Finally, we define the restriction of A to a

subalphabet B ⊆ A by A �Bdef= (Q,B, I, F, δ ∩ (Q×B ×Q)).

3 Separation by piecewise testable languages

Since PT[κ] ⊂ PT, PT[κ]-separability implies PT-separability. Furthermore, fora fixed κ, it is obviously decidable whether two languages L1 and L2 are PT[κ]-

6

separable: there is a finite number of PT [κ] languages over A, and for each of them,one can test whether it separates L1 and L2. The difficulty for deciding whetherL1 and L2 are PT-separable is to effectively compute a witness κ = κ(L1, L2),i.e., such that L1 and L2 are PT-separable if and only if they are PT [κ]-separable.Actually, we show that PT-separability is decidable, by different arguments:

(1.a) We give a necessary and sufficient condition on NFAs recognizing L1 and L2,in terms of forbidden patterns, to test whether L1 and L2 are PT-separable.

(1.b) We give a polynomial time algorithm to check this condition.(2) We compute κ ∈ N from L1, L2, such that PT-separability and PT [κ]-

separability are equivalent for L1 and L2. Hence, if the Ptime algorithm an-swers that L1 and L2 are PT-separable, then [L1]∼κ is a valid PT-separator.

Let us first introduce some terminology to explain the necessary and sufficientcondition on NFAs. Let A be an NFA over A. For u0, . . . , up ∈ A∗ and nonemptysubalphabets B1, . . . , Bp ⊆ A, let u = (u0, . . . , up) and B = (B1, . . . , Bp). A(u,B)-path in A is a successful path (leading from the initial state to a finalstate of A), of the form shown in Fig. 1.

u0 ⊆ B1 ⊆ B1 u1 up−1 ⊆ Bp ⊆ Bp up

= B1 = Bp

Fig. 1. A (u,B)-path

Recall that edges denote sequences of transitions (see section Automata, p. 5).Therefore, if A has a (u,B)-path, then L(A) contains a language of the formu0(x1y

∗1z1)u1 · · ·up−1(xpy

∗pzp)up, where alph(xi) ∪ alph(zi) ⊆ alph(yi) = Bi.

Given NFAs A1 and A2, a pair (u,B) is a witness of non PT-separabilityfor (A1,A2) if there is a (u,B)-path in both A1 and A2. For instance in Fig. 2,u = (ε, c, ε) and B = ({a, b}, {a}) define such a witness of non PT-separability.

A1

a

bc a

ab

ab c

A2

a

Fig. 2. A witness of non PT-separability for (A1,A2): u = (ε, c, ε), B = ({a, b}, {a})

We are now ready to state our main result regarding PT-separability.

Theorem 1. Let A1 and A2 be two NFAs over A. Let L1 = L(A1) and L2 =L(A2). Let k1, k2 be the number of states of A1 resp. A2. Define p = max(k1, k2)+1

and κ = p|A|22|A||A|(p|A|+1). Then the following conditions are equivalent:

7

(1) L1 and L2 are PT-separable.(2) L1 and L2 are PT[κ]-separable.(3) The language [L1]∼κ separates L1 from L2.(4) There is no witness of non PT-separability in (A1,A2).

Condition (2) yields an algorithm to test PT-separability of regular languages.Indeed, one can effectively compute all piecewise testable languages of index κ(of which there are finitely many), and for each of them, one can test whether itseparates L1 and L2. Before proving Theorem 1, we show that Condition (4) canbe tested in polynomial time (and hence, PT-separability is Ptime decidable).

Proposition 2. Given two NFAs A1 and A2, one can determine whether thereexists a witness of non PT-separability in (A1,A2) in polynomial time wrt. thesizes of A1 and A2, and the size of the alphabet.

Proof. Let us first show that the following problem is in Ptime: given statesp1, q1, r1 of A1 and p2, q2, r2 of A2, determine whether there exist a nonempty

B ⊆ A and paths pi⊆B−−→ qi

(=B)−−−→ qi⊆B−−→ ri in Ai for both i = 1, 2.

To do so, we compute a decreasing sequence (Ci)i of alphabets overapproxi-mating the greatest alphabet B that can be chosen for labeling the loops aroundq1 and q2. Note that if there exists such an alphabet B, it should be contained in

C1def= alph scc(q1,A1) ∩ alph scc(q2,A2).

Using Tarjan’s algorithm to compute strongly connected components in lineartime, one can compute C1 in linear time as well. Then, we restrict the automatato alphabet C1, and we repeat the process to obtain the sequence (Ci)i:

Ci+1def= alph scc(q1,A1 �Ci) ∩ alph scc(q2,A2 �Ci).

After a finite number n of iterations, we obtain Cn = Cn+1. Note that n 6|alph(A1) ∩ alph(A2)| 6 |A|. If Cn = ∅, then there exists no nonempty B forwhich there is an (= B)-loop around both q1 and q2. If Cn 6= ∅, then it isthe maximal nonempty alphabet B such that there are (= B)-loops around q1in A1 and q2 in A2. It then remains to determine whether there exist paths

p1⊆B−−→ q1

⊆B−−→ r1 and p2⊆B−−→ q2

⊆B−−→ r2, which can be performed in linear time.To sum up, since the number n of iterations such that Cn = Cn+1 is bounded

by |A|, and since each computation is linear wrt. the size of A1 and A2, onecan decide in Ptime wrt. to both |A| and these sizes whether such a pair ofpaths occurs.

Now we build from A1 and A2 two new automata A1 and A2 as follows. Theprocedure first initializes Ai as a copy of Ai. Denote by Qi the state set of Ai. Foreach 4-uple τ = (p1, r1, p2, r2) ∈ Q2

1 ×Q22 such that there exist B 6= ∅, two states

q1 ∈ Q1, q2 ∈ Q2 and paths pi⊆B−−→ qi

=B−−→ qi⊆B−−→ ri in Ai both for i = 1 and

i = 2, we add in both A1 and A2 a new letter aτ to the alphabet, and “summary”transitions p1

aτ−→ r1 and p2aτ−→ r2. Since there is a polynomial number of tuples

8

(p1, q1, r1, p2, q2, r2), the above shows that computing these new transitions canbe performed in Ptime. So, computing A1 and A2 can be done in Ptime.

By construction, there exists some pair (u,B) such that A1 and A2 bothhave a (u,B)-path if and only if L(A1) ∩ L(A2) 6= ∅. Since both A1 and A2 canbe built in Ptime, this can be decided in polynomial time as well. ut

The following is an immediate consequence of Theorem 1 and Proposition 2.

Corollary 3. Given two NFAs, one can determine in polynomial time, withrespect to the number of states and the size of the alphabet, whether the languagesrecognized by these NFAs are PT-separable. ut

In the rest of the section, we sketch the proof of Theorem 1. The implications(3)⇐⇒(2) =⇒ (1) are obvious. To show (1) =⇒ (2), we introduce some terminol-ogy. Let us fix an arbitrary order a1 < · · · < am on A.

(p,B)-patterns. Let B = {b1, . . . , br} ⊆ A with b1 < · · · < br, and let p ∈ N. Wesay that a word w ∈ A∗ is a (p,B)-pattern if w ∈ (B∗b1B

∗ · · ·B∗brB∗)p. Thenumber p is called the power of w. For example, set B = {a, b, c} with a < b < c.The word bbaababccacbabaca is a (2, B)-pattern but not a (3, B)-pattern.

`-templates. An `-template is a sequence T = t1, . . . , t` of length `, such thatevery ti is either a letter or a nonempty subset of the alphabet A. The mainidea behind `-templates is that they yield decompositions of words that canbe detected using pieces and provide a suitable decomposition for pumping.Unfortunately, not all `-templates are actually detectable. Because of this werestrict ourselves to a special case of `-templates. An `-template is said to beunambiguous if all pairs ti, ti+1 are either two letters, two incomparable sets or aset and a letter that is not included in the set. For example, T = a, {b, c}, d, {a}is unambiguous, while T ′ = b, {b, c}, d, {a} and T ′′ = a, {b, c}, {c}, {a} are not.

p-implementations. A word w ∈ A∗ is a p-implementation of an `-templateT = t1, . . . , t` if w = w1 · · ·w` and for all i either ti = wi ∈ A or ti = B ⊆ A, wi ∈B∗ and wi is a (p,B)-pattern. For example, abccbbcbdaaaa = a.(bccbbcb).d.(aaaa)is a 2-implementation of the 4-template T = a, {b, c}, d, {a}, since bccbbcb is a(2, {b, c})-pattern and aaaa is a (2, {a})-pattern.

We now prove (1) =⇒ (2) by contraposition: we show that if w1 ∈ L1, w2 ∈ L2

are such that w1 ∼κ w2, then for any h, one can build v1 ∈ L1 and v2 ∈ L2 suchthat v1 ∼h v2. Therefore, non-PT[κ]-separability entails non-PT-separability.

Lemma 4. From regular languages L1, L2, we can compute p ∈ N such thatwhenever L1 and L2 both contain p-implementations of the same `-template T ,then L1 and L2 are not PT-separable.

Proof. Let p be greater than the number of states of NFAs recognizing L1, L2.Let w1, w2 be p-implementations of an `-template T = t1, . . . , t`. Fix h ∈ N.Whenever ti is a set B, the corresponding factors in w1, w2 are (p,B)-patterns.By choice of p, these factors can be pumped into (h,B)-patterns in v1 ∈ L1 andv2 ∈ L2, respectively. It is then easy to check that v1 ∼h v2. Hence, L1 and L2

are not PT[h]-separable. Since h is arbitrary, L1, L2 are not PT-separable. ut

9

It remains to prove that if two words contain the same pieces of a large enoughsize κ, they are both p-implementations of a common unambiguous `-template,where p is the number introduced in Lemma 4. We split the proof in two parts.We begin by proving that it is enough to look for `-templates for a bounded `.

Lemma 5. Let p ∈ N. Every word is the p-implementation of some unambiguous

NA-template, for NA = 22|A||A|(p|A|+1).

Proof. We first get rid of the unambiguity condition. Any ambiguous `-template Tcan be reduced to an unambiguous `′-template T ′ with `′ < ` by merging theambiguities. It is then straightforward to reduce any p-implementation of T intoa p-implementation of T ′. Therefore, it suffices to prove that every word is thep-implementation of some (possibly ambiguous) NA-template.

The choice of NA comes from Erdos-Szekeres’ upper bound of Ramsey numbers.Indeed, a complete graph with edges labeled over c = 2|A| colors, there exists acomplete monochromatic subgraph of size m = p|A|+ 1 provided the graph hasat least 2mc vertices (see [5] for a short proof that this bound suffices).

Observe that a word is always the p-implementation of the `-template whichis just the sequence of its letters. Therefore, in order to complete our proof, itsuffices to prove that if a word is the p-implementation of some `-template Twith ` > NA, then it is also the p-implementation of an `′-template with `′ < `.

Fix a word w, and assume that w is the p-implementation of some `-templateT = t1, . . . , t` with ` > NA. By definition, we get a decomposition w = w1 · · ·w`.We construct a complete graph Γ with vertices {0, . . . , `} and edges labeled bysubsets of A. For all i < j, we set alph(wi+1 · · ·wj) as the label of the edge (i, j).Since Γ has more than ` > NA vertices, by definition of NA there exists a completemonochromatic subgraph with p|A|+ 1 vertices {i1, . . . , ip|A|+1}. Let B be thecolor of the edges of this monochromatic subgraph. Let w′ = wi1+1 · · ·wip|A|+1

. Byconstruction, w′ is the concatenation of p|A| > p words with alphabet exactly B.Hence w′ is a (p,B)-pattern. It follows that w is a p-implementation of the `′-template t1, . . . , ti1 , B, tip|A|+2

, . . . , t` with `′ = `− p|A|+ 1. Hence `′ < ` (exceptfor the trivial case p = |A| = 1). ut

The next lemma proves that once ` and p are fixed, given w it is possible to describeby pieces `-templates that w p-implements, as long as they are unambiguous.

Lemma 6. Let `, p ∈ N. From p and `, we can compute κ such that for every pairof words w ∼κ w′ and every unambiguous `-template T , w′ is a p-implementationof T whenever w is a (p+ 1)-implementation of T . ut

We finish the proof of the implication (1) =⇒ (2) by assembling the results.Let p be greater than the number of states of NFAs recognizing L1 and L2, asintroduced in the proof of Lemma 4. Let NA be as introduced in Lemma 5 forp+ 1, and let κ = |A|(p+ 1)NA be as introduced in Lemma 6. Fix h > κ andassume that we have w1 ∈ L1 and w2 ∈ L2 such that w1 ∼κ w2. By Lemma 5, w1

is the (p+ 1)-implementation of some unambiguous NA-template T . Moreover,

10

it follows from Lemma 6 that w2 is a p-implementation of T . By Lemma 4, wefinally obtain that L1 and L2 are not PT-separable.

The implication (1) =⇒ (4) of Theorem 1 is easy to show by contraposition,see [13, Lemma 2]. The remaining implication (4) =⇒ (1) can be shown usingLemma 6. For a direct proof, see [13, Lemma 3], where the key for getting aforbidden pattern out of two non-separable languages is to extract a suitablep-implementation using Simon’s Factorization Forest Theorem [17].

4 Separation by unambiguous languages

This section is devoted to proving that UL-separability is a decidable property.Again, the result is twofold. Using an argument that is analogous to property(2) of Theorem 1 in Section 3, we prove that given L1, L2, it is possible tocompute a number κ such that L1, L2 are UL-separable if and only if they areUL[κ]-separable. It is then possible to test separability by using a brute-forceapproach that tests all languages in UL[κ].

The second part of our theorem is an algorithm providing only a ‘yes/no’answer, but running in exponential space. This algorithm is more complicatedthan the one of Section 3. In this case, we cannot search for a witness of non-separability directly on the NFAs of the languages. A precomputation is needed.We present the algorithm before stating our main theorem.

UL-intersection. Let A1 = (Q1, A, I1, F1, δ1), A2 = (Q2, A, I2, F2, δ2) be NFAs.The purpose of our precomputation is to associate to all 4-uples (q1, r1, q2, r2) ∈Q2

1 ×Q22 a set α(q1, r1, q2, r2) of subalphabets. Intuitively, B ∈ α(q1, r1, q2, r2) if,

for all κ ∈ N, there are two words w1, w2 such that

(1) B = alph(w1) = alph(w2),

(2) q1w1−−→ r1, and q2

w2−−→ r2,(3) w1

∼=κ w2.

The precomputation of α : Q21 ×Q2

2 → 22A

is performed via a fixpoint algorithm.

For all (q1, r1, q2, r2) ∈ Q21 ×Q2

2, we initially set α(q1, r1, q2, r2) = {{a} | q1a−→

r1 and q2a−→ r2}. The sets are then saturated with the following two operations:

(1) When α(p1, q1, p2, q2) = B and α(q1, r1, q2, r2) = C, then add B ∪ C toα(p1, r1, p2, r2).

(2) When B ∈ α(q1, q1, q2, q2)∩α(r1, r1, r2, r2) and there exist words w1, w2 ∈ B∗such that q1

w1−−→ r1 and q2w2−−→ r2 then add B to α(q1, r1, q2, r2).

Since every set α(q1, r1, q2, r2) only grows with respect to inclusion, and isbounded from above by 2A, the computation terminates. It is straightforward tosee that α can be computed in EXPspace using a fixpoint algorithm. Finally,we say that L1, L2 have empty UL-intersection if α(q1, r1, q2, r2) = ∅ for allq1, q2 ∈ I1, I2 and r1, r2 ∈ F1, F2. We now state the main theorem of this section.

Theorem 7. Let A1 and A2 be two NFAs over alphabet A. Let L1 = L(A1)and L2 = L(A2). Let k1, k2 be the number of states of A1, resp. A2. Defineκ = (2k1k2 + 1)(|A|+ 1)2. Then the following conditions are equivalent:

11

(1) L1 and L2 are UL-separable.(2) L1 and L2 are UL[κ]-separable.(3) The language [L1]∼=κ separates L1 from L2.(4) L1, L2 have empty UL-intersection.

As in the previous section, Conditions (2) and (4) yield algorithms for testingwhether two languages are separable. Moreover, it can be shown that emptyUL-intersection can be tested in Pspace from α. Therefore, we get the followingcorollary.

Corollary 8. It is decidable whether two regular languages can be separated byan unambiguous language. Moreover, this can be done in EXPspace in the sizeof the NFAs recognizing the languages.

Observe that by definition of UL[κ], the bound κ is defined in terms ofunambiguous products. A rephrasing of the theorem would be: there exists aseparator iff there exists one defined by a boolean combination of unambiguousproducts of size κ. It turns out that the same κ also works for FO2(<), i.e., thereexists a separator iff there exists one defined by an FO2(<)-formula of quantifierrank κ. This can be proved by minor adjustements to the proof of Theorem 7.

The proof of Theorem 7 is inspired from techniques used in [11] and reliesheavily on the notion of (p,B)-patterns. It works by induction on the size of thealphabet. There are two non-trivial implications: (1) =⇒ (4) and (4) =⇒ (3). Wenow provide an insight into the most difficult one, i.e., (4) =⇒ (3). The followingproposition is used to prove this.

Proposition 9. Let B ⊆ A and κ = (2k1k2 + 1)(|B|+ 1)2. For all pairs of wordsw1∼=κ w2 such that B = alph(w1) = alph(w2) and all pairs of states (q1, r1) ∈ Q2

1

and (q2, r2) ∈ Q22 such that q1

w1−−→ r1 and q2w2−−→ r2, we have B ∈ α(q1, r1, q2, r2).

Observe that a consequence of Proposition 9 is that as soon as there existsw1 ∈ L1,w2 ∈ L2 such that w1

∼=κ w2 (i.e., [L1]∼=κ is not a separator), there existsa witness of nonempty UL-intersection. This is the contrapositive of (4) =⇒ (3).

5 Conclusion

We proved separation results for both piecewise testable and unambiguous lan-guages. Both results provide a means to decide separability. In the PT case, weeven prove that this can be done in Ptime. Moreover, in both cases we give aninsight on the actual separator by providing a bound on its size, should it exist.

There remain several interesting questions in this field. First, one couldconsider other subclasses of regular languages, the most interesting one beingfull first-order logic. Separability by first-order logic has already been proven tobe decidable using semigroup theory [7]. However, this approach is difficult tounderstand, and it yields a costly algorithm that only provides a yes/no answer,without insight about a possible separator. Another question is to get tight

12

complexity bounds. For unambiguous languages for instance, it is likely thatour EXPspace upper bound can be improved, and even for piecewise testablelanguages, we do not know any tight bounds.

A final observation is that right now, we have no general approach and arebound to use ad-hoc techniques for each subclass. An interesting direction wouldbe to invent a general framework that is suitable for this problem in the sameway that monoids are a suitable framework for decidable characterizations.

References

1. J. Almeida. Some algorithmic problems for pseudovarieties. Publ. Math. Debrecen,54(suppl.):531–552, 1999. Automata and formal languages, VIII (Salgotarjan, 1996).

2. J. Almeida, J. C. Costa, and M. Zeitoun. Pointlike sets with respect to R and J. J.Pure Appl. Algebra, 212(3):486–499, 2008.

3. J. Almeida and M. Zeitoun. The pseudovariety J is hyperdecidable. RAIRO Inform.Theor. Appl., 31(5):457–482, 1997.

4. C. J. Ash. Inevitable graphs: a proof of the type II conjecture and some relateddecision procedures. Internat. J. Algebra Comput., 1:127–146, 1991.

5. R. Bacher. An easy upper bound for Ramsey numbers. HAL, 00763927.6. W. Czerwinski, W. Martens, and T. Masopust. Efficient separability of regular

languages by subsequences and suffixes. In Proc. of ICALP’13, 2013.7. K. Henckell. Pointlike sets: the finest aperiodic cover of a finite semigroup. J. Pure

Appl. Algebra, 55(1-2):85–126, 1988.8. K. Henckell, J. Rhodes, and B. Steinberg. Aperiodic pointlikes and beyond. IJAC,

20(2):287–305, 2010.9. H. B. Hunt, III. Decidability of grammar problems. J. ACM, 29(2):429–447, 1982.

10. J.-E. Pin and P. Weil. Polynomial closure and unambiguous product. Theory ofComputing Systems, 30(4):383–422, 1997.

11. T. Place and L. Segoufin. Deciding definability in FO2(<h, <v) on trees. Journalversion, to appear, 2013.

12. L. Ribes and P. A. Zalesskiı. On the profinite topology on a free group. Bull.London Math. Soc., 25:37–43, 1993.

13. L. van Rooijen and M. Zeitoun. The separation problem for regular languages bypiecewise testable languages. http://arxiv.org/abs/1303.2143, 2013.

14. M. Schutzenberger. On finite monoids having only trivial subgroups. Informationand Control, 8(2):190–194, 1965.

15. M. Schutzenberger. Sur le produit de concatenation non ambigu. Semigroup Forum,13:47–75, 1976.

16. I. Simon. Piecewise testable events. In Proc. of the 2nd GI Conf. on AutomataTheory and Formal Languages, pages 214–222. Springer, 1975.

17. I. Simon. Factorization forests of finite height. Th.. Comp. Sci., 72(1):65 – 94, 1990.18. J. Stern. Complexity of some problems from the theory of automata. Information

and Control, 66(3):163–176, 1985.19. P. Tesson and D. Therien. Diamonds are forever: The variety DA. In Semigroups,

Algorithms, Automata and Languages, pages 475–500. World Scientific, 2002.20. D. Therien and T. Wilke. Over words, two variables are as powerful as one quantifier

alternation. In Proc. of STOC’98, pages 234–240. ACM, 1998.21. A. N. Trahtman. Piecewise and local threshold testability of DFA. In Proc. FCT’01,

pages 347–358. Springer, 2001.

http://arxiv.org/abs/1303.2143

13

Appendix

A Proofs of Section 3

A.1 Proof of Condition (4) Theorem 1

We prove Condition (4). We prove (4) =⇒ (1) and (3) =⇒ (4).

(4) =⇒ (1). We proceed by contraposition. Assume that L1, L2 are not PT-separable. Recall that A1 and A2 are NFAs for L1, L2 and that k1, k2 are their

sizes. Set p = max(k1, k2) + 1 and ` = 22|A||A|(p|A|+1). We prove that L1, L2 both

contain p-implementations of some NA-template T and use T to construct awitness of non PT-separability in (A1,A2).

Let κ be as defined in Lemma 6 from ` and p. Since L1, L2 are not PT-separable, there exist w1 ∈ L1 and w2 ∈ L2 such that w1 ∼κ w2. By choiceof `, p and Lemma 5, w1 must be the p-implementation of some unambiguous`-template T . Applying Lemma 6, we obtain that w1, w2 are both (max(k1, k2))-implementations of T .

Let B = (B1, . . . , Bn) be the subsequence of elements T that are sets. Letu = (u0, . . . , un), where ui is the word obtained by concatenating the letters thatare between Bi and Bi+1 in T . By definition (u,B) is a factorization pair.

Because w1 is a (max(k1, k2))-implementation, the path used to read w1 in A1

must traverse loops labeled by each of the Bi, and clearly this is a (u,B)-path.Using the same argument we get that the path of w2 in A2 is also a (u,B)-path.Therefore (u,B) is a witness of non PT-separability.

(3) =⇒ (4). Again, we proceed by contraposition. Set κ as in Theorem 1 andassume that there exists a factorization pair (u,B) that is a witness of nonPT-separabilty. We prove that [L1]∼κ is not a separator.

Set B = (B1, . . . , Bn) and u = (u0, . . . , un). By definition, this means thatthere exist w1 ∈ L1 and w2 ∈ L2 of the form

u0v1u1v2 · · · vnun,

where the words such that alph(vi) = Bi and vi contains a (κ,Bi)− pattern. It isstraightforward to see that w1 ∼κ w2. Therefore, w2 ∈ L2 ∩ [L1]∼κ and [L1]∼κ isnot a separator.

A.2 Proof of Lemma 6

Lemma 6. Let `, p ∈ N. From p and `, we can compute κ such that for every pairof words w ∼κ w′ and every unambiguous `-template T , w′ is a p-implementationof T whenever w is a (p+ 1)-implementation of T .

Proof. We prove that the lemma holds for κ = |A|p`. Let w ∼κ w′ and letT = {t1, t2, . . . , t`} be an unambiguous `-template such that w is a p + 1-implementation of T . We begin by giving a decomposition of w′ and provethat it indeed witnesses the fact that w′ is a p-implementation of T . We define

14

w′1 · · ·w′` = w′ inductively as follows: assume that the factors are defined up tow′i and let u be such that w′ = w′1 · · ·w′i · u. If ti+1 is a letter then w′i+1 is justthe first letter of u, otherwise ti+1 = B ⊆ A, and in that case w′i+1 is the largestprefix of u which contains only letters of B. We will prove that w′1 · · ·w′` witnessesthat w′ is a p-implementation of T . The proof relies on Claim 1 stated below.

To every unambiguous `-template T = {t1, t2, . . . , t`} and p ∈ N, we associatea piece vT,p = v1 · · · v`, such that for all i:

vi =

{a if ti = a ∈ A(b1 · · · bn)p if ti = {b1, . . . , bn} ⊆ A

By definition, if w is a p-implementation of T then vT,p is a piece of w. ConsidervT,1 = v1 · · · v`. A piece v is incompatible with T when v is of the following form:v = v1 · · · vi · u · vi+1 · · · v` such that if ti (resp. ti+1) is a set the first letter (resp.last letter) of u is not in ti (resp. ti+1).

Claim 1 If w is a 1-implementation of some unambiguous `-template T , thenthere is no piece of w that is incompatible with T .

Proof. This is a consequence of the fact that T is unambiguous. ut

We now prove that for all i, w′i = a if ti is the letter a or w′i is a word oversome B ⊆ A containing a (p,B)-pattern if ti = B. We proceed by induction,assume that this is true up to w′i and consider w′i+1. Set vT,1 = v1 · · · v` andvT,p+1 = r1 · · · r`. By induction hypothesis and by choice of κ, we know thatvi+1 · · · v` and ri+1 · · · r` are pieces of wi+1 · · ·w`. We distinguish two casesdepending on the nature of ti+1.

Case 1: ti+1 is some letter a. We have to prove that w′i+1 = a. Assume thatw′i+1 = b 6= a. Then v1 · · · vi · b · vi+1 · · · v` must be a piece of w′ and therefore apiece of w (w ∼κ w′). We prove that this piece is incompatible with T , whichcontradicts Claim 1. Observe that by definition of wi, if ti is a set then b 6∈ ti(otherwise it would have been included in ti). Therefore v1 · · · vi · b · vi+1 · · · v` isincompatible with T .

Case 2: ti+1 is a set B = {b1, . . . , bn}. By construction, wi+1 contains only lettersin B. Therefore, we have to prove that it contains a (p,B)-pattern. Assumethat it does not. By contruction, the first letter of wi+2 is some letter c 6∈ B. Ifwi+1 = ε, then v1 · · · vi · c · vi+1 · · · v` must be a piece of w′. Using an argumentsimilar to the previous case, we can prove that this piece is incompatible with T ,which contradicts Claim 1.

Otherwise, let b be the first letter of wi+1. Recall that by definition, ri+1

contains a (p+1, B)-pattern and that ri+1 · · · r` is a piece of wi+1 · · ·w`. Therefore,since wi+1 does not contain a (p,B)-pattern, the last suffix of ri+1 containinga (1, B)-pattern must fall in wi+2 and consequently, vi+1 · · · v` must be a pieceof wi+2 · · ·w`. It follows that v1 · · · vi · b · c · vi+1 · · · v` is a piece of w′ and ofw (w ∼κ w′). By definition, c 6∈ ti+1. Moreover, by construction, if ti is a set,then b 6∈ ti. Therefore, v1 · · · vi · b · c · vi+1 · · · v` is incompatible with T whichcontradicts Claim 1 since it is also a piece of w. ut

15

B Proofs of Section 4

In this appendix, we prove Theorem 7. There are two non-trivial directions:(1) =⇒ (4) and (4) =⇒ (3). As we explained, our proof relies heavily on thenotion of (p,B)-patterns. The appendix is divided into two parts, each devotedto one direction. For the rest of this appendix A1,A2 are automata recognizingL1, L2 and k1, k2 are their number of states.

B.1 (4) =⇒ (3)

As we explained in the main paper, this is a consequence of Proposition 9.

Proposition 9. Let B ⊆ A and κ = (2|Q1||Q2| + 1)(|A| + 1)2. For all pairsof words w1

∼=κ w2 such that B = alph(w1) = alph(w2) and all pairs of states

(q1, r1) ∈ Q21 and (q2, r2) ∈ Q2

2 such that q1w1−−→ r1 and q2

w2−−→ r2, we haveB ∈ α(q1, r1, q2, r2).

We begin by giving a brief outline. We first fix a large enough p. Intuitively, pneeds to be large enough so that a run on a word of length p contains a repeatedstate. Then, we show that κ is large enough in order to ensure that w1, w2 areboth a (2p,B)-pattern or neither of them is. In both cases, we are then able todecompose w1, w2 into sequences of factors that are pairwise equivalent and use asmaller alphabet. We then apply induction on these factors and derive the resultby reconstructing w1, w2 using the two operations in the construction of α. Wefirst define our notion of decomposition. For the remainder of this section, weassume B = {b1, · · · , bn} is fixed.

Decompositions for (B, p)-patterns. Consider w ∈ B∗. Recall that by def-inition, w is a (B, p)-pattern iff w ∈ (B∗b1 · · ·B∗bnB∗)p. This means that wcan be decomposed into a sequence of factors witnessing its membership in(B∗b1 · · ·B∗bnB∗)p:

w =

pn∏i=1

(wi · bi mod n) · wkn+1.

To each word w we associate a unique such decomposition for the largestinteger p such that w is a (B, p)-pattern. We say that w ∈ B∗ admits a (B, p)-decomposition iff w is a (B, p)-pattern but not a (B, k+ 1)-pattern. It is straight-forward to see that if w admits a (B, p)-decomposition then there exists an integerl such that pn < l < (p+ 1)n, and

w =

l∏i=1

(wi · bi mod n) · wl+1, (1)

such that for all i, bi mod m 6∈ wi. The integer l is called the length of thedecomposition. We give an example of a word that admits a (B, 1)-decomposition(of length 5) in Figure (3).

16

bcacbbcccaccbaa{

w1

{

w2

{

w3

{

w4

{

w5

{

w6

Fig. 3. w = bcacbbcccaccbaa over B = {a, b, c} admits a (B, 1)-decomposition

A useful result about (B, p)-decompositions is that they can be detected usingan unambiguous product with large enough size. Moreover, using products of thesame size, one can also describe the products that are satisfied by the factors inthe (B, p)-decomposition.

Lemma 10. Let p, κ ∈ N such that κ > κ + p(|B| + 1). Then for every pairof words u ∼=κ v and all h 6 p, u admits a (B,h)-decomposition iff v admitsa (B,h)-decomposition. Moreover, the associated decompositions, as describedin (1), are of the same length l:

u =∏li=1(ui · bi mod n) · ul+1

v =∏li=1(vi · bi mod n) · vl+1

and for all i, ui ∼=κ vi.

Proof. For the sake of readability, for all b ∈ B we will write Bb for the alphabetB \ {b} and for every number i, we will write ci for bi mod n. Assume that uadmits a (B, h)-decomposition and consider the associated decomposition:

u =

l∏i=1

(ui · ci) · ul+1

Consider the following unambiguous product:

P =

l∏i=1

(Bci · ci) ·Bcl+1.

By construction, P is of size l 6 p(|A|+ 1) < κ and u ∈ P . Therefore, v ∈ Pand v admits an (B, h)-decomposition together with the associated decomposition:

v =

l∏i=1

(vi · ci) · vl+1.

It remains to prove that for all i, ui ∼=κ vi. Assume that ui belongs to someunambiguous product P ′ of size κ. This means that u is in the unambiguousproduct:

P ′′ =

i−1∏j=1

(Bcj · cj) · (P ′ · ci) ·l∏

j=i+1

(Bcj · cj) ·Bcl+1.

Because P ′′ is of size at most κ+ k(|A|+ 1), we have v ∈ P ′′. By assumptionon the decomposition of v, this means that vi ∈ P ′ and we are done. ut

17

We now move to the proof of Proposition 9. Set w1, w2, q1, q2, r1 and r2 asin the statement and recall that {b1, . . . , bn} = B of size n. We begin by fixingthe size of the patterns for which we will look in w1, w2. Set p = k1k2, (recallthat k1 = |Q1| and k2 = |Q2|). A pigeonhole principle argument proves that forq0, . . . , qp ∈ Q1 and r0, . . . , rp ∈ Q2, there exist i < j such that both qi = qj andri = rj . We will look for (B, p)-patterns.

Recall that κ = (2p + 1)(n + 1)2 (n = |B|). We prove Proposition 9 byinduction on n. By Lemma 10, either both w1, w2 admit (B, h)-decompositionsfor some h < 2p or both w1, w2 do not admit (B, h)-decompositions for h 6 2p.We treat these two cases separately.

Case 1: both w1 and w2 admit (B, h)-decompositions for h < 2p. Observethat κ = (2p + 1)(|B|)2 implies that κ > κ + 2p(|B| + 1). Therefore, we canapply the second part of Lemma 10 to w1 and w2. This yields the followingdecompositions:

w1 =∏li=1(ui · bi mod n) · ul+1

w2 =∏li=1(vi · bi mod n) · vl+1

such that for all i, ui ∼=κ vi. By hypothesis, there exist runs q1w1−−→ r1 and

q2w2−−→ r2. These runs can be decomposed as sequences of subruns for each factor

in the decomposition.

q1 = q11u1−→ r11

b1−→ q21u2−→ r21

b2−→ · · · ql+11

ul+1−−−→ rl+11 = r1

q2 = q12v1−→ r12

b1−→ q22v2−→ r22

b2−→ · · · ql+12

vl+1−−−→ rl+12 = r2

Observe that by definition of the decompositions ui, vi use a strict subalphabetof B. Therefore, by choice of κ the induction hypothesis can be used and forall i, alph(ui) = alph(vi) ∈ α(qi1, r

i1, q

i2, r

i2). Moreover, by definition of α, for all i

{bi mod n} ∈ α(ri1, qi+11 , ri2, q

i+12 ). Therefore, by application of the Operation (1)

in the construction of α we get that

B = alph(w1) =⋃

i6l+1

alph(ui) ∪⋃i6l

{bi mod n} ∈ α(q1, r1, q2, r2)

which finishes this case.

Case 2: both w1 and w2 do not admit (B, h)-decompositions for h 6 2k.This means that w1, w2 are both (B, 2p)-patterns. A simple argument as in theproof of Lemma 10 shows that w1, w2 can be decomposed into three factors:

w1 = ul · uc · urw2 = vl · vc · vr

such that ul ∼=κ−p(n+1) vl, ur ∼=κ−p(n+1) vr, uc ∼=κ−p(n+1) vc and ul, vl, ur, vr

admit (B, p)-decompositions. By hypothesis, there exist runs q1w1−−→ r1 and

q2w2−−→ r2. These runs can then be decomposed as follows:

18

q1ul−→ p1

uc−→ s1ur−→ r1

q2vl−→ p2

vc−→ s2vr−→ r2

We prove that there are intermediary states p′1, p′2, s′1 and s′2 occurring in the

runs on ul, vl, ur and vr such that:

(1) ∃Cl, Cr ⊆ B such that Cl ∈ α(q1, p′1, q2, p

′2) and Cr ∈ α(s′1, r1, s

′2, r2).

(2) B ∈ α(p′1, p′1, p′2, p′2) and B ∈ α(s′1, s

′1, s′2, s′2).

(3) There exist x1, x2 ∈ B∗ such that p′1x1−→ s′1 and p′2

x2−→ s′2.

Using Items 2 and 3 we can use Operation (2) in the construction of α andobtain that B ∈ α(p′1, s

′1, p′2, s′2). Then by Operation (1) and Item 1, we conclude

that B ∈ α(q1, r1, q2, r2) which ends the proof.Observe that the third point is immediate as soon as p′1, p

′2, s′1 and s′2 have

been defined as occurring in the runs on ul, vl, ur and ur. Therefore we only needto prove that there exist states satisfying the first two points. We only do theproof for p′1, p

′2 occurring in the runs on ul, vl, the proof is symmetrical for s′1, s

′2.

Observe that κ− p(n+ 1) > κ+ p(n+ 1). Therefore, we can apply Lemma 10to ul and vl. This yields the following decompositions:

ul =∏li=1(ui · bi mod n) · ul+1

vl =∏li=1(vi · bi mod n) · vl+1

such that for all i, ui ∼=κ vi. Moreover the length l of the decompositions isl > pn. Recall that there are runs q1

ul−→ p1 and q2vl−→ p2 These runs can be

decomposed as sequences of subruns for each factor in the decomposition.

q1 = q11u1−→ p11

b1−→ q21u2−→ p21

b2−→ · · · ql+11

ul+1−−−→ pl+11 = r1

q2 = q12v1−→ p12

b1−→ q22v2−→ p22

b2−→ · · · ql+12

vl+1−−−→ pl+12 = r2

Observe that by definition of the decompositions ui, vi use a strict subalphabetof B. Therefore, by choice of κ the induction hypothesis can be used and for alli, alph(ui) = alph(vi) ∈ α(qi1, p

i1, q

i2, p

i2). Moreover, by definition of α, for all i

{bi mod n} ∈ α(pi1, qi+11 , pi2, q

i+12 ). Recall that l > pn, therefore, by choice of p,

there exist numbers 0 6 j < j′ 6 p such that q1+nj1 = q1+nj′

1 and q1+nj2 = q1+nj′

2 .

Set p′1 = q1+nj1 and p′2 = q1+|B|j2 . By applying Operation (1) in the definition of

α we get

α(p′1, p′1, p′2, p′2) 3

⋃nj<i6nj′

alph(ui) ∪⋃

nj<i6nj′

{bi mod n} ⊆ B

α(q1, p′1, q2, p

′2) 3

⋃16i6nj

alph(ui) ∪⋃

16i6nj

{bi mod n} ⊆ B

Moreover, observe that since j < j′⋃nj<i6nj′

{bi mod n} = B

We conclude that B ∈ α(p′1, p′1, p′2, p′2) which finishes the proof.

19

B.2 (1) =⇒ (4)

This direction in Theorem 7 is a consequence of the following proposition.

Proposition 11. Let (q1, r1) ∈ Q21 and (q2, r2) ∈ Q2

2 and assume that B ∈α(q1, r1, q2, r2). Then for all κ ∈ N, there exist w1

∼=κ w2 such that B =

alph(w1) = alph(w2), q1w1−−→ r1 and q2

w2−−→ r2.

Before proving Proposition 11, we briefly explain how it can be used to provethat (1) =⇒ (4). It is simple to derive from Proposition 11 that if there existsa witness of non-empty UL-intersection, then for all κ there can be no UL[κ]separator. This is exactly the contrapositive of (1) =⇒ (4). We finish with theproof of Proposition 11.

The proof goes by induction on the number of operations required to generateB ∈ α(q1, r1, q2, r2). If no operation was used then B = {a} is a singleton and

q1a−→ r1 and q2

a−→ r2. Moreover, for all κ ∈ N, a ∼=κ a.Otherwise we distinguish two cases depending on the last operation that was

used in order to generate B ∈ α(q1, r1, q2, r2).

First Case: Operation (1). There exist states p1, p2 and C ∪D = B such thatα(q1, p1, q2, p2) = C, α(p1, r1, p2, r2) = D. Set κ ∈ N, by induction hypothesis,there exists u1, u2, v1, v2 such that u1 ∼=κ u2, v1 ∼=κ v2, alph(u1) = alph(u2) = C,

alph(v1) = alph(v2) = D, q1u1−→ p1, p1

v1−→ r1, q2u2−→ p2 and p2

v2−→ r2. Set

w1 = u1v1 and w2 = u2v2. It is straightforward to see that w1∼=κ w2, q1

w1−−→ r1,q2

w2−−→ r2 and alph(w1) = alph(w2) = B.

Second Case: Operation (2). B ∈ α(q1, q1, q2, q2), α(r1, r1, r2, r2) and there

exists words u1, u2 ∈ B∗ such that q1u1−→ r1 and q2

u2−→ r2. By inductionhypothesis, there exist words v1, v2, x1, x2 with alphabet B such that v1 ∼=κ v2,x1 ∼=κ x2, alph(v1) = alph(v2) = alph(x1) = alph(x2) = B, q1

v1−→ q1, r1x1−→

r1, q2v2−→ q2 and r2

x2−→ r2. Set κ ∈ N and set w1 = (v1)|B|κu1(x1)|B|κ andw2 = (v2)|B|κu2(x2)|B|κ. By definition, it is clear that alph(w1) = alph(w2) = B,

q1w1−−→ r1 and q2

w2−−→ r2. Moreover, since alph(v1) = alph(x1) = B, (v1)|B|κ and(x1)|B|κ are (B, κ)-patterns. Observe that because v1 ∼=κ v2, x1 ∼=κ x2, we have

w2 = (v2)|B|κu2(x2)|B|κ ∼=κ (v1)|B|κu2(x1)|B|κ

Therefore it suffices to prove that

w1 = (v1)|B|κu1(x1)|B|κ ∼=κ (v1)|B|κu2(x1)|B|κ

in order to conclude that w1∼=κ w2 and end the proof. This last equivalence is a

consequence of the following lemma.

Lemma 12. Let κ′ ∈ N, and let u, v ∈ B∗ be two words that are both (B,κ′)-patterns. Then for all words w1, w2, the following equivalence holds:

u · w1 · v ∼=κ′ u · w2 · v.

20

Proof. We begin by observing a simple property of unambiguous products.

Remark 1 Let B∗0c1B∗1 · · ·B∗κ′−1cκ′B∗κ′ ⊆ B∗ be an unambiguous product. There

can be at most one set Bi such that Bi = B.

Let P = B∗0c1B∗1 · · ·B∗κ′−1cκ′B∗κ′ an unambiguous product of size κ′ and

assume that uw1v ∈ P . We prove that uw2v ∈ P . Since uw1v ∈ P , there existssome decomposition

uw1v = x0c1x1 · · ·xκ′−1cκ′xκ′

that is a witness. If no set Bi is the whole alphabet B, then uw1v is at most aκ′-pattern, which is impossible since it is by definition a 2κ′-pattern. Therefore, byRemark 1 there is exactly one set Bi such that Bi = B. It follows that the wordsx0c1x1 · · ·xi−1 and ci+1 · · ·xκ′−1cκ′xκ′ are at most κ′ − 1-patterns. Therefore,they are respectively a prefix of u and a suffix of v (which are κ′-patterns). Itfollows that there exists some word yi such that

uw2v = x0c1x1 · · ·xi−1yici+1 · · ·xκ′−1cκ′xκ′ .

Such a decomposition for uw2v is witness for membership in P . We concludethat uw2v ∈ P .

ut

separating regular languages by piecewise testable … regular languages by piecewise testable and...

Documents