t f l t c - university of manitoba

39
T F L T C A S Turku Centre for Computer Science University of Turku Lemminkäisenkatu 14, 20520 Turku, Finland [email protected] M W T Michael Domaratzki Jodrey School of Computer Science, Acadia University, Wolfville, NS B4P 2R6 Canada [email protected] Abstract We survey recent results on the use of trajectories as a tool for mod- elling language operations and other, related objects. Many applications of the concept of trajectories have been developed since their introduction by Mateescu, Rozenberg and Salomaa in 1996. Areas which have seen activity include the theory of codes, language equations, modelling noisy channels, grammar models and DNA code-word design. We survey each of these ar- eas. 1 Introduction Trajectories, introduced by Mateescu, Rozenberg and Salomaa [79], are a manner in which a language operation is defined by a fixed language, used as a parameter. 1

Upload: others

Post on 16-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

T F L T C

A S

Turku Centre for Computer ScienceUniversity of Turku

Lemminkaumlisenkatu 14 20520 Turku Finlandasalomaaitutufi

MW T

Michael DomaratzkiJodrey School of Computer Science Acadia University

Wolfville NS B4P 2R6 Canadamikedomaratzkiacadiauca

Abstract

We survey recent results on the use of trajectories as a tool for mod-elling language operations and other related objects Many applications ofthe concept of trajectories have been developed since their introduction byMateescu Rozenberg and Salomaa in 1996 Areas which have seen activityinclude the theory of codes language equations modelling noisy channelsgrammar models and DNA code-word design We survey each of these ar-eas

1 Introduction

Trajectories introduced by Mateescu Rozenberg and Salomaa [79] are a mannerin which a language operation is defined by a fixed language used as a parameter

1

In this way we define infinitely many language operations as each language overthe trajectory alphabet defines a binary language operation

The simplicity and elegance of the concept of trajectories has lead to appli-cations and generalizations in a number of research areas including languageequations theory of codes DNA code-word design modelling noisy channelsand other applied areas of formal language theory

In this survey we review several recent research areas related to the use oftrajectories For a survey of early results on trajectories we refer the reader to theFormal Language Theory column ldquoWords on Trajectoriesrdquo by Mateescu [78]

2 Shuffle on Trajectories

We begin with the fundamental definition for the investigation of trajectories thatof shuffle on trajectories The shuffle on trajectories operation is a method forspecifying the ways in which two input words may be merged while preservingthe order of symbols in each word to form a result Each trajectoryt isin 01lowast

with |t|0 = n and|t|1 = mspecifies the manner in which we can form the shuffle ontrajectories of two words of lengthn (as the left input word) andm (as the rightinput word) The word resulting from the shuffle alongt will have a letter fromthe left input word in positioni if the i-th symbol oft is 0 and a letter from theright input word in positioni if the i-th symbol oft is 1

We now give the formal definition of shuffle on trajectories originally due toMateescuet al [79] Shuffle on trajectories is defined by first defining the shuffleof two wordsx andy over an alphabetΣ on a trajectoryt a word over01 Wedenote the shuffle of x andy on trajectoryt by x t y

If x = axprime y = byprime (with ab isin Σ) andt = etprime (with e isin 01) then

x etprime y =

a(xprime tprime byprime) if e= 0b(axprime tprime yprime) if e= 1

If x = axprime (a isin Σ) y = ε andt = etprime (e isin 01) then

x etprime ε =

a(xprime tprime ε) if e= 0empty otherwise

If x = ε y = byprime (b isin Σ) andt = etprime (e isin 01) then

ε etprime y =

b(ε tprime yprime) if e= 1empty otherwise

We let x ε y = empty if x y ε Finally if x = y = ε thenε t ε = ε if t = ε andempty otherwise

2

It is not difficult to see that ift =prodn

i=1 0j i 1ki for somen ge 0 and j i ki ge 0 forall 1 le i le n then we have that

x t y =nprod

i=1

xiyi x =nprod

i=1

xi y =nprod

i=1

yi

with |xi | = j i |yi | = ki for all 1 le i le n

if |x| = |t|0 and|y| = |t|1 andx t y = empty if |x| |t|0 or |y| |t|1We extend shuffle on trajectories tosets Tsube 01lowast of trajectoriesas follows

x T y =⋃tisinT

x t y

Further forL1 L2 sube Σlowast we define

L1 T L2 =⋃xisinL1yisinL2

x T y

Consider the following examples We can see that ifT = 0lowast1lowast we have thatL1 T L2 = L1L2 ieT = 0lowast1lowast gives the concatenation operation IfT = (0+1)lowastthen L1 T L2 = L1 L2 ie T = 01lowast gives the shuffle operation This isthe least restrictive set of trajectories IfT = 0lowast1lowast0lowast then T is the insertionoperationlarr (see eg Kari [50]) which is defined byxlarr y = x1yx2 x1 x2 isin

Σlowast x1x2 = x for all x y isin Σlowast See Mateescuet al [79] for the fundamental studyof shuffle on trajectories

3 Deletion along Trajectories

We now consider deletion on trajectories an important addition to the study oftrajectories and related areas The concept of deletion along trajectories was in-dependently introduced by the author [17 22] and Kari and Sosiacutek [55 56] Theprimary motivation for the introduction of deletion on trajectories is to definean ldquoinverserdquo to shuffle on trajectories in a sense we will see below Intuitivelydeletion on trajectories uses the trajectories to model language operations whichdelete an occurrence of the right argument from the left argument in a controlledscattered way

Let x y isin Σlowast be words withx = axprime y = byprime (ab isin Σ) Let t be a word overid such thatt = etprime with e isin id Then we definex t y the deletion ofyfrom x along trajectoryt as follows

xt y =

a(xprime tprime byprime) if e= ixprime tprime yprime if e= d anda = bempty otherwise

3

Also if x = axprime (a isin Σ) andt = etprime (e isin id) then

xt ε =

a(xprime tprime ε) if e= iempty otherwise

If x ε then x ε y = empty Furtherε t y = ε if t = y = ε Otherwiseε t y = empty

Let T sube idlowast ThenxT y =

⋃tisinT

xt y

We extend this to languages as expected LetL1 L2 sube Σlowast andT sube idlowast Then

L1T L2 =⋃xisinL1yisinL2

xT y

We consider the following examples of deletion along trajectories

(a) if T = ilowastdlowast thenT= the right-quotient operation(b) if T = dlowastilowast thenT= the left-quotient operation(c) if T = ilowastdlowastilowast thenT=rarr the deletion operation (see eg Kari[49 50])(d) if T = (i + d)lowast thenT= the scattered deletion operation (seeeg Itoet al [41])(e) if T = dlowastilowastdlowast thenT= the bi-polar deletion operation (seeeg Kari [50])(f) let k ge 0 andTk = ilowastdlowastilek ThenTk=rarr

k thek-deletion operation(see eg Kari and Thierrin [57])

We now recall some of the closure properties of deletion along trajectories

Theorem 31 Let Σ be an alphabet There exist weak codingsρ1 ρ2 τ ϕ and aregular language R such that for all L1 L2 sube Σ

lowast and all T sube idlowast

L1T L2 = ϕ(ρminus1

1 (L1) cap ρminus12 (L2) cap τ

minus1(T) cap R)

Corollary 32 LetL be a cone Then for all L1 L2T such that two are regularlanguages and the third is fromL L1T L2 isin L

31 Non-regular Trajectories Preserving Regularity

Consider the following result of Mateescuet al [79 Thm 51] if L1 T L2 isregular for all regular languagesL1 L2 thenT is regular This result is clear uponnoting that for allT 0lowast T 1lowast = T

4

However we note that the same result does not hold if we replace ldquoshuffleon trajectoriesrdquo by ldquodeletion along trajectoriesrdquo As motivation we begin with abasic example LetΣ be an alphabet andH = indn n ge 0 Note that

R1H R2 = x isin Σlowast existy isin R2 such thatxy isin R1 and|x| = |y|

We can establish directly (by constructing an NFA) that for all regular languagesR1R2 sube Σ

lowast the languageR1H R2 is regular HoweverH itself is not regularWe remark thatR1 H R2 is similar to proportional removals studied by

Stearns and Hartmanis [90] Amar and Putzolu [1 2] Seiferas and McNaughton[86] Kosaraju [60 61 62] Kozen [63] Zhang [96] the author [16] Bersteletal [4] and others In particular we note the case of1

2(L) given by

12

(L) = x isin Σlowast existy isin Σlowast such thatxy isin L and|x| = |y|

The operation12(L) is one of a class of operations which preserve regularity

Seiferas and McNaughton completely characterize those binary relationsr sube N2

such that the operation

P(L r) = x isin Σlowast existy isin Σlowast such thatxy isin L andr(|x| |y|)

preserves regularityRecall that a setA is ultimately periodic (up) if there existn0 p isin N p gt 0

such that for allx ge n0 x isin I lArrrArr x + p isin I Call a binary relationr sube N2

up-preservingif A up impliesrminus1(A) = i exist j isin A such thatr(i j) is alsoup Then the binary relationsr such thatP(middot r) preserves regularity are preciselythe up-preserving relations [86] This was extended to deletion on trajectories[17 22]

Theorem 33 Let r sube N2 be a binary relation and Hr = indm r(nm) TheoperationHr is regularity-preserving if and only if r is up-preserving

Theorem 33 has been extended by the author [17 22] to cover other boundedsets of trajectories

32 Algebraic Properties

Kari and Sosiacutek [56] show the following results concerning algebraic properties ofdeletion along trajectories

Theorem 34Let T sube idlowast The following three conditions are equivalent

5

(a) the operationT is commutative(b) L1T L2 sube ε for all languages L1 L2(c) T sube dlowast

Corollary 35 Given a context-free set of trajectories Tsube idlowast it is decidablewhetherT is a commutative operation

Theorem 36 For a set of trajectories Tsube idlowast the following two conditionsare equivalent

(i) For all t1 isin 1m 0n t2 isin 1n 0i and t3 isin 1j 0m+n i jmn isin N

(a) mgt 0 and t1 isin T implies t2 lt T(b) mgt 0 and t1 isin T implies t3 lt T(c) t1 isin T and0m isin T implies0n isin T(d) t1 isin T and0n isin T implies0m isin T

(ii) T is an associative operation

However the associated decidability problem is apparently open

Open Problem 37Given a regular set of trajectories Tsube idlowast is it decidablewhetherT is an associative operation What if T is context-free

33 Commutative Closure

Recall that the commutative closure of a wordw isin Σlowast is the setcom(w) = x foralla isin Σ |w|a = |x|a The commutative closure of a languageL is the union ofthe commutative closure of each word inL Note that the operationcomdoes notpreserve regularitycom((ab)lowast) = x isin ablowast |x|a = |x|b

The author Mateescu K Salomaa and Yu [23] have shown that ifT is com-plete and associative then for all languagesL we can define an a congruencerelationsimLT Then the factor monoid ofLT = (P(Σlowast) T ε) with respect tosimLT

is finite if and only ifcom(L) is a regular language [23]

4 Language Equations

The immediate consequence of the introduction of deletion on trajectories is itsapplication to language equations A language equation is an equality involvingconstant languages language operations and unknowns We refer the reader toLeiss [67] for an introduction to the theory of language equations

6

There are several facets to research on language equations For instance givena language equation or a system of language equations we can seek to character-ize the solutions to the equation or simply decide if a solution exists In theresearch on language equations involving shuffle on trajectories the focus of re-search has been the latter task

41 Zero Variable Equations

The most simple language equations are those without any variables at all Itfollows from the closure properties of T andT that it is decidable if eitherR1 T R2 = R3 or R1 T R2 = R3 holds if all of R1R2 andR3 are regular lan-guages andT is regular Below we review decidability results when the languagesand set of trajectories are not all regular

Let Ψa(T) = |t|a t isin T wherea is a letter and|t|a denotes the number ofzeroes int Kari and Sosiacutek [56] establish the following elegant result

Theorem 41 Let T be an arbitrary but fixed regular set of trajectories Theproblem ldquoIs L1 T L2 = Rrdquo is decidable for a CFL L1 and regular languagesL2R if and only ifΨ0(T) is finite

The analogous result for deletion on trajectories (ieΨi(T) is finite) also holds[56] Further ifL1 is taken to be a regular language andL2 is taken to be a CFLthen the necessary and sufficient conditions are thatΨ1(T) is finite [56]

We can contrast the above results with the following surprising result whichwas established by the author and K Salomaa [26]

Theorem 42 Let T = 0i1r0j1s0k r s r s are odd i j k ge 0 Then fora given alphabetΣ and given regular languages SR1R2 sube Σ it is undecidablewhether or not S= R1 T R2

The result is interesting since it is an undecidability result about regular lan-guages given three regular languages it is undecidable whether the given equalityholds the context-free set of trajectoriesT if fixed (and is not part of the input)

42 One Variable Equations

One variable equations involving shuffle and deletion on trajectories have beenstudied by the author [17 22] Kari and Sosiacutek [55 56] and the author and K Sa-lomaa [25] We summarize these results below

By general results of Kari on the inverse of language operations [50] the fol-lowing result on one variable language equations involving shuffle and deletionalong trajectories is immediate

7

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

In this way we define infinitely many language operations as each language overthe trajectory alphabet defines a binary language operation

The simplicity and elegance of the concept of trajectories has lead to appli-cations and generalizations in a number of research areas including languageequations theory of codes DNA code-word design modelling noisy channelsand other applied areas of formal language theory

In this survey we review several recent research areas related to the use oftrajectories For a survey of early results on trajectories we refer the reader to theFormal Language Theory column ldquoWords on Trajectoriesrdquo by Mateescu [78]

2 Shuffle on Trajectories

We begin with the fundamental definition for the investigation of trajectories thatof shuffle on trajectories The shuffle on trajectories operation is a method forspecifying the ways in which two input words may be merged while preservingthe order of symbols in each word to form a result Each trajectoryt isin 01lowast

with |t|0 = n and|t|1 = mspecifies the manner in which we can form the shuffle ontrajectories of two words of lengthn (as the left input word) andm (as the rightinput word) The word resulting from the shuffle alongt will have a letter fromthe left input word in positioni if the i-th symbol oft is 0 and a letter from theright input word in positioni if the i-th symbol oft is 1

We now give the formal definition of shuffle on trajectories originally due toMateescuet al [79] Shuffle on trajectories is defined by first defining the shuffleof two wordsx andy over an alphabetΣ on a trajectoryt a word over01 Wedenote the shuffle of x andy on trajectoryt by x t y

If x = axprime y = byprime (with ab isin Σ) andt = etprime (with e isin 01) then

x etprime y =

a(xprime tprime byprime) if e= 0b(axprime tprime yprime) if e= 1

If x = axprime (a isin Σ) y = ε andt = etprime (e isin 01) then

x etprime ε =

a(xprime tprime ε) if e= 0empty otherwise

If x = ε y = byprime (b isin Σ) andt = etprime (e isin 01) then

ε etprime y =

b(ε tprime yprime) if e= 1empty otherwise

We let x ε y = empty if x y ε Finally if x = y = ε thenε t ε = ε if t = ε andempty otherwise

2

It is not difficult to see that ift =prodn

i=1 0j i 1ki for somen ge 0 and j i ki ge 0 forall 1 le i le n then we have that

x t y =nprod

i=1

xiyi x =nprod

i=1

xi y =nprod

i=1

yi

with |xi | = j i |yi | = ki for all 1 le i le n

if |x| = |t|0 and|y| = |t|1 andx t y = empty if |x| |t|0 or |y| |t|1We extend shuffle on trajectories tosets Tsube 01lowast of trajectoriesas follows

x T y =⋃tisinT

x t y

Further forL1 L2 sube Σlowast we define

L1 T L2 =⋃xisinL1yisinL2

x T y

Consider the following examples We can see that ifT = 0lowast1lowast we have thatL1 T L2 = L1L2 ieT = 0lowast1lowast gives the concatenation operation IfT = (0+1)lowastthen L1 T L2 = L1 L2 ie T = 01lowast gives the shuffle operation This isthe least restrictive set of trajectories IfT = 0lowast1lowast0lowast then T is the insertionoperationlarr (see eg Kari [50]) which is defined byxlarr y = x1yx2 x1 x2 isin

Σlowast x1x2 = x for all x y isin Σlowast See Mateescuet al [79] for the fundamental studyof shuffle on trajectories

3 Deletion along Trajectories

We now consider deletion on trajectories an important addition to the study oftrajectories and related areas The concept of deletion along trajectories was in-dependently introduced by the author [17 22] and Kari and Sosiacutek [55 56] Theprimary motivation for the introduction of deletion on trajectories is to definean ldquoinverserdquo to shuffle on trajectories in a sense we will see below Intuitivelydeletion on trajectories uses the trajectories to model language operations whichdelete an occurrence of the right argument from the left argument in a controlledscattered way

Let x y isin Σlowast be words withx = axprime y = byprime (ab isin Σ) Let t be a word overid such thatt = etprime with e isin id Then we definex t y the deletion ofyfrom x along trajectoryt as follows

xt y =

a(xprime tprime byprime) if e= ixprime tprime yprime if e= d anda = bempty otherwise

3

Also if x = axprime (a isin Σ) andt = etprime (e isin id) then

xt ε =

a(xprime tprime ε) if e= iempty otherwise

If x ε then x ε y = empty Furtherε t y = ε if t = y = ε Otherwiseε t y = empty

Let T sube idlowast ThenxT y =

⋃tisinT

xt y

We extend this to languages as expected LetL1 L2 sube Σlowast andT sube idlowast Then

L1T L2 =⋃xisinL1yisinL2

xT y

We consider the following examples of deletion along trajectories

(a) if T = ilowastdlowast thenT= the right-quotient operation(b) if T = dlowastilowast thenT= the left-quotient operation(c) if T = ilowastdlowastilowast thenT=rarr the deletion operation (see eg Kari[49 50])(d) if T = (i + d)lowast thenT= the scattered deletion operation (seeeg Itoet al [41])(e) if T = dlowastilowastdlowast thenT= the bi-polar deletion operation (seeeg Kari [50])(f) let k ge 0 andTk = ilowastdlowastilek ThenTk=rarr

k thek-deletion operation(see eg Kari and Thierrin [57])

We now recall some of the closure properties of deletion along trajectories

Theorem 31 Let Σ be an alphabet There exist weak codingsρ1 ρ2 τ ϕ and aregular language R such that for all L1 L2 sube Σ

lowast and all T sube idlowast

L1T L2 = ϕ(ρminus1

1 (L1) cap ρminus12 (L2) cap τ

minus1(T) cap R)

Corollary 32 LetL be a cone Then for all L1 L2T such that two are regularlanguages and the third is fromL L1T L2 isin L

31 Non-regular Trajectories Preserving Regularity

Consider the following result of Mateescuet al [79 Thm 51] if L1 T L2 isregular for all regular languagesL1 L2 thenT is regular This result is clear uponnoting that for allT 0lowast T 1lowast = T

4

However we note that the same result does not hold if we replace ldquoshuffleon trajectoriesrdquo by ldquodeletion along trajectoriesrdquo As motivation we begin with abasic example LetΣ be an alphabet andH = indn n ge 0 Note that

R1H R2 = x isin Σlowast existy isin R2 such thatxy isin R1 and|x| = |y|

We can establish directly (by constructing an NFA) that for all regular languagesR1R2 sube Σ

lowast the languageR1H R2 is regular HoweverH itself is not regularWe remark thatR1 H R2 is similar to proportional removals studied by

Stearns and Hartmanis [90] Amar and Putzolu [1 2] Seiferas and McNaughton[86] Kosaraju [60 61 62] Kozen [63] Zhang [96] the author [16] Bersteletal [4] and others In particular we note the case of1

2(L) given by

12

(L) = x isin Σlowast existy isin Σlowast such thatxy isin L and|x| = |y|

The operation12(L) is one of a class of operations which preserve regularity

Seiferas and McNaughton completely characterize those binary relationsr sube N2

such that the operation

P(L r) = x isin Σlowast existy isin Σlowast such thatxy isin L andr(|x| |y|)

preserves regularityRecall that a setA is ultimately periodic (up) if there existn0 p isin N p gt 0

such that for allx ge n0 x isin I lArrrArr x + p isin I Call a binary relationr sube N2

up-preservingif A up impliesrminus1(A) = i exist j isin A such thatr(i j) is alsoup Then the binary relationsr such thatP(middot r) preserves regularity are preciselythe up-preserving relations [86] This was extended to deletion on trajectories[17 22]

Theorem 33 Let r sube N2 be a binary relation and Hr = indm r(nm) TheoperationHr is regularity-preserving if and only if r is up-preserving

Theorem 33 has been extended by the author [17 22] to cover other boundedsets of trajectories

32 Algebraic Properties

Kari and Sosiacutek [56] show the following results concerning algebraic properties ofdeletion along trajectories

Theorem 34Let T sube idlowast The following three conditions are equivalent

5

(a) the operationT is commutative(b) L1T L2 sube ε for all languages L1 L2(c) T sube dlowast

Corollary 35 Given a context-free set of trajectories Tsube idlowast it is decidablewhetherT is a commutative operation

Theorem 36 For a set of trajectories Tsube idlowast the following two conditionsare equivalent

(i) For all t1 isin 1m 0n t2 isin 1n 0i and t3 isin 1j 0m+n i jmn isin N

(a) mgt 0 and t1 isin T implies t2 lt T(b) mgt 0 and t1 isin T implies t3 lt T(c) t1 isin T and0m isin T implies0n isin T(d) t1 isin T and0n isin T implies0m isin T

(ii) T is an associative operation

However the associated decidability problem is apparently open

Open Problem 37Given a regular set of trajectories Tsube idlowast is it decidablewhetherT is an associative operation What if T is context-free

33 Commutative Closure

Recall that the commutative closure of a wordw isin Σlowast is the setcom(w) = x foralla isin Σ |w|a = |x|a The commutative closure of a languageL is the union ofthe commutative closure of each word inL Note that the operationcomdoes notpreserve regularitycom((ab)lowast) = x isin ablowast |x|a = |x|b

The author Mateescu K Salomaa and Yu [23] have shown that ifT is com-plete and associative then for all languagesL we can define an a congruencerelationsimLT Then the factor monoid ofLT = (P(Σlowast) T ε) with respect tosimLT

is finite if and only ifcom(L) is a regular language [23]

4 Language Equations

The immediate consequence of the introduction of deletion on trajectories is itsapplication to language equations A language equation is an equality involvingconstant languages language operations and unknowns We refer the reader toLeiss [67] for an introduction to the theory of language equations

6

There are several facets to research on language equations For instance givena language equation or a system of language equations we can seek to character-ize the solutions to the equation or simply decide if a solution exists In theresearch on language equations involving shuffle on trajectories the focus of re-search has been the latter task

41 Zero Variable Equations

The most simple language equations are those without any variables at all Itfollows from the closure properties of T andT that it is decidable if eitherR1 T R2 = R3 or R1 T R2 = R3 holds if all of R1R2 andR3 are regular lan-guages andT is regular Below we review decidability results when the languagesand set of trajectories are not all regular

Let Ψa(T) = |t|a t isin T wherea is a letter and|t|a denotes the number ofzeroes int Kari and Sosiacutek [56] establish the following elegant result

Theorem 41 Let T be an arbitrary but fixed regular set of trajectories Theproblem ldquoIs L1 T L2 = Rrdquo is decidable for a CFL L1 and regular languagesL2R if and only ifΨ0(T) is finite

The analogous result for deletion on trajectories (ieΨi(T) is finite) also holds[56] Further ifL1 is taken to be a regular language andL2 is taken to be a CFLthen the necessary and sufficient conditions are thatΨ1(T) is finite [56]

We can contrast the above results with the following surprising result whichwas established by the author and K Salomaa [26]

Theorem 42 Let T = 0i1r0j1s0k r s r s are odd i j k ge 0 Then fora given alphabetΣ and given regular languages SR1R2 sube Σ it is undecidablewhether or not S= R1 T R2

The result is interesting since it is an undecidability result about regular lan-guages given three regular languages it is undecidable whether the given equalityholds the context-free set of trajectoriesT if fixed (and is not part of the input)

42 One Variable Equations

One variable equations involving shuffle and deletion on trajectories have beenstudied by the author [17 22] Kari and Sosiacutek [55 56] and the author and K Sa-lomaa [25] We summarize these results below

By general results of Kari on the inverse of language operations [50] the fol-lowing result on one variable language equations involving shuffle and deletionalong trajectories is immediate

7

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

It is not difficult to see that ift =prodn

i=1 0j i 1ki for somen ge 0 and j i ki ge 0 forall 1 le i le n then we have that

x t y =nprod

i=1

xiyi x =nprod

i=1

xi y =nprod

i=1

yi

with |xi | = j i |yi | = ki for all 1 le i le n

if |x| = |t|0 and|y| = |t|1 andx t y = empty if |x| |t|0 or |y| |t|1We extend shuffle on trajectories tosets Tsube 01lowast of trajectoriesas follows

x T y =⋃tisinT

x t y

Further forL1 L2 sube Σlowast we define

L1 T L2 =⋃xisinL1yisinL2

x T y

Consider the following examples We can see that ifT = 0lowast1lowast we have thatL1 T L2 = L1L2 ieT = 0lowast1lowast gives the concatenation operation IfT = (0+1)lowastthen L1 T L2 = L1 L2 ie T = 01lowast gives the shuffle operation This isthe least restrictive set of trajectories IfT = 0lowast1lowast0lowast then T is the insertionoperationlarr (see eg Kari [50]) which is defined byxlarr y = x1yx2 x1 x2 isin

Σlowast x1x2 = x for all x y isin Σlowast See Mateescuet al [79] for the fundamental studyof shuffle on trajectories

3 Deletion along Trajectories

We now consider deletion on trajectories an important addition to the study oftrajectories and related areas The concept of deletion along trajectories was in-dependently introduced by the author [17 22] and Kari and Sosiacutek [55 56] Theprimary motivation for the introduction of deletion on trajectories is to definean ldquoinverserdquo to shuffle on trajectories in a sense we will see below Intuitivelydeletion on trajectories uses the trajectories to model language operations whichdelete an occurrence of the right argument from the left argument in a controlledscattered way

Let x y isin Σlowast be words withx = axprime y = byprime (ab isin Σ) Let t be a word overid such thatt = etprime with e isin id Then we definex t y the deletion ofyfrom x along trajectoryt as follows

xt y =

a(xprime tprime byprime) if e= ixprime tprime yprime if e= d anda = bempty otherwise

3

Also if x = axprime (a isin Σ) andt = etprime (e isin id) then

xt ε =

a(xprime tprime ε) if e= iempty otherwise

If x ε then x ε y = empty Furtherε t y = ε if t = y = ε Otherwiseε t y = empty

Let T sube idlowast ThenxT y =

⋃tisinT

xt y

We extend this to languages as expected LetL1 L2 sube Σlowast andT sube idlowast Then

L1T L2 =⋃xisinL1yisinL2

xT y

We consider the following examples of deletion along trajectories

(a) if T = ilowastdlowast thenT= the right-quotient operation(b) if T = dlowastilowast thenT= the left-quotient operation(c) if T = ilowastdlowastilowast thenT=rarr the deletion operation (see eg Kari[49 50])(d) if T = (i + d)lowast thenT= the scattered deletion operation (seeeg Itoet al [41])(e) if T = dlowastilowastdlowast thenT= the bi-polar deletion operation (seeeg Kari [50])(f) let k ge 0 andTk = ilowastdlowastilek ThenTk=rarr

k thek-deletion operation(see eg Kari and Thierrin [57])

We now recall some of the closure properties of deletion along trajectories

Theorem 31 Let Σ be an alphabet There exist weak codingsρ1 ρ2 τ ϕ and aregular language R such that for all L1 L2 sube Σ

lowast and all T sube idlowast

L1T L2 = ϕ(ρminus1

1 (L1) cap ρminus12 (L2) cap τ

minus1(T) cap R)

Corollary 32 LetL be a cone Then for all L1 L2T such that two are regularlanguages and the third is fromL L1T L2 isin L

31 Non-regular Trajectories Preserving Regularity

Consider the following result of Mateescuet al [79 Thm 51] if L1 T L2 isregular for all regular languagesL1 L2 thenT is regular This result is clear uponnoting that for allT 0lowast T 1lowast = T

4

However we note that the same result does not hold if we replace ldquoshuffleon trajectoriesrdquo by ldquodeletion along trajectoriesrdquo As motivation we begin with abasic example LetΣ be an alphabet andH = indn n ge 0 Note that

R1H R2 = x isin Σlowast existy isin R2 such thatxy isin R1 and|x| = |y|

We can establish directly (by constructing an NFA) that for all regular languagesR1R2 sube Σ

lowast the languageR1H R2 is regular HoweverH itself is not regularWe remark thatR1 H R2 is similar to proportional removals studied by

Stearns and Hartmanis [90] Amar and Putzolu [1 2] Seiferas and McNaughton[86] Kosaraju [60 61 62] Kozen [63] Zhang [96] the author [16] Bersteletal [4] and others In particular we note the case of1

2(L) given by

12

(L) = x isin Σlowast existy isin Σlowast such thatxy isin L and|x| = |y|

The operation12(L) is one of a class of operations which preserve regularity

Seiferas and McNaughton completely characterize those binary relationsr sube N2

such that the operation

P(L r) = x isin Σlowast existy isin Σlowast such thatxy isin L andr(|x| |y|)

preserves regularityRecall that a setA is ultimately periodic (up) if there existn0 p isin N p gt 0

such that for allx ge n0 x isin I lArrrArr x + p isin I Call a binary relationr sube N2

up-preservingif A up impliesrminus1(A) = i exist j isin A such thatr(i j) is alsoup Then the binary relationsr such thatP(middot r) preserves regularity are preciselythe up-preserving relations [86] This was extended to deletion on trajectories[17 22]

Theorem 33 Let r sube N2 be a binary relation and Hr = indm r(nm) TheoperationHr is regularity-preserving if and only if r is up-preserving

Theorem 33 has been extended by the author [17 22] to cover other boundedsets of trajectories

32 Algebraic Properties

Kari and Sosiacutek [56] show the following results concerning algebraic properties ofdeletion along trajectories

Theorem 34Let T sube idlowast The following three conditions are equivalent

5

(a) the operationT is commutative(b) L1T L2 sube ε for all languages L1 L2(c) T sube dlowast

Corollary 35 Given a context-free set of trajectories Tsube idlowast it is decidablewhetherT is a commutative operation

Theorem 36 For a set of trajectories Tsube idlowast the following two conditionsare equivalent

(i) For all t1 isin 1m 0n t2 isin 1n 0i and t3 isin 1j 0m+n i jmn isin N

(a) mgt 0 and t1 isin T implies t2 lt T(b) mgt 0 and t1 isin T implies t3 lt T(c) t1 isin T and0m isin T implies0n isin T(d) t1 isin T and0n isin T implies0m isin T

(ii) T is an associative operation

However the associated decidability problem is apparently open

Open Problem 37Given a regular set of trajectories Tsube idlowast is it decidablewhetherT is an associative operation What if T is context-free

33 Commutative Closure

Recall that the commutative closure of a wordw isin Σlowast is the setcom(w) = x foralla isin Σ |w|a = |x|a The commutative closure of a languageL is the union ofthe commutative closure of each word inL Note that the operationcomdoes notpreserve regularitycom((ab)lowast) = x isin ablowast |x|a = |x|b

The author Mateescu K Salomaa and Yu [23] have shown that ifT is com-plete and associative then for all languagesL we can define an a congruencerelationsimLT Then the factor monoid ofLT = (P(Σlowast) T ε) with respect tosimLT

is finite if and only ifcom(L) is a regular language [23]

4 Language Equations

The immediate consequence of the introduction of deletion on trajectories is itsapplication to language equations A language equation is an equality involvingconstant languages language operations and unknowns We refer the reader toLeiss [67] for an introduction to the theory of language equations

6

There are several facets to research on language equations For instance givena language equation or a system of language equations we can seek to character-ize the solutions to the equation or simply decide if a solution exists In theresearch on language equations involving shuffle on trajectories the focus of re-search has been the latter task

41 Zero Variable Equations

The most simple language equations are those without any variables at all Itfollows from the closure properties of T andT that it is decidable if eitherR1 T R2 = R3 or R1 T R2 = R3 holds if all of R1R2 andR3 are regular lan-guages andT is regular Below we review decidability results when the languagesand set of trajectories are not all regular

Let Ψa(T) = |t|a t isin T wherea is a letter and|t|a denotes the number ofzeroes int Kari and Sosiacutek [56] establish the following elegant result

Theorem 41 Let T be an arbitrary but fixed regular set of trajectories Theproblem ldquoIs L1 T L2 = Rrdquo is decidable for a CFL L1 and regular languagesL2R if and only ifΨ0(T) is finite

The analogous result for deletion on trajectories (ieΨi(T) is finite) also holds[56] Further ifL1 is taken to be a regular language andL2 is taken to be a CFLthen the necessary and sufficient conditions are thatΨ1(T) is finite [56]

We can contrast the above results with the following surprising result whichwas established by the author and K Salomaa [26]

Theorem 42 Let T = 0i1r0j1s0k r s r s are odd i j k ge 0 Then fora given alphabetΣ and given regular languages SR1R2 sube Σ it is undecidablewhether or not S= R1 T R2

The result is interesting since it is an undecidability result about regular lan-guages given three regular languages it is undecidable whether the given equalityholds the context-free set of trajectoriesT if fixed (and is not part of the input)

42 One Variable Equations

One variable equations involving shuffle and deletion on trajectories have beenstudied by the author [17 22] Kari and Sosiacutek [55 56] and the author and K Sa-lomaa [25] We summarize these results below

By general results of Kari on the inverse of language operations [50] the fol-lowing result on one variable language equations involving shuffle and deletionalong trajectories is immediate

7

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Also if x = axprime (a isin Σ) andt = etprime (e isin id) then

xt ε =

a(xprime tprime ε) if e= iempty otherwise

If x ε then x ε y = empty Furtherε t y = ε if t = y = ε Otherwiseε t y = empty

Let T sube idlowast ThenxT y =

⋃tisinT

xt y

We extend this to languages as expected LetL1 L2 sube Σlowast andT sube idlowast Then

L1T L2 =⋃xisinL1yisinL2

xT y

We consider the following examples of deletion along trajectories

(a) if T = ilowastdlowast thenT= the right-quotient operation(b) if T = dlowastilowast thenT= the left-quotient operation(c) if T = ilowastdlowastilowast thenT=rarr the deletion operation (see eg Kari[49 50])(d) if T = (i + d)lowast thenT= the scattered deletion operation (seeeg Itoet al [41])(e) if T = dlowastilowastdlowast thenT= the bi-polar deletion operation (seeeg Kari [50])(f) let k ge 0 andTk = ilowastdlowastilek ThenTk=rarr

k thek-deletion operation(see eg Kari and Thierrin [57])

We now recall some of the closure properties of deletion along trajectories

Theorem 31 Let Σ be an alphabet There exist weak codingsρ1 ρ2 τ ϕ and aregular language R such that for all L1 L2 sube Σ

lowast and all T sube idlowast

L1T L2 = ϕ(ρminus1

1 (L1) cap ρminus12 (L2) cap τ

minus1(T) cap R)

Corollary 32 LetL be a cone Then for all L1 L2T such that two are regularlanguages and the third is fromL L1T L2 isin L

31 Non-regular Trajectories Preserving Regularity

Consider the following result of Mateescuet al [79 Thm 51] if L1 T L2 isregular for all regular languagesL1 L2 thenT is regular This result is clear uponnoting that for allT 0lowast T 1lowast = T

4

However we note that the same result does not hold if we replace ldquoshuffleon trajectoriesrdquo by ldquodeletion along trajectoriesrdquo As motivation we begin with abasic example LetΣ be an alphabet andH = indn n ge 0 Note that

R1H R2 = x isin Σlowast existy isin R2 such thatxy isin R1 and|x| = |y|

We can establish directly (by constructing an NFA) that for all regular languagesR1R2 sube Σ

lowast the languageR1H R2 is regular HoweverH itself is not regularWe remark thatR1 H R2 is similar to proportional removals studied by

Stearns and Hartmanis [90] Amar and Putzolu [1 2] Seiferas and McNaughton[86] Kosaraju [60 61 62] Kozen [63] Zhang [96] the author [16] Bersteletal [4] and others In particular we note the case of1

2(L) given by

12

(L) = x isin Σlowast existy isin Σlowast such thatxy isin L and|x| = |y|

The operation12(L) is one of a class of operations which preserve regularity

Seiferas and McNaughton completely characterize those binary relationsr sube N2

such that the operation

P(L r) = x isin Σlowast existy isin Σlowast such thatxy isin L andr(|x| |y|)

preserves regularityRecall that a setA is ultimately periodic (up) if there existn0 p isin N p gt 0

such that for allx ge n0 x isin I lArrrArr x + p isin I Call a binary relationr sube N2

up-preservingif A up impliesrminus1(A) = i exist j isin A such thatr(i j) is alsoup Then the binary relationsr such thatP(middot r) preserves regularity are preciselythe up-preserving relations [86] This was extended to deletion on trajectories[17 22]

Theorem 33 Let r sube N2 be a binary relation and Hr = indm r(nm) TheoperationHr is regularity-preserving if and only if r is up-preserving

Theorem 33 has been extended by the author [17 22] to cover other boundedsets of trajectories

32 Algebraic Properties

Kari and Sosiacutek [56] show the following results concerning algebraic properties ofdeletion along trajectories

Theorem 34Let T sube idlowast The following three conditions are equivalent

5

(a) the operationT is commutative(b) L1T L2 sube ε for all languages L1 L2(c) T sube dlowast

Corollary 35 Given a context-free set of trajectories Tsube idlowast it is decidablewhetherT is a commutative operation

Theorem 36 For a set of trajectories Tsube idlowast the following two conditionsare equivalent

(i) For all t1 isin 1m 0n t2 isin 1n 0i and t3 isin 1j 0m+n i jmn isin N

(a) mgt 0 and t1 isin T implies t2 lt T(b) mgt 0 and t1 isin T implies t3 lt T(c) t1 isin T and0m isin T implies0n isin T(d) t1 isin T and0n isin T implies0m isin T

(ii) T is an associative operation

However the associated decidability problem is apparently open

Open Problem 37Given a regular set of trajectories Tsube idlowast is it decidablewhetherT is an associative operation What if T is context-free

33 Commutative Closure

Recall that the commutative closure of a wordw isin Σlowast is the setcom(w) = x foralla isin Σ |w|a = |x|a The commutative closure of a languageL is the union ofthe commutative closure of each word inL Note that the operationcomdoes notpreserve regularitycom((ab)lowast) = x isin ablowast |x|a = |x|b

The author Mateescu K Salomaa and Yu [23] have shown that ifT is com-plete and associative then for all languagesL we can define an a congruencerelationsimLT Then the factor monoid ofLT = (P(Σlowast) T ε) with respect tosimLT

is finite if and only ifcom(L) is a regular language [23]

4 Language Equations

The immediate consequence of the introduction of deletion on trajectories is itsapplication to language equations A language equation is an equality involvingconstant languages language operations and unknowns We refer the reader toLeiss [67] for an introduction to the theory of language equations

6

There are several facets to research on language equations For instance givena language equation or a system of language equations we can seek to character-ize the solutions to the equation or simply decide if a solution exists In theresearch on language equations involving shuffle on trajectories the focus of re-search has been the latter task

41 Zero Variable Equations

The most simple language equations are those without any variables at all Itfollows from the closure properties of T andT that it is decidable if eitherR1 T R2 = R3 or R1 T R2 = R3 holds if all of R1R2 andR3 are regular lan-guages andT is regular Below we review decidability results when the languagesand set of trajectories are not all regular

Let Ψa(T) = |t|a t isin T wherea is a letter and|t|a denotes the number ofzeroes int Kari and Sosiacutek [56] establish the following elegant result

Theorem 41 Let T be an arbitrary but fixed regular set of trajectories Theproblem ldquoIs L1 T L2 = Rrdquo is decidable for a CFL L1 and regular languagesL2R if and only ifΨ0(T) is finite

The analogous result for deletion on trajectories (ieΨi(T) is finite) also holds[56] Further ifL1 is taken to be a regular language andL2 is taken to be a CFLthen the necessary and sufficient conditions are thatΨ1(T) is finite [56]

We can contrast the above results with the following surprising result whichwas established by the author and K Salomaa [26]

Theorem 42 Let T = 0i1r0j1s0k r s r s are odd i j k ge 0 Then fora given alphabetΣ and given regular languages SR1R2 sube Σ it is undecidablewhether or not S= R1 T R2

The result is interesting since it is an undecidability result about regular lan-guages given three regular languages it is undecidable whether the given equalityholds the context-free set of trajectoriesT if fixed (and is not part of the input)

42 One Variable Equations

One variable equations involving shuffle and deletion on trajectories have beenstudied by the author [17 22] Kari and Sosiacutek [55 56] and the author and K Sa-lomaa [25] We summarize these results below

By general results of Kari on the inverse of language operations [50] the fol-lowing result on one variable language equations involving shuffle and deletionalong trajectories is immediate

7

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

However we note that the same result does not hold if we replace ldquoshuffleon trajectoriesrdquo by ldquodeletion along trajectoriesrdquo As motivation we begin with abasic example LetΣ be an alphabet andH = indn n ge 0 Note that

R1H R2 = x isin Σlowast existy isin R2 such thatxy isin R1 and|x| = |y|

We can establish directly (by constructing an NFA) that for all regular languagesR1R2 sube Σ

lowast the languageR1H R2 is regular HoweverH itself is not regularWe remark thatR1 H R2 is similar to proportional removals studied by

Stearns and Hartmanis [90] Amar and Putzolu [1 2] Seiferas and McNaughton[86] Kosaraju [60 61 62] Kozen [63] Zhang [96] the author [16] Bersteletal [4] and others In particular we note the case of1

2(L) given by

12

(L) = x isin Σlowast existy isin Σlowast such thatxy isin L and|x| = |y|

The operation12(L) is one of a class of operations which preserve regularity

Seiferas and McNaughton completely characterize those binary relationsr sube N2

such that the operation

P(L r) = x isin Σlowast existy isin Σlowast such thatxy isin L andr(|x| |y|)

preserves regularityRecall that a setA is ultimately periodic (up) if there existn0 p isin N p gt 0

such that for allx ge n0 x isin I lArrrArr x + p isin I Call a binary relationr sube N2

up-preservingif A up impliesrminus1(A) = i exist j isin A such thatr(i j) is alsoup Then the binary relationsr such thatP(middot r) preserves regularity are preciselythe up-preserving relations [86] This was extended to deletion on trajectories[17 22]

Theorem 33 Let r sube N2 be a binary relation and Hr = indm r(nm) TheoperationHr is regularity-preserving if and only if r is up-preserving

Theorem 33 has been extended by the author [17 22] to cover other boundedsets of trajectories

32 Algebraic Properties

Kari and Sosiacutek [56] show the following results concerning algebraic properties ofdeletion along trajectories

Theorem 34Let T sube idlowast The following three conditions are equivalent

5

(a) the operationT is commutative(b) L1T L2 sube ε for all languages L1 L2(c) T sube dlowast

Corollary 35 Given a context-free set of trajectories Tsube idlowast it is decidablewhetherT is a commutative operation

Theorem 36 For a set of trajectories Tsube idlowast the following two conditionsare equivalent

(i) For all t1 isin 1m 0n t2 isin 1n 0i and t3 isin 1j 0m+n i jmn isin N

(a) mgt 0 and t1 isin T implies t2 lt T(b) mgt 0 and t1 isin T implies t3 lt T(c) t1 isin T and0m isin T implies0n isin T(d) t1 isin T and0n isin T implies0m isin T

(ii) T is an associative operation

However the associated decidability problem is apparently open

Open Problem 37Given a regular set of trajectories Tsube idlowast is it decidablewhetherT is an associative operation What if T is context-free

33 Commutative Closure

Recall that the commutative closure of a wordw isin Σlowast is the setcom(w) = x foralla isin Σ |w|a = |x|a The commutative closure of a languageL is the union ofthe commutative closure of each word inL Note that the operationcomdoes notpreserve regularitycom((ab)lowast) = x isin ablowast |x|a = |x|b

The author Mateescu K Salomaa and Yu [23] have shown that ifT is com-plete and associative then for all languagesL we can define an a congruencerelationsimLT Then the factor monoid ofLT = (P(Σlowast) T ε) with respect tosimLT

is finite if and only ifcom(L) is a regular language [23]

4 Language Equations

The immediate consequence of the introduction of deletion on trajectories is itsapplication to language equations A language equation is an equality involvingconstant languages language operations and unknowns We refer the reader toLeiss [67] for an introduction to the theory of language equations

6

There are several facets to research on language equations For instance givena language equation or a system of language equations we can seek to character-ize the solutions to the equation or simply decide if a solution exists In theresearch on language equations involving shuffle on trajectories the focus of re-search has been the latter task

41 Zero Variable Equations

The most simple language equations are those without any variables at all Itfollows from the closure properties of T andT that it is decidable if eitherR1 T R2 = R3 or R1 T R2 = R3 holds if all of R1R2 andR3 are regular lan-guages andT is regular Below we review decidability results when the languagesand set of trajectories are not all regular

Let Ψa(T) = |t|a t isin T wherea is a letter and|t|a denotes the number ofzeroes int Kari and Sosiacutek [56] establish the following elegant result

Theorem 41 Let T be an arbitrary but fixed regular set of trajectories Theproblem ldquoIs L1 T L2 = Rrdquo is decidable for a CFL L1 and regular languagesL2R if and only ifΨ0(T) is finite

The analogous result for deletion on trajectories (ieΨi(T) is finite) also holds[56] Further ifL1 is taken to be a regular language andL2 is taken to be a CFLthen the necessary and sufficient conditions are thatΨ1(T) is finite [56]

We can contrast the above results with the following surprising result whichwas established by the author and K Salomaa [26]

Theorem 42 Let T = 0i1r0j1s0k r s r s are odd i j k ge 0 Then fora given alphabetΣ and given regular languages SR1R2 sube Σ it is undecidablewhether or not S= R1 T R2

The result is interesting since it is an undecidability result about regular lan-guages given three regular languages it is undecidable whether the given equalityholds the context-free set of trajectoriesT if fixed (and is not part of the input)

42 One Variable Equations

One variable equations involving shuffle and deletion on trajectories have beenstudied by the author [17 22] Kari and Sosiacutek [55 56] and the author and K Sa-lomaa [25] We summarize these results below

By general results of Kari on the inverse of language operations [50] the fol-lowing result on one variable language equations involving shuffle and deletionalong trajectories is immediate

7

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

(a) the operationT is commutative(b) L1T L2 sube ε for all languages L1 L2(c) T sube dlowast

Corollary 35 Given a context-free set of trajectories Tsube idlowast it is decidablewhetherT is a commutative operation

Theorem 36 For a set of trajectories Tsube idlowast the following two conditionsare equivalent

(i) For all t1 isin 1m 0n t2 isin 1n 0i and t3 isin 1j 0m+n i jmn isin N

(a) mgt 0 and t1 isin T implies t2 lt T(b) mgt 0 and t1 isin T implies t3 lt T(c) t1 isin T and0m isin T implies0n isin T(d) t1 isin T and0n isin T implies0m isin T

(ii) T is an associative operation

However the associated decidability problem is apparently open

Open Problem 37Given a regular set of trajectories Tsube idlowast is it decidablewhetherT is an associative operation What if T is context-free

33 Commutative Closure

Recall that the commutative closure of a wordw isin Σlowast is the setcom(w) = x foralla isin Σ |w|a = |x|a The commutative closure of a languageL is the union ofthe commutative closure of each word inL Note that the operationcomdoes notpreserve regularitycom((ab)lowast) = x isin ablowast |x|a = |x|b

The author Mateescu K Salomaa and Yu [23] have shown that ifT is com-plete and associative then for all languagesL we can define an a congruencerelationsimLT Then the factor monoid ofLT = (P(Σlowast) T ε) with respect tosimLT

is finite if and only ifcom(L) is a regular language [23]

4 Language Equations

The immediate consequence of the introduction of deletion on trajectories is itsapplication to language equations A language equation is an equality involvingconstant languages language operations and unknowns We refer the reader toLeiss [67] for an introduction to the theory of language equations

6

There are several facets to research on language equations For instance givena language equation or a system of language equations we can seek to character-ize the solutions to the equation or simply decide if a solution exists In theresearch on language equations involving shuffle on trajectories the focus of re-search has been the latter task

41 Zero Variable Equations

The most simple language equations are those without any variables at all Itfollows from the closure properties of T andT that it is decidable if eitherR1 T R2 = R3 or R1 T R2 = R3 holds if all of R1R2 andR3 are regular lan-guages andT is regular Below we review decidability results when the languagesand set of trajectories are not all regular

Let Ψa(T) = |t|a t isin T wherea is a letter and|t|a denotes the number ofzeroes int Kari and Sosiacutek [56] establish the following elegant result

Theorem 41 Let T be an arbitrary but fixed regular set of trajectories Theproblem ldquoIs L1 T L2 = Rrdquo is decidable for a CFL L1 and regular languagesL2R if and only ifΨ0(T) is finite

The analogous result for deletion on trajectories (ieΨi(T) is finite) also holds[56] Further ifL1 is taken to be a regular language andL2 is taken to be a CFLthen the necessary and sufficient conditions are thatΨ1(T) is finite [56]

We can contrast the above results with the following surprising result whichwas established by the author and K Salomaa [26]

Theorem 42 Let T = 0i1r0j1s0k r s r s are odd i j k ge 0 Then fora given alphabetΣ and given regular languages SR1R2 sube Σ it is undecidablewhether or not S= R1 T R2

The result is interesting since it is an undecidability result about regular lan-guages given three regular languages it is undecidable whether the given equalityholds the context-free set of trajectoriesT if fixed (and is not part of the input)

42 One Variable Equations

One variable equations involving shuffle and deletion on trajectories have beenstudied by the author [17 22] Kari and Sosiacutek [55 56] and the author and K Sa-lomaa [25] We summarize these results below

By general results of Kari on the inverse of language operations [50] the fol-lowing result on one variable language equations involving shuffle and deletionalong trajectories is immediate

7

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

There are several facets to research on language equations For instance givena language equation or a system of language equations we can seek to character-ize the solutions to the equation or simply decide if a solution exists In theresearch on language equations involving shuffle on trajectories the focus of re-search has been the latter task

41 Zero Variable Equations

The most simple language equations are those without any variables at all Itfollows from the closure properties of T andT that it is decidable if eitherR1 T R2 = R3 or R1 T R2 = R3 holds if all of R1R2 andR3 are regular lan-guages andT is regular Below we review decidability results when the languagesand set of trajectories are not all regular

Let Ψa(T) = |t|a t isin T wherea is a letter and|t|a denotes the number ofzeroes int Kari and Sosiacutek [56] establish the following elegant result

Theorem 41 Let T be an arbitrary but fixed regular set of trajectories Theproblem ldquoIs L1 T L2 = Rrdquo is decidable for a CFL L1 and regular languagesL2R if and only ifΨ0(T) is finite

The analogous result for deletion on trajectories (ieΨi(T) is finite) also holds[56] Further ifL1 is taken to be a regular language andL2 is taken to be a CFLthen the necessary and sufficient conditions are thatΨ1(T) is finite [56]

We can contrast the above results with the following surprising result whichwas established by the author and K Salomaa [26]

Theorem 42 Let T = 0i1r0j1s0k r s r s are odd i j k ge 0 Then fora given alphabetΣ and given regular languages SR1R2 sube Σ it is undecidablewhether or not S= R1 T R2

The result is interesting since it is an undecidability result about regular lan-guages given three regular languages it is undecidable whether the given equalityholds the context-free set of trajectoriesT if fixed (and is not part of the input)

42 One Variable Equations

One variable equations involving shuffle and deletion on trajectories have beenstudied by the author [17 22] Kari and Sosiacutek [55 56] and the author and K Sa-lomaa [25] We summarize these results below

By general results of Kari on the inverse of language operations [50] the fol-lowing result on one variable language equations involving shuffle and deletionalong trajectories is immediate

7

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Theorem 43Let LR be regular languages Then it is decidable whether any ofthe following equations have a solution X and if so the maximal solution (underinclusion) is an effectively constructible regular language

(a) X T L = R where Tsube 01lowast is regular(b) L T X = R where Tsube 01lowast is regular(c) XT L = R where Tsube idlowast is regular(d) LT X = R where Tsube idlowast is regular

We can also examine the question of if it is decidable whether a set of trajec-tories can be found to satisfy a fixed language equation

Theorem 44 Let L1 L2R sube Σlowast be regular languages Then it is decidablewhether

(a) there exists a set Tsube 01lowast of trajectories with L1 T L2 = R(b) there exists a set Tsube idlowast of trajectories with L1T L2 = R

We now turn to undecidability LetΠ0Π1 01lowast rarr 01lowast be the projectionsgiven byΠ0(0) = 0Π0(1) = ε andΠ1(1) = 1Π1(0) = ε We say thatT sube 01lowast

is left-enabling(respright-enabling) if Π0(T) = 0lowast (respΠ1(T) = 1lowast) It isundecidable whether the corresponding equation has a solution

Theorem 45 Fix T sube 01lowast to be a regular set of left-enabling (resp right-enabling) trajectories For a given LCFL L and regular language R it is undecid-able whether or not L T X = R (resp X T L = R) has a solution X

Say that a set of trajectories is left-preserving (resp right-preserving) ifT supe0lowast (respT supe 1lowast) We can give an incomparable result which removes the condi-tion thatT must be regular but must strengthen the conditions on words inT

Theorem 46 Fix a left-preserving (resp right-preserving) set of trajectoriesT sube 01lowast Given an LCFL L and a regular language R it is undecidable whetherthere exists a language X such that LT X = R (resp X T L = R)

By an extension of Theorem 42 we can also show that there exists a fixedcontext-free set of trajectoriesT such that on inputR1R2 (regular languages)deciding whether or not there exists a languageX such thatR1 = X T R2 isundecidable [26] We also have the following undecidability results concerningthe existence of sets of trajectories [25 22]

Theorem 47 Given an LCFL L and regular languages R1R2 it is undecidablewhether there exists Tsube 01lowast such that (a) R1 T R2 = L (b) R1 T L = R2 or(c) L T R1 = R2

Given an LCFL L and regular languages R1R2 it is undecidable whetherthere exists Tsube idlowast such that (a) R1 T R2 = L (b) L T R1 = R2 or (c)R1T L = R2

8

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

43 Two Variable Equations

We can also consider language equations with two variables As an exampleconsider the well-studied language equation

L = X1X2 (1)

whereL is a fixed language andX1X2 are unknown Study of this equation hasbeen undertaken by Conway [10] Kari and Thierrin [59] Salomaa and Yu [85]and Choffrut and Karhumaumlki [9] In particular it is known that ifL is a regularlanguage we can determine if a solution to (1) exists Furthermore it is knownthat if a solutionX1X2 exists there exists a maximal regular solution ie thereexistsR1R2 such thatXi sube Ri for i = 12 andL = R1R2

For language equations of the formL = X1 T X2 unlike the case of one-variable equations we do not have a general result which applies to all regularsets of trajectoriesT However a large class of trajectories are covered by theauthor and K Salomaa [25 22]

Recall that a languageL sube Σlowast is boundedif there existw1w2 wn isin Σlowast

such thatL sube wlowast1wlowast2 middot middot middotw

lowastn We say thatL is letter-boundedif wi isin Σ for all

1 le i le n The following result is due to the author and K Salomaa [25 22]

Theorem 48 Let T sube 01lowast be a letter-bounded regular set of trajectoriesThen given a regular language R it is decidable whether there exist X1X2 suchthat X1 T X2 = R

As we have mentioned this result was known for catenationT = 0lowast1lowast How-ever it also holds for eg the following operations insertion (0lowast1lowast0lowast) k-insertion(0lowast1lowast0lek for fixedk ge 0) and bi-catenation (1lowast0lowast + 0lowast1lowast)

We also note that if the equationX1 T X2 = R has a solution whereR isa regular language andT is a letter-bounded regular set of trajectories then theequation also has solutionY1 T Y2 = R whereY1Y2 are regular languages Thisresult is well-known forT = 0lowast1lowast (see eg Choffrut and Karhumaumlki [9])

For T = (0+ 1)lowast the two-variable decomposition problem is open [7]

Open Problem 49Given a regular language R is it decidable whether thereexist X1X2 (with X1X2 ε) such that R= X1 X2

Recall that a languageL is k-thin if |L cap Σn| le k for all n ge 0 The followingproblem is also open

Open Problem 410Given a k-thin set of trajectories Tsube 01lowast is it decidablegiven a regular language R whether R has a shuffle decomposition with respectto T

9

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

The author and K Salomaa [26] have shown that ifT sube 01lowast is a fixed1-thin set of trajectories given a regular languageR it is decidable whetherRhas a shuffle decomposition with respect toT However even for 2-thin sets oftrajectories the problem remains open [26]

We can also note that if the left and right operand are restricted to be the samethe resulting language equation problem remains decidable if the set of trajectoriesis regular and letter-bounded [25 22]

Theorem 411Fix a letter-bounded regular set of trajectories Tsube 01lowast Thenit is decidable whether there exists a solution X to the equation XT X = R for agiven regular language R

We now turn to undecidability It has been shown [7] that it is undecidablewhether a context-free language has a nontrivial shuffle decomposition with re-spect to the set of trajectories01lowast This result can be extended for arbitrarycomplete regular sets trajectories [25] (Note that ifT is a complete set of trajec-tories then any languageL has decompositionsL Tε andε TL Below weexclude these trivial decompositions all other decompositions ofL are said to benontrivial)

Theorem 412Let T be any fixed complete regular set of trajectories For a givencontext-free language L it is undecidable whether or not there exist languagesX1X2 ε such that L= X1 TX2

We also note the following open problem [26]

Open Problem 413Is it possible to construct a fixed context-free set of trajecto-ries T such that it is undecidable whether there exist languages X1X2 ε suchthat L= X1 TX2

Also open are other forms of language equation We mention only two here

Open Problem 414Find necessary and sufficient conditions on sets of trajecto-ries T1T2 so that given a regular language R it is decidable whether there existnontrivial languages X1X2X3 satisfying(X1 T1 X2) T2 X3 = R

Open Problem 415Given regular languages R1R2 is it decidable whetherthere exists nontrivial languages X1T (with T sube 01lowast) such that X1 T R1 = R2

or R1 T X1 = R2

5 Splicing on Routes

The notion of shuffle on trajectories was extended by Mateescu [77] to encompasscertain splicing operations This extension is calledsplicing on routes Splicing

10

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

on routes is a proper extension of shuffle on trajectories and also encompassesseveral unary operations Bel-Enguixet al also use the concept of splicing onroutes to model dialog in natural language [3]

Splicing on routes was introduced by Mateescu [77] to model generaliza-tions of the crossover splicing operation (see Mateescu [77] for a definition ofthe crossover splicing operation) Splicing on routes generalizes the crossoversplicing operation by specifying a setT of routes which restricts the way in whichsplicing can occur The result is that specific sets of routes can simulate not onlythe crossover operation but also such operations on DNA such as thesimple splic-ing and theequal-length crossoveroperations (see Mateescu for details and defi-nitions of these operations [77])

We now define the concept ofsplicing on routes and note the difference be-tween deletion along trajectories from splicing on routes which allows discardingletters from either input word In particular aroute is a wordt specified overthe alphabet0011 where informally 01 means insert the letter from theappropriate word and01 means discard that letter and continue

Formally letx y isin Σlowast andt isin 0011lowast We define the splicing ofx andydenotedx t y recursively as follows ifx = axprime y = byprime (ab isin Σ) andt = ctprime

(c isin 0011) then

x ctprime y =

a(xprime tprime y) if c = 0(xprime tprime y) if c = 0b(x tprime yprime) if c = 1(x tprime yprime) if c = 1

If x = axprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

x ctprime ε =

a(xprime tprime ε) if c = 0(xprime tprime ε) if c = 0empty otherwise

If y = byprime andt = ctprime wherea isin Σ andc isin 0011 then

ε ctprime y =

b(ε tprime yprime) if c = 1(ε tprime yprime) if c = 1empty otherwise

11

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

We havex ε y = empty if x y ε Finally we setε t ε = ε if t = ε andemptyotherwise We extendt to sets of routes and languages as expected

x T y =⋃tisinT

x t y forallT sube 0011lowast x y isin Σlowast

L1 T L2 =⋃

xisinL1yisinL2

x T y

For example ifx = abc y = cbcandT = 010011010011 thenx T y =acbcbcabbc

It turns out that splicing on routes can be simulated by a combination of shuffleon trajectories and deletion along trajectories [22]

Theorem 51 There exist weak codingsπ1 π2 0101lowast rarr idlowast and a weakcodingπ3 0101lowast rarr 01lowast such that for all t isin 0011lowast and for allx y isin Σlowast we have

x t y = (xπ1(t) Σlowast) π3(t) (yπ2(t) Σ

lowast)

Corollary 52 There exist weak codingsπ1 π2 0101lowast rarr idlowast and π3 0101lowast rarr 01lowast such that for all Tsube 0011lowast and L1 L2 sube Σ

lowast

L1 T L2 =⋃tisinT

(L1π1(t) Σlowast) π3(t) (L2π2(t) Σ

lowast)

Unfortunately the identity

L1 T L2 = (L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast)

does not hold in general even ifL1 L2 are singletons and|T | = 2 For example ifL1 = ab L2 = cd andT = 00110011 then

L1 T L2 = bcad

(L1π1(T) Σlowast) π3(T) (L2π2(T) Σ

lowast) = acadbcbd

However ifT is aunaryset of routes by which we mean thatT sube 00lowast1lowast

then we have the following result [22]

Corollary 53 Let T sube 00lowast1lowast Then for all Lsube Σlowast

L T Σlowast = Lπ1(T) Σ

lowast

12

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

We refer the reader to Mateescu [77] for a discussion of unary operationsdefined by splicing on routes As an example consider that withT = 0n0

n n ge

01lowast L T Σ

lowast = 12(L) where1

2(L) was given in Section 31Bel-Enguixet al [3] have used splicing on routes to model dialogue in natural

language Adialogueis the result of two or more actors who alternately com-municate The alternating nature of trajectory-based operations makes it naturalto apply them to modeling dialogue Routes and in particular the ability to notinsert portions of certain actorrsquos input is important for selecting different piecesof information for different dialogues ldquoDepending on the conversational goal andon the behaviour of the agents different acts will be selected rdquo [3] Bel-Enguixetal [3] note three important classes of routes used in dialogue

(a) Imbrication in a dialogue is the result of applying a route from the set ofroutes defined by the regular expression ((01)+ (01))+ Imbrication repre-sents the ldquoperfect conversationrdquo where actors strictly alternate

(b) Concatenationin a dialogue is the result of applying a route from the setof route defined by the regular expression (0+ 0)+(1 + 1)+ According toBel-Enguix concatenation has applications to more formal dialogues suchas debates and round tables [3]

(c) Merging is the result of applying a route which does not fall into the cat-egories of imbrication or concatenation Merging is more common thanimbrication and concatenation in informal dialogues

Bel-Enguixet al [3] mention several open areas of research relating to theuse of splicing on routes for modelling dialogue In particular they ask aboutestablishing a mechanism in order to select routes depending on context Thiswould establish a more semantic operation in the terminology of Section 6 below

6 Semantic Shuffle on Trajectories

In the paper which introduced shuffle on trajectories Mateescuet almake a dis-tinction betweensyntacticandsemanticoperations on words

[Shuffle on trajectories is] based on syntactic constraints on the shuffleoperations The constraints are referred to as syntactic constraintssince they do not concern properties of the words that are shuffled orproperties of the letters that occur in these words

Instead the constraints involve the general strategy to switch fromone word to another word Once such a strategy is defined the struc-ture of the words that are shuffled does not play any role

13

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

However constraints that take into consideration the inner structureof the words that are shuffled together are referred to as semantic con-straints [79 p 2]

The author [19] has introduced a semantic variant of shuffle on trajectoriesnaturally calledsemantic shuffle on trajectories(SST) The corresponding no-tion for deletion on trajectories is calledsemantic deletion on trajectories(SDT)The advantages of SST and SDT are that they preserves many of the desirableproperties of the usual syntactic shuffle on trajectories while being capable ofsimulating more operations of interest

The two semantic constructs we introduce aresynchronizationand contentrestriction Synchronization allows for only one letter to be output for two cor-responding identical symbols in the input words Content restriction allows atrajectory to specify that a particular letter must appear at a specific point This isinspired by bio-informatical operations where operations occur only in the con-text of certain subsequences of the DNA strand

Before we define SST we define the trajectory alphabet LetΓ = 01 σ Forany alphabetΣ letΓΣ = Γ cup (Γ times Σ) For ease of readability we denote [ca] by

ac

for all a isin Σ andc isin ΓWe can now define the SST operation LetΣ be an alphabett isin Γlowast

Σand

x y isin Σlowast Then the SST ofx andy alongt denotedx t y is defined as followsIf x = axprime y = byprime (whereab isin Σ xprime yprime isin Σlowast) andt = ctprime wherec isin ΓΣ andtprime isin Γlowast

Σ then

x t y =

a(xprime tprime y) if c isin 0

a0

b(x tprime yprime) if c isin 1b1

a(xprime tprime yprime) if a = b andc isin σaσ

empty otherwise

If x = axprime y = ε andt = ctprime then

x t ε =

a(xprime tprime ε) if c isin 0a0

empty otherwise

If x = ε y = byprime andt = ctprime then

ε t y =

b(ε tprime yprime) if c isin 1b1

empty otherwise

If x = y = ε thenx t y = ε if t = ε andempty otherwise Finally ifx y ε thenx ε y = empty If x y isin Σlowast andT sube Γlowast

Σ thenx T y = cuptisinT x T y If L1 L2 sube Σ

lowast and

14

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

T sube ΓlowastΣ thenL1 T L2 = cupxisinL1yisinL2 x T y The associated deletion operation over

the alphabet∆Σ = ∆ cup (∆ times Σ) (where∆ = id σ) is defined in the natural way[19]

We now consider givenΣ and two sets of trajectoriesT1T2 sube ΓlowastΣ whether

the operations T1 T2 coincide that is whetherL1 T1 L2 = L1 T2 L2 for alllanguagesL1 L2 sube Σ

lowast If T1 T2 represent in this sense the same operation wesay thatT1T2 areequivalentsets of trajectories

We note that it is possible for two distinct sets of trajectoriesT1T2 sube ΓlowastΣ

to be

equivalent As a simple example considerT1 = a0

a1 andT2 =

a1

a0 Note that for

i = 12

L1 Ti L2 =

aa if L1 cap L2 supe aempty otherwise

ThusT1T2 are equivalent but not equalBy using a special case of partial commutation and trace languages (see Diek-

ert and Meacutetivier [15]) we can show that two sets of trajectoriesT1T2 sube ΓlowastΣ

areequivalent if and only if their corresponding trace languages (under the naturalmorphism) are equivalent This implies the following important decidability re-sult

Theorem 61 Let Σ be an alphabet and T1T2 sube ΓlowastΣ If T1T2 are regular it is

decidable whether T1 and T2 are equivalent

We can now consider some examples of the power of SST and SDT We firstnote that SST consists of a valid extension of the shuffle on trajectories ifT sube01lowast thenL1 T L2 = L1 T L2 the syntactic shuffle on trajectories operation[79] We also note that ifT = σlowast then T = cap

Given the very natural set of trajectoriesT = (0 + 1 + σ)lowast T denotes theinfiltration product uarr see eg Pin and Sakarovitch [81] The infiltration productis defined as follows ifx = x1 xn is a word of lengthn andI = (i1 i2 ir) isa subsequence of (12 n) let xI = xi1 xi2 middot middot middot xir Then givenx y isin Σlowast

x uarr y = z isin Σlowast existI J sube [|z|] such thatI cup J = [|z|] zI = x andzJ = y

For exampleabuarr ba= abababbaabbabaabbaababWe now show how to use SST and SDT to simulate ciliate bio-operations

which have been the subject of recent research in the literature A model of cil-iate bio-operations without circular variants were introduced by Daleyet al [11]to mimic the manner in which DNA is unscrambled in the DNA of certain uni-cellular ciliates in the process of asexual reproduction Ciliate bio-operations arealso investigated by Ehrenfeuchtet al [28] Prescottet al [82] and Daley andMcQuillan [12] using various approaches

15

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Daley et al [11] define several language operations which simulate ciliatebio-operations including synchronized insertion deletion and bi-polar deletionSynchronized insertion can be given as follows

α oplus β = uavaw a isin Σ α = uaw β = va

The operation is extended to languages as usual LetT =⋃

aisinΣ 0lowasta0 1lowast

a1 0lowast Then

for all L1 L2 sube Σlowast L1 oplus L2 = L1 T L2 The operations of synchronized deletion

and synchronized bi-polar deletion (also defined by Daleyet al [11]) can also besimulated by SDT

Contextual insertion and deletion were introduced by Kari and Thierrin as asimple set of operations which are capable for modelling DNA computing [58]Let Σ be an alphabet and [x y] isin (Σlowast)2 We call [x y] a context Then givenvu isin Σlowast the [x y]-contextual insertionof v into u is given byu larrminus[xy] v = u1xvyu2 u = u1xyu2u1u2 isin Σ

lowast Let C sube (Σlowast)2 Then

u larrminusC v =⋃

[xy]isinC

u larrminus[xy] v

The operationlarrminusC is extended to languages monotonically as expected Letx =x1 middot middot middot xn andy = y1 middot middot middot ym be arbitrary words overΣ Then define

T[xy] = 0lowastnprod

i=1

xi

0 1lowastmprod

i=1

yi

0 0lowast

We naturally extend this toTC = cup[xy]isinCT[xy] for all C sube (Σlowast)2 Under this defini-tion it is clear that TC =

larrminusC for all C sube (Σlowast)2

Kari and Thierrin note that ifC sube (Σlowast)2 is finite then the regular and context-free languages are closed underlarrminus

C As TC is regular for all finiteC we notethat the closure of the regular languages underlarrminus

C is a consequence of the closureproperties of SST It is known that the CFLs are closed underlarrminus

C [58] Howeverin general the CFLs are not closed underT This leads to the following openproblem

Open Problem 62Find necessary and sufficient (language-theoretic) conditionson a set of trajectories Tsube Γlowast

Σsuch that the CFLs overΣ are closed under T

We note that [x y]-contextual deletion and [x y]-contextual bi-polar deletion[58] can be simulated by SDT [19]

Recall thatsynchronized shuffle (see eg Latteux and Roos [66]) is defined asfollows LetΣ1Σ2 be alphabets not necessarily disjoint Letρi (Σ1 cup Σ2) rarr Σi

be the projection ontoΣi given byρi(a) = a for all a isin Σi andρi(a) = ε for alla isin (Σ1 cup Σ2) minus Σi

16

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Let Li sube Σi for i = 12 Then the synchronized shuffle of L1 andL2 denotedL1 L2 is given by

L1 L2 = ρminus11 (L1) cap ρ

minus12 (L2)

Lemma 63 LetΣ1Σ2 be alphabets Let Tsube ΓlowastΣ1cupΣ2

be given by

T = ((⋃

aisinΣ1capΣ2

aσ) + (

⋃aisinΣ1minusΣ2

a0) + (

⋃aisinΣ2minusΣ1

a1))lowast

Then for all L1 L2 such that Li sube Σi for i = 12 L1 L2 = L1 T L2

Further examples of operations simulated by SST are described by the author[19]

We note that many of the results on language equations hold for SST In par-ticular the following are new results on bio-operations which have not been notedbefore [19]

Corollary 64 Let R be a regular language Then it is decidable whether thereexist languages X1X2 such that R= X1 oplus X2

Corollary 65 Let C sube (Σlowast)2 be a finite set of contexts Let R be a regularlanguage Then it is decidable whether there exist languages X1X2 such thatR= X1

larrC X2

7 Descriptional Complexity

Descriptional complexity of formal languages deals with the problems of concisedescriptions of languages in terms of generative or accepting devices For in-stance the(deterministic) state complexityof a regular languageL is the minimalnumber of states in any deterministic finite automaton acceptingL [94] Nonde-terministic state complexity of a regular language is similarly defined (see egHolzer and Kutrib [36])

For shuffle on trajectories Mateescuet al [79] and Harjuet al [34] both giveproofs that given a regular set of trajectoriesT and regular languagesL1 L2 theoperationL1 T L2 always yields a regular language Thus it is reasonable toconsider the state complexity of shuffle on trajectories The author and K Salo-maa [24] have obtained results in this area We state an upper bound in terms ofnondeterministic state complexity

Lemma 71 Let L1 L2 be regular languages overΣlowast and Tsube 01lowast be a regularset of trajectories Then

sc(L1 T L2) le 2nsc(L1)nsc(L2)nsc(T)

17

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

The following open problem appears to be very challenging

Open Problem 72 For what regular sets of trajectories Tsube 01lowast does theconstruction given by Lemma 71 give a construction which is best possible

Consider unrestricted shuffle given by the set of trajectoriesT = (0 + 1)lowastThe bound of Lemma 71 in this case is 2nsc(L1)nsc(L2) Cacircmpeanuet al [8] haveshown that there exist languagesL1 andL2 accepted by incomplete DFAs havingrespectivelyn andm states such that any incomplete DFA acceptingL1 L2 hasat least 2nm minus 1 states This bound is optimal for incomplete DFAs howeverfor complete DFAs it gives only the lower bound 2(sc(L1)minus1)(sc(L2)minus1) However weregard this as near enough to our goal of Lemma 71 for our purposes ie weregardT = (0 + 1)lowast as an example of a set of trajectoriesT satisfying OpenProblem 72

The density functionof a languageL sube Σlowast is defined bypL N rarr N aspL(n) = |L cap Σn| for all n ge 0 That is pL(n) gives the number of words oflengthn in L By the density of a languageL we informally mean the asymptoticbehaviour ofpL The following important result of Szilardet al [91 Thm 3]characterizes the density of regular languages

Theorem 73 A regular language R overΣ satisfies pR(n) isin O(nk) k ge 0 ifand only if R can be represented as a finite union of regular expressions of thefollowing form xylowast1z1 middot middot middot ylowastt zt where x y1 z1 middot middot middot yt zt isin Σ

lowast and0 le t le k+ 1

Call a languageL slenderif pL(n) isin O(1) [83] Let R be a regular languagewhich has polynomial densityO(nk) and lett be the smallest integer such thatR = cupt

i=1xiylowasti1zi1 middot middot middot ylowastikiziki 0 le ki le k + 1 i = 1 t Then callt the UkL-

indexof L If k = 0 we callt theUSL-indexof L (languages with USL indextare calledt-thin by Paun and Salomaa [83] slender regular languages were alsocharacterized independently by Shallit [87 Lemma 3 p 336])

Lemma 74 Let T = uvlowast where u v isin 01lowast Let Li be regular languages overΣwith sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le |uv|n1n2 (2)

We now give a bound for sets of trajectoriesT = uvlowastw with w ε

Lemma 75 Let T = uvlowastw where u vw isin 01lowast and w ε Let Li be regularlanguages overΣ with sc(Li) = ni i = 12 Let L= L1 T L2 Then

sc(L) le n1n2

|u| + 1+ |v|(n1n2)

lceil|w||v|

rceil+1minus n1n2

n1n2 minus 1

(3)

18

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Our aim is to obtain a lower bound for the shuffle operation on trajectorieswith USL index 1 It seems likely that the bound (3) cannot be reached for anyfixed set of trajectories (and for all values ofsc(Li)i = 12) In particular if|w| isfixed andsc(Li) can grow arbitrarily then it seems impossible that the

lceil|w||v|

rceilparallel

computations on the suffix wcould simultaneously reach all combinations of statesof the DFAs forL1 andL2 Note that if the computation ofM contains parallelbranches that simulate the computations ofMi (1 le i le 2) in statesPi sube Qi thenall the states ofPi need to be reachable from a single state ofMi with inputs oflength at most|w|

For the above reason we consider a lower bound for sets of trajectoriesuvlowastwwhere the length ofv and ofw can depend on the sizes of the minimal DFAs forthe component languagesL1 andL2 Furthermore to simplify the notations belowwe give lower bound results for sets of trajectories of the formvlowastw ie u = εIt would be straightforward to modify the construction for prefixesu of arbitrarylength to include the additive termn1n2 middot (|u| + 1) from (3)

Lemma 76 Let Σ = ab c For any n1n2 isin N there exist regular languagesLi sube Σ

lowast with sc(Li) = ni i = 12 and a set of trajectories T= vlowastw wherevw isin 01lowast such that

sc(L1 TL2) ge (n1n2)d|w||v| e+1

The ratio|w||v| above can be chosen to be arbitrarily large

By extending Lemma 76 slightly we obtain the following result

Theorem 77The upper bound (3) is asymptotically optimal if sc(T) (that is|v|)can be arbitrarily large compared to sc(Li) i = 12

We conclude with some open problems Recall that the example of arbitraryshuffle shown by Cacircmpeanuet al to have state complexity no better than ourconstruction in Lemma 71 uses the set of trajectoriesT = (0+ 1)lowast of density 2nWe also note that by Szilardet al [91] the density of a regular language overΣis eitherO(p(n)) wherep is a polynomial orΩ(|Σ|n)

Thus we may conjecture that a set of trajectoriesT yields an operation whichis in the worst case no better than Lemma 71 if and only ifpT(n) isin Ω(2n) ieThas exponential density

Our constructions in Lemma 76 use three-letter alphabets Can these con-structions be improved to two-letter alphabets The problem of restricting thealphabet size to be as small as possible is often challenging For example in thecase of concatenation the state complexity problem was solved for a three-letteralphabet by Yuet al [94] but the case of a two-letter alphabet was open untilrecently [44 43]

19

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

8 Substitution on Trajectories

We now recall the definition of substitution on trajectories originally given byKari et al [53 52] A trajectoryt is a word over01 Given a trajectoryt andu v isin Σlowast the substitution ofv into u (or the substitution inu of v) is given by

u t v = (nprod

i=1

uivi)un+1 n ge 0u = (nprod

i=1

uiai)un+1 v =nprod

i=1

vi

t =nprod

i=1

0ai 1)0an+1ai vi isin Σforalli1 le i le kui isin Σlowastforalli1 le i le k+ 1

j i = |ui |forall1 le i le k

Note that if|u| |t| or |v| |t|1 thenu t v = emptyWe extend this to sets of trajectoriesT sube 01lowast as expectedu T v =

cuptisinTu T v Further if L1 L2 are languages thenL1 T L2 = cupxisinL1yisinL2

u T v

We note that the notationT was also used by Mateescu [77] for the splicing onroutes This concept is unrelated to substitution on trajectories except in that theyboth are based on the concept of trajectories

We consider some examples(i) If T = 01lowast the resulting operationT is known as the substi-tution operation In this operation substitutions are permitted in anypossible position(ii) If Tk = 0lowast(10lowast)k0lowast the languagex T Σ

k contains all possiblewords obtained by substituting exactlyk symbols intox

One motivation for substitution on trajectories is the close associations betweeninsertion deletion and substitution in models of channels with can generate errorswhile transmitting data Indeed so-called SID channels (for substitution insertionand deletion) are a strong model of transmission media where these three typesof errors may occur Thus it is natural to consider a substitution-based operationusing trajectories

Formally a channelγ is a binary relation on words which defines a set ofpossible outputs from a channel given an input word ifu γ y theny is a possibleoutput of the channel on inputu A languageL is error-detecting for a channelγif for all u v isin L cup ε u γ v impliesu = v For a given set of trajectoriesT thechannel defined byT is given by the relation(u v) v isin u T Σ

lowast This is avery natural definition it is completely analogous to the definition of the binaryrelation defined by shuffle on trajectories defined independently by the author anddescribed in Section 91 below)

Kari et al [53 52] show that given a language and a substitution channel(defined by a set of trajectories) it is decidable in polynomial time whether the

20

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

language is error-detecting for a channel Kariet al [53 52] have also investi-gated language equations involving substitution on trajectories In particular forone-variable equations of the formX T R1 = R2 or R1 T X = R2 it is decid-able whether there exists a solution ifR1R2 andT are all regular The problemof examining two-variable equations for substitution on trajectories however isapparently open

Open Problem 81Find necessary and sufficient conditions on T such that givena regular language R it is decidable whether there exist X1X2 such that R=X1 T X2

9 Theory of Codes

Prefix codes are fundamental objects in formal language theory and are likelyone of the most well-studied classes of languages which are not defined by theirrelation to a generating or accepting device A languageL is a prefix code if noword x isin L is a prefix of another wordy in L The use of prefix is intricatelylinked to concatenationmdashwhich is itself a particular case of an operation definedby shuffle on trajectories This is reflected by the following well-known identifyfor prefix codesL is a prefix code if and only ifLcap LΣ+ = empty (We refer the readerto Berstel and Perrin [5] Juumlrgensen and Konstantinidis [46] or Shyr [88] for anintroduction to the theory of codes and prefix codes)

The question now arises is concatenation the only shuffle-on-trajectories op-eration for which the associated ldquoprefix-likerdquo property defines a class of languagesrelated to the theory of codes It turns out that the answer is no several well-studied language classes related to the theory of codes are a particular case of theconcept ofT-codes which we review now The results in this section are due tothe author [20 21 22]

Let L sube Σ+ be a language Then for anyT sube 01lowast we say thatL is aT-codeif L is non-empty and (L T Σ

+) cap L = empty If Σ is an alphabet andT sube 01lowast letPT(Σ) denote the set of allT-codes overΣ

There has been much research into the idea ofT-codes for particularT sube01lowast including

(a) prefix suffix and biprefix (or bifix) codes corresponding toT =0lowast1lowast T = 1lowast0lowast andT = 0lowast1lowast + 1lowast0lowast respectively(b) outfix and infix codes corresponding toT = 0lowast1lowast0lowast and T =1lowast0lowast1lowast respectively(c) shuffle-codes corresponding to bounded sets of trajectories suchasT = (0lowast1lowast)n for fixed n ge 1 (prefix codes of indexn) T = (1lowast0lowast)n

for fixedn ge 1 (suffix codes of indexn) T = 1lowast(0lowast1lowast)n for fixedn ge 1

21

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

(infix codes of indexn) orT = (0lowast1lowast)n0lowast for fixedn ge 1 (outfix codesof indexn)(d) hypercodes corresponding toT = (0+ 1)lowast(e) k-codes corresponding toT = 0lowast1lowast0lek (see Kari and Thierrin[57]) for fixedk ge 0 and(f) for arbitraryk ge 1 codes defined by the sets of trajectoriesPPk =

0lowast + (0lowast1lowast)kminus10lowast1+ PSk = 0lowast + 1+0lowast(1lowast0lowast)kminus1 PIk = 0lowast + (1lowast0lowast)k1+S Ik = 0lowast+1+(0lowast1lowast)k PBk = PPkcupPSk andBIk = PIkcupS Ik see Long[69] or Ito et al [40] for PI1S I1

For a list of references related to (a)ndash(d) see Juumlrgensen and Konstantinidis [46pp 549ndash553]

91 The Binary Relation Defined by Shuffle on Trajectories

We can also defineT-codes by appealing to a definition based on binary relationsIn particular forT sube 01lowast defineωT as follows for allx y isin Σlowast

xωT y lArrrArr y isin x T Σlowast

It is clear thatL sube Σ+ is aT-code if and only ifL is an anti-chain underωT (iex y isin L andxωT y impliesx = y)

We note that the relation analogous toωT for infinite words andω-trajectorieswas defined by Kadrieet al [48] In what follows we will refer toT having apropertyP if and only if ωT has propertyP Recall that a binary relationρ onΣlowast

is said to bepositiveif ε ρ x for all x isin Σlowast

Lemma 91 Let T sube 01lowast

(a) The relationωT is anti-symmetric(b) T is reflexive if and only if0lowast sube T(c) T is positive if and only if1lowast sube T

Let ρ be a binary relation onΣlowast Then we say thatρ is left-compatible(respright-compatible) if for all u vw isin Σlowast uρv implies thatwuρwv (respuwρvw)If ρ is both left- and right-compatible we say it iscompatible

Lemma 92 Let T sube 01lowast Then T is right-compatible (resp left-compatiblecompatible) if and only if T0lowast sube T (resp0lowastT sube T 0lowastT0lowast sube T)

We now consider conditions onT which will ensure thatωT is a transitiverelation Transitivity is often but not always a property of the binary relationsdefining the classic code classes For instance both bi-prefix and outfix codes aredefined by binary relations which are not transitive and hence not a partial order

22

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

First we define three morphisms we will need LetD = x y z andϕ σ ψ Dlowast rarr 01lowast be the morphisms given by

ϕ(x) = 0 σ(x) = 0 ψ(x) = 0ϕ(y) = 0 σ(y) = 1 ψ(y) = 1ϕ(z) = 1 σ(z) = ε ψ(z) = 1

Note that these morphisms are similar to the substitutions defined by Mateescuetal [79] whose purpose is to give necessary and sufficient conditions on a setT oftrajectories defining an associative operation

Theorem 93 Let T sube 01lowast Then T is transitive if and only ifψ(ϕminus1(T) capσminus1(T)) sube T

As an alternate formulation for Theorem 93 we note that for allT sube 01lowastT is transitive if and only ifT T 1lowast sube T

Corollary 94 Given a regular set Tsube 01lowast of trajectories it is decidablewhether T is transitive

For undecidability we naturally find that deciding whether a context-free setof trajectories is transitive is undecidable

Theorem 95Given a CF set Tsube 01lowast of trajectories it is undecidable whetherT is transitive

It is easy to see that ifTiiisinI is a family of transitive sets of trajectories thenthe setcapiisinITi is also transitive Thus we can define the transitive closure of a setT of trajectories as follows for allT sube 01lowast let tr(T) = Tprime sube 01lowast T subeTprimeTprime transitive Note thattr(T) empty as01lowast isin tr(T) for all T sube 01lowast DefineT as

T =⋂

Tprimeisintr(T)

Tprime (4)

Then note thatT is transitive and is the smallest transitive set of trajectories con-tainingT The operationmiddot 201

lowast

rarr 201lowast

is indeed a closure operator (much likethe closure operators on sets of trajectories constructed by Mateescuet al [79]for eg associativity and commutativity) in the algebraic sense sinceT sube T andmiddot preserves inclusion and is idempotent

Consider the operatorΩT 201lowast

rarr 201lowast

given by

ΩT(Tprime) = T cup Tprime cup ψ(σminus1(Tprime) cap ϕminus1(Tprime))

It is not difficult to see that givenT we can findT by iteratively applyingΩT toT and in factT =

⋃ige0Ω

iT(T) This observation allows us to constructT and for

instance gives us the following result (a similar result forω-trajectories is givenby Kadrieet al [48])

23

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Lemma 96 There exists a regular set of trajectories Tsube 01lowast such thatT isnot a CFL

In particular considerT = (01)lowast corresponding to perfect or balanced literalshuffle Then we note thatT cap 01lowast = 012nminus1 n ge 1

Open Problem 97Given Tisin (or T isin ) is it decidable whetherT isin

We now examine the relationship betweenT-codes andT-codes for arbitraryT sube 01lowast We call a languageL sube Σlowast T-convexif for all y isin Σlowast andx z isin LxωT y andyωT z impliesy isin L

We now characterize when a language isT-convex using shuffle and deletionalong trajectories Define the morphismτ 01lowast rarr idlowast by τ(0) = i andτ(1) = d This morphism defines the relationship between shuffle and deletionalong trajectories

Lemma 98 Let T sube 01lowast Then Lsube Σlowast is T-convex if and only if(L T Σlowast) cap

(Lτ(T) Σlowast) sube L

We now turn to decidability

Corollary 99 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage L it is decidable whether L is T-convex

These results lead to the following general relationship betweenT-codes andT-codes

Theorem 910LetΣ be an alphabet and Tsube 01lowast For all languages Lsube Σ+the following two conditions are equivalent

(i) L is a T -code(ii) L is a T -convex T-code

Theorem 910 was known for the caseO = 0lowast1lowast0lowast which corresponds to outfixcodes see eg Shyr and Thierrin [89 Prop 2] In this caseO = H = (0+ 1)lowastwhich corresponds to hypercodes Theorem 910 was known to Guoet al [31Prop 2] in a slightly weaker form forB = 0lowast1lowast+1lowast0lowast In this caseB = I = 1lowast0lowast1lowastand the convexity is with respect to the factor (or subword) ordering See alsoLong [70 Sect 5] for the case of shuffle codes

We conclude this section with some research directions A binary relationρon Σlowast is said to beleft-cancellative(respright-cancellative) if uv ρ ux impliesvρ x (respvuρ xu impliesvρ x) for all u v x isin Σlowast The relationρ is cancellativeif it is both left- and right-cancellative

24

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Open Problem 911What are necessary and sufficient language-theoretic con-ditions on a set of trajectories T so that T is left-cancellative right-cancellativeor cancellative

Another research direction would be to consider the class of codes definednot by shuffle on trajectories but by splicing on routes We note that additionalinteresting binary relations from the literature can be modeled using splicing onroutes For instance we leave it to the reader to verify that ifT = 0lowast + (01)lowast1+

then the associated binary relation (defined in the same way as for shuffle ontrajectories) is the length ordering given by

x le y lArrrArr (|x| lt |y|) or x = y

92 Maximal T-codes

Let T sube 01lowast We say thatL isin PT(Σ) is amaximal T-code if for allLprime isin PT(Σ)L sube Lprime impliesL = Lprime Denote the set of all maximalT-codes over an alphabetΣbyMT(Σ) Note that the alphabetΣ is crucial in the definition of maximality ByZornrsquos Lemma we can easily establish that everyL isin PT(Σ) is contained in someelement ofMT(Σ) The proof is a specific instance of a result from dependencytheory [46]

Unlike showing that everyT-code can be embedded in a maximalT-codeto our knowledge dependency theory has not addressed the problem of decid-ing whether a language is a maximal code under some dependence system Weaddress this problem forT-codes now We first require the following technicallemma which is interesting in its own right (specific cases were known for egprefix codes [5 Prop 31 Thm 33] hypercodes [89 Cor to Prop 11] as well asbiprefix and outfix codes [68 Lemmas 33 and 35]) Letτ 01lowast rarr idlowast beagain given byτ(0) = i andτ(1) = d

Lemma 912 Let T sube 01lowast Let Σ be an alphabet For all Lisin PT(Σ) L isinMT(Σ) if and only if

L cup (L T Σ+) cup (Lτ(T) Σ

+) = Σ+ (5)

Corollary 913 Let T sube 01lowast be a regular set of trajectories Given a regularlanguage Lsube Σ+ it is decidable whether Lisin MT(Σ)

Similar results were also obtained by Kariet al [51 Sect 5]

25

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

93 Embedding and Finiteness

Given a class of codesC and a languageL isin C of given complexity there hasbeen much research into whether or notL can beembedded in(or completed to) amaximal elementLprime isin C of the same complexity ie a maximal codeLprime isin C withL sube Lprime Finite and regular languages in these classes of codes are of particularinterest For instance we note that every regular code can be completed to amaximal regular code while the same is not true for finite codes or finite biprefixcodes

We now show an interesting result on embeddingT-codes in maximalT-codeswhile preserving complexity Our construction is a generalization of a result dueto Lam [65] In particular we define two transformations on languages LetT bea set of trajectories andL sube Σ+ be a language Then defineUT(L)VT(L) sube Σ+ as

UT(L) = Σ+ minus (L T Σ+ cup Lτ(T) Σ

+)

VT(L) = UT(L) minus (UT(L) T Σ+)

Recall thatτ 01lowast rarr idlowast is given byτ(0) = i andτ(1) = d

Theorem 914Let T sube 01lowast be transitive LetΣ be an alphabet Then for allL isin PT(Σ) the language VT(L) contains L and VT(L) isin MT(Σ)

We note one consequence of Theorem 914

Corollary 915 Let T sube 01lowast be transitive and regular Then every regular(resp recursive) T-code is contained in a maximal regular (resp recursive) T-code

Corollary 915 was given forT = 1lowast0lowast1lowast and regularT-codes by Lam [65Prop 32] Further research into the case whenT is not transitive is necessary (forexample the proofs of Zhang and Shen [97] and Bruyegravere and Perrin [6] on embed-ding regular biprefix codes are much more involved than the above constructionand do not seem to be easily generalized)

Open Problem 916Characterize those Tsube 01lowast for which every regular(resp finite recursive) T-code can be embedded in a maximal regular (respfinite recursive) T-code

We can extend our embedding results to finite languages with one additionalconstraint onT namely completeness

Corollary 917 Let T sube 01lowast be transitive and complete LetΣ be an alphabetThen for all finite Fisin PT(Σ) there exists a finite language Fprime isin MT(Σ) such thatF sube Fprime Further if T is effectively regular and F is effectively given we caneffectively construct Fprime

26

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

In practice the condition thatT be complete is not very restrictive since nat-ural operations seem to typically be defined by a complete set of trajectories

Also of interest are thoseT sube 01lowast such that allT-codes are finite It isa well-known result that all hypercodes (T = 01lowast) are finite which can beconcluded from a result due to Higman [35] We define the classFH as

FH = T isin 01lowast PT(Σ) sube

Also studied by the author are the classes of trajectories such that everyregular(or context-free)T-code is finite [20 22]

The classFH is related to a large amount of research in the literature IfT is apartial order andT isin FH thenT is awell partial order We defineF(po)

H to be theset of allT which are well partial orders Without trying to be exhaustive we notethe work of Jullien [45] Haines [32] van Leeuwen [93] Ehrenfeuchtet al [29]Ilie [37 38] Ilie and Salomaa [42] and Harju and Ilie [33] on well partial ordersrelating to words We also refer the reader to the survey of results presented by deLuca and Varricchio [14 Sect 5]

We now consider the question of the existence of arbitrary infinite languagesin a class ofT-codes We first show that ifT is bounded then there is an infiniteT-code

Theorem 918Let T sube 01lowast be a bounded set of trajectories Then for allΣwith |Σ| gt 1 PT(Σ) contains an infinite language ie Tlt FH

Further there exist uncountably many unbounded trajectoriesT such thatPT(Σ) contains infinitendasheven infiniteregularndashlanguages Infinitely many of theseare unbounded regular sets of trajectories

Theorem 919Let T sube 01lowast be a set of trajectories such that there exists nge 0such that Tsube 0len1(0+ 1)lowast Then for allΣ with |Σ| gt 1 PT(Σ) contains an infiniteregular language

We now turn to defining setsT of trajectories such that allT-codes are finiteThe following proof is generalized from the caseH = (0 + 1)lowast found in egLothaire [74] or Conway [10 pp 63ndash64]

Lemma 920 Let nm ge 1 be such that m| n Let Tnm = (0n + 1m)lowast0lenminus1 ThenTnm isin FH

As another class of examples Ehrenfeuchtet al[29 p 317] note that1n0lowast isinFH for all n ge 1 Ilie [38 Sect 77] also gives a class of partial orders which wemay phrase in terms of sets of trajectories In particular define the set of functions

G = g Nrarr N g(0) = 0 and 1le g(n) le n for all n ge 1

27

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Then for allg isin G we define

Tg = 1lowast

mprodk=1

(0ik1lowast) ik ge 0 forall1 le k le m m= g(msum

k=1

ik)

We denote theupper limit of a sequencesnnge1 by limnrarrinfinsn We have thefollowing result [38 Thm 778]

Theorem 921Let gisin G Then Tg isin FH lArrrArr limnrarrinfinn

g(n) lt infin

However a complete characterization is still open

Open Problem 922Give necessary and sufficient language-theoretic conditionson a set of trajectories T so that Tisin FH

We now turn to the complexity ofT-convex languages

Theorem 923Let C be a cone Let Tisin F(po)H be an element ofC Then every

T-convex language is an element ofC and co-C

Corollary 924 Let T isin (resp) be such that Tisin F(po)H If L is a T-convex

language then Lisin (resp)

Corollary 924 was known for the case ofH = (0 + 1)lowast and L isin seeThierrin [92 Cor to Prop 3]

94 Codes defined by Multiple Sets of Trajectories

When studyingT-codes we note that ifT1T2 sube 01lowast are sets of trajectoriesthere is not necessarily a set of trajectoriesT such thatωT = ωT1 cap ωT2 ie suchthatxωT y lArrrArr (xωT1 y)and (xωT2 y) For instance forP = 0lowast1lowast andS = 1lowast0lowast therelationωP cap ωS is given byled wherex led y if and only if there existu v isin Σlowast

such thaty = xu= vx This relation cannot be represented by a set of trajectoriesFor a discussion ofled see Shyr [88 Ch 8]

In fact there exist many natural classes of languages studied in connection tothe theory of codes which are notT-codes Classes and their associated binaryrelations studied by Day and Shyr [13] Fanet al [30] Ito et al [39] Long [71]Long et al [73 72] Shyr [88] Yu [95] and the author [18] are instead definedby a binary relation dependent on multiple sets of trajectories The author andK Salomaa [27] have studied the properties of classes of languages defined bymultiple sets of trajectories

28

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Let T sube 201lowast

We call such a set of sets of trajectoriesT a hyperset oftrajectories such a hyperset of trajectories is always assumed to be a finite set ofsets of trajectories DefineωT as

xωT y lArrrArrandTisinT

xωT y

That isxωT y if and only if xωT y for all T isin TThe classP(and)

T (Σ) is defined as follows for all non-empty languagesL sube Σ+L isin P(and)

T (Σ) if and only if L is an anti-chain underωT That is for allx y isin L ifxωT y thenx = y

The definition ofP(and)T (Σ) is motivated by the interest in the classP(and)

Tps(Σ) for

Tps = 0lowast1lowast1lowast0lowast Note thatxωTps y ie x led y implies thatx is both a prefixand a suffix of y We refer the reader to Juumlrgensen and Konstantinidis [46 pp550ndash551] for references and a discussion ofP(and)

Tps(Σ)

We also define a second class of languages indexed by an integerm which isconsidered in conjunction withP(and)

T (Σ) for particularT For allmge 0 letP(m)T (Σ)

be defined as follows for all non-empty languagesL sube Σ+ L isin P(m)T (Σ) if and

only if for all Lprime sube L with |Lprime| le m Lprime isin cupTisinTPT(Σ)The following hypersets of trajectories have been studied in connection with

the associated classP(m)T (Σ)

(i) Tps = 0lowast1lowast0lowast1lowast The classP(m)Tps

(Σ) is known as the class ofm-prefix-suffixcodes(or m-ps-codes) See Itoet al [39] for details

(ii) T io = 0lowast1lowast0lowast1lowast0lowast1lowast [73 18] The classP(m)T io

(Σ) is known as the class ofm-infix-outfix codes

(iii) Tkminusio = (1lowast0lowast)k1lowast (0lowast1lowast)k0lowast andTkminusps = (0lowast1lowast)k (1lowast0lowast)k for k ge 1 TheclassP(m)

Tkminusio(Σ) (respP(m)

Tkminusps(Σ)) is known as the class ofm-k-infix-outfix

codes(respm-k-prefix-suffix codes) For results on these classes see Longet al [72 Sect 4] or Long [71 Sect 23])

The following lemma [27] states thatP(and)T (Σ) andP(2)

T (Σ) always coincide

Lemma 925LetT sube 201lowast

be a hyperset of trajectories Then

P(and)T (Σ) = P(2)

T (Σ)

Lemma 925 was previously observed for eg the caseTps = 0lowast1lowast1lowast0lowast seeIto et al[39] The following equations detail the hierarchies induced by varyingmin P(m)

T (Σ) and their collapse These equations which hold for allT sube 201lowast

can

29

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

be proven using dependency theory [46] (in the following|T| is the cardinality ofT as a subset of 201

lowast

)

P(n)T (Σ) supe P

(n+1)T (Σ) foralln ge 0 (6)

P(2|T|)T (Σ) = P

(2|T|+i)T (Σ) =

⋃TisinT

PT(Σ) foralli ge 0 (7)

See eg Itoet al [39 Cor 32] for (7) in the particular case ofTps = 0lowast1lowast1lowast0lowastThe positive decidability of membership inP(and)

Tps(Σ) for regular languages (see

Ito et al[39] or Juumlrgensenet al[47]) relies intrinsically on the nature of the mem-bers ofTps The corresponding positive decidability problem forT io also relies onthe nature of the sets of trajectories involved [18] Kariet al [51 Thm 47] haveresolved the decidability of a somewhat similar decision problem for two sets oftrajectories in their framework ofbond-free property(see Section 10) Howevertheir approach is not applicable to our formalism We recall a particular case oftheir result translated into our framework

Theorem 926Let T = T1T2 be a hyperset of trajectories where Ti isin fori = 12 Given a regular language Rsube Σlowast we can determine whether there existw1w2 isin R and wisin Σ+ such that wi w and wi ωTi w for i = 12

However the following surprising undecidability result holds forP(and)T (Σ) [27]

Theorem 927GivenT = T1T2 where Ti isin for i = 12 and a regularlanguage R it is undecidable whether or not Risin P(and)

T (Σ)

The following undecidability result also holds [27]

Theorem 928There exists a fixed hyperset of trajectoriesT = T1T2 whereTi isin for i = 12 such that the following problem is undecidable ldquoGivenL isin is L isin P(and)

T (Σ)rdquo

However the decidability of membership inP(and)T (Σ) for a fixed T remains

open

Open Problem 929For which hypersets of trajectoriesT sube 201lowast

is the follow-ing problem decidable ldquoGiven Lisin is L isin P(and)

T (Σ)rdquo

It is conceivable that the question stated in Open Problem 929 could be de-cidable for all hypersetsT = T1 Tn whereTi isin for 1 le i le n inparticular if the alphabetΣ is fixed If this is the case by Theorem 927 givenTthe corresponding algorithm cannot be found effectively

The author and K Salomaa have also studied the equivalence problem forhypersets of trajectories In particular givenT1T2 sube 201

lowast

we say thatT1 andT2

areand-equivalent with respect toΣ if P(and)T1

(Σ) = P(and)T2

(Σ) We simply say thatT1T2

areand-equivalent if they areand-equivalent with respect to every finite alphabetΣWe use the notationT1 equivand T2 to indicate thatT1T2 areand-equivalent

30

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

Open Problem 930GivenT1T2 each consisting of regular sets of trajectoriescan we determine whetherT1 andT2 areand-equivalent

We can restate Open Problem 930 as follows

Open Problem 931For given regular languages T1 middot middot middot Tk sube 01lowast and U sube01lowast is it decidable whether or not there exist an alphabetΣ and x y isin Σ+ suchthat for all 1 le i le k yisin x Ti Σ

+ but ylt x U Σ+

This problem seems very challenging Similarly we say thatT1T2 arem-equivalent with respect toΣ if P(m)

T1(Σ) = P(m)

T2(Σ) Again we say thatT1T2

arem-equivalent if they arem-equivalent with respect to every finite alphabetΣWe use the notationT1 equivm T2 to indicate thatT1T2 are m-equivalent OpenProblems 930 and 931 also remain open whereand-equivalence is replaced bym-equivalence

10 DNA Code-word Design

The design of DNA code-words is a crucial step in employing DNA for computingpurposes DNA code-words are strands of DNA which allow only desired bondingto occur careful consideration must be taken when designing code-words Karietal [54] have used trajectories to investigate DNA bonding and code-word design

An involution θ Σ rarr Σ is any function such thatθ2 is equal to the identitymapping Any involution can be extended to a morphism via the ruleθ(xy) =θ(x)θ(y) for all x y isin Σlowast or an antimorphism via the ruleθ(xy) = θ(y)θ(x) for allx y isin Σlowast

Kari et al [54] define the concept of thebond-free propertiesof languages Inparticular the bond-free property with respect to the sets of trajectoriesTloTup isgiven by the following condition (for some involutionθ)

forallw isin Σ+ x y isin Σlowast(w Tlo xcap L emptyw Tup ycap θ(L) empty)rArr xy= ε

Several previously studied DNA coding-inspired conditions are particular casesof the bond-free properties for particular pairs of sets of trajectories This againdemonstrates the power of trajectoriesndashby approaching a problem from a uniformtrajectory-based viewpoint many particular cases can be unified and studied sys-tematically The following decidability result shows that the bond-free property isdecidable if all languages involved are regular [54]

Theorem 101Let TloTup be regular sets of trajectories Given a regular lan-guage L it is decidable in quadratic time whether L satisfies the bond-free prop-erty with respect to TloTup

31

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

11 Contextual Grammars

Contextual grammar with contexts shuffled along trajectories orCST grammarsare an interesting application of trajectories to the study of the generative capacityof grammar forms CST grammars were introduced by Martin-Videet al[75] Werecall the definition here A CST grammar is a 4-tupleG = (Σ BCT ) whereΣis an alphabetBC sube Σlowast are finite languages andT = (Tc)cisinC is a finite familyof sets of trajectories indexed by wordsc from C The languageB is called thebase ofG andC is called the contexts ofG Derivations inG are given byxrArrG yif and only if there existsc isin C such thaty isin x Tc c The reflexive transitiveclosure ofrArrG is denoted byrArrlowastG The language ofG is the set of all words whichare derivable from a word in the base ofG

L(G) = w isin Σlowast existx isin B such thatxrArrlowastG w

Recently Okhotin and K Salomaa [80] have investigated uniform CST gram-mars A CST grammarG = (Σ BCT ) is said to beuniform if Tc = Tcprime for allc cprime isin C In this case we denote the uniform CST byG = (Σ BCT) for someT sube Σlowast Okhotin and K Salomaa demonstrate several results relating to uniformCST grammars

Theorem 111 There exists a language Lsube Σlowast where |Σ| ge 2 that cannot begenerated by any uniform CST grammar

Theorem 111 does not depend in any way on the complexity of the set oftrajectories the languageL cannot be generated by a uniform CST grammarGregardless of the complexity ofT However Okhotin and K Salomaa show thatthis same languageL can be generated by a non-uniform CSTG = (Σ BCT )where eachTc isin T is a context-sensitive set of trajectories

Theorem 112 Non-uniform CST grammars with context-sensitive sets of tra-jectories are strictly more powerful than uniform CST grammars with context-sensitive sets of trajectories

The following questions remain open [80]

Open Problem 113Are non-uniform CST grammars with context-free (respregular) sets of trajectories more powerful than uniform CST grammars withcontext-free (resp regular) sets of trajectories

We also note that Mateescu has also extended the notion of co-operating dis-tributed grammars (CD grammars) to encompass the notion of trajectories [76]A CD grammar on trajectoryT is a six-tupleΓ = (VΣSP0P1T) whereV is

32

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

a finite set of non-terminalsΣ is a finite alphabetS isin V is a distinguished startstateP0P1 sube V times (V cup Σ)lowast are two finite sets of productions andT sube 01lowast isthe set of trajectories

LetrArri denote the relation defined by the CFGΓi = (VΣSPi) for i = 01Then a wordw isin Σlowast is generated byΓ if there existt isin T of lengthn andαi isin

(V cup Σ)lowast for 1 le i le n such that ift = t1t2 middot middot middot tn with ti isin 01 then for all1 le i le n minus 1 αi rArrti αi+1 with S = α1 andw = αn The usual notion of a CDgrammar corresponds toT = 0lowast1lowast The notion of CD grammars on trajectoriesis also generalized to grammars withn sets of productionsP0P1 Pnminus1 and aset of trajectoriesT sube 0 nminus 1lowast

12 Conclusion

The notion of trajectories can seem deceptively simple languages are used toparameterize language operations This provides a basis for uniform results onlanguage operations However in the ten years since their introduction trajecto-ries have seen much use beyond modelling language operations we have surveyedthese areas The use of trajectories in many varied areas is a testament to the el-egance of the concept We are confident that interest in trajectories will remainhigh as researchers continue to find new applications for the concept

Acknowledgments

I am grateful to Kai Salomaa for reading a version of this survey

References

[1] A V P G On a family of linear grammarsInf and Cont 7(1964)283ndash291

[2] A V P G Generalizations of regular eventsInf and Cont 8(1965) 56ndash63

[3] B-E G M-V C M A Dialog and splicing on routesRomanian Journal of Information Science and Technology 6 1-2 (2003) 45ndash59

[4] B J B L C O P B P J-E Operations pre-serving recognizable languages InFundamentals of Computation Theory (FCT2003)(2003) A Lingas and B Nilsson Eds vol 2751 ofLecture Notes in Com-puter Science Springer pp 343ndash354

33

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

[5] B J P D Theory of Codes Available athttpwww-igmuniv-mlvfr7EberstelLivreCodesCodeshtml 1996

[6] B V P D Maximal bifix codesTheor Comp Sci 218(1999)107ndash121

[7] C C S K V S Shuffle decompositions of regularlanguagesInt J Found Comp Sci 13 6 (2002) 799ndash816

[8] C C S K Y S Tight lower bound for the state complexityof shuffle of regular languagesJ Automata Languages and Combinatorics 7 3(2002) 303ndash310

[9] C C K J On Fatou properties of rational languagesIn Where Mathematics Computer Science Linguistics and Biology Meet(2000)C Martin-Vide and V Mitrana Eds Kluwer pp 227ndash235

[10] C J Regular Algebra and Finite Machines Chapman and Hall 1971

[11] D M I O K L Closure properties and decision questions ofsome language classes under ciliate bio-operationsTheor Comp Sci 306 1 (2003)19ndash38

[12] D M MQ I Template-guided DNA recombinationTheor CompSci 330(2005) 237ndash250

[13] D P-H S H Languages defined by some partial ordersSoochow JMath 9(1983) 53ndash62

[14] L A V S Regularity and finiteness conditions pp 747ndash810In [84] volume I

[15] D V M Y Partial commutation and traces pp 457ndash534 In [84]volume III

[16] D M State complexity of proportional removalsJ Automata Languagesand Combinatorics 7 4 (2002) 455ndash468

[17] D M Deletion along trajectoriesTheor Comp Sci 320 2ndash3 (2004)293ndash313

[18] D M On the decidability of 2-infix-outfix codes Tech Rep 2004-479School of Computing Queenrsquos University 2004

[19] D M Semantic shuffle on and deletion along trajectories InDevel-opments in Language Theory(2004) C Calude E Calude and M Dineen Edsvol 3340 ofLNCS Springer pp 163ndash174

[20] D M Trajectory-based codesActa Inf 40 6ndash7 (2004) 491ndash527

[21] D M Trajectory-based embedding relationsFund Inf 59 4 (2004)349ndash363

[22] D M Trajectory-Based Operations PhD thesis Queenrsquos University2004

34

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

[23] D M M A S K Y S Deletion along trajec-tories and commutative closure InProceedings of WORDSrsquo03 4th InternationalConference on Combinatorics on Words(2003) T Harju and J Karhumaumlki Edspp 309ndash319

[24] D M S K State complexity of shuffle on trajectories InDescriptional Complexity of Formal Systems (DCFS)(2002) J Dassow M Hoe-berechts H Juumlrgensen and D Wotschke Eds pp 95ndash109 To appear J AutomataLanguages and Combinatorics

[25] D M S K Decidability of trajectory-based equations InMathematical Foundations of Computer Science 2004(2004) J Fiala V Koubekand J Kratochvil Eds vol 3153 ofLNCS Springer pp 723ndash734 To appear TheorComp Sci

[26] D M S K Restricted sets of trajectories and decidabilityof shuffle decompositions InDescriptional Complexity of Formal Systems (DCFS2004)(2004) L Ilie and D Wotschke Eds pp 37ndash51 To appear Int J FoundComp Sci

[27] D M S K Codes defined by multiple sets of trajectories ToappearAFL 2005(2005)

[28] E A H T P I P DR GComputationin Living Cells Gene assembly in ciliates Springer 2004

[29] E A H D R G On regularity of context-freelanguagesTheor Comp Sci 23(1983) 311ndash332

[30] F C-M S H Y S Sd-Words andd-languagesActa Inf 35(1998)709ndash727

[31] G Y S H T G E-Convex infix codesOrder 3(1986) 55ndash59

[32] H L On free monoids partially ordered by embeddingJ Comb Th 6(1969)94ndash98

[33] H T I L On quasi orders of words and the confluence propertyTheorComp Sci 200(1998) 205ndash224

[34] H T M A S A Shuffle on trajectories The Schuumltzen-berger product and related operations InMathematical Foundations of ComputerScience 1998(1998) L Brim J Gruska and J Zlatuska Eds no 1450 in LectureNotes in Computer Science Springer pp 503ndash511

[35] H G Ordering by divisibility in abstract algebrasProc London Math Soc2 3 (1952) 326ndash336

[36] H M K M State complexity of basic operations on nondetermin-istic finite automata InImplementation and Application of Automata 7th Inter-national Conference CIAA 2002(2003) J-M Champarnaud and D Maurel Edsvol 2608 ofLecture Notes in Computer Science Springer pp 148ndash157

35

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

[37] I L Remarks on well quasi orders of words InProceedings of the 3rd DLT(1997) S Bozapalidis Ed pp 399ndash411

[38] I L Decision Problems on Orders of Words PhD thesis University of Turku1998

[39] I M J H S H T G n-prefix-suffix languagesIntlJ Comp Math 30(1989) 37ndash56

[40] I M J H S H T G Outfix and infix codes and relatedclasses of languagesJ Comp Sys Sci 43(1991) 484ndash508

[41] I M K L T G Shuffle and scattered deletion closure of lan-guagesTheor Comp Sci 245(2000) 115ndash133

[42] I M S P Remark on deletions scattered deletions and related operationson languages InSemigroups and Applications(1998) J Howie and N RuskucEds World Scientific pp 97ndash105

[43] J J J G S A State complexity of concatenation andcomplementation of regular languages InImplementation and Application of Au-tomata(2004) M Domaratzki A Okhotin K Salomaa and S Yu Eds vol 3317of LNCS Springer pp 178ndash189

[44] J G State complexity of some operations on regular languages InDe-scriptional Complexity of Formal Systems Fifth International Workshop(2003)E Csuhaj-Varjuacute C Kintala D Wotschke and G Vaszil Eds pp 114ndash125

[45] J P Sur un theacuteoregraveme drsquoextension dans la theacuteorie des motsCR Acad Sc Paris(Seacuterie A) 266(1968) 851ndash854

[46] J H K S Codes pp 511ndash600 In [84] volume I

[47] J H S K Y S Transducers and the decidability of inde-pendence in free monoidsTheor Comp Sci 134(1994) 107ndash117

[48] K A D V T D S K Algebraic properties of theshuffle overω-trajectoriesInf Proc Letters 80 3 (2001) 139ndash144

[49] K L Generalized derivativesFund Inf 18(1993) 27ndash39

[50] K L On language equations with invertible operationsTheor Comp Sci 132(1994) 129ndash150

[51] K L K S S P On properties of bond-free DNA lan-guages Tech Rep 609 Computer Science Department University of Western On-tario 2003

[52] K L K S S P Substitutions on trajectories InThe-ory is Forever(2004) J Karhumaumlki H Maurer G Paun and G Rozenberg Edsvol 3113 ofLNCS Springer pp 145ndash158

[53] K L K S S P Substitutions trajectories and noisychannels InImplementation and Application of Automata(2004) M DomaratzkiA Okhotin K Salomaa and S Yu Eds vol 3317 ofLNCS Springer pp 202ndash212

36

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

[54] K L K S S P On properties of bond-free DNA lan-guagesTheor Comp Sci 334(2005) 131ndash159

[55] K L S P Language deletions on trajectories Tech Rep 606 ComputerScience Department University of Western Ontario 2003

[56] K L S P Aspects of shuffle and deletion on trajectoriesTheor CompSci 332 1ndash3 (2005) 47ndash61

[57] K L T G k-catenation and applicationsk-prefix codesJ Inf OptSci 16 2 (1995) 263ndash276

[58] K L T G Contextual insertionsdeletions and computabilityInfand Comp 131(1996) 47ndash61

[59] K L T G Maximal and minimal solutions to language equationsJ Comp Sys Sci 53(1996) 487ndash496

[60] K S Correction to ldquoRegularity Preserving FunctionsrdquoACM SIGACT News6 3 (1974) 22

[61] K S Regularity preserving functionsACM SIGACT News 6 2 (1974)16ndash17

[62] K S Context-free preserving functionsMath Sys Theor 9 3 (1975)

[63] K D On regularity-preserving functions Tech Rep TR95-1559 Departmentof Computer Science Cornell University 1995

[64] K M Simple language equationsBull Eur Assoc Theor Comp Sci 85(2005)81ndash102

[65] L N Finite maximal infix codesSemigroup Forum 61(2000) 346ndash356

[66] L M R Y Synchronized shuffle and regular languages InJewelsare Forever Contributions on Theoretical Computer Science in Honor of Arto Salo-maa(1999) J Karhumaumlki H Maurer G Paun and G Rozenberg Eds Springerpp 35ndash44

[67] L E Language Equations Monographs in Computer Science Springer 1999

[68] L D On nilpotency of the syntactic monoid of a language InWords Lan-guages and Combinatorics II(1992) M Ito and H Juumlrgensen Eds World Scien-tific pp 279ndash293

[69] L D On two infinite hierarchies of prefix codes InProceedings of the Confer-ence on Ordered Structures and Algebra of Computer Languages(1993) K Shumand P Yuen Eds World Scientific pp 81ndash90

[70] L D k-bifix codesRivista di Matematica Pura ed Applicata 15(1994) 33ndash55

[71] L D Study of Coding Theory and its Application to Cryptography PhD thesisCity University of Hong Kong 2002

[72] L D J W M J Z D k-p-infix codes and semaphore codesDiscAppl Math 109(2001) 237ndash252

37

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

[73] L D M J Z D Structure of 3-infix-outfix maximal codesTheorComp Sci 188(1997) 231ndash240

[74] L M Combinatorics on Words Addison-Wesley 1983

[75] M-V C M A R G S A Contexts on tra-jectoriesIntl J Comp Math 73 1 (1999) 15ndash36

[76] M A CD grammar systems and trajectoriesActa Cyb 13 2 (1997) 141ndash157

[77] M A Splicing on routes a framework of DNA computation InUnconven-tional Models of Computation(1998) C Calude J Casti and M Dinneen EdsSpringer pp 273ndash285

[78] M A Words on trajectoriesBull Eur Assoc Theor Comp Sci 65(1998)118ndash135

[79] M A R G S A Shuffle on trajectories SyntacticconstraintsTheor Comp Sci 197(1998) 1ndash56

[80] O A S K Contextual grammars with uniform sets of trajecto-ries Fund Inf 64(2005) 341ndash351

[81] P J-E S J Some operations and transductions that preserve ra-tionality In 6th GI Conference(1983) A Cremers and H-P Kriegel Eds vol 145of Lecture Notes in Computer Science Springer pp 277ndash288

[82] P D E A R G Template-guided recombinationfor ies elimination and unscrambling of genes in stichotrichous ciliatesJ TheoretBiol 222(2003) 323ndash330

[83] P GS A Thin and slender languagesDisc Appl Math 61(1995)257ndash270

[84] R G S A Eds Handbook of Formal Languages Springer1997

[85] S A Y S On the decomposition of finite languages InDevelopmentsin Language Theory(1999) G Rozenberg and W Thomas Eds pp 22ndash31

[86] S J MN R Regularity-preserving relationsTheor CompSci 2(1976) 147ndash154

[87] S J Numeration systems linear recurrences and regular setsInf and Comp113 2 (1994) 331ndash347

[88] S H Free Monoids and Languages Hon Min Book Company Taichung Tai-wan 2001

[89] S H T G HypercodesInf and Cont 24 1 (1974) 45ndash54

[90] S R H J Regularity preserving modifications of regular ex-pressionsInf and Cont 6 1 (1963) 55ndash69

38

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion

[91] S A Y S Z K S J Characterizing regular languageswith polynomial densities InMathematical Foundations of Computer Science 1992(1992) I Havel and V Koubek Eds vol 629 ofLecture Notes in Computer Sci-ence Springer pp 494ndash503

[92] T G Convex languages InAutomata Languages and Programming Col-loquium Paris France(1972) M Nivat Ed pp 481ndash492

[93] L J Effective constructions in well-partially ordered free monoidsDiscMath 21(1978) 237ndash252

[94] Y S Z Q S K The state complexities of some basic opera-tions on regular languagesTheor Comp Sci 125(1994) 315ndash328

[95] Y S Sd-Minimal languagesDisc Appl Math 89(1998) 243ndash262

[96] Z G-Q Automata boolean matrices and ultimate periodicityInf and Comp152(1999) 138ndash154

[97] Z L S Z Completion of recognizable bifix codesTheor Comp Sci145(1995) 345ndash355

39

  • Introduction
  • Shuffle on Trajectories
  • Deletion along Trajectories
    • Non-regular Trajectories Preserving Regularity
    • Algebraic Properties
    • Commutative Closure
      • Language Equations
        • Zero Variable Equations
        • One Variable Equations
        • Two Variable Equations
          • Splicing on Routes
          • Semantic Shuffle on Trajectories
          • Descriptional Complexity
          • Substitution on Trajectories
          • Theory of Codes
            • The Binary Relation Defined by Shuffle on Trajectories
            • Maximal T-codes
            • Embedding and Finiteness
            • Codes defined by Multiple Sets of Trajectories
              • DNA Code-word Design
              • Contextual Grammars
              • Conclusion