formalisation of parenthesis-free languages

10
Zeilsehr. f. math. LopdB und Grulzdlagen d. Math. Bd. 13, 177-186 (IYtih') FORMALISATION OF PARENTHESIS-FREE LANGUAGES by ANDRZEJ J. BLIKLE in Warsaw Introduction In this paper we shall treat several parenthesis-free languages the concept of which was introduced by Z. PAWLAK [4]. More precidy our intention is to describe a generative grammar (in the sense of CHOMSKY) for these languages and then to use these languages to describe processes (see BLTKLE [I]). We shall also show that every arithmetical formula can be written in each of the languages. 1. Syntax *,.Z,C 8,0,y, A, B, C,. . .,a, b,c,. . . Let the following alphabet A, be given: with the supplementary assumption that Latin letters and capitals y a y have arbitrary indices. We now consider a generative grammar G given by the ru1es:l) 1. r -+zr 2. r -+ 8rr 3.r+oVy 4.'X+@*y,@y* 5.8-+0** 6. 0 + A, B, C, . . . (with eventual indices) 7. y +a, b, c, . . . (with eventual indices) This is a simple phrase-structure grammar (see Y. BAR-HILLEL). Let L, denote the language generated by 0, and let LGT denote its terminal language. The problem arises of deciding whether a given word written in the alpha- bet A, is a member of La or is not. Let us consider all words in AG of the form: where T is the initial symbol. Q,O, - . . SL, where every 52< is of one of the following seven types: and where: r,z, 6, A A *p, A a*, A* *, (i) A is a variable for the symbols 0, A, B, . . . (with indices), (ii) a and fi are variables for the symbols v, a, 6, . . . (with indices). 1) A statement of the form X + Y, means: replace X by Y. 12 Ztschr. f. math. Logik

Upload: andrzej-j-blikle

Post on 15-Jun-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Formalisation of Parenthesis-Free Languages

Zeilsehr. f. math. LopdB und Grulzdlagen d. Math. Bd. 13, 177-186 ( I Y t i h ' )

FORMALISATION OF PARENTHESIS-FREE LANGUAGES

by ANDRZEJ J. BLIKLE in Warsaw

Introduction In this paper we shall treat several parenthesis-free languages the concept of

which was introduced by Z. PAWLAK [4]. More precidy our intention is to describe a generative grammar (in the sense of CHOMSKY) for these languages and then to use these languages to describe processes (see BLTKLE [I]).

We shall also show that every arithmetical formula can be written in each of the languages.

1. Syntax

*, .Z ,C 8 , 0 , y , A , B , C , . . . , a , b , c , . . . Let the following alphabet A, be given:

with the supplementary assumption that Latin letters and capitals y a y have arbitrary indices.

We now consider a generative grammar G given by the ru1es:l) 1. r -+zr 2. r -+ 8rr 3 . r + o V y 4 . ' X + @ * y , @ y * 5 . 8 - + 0 * * 6 . 0 + A , B , C , . . . (with eventual indices) 7. y + a , b , c , . . . (with eventual indices)

This is a simple phrase-structure grammar (see Y. BAR-HILLEL). Let L, denote the language generated by 0, and let LGT denote its terminal

language. The problem arises of deciding whether a given word written in the alpha- bet A, is a member of La or is not. Let us consider all words in AG of the form:

where T is the initial symbol.

Q,O, - . . SL, where every 52< is of one of the following seven types:

and where: r ,z , 6 , A A * p , A a*, A * *,

(i) A is a variable for the symbols 0, A , B , . . . (with indices), (ii) a and fi are variables for the symbols v , a , 6 , . . . (with indices).

1) A statement of the form X + Y , means: replace X by Y . 12 Ztschr. f. math. Logik

Page 2: Formalisation of Parenthesis-Free Languages

178 ANDR7EJ J . BLIKLE

Now let A be a real valued function defined on all words of the above type as follows :

1. A ( r ) = A ( A Q) = 1

2. A (2) = A ( A * 8) = A ( A Cr * ) = 0

4. A ( 8 , . . . 8?&) = A ( 8 , ) + *

3. A ( 6 ) = A ( A * *) = -1

+ A ( 9 , ) .

Defin i t ion : A word 9,. . . 8, is said to be well-formed, or is said to be a

1. A ( S , . . . Q,) = 1

formula, if:

2. (V1 S i I n) [A(Qi8i+ , . . . 9,) > O ] .

Now let LA denote the set of all words well-formed in this sense and let Fi denote QiQi+, . . . 9, where F denotes Q1 . . . 9,.

Lemma 1. If a word in the alphabet {r, 6 , Z} is well-formed, then every other word obtained from it by generative rules 4., 5., B., and 7. is also well-formed and conversely: to wery formula there corresponds a formula in (r, 6 , Z} from which it may be obtained.

The proof is obvious.

Theorem 1. LG = LA i.e. the grammar G generates only well-formed words and

Proof. Let us prove first that Ld C L A . By means of lemma 1 we may consider

conversely every well-formed word can be generated by G .

a reduced grammar G,:

1. r + ~ r 2. r -+ w r .

The one-letter word r is the shortest word generated by G and is of course well- formed.

Let F = Q, . . . SZ, E L, and let SZi = r. Using rule 1 we obtain a new word F’ = Q, . . . Q + J ~ Q + , . . . 8, but

A(F’) = A ( 8 , . . . Q.+,) + A ( Z ) + A ( F ) + A(52,+, . . . QL) =

= A ( 8 , . . . Qt-l) + A(Qi) + A(Q7+, . . . 8?J = A ( F ) = 1

and as is easy to see, for every 1 5 j 5 n

thus E’ is also well-formed. A@’) = A ( F J > 0

An analogous proof may be given for rule 2. Therefore L, C LA. We shall now prove the converse inclusion : LA C La.

The shortest word in LA is T and it is of course a member of La. Suppose now that for every k 2 n every well-formed word 8, . . . Qk can be generated by G, and let us consider a well-formed word F : 9, . . . 8,+

Page 3: Formalisation of Parenthesis-Free Languages

FORMALISATION OF PARENTHESIS-FREE LANGUAGES 179

By the definitions of a formula and of the function A the last symbol of this word must be r. Thus let Qi be the first symbol from the right side which is not a r. Therefore is eitherZ or 6. In the first case, Qi+l is a r and i 7- I 5 1) -1 1 , in the second case sZi+l and Qi+2 are symbols I' and i + 2 5 n + 1.

is 6. Thus F is of the form: Suppose that Q,. . . ai-l 6rrai+, . . .a,,, , .

PO: a,. . . a < - ~ a ~ + ~ . . . f J , L + l . Let us consider now the word

It is easy to see that if F is a formula then Fo is also a formula. Prom the inductive assumption Po may thus be generated by G , but F may be generated from Po. Hence P E La. The proof is analogous if Qi is Z. Q . e. d.

By this theorem we shall use the notation: L = LQ = LA and LGp = L, .

2. Interpretation As mentioned, in the introduction to this paper the language introduced in

section 1 will be used to describe processes and more precisely, adequate processes. The notion of adequate process was formulated by the author in [l] and may

be briefly characterized as an algebra 91 = ( A , R,) where A is a set called the universe of process or set of elements of process, and R3 is a three argument relation') defined in A and satisfying certain conditions. These conditions restrict the rela- t>ion R, in such a way that if A is interpreted as t,he set of nodes of a graph and every three-tuple (ao, a,, a2) satisfying R, i. e. such that R,(ao, a, , n2) , is inter- preted as a subgraph of the form

corresponds to t8he set of all free nodes of a tree, and

corresponds to t,he root of the tree. Thus every process may be considered in some way as isomorphic with some tree.

l) The case when a process is described by a two-argument relation will not be descused in this paper, but it may be easilly rednced to the one considered.

12*

Page 4: Formalisation of Parenthesis-Free Languages

180 ANDRZEJ J. BLIRLE

.- w(aj) + '

R" * aj, W(aj , ) if Ri(ai , uj,, u jJ ; uj, $ IT(&,); a?, E 17(R3)

R~ ail * W(aj,) if R'(uj, uj,, u jJ; ail E IT(&); uj2 B n(R, ) Ri * * @(aj,) w(a j , ) if Ri(ai, aj,, uj,); aj, B n(R,) R' uj,uj, if Ri(ai, uj,, u jJ ; aj, E I l ( R 3 ) ; uj, E 17(R3).

aj, 6 IT(R,);

1) In the case of arithmetical processes, every arithmetical operation can be represented by a relation R', for example Rl(s, y, z ) represents s = y + z , R2(s, y, z ) represents 5 = yz, etc.

As was shown by Z. PAWLAK [4], every tree can be described in our language in two different ways depending upon the order of nodes considered (f or P ) . Hence for processes, two different algorithms correlating a process to a formula can be expected and we shall discuss them successively.

Thus let 91 = ( A , R3) be a given process and let R, be represented as a sum of certains subrelationsl) :

a . R, = ,lJ R’. 2=1 Algorithm for order f:

We introduce a coding function f? which will be a one-to-one function defined on the set A u { A ] , whose values are symbols and strings of symbols. The algorithm which will bc given can be considered as an inductive definition of w ; it is, however, written in CHOMSKY’s notation which is more convenient for our purpose: E ( A ) --f W(u)

Page 5: Formalisation of Parenthesis-Free Languages

FORMALISATION OF PARENTHESIS-FREE LANQUAGES 181

aj, B n(R3) ; aj, n (R3) Ri ai uj,Ql . . . D, if Ri(a i , ajl , u j z ) ;

aj, E II(R3) ; aj, E h' (R3) . \

- Let us now consider the order P. As in the former case we introduce a coding

function p; the algorithm is described by the following rules]) (with the supplement- ary condition, that every performance of a substitution in a formula must be done a t the first admissible place from the left):

m 1 -+ P ( d )

Ri * ajLQ, . , . O,P(aj,)

~i aj, * 9, . . . Q,, P(aj , ) if ~i ( a j , ail, uj,) ;

if Ri(ai , ui,, wj2); aj, n ( R 3 ) ; aj, n ( R , )

ajl E n ( R 3 ) ; uj, B Il(R,) I R'* 4 Q, . . . sZ,P(aj,)P(aj,) if R i ( a i , a j l , a j 2 ) ;

-

P(ai) L?, . . . Q,,A --f

I ) A denotes an empty symbol. 2, This is even not a grammar in the sense of CHOMSEY, become a (the length of Q, . . . Q,J

is not bounded and thus rules 2 and 3 are in fact schemas of an infinite set of niles.

Page 6: Formalisation of Parenthesis-Free Languages

182 ANDRZEJ J. BLIKLE

second substitution. We obtain a new formula F': Q, . . . sdi --lZl&+l . . . Qfir. It is easy to verify (using the function A ) that if F € L then F' € L also. An analog- ous argument holds of cours for the third substitution. Therefore L,- C LT.

We have shown that the formulas describing adequate processes in both orders are formulas from LT. As we shall see in the next paragraph, the process described by such a formula can be reconstructed only if the order of coding is known. It will be shown, that L = LF = Lw, thus it is impossible to decide for any formula describing a process in which order it was written. This information must be known to understand the formula properly.

To prove the above equivalence we shall prove by induction the inclusion L, C LF :

The shortest formula from LT is r and it can be generated from Lp.l) Let F: SL, . . . Q,, E L , thus by the definition of a formula there exists an s < n

such that

are symbols r. Suppose that SLs- is not a F and is, for example a 6. Than we have Q 8 , Q 8 + I , . . 4 % l

F : Q, . . . Q+, w . . . r.

FO: Q, . . . .n,_,rr. . . r.

-_ ?I - 8

Considei .

n-8-2

Obviously FO E L and F is generable from Po. Hence if Fo E LF then F E Lr; When Q8-, is a 2 the argument is analogous. Thus we have proved that Lp = L, .

Therefore the equality Lp = L, follows and thus we have the following theorem:

Theorem 2. To every formula F from LT there corresponds an adequate process ,2(

(a tree) such that F describes 21 in order and there also corresponds a second adequate process 123 (u different tree) such that P describes 23 in order w. 81 and 123 are in general different.

3. Computation

Let 52, . . . a,, be a formula from Lrri; or L,, where every Si is one of the follow- ing four types of words:

Riaiah , Ri'*aa,, Ria i# : , R i * w

called subformulas. Every formula is thus a string of subformulas.

II i , uk, and * represent arguments of the operation denoted by Ri. In each subformula the symbol Ri represents a binary operation and the symbols

l) We assume here as above, that we put I' in the place of 2 in G> and we take into account only the two first rules of G>.

Page 7: Formalisation of Parenthesis-Free Languages

FORMALISATION OF PARENTHESIS-FREE LANGUAGES 183

The symbol a, means that the corresponding argument is given, the symbol * - that, it is a result represented by some other formula.

Now t,he problem arises; how to find this subformula for a given BymboI *. This problem will have of course two solutions corresponding to the two orders W and ?.

Let us first consider the order W with the corresponding algorithm. It follows that if some symbol * stands in a statement of the type Ri * ak or Ri aj * , then the corresponding subformula in the formula is the first subformula on the right side. If we have however a subformula of the type Ri * * , then the situation is more complicated :

Let us consider a statement

~'2% . . . s2, Ri * * w (aj,) W (aj,)

where Ri (ai , aj,, aj,) holds for some a3 .l) It is now easy to see that to the first * there corresponds the first subformula of the statement generated from @ (aj,) and to the second * corresponds the first subformula of the statement generated from @(aj,) .

This last mentioned subformula is simply the first subformula to the right side of Ri * *, however to reach the first subformula generated from w(aj , ) we must jump ower all subformulas A'228+2. . . Q8+k generated from W(aj , ) .

In order to distinguish these latter subformulas in the formula, we can profit from the obvious fact that all symbols * appearing in Qs+2 . . . Qs+t have their corresponding subformulas among Q,+z . . . Q8+k. Thus we have the following algorithm :

The formula under consideration is scanned from right to the left and all symbols * are successively considered. Each such symbol then corresponds to the first sub- formula to the right which has not yet been correlated to some other symbol *. To indicate this correspondence inside the formula, we shall use arrows in the following way :

__

1 v- _I_ Q,. . . Q, Ri I * W(a,) W ( a j , )

Now let us consider the order P and the corresponding algorithm. It is easy to see that using the above notation we have

4 I

R' * ai2 9,. . . Q, F(a,)

I. e. where some u, is the result of the operation R' performed on uj, and ujz.

Page 8: Formalisation of Parenthesis-Free Languages

184 ANDRZEJ J . BLIKLE

Taking into account the supplementary condition which has been assumed with this algorithm we claim as follows:

To determine the correspondence between symbols * and subformulas of a formula written in LF, we scan the formula from left ro right considering successivs sym- bols *. To every symbol * we correlate now the first subformula to the right wThich has not yet been correlated to some other symbol.

Examples .

Let us consider the process of computing some arithmetic formula, for exaizlple:

To this formula there corresponds the tree:

The parenthesis-free formulas describing this tree are the following :

Where the arrows indicate the correspondence between symbols * and tliree-

Theorem 3. Every arithmetic formula, i .e. every formula which is an ccrhitrary composition of operations of addition, subtraction, multiplication and divisioxl), caii be written in L, or i n L,.

The proof is rather obvious. Namely it is sufficient to note that every such formula can be expresed by means of a tree and more precisely by an adequate process. Thus it may be written in parenthesis-free notation.

carracter subformulas.

1) The kind of operation is of course unimportant. An analogous theorem could be formulated for an arbitrary set of two-argument operations.

Page 9: Formalisation of Parenthesis-Free Languages

FORMALISATION OF PARENTHESIS-FREE LANGUAGES 185

4. Composition of processes and corresponding composition of formulas in LT

Let the two processes '91 = ( A , R3) and B = ( B , Q3) be given. Suppose A n B = {a} , a E 17(R3) and that a is the root of B . We now consider the process 6 = ( C , S,> where :

C = A v B , S 3 / ~ = R,, " 3 1 ~ = Q,.

This process will be called a composition of 2t and 23 in point a , in symbols:

Now let $3 be described by the formula F,: 9, . . . Q n , and B by the formula

Suppose F% and FB are written in L--, and Pol: Q, . . .!& where the symbol a

We consider three cases:

Case 1. SZj: Ri a a where iy is the symbol of * or some ak . Thus generating Fa,

(U 0 23) (a) .

F , : @, . . . @&.I) The problem arises of finding Fa for given Fa and FB .

appears in some 52,; 1 5 j n.

starting from

we obtain

and therefore

Pol : 52, . . . SiPl Ri (Y aQj, . . , a,, S, . . . Q ~ ~ ~ R ~ a * @(a) . . . Q,

Fa: 51,. . . Qj-lR' iy * @,.. . @n;sZj+l.. .On.

Case 2. Qj: R i a aE then we have

4, . . . Q ~ - ~ R' * cdk @(a) Q ~ + . . . SZ,,

F,: 52, . . . S jP lR i * an;@,. . . dik.SZj,, . . . S,.

52, . . . Qi-lRi a * W ( a k ) Qj+s . . . 52,,

Q, . . . QjPIRZ * * W ( q J @(a) Oj+, . . . Q,

Fa: S, . . . .Qj-l Ri * * 52j.kl . . . 52j+s-l @, . . . diL Si+8 . . . Q,&.

To distinguish Sj+l . . . SJj +s--l in the formula F,, we may use the following obvious property :

s - 1 is the least k for which A(Qj+, . . . 52j+k) = 1. Thus we have an algorithm for superposition of formulas written in W .

If F , and F , are written in P then the algorithm is much more complicated. It is of no particular interest to describe this algorithm in detail, but it may be of interest to give the principal concept:

and hence

Case 3. 52,: R ia *. In place of

we obtain

and therefore

Both formulas are of course written in the same order.

Page 10: Formalisation of Parenthesis-Free Languages

186 ANDRZEJ J. BLIKLE

First we consider the trees corresponding to 91 and 23 and we divide F , and F , into those parts corresponding to the successive levels of the trees?)

6. Final remarks

All the above considerations concern one particular parenthesis-free language. However other classes of parenthesis-free languages are known as for example tfhe second language given by Z. PAWLAK [4] or a EUKASIEWICZ parenthesis-free notation. The question arises whether these languages may also be discussed in the manner exposed in this paper?

The answer is affirmative and we note that analogous grammars can be introduced and analogous theorems can be formulated.

Another problem connected with parenthesis-free languages is : how to describe non-binary trees i.e. trees with more than two branches a t a point of branching, where the number of branches may differ a t every point. A complete discussion of this problem is not of particular interest for this paper. We note, however, that all the three mentioned languages can be extended to the case of non-binary trees but that only the first language is interesting from the practical point of view. For the others two languages the rules of computation are considerable more com- plicated.

References [l] A. J. BLIKLE, On the notion of process. This Zeitschr. 11 (1965), 257-271. [2] Y. BAR-HILLEL, M. PERLES, E. SHAMIR, On formal properties of simple phrase structure

grammars. Applied Logic Branch, The Hebrew University of Jerusalem, Technical Report No. 4, July 1960.

[3] J. LUKASIEWICZ, Elementy logiki matematycznej. Warsaw 1929. [a] Z. PAWLAK, New Class of Mathematical Languages and Organisation of Adressless Comp-

uters. Lecture held on the Colloquium “The fundations of mathematics, mathematical machines and their applications”, Tihany (Ungary) 11-15 September 1962. To be published.

I) Every formula written in L, may be divided into such parts where each such part consists of a bloc of successive symbols (this is not in general true for a formula written in w). For example dividing the formula on page 184 written in L,, we obtain:

I + * * I + * e + * * I * * * . k Z + * * 1 + a b : c d - - f j . g h l . A simple algorithm can be given for such a partition (the acquaintance of the tree is not required).

We shall say that the part of the formula which corresponds to the k-th level of the correspond- ing tree is the k-th level of the formula.

Now, in the construction of Fpt, if a appears in the k-th level of F u then subformulas of i-th level of Fa will be placed into the k + i-th level of 3’91. The levels of F s will not occure in F p ( in bloc form.

(Eingegangen am 6. Juli 1964)