problem-solving procedures forrf717ks1306/rf717ks1306.pdf · problem-solving research thatare...

SCIENTIFIC REPORT NO. 1

PROBLEM-SOLVING PROCEDURES FOREFFICIENT SYNTACTIC ANALYSIS

by

Dr. Saul Amarel

Prepared for th* Air Fore* Office of Scientific Research of the Officeof Aerospace Research under Contract No. AF 49(638)- 1 184.

RCA LaboratoriesPrinceton, New Jersey 08540

Qualified users may request copies of this report from DDC.

SCIENTIFIC REPORT NO. 1

PROBLEM-SOLVING PROCEDURES FOREFFICIENT SYNTACTIC ANALYSIS

by

Dr. Saul AmarelRCA Laboratories

Prepared for the Air Force Office of Scientific Research of the Officeof Aerospace Research under Contract No. AF 49(638)- 1184.

Princeton, New Jersey 08540

II

PREFACE

This paper was presented at the ACM 20th National Conference, held in

Cleveland, Ohio, on August 24-26, 1965. It was not included, however, in

the Proceedings of the Conference (partly because of its length). Plans to

write a modified version of this paper, to include some new results, have

delayed its publication. However, since this paper includes concepts from

problem-solving research that are applicable to the clarification of certain

questions of current interest in computer linguistics, we feel that it would

be appropriate not to delay its publication any longer, and to issue it at

present as a scientific report.

Saul AmarelPrinceton, N. J.May 1968

111

ABSTRACT

The main purpose of this report is to present a logical framework inwhich the syntactic analysis problem can be represented. This frameworkoriginates from previous work on problem-solving procedures for theorem

proving. Procedures for syntactic analysis are represented as reduction

procedures where a problem undergoes a sequence of nested transformations thatresult in a set of simpler subordinate problems whose solution implies thesolution of the original problem. Our representation of the syntactic analysisproblem provides a unifying basis for expressing a variety of syntactic

analysis procedures, both existing ones as well as new, proposed, procedures.Such a common basis contributes to a better understanding and systematization

of the programming of syntax-directed compilers and of other translators whose

source language is a context-free fragment of natural language, e.g., some

"question-answering" systems. A useful concept of computational effort is

defined, and it is used as a guide for the formulation of new efficient

procedures. Heuristic procedures for syntactic analysis are suggested. Some

features of these procedures are relevant to the design of advanced syntax-

directed translators.

V

TABLE OF CONTENTS

Section Page

I. INTRODUCTION 1

11. LINGUISTIC BACKGROUND AND FORMULATION OF THE SYNTACTIC ANALYSISPROBLEM 4

A. The CF-Grammar System 4B. The Syntactic Analysis Problem 14

111. FORMULATION OF THE SYNTACTIC ANALYSIS PROBLEM IN SYSTEMS OFNATURAL INFERENCE 19

A. Theorem-Proving Formulation 19B. The Class of Natural Inference Systems N (G) 20a

The Natural Inference System N (G) 21

The Natural Inference System N.(G) 28

The Natural Inference System N (G) 31Logical Consistency Theorem 36

IV. CONVERSION FROM PROOF TREES TO DESCRIPTION TREES 38

V. NATURAL DECISION SYSTEMS FOR SYNTACTIC ANALYSIS 39A Class of Decision Procedures for N (G) (for all 0)Completeness Theorem a 50

VI. HEURISTIC PROCEDURES OF REDUCTION TYPE FOR SYNTACTIC ANALYSIS . 54A. States, Moves, and Search Trees in Reduction Procedures for

Syntactic Analysis 55B. Computational Effort 56C. Approaches to Move Selection 58D. Approaches to Attention Control 63

REFERENCES 66

VI

ILLUSTRATIONS

figure

1 Graph representation of a labelled n-nary replacement rule

K ± : A-,/^. . . *(tO , where ♦<», ♦<«, .. . , /"W ... 9

2. The graph, r(G{) , of the grammar Gx 9

3. A P-derivation of a terminal string and its corresponding, 11P-marker

4. Schematic representation of the alternative approaches to the

construction of a proof of P =» x in the systems Nff(G) 22

5. Graphic interpretation of the situation considered in the

specification of a rule of inference It ± of Mt(G) , which

corresponds to a "top" application of a rule of replacement24

R. of Gl

6. The proof is tree form of P=» abed in the system N^G^. . . .27


specification of a rule of inference I^. of N^G) , which

corresponds to a "left" application of a rule of replacement

R. of G 29

8. The proof in tree form of P=> abed in the system . ... 32


specification of a rule of inference I± of Nr(G) which

corresponds to a right application of a'rule of replacement

R. of G 33i

10. The proof in tree form of P=» abed in the system N^Gj) . ... 35

11. One of the tree form proofs of P =» abed in the mixed system

N(M,r/Gl> 3?

12. Inference mapping tree 48

13. Representation of the application of a replacement move,

followed by two bisection moves, in a problem-solving tree. . . 57

14. A compound replacement move (a maneuver) from the "top" and

"left" 62

1

INTRODUCTIONI.

The problem of efficient syntactic analysis in a context-free (CF)language is of considerable practical importance for the design of syntax

directed compilers and of computer processors whose source language is a CF

fragment of natural language, e.g., "question-answering" systems. The problem

is also of theoretical significance for the study of grammars and for theexploration of perceptual models of language.

Our main purpose in this paper is to develop a broad logical framework

in which the syntactic analysis problem can be represented, in a way that will

permit us to consider a large class of syntactic analysis procedures, among

them in particular, procedures that perform efficient syntactic analysis. The

framework to be presented originates from attempts to systematize certain

essential elements in heuristic problem-solving procedures. The problem of

syntactic analysis is a theorem-proving problem. We will demonstrate that it

can be effectively solved by procedures of the reduction type. In such

procedures a problem undergoes a sequence of nested transformations that result

in a set of "simpler" subordinate problems whose solution implies the solution

of the original problem. We will also show that heuristic problem-solving

procedures of the reduction type are' excellent candidates for the solution of

the problem of efficient syntactic analysis. While most existing syntactic

analysis procedures carry out their task in a rigid way, our proposed heuristic

procedures exhibit considerable flexibility of approach (because of their more

global view of the problem) and they can attain greater computational efficiency

(in the sense of avoiding needless search) by selectively responding to special

properties of the string at hand.The point of view that we are proposing for the syntactic analysis

problem provides a unifying basis for expressing a variety of syntactic anal/sisprocedures, both existing ones as well as new, proposed procedures. We will

show that it is possible to define a useful concept of computational effort

within our general framework, and will use this concept directly as a guide for

the formulation of efficient procedures; this same concept can also be used asa general basis for comparing procedures that are expressed within our frame-work.

A serious obstacle to the application of ideas that come from artificialintelligence research to problems of practical interest is the difficulty ofrepresenting the problem in an "appropriate" form, i.e., in a form that makes it

2

easy to transfer concepts and methods that were developed for a prototype

problem to the problem on hand. Because of the importance of the question of

transforming the problem representation from its original form to an "ap-

propriate" form, we are giving it major emphasis in this paper.

We formulate in Section II the syntactic analysis problem in its

conventional linguistic form, where the concept of a granroar G as a combinatorial

system of concatenation is central, and where a language L(G) ls defined «s «

set of strings generated by the grammar. We then show in Section 111 how the

problem canbe regarded as a theorem-proving problem in a logic, and we

formulate a set of natural inference systems, Nff

(G), in which our problem can

be represented. We then prove that the systems Nff(G) are consistent with the

linguistic system G, i.e., if a solution to our problem exists in N^G), then

it also exists in G. Our mpve from Gto the systems Nff

(G) was suggested by

previous work with heuristic theorem-proving procedures for the propositional

calculus, where we have found that a formulation of the problem in a system of

natural inference (a system of subordinate proofs) was especially fruitful.

A proof in a system Nff(G) has the form of a tree and it corresponds to

a structural description of an input string in the language L(G) . Because of

the flexibilityof proof construction afforded by the natural inference systems,

a proof may be obtained in a variety of tree forms. In Section IV we discuss

the correspondence between a tree proof in any system N^G) and a structural

description of a string in the CF language; this clarifies the question

of structural consistency of the natural inference systems with respect to

the grammar.The completeness of the systems N

ff(G) with respect to G [i.e., if the

problem has a solution in G it is also solvable in any of the systems N^GS,is proved in Section V by embedding the natural inference systems in a set of

aystems JC*(.G) of broader scope. The latter systems are natural extensions of

the systems N (G), and they are obtained by strengthening the inferential

mechanisms of\(G) so that both proofs and refutations can be obtained in

them in a uniform way. We call the stronger systems natural decision systems.

TheVformulation of a general schema for decision procedures ih the natural

decision systems (given in Section V) provides us directly with a large class

of reduction procedures for the solution of our syntactic analysis problem.

In these procedures a search tree is grown as the computation proceeds, and

the growth stops when either a proof or a refutation is obtained. In Section VI

3

we introduce a measure of computational effort which is related to the size of

the maximal search tree which is grown by a procedure. We then develop the

essential features of heuristic reduction procedures for syntactic analysis.

Our approach is guided by the goal of minimizing the expected computational

effort that a procedure is to expend in the course of attempting to construct

a structural description for a string in a CF language. The ability to choose

a different method of attack in response to the properties of the specific

problem on hand, the possibility of considering a restricted set of relevant

moves, the formulation of compound moves (or maneuvers), and the "on-line"

direction of the thread of computation to those subordinate problems that

promise a minimal expected expenditure of estimated computational effort are

the essential features of these heuristic procedures. The estimation of

expected effort along alternative lines of solution is an extremely useful

concept for the organization of intelligent search in heuristic problem -solving procedures. In our present case, we find that this concept can be

applied to great advantage since, as we will show in Section VI, it is possible

to formulate a reasonably good estimate of expected computational effort on

the basis of string length.

4

11. LINGUISTIC BACKGROUND AND FORMULATION OF THESYNTACTIC ANALYSIS PROBLEM

The formal linguistic definition of a context-free (CF) language L(G)

is commonly given in terms of its CF-grammar, G

which is regarded as a combinatorial system of concatenation (see Chomsky' ' J) .V in (2.1) is a finite set of elements, called the vocabulary of G.

The concatenation of a finite number 0) of elements in V forms a

string inV*; in this pcper we denote strings in Vby9, X' * and (P°seibly

subscripted) .A string is represented as a juxtaposition of the symbols that denote

its successive elements. If n is the number of elements in a sering 9, then

1(9) "n; 1(9) is called the length of <p. We shall use the following no-

tational convention for naming component elements of strings: If 9is a non-empty string, then 9O denotes the leftmost symbol in <p, 9 the next to tha

leftmost symbol, etc.; furthermore, 9^ denotes the rightmost symbol in 9,<[>^ the one preceding the rightmost symbol, etc. If 1(<P) "n, then we have

The etTOtv string, of length 0, is denoted by A. The concatenation of a pair

of strings 9 and x is denoted by 9X*VT in (2.1) is a set which is properly included in V and it is called

the terminal vocabulary; its elements are called terminal elements. Theconcatenation of a finite number (possibly 0) of terminal elements forms a

string in VT: we denote strings in Vj by x, y, z, (possibly subscripted).

The sentences of the language L(G) form a subset cf the set of all strings

in VTThe complement of the set VT with respect to V is denoted by VN and

it is called the nonterminal vocabulary; its elements are called nonterminal

.elements. We have then, V«Vj U Vj, and VT fl VN ■ Q>. The nonterminalelements are used to represent syntactic types in the grammar.

p in (2.1) is a finite set of replacement ruins that are given inthe form

A "* 9 (2.3)

"* For brevity, we will use "9 is a string in V" or "9 is in V" for "9 is a

string whose elements are taken exclusively from the. set V"; similarcomments hold for "x is a string in VT" or "x is in V".

A. The CF-Grammar System

G»< V, VT, p, P>, (2.1)

9 - 90) 9(2) ... 9(n- ,} 9<n)"= 9

("> 9(nC,) ... 9

<?)9

<T) . (2.2)

5

where AcVNj ar>d 9is a non-empty string in V. The arrow, -►, stands for a non-reflexive and asymmetric dyadic relation whose interpretation is "can be re-placed by". As an example, the replacement rule A ■*■ ABb can be read as "A canbe replaced by the string ABb". A replacement rule A * 9 is called n-ary if1(9) - n (in the previous example, we have a ternary rule); in most CF grammarsthat have been proposed for fragments of natural .languages, as well as forprogramming languages, - n is 1, 2, or 3.

A string 9 is derivable in G from a string ty if and only if thereexists a sequence of strings D(ty,9|G),

(2.4)

such that(i) 9, "y> <Pn *» <P, and

(ii) for any two consecutive strings 9., 91+.j (i < n for n > 1), thereare strings Xi> X? not necessarily distinct, possibly empty) andreplacement rule A "*" a>, such that 9. ■Xi X9an<* 'Pi+i "Xi"* %"

We call the sequence D(ty,9|G) a ty-derivation of 9 in

G,

and we denote therelation "9 is derivable in G from ty" by ty =» 9. Clearly, the relation "» isreflexive and transitive. If ty =» x holds, and if x is a string in theterminal vocabulary V_, then x is called a terminal string of ty in G.

For any nonterminal element X, let =C(G) denote the set of all terminai strings of X in G, i.e.,

(2.5)

We assume that G is such that is not empty for any nonterminalKin VM. This is clearly a necessary condition for the inclusion of a non-terminal element in a grammar (of any practical interest).

We now introduce the notion of a support. s(CK), Of an element OteV,which we find useful in the formulation of syntactic analysis procedures.

s(a) « ( Mm l(x), if OteVNxe^(G)(2.6)

Thus, the support of a nonterminal element X is the minimal length of a termi-nal string which is derivable from X in G; clearly, according to our previous

assumption on nonterminals, we have,

D(ty,9[G) -hy 92, ..., 9n], (n> 1),

s^(G) = {x|x is in VT, and X=> x holds}

1 , if acVT

6

s(X) > 0, for all XcVN (2.7)

The notion of support extends to strings in a natural way. Thus, if 9is astring in V, and 1(9) « n, then,

ns(9) -£ s(9(i)). (2.8)

i»l

P in (2.1) is a designated element in V„ which has the followingspecial linguistic significance: The CF- language L(G) generated by thegrammar G .is the set of all terminal strings of Pin G. Thus,

(2.9)

For each string, x, in L(G) there exists a phrase-marker (P-marker)

of x in G. which is a structural description of x in G relative to P. TheP-marker of x is based on the P-derivation, D(P,x|G), of x. A structuraldescription of x in a grammar G is a detailed representation of the structure

of x in terms of the replacement rules that determine the successive steps ofits derivation within that grammar.

Before proceeding further with our definition of the notions of P-

marker and structural description, let us introduce a grammar, G'

(2.10)

which is equivalent to G (in terms of generative capacity), but it differsfrom it in the form of representing the replacement rules, p 1 In G' is aset of labeled replacement rules which is related to p in G as follows:

For each rule A ■* 9 in p there corresponds a labeled rule in p' whichhas the form

(2.11)

where mis the number of rules in p, and K± is the label of the ith rule (it

names that rule). Thus, a reference to a rule R^p is intended to designate

the replacement rule A■* 9. The finite set of rule labels R±, ..., Rm is

called the rule labels vocabulary, and it is denoted by V^ in G.By using the notational convention given in (2.2), we can write a

labeled replacement rule in the form .

where 9^ , ..., 9 «V, and n « 1(9)

L(G) » = [x|x is in Vx, and P=> x holds].

G' -< V, VT, VR, p», P >

\: A - 9, (1 < i <m),

{\\ (?\ (n)»R :A -r 9VU <?^J ... 9 , (2.12)

cV,

and n « 1(9). In the interest of notational

7

convenience, we also use the following labeling scheme for parts of a replace-ment rule:

R^ ' names the left side of the replacement rule R (in (2.12),R <o) -A),

R , I<J < n , names the appropriate component of the right side

string in the replacement rule R (in (2.12), IV '■ 9 , etc.).

Each labeled replacement rule can be represented by a special directedgraph where the order of the string components 9* , 9 , etc. is explicitlyindicated by numbering the graph branches in an appropriate way; such a graphis shown in Figure 1. We can extend the graph representation used for indi-vidual rules to obtain a combined overall description of the entire grammarG'; we denote this representation by T(G'), and we call it the graph of G.The graph r(G')is constructed as follows:

(i) Each node of the graph corresponds to an element from the vocabulariesV„, VN, VR . The nodes are labeled appropriately and they are classi-fied (in the obvious way) as terminal, nonterminal, and rule nodes,

(ii) From each nonterminal node (corresponding to a nonterminal element,say X) there emanate jx branches (jx > 1) into the nodes correspondingto rules of which the nonterminal Xis a left side. These branches arenumbered 1, 2, ..., jx ; the numbering is arbitrary, however, somespecific methods of numbering may be better suited than others forspecific computer realizations of procedures that use the notion ofr(G').

(iii) From each rule node (corresponding to a replacement rule, say R )there emanate n branches into the terminal and nonterminal nodes

(1) (2) (n )that correspond to components 9 ,9 , """, 9 ', of the right-

side string in R. . These branches are numbered 1, 2, ..., n, sothat a branch entering a node 9^ ' is assigned the number k. Forconvenience, we also mark the n 'th branch with fTo illustrate the notion of the graph of a grammar, let us consider,

JL

*s an example, the following simple CF-grammar, Gj:

This illustrative example was used by Griffiths and Petrick in [2].

Cp VN = {P, A, B}

*" " :—■——

8

VT - {a, b, c, d],

p ' m R 1 : p ■*■ AB

R : A ■*" ABb

R3 : A* (2.13)R. : B * BdAR : B -*■ be

The graph r(G|) is shown in Figure 2.

A derivation In C, denoted D(ty,9|G'), is a stronger version of the

notion of derivation, D(ty,9.G), used in G (see 2.4). In addition to the

strings that enter as steps of a derivation in G, we include in D(ty,9|G') a

specific record of the rule applications that are associated with each

transition between consecutive steps.

The notion of rule application is extremely important for problem -solving procedures in general and it has great significance in our present

context (as it will become apparent in the subsequent discussion). Its

nature is that of a process which effects a specific transition between a

pair of strings. The rule application process consists of two subprocesses

in sequence. The first is a selection process which decides what part of

what relevant rule is to be identified with what part of the input string (we

call the latter part the application site); the second is an execution proces

which carries out the replacement of elements prescribed by the chosen rule

at the chosen application site, and it produces the output string.

Consider, as an example, an input strifjg 9 which has the element X in

one or more of its sites, say 9- X, <P(U>

X 2 »(V)X 3 <?W \> "»»*"

9<») - 9<v> - 9

(w) -X, and x,, x2> X3> X4"c strinSs in V <P°" ibly ""**>"Consider next the set of all rules of replacement that have X at their left

side, i.e., {R|R(0) - X}. Let us assume that a specific rule application

process takes place, where

(i) both a rule 8 6 £R|R(o) -X} (suppose that R designates the rule

X - o>) and an application site in 9 (say 9(V> are selected, and

(ii) the element 9(V> f replaced by co and a new string,

*. v 9<u) xm X 3 ?

(W) \is Benerate8enerated* We represent the record

9

Fig. 1. Graph representation of a labelled n-nary replacement ruleRi : A-*^1) ... <t>(n), where $(*), 4>(2 ), ..., a>(n)eV.

Fig. 2. The graph, T(G|), of the grammar G...

10

of this rule application process by the following sequence:

The parenthesis (R(o) , v) indicates that the left side element of^the rule R , i.e.,R(o) , is applied to the v'th site of 9, !"«" 9 V

We can now define a ty-derivation of 9 in G', i.e.,D(ty,9|G'), #s

sequence which has the following form:

(2.15)

where(i) 9, ** *> \ " 9 ,

(ii) a subsequence [^ (Rk<o) , qk >> 9^., ], for I<k < n and n> 1,

stands for the record of a rule application where (\ , qfc) mdi

cates that the left side element of a rule \ (where

R €{R|R(o) - 9(<lk) }) is applied to the qk 'th site of 9k (we have

1 <qk < K<Pk»-Consider now, as an example, a specific P-derivation of the string

x » abed In the grammar G{ (given in 2.13).

(R<o) , 2), abed]. <2 ' l6>This derivation is shown in graphical form in Figure 3(a). In this figure, we

are using the graph representation of replacement rules that we have intro-

duced previously, and a representation of strings in the form of sequences of

nodes of appropriate types; we are also using special branches (denoted by a

double line) to indicate the "carrying over" of the string elements that are

not affected by a rule application from one step of the derivation to the next

Let us now apply the following elementary (condensation and simplification)

transformation (which we call a)

a: (i) condense the "carrying over" branches, (ii) substitute a rule

label for a rule application parenthesis, and (iii) eliminate (for

convenience) Jhe numbering of branches, but maintain the horizontal

ordering of the branches that descend from each rule node.

[9,(R(o) , v), ty]. '<*"">

D(ty,9|G') - t9,, (R-j i <.]*>> <P2 ' (R2 ° ' q2 J ' *"'

Vl>> *n]' (n^°'

D(P, abcd|Gj) - [P,(R;°! D, AB,(R$0) , 0, aß,(R<°>, 2), aßd,

11

o

a. * / iTsg , V 2 -\ Vi -°\ 10 H Ld o\ ° T iio »»q » q i

a*"8oa.10<vvvoo

2 ca

■scd

- JH g in~ Ssr -o

o o o CD o? E E — Ecc a: <r j£ j^CO CO CO CO CO♦-" *d -d x: jd

I1InIdMTI

(X _| JrlI c

—^. I r-

ifi?lI

I

—T

I

vo

o <

Ma: idUJ XQ h-» 2CL

_3

00a"H

4-1car- 1a1J-l

CU

4-1

>4-<og"l-l4-1cd>v

CU

"aIa.

<:.

CO

M■H

— CM ro sr tf)

12

We obtain the tree graph shown in Figure 3(b). This tree graph is a P-marker

of x = abed in Gl ; we denote it by The transformation a, whichwas just outlined, produces, in general, the P-marker of a string x in Gl ,i.e., the tree from its corresponding derivation D(P,x|G') .

The P-marker of a terminal string provides the essential information

contained in a specific derivation of that string. In particular, it shows

the structure of replacement rules that transforms the designated element P

into the terminal string. For each nonterminal element X that assumes the

role of a rule application site during a derivation, the P-marker provides a

trace of rule applications down to the part of the terminal string which is

derived from X. The P-marker does not conserve information about the specific

step of the derivation at which a specific rule has been applied; it just

shows that the rule was applied at some step.

P-markers are trees that are rooted at the nonterminal node P and

that have as terminal nodes (in the appropriate horizontal order) the terminal

elements that compose the string whose structure they display.

The notion of a P-marker is a special case of the more general notion

of a structural description. A structural description of the string 9 in V

relative to the nonterminal element X (we denote it by is a tree

which can be obtained from a derivationD(X,cp|G') via the transformation a

which we have discussed previously. The tree is rooted at X, its

terminals are the (horizontally ordered) components of the string cp, and its

structure displays the manner in which a specific set of replacement rules in

G 1 is combined to effect a transition from X to cp.

If a string 9 in V is derivable in G' from XcVN and if furthermore

there exists a single structural description of cp relative to X, then the

string is called syntactically unambiguous relative to X; if there is more

than one structural description, then the string is called syntactically

ambiguous relative to X. This notion of syntactic ambiguity is carried over

to sets of strings as follows: Given a set {cp|x=> cp holds], the set is

syntactically unambiguous relative to XcVN if there is no string in the set

which is ambiguous relative to X; otherwise the set is syntactically ambiguous.

These notions are carried over in the obvious way to P-markers and CF-languages

L(G).

jrp

13

k

Since it is a desirable property of programming languages that they

be syntactically unambiguous (and therefore it can be assumed that their

designers make efforts to satisfy this desideratum), then it can be assumed

(for practical purposes, since the problem of syntactic ambiguity for CF-r ix 1

languages is undecidable in general ) that every valid string in such a

language has a single P-marker, and furthermore all the substrings of validstrings have unique structural descriptions relative to the nonterminals from

which they are derivable. This assumed property of programming languages hassignificant implications on the types of procedures that can be proposed for

their syntactic analysis (as we will see later in our procedures where effort

allocation decisions are made).

The notion of the graph of a grammar, r(G'), and the set of structural

descriptions in that grammar are strongly related. For all nonterminals XcVN,and for all the strings cp in V that are derivable from X in G', the set of

structural descriptions of cp in G 1 relative to X can be effectively generated

from r(G') via a generation procedure of the type outlined in the next

paragraph.

The generation procedure based on F(G') consists of systematically

tracing and recording all the distinct tree paths of the graph T(G') that

are rooted at X; each such tree path corresponds to a structural description

relative to X. A tree path starts at X; it follows a single directed branch

out of each nonterminal node which has been reached by the path; it follows

all the branches that leave a rule node which has been reached by the path

into the adjacent (terminal and nonterminal) nodes; and it stops only at

terminal or nonterminal nodes. Several approaches are possible for organizing

the systematic generation of the set of structural descriptions, and also for

selectively generating structural descriptions that have certain properties,

as well as for stopping the generation process under given conditions.

A generation process of special interest is that which starts at the

designated node P of T(G') and produces all the structural description trees

of terminal strings, i.e., the set of P-raarkers of the language L(G'). In

most of the interesting cases, where the membership of a language L(G') is

infinite,

the set of all P-markers is infinite. This set is generated by a

T(G') which contains loops. All the nonterminal nodes that are parts of

loops are associated with elements that are called recursive. Three types of

14

recursive elements, which have been found significant in various linguistic

studies^ , will be discussed later in connection with the formulation of our

syntactic analysis procedures. They are:

(i) Left-recursive elements. These elements occur in loops of r(G')

where, for each rule node on the loop, the loop branch leaving the

node is marked with a 1. Left-recursive elements occur iteratively

in the leftmost chain of a P-marker tree.

(ii) Right -recursive elements. They occur in loops of T(G') where, for

each rule node on the loop, the loop branch leaving the node is

marked with a T. These elements may occur iteratively in the right-

most chain of P-marker trees.

(iii) Splf-embedding elements. They occur in loops of f(G') where, the

loop branches leaving the rule nodes on the loop are not all marked

consistently with either 1 or f. Self-embedding elements may occur

iteratively in a tree chain of the P-marker which is neither the

leftmost or the rightmost.

In the graph r(G ') of our example (see Figure 2), it can be seen that the

nonterminals A. and B are left recursive.

B. Thp Syntactic Analysis Problem

Our general objective is to find efficient solution procedures for

the following problem, which is commonly called the language recognition

problem:(ir

Q

): Given an input string xin VT, (i) determine whether xis well

formed in the language L(G) (i.e.,whether x is a member of L(G)), and

(ii) if x is well formed, find a P-marker of x in Gl .Since the answer to the first part of ir

Q

is reducible, in most non-

trivial cases, to that of attempting the construction of a P-marker for x,

then the central problem in language recognition is that of P-marker con-

struction. This problem is also becoming of increasing practical significance

P-marker (parse) generation of an input string constitutes the first stage of

15

processing in syntax-directed compilers for programming languages and in

certain "question-answering" information retrieval systems that respond to

restricted context-free fragments of English. In systems of this type, theparse generated, by the syntactic analysis stage is the input to translationand interpretation rules that assemble the appropriate computations in response

to the input string.

Since it can be assumed that the source of the input string x (namelythe programmer or computer user) is restricted (by intention) to generation ofwell-formed strings, then the reason for the possible occurrence of ill-formedinput strings 'is the presence of communication error (remember that theprogrammer or computer user take part in the communication process also).

While it is sufficient, in many cases, to know simply that an error hasoccurred, it is more desirable, in general, to obtain error identification

information from an unsuccessful attempt to construct a P-marker, and betterstill, to automatically use this information for error correction. Therefore,even in the case of ill-formed input strings it appears desirable to makeserious attempts towards the construction of a P-marker — with the view to

use information- derived from these attempts for error control.

The statement of the problem. 7r

Q

is especially appropriate for

language theorists whose main objective is to propose and subsequently test

the validity of generative grammars that are intended to characterize given

languages; since our main objective is to propose efficient procedures that

"understand" (in our context, this has the limited meaning of "that respond

appropriately to the form of") language strings that are generated by a given

.grammar (via a generation process which has certain error properties), thenit is appropriate for us to reformulate the statement of 7Tq as follows:

(7T,): Given an input string xin VT, (i) construct a P-marker of xin

G* and (ii) if no P-marker exists, provide information (for purposes

of error control) about the unsuccessful attempt to construct a P-marker.

In this paper we are mainly concerned with the first part of theproblem TT, . A satisfactory solution to the second part requires the formu-

lation of a specific rationale for error control, which is outside our present

scope. However, our approach to the construction of P-markers makes it

possible to record considerable information about the construction process

itself — information which appears relevant to error identification and

correction.

16

The problem as stated in

TT,

requires the construction of a single

P-marker of x, if it exists. The construction of a single P-marker doesn'traise any question if the language is syntactically unambiguous. As we

pointed out previously, procedure-oriented programming languages can be

assumed to be unambiguous (at least it can be assumed that the part of the

language which is considered as proper input to a syntax-directed analyzer

will be free of known ambiguities); a similar assumption can be made about

other computer source languages that are designed to approximate smallfragments of natural language and that are intended for use in "questionanswering" systems, where the system is assumed to selectively respond to

syntactic forms of input strings *

Even in the case of syntactically ambiguous languages, we believe

that the introduction of explicit, non-syntactic, rules of preference for

ordering the generation of alternative structural descriptions (so that a

single "preferred" structural description is produced) will not necessarily

be detrimental to the machine "understanding" of an input string;— on the

contrary, it may provide an independent handle for dealingwith the difficult

problems posed by ambiguities. In some of our proposed procedures (to be

discussed in Section VI), the selection of a "preferred" P-marker is based on

the principle of minimizing expected computational effort - a purely pragmatic

and non-syntactic notion. Clearly, this notion is sensitive to the specific

formulation of the syntactic analysis problem, to the solution approach, and

to our definition of "computational effort"; we will return again to this

point later.We can how formulate our objective in more specific terms. We wish to

obtain syntactic analysis procedures

mapping 6;that efficiently compute the following

if P =» x holds (2.17)if P =» x does not hold

where indicates the "preferred" P-marker of x, and E(P,x|Gj) denotes

an error description (this can be a simple indication of failure or a more

elaborate message). Our emphasis is on the concept of efficient computation

of 6; roughly, we expect that such a computation should require a minimal

* As it can be seen from a recent survey [ 5 ]by

Simmons,

not all approachesto such systems attempt to avoid syntactic ambiguities of source language.

for all x inVT , 6(x)= j p^lG ')( E (P,x|G')

17

expected expenditure of computational resources (or computational effort).To make the notion of efficiency precise (and meaningful) we need to introducea definition of computational effort which both satisfies our intuitive requirements and is applicable in a uniform way over alternative syntactic analysisprocedures. We need therefore, as a prerequisite, a broad framework forformulating alternative procedures in a uniform way. Furthermore, it isdesirable that this framework be closer to real computer programs rather thanto abstract machines. After all, the requirement for efficiency comes froma desire to attain solutions of "real life" problems faster and more economi-cally, and it is clear that results regarding relative efficiency of procedures

that are formulated in a framework closer to real computations will be moreuseful (as -guides to actual selection of computational programs) than resultsobtained in the world of Turing machines. Since several procedures forsyntactic analysis have already been formulated in the past (both for program-ming and natural languages), it is desirable that these procedures beinterpretable in our framework; this way, their relationships can be betterunderstood and they can be compared with the new procedures that we willformulate.

Given a definition of computational effort and a framework for formu-lating procedures, we are still faced with the problem of actually creatingclasses of new promising procedures, comparing them with existing procedures,and choosing among them those that are optimal with respect to computational

effort (or at least of ordering procedures by degree of optimality). This isan extremely difficult ordering and optimization problem which, at present,

can be approached only empirically (i.e.* by computer experimentation) in most

nontrivial cases; in such cases we are not in a position to demonstrate con-clusively that a candidate procedure with alleged optimality properties isindeed optimal. Procedures of this type are heuristic procedures. A heuristicprocedure has a status similar to that of a theory in an empirical science;it is the best procedure that we know how to devise given existing ideas and

experience - however, it is always possible that its validity (in our case,its optimality) may be refuted at the next computer run. Heuristic procedures

[6] are a central subject of study in artificial intelligence. An important

class of heuristic procedures is based on an overall scheme offlexible and selective search for solution which proceeds by successivereductions of the initial problem into subsidiary problems. We call such

I

18

procedures, (heuristic) problem-solving procedures of the reduction type;

they apply to a large variety of problem situations- (well-defined problems of

the theorem-proving type), provided that the problem is cast in the appropriate

form. The ."appropriate representation of the problem" is one of the main

criteria for the choice of framework in which our procedures have to be

formulatedThe central importance of building" an appropriate framework for

problem and procedure representation, as well as for formulation of efficiency

measures should be evident by now. In the next section we shall introduce

such a general logical framework for our problem. We shall subsequently

formulate within this framework classes of problem solving procedures of the

reduction type for syntactic analysis. These procedures will incorporate in

their design features that reflect our intention to minimize (our notion of)

expected computational effort.

*",

19

k

111. FORMULATION OF THE SYNTACTIC ANALYSIS PROBLEMIN SYSTEMS OF NATURAL INFERENCE

It takes only a slight reformulation of our syntactic analysis problemto recognize that it is a theorem-proving problem. We note that the CF-grammarG is a combinatorial system (a restricted semi-Thue system (see Davis ' ))which has V as its alphabet, P as its single axiom, p as the basis for itsproductions, and strings in Vas its words. Under this interpretation, forany string (word) <p in V a derivation sequence, D(P, cp|G), is a proof of CP inG, and <p is a theorem of G. A derivation D(P <p|G') can be regarded as ajustified proof of cp, where, in addition to the words that constitute steps

of the proof, justification for each step is provided in the form of appli-cations of rules in p that form valid productions. Furthermore, a structuraldescription can be regarded as a structure of justifications that"holds together" the proof D(P,cp|G') . The strings of the language L(G) formthat subset of the theorems of G whose component elements are taken exclusivelyfrom the sub-alphabet Vdj,. If x is in L(G), then x is a theorem of G, and aP-marker of x corresponds to the structure of justifications in a proof of xin G. Thus, the problem of constructing a P-marker of xin G' corresponds

to the problem of constructing a justified proof of x in G and then extracting

from it the underlying structure of justifications.[ 9 1

If we adopt Davis' broad notion of logic , which includes in anatural way the notion of a combinatorial system such as G, then we can alsoregard our problem as proof construction in a logic which has a single axiom

P and whose rules of inference are the productions of G.

In recent years there has been considerable work on the mechanizationof proof construction in various systems of symbolic logic; both the propo-sitional calculus and the predicate calculus have received attention by

several investigators. ' ' The propositional calculus has alreadyprovided a fruitful proving ground for the development of concepts and methods[ 7 8 1in an important class of heuristic problem solving procedures. ' i In otherwork with heuristic procedures for proof construction in the propositional

ffi 19lcalculus we have found that a formulation of the proof problem in asystem of natural inference ' has considerable advantage over formulations

* The natural inference approach, or the method of "subordinate proofs" wasdeveloped in the early 1930's by Gentzen [13] and Jaskowski [ Rj; it wasused more recently by Fitch [15] and Nidditch [16]; and it was first used byWang [ll] for obtaining proofs by computer.

A. Theorem-Proving Formulation

20

within conventional axiomatic systems. The natural inference approach, permits

us to cast the proof construction problem in a broad framework of machine

problem solving, which is appropriate for the formulation of procedures of the

reduction type. In these procedures, the steps towards solution appear to

have striking similarity to the natural steps of reasoning observed in human

problem solving. Furthermore, the organization of the search for solution

required by such procedures appears to be well suited for computer implementation.

Since the syntactic analysis problem is essentially the problem of

constructing a proof in a given logic, we will attempt a natural inference

approach to the problem,* with the expectation that it will provide us the

desired unifying conceptual framework for syntactic analysis procedures.

We shall discuss next a class of natural inference systems for our

proof construction problem; these systems will provide the basis for a class

of augmented systems, to be discussed subsequently, in which our problem

solving procedures can be naturally formulated.

B . The Class of Natural Inference Systems

We shall associate with a CF-grammar G a class of seven natural inference

systems of logic, which we denote by Na(G), where cre{t,^,r, (t,i),(t,r), U,r),

(t,i,r)}. We intend to designate by a the "strategic approach" to proof

construction associated with each of the systems; a - t designates an approach

from the "Top", a - & an approach from the "Left", err - r an approach from the

"Right", and the other values designate combinations of these approaches (we

will shortly give an interpretationof these terms).

The well-formed formulas (w.f .f .s) of \(G) , which we denote by

S, S. (i

■>

0, 1, 2, ...), are expressions of the form

a » 9, (3.1)

where aeV 9 is a string in V, and the intended interpretation of the double

arrow, =», in (3.1) is as in the linguistic system (see Section II.A); namely,

it denotes the reflexive and transitive relation "9 is derivable in G from <*" .The systems N (G) have a single axiom in common, they differ however

in their respective sets of rules of

inference;

they also have in common a

tree form of proof, which is especially well suited for the uniform formulation

'* We have recently found that Lambek [17] has proposed a formulation whichhas some conceptual similarity to ours; it is mainly geared, however, toproblems of grammar and dictionary construction.

21

of various proof construction procedures. We have chosen these systems so

that for every w.f .f . P =» x which is valid in any N (G), x is well formed in

the CF-language L(G), and moreover a tree proof of P =» x in any system is ina simple one-one correspondence with a P-marker of xin G. In Figure 4we

give a schematic representation of the proof construction situation that we

face in N (G) . We find it suggestive to represent a w.f.f., say P=»x, as a

triangular figure with P at its apex (we call P the "Top" element) and thestring xat its base (we call x the "base string", x ' the "Left" element,x the "Right" element, and we represent the string as a Jagged line). We

regard such a triangle as an outline of a required configuration of replace-

ment rules (each rule having the tree form shown in Figure 1) which is to

bridge the space between the "Top" element and the "base string"; an appropriate

configuration of this type is in effect, a structural description tree. The

meaning of the different strategic approaches that characterize the different

systems N (G) can be easily interpreted now in the light of the triangular

description: an approach of a certain type indicates the corner (or combination

of corners) of the triangle from which the attempts to construct the bridging

configuration are made. While all the systems Nff

(G) are logically equivalent

(they all have the same set of theorems), they differ in their approach to

proof construction; this property is of no serious consequence for logic

£erse, but it is of primary significance for us, since it provides us with

a rich variety of alternatives - on basis of which efficient proof construction

procedures can be formulated.We shall now consider in detail the systems of natural inference Na(G) .

Axiom CL : For all strings *, 9in V, the w.f.f. **9 is valid in all the

systems N^G) if v/ « 9-

The Natural Inference System N^(G±.Rules of Inference fl }► . The set {l} contains m rules, where each

■ ~ (1) (2) (ni)rule I e{l} corresponds to a rule of replacement R£ : A ■*■ 9 9 ... 9 ,(1 < i < m), of p' in G' in the following way:

* This is an axiom schema, and so are the "axioms" of Section V. Since noconfusion is likely to arise in our present context we are referring to

them as axioms.

Lemma 3.1 If a w.f.f. *=»9 is valid in N^G) by Oj , then 9is

in G.

Proof: By reflexivity of the relation =»,

Po("T0P" ELEMENT)

"TOP" APPROACH

\J1RIGHT" APPROACH

a/vvo x(l } ("RIi3HT"ELEMENT)

Fig. 4. Schematic representation of the alternative approaches tothe construction of a proof of P => x in the systems

VG>-

23

k

I ,:* For any nonterminal X such that X■ A, and for any (non-empty)t,i .string w in V, if there exist non-empty strings Xj> X2' ""> Xn in vsuch that X]X2* , *Xn * w, and all thew.f.f.s

9* 1* " Xj» <P(2) " V ""' <p(ni^ *\ ' *re valid ln N

t(G>' then x s* w

is also valid in N£ (G) .Lemma 3.2(t). If a w.f.f. X =» w is valid in N (G) by a rule of inference

I *[!■}+> then w is X-derivable in G.t, i t

Proof;

Since X- A, then (by R.) X can be replaced by the string

cp - 9^, ..-., 9^ni^; therefore, X*» 9 holds in G(9 is X-derivable in G) .Now, if a cp-derivation of cv exists in

G,

an X-derivation of co also exists in

G (by the transitivity of the relation =>). If it is possible to partition the

string o> in n. adjacent substrings y., ..., v such that

\,

is 9 -derivable1 (n*\ ' "i 'in G, ..., v is 9 '-derivable in G, then it is clear that u> is 9-derivable

in G (by definition of the notion of derivation in G). Hence, a rule ofinference I is consistent in G, and the Lemma is proved. Note that by

t> iintroducing the partition idea, we break down the initial question of deriva-bility into n. subsidiary questions of derivability each of which is identical

in form with the first, and moreover it can be resolved independently of the

others (thanks to the context-free nature of G) .In Figure 5 we give a graphic interpretation, (in terms of the

triangular descriptions introduced in Figure 4) of the situation which under-

lies the specification of I .. The rule of inference considers a part of a(hypothetical) configuration that can possibly establish the required bridge

over the "problematic" triangular space. Specifically, It considers the

application of a replacement rule R., where the element R.^ 0' is "connected"to the application site X; we call this a "Top" application of R^ and denoteit by RC (the more comprehensive notation, (R^ , 1), introduced in (2.10),

is not needed here since we have an unambiguous "Top" application site) . The"Top" application of R. fills part of the triangular space between X and u> ,and it results in n. new "Top" elements (the elements of 9) dangling atop the

problematic space. Given a partition of the base string in n^ substrings

X > '"> X, > eac^ °^ the d^gli1^ "T°P" elements couples in an orderly way

with the substring "below it", and n. new problem- triangles are created. The

rule I asserts that a structure bridging X and coexists, if there existt> i

bridging structures for the u^ new triangles.

* A graphic interpretation of these rules of inference will be given shortly(see Figure 5), and a proof based on them is shown in Figure 6.

"TOP" ELEMENT X ; WHERE X =A

STRING.w

[THE DASHED LINES OUTLINE PROBLEM -TRIANGLESTHAT NEED TO BE FILLED BY APPROPRIATE CONFIGURA-TIONS OF RULE APPLICATIONS]

Fig. 5. Graphic interpretation of the situation considered in thespecification of a rule of inference lt,x of Nt(G)» whichcorresponds to a "top" application of a rule of replacementK± of G.

N. "TOP" APPLICATION OFR^ONXA I?T REPLACEMENT RULE R-,

25

Logically, the "Top" application of R acts as a keys tone holding

together, and closing, an argument; from the point of view of proof construction,

the rule application acts as a bridgehead from which further lines of con-

struction are initiated.The idea of partition, that enters in the formulation of {l}

fc, is

central in all our natural inference systems and in all the problem- solving

procedures that are based on them. It enables us to reduce our problem

into parts and also to benefit by our global view of the situation, so that

necessary constraints that appear at subproblem boundaries can be used fora priori elimination of irrelevant lines of construction.

A partition of a string o> in n parts is a n-tuple [ly I^, ... 1^ ],which we denote by pn (<"); 1 , 1 0. (3.3)

It is convenient for our purposes to introduce a tree representation

of the situation which is relevant to the specification of a rule of inference

I Such a tree, which we call an inference tree, has the following form:

(3.4)

the n parts resulting from the

XiXo """ Xa " <°' This labelledwhere the substrings y., X2> """' \» are

partition p (<") under consideration, and

tree is made of,/j\ M .f.f t nodes, graphically shown as parentheses that enclose the

labeling w.f.f. 's,

( ijL) an inference node, denoted by " in the graph, and labeled by the pair

of conditions under which the inference takes place, namely the

l± - !(<*>)I<i<n

26

specific application of a replacement rule, R , and the specificsegmentation pn (cv),

(iii) directed branches denoting the direction of the logical argument.

It should be noted that the convention shown in (3.4) on the specific orderingof the bottom w.f.f. nodes is significant for some of the tree manipulationprocesses that we will discuss later.

The assertion that a w.f.f. is true by the axiomfl. (we call such aw.f.f. a conclusive w.f.f.) can be represented in tree form as follows:

(+ J*)1 (3.5)

This degenerate tree (it is a single branch, or a link) is called an axiomlink; it has a single w.f.f. node (the conclusive w.f.f. node), one axiom node(denoted in the graph by „f)/ and a directed branch showing the direction ofthe argument.

A proof in tree form in the system N (G) is a labeled tree which ismade of inference trees and it terminates exclusively with axiom links to fl. .More specifically, the proof tree has a w.f.f. -node as its root; this node isthe consequent node of an inference tree, the antecedent nodes of that inferencetree are themselves either consequent nodes of other inference trees or con-elusive nodes (linked to CL), and so on, until all the tree terminals are Q. '

A w.f.f. S is a theorem of N (G) if it labels the root node of a tree

proof in N (G) . We denote the tree proof of Sin Nt(G) by D(s|Nt (G)).As an example, we show in Figure 6, the proof in tree form of the

w.f.f. P =» abed in the system Nt(G.), where Gj is the grammar of our illus-

trative example (given in (2.12)). This proof gives the solution in N (G) ofthe syntactic analysis problem for x « abed which was previously shown, inits conventional linguistic form, in Figure 3. [The general question of

correspondence between a proof in N (G) and a solution in the linguisticsystem will be discussed shortly.]

To "read" the proof D(P =» abcd|Nt(GJ) in Figure 6, we have to

follow the information associated with the tree nodes in the direction of thearrows (from the terminals to the root). Each conclusive w.f.f. is valid inNt(Gj) by the axiom (L (thus, a =* a, b=»b, c=»c and d *» d are valid); eachw.f.f. which has only valid w.f.f. 's as antecedents in a given inference tree

27

Fig. 6. The proof in tree form of P=> abed in the system Nt (G 1).

'R, :P—> ABR 2 : A-* ABb

p l s < R 3 : A—*- aR 4 : B-* BdR 5 : B—"* be (P=»abcd)

28

(these are the bottom nodes of the inference tree) is also valid (thus, firstA =» a and B =» be, then B =» bed, and finally the candidate theorem P =» abed arevalid in N

t(G,)).It should be noted that all the subtrees of a proof tree in the system

Nt(G) are themselves proof trees in Nt(G), and therefore their root w.f.f.'sare theorems; thus, all the intermediate w.f.f.'s in a proof tree are theoremsin Nfc (G) .

It is most likely that if we are faced with the problem of constructinga tree proof in Nfc (G), rather than with the task of "reading" it (ascertainingwhether it is valid), the direction of our attention would run against thearrows of the proof tree; we would start with the candidate theorem P =* x, wewould attempt to apply in reverse an inference tree where (P =» x) would be thetop node and one or more (new) w.f.f. nodes would be the bottom nodes; wewould next focus attention on the w.f.f.'s in the bottom nodes; and we wouldtreat them recursively in the same manner as the candidate theorem, until wewould reach terminal w.f.f.'s that are directly recognizable as valid by theaxiom of the system. Our general approach to automatic proof construction isto develop prqblem-solving procedures that proceed intelligently in the con-struction of a proof, in accordance with the "backward reasoning" scheme thatwe have just outlined. We shall discuss such procedures later, after completingthe presentation of the logical framework which provides a basis for theirformulation and study.

The Natural Inference System N. (G) .Rules of Inference fll^; The set {l}^ contains m rules, where each

rule Ig j^CIL corresponds to a rule of replacement R.: A ■*■ <jr '9' ... 9^ *"',(1 - a)

(I\iA 2... aP*' such that "P> - </'>, if there exist stringsXp X2> '""" Xn m\n V (where Xi "^y be empty but the remaining strings arenon-empty) such that X9XO """Xni Xq ="J (where ok ' denotes the comple-ment of cd relative to of-*), i.e.,cjoo) - v£2K.. cd

(N '), and all the w.f.f.'sX=» Ax., 9 =» \2, ..., 9 l* => Xn are valid in N^G), then X=»cd is also

valid in N^G).Lemma 3.2(1). If a w.f.f. X=»*»is valid in N^(G) by a rule of inference

Xi ie^Z' then w is x-derivable in G-* A graphic interpretation of these rules of inference will be given shortly(see Figure 7), and a proof based on them is shown in Figure 8.

to

"TOP" ELEMENT X

[THE DASHED LINES OUTLINE PROBLEM- TRIANGLES THATNEED TO BE FILLED BY APPROPRIATE CONFIGURATIONS OFRULE APPLICATIONS]

Fig. 7. Graphic interpretation of the situation considered in thespecification of a rule of inference Ig £ of Ng(G), whichcorresponds to a "left" application of a rule ofreplacement R. of G.

30

Proof: Suppose that, for the given R., it is possible to partition the sub- ]

string v in ni adjacent substrings x2-> """> >X] Bucn that MX2 is

9^2' -derivable in G, ..., v is qr -derivable in G, and (ii) Ax, isi fO

'

(liX-derivable in G. Since R. is such that 9 ■ > then the string00 XoXvXn is A-derivable in G (by suppositions (i) and properties of =»)..-

Therefore (by supposition (ii), and properties of «»), the string «> X2X3'"Xn Xis X-derivable in G, which (see definitions of string parts) proves the Lemma.

In Figure 7 we give a graphic interpretation of the situation which

underlies the specification of a rule 1^ .. In the present case, the rule of

inference considers an application of a replacement rule R., where the elementrO is "connected" to the application site aM, i.e., the "Left" element ofa>; we call this a "Left" application of Rt . and we denote it by R, . For a

(7)given partition of the string cdx ' in n^ substrings, n^ new problem- triangles

are created, as shown in the figure. The rule 1^ asserts that a structure

bridging X and en exists, if there exist bridging structures for the n^ newtriangles .

In the system N.(G), a partition n -tuple [lj, ..., l n ], which we

denote by p (u^M, specifies the lengths of a set of n substringsn ~mXi """' Xn of "* ' the Bubfitrin8 lengths satisfy the conditions,

1, > 0> lt > 0 for 1 < i < n , and

(3.6)

As in the case of N (G), we introduce in N^(G) the notion of aninference tree, which represents the situation in which a rule of inferenceI, . is specified. Such a tree has the following form:

where the substrings Xi» X?' fy """> Xn *re the n< parts that result from the

I<i<n

31

i

partition p (.&> ) under consideration, and x2X3' "" Xn Xi" " The con-vention shown in (3.7) on the specific ordering of the bottom w.f.f.'s will beused subsequently in our tree manipulation procedures.

The notions of a tree proof is the same in N (G) as in N (G);* t

the inference trees that enter as building blocks in a' tree proof ih N.(G) areof the type shown in (3.7). A w.f.f. S is a theorem of N,(G) if it labels theroot node of a tree proof in N^(G); we denote such proof by D(s|N,(G)).

It is of interest to note some special cases of partition that occurin N»(G) proofs, and that appear in the proof of Figure 8. (i) If a unary

replacement rule, R , is the basis for an inference tree whose root node is

(X =»^), then the partition is degenerate; since n ■ 1,- the partition resultsin a single substring Xi with !j = K">)-1 (in Figure 8 this is the case with

the rule application R_ on (P =>abed)). Xii) If the substring X. is empty, then

the segmentation has the form [0, I^, ..., 1 ], (since 1. ■ 0); this can occurif, for a rule of replacement R : A "*■ 9, whicn is considered for a w.f.f.X=» <a, we have both X= A and tvS '« 9^ (in the proof of Figure 8 this is

the case with the rule applications R. on (P =» Abed) and R, on (B =» Bd)).

The Natural Inference SystemJN (G). .Rules of Inference fll : The set {i} contains mjruleSj^where each rule—~ (*m) (n<~l)

"

<1h rt\I e{l) corresponds to a rule of replacement R.: A ■*■ 9X i '9x *" '. . .qr'-'qp- ,J ,(1 """■» Xn in V (where X] "^y be_gmpty but the remaining n^l stringsmay not) such tnat XjXn Xn .] """ X 2 = " ' (where cd ' denotes the coraple-

H*)* cV) rib c£)ment of oq relative to a? ' i.e. cdv

'» cdv ' . . . cdv '), and all the w.f.f.'sx=* Xi A> 9 V -""»9 =» Xn are valid in Nr(G), then X=»cd is alsovalid in Nr(G).

Lemma 3.2 (r) . If a w.f.f. X=»cd is valid in Nf

(G) by a rule of inferenceI . e{l) , then cd is X-derivable in G.

Proof;

Similar to that of Lemma 3.2 (£) (with appropriate adjustment for the

"change from "Left" to "Right".

In Figure 9 we show the situation which underlies the specification

of a rule I^ i# Here, Ir jConsiders an application of the replacement rule

* A graphic interpretation of these rules of inference will be given shortly(see Figure 9), and a proof based on them is shown in Figure 10.

32

Fig. 8. The proof in tree form of P=> abed in the system N (G.^ .

l*J

CO

)

"RIGHT"APPLICATIONGraphic interpretation of the situation considered in theFig. 9

" RIGHT"ELEMENToAoFw^WHERE ■Jh'ifl 1 '-

[THE DASHED LINES OUTLINE PROBLEM -TRIANGLESTHAT NEED TO BE FILLED BY APPROPRIATECONFIGURATIONS OF RULE APPUCATIONS]

«X

i

34

R. , where the element R; 'is "connected" to the application site <a , i.e.,the "Right" element of cd; we call this a "Right" application of R. , and wedenote it by R, . For a given partition of the string ar-^' in n. substrings,n. new problem-triangles are created; according to I , a structure bridgingl r,i

X and (a exists, if there exist bridging structures for the n. new triangles.In the system N (G), a partition n -tuple [1., ..., 1 ], which we

(fi indenote by p (or"), specifies the lengths of a set of n substringsn /i\Xl ""' Xn o£ ffl » these lengths satisfy the conditions,

1, >.0> 1. > 0 for 1 < 1 < n , and

t

(3.8)

An inference tree in N (G) has the following form:

(3.9)

where the substrings Xi> """> Xn are the ni Parts that result from thepartition p (cd(D) under consideration, and XiXn Xn _

i """ Xo " a)'''.Again, the convention shown here on the specific ordering of the bottom w.f.f.will be observed in the subsequent tree representations.

The definitions of tree proof and theoremhood that were wade for thesystems N (G) and N.(G) carry over to N (G) in the obvious way; of course,the tree proof of a w.f.f. S in N (G), which we denote by D(S |N (G)), isconstructed with inference trees of the type shown in (3.9). As an example,we show in Figure 10 the tree proof in N (G) of our illustrative problemP * abed (tree proofs in N„(G) and N (G) are shown in the Figures 6 and8, respectively), [it is of interest to note in the proof of Figure 10examples of special cases of partition that occur in N (G); i.e., unary re-placement rule cases, and I. » 0 cases, as well as combinations of these cases,e.g. the rule application R^ on (A =»a).]

1± - I<CD)-1.i<i<n

Fig. 10. The proof in tree form of P=> abed in the system N^G^).

m

36

The Mixed Natural Inference Svstema Ng.°mixed €«M)> (t,r), (l,r), (t,l,r)f. xc

(G), where

:ules of Inferences fll :"mixed

Thus, the rules of inference of a mixed system, which offers a combination ofstrategic approaches to the proof problem, are the union of the rules of infer-

ence of the "pure" systems whose approaches may be used in the mixed system.

A mixed system offers the convenience of constructing proofs where at

each step of construction one can decide what form of inference to use ("Top","Left", or "Right" inference - according to the choices available in thesystem), i.e.,at what site of a candidate w.f.f. to attempt a rule applicationin order to advance the argument. The mixed system extends the problem-

solver's flexibilityof approach- a fact of great potential significance forthe efficiency of the proof construction procedure.

Lemma 3.2 (crmlxed) . If a w.f.f. X » co is valid in Ncrmlxed<G) by a rule, ofin£erence 1j- < d ,i € I^mixed' then mis x* derivablc in G.

Proof: By Lemmas 3.2 (t), 3.2 (4), 3.2 (r).

A proof in tree form in a mixed system of natural inference Na .(.G)is constructed of inference trees that may be taken from any of its component

"pure" systems; we denote the tree proof of S in No-mlxed(G) by D(S lNamlscedCG)>As an example, we show in Figure 11 one of the possible tree proofs of ourproblem P-> abed in NNt,

t i r )<G j) < th* proofs in Nt(G), N^(G) and Nf

(G) areshown in Figures 6, 8, and 10, respectively).

Logical Cpp^fitencv TheoremIf a w.f.f. ir* x i3a theorem in any of the systems of natural infer

ence N (G), then x is well-formed in the CF-language L(G).

Proof: The string xis well-formed in L(G) if it is P-derivable in G. ByLemma 3.1 and the various versions of Lemma 3.2, we know that if a w.f.f.X ■» vi ig valid in any of the systems N^G) by the axiom CL or by any of the

M(t,J> " {l)t U fl}4

t

W<t,j.«>- CDt v {i}, v {i}r .

>i

(

\

l;l

:\\'ill

i!;

37

rules of inference in Nff(G), then co is X-derivable in G. Because of this, and

since P =» x is, by hypothesis, a theorem of N (G) (i.e., it is at the root nodeof a proof tree in N (G)), it follows (by the definition of a proof tree) thatxis P-derivable in G. This proves the theorem.

>i

:i

i

jl

■

f

Fig 11 One of the tree form proofs of P=> abed in the mixed system

V^r)^'

A -8 fR,: P—AB.R 2:A-^ABb

* R3 : A -=-0R4 :B -—BdR 5 B bc

II

38

IV. CONVERSION FROM PROOF TREES TO DESCRIPTION TREES

The substance of this section will be presented as a separate technicalpaper. The main result shows how to construct procedures for converting tree-

proofs in the seven systems Na(G) into P-markers of G. This is stated as:i

"uctural Consistency TheorIfaw.f.f. P =» x is a theorem in any of the systems of natural

inference N (G), then a P-marker tree of x in G is in one-one correspondence

with the tree proof of P =» x. Moreover, there exist effective procedures for

obtaining the P-marker from the tree proof.

i

i

r

f

1j

39

V. NATURAL DECISION SYSTEMS FOR SYNTACTIC ANALYSIS

We shall now extend our natural inference systems N (G) into systems

JC (G) that have richer inferential mechanisms and that are especially wellsuited for the formulation of natural decision procedures; we call the systemsJf(G)> natural decision cyst

We associate with each w.f.f . Sin Jf(G), a value, v. (6), (k ■ 0, 1,2, . . .),whose intended meaning is close to that of "truth value" in logic; more speci-fically, we intend it to reflect the state of knowledge of a problem solver at

a given time, k. about the validity ( theoremhood) of S in a given decisionsystem. We find that this notion of value is extremely useful for the formu-lation of decision procedures and problem-solving procedures in general, andwe believe that it will prove considerably fruitful for the theoretical study,of the dynamic b of knowledge and uncertainty in traces (or trajectories) ofsuch procedures when they are observed as physical events. We assume thatvk(S) takes values from the set {1,u,0}; we intend vfc(S) -1 to mean that theproblem solver knows at time k that S is valid in the given system of decision)t'(G), i.e., S is provable in the system; v. (S) «» 0 is intended to mean thatthe problem solver knows that S is not valid in/V^.(G), i.e., S is not provablein^(G); if the value of S is v, then we mean that the problem 6olver isuncertain about the validity of S in«4£(G). The value "v" should be interpretednumerically aa a number between 0 and 1. Since v/e are not considering in thispaper notions such as "forgetting" or "error in problem solving", we will agree

that if a w.f.f. has assumed at any time k a value 1 or 0, then its value willremain stable for all times after k. In other words, 1 and 0 are stable values.On the other hand> "v" is an unstable value which may change with time, andmoreover (as we shall see shortly) it is the intended function of a problem-solving procedure to attempt to "stabilize" the "v" value of a w.f.f. underconsideration, by changing it to one of the two stable values. It may bedesirable in some problem-solving situations to consider more than three valuesfor vv(S); however, we find this set sufficient for our present purposes.

We shall now introduce certain elementary notions that will enable usto formulate an effective basis for refutations (i.e., for recognizing that agiven w.f.f. is not a theorem in a system *^(G)). Consider the set *L(.G) ofall terminal strings of a nonterminal X in G; this set was introduced in (2.5).Given a w.f.f. X =* cb, where cd is in Vj, it is valid to conclude that cd is not

40

X-derivable in G if we know that (a 4 s^(G). Now, if we can state "simple"necessary conditions for membership in (in the sense that they can be

readily tested), then we can test cd with respect to these conditions, and if

co fails one of these tests, we can conclude that cd is not X-derivable in G.

A useful property of *k(G) is its support, s(X), which was defined in (2.6). .Clearly, if l(<u) < s(X), then cd i . There is a class of properties,

similar to "support", that can be formulated in terms of lengths of strings in

and that provide ways for testing whether a given 6tring is refutable.

Such tests examine whether the length of a candidate string falls in a

"forbidden region" of string lengths in a^(G); if it does, then we can conclude

that the string is not X-derivable in G.

The set «&(G) is a subset of the set =£(G), which includes all strings

in V that are derivable in G from X, i.e.

(5.1)

Note that Xis also a member of =£(G) . Now, if it is known that 9 /sc^(G),then it can be validly inferred that cp is not X-derivable in G. We say that a

string cp in V is X-derivable from the left in G, if there are strings in s^(G)whose first component is q^ 1 '. We denote this relation by X=» cp. If a

candidate string doesn't satisfy derivability from the left, then it is not in\£(G) . The relation of X-derivability from the right is similar, and we denote

it by X*> Cp. Left and right derivabilities can be tested with relative ease

in the graph of the grammar, T(G') (an example of such a graph for our illus-

trative problem is given in Figure 2). For a given X and 9, X=> cp holds if it

is possible to find in T(G') a path from <p"' to X (by going against the arrows),

such that all the branches that lie on the path and that leave rule nodes (here

we refer to the direction of the arrows in JT(G')) are labeled 1. Similarly,

/IN

X =» Cp holds if it is possible to find in T(G') a path from q> ' to X, suchthat a.ll the branches that lie on the path and that leave rule nodes a.relabeled f .

Given a w.f .f. X=* p, where X6VN and p£V,it is important for our

decision procedures to know whether the element p is X-derivable in G. Note

that if P is derivable from X, then there must exist In p a finite sequence of

unary replacement rules (possibly a single unary rule) that can take X to |3.

This can be easily tested in the graph r(G').

>Z(G) » {cp|cp is mV, and X => q> holds)

41

i:

± «

In general, it is extremely important for our proof constructionprocedures to have a set of easily testable necessary conditions for stringsin «£(G), so that a candidate w.f.f. X cp which is not valid can be refutedearly; this minimizes the expected expenditure of search effort in the directionof "dead ends". We are facing here a classical problem of pattern recognition,namely that of selecting a set of features on basis of which a decision aboutclass belongingness can be efficiently made. (In our present case, the classes

of interest are s^t(G), for all X in V„ .) The problem is not only to find

features from which we can make logically valid inferences. The features shouldbe such that they can be tested with a relatively small amount of "computationaleffort". This implies (among other things) that appropriate representations of

the grammar should be available — in the sense that the grammar features that

are to be used for decision should be easily testable in these representations.

[We find the graph T(G') (with the additional property that at each nonterminal

node X, the "support" of X is also available) to be a satisfactory representation

of the grammar — with respect to the elementary feature tests that we are usinghere.] The problems that exist in this general area need considerable further

study; the recent work on "question-answering" systems is relevant here.

We shall use for the systems «/f^.(G) a small set of refutation conditions

which is sufficient for the formulation of the axioms of

Axioms of J^(-G) < £or all a'

Validation Axioms(Xt : Same as in Na(G)

Refutation Axioms:

a! For all strings x in V-, and all strings , then theOil §

w.f.f. x => cp is not valid in

Q,f. ". For all Xc VN and all strings xin VT, if l(x) < s(X), then X=»x isnot valid in^(G).

dn i For all X c Vj, and all strings cp in V, if cp is not X-derivable fromthe left in G, or cp is not X-derivable from the right in

G,

thenX «* cp is not valid in^(G).

Q, . For all Xc VN and all PeV, if pis not X-derivable in G (by asequence of applications of unary replacement rules), then X =» P is

not valid in JC^iG).

42

Lemma 5.1. For all a€ V and all strings cp in V, if a w.f.f. a=»cp is

refuted inJ^(G) by &Qf , or Q.Q>2 or (2o>3 or UQ A , then <p is not

a-derivable in G.pro0

f;

if $ applies, then there ls no rule of replacement which is0, 1

applicable on a (since a is, by supposition, in VT)j hence, since a ¥ cp, no

derivation exists in G from ato 9. If QQ 2 applies, then (as discussed

previously) cp d ittXG); therefore cp is not a-derivable in G. If OLq 3 applies,

then (from our previous discussion) cp jl therefore cp is not a-derivable

in G. Finally, the axiom £L . is clearly valid in G. This proves the lemma.

We can regard any validation axiom as assigning the value 1 to any

w.f.f. on which it applies. Also, any refutation axiom assigns the value 0 to

any w.f.f. on which it applies.

It is possible, and sometimes practically desirable, to augment the

set of axioms that we are using in«^.(G) - both for validation and refutation.

We have already discussed the problem of selecting additional conditions for

refutation; such conditions can be used to augment the set of refutation axioms.

A natural way to augment the set of validation axioms is to introduce for each

replacement rule R : A+cp an axiom which validates the w.f.f. A =» cp. Another

natural way of increasing the efficiency of a decision procedure is to utilize

in a dynamic way previously proved theorems. In the course of executing

decision procedures, it happens sometimes that a w.f.f. has been proved valid

in one part of a search tree at time k (a search tree provides a trace of the

decision procedure up to a certain time - it will be discussed shortly), while

at a later time and in a different part of the search tree there appears the

same w.f.f. with a "v" value associated to it. Under these conditions, it is

useful to be able to consider the previously proved w.f.f. as a theorem which

provides validation for the new

w.f.f.,

and hence to avoid the need for repeating

the proof of the new w.f.f. A decision system with such a variable set of availa

ble theorems while logically equivalent to a system without theorems that is

based on the initial set of axioms, may provide the basis for an extremely

efficient decision procedure. In general, however, the optimal size of the set

of available valid w.f.f.'s and the mode of its growth depends on a variety of

pragmatic considerations that are related to the available means for storage

and retrieval.Suppose that a rule application of type t, where 1 e{t,£,r), is

considered for a w.f.f. S = (X =» cd) whose value is uncertain. For each of the

three types of rule application, there exists a set of relevant replacement

Pc

S!43

L i

ru lea relative to SQ which we denote by [R}g . The set of relevant replace-

ment rules is such that if SQ has a proof with a rule application RjF at S^,then the rule R, must be among the relevant rules of- the set {r}| . The set

of relevant replacement rules is a subset of the set of applicable rules, (of

the given application type) on S». For example, if the "Top" approach on 80is considered, then {R}*j c {r|r*0) «= X). The members of the set {R}g are

those members of the set of applicable rules {r|r* ' » X} that satisfy certain

necessary conditions Of relevance. An example of a condition of relevance for

a rule R : A "*" q> which is a candidate for a "Top" application on Sfi is that

both the relations cp ' => cd and cp v ' *» cd should be satisfied. The problem of

relevance conditions will be discussed in more detail later. It suffices at

this point to indicate that, for each w.f.f. we can effectively obtain the set

of all relevant replacement ruleß that are applicable on the w.f.f. from the

"Top" the "Left" or the "Right".For the given S

Q

and a relevant replacement rule R c£R}g " there exist

set of relevant partitions, which we denote by {pn (co) ]R . The set of

relevant partitions is such that if SQ has a tree proof with a rule application

RT at Sn and a given partition pn (cd) associated with it, then pn (cd) must beamong the relevant partitions in {pn (co) }* . The set of relevant partitions isa subset of the set of all possible partitions of a string of N elements (where

1(cd) oN) into n parts, where the restriction is based on certain necessary

conditions of relevance — to be discussed later. Again, it suffices to indicateat present that the set of relevant partitions can be effectively enumerated*

We can regard a rule of inference I . in one of the systems of naturalinference N (G) as the partial specification for a mapping between the valuesof a set of w.f.f.'s (the antecedents in the specification of I ,) and thevalue of another w.f.f. (the consequent) - given that the constituent elementsof antecedent and consequent w.f.f.'s are related in a specified manner. We

shall define next the notion of an inference mapping, i}R of which such a ruleof inference 1b a partial specification, with the intention of augmenting theinferences that are possible in our systems of natural decision- in a way which

would be advantageous for work with decision procedures and their associatedproblem-solving procedures.

Consider a w.f.f. S

Q

- (X =» co), where a given application of a relevantrule of replacement from [R)g takes place; let R denote this application andsuppose that R labels the rule A■+ cp . For a given relevant partition

V. m.»

:&

44

p (">) e{p ("OJr > we have n antecedent w.f.f.'s S , 1 < j < n , whose

values at time kwe denote by vk{S ). [The nature of the antecedent w.f.f.'shas been discussed in detail in Section 111 for each of the three "pure"

approaches to rule application.] In reference to this situation, we define the

following inference mapping, *1R,

(5.2)

TWe call the left side of r\ the conditional value of S0 at time k, given R

X., P U

and pn (co).

Lemma 5.2. The inferences obtained via the mapping i)R in a system Jf%

(.G) ,where t e{t,i,r), are valid in the linguistic system G.

Proof: Let us consider in turn the three possible valuations of the condition-

al value defined by % "(i) If, for given RT and pn ("0, we have vk(Sj) = 1 for all j, 1 < j < n ,

then the conditional value of SQ at time kis 1, according to nR '>this agrees with our definition of the rules of inference Ifc ,1^ ,I that were shown to be valid in G (see Lemmas 3.2). Note thatr

the case just discussed covers precisely the partial specification of

ti defined by a rule of inference IT ; the next two cases refer to

R, pnew inferences introduced by \)P>

(ii) If v (S ) - 0 for some j, then the conditional value of S^ at time k

(given a pair RT and pn (">)) is 0, according to t]r^. This inference,

certainly satisfies our intended interpretations of =» in G. For if,for the given partition of cd, one of the antecedent w.f.f.'s does not

hold in G, then clearly the w.f.f. SQ does not hold in G either. If

JC (G) is to be consistent with G, then S

Q

should not be a theorem of

JC (G) under the given conditions, which is precisely what is inferredln nR>P*

(iii) If for no j, 1 < J < n . *>c have vk (S.) = 0, but for some j we have

v. (S ) » v, then the conditional value of SQ at time k (given a pair

RT , p (cd)) is v (uncertain), according to tir . In other words, for

a given partition of for which it is known at time k that some (or

none) of the antecedent formulas hold in G, but it is uncertain

whether some (or all) formulas hold in G, then it is uncertain at that

\ ? " VSolRT ' Pn <m» " W

45

'i

time whether an X-derivation of cd in G exists on the basis of the given

rule application and the given partition. It is therefore reasonable,and not inconsistent with G, to consider the theoremhood of S_ uncertainat time k, which is what is inferred from 1K.,p

It Bhould be noted that the notion of the inference mapping t)D_ isK<P

closely related to that of "truth function" - a notion of extreme usefulness

for reducing logical problems to algebraic and computational forms. Specifically,

ti0 corresponds to conjunction in a 3-valued logic of the type proposed by PostR, p[18].

Next, we introduce two compound inference mappings that are closely

related to r\ ;we denote them by t)r and i), respectively. The mapping tjr isdefined as follows:

We call the left side of i) the conditional value of S-. at time k given R andall the relevant partitions in [pn (co) }R . In our truth functional interpretation

of inference mappings, 1R corresponds to a disjunction of conjunctions in a

3-valued Post logic. This "disjunctive normal form" contains as clauses the

conditional values that are associated with all the relevant partitions of cdifor a given R .

Lemma 5.3. The inferences obtained via the mapping t)r in a system^(G),(t e{t,4,r}), are valid in the linguistic system G.

Proof

;

Consider the three possible valuations of the conditional value

defined by T)R .(i) The mapping % assigns to SQ a conditional value of 1 (given a R ) if

there exists at least one relevant partition for the given RT wherev.(S.) - 1 for all J, 1£ J < n . This is certainly consistent with

X Jthe rules of inference in the systems NT (G), and hence also with G,

(ii) % assigns to SQ a conditional value of 0 (given a R ) if for aU thepossible relevant partitions associated with the rule application RT

there is no partition in which vk(S.) ■■ 1 for all J, 1 < J < n . It isclear that under these conditions no derivation in G is possible thatutilizes RT at SQ . Hence, for consistency with G, Sq should not be a

V VSoIrT ' {p« (a3)J R > " "{*(„))* vk(SolßT' pn <»»" < s'3>5 ' 3>

46

theorem ofb^(G) under the given condition, which is what T]R asserts.

[Informally, if we consider the graphical interpretationof thesituation, as presented in the Figures 5, 7, and 9, we aredealing here with a case where, after a "bridgehead" is constructedat the application site of S_ (by "connecting" there in the appropriateway the rule R ) we know at time k that it is impossible that all theremaining parts of the "bridge" are constructable. Once this fact isknown it is clear that we can say at that time that no bridging structureis possible between the apex X and the base string cd which starts atthe given "pridgehead". ]

The present Lemma implies that the conditional value of SQ (given RT )

is uncertain at time k if for no possible partition associated withR there exists a conditional value

v,

(S

Q

|R , p (cd)) =1, but it maybe the case that some (but not all) conditional values are known to be0 and some (at least one) are still uncertain. Since there remainuncertain conditional values, it is uncertain whether further searchwill not reveal one to be a 1 or all to be a 0 at which time the cases(i) or (ii) will hold, respectively. Therefore, it is not inconsistentwith G and it is in agreement with our intended interpretation of v toconclude that the Lemma is valid in this case also.

(iii)

The second compound inference mapping, tj, is defined next

Max vk(S 0 |RT , [pn (cd) }J.) .[R]

STso

i

(5.4)

We' call the left side of i\ the conditional value of S at time k given all therelevant applications (of given type t) of rules of replacement and all therelevant partitions that are associated with each relevant rule of replacement.

Going back to our truth functional interpretationof inference mappings,T] corresponds to a "disjunctive normal form" in a 3-valued Post logic, andits clauses are the conditional values that correspond to all the applicationsof relevant rules of replacement on SQ . We can now express the (absolute)value of S- in terms of the values of all the antecedent w.f.f.'s, that areobtained from the inference mappings that we have defined.

ti: vk (So |[R)sso,T

o, {pn («»)J)-

«* Max Max Mm vk^ S <^[R}Jo(Pn <»»; I<J<n

47

Lemma 5.4. In a system^(G), (t e{t,4,rj), the (absolute) value of anonconclusive w.f.f. S

Q

= (X *v) at time k is given by

where the w.f.f.'s S,, 1 < J < n , denote the antecedents of Sqrelative to a relevant RT and a relevant p (co); this value mapping is

consistent with G.

pyoof; Similar to proof of Lemma 5.3.

Let us introduce next a tree representation of the situation which

underlies the specification of the inference mappings discussed above. Such

a tree, which we call an inference mapping tree (in analogy to the inference

trees defined in Section III) is shown in Figure 12. The role of an inference

node in an inference tree (see for example Figure 7) is split here in two.

Rule application nodes and partition nodes are shown separately; a sequence of

branches going through RT and then through pn (co) in the inference mapping tree

is equivalent to a branch going through a node R , pn (co) in an inference tree.

In an inference mapping tree, for a given direction of approach t, there are

branches descending from the w.f.f. node SQ to all the relevant rule application

nodes. For each relevant rule application node, say R , there are branches

descending to all the relevant partition nodes, and for each of these nodes,

o (co) there are n branches that descend to the set of n antecedentsay pn \

/,

w.f.f. s that relate to S

Q

via R and pn (co) .Axioms links are represented in the systems -A£.(G) in the same way as

in the systems Nff

(G) (see Figure 8).

We can regard an inference mapping tree as the logical diagram of a

circuit where values can be processed and transmitted. The inputs are the

values associated with a set of antecedent w.f.f. nodes such as Sj, ..., Sn .These inputs are processed at the partition node pn (co) by ij (see Figure 12);

in view of our previous comments, a partition node can be considered as a

3-valued AND gate. The outputs of the AND gates are processed at the rule

application nodes by tjr (see Figure 12); a rule application node can be con-

sidered as a 3-valued OR gate. The outputs of these gates are processed at

the S node by T (see Figure 12), which produces the output of the tree; the

Srt node can also be considered as an OR gate.0

v. (S0) - Max_ Max Mm v(S ),k ° wj (pn <»»; i<j<n k J

k (S 0 )AN ELEMENTARY TRANSITION

OO

Fig. 12. Inference mapping tree.

49il

A decision tree in a natural decision system is labeled tree whichof axiom links, and furthermore theis stable (1 or 0) .is made of inference mapping trees and

value associated with its root node S

decision system, (t c [t,4,r}),type t. In a "mixed" system

In a decision tree of a "pure"

the inference mapping trees are all of

Jf,(G), (t' e{(t,4), (t,r), (4,r), (t,4,r)}, a decision tree may use as building

blocks inference mapping trees from all its component "pure" systems. Thus a

decision tree in-iC t £ r )<G) may have inference mapping trees based on "Top","Left" or "Right" applications of replacement rules.

Each inference mapping tree in a decision tree is completely autonomous

from the point of view of value computations. In a decision tree of any

natural decision system ("pure" or "mixed"), value computations can be carried

out homogeneously.If v. (S ) = 1, where the w.f.f. S

Q

«= (P =» x) is the root w.f.f. of a

decision tree in any of our natural decision systems then S. is valid in

JC (G) and it is also valid in N (G) (by the Lemmas in the present section) and

also x is well formed in L(G) (by

■

the consistency theorem). Furthermore, the

tree proof of S in N (G) can be easily obtained from the decision tree by

tracing back from the root node a tree path which proceeds as follows through

nodes with value 1: Only one branch is taken below each OR node (the w.f.f.

nodes and the rule application nodes) and all the branches are taken below

an AND node (the partition nodes) until all the tree terminals reach conclusive

w.f -f " 's linked to Q^ .If v (S ) = 0, then S is refuted in JTiG) and x is not well formed in

L(G) (by the lemmas of the present section). Furthermore, the entire decision

tree provides now a detailed record of the unsuccessful attempts to form a

tree proof of S . This decision tree has the logical status of a tree proof;

it is indeed a tree proof of the nonvalidity of S . We call it a refutation

tree of S . In view of the information included in a refutation tree, this

tree (or a partial description of it) is an interesting candidate for the

error description message which was discussed in Eg. 2.17.

As we will see shortly, decision trees are grown, from their root node

down " in the course of execution of decision procedures and of problem solving

procedures. We call a decision tree in its intermediate stages of growth a

Contrary to the habits of live trees, and in view of our habits of mind, thedirection of growth of our trees is downwards.

50

search tree (or a problem-solving tree). The value at the root node of asearch tree is uncertain; it is the purpose of a decision procedure to directthe growth of a search tree in such a way that a stable value (1, or 0) isattained at its root node.

We shall now formulate an effective decision procedure for J[ (G),(for all a), which starts with the w.f.f. S * (P =* x) and produces a decisiontree for S from which we can obtain a P-marker of x if one exists .A Class of Decision Procedures for m/L(.G) (for all g

We start by focusing attention on the initial w.f.f. S- ■ (P =» x)whose value is uncertain. The objective of the decision procedures, which wedenote by n is to eliminate this initial uncertainty, and to assign a valueof 1 or 0 to S . A decision procedure is entered at n and is exited atTT^.g 0The following is a description of n,:nd! For a Biven w-f.f. S under attention, where vk(S) -u,(k ■ 1, 2,...),

and for a given a, choose a direction of approach (in a "pure" system

there is no choice; however, in a "mixed" system such as a ■ (t,4) we canchoose between t and 4) and generate an appropriate inference mapping tree.

This tree includes all the relevant rule applications for S and all therelevant partitions relative to these rule applications, and it results in

set of terminal w.f.f.'(II ). Test, in a given order, whether any of the newly generated w.f.f.'sare conclusive, i.e., if an axiom of «/C.(G) applies to them. If yes, assignthe appropriatevalue to the w.f.f.; if no, assign the value v.

(II). On basis of the values assigned to terminalw.f.f 's compute thenew value, vk(S_) (k = 1, 2, ...), of the initial w.f.f. (on basis of(5.2), (5.3) and (5.4)) by a process of "backing up" values from theterminals to the root.

(IIo). If vk (SQ) «1, stop and indicate that xis well formed in L(G)(by the consistency argunent) . [At this point it is also possible toextract the proof tree in N (G) of the w.f.f. S. and from it — via theprocedures mentioned in Section IV— to obtain the P-marker of x.] Ifvk (SQ) « 0, stop and indicate that x is not well- formed in L(G) (by thtlemmas in the present section). [At this point it is possible to outputthe decision tree which is rooted at S for purposes of error control .]If vk( so) = u» continue the process, as indicated inTT .a

51

(H )" Direct attention (in a given order) to each terminal w.f.f. from■3

the previous generation whose value is v. For each of these w.f.f.'scarry out the processes n , n " After these processes have been carriedout, execute 11 over to the entire tree, and then go ton .

We shall now prove the completeness of our systems of natural decisionwith respect to the grammar G. This proof provides a justification of theeffectiveness of our decision proceudres (in fact, of a large family ofprocedures that have the essential characteristics of the procedures nd incommon) for the solution of the syntactic analysis problem that we haveformulated in (2.17).

Completeness TheoremFor all strings x in V_, if x is well formed in L(G), then the w.f.f.

S=(P=» x) is valid in any of the systems Jfa(G) (i.e. vk (S) =1, where kis afinite integer) .

Proof;

It suffices to show that the decision procedure IL which is associatedwith a given system Jf (G) always terminates after a finite number of generations.

Note that, as a search tree grows from S = (P => x) the lengths of strings thatenter in the right sides of w.f.f .s in successive generations are non-increasing.

The case where the string length is conserved between successive generations

occurs when a unary replacement rule is applied; in all the other cases, thestring lengths decrease from generation to generation.

Since any nonterminal element in the grammar must have non-zero support

(see 2.7), then any sequence of unary replacement rules in p is finite and it

has the following form:

(5.5)

where A, A ,

.#.,

Afl

iVN and \|r is either a single element in VT, say a, ora string in Vof length larger than 1 . To a sequence of unary rules of the

type shown in (5-5) there corresponds a segment of a chain in the search tree

where string length is conserved." Let us call such a segment a length-conserving

segment.

* A tree chain is a path from the root of the tree to one of the treeterminals .

A 1 * A2' A 2""" A3' ""' An "* *'

52

Let us consider a decision procedure for the mixed system Jf. . ..(G).(t,4,r)If this procedure terminates after growing a finite search tree, then theprocedures for the other natural decision systems terminate also. We assumethen that the search tree which is grown by the decision procedure underconsideration is made of inference mapping trees from any of the three ruleapplication types.

In each inference napping tree (see Figure 12) there are several chainsthat go from the root w.f.f. node, through a rule application node and thenthrough a partition node to a terminal w.f.f. Let us call these chainselementary transitions. It is clear that in each case that a non-unary, re-placement rule is applied the string length decreases over an elementarytransition -in the direction of tree growth. If we consider a chain in asearch tree which is made of a sequence of such elementary transitions, thenwe will reach on this chain a w.f.f. whose string length is 1.

Consider now length- conserving segments of the decision tree. Theyare made of one or more length-conserving elementary transitions in sequence,each of which is associated with a unary replacement rule. Since any sequenceof unary rules is finite in length (see (5.5.)), it follows that the length-conserving segments are also composed of a finite number of elementarytransitions.

Let us consider now a w.f.f., S, that lies on a length-conservingsegment, and let us examine the transitions that are obtainable by applying toS replacement rules that attempt to maintain the next w.f.f. on the length-conserving segment (if this is possible); we examine in succession ruleapplications from the "Top", "Left", and "Right".

(1) Suppose that the rule is applied at the "Top" of a w.f.f. S«= (A = cd),where k^ (1 is a string in V. If 1 co which lies on the length-conserving segment. Ifco consists of a single nonterminal element, then it may be the case that CL.applies and the growth stops. If i = n, then (by 5.5) we can apply on Seither (i) a unary rule which yields a w.f.f. a => co (where a c V), or(ii) a non-unary rule, which causes the length-conserving segment to branchinto new segments where the string length is smaller than in the previouslength- conserving segment. In the case (i) where the w.f.f. a => co is

53

obtained, the axioms £2, or Qq 1 apply, and the growth stops.

(2) Suppose that a rule is applied at the "Left" of a w.f.f. S-(X =» A y),where A. is as in (1), X c V„ and yis a string in V. If 1 < i <n, then we

can apply on S one of the unary rules in the sequence. This yields the w.f.f.

X=» A . .y^ which lies on the length-conserving segment. If the string yis

empty, then it may be the case that££. applies, and the growth stops. If

i «= 1, then no further unary rules can be applied on S (by definition of A.).

Hence, if the string yis not empty either Q.Q - applies or a non-unary rule can

be applied, which causes the length-conserving segment to branch into newsegments where the string length is smaller than in the previous segment. If

the string x is empty, then either CL. orfln * apply, and the growth stops

(3) The case for rule applications at the "Right" of Sis similar to thecase (2) just described.

In general, the finite length- conserving segments always lead to chainsegments with w.f.f.'s of smaller string length, or their growth is stopped

by one of the axioms # Qq y&0 yorQq 4 - It remains now to examine thesituation at w.f.f.'s that have right-side strings of length 1.

If a chain of the search tree reaches a w.f.f . a=» p, where a, p c V_,,

then Q., or &n . apply and the growth stops. If a w.f.f. X=> p is reached,

where X c VN, P c VM,, then either Q

Q

* applies or a length-conserving segment

will grow below the w.f.f. X=» $ (as described above) and it will be stopped

by Q. . If a w.f.f. X=>A is reached, where X, A c VN, then either Ct or Ct

Q

,apply and the growth stops, or a length-conserving segment will grow below the

w.f.f. X=> A (as described above), and it will be eventually stopped by CL.We have shown then that all the chains of the search tree must stop in

a finite number of generations -by one of the axioms 1' 3°r 4"The axiom &n , may also contribute to stoppage of growth in some chains prior

0, 2to the application of the other axioms. Therefore, the decision procedures

discussed before effectively compute the desired mapping 8(x) which is defined

in (2.17) via the formation of appropriate decision trees; i.e. if the input

string x is well-formed in L(G), then the procedure produces a tree proof

which yields a single* P-marker of x. If x is not well-formed in L(G), the

procedure indicates this, and the decision tree which is grown during itsexecution can be used as an error description.

* The choice of a specific. P-markor depends on details of the decisionprocedure; these will be discussed in the next section.

54

VI. HEURISTIC PROCEDURES OF REDUCTION TYPEFOR SYNTACTIC ANALYSIS

While the decision procedures presented in the previous section areeffective algorithms for syntactic analysis, they are not designed with compu-

tational efficiency in mind. But they provide the framework for the construction

of efficient procedures. In what follows, we shall:1. establish the correspondence between the decision procedures discussed

above and certain reduction- type problem-solving procedures;

2. identify aspects of these procedures related to computational efficiency;

3. develop the procedures into efficient syntactic analyzers.A search tree of the type grown during a decision procedure is the

prototype of a tree which grows in the course of executing a problem- solving[8]

procedure of the reduction type. We also call this tree a problem-solvingtree. The main types of elements that appear in a problem-solving tree arestates and moves.

A state is a description of the problem that confronts the problemsolver at a given stage of his (its) activity. It includes index of uncertainty

about the solvability of the problem.A move in a problem-solving tree is an operator that either effects a

transition from one state to one or more subordinate states (with the intentionof reducing the uncertainty of the initial state), cr it recognizes that a

certain state is conclusive; a conclusive state is characterized by completecertainty, and when it is recognized as such no further problem—solvingactivity takes place from that state.

Problem-solving procedures of the reduction type are characterized bytheir mode of growing a search tree which is context free in the followingessential sense: the choice of moves from a given state is independent ofmove choices in any other state of the search tree. This property allows agreat flexibilityof approach in organizing the solution activity for the setof subordinate problem-states that demand attention at any one time. It alsosuggests systematic and efficient computer realizations of these procedures viarelatively simple mechanisms that can be used uniformly for classes of suchprocedures .

Clearly, our decision procedures arc themselves problem-solvingprocedures of the reduction type (we will call thorn subsequently reductionprocedures). In effect, our development of the. natural decision systems was

I

55

intended to bring us to a point where pur problem of devising efficient

procedures for syntactic analysis could be properly formulated as a problem

of devising efficient reduction procedures.

The description of the decision procedures in the previous section

provides the general outline of the structure of reduction procedures. The

execution of these procedures proceeds in cycles. In each cycle there is a

generation phase and an evaluation phase. During the generation phase,

decisions on the growth of the search tree are made. During the evaluation

phase the new growth is appraised, and various performance measures (such asvalue) are readjusted over the search tree. These new performance measures

guide the decisions in the next stage of generation. There are, in addition,

output procedures and executive procedures for tying together the entire problem

solving activity.

It is evident that we can obtain a variety of specific reduction

procedures if we fix the following two essential features:

(1) The basis for choosing the direction of approach at different stages

of solution, and the restrictive conditions for selecting the relevant set of

rule applications as well as of the relevant set of partitions,

(2) the strategy for selective direction of attention to the non-conclusivesubordinate problems.

A . States, Moves, and Search Trees in Reduction Proceduresfor Syntactic Analysis

Before shaping the above two features to optimize computational

efficiency, we discuss the reduction procedures for syntactic analysis and

the search trees that they generate. We have two types of problem states.

(1) States that contain a specified w.f.f. , such as S=(X => cd), togetherwith its associated value vk (S); we denote such states by ]L.

(2) States that contain a w.f.f. system, such as

[S, - (*>< 1 > -x,),..., sn = (*><'> x 1x2---xn -»}

where Xi> """> X-, are strinS variables, and co is a string in V, together withthe value associated with such a system, as defined in (5.3); we denote such

states by2-* .The moves used in our reduction procedures arc. of four types, as follows

(1) Replacament moves, whore (i) a non-unary rule of replacement is applied

"s

56

at a certain site of a specified w.f.f. in a given state, and it produces

a w.f.f. system in a subordinate state, and (ii) a unary rule of replacement

effects a transition between two specified w.f.f.'s in consecutive states; wename such moves by the appropriate rule application.

(2) Compound replacement moves (maneuvers), where 2or 3 rules of replace-

ment are simultaneously applied to a specified w.f.f. in a given state and theyproduce a w.f.f. system in a subordinate state; we name such moves by theappropriate set of rule applications.

(3) Partition moves, where a state that includes a w.f.f. system of n.w.f.f.is transformed into n subordinate states containing one specified w.f.f. each;

(4) Recognition moves, where a specified w.f.f. in a given problem state

is recognized by an axiom of our system as conclusive and it receives a stablevalue (1 or 0); we name such a move by the appropriate axiom.

In our reduction procedures, we shall use bisection moves exclusively.

This permits us to consider all the relevant partitions of a given string bycomposing bisection moves in an orderly manner. In the search tree, the effectof a full partition move is obtained by a cascade of bisection moves. We

illustrate in Figure 13 the process of applying a replacement move followed by

a partition move in the form of two bisection moves. In this figure, the4replacement move R (which is one of the relevant replacement moves) is applied

to the specified w.f.f. X => Bto in the state 2jy and the state £* resultswith a w.f.f. system which includes 3 w.f.f.'s, each with a string variable.A bisection move, is then applied on the w.f.f. system. This move chooses a

specific assignment for the pair of strings Xi and X9*3 an<l it Produces two

new states, 2 2 and J^ . Here, zl2 includes the specified w.f.f. X =»A^ , and

52 has a w.f.f. system with 2 w.f.f.'s. The second bisection move chooses2a specific assignment for the strings X2and X3and it: results in 2 new states

each of which includes a specified w.f.f

B. Computational EffortThe search tree in Figure 13 is typical for the problem described in

the state 2^. . Mark the node on which computer attention is currently focusedby a symbol denoting the sub-procedure of the main procedure which is in controlat that time. We then obtain an instantaneous description of the computation

[9](in the sense used by Davis for Turing machines). In Figure 13, we havemarked ]T, by f] ; this is intended to indicate that a growth procedure focused3 8

57

i

A|tA2 , A3 STAND FOR TREECONTINUATIONS

S~A)2.(X=S3* Qui)1

I v

R e A C .-Nt")ny V. /V J REPLACEMENT1 XA MOVES

j C=^X2lfa>=X2X3X, J ALTERNATIVE"n RELEVANTL BISECTIONS OFJ w, BASED ON

X| AND X 2X3

RELEVANT BISECTIONSOFTHE STRING w-Xj ,BASED ON X2AND X 3

Fig. 13. Representation of the application of a replacement\ move, followed by two bisection moves, in a problem-

solving tree.

2 2(X=3>AX.) fD=>X 3 1„*C_> — |C=*>X2 JJ?2

X 2X3X, =a ALTERNATIVEX f \ "N I RELEVANT BISEB(NH(X,))/C (sl" f OFTHE STRINI

/ \ S^p BASED ON X 2/

Ta 23 (D=^X 3 ) (C^X 2 )S.

NOTE:

58

on 2L/3 is now in control. We can regard the sequence of instantaneous de-scriptions that are recorded at the beginning of a cycle as a good representation°f the course of computation of a reduction procedure. It is reasonable toexpress "computational effort" as a measure of complexity defined over such acourse of computation. An important measure of computational complexity forany given instantaneous description is the number of its state nodes.

We define computational effort E(S,P) associated with a procedure Prelative to a root w.f.f. S (the problem statement), as the sum of weights ofstate nodes in the last instantaneous description when the computation stops.We assume here that a state "weight" exists which reflects the relative processingcomplexity associated with a state node in a given procedure.

This measure of effort reflects the size of the largest search treegrown by the procedure just before it provides a definite answer to the problem.The smallest tree that a procedure can construct before it stops has the samenumber of state nodes as the number of nodes in a P-marker of the string x.

Let E(N,P) denote the expected computational effort of the procedure Pover all strings xin V that have length N. Given a class of reduction proceduresfor syntactic analysis, it is of interest to ask whether a given procedure inthe class solves the syntactic analysis problem with the least E(N,P), for allN. Furthermore, it is natural to look for a new procedure which is betterthan any of the procedures in the given class in one of the senses just specified.Suppose that the goal of a proposed procedure is to satisfy this latter re-quirement. Since we cannot prove that the proposed procedure attains the goal,it has the status of a heuristic procedure relative to that goal.

The only way of ascertaining the relative ranking of candidate procedureswith respect to expected computational effort is through computer experimentation.We feel that the notion of E(N,P) is useful for such experiments. It is formu-lated at a broad enough conceptual level, so that it can be estimated for avariety of procedures that are implemented in different forms.

c * Approaches to Move SelectionNote that the main cause of branching in a search tree is generally due

to the multiplicity of jelgvant, bisection moves. This is a critical problanfor the design of efficient reduction procedures. Thus, it is highly desirableto use all the existing relevant information and also to develop special tests

V

59

towards the achievement of the largest a priori restriction possible over theset of relevant bisections. We shall consider next a concept which is extremely

useful for restricting bisection moves.Consider a pair [a,p], where a,p c V„, such that for any two strings

x, yin VT, and for a pair of nonterminal elements [A,B] the string xOJ isA-derivable from the right in G and the string Py is B-derivable from the left

in G. The set of all pairs fa,p]~, which we call boundary pairs, defines aboundary for the pair of nonterminals [A,B]. Any string z in V„ which is

derivable from the string AB in G must contain a boundary pair belonging to

the boundary of [A,B]. We will assume that a "boundary test" is available in

the system, and we will use it in the definition of relevant sets of bisections.Let us consider a w.f.f. S = (X => to), where X c V„, and co is a string

in V. Let us examine sets of relevant replacement moves and relevant bisections

for various "pure" and "mixed" approaches to solution.

(1) "Pure" systems .(a) Approach from the "Top" (see Fig. 3-2): Consider a non-unaryreplacement rule R: A-» c/ 1^ ... </n) , where A c VN, and cp . , """, cp c

The rule R is a relevant replacement rule for "Top" application on the

w.f.f. S if it satisfies the following 4 conditions:

(1) A= X; (2) qp(1^ £tv is satisfied;" (3) cp(n^ is

satisfied;

(4) s(cp) < l(co).** (6.1)

If R is applied on S from the "Top" it produces a w.f.f. system which

includes n w.f.f.'s in the form S = (9 =» x), ' < j <n> where the

X's are string variables whose values are to be specified by the relevantpartitions.

In order to specify all the relevant partitions of the string cd (in n

parts) we have to consider, one after the other, all the n-1 sets of bisectionmoves. The first bisection can be attempted from the left end or the right end

of co. Suppose that the bisection, [1, , 1 ], is attempted from the left, with

the purpose of producing a pair of strings Xj > Xo - *'^-The relevant interval

* See discussion on derivabilities from the left and the right in the previoussection.

*"'"' See (2.6) and (2.8) for definition of support s.

60

for the set of first bisections can be defined by,

each pair of string elements in the relevant interval which is a member of theboundary Bet of [9 , 9 ] defines a relevant bisection. For each specifi-

cation of Xi o>y a relevant bisection), say y.» we can now obtain the Bet °*relevant bisections that will specify y- in tbe same loader done before, and

so on. Note that if the nonterminal element A is self-imbedding, then the

usefulness of the "boundary test" decreases, since the occurrence of "spurious"

boundary pairs tends to be more frequent inside the relevance interval.(b) Approach from the "Left" (see Figure 7) : Consider again the rule

R given in (a) above. This rule is a relevant replacement rule for "Left"

application on the w.f.f . S if it satisfies the following 4 conditions:

The first bisection can be attempted from the left end or the right end of thestring cd. Suppose that the bisection, is attempted from the right, with the

purpose of producing a pair of strings X2"'Xn and Xi, " The relevant intervalfor the set of first bisections can be defined by

(6.4)

each string element in this interval which is A-derivable from the right in G

defines a relevant bisection. Note that the restrictions on relevant bisectionshere are weaker than in the approach from the "Top". After specifying a Xj bY

the first bisection, the process of successive bisections continues in the sameway as with the first. Note that if A is

left-recursive,

then the usefulnessof the "right-derivability" test decreases, since the occurrence of elementssatisfying this test tends to be more frequent in the relevance interval.

(c) Approach from the Right (see Figure 9) : The situation is similarto (b) above. In the present case, the choice of a relevant bisection involvesa test of "left-derivability" from A; the usefulness of this test diminishes ifA is right-recursive.

s(90)) <\ < 4 (co) - V\(cp (j)); (6.2)

J-2

(1) m* 1)*> co* 1 ); (2) X£a is satisfied;

(ft 1^

(3) 9(2) £")(2) is satisfied; (4) ]£ s(9(j)) < l(<a)-l.

J-2

0 <1„ <l(co(1)) - £ s(9(j) );j=2

61

(2) "Mixed" Systems

We shall discuss only one combined approach with the purpose of illus-

trating the advantages of mixed systems: from "Top" and "Left", see Figure 14

Consider a non-unary replacement rule R. : A. ■*■ B,C, which is a candidate

for "Top" application, and a rule R2: A 2 ■* B^ which is a candidate for "Left"application. The. pair of rules R,, -^ Is a relevant pair if the following

conditions are satisfied:

(1) A 1 =X; (2) Bj £a 2 is satisfied;

(3) B, = co* 1*; (4) C. co is

satisfied;

(6.5)

Suppose that the first bisection [ lh, lj.] is attempted from the right with the

purpose of producing a pair of strings cd* I*1 '* The relevant interval for

the set of first bisections can be defined by

(6.6)

each pair of string elements in the relevant interval which is a member of the

boundary set of [8., C. ] defines a relevant bisection.

The combined approach just described is a compound move which has

the property of an especially useful maneuver. It produces simultaneously two

compatible "bridgeheads" from triangle corners (see Figure 14), and in so doing

it appreciably reduces unnecessary growth in the search tree. Clearly, a

simultaneous approach from three corners would have reduced to a greater extent

the total search effort needed. In this connection, it is suggestive to consider

our problem as one of solving a triangular jigsaw puzzle of a special kind. It

is clearly reasonable to start by filling in as much of the boundaries as

possible before venturing into the central region.

In general, as we increase sharpness of selection from specification

of many simultaneous requirements, we reduce the maximal search tree needed for

selection. The local effort needed for selection also increases under those

circumstances.In some cases it may be possible to have a complete "mixed" system

(t,4,r), and yet to approach the problem, at each state from a single (but not

necessarily identical) direction, reducing this way the amount of local compu-

tation needed st a state. The decision on the choice of approach can be carried

(5) C 2 4cd(2) is satisfied; (6) s(C,) + Max[s (C2,5(B 1)] < l(co)-1

8(0,) <lt <Max[s(C2), s(Bt)];

62

NOTE:THE SHADED AREASCORRESPOND TO PROBLEMAREAS THAT REMAIN AFTERTHE APPLICATION OF THEMOVE

Fig. 14. A compound replacement move (a maneuver) from the"top" and "left".

(X=S>w)

B, =>A 2£02=^X2

63

out locally at each state by examining the recursiveness properties of theright-side element of the replacement rule which is under consideration forapplication at the w.f.f. of that state. If this clement is left-recursive,an approach from the "Left" should be avoided. If it is self-embedding a"Top" approach is to be avoided. If it is right-recursive, an approach fromthe "Right" is not indicated. This advice is based on the comments that wehave made previously about the blunting effects of certain recursivenesssituations on the selectivity of relevant bisection moves.

The variety of possible modes of selecting replacement and partition

moves from a state must be evident by now. A specific choice of mode depends

on the characteristics of the available computer system. An experimental

study of the relationship between various modes would be of considerable interestIn our heuristic procedures, we assume that if no relevant (compound)

replacement move is found from a state or if no relevant bisection is found,then the state is immediately assigned the value 0.

We assume that the axiom set of Jf(G) is available for the recognition

of conclusive w.f.f.'s in the heuristic reduction procedures. Our remarks inSection V on the possible augmentation of the axiom set, both for validationand

refutation,

are relevant here.

D. Approaches to Attention Control

We assume that at each generation cycle of a heuristic procedure

single "quantum of growth" is initiated from a terminal state of its search tree(the state under attention). The growth starts with the application of all therelevant replacement moves on the specified w.f.f. contained in the state underattention, and it proceeds with the application of all the relevant bisectionmoves, until all the new terminal states contain specified w.f.f.'s.

At each problem solving cycle a decision is made as to "vhere to gonext" so that expected computational effort is minimized for the problem onhand. Estimates of expected effort at non-conclusive terminal states can beused to estimate expected effort for all the other non-terminal nodes of thesearch tree whose value is uncertain. These effort estimates can then be usedin a systematic manner to control the formation of an "attention path" from theroot node to a terminal node. An important factor in the choice of an approachto the control of attention which is oriented to the minimization of expectedeffort is the expected multiplicity of solutions. In our case, this is relatedto the question of syntactic ambiguity. If we assume that the solution isunique if it exists (this is a reasonable assumption for programming languages

64

without known ambiguities), then it is possible to formulate a scheme of

attention control which is both simple and it has satisfactory effort saving

properties. We shall outline this scheme below.We associate with each node Z of the search tree the two following

effort estimates:

(Z): estimate of the expected number of state nodes in the subtree rooted

at Z if the node Z is eventually assigned the value 1■°(Z): estimate of the expected number of state nodes in the subtree rooted

at Z if the node Z is eventually assigned a value 0.These estimates change from cycle to cycle as the growth of the search

tree evolves.If Z is an OR' node (all nodes except bisection move nodes) with n

descending branches, each having an associated pair of effort estimates

c?, c! (1 < i < n), then the effort estimates at Z are computed as follows:

(6.7)e°(Z) - £ e°

i-1

If Z is on an "attention path" descending from the root node, then it chooseswhich has mm. cthe next segment of the path below it

If Z is a bisection move node (an AND node) with two descending states

e2, e2, then the effort estimates at Zthat have effort estimates c., c, andare computed as follows

where c, is such that c, + c. <e2 + e2 (6.8)

If such a move node is on an "attention path", then it chooses the next segment

* OR and AND in the sense of Section V.

e*(Z) ""J [S e{ + <n" 1) e° + <n"2 > e2 +"" + en-l] '

where et , .... c , are such that1 n-l

o o .0.0c, <e2 ... <cen_, < en -

.'(Z) - ej +ej

S

°(Z) - +e° + e|)

65

o 1of the path below it which has mm. c + cConsider terminal states whose w.f.f. have strings of length N. Let

e'(N) be the upper bound on the number of state nodes in the search tree that

appear below such a state when the state is validated. Similarly, c (N) is

the upper bound on the nodes when the state is refuted. We take c (N) and

B°(N) to be the effort estimates at the terminal states. In the computation

ef these upper bounds we consider that all the possible bisections of a string

are taken at a bisection node; the max. nunber of applicable replacement rules

which is possible for any element of V is taken at a replacement node. Also,

in this computation the rules for processing effort estimates (i.e., (6.7) and(6.8)) and for controlling the "attention path" are used consistently. It is

evident that we can compute recursively c (N), c (N) under these assumptions,

starting with e^l) - 1 and e°(1) - 1, using them in c (2), e°(2), and so on.

These estimates can then be available in the form of tables or (approximate)

functions for use in any syntactic analysis problem. From preliminary hand

experiments with a heuristic procedure that uses the scheme for attention

control just outlined, we feel that it provides a strong basis for selectivity

of search along the most relevant lines.It is important to note that the tighter the basis for effort estimation

the better a heuristic procedure P will be from the point of view of minimizing

E(N,P).

66

REFERENCES

[1] Chomsky, N., and Miller, G. A., "Introduction to the Formal Analysisof Natural Languages", in Handbook of Mathematical Psychology ,(Eds.) Bush, Galanter and Luce, Vol. 2, Ch. 11, Wiley, 1962.

[2] Griffiths, T. V., and Petrick, S. R., "On the Relative Efficienciesof Context-Free Grammar Recognizers", pp. 289-300, Communications ofACM, Vol. 8, No. 5, May 1965.

[3] Goto, S., "Specification Languages for Mechanical Languages and TheirProcessors — A Baker's Dozen," Communications of the ACM, Vol. 4, No.12, Dec. 1961 .

[4] Chomsky, N., "Formal Properties of Grammars", in Handbook of MathematicalPsychology , (Eds.) Bush, Galanter and Luce, Vol, 2, Ch. 12, Wiley, 1962-

[5] Simmons, R. F., "Answering English Questions by Computer: A Survey",Communications of the ACM, Vol. 8, No. 1, January 1965, pp. 53-71.

[6] Newell, A. and Ernst, G., "The Search for Generality", Proc. of theIFIP Congress 1965, Vol. 1, Spartan Books, 1965.

[7] Newell, A., Shaw, J. C, Simon, H. A., "Report on a General Problem-Solving Program for a Computer", in Information Processing: Proceedingsof the International Conference on Information Processing, pp. 256-264,UNESCO Paris, 1960.

[8] Walters, D., Amarel, S., "Heuristic Theorem Proving", Parts I and 11,Final Report AFCRL-62-367, on Contract AF19(604)-8422, May 1962.

[9]

[10] Newell, A., Simon, H. A., "The Logic Theory Machine: A Complex Infor-mation Processing System", IRE Transactions on Information Theory,Vol. IT-2, No. 3, September 1956.

[11] Wang, H., "Toward Mechanical Mathematics", IBM Journal of Research andDevelopment, Vol. 4, No. 1, January 1960, pp. 2-22.

[12] Davis, M., "Eliminating the Irrelevant from Mechanical Proofs", Proc.Symposium Applied Math., Vol. XV, pp. 15-30, American MathematicalSociety, Providence, R. 1., 1963.

[13] Gentzen, G. "Untersuchungen iiber das logische Schliessen", Math. Zeit.,39 (1934), pp. 176-210.

[14] Jaskowski, S., "On the Rules of Suppositions in Formal Logic," StudiaLogica, 1 (1934), pp. 5-32.

[15] Fitch, F. 8., Symbolic Logic: An Introduction , Ronald Press Co..New York, 1952.

Davis, M., Computability and Unsolvability , McGraw Hill, 1958.

67

Nidditch, P. H. , Introductory Formal Logic of Mathematics, UniversityTutorial Press, London 1957.

[16]

[17] Lambek, J., "On the Calculus of Syntactic Types", in the Proceedings

of Symposia in Applied Mathematics, Vol. XII, Ed. by R. Jakobson.

[18] See Rosenbloom, P., The Elements of Mathematical Logic, Ch. 11,Sec. 4, Dover Publications, 1950.

Amarel, S., "An Approach to Heuristic Problem Solving and TheoremProving in the Propositional Calculus", in Systems and Computer

Science,

Hart and Takasu, eds., University of Toronto Press, 1967.

[19]

UNCLASSIFIEDSecurity Classification

Security Classification


KEY

WORDS

Syntactic analysis

Heuristic programmingProblem solving

Theorem proving

Natural inference systems

Computer linguistics

IN!

1. ORIGINATING ACTIVITY: Enter the name and addressof the contractor,

subcontractor,

grantee, Department of De-fense activity or other organization (corporate author) issuingthe report.

2a. REPORT SECUHTY CLASSIFICATION: Enter the over-all security classification of the report. Indicate whether"Restricted Data" is included. Marking is to be in eccoro-ance with sppropriate security regulationa.26.

GROUP:

Automatic downgrading is specified in DoD Di-rective 5200. 10 and Armed Forces Industrial Manual. Enterthe

group

number.

Also,

when applicable, show that optionalmarkings have been used

for

Group 3 snd Group 4 aa author-ized.3. REPORT TITLE: Enter the complete report title in allcapital tetters. Titles in all caaea should be unclaeeifled.If a meaningful title cannot be selected without claselflca.

tion,

show title classification in all capitals in parenthesisimmediately following the title.4. DESCRIPTIVE NOTES: If appropriate, enter the type ofreport,

e.g., interim,

progress,

summary, annual,

or »n«-Give the inclusive dates when a specific reporting period iscovered.5. AUTHOR(S): Enter the name(s) of authoKa) as shown onor in the report. Enter last name, first name, middle initial.If military, show rank and branch of aervice. The name

of

the principal author is an abaolute minimum requirement.6. REPORT DATE: Enter the date

of

the report as day,

month, year;

or month,

year.

If more than one date appearson the report, use date ofpublication.7a. TOTAL NUMBER OF PAGES: The total page countshould

follow

normal pagination proceduree.

i.e.,

enter thenumber of pages containing information.7b. NUMBER OF REFERENCES: Enter the total number ofreferences cited in the report.

Ba. CONTRACT OR GRANT NUMBER: If appropriate, enterthe applicable number ofthe contract or grant under whichthe report was

written,

86, Be,

fc td. PROJECT NUMBER: Enter the appropriatemilitary department

Identification,

auch aa project

number,

aubpraject

number,

ayatem

numbera,

taak

number,

etc.

9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi-cial report number by which the document will be identifiedand controlled by the originating activity. Thla number mustbe uniqueto thla report.

96. OTHER REPORT NUMBERfS): If the report haa beenaaslgned

any

other report numbera (either by the originatoror by the sponsor), alao enter this numbers).

ITIONS

10. AVAILABILITY/LIMITATION NOTICES: Enter any lim-itations on further dissemination of the report, othar than thoseimposed by security

classification,

using standard statementssuch as:

(1)

"Qualified

requesters

may

obtain coplea of thisreport from DDC"

(2) "Foreign announcement and dissemination of thisreport by DDC is not

authorized,"

(3) "U. S. Government agenciea

may

obtain copies ofthis report directly from DDC. Othar qualified DDCusers shall request through

(4) "U. S. militaryagenciea

may

obtain copies of thisreport directly from DDC Other qualified usersshall request through

M

(5) "All distribution of this report is controlled.

Qual-ified

DDC users shall request throughii

If the report hss been furnished to the Office of Technical

Services,

Department of

Commerce,

for sale to the public, Indi-cate this fact and enter the price. Ifknown.

IL SUPPLEMENTARY NOTES: Uae

for

additional explana-tory notes.li SPONSORING MILITARY ACTIVITY: Enter the name ofthe departmental project officeor laboratory sponsoring (pay-ing lor) the research and development. Include address.

13. ABSTRACT: Enter an abstrsct giving a brief and factual

summsry

of the document indicative of the report, even thoughit

may

also sppesr elsewhere in the body

of

the technical re-port. If sddltlonal apace is required, a continuation sheetshsll be attached.

It ls highly desirable that the abstract of classified re-ports be unclassified. Eech paragraph of tha abstract shallend with an Indication

of

the military security classificationof the information in the paragraph, represented ss (TS), (S),(C), or (V).

There is no Umltstlon on the length of the abstract. How-ever, the suggested length la

from

150 to 225 worda.14. KEY WORDS: Key words are technically meaningful termsor short phrases thst charscterlze s report snd

may

be used SS

index entries

for

cataloging the report. Key words must besetecteH so thst no security classification is required. Iden-

fiers,

such as equipment model designation, trade name, mili-tary project code name, geographic

location, may

be uaed askey words but will be followed by sn indlcstlon

of

technicalcontext. The assignment

of links, rules,

and welghta laoptional.


*,

problem-solving procedures forrf717ks1306/rf717ks1306.pdf · problem-solving research thatare...

Documents