problem-solving procedures forrf717ks1306/rf717ks1306.pdf · problem-solving research thatare...
TRANSCRIPT
SCIENTIFIC REPORT NO. 1
PROBLEM-SOLVING PROCEDURES FOREFFICIENT SYNTACTIC ANALYSIS
by
Dr. Saul Amarel
Prepared for th* Air Fore* Office of Scientific Research of the Officeof Aerospace Research under Contract No. AF 49(638)- 1 184.
RCA LaboratoriesPrinceton, New Jersey 08540
Qualified users may request copies of this report from DDC.
SCIENTIFIC REPORT NO. 1
PROBLEM-SOLVING PROCEDURES FOREFFICIENT SYNTACTIC ANALYSIS
by
Dr. Saul AmarelRCA Laboratories
Prepared for the Air Force Office of Scientific Research of the Officeof Aerospace Research under Contract No. AF 49(638)- 1184.
Princeton, New Jersey 08540
II
PREFACE
This paper was presented at the ACM 20th National Conference, held in
Cleveland, Ohio, on August 24-26, 1965. It was not included, however, in
the Proceedings of the Conference (partly because of its length). Plans to
write a modified version of this paper, to include some new results, have
delayed its publication. However, since this paper includes concepts from
problem-solving research that are applicable to the clarification of certain
questions of current interest in computer linguistics, we feel that it would
be appropriate not to delay its publication any longer, and to issue it at
present as a scientific report.
Saul AmarelPrinceton, N. J.May 1968
111
ABSTRACT
The main purpose of this report is to present a logical framework inwhich the syntactic analysis problem can be represented. This frameworkoriginates from previous work on problem-solving procedures for theorem
proving. Procedures for syntactic analysis are represented as reduction
procedures where a problem undergoes a sequence of nested transformations thatresult in a set of simpler subordinate problems whose solution implies thesolution of the original problem. Our representation of the syntactic analysisproblem provides a unifying basis for expressing a variety of syntactic
analysis procedures, both existing ones as well as new, proposed, procedures.Such a common basis contributes to a better understanding and systematization
of the programming of syntax-directed compilers and of other translators whose
source language is a context-free fragment of natural language, e.g., some
"question-answering" systems. A useful concept of computational effort is
defined, and it is used as a guide for the formulation of new efficient
procedures. Heuristic procedures for syntactic analysis are suggested. Some
features of these procedures are relevant to the design of advanced syntax-
directed translators.
V
TABLE OF CONTENTS
Section Page
I. INTRODUCTION 1
11. LINGUISTIC BACKGROUND AND FORMULATION OF THE SYNTACTIC ANALYSISPROBLEM 4
A. The CF-Grammar System 4B. The Syntactic Analysis Problem 14
111. FORMULATION OF THE SYNTACTIC ANALYSIS PROBLEM IN SYSTEMS OFNATURAL INFERENCE 19
A. Theorem-Proving Formulation 19B. The Class of Natural Inference Systems N (G) 20a
The Natural Inference System N (G) 21
The Natural Inference System N.(G) 28
The Natural Inference System N (G) 31Logical Consistency Theorem 36
IV. CONVERSION FROM PROOF TREES TO DESCRIPTION TREES 38
V. NATURAL DECISION SYSTEMS FOR SYNTACTIC ANALYSIS 39A Class of Decision Procedures for N (G) (for all 0)Completeness Theorem a 50
VI. HEURISTIC PROCEDURES OF REDUCTION TYPE FOR SYNTACTIC ANALYSIS . 54A. States, Moves, and Search Trees in Reduction Procedures for
Syntactic Analysis 55B. Computational Effort 56C. Approaches to Move Selection 58D. Approaches to Attention Control 63
REFERENCES 66
VI
ILLUSTRATIONS
figure
1 Graph representation of a labelled n-nary replacement rule
K ± : A-,/^. . . *(tO , where ♦<», ♦<«, .. . , /"W ... 9
2. The graph, r(G{) , of the grammar Gx 9
3. A P-derivation of a terminal string and its corresponding, 11P-marker
4. Schematic representation of the alternative approaches to the
construction of a proof of P =» x in the systems Nff(G) 22
5. Graphic interpretation of the situation considered in the
specification of a rule of inference It ± of Mt(G) , which
corresponds to a "top" application of a rule of replacement24
R. of Gl
6. The proof is tree form of P=» abed in the system N^G^. . . .27
7. Graphic interpretation of the situation considered in the
specification of a rule of inference I^. of N^G) , which
corresponds to a "left" application of a rule of replacement
R. of G 29
8. The proof in tree form of P=> abed in the system . ... 32
9. Graphic interpretation of the situation considered in the
specification of a rule of inference I± of Nr(G) which
corresponds to a right application of a'rule of replacement
R. of G 33i
10. The proof in tree form of P=» abed in the system N^Gj) . ... 35
11. One of the tree form proofs of P =» abed in the mixed system
N(M,r/Gl> 3?
12. Inference mapping tree 48
13. Representation of the application of a replacement move,
followed by two bisection moves, in a problem-solving tree. . . 57
14. A compound replacement move (a maneuver) from the "top" and
"left" 62
1
INTRODUCTIONI.
The problem of efficient syntactic analysis in a context-free (CF)language is of considerable practical importance for the design of syntax
directed compilers and of computer processors whose source language is a CF
fragment of natural language, e.g., "question-answering" systems. The problem
is also of theoretical significance for the study of grammars and for theexploration of perceptual models of language.
Our main purpose in this paper is to develop a broad logical framework
in which the syntactic analysis problem can be represented, in a way that will
permit us to consider a large class of syntactic analysis procedures, among
them in particular, procedures that perform efficient syntactic analysis. The
framework to be presented originates from attempts to systematize certain
essential elements in heuristic problem-solving procedures. The problem of
syntactic analysis is a theorem-proving problem. We will demonstrate that it
can be effectively solved by procedures of the reduction type. In such
procedures a problem undergoes a sequence of nested transformations that result
in a set of "simpler" subordinate problems whose solution implies the solution
of the original problem. We will also show that heuristic problem-solving
procedures of the reduction type are' excellent candidates for the solution of
the problem of efficient syntactic analysis. While most existing syntactic
analysis procedures carry out their task in a rigid way, our proposed heuristic
procedures exhibit considerable flexibility of approach (because of their more
global view of the problem) and they can attain greater computational efficiency
(in the sense of avoiding needless search) by selectively responding to special
properties of the string at hand.The point of view that we are proposing for the syntactic analysis
problem provides a unifying basis for expressing a variety of syntactic anal/sisprocedures, both existing ones as well as new, proposed procedures. We will
show that it is possible to define a useful concept of computational effort
within our general framework, and will use this concept directly as a guide for
the formulation of efficient procedures; this same concept can also be used asa general basis for comparing procedures that are expressed within our frame-work.
A serious obstacle to the application of ideas that come from artificialintelligence research to problems of practical interest is the difficulty ofrepresenting the problem in an "appropriate" form, i.e., in a form that makes it
2
easy to transfer concepts and methods that were developed for a prototype
problem to the problem on hand. Because of the importance of the question of
transforming the problem representation from its original form to an "ap-
propriate" form, we are giving it major emphasis in this paper.
We formulate in Section II the syntactic analysis problem in its
conventional linguistic form, where the concept of a granroar G as a combinatorial
system of concatenation is central, and where a language L(G) ls defined «s «
set of strings generated by the grammar. We then show in Section 111 how the
problem can- be regarded as a theorem-proving problem in a logic, and we
formulate a set of natural inference systems, Nff
(G), in which our problem can
be represented. We then prove that the systems Nff(G) are consistent with the
linguistic system G, i.e., if a solution to our problem exists in N^G), then
it also exists in G. Our mpve from Gto the systems Nff
(G) was suggested by
previous work with heuristic theorem-proving procedures for the propositional
calculus, where we have found that a formulation of the problem in a system of
natural inference (a system of subordinate proofs) was especially fruitful.
A proof in a system Nff(G) has the form of a tree and it corresponds to
a structural description of an input string in the language L(G) . Because of
the flexibilityof proof construction afforded by the natural inference systems,
a proof may be obtained in a variety of tree forms. In Section IV we discuss
the correspondence between a tree proof in any system N^G) and a structural
description of a string in the CF language; this clarifies the question
of structural consistency of the natural inference systems with respect to
the grammar.The completeness of the systems N
ff(G) with respect to G [i.e., if the
problem has a solution in G it is also solvable in any of the systems N^GS,is proved in Section V by embedding the natural inference systems in a set of
aystems JC*(.G) of broader scope. The latter systems are natural extensions of
the systems N (G), and they are obtained by strengthening the inferential
mechanisms of\(G) so that both proofs and refutations can be obtained in
them in a uniform way. We call the stronger systems natural decision systems.
TheVformulation of a general schema for decision procedures ih the natural
decision systems (given in Section V) provides us directly with a large class
of reduction procedures for the solution of our syntactic analysis problem.
In these procedures a search tree is grown as the computation proceeds, and
the growth stops when either a proof or a refutation is obtained. In Section VI
3
we introduce a measure of computational effort which is related to the size of
the maximal search tree which is grown by a procedure. We then develop the
essential features of heuristic reduction procedures for syntactic analysis.
Our approach is guided by the goal of minimizing the expected computational
effort that a procedure is to expend in the course of attempting to construct
a structural description for a string in a CF language. The ability to choose
a different method of attack in response to the properties of the specific
problem on hand, the possibility of considering a restricted set of relevant
moves, the formulation of compound moves (or maneuvers), and the "on-line"
direction of the thread of computation to those subordinate problems that
promise a minimal expected expenditure of estimated computational effort are
the essential features of these heuristic procedures. The estimation of
expected effort along alternative lines of solution is an extremely useful
concept for the organization of intelligent search in heuristic problem -solving procedures. In our present case, we find that this concept can be
applied to great advantage since, as we will show in Section VI, it is possible
to formulate a reasonably good estimate of expected computational effort on
the basis of string length.
4
11. LINGUISTIC BACKGROUND AND FORMULATION OF THESYNTACTIC ANALYSIS PROBLEM
The formal linguistic definition of a context-free (CF) language L(G)
is commonly given in terms of its CF-grammar, G
which is regarded as a combinatorial system of concatenation (see Chomsky' ' J) .V in (2.1) is a finite set of elements, called the vocabulary of G.
The concatenation of a finite number 0) of elements in V forms a
string inV*; in this pcper we denote strings in Vby9, X' * and (P°seibly
subscripted) .A string is represented as a juxtaposition of the symbols that denote
its successive elements. If n is the number of elements in a sering 9, then
1(9) "n; 1(9) is called the length of <p. We shall use the following no-
tational convention for naming component elements of strings: If 9is a non-empty string, then 9O denotes the leftmost symbol in <p, 9 the next to tha
leftmost symbol, etc.; furthermore, 9^ denotes the rightmost symbol in 9,<[>^ the one preceding the rightmost symbol, etc. If 1(<P) "n, then we have
The etTOtv string, of length 0, is denoted by A. The concatenation of a pair
of strings 9 and x is denoted by 9X*VT in (2.1) is a set which is properly included in V and it is called
the terminal vocabulary; its elements are called terminal elements. Theconcatenation of a finite number (possibly 0) of terminal elements forms a
string in VT: we denote strings in Vj by x, y, z, (possibly subscripted).
The sentences of the language L(G) form a subset cf the set of all strings
in VTThe complement of the set VT with respect to V is denoted by VN and
it is called the nonterminal vocabulary; its elements are called nonterminal
.elements. We have then, V«Vj U Vj, and VT fl VN ■ Q>. The nonterminalelements are used to represent syntactic types in the grammar.
p in (2.1) is a finite set of replacement ruins that are given inthe form
A "* 9 (2.3)
"* For brevity, we will use "9 is a string in V" or "9 is in V" for "9 is a
string whose elements are taken exclusively from the. set V"; similarcomments hold for "x is a string in VT" or "x is in V".
A. The CF-Grammar System
G»< V, VT, p, P>, (2.1)
9 - 90) 9(2) ... 9(n- ,} 9<n)"= 9
("> 9(nC,) ... 9
<?)9
<T) . (2.2)
5
where AcVNj ar>d 9is a non-empty string in V. The arrow, -►, stands for a non-reflexive and asymmetric dyadic relation whose interpretation is "can be re-placed by". As an example, the replacement rule A ■*■ ABb can be read as "A canbe replaced by the string ABb". A replacement rule A * 9 is called n-ary if1(9) - n (in the previous example, we have a ternary rule); in most CF grammarsthat have been proposed for fragments of natural .languages, as well as forprogramming languages, - n is 1, 2, or 3.
A string 9 is derivable in G from a string ty if and only if thereexists a sequence of strings D(ty,9|G),
(2.4)
such that(i) 9, "y> <Pn *» <P, and
(ii) for any two consecutive strings 9., 91+.j (i < n for n > 1), thereare strings Xi> X? not necessarily distinct, possibly empty) andreplacement rule A "*" a>, such that 9. ■Xi X9an<* 'Pi+i "Xi"* %"
We call the sequence D(ty,9|G) a ty-derivation of 9 in
G,
and we denote therelation "9 is derivable in G from ty" by ty =» 9. Clearly, the relation "» isreflexive and transitive. If ty =» x holds, and if x is a string in theterminal vocabulary V_, then x is called a terminal string of ty in G.
For any nonterminal element X, let =C(G) denote the set of all terminai strings of X in G, i.e.,
(2.5)
We assume that G is such that is not empty for any nonterminalKin VM. This is clearly a necessary condition for the inclusion of a non-terminal element in a grammar (of any practical interest).
We now introduce the notion of a support. s(CK), Of an element OteV,which we find useful in the formulation of syntactic analysis procedures.
s(a) « ( Mm l(x), if OteVNxe^(G)(2.6)
Thus, the support of a nonterminal element X is the minimal length of a termi-nal string which is derivable from X in G; clearly, according to our previous
assumption on nonterminals, we have,
D(ty,9[G) -hy 92, ..., 9n], (n> 1),
s^(G) = {x|x is in VT, and X=> x holds}
1 , if acVT
6
s(X) > 0, for all XcVN (2.7)
The notion of support extends to strings in a natural way. Thus, if 9is astring in V, and 1(9) « n, then,
ns(9) -£ s(9(i)). (2.8)
i»l
P in (2.1) is a designated element in V„ which has the followingspecial linguistic significance: The CF- language L(G) generated by thegrammar G .is the set of all terminal strings of Pin G. Thus,
(2.9)
For each string, x, in L(G) there exists a phrase-marker (P-marker)
of x in G. which is a structural description of x in G relative to P. TheP-marker of x is based on the P-derivation, D(P,x|G), of x. A structuraldescription of x in a grammar G is a detailed representation of the structure
of x in terms of the replacement rules that determine the successive steps ofits derivation within that grammar.
Before proceeding further with our definition of the notions of P-
marker and structural description, let us introduce a grammar, G'
(2.10)
which is equivalent to G (in terms of generative capacity), but it differsfrom it in the form of representing the replacement rules, p 1 In G' is aset of labeled replacement rules which is related to p in G as follows:
For each rule A ■* 9 in p there corresponds a labeled rule in p' whichhas the form
(2.11)
where mis the number of rules in p, and K± is the label of the ith rule (it
names that rule). Thus, a reference to a rule R^p is intended to designate
the replacement rule A■* 9. The finite set of rule labels R±, ..., Rm is
called the rule labels vocabulary, and it is denoted by V^ in G.By using the notational convention given in (2.2), we can write a
labeled replacement rule in the form .
where 9^ , ..., 9 «V, and n « 1(9)
L(G) » = [x|x is in Vx, and P=> x holds].
G' -< V, VT, VR, p», P >
\: A - 9, (1 < i <m),
{\\ (?\ (n)»R :A -r 9VU <?^J ... 9 , (2.12)
cV,
and n « 1(9). In the interest of notational
7
convenience, we also use the following labeling scheme for parts of a replace-ment rule:
R^ ' names the left side of the replacement rule R (in (2.12),R <o) -A),
R , I<J < n , names the appropriate component of the right side
string in the replacement rule R (in (2.12), IV '■ 9 , etc.).
Each labeled replacement rule can be represented by a special directedgraph where the order of the string components 9* , 9 , etc. is explicitlyindicated by numbering the graph branches in an appropriate way; such a graphis shown in Figure 1. We can extend the graph representation used for indi-vidual rules to obtain a combined overall description of the entire grammarG'; we denote this representation by T(G'), and we call it the graph of G.The graph r(G')is constructed as follows:
(i) Each node of the graph corresponds to an element from the vocabulariesV„, VN, VR . The nodes are labeled appropriately and they are classi-fied (in the obvious way) as terminal, nonterminal, and rule nodes,
(ii) From each nonterminal node (corresponding to a nonterminal element,say X) there emanate jx branches (jx > 1) into the nodes correspondingto rules of which the nonterminal Xis a left side. These branches arenumbered 1, 2, ..., jx ; the numbering is arbitrary, however, somespecific methods of numbering may be better suited than others forspecific computer realizations of procedures that use the notion ofr(G').
(iii) From each rule node (corresponding to a replacement rule, say R )there emanate n branches into the terminal and nonterminal nodes
(1) (2) (n )that correspond to components 9 ,9 , """, 9 ', of the right-
side string in R. . These branches are numbered 1, 2, ..., n, sothat a branch entering a node 9^ ' is assigned the number k. Forconvenience, we also mark the n 'th branch with fTo illustrate the notion of the graph of a grammar, let us consider,
JL
*s an example, the following simple CF-grammar, Gj:
This illustrative example was used by Griffiths and Petrick in [2].
Cp VN = {P, A, B}
*" " :—■——
8
VT - {a, b, c, d],
p ' m R 1 : p ■*■ AB
R : A ■*" ABb
R3 : A* (2.13)R. : B * BdAR : B -*■ be
The graph r(G|) is shown in Figure 2.
A derivation In C, denoted D(ty,9|G'), is a stronger version of the
notion of derivation, D(ty,9.G), used in G (see 2.4). In addition to the
strings that enter as steps of a derivation in G, we include in D(ty,9|G') a
specific record of the rule applications that are associated with each
transition between consecutive steps.
The notion of rule application is extremely important for problem -solving procedures in general and it has great significance in our present
context (as it will become apparent in the subsequent discussion). Its
nature is that of a process which effects a specific transition between a
pair of strings. The rule application process consists of two subprocesses
in sequence. The first is a selection process which decides what part of
what relevant rule is to be identified with what part of the input string (we
call the latter part the application site); the second is an execution proces
which carries out the replacement of elements prescribed by the chosen rule
at the chosen application site, and it produces the output string.
Consider, as an example, an input strifjg 9 which has the element X in
one or more of its sites, say 9- X, <P(U>
X 2 »(V)X 3 <?W \> "»»*"
9<») - 9<v> - 9
(w) -X, and x,, x2> X3> X4"c strinSs in V <P°" ibly ""**>"Consider next the set of all rules of replacement that have X at their left
side, i.e., {R|R(0) - X}. Let us assume that a specific rule application
process takes place, where
(i) both a rule 8 6 £R|R(o) -X} (suppose that R designates the rule
X - o>) and an application site in 9 (say 9(V> are selected, and
(ii) the element 9(V> f replaced by co and a new string,
*. v 9<u) xm X 3 ?
(W) \is Benerate8enerated* We represent the record
9
Fig. 1. Graph representation of a labelled n-nary replacement ruleRi : A-*^1) ... <t>(n), where $(*), 4>(2 ), ..., a>(n)eV.
Fig. 2. The graph, T(G|), of the grammar G...
10
of this rule application process by the following sequence:
The parenthesis (R(o) , v) indicates that the left side element of^the rule R , i.e.,R(o) , is applied to the v'th site of 9, !"«" 9 V
We can now define a ty-derivation of 9 in G', i.e.,D(ty,9|G'), #s
sequence which has the following form:
(2.15)
where(i) 9, ** *> \ " 9 ,
(ii) a subsequence [^ (Rk<o) , qk >> 9^., ], for I<k < n and n> 1,
stands for the record of a rule application where (\ , qfc) mdi
cates that the left side element of a rule \ (where
R €{R|R(o) - 9(<lk) }) is applied to the qk 'th site of 9k (we have
1 <qk < K<Pk»-Consider now, as an example, a specific P-derivation of the string
x » abed In the grammar G{ (given in 2.13).
(R<o) , 2), abed]. <2 ' l6>This derivation is shown in graphical form in Figure 3(a). In this figure, we
are using the graph representation of replacement rules that we have intro-
duced previously, and a representation of strings in the form of sequences of
nodes of appropriate types; we are also using special branches (denoted by a
double line) to indicate the "carrying over" of the string elements that are
not affected by a rule application from one step of the derivation to the next
Let us now apply the following elementary (condensation and simplification)
transformation (which we call a)
a: (i) condense the "carrying over" branches, (ii) substitute a rule
label for a rule application parenthesis, and (iii) eliminate (for
convenience) Jhe numbering of branches, but maintain the horizontal
ordering of the branches that descend from each rule node.
[9,(R(o) , v), ty]. '<*"">
D(ty,9|G') - t9,, (R-j i <.]*>> <P2 ' (R2 ° ' q2 J ' *"'
Vl>> *n]' (n^°'
D(P, abcd|Gj) - [P,(R;°! D, AB,(R$0) , 0, aß,(R<°>, 2), aßd,
11
o
a. * / iTsg , V 2 -\ Vi -°\ 10 H Ld o\ ° T iio »»q » q i
a*"8oa.10<vvvoo
2 ca
■scd
- JH g in~ Ssr -o
o o o CD o? E E — Ecc a: <r j£ j^CO CO CO CO CO♦-" *d -d x: jd
I1InIdMTI
(X _| JrlI c
—^. I r-
ifi?lI
I
—T
I
vo
o <
Ma: idUJ XQ h-» 2CL
_3
00a"H
4-1car- 1a1J-l
CU
4-1
>4-<og"l-l4-1cd>v
CU
"aIa.
<:.
CO
M■H
— CM ro sr tf)
12
We obtain the tree graph shown in Figure 3(b). This tree graph is a P-marker
of x = abed in Gl ; we denote it by The transformation a, whichwas just outlined, produces, in general, the P-marker of a string x in Gl ,i.e., the tree from its corresponding derivation D(P,x|G') .
The P-marker of a terminal string provides the essential information
contained in a specific derivation of that string. In particular, it shows
the structure of replacement rules that transforms the designated element P
into the terminal string. For each nonterminal element X that assumes the
role of a rule application site during a derivation, the P-marker provides a
trace of rule applications down to the part of the terminal string which is
derived from X. The P-marker does not conserve information about the specific
step of the derivation at which a specific rule has been applied; it just
shows that the rule was applied at some step.
P-markers are trees that are rooted at the nonterminal node P and
that have as terminal nodes (in the appropriate horizontal order) the terminal
elements that compose the string whose structure they display.
The notion of a P-marker is a special case of the more general notion
of a structural description. A structural description of the string 9 in V
relative to the nonterminal element X (we denote it by is a tree
which can be obtained from a derivationD(X,cp|G') via the transformation a
which we have discussed previously. The tree is rooted at X, its
terminals are the (horizontally ordered) components of the string cp, and its
structure displays the manner in which a specific set of replacement rules in
G 1 is combined to effect a transition from X to cp.
If a string 9 in V is derivable in G' from XcVN and if furthermore
there exists a single structural description of cp relative to X, then the
string is called syntactically unambiguous relative to X; if there is more
than one structural description, then the string is called syntactically
ambiguous relative to X. This notion of syntactic ambiguity is carried over
to sets of strings as follows: Given a set {cp|x=> cp holds], the set is
syntactically unambiguous relative to XcVN if there is no string in the set
which is ambiguous relative to X; otherwise the set is syntactically ambiguous.
These notions are carried over in the obvious way to P-markers and CF-languages
L(G).
jrp
13
k
Since it is a desirable property of programming languages that they
be syntactically unambiguous (and therefore it can be assumed that their
designers make efforts to satisfy this desideratum), then it can be assumed
(for practical purposes, since the problem of syntactic ambiguity for CF-r ix 1
languages is undecidable in general ) that every valid string in such a
language has a single P-marker, and furthermore all the substrings of validstrings have unique structural descriptions relative to the nonterminals from
which they are derivable. This assumed property of programming languages hassignificant implications on the types of procedures that can be proposed for
their syntactic analysis (as we will see later in our procedures where effort
allocation decisions are made).
The notion of the graph of a grammar, r(G'), and the set of structural
descriptions in that grammar are strongly related. For all nonterminals XcVN,and for all the strings cp in V that are derivable from X in G', the set of
structural descriptions of cp in G 1 relative to X can be effectively generated
from r(G') via a generation procedure of the type outlined in the next
paragraph.
The generation procedure based on F(G') consists of systematically
tracing and recording all the distinct tree paths of the graph T(G') that
are rooted at X; each such tree path corresponds to a structural description
relative to X. A tree path starts at X; it follows a single directed branch
out of each nonterminal node which has been reached by the path; it follows
all the branches that leave a rule node which has been reached by the path
into the adjacent (terminal and nonterminal) nodes; and it stops only at
terminal or nonterminal nodes. Several approaches are possible for organizing
the systematic generation of the set of structural descriptions, and also for
selectively generating structural descriptions that have certain properties,
as well as for stopping the generation process under given conditions.
A generation process of special interest is that which starts at the
designated node P of T(G') and produces all the structural description trees
of terminal strings, i.e., the set of P-raarkers of the language L(G'). In
most of the interesting cases, where the membership of a language L(G') is
infinite,
the set of all P-markers is infinite. This set is generated by a
T(G') which contains loops. All the nonterminal nodes that are parts of
loops are associated with elements that are called recursive. Three types of
14
recursive elements, which have been found significant in various linguistic
studies^ , will be discussed later in connection with the formulation of our
syntactic analysis procedures. They are:
(i) Left-recursive elements. These elements occur in loops of r(G')
where, for each rule node on the loop, the loop branch leaving the
node is marked with a 1. Left-recursive elements occur iteratively
in the leftmost chain of a P-marker tree.
(ii) Right -recursive elements. They occur in loops of T(G') where, for
each rule node on the loop, the loop branch leaving the node is
marked with a T. These elements may occur iteratively in the right-
most chain of P-marker trees.
(iii) Splf-embedding elements. They occur in loops of f(G') where, the
loop branches leaving the rule nodes on the loop are not all marked
consistently with either 1 or f. Self-embedding elements may occur
iteratively in a tree chain of the P-marker which is neither the
leftmost or the rightmost.
In the graph r(G ') of our example (see Figure 2), it can be seen that the
nonterminals A. and B are left recursive.
B. Thp Syntactic Analysis Problem
Our general objective is to find efficient solution procedures for
the following problem, which is commonly called the language recognition
problem:(ir
Q
): Given an input string xin VT, (i) determine whether xis well
formed in the language L(G) (i.e.,whether x is a member of L(G)), and
(ii) if x is well formed, find a P-marker of x in Gl .Since the answer to the first part of ir
Q
is reducible, in most non-
trivial cases, to that of attempting the construction of a P-marker for x,
then the central problem in language recognition is that of P-marker con-
struction. This problem is also becoming of increasing practical significance
P-marker (parse) generation of an input string constitutes the first stage of
15
processing in syntax-directed compilers for programming languages and in
certain "question-answering" information retrieval systems that respond to
restricted context-free fragments of English. In systems of this type, theparse generated, by the syntactic analysis stage is the input to translationand interpretation rules that assemble the appropriate computations in response
to the input string.
Since it can be assumed that the source of the input string x (namelythe programmer or computer user) is restricted (by intention) to generation ofwell-formed strings, then the reason for the possible occurrence of ill-formedinput strings 'is the presence of communication error (remember that theprogrammer or computer user take part in the communication process also).
While it is sufficient, in many cases, to know simply that an error hasoccurred, it is more desirable, in general, to obtain error identification
information from an unsuccessful attempt to construct a P-marker, and betterstill, to automatically use this information for error correction. Therefore,even in the case of ill-formed input strings it appears desirable to makeserious attempts towards the construction of a P-marker — with the view to
use information- derived from these attempts for error control.
The statement of the problem. 7r
Q
is especially appropriate for
language theorists whose main objective is to propose and subsequently test
the validity of generative grammars that are intended to characterize given
languages; since our main objective is to propose efficient procedures that
"understand" (in our context, this has the limited meaning of "that respond
appropriately to the form of") language strings that are generated by a given
.grammar (via a generation process which has certain error properties), thenit is appropriate for us to reformulate the statement of 7Tq as follows:
(7T,): Given an input string xin VT, (i) construct a P-marker of xin
G* and (ii) if no P-marker exists, provide information (for purposes
of error control) about the unsuccessful attempt to construct a P-marker.
In this paper we are mainly concerned with the first part of theproblem TT, . A satisfactory solution to the second part requires the formu-
lation of a specific rationale for error control, which is outside our present
scope. However, our approach to the construction of P-markers makes it
possible to record considerable information about the construction process
itself — information which appears relevant to error identification and
correction.
16
The problem as stated in
TT,
requires the construction of a single
P-marker of x, if it exists. The construction of a single P-marker doesn'traise any question if the language is syntactically unambiguous. As we
pointed out previously, procedure-oriented programming languages can be
assumed to be unambiguous (at least it can be assumed that the part of the
language which is considered as proper input to a syntax-directed analyzer
will be free of known ambiguities); a similar assumption can be made about
other computer source languages that are designed to approximate smallfragments of natural language and that are intended for use in "questionanswering" systems, where the system is assumed to selectively respond to
syntactic forms of input strings *
Even in the case of syntactically ambiguous languages, we believe
that the introduction of explicit, non-syntactic, rules of preference for
ordering the generation of alternative structural descriptions (so that a
single "preferred" structural description is produced) will not necessarily
be detrimental to the machine "understanding" of an input string;— on the
contrary, it may provide an independent handle for dealingwith the difficult
problems posed by ambiguities. In some of our proposed procedures (to be
discussed in Section VI), the selection of a "preferred" P-marker is based on
the principle of minimizing expected computational effort - a purely pragmatic
and non-syntactic notion. Clearly, this notion is sensitive to the specific
formulation of the syntactic analysis problem, to the solution approach, and
to our definition of "computational effort"; we will return again to this
point later.We can how formulate our objective in more specific terms. We wish to
obtain syntactic analysis procedures
mapping 6;that efficiently compute the following
if P =» x holds (2.17)if P =» x does not hold
where indicates the "preferred" P-marker of x, and E(P,x|Gj) denotes
an error description (this can be a simple indication of failure or a more
elaborate message). Our emphasis is on the concept of efficient computation
of 6; roughly, we expect that such a computation should require a minimal
* As it can be seen from a recent survey [ 5 ]by
Simmons,
not all approachesto such systems attempt to avoid syntactic ambiguities of source language.
for all x inVT , 6(x)= j p^lG ')( E (P,x|G')
17
expected expenditure of computational resources (or computational effort).To make the notion of efficiency precise (and meaningful) we need to introducea definition of computational effort which both satisfies our intuitive requirements and is applicable in a uniform way over alternative syntactic analysisprocedures. We need therefore, as a prerequisite, a broad framework forformulating alternative procedures in a uniform way. Furthermore, it isdesirable that this framework be closer to real computer programs rather thanto abstract machines. After all, the requirement for efficiency comes froma desire to attain solutions of "real life" problems faster and more economi-cally, and it is clear that results regarding relative efficiency of procedures
that are formulated in a framework closer to real computations will be moreuseful (as -guides to actual selection of computational programs) than resultsobtained in the world of Turing machines. Since several procedures forsyntactic analysis have already been formulated in the past (both for program-ming and natural languages), it is desirable that these procedures beinterpretable in our framework; this way, their relationships can be betterunderstood and they can be compared with the new procedures that we willformulate.
Given a definition of computational effort and a framework for formu-lating procedures, we are still faced with the problem of actually creatingclasses of new promising procedures, comparing them with existing procedures,and choosing among them those that are optimal with respect to computational
effort (or at least of ordering procedures by degree of optimality). This isan extremely difficult ordering and optimization problem which, at present,
can be approached only empirically (i.e.* by computer experimentation) in most
nontrivial cases; in such cases we are not in a position to demonstrate con-clusively that a candidate procedure with alleged optimality properties isindeed optimal. Procedures of this type are heuristic procedures. A heuristicprocedure has a status similar to that of a theory in an empirical science;it is the best procedure that we know how to devise given existing ideas and
experience - however, it is always possible that its validity (in our case,its optimality) may be refuted at the next computer run. Heuristic procedures
[6] are a central subject of study in artificial intelligence. An important
class of heuristic procedures is based on an overall scheme offlexible and selective search for solution which proceeds by successivereductions of the initial problem into subsidiary problems. We call such
I
18
procedures, (heuristic) problem-solving procedures of the reduction type;
they apply to a large variety of problem situations- (well-defined problems of
the theorem-proving type), provided that the problem is cast in the appropriate
form. The ."appropriate representation of the problem" is one of the main
criteria for the choice of framework in which our procedures have to be
formulatedThe central importance of building" an appropriate framework for
problem and procedure representation, as well as for formulation of efficiency
measures should be evident by now. In the next section we shall introduce
such a general logical framework for our problem. We shall subsequently
formulate within this framework classes of problem solving procedures of the
reduction type for syntactic analysis. These procedures will incorporate in
their design features that reflect our intention to minimize (our notion of)
expected computational effort.
*",
19
k
111. FORMULATION OF THE SYNTACTIC ANALYSIS PROBLEMIN SYSTEMS OF NATURAL INFERENCE
It takes only a slight reformulation of our syntactic analysis problemto recognize that it is a theorem-proving problem. We note that the CF-grammarG is a combinatorial system (a restricted semi-Thue system (see Davis ' ))which has V as its alphabet, P as its single axiom, p as the basis for itsproductions, and strings in Vas its words. Under this interpretation, forany string (word) <p in V a derivation sequence, D(P, cp|G), is a proof of CP inG, and <p is a theorem of G. A derivation D(P <p|G') can be regarded as ajustified proof of cp, where, in addition to the words that constitute steps
of the proof, justification for each step is provided in the form of appli-cations of rules in p that form valid productions. Furthermore, a structuraldescription can be regarded as a structure of justifications that"holds together" the proof D(P,cp|G') . The strings of the language L(G) formthat subset of the theorems of G whose component elements are taken exclusivelyfrom the sub-alphabet Vdj,. If x is in L(G), then x is a theorem of G, and aP-marker of x corresponds to the structure of justifications in a proof of xin G. Thus, the problem of constructing a P-marker of xin G' corresponds
to the problem of constructing a justified proof of x in G and then extracting
from it the underlying structure of justifications.[ 9 1
If we adopt Davis' broad notion of logic , which includes in anatural way the notion of a combinatorial system such as G, then we can alsoregard our problem as proof construction in a logic which has a single axiom
P and whose rules of inference are the productions of G.
In recent years there has been considerable work on the mechanizationof proof construction in various systems of symbolic logic; both the propo-sitional calculus and the predicate calculus have received attention by
several investigators. ' ' The propositional calculus has alreadyprovided a fruitful proving ground for the development of concepts and methods[ 7 8 1in an important class of heuristic problem solving procedures. ' i In otherwork with heuristic procedures for proof construction in the propositional
ffi 19lcalculus we have found that a formulation of the proof problem in asystem of natural inference ' has considerable advantage over formulations
* The natural inference approach, or the method of "subordinate proofs" wasdeveloped in the early 1930's by Gentzen [13] and Jaskowski [ Rj; it wasused more recently by Fitch [15] and Nidditch [16]; and it was first used byWang [ll] for obtaining proofs by computer.
A. Theorem-Proving Formulation
20
within conventional axiomatic systems. The natural inference approach, permits
us to cast the proof construction problem in a broad framework of machine
problem solving, which is appropriate for the formulation of procedures of the
reduction type. In these procedures, the steps towards solution appear to
have striking similarity to the natural steps of reasoning observed in human
problem solving. Furthermore, the organization of the search for solution
required by such procedures appears to be well suited for computer implementation.
Since the syntactic analysis problem is essentially the problem of
constructing a proof in a given logic, we will attempt a natural inference
approach to the problem,* with the expectation that it will provide us the
desired unifying conceptual framework for syntactic analysis procedures.
We shall discuss next a class of natural inference systems for our
proof construction problem; these systems will provide the basis for a class
of augmented systems, to be discussed subsequently, in which our problem
solving procedures can be naturally formulated.
B . The Class of Natural Inference Systems
We shall associate with a CF-grammar G a class of seven natural inference
systems of logic, which we denote by Na(G), where cre{t,^,r, (t,i),(t,r), U,r),
(t,i,r)}. We intend to designate by a the "strategic approach" to proof
construction associated with each of the systems; a - t designates an approach
from the "Top", a - & an approach from the "Left", err - r an approach from the
"Right", and the other values designate combinations of these approaches (we
will shortly give an interpretationof these terms).
The well-formed formulas (w.f .f .s) of \(G) , which we denote by
S, S. (i
■>
0, 1, 2, ...), are expressions of the form
a » 9, (3.1)
where aeV 9 is a string in V, and the intended interpretation of the double
arrow, =», in (3.1) is as in the linguistic system (see Section II.A); namely,
it denotes the reflexive and transitive relation "9 is derivable in G from <*" .The systems N (G) have a single axiom in common, they differ however
in their respective sets of rules of
inference;
they also have in common a
tree form of proof, which is especially well suited for the uniform formulation
'* We have recently found that Lambek [17] has proposed a formulation whichhas some conceptual similarity to ours; it is mainly geared, however, toproblems of grammar and dictionary construction.
21
of various proof construction procedures. We have chosen these systems so
that for every w.f .f . P =» x which is valid in any N (G), x is well formed in
the CF-language L(G), and moreover a tree proof of P =» x in any system is ina simple one-one correspondence with a P-marker of xin G. In Figure 4we
give a schematic representation of the proof construction situation that we
face in N (G) . We find it suggestive to represent a w.f.f., say P=»x, as a
triangular figure with P at its apex (we call P the "Top" element) and thestring xat its base (we call x the "base string", x ' the "Left" element,x the "Right" element, and we represent the string as a Jagged line). We
regard such a triangle as an outline of a required configuration of replace-
ment rules (each rule having the tree form shown in Figure 1) which is to
bridge the space between the "Top" element and the "base string"; an appropriate
configuration of this type is in effect, a structural description tree. The
meaning of the different strategic approaches that characterize the different
systems N (G) can be easily interpreted now in the light of the triangular
description: an approach of a certain type indicates the corner (or combination
of corners) of the triangle from which the attempts to construct the bridging
configuration are made. While all the systems Nff
(G) are logically equivalent
(they all have the same set of theorems), they differ in their approach to
proof construction; this property is of no serious consequence for logic
£erse, but it is of primary significance for us, since it provides us with
a rich variety of alternatives - on basis of which efficient proof construction
procedures can be formulated.We shall now consider in detail the systems of natural inference Na(G) .
Axiom CL : For all strings *, 9in V, the w.f.f. **9 is valid in all the
systems N^G) if v/ « 9-
The Natural Inference System N^(G±.Rules of Inference fl }► . The set {l} contains m rules, where each
■ ~ (1) (2) (ni)rule I e{l} corresponds to a rule of replacement R£ : A ■*■ 9 9 ... 9 ,(1 < i < m), of p' in G' in the following way:
* This is an axiom schema, and so are the "axioms" of Section V. Since noconfusion is likely to arise in our present context we are referring to
them as axioms.
Lemma 3.1 If a w.f.f. *=»9 is valid in N^G) by Oj , then 9is
in G.
Proof: By reflexivity of the relation =»,
Po("T0P" ELEMENT)
"TOP" APPROACH
\J1RIGHT" APPROACH
a/vvo x(l } ("RIi3HT"ELEMENT)
Fig. 4. Schematic representation of the alternative approaches tothe construction of a proof of P => x in the systems
VG>-
23
k
I ,:* For any nonterminal X such that X■ A, and for any (non-empty)t,i .string w in V, if there exist non-empty strings Xj> X2' ""> Xn in vsuch that X]X2* , *Xn * w, and all thew.f.f.s
9* 1* " Xj» <P(2) " V ""' <p(ni^ *\ ' *re valid ln N
t(G>' then x s* w
is also valid in N£ (G) .Lemma 3.2(t). If a w.f.f. X =» w is valid in N (G) by a rule of inference
I *[!■}+> then w is X-derivable in G.t, i t
Proof;
Since X- A, then (by R.) X can be replaced by the string
cp - 9^, ..-., 9^ni^; therefore, X*» 9 holds in G(9 is X-derivable in G) .Now, if a cp-derivation of cv exists in
G,
an X-derivation of co also exists in
G (by the transitivity of the relation =>). If it is possible to partition the
string o> in n. adjacent substrings y., ..., v such that
\,
is 9 -derivable1 (n*\ ' "i 'in G, ..., v is 9 '-derivable in G, then it is clear that u> is 9-derivable
in G (by definition of the notion of derivation in G). Hence, a rule ofinference I is consistent in G, and the Lemma is proved. Note that by
t> iintroducing the partition idea, we break down the initial question of deriva-bility into n. subsidiary questions of derivability each of which is identical
in form with the first, and moreover it can be resolved independently of the
others (thanks to the context-free nature of G) .In Figure 5 we give a graphic interpretation, (in terms of the
triangular descriptions introduced in Figure 4) of the situation which under-
lies the specification of I .. The rule of inference considers a part of a(hypothetical) configuration that can possibly establish the required bridge
over the "problematic" triangular space. Specifically, It considers the
application of a replacement rule R., where the element R.^ 0' is "connected"to the application site X; we call this a "Top" application of R^ and denoteit by RC (the more comprehensive notation, (R^ , 1), introduced in (2.10),
is not needed here since we have an unambiguous "Top" application site) . The"Top" application of R. fills part of the triangular space between X and u> ,and it results in n. new "Top" elements (the elements of 9) dangling atop the
problematic space. Given a partition of the base string in n^ substrings
X > '"> X, > eac^ °^ the d^gli1^ "T°P" elements couples in an orderly way
with the substring "below it", and n. new problem- triangles are created. The
rule I asserts that a structure bridging X and coexists, if there existt> i
bridging structures for the u^ new triangles.
* A graphic interpretation of these rules of inference will be given shortly(see Figure 5), and a proof based on them is shown in Figure 6.
"TOP" ELEMENT X ; WHERE X =A
STRING.w
[THE DASHED LINES OUTLINE PROBLEM -TRIANGLESTHAT NEED TO BE FILLED BY APPROPRIATE CONFIGURA-TIONS OF RULE APPLICATIONS]
Fig. 5. Graphic interpretation of the situation considered in thespecification of a rule of inference lt,x of Nt(G)» whichcorresponds to a "top" application of a rule of replacementK± of G.
N. "TOP" APPLICATION OFR^ONXA I?T REPLACEMENT RULE R-,
25
Logically, the "Top" application of R acts as a keys tone holding
together, and closing, an argument; from the point of view of proof construction,
the rule application acts as a bridgehead from which further lines of con-
struction are initiated.The idea of partition, that enters in the formulation of {l}
fc, is
central in all our natural inference systems and in all the problem- solving
procedures that are based on them. It enables us to reduce our problem
into parts and also to benefit by our global view of the situation, so that
necessary constraints that appear at subproblem boundaries can be used fora priori elimination of irrelevant lines of construction.
A partition of a string o> in n parts is a n-tuple [ly I^, ... 1^ ],which we denote by pn (<"); 1 , 1 < i < n , stands for the length of the i'thsubstring in the partition and in general,
(3.2)
In the system N (G), the substring lengths satisfy the condition,
for 1 < i < n , 1 > 0. (3.3)
It is convenient for our purposes to introduce a tree representation
of the situation which is relevant to the specification of a rule of inference
I Such a tree, which we call an inference tree, has the following form:
(3.4)
the n parts resulting from the
XiXo """ Xa " <°' This labelledwhere the substrings y., X2> """' \» are
partition p (<") under consideration, and
tree is made of,/j\ M .f.f t nodes, graphically shown as parentheses that enclose the
labeling w.f.f. 's,
( ijL) an inference node, denoted by " in the graph, and labeled by the pair
of conditions under which the inference takes place, namely the
l± - !(<*>)I<i<n
26
specific application of a replacement rule, R , and the specificsegmentation pn (cv),
(iii) directed branches denoting the direction of the logical argument.
It should be noted that the convention shown in (3.4) on the specific orderingof the bottom w.f.f. nodes is significant for some of the tree manipulationprocesses that we will discuss later.
The assertion that a w.f.f. is true by the axiomfl. (we call such aw.f.f. a conclusive w.f.f.) can be represented in tree form as follows:
(+ J*)1 (3.5)
This degenerate tree (it is a single branch, or a link) is called an axiomlink; it has a single w.f.f. node (the conclusive w.f.f. node), one axiom node(denoted in the graph by „f)/ and a directed branch showing the direction ofthe argument.
A proof in tree form in the system N (G) is a labeled tree which ismade of inference trees and it terminates exclusively with axiom links to fl. .More specifically, the proof tree has a w.f.f. -node as its root; this node isthe consequent node of an inference tree, the antecedent nodes of that inferencetree are themselves either consequent nodes of other inference trees or con-elusive nodes (linked to CL), and so on, until all the tree terminals are Q. '
A w.f.f. S is a theorem of N (G) if it labels the root node of a tree
proof in N (G) . We denote the tree proof of Sin Nt(G) by D(s|Nt (G)).As an example, we show in Figure 6, the proof in tree form of the
w.f.f. P =» abed in the system Nt(G.), where Gj is the grammar of our illus-
trative example (given in (2.12)). This proof gives the solution in N (G) ofthe syntactic analysis problem for x « abed which was previously shown, inits conventional linguistic form, in Figure 3. [The general question of
correspondence between a proof in N (G) and a solution in the linguisticsystem will be discussed shortly.]
To "read" the proof D(P =» abcd|Nt(GJ) in Figure 6, we have to
follow the information associated with the tree nodes in the direction of thearrows (from the terminals to the root). Each conclusive w.f.f. is valid inNt(Gj) by the axiom (L (thus, a =* a, b=»b, c=»c and d *» d are valid); eachw.f.f. which has only valid w.f.f. 's as antecedents in a given inference tree
27
Fig. 6. The proof in tree form of P=> abed in the system Nt (G 1).
'R, :P—> ABR 2 : A-* ABb
p l s < R 3 : A—*- aR 4 : B-* BdR 5 : B—"* be (P=»abcd)
28
(these are the bottom nodes of the inference tree) is also valid (thus, firstA =» a and B =» be, then B =» bed, and finally the candidate theorem P =» abed arevalid in N
t(G,)).It should be noted that all the subtrees of a proof tree in the system
Nt(G) are themselves proof trees in Nt(G), and therefore their root w.f.f.'sare theorems; thus, all the intermediate w.f.f.'s in a proof tree are theoremsin Nfc (G) .
It is most likely that if we are faced with the problem of constructinga tree proof in Nfc (G), rather than with the task of "reading" it (ascertainingwhether it is valid), the direction of our attention would run against thearrows of the proof tree; we would start with the candidate theorem P =* x, wewould attempt to apply in reverse an inference tree where (P =» x) would be thetop node and one or more (new) w.f.f. nodes would be the bottom nodes; wewould next focus attention on the w.f.f.'s in the bottom nodes; and we wouldtreat them recursively in the same manner as the candidate theorem, until wewould reach terminal w.f.f.'s that are directly recognizable as valid by theaxiom of the system. Our general approach to automatic proof construction isto develop prqblem-solving procedures that proceed intelligently in the con-struction of a proof, in accordance with the "backward reasoning" scheme thatwe have just outlined. We shall discuss such procedures later, after completingthe presentation of the logical framework which provides a basis for theirformulation and study.
The Natural Inference System N. (G) .Rules of Inference fll^; The set {l}^ contains m rules, where each
rule Ig j^CIL corresponds to a rule of replacement R.: A ■*■ <jr '9' ... 9^ *"',(1 < i < m), of p' in G' in the following way:
Ig j_: For any nonterminal X, and for any (non-empty) stringa>- a)
(I\iA 2... aP*' such that "P> - </'>, if there exist stringsXp X2> '""" Xn m\n V (where Xi "^y be empty but the remaining strings arenon-empty) such that X9XO """Xni Xq ="J (where ok ' denotes the comple-ment of cd relative to of-*), i.e.,cjoo) - v£2K.. cd
(N '), and all the w.f.f.'sX=» Ax., 9 =» \2, ..., 9 l* => Xn are valid in N^G), then X=»cd is also
valid in N^G).Lemma 3.2(1). If a w.f.f. X=»*»is valid in N^(G) by a rule of inference
Xi ie^Z' then w is x-derivable in G-* A graphic interpretation of these rules of inference will be given shortly(see Figure 7), and a proof based on them is shown in Figure 8.
to
"TOP" ELEMENT X
[THE DASHED LINES OUTLINE PROBLEM- TRIANGLES THATNEED TO BE FILLED BY APPROPRIATE CONFIGURATIONS OFRULE APPLICATIONS]
Fig. 7. Graphic interpretation of the situation considered in thespecification of a rule of inference Ig £ of Ng(G), whichcorresponds to a "left" application of a rule ofreplacement R. of G.
30
Proof: Suppose that, for the given R., it is possible to partition the sub- ]
string v in ni adjacent substrings x2-> """> >X] Bucn that MX2 is
9^2' -derivable in G, ..., v is qr -derivable in G, and (ii) Ax, isi fO
'
(liX-derivable in G. Since R. is such that 9 ■ > then the string00 XoXvXn is A-derivable in G (by suppositions (i) and properties of =»)..-
Therefore (by supposition (ii), and properties of «»), the string «> X2X3'"Xn Xis X-derivable in G, which (see definitions of string parts) proves the Lemma.
In Figure 7 we give a graphic interpretation of the situation which
underlies the specification of a rule 1^ .. In the present case, the rule of
inference considers an application of a replacement rule R., where the elementrO is "connected" to the application site aM, i.e., the "Left" element ofa>; we call this a "Left" application of Rt . and we denote it by R, . For a
(7)given partition of the string cdx ' in n^ substrings, n^ new problem- triangles
are created, as shown in the figure. The rule 1^ asserts that a structure
bridging X and en exists, if there exist bridging structures for the n^ newtriangles .
In the system N.(G), a partition n -tuple [lj, ..., l n ], which we
denote by p (u^M, specifies the lengths of a set of n substringsn ~mXi """' Xn of "* ' the Bubfitrin8 lengths satisfy the conditions,
1, > 0> lt > 0 for 1 < i < n , and
(3.6)
As in the case of N (G), we introduce in N^(G) the notion of aninference tree, which represents the situation in which a rule of inferenceI, . is specified. Such a tree has the following form:
where the substrings Xi» X?' fy """> Xn *re the n< parts that result from the
I<i<n
31
i
partition p (.&> ) under consideration, and x2X3' "" Xn Xi" " The con-vention shown in (3.7) on the specific ordering of the bottom w.f.f.'s will beused subsequently in our tree manipulation procedures.
The notions of a tree proof is the same in N (G) as in N (G);* t
the inference trees that enter as building blocks in a' tree proof ih N.(G) areof the type shown in (3.7). A w.f.f. S is a theorem of N,(G) if it labels theroot node of a tree proof in N^(G); we denote such proof by D(s|N,(G)).
It is of interest to note some special cases of partition that occurin N»(G) proofs, and that appear in the proof of Figure 8. (i) If a unary
replacement rule, R , is the basis for an inference tree whose root node is
(X =»^), then the partition is degenerate; since n ■ 1,- the partition resultsin a single substring Xi with !j = K">)-1 (in Figure 8 this is the case with
the rule application R_ on (P =>abed)). Xii) If the substring X. is empty, then
the segmentation has the form [0, I^, ..., 1 ], (since 1. ■ 0); this can occurif, for a rule of replacement R : A "*■ 9, whicn is considered for a w.f.f.X=» <a, we have both X= A and tvS '« 9^ (in the proof of Figure 8 this is
the case with the rule applications R. on (P =» Abed) and R, on (B =» Bd)).
The Natural Inference SystemJN (G). .Rules of Inference fll : The set {i} contains mjruleSj^where each rule—~ (*m) (n<~l)
"
<1h rt\I e{l) corresponds to a rule of replacement R.: A ■*■ 9X i '9x *" '. . .qr'-'qp- ,J ,(1 < i < m), of p' in G' in the following way:
I : For any nonterminal X, and for any (non-empty) string
cd = cd^M.. v/2'cd^ such that <xV '- cp ,if there exist strings
x x2> """■» Xn in V (where X] "^y be_gmpty but the remaining n^l stringsmay not) such tnat XjXn Xn .] """ X 2 = " ' (where cd ' denotes the coraple-
H*)* cV) rib c£)ment of oq relative to a? ' i.e. cdv
'» cdv ' . . . cdv '), and all the w.f.f.'sx=* Xi A> 9 V -""»9 =» Xn are valid in Nr(G), then X=»cd is alsovalid in Nr(G).
Lemma 3.2 (r) . If a w.f.f. X=»cd is valid in Nf
(G) by a rule of inferenceI . e{l) , then cd is X-derivable in G.
Proof;
Similar to that of Lemma 3.2 (£) (with appropriate adjustment for the
"change from "Left" to "Right".
In Figure 9 we show the situation which underlies the specification
of a rule I^ i# Here, Ir jConsiders an application of the replacement rule
* A graphic interpretation of these rules of inference will be given shortly(see Figure 9), and a proof based on them is shown in Figure 10.
32
Fig. 8. The proof in tree form of P=> abed in the system N (G.^ .
l*J
CO
)
"RIGHT"APPLICATIONGraphic interpretation of the situation considered in theFig. 9
" RIGHT"ELEMENToAoFw^WHERE ■Jh'ifl 1 '-
[THE DASHED LINES OUTLINE PROBLEM -TRIANGLESTHAT NEED TO BE FILLED BY APPROPRIATECONFIGURATIONS OF RULE APPUCATIONS]
«X
i
34
R. , where the element R; 'is "connected" to the application site <a , i.e.,the "Right" element of cd; we call this a "Right" application of R. , and wedenote it by R, . For a given partition of the string ar-^' in n. substrings,n. new problem-triangles are created; according to I , a structure bridgingl r,i
X and (a exists, if there exist bridging structures for the n. new triangles.In the system N (G), a partition n -tuple [1., ..., 1 ], which we
(fi indenote by p (or"), specifies the lengths of a set of n substringsn /i\Xl ""' Xn o£ ffl » these lengths satisfy the conditions,
1, >.0> 1. > 0 for 1 < 1 < n , and
t
(3.8)
An inference tree in N (G) has the following form:
(3.9)
where the substrings Xi> """> Xn are the ni Parts that result from thepartition p (cd(D) under consideration, and XiXn Xn _
i """ Xo " a)'''.Again, the convention shown here on the specific ordering of the bottom w.f.f.will be observed in the subsequent tree representations.
The definitions of tree proof and theoremhood that were wade for thesystems N (G) and N.(G) carry over to N (G) in the obvious way; of course,the tree proof of a w.f.f. S in N (G), which we denote by D(S |N (G)), isconstructed with inference trees of the type shown in (3.9). As an example,we show in Figure 10 the tree proof in N (G) of our illustrative problemP * abed (tree proofs in N„(G) and N (G) are shown in the Figures 6 and8, respectively), [it is of interest to note in the proof of Figure 10examples of special cases of partition that occur in N (G); i.e., unary re-placement rule cases, and I. » 0 cases, as well as combinations of these cases,e.g. the rule application R^ on (A =»a).]
1± - I<CD)-1.i<i<n
Fig. 10. The proof in tree form of P=> abed in the system N^G^).
m
36
The Mixed Natural Inference Svstema Ng.°mixed €«M)> (t,r), (l,r), (t,l,r)f. xc
(G), where
:ules of Inferences fll :"mixed
Thus, the rules of inference of a mixed system, which offers a combination ofstrategic approaches to the proof problem, are the union of the rules of infer-
ence of the "pure" systems whose approaches may be used in the mixed system.
A mixed system offers the convenience of constructing proofs where at
each step of construction one can decide what form of inference to use ("Top","Left", or "Right" inference - according to the choices available in thesystem), i.e.,at what site of a candidate w.f.f. to attempt a rule applicationin order to advance the argument. The mixed system extends the problem-
solver's flexibilityof approach- a fact of great potential significance forthe efficiency of the proof construction procedure.
Lemma 3.2 (crmlxed) . If a w.f.f. X » co is valid in Ncrmlxed<G) by a rule, ofin£erence 1j- < d ,i € I^mixed' then mis x* derivablc in G.
Proof: By Lemmas 3.2 (t), 3.2 (4), 3.2 (r).
A proof in tree form in a mixed system of natural inference Na .(.G)is constructed of inference trees that may be taken from any of its component
"pure" systems; we denote the tree proof of S in No-mlxed(G) by D(S lNamlscedCG)>As an example, we show in Figure 11 one of the possible tree proofs of ourproblem P-> abed in NNt,
t i r )<G j) < th* proofs in Nt(G), N^(G) and Nf
(G) areshown in Figures 6, 8, and 10, respectively).
Logical Cpp^fitencv TheoremIf a w.f.f. ir* x i3a theorem in any of the systems of natural infer
ence N (G), then x is well-formed in the CF-language L(G).
Proof: The string xis well-formed in L(G) if it is P-derivable in G. ByLemma 3.1 and the various versions of Lemma 3.2, we know that if a w.f.f.X ■» vi ig valid in any of the systems N^G) by the axiom CL or by any of the
M(t,J> " {l)t U fl}4
t
W<t,j.«>- CDt v {i}, v {i}r .
>i
(
\
l;l
:\\'ill
i!;
37
rules of inference in Nff(G), then co is X-derivable in G. Because of this, and
since P =» x is, by hypothesis, a theorem of N (G) (i.e., it is at the root nodeof a proof tree in N (G)), it follows (by the definition of a proof tree) thatxis P-derivable in G. This proves the theorem.
>i
:i
i
jl
■
f
Fig 11 One of the tree form proofs of P=> abed in the mixed system
V^r)^'
A -8 fR,: P—AB.R 2:A-^ABb
* R3 : A -=-0R4 :B -—BdR 5 B bc
II
38
IV. CONVERSION FROM PROOF TREES TO DESCRIPTION TREES
The substance of this section will be presented as a separate technicalpaper. The main result shows how to construct procedures for converting tree-
proofs in the seven systems Na(G) into P-markers of G. This is stated as:i
"uctural Consistency TheorIfaw.f.f. P =» x is a theorem in any of the systems of natural
inference N (G), then a P-marker tree of x in G is in one-one correspondence
with the tree proof of P =» x. Moreover, there exist effective procedures for
obtaining the P-marker from the tree proof.
i
i
r
f
1j
39
V. NATURAL DECISION SYSTEMS FOR SYNTACTIC ANALYSIS
We shall now extend our natural inference systems N (G) into systems
JC (G) that have richer inferential mechanisms and that are especially wellsuited for the formulation of natural decision procedures; we call the systemsJf(G)> natural decision cyst
We associate with each w.f.f . Sin Jf(G), a value, v. (6), (k ■ 0, 1,2, . . .),whose intended meaning is close to that of "truth value" in logic; more speci-fically, we intend it to reflect the state of knowledge of a problem solver at
a given time, k. about the validity ( theoremhood) of S in a given decisionsystem. We find that this notion of value is extremely useful for the formu-lation of decision procedures and problem-solving procedures in general, andwe believe that it will prove considerably fruitful for the theoretical study,of the dynamic b of knowledge and uncertainty in traces (or trajectories) ofsuch procedures when they are observed as physical events. We assume thatvk(S) takes values from the set {1,u,0}; we intend vfc(S) -1 to mean that theproblem solver knows at time k that S is valid in the given system of decision)t'(G), i.e., S is provable in the system; v. (S) «» 0 is intended to mean thatthe problem solver knows that S is not valid in/V^.(G), i.e., S is not provablein^(G); if the value of S is v, then we mean that the problem 6olver isuncertain about the validity of S in«4£(G). The value "v" should be interpretednumerically aa a number between 0 and 1. Since v/e are not considering in thispaper notions such as "forgetting" or "error in problem solving", we will agree
that if a w.f.f. has assumed at any time k a value 1 or 0, then its value willremain stable for all times after k. In other words, 1 and 0 are stable values.On the other hand> "v" is an unstable value which may change with time, andmoreover (as we shall see shortly) it is the intended function of a problem-solving procedure to attempt to "stabilize" the "v" value of a w.f.f. underconsideration, by changing it to one of the two stable values. It may bedesirable in some problem-solving situations to consider more than three valuesfor vv(S); however, we find this set sufficient for our present purposes.
We shall now introduce certain elementary notions that will enable usto formulate an effective basis for refutations (i.e., for recognizing that agiven w.f.f. is not a theorem in a system *^(G)). Consider the set *L(.G) ofall terminal strings of a nonterminal X in G; this set was introduced in (2.5).Given a w.f.f. X =* cb, where cd is in Vj, it is valid to conclude that cd is not
40
X-derivable in G if we know that (a 4 s^(G). Now, if we can state "simple"necessary conditions for membership in (in the sense that they can be
readily tested), then we can test cd with respect to these conditions, and if
co fails one of these tests, we can conclude that cd is not X-derivable in G.
A useful property of *k(G) is its support, s(X), which was defined in (2.6). .Clearly, if l(<u) < s(X), then cd i . There is a class of properties,
similar to "support", that can be formulated in terms of lengths of strings in
and that provide ways for testing whether a given 6tring is refutable.
Such tests examine whether the length of a candidate string falls in a
"forbidden region" of string lengths in a^(G); if it does, then we can conclude
that the string is not X-derivable in G.
The set «&(G) is a subset of the set =£(G), which includes all strings
in V that are derivable in G from X, i.e.
(5.1)
Note that Xis also a member of =£(G) . Now, if it is known that 9 /sc^(G),then it can be validly inferred that cp is not X-derivable in G. We say that a
string cp in V is X-derivable from the left in G, if there are strings in s^(G)whose first component is q^ 1 '. We denote this relation by X=» cp. If a
candidate string doesn't satisfy derivability from the left, then it is not in\£(G) . The relation of X-derivability from the right is similar, and we denote
it by X*> Cp. Left and right derivabilities can be tested with relative ease
in the graph of the grammar, T(G') (an example of such a graph for our illus-
trative problem is given in Figure 2). For a given X and 9, X=> cp holds if it
is possible to find in T(G') a path from <p"' to X (by going against the arrows),
such that all the branches that lie on the path and that leave rule nodes (here
we refer to the direction of the arrows in JT(G')) are labeled 1. Similarly,
/IN
X =» Cp holds if it is possible to find in T(G') a path from q> ' to X, suchthat a.ll the branches that lie on the path and that leave rule nodes a.relabeled f .
Given a w.f .f. X=* p, where X6VN and p£V,it is important for our
decision procedures to know whether the element p is X-derivable in G. Note
that if P is derivable from X, then there must exist In p a finite sequence of
unary replacement rules (possibly a single unary rule) that can take X to |3.
This can be easily tested in the graph r(G').
>Z(G) » {cp|cp is mV, and X => q> holds)
41
i:
± «
In general, it is extremely important for our proof constructionprocedures to have a set of easily testable necessary conditions for stringsin «£(G), so that a candidate w.f.f. X cp which is not valid can be refutedearly; this minimizes the expected expenditure of search effort in the directionof "dead ends". We are facing here a classical problem of pattern recognition,namely that of selecting a set of features on basis of which a decision aboutclass belongingness can be efficiently made. (In our present case, the classes
of interest are s^t(G), for all X in V„ .) The problem is not only to find
features from which we can make logically valid inferences. The features shouldbe such that they can be tested with a relatively small amount of "computationaleffort". This implies (among other things) that appropriate representations of
the grammar should be available — in the sense that the grammar features that
are to be used for decision should be easily testable in these representations.
[We find the graph T(G') (with the additional property that at each nonterminal
node X, the "support" of X is also available) to be a satisfactory representation
of the grammar — with respect to the elementary feature tests that we are usinghere.] The problems that exist in this general area need considerable further
study; the recent work on "question-answering" systems is relevant here.
We shall use for the systems «/f^.(G) a small set of refutation conditions
which is sufficient for the formulation of the axioms of
Axioms of J^(-G) < £or all a'
Validation Axioms(Xt : Same as in Na(G)
Refutation Axioms:
a! For all strings x in V-, and all strings <p in V, if x 4 q>, then theOil §
w.f.f. x => cp is not valid in
Q,f. ". For all Xc VN and all strings xin VT, if l(x) < s(X), then X=»x isnot valid in^(G).
dn i For all X c Vj, and all strings cp in V, if cp is not X-derivable fromthe left in G, or cp is not X-derivable from the right in
G,
thenX «* cp is not valid in^(G).
Q, . For all Xc VN and all PeV, if pis not X-derivable in G (by asequence of applications of unary replacement rules), then X =» P is
not valid in JC^iG).
42
Lemma 5.1. For all a€ V and all strings cp in V, if a w.f.f. a=»cp is
refuted inJ^(G) by &Qf , or Q.Q>2 or (2o>3 or UQ A , then <p is not
a-derivable in G.pro0
f;
if $ applies, then there ls no rule of replacement which is0, 1
applicable on a (since a is, by supposition, in VT)j hence, since a ¥ cp, no
derivation exists in G from ato 9. If QQ 2 applies, then (as discussed
previously) cp d ittXG); therefore cp is not a-derivable in G. If OLq 3 applies,
then (from our previous discussion) cp jl therefore cp is not a-derivable
in G. Finally, the axiom £L . is clearly valid in G. This proves the lemma.
We can regard any validation axiom as assigning the value 1 to any
w.f.f. on which it applies. Also, any refutation axiom assigns the value 0 to
any w.f.f. on which it applies.
It is possible, and sometimes practically desirable, to augment the
set of axioms that we are using in«^.(G) - both for validation and refutation.
We have already discussed the problem of selecting additional conditions for
refutation; such conditions can be used to augment the set of refutation axioms.
A natural way to augment the set of validation axioms is to introduce for each
replacement rule R : A+cp an axiom which validates the w.f.f. A =» cp. Another
natural way of increasing the efficiency of a decision procedure is to utilize
in a dynamic way previously proved theorems. In the course of executing
decision procedures, it happens sometimes that a w.f.f. has been proved valid
in one part of a search tree at time k (a search tree provides a trace of the
decision procedure up to a certain time - it will be discussed shortly), while
at a later time and in a different part of the search tree there appears the
same w.f.f. with a "v" value associated to it. Under these conditions, it is
useful to be able to consider the previously proved w.f.f. as a theorem which
provides validation for the new
w.f.f.,
and hence to avoid the need for repeating
the proof of the new w.f.f. A decision system with such a variable set of availa
ble theorems while logically equivalent to a system without theorems that is
based on the initial set of axioms, may provide the basis for an extremely
efficient decision procedure. In general, however, the optimal size of the set
of available valid w.f.f.'s and the mode of its growth depends on a variety of
pragmatic considerations that are related to the available means for storage
and retrieval.Suppose that a rule application of type t, where 1 e{t,£,r), is
considered for a w.f.f. S = (X =» cd) whose value is uncertain. For each of the
three types of rule application, there exists a set of relevant replacement
Pc
S!43
L i
ru lea relative to SQ which we denote by [R}g . The set of relevant replace-
ment rules is such that if SQ has a proof with a rule application RjF at S^,then the rule R, must be among the relevant rules of- the set {r}| . The set
of relevant replacement rules is a subset of the set of applicable rules, (of
the given application type) on S». For example, if the "Top" approach on 80is considered, then {R}*j c {r|r*0) «= X). The members of the set {R}g are
those members of the set of applicable rules {r|r* ' » X} that satisfy certain
necessary conditions Of relevance. An example of a condition of relevance for
a rule R : A "*" q> which is a candidate for a "Top" application on Sfi is that
both the relations cp ' => cd and cp v ' *» cd should be satisfied. The problem of
relevance conditions will be discussed in more detail later. It suffices at
this point to indicate that, for each w.f.f. we can effectively obtain the set
of all relevant replacement ruleß that are applicable on the w.f.f. from the
"Top" the "Left" or the "Right".For the given S
Q
and a relevant replacement rule R c£R}g " there exist
set of relevant partitions, which we denote by {pn (co) ]R . The set of
relevant partitions is such that if SQ has a tree proof with a rule application
RT at Sn and a given partition pn (cd) associated with it, then pn (cd) must beamong the relevant partitions in {pn (co) }* . The set of relevant partitions isa subset of the set of all possible partitions of a string of N elements (where
1(cd) oN) into n parts, where the restriction is based on certain necessary
conditions of relevance — to be discussed later. Again, it suffices to indicateat present that the set of relevant partitions can be effectively enumerated*
We can regard a rule of inference I . in one of the systems of naturalinference N (G) as the partial specification for a mapping between the valuesof a set of w.f.f.'s (the antecedents in the specification of I ,) and thevalue of another w.f.f. (the consequent) - given that the constituent elementsof antecedent and consequent w.f.f.'s are related in a specified manner. We
shall define next the notion of an inference mapping, i}R of which such a ruleof inference 1b a partial specification, with the intention of augmenting theinferences that are possible in our systems of natural decision- in a way which
would be advantageous for work with decision procedures and their associatedproblem-solving procedures.
Consider a w.f.f. S
Q
- (X =» co), where a given application of a relevantrule of replacement from [R)g takes place; let R denote this application andsuppose that R labels the rule A■+ cp . For a given relevant partition
V. m.»
:&
44
p (">) e{p ("OJr > we have n antecedent w.f.f.'s S , 1 < j < n , whose
values at time kwe denote by vk{S ). [The nature of the antecedent w.f.f.'shas been discussed in detail in Section 111 for each of the three "pure"
approaches to rule application.] In reference to this situation, we define the
following inference mapping, *1R,
(5.2)
TWe call the left side of r\ the conditional value of S0 at time k, given R
X., P U
and pn (co).
Lemma 5.2. The inferences obtained via the mapping i)R in a system Jf%
(.G) ,where t e{t,i,r), are valid in the linguistic system G.
Proof: Let us consider in turn the three possible valuations of the condition-
al value defined by % "(i) If, for given RT and pn ("0, we have vk(Sj) = 1 for all j, 1 < j < n ,
then the conditional value of SQ at time kis 1, according to nR '>this agrees with our definition of the rules of inference Ifc ,1^ ,I that were shown to be valid in G (see Lemmas 3.2). Note thatr
the case just discussed covers precisely the partial specification of
ti defined by a rule of inference IT ; the next two cases refer to
R, pnew inferences introduced by \)P>
(ii) If v (S ) - 0 for some j, then the conditional value of S^ at time k
(given a pair RT and pn (">)) is 0, according to t]r^. This inference,
certainly satisfies our intended interpretations of =» in G. For if,for the given partition of cd, one of the antecedent w.f.f.'s does not
hold in G, then clearly the w.f.f. SQ does not hold in G either. If
JC (G) is to be consistent with G, then S
Q
should not be a theorem of
JC (G) under the given conditions, which is precisely what is inferredln nR>P*
(iii) If for no j, 1 < J < n . *>c have vk (S.) = 0, but for some j we have
v. (S ) » v, then the conditional value of SQ at time k (given a pair
RT , p (cd)) is v (uncertain), according to tir . In other words, for
a given partition of for which it is known at time k that some (or
none) of the antecedent formulas hold in G, but it is uncertain
whether some (or all) formulas hold in G, then it is uncertain at that
\ ? " VSolRT ' Pn <m» " W
45
'i
time whether an X-derivation of cd in G exists on the basis of the given
rule application and the given partition. It is therefore reasonable,and not inconsistent with G, to consider the theoremhood of S_ uncertainat time k, which is what is inferred from 1K.,p
It Bhould be noted that the notion of the inference mapping t)D_ isK<P
closely related to that of "truth function" - a notion of extreme usefulness
for reducing logical problems to algebraic and computational forms. Specifically,
ti0 corresponds to conjunction in a 3-valued logic of the type proposed by PostR, p[18].
Next, we introduce two compound inference mappings that are closely
related to r\ ;we denote them by t)r and i), respectively. The mapping tjr isdefined as follows:
We call the left side of i) the conditional value of S-. at time k given R andall the relevant partitions in [pn (co) }R . In our truth functional interpretation
of inference mappings, 1R corresponds to a disjunction of conjunctions in a
3-valued Post logic. This "disjunctive normal form" contains as clauses the
conditional values that are associated with all the relevant partitions of cdifor a given R .
Lemma 5.3. The inferences obtained via the mapping t)r in a system^(G),(t e{t,4,r}), are valid in the linguistic system G.
Proof
;
Consider the three possible valuations of the conditional value
defined by T)R .(i) The mapping % assigns to SQ a conditional value of 1 (given a R ) if
there exists at least one relevant partition for the given RT wherev.(S.) - 1 for all J, 1£ J < n . This is certainly consistent with
X Jthe rules of inference in the systems NT (G), and hence also with G,
(ii) % assigns to SQ a conditional value of 0 (given a R ) if for aU thepossible relevant partitions associated with the rule application RT
there is no partition in which vk(S.) ■■ 1 for all J, 1 < J < n . It isclear that under these conditions no derivation in G is possible thatutilizes RT at SQ . Hence, for consistency with G, Sq should not be a
V VSoIrT ' {p« (a3)J R > " "{*(„))* vk(SolßT' pn <»»" < s'3>5 ' 3>
46
theorem ofb^(G) under the given condition, which is what T]R asserts.
[Informally, if we consider the graphical interpretationof thesituation, as presented in the Figures 5, 7, and 9, we aredealing here with a case where, after a "bridgehead" is constructedat the application site of S_ (by "connecting" there in the appropriateway the rule R ) we know at time k that it is impossible that all theremaining parts of the "bridge" are constructable. Once this fact isknown it is clear that we can say at that time that no bridging structureis possible between the apex X and the base string cd which starts atthe given "pridgehead". ]
The present Lemma implies that the conditional value of SQ (given RT )
is uncertain at time k if for no possible partition associated withR there exists a conditional value
v,
(S
Q
|R , p (cd)) =1, but it maybe the case that some (but not all) conditional values are known to be0 and some (at least one) are still uncertain. Since there remainuncertain conditional values, it is uncertain whether further searchwill not reveal one to be a 1 or all to be a 0 at which time the cases(i) or (ii) will hold, respectively. Therefore, it is not inconsistentwith G and it is in agreement with our intended interpretation of v toconclude that the Lemma is valid in this case also.
(iii)
The second compound inference mapping, tj, is defined next
Max vk(S 0 |RT , [pn (cd) }J.) .[R]
STso
i
(5.4)
We' call the left side of i\ the conditional value of S at time k given all therelevant applications (of given type t) of rules of replacement and all therelevant partitions that are associated with each relevant rule of replacement.
Going back to our truth functional interpretationof inference mappings,T] corresponds to a "disjunctive normal form" in a 3-valued Post logic, andits clauses are the conditional values that correspond to all the applicationsof relevant rules of replacement on SQ . We can now express the (absolute)value of S- in terms of the values of all the antecedent w.f.f.'s, that areobtained from the inference mappings that we have defined.
ti: vk (So |[R)sso,T
o, {pn («»)J)-
«* Max Max Mm vk^ S <^[R}Jo(Pn <»»; I<J<n
47
Lemma 5.4. In a system^(G), (t e{t,4,rj), the (absolute) value of anonconclusive w.f.f. S
Q
= (X *v) at time k is given by
where the w.f.f.'s S,, 1 < J < n , denote the antecedents of Sqrelative to a relevant RT and a relevant p (co); this value mapping is
consistent with G.
pyoof; Similar to proof of Lemma 5.3.
Let us introduce next a tree representation of the situation which
underlies the specification of the inference mappings discussed above. Such
a tree, which we call an inference mapping tree (in analogy to the inference
trees defined in Section III) is shown in Figure 12. The role of an inference
node in an inference tree (see for example Figure 7) is split here in two.
Rule application nodes and partition nodes are shown separately; a sequence of
branches going through RT and then through pn (co) in the inference mapping tree
is equivalent to a branch going through a node R , pn (co) in an inference tree.
In an inference mapping tree, for a given direction of approach t, there are
branches descending from the w.f.f. node SQ to all the relevant rule application
nodes. For each relevant rule application node, say R , there are branches
descending to all the relevant partition nodes, and for each of these nodes,
o (co) there are n branches that descend to the set of n antecedentsay pn \
/,
w.f.f. s that relate to S
Q
via R and pn (co) .Axioms links are represented in the systems -A£.(G) in the same way as
in the systems Nff
(G) (see Figure 8).
We can regard an inference mapping tree as the logical diagram of a
circuit where values can be processed and transmitted. The inputs are the
values associated with a set of antecedent w.f.f. nodes such as Sj, ..., Sn .These inputs are processed at the partition node pn (co) by ij (see Figure 12);
in view of our previous comments, a partition node can be considered as a
3-valued AND gate. The outputs of the AND gates are processed at the rule
application nodes by tjr (see Figure 12); a rule application node can be con-
sidered as a 3-valued OR gate. The outputs of these gates are processed at
the S node by T (see Figure 12), which produces the output of the tree; the
Srt node can also be considered as an OR gate.0
v. (S0) - Max_ Max Mm v(S ),k ° wj (pn <»»; i<j<n k J
k (S 0 )AN ELEMENTARY TRANSITION
OO
Fig. 12. Inference mapping tree.
49il
A decision tree in a natural decision system is labeled tree whichof axiom links, and furthermore theis stable (1 or 0) .is made of inference mapping trees and
value associated with its root node S
decision system, (t c [t,4,r}),type t. In a "mixed" system
In a decision tree of a "pure"
the inference mapping trees are all of
Jf,(G), (t' e{(t,4), (t,r), (4,r), (t,4,r)}, a decision tree may use as building
blocks inference mapping trees from all its component "pure" systems. Thus a
decision tree in-iC t £ r )<G) may have inference mapping trees based on "Top","Left" or "Right" applications of replacement rules.
Each inference mapping tree in a decision tree is completely autonomous
from the point of view of value computations. In a decision tree of any
natural decision system ("pure" or "mixed"), value computations can be carried
out homogeneously.If v. (S ) = 1, where the w.f.f. S
Q
«= (P =» x) is the root w.f.f. of a
decision tree in any of our natural decision systems then S. is valid in
JC (G) and it is also valid in N (G) (by the Lemmas in the present section) and
also x is well formed in L(G) (by
■
the consistency theorem). Furthermore, the
tree proof of S in N (G) can be easily obtained from the decision tree by
tracing back from the root node a tree path which proceeds as follows through
nodes with value 1: Only one branch is taken below each OR node (the w.f.f.
nodes and the rule application nodes) and all the branches are taken below
an AND node (the partition nodes) until all the tree terminals reach conclusive
w.f -f " 's linked to Q^ .If v (S ) = 0, then S is refuted in JTiG) and x is not well formed in
L(G) (by the lemmas of the present section). Furthermore, the entire decision
tree provides now a detailed record of the unsuccessful attempts to form a
tree proof of S . This decision tree has the logical status of a tree proof;
it is indeed a tree proof of the nonvalidity of S . We call it a refutation
tree of S . In view of the information included in a refutation tree, this
tree (or a partial description of it) is an interesting candidate for the
error description message which was discussed in Eg. 2.17.
As we will see shortly, decision trees are grown, from their root node
down " in the course of execution of decision procedures and of problem solving
procedures. We call a decision tree in its intermediate stages of growth a
Contrary to the habits of live trees, and in view of our habits of mind, thedirection of growth of our trees is downwards.
50
search tree (or a problem-solving tree). The value at the root node of asearch tree is uncertain; it is the purpose of a decision procedure to directthe growth of a search tree in such a way that a stable value (1, or 0) isattained at its root node.
We shall now formulate an effective decision procedure for J[ (G),(for all a), which starts with the w.f.f. S * (P =* x) and produces a decisiontree for S from which we can obtain a P-marker of x if one exists .A Class of Decision Procedures for m/L(.G) (for all g
We start by focusing attention on the initial w.f.f. S- ■ (P =» x)whose value is uncertain. The objective of the decision procedures, which wedenote by n is to eliminate this initial uncertainty, and to assign a valueof 1 or 0 to S . A decision procedure is entered at n and is exited atTT^.g 0The following is a description of n,:nd! For a Biven w-f.f. S under attention, where vk(S) -u,(k ■ 1, 2,...),
and for a given a, choose a direction of approach (in a "pure" system
there is no choice; however, in a "mixed" system such as a ■ (t,4) we canchoose between t and 4) and generate an appropriate inference mapping tree.
This tree includes all the relevant rule applications for S and all therelevant partitions relative to these rule applications, and it results in
set of terminal w.f.f.'(II ). Test, in a given order, whether any of the newly generated w.f.f.'sare conclusive, i.e., if an axiom of «/C.(G) applies to them. If yes, assignthe appropriatevalue to the w.f.f.; if no, assign the value v.
(II). On basis of the values assigned to terminalw.f.f 's compute thenew value, vk(S_) (k = 1, 2, ...), of the initial w.f.f. (on basis of(5.2), (5.3) and (5.4)) by a process of "backing up" values from theterminals to the root.
(IIo). If vk (SQ) «1, stop and indicate that xis well formed in L(G)(by the consistency argunent) . [At this point it is also possible toextract the proof tree in N (G) of the w.f.f. S. and from it — via theprocedures mentioned in Section IV— to obtain the P-marker of x.] Ifvk (SQ) « 0, stop and indicate that x is not well- formed in L(G) (by thtlemmas in the present section). [At this point it is possible to outputthe decision tree which is rooted at S for purposes of error control .]If vk( so) = u» continue the process, as indicated inTT .a
51
(H )" Direct attention (in a given order) to each terminal w.f.f. from■3
the previous generation whose value is v. For each of these w.f.f.'scarry out the processes n , n " After these processes have been carriedout, execute 11 over to the entire tree, and then go ton .
We shall now prove the completeness of our systems of natural decisionwith respect to the grammar G. This proof provides a justification of theeffectiveness of our decision proceudres (in fact, of a large family ofprocedures that have the essential characteristics of the procedures nd incommon) for the solution of the syntactic analysis problem that we haveformulated in (2.17).
Completeness TheoremFor all strings x in V_, if x is well formed in L(G), then the w.f.f.
S=(P=» x) is valid in any of the systems Jfa(G) (i.e. vk (S) =1, where kis afinite integer) .
Proof;
It suffices to show that the decision procedure IL which is associatedwith a given system Jf (G) always terminates after a finite number of generations.
Note that, as a search tree grows from S = (P => x) the lengths of strings thatenter in the right sides of w.f.f .s in successive generations are non-increasing.
The case where the string length is conserved between successive generations
occurs when a unary replacement rule is applied; in all the other cases, thestring lengths decrease from generation to generation.
Since any nonterminal element in the grammar must have non-zero support
(see 2.7), then any sequence of unary replacement rules in p is finite and it
has the following form:
(5.5)
where A, A ,
.#.,
Afl
iVN and \|r is either a single element in VT, say a, ora string in Vof length larger than 1 . To a sequence of unary rules of the
type shown in (5-5) there corresponds a segment of a chain in the search tree
where string length is conserved." Let us call such a segment a length-conserving
segment.
* A tree chain is a path from the root of the tree to one of the treeterminals .
A 1 * A2' A 2""" A3' ""' An "* *'
52
Let us consider a decision procedure for the mixed system Jf. . ..(G).(t,4,r)If this procedure terminates after growing a finite search tree, then theprocedures for the other natural decision systems terminate also. We assumethen that the search tree which is grown by the decision procedure underconsideration is made of inference mapping trees from any of the three ruleapplication types.
In each inference napping tree (see Figure 12) there are several chainsthat go from the root w.f.f. node, through a rule application node and thenthrough a partition node to a terminal w.f.f. Let us call these chainselementary transitions. It is clear that in each case that a non-unary, re-placement rule is applied the string length decreases over an elementarytransition -in the direction of tree growth. If we consider a chain in asearch tree which is made of a sequence of such elementary transitions, thenwe will reach on this chain a w.f.f. whose string length is 1.
Consider now length- conserving segments of the decision tree. Theyare made of one or more length-conserving elementary transitions in sequence,each of which is associated with a unary replacement rule. Since any sequenceof unary rules is finite in length (see (5.5.)), it follows that the length-conserving segments are also composed of a finite number of elementarytransitions.
Let us consider now a w.f.f., S, that lies on a length-conservingsegment, and let us examine the transitions that are obtainable by applying toS replacement rules that attempt to maintain the next w.f.f. on the length-conserving segment (if this is possible); we examine in succession ruleapplications from the "Top", "Left", and "Right".
(1) Suppose that the rule is applied at the "Top" of a w.f.f. S«= (A = cd),where k^ (1 < i < n) is one of the nonterminal elements in a sequence of unaryrules in p of the type shown in (5.5), and u> is a string in V. If 1 < i <n,then we can apply on S one of the unary rules in the sequence (5.5). Thisyields the w.f.f. Ai+l => co which lies on the length-conserving segment. Ifco consists of a single nonterminal element, then it may be the case that CL.applies and the growth stops. If i = n, then (by 5.5) we can apply on Seither (i) a unary rule which yields a w.f.f. a => co (where a c V), or(ii) a non-unary rule, which causes the length-conserving segment to branchinto new segments where the string length is smaller than in the previouslength- conserving segment. In the case (i) where the w.f.f. a => co is
53
obtained, the axioms £2, or Qq 1 apply, and the growth stops.
(2) Suppose that a rule is applied at the "Left" of a w.f.f. S-(X =» A y),where A. is as in (1), X c V„ and yis a string in V. If 1 < i <n, then we
can apply on S one of the unary rules in the sequence. This yields the w.f.f.
X=» A . .y^ which lies on the length-conserving segment. If the string yis
empty, then it may be the case that££. applies, and the growth stops. If
i «= 1, then no further unary rules can be applied on S (by definition of A.).
Hence, if the string yis not empty either Q.Q - applies or a non-unary rule can
be applied, which causes the length-conserving segment to branch into newsegments where the string length is smaller than in the previous segment. If
the string x is empty, then either CL. orfln * apply, and the growth stops
(3) The case for rule applications at the "Right" of Sis similar to thecase (2) just described.
In general, the finite length- conserving segments always lead to chainsegments with w.f.f.'s of smaller string length, or their growth is stopped
by one of the axioms # Qq y&0 yorQq 4 - It remains now to examine thesituation at w.f.f.'s that have right-side strings of length 1.
If a chain of the search tree reaches a w.f.f . a=» p, where a, p c V_,,
then Q., or &n . apply and the growth stops. If a w.f.f. X=> p is reached,
where X c VN, P c VM,, then either Q
Q
* applies or a length-conserving segment
will grow below the w.f.f. X=» $ (as described above) and it will be stopped
by Q. . If a w.f.f. X=>A is reached, where X, A c VN, then either Ct or Ct
Q
,apply and the growth stops, or a length-conserving segment will grow below the
w.f.f. X=> A (as described above), and it will be eventually stopped by CL.We have shown then that all the chains of the search tree must stop in
a finite number of generations -by one of the axioms 1' 3°r 4"The axiom &n , may also contribute to stoppage of growth in some chains prior
0, 2to the application of the other axioms. Therefore, the decision procedures
discussed before effectively compute the desired mapping 8(x) which is defined
in (2.17) via the formation of appropriate decision trees; i.e. if the input
string x is well-formed in L(G), then the procedure produces a tree proof
which yields a single* P-marker of x. If x is not well-formed in L(G), the
procedure indicates this, and the decision tree which is grown during itsexecution can be used as an error description.
* The choice of a specific. P-markor depends on details of the decisionprocedure; these will be discussed in the next section.
54
VI. HEURISTIC PROCEDURES OF REDUCTION TYPEFOR SYNTACTIC ANALYSIS
While the decision procedures presented in the previous section areeffective algorithms for syntactic analysis, they are not designed with compu-
tational efficiency in mind. But they provide the framework for the construction
of efficient procedures. In what follows, we shall:1. establish the correspondence between the decision procedures discussed
above and certain reduction- type problem-solving procedures;
2. identify aspects of these procedures related to computational efficiency;
3. develop the procedures into efficient syntactic analyzers.A search tree of the type grown during a decision procedure is the
prototype of a tree which grows in the course of executing a problem- solving[8]
procedure of the reduction type. We also call this tree a problem-solvingtree. The main types of elements that appear in a problem-solving tree arestates and moves.
A state is a description of the problem that confronts the problemsolver at a given stage of his (its) activity. It includes index of uncertainty
about the solvability of the problem.A move in a problem-solving tree is an operator that either effects a
transition from one state to one or more subordinate states (with the intentionof reducing the uncertainty of the initial state), cr it recognizes that a
certain state is conclusive; a conclusive state is characterized by completecertainty, and when it is recognized as such no further problem—solvingactivity takes place from that state.
Problem-solving procedures of the reduction type are characterized bytheir mode of growing a search tree which is context free in the followingessential sense: the choice of moves from a given state is independent ofmove choices in any other state of the search tree. This property allows agreat flexibilityof approach in organizing the solution activity for the setof subordinate problem-states that demand attention at any one time. It alsosuggests systematic and efficient computer realizations of these procedures viarelatively simple mechanisms that can be used uniformly for classes of suchprocedures .
Clearly, our decision procedures arc themselves problem-solvingprocedures of the reduction type (we will call thorn subsequently reductionprocedures). In effect, our development of the. natural decision systems was
I
55
intended to bring us to a point where pur problem of devising efficient
procedures for syntactic analysis could be properly formulated as a problem
of devising efficient reduction procedures.
The description of the decision procedures in the previous section
provides the general outline of the structure of reduction procedures. The
execution of these procedures proceeds in cycles. In each cycle there is a
generation phase and an evaluation phase. During the generation phase,
decisions on the growth of the search tree are made. During the evaluation
phase the new growth is appraised, and various performance measures (such asvalue) are readjusted over the search tree. These new performance measures
guide the decisions in the next stage of generation. There are, in addition,
output procedures and executive procedures for tying together the entire problem
solving activity.
It is evident that we can obtain a variety of specific reduction
procedures if we fix the following two essential features:
(1) The basis for choosing the direction of approach at different stages
of solution, and the restrictive conditions for selecting the relevant set of
rule applications as well as of the relevant set of partitions,
(2) the strategy for selective direction of attention to the non-conclusivesubordinate problems.
A . States, Moves, and Search Trees in Reduction Proceduresfor Syntactic Analysis
Before shaping the above two features to optimize computational
efficiency, we discuss the reduction procedures for syntactic analysis and
the search trees that they generate. We have two types of problem states.
(1) States that contain a specified w.f.f. , such as S=(X => cd), togetherwith its associated value vk (S); we denote such states by ]L.
(2) States that contain a w.f.f. system, such as
[S, - (*>< 1 > -x,),..., sn = (*><'> x 1x2---xn -»}
where Xi> """> X-, are strinS variables, and co is a string in V, together withthe value associated with such a system, as defined in (5.3); we denote such
states by2-* .The moves used in our reduction procedures arc. of four types, as follows
(1) Replacament moves, whore (i) a non-unary rule of replacement is applied
"s
56
at a certain site of a specified w.f.f. in a given state, and it produces
a w.f.f. system in a subordinate state, and (ii) a unary rule of replacement
effects a transition between two specified w.f.f.'s in consecutive states; wename such moves by the appropriate rule application.
(2) Compound replacement moves (maneuvers), where 2or 3 rules of replace-
ment are simultaneously applied to a specified w.f.f. in a given state and theyproduce a w.f.f. system in a subordinate state; we name such moves by theappropriate set of rule applications.
(3) Partition moves, where a state that includes a w.f.f. system of n.w.f.f.is transformed into n subordinate states containing one specified w.f.f. each;
(4) Recognition moves, where a specified w.f.f. in a given problem state
is recognized by an axiom of our system as conclusive and it receives a stablevalue (1 or 0); we name such a move by the appropriate axiom.
In our reduction procedures, we shall use bisection moves exclusively.
This permits us to consider all the relevant partitions of a given string bycomposing bisection moves in an orderly manner. In the search tree, the effectof a full partition move is obtained by a cascade of bisection moves. We
illustrate in Figure 13 the process of applying a replacement move followed by
a partition move in the form of two bisection moves. In this figure, the4replacement move R (which is one of the relevant replacement moves) is applied
to the specified w.f.f. X => Bto in the state 2jy and the state £* resultswith a w.f.f. system which includes 3 w.f.f.'s, each with a string variable.A bisection move, is then applied on the w.f.f. system. This move chooses a
specific assignment for the pair of strings Xi and X9*3 an<l it Produces two
new states, 2 2 and J^ . Here, zl2 includes the specified w.f.f. X =»A^ , and
52 has a w.f.f. system with 2 w.f.f.'s. The second bisection move chooses2a specific assignment for the strings X2and X3and it: results in 2 new states
each of which includes a specified w.f.f
B. Computational EffortThe search tree in Figure 13 is typical for the problem described in
the state 2^. . Mark the node on which computer attention is currently focusedby a symbol denoting the sub-procedure of the main procedure which is in controlat that time. We then obtain an instantaneous description of the computation
[9](in the sense used by Davis for Turing machines). In Figure 13, we havemarked ]T, by f] ; this is intended to indicate that a growth procedure focused3 8
57
i
A|tA2 , A3 STAND FOR TREECONTINUATIONS
S~A)2.(X=S3* Qui)1
I v
R e A C .-Nt")ny V. /V J REPLACEMENT1 XA MOVES
j C=^X2lfa>=X2X3X, J ALTERNATIVE"n RELEVANTL BISECTIONS OFJ w, BASED ON
X| AND X 2X3
RELEVANT BISECTIONSOFTHE STRING w-Xj ,BASED ON X2AND X 3
Fig. 13. Representation of the application of a replacement\ move, followed by two bisection moves, in a problem-
solving tree.
2 2(X=3>AX.) fD=>X 3 1„*C_> — |C=*>X2 JJ?2
X 2X3X, =a ALTERNATIVEX f \ "N I RELEVANT BISEB(NH(X,))/C (sl" f OFTHE STRINI
/ \ S^p BASED ON X 2/
Ta 23 (D=^X 3 ) (C^X 2 )S.
NOTE:
58
on 2L/3 is now in control. We can regard the sequence of instantaneous de-scriptions that are recorded at the beginning of a cycle as a good representation°f the course of computation of a reduction procedure. It is reasonable toexpress "computational effort" as a measure of complexity defined over such acourse of computation. An important measure of computational complexity forany given instantaneous description is the number of its state nodes.
We define computational effort E(S,P) associated with a procedure Prelative to a root w.f.f. S (the problem statement), as the sum of weights ofstate nodes in the last instantaneous description when the computation stops.We assume here that a state "weight" exists which reflects the relative processingcomplexity associated with a state node in a given procedure.
This measure of effort reflects the size of the largest search treegrown by the procedure just before it provides a definite answer to the problem.The smallest tree that a procedure can construct before it stops has the samenumber of state nodes as the number of nodes in a P-marker of the string x.
Let E(N,P) denote the expected computational effort of the procedure Pover all strings xin V that have length N. Given a class of reduction proceduresfor syntactic analysis, it is of interest to ask whether a given procedure inthe class solves the syntactic analysis problem with the least E(N,P), for allN. Furthermore, it is natural to look for a new procedure which is betterthan any of the procedures in the given class in one of the senses just specified.Suppose that the goal of a proposed procedure is to satisfy this latter re-quirement. Since we cannot prove that the proposed procedure attains the goal,it has the status of a heuristic procedure relative to that goal.
The only way of ascertaining the relative ranking of candidate procedureswith respect to expected computational effort is through computer experimentation.We feel that the notion of E(N,P) is useful for such experiments. It is formu-lated at a broad enough conceptual level, so that it can be estimated for avariety of procedures that are implemented in different forms.
c * Approaches to Move SelectionNote that the main cause of branching in a search tree is generally due
to the multiplicity of jelgvant, bisection moves. This is a critical problanfor the design of efficient reduction procedures. Thus, it is highly desirableto use all the existing relevant information and also to develop special tests
V
59
towards the achievement of the largest a priori restriction possible over theset of relevant bisections. We shall consider next a concept which is extremely
useful for restricting bisection moves.Consider a pair [a,p], where a,p c V„, such that for any two strings
x, yin VT, and for a pair of nonterminal elements [A,B] the string xOJ isA-derivable from the right in G and the string Py is B-derivable from the left
in G. The set of all pairs fa,p]~, which we call boundary pairs, defines aboundary for the pair of nonterminals [A,B]. Any string z in V„ which is
derivable from the string AB in G must contain a boundary pair belonging to
the boundary of [A,B]. We will assume that a "boundary test" is available in
the system, and we will use it in the definition of relevant sets of bisections.Let us consider a w.f.f. S = (X => to), where X c V„, and co is a string
in V. Let us examine sets of relevant replacement moves and relevant bisections
for various "pure" and "mixed" approaches to solution.
(1) "Pure" systems .(a) Approach from the "Top" (see Fig. 3-2): Consider a non-unaryreplacement rule R: A-» c/ 1^ ... </n) , where A c VN, and cp . , """, cp c
The rule R is a relevant replacement rule for "Top" application on the
w.f.f. S if it satisfies the following 4 conditions:
(1) A= X; (2) qp(1^ £tv is satisfied;" (3) cp(n^ is
satisfied;
(4) s(cp) < l(co).** (6.1)
If R is applied on S from the "Top" it produces a w.f.f. system which
includes n w.f.f.'s in the form S = (9 =» x), ' < j <n> where the
X's are string variables whose values are to be specified by the relevantpartitions.
In order to specify all the relevant partitions of the string cd (in n
parts) we have to consider, one after the other, all the n-1 sets of bisectionmoves. The first bisection can be attempted from the left end or the right end
of co. Suppose that the bisection, [1, , 1 ], is attempted from the left, with
the purpose of producing a pair of strings Xj > Xo - *'^-The relevant interval
* See discussion on derivabilities from the left and the right in the previoussection.
*"'"' See (2.6) and (2.8) for definition of support s.
60
for the set of first bisections can be defined by,
each pair of string elements in the relevant interval which is a member of theboundary Bet of [9 , 9 ] defines a relevant bisection. For each specifi-
cation of Xi o>y a relevant bisection), say y.» we can now obtain the Bet °*relevant bisections that will specify y- in tbe same loader done before, and
so on. Note that if the nonterminal element A is self-imbedding, then the
usefulness of the "boundary test" decreases, since the occurrence of "spurious"
boundary pairs tends to be more frequent inside the relevance interval.(b) Approach from the "Left" (see Figure 7) : Consider again the rule
R given in (a) above. This rule is a relevant replacement rule for "Left"
application on the w.f.f . S if it satisfies the following 4 conditions:
The first bisection can be attempted from the left end or the right end of thestring cd. Suppose that the bisection, is attempted from the right, with the
purpose of producing a pair of strings X2"'Xn and Xi, " The relevant intervalfor the set of first bisections can be defined by
(6.4)
each string element in this interval which is A-derivable from the right in G
defines a relevant bisection. Note that the restrictions on relevant bisectionshere are weaker than in the approach from the "Top". After specifying a Xj bY
the first bisection, the process of successive bisections continues in the sameway as with the first. Note that if A is
left-recursive,
then the usefulnessof the "right-derivability" test decreases, since the occurrence of elementssatisfying this test tends to be more frequent in the relevance interval.
(c) Approach from the Right (see Figure 9) : The situation is similarto (b) above. In the present case, the choice of a relevant bisection involvesa test of "left-derivability" from A; the usefulness of this test diminishes ifA is right-recursive.
s(90)) <\ < 4 (co) - V\(cp (j)); (6.2)
J-2
(1) m* 1)*> co* 1 ); (2) X£a is satisfied;
(ft 1^
(3) 9(2) £")(2) is satisfied; (4) ]£ s(9(j)) < l(<a)-l.
J-2
0 <1„ <l(co(1)) - £ s(9(j) );j=2
61
(2) "Mixed" Systems
We shall discuss only one combined approach with the purpose of illus-
trating the advantages of mixed systems: from "Top" and "Left", see Figure 14
Consider a non-unary replacement rule R. : A. ■*■ B,C, which is a candidate
for "Top" application, and a rule R2: A 2 ■* B^ which is a candidate for "Left"application. The. pair of rules R,, -^ Is a relevant pair if the following
conditions are satisfied:
(1) A 1 =X; (2) Bj £a 2 is satisfied;
(3) B, = co* 1*; (4) C. co is
satisfied;
(6.5)
Suppose that the first bisection [ lh, lj.] is attempted from the right with the
purpose of producing a pair of strings cd* I*1 '* The relevant interval for
the set of first bisections can be defined by
(6.6)
each pair of string elements in the relevant interval which is a member of the
boundary set of [8., C. ] defines a relevant bisection.
The combined approach just described is a compound move which has
the property of an especially useful maneuver. It produces simultaneously two
compatible "bridgeheads" from triangle corners (see Figure 14), and in so doing
it appreciably reduces unnecessary growth in the search tree. Clearly, a
simultaneous approach from three corners would have reduced to a greater extent
the total search effort needed. In this connection, it is suggestive to consider
our problem as one of solving a triangular jigsaw puzzle of a special kind. It
is clearly reasonable to start by filling in as much of the boundaries as
possible before venturing into the central region.
In general, as we increase sharpness of selection from specification
of many simultaneous requirements, we reduce the maximal search tree needed for
selection. The local effort needed for selection also increases under those
circumstances.In some cases it may be possible to have a complete "mixed" system
(t,4,r), and yet to approach the problem, at each state from a single (but not
necessarily identical) direction, reducing this way the amount of local compu-
tation needed st a state. The decision on the choice of approach can be carried
(5) C 2 4cd(2) is satisfied; (6) s(C,) + Max[s (C2,5(B 1)] < l(co)-1
8(0,) <lt <Max[s(C2), s(Bt)];
62
NOTE:THE SHADED AREASCORRESPOND TO PROBLEMAREAS THAT REMAIN AFTERTHE APPLICATION OF THEMOVE
Fig. 14. A compound replacement move (a maneuver) from the"top" and "left".
(X=S>w)
B, =>A 2£02=^X2
63
out locally at each state by examining the recursiveness properties of theright-side element of the replacement rule which is under consideration forapplication at the w.f.f. of that state. If this clement is left-recursive,an approach from the "Left" should be avoided. If it is self-embedding a"Top" approach is to be avoided. If it is right-recursive, an approach fromthe "Right" is not indicated. This advice is based on the comments that wehave made previously about the blunting effects of certain recursivenesssituations on the selectivity of relevant bisection moves.
The variety of possible modes of selecting replacement and partition
moves from a state must be evident by now. A specific choice of mode depends
on the characteristics of the available computer system. An experimental
study of the relationship between various modes would be of considerable interestIn our heuristic procedures, we assume that if no relevant (compound)
replacement move is found from a state or if no relevant bisection is found,then the state is immediately assigned the value 0.
We assume that the axiom set of Jf(G) is available for the recognition
of conclusive w.f.f.'s in the heuristic reduction procedures. Our remarks inSection V on the possible augmentation of the axiom set, both for validationand
refutation,
are relevant here.
D. Approaches to Attention Control
We assume that at each generation cycle of a heuristic procedure
single "quantum of growth" is initiated from a terminal state of its search tree(the state under attention). The growth starts with the application of all therelevant replacement moves on the specified w.f.f. contained in the state underattention, and it proceeds with the application of all the relevant bisectionmoves, until all the new terminal states contain specified w.f.f.'s.
At each problem solving cycle a decision is made as to "vhere to gonext" so that expected computational effort is minimized for the problem onhand. Estimates of expected effort at non-conclusive terminal states can beused to estimate expected effort for all the other non-terminal nodes of thesearch tree whose value is uncertain. These effort estimates can then be usedin a systematic manner to control the formation of an "attention path" from theroot node to a terminal node. An important factor in the choice of an approachto the control of attention which is oriented to the minimization of expectedeffort is the expected multiplicity of solutions. In our case, this is relatedto the question of syntactic ambiguity. If we assume that the solution isunique if it exists (this is a reasonable assumption for programming languages
64
without known ambiguities), then it is possible to formulate a scheme of
attention control which is both simple and it has satisfactory effort saving
properties. We shall outline this scheme below.We associate with each node Z of the search tree the two following
effort estimates:
(Z): estimate of the expected number of state nodes in the subtree rooted
at Z if the node Z is eventually assigned the value 1■°(Z): estimate of the expected number of state nodes in the subtree rooted
at Z if the node Z is eventually assigned a value 0.These estimates change from cycle to cycle as the growth of the search
tree evolves.If Z is an OR' node (all nodes except bisection move nodes) with n
descending branches, each having an associated pair of effort estimates
c?, c! (1 < i < n), then the effort estimates at Z are computed as follows:
(6.7)e°(Z) - £ e°
i-1
If Z is on an "attention path" descending from the root node, then it chooseswhich has mm. cthe next segment of the path below it
If Z is a bisection move node (an AND node) with two descending states
e2, e2, then the effort estimates at Zthat have effort estimates c., c, andare computed as follows
where c, is such that c, + c. <e2 + e2 (6.8)
If such a move node is on an "attention path", then it chooses the next segment
* OR and AND in the sense of Section V.
e*(Z) ""J [S e{ + <n" 1) e° + <n"2 > e2 +"" + en-l] '
where et , .... c , are such that1 n-l
o o .0.0c, <e2 ... <cen_, < en -
.'(Z) - ej +ej
S
°(Z) - +e° + e|)
65
o 1of the path below it which has mm. c + cConsider terminal states whose w.f.f. have strings of length N. Let
e'(N) be the upper bound on the number of state nodes in the search tree that
appear below such a state when the state is validated. Similarly, c (N) is
the upper bound on the nodes when the state is refuted. We take c (N) and
B°(N) to be the effort estimates at the terminal states. In the computation
ef these upper bounds we consider that all the possible bisections of a string
are taken at a bisection node; the max. nunber of applicable replacement rules
which is possible for any element of V is taken at a replacement node. Also,
in this computation the rules for processing effort estimates (i.e., (6.7) and(6.8)) and for controlling the "attention path" are used consistently. It is
evident that we can compute recursively c (N), c (N) under these assumptions,
starting with e^l) - 1 and e°(1) - 1, using them in c (2), e°(2), and so on.
These estimates can then be available in the form of tables or (approximate)
functions for use in any syntactic analysis problem. From preliminary hand
experiments with a heuristic procedure that uses the scheme for attention
control just outlined, we feel that it provides a strong basis for selectivity
of search along the most relevant lines.It is important to note that the tighter the basis for effort estimation
the better a heuristic procedure P will be from the point of view of minimizing
E(N,P).
66
REFERENCES
[1] Chomsky, N., and Miller, G. A., "Introduction to the Formal Analysisof Natural Languages", in Handbook of Mathematical Psychology ,(Eds.) Bush, Galanter and Luce, Vol. 2, Ch. 11, Wiley, 1962.
[2] Griffiths, T. V., and Petrick, S. R., "On the Relative Efficienciesof Context-Free Grammar Recognizers", pp. 289-300, Communications ofACM, Vol. 8, No. 5, May 1965.
[3] Goto, S., "Specification Languages for Mechanical Languages and TheirProcessors — A Baker's Dozen," Communications of the ACM, Vol. 4, No.12, Dec. 1961 .
[4] Chomsky, N., "Formal Properties of Grammars", in Handbook of MathematicalPsychology , (Eds.) Bush, Galanter and Luce, Vol, 2, Ch. 12, Wiley, 1962-
[5] Simmons, R. F., "Answering English Questions by Computer: A Survey",Communications of the ACM, Vol. 8, No. 1, January 1965, pp. 53-71.
[6] Newell, A. and Ernst, G., "The Search for Generality", Proc. of theIFIP Congress 1965, Vol. 1, Spartan Books, 1965.
[7] Newell, A., Shaw, J. C, Simon, H. A., "Report on a General Problem-Solving Program for a Computer", in Information Processing: Proceedingsof the International Conference on Information Processing, pp. 256-264,UNESCO Paris, 1960.
[8] Walters, D., Amarel, S., "Heuristic Theorem Proving", Parts I and 11,Final Report AFCRL-62-367, on Contract AF19(604)-8422, May 1962.
[9]
[10] Newell, A., Simon, H. A., "The Logic Theory Machine: A Complex Infor-mation Processing System", IRE Transactions on Information Theory,Vol. IT-2, No. 3, September 1956.
[11] Wang, H., "Toward Mechanical Mathematics", IBM Journal of Research andDevelopment, Vol. 4, No. 1, January 1960, pp. 2-22.
[12] Davis, M., "Eliminating the Irrelevant from Mechanical Proofs", Proc.Symposium Applied Math., Vol. XV, pp. 15-30, American MathematicalSociety, Providence, R. 1., 1963.
[13] Gentzen, G. "Untersuchungen iiber das logische Schliessen", Math. Zeit.,39 (1934), pp. 176-210.
[14] Jaskowski, S., "On the Rules of Suppositions in Formal Logic," StudiaLogica, 1 (1934), pp. 5-32.
[15] Fitch, F. 8., Symbolic Logic: An Introduction , Ronald Press Co..New York, 1952.
Davis, M., Computability and Unsolvability , McGraw Hill, 1958.
67
Nidditch, P. H. , Introductory Formal Logic of Mathematics, UniversityTutorial Press, London 1957.
[16]
[17] Lambek, J., "On the Calculus of Syntactic Types", in the Proceedings
of Symposia in Applied Mathematics, Vol. XII, Ed. by R. Jakobson.
[18] See Rosenbloom, P., The Elements of Mathematical Logic, Ch. 11,Sec. 4, Dover Publications, 1950.
Amarel, S., "An Approach to Heuristic Problem Solving and TheoremProving in the Propositional Calculus", in Systems and Computer
Science,
Hart and Takasu, eds., University of Toronto Press, 1967.
[19]
UNCLASSIFIEDSecurity Classification
Security Classification
UNCLASSIFIEDSecurity Classification
KEY
WORDS
Syntactic analysis
Heuristic programmingProblem solving
Theorem proving
Natural inference systems
Computer linguistics
IN!
1. ORIGINATING ACTIVITY: Enter the name and addressof the contractor,
subcontractor,
grantee, Department of De-fense activity or other organization (corporate author) issuingthe report.
2a. REPORT SECUHTY CLASSIFICATION: Enter the over-all security classification of the report. Indicate whether"Restricted Data" is included. Marking is to be in eccoro-ance with sppropriate security regulationa.26.
GROUP:
Automatic downgrading is specified in DoD Di-rective 5200. 10 and Armed Forces Industrial Manual. Enterthe
group
number.
Also,
when applicable, show that optionalmarkings have been used
for
Group 3 snd Group 4 aa author-ized.3. REPORT TITLE: Enter the complete report title in allcapital tetters. Titles in all caaea should be unclaeeifled.If a meaningful title cannot be selected without claselflca.
tion,
show title classification in all capitals in parenthesisimmediately following the title.4. DESCRIPTIVE NOTES: If appropriate, enter the type ofreport,
e.g., interim,
progress,
summary, annual,
or »n«-Give the inclusive dates when a specific reporting period iscovered.5. AUTHOR(S): Enter the name(s) of authoKa) as shown onor in the report. Enter last name, first name, middle initial.If military, show rank and branch of aervice. The name
of
the principal author is an abaolute minimum requirement.6. REPORT DATE: Enter the date
of
the report as day,
month, year;
or month,
year.
If more than one date appearson the report, use date ofpublication.7a. TOTAL NUMBER OF PAGES: The total page countshould
follow
normal pagination proceduree.
i.e.,
enter thenumber of pages containing information.7b. NUMBER OF REFERENCES: Enter the total number ofreferences cited in the report.
Ba. CONTRACT OR GRANT NUMBER: If appropriate, enterthe applicable number ofthe contract or grant under whichthe report was
written,
86, Be,
fc td. PROJECT NUMBER: Enter the appropriatemilitary department
Identification,
auch aa project
number,
aubpraject
number,
ayatem
numbera,
taak
number,
etc.
9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi-cial report number by which the document will be identifiedand controlled by the originating activity. Thla number mustbe uniqueto thla report.
96. OTHER REPORT NUMBERfS): If the report haa beenaaslgned
any
other report numbera (either by the originatoror by the sponsor), alao enter this numbers).
ITIONS
10. AVAILABILITY/LIMITATION NOTICES: Enter any lim-itations on further dissemination of the report, othar than thoseimposed by security
classification,
using standard statementssuch as:
(1)
"Qualified
requesters
may
obtain coplea of thisreport from DDC"
(2) "Foreign announcement and dissemination of thisreport by DDC is not
authorized,"
(3) "U. S. Government agenciea
may
obtain copies ofthis report directly from DDC. Othar qualified DDCusers shall request through
(4) "U. S. militaryagenciea
may
obtain copies of thisreport directly from DDC Other qualified usersshall request through
M
(5) "All distribution of this report is controlled.
Qual-ified
DDC users shall request throughii
If the report hss been furnished to the Office of Technical
Services,
Department of
Commerce,
for sale to the public, Indi-cate this fact and enter the price. Ifknown.
IL SUPPLEMENTARY NOTES: Uae
for
additional explana-tory notes.li SPONSORING MILITARY ACTIVITY: Enter the name ofthe departmental project officeor laboratory sponsoring (pay-ing lor) the research and development. Include address.
13. ABSTRACT: Enter an abstrsct giving a brief and factual
summsry
of the document indicative of the report, even thoughit
may
also sppesr elsewhere in the body
of
the technical re-port. If sddltlonal apace is required, a continuation sheetshsll be attached.
It ls highly desirable that the abstract of classified re-ports be unclassified. Eech paragraph of tha abstract shallend with an Indication
of
the military security classificationof the information in the paragraph, represented ss (TS), (S),(C), or (V).
There is no Umltstlon on the length of the abstract. How-ever, the suggested length la
from
150 to 225 worda.14. KEY WORDS: Key words are technically meaningful termsor short phrases thst charscterlze s report snd
may
be used SS
index entries
for
cataloging the report. Key words must besetecteH so thst no security classification is required. Iden-
fiers,
such as equipment model designation, trade name, mili-tary project code name, geographic
location, may
be uaed askey words but will be followed by sn indlcstlon
of
technicalcontext. The assignment
of links, rules,
and welghta laoptional.
UNCLASSIFIEDSecurity Classification
*,