what was wrong with the chomsky hierarchy?seen as computer languages. in of the tenth international...

14
What Was Wrong with the Chomsky Hierarchy? Makoto Kanazawa Hosei University context- free Chomsky Hierarchy of Formal Languages regular context- sensitive recursively enumerable Hopcroft and Ullman 1979 Enduring impact of Chomsky’s work on theoretical computer science. 15 pages devoted to the Chomsky hierarchy.

Upload: others

Post on 13-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

What Was Wrong with the Chomsky Hierarchy?

Makoto KanazawaHosei University

context-free

Chomsky Hierarchy of Formal Languages

regular

context-sensitive

recursively enumerable

Hopcroft and Ullman 1979

Enduring impact of Chomsky’s work on theoretical computer science.15 pages devoted to the Chomsky hierarchy.

Page 2: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

450 INDEX

Carmichael, R. D., 444Cartesian product, 6, 46CD-ROM, 349Certificate, 293CFG, see Context-free grammarCFL, see Context-free languageChaitin, Gregory J., 264Chandra, Ashok, 444Characteristic sequence, 206Checkers, game of, 348Chernoff bound, 398Chess, game of, 348Chinese remainder theorem, 401Chomsky normal form, 108–111, 158,

198, 291Chomsky, Noam, 444Church, Alonzo, 3, 183, 255Church–Turing thesis, 183–184, 281CIRCUIT-SAT , 386Circuit-satisfiability problem, 386CIRCUIT-VALUE, 432Circular definition, 65Clause, 302Clique, 28, 296CLIQUE, 296Closed under, 45Closure under complementation

context-free languages, non-, 154deterministic context-free

languages, 133P, 322regular languages, 85

Closure under concatenationcontext-free languages, 158NP, 322P, 322regular languages, 47, 60

Closure under intersectioncontext-free languages, non-, 154regular languages, 46

Closure under starcontext-free languages, 158NP, 327P, 327regular languages, 62

Closure under unioncontext-free languages, 158NP, 322P, 322regular languages, 45, 59

CNF-formula, 302Co-Turing-recognizable language, 209Cobham, Alan, 444Coefficient, 183Coin-flip step, 396Complement operation, 4Completed rule, 140Complexity class

ASPACE(f(n)), 410ATIME(t(n)), 410BPP, 397coNL, 354coNP, 297EXPSPACE, 368EXPTIME, 336IP, 417L, 349NC, 430NL, 349NP, 292–298NPSPACE, 336NSPACE(f(n)), 332NTIME(f(n)), 295P, 284–291, 297–298PH, 414PSPACE, 336RP, 403SPACE(f(n)), 332TIME(f(n)), 279ZPP, 439

Complexity theory, 2Composite number, 293, 399Compositeness witness, 401COMPOSITES, 293Compressible string, 267Computability theory, 3

decidability and undecidability,193–210

recursion theorem, 245–252reducibility, 215–239Turing machines, 165–182

Computable function, 234Computation history

context-free languages, 225–226defined, 220linear bounded automata,

221–225Post Correspondence Problem,

227–233reducibility, 220–233

Sipser 2013

No mention of the Chomsky hierarchy.The same with the latest edition of Hopcroft, Motwani, and Ullman (2006).

Semi-Thue System/String Rewriting System

α → βProduction

γαδ ⇒ γβδOne-step rewriting

L(G) = { w ∈ Σ* ∣ S ⇒* w }

Type 0 unrestricted

Type 1 |α| ≤ |β|

Type 2 α ∈ N

Type 3 α ∈ N, β ∈ Σ N

α, β ∈ (N ∪ Σ)*

Language

N: finite alphabet of nonterminal symbolsΣ: finite alphabet of terminal symbols

context-free

regular

context-sensitive

recursively enumerable

type 0

type 1

type 2

type 3

Turing machines

LBA

PDA

FA

rewriting systems

machine models

LBA: linear-bounded automata = NSPACE(n) Turing machinesPDA: pushdown automataFA: finite automata

Page 3: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

recursively enumerable

context-sensitive

context-free

regular

type 0

type 1

type 2type 3

PSPACE-complete

Context-sensitive languages are computationally very complex.Some context-sensitive languages are PSPACE-complete.

• Context-sensitive languages are computationally very complex.

• There is a huge gap between context-free and context-sensitive.

• Many intermediate classes of languages have been considered in formal language theory.

• In particular, “mildly context-sensitive” grammars have been investigated by computational linguists.

Indexed GrammarsHopcroft and Ullman 1979

Page 4: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

recursively enumerable

context-sensitive

context-free

regular

type 0

type 1

type 2type 3

indexed??

What type of rewriting system is an indexed grammar??

T → ABC

Tfg ⇒ AfgBfgCfg

Production

One-step rewriting

α → βProduction

γαδ ⇒ γβδOne-step rewriting

Type 0

Indexed

An indexed grammar is not an instance of a type 0 grammar.

Page 5: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

recursively enumerable

context-sensitive

context-free

regular

type 0

type 1

type 2type 3

indexed

NP-complete

Indexed languages are also sort of complex.

70 GERALD GAZDAR

push items onto, pop items from, and copy the stack. What we end up with now is no longer equivalent to the CF-PSGs but is significantly more powerful, namely the indexed grammars (Aho, 1968). This class of grammars has been alluded to a number of times in the recent linguistic literature: by Klein (1981) in connection with nested comparative constructions, by Dahl (1982) in connection with topicalised pronouns, by Engdahl (1982) and Gazdar (1982) in connection with Scandinavian unbounded dependencies, by Huybregts (1984) ilDd Pulman and Ritchie (1984) in connection with Dutch, by Marsh and Partee (1984) in connection with variable binding, and doubtless elsewhere as well.

Indexed grammars fall in between CF-PSGs and context-sensitive grammars in their generative capacity. Every context-free language (CPL) is an indexed language (IL), but not conversely. Thus anbnen is an IL, for example, but it is not a CPL. And every IL is a context-sensitive language, but not conversely. Until recently, there were no good arguments to suggest that the natural languages (NLs) fell outside the CFLs (see Pullum and Gazdar, 1982, for defense of this claim), but work by Culy (1985), Huybregts (1984) and Shieber (1985) does now indicate rather strongly that they do. However, no NL phenomena are known which would imply that NLs exist which fall outside the indexed class (see Gazdar and Pullum, 1985, for a survey).

Hopcroft and Ullman write that "of the many generalizations of context -free grammars that have been proposed, a class called 'indexed' appears the most natural, in that it arises in a wide variety of contexts" (1979:389). One purpose of the present paper is to make indexed grammars more accessible to linguists and computational linguists, and more directly relevant to their concerns. It assumes some passive competence in mathematical linguistics, but not very much. In accord with the purpose just mentioned, I shall present grammars informally as sets of rules, rather than as the official n-tuples. The start symbol will always be "S" or and the relevant sets of terminals, indices, and nonterminals can simply be inferred from looking at what appears in the rules.

2. THE FORM OF RULES: A STACK-ORIENTED NOTATION

In exhibiting schematic rules and trees below, I shall maintain a number

APPLICABILITY OF INDEXED GRAMMARS 71

of orthographic conventions: nonterminal symbols are indicated by upper case letters (A, B, C), terminal symbols by lower case letters (a, b, e), possibly empty strings of terminals and nonterminals by W, W1, W2, etc., indices by lower case italic letters k), and stacks of indices by square brackets and periods ([], [ .. ], [i, .. ]) where [i, .. ] is a stack whose topmost index is i, [] is an empty, and [ .. ] is a possibly empty stack of indices. As for the rules themselves, Aho (1968) uses one notation, and Hopcroft and Ullman (1979) use another. The notation used below is essentially just a redundant, and intendedly more perspicuous, variant of that employed by Hopcroft and Ullman.

In the standard formulations, an indexed grammar can contain rules of three different sorts:

(1) 1. A[ .. ] -> W[ .. ] i 1. A[ .. ] -> B[i, .. ] i i.i. A[i, .. ] -> W[ .. ]

I shall refer to rules that have one or other of these three forms as H& U rules. The first type of rule simply copies the stack to all nonterminal daughters. The second type of rule pushes a new index onto the stack handed down to its unique nonterminal daughter. And the third type of rule pops an index off the stack and distributes what is left to its nonterminal daughters. In the rules, A and B are nonterminal symbols (not necessarily distinct) and W is a string ofterminal and/or nonterminal symbols. A compound symbol of the form A[ .. ] means that the nonterminalA bears the stack [ .. ]. A compound symbol of the form W[ .• ] stands for a string of terminal and/or nonterminal symbols each nonterminal symbol-of which bears the stack [ .. ]. Terminal symbols cannot bear Thus, if W = BcDe, then W[ •. ] = B[ .. ]e D[ .. ]e. Stacks of indices are ,;thus associated with nonterminals and transmitted by rules. Indeed, the iLs are exactly characterised by a class of automata known as (one-way nondeterministic) nested stack automata (Aho, 1969). The indices that make up stacks are drawn from some finite vocabulary though the stacks themselves are not bounded in size. Any upper bound on the size of stacks restricts the class of grammars to the context-free class.

The types of.rule shown in (1) are just those permitted in Hopcroft and Ullman's defInition of the class of grammars--what I shall refer to as the standard definition. Notice that this defInition (a) copies all or most of the

68 STUART M. SHffiBER 2. Robert Berwick and Amy Weinberg. The Grammatical Basis of Linguistic Performance:

Language Use and Acquisition. MIT Press, Cambridge, Massachusetts, 1984. 3. Joan Bresnan, editor. The Mental Representation of Grammatical Relations. MIT Press,

Cambridge, Massachusetts, 1982. 4. Luca Cardelli and Peter Wegner. Understanding types, data abstraction and

polymorphism. Draft paper, 1985. 5. Noam Chomsky. Aspects of the Theory of Syntax. MIT Press, Cambridge, Massachusetts,

1965. 6. Noam Chomsky. Lectures on Government and Binding. Foris Publications, Dordrecht,

Holland, 1982. 7. Steven Fortune, Daniel Leivant, and Michael O'Donnell. The Expressiveness of Simple

and Second Order Type Structures. Research Report RC 8542, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1980.

8. Gerald Gazdar. Phrase Structure Grammar, pages 131-186. D. Reidel, Dordrecht, Holland, 1982.]

9. Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. Generalized Phrase Structure Grammar. Blackwell Publishing, Oxford, England, and Harvard University Press, Cambridge, Massachusetts, 1985.

10. Ronald Kaplan. Personal communication, 1985. 11. Martin Kay. Unification Grammar. Technical Report, Xerox Palo Alto Research

Center, Palo Alto, California, 1983. 12. Fernando C. N. Pereira and Stuart M. Shieber. The semantics of grammar formalisms

seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California, 2-7 July 1984.

13. Fernando C. N. Pereira and David H. D. Warren. Parsing as deduction. In of the 21st Annual Meeling of the Association for ComputatiolJal Linguistics, pages 137-144, Massachusetts Institute of Technology, Cambridge, Massachusetts, 15-17 June 1983.

14. Ritchie, Graeme. The Computational Complexity of Sentence Derivation in Functional Unification Grammar. In Proceedings of the 11th International Conference on Computational Linguistics, pages 5!W-586, University of Bonn, Bonn, West-Germany, August 1986.

15. William C. Rounds and Robert Kasper. A Complete Logical Calculus for RecoId Structures Representing Linguistic Information. In Proceedings of the 15th Annual Symposium on Logic in Computer Science, Massachusetts Institute of Technology, Cambridge, Massachussetts, June 1986.

16. Stuart M. Shieber. The design of a computer language for linguistic information. In Of the Tenth International Conference on Computational Linguistics,

Stanford University, Stanford, California, 2-7 July 1984. 17. Stuart M. Shieber. An Introduction to Unification-Based Approaches to Grammar.

Volume 4 of Lecture Note Series, Center for the Study of Language and Information, Stanford, California, 1986.

18. Stuart M. Shieber. A simple reconstruction of GPSG. In Proceedings of the 11th International Conference on Computational Linguistics, University of Bonn, Bonn, West Germany, 25-29 August 1986.

19. Stuart M. Shieber. Using restriction to extend parsing algorithms for complex-feature-based formalisms. In of the 22nd Annual Meeting of the Association for Computational Linguistics, University of Chicago, Chicago, Dlinois, July 1985.

20. Patrick H. Winston and Karen A. Prendergast. The AI Business: Commercial Uses of ArtifiCial Intelligence. MIT Press, Cambridge, Massachusetts, 1984.

GERALD GAZDAR

UNIVERSITY OF SUSSEX

SCHOOL OF SOCIAL SCIENCES

BRIGHTON

APPLICABILITY

OF INDEXED GRAMMARS

TO NATURAL LANGUAGES

* 1. INTRODUCI10N

If we take the class of context-free phrase structure grammars (CF-PSGs) and modify it so that (i) grammars are allowed to make use of fmite feature systems and (ii) rules are permitted to manipulate the features in arbitrary ways, then what we end up with is equivalent to what we started out with. Suppose, however, that we take the class of context-free phrase structure grammars and modify it so that (i) grammars are allowed to employ a single designated feature that takes stacks of items drawn from some finite set as its values, and (ii) rules are permitted to

* This paper originates in a talk given to the Workshop on Scandinavian and the Theory of Grammar held in Trondheim in the Summer of 1982 and orgamzed by Lars Hellan. I am indebted to Jens Erik Fenstad for drawing my attention to an error in my presentation there. A subsequent version was given to the Santa Cruz Workshop on the Mathematics of Grammars and Languages organized by Geoffrey K. Pullum in June 1985. I . am grateful to Fernando Pereira, Carl Pollard, and Stuart Shieber for relevant conversations to Bill Marsh and Geoff Pullum for their detailed comments on earlier drafts, and to'Dikran Karageuzian and his colleagues for the work they did on the diagrams and figures found below. Support for work on this paper was provided by the ESRC (UK), by grants to Stanford University from the National Science Foundation (BNs.8102406) and the Sloan Foundation by the Center for the Study of Language and Information, and by grants from the Sloan' Foundation and System Development Foundation to the Center for Advanced Study in the Behavioral Sciences.

I

69

U. Reyle and C. Rohrer (eds.), Natural Language Parsing and Linguistic Theories, 69-94. © 1988 by D. Reidel PUblishing Company.

68 STUART M. SHffiBER 2. Robert Berwick and Amy Weinberg. The Grammatical Basis of Linguistic Performance:

Language Use and Acquisition. MIT Press, Cambridge, Massachusetts, 1984. 3. Joan Bresnan, editor. The Mental Representation of Grammatical Relations. MIT Press,

Cambridge, Massachusetts, 1982. 4. Luca Cardelli and Peter Wegner. Understanding types, data abstraction and

polymorphism. Draft paper, 1985. 5. Noam Chomsky. Aspects of the Theory of Syntax. MIT Press, Cambridge, Massachusetts,

1965. 6. Noam Chomsky. Lectures on Government and Binding. Foris Publications, Dordrecht,

Holland, 1982. 7. Steven Fortune, Daniel Leivant, and Michael O'Donnell. The Expressiveness of Simple

and Second Order Type Structures. Research Report RC 8542, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1980.

8. Gerald Gazdar. Phrase Structure Grammar, pages 131-186. D. Reidel, Dordrecht, Holland, 1982.]

9. Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. Generalized Phrase Structure Grammar. Blackwell Publishing, Oxford, England, and Harvard University Press, Cambridge, Massachusetts, 1985.

10. Ronald Kaplan. Personal communication, 1985. 11. Martin Kay. Unification Grammar. Technical Report, Xerox Palo Alto Research

Center, Palo Alto, California, 1983. 12. Fernando C. N. Pereira and Stuart M. Shieber. The semantics of grammar formalisms

seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California, 2-7 July 1984.

13. Fernando C. N. Pereira and David H. D. Warren. Parsing as deduction. In of the 21st Annual Meeling of the Association for ComputatiolJal Linguistics, pages 137-144, Massachusetts Institute of Technology, Cambridge, Massachusetts, 15-17 June 1983.

14. Ritchie, Graeme. The Computational Complexity of Sentence Derivation in Functional Unification Grammar. In Proceedings of the 11th International Conference on Computational Linguistics, pages 5!W-586, University of Bonn, Bonn, West-Germany, August 1986.

15. William C. Rounds and Robert Kasper. A Complete Logical Calculus for RecoId Structures Representing Linguistic Information. In Proceedings of the 15th Annual Symposium on Logic in Computer Science, Massachusetts Institute of Technology, Cambridge, Massachussetts, June 1986.

16. Stuart M. Shieber. The design of a computer language for linguistic information. In Of the Tenth International Conference on Computational Linguistics,

Stanford University, Stanford, California, 2-7 July 1984. 17. Stuart M. Shieber. An Introduction to Unification-Based Approaches to Grammar.

Volume 4 of Lecture Note Series, Center for the Study of Language and Information, Stanford, California, 1986.

18. Stuart M. Shieber. A simple reconstruction of GPSG. In Proceedings of the 11th International Conference on Computational Linguistics, University of Bonn, Bonn, West Germany, 25-29 August 1986.

19. Stuart M. Shieber. Using restriction to extend parsing algorithms for complex-feature-based formalisms. In of the 22nd Annual Meeting of the Association for Computational Linguistics, University of Chicago, Chicago, Dlinois, July 1985.

20. Patrick H. Winston and Karen A. Prendergast. The AI Business: Commercial Uses of ArtifiCial Intelligence. MIT Press, Cambridge, Massachusetts, 1984.

GERALD GAZDAR

UNIVERSITY OF SUSSEX

SCHOOL OF SOCIAL SCIENCES

BRIGHTON

APPLICABILITY

OF INDEXED GRAMMARS

TO NATURAL LANGUAGES

* 1. INTRODUCI10N

If we take the class of context-free phrase structure grammars (CF-PSGs) and modify it so that (i) grammars are allowed to make use of fmite feature systems and (ii) rules are permitted to manipulate the features in arbitrary ways, then what we end up with is equivalent to what we started out with. Suppose, however, that we take the class of context-free phrase structure grammars and modify it so that (i) grammars are allowed to employ a single designated feature that takes stacks of items drawn from some finite set as its values, and (ii) rules are permitted to

* This paper originates in a talk given to the Workshop on Scandinavian and the Theory of Grammar held in Trondheim in the Summer of 1982 and orgamzed by Lars Hellan. I am indebted to Jens Erik Fenstad for drawing my attention to an error in my presentation there. A subsequent version was given to the Santa Cruz Workshop on the Mathematics of Grammars and Languages organized by Geoffrey K. Pullum in June 1985. I . am grateful to Fernando Pereira, Carl Pollard, and Stuart Shieber for relevant conversations to Bill Marsh and Geoff Pullum for their detailed comments on earlier drafts, and to'Dikran Karageuzian and his colleagues for the work they did on the diagrams and figures found below. Support for work on this paper was provided by the ESRC (UK), by grants to Stanford University from the National Science Foundation (BNs.8102406) and the Sloan Foundation by the Center for the Study of Language and Information, and by grants from the Sloan' Foundation and System Development Foundation to the Center for Advanced Study in the Behavioral Sciences.

I

69

U. Reyle and C. Rohrer (eds.), Natural Language Parsing and Linguistic Theories, 69-94. © 1988 by D. Reidel PUblishing Company.

72 GERALD GAZDAR

stack to all nonterminal . daughters in all rule types, and (b) restricts additions to the stack to cases where the mother category has but a single daughter. Neither aspect of the definition appears to be essential, and things are probably only done that way in order to facilitate doing proofs. Suppose we allow the rule types shown in (2) in addition to those listed in (1):

(2) i. ii. iii.

A[ .. ] -> WiD B[ .. ] W2D A[) -> WiD B[i, .. ] WZp A[l, .. ] -> W1[]E[ .. ] W2U

Rules of type (2.i) would allow the stack to be carried down onto a single nonterminal designated daughter, and rules of type (2.ii) and (2.iii) would also allow such a designated daughter to have its stack incremented or decremented, respectively. As shown in the appendix, permitting such additional rule types has no consequences for the class of languages such grammars can generate.

3. SOME EXAMPLE GRAMMARS

The best way to see how indexed grammars work is to look at an example, such as (3), which shows a grammar for aDbD.

(3) S[ .• ] -> aA[z, .• ] A[ .. ] -> aA[a, .. ] A[ .. ] -> B[ .. ] B[a, .. ] -> bB[ .. ] B[z, .. ] -> b

This looks more complicated than it actually is. All it does is generate trees such as that shown in (4):

I I

.J ... ":"<'?

APPLICABILITY OF INDEXED GRAMMARS 73

(4) 51]

8

B[z] b

b

As we go down the tree, nonterminal as are produced, and index as are loaded onto the stack. WheD we get to the middle, signalled by the change of category from A to B, then nonterminal bs are produced, and index as are removed from the stack. The stack thus records how many nonterminal as were originally produced, and this record can then be used to produce exactly as many nonterminal bs. One slight complication is the use of the end-marker index z. This is necessary because the standard formulations of index grammars allow a non-empty stack to simply disappear if the category the· stack appears on only has terminal daughters. We could avoid the need for end-marker indices of this kind by requiring that the stack distributed over daughters must be empty when every daughter is nonterminal. As shown in the appendix, such a constraint would not affect the class of languages generated, but it does simplify the formulation of certain kinds of grammar. Thus we could simplify the grammar in (3) above by replacing each occurrence of z with Q. When I come to discuss certain abstract tree configurations later in the paper, I shall simply ignore end-marker indices.

Notice that (4) is a right-linear (and hence finite state) tree, and that

Gazdar 1988

r r r \

r r (!fIWt

l

r r r r r r r r r r r r r

The productions in an Indexed Grammar can be written as (refer [Gazdar 85a], the

original definition of Indexed Grammars appears in [Aho 68])

.Y'[··]--+ .. Ydi, .. ] ..... Yn[i, .. ] (push) or

.Y [ i, .. ] --+.\ 1 [ •• ] ..... Y n [ •• ] or

In an Indexed Grammar, a stack of svmbols (called the indices) is associated \vith . .

nontenninals during derivations. [ .. ] is used to represent the present stack associated

with a nonterminal. In the first production the parent (th.s of the production) distributes

copies of its stack to its children after pushing the symbol i at the top of the stack. Thus,

we can see how the stack information is shared by all the symbols in me right-hand

of the production.

In an LIG the productions have the fonn

.. "'( [ i, .. ] --+ ."'( 1 [] • • •. ,:r j [ .. ] . . .• "'( n [] (pop) or

LIG differs from IG in that the stack is passed on to only one of the children. [] corre-

sponds to a new but empty stack. The productions of LIG' s can be generalized [0 be of

the form

79

r r r \

r r (!fIWt

l

r r r r r r r r r r r r r

The productions in an Indexed Grammar can be written as (refer [Gazdar 85a], the

original definition of Indexed Grammars appears in [Aho 68])

.Y'[··]--+ .. Ydi, .. ] ..... Yn[i, .. ] (push) or

.Y [ i, .. ] --+.\ 1 [ •• ] ..... Y n [ •• ] or

In an Indexed Grammar, a stack of svmbols (called the indices) is associated \vith . .

nontenninals during derivations. [ .. ] is used to represent the present stack associated

with a nonterminal. In the first production the parent (th.s of the production) distributes

copies of its stack to its children after pushing the symbol i at the top of the stack. Thus,

we can see how the stack information is shared by all the symbols in me right-hand

of the production.

In an LIG the productions have the fonn

.. "'( [ i, .. ] --+ ."'( 1 [] • • •. ,:r j [ .. ] . . .• "'( n [] (pop) or

LIG differs from IG in that the stack is passed on to only one of the children. [] corre-

sponds to a new but empty stack. The productions of LIG' s can be generalized [0 be of

the form

79

Vijayashanker 1987

Page 6: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

“Convergence” of Mildly Context-Sensitive Grammar Formalisms

TAG ≡ LIG ≡ HG

Vijay-Shanker and Weir 1994

TAG: tree-adjoining grammar; LIG: linear indexed grammar; HG: head grammar

recursively enumerable

context-sensitive

context-free

regular

type 0

type 1

type 2type 3

linear indexedtype 1.5?

There is a book where the author calls the linear indexed grammars “type 1.5”.

T[] → AB[]C

T[ fg] ⇒ AB[ fg]C

Production

One-step rewriting

α → βProduction

γαδ ⇒ γβδOne-step rewriting

Type 0

Linear indexed

A linear indexed grammar is not an instance of a type 0 grammar.

Page 7: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

recursively enumerable

context-sensitive

context-free

regular

type 0

type 1

type 2type 3

linear indexed

indexed

There are actually almost no other grammar formalisms that are instances of Chomsky’s type 0 grammars.

Thue→Post→Chomsky

§ 1. GENERATIVE GRAMMARS AND LINGUISTIC COMPETENCE 9

ments may provide useful, in fact, compelling evidence for such a theory.

To avoid what has been a continuing misunderstanding, it is perhaps worth while to reiterate that a generative grammar is not a model for a speaker or a hearer. It attempts to characterize in the most neutral possible terms the knowledge of the language that provides the basis for actual use of language by a speaker- hearer. When we speak of a grammar as generating a sentence with a certain structural description, we mean simply that the grammar assigns this structural description to the sentence. When we say that a sentence has a certain derivation with respect to a particular generative grammar, we say nothing about how the speaker or hearer might proceed, in some practical or efficient way, to construct such a derivation. These questions belong to the theory of language use — the theory of per- formance. No doubt, a reasonable model of language use will incorporate, as a basic component, the generative grammar that expresses the speaker-hearer's knowledge of the language; but this generative grammar does not, in itself, prescribe the char- acter or functioning of a perceptual model or a model of speech production. For various attempts to clarify this point, see Chomsky (1957), Gleason (1961), Miller and Chomsky (1963), and many other publications.

Confusion over this matter has been sufficiently persistent to suggest that a terminological change might be in order. Never- theless, I think that the term "generative grammar" is completely appropriate, and have therefore continued to use it. The term "generate" is familiar in the sense intended here in logic, particularly in Post's theory of combinatorial systems. Further- more, "generate" seems to be the most appropriate translation for Humboldt's term erzeugcn, which he frequently uses, it seems, in essentially the sense here intended. Since this use of the term "generate" is well established both in logic and in the tradition of linguistic theory, I can see no reason for a revision of terminology.

Chomsky 1965

Tan JouHNL ow SmBouc LOGIC Volume 12, Number 1, March 1947

RECURSIVE UNSOLVABILITY OF A PROBLEM OF THUE

EMIL L. POST

Alonzo Church suggested to the writer that a certain problem of Thue [6]' might be proved unsolvable by the methods of [5]. We proceed to prove the problem recursively unsolvable, that is, unsolvable in the sense of Church [1], but by a method meeting the special needs of the problem.

Thue's (general) problem is the following. Given a finite set of symbols al,

a2, ... , a, , we consider arbitrary strings (Zeichenreihen) on those symbols, that is, rows of symbols each of which is in the given set. Null strings are in- cluded. We further have given a finite set of pairs of corresponding strings on

the ai's, (Al , B1), (A2 , B2), , I (An , B,). A string R is said to be a substring of a string S if S can be written in the form URV, that is, S consists of the letters,

in order of occurrence, of some string U, followed by the letters of R, followed by the letters of some string V. Strings P and Q are then said to be similar if Q can be obtained from P by replacing a substring Ai or Bi of P by its correspond- ent Bi, Ai. Clearly, if P and Q are similar, Q and P are similar. Finally, P and Q are said to be equivalent if there is a finite set R1 , R2, * * *, R, of strings on a,, * * *, a, such that in the sequence of strings P, R1, R2, ... , RX Q each string except the last is similar to the following string. It is readily seen that this relation between strings on a,, * * *, a, ,is indeed an equivalence relation. Thue's problem is then the problem of determining for arbitrarily given strings

A, B on al, * * *, a;, whether, or no, A and B are equivalent. This problem, at least for the writer, is more readily placed if it is restated

in terms of a special form of the canonical systems of [3]. In that notation, strings C and D are similar if D can be obtained from C by applying to C one of the following operations:

PAiQ produces PBQ, PBQ produces PAQ, i = 1, 2, * , n. (1)

In these operations the operational variables P, Q represent arbitrary strings Strings A and B will then be equivalent if B can be obtained from A by starting with A, and applying in turn a finite sequence of operations (1). That is, A and B are equivalent if B is an assertion in the "canonical system"2 with primi- tive assertion A and operations (1). Thue's general problem thus becomes the decision problem for the class of all canonical systems of this "Thue type."

This general problem could easily be proved recursively unsolvable if, instead of the pair of operations for each i of (1), we merely had the first operation of each pair.3 In fact, by direct methods such as those of [31, we easily reduce the decision problem of an arbitrary "normal system" [3] to the decision problem of such a system of "semi-Thue type," the known recursive unsolvability of the

Received October 26, 1946. Presented to the American Mathematical Society November 2, 1946.

1 Numbers in brackets refer to the bibliography at the end of the paper. 2 Null assertions, however, now being allowed. J That is, using the language of propositions instead of operations, if we merely had an

implication where (1) has an equivalence.

1

This content downloaded from 133.25.240.37 on Thu, 28 Jun 2018 12:34:17 UTCAll use subject to http://about.jstor.org/terms

How did Chomsky come to use semi-Thue systems as the most general type of grammars?I believe Post (1947) coined the term “semi-Thue system”.

Post Canonical Systems FORMAL REDUCTIONS OF THE GENERAL COMBINATORIAL

DECISION PROBLEM.*

By EMIL L. POST.

1. Introduction. It is not new to the literature that the usual form of

a symbolic logic with its parenthesis notation and infinite set of variables can

be transformed into one in which the enunciations, i. e., formulas of the

system, are finite sequences of letters,' the different letters constituting a

once-and-for-all given finite set. If the primitive letters of such a system are

represented by a,, a2, , ap, an arbitrary enunciation of the system will take the form ai, a2* a,, i - 1, 2, 3, , ij 1, 2, , 1. In describing the basis of such a system it is convenient to use new letters to represent finite

sequences of the above primitive letters. If then A, B, * , E represent

the sequences ai, a 2 . . . a,, aj, aj2 ... aj , * * *, a)t1 a,2 * a., respectively, ABR E will represent the sequence ail a 2 **aO aj, aj2 aj.- a,1 am2 * .* am,,.

We shall say that such a system is in canonical form if its basis has the

following structure.2 The primitive assertions of the system are a specified

finite set of enunciations of the above form. The operations of the system are a specified finite set of productionts, each of the following form:

gllPi'l g12PVt2 91M gmPi'.1 91 ('M+1)

g2lPi"i g22PV'2 . .2m2Pi"m_2 g2(m2+1)

P(o) d (e) gkmkP,' (k) ghk

produce

gqlPil gXi2P . . . ,q'Mpim ,QM+l'

* Received November 14, 1941; Revised April 11, 1942. 'More exactly, "strings" of "marks," to use terms of C. I. Lewis (A Survey of

Symbolic Logic, Berkeley, 1918: chapter VI, sec. III).

2 This formulation stems from the "Generalization by Postulation" of the writer's

"Introduction to a general theory of elementary propositions," American Journal of Mathematics, vol. 43 (1921), pp. 163-185 (see p. 176). We take this opportunity to

make the following Emendation: Lemma 1 thereof (pp. 177-178) requires the added

condition that the expressions replacing the r's do not involve any letter upon which a

substitution is made in the given deductive process. This necessitates several minor

changes in the proof of the theorem there following. Actually, both Lemma 1 and its

companion Lemma 2 admit of further simplification, with the proof of the theorem then

being valid as it stands.

197

This content downloaded from 133.25.167.229 on Fri, 29 Jun 2018 04:50:34 UTCAll use subject to http://about.jstor.org/terms

FORMAL REDUCTIONS OF THE GENERAL COMBINATORIAL

DECISION PROBLEM.*

By EMIL L. POST.

1. Introduction. It is not new to the literature that the usual form of

a symbolic logic with its parenthesis notation and infinite set of variables can

be transformed into one in which the enunciations, i. e., formulas of the

system, are finite sequences of letters,' the different letters constituting a

once-and-for-all given finite set. If the primitive letters of such a system are

represented by a,, a2, , ap, an arbitrary enunciation of the system will take the form ai, a2* a,, i - 1, 2, 3, , ij 1, 2, , 1. In describing the basis of such a system it is convenient to use new letters to represent finite

sequences of the above primitive letters. If then A, B, * , E represent

the sequences ai, a 2 . . . a,, aj, aj2 ... aj , * * *, a)t1 a,2 * a., respectively, ABR E will represent the sequence ail a 2 **aO aj, aj2 aj.- a,1 am2 * .* am,,.

We shall say that such a system is in canonical form if its basis has the

following structure.2 The primitive assertions of the system are a specified

finite set of enunciations of the above form. The operations of the system are a specified finite set of productionts, each of the following form:

gllPi'l g12PVt2 91M gmPi'.1 91 ('M+1)

g2lPi"i g22PV'2 . .2m2Pi"m_2 g2(m2+1)

P(o) d (e) gkmkP,' (k) ghk

produce

gqlPil gXi2P . . . ,q'Mpim ,QM+l'

* Received November 14, 1941; Revised April 11, 1942. 'More exactly, "strings" of "marks," to use terms of C. I. Lewis (A Survey of

Symbolic Logic, Berkeley, 1918: chapter VI, sec. III).

2 This formulation stems from the "Generalization by Postulation" of the writer's

"Introduction to a general theory of elementary propositions," American Journal of Mathematics, vol. 43 (1921), pp. 163-185 (see p. 176). We take this opportunity to

make the following Emendation: Lemma 1 thereof (pp. 177-178) requires the added

condition that the expressions replacing the r's do not involve any letter upon which a

substitution is made in the given deductive process. This necessitates several minor

changes in the proof of the theorem there following. Actually, both Lemma 1 and its

companion Lemma 2 admit of further simplification, with the proof of the theorem then

being valid as it stands.

197

This content downloaded from 133.25.167.229 on Fri, 29 Jun 2018 04:50:34 UTCAll use subject to http://about.jstor.org/terms

An indexed grammar is an instance of a Post canonical system.

Page 8: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

Why Didn’t Chomsky Use Post Canonical Systems?

REDUCTIONS OF TI-IE COMIBINATORIAL DECISION PROBLEM. 199

the production whenever it contains the enuniciations represelnted by the several premises of the prodluction.

A very special case of the canonical form is what we term the normal

form. A system in canonical form will be said to be in normal form if it has

but one primitive assertion, and, ea el of its productions is in the form

gP

produces

Pg'.

The main purpose of the presenit paper is to demonstrate that every system in canonical form can formally be reduced to a system in normal form. The

two forms may therefore in fact be said to be equipotent.- More precisely, we prove the following

THEOREM. Giver a system in canonical form with primitive letters

a1, a2 < * , , ap, a system in1 normal form with primitive letters a, a2, , aiu, a 1, a'2, * , a' can be set utp such that the assertions of the system in canonical

form are exactly those assertions of the system in normal formn which involve

no other letters than al, a2, , ap.

As a result of this theorem the decision problem for a system in canonical

form is reduced to the decision problenm for the corresponding systemi in normial

form. For an enunciation of the former system is an assertion when and only when it is an assertion of the latter system. Hence any procedure whiclh could

effectively determine for an arbitrary enunciation of the system in normal form whether it is or is not an assertion thereof would automatically do the same for the system in canonical form. Now by methocls such as those referred

to in the opening sentence of this introduction, it can be shown that the problem of determining for an arbitrary well-formed formula in the X-calculus

of Church whether it has or has not a normnal form (Church) ) can be reduced

to the decision problem for a particular system in our canonical form. While Church has proved the above problenm unsolvable in a certaini technical sense, in the interest of economy we invoke his identification of X-definability with

effective calculability to conclude that as a resuilt the decision problem for that

particular system in canonical form, and hence for the class of systems in canonical form, is unsolvable. We are thus led to the more surprising result

5 Alonzo Church, " An unsolvable problem of elementary number theory," American Journal of Mathematics, vol. 58 (1936), pp. 345-363.

This content downloaded from 133.25.167.229 on Fri, 29 Jun 2018 04:50:34 UTCAll use subject to http://about.jstor.org/terms

Post 1947

Perhaps because of this striking result?

recursively enumerable

context-sensitive

context-free

regular

type 0

type 1

type 2type 3

linear indexed

indexed

Should Context-Free Grammars Be Thought of as String Rewriting Systems?

S

NP VP

V NPKim

saw Sandy

S ⇒ NP VP ⇒ Kim VP ⇒ Kim V NP ⇒ Kim saw NP ⇒ Kim saw Sandy

S ⇒ NP VP ⇒ Kim VP ⇒ Kim V NP ⇒ Kim V Sandy ⇒ Kim saw Sandy

S ⇒ NP VP ⇒ NP V NP ⇒ Kim V NP ⇒ Kim saw NP ⇒ Kim saw Sandy

S ⇒ NP VP ⇒ NP V NP ⇒ NP saw NP ⇒ Kim saw NP ⇒ Kim saw Sandy

S ⇒ NP VP ⇒ NP V NP ⇒ NP V Sandy ⇒ Kim V Sandy ⇒ Kim saw Sandy

S ⇒ NP VP ⇒ NP V NP ⇒ Kim V NP ⇒ Kim V Sandy ⇒ Kim saw Sandy

S ⇒ NP VP ⇒ NP V NP ⇒ NP saw NP ⇒ NP saw Sandy ⇒ Kim saw Sandy

S ⇒ NP VP ⇒ NP V NP ⇒ NP V Sandy ⇒ NP saw Sandy ⇒ Kim saw Sandy

It is silly to define well-formed sentences in terms of derivations if what we really want is to assign tree structures to sentences.

Page 9: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

Bottom-up Interpretation of Context-Free Rules

Bi ⇒G* vi (i = 1,…,n) A → w0 B1 w1 … Bn wn ∈ PA ⇒G* w0 v1 w1 … vn wn

L(G) = { w ∈ Σ* | S ⇒G* w }

All we need is the subrelation of ⇒G*

restricted to N × Σ*.Just a general type of inductive definition.

Inductive Definition = CFG Interpreted Bottom-up

CFGs as Logic Programs on Strings

A → w0 B1 w1 … Bn wn

A(w0 x1 w1 … xn wn) ← B1(x1),…,Bn(xn)

Horn clause

L(G) = { w ∈ Σ* | G ⊢ S(w) }

Page 10: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

Derivation = Proof Tree

S(Kim saw Sandy)

NP(Kim) VP(saw Sandy)

V(saw) NP(Sandy)

G ⊢ S(Kim saw Sandy)

It’s a small step to use n-ary predicates for n ≥ 2.

Multiple Context-Free Grammars

A(α1,…,αq) ← B1(x1,1,…,x1,q1),…,Bn(xn,1,…,xn,qn)

n ≥ 0, q, qi ≥ 1,αk ∈ (Σ ∪ { xi,j | i ∈ [1,n], j ∈ [1,qi] })*each xi,j occurs exactly once in (α1,…,αq)

• q = dim(A) (dimension of A)• dim(S) = 1• L(G) = { w ∈ Σ* | G ⊢ S(w) }

It’s best to think of an MCFG as a kind of logic program.Each rule is a definite clause.Nonterminals are predicates on strings.

S(x1#x2) ← D(x1, x2)D(ε, ε) ←

D(x1y1, y2x2) ← E(x1,x2), D(y1,y2)E(ax1ā, āx2a) ← D(x1,x2)

{ w#wR | w ∈ D1* }

S(aaāāaā#āaāāaa)

D(aaāāaā, āaāāaa)

E(aaāā, āāaa)

D(aā, āa)

E(aā, āa)

D(ε, ε)

D(ε, ε)

D(aā, āa)

E(aā, āa)

D(ε, ε)

D(ε, ε)

2-MCFG2-ary branching

derivation tree

Page 11: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

S(x1…xm) ← A(x1,…,xm)A(ε,…,ε) ←

A(a1 x1 a2,…,a2m−1 xm a2m) ← A(x1,…,xm)

non-branching m-MCFG

{ a1n a2n … a2m−1n a2mn | n ≥ 0 }m-MCFL(m−1)-MCFL

2-MCFL

=CFL

1-MCFL

Seki et al. 1991

MCFGs as Logic Programs on Strings

Annius Groenink

..., daß Frank Julia Fred schwimm

en helfen sah

...th

at Fr

ank

saw

Jul

ia he

lp F

red

sw

im

...dat Frank Julia Fred zag helpen zwemmen

word order and tractability issues in natural language analysis

Surface without Structure

Page 12: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

Elementary Formal Systems

Smullyan 1961

Elementary Formal Systems

“Elementary Formal Systems can be looked at as variants of the canonical languages of Post …. The reason for our choice of elementary formal systems, in lieu of Post’s canonical systems, is that their structure is more simply described, and their techniques are more easily applied. The general notion of “production” used in Post systems is replaced by the simpler logistic rules of substitution and detachment (modus ponens), which are more easily formalized.”

Smullyan 1961

Chomsky HierarchyRewriting Systems Machines Logic Programs on Strings Languages

Type 0 Turing Elementary Formal Systems (Smullyan 1961) r.e.

Type 1 LBA Length-Bounded EFS (Arikawa et al. 1989)

CSL = NSPACE(n)

Poly-time Turing

Simple LMG (Groenink 1997) / Hereditary EFS (Ikeda and

Arimura 1997)P

MCFG MCFLType 2 PDA Simple EFS (Arikawa 1970) CFLType 3 FA Reg

Page 13: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

recursively enumerable

context-sensitive

multiple context-free

context-free

regular

head

Chomsky on Mathematical Linguistics (1979)

Chomsky (2004): Generative Enterprise Revisited

. . .

. . .

A long quote, from an interview held in the year when Hopcroft and Ullman’s textbook was published.

Chomsky on Mathematical Linguistics (1979)

Chomsky (2004): Generative Enterprise Revisited

. . .

Page 14: What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California,

Summary• The four levels of the Chomsky hierarchy are not of equal

importance.

• The choice of semi-Thue systems as the most general form of grammar hampered investigation of some important language classes.

• Smullyan’s elementary formal system provides the right level of generality and simplicity.

• Bottom-up semantics of rules is basic.

• Natural generalization of “inductive definitions”.

What is the Significance of Grammar Formalisms?

• Abstract conception of grammars.

• Makes possible investigation of algorithmic properties of grammars.

• Should have good formal properties (like familiar closure properties).

• The choice of a grammar formalism should not be thought of as a tight characterization of possible human grammars.

• A grammar formalism does not have explanatory power.