american mathematical society - titles in this series8 simon gindikin, editor, mathematical methods...

16

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,
Page 2: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

Titles in This Series 17 Eri c Sven Ristad, Editor, Languag e Computation s 16 Pano s M. Pardalos and Henry Wolkowicz, Editors , Quadrati c Assignmen t an d Relate d

Problems 15 Nathanie l Dea n and Gregory E. Shannon, Editors, Computationa l Suppor t fo r Discret e

Mathematics 14 Rober t Calderbank, G . Davi d Forney, Jr. , an d Nader Moayeri , Editors , Codin g an d

Quantization: DIMACS/IEE E Worksho p 13 Jin-Y i Cai, Editor, Advance s in Computational Complexit y Theor y 12 Davi d S. Johnson and Catherine C. McGeoch, Editors , Networ k Flow s and Matching :

First DIMACS Implementation Challeng e 11 Larr y Finkelstein and William M. Kantor, Editors, Group s and Computatio n 10 Joe l Friedman, Editor, Expandin g Graphs 9 Willia m T. Trotter, Editor, Plana r Graph s 8 Simo n Gindikin, Editor, Mathematica l Method s of Analysis of Biopolymer Sequence s 7 Lyl e A. McGeoch and Daniel D. Sleator, Editors, On-Lin e Algorithms 6 Jaco b E . Goodman , Richar d Pollack , an d Willia m Steiger , Editors , Discret e

and Computational Geometry : Paper s from th e DIMACS Special Year 5 Fran k Hwang, Clyde Monma, and Fred Roberts, Editors, Reliabilit y o f Compute r an d

Communication Network s 4 Pete r Gritzman n an d Bern d Sturmfels , Editors , Applie d Geometr y an d Discret e

Mathematics, The Victor Klee Festschrif t 3 E . M. Clarke and R. P. Kurshan, Editors, Computer-Aide d Verificatio n '9 0 2 Joa n Feigenbaum and Michael Merritt, Editors, Distribute d Computin g and Cryptograph y 1 Willia m Cook and Paul D. Seymour, Editors, Polyhedra l Combinatoric s

Page 3: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

This page intentionally left blank

Page 4: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

Language Computation s

Page 5: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

This page intentionally left blank

Page 6: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

DIMACS Series i n Discret e Mathematic s

and Theoretica l Compute r Scienc e

Volume 1 7

Language Computation s

DIMACS Worksho p on Huma n Languag e March 20-22 , 199 2

Eric Sve n Rista d Editor

NSF Scienc e an d Technolog y Cente r in Discret e Mathematic s an d Theoretica l Compute r Scienc e A consortiu m o f Rutger s University , Princeto n University ,

AT&T Bell I>abs , Bellcor e

American Mathematica l Societ y

https://doi.org/10.1090/dimacs/017

Page 7: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

The DIMAC S Worksho p o n Huma n Languag e wa s hel d a t DIMACS , Marc h 20-22 , 1992 .

1991 Mathematics Subject Classification. Primar y 92K20 ; Secondary 68Q25 , 68S05 , 68T05.

ABSTRACT. Th e goa l o f thi s volum e i s t o advanc e ou r understandin g o f th e computation s tha t underlie th e comprehension , production , an d acquisitio n o f huma n languages . Th e focu s o f th e volume i s on computation s relate d t o th e soun d an d meanin g o f words .

Library o f Congres s Cataloging-in-Publicat io n D a t a

Language computations/Eri c Sve n Ristad , editor . p. cm . — (DIMAC S serie s in discret e mathematic s an d theoretica l compute r science ; v . 17 )

Includes bibliographica l references . ISBN 0-8218-6608- 7 1. Natura l languag e processin g (Compute r science ) 2 . Computationa l complexity . I . Ristad ,

Eric Sven . II . Series . QA76.9.N38L36 199 4 415/.01/5113—dc20 94-2804 5

CIP

Copying an d reprinting . Individua l reader s o f thi s publication , an d nonprofi t librarie s actin g for them , ar e permitte d t o mak e fai r us e o f th e material , suc h a s t o cop y a n articl e fo r us e in teachin g o r research . Permissio n i s grante d t o quot e brie f passage s fro m thi s publicatio n i n reviews, provide d th e customar y acknowledgmen t o f th e sourc e i s given .

Republication, systemati c copying , o r multipl e reproductio n o f any materia l i n thi s publicatio n (including abstracts ) i s permitte d onl y unde r licens e fro m th e America n Mathematica l Society . Requests fo r suc h permissio n shoul d b e addresse d t o th e Manage r o f Editoria l Services , America n Mathematical Society , P.O . Bo x 6248 , Providence , Rhod e Islan d 02940-6248 . Request s ca n als o be mad e b y e-mai l t o reprint-permissionQmath.ams.org .

The appearanc e o f th e cod e o n th e firs t pag e o f a n articl e i n thi s publicatio n indicate s th e copyright owner' s consen t fo r copyin g beyon d tha t permitte d b y Section s 10 7 o r 10 8 o f th e U.S . Copyright Law , provide d tha t th e fe e o f $1.0 0 plu s $.2 5 pe r pag e fo r eac h cop y b e pai d directl y t o the Copyrigh t Clearanc e Center , Inc. , 22 2 Rosewoo d Drive , Danvers , Massachusett s 01923 . Thi s consent doe s no t exten d t o othe r kind s o f copying , suc h a s copyin g fo r genera l distribution , fo r advertising o r promotiona l purposes , fo r creatin g ne w collectiv e works , o r fo r resale .

© Copyrigh t 199 4 b y th e America n Mathematica l Society . Al l right s reserved . The America n Mathematica l Societ y retain s al l right s

except thos e grante d t o th e Unite d State s Government . Printed i n th e Unite d State s o f America .

@ Th e pape r use d i n thi s boo k i s acid-fre e an d fall s withi n th e guideline s established t o ensur e permanenc e an d durability .

W Printe d o n recycle d paper .

All article s i n thi s volum e wer e printe d fro m cop y prepare d b y th e authors . Some o f th e article s i n thi s volum e wer e typese t b y th e author s usin g AMS-I&T&L,

the America n Mathematica l Society' s TEJ X macro system .

10 9 8 7 6 5 4 3 2 1 9 9 9 8 9 7 9 6 9 5 9 4

Page 8: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

To Morris Halle

Page 9: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

This page intentionally left blank

Page 10: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

Contents

Foreword x i

Preface xii i

Part I . Phonetic s

C/D model : A computational mode l o f phonetic implementatio n OSAMU FUJIMUR A 1

Relating phoneti c an d phonologica l categorie s ANDRAS KORNA I 2 1

Part II. Metrical Phonology

General propertie s o f stress an d metrica l structur e MORRIS HALL E AN D WILLIA M IDSARD I 3 7

Acquiring stres s system s B. ELA N DRESHE R 7 1

Metrical consistenc y LUIGI BURZI O 9 3

Part III. Learning Frameworks

Inductive reasonin g MING L I AN D PAU L VITANY I 12 7

Language acquisitio n i n the MD L framewor k JORMA RlSSANE N AN D ERI C SVE N RlSTA D 14 9

Part IV. Morphology

Parsing morphology : "Factoring " word s STEPHEN R . ANDERSO N 16 7

Complexity o f morpheme acquisitio n E R I C SVE N RISTA D 18 5

Page 11: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

This page intentionally left blank

Page 12: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

Foreword

This DIMAC S volum e o n Languag e Computation s contain s referee d version s of papers fro m a worksho p hel d a t DIMACS , Marc h 20-22 , 1992 . Th e worksho p wa s sponsored b y DIMACS , throug h a gran t fro m th e Nationa l Scienc e Foundation .

We especiall y than k th e workshop' s organizer , Eri c Ristad , fo r organizin g th e workshop, whic h brough t togethe r s o many outstandin g participant s i n thi s field.

Andras Hajnal , Directo r

Andrew Yao , Co-Directo r

Stephen Mahaney , Associat e Directo r

xi

Page 13: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

This page intentionally left blank

Page 14: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

PREFACE

A language computation i s a computatio n tha t underlie s th e comprehension , production, o r acquisitio n o f huma n language . Languag e comprehensio n i s a com -putation tha t map s a physica l sensatio n t o a representatio n o f the languag e user' s knowledge abou t tha t sensation . Languag e productio n i s a computation tha t map s a languag e user' s linguisti c intentio n t o th e comple x moto r activit y tha t expresse s that intention . Languag e acquisitio n i s a computatio n tha t map s a strea m o f lin -guistic evidenc e t o a n interna l representatio n o f th e huma n languag e fro m whic h that evidenc e wa s culled . Thes e computation s li e a t th e ver y hear t o f huma n lan -guage. Th e goa l o f thi s volum e i s to advanc e ou r understandin g o f these languag e computations. Ou r focu s i s on computation s relate d t o th e sound s an d word s o f a language.

Our volum e begin s i n par t I wit h a n investigatio n o f th e sensory-moto r repre -sentation o f speec h sound s (phonetics) . Osam u Fujimur a open s thi s par t wit h a revolutionary ne w theor y o f ho w th e primitiv e sensory-moto r goal s o f th e speake r are expresse d i n a n acousti c signal . Next , Andra s Korna i present s a n abstrac t mathematical framewor k fo r modelin g th e relatio n betwee n th e discret e represen -tations attribute d t o th e languag e use r an d th e continuou s physica l event s directl y observable b y th e scientist . Together , thes e tw o article s presen t a comprehensiv e view o f the relatio n betwee n th e phonetic s an d th e acousti c signal .

In par t II , we examine phonologica l stress , a phonological subsyste m use d t o in -dicate constituenc y an d hierarchy . Morri s Hall e an d Willia m Isdard i begi n b y pre -senting a n explici t theor y o f th e languag e user' s knowledg e o f phonologica l stress . Their theor y explain s how stress i s represented an d compute d i n the languag e user , and i t enumerate s th e humanly-possibl e stres s systems . Next , Ela n Dreshe r pro -poses a resource-limited algorith m capabl e o f acquiring the Hall e and Idsard i stres s system on the basi s of limited phonologica l evidence . Finally , Luig i Burzio present s a radicall y differen t theor y o f phonologica l stress , tha t relate s th e possibl e stres s patterns o f a languag e t o th e morpholog y o f tha t language . Thes e thre e article s each represen t a significan t advanc e i n ou r understandin g o f phonologica l stress .

In par t III , w e conside r th e abstrac t questio n o f ho w i t i s possibl e t o reliabl y learn a human languag e i n the fac e o f noisy, incomplete , an d inconsisten t evidence . Ming L i an d Pau l Vitany i presen t a genera l theor y o f ho w t o reassig n probabilit y to a hypothesi s spac e o n th e basi s o f empirica l evidence . Next , Jorm a Rissane n and Eri c Rista d revie w th e philosoph y an d practic e o f th e Minimu m Descriptio n Length (MDL ) principle , and demonstrate it s application to the metrical acquisitio n problem considere d i n par t II . Th e centra l contributio n o f thes e tw o article s i s to provid e a powerfu l conceptua l framewor k withi n whic h t o addres s th e logica l problem o f language acquisition .

xii i

Page 15: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,

X I V PREFACE

In par t IV , w e study th e relatio n betwee n th e soun d an d th e meanin g o f word s (morphology). Stephe n Anderso n introduce s thi s par t wit h a detaile d explanatio n of th e intricac y o f morphologica l computations . Eri c Rista d conclude s wit h a n explicit statemen t o f th e morphologica l acquisitio n problem , an d a proo f tha t th e intricacies of this computation — as detailed b y Anderson — straightforwardly giv e rise t o NP-completeness . Traditionally , morpholog y ha s bee n viewe d a s a trivia l component o f linguisti c knowledge , o f littl e scientifi c o r engineerin g importance . By detailin g th e empirica l complexit y o f th e morpholog y an d demonstratin g it s corresponding computationa l demands , thes e tw o article s offe r a seriou s challeng e to traditiona l view s o f morphology .

This volume constitutes the official, referee d recor d of the DIMACS Workshop on Human Language , whic h was held Marc h 20-22 , 199 2 at Princeto n University . Th e workshop dre w togethe r man y o f th e world' s mos t prominen t linguists , compute r scientists, an d learnin g theorist s wit h th e commo n goa l o f advancin g ou r under -standing o f languag e computations . Th e worksho p wa s sponsore d b y th e NE C Research Institut e an d b y th e Nationa l Scienc e Foundation , vi a gran t numbe r 91 -19999 t o th e Cente r fo r Discret e Mathematic s an d Theoretica l Compute r Scienc e (DIMACS).

The article s i n thi s volum e ar e directe d towar d th e technically-competen t re -searcher wit h a n interes t i n human languag e an d i n computation. N o article i n thi s volume require s a n expertis e i n eithe r linguistic s o r compute r science . However , the reade r ma y wis h t o strengthe n hi s backgroun d i n thes e area s b y consultin g the followin g introductor y material : phonolog y an d phonetic s [3] ; morphology [1] ; computer scienc e [4] ; computationa l complexit y theor y [2] ; computationa l theor y of language [8] ; inductive reasonin g [5] ; and minimu m descriptio n lengt h principl e [6, 7].

REFERENCES

1. Lauri e Bauer, Introducing linguistic morphology, Edinburg h University Press, Edinburgh, 1988. 2. Michae l R . Gare y an d Davi d S . Johnson , Computers and intractability, W.H . Freeman , Ne w

York, 1979 . 3. Morri s Halle, Phonology, Language : A n Invitation to Cognitiv e Science (Cambridge) (Danie l N.

Osherson an d Howar d Lasnik , eds.) , MI T Press , Cambridge , 1990 , pp. 43-68 . 4. Davi d Harel , Algorithmics: the spirit of computing, 2n d ed. , Addison-Wesley , Reading , MA ,

1992. 5. Min g L i an d Pau l Vitanyi , An introduction to kolmogorov complexity and its applications,

Springer-Verlag, Ne w York , 1993 . 6. Jorm a Rissanen , Minimum description length principle, Encyclopedi a o f Statistica l Science s

(Chichester) (S . Kot z an d N.L . Johnson , eds.) , vol . 5 , Joh n Wile y & Sons , Chichester , 1985 , pp. 523-527 .

7. , Stochastic complexity, J . Roya l Statistica l Society , Serie s B 4 9 (1987) , no. 3 , 223-23 9 and 252-265 .

8. Eri c Sve n Ristad , The language complexity game, MI T Press , Cambridge , MA , 1993 .

ERIC SVE N RISTAD , PRINCETON , N.J .

E-mail address: [email protected] u

Page 16: American Mathematical Society - Titles in This Series8 Simon Gindikin, Editor, Mathematical Methods of Analysis of Biopolymer Sequences 7 Lyle A. McGeoch and Daniel D. Sleator, Editors,