rich languages from poor inputs

328

Upload: affarisuoi

Post on 20-Jan-2016

83 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Rich Languages From Poor Inputs
Page 2: Rich Languages From Poor Inputs

Rich Languages from Poor Inputs

Page 3: Rich Languages From Poor Inputs

This page intentionally left blank

Page 4: Rich Languages From Poor Inputs

Rich Languages fromPoor Inputs

Edited by

MASSIMO PIATTELLI-PALMARINI AND

ROBERT C. BERWICK

OXFORDUNIVERSITY PRESS

Page 5: Rich Languages From Poor Inputs

OXFORDUNIVERSITY PRESS

Great Clarendon Street, Oxford, 0x2 6DP,United Kingdom

Oxford University Press is a department of the University of Oxford.It furthers the University's objective of excellence in research, scholarship,and education by publishing worldwide. Oxford is a registered trade mark ofOxford University Press in the UK and in certain other countries

© editorial matter and organization Massimo Piattelli-Palmarini and Robert C. Berwick 2013© the chapters their several authors 2013

The moral rights of the authors have been asserted

First Edition published in 2013Impression: i

All rights reserved. No part of this publication maybe reproduced, stored ina retrieval system, or transmitted, in any form or by any means, without theprior permission in writing of Oxford University Press, or as expressly permittedby law, by licence or under terms agreed with the appropriate reprographicsrights organization. Enquiries concerning reproduction outside the scope of theabove should be sent to the Rights Department, Oxford University Press, at theaddress above

You must not circulate this work in any other formand you must impose this same condition on any acquirer

British Library Cataloguing in Publication DataData available

ISBN 978-0-19-959033-9

Printed in Great Britain byMPG Books Group, Bodmin and King's Lynn

Page 6: Rich Languages From Poor Inputs

Contents

AcknowledgmentsThe Authors

i IntroductionMassimo Piattelli-Palmarini and Robert C. Berwick

Part I. Poverty of the Stimulus and Modularity Revised

2 Poverty of the Stimulus Stands: Why Recent Challenges FailRobert C. Berwick, Noam Chomsky, and Massimo Piattelli-Palmarini

3 Children's Acquisition of Syntax: Simple Models are Too SimpleXuan-Nga Cao Kam and Janet Dean Fodor

4 Poverty of the Stimulus: Willingness to be PuzzledNoam Chomsky

5 Revisiting Modularity: Using Language as a Window to the MindSusan Curtiss

6 Every Child an Isolate: Natures Experiments in Language LearningLila Gleitman and Barbara Landau

Part II. Discrepancies between Child Grammarand Adult Grammar

7 Recent Findings about Language AcquisitionJean-Remy Hochmann and Jacques Mehler

8 Ways of Avoiding Intervention: Some Thoughts on the Developmentof Object Relatives, Passive, and ControlAdriana Belletti and Luigi Rizzi

9 Merging from the Temporal Input: On Subject-Object Asymmetriesand an Ergative LanguageItziar Laka

10 Tough-Movement Developmental Delay: Another Effect of PhasalComputationKen Wexler

viiiix

i

19

43

61

68

91

107

H5

127

146

Page 7: Rich Languages From Poor Inputs

vi Contents

ii Assessing Child and Adult GrammarJulie Anne Legate and Charles Yang

12 Three Aspects of the Relation between Lexical and SyntacticKnowledgeThomas G. Bever

Part III. Broadening the Picture: Spelling and Reading

13 Children's Invented Spelling: What We Have Learned in Forty YearsCharles Read and Rebecca Treiman

14 How Insights into Child Language Change our Understanding of theDevelopment of Written Language: The Unfolding Legacy of CarolChomskyStephanie Gottwald and Maryanne Wolf

1 5 The Phonology of Invented SpellingWayne O'Neil

16 The Arts as Language: Invention, Opportunity, and LearningMerryl Goldberg

Epilogue: Analytic Study of the Tadoma Method — Language Abilities of ThreeDeaf-Blind Subjects

Carol ChomskyReferencesIndex

168

183

195

210

220

227

241

2/1

307

Page 8: Rich Languages From Poor Inputs

I saw one cab flattened down to about one foot high. And my mechanics friend toldme that the driver who got out of that cab that was squashed down by accident gotout by a [narrow] escape.

(A person deaf-blind from nineteen months of age enthusiastically describinga recent field trip. Reported in Carol Chomsky, 19863: 337. See also

Gleitman and Landau, this volume p. 92.)

Page 9: Rich Languages From Poor Inputs

Acknowledgments

Although this volume, as explained in the Introduction, and as will be clear to thepresent reader, stands on its own as a highly integrated collection of essays, the initialoccasion for its growth was a workshop of the same title held at MIT, under theauspices of the Department of Linguistics and Philosophy and the Laboratory forInformation and Decision Systems, in December 2009 ('Rich Languages from PoorInputs: A Workshop in Honor of Carol Chomsky'). Funds that made this workshoppossible were generously provided by the National Science Foundation (under grant#0951620), a donation from the MIT Department of Linguistics and Philosophy, andgrants from Oxford University Press and the Cognitive Science Program of the Uni-versity of Arizona. We would like to extend special thanks for facilitating this work-shop to Eric Potsdam, Linguistics Program Director of NSF at that time, and to Prof.Irene Heim, then Department Head Linguistics and Philosophy at MIT. Additionally,Lynne Dell and Lisa Gaumond of MIT ensured that all details of the conferencewere taken care of. Finally, we express our special gratitude to Noam Chomsky, whoaffectionately and unfailingly assisted us in the preparation of the workshop and thenof this volume. The remarkable success of that workshop and now, we fondly hope, ofthis book is also due to the active participation of members of the MIT Department ofLinguistics and Philosophy and colleagues from many other universities in the livelydiscussions during the workshop. The tacit but pervasive impact of those discussionshas contributed to making these chapters as interesting as they are.

MPP and RGB

Page 10: Rich Languages From Poor Inputs

The Authors

ADRIANA BELLETTI is Professor of Linguistics at the University of Siena. She studied atthe Scuola Normale Superiore of Pisa and was a research affiliate at the Department ofLinguistics of MIT; she taught Romance linguistics at the University of Geneva. Herresearch focuses on theoretical comparative syntax, comparative studies in first andsecond language acquisition, and atypical development, with regard to the syntax-discourse relation and the complexity of morphosyntactic derivations. Her recentpublications include Structures and Strategies (Routledge, 2009).

ROBERT C. BERWICK is Professor of Computer Science and Computational Linguistics atthe Massachusetts Institute of Technology. He has published more than a half-dozenbooks on the nature of language, language learnability, and computation, startingfrom his 1982 dissertation, The Acquisition of Syntactic Knowledge, to the Gram-matical Basis of Linguistic Performance, Computational Complexity and NaturalLanguage, and Principle-Based Parsing. Most recently, he has focused on the biologyof language, particularly the evolution of language.

THOMAS BEVER is Regents' Professor of Linguistics, Psychology, Cognitive Science,Second Language Learning, and Education at the University of Arizona. He was inthe first MIT PhD Linguistics class, working in phonology at that time. As a Har-vard Junior Fellow, he then pursued graduate training in psychology. He has co-authored/edited six books, on general psycholinguistics, child development, animalcognition, sentence comprehension and the relation between language and thought.His sustained research focus has been on models that reconcile and integrate statisticalwith symbolic/categorical processes in adult behavior and maturation. His currentresearch includes a broad investigation of normal genetically controlled variation inthe neurological organization for language and cognition: the primary case studyinvolves right-handed people with and without heritable elements of left-handedness.He also holds five patents and patents pending on computational methods to improvetext readability by implementing linguistic and informational structures in how thetext is displayed.

NOAM CHOMSKY is Institute Professor (retired) at MIT, where he has taught since 1955.His work has focused on linguistics, cognitive science, philosophy, history of ideas,social and political theory, and contemporary affairs.

SUSAN CURTISS is Professor Emeritus of Linguistics at UCLA. Her research has focusedon the issues of a 'Critical Period' for first language acquisition, the relationshipof grammar as a mental faculty to non-linguistic cognition both in development

Page 11: Rich Languages From Poor Inputs

The Authors

and breakdown, and the ability of each isolated hemisphere of the brain to developlanguage. Professor Curtiss studied the famous case of 'Genie' as well as the case ofChelsea, a deaf woman exposed to language for the first time in her 305, severely cog-nitively impaired children who despite pervasive retardation have selectively intactgrammar, and adults with progressive dementia, who despite pervasive cognitivedissolution have remarkably spared grammatical function. She has studied the lan-guage of children with SLI (Specific Language Impairment) and adults with acquiredaphasia. She has also studied the language of children, adolescents, and adults withsex chromosomal anomalies, e.g., Turner s syndrome (X) and Klinefelter s syndrome(XXY). Her recent work has focused on the effects of pediatric hemispherectomy—theremoval of one entire hemisphere of the brain—for treatment of catastrophic epilepticdiseases, on language acquisition, and on developing new techniques for evaluatinggrammar pre-operatively and intra-operatively.

JANET DEAN FODOR is Distinguished Professor of Linguistics at the Graduate Center ofthe City University of New York. Following a dissertation on semantics at MIT in 1970,her attention turned to psycholinguistics. She has contributed many papers on cross-linguistic aspects of sentence processing and language acquisition. Her most recentinterests embrace the interface between syntactic structure and prosodic structure insilent reading, and computational models of syntactic parameter setting.

LILA GLEITMAN is Professor of Psychology and Linguistics at the University of Penn-sylvania. Her areas of interest are language and mind with particular emphasis onuniversals of lexical structure and content. Relatedly, she studies language acquisitionunder varying input conditions, including cross-linguistic comparisons. Recent workincludes studies of effects (and non-effects) of specific language encoding on cognitiverepresentation and processing, and the interlocking influences of syntax and visualobservation on growth of the lexicon.

MERRYL GOLDBERG ED.D. is a Professor of Visual and Performing Arts at CaliforniaState University San Marcos (CSUSM) where she teaches courses on Arts and Learn-ing, and Music, and where she is founder and director of Center ARTES, a centerdedicated to restoring arts to education. She has numerous publications includingbooks, articles, chapters, editorials, and blogs, as well as grants from the NationalEndowment for the Arts, the Department of Education, Fulbright-Hays Foundation,and California Arts Council. Merryl was a student of Carols at Harvard where she andCarol, along with several other Harvard and MIT faculty members, formed a musicalgroup called 'Band in Boston. The group performed regularly in Harvard Square, CapeCod, and even once nationally on NPR. For over twenty years, she and Carol sharedacademics, music, friendship, and a mutual love for nature, especially on the Cape.

STEPHANIE GOTTWALD is the Assistant Director at the Center for Reading and LanguageResearch at Tufts University. She is responsible for the administration of federally

X

Page 12: Rich Languages From Poor Inputs

The Authors xi

funded intervention studies for elementary-aged struggling readers and directingtraining workshops for educators on reading fluency and linguistics. She received herMaster's degree in linguistics from Boston College in the Slavic and Eastern LanguagesDepartment and was the recipient of a Fulbright Scholarship to Germany. She iscurrently pursuing a PhD in literacy and child language acquisition at Tufts University.

JEAN-REMY HOCHMANN was born in France. He received his PhD in Cognitive Neu-roscience form SISSA, the International School for Advanced Studies, Trieste, Italy.He now pursues his research in the Laboratory for Developmental Studies at HarvardUniversity, studying speech processing in infants and its relation to other domains ofcognition.

XUAN-NGA CAO KAM received her PhD in linguistics from the Graduate Center ofthe City University of New York in 2009 with a dissertation entitled 'Contributionsof statistical induction to models of syntax acquisition. She has been engaged ineducational research projects in association with CUNY's Research Institute for theStudy of Language in Urban Society. Her research has been reported at a numberof conferences and has been published in Cognitive Science, and the Proceedings ofthe Boston University Conference on Language Development. Xuan-Nga has taughtextensively on various campuses of New York. Currently she is also the ProductionDirector at Hotgrinds, Inc., which provides social media research to Fortune 500companies.

ITZIAR LAKA received her PhD in 1990 from MIT with a dissertation entitled 'Nega-tion in syntax: On the nature of functional categories and projections', publishedby Garland in 1994. She is currently Full Professor at the Department of Lin-guistics and Basque Studies at the University of the Basque Country, and direc-tor of The Bilingual Mind research group at Elebilab Psycholinguistics Laboratory.Her current research combines theoretical linguistics and experimental methodsfrom psycho/neurolinguistics, to study the representation and processing of vari-able/invariable aspects of linguistic structure and bilingualism, with a focus on Basqueand Spanish. She is the author of A Brief Grammar ofEuskara: the Basque Language(1996), freely available on the web, and of a number of papers on theoretical andexperimental linguistics.

BARBARA LANDAU is the Dick and Lydia Todd Professor of Cognitive Science at theJohns Hopkins University. Her areas of interest are spatial representation, language,and the relationship between these two systems of knowledge during developmentand in adulthood. Her work includes theoretical and empirical studies of the relation-ship between language and space in normally developing children, in the congenitallyblind, and in people with Williams syndrome. Recent work emphasizes the roles ofgenes and brain structure in cognitive development.

Page 13: Rich Languages From Poor Inputs

xii The Authors

JULIE ANNE LEGATE received her PhD from MIT in 2002, and she is now an Asso-ciate Professor at the University of Pennsylvania. Her research interests include syn-tactic theory, the syntax and morphology of understudied languages, and languageacquisition.

JACQUES MEHLER is a member of the International School of Advanced Studies, Trieste,Italy. He arrived at this institution as a Professor in Cognitive Neuroscience. Mehlerset up the Language, Cognition, and Development (LCD) Laboratory that investigateshow language is acquired, as well as higher mental processes. Before coming to ItalyMehler was a Directeur de Recherche at the CNRS in France. The language acquisi-tion research that is being done at the LCD begins with the exploration of languageprecursors in neonates and the specialized mechanisms that facilitate infants' pathto language used in the first months of life. LCD also explores how languages areacquired in infants who were raised simultaneously with two languages since birth.

WAYNE O'NEIL is Professor of Linguistics at the Massachusetts Institute of Technologyand Adjunct Lecturer on Human Development at Wheelock College/Boston. Hismost recent publications include Thinking Linguistically (Blackwell, 2008, with MayaHonda), articles in Language and Linguistics Compass, and (also with Maya Honda)two handbooks in the Indigenous Language Institutes Awakening our Languages'series. While at Harvard University in the mid-1960s, O'Neil was a member of CarolChomsky's PhD dissertation committee and during the 19805, he worked with Carolin an attempt to bring linguistics into the school curriculum through the ScientificTheory and Method Project within Harvard's Educational Technology Center.

MASSIMO PIATTELLI-PALMARINI is Professor of Cognitive Science at the University ofArizona and a member of the Department of Linguistics, the Cognitive Science Pro-gram, and the Department of Psychology. In October 1975 he organized the encounterbetween Jean Piaget and Noam Chomsky and in 1980 edited the proceedings (Lan-guage and Learning, Harvard University Press), now translated into eleven languagesand the echoes of which still explicitly resonate in the present volume. In 2009, withJuan Uriagereka and Pello Salaburu, he edited the volume Of Minds and Language:A Dialogue with Noam Chomsky in the Basque Country (Oxford University Press).In 2010, with Jerry Fodor, he published the book What Darwin Got Wrong (ProfileBooks).

CHARLES READ is Professor of Linguistics Emeritus, and Dean Emeritus of the Schoolof Education at the University of Wisconsin-Madison. His research concerns the lin-guistic foundations of reading and writing, including studies of children's beginningspelling, auditory memory in adults of low literacy, and access to units of sound withinsyllables by readers of Chinese. He has also published work on acoustic phoneticanalysis.

Page 14: Rich Languages From Poor Inputs

The Authors xiii

LUIGI RIZZI is Professor of Linguistics at the University of Siena. He was AssociateProfessor at MIT, Professor at the University of Geneva, and Visiting Professor at theEcole Normale Superieure, Paris. His research interests are focused on theoretical andcomparative syntax, with special reference to the theory of locality, the cartographyof syntactic structures, and the study of language variation. He has also contributedto the study of language acquisition.

REBECCA TREIMAN is the Burke and Elizabeth High Baker Professor of Child Devel-opmental Psychology at Washington University in St. Louis. Her research focuses onspelling, reading, and phonology. Many of her studies deal with spelling acquisition intypically developing learners of English. She has also examined spelling and readingin other languages and in children with deafness or dyslexia.

KEN WEXLER is Professor of Psychology and Linguistics in the Department of Brainand Cognitive Sciences and the Department of Linguistics and Philosophy at MIT.He works on theoretical and empirical studies of language learning and development,with a focus on syntax and semantics. His works include studies on learning andmaturation of binding theory, chains, phases, verb movement, tense, null-subjects,and scope. He has a particular interest in the biolinguistic foundations of the field,with many studies of language impairment, including Specific Language Impairment,Williams syndrome, and Autism Spectrum Disorders.

MARYANNE WOLF is the John DiBiaggio Professor of Citizenship and Public Service andDirector of the Center for Reading and Language Research at Tufts University. She isthe author of Proust and the Squid: The Story and Science of the Reading Brain, whichhas been published in twelve languages and an audio version. Wolf's research interestsinclude reading interventions, imaging studies of the reading brain, the genetic basisof dyslexia, early prediction, fluency and naming speed, cross-linguistic studies ofreading, and the development of a reading tablet in work on global literacy.

CHARLES YANG teaches linguistics, computer science, and psychology at the Universityof Pennsylvania. His research focuses on formal and empirical issues in languageacquisition, variation, and change, and he is the author of Knowledge and Learningin Natural Language (Oxford, 2002) and The Infinite Gift (Scribner, 2006). He holds aPhD in Computer Science from MIT and has previously taught at Yale University.

Page 15: Rich Languages From Poor Inputs

This page intentionally left blank

Page 16: Rich Languages From Poor Inputs

1

Introduction

MASSIMO P I A T T E L L I - P A L M A R I N I ANDROBERT C. BERWICK

There is, we are told, a curious contrivance in the service of the English marine. Theropes in use in the royal navy, from the largest to the smallest, are so twisted that ared thread runs through them from end to end, which cannot be extracted withoutundoing the whole; and by which the smallest pieces may be recognized as belongingto the crown.

Johann Wolfgang von Goethe, The Elective Affinities(1809; trans. English Boston: D. W. Niles, 1872), 163

This book is about state-of-the-art research on language structure and languagegrowth in the child. We are all still puzzled today, as linguists were decades ago, bythe dilemma posed by the Poverty of the Stimulus (POS), i.e., the richness of thelanguage acquired by the child on the meager basis of the episodic, variable, and onlyimplicitly structured, linguistic input she receives. Several solutions, tentative as theymay still be, are suggested in what follows. In spite of many old and new attacks onPOS, one point appears to us incontrovertible: sufficiently rich internal resources haveto be attributed to the child, and very early so, in order to have some hope of solvingthis puzzle. What these resources are, and why they are what they are, is the centralconcern of our authors. They lay out in a very clear and exhaustive though necessarilysuccinct way their findings and their hypotheses on this score. The red thread silentlyrunning through the volume from end to end is Carol Chomsky's pioneering researchin child language and literacy and the lines of research which flow from her work.Scholars and young researchers take a fresh look at Chomsky's conclusions and showhow current research owes much of its inspiration to the questions Chomsky posedmore than fifty years ago.

The book is organized in three Parts.

Page 17: Rich Languages From Poor Inputs

Piattdli-Palmarini and Berwick

Part i explains why all presently known refutations of POS fail, in principle andin fact. It is not limited, though, to refuting recent critiques. Several alternative andeminently plausible explanations are offered, acknowledging the full force of POS.

Part 2 examines in some detail the processes of language acquisition, and observedgaps' between adult and child grammar, concentrating on the late spontaneous acqui-sition by children of some key syntactic principles, basically, though not exclusively,between the ages of 5 to 9.

Part 3 widens the horizon beyond language acquisition in the narrow sense, exam-ining the natural development of reading and writing and of the child's growingsensitivity for the fine arts.

Part i Poverty of the Stimulus and Modularity Revisited

Robert Berwick, Noam Chomsky, and Massimo Piattelli-Palmarini, in their chapter'Poverty of the Stimulus Stands: Why Recent Challenges Fail', tackle head-on the 'redthread' of the POS. They note that recently several researchers have claimed that thePOS argument can be deflected without resort to an 'innate schematism'. They thenproceed to demonstrate that all these recent arguments fail, pinpointing why thesefailures occur. They conclude that the POS argument and its support for the availabil-ity of a priori structure dependence in the child stands, and that investigation of thePOS question within standard approaches of the natural sciences yields interestingresults and opens important questions for inquiry.

The next chapter by Xuan-Nga Cao Kam and Janet Dean Fodor, 'Children's Acqui-sition of Syntax: Simple Models are Too Simple', picks up the same 'red thread' ofthe POS and the acquisition of polar interrogatives, focusing in more detail on theinteraction between linear word-to-word relationships (bigrams and trigrams, moregenerally what are called n-grams) as opposed to hierarchical structural relationships.In brief, in accord with the finding of the previous chapter, they demonstrate thatthe only condition found so far that enables accurate across-the-board statisticalauxiliary inversion acquisition presupposes that the learning model has access to thephrase structure properties of word strings. This finding further underwrites the POSargument for an innate biological specialization for language.

Their significant final conclusion is that a learner who only has access to word-levelstatistics as the input for syntax must, as a minimum, also possess an innate propensityto project phrase structure onto word strings, just as Noam Chomsky observed fourdecades ago.

In fact, stringing together the 'red thread' from Kam and Fodor's chapter to thechapters that both precede and follow it, one might advance an even stronger position:contrary to what some have supposed, word-level statistics might be doing very littlefor children in the domain of language syntax, simply because linear order isn't even

2

Page 18: Rich Languages From Poor Inputs

Introduction

available to the child's computational system there. The seemingly 'obvious' fact thatone word follows another obscures the deeper principle of structure dependence thatis the actual driving force behind language. Only if we are willing to shed what at firstsight seems 'obvious' and grapple with what remains as a real puzzle can we makeprogress.

In just this way, in his individual contribution 'Poverty of the Stimulus: Willingnessto be Puzzled', Noam Chomsky addresses this point, first by sketching an illuminatingparallel between the development of Generative Grammar and the rise of the modernscientific revolution, when scientists chose no longer to be satisfied with the conven-tional explanation for why stones fall to the ground and steam rises to the sky: i.e., thatobjects simply seek their natural place. His first lesson is that willingness to be puzzledby what seem to be obvious truths is the first step towards gaining understanding ofhow the world works.

Chomsky then notes that external efficiency of communication has often beeninvoked as the main driving force in shaping the structure and evolution of language.Yet, as he notes, this 'functional' approach turns out to be often at odds with theconstraints on the computational efficiency and transparency of internal linguisticrepresentations, as an interface to the cognitive systems of interpretation and reason-ing. Perhaps surprisingly, it is considerations of internal computational efficiency thatoverride the demands of externalization: parsing and production, hard as they maybe, or even sometimes impossible, are based on internal computations. These seem tobe perfectly designed as a 'language of thought'.

Chomsky's conclusion is that, once we agree to be puzzled by the elementary POSproblem (for example in Auxiliary-raising), and try to offer a principled answer toit, there are many important consequences that follow, and many new problemsarise. This is precisely what we should anticipate (and hope for) in a research pro-gram concerned with fundamental questions. We should not be satisfied with merelydescriptive reports, valuable as these maybe for clearly formulating the questions. Thefact that rich languages arise from poor inputs is an instance of the wider problem ofgrowth and development in biological systems, which includes language in particular.

The next chapter, by Susan Curtiss, 'Revisiting Modularity: Using Language as aWindow to the Mind', takes Chomsky's moral to heart, taking up old questions aboutthe modularity of the mind to see what new puzzles they pose. Curtiss reviews evi-dence from a wide array of sources from which to examine modularity's basic tenets—evidence from studies on the neurology of language, the genetics of language, fromcases of atypical development, cases of genetic anomalies, language breakdown, fromcognitive dissolution, and from a variety of cognitive domains in addition to language.The author is well aware of the old and recent critiques of the modularity thesis and theskepticism voiced about a modular view of the mind. She believes, however, that whenone considers the vast array of relevant evidence, new and old, only a small amountof which she could include here, there is strong reason to conclude that language,

3

Page 19: Rich Languages From Poor Inputs

Piattdli-Palmarini and Berwick

and in particular grammar, is a mental faculty that rests on structural organizingprinciples and constraints not shared in large part by other mental faculties and that,in its processing and computation, grammar is automatic and mandatory.

She concludes that also data from domains outside of language further support thefundamental notion of a modular mind. These modules under normal circumstances'intricately interact in a beautiful dance'. Being human means possessing these differ-ent separable pieces and enjoying their dance.

The chapter by Lila Gleitman and Barbara Landau 'Every Child an Isolate: NaturesExperiments in Language Learning' takes its cue from an insight due to Carol Chom-sky: 'Successful language learning takes place under conditions of input deprivationthat intuition suggests would pose insuperable problems'. Its central finding is that alllearners acquire delicacies of syntactic form and interpretation that (if we are literal)are experienced by nobody.

They make it very clear that simple ostension, that is, presenting the child (as thetraditional story goes) simultaneously with an object and a word is very far removedfrom a real account of how the lexicon is acquired. The extremely sophisticatedknowledge and the very fast mapping that the child develops between words andthe objects, actions, circumstances, and (particularly important) internal states ofmind that those words stand in relation with cannot be explained by the traditionalempiricist hypotheses.

As Landau and Gleitman have famously discovered and rediscovered, congenitallyblind infants acquire predicates that—to the sighted—refer to visual experience with-out having had any experience of seeing at all, and they acquire such items at theordinary times—ages two and three. Many of their earliest words refer to objects,people, places, motions, and locations in ways that seem quite ordinary, even thoughtheir experience of such things was surely different from that of the sighted child.Even more surprisingly, among the earliest words in the blind child's vocabulary arethe verbs look and see, followed shortly by a variety of color terms such as red, blue,and orange. Sighted blindfolded three-year-olds told to 'Look up!' turn their faces, i.e.,their covered eyes, upward, suggesting that they interpret 'look' to implicate vision inparticular. But ablind three-year-old given the same command raises her hands ratherthan her face, suggesting that for her the term is connected to the manual sense. Theblind child's understanding of color is that it refers to an (unknown) quality of concreteobjects and not to mental objects. These findings display the remarkable robustnessof semantic acquisition in spite of variations in input. This special and very clear caseof a rich vocabulary acquired from poor input is a counter to the empiricist theories,which claim that internal representations are abstracted directly from the externalinput. Further clear cases are presented and discussed by Gleitman and Landau.

Famously, Carol Chomsky asked if a blindfolded doll is 'hard to see'. And herfour- and five-year-old subjects confidently replied yes, 'because of the blindfold'. Onerevelation from this work is thus that learning isn't all over and done with by three

4

Page 20: Rich Languages From Poor Inputs

Introduction 5

or four years of age; rather, complexities are still evolving through the school years,with particular structures appearing to elude some native speakers throughout life.

Gleitman and Landau also describe the case of deaf children of hearing parents,who have no available language input. Yet they spontaneously invent gesture systemscalled 'Home Signs'. Remarkably, their home sign systems spontaneously organizetheir world of experience in the same way as spoken languages do. Specifically, homesign systems possess nouns and verbs, distinguishable from each other by their posi-tions in the children's gesture sequences and by their distinctive iconic properties.

They stress that multiple cues to a word meaning are present simultaneously when aword is heard. These include not only a sound and its contingent situation, but also thewhole structure in which the word occurs. From the age of two years through the earlyschool years, all this syntactic knowledge strictly connected with lexical knowledgeexplains how the child succeeds in the formidable task of acquiring the lexicon.

Part 2 Discrepancies between Child Grammar and Adult Grammar

Jean-Remy Hochmann and Jacques Mehler, in their chapter 'Recent Findings aboutLanguage Acquisition present novel data, framed by theoretical explanations thatbring linguistics and cognitive science closer to one another and help explain the basicprocesses of language acquisition.

They review the hypothesis that infants use low-level cues to try to segregate theinput into two categories. Once these categories become available they tend to be usedfor different functions.

Hochmann and Mehler illustrate two cases that show that infants have a tendencyto apply a binary categorization of the different continua carried by speech. Once thesetwo categories are established, each one becomes functional in acquiring language.

Pre-lexical infants are sensitive to a distributional property that allows them toidentify function words in their language in terms of their frequency of occurrence.Function words, in fact, constitute the most frequent words of any language. Using ahead-turn preference procedure, Hochmann and Mehler asked whether infants wouldrather perceive this stream as a series of sequences starting with frequent syllables(Hi Lo Hi Lo) or ending with frequent syllables (Lo Hi Lo Hi). Interestingly, Italianand Japanese showed opposite preference patterns. Italian infants preferred to listento sequences starting with frequent syllables, whereas Japanese infants preferred tolisten to sequences ending with frequent syllables. This preference pattern correlateswith the actual position of frequent words, and therefore that of function words, intheir respective languages

They then asked whether seventeen-month-old infants have different expectationsabout the role of frequent and infrequent words in language acquisition. In particular,if the category of frequent words is related to the category of function words, infants

Page 21: Rich Languages From Poor Inputs

Piattdli-Palmarini and Berwick

should use frequent and infrequent words in a way relevant to the use of functionand content words, respectively. They thus conjectured that infants should rely moreon infrequent words when learning the label of a novel object. This study togetherwith various other controls suggests that infants tend to build binary classes whenconfronted with a continuous property such as frequency. Moreover, once binaryclasses are established, infants tend to use the classes for different purposes.

Once this frequency distribution pattern is accessed, infants will be able to linkthe most frequent syllables with the hierarchical structure of syntactic structures andwill use the infrequent syllables to learn the labels of the nouns or verbs that werehighlighted in the input. Once this first step is achieved, more abstract computationsand generalizations may take place in order to individuate other word classes such asadjectives, adverbs, cognitive verbs, etc.

Overall, Hochmann's and Mehler's results suggest that the categories of consonantsand vowels are not solely convenient constructs for linguists and psycholinguists,but are actually represented in infants' minds. Moreover, by the end of the first yearof life, infants have different expectations about the type of information carried byconsonants and vowels, expecting lexical information to be carried by consonantsand structural information by vowels.

Language acquisition may thus initially rely on a series of core linguistic repre-sentations that are triggered by specific perceptual or distributional properties, asexemplified in the experiments reported in their chapter with sonority and frequency.Core representations will then be enriched by experience to yield the mature repre-sentations that adult speakers entertain. Their view fits well with the Principles andParameters approach proposed by Noam Chomsky (1981). In this view, universallinguistic principles constrain the possible human languages. Some of these principlesmay take the form of core linguistic representations. In Noam Chomsky's view, param-eters are seen as switches that should be put in one or another position according to theinformation extracted from the input. Parameter setting would thus serve to enrichcore representation, and yield the more detailed representations found in the finalstate of language acquisition.

Moving again from very early language acquisition to later childhood acquisition,the next several chapters return to a deeper analysis of the 'easy/hard to see' classicexperiment due to Carol Chomsky. Something interesting seems to happen in thechild's belated understanding of the syntax of the tacit subjects of infinitives, tradi-tionally designated with the capitalized PRO (not to be confused with the lower casepro, the tacit pronoun so frequent in Romance languages). Adopting the conventionof underlying unpronounced elements in a sentence, the sentence The doll is easy to seeshould be rewritten The doll is PRO easy to see. English-speaking adults and childrenabove nine years of age clearly understand that PRO, the tacit subject of to see, refers to'us', while younger children interpret PRO as referring to the doll. The syntax of PROgoes under the name of 'control'. Clearly, this syntactic component is acquired quite

6

Page 22: Rich Languages From Poor Inputs

Introduction

late by children. There are several ways to address this child-adult grammar gap. Atleast three chapters, those by Belletti and Rizzi, Laka, Wexler, and, to a certain extent,a fourth, by Read and Treiman, all take up this issue. We will summarize the first threehere, deferring the last until later in this introduction, since it deals with reading andwriting.

In the chapter by Adriana Belletti and Luigi Rizzi, 'Ways of Avoiding Intervention:Some Thoughts on the Development of Object Relatives, Passive, and Control', a child-adult grammar gap is tackled by considering the central notion of 'intervention: thechild cannot compute a local relation across an intervener close enough in structuraltype to the target of the relation. In fact, this follows from the general locality principle,called Relativized Minimality (RM), initially proposed by Luigi Rizzi in 1990, alsoholding in adult grammars. The hypothesis concerning language acquisition by thechild is that the intervention effect can be avoided through the adoption of certainstructural strategies that become accessible only at later stages in development.

Two cases discussed in their previous work involve different adult strategies avoid-ing intervention: object relatives and passive. Belletti and Rizzi then turn their atten-tion to control and try to trace back subject control to a similar explanatory scheme.

Children below age five (and interestingly also several Broca's aphasics) have a dif-ficulty understanding sentences like the following (characterized as 'object relatives'):

Show me the lion that the elephant is wetting.

The stimulus presented to the child contains two drawings, one in which an elephantsprays a lion with water spouting from its trunk, the other showing a lion that sprayswater from a hose onto the elephant. The child is asked to indicate the vignette that isadequately described by the above sentence. Younger children very frequently indicatethe wrong animal. It's to be stressed that the lion has undergone a syntactic movement,being normally understood (by adults and older children) as the object of wetting.In contrast, if asked via a different syntactic construction (characterized as 'subjectrelatives'):

Show me the elephant that is wetting the lion.

children answer correctly (and so do Broca's aphasics, and even three-year-olds).Identical effects are revealed in wh-constructions:

Which is the lion that the elephant is wetting?

Versus:

Which is the elephant that is wetting the lion?

Why should the presence vs absence of a lexical restriction (two very similar NounPhrases or two very similar wh-expressions—the lion, the elephant, or which, that) inthe moved phrase make a difference?

7

Page 23: Rich Languages From Poor Inputs

Piattdli-Palmarini and Berwick

In essence, if we look at the set-theoretic relations holding between the feature spec-ification of the target and the intervener, three main cases arise: identity, inclusion,and disjunction. When the intervener's specification is identical to the target specifi-cation (the lion, the elephant or that, which), the structure is ruled out by RelativizedMinimality. When the featural specification is disjoint (as in the elephant that...) theprinciple is satisfied and the structure is well formed.

Comprehension significantly improved in children if the target and the intervenerare made structurally dissimilar, in fact featurally disjoint, with only one of them beinglexically restricted. As for adults and older children, the selective violability of weakislands shows that their grammar tolerates situations of featural inclusion. Therefore,the crucial case of headed object relatives crossing over a lexically restricted subjectis expected to be unproblematic. Under this approach, the same formal principle,Relativized Minimality, applies in a slightly stricter form in children than in adults.Belletti and Rizzi do not say it explicitly, but we assume they would agree that positingthe young child as a stricter syntactician than the older child is a far cry from anycontinuist and progressivist (inductivist) theory of language learning.

This selective violability opens up the interesting possibility that object relativesand passive develop independently. We may then expect that the cost in terms ofcomplexity in child grammar will not be equal, one being favored over the other.Data on the acquisition of the Italian passives and relatives by children before andafter around age five confirm this hypothesis, as shown in their chapter.

Turning now to the case of the blindfolded doll, the syntactic issue is that of PROand its controller. These must be connected by a search operation (an Agree-like one)constrained by Relativized Minimality (RM). Control is therefore local and obeys theMinimal Distance Principle now subsumed under RM. If this is so, subject controlacross an intervening object should be barred in principle. This straightforwardlyaccounts for the fact that children systematically misinterpret such sentences as casesof object control (Carol Chomsky's results with the blindfolded doll). The next ques-tion is: Why is subject control possible at all in adult grammar? Pursuing a closeanalogy with the cases examined previously, Belletti and Rizzi suggest two possibletechniques that avoid intervention in different types of local relations:

1. The intervention configuration holds, but the feature specification of the inter-vener is properly included in the feature specification of the target (this is thecase of object relatives in adult grammars).

2. The intervention configuration is destroyed by a movement of a verbal chunk('smuggling'), which bypasses the intervener.

The delay of subject control in children's acquisition of syntax can thus be looked atas a particular case of the delay that children experience with smuggling operations.Subject control involves, in this analysis, at least as much derivational machineryas passive, and in fact more: the obligatory extraposition of the infinitive, arguably

8

Page 24: Rich Languages From Poor Inputs

Introduction

motivated by case-theoretic considerations, makes the derivational computation ofsubject control, under this analysis, more complex than passive. It may thus beexpected that subject control develops even later than passive, and with the individualvariation that the blindfolded doll experiment reveals.

The chapter by Itziar Laka, 'Merging from the Temporal Input: On Subject-ObjectAsymmetries and an Ergative Language', also tackles the issue of syntactically com-plex structures, those that are mastered by the child only at a later age. This makesthe gap between children's language production and their linguistic knowledge quiteevident. It also stresses the importance of finding new means to study language, notsolely based on what speakers say, but also on how they comprehend what is saidto them.

The asymmetries between subject relatives and object relatives (and subject gapsand object gaps) just reviewed above and expounded in Belletti and Rizzi's chapter,are frequently assumed to be invariant across languages and rooted in deep universalaspects of linguistic structure. Laka refines this hypothesis considerably, suggestinginstead that these asymmetries are subject to linguistic variation, and depend onexternal aspects of linguistic form largely independent of syntactic structure, thoughextremely relevant to the study of language use.

She discusses some recent results from studies on relative-clause processing inBasque that are incompatible with the widely held assumption that subject-objectlanguage-processing asymmetries are universal and that they tap into deep aspects oflinguistic structure involving the core grammatical functions 'subject-of' and 'object-of'. She argues, instead, that the processing results obtained in Basque do not entailthat the structural location of subjects and objects in ergative and nominative lan-guages is different; rather they entail that morphological differences and input-initialchoices have non-trivial consequences for processing.

In fact, Basque has prenominal relative clauses, like Chinese, Japanese, and Korean.But unlike all these languages, including also those with postnominal relative clauses,it is an ergative language. Ergative languages mark actor/undergoer core arguments ofthe verb differently from nominative languages; this difference crucially involves thegrammatical functions of subject and object. Hence, the study of processing asym-metries in an ergative grammar becomes particularly relevant in order to ascertainits cross-linguistic validity. Laka points out that, if subject gap syntactic structuresare not universally easier to process than object gap structures, then accounts basedon the inherent saliency or higher structural position of subjects cannot constitute across-linguistically valid account for processing asymmetries involving subjects andobjects. These data make a constraint satisfaction approach based on frequency notsuitable to account for the findings: there is no correlation between the frequencyof occurrence of subject versus object relative clauses and the processing asymmetryfound. In perfect agreement with the data and arguments offered in the precedingchapters, Laka uses this case to underline that if frequency were the factor modulating

9

Page 25: Rich Languages From Poor Inputs

10 Piattdli-Palmarini and Berwick

processing difficulty, then the subject gap relative clause should have turned out to beeasier to process than the object gap relative, contrary to results.

Language-specific properties are a plausible candidate, because language process-ing handles externalized language forms and their particular syntactic forms. Giventhe view that the most plausible locus for language variation is morphology, and giventhe fact that ergativity in Basque is a morphological phenomenon, this linguistic traitstands out as a likely source for this divergent pattern of processing asymmetries,because it directly involves morphological case marking of core arguments.

This explains why morphologically unmarked antecedent gap dependencies areeasier to process, because in nominative languages, unmarked dependencies cor-respond to nominative/subject gap relative clauses, while in ergative languagesunmarked dependencies correspond to absolutive arguments, which include objects.In the specific case of transitive sentences, this predicts a subject gap relative-clauseadvantage for the class of nominative languages, but an object gap relative-clauseadvantage for the class of ergative languages.

Importantly, Laka underlines that we must not conclude that Basque violates theprinciples of Universal Grammar, in particular minimal parsing. A crucial feature isanimacy, which strongly determines the processing choices speakers initially make forrelative clauses. Prominence features like animacy, known to be active features in themorphology of many human grammars, can drive processing choices in the absenceof other cues, and can have variable impact on processing cross-linguistically. Insum, the subject-initial processing strategy that follows from a minimalist processingperspective is modulated by grammatically active features like animacy, which favoran ergative/actor processing choice for a sentence-initial animate ambiguous form.

In linguistics, it is widely agreed that cross-linguistic comparative analysis is neededto discover the ultimate nature of linguistic structure. Laka explains why this is also soimportant in the analysis of input-processing mechanisms, provided that this is doneat an adequate level of abstraction.

Akin in spirit to Carol Chomsky's foundational observation, and what we have seenabove with Belletti and Rizzi—a difference in the stringency of syntactic requirementsas applied by young children as compared to adults and older children—the chapterby Ken Wexler 'Tough-Movement Developmental Delay: Another Effect of PhasalComputation' is centered on a model of why certain other components of syntaxmature relatively late. Wexler advances the Universal Phase Requirement as a par-tial explanation for this: that young children apply less (rather than more) stringentrequirements on what is and isn't a phase, that is, a point of mandatory computationalclosure within a sentence. He too is inspired by the blindfolded doll data, and thechild's understanding (or failure to understand) sentences such as the following,

Is the doll easy to see or hard to see?Would you make her easy/hard to see?

Page 26: Rich Languages From Poor Inputs

Introduction 11

as well as sentences of the following type (called Tough-Movement TM),

That house was easy/tough to knock down.

where the subject the house is clearly the object of knock down, suggesting that theDP that house is moving from object position of the verb knock down to the subjectposition. No such movement is present in the closely parallel construction:

It was easy to knock down that house.

The underlying syntactic movement in these constructions is mixed, betweenA(argument) movement and A-bar(non-argument) movement. Only at about ageeight can one become fairly confident that a child can distinguish between these twotypes of sentences, and successfully 'pass' a TM test.

Wexler stresses that it has been extremely difficult to capture such empirical gen-eralizations in traditional terms, but that the insights developed on the strong role oflocal cyclic computation in Minimalist theory turn out to be crucial in explaining theobserved developmental facts.

The background for Wexler s present proposal is the derivation by phase analysisin Minimalist theory. The idea is to severely restrict the computation needed in asentence by proceeding to analyze by 'phases', from the bottom up, with only the min-imal amount of material available from the next phase down. Passives, unaccusatives,and raising structures are grammatical because the relevant full-blown Verb Phrase(indicated by vP, which contains the more traditional VP—both letters in uppercase—as a component) is defective. It is not a phase. Thus the complement of the vP (forexample, the direct object in the case of passives and unaccusatives, the lower subjectin the case of raising) is visible to syntactic operations.

In short, several structures require a non-phasal characterization of categories thatare usually phasal. The Universal Phase Requirement (UPR) simply states that chil-dren don't count any phases as defective; all potentially phasal categories are phasal tothem. Wexler then suggests that we ask the biological question: Since the explanationfor the progressive relaxation of the Universal Phase Requirement is biological, andmaturational, therefore depending on the biological/linguistic state of the organism,if UPR explains the late development of Tough-Movement, we would predict that TMand other UPR-delayed structures emerge about the same time. Indeed, we see thatthere is parallel development, as predicted, of the verbal passive and raising. This isquite revealing because these two structures are predicted to be delayed until the UPRis relaxed.

Like Wexler and the other authors in this section, Julie Anne Legate and CharlesYang's chapter 'Assessing Child and Adult Grammar' opens with the considerationthat we stand to gain much from analyzing the transient stages in child language. Notall aspects of child language are acquired instantaneously or uniformly: acknowledg-ing this in no way denies the critical contribution from UG and can only lead to a

Page 27: Rich Languages From Poor Inputs

12 Piattdli-Palmarini and Berwick

more complete understanding of child and adult language. To do so requires accuratemeasures of children's developmental trajectories, realistic estimates of the primarylinguistic data, concrete formulations of linguistic theory, and precise mechanisms oflanguage acquisition. It is in this spirit that they tackle the acquisition of the Englishmetrical stress system, because the developmental patterns of stress acquisition mayshed some light on the organization of the adult grammar.

Legate and Yang show that there is now a reasonable body of developmental dataon stress acquisition, both longitudinal and cross-sectional, and that the main (early)stages in children's metrical system can be identified. This allows them to connectthe phonological theory of stress with child language acquisition. In a more generalframework, linguistic theories often have to decide what constitutes the core system ofthe grammar—such as basic word orders, default rules, unmarked forms—and whatcan be, more marginally, relegated to the lexicon. The complex metrical system ofEnglish is riddled with exceptions, thanks in part to the extensive borrowing in thehistory of the language. There are therefore decisions that the child learner needs tomake, for the primary linguistic data do not arrive pre-labeled as core or peripheral.In this way, the child's navigation toward the adult grammar might shed light on thechoices of linguistic theorizing as well.

Legate and Yang assume that the child learner has acquired sufficient phonologicalknowledge of her specific language to carry out the computation and acquisition ofmetrical stress. Specifically, it is plausibly assumed that the child has acquired thesegmental inventory of their native language, which is typically fairly complete beforeher first birthday. Moreover the child has acquired the basic phonotactic constraintsof their language and is thus capable of building syllables from segments that aresubsequently used to construct the metrical system. Additionally, the child is capableof extracting words from continuous speech, perhaps as early as seven-and-a-halfmonths. Finally and importantly the child can readily detect prominence of stress.Indeed, very young infants appear to have identified the statistically dominant stresspattern of the language, as early as seven-and-a-half months old. The child is, there-fore, able to locate primary stress on the metrical structure of words, and acquisitionof the metrical system probably starts well before the onset of speech.

Legate and Yang share with Hochmann and Mehler a principles-and-parametersapproach, namely, in their case, that stress acquisition can be viewed as an instanceof parameter setting as the learner makes a set of choices made available by UniversalGrammar. However, they part ways with previous efforts on metrical stress acquisitionin several important ways, as explained in their chapter.

Refining previous work on the smooth integration of Universal Grammar withparameter-oriented probabilistic learning ('the variational model') they show that theformal learnability motivations for cues are no longer necessary. The data and themodel presented in this chapter consolidate the trend towards reducing the apparatusof Universal Grammar to a strict minimum (as also pursued quite explicitly in thechapters by Berwick, Chomsky, and Piattelli-Palmarini; Chomsky; and Wexler).

Page 28: Rich Languages From Poor Inputs

Introduction 13

The chapter by Thomas G. Bever, "Three Aspects of the Relation between Lexicaland Syntactic Knowledge', picks up Legate and Yang's thread of integration, proposingseveral points of reconciliation between a more traditional nativist theory of UniversalGrammar and the child's access to relevant statistical regularities and other inductiveprocesses. The cases of late acquisition examined in the previous chapters and othercases examined by Bever over several years, combine innate and statistical regularities,and late maturation, with aesthetic motivation, group-identity motivation, analysisby synthesis, hypothesis formation and testing. Integrating inductive and deductiveprocesses, Bever suggests an original approach to a very old problem: the theoreticaland empirical status of the Extended Projection Principle (EPP) (the constraint thatall sentences have to have a syntactic subject). Bever's suggestion is that this principle,important as it is in the child's acquisition of her local grammar, is not, after all,part of syntax, not a universal principle of Universal Grammar. Rather, it is a broadguiding criterion, subject to variation across languages, derived from the manifestregularities of characteristic template sentences. Bever then proceeds to ask whetherlanguage development in the child is a case of discovery by the child, as if they werea 'little linguist' (a termed coined by Virginia Valian). Bever stresses that his viewdoes not deny nor minimize the critical computational capacities that underlie thesuccessive structural hypotheses that the child formulates to match the empiricalgeneralizations. Capitalizing on years of his own past and current research on thedifferences in language processing between right-handers with and without a familyhistory of left-handedness, Bever explains why group differences might result froma general difference in the extent to which relevant neurological areas of the rightand left hemisphere are more equipotential in people with familial left-handedness.Turning his attention to reading, rereading, and intonation (a topic developed in Part3, and also pioneered by Carol Chomsky), Bever re-examines the role of words versussyntax in acquisition. He asks whether there really is a Voice in the head' that monitorsthe mapping between letters and sounds. Finally, he reports data on the facilitationthat readers experience when slightly graded spaces are inserted graphically in printedtext, taking into account what the reader/speaker tacitly knows very well: the existenceof gaps between sentence constituents.

Part 3 Broadening the Picture: Spelling and Reading.

Charles Read and Rebecca Treiman, in their chapter 'Children's Invented Spelling:What We have Learned in Forty Years', extend, update, and refine Carol Chomsky'sgroundbreaking work on the child's development of writing and reading (yes, basicallyin this order: writing first, reading later). They summarize the early contributions andthen examine what four decades of further research has uncovered. How well havethe initial views of the nature of invented spelling and the early ideas about classroominstruction held up? Read and Treiman emphasize that children's invented spellings

Page 29: Rich Languages From Poor Inputs

14 Piattdli-Palmarini and Berwick

are their own creation, at least in part, not a defective imitation of the adults' writingsystem. The creative peculiarities of children's invented spelling cannot have beenacquired from instruction or dictation, as the standard spellings (also found) mayhave been. Invented spellings may thus tell us something about the child's knowledgeof language.

Children's non-standard spellings, because they are invented, inform us abouttheir conceptions of sounds. Indeed, some spelling patterns have been found tobe quite consistent. The consequence is that one is well advised to think aboutwhat invented spelling might mean for learning to read and how writing might beincorporated into a preschool or primary-grade classroom. They ask: What else canwe learn from the spellings of children in the US and other countries? And sug-gest how the early spellings fit into the larger picture of spelling development ingeneral.

Early writing, like the acquisition of language itself, can be properly comparedto artwork (a reflection further developed by Merryl Goldberg in her chapter). Therecommendation, therefore, is that it must 'not degenerate into a form of exercise', andit must be guided by the child. How much writing the child will eventually producedepends on her own inclination and interest.

Read and Treiman conclude that teachers' own literacy is 'a double-edged sword':it can make it hard for them to think about how a language seems to a person whodoes not yet know how to spell it. Teachers may not appreciate, for example, the logicbehind a child's non-standard categorizations of certain sounds. The children whoultimately benefit from a better understanding of the beginnings of writing and whohave the opportunity to begin their writing in a supportive home or classroom willhave much to thank Carol Chomsky for.

In close connection with the preceding chapter, Stephanie Gottwald's andMaryanne Wolf's 'How Insights into Child Language Changed the Developmentof Written Language' takes its inspiration again from the blindfolded doll exampledescribed earlier, but now further extended to the developmental order of othersyntactic constructions in the domain of written language. The syntactic structuresunder investigation by Gottwald and Wolf were the following:

1. Easy to see (The doll is easy to see), the classical case also discussed in previouschapters and earlier in this introduction.

2. Promise (Bozo promises Donald to stand on the book). The crucial test here is toask the young child: Who will stand on the book?

3. Ask (The girl asked the boy what to paint). The test here is: Who will be doing thepainting?

4. And followed by ellipsis (Mother scolded Gloria for answering the phone, and Iwould have done the same). The test here is: Does 'the same' mean answering thephone or scolding?

Page 30: Rich Languages From Poor Inputs

Introduction 15

5. Although followed by ellipsis (Mother scolded Gloria for answering the phone,although I would have done the same). Here too the test is: Does 'the same' meananswering the phone or scolding?

These five structures constitute a developmental acquisition sequence: None of thechildren who did not comprehend structure i could comprehend any of the otherconstructions. All of the children who comprehended structure 5 also comprehendedthe four previous constructions.

Such data bear a strong resemblance to earlier data from researchers like RogerBrown and Ursula Bellugi, who discovered similar acquisition sequences in youngerchildren. The implications of these findings are that syntactic aspects of languageacquisition are not complete before entering primary school. Further, these datademonstrate that the acquisition process proceeds beyond the age of five in a manneridentical to other aspects of language acquisition in the young child: systematicallyand without direct instruction. Throughout this research, one is cautioned that thesefindings maybe just the 'tip of the iceberg' and that the linguistic structures yet to beacquired could be fairly extensive, beyond the five that have already been tested

In further extensions of this line of inquiry, Wayne O'Neil's "The Phonology ofInvented Spelling' shows how the data clearly indicate that English inventive spellersaim for a taxonomic phonemic representation, one that is phonetically grounded anddoes not take the morphology of the language into account. However, the writingsystem that the English-speaking child must ultimately control is morphophonemic:Its general principle, obviously grossly violated at times, is to leave unrepresented whatcan be predicted by phonological rule. O'Neil stresses that a child is not hardwiredto read and write; thus she cannot know what kind of writing system, if any, she willhave to contend with. When we examine the range of writing systems that exist for theworld's languages (alphabetic, alphasyllabic, syllabic, logographic, and combinationsthereof), we begin to understand that writing systems can be 'friendly' or not relativeto their different audiences.

The final chapter, Merryl Goldberg's 'The Arts as Language: Invention, Opportu-nity, and Learning', widens the horizons of the application of linguistic theory to prac-tical applications even further. She argues that a large literature, which she character-izes as 'misconceptionist', often negates the child's thinking process. While in such lit-erature children's pre-existing knowledge is mostly valued and it is acknowledged thatit is constructed by each individual, the discourse is negative in nature. It attributesto the child an endless series of 'misconceptions'. On the other hand, the alternativechampioned here, the discourse of invention, respects the child's development and iswritten in a more positive language. Teachers who consider learners' work as inven-tion acknowledge that children are engaged in creative knowledge building. Inventionis a creative affair. Viewing the learner as an active creator of knowledge respects notonly the learner but also acknowledges the very many different ways students invent

Page 31: Rich Languages From Poor Inputs

16 Piattdli-Palmarini and Berwick

their understandings of phenomena. A training that emphasizes looking at children'sinventions, no matter how they unfold, clearly is far more respectful toward childrenand their capabilities than one that finds 'misconceptions' and then figures out waysto fix them.

She underlines, in conclusion, that Carol Chomsky and her former students (wellrepresented in this Part 3) found joy in howthey viewed children as meaning-makersand creative individuals with a passion to present their ideas via messages with theirinvented spellings. Goldberg's belief is that 'this is a legacy that applies to how weshould view education overall'.

The reprint chosen for the Epilogue selects one work out of the vast production ofCarol Chomsky's papers and books. This paper has become a classic that encompassesthe title of the present volume, the idea that children develop rich grammars frompoor inputs. It reports a careful longitudinal analysis of language growth in casesthat indubitably constitute extreme examples of the Poverty of the Stimulus: childrenthat are both deaf and blind from a very early age. Notwithstanding such severedeprivation, language growth unfolds in essentially a normal way, matching the levelsof production and understanding of normal children of the same age. We opened thisvolume with a revealing snippet of conversation, reported in Carol's paper. Vibrantlyreporting a recent dramatic episode, a person deaf-blind from nineteen months of agesaid the following (we repeat it here):

I saw one cab flattened down to about one foot high ... And my mechanics friend told me thatthe driver who got out of that cab that was squashed down by accident got out by a [narrow]escape.

If we took the (considerable) trouble of drawing the complete syntactic tree ofthe last sentence we would be amazed by its complexity. Even greater amazement is,no doubt, felt when we consider the uniquely deprived perceptual conditions of thespeaker. But, as Noam Chomsky likes to emphasize, the normal development of lan-guage in the congenitally deaf, blind, and deaf and blind children, fascinating as it is,strikes us, in a sense, for the wrong reasons. Wrong because normal children the worldover and along the whole history of our species have received an input that looks,only superficially looks, to us vastly richer than the one received by these deprivedchildren. The lesson that Noam Chomsky wants to bring home is that decades ofattentive research on language acquisition show that it is not. Not really. Reading thisvolume will, we hope, explain why.

Page 32: Rich Languages From Poor Inputs

Parti

Poverty of the Stimulusand Modularity Revised

Page 33: Rich Languages From Poor Inputs

This page intentionally left blank

Page 34: Rich Languages From Poor Inputs

Poverty of the Stimulus Stands:Why Recent Challenges Fail1

ROBERT C. BERWICK, NOAM CHOMSKY, AND MASSIMOP I A T T E L L I - P A L M A R I N I

2.1 Introduction: the Poverty of the Stimulus Revisited

Environmental stimuli greatly underdetermine developmental outcomes in all organ-isms, including their physical growth. This is in biology at large a familiar truism andis sometimes called, in the domain of language, the 'poverty of the stimulus' (POS).For example, the distinct genomes of insects and vertebrates give rise to quite differenteye lenses, compound vs simple, independently of external environmental stimulus.In this case, placing the primary focus on the actual object of study, the internalorganization of eyes and their development rather than extraneous external variation,has led to clearer understanding, as in most other biological examples.

Turning to cognition, only human infants are able to reflexively acquire language,selecting language-related data from the 'blooming buzzing confusion'2 of the exter-nal world, then developing capacities to use language that far exceed any data pre-sented to them, much as in other areas of growth and development. The explanationfor such differences in outcome arises from four typically interacting factors:3

(1) Innate, domain-specific factors (in the case of language, what is called 'univer-sal grammar', obviously crucial at least in the initial mapping of external datato linguistic experience);

(2) Innate, domain-general factors;

1 For a more detailed critique of several of these challenges see Berwick, Pietroski, Yankama, andChomsky, 'Poverty of the Stimulus Revisited' (2011).

2 This expression is famously due to William James (in The Principles of Psychology, 1890/1981), whocharacterized 'the stream of thought' and the baby's impression of the world 'as one great blooming, buzzingconfusion'.

3 This view accommodates the familiar possibility of so-called 'epigenetic effects', the interaction ofexternal stimuli (factor (3)) with innate factors (i) and (2).

2

Page 35: Rich Languages From Poor Inputs

20 Berwick, Chomsky, & Piattelli-Palmarini

(3) External stimuli, such as nutrition, modification of visual input in very earlylife, exposure to distinct languages such as Japanese vs English, or the like; and

(4) Natural law, e.g., physical constraints such as those determining that mostdividing cells form spheres rather than other shapes, and none forms, say,rectangular prisms.

Addressing the same question, Descartes famously observed that an infant presentedwith a figure with three irregular sides—all that it ever experiences in the naturalworld—perceives it as a distorted triangle, not as a perfect example of what it actuallyis. In this case as well, the sample data, 'the stimulus' for selecting the correct concept'triangle', seems too impoverished without positing antecedently the target conceptin question. While Descartes's conclusion may well be too strong—the operativeprinciple might be some kind of a priori Euclidean geometry applied to sensationsyielding geometrical figures—the methodological approach stands.

The goal of this chapter is to re-examine familiar examples that were used to motivateone very elementary illustration of a POS question in linguistics, so-called yes-noquestions or 'polar interrogatives' (N. Chomsky 1968, 1971, 1980) in an attempt todetermine the proper formulation of factor (i), the domain-dependent linguisticfactors required to explain them. We stress at the outset that these examples wereselected for expository reasons, deliberately simplified so that they could be presentedas illustrations without the need to present more than quite trivial linguistic theory.They are but one example out of a wide array of POS arguments given fifty yearsago. Nevertheless, this simplified example, taken in isolation, has given rise to asubstantial literature, much of it attempting to show that this knowledge of languagecan be accounted for by resort to factor (2), for example, statistical data analysis bypresumably domain-general methods. Further, it is sometimes suggested that if thateffort succeeds, something significant will be revealed about the POS, perhaps even itsnon-existence in the case of language. As we will show to the contrary, the question ofPOS in the case of language would scarcely be affected even if such efforts succeeded,since one can resolve this particular POS question with very minimal assumptionsabout factor (i) principles (that is, UG). However, even this much is academic, sinceas section 2.4 below demonstrates, these approaches fail completely.

In fact, there is good reason to expect continued failure, for several reasons. First,such approaches misconstrue what is actually at stake, even in this artificially simpli-fied example. Second, they ignore the empirical range of relevant cases from whichthis example was selected. Perhaps most importantly however, there are long-known,straightforward answers to this particular POS question that have far wider scope.These answers are quickly discovered if we follow standard biological methodology, asin the case of animal eye lenses mentioned earlier. No one would have dreamt of tryingto account for the POS problem in the case of animal eye lenses, or innumerably many

Page 36: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 21

others like it, while knowing virtually nothing about eyes. Similarly, incorporatinga small part about what is actually known about language happens to yield a verysimple solution to the POS problem brought up in the case of yes-no questions, whilealso addressing the actual issues at stake and covering a much wider empirical range.Pursuing this course also opens new and intriguing questions that have yet to beexplored carefully.

Specifically, we will consider some recent attempts to deal with the simple case ofpolar interrogatives on the basis of domain-general procedures, factor (2) above, elim-inating factor (i). These alleged alternatives include a string-substitution inferencealgorithm (Clark and Eyraud 2007; Clark, Eyraud, and Habrard 2008; Clark 2010), aBayesian model selection algorithm that chooses among different types of grammars(Perfors, Tenenbaum, and Regier 2006, 2011), and a bigram or trigram statisticalmethod (Reali and Christiansen 2005).4 Though these particular approaches do notsucceed, we show that it is indeed possible to reduce the domain-specific linguisticcomponent (i) quite radically, perhaps even to what may well be a logical minimum.Our alternative arises from a very different way of looking at the problem than the oneadopted by these other approaches, one closer to the biological method: an analysisof the internal system of language and its constraints, rather than data analysis ofexternal events.

More generally we note that the prime concern of serious theoretical work inlinguistics since the 19505 has been to uncover potential POS issues, and then attemptto eliminate them, reducing, not increasing, the linguistic domain-specific compo-nent (i). This approach is pursued for obvious reasons: the apparent complexity anddiversity of descriptive linguistic proposals raises insuperable burdens for all relevantbiolinguistic questions, including the acquisition and evolution of language as well asits neural basis.

The remainder of this paper is organized as follows. In section 2.2 we lay out thebasic empirical facts regarding the expository question formation examples, strivingto remain neutral as to any particular linguistic formulation insofar as possible, arriv-ing at abasic list of empirical requirements that any explanatory account must address.Section 2.3 turns to explaining the empirical data in section 2.2 from a moderngrammatical standpoint—what an answer to the original problems ought to look like.It aims at reducing the linguistic domain-dependent facts (i) to a minimum. We shallsee that even when we consider extensions beyond the question formation examples,very few language-specific assumptions are required to provide a simple solution tothis particular problem (though as expected, new and non-trivial issues arise). Section2.4 proceeds to assess the claimed explanatory success of the recent approaches listedabove. We shall see that all these approaches collapse, both on the original examples

4 See also Kam and Fodor, this volume, for a reanalysis of this method, with considerations and conclu-sions germane to our own.

Page 37: Rich Languages From Poor Inputs

22 Berwick, Chomsky, & Piattelli-Palmarini

and on the extended example set. We find that on balance, the elimination of POSproblems and the reduction of factor (i) (the domain-dependent linguistic knowledgethat must be taken as a priori) remains best advanced by current research in linguistictheory, rather than by the alternative approaches reviewed in section 2.4, a conclusionthat we believe generalizes to other cases.

2.2 POS Revisited: Empirical Foundations

We begin our re-examination of the POS with the familiar expository example fromN. Chomsky (1968, 1980). Consider a simple yes-no (polar interrogative) questionstructure as in (la) below, where square brackets denote an assignment of phrasestructure and lower-case v and v* denote possible positions for the interpretation ofthe word 'can:

(la) [can [eagles that v* fly] v eat]

For (la) to be properly understood, the occurrence of 'can' must be interpreted inthe position marked by v, not v*, yielding a question about the predicate 'eat' ratherthan 'fly'; the question asks whether or not eagles can eat, not whether they can fly.Assigning the proper semantic interpretation to sentences like these has always beenthe real question of linguistic interest. We note further that the proper interpretationof example (la) also depends on its bracketing into phrases, that is, the assignment of astructural description to the string of items can eagles that fly eat'. This is necessary inorder to interpret, e.g., 'eagles that fly' as a single expression that serves as the subjectof the question.

How then is the choice made between the alternative positions for interpretation, vand v*? Note that the question (la) has a clear declarative counterpart with the samesemantic properties, differing only in the property of being a declarative rather thanan interrogative, where 'can replaces the correct position for interpretation, v, ratherthan v*, i.e.,

(ib) [[eagles that fly] can eat]

With no tacit assumptions as to the actual principles involved, we may posit thatexamples (la) and (ib) constitute apairing, where the second item of the pair explicitlyindicates the correct position for interpretation of 'can'. Such pairings are part ofthe knowledge of language that children attain, attesting to the relationship betweenstructure and interpretation. It is the relationship between such pairs that is the fun-damental question of interest, as clearly posed by the earliest expository examples,e.g., 'the dog that is in the corner is hungry'—'is the dog that is in the corner hungry',with the assumed bracketing and position for interpretation marked by v as: [is [thedog that is in the corner] v happy] (Chomsky 1968: 61-2, 1980: 39-52). It is this

Page 38: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 23

knowledge, at the very least, that factors (i)-(4) above must account for, as was explicitin the earliest presentations.5

Further insight into this knowledge maybe gained by considering related pairingsbeyond this simple expository example. Let us consider some of these here, with theunderstanding that they by no means exhaust the possibilities, but simply serve toillustrate that there is a much wider range of related pairing examples demandingexplanation, both within a single language, and, even more importantly, across alllanguages, universally. First, in English one may also substitute 'do' for the auxiliaryverb 'can or the main verb 'is' since 'do' bears morphological tense (cf. 'did') but isotherwise semantically a dummy or pleonastic item. We denote its correct position ofinterpretation by dv, and its incorrect position by dv*:

(2) [do [eagles that dv* fly] dv eat]

However, in languages that lack a dummy tense marker like 'do', e.g., German, we findthat the entire tensed verb maybe found heading the sentence:

(3) [Essen Adler [die v* fliegen] v]

Moreover, the same form appears in various constructions in languages that havebasically VSO (verb-subject-object) order, as in Irish, even though these need not bequestions (examples from McCloskey 2009):6

(43) [gcuirfidh [si isteach v aranphost]]put-future she in for the job

'She will apply for the job.'

(4b) [An gcuirfidh [si isteach v aranphost]]Interrog put-future she in for the job

'Will she apply for the job?'

In other words, the element that may be paired depends on details about the lan-guage in question. Crucially, we find that in the rich variety of examples like these,

5 Such pairings are a part of every linguistic theory that takes the relationship between structure andinterpretation seriously, including modern accounts such as HPSG (Head-driven Phrase Structure Gram-mar), LFG (Lexical Functional Grammar), and TAG (Tree Adjoining Grammar), as also stressed by Kamand Fodor in their chapter in this volume. As it stands, our formulation takes a deliberately neutral stance,abstracting away from details as to how pairings are determined, e.g., whether by derivational rules as inTAG or by relational constraints and lexical redundancy rules, as in LFG or HPSG. For example, HPSG(Bender, Sag, and Wasow 2003) adopts an 'inversion lexical rule' (a so-called 'post-inflectional' or 'pi-rule')that takes 'can' as input, and then outputs 'can' with the right lexical features so that it may appear sentence-initially and inverted with the subject, with the semantic mode of the sentence altered to be 'question' ratherthan 'proposition'. At the same time this rule makes the subject noun phrase a 'complement' of the verb,requiring it to appear after 'can'. In this way the HPSG implicational lexical rule defines a pair of exactly thesort described by (ia,b), though stated declaratively rather than derivationally.

6 See Chung and McCloskey (1987), McCloskey (1991, 1996) for extensive evidence for a v positionin Irish.

Page 39: Rich Languages From Poor Inputs

24 Berwick, Chomsky, & Piattelli-Palmarini

the constraint that governs the correct choice for the position of interpretation forv continues to apply. Any explanation of pairings must therefore apply universally,cross-linguistically to cases such as (3) and (4a,b) as well as (ia,b).

Probing further, the possibility of a construction like (la) does not necessarilyinvolve the semantic or underlying subject position, as illustrated in (5) below, wherethe position for interpretation, v, follows the surface subject 'there', not the underlyingsemantic subject 'eagles that eat while flying':

(5) [can [there v be [eagles that eat while flying]]]

Pairings may also include adjectival constructions, (6a,b), as well as forms with 'wh'words ('what', 'who', 'which book', etc.), as indicated below. We again mark the positionfor correct interpretation via a notation for adjectives, a, or wh-words, w. Examples(6c) and (/b) illustrate that here too certain pairings are possible, while other pairingsappear to violate some constraint, as marked by the illicit positions for interpretation,a* and w*.7

(6a) [Happy though [the man who is tall] is a], he's in for trouble

(6b) [Though [the man who is tall] is happy], he's in for trouble

(6c) [Tall though [the man who is a* ] is happy], he's in for trouble

(/a) [What did [the man who bought the book] read w]

(/b) [What did [the man who bought w*] read]

The constraints on v and w pairings partly overlap but are not identical. In both (5) and(/a,b) the legitimate v or w positions are in the main clause, while the forbidden v* orw* positions lie within an embedded clause. However, example (8) below shows thatthe constraints on v and w pairings must be distinguished. In (8), 'what' maybe pairedwith the w position that lies within an embedded clause, 'that eagles like w'; in contrast,'will' can never be paired with the v* position within that same embedded clause:

(8) [what will John v warn [people that we read w* to p ] [that eagles v* like w] ]cf. 'John will warn people that we read to that eagles like what'

More generally, although not all languages will necessarily exhibit pairings like thosein (i)-(8) due to other, extraneous factors (e.g., some languages might not formquestions with wh-words along the lines of (8)), where such pairings are possible atall, the general constraints look the same as they do in English.

Our general conclusion then is that a proposed explanation for v-pairing must meetat least the following conditions:

7 There are of course many other possible construction pairings and constraints, including some thatapparently Violate' the embedding constraint described in the main text, but they are not relevant to theproblem we address in this article. These would be part of a fully articulated theory of language, whichwe do not present here. We are simply illustrating example pairings that will have to be met by any fullysatisfactory account.

Page 40: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 25

I. Yield the correct pairings, for an infinite set of examples, those that exhaust therelevant cases;

II. Yield the correct structures, since interpretation is required by any seriouslinguistic/cognitive theory, also for an infinite set of examples;

III. Yield the correct language-universal patterning of possible/impossible pairings;IV. Distinguish v- from w-pairings in part, while also accounting for their shared

constraints.

Criteria I-IV impose a considerable empirical burden on possible explanations thatgo, as they should, beyond the simplified expository example. They exclude proposalsthat do not even attempt to account for the pairings and the various options forinterpretation, or that do not extend beyond (la), or, even worse, that limit themselvesto generating only a correct surface string of words, rather than the correct bracketedstructures. As we shall see, these problems arise for all the efforts we consider insection 2.4 below that attempt to frame an account in terms of factor (2), domain-general principles, though in fact they collapse on even simpler grounds.

As is familiar, Chomsky (1968,1971,1980) addressed the question of pairings like(ia,b) in terms of a grammatical rule relating the (la) and (ib) forms, noting thatwhatever the correct formulation, such a rule must make reference to the structure(i.e., bracketing) of the sentence, rather than simply 'counting' until reaching the firstoccurrence of 'can, and ignoring the sentence structure. The question was framed(1968: 61-2, 1971: 26-7, 1980: 39-52) by imagining a learner faced with accountingfor such declarative/question pairs by means of two competing rule hypotheses, Hiand H2. Hi 'takes the left-most occurrence of "is" and then moves it to the front ofthe sentence' (1971: 27, 1980: 39) while H2 'first identifies the subject noun phraseof the sentence' and then moves 'the occurrence of "is" following this noun phraseto the front of the sentence' (ibid: 26). Let's stress here and now (though we willdevelop this further in section 2.4.2.1) that the mere existence of hierarchical struc-tures and the child's access to them are presupposed. The issue that is so centralto this particular POS problem is tacit knowledge by the child that grammaticalrules apply to such structures. Regrettable confusion persists on this distinction inmuch of the work we are reviewing here. By convention, we call this movement'V-raising', and its generalization to other categories as described in examples (2)-(8),

8 From the earliest work in generative grammar in the 19505, both declaratives and correspondinginterrogatives were assumed, for good reasons, to be derived from common underlying forms that yieldthe basic shared semantic interpretations of the paired constructions. These expressions differ only by alexical property that in some structures 'attracts' the verbal auxiliary to the front: for example, in (i) but notin the semantically similar expression (ii):

(i) he asked 'are men happy?'

(ii) he asked whether men are happy

'raising'8.

Page 41: Rich Languages From Poor Inputs

26 Berwick, Chomsky, & Piattelli-Palmarini

Crucially, rule Hi refers only to the analysis of the sentence into individual wordsor at most part-of-speech labels, along with the property 'left-most', that is, it does notdepend on the sentence structure, and consequently is called structure-independent.In contrast, rule Hi refers to the abstract label 'noun phrase', a grouping of wordsinto phrases, and consequently is called structure-dependent. In this case, the crucialdomain-specific factor (i) is the structure dependence of rules (as is stressed in all thepublished work regarding this topic, see, e.g., Chomsky 1968,1971,1975,1980).9

We can describe the examples we have covered in terms of two principles, whichhappen to overlap in the case of subject relative clauses. For the V case, the pairing (orraising) indeed does keep to minimal distance, but 'minimal' is defined in structural(phrase-based) rather than linear (word-based) terms: the paired/raised element isthe one structurally closest to the clause-initial position. More generally, there areno 'counting rules' in language (see Chomsky 1965, 1968; Berwick 1985, for furtherdiscussion).10 For all cases, the descriptive principle is that subject relative clauses actas 'islands', barring the pairing of an element inside the relative clause with an elementoutside it (whether an auxiliary verb, a verb, a do-auxiliary, an adjective, or a wh-word). Such 'island constraints' have been studied since the early 19605.n Tentatively,we can take these two principles to be factor (i) principles, that is, part of antecedentdomain-specific knowledge. However, at least the first principle might reasonably beregarded as a factor (4) principle, reducing to minimal search, a natural principleof computational efficiency. We will not explore the source of the second principlehere, but it has been the topic of a good deal of inquiry, which also seeks to reduce itsubstantially to locality and related principles that might fall within a general notionof efficient computation that is language- or possibly even organism-independent.

2.3 An Optimal General Framework

How can we construct a system that will cover the empirical examples in the previoussection, while minimizing the contribution of domain-dependent factors (i)? We firstnote that in order to satisfy conditions I and II above, such a system must yield an

The 1980 publication includes a section explicitly headed 'Structure Dependence of LinguisticRules', p. 39; in this regard, note also that Grain and Nakayama (1987: 522) concluded that their experi-ments 'support Chomsky's contention that children unerringly hypothesize structure-dependent rules' [ouremphasis].

10 Different patterns of fJVIRI brain activation have been evidenced when a subject monitors structure-dependent (and therefore grammatically realistic) rules, as opposed to rules applying to words in a fixedposition in a sentence (and therefore grammatically impossible) (Musso, Moro, et al. 2003). Further fJVIRIevidence showing this difference has recently been published by Pallier, Devauchelle, and Dehaene (2011;see also Moro's commentary in the same issue).

11 Note that this restriction to subject relative clauses is presumably part of some broader principle;for the initial observation, see Chomsky (1962: 38-47), followed by Ross's more general account (1967:2-13), and many improvements since. V-pairing is more constrained than wh-pairing because it evidentlyrequires a kind of adjacency; see section 2.5 for further discussion of this constraint, which holds muchmore generally for lexical items that are 'atoms' for computation in the sense discussed directly below.

9

Page 42: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 27

infinite number of discrete, structured pairs. While there are many specific methodsfor accomplishing this, since the latter part of the nineteenth century it has beenknown that any approach will incorporate some primitive combinatory operation thatforms larger elements out of smaller ones, whether this is done via a Peano-style axiomsystem, a Fregean ancestral, a Lambek-style calculus with Valences', or by some othermeans. Call this basic operation Merge.

At a minimum, Merge takes as input two available syntactic objects X, Y, each an'atom' for computation (drawn from the lexicon), or else constructed by Merge fromsuch atoms, and from these constructs a new, extended object, Z.12 In the simplestcase, X and Y are unchanged and unordered by the merge operation, so that Merge(X, Y) can be taken to be just the unordered set {X, Y}. We will refer to the condi-tion that X and Y are unchanged as the 'no-tampering condition (NTC), a generalprinciple of efficient computation. Imposing an order on X and Y requires additionalcomputation, which, it appears, does not belong within the syntactic-semantic com-ponent of language. There is substantial reason to suppose that ordering is a reflex ofthe process of externalization of structures by the sensory-motor system, and does notenter into the core processes of syntax and semantics that we are considering here (seeBerwick and Chomsky 2011, and Chomsky's separate contribution to this volume).

Anything beyond the simplest case of Merge(X,Y) = {X,Y} requires additional stip-ulations and more complex computations, and therefore is to be rejected unless itreceives adequate empirical support.

If X is a lexical item and Y any syntactic object (SO), then the output of Mergeis the set {X, SO} with SO traditionally called the complement of X. As a simpleexample, 'see the man, with part-of-speech labels v, det, n, traditionally written asa Verb Phrase consisting of the verb 'see' and its Noun Phrase complement 'the man,can for expository convenience be represented as {v, {det, n}}. Since Merge can applyto its own output, without limit, it generates an infinite number of discrete, structuredexpressions.

Each syntactic object X formed by the repeated application of Merge has propertiesthat enter into further computation, including semantic/phonetic interpretation: averb phrase VP functions differently from a noun phrase NP. In the best case, thisinformation about X will be contained in a single designated element of X, its label,which can be located by a search procedure as the computation involving X proceeds.In the best case, the search procedure will be optimal, hence plausibly an instanceof factor (4). We will put aside for the moment the interesting question of optimallabeling algorithms, noting only that in the simple case of lexical item ('head') Hand complement XP, {H, XP}, the optimal minimal search algorithm will locate H

12 There may be justification for an additional operation of pair-Merge that forms ordered pairs. Forsome discussion of this point, see N. Chomsky (2009, and this volume). In the best case, we can reduce thenumber of merged items to exactly two; see Kayne (1984) for evidence on this point.

Page 43: Rich Languages From Poor Inputs

28 Berwick, Chomsky, & Piattelli-Palmarini

as the label, thus v in {v, {det, n}}, a Verb Phrase. (More on this in Chomsky's separatecontribution to this volume.)

Let us take Y to be a term of X if Y is a subset of X or a subset of a term of X. If wethink of Y merged to X, then without stipulation we have two possibilities: either Yis not a term of X, what is called external Merge (EM); or else Y is a term of X, whatis called internal Merge (IM). In both cases the outputs are {X,Y}. External Mergetypically underlies argument structure, as in see the man with 'the man the NounPhrase object of'see' in the Verb Phrase {X,Y} (omitting irrelevant details). InternalMerge typically underlies non-argument structure (discourse, scope related, and thelike). For example, in topicalization constructions such as 'him, John really admiresn', an intonation peak is placed on the 'new' information, 'him', which is associated viaInternal Merge (IM) with the position marked by n, where it receives its semantic roleby External Merge (EM). This contrasts with the construction without IM operating,namely, 'John really admires him', with normal intonation.13

IM yields two copies of Y in {X,Y}, one copy internal to X, and the other external toX, in what is sometimes called the 'copy theory of movement'. Note that the fact thatboth copies appear, unchanged, follows from the optimal computational constraintNTC: it would require extra computational work to delete either one. Thus there is noneed to explain the existence of copies, since they in effect 'come for free'. What wouldrequire explanation is a ban on copies. Furthermore, contrary to common misunder-standings, there is no operation of 'forming copies' or 'remerging copies'. Rather, thecopy theory of movement follows from principles of minimal computation.

Suppose, for example, we have the structure (93) below. Taking Y = what andX = the syntactic object corresponding to the structure of (93), with Y a term ofX, and applying internal Merge, we obtain the output (9b), where what is in theso-called 'Specifier position of Comp:14

(9a) [comp [you wrote what] ]

(9b) [spec what [Comp [you wrote what]]]

It is apparent that internal Merge—the special case where either X or Y is a term of theother—yields pairs, or 'raising' constructions of the kind discussed earlier in section2.2: the structurally lower occurrence of what in (9!)) is in its proper position for

13 This distinction between structures formed via EM and those formed by IM is sometimes called the'duality of semantics', and is presumably part of UG. Relying on it, the child knows that in such structuresas 'what eagles eat' (as in 'I know what eagles eat'), etc., 'what' is displaced from the underlying structureformed solely by EM that yields the correct interpretation of 'what' as the object of 'eat'. More complexsystems that bar IM and instead add new mechanisms have to provide a more intricate account of thispervasive and quite salient duality property of semantics, which has to be captured in some way in anyadequate theory of language. There are some apparent departures, presumably reflecting our current lackof understanding.

14 This 'specifier' position itself may well be eliminable. See section 2.5 and Chomsky's separate contri-bution in this volume. This possibility does not bear on the discussion in this section. Here, 'Comp' standsfor the 'complementizer', sometimes overt, as in 'it seems that John wrote something'.

Page 44: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 29

interpretation (as an argument of 'wrote'), while the structurally higher occurrenceof what is in the position where it is 'pronounced' (and, furthermore, interpreted asan operator ranging over the construction, so that the interpretation is roughly 'forwhich thing x, you wrote the thing x'). Thus this formulation meets requirement II.Given the two descriptive principles mentioned earlier, one for 'atoms' and the otherfor all phrases, IM generates a structured object that provides precisely the properpositions for interpretation.15

Importantly, having Merge operate freely, including both EM and IM, is the sim-plest option. It would require some specific stipulation to rule out either IM or EM.And it would require further stipulation to develop new mechanisms to achievethe same results as in computation with unrestricted Merge. Such stipulations toconstruct pairings enrich UG, the domain-specific factor (i), and therefore requireempirical evidence. What would be needed is evidence for the double stipulation ofbarring IM (or EM) and adding new descriptive technology to replace what IM andEM do without stipulation. Lacking such empirical evidence, we keep to the simplestMerge-based system.

As with (93), (la) may now be rewritten as (10), with two copies of can, thestructurally lower copy indicating the proper place for interpretation associated witheat, and the structurally higher one indicating the position for pronunciation:

(10) [can [eagles that fly] can eat]

The relation between the (ia,b) pairs is thus established via the IM operation and theresulting copies.16 Note that the v notation used earlier for exposition may now beseen to be more than just an expository convenience. Understood as a copy, not anotational device, it captures the pairing in what appears to be an optimal way. (10)exhibits the syntactic structure transferred to the language components responsibleboth for articulating and interpreting the syntactic form. It is at this latter stage thatexplicit pronunciation of the second occurrence of'can is suppressed.17

Since the No Tampering Condition (NTC) does not permit any manipulation of the structure X, theonly possible operation is to raise Y from within X; lowering Y into X is barred. Thus without stipulationthe duality of semantics is determined in the right way: the structurally higher position is not the positionwhere argument structure is determined but instead has to be the operator position, which also conforms,automatically, to the structural notion of c-command' determining scope, as necessary for independentreasons—as in standard quantification theory notation. (See also Chomsky, this volume.)

16 We adopt here and elsewhere the convention of underlining the unpronounced lower copy.17 There is some (arguably marginal) evidence from child language studies (Nakamura and Grain 1987;

Ambridge et al. 2008) that there could be a presumptively performance tendency to repeat this secondoccurrence, so-called 'aux-doubling', a fact lending additional credence to the copy theory. Further, thereare interesting cases where some residue of the lower copy is retained in pronunciation, for example, ifthe copy is in a position where an affix requires it. Even so, the overwhelming phenomenon is deletionof the lower copy, for reasons that are discussed in Berwick and Chomsky (2011): it saves considerableduplicated neural-mental and articulatory computation. It seems to be the case that there is no languagethat 'pronounces' the full set of copies, e.g., in 'which picture of John did you say Bill told Mary Tom took'the fully spelled-out other copies would amount to (at least) something like, 'which picture of John did you

15

Page 45: Rich Languages From Poor Inputs

30 Berwick, Chomsky, & Piattelli-Palmarini

Systematically running through examples (2)-(<j), we can now readily check that ineach case the copying account automatically fills in the legitimate locations for v, dv,a, or wh interpretation, meeting our requirements (I) and (II), and most of (III). Forexample, in (6a), repeated below as (11), 'happy' is interpreted properly in its positionafter the predicate 'is':

(11) [Happy though [the man who is tall] is happy], he's in for troublecompare: Though the man who is tall is happy, he's in for trouble.

To capture the constraints on pairings, we need to add the two language-dependentprinciples mentioned earlier: first, for v-pairing, the 'raised' v is the one structurallyclosest to the clause-initial position; second, in all cases, subject relative clauses act as'islands'.18 Given this, all of the criteria (I)-(IV) are satisfied.19

These are straightforward examples. But copying can also account for far morecomplex cases, where, for example, quantificational structure cannot simply be readdirectly off surface word order, another potentially serious POS problem. For instance,in (na) below, 'which of his pictures' is understood to be the object of'likes', analogousto 'one of his pictures' in (13). The copying account renders (na) as (lib), with thecopy 'one of his pictures' in exactly the correct position for interpretation. Further, thequantifier-variable relationship between 'every' and 'his' in (na) is understood to bethe same as that in (13), since the answer to (na) can be 'his first one' (different forevery painter, exactly as it is for one of the interpretations of (13)). No such answeris possible for the structurally very similar (14). Here too the correct structure issupplied by (lib). In contrast, in (14) 'one of his pictures' does not fall within thescope of every painter', the right result.

(na) [which of his pictures] did they persuade the museum that [[every painter]likes best?]

(lib) [which of his pictures] did they persuade the museum that [[every painter]likes [which of his pictures] best?]

(13) they persuaded the museum that [[every painter] likes [one of his pictures]best]

say [which picture of John] Bill told Mary [which picture of John] Tom took [which picture of John]'. (Thereare some claims about Afrikaans which assert that this particular language violates this principle, but weput these to one side here.) In fact, in examples like these, sometimes called 'successive-cyclic movement',the position of the unpronounced copy is often marked by some device—morphology, special agreement,or in a case discussed by Torrego in Spanish (1984), V-raising. V-raising meets the standard conditions asoutlined in the main text, as expected.

18 For interesting data and arguments showing the crucial importance of'islands' and why these cannotbe extracted from statistical data see also the chapter by Kam and Fodor in this volume.

19 We leave open the possibility that there might be some language-independent principles related toisland constraints, as discussed in Berwick and Weinberg (1984).

Page 46: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 31

(14) [which of his pictures] persuaded the museum that [[every painter] likesflowers?]

A wide range of similar cases involving such 'reconstruction effects' are readily accom-modated by the copying account, all within this very restricted UG.

2.4 Other Explanatory Attempts

Since the first expository examples were formulated, there have been attempts toformulate alternative partitionings of factors (i)-(4), distinct from the account givenin section 2.3. In this section we review three of the most recent such approaches inlight of our criteria listed in section 2.2. In general, while these recent alternativesalso strive to reduce the linguistic domain-specific factor (i), the right methodolog-ical goal, we shall see that they all fail. For one thing, they leave the principle ofthe structure dependence of linguistic rules untouched. Further, some aim only togenerate the correct polar interrogative sentence strings, rather than addressing theonly real question of linguistic interest, which is generating the correct structures forinterpretation along with correct pairings, as we emphasized in section 2.1. Thosethat do aim to get the right pairings, sometimes implicitly, still fail to do so, as weshall show. Finally, in general they do not address the broader cross-linguistic andempirical examples and cannot generate the attested broader patterns of correct andincorrect pairings.

2.4.1 Clark and Eyraud (Clark and Eyraud 2007; Clark, Eyraud, and Habrard 2008;Clark 2010); hereafter, CE

We begin by considering a string-based approach that was motivated by consider-ing some of Zellig Harris's proposals on 'discovery procedures' for grammars. CEadvance an inference algorithm for grammars that, given positive examples such as(153) and (isb) below, generalizes to a much larger derivable set of sentences thatincludes examples such as (isc), while correctly excluding ungrammatical examplessuch as (isd).

(153) men are happy.

(i5b) are men happy?

(i5c) aremen who are tall happy?

(i5d) * are men who tall are happy?

Briefly, the method weakens the standard definition of syntactic congruence, positingthat if two items u and v can be substituted for each other in a single sentence context,then they can be substituted for each other in all sentence contexts. E.g., given 'theman died' and 'the man who is hungry died', we can conclude that the strings 'the man

Page 47: Rich Languages From Poor Inputs

32 Berwick, Chomsky, & Piattelli-Palmarini

and 'the man who is hungry' are substitutable for one another in these sentences, andtherefore are substitutable in all sentences; similarly, given a new sentence, 'the manis hungry', we may use the congruence of 'the man and 'the man who is hungry', tosubstitute for 'the man', yielding 'the man who is hungry is hungry'.

CE call this notion 'weak substitutability' to distinguish it from the more conven-tional and stronger definition of substitutability, which of course does not extendexistential substitutability to universal substitutability. (The punctuation marks at theend of the example sentences are actually crucial for the operation of the algorithm;see Clark & Eryaud, 2007.) Weak substitutability imposes a set of (syntactic) con-gruence classes, a notion of constituency, on the set of strings in a language. Forexample, 'the man' and 'the man who is hungry' are in the same congruence classaccording to the two simple strings given above. This yields an account of sentencestructure, 'how words are grouped into phrases'. It is this extension that does the workin CE's system of generalizing to examples that have never been encountered by alearner—that is, generating novel strings. But it is evident that these notions collapseat once.

CE themselves remark that weak substitutability will 'overgenerate radically' andon 'more realistic samples this algorithm would eventually start to generate even theincorrect forms of polar questions'. That is true, but misleading. The problems donot arise only 'eventually' and with 'more realistic samples', but rather at once andwith very simple ones. E.g., from the examples 'eagles eat apples' and 'eagles eat', weconclude that eat' is in the same class as 'eat apples', so that substituting 'eat apples' for'eat' yields the ill-formed string eagles eat apples apples'. Note that 'eat' and 'eat apples'are both verb phrases, but cannot be substituted for each other in 'eagles—apples.' Infact, virtually no two phrases will be substitutable for each other in all texts. Similarelementary examples yield incorrect forms for polar sentences. Thus, from 'can eaglesfly' and 'eagles fly' we conclude that 'can eagles' and 'eagles' are in the same congruenceclass, yielding the polar question 'can can eagles fly'.

To take another example, consider the following simple sequence. It yields anungrammatical sentence derivation (square brackets are introduced for readability,'=' denotes 'is weakly substitutable for'):

(16) does he think [well]'?

(17) does he think [hittingis nice]'?; /. well = hittingis nice

Accordingly, given the sentence 'is he well?', we may substitute 'hitting is nice' for 'well'to yield the invalid string 'is he hitting is nice'. In short, it is easy to construct manysimple counter-examples like this that violate the weak generative capacity of English.As has long been known, such an approach cannot get off the ground, even with thesimplest cases.

As we stressed earlier, the question of interest is generating the right struc-tures for interpretation, along with the proper pairings. The simplest way we know

Page 48: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 33

of—virtually without assumptions—is the one we just sketched (which when spelledout carefully, incorporates the basic assumptions of note 5).20

To summarize, CE develop an approach that fails even for the simplest examplesand completely avoids the original problem, and of course does not even addressthe question of why the principles at work generalize broadly, it seems universally.There seems to be no way to remedy the irrelevance of CE's proposal while keepingto anything like their general approach.

2.4.2 Perfors, Tenenbaum, and Regier (2011), PTR: Bayesian model selectionof context-free grammars

PTR also consider the key question of domain-specific vs domain-general knowledgein language acquisition, but from a different perspective and with a very different wayof partitioning factors (i)-(4). We review their approach briefly before turning to itsevaluation.

Factor (i), prior, domain-specific linguistic knowledge:For PTR, this consists of a series of crucial stipulations:

(i) Sentence words are assigned unambiguous parts of speech.(ii) In particular, PTR represent a sentence such as 'eagles that can fly eat' as the

part-of-speech sequence 'n comp aux v vi'; the sentence 'eagles eat' as 'n v';'eagles can eat' as 'n aux v'; 'can eagles eat' as 'aux n vi'; and eagles are happy'as 'n aux adj'.

Here, the part-of-speech label 'n' denotes any noun; 'comp', the 'comple-mentizer' that introduces embedded S's, typically the word 'that'; and 'adj', anyadjective. For PTR's analysis, 'aux' denotes any auxiliary verb (including can,do, will, and the copula in its varied uses; thus appearing twice in 'is the childbeing obstinate'); 'vi' denotes any verb taken to be uninflected for tense, e.g.,'eat' in (sb); and V any inflected verb, e.g., 'fly' in (sb). Note that 'fly' and'eat' are actually ambiguous as to whether they are inflected or not, but PTRassume this choice to have been resolved in the required way before the analysisproceeds, by some means that they do not discuss. We note that the CHILDEStraining corpus they use does not in fact typically distinguish V and 'vi'; thenovel tag 'vi', which plays a crucial role in the analysis, has been introduced byPTR as a stipulation.

(iii) All the phrases S, NP, VP, IP, etc. required to build a context-free grammar tocover at least the sentences in the training corpus (PTR further assume as giventhe correct phrase boundaries for the part-of-speech sequences in the trainingcorpus); a (stochastic) context-free grammar that can parse all the sentences of

20 Clark (2010) extends the distributional approach to include an explicit notion of structure, therebyremediating the issue of addressing only weak generative capacity, but as far as we are able to determine, asof yet there are no results that yield the desired auxiliary verb POS results.

Page 49: Rich Languages From Poor Inputs

34 Berwick, Chomsky, & Piattelli-Palmarini

the training corpus; and a finite-state (right-linear, regular) grammar usuallyderived from the Context-Free Grammar (CFG) that can also parse all thesentences of the training corpus.21 Note in particular that PTR's system doesnot learn any particular grammar rules; these too are stipulated.

Factor (2), domain-general knowledge:PTR assume a Bayesian model selection procedure that can choose among the

three grammar types, picking the one with the largest posterior probability ('mostlikely') given the corpus. This probability is in turn the product of two factors, (i)the prior probability of a grammar, P(G), essentially a measure of the grammarssize, with larger grammars being less likely; and (ii) the likelihood of a grammar-corpus pair, which is the conditional probability of generating (parsing) the givencorpus given the grammar, P(corpus|G). The 'best' likelihood P(corpus|G) is foundby attempting to maximize P(corpus|G), by altering the initial uniform probabilitiesof the antecedently stipulated CFG or Finite-State Grammar (FSG).22

Factor (3), external stimuli:PTR use a 'training' set of 2,336 'sentence types' selected from the CHILDES Adam

corpus. As mentioned, actual sentence words are replaced with pre-assigned part-of-speech tags; certain sentences have been removed from the corpus.23

Two basic elements are learned by PTR's system: (i) the re-estimated probabilitiesfor the context-free or finite-state rules that (locally) maximize the likelihood of acorpus, grammar pair; and (2) which of the stipulated types of grammar (mem-orized sentence list, FSG, or CFG) yields the highest posterior probability. In twocases, PTR's method does construct grammar rules per se. One approach conducts anautomatic 'local search' from a given hand-specified context-free grammar (and itscorresponding finite-state grammar). The second attempts to carry out a (partially)global, automatic search of the space of possible FSGs, while also calculating theposterior probability of the resulting grammars.

21 PTR also posit a third 'grammar' type, which consists of simply a memorized list of the sentences(for them, part-of-speech sequences) in the corpus. Some versions of PTR's analyses start from a hand-built context-free grammar (CFG) and then carry out a 'local search' in the space of grammars aroundthis starting point, to see whether this alters their Bayesian selection of CFGs over Finite-State Grammars(FSGs). It does not. But we should note as do PTR that there is no mechanical inference procedure providedfor constructing CFGs generally; even for FSGs the problem is known to be NP-hard.

PTR include a third factor, the probability of the particular grammar type, T, (i.e, memorizedlist, finite-state/regular, or context-free), but since these probabilities are all set to be the same, as PTRnote, the T value does not alter the relative final posterior probability calculation. The maximization ofP(corpus | grammar) is done by a local hill-climbing search method known as the 'inside-outside' algorithm;the details here are not relevant except to note as PTR do that this method is not guaranteed to find a globalmaximum.

23 'The most grammatically complex sentence types are removed...' specifically (PTR fn. 5), 'Removedtypes included topicalized sentences (66 individual utterances), sentences containing subordinate phrases(845), sentential complements (1636), conjunctions (634), serial verb constructions (460), and ungram-matical sentences (443)'. For example, PTR exclude the sentence with the subordinate clause, 'are you astall as Mommy' (Adamo2.txt, example 1595).

22

Page 50: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 3 5

Specifically, PTR argue that the Bayesian calculus works out so as to rank stochas-tic context-free grammars with higher posterior probabilities—a 'better fit to thecorpus'—than the two other choices which they take to lack hierarchical structure,establishing that this latter property of natural language is learnable without havingto posit it a priori.24

PTR claim two main results. First, PTR conclude that 'a learner equipped with thecapacity to explicitly represent both linear and hierarchical grammars—but withoutany initial bias to prefer either in the domain of language—can infer that the hier-archical grammar is a better fit'. Second, PTR assert that their 'best' (most prob-able) context-free grammars exhibit 'mastery' of the auxiliary system: '...we showthat the hierarchical grammar favored by the model—unlike the other grammars itconsiders—masters auxiliary fronting, even when no direct evidence to that effect isavailable in the input data' (p. 3is).25

However, as we show directly, PTR do not establish either of these results, and inparticular have not confronted the original POS problem at all.

2.4.2.1 Learnability of hierarchical structure? Consider first the question of thelearnability of hierarchical structure, given PTR's three choices for covering the givencorpus: a memorized finite list of part-of-speech sequences; a (stochastic) context-freegrammar; and a regular (finite-state) right-linear grammar derived from the coveringcontext-free grammar.

We may immediately exclude (as they also do) the finite list option as a viable optionfor any realistic learning model for natural languages. The finite list grammar' simplymemorizes each particular part-of-speech sequence in the corpus as a special case.Not only is each sentence then totally unrelated to every other sentence—the nextsequence could even come from a completely unrelated language, such as German

24 PTR argue that their three-way choice is a reasonable starting point, though they agree these sortsof grammars are inadequate as models for human language. They also agree that a child does not actuallyfollow this procedure. What is crucial to them is that such completely general statistical Bayesian procedurecan converge on the hypothesis that the grammar has hierarchical structure. But these three possibilitiesverge on straw-man possibilities—by their own admission, they are not alternatives a child would actuallyentertain. The finite memorized set is not even worth considering for elementary memory reasons. Further,as we note in the main text, FSGs do yield hierarchical structure, unless we add an extra assumption of strictassociativity. We are then left with two choices, not three, both with hierarchical structure, and, it has beenknown from the foundational work in formal language theory, finite-state grammars will in general bemuch larger than CFGs generating the same regular language (see Meyer and Fischer 1969; Berwick 1985).So FSGs are easily eliminated as possible candidates for natural languages, as was familiar from the earliestwork in the field. Furthermore, ordinary CFGs are also ruled out, for reasons understood 40 years agowhen they were eliminated in favor of X-bar theories of phrase structure (N. Chomsky 1970; Jackendoff1972). The basic conclusions are carried over to all work we know of making use of phrase structuregrammar—which can apparently be eliminated in favor of the most elementary combinatorial operationalong the lines discussed above.

25 'We argue that phenomena such as children's mastery of auxiliary fronting are not sufficient to requirethat the innate knowledge constraining generalization in language acquisition be language-specific. Ratherit could be based on more general-purpose systems of representation and inductive biases that favor theconstruction of simpler representations over more complex ones.' (p. 311)

Page 51: Rich Languages From Poor Inputs

36 Berwick, Chomsky, & Piattelli-Palmarini

or Chinese—but storage quickly grows far beyond any conceivable memory capacity(and in the limit, is of course impossible). Furthermore, there is no way to addresseither of the basic conditions (I) or (II) above.

That leaves only the context-free and regular grammars as real candidates. Assum-ing language to be infinite, as do PTR, then there must be some operation thateventually applies to its own output, that is, recursively, or some logical equivalentlike a Fregean ancestral. The sequence of applications of this operation always fixessome hierarchical structure (one notion of strong generation), which is not to beconfused with the (weakly generated) string that is produced. E.g., assuming/ to bea successor operation, applied to a single element a, we obtain the structured objectf(...(f(f(a)))...) along with the weakly generated string a". The operation appliesto the output of a previous application of that same operation, and so on. Note thathierarchical structure will always be produced when generating infinite languages,even in the finite-state case, though we can add an extra operation that removes it,such as right associativity in the previous example. Similarly, for CFGs, an operationto remove structure can be added, leaving the (weakly generated) 'terminal string'.Thus both of PTR's remaining options generate hierarchical structure, and so thereis actually no choice as to whether language is to be represented with hierarchicalstructure or not. The only question that PTR actually address is whether context-freegrammars are to be preferred to finite-state grammars—both inducing hierarchicalstructure, both long known to be inadequate as descriptions for natural language,as PTR themselves note (p. 329)—while excluding by stipulation the simpler andapparently much more adequate systems described above in section 2.3.

PTR assume that if a grammar produces hierarchical structures, then rules must bestructure-dependent. But this is an error. Thus given the structure (la) (= [can [eaglesthat v* fly] v eat]), we are free to interpret 'can' in the position v with a structure-dependent rule, or to interpret it in the position v* with a structure-independent rule.That was the original problem. It makes no difference whether structure is innatelydetermined or learned, or if the latter, how it is learned. In all of these cases, theoriginal POS problem remains unaffected.26

The confusion between hierarchical structure and structure dependence of rulesappears throughout their paper. Thus they state, 'Henceforth, when we say that "lan-guage has hierarchical phrase structure" we mean, more precisely, that the rules ofsyntax are defined over hierarchical phrase-structure representations rather than a

26 An additional error is PTR's conflation of hierarchical structure with analysis into phrases. Thussuppose we have the following (non-hierarchical) sequence of phrases: [5 [AUX is] INP the eagle] [ppin the air] ]. A structure-dependent rule can refer to the phrase-names S, Aux, NP, etc., remaining blind tothe particular word tokens 'is', 'the', etc. and can front 'the eagle'. A structure-independent rule would ignorethe brackets. Nothing in this presumes that the bracketing must be hierarchical, though of course it maybe (and in fact generally is). The essential point is that grammatical operations make reference to phrases,rather than individual words; the 'hierarchical' addition is just that, PTR's own addition.

Page 52: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 37

flat linear sequence of words. Is the knowledge that language is organized in this wayinnate?' (pp. 7-8).

But having hierarchical phrase structure does not entail that rules are defined overthese structures. Rather, the question remains open. That was exactly the point of theoriginal POS problem, which was originally posed on the assumption that structureis hierarchical.

Elsewhere they ask: 'is it [that the rules of syntax are defined over hierarchicalphrase-structure representations] a part of the initial state of the language acquisitionsystem and thus a necessary feature of any possible hypothesis that the learner willconsider?' (p. 309). They do not address this question, contrary to what they assert.Rather, they consider an entirely different question: Is hierarchical structure innate oracquired? They claim to show that it is acquired, but they do not address this questioneither; rather, they beg the question by considering only a choice between two systems,both hierarchical (putting aside the inconceivable list option). And again, the answerto the question they beg leaves the POS problem unchanged. PTR do not addressthe original POS question regarding the learnability of the structure dependence ofgrammatical rules, as published in all the accounts regarding this topic (Chomsky1968,1971,1975,1980).

PTR go on to say that "This question [learnability of hierarchical structure] hasbeen the target of stimulus poverty arguments in the context of a number of differentsyntactic phenomena, but perhaps most famously auxiliary-fronted interrogatives inEnglish' (p. 309). However, this is incorrect. The question has always been whetherrules are structure-dependent, not whether language is hierarchical; the POS questionremains as before, the choice between using structure or ignoring it when hypothe-sizing rules, regardless of whether children have to learn that language is hierarchicalor not.27

27 PTR base their misconstrual on a single sentence from informal discussion in an internationalconference: 'We quote at some length from one of Chomsky's most accessible statements of this argument,in his debates with Piaget about the origins of knowledge' (Piattelli-Palmarini 1980). It is rather odd to takea sentence from informal discussion (not incidentally with Piaget) when so much is available in the verysame conference proceedings, and in print elsewhere, directly refuting their misinterpretation. But eventhe passage they quote is clear in context. It refers to the suggestion that, if there is hierarchical structure,that would somehow solve the POS problem. It wouldn't, because while the child acquiring language canuse the structure, giving the right result, the 'left-most' property is of course just as readily available as aninduction base.

Furthermore, and more significantly, a few pages later (p. 124), Chomsky points out that the examplesthat were discussed (and that PTR rely on) 'are misleading in one important respect', namely, they arepresented as if they are a list of properties of UG, but the important point is that 'this list of properties formsa highly integrated theory... [They] flow from a kind of common concept, an integrated theory of what thesystem is like. This seems to me exactly what we should hope to discover: that there is in the general initialcognitive state a subsystem (that we are calling [UG] for language) which has a specific integrated characterand which in effect is the genetic program for a specific organ... It is evidently not possible now to spell itout in terms of nucleotides, although I don't see why someone couldn't do it, in principle'. The structure-dependent hypothesis discussed is one fragment of that integrated theory, which, we have suggested, can bereduced to much simpler terms that apply much more generally, and in crucial respects may be language-or even organism-independent.

Page 53: Rich Languages From Poor Inputs

38 Berwick, Chomsky, & Piattelli-Palmarini

As we have seen, PTR's proposals are irrelevant to the original question of structure-dependence, and are also irrelevant to the new question they raise of learnability ofhierarchical structure. Furthermore, as already discussed, the conclusion that lan-guage is hierarchically structured follows virtually without assumptions, so the ques-tion they pose (and beg) does not arise.

In short, PTR do not deal with the only problem that had been posed: how theproper pairings are acquired by the child, in accord with the universal patterning ofdata as described in examples (s)-(8). PTR do not even answer the question as to whywe should expect the acquired grammar to be a CFG in the face of overwhelmingevidence that CFGs make far too many unwarranted stipulations; for example, thereis no reason to choose the rule VP -> V NP rather than VP -> N PP. These areamong the considerations that led to X-bar theory forty years ago.28 The merge-based system described in section 2.3 is simpler—requires fewer factor (i) language-specific stipulations—than PTRs 'best' CFG with its hundreds of rules. It also yieldsthe required pairings straightforwardly, and appears to deal appropriately with thecross-linguistic examples and constraints that PTRs stipulations do not even address.

2.4.2.2 Reali and Christiansen (2005); RC: learning from bigrams and trigrams29

Besides PTRs Bayesian method, others have offered statistically based proposals forsolving the POS problem for yes-no questions. We consider just one representativeexample here, a recent model by Reali and Christiansen (2005), hereafter RC. Assummarized in a critique of this model, 'knowledge of which auxiliary to front isacquirable through frequency statistics over pairs of adjacent words (bigrams) intraining corpus sentences' (Kam, Stoyneshka, Tornyova, Fodor, and Sakas 2008: 722).

RC's method is straightforward. Like PTR, RC use a corpus of child-directed speechfrom CHILDES as their test input data, but in this case, actual words, not just partsof speech; in this sense their approach is less stipulative than PTRs. This becomesthe training data to calculate the frequency of word pairs. Given this, one can thencalculate an overall sentence likelihood, even for previously unseen sequences of wordpairs.30 This sentence likelihood is then used to select between opposing 'test sentence

28 It can be shown that, while PTR's system using tree substitution can correctly match a few of thecorrect and incorrect patterns for auxiliary-verb inversion, it fails on many others, both in terms of weakgenerative capacity as well as in terms of assigned parse trees, because the CFG rules sometimes interacttogether to yield the wrong results.

29 See also the critique offered by Kam and Fodor, this volume and Berwick, Pietroski, Yankama, andChomsky 'Poverty of the Stimulus Revisited' (2011). This recent work and Kam's previous work (2007,2009) show that RC's kind of statistical analysis can be extended to trigrams (sequences of three words)without success. Serious additional problems arise with the recurrent neural networks proposed by RC.

30 RC used cross-entropy as this likelihood measure, roughly like taking the product of each word pairbigram, but in a log-transformed space (and so turning the product of probabilities into a sum). If wedenote by P(w,-|w;_1) the conditional probability of the word sequence Wj—1Wj, then the cross-entropyof a sentence N words long is -i/N Iog2 £^=2 (

wi \wi-i)- As Kam et al. (2008: 784, and Kam and Fodorthis volume) note, 'cross-entropies and probabilities are intertranslatable (they are inversely proportional)'.Further, if bigram count for a particular pair is o, then RC use a smoothing method based on unigramlikelihoods (the frequencies for the words considered by themselves).

Page 54: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 39

pairs' similar to are men who are tall happy/are men who tall are happy, the idea beingthat sentences with the correct auxiliary fronting will have a greater likelihood thanthose with incorrect auxiliary fronting.

RCs (2005) Experiment i demonstrates that on 100 test pairs, so-called polarinterrogatives with subject relative clauses (PIRCs), the bigram method successfullychooses the correct form 96 percent of the time (as restated by Kam et al. 2008: 773,Table i). RC go on to demonstrate that simple recurrent neural networks, SRNs (Lewisand Elman 2001), can be trained on the same data, replicating this performance.

RC also extended their bigram approach to a trigram model. That is, they calculatedsentence likelihoods according to the frequencies of three-word sequences, ratherthan just two-word sequences. Kam (2007, 2009; Kam and Fodor this volume) alsoaddressed this question, and confirmed that trigram performance essentially trackedthat of bigrams, and succeeded (or failed) for the same reason that bigrams did. Wecan also confirm (see Table 2.1 below), that the homophony issue (between pronounsand complementizers in English) that Kam et al. discovered was the real source of theapparent success of RCs bigrams to discriminate grammatical from ungrammaticalauxiliary inverted forms arises as well in the trigrams case, so moving to trigramsor beyond will not help. Berwick et al. (2011) further argue that a neural network isessentially an emulation of the bigram analysis, and so also fails.

To test RCs trigram approach, we used child-directed utterances by adults fromtwo versions of the Bernstein-Ratner (1984) corpus, one used by Kam et al., with9,634 sentences, and the second supplied to us by RC, with 10,705 sentences.

We first checked to see whether we could replicate the Reali and Christiansen(2005) results using trigrams. Using their trigram estimation method, we again testedwhether trigram statistics could successfully discriminate between the same 100grammatical and ungrammatical test sentence pairs used by RC. The results are givenin row i of Table 2.1: 95 percent correct, 5 percent incorrect, and o undecided. This isquite close, but not exactly the same as Reali and Christiansens (2005) results. Reali

Experiments

i. Replication of Reali andChristiansen (2005),trigram test

2. Trigram test, using Realiand Christiansen's (2005)corpus and test sentences,but with disambiguated rel-pronouns

Sentences Tested % Correct % Incorrect % Undecided

100 95 5 o

100 74 11 15

TABLE 2.1. Percentage of test sentences classified correctly vs incorrectly as gram-matical or undecided, using RC's trigram analysis and Kam et al.'s methodology

Page 55: Rich Languages From Poor Inputs

40 Berwick, Chomsky, & Piattelli-Palmarini

and Christiansen found that the trigram analysis gave exactly the same results as thebigram analysis, incorrectly judging the same four ungrammatical sentences as morelikely than their grammatical counterparts. Our replication made a single additionalmistake, classifying is the box that there is open as more likely than is the box thatis there open. In this last case, aside from one exception, all trigrams in both thegrammatical and ungrammatical sentences are o, so the trigram values are actuallyestimated from bigram and unigram values. The sole exception is what makes thedifference: the ungrammatical form contains a trigram with frequency i, 'is open end-of-sentence', while the corresponding grammatical form has displaced 'is' to the frontof the sentence, so this trigram does not occur. This single 'winning trigram' in theungrammatical form is enough to make the ungrammatical form more likely underthe trigram analysis than its grammatical counterpart.

We then applied Kam et al.'s methodology regarding the effect of the homographicforms for who and that to the trigram case. We replaced each occurrence of who andthat in the Reali and Christiansen test sentence data where these words introducedrelative clauses with the new forms who-rel and that-rel. The training data was a sim-ilarly modified Bernstein corpus. We then applied the trigram calculation to classifythe revised test sentences, with the results shown in row 2 of Table 2.1: 74 percentcorrect, 11 percent incorrect, 15 percent undecided. This is some improvement overthe bigram results, of about 5 percent, but is still well below the close to perfectresults found when 'winning bigrams' like who-is or that-is are not excluded. Thus,while trigrams boost performance slightly, the high accuracy for both bigrams andtrigrams in discriminating between grammatical and ungrammatical sentences seemsto be due to exactly the effect Kam et al. found: the accidental homophony betweenpronouns and complementizers in English, rather than anything to do with the yes-noquestion construction itself.

In short, moving to a trigram model does not help solve this POS problem; indeed,this problem too has been restricted to weak generative capacity, and has omitted thecentral issue of valid pairings.

Ultimately, the flaw here, parallel to that of CE, is that what the child (and adult)comes to know, as we have seen in our discussion, is indeed based on the structuredependence of rules, whether acquired or innate, and this knowledge cannot be repli-cated by simply examining string sequence frequencies.

More broadly, the bigram analysis makes no attempt to construct pairings. Thebigram analysis takes any examples such as (ia,b) as a string of word pairs, withthe declarative pairs unrelated to the corresponding interrogatives, thus avoidingthe central issue of a semantic connection between a declarative sentence and itscorresponding interrogative.

Further, the bigram analysis does not cover the desired range of cases in (6)-(i5).Finally, the RC bigram analysis is not the simplest possible, since it demands a factor(2) probability calculation that does not otherwise seem to be required (and for longer

Page 56: Rich Languages From Poor Inputs

POS Stands: Why Recent Challenges Fail 41

sentences becomes increasingly difficult). To be sure, it has been argued elsewhere(Saffran, Aslin, and Newport 1996; Hauser, Aslin, and Newport 2001) that such afacility might be available as part of some more general cognitive competence, even inother animals (though it has yet to be demonstrated that such high-precision numeri-cal calculations are readily available). But as we have seen, there is a simpler alternativethat gets the correct answers yet does not invoke any such likelihood calculation at all.

2.5 Conclusion: What POS Questions Remain?

Much progress has been made in the past half century in reducing the richness andcomplexity of the postulated innate language-specific properties, thus overcomingcertain POS problems and laying a sounder basis for addressing further questionsthat arise within the biolinguistic framework: questions of acquisition/development,evolution, brain-language relations, and the like. Examples since the 19605 include theelimination of phrase structure grammar (PSG) with all of its complexity and stipula-tions, progressive simplification of transformational rules and finally their reductionto the same primitive and arguably minimal operation that yields the core of PSGproperties, and much else. Needless to say, a great deal remains unexplained. Andeven if reduced, POS problems always remain, as apparent factor (i) elements areaccounted for in terms of general principles—in the best case, natural law.

A good illustration is the example we have been considering: V-raising. As wediscussed, there is a natural account in terms of minimal search, possibly a principleof computational efficiency that falls within laws of nature: namely, a clause-initialelement C(omplementizer) that determines the category of the expression (declara-tive, interrogative, and the like) attracts the closest verbal element, where 'distance'is measured structurally, not linearly, the only possibility in a Merge-based system inwhich linear order is part of the mapping to the sensorimotor interface, hence does notenter into core syntactic/semantic computations. This proposal is based on a numberof assumptions, some reasonable on empirical and conceptual grounds, but some ofthem illegitimate within the minimal Merge-based framework, hence stipulative, factsthat have not hitherto been recognized. Note that any such stipulation amounts to areal POS problem, since it is a particular fact about language that must be antecedentlyavailable. As mentioned at the outset, we would like to eliminate such stipulations ifpossible, eliminating the POS problem that arises. While we cannot pursue the matterhere in any depth, the general situation maybe easily stated.

Abstracting from many important details, consider the syntactic object (18),exhibiting the basic structure of (ib), 'eagles that fly can eat':

(18) [ C [Auxp Subject UuxpAux VP]]]

Here the subject, called the 'specifier of AuxP' (SPEC-AuxP), is 'eagles that fly', theinner Aux-phrase is 'can VP,' and the VP is 'eat'. Aux is the head of the AuxP, carrying

Page 57: Rich Languages From Poor Inputs

42 Berwick, Chomsky, & Piattelli-Palmarini

all information about it that is relevant for further computation (its label). C searchesfor the closest label and finds Aux, which it raises to C. This is essentially the analysisin traditional grammar, though the terminology and framework are quite different.Further, it appears to capture the basic facts in the simplest way. But it is illegitimate onour assumptions, since the very notion of a Specifier position (SPEC) is an illegitimateborrowing from phrase structure grammar which has long been abandoned, for goodreasons (see Chomsky ion-forthcoming, and in this volume).

On any account, putting the particular terminology to one side, the Subject ismerged with AuxP to form a subject-predicate construction. In our terms, oversim-plified here for expository reasons, the operation yields {subject, AuxP}. The Subjectalso has a label, say N (for simplicity): 'eagles that fly' is an NP. But the minimal searchprocedure that seeks the closest label in (ib) runs into an ambiguity: should it select Nor Aux? The problem does not arise in the structure (18); here Aux is the closest label,by stipulation. But we have no basis for the stipulation that the Subject is the Specifierof the auxiliary phrase, SPEC-AuxP, rather than, say, AuxP being the Specifier of theSubject, SPEC-subject. The stipulation illustrated in (18) is therefore illegitimate. Thelearner must somehow know the right choice to pursue; this is a real POS question thatone should try to eliminate, deriving the correct choice from more general principlesthat are already required.

Note that the same problem arises in any structure of the form A = {XP, YP} whereneither XP nor YP is a lexical item (a head). A proposed solution should address thebasic Aux/V-raising problems; and reciprocally, provide evidence for the assumptionson which they are based. They must be integrated within broader principles that holdfor structures of the form A more generally. There is much to say about this topic, butthis is not the place. These considerations about the special case of Aux/V-raising do,however, suggest ways to address a variety of problems that have resisted principled,non-stipulated analysis, while also opening new and quite intriguing questions. Thatis exactly the kind of outcome we should look forward to when principled solutionsare sought for POS problems.

We cannot proceed with the matter here, but it is worth observing that the initialquestion proposed as a didactically simple illustration of the general POS problemhas in this way opened up many productive lines of inquiry when addressed in themanner that is typical of questions of biology and linguistic science generally, as wediscussed briefly at the outset.

Page 58: Rich Languages From Poor Inputs

Children's Acquisition of Syntax:Simple Models are Too Simple*

XUAN-NGA CAO KAM AND JANET DEAN FODOR

3.1 Introduction

3.1.1 Studying early syntax acquisition

There has been a renewal of interest in statistical analysis as a foundation for syntaxacquisition by children. At issue is how much syntactic structure children couldinduce from the word sequences they hear. This factors into three more specificquestions: How much structure-relevant information do word strings contain? Whatkinds of computation could extract that information? Are pre-school children capableof those kinds of computation? These points are currently being addressed fromcomplementary perspectives in psycholinguistics and computational linguistics.

Experimental studies present a learning device with a sample of sentences froma target language, and assess what aspects of the target syntax are acquired. Thelearning device maybe a child, an adult, or a computer program. The language maybeartificial or (part of) a real natural language. Each of these combinations of learner andlanguage is responsive to one of the methodological challenges in research on earlysyntax acquisition. In the research reported here, the language was natural but thelearner was artificial. We explain below why we regard this combination as especiallyfruitful.

Testing infants has the undeniable advantage that the psychological resources(attention, memory, computational capacity) of the subjects match the resources avail-able for real-life primary language acquisition. However, the input language in childstudies is typically artificial, because it is improper to tamper with the acquisitionof the subjects' native language, and also to control across subjects exactly what input

* This work originated in a joint project with Iglika Stoyneshka, Lidiya Tornyova, and William Sakas,published as Kam et al. (2008). The project was continued in Kam's (2009) dissertation. The present chapterhas benefited from the technical expertise of Martin Chodorow and William Sakas.

3

Page 59: Rich Languages From Poor Inputs

44 Kam and Fodor

they receive. In order for an infant to acquire properties of an artificial language in thespan of an experimental session, the language must also be very simple. See Gomezand Gerken (1999) for a classic example.

Adult subjects can undergo more extensive and rigorous testing than infants, pro-viding more data in less time. But again, the input language must be artificial andfairly simple for purposes of experimental control and uniformity across subjects (e.g.,Thompson and Newport 2007). With adult subjects, moreover, it is not possible toexclude from the experimental context any expectations or biases they may have dueto their existing knowledge of a natural language. For example, Takahashi and Lidz(2008) found that the adult subjects in their study respected a constituency constrainton movement in the test phase, even when the training sample contained no move-ment constructions. Although of considerable interest, this is prey to uncertaintiessimilar to studies of'normal' adult L2 acquisition: was the sensitivity of movement toconstituency due to an innate bias, or to analogy or transfer from the subjects Li?

Artificial language studies, with children or adults, provide no insight into whatcould be learned from word strings in the absence of any innate biases or priorlinguistic experience. But this is the issue that has animated many recent computa-tional studies of language acquisition, motivated in large part by a conjecture thatlanguage acquisition may not, after all, require any innate substrate, despite long-standing assumptions to the contrary by many linguists and psycholinguists. Thefocus of these computational studies is on pure distributional learning, relying solelyon the information that is carried by regularities in the sequences of words.1 Forinvestigating this, only an artificial learner will do. If the learning system is an algo-rithm implemented in a computer program, there is complete certainty as to whether,before exposure to the target input, it is innocent of linguistic knowledge of any kind(as in the model we discuss below), or whether it is equipped with certain biasesconcerning what language structure is like, such as the 'priors' of Bayesian learningmodels (Perfors et al. 2006,2011) or some version of Universal Grammar as espousedby many linguists.

Another advantage of artificial learners is that the target language can be a realnatural language, or a substantial part thereof. Since the learning algorithm has noLi, there are no concerns about transfer. More complex phenomena can be examinedbecause there is little constraint on the extent of the training corpus or how manyrepetitions of it the learner is exposed to. Moreover, not only is the presence/absenceof prior knowledge under the control of the experimenter, but so too are the com-putational sophistication and resources of the learning device. So this approach canprovide systematic information concerning what types of computational system canextract what types of information from input data. We illustrate this in section 3.3below.

1 While children make use of prosodic, morphological, and semantic properties of their input (Morgan1986), these sources of information are set aside in many computational studies in order to isolate effectsof co-occurrence and distribution of words.

Page 60: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 45

The artificial learner approach has its disadvantages too, especially uncertainty as towhich experimental outcomes have bearing on human native language acquisition. Incompensation, however, a wide range of different algorithms can be fairly effortlesslytested, and informative comparisons can be drawn between them. The hope is thatit may one day be possible to locate children's learning resources and achievementssomewhere within that terrain, which could then provide guidance concerning thetypes of mental computation infants must be engaging in as they pick up the facts oftheir language from what they hear.

3.1.2 Transitional probabilities as a basis for syntax acquisition

The specific learning models we discuss here are founded on transitional probabili-ties. It has been demonstrated that infants are sensitive to transitional probabilitiesbetween syllabic units in an artificial language, and can use them to segment a speechstream into word-like units (Saffran et al. 1996). For syntax acquisition, what isrelevant is transitional probabilities between one word and the next. Infant studieshave documented sensitivity to between-word transitional probabilities which affordinformation about word order patterns and sentence structure (Gomez and Gerken1999; Saffran and Wilson 2003). The type of learning model discussed below putsthe word-level transitional probabilities to work by integrating them into probabil-ity scores for complete word strings, and on that basis predicts which strings arewell-formed sentences of the target language (details in section 3.2.2). We assess themodels accuracy under various circumstances, and where it falls short we ask whatadditional resources would be needed to achieve a significant improvement in taskperformance.

The original stimulus for our series of experiments was a dramatic report by Realiand Christiansen (2003,2005) (see also Berwick, Chomsky, and Piattelli-Palmarini inthis volume). They found that an extremely simple model using transitional probabil-ities between words, trained on extremely simple input (speech directed to one-year-olds), was able to ace what is often regarded as the ultimate test of syntax acquisition:which auxiliary in a complex sentence moves to the front in an English question?If that finding could be substantiated, there would appear to be no need to developmore powerful acquisition models. Distributional learning of a complex syntacticconstruction would have been proved to be trivially easy.

We checked the finding and replicated it (results below). However, as we willexplain, we found it to be fragile: almost any shift in the specific properties of the testsentences resulted in chance performance or worse. Thus, two questions presentedthemselves, (i) What distinguishes the circumstances of the original success fromthose of the subsequent failures? (ii) Does an understanding of that give grounds foranticipating that broader success is within easy reach, needing perhaps only slightenrichment of the original model or the information it has access to?

Page 61: Rich Languages From Poor Inputs

46 Kam and Fodor

To address these points, the first author conducted a series of eighteen computerexperiments, reported in full in Kam (2009). The earlier experiments, summarizedhere as background, showed that the models use of transitional probabilities at thelevel of words, or even with part-of-speech categories, does not suffice for reliablediscrimination between grammatical and ungrammatical auxiliary fronting (Kam2007). In this paper we report our most recent experiments in the series, which weredirected to the role of phrase structure in the acquisition of auxiliary movement.To anticipate: we found that if, but only if, the learning model had access to certainspecific phrase structure information, it succeeded spectacularly well on the auxiliary-fronting construction. The implication is that transitional probabilities could be thebasis for natural language syntax acquisition only if they can be deployed at severallevels, building up from observable word-level transitions to relations between moreabstract phrasal units.

3.2 The OriginalN-Gram Experiments

3.2.1 Linguistic preliminaries

The sentences tested in these experiments were instances of what we call the PIRCconstruction (Polar Interrogatives containing a Relative Clause), in which questionformation requires fronting of the auxiliary in the main clause, not the auxiliary inthe RC (N. Chomsky 1968 and since). Grammatical and ungrammatical forms werecompared. Examples are shown in (i), with the trace of the moved auxiliary indicatedhere (though of course not in the experiments).2

(1) a. Is; the little boy who is crying f; hurt?

b. *Is; the little boy who f; crying is hurt?

Reali and Christiansen (henceforth R&C) tested n-gram models: a bigram modeland a trigram model. A bigram is a sequence of two adjacent words; a trigram is asequence of three adjacent words. These n-gram models did not differ radically intheir performance, so for brevity here we focus on the bigram model. It gathers bigramdata from a corpus of sentences, and feeds it into a calculation of the probabilitythat any given sequence of bigrams would also occur in the corpus. The bigrams insentences (ia,b) are shown, in angle brackets, in (2a,b) respectively.

(2) a. <is the> <the little> <little boy> <boy who> <who is> <is crying>< crying hurt >

b. <isthe> <thelittle> <littleboy> <boywho> <who crying> <cryingis><is hurt>

2 Following standard practice we refer to the inverting verbs as auxiliaries, though the examples oftencontain a copula (as in the main clause of (i) above). Below we also discuss do-support and inversion ofmain verbs.

Page 62: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 47

Bigram statistics could be employed in many different ways within a learning model(see for example Chang et al. 2006; also section 3.4 below). The bigram model asdefined by R&C puts bigrams to work in a direct and simple manner. It does notrepresent syntactic structure. It does not compose grammar rules. Its knowledgeof the language consists solely of the set of all the bigrams in the training corpus,each assigned an estimated transitional probability (see below). R&C's experimentalproject thus raises the linguistic-theoretical question: Is it possible in principle to dis-criminate grammatical and ungrammatical forms of auxiliary inversion by referencesolely to pairs of adjacent words?

We think most linguists would judge that it is not, for several reasons. One con-sideration is Chomsky's original point: that the generalization about the right auxil-iary to move is not that it is in any particular position in the word string, but thatit is in a particular position in the syntactic tree; auxiliary inversion is 'structure-dependent'. There are non-transformational analyses of the inversion facts, but theyalso crucially presuppose phrase structure concepts (see section 3.4.2 below). Also,auxiliary movement creates a long-distance dependency between the initial auxiliaryand its trace if defined over the word string (six words intervene in (la), clearly beyondthe scope of a bigram) whereas the dependency spans just one element, an NP inevery case, if defined over syntactic structure, bringing it within reach at least of atrigram model. So a purely linear analysis in terms of word pairs would seem unlikelyto be able to capture the relevant differences that render (la) grammatical and (ib)ungrammatical. However, R&C's noteworthy finding of successful discrimination bythe bigram model suggests that we should pause and reconsider. Perhaps, after all,there are properties of the word pairs in the two sentence versions which, in somefashion, permit the grammatical one to be identified.

For instance, the bigram model might judge (ib) ungrammatical on the basis of itsbigram <who crying>, which presumably is absent or vanishingly rare in a typicalcorpus. This may sound like a sensible strategy: judge a sentence ungrammatical ifit contains an 'ungrammatical' (i.e., unattested) bigram. Against such a strategy theobjection is often raised that a linguistic form maybe unattested in a corpus for manyreasons other than its being ungrammatical (cf. Colorless green ideas sleep furiously;Chomsky 1957). But in the case of auxiliary inversion, there is another and quitespecific problem with this approach: the grammatical version (la) also contains avanishingly rare bigram <crying hurt>. By parity of reasoning, that should indicateto the model that (la) is also ungrammatical, leaving no obvious basis for preferringone version of the sentence over the other. Thus, a decision strategy based on weighingone low-frequency bigram against another is delicately balanced: it might sometimessucceed, but not reliably so unless there were a systematic bias in the corpus againstbigrams like <who crying> and in favor of bigrams like <cryinghurt>. It is not clearwhy there would be; but that is just the sort of thing that corpus studies can usefullyestablish.

Page 63: Rich Languages From Poor Inputs

48 Kam and Fodor

An alternative strategy might focus instead on the higher-frequency bigrams in thetest sentences. The learner might judge a sentence grammatical if it contains one ormore strongly attested bigrams. A good candidate would be the bigram <who is>in (la), which can be expected to have a relatively high corpus probability. Sincethe ungrammatical version has no comparably strong bigram in its favor, there is anasymmetry here that the learner might profit from. This generates an experimentalprediction: If the grammatical version in all or most test pairs contains at least onestrong bigram, a high percentage of correct sentence choices is likely;3 if not, themodels choices will not systematically favor the grammatical version. In the lattercase, exactly how well the model performs will depend on details of the corpus,the test sentences, how bigram probabilities are calculated, and the sentence-levelcomputations they are entered into. These we now turn to.

3.2.2 Procedure

For maximum comparability, all our experiments followed the method established byR&C except in the specific respects, indicated below, that we modified over the courseof our multi-experiment investigation. The training corpus consisted of approximately10,000 child-directed English utterances (drawn from the Bernstein-Ratner corpusin CHILDES; MacWhinney 2000). The test sentences were all instances of the PIRCconstruction. In a forced-choice task, grammatical versions were pitted against theirungrammatical counterparts (fronting of the RC auxiliary), as illustrated by (i) above.

For our Experiment i, a replication of R&C's, we created 100 such sentence pairsfrom words ('unigrams') in the corpus, according to R&C's templates in (3), wherevariables A and B were instantiated by an adjective phrase, an adverbial phrase, aprepositional phrase, a nominal predicate, or a progressive participle with appropriatecomplements.

(3) Grammatical: Is NP is A B?

Ungrammatical: Is NP A is B?8

The corpus contained monoclausal questions with auxiliary inversion (e.g., Are yousleepy?), and non-inverted sentences with RCs (e.g., That's the cow that jumped overthe moon), but no PIRCs.

R&C computed the estimated probability of a sentence as the product of the esti-mated probabilities of the bigrams in the sentence.4 The sentence probability was

3 Of course it is possible that the bigrams in the ungrammatical version collectively outweigh theadvantage of the strong bigram(s) in the grammatical version, so this strategy is not guaranteed to alwayslead to the correct choice. See results below.

4 The probability of a bigram not in the corpus must be estimated. We followed R&C in applying aninterpolation smoothing technique. In what follows, we use the term 'bigram probability' to denote thesmoothed bigram probability.

that

that

who

who

Page 64: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 49

entered into a standard formula for establishing the cross-entropy of the sentence(see details in R&C 2005 and Kam et al. 2008). The cross-entropy of a sentence isa measure of its unlikelihood relative to a corpus; a lower cross-entropy correspondsto a higher sentence probability. In the forced-choice task the model was deemed toselect as grammatical whichever member of the test sentence pair had the lower cross-entropy relative to the training corpus. To simplify discussion in what follows, we referto sentence probabilities rather than cross-entropies; this does not alter the overallshape of the results.

It is important to note that a bigram probability in this model is not the probabilitythat a sequence of two adjacent words (e.g., boy and is) will occur in the corpus. It isthe probability of the second word occurring in the corpus, given an occurrence ofthe first: the bigram probability of <boy is> is the probability that the word is willimmediately follow an occurrence of the word boy. So defined, a bigram probabilityis equivalent to a transitional probability, as manipulated in the stimuli for the infantlearning experiments noted above.

3.2.3 Initial results and their implications

In R&C's Experiment i, the bigram model selected the grammatical version in 96of the 100 test sentence pairs. In our Experiment i the model also performed well,predicting 87 percent of the test sentences correctly. Now we were in a position to beable to explore the basis of the models correct predictions.

Some bigrams in the test sentences could not have contributed, because they wereidentical in the grammatical and ungrammatical versions. For the sentence pair (i),the bigrams <isthe>, <thelittle>, <little boy>, and <boywho> are in both versions.The bigrams that differ are shown in Table 3.1; we refer to these as distinguishingbigrams. The model's selection of one sentence version over the other can dependonly on the distinguishing bigrams.

The results showed, as anticipated in our speculations above, that the majority ofcorrect choices were due to the contribution of the distinguishing bigram containingthe relative pronoun in the grammatical version: either <who is> or <that is>.(Henceforth, we abbreviate these as <who\that is>.) This bigram had the opportunityto influence all judgments in the experiment because it appeared in every grammaticaltest sentence, and not in any ungrammatical versions. Note that this was by design:it was prescribed by the templates in (3) that defined the test items. The <who\that

TABLE 3.1. Distinguishing bigrams for the test sentence pair

(la) grammatical <who is> <is crying> <cryinghurt>

(ib) ungrammatical < who cry ing > <cryingis> < is hurt >

(1a)/(1b)

Page 65: Rich Languages From Poor Inputs

50 KamandFodor

is> bigram boosted selection of the grammatical version in many cases because ithad a higher corpus frequency than most other bigrams in the test sentences, in partbecause its elements are both closed-class 'functional' items, which recur more oftenthan typical open-class lexical items.5 In the ungrammatical version, by comparison,the word who or that was followed by a lexical predicate, differing across the sentencepairs and mostly with low corpus frequency (e.g., <who crying> in (ib)).

In short: the <who\that is> bigram is the means by which the model was able toselect the correct form of auxiliary inversion. Its performance rested on a strictly localword-level cue, without any need to recognize the auxiliary movement dependencyper se or to learn anything at all about the structural properties of PIRCs. Thus, onepart of our mission was accomplished. Discovering the decisive role of the <who\thatis> bigram explains the models strong performance in R&C's original experiment,and in our replication of it. But this discovery raises a doubt about whether the modelcould select the grammatical version of PIRCs that lack a helpful 'marker' bigram suchas <who\that is>. Our next task, therefore, was to find out whether other varieties ofPIRC contain bigrams that can play a similar role.

3.3 Limits of N-Gram-based Learning

3.3.1 Extending the challenge

The templates in (3) are very specific. They pick out just a subset of PIRC construc-tions, those with is as the auxiliary in both clauses, and an RC with a subject gap(i.e., the relative pronoun fills the subject role in the RC). But there are many othervariants of the PIRC construction: the auxiliaries may differ, the RC could have arelativized object, the matrix clause might have a lexical main verb that requires do-support in the question form, or in some languages the main verb may itself invert.The rule is the same in all cases, but the bigrams it creates vary greatly. Table 3.2 showssome examples.

In our subsequent group of experiments, aimed at assessing how generally thebigram model could pick out grammatical versions, we tested PIRCs with is in bothclauses but an object gap RC, and PIRCs with a main verb and do-support. We alsotested Dutch examples in which the main verb inverts.

The bigram model did very poorly on these PIRC varieties not constrained byR&C's templates; see Table 3.3.

These weak results suggest that the model did not find any reliable local cues to thegrammatical version. Inspection of the distinguishing bigrams confirmed that theseother PIRC varieties do not contain any useful 'marker' bigrams. These results thussupport the diagnosis that when the bigram model does succeed, it does so on the basis

5 Other factors bestowing a powerful role on the < who \ that is> bigram were the specific nature of R&C'ssmoothing formula, and the fact that many other bigrams in the test sentences were not in the corpus; fordetails see Kam et al. (2008: section 3.2).

Page 66: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 51

TABLE 3.2. More varied examples of auxiliary (or main verb) inversion

Subtype of PIRC Example

Is-is subject gap (la) Is the little boy who is crying hurt?Other auxiliaries Can the lion that must sleep be fed carrots?Is-is object gap Is the wagon that your sister is pushing red?Main verbs with cio-support Does the boy who plays the drum want a cookie?Main verb inversion in Dutch Wil de baby [die op de stoel zit] een koekje?

'Does the baby that is sitting on the chair want a cookie?'

TABLE 3.3. Bigram model performance for four varieties of PIRC

Subtype of PIRC

Is-is subject gap RC (as above)Is-is object gap RCMain verbs with cio-supportMain verb inversion in Dutch6

% correct

87354932-5

% incorrect

13155155

% undecided

o50

o12.5

of information that is neither general nor inherently related to the structurally relevantproperties of PIRCs. It is no more than a lucky chance if some specific instantiationof the PIRC construction—such as the one originally tested—happens to offer a high-probability word sequence that correlates with grammaticality.

A tempting conclusion at this point is therefore that this simple learning modelis too simple to match the achievements of human learners. The original result wasimpressive, but subsequent tests appear to bear out the hunch that a word-level learneris not equipped to recognize the essential difference between correct and incorrectauxiliary inversion. Neither the early success on is-is subject gap PIRCs nor the natureof the subsequent failures encourages the view that broader success could be attainedby minor adjustments of the model or its input. So perhaps one might rest the casehere. However, we really hoped to be able to settle the matter once and for all, so thatlater generations of researchers would not need to revisit it.

Also, to be fair, it should be noted that no child acquisition study to date hasinvestigated the age (and hence the level of input sophistication) at which learners ofEnglish or any language achieve mastery of object gap PIRCs and do-support PIRCs.7

This lacuna in the empirical record includes the much-cited early study by Grain andNakayama (1987), which focused on the is-is subject gap variety. One step in the

6 For Dutch, only forty sentence pairs were tested. All other experiments reported here had 100 testpairs for each subtype of PIRC.

7 It has been maintained (Ambridge et al. 2006) that children before five years do not have a fullyproductive rule for auxiliary inversion even in single-clause questions.

Page 67: Rich Languages From Poor Inputs

52 KamandFodor

right direction is taken in a recent study by Ambridge et al. (2008), which extendsthe domain of inquiry from is-is to can-can PIRCs.

One last reason for not rejecting n-gram models out of hand for auxiliary inversionis that it is not at all an uncommon occurrence in current research to find that, as com-putational techniques have become ever more refined and powerful, they can achieveresults which would once have been deemed impossible (Pereira 2000). Thus, givenour goal of establishing an unchallengeable lower bound on learning mechanisms thatcould acquire a natural language, it was important to assess whether or not the failureswe had documented stemmed from the inherent nature of the n-gram approach. Thuswe entered the next phase of our project. We conducted additional experiments inwhich we provided the n-gram model with better opportunities to succeed if it could.

3.3.2 Increasing the resources

In Experiments 7-12, keeping the basic mechanism constant, we provided it withenriched training corpora:

• a longitudinal corpus of speech to a child (Adam) up to age 5;2;• a corpus approximately ten times larger than the original, of adult speech to older

children, up to age eight years, containing more sophisticated syntax;• a corpus into which we inserted PIRC examples (fifty object gap; fifty do-

support), providing direct positive information for the model to learn from ifit were capable of doing so;

• the original corpus but with sentences coded into part-of-speech tags, as a bridgebetween specific words and syntactic structure.

In Experiments 13-15, we moved from the bigram model to a trigram model, gather-ing statistical data on three-word combinations, thus expanding the models windowon the word string. The trigram model was trained on the original corpus and thelarger corpus with and without part-of-speech tags. (See Kam 2009: ch. 3 for detailedresults.)8 In all these studies we used the object gap and do-support PIRCs as test casesfor whether an n-gram model could go beyond reliance on an 'accidentally' supportivesurface word sequence such as <who\that is> in the subject gap examples.

These resource enhancements did improve the n-gram models' success rate to someextent, but performance on object gap and do-support PIRCs was still lackluster. Per-formance did not rise over 70 percent correct, except in one case (out of twenty-oneresults) which could be attributed to the presence of a 'marker' trigram.9 Moreover,the n-gram models never did well across all PIRC varieties under the same conditions:

8 The chapter by Berwick, Chomsky, and Piattelli-Palmarini in this volume, which includes a critiqueof R&C's approach to auxiliary inversion, presents data for the trigram model trained on an additionalcorpus: one created by Kam et al. (2008) in which the relative pronouns who and that were distinguishedfrom interrogative who and demonstrative and complementizer that.

9 The trigram was <n v:aux&3S part-PROG> (e.g., sister is pushing). It appeared only in grammaticalversions, and in most of them due to materials construction: object gap RCs needed transitive verbs rather

Page 68: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 5 3

sometimes performance on object gap PIRCs improved but do-support PIRCs did lesswell, and vice versa. Even the is-is subject gap type was less successful in many casesthan in the original experiment. (See Kam 2009: ch. 3 for detailed results.)

Thus this series of experiments provided little support for the view that n-grammodels are on basically the right track and need only a little more assistance from theenvironment to begin performing at a consistently high level. Two conclusions seemto be warranted. One is that either there wasn't rich information in the corpus or then-gram models were too weak to extract it. Either way, the experimental findings offerno demonstration of 'the richness of the stimulus', which is the conclusion that R&Cdrew from their results: 'the general assumptions of the poverty of stimulus argumentmay need to be reappraised in the light of the statistical richness of language input tochildren' (R&C 2005: 1024). The second conclusion is that the n-gram models wereunable to extend a pattern learned for one subvariety of PIRC onto other instantiationsof the same linguistic phenomenon. The object gap and do-support forms were notmastered on their own terms, based on their own particular distributional proper-ties; but equally clearly, the n-gram models did not form a general rule of auxiliaryinversion which could be projected from the subject gap type to other varieties.

All of this points to a deep inability of a localistic word-oriented learning model todetect or deploy the true linguistic generalization at the heart of auxiliary inversionphenomena. Therefore a more radical shift seems called for: a qualitative rather thana merely quantitative augmentation of the learning model or its resources. Very dif-ferent ideas are possible concerning what more is needed. Linguists may regard UGas the essential addition; computer scientists might call instead for stronger statistics,perhaps as embodied in neural networks;10 psychologists might argue that negativedata (direct or indirect) plays an essential role in child syntax acquisition. These pos-sibilities are worth pursuing. But we chose, in our most recent set of experiments, toexamine the role of phrase structure as a basis for the acquisition of transformationaloperations such as auxiliary inversion.

This third phase of our project thus moves toward a more positive investigationof the computational resources needed for the acquisition of natural language syn-tax: How could the previous learning failures be rescued? Here we address the spe-cific question: In acquiring the auxiliary inversion construction, could an n-grammodel benefit from access to phrase structure information? Chomsky's observationconcerning the structure dependence of auxiliary inversion suggests that it might. In

than other predicate types such as adjectives. Apart from this, the only other success occurred when weran the bigram model on the Wall Street Journal corpus (Marcus et al. 1999), which is presumably of littlerelevance to child language acquisition.

10 Neural network models are at the opposite end of the scale from n-gram models in respect ofcomputing power. Simple Recurrent Networks (SRNs) have been applied to the PIRC construction in workby Lewis and Elman (2001) and R&C (2005) and have performed well. But so far they have been tested onlyon the is-is subject gap variety which even the bigram model mastered, so the results are uninformative.More telling will be how they perform with other PIRC varieties on which the bigram model failed. (Seealso Berwick, Chomsky, and Piattelli-Palmarini, this volume.)

Page 69: Rich Languages From Poor Inputs

54 KamandFodor

non-transformational analyses, as in Head-driven Phrase Structure Grammar (HPSG;Sag et al. 2003), there is also crucial reference to phrase structure. Linguists disagreeabout many things, but on this point they are in full accord: there is no viable linguisticanalysis that characterizes the auxiliary inversion construction in terms of unstruc-tured word sequences.

3.4 Providing Phrase Structure Information

The aim of our phrase structure (PS) experiments was to integrate hierarchical struc-tural representations into otherwise simple statistical learning models like thoseabove, which rely solely on transitional probabilities between adjacent items. Thisproject raises novel questions. How would such a learning system obtain PS infor-mation? How could it represent or use it?

On these matters we can only speculate at present. We suppose it might be possibleto implement a sequence of n-gram analyses, at increasingly abstract levels, eachfeeding into the next: from words to lexical categories (parts of speech) to phrasesand then larger phrases and ultimately clauses and sentences. The phrase structureinformation thus acquired would then enter into the PIRC discrimination task toassist in selecting the grammatical sentence. We emphasize that this is an experimentin imagination only at present. There do exist algorithms that compute phrase struc-ture from word sequences,11 but it remains to be established whether they can do sowithout exceeding the computational resources plausibly attributable to a two-year-old child (however approximate any such estimate must be). Multi-level tracking oftransitional probabilities has been proposed as a means for human syntax acquisition.Some of the data are from adult learning experiments (Takahashi and Lidz 2008).But Gomez and Gerken (1999: 132) speculated for children: 'A statistical learningmechanism that processes transitional probabilities among linguistic cues may alsoplay a role in segmenting linguistic units larger than words (e.g. clauses and phrases)'.Of interest in this context are the findings of an infant acquisition study by Saffran andWilson (2003), which suggest that one-year-olds can perform a multilevel analysis,simultaneously identifying word boundaries and learning the word order rules of afinite-state grammar.

The approach we are now envisaging is sketched in (4):

(4) Multilevel n-gram analysis -> phrase structure -> PIRC discrimination

We decided to tackle the second step first, temporarily imagining successful accom-plishment of the first one via some sort of cascade of transitional probability analy-ses at higher and higher levels of structure. We thus made a gift of PS informationto the bigram learning model, and then tested it again on the auxiliary inversion

11 We cannot review this literature. Some points of interest include Brill (1993); Ramshaw and Marcus(1995); Bod (2009); Wang and Mintz (2010).

Page 70: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 55

forced-choice discrimination to see whether it would now succeed more broadly.Whether it would do so was not a foregone conclusion. But if phrase structure knowl-edge did prove to be the key, that would represent a welcome convergence betweentheoretical and computational linguistics.

3.4.1 Method

To run these experiments we had to devise ways by which PS information could beinjected into the learning situation. We did so by assuming that the PS building pro-cess produced as output a labeled bracketing of the word string. Thus we added labeledphrase brackets into all word strings in the training corpus and test sentences.12

We inserted only NP brackets in the present experiments, for two reasons. We wereconcerned that a full bracketing would overwhelm the system. Within the constraintof a limited bracketing, the fact that the word sequence following the initial auxiliaryis an NP seemed likely to be of most benefit to the learner (see discussion below).In future work we can explore the consequences of supplying a full phrase structurebracketing.

NP brackets were manually inserted surrounding all noun phrases in the origi-nal corpus and in the test sentences used in our earlier experiments (subject gap,object gap, and do-support PIRCs). Left and right brackets were distinguished; seeexample (5).

(5) Let NP [ the boy ]NP talk on NP [ the phone ]NP-

For purposes of the bigram analysis, each bracket was treated on a par with wordsin the string. Thus a bigram now consisted of two adjacent items which might bewords and/or labeled brackets. For example, one bigram in (5) is <the boy> andanother is <boy]^p>. Bigram and sentence probabilities (and cross-entropies) werethen computed as before, and employed in the forced-choice discrimination task toselect one sentence version as the grammatical one.

Two experiments were conducted. They differed with respect to the labels on thebrackets in the test sentences. In PS-experiment i the labeled bracketing was as illus-trated in (6). It does not distinguish well-formed NPs such as the boy who is crying in(6a) from ungrammatical NPs such as the boy who crying in (6b).

(6) a. Gramm: Is Np[Np[ the little boy]Np NP[W!IO]NP is crying ]NP hurt?b. Ungramm: Is Np[Np[ the little boy]Np Np[who]Np crying ]NP is hurt?

This labeling would allow us to see whether the model could identify the grammaticalversion based solely on the locus of a sequence of an NP followed by a non-finitepredicate, which is acceptable in the main clause of (6a) but not in the RC in (6b).

12 In other experiments we substituted the symbol NP for word sequences constituting noun phrases.(See Kam 2009: ch. 4 for details.)

Page 71: Rich Languages From Poor Inputs

56 KamandFodor

In PS-experiment 2 we used the label *NP on the brackets around the ill-formedcomplex NP in the ungrammatical sentence version, as in (/b).

(7) a. Gramm: Is Np[Np[ the little boyJNp Np[who]Np is crying ]NP hurt?b. Ungramm: Is *Np[Np[the little boyJNp Np[who]Np crying ]*NP is hurt?

This avoids giving the learning model misleading information about the grammaticalstatus of the word sequence the little boy who crying; it is not in an equivalenceclass with strings like the little boy or Jim. Note, though, that employing this labelingpresupposes that in the prior PS-assignment stage, the learning model would havebeen able to recognize the deviance of who crying and percolate that up from the RCto the NP. We return to this point in discussion below. In any case, explicit indicationthat a word sequence such as the little boy who crying is not a well-formed constituentcould be expected to provide the strongest support for rejection of ungrammaticalPIRCs in the discrimination task.

3.4.1.1 PS-experiment i: Results and discussion The percentages of correct choicesfor the object gap and do-support PIRCs were essentially unchanged compared withthe original experiment without brackets; see Table 3.4. For the subject gap PIRCs,on which the model had previously succeeded without bracketing, there was a highlysignificant drop in performance.

This may appear paradoxical: provided with richer relevant information, the modelperformed less well. A positive outcome might have been anticipated due to thecoding of the whole complex subject as an NP. Yet the data suggest that this hinderedrather than helped. To understand this, let us consider the is-is subject gap examplesin (6), with distinguishing bigrams as in (8).

(8) a. <is crying> < ]NP hurt>

b. < ]NP crying> <is hurt>

The unlikely bigrams <cryinghurt> and <who crying> in (la) and (ib) respec-tively (section 3.2.1 above) have now been transformed by the bracketing intobetter-supported bigrams: <]NP hurt> in (8a) and <]NP crying> in (8b). Thesemight well occur in the corpus, instantiated in sentences like Are Npfyouj^p hurt?and Is Np[Baby]fjp crying? (also in small-clause constructions such as / like }jp[my

TABLE 3.4. Bigram model performance in PS-experiment i

Word string with % correct % incorrect % undecidedNP-labeled brackets

Is-is subject gap PIRCsIs-is object gap PIRCsDo-support PIRCs

3i3745

62

4355

720

o

Page 72: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 57

porridgejfjp hot). But since these bigrams with NP brackets benefit both sentenceversions, they provide no net gain for the grammatical one. For the object gap anddo-support PIRCs, comparable considerations apply, but we will not track throughthe details here.

Outcomes thus remain much as for the original unbracketed corpus—with the oneexception of the is-is subject gap PIRCs which have plummeted from 87 percent to 31percent correct. The reason is clear: the bracketing has broken up the previously influ-ential <who|that is> bigram into <who|that ]NP> and <]NP is>. The former is inboth test sentence versions, and so is the latter although at different sentence positions,so they are not distinguishing bigrams and cannot affect the outcome. The originalstriking success without brackets is thus reduced to the general rough-and-tumble ofwhich particular item sequences happen to be better represented in the corpus.

Thus there is no indication here that NP brackets can solve the discriminationproblem for the bigram learner. Although the NP brackets carry relevant information,a bigram model is unable to make good use of that information because it has toolocal a view of the sentence patterns.13 Its problem is the same as before: there is alocal oddity in both the grammatical and the ungrammatical word string, consistingof a non-finite predicate not immediately preceded by the sort of auxiliary that selectsfor it. The NP-bracketing adds only that what does precede the non-finite predicateis an NP. From a linguistic perspective, however, the relevant difference is that in theungrammatical version what precedes the main predicate is a defective NP, while inthe grammatical version it is a well-formed NP. These are distinguished in the nextexperiment.

3.4.1.2 PS-experiment 2: Results and discussion In PS-experiment 2 we supplied themodel with the information it evidently could not compute for itself in the previousexperiment: that an NP followed by a non-finite predicate is damaging to the sentenceas a whole if it occurs in an RC inside an NP, but not if it is in the main clause.NPs containing an ill-formed RC were labeled with the *NP notation. The results inTable 3.5 show that there were now virtually no errors. The model overwhelminglyfavored the grammatical sentence versions.

What caused rejection of the ungrammatical sentences in this experiment was notthe * symbol itself (which has no meaning for the learning model), but the fact that,unlike all other unigrams in the test sentences, including NP[ and ]NP, the unigrams*NP[ and ]*NP are not present in the corpus. (No utterances in the Bernstein-Ratnercorpus were found to contain ungrammatical NPs.)14 Standard treatment in caseswhere a unigram is unknown in the corpus is to assign it an estimated probability; we

13 With trigrams, which have a wider compass than bigrams, results improved but were still unsatisfac-tory: 58% correct for subject gap; 52% for object gap; 47% for do-support. (See Kam 2009: ch. 4 for details.)

14 We re-ran the experiment after inserting sixty ungrammatical NPs into the corpus, so that theunigrams t-^p [ and ] t^P had a positive probability without invoking the Witten-Bell formula. This madelittle difference: all three PIRC varieties showed 100% correct.

Page 73: Rich Languages From Poor Inputs

58 KamandFodor

TABLE 3.5. Bigram model performance in PS-experiment 2.

Word string with NP and % correct % incorrect % undecided*NP-labeled brackets

Is-is subject gap PIRCsIs-is object gap PIRCsDo-support PIRCs

10010099

oo1

ooo

did so using the Witten-Bell discounting technique (Witten and Bell 1991). However,the estimated probability is low relative to that of actually occurring unigrams, soits presence in the ungrammatical sentence can drag down the sentence probability,leading to preference for the grammatical version.

Together, these two experiments show that an n-gram-based learner could discrim-inate grammatical from ungrammatical PIRCs only if it could distinguish NPs from*NPs. Earlier, we postponed the question of whether and how it could do so. Now wemust consider that.

3.4.2 How to recognize *NP?

Presumably, the recognition that 'the boy who crying' in (ib) is an ungrammaticalnoun phrase would have to occur during the process of assigning phrase structure tothe sentence, based on recognition of 'who crying' as an ungrammatical RC, missingan auxiliary. However, in the grammatical version (la) there is also a missing auxiliaryin the bigram <NP hurt>. The absence of the needed auxiliary has a very differentimpact in the two cases: in (ib) it contaminates every larger phrase that contains it,while in (la) it is amnestied by presence of the auxiliary at the start of the sentence.In general: since natural languages allow movement, absence of an obligatory item(a gap') in one location can be licensed by its presence elsewhere in the sentence. Butthere are constraints on where it can be. RCs are 'extraction islands', i.e., a gap insidean RC cannot be rescued by an item outside it (cf. the Complex NP Constraint of Ross1967). By contrast, the main clause predicate is not an extraction island, so the lack ofa needed auxiliary there can be rescued by association with the extra' auxiliary at thebeginning of the sentence.

The notion of extraction islands has been refined and generalized as syntactic the-ory has progressed. In current theory, the contrast between legitimate and illegitimatemovement is most often portrayed not in terms of specific constructions such as mainclauses versus RCs but in terms of structural locality: local movement dependenciesare favored over more distant ones by very general principles of economy governingsyntactic computations. Deeper discussion of these matters within the frameworkof the Minimalist theory can be found in the chapters by Berwick, Chomsky, and

Page 74: Rich Languages From Poor Inputs

Children's Acquisition of Syntax 59

Piattelli-Palmarini and Chomsky in the present volume; see also the chapter by Bellettiand Rizzi which shows locality/economy principles at work in child language.

By contrast with the transformational approach, recent discussions by Ambridge etal. (2008), Clark and Eyraud (2006), and Sag et al. (2003) suggest that as long as phrasestructure is in place, the correct choice between grammatical and ungrammaticalPIRCs follows even more naturally in a non-transformational theoretical framework,and hence might be even more readily accessible to a modest learning algorithm. Inparticular, a ternary structure for auxiliary inversion constructions, as in (9), is verysimple, and would be frequently attested in the input in sentences such as Is Jim hurt?.

(9)

Aux NP Predicate

Once acquired, this analysis would automatically extend from Is Jim hurt? to Is the littleboy who is crying hurt?. Without a transformational operation that moves the auxiliaryfrom one site to another, there would be no question of moving it from the wronglocation. Ungrammatical PIRC examples like (ib) would be simply ungeneratable.It might even be argued, contrary to stimulus poverty reasoning, that it is actuallybeneficial for learners that they would hear many simple questions like Is Jim hurt?before ever encountering a PIRC.

However, the grammar must not allow a sequence of Aux, NP, and a non-finitepredicate to be freely generated. There is a selectional dependency which must becaptured between the sentence-initial aux and the non-adjacent main clause predicate,as Sag et al. note. The predicate must be of a type that is selected for by the auxiliary;see (10).

(10) Is Jim running? *IsJimrun?Jim is running. * Jim is run.

* Can Jim running? Can Jim run?* Jim can running. Jim can run.

In a transformational framework this selectional dependency across the subject NP iscaptured by the assumption that the auxiliary originates adjacently to the predicate.In HPSG, without movement operations, a lexical rule manipulates the argumentstructure of the auxiliary. In declaratives its first argument (the subject) is realizedpreceding the auxiliary while its other argument (the non-finite predicate) followsthe auxiliary. The lexical rule modifies this pattern so that in interrogatives both ofthe auxiliary's arguments follow it.

A lexical rule is inherently local since it manipulates the argument structure of onelexical head. Therefore an error such as (ib), spanning two clauses, can never arise.

Page 75: Rich Languages From Poor Inputs

60 Kam and Fodor

Note, however, that this nice solution to the auxiliary inversion learnability problemonly holds if it is necessary for auxiliary inversion to be captured by a lexical rule. Ifnot, there is still a risk of a learning mis-step, even in the HPSG framework. Long-distance phenomena such as wh-'movement' or topicalization cannot be handled bylexical rules. HPSG treats them by means of a different formal device: GAP featuresare passed through the tree, from one node to another, between the gap' position andthe surface position of the item. While there are some constraints on the inheritanceof GAP features, there is no bound on how far a GAP feature can be passed.

Therefore, an HPSG-based learner that encountered questions in the input, evensimple questions like Is Jim running?, would have to choose between formulatinga lexical rule, which is local, or establishing GAP feature passing for auxiliaries. Ifpreference for a lexical rule were innate, then indeed a learners grammar could notlicense displacement of the 'wrong' auxiliary as in (ib). But if a learner could optfor a GAP feature analysis of simple questions, then errors like (ib) could ensueon PIRCs. To prevent this, an innate constraint would be needed on GAP featurepassing, comparable to the locality constraint needed in a transformational system:despite formal differences, both theories must make the RC an extraction island.(For discussion of complex NP islands in HPSG, see Pollard and Sag 1994: ch. 5.)

3.5 Conclusions

This study of the prospects for n-gram-based learning of natural language syntax leadsto the following conclusions:

(I) Low-level statistics over word strings might contribute to syntax learning butcannot substitute for syntactic knowledge.

(II) Specifically: such statistics cannot capture the generalization about auxiliaryinversion.

(III) Theoretical differences aside, the only route to the correct generalizationrequires a bias toward local syntactic dependencies, defined over a phrasestructure analysis of the sentence.

(IV) Hence, a learner that makes use of word-level statistics as the basis for auxil-iary inversion must, at a minimum, also have an innate propensity to projectphrase structure onto word strings—just as Noam Chomsky observed fourdecades ago.

Page 76: Rich Languages From Poor Inputs

Poverty of the Stimulus: Willingnessto be Puzzled

NOAM CHOMSKY

An accompanying chapter in this volume (Berwick et al, 'Poverty of the StimulusStands', henceforth POTSS) discusses some of the problems that arose in the 19505when the study of language began to try to address more directly the core issue ofnatural language: how the internal system that each normal person has mastered (I-language) determines the infinite array of structured expressions that yield interpreta-tions at the two interfaces, the sensory-motor system SM and the system of thought CI(conceptual-intentional). At once numerous puzzles arose, in ways rather reminiscentof the earliest days of the modern scientific revolution, when scientists chose no longerto be satisfied with the conventional explanation for why stones fall to the groundand steam rises to the sky: i.e., that they are seeking their natural place. Willingnessto be puzzled by what seem to be obvious truths is the first step towards gainingunderstanding of how the world works.

As soon as this stance was adopted, it was quickly discovered, in a sharp departurefrom the prevailing mood of the time, that very little was understood and that conven-tional accounts were grounded in concepts far too obscure and indeterminate to bearthe burden of explanation: for example, that the capacity to produce and comprehendnovel utterances, perhaps the most elementary feature of normal language use, issimply a matter of induction and analogy, concepts left unanalyzed, and known forcenturies to be highly problematic.

In particular, it quickly became apparent that little was understood about the hugegap between data available and state attained, a feature of all growth and development.In the study of language the problem was given a special name: Poverty of Stimulus(POS). POTSS discusses one of the simplest of the innumerable POS problems thatcame to light as soon as puzzlement was entertained and many of the efforts to dealwith it: the problem sometimes called Aux-fronting. To repeat, in (i),

(i) can eagles that fly swim

4

Page 77: Rich Languages From Poor Inputs

62 N. Chomsky

we see that the auxiliary can is associated with swim, not fly, as is evident frominterpretation and many other properties: for example, morphology ('are eagles thatfly swimming', 'have eagles that fly been swimming'). Other examples show that thepre-posed auxiliary is actually a bare inflectional element T (bearing tense and agree-ment features in a manner depending on morphological properties of the language inquestion) with whatever verbal element is attached to it, as we see in 'do eagles that flyswim'. Accordingly, any solution that merits consideration will associate T and whatis attached to it with two positions, as in the abstract form:

(2) can eagles that fly can swim,

with the rightmost (embedded) occurrence of can unpronounced.Adopting the conventional (and well-motivated) assumption that a clause is intro-

duced by a complementizer C that indicates at least force (declarative, interrogative,etc.), we can say that in (i), the structurally less prominent (and unpronounced)occurrence of can is in the T(ense)-position, and the most prominent (pronounced)occurrence is in the C-position.

The C-T relation is particularly close in other ways as well. The most obvious is therequirement of strict adjacency, unlike, say, the relation of C to wh-phrases, as in (3),with the same convention for pronunciation.

(3) which book has the teacher has told the students that they should readwhich book

The interpretation is roughly 'for which bookx, the teacher has told the students thatthey should read the book x'.

And there are other properties as well, among them shared features.As discussed in POTSS, the simplest account, which satisfies the overriding prin-

ciple of Minimal Computation (MC), takes the two occurrences of 'can in (2) to becopies formed by Internal Merge (IM), a special case of the simplest combinatorialoperation Merge, followed by externalization processes that conform to MinimalComputation (MC), hence deleting all copies apart from one that is needed to indi-cate that the operation took place, the hierarchically most prominent one, yielding(i). Quite generally, such processes yield the structures required for straightforwardsemantic interpretation, including quite subtle cases, but at the sensory-motor SMoutput they yield structures that pose problems for processing (hence communicationas a special case). This is one of many illustrations of conflict between computationaland communicational efficiency. In cases that are at all understood, computationalefficiency wins out, dramatically in fact, with implications for the general architecturaldesign of language as well as its evolution, topics discussed briefly in POTSS and inmore detail elsewhere (Berwick and Chomsky 2011).

That leaves the question why structural distance between C and T is selected asthe operative mechanism, not linear distance between the two, which is far easier to

Page 78: Rich Languages From Poor Inputs

POS: Willingness to be Puzzled 63

deal with computationally (in parsing, for example). Furthermore, why do we findthe same property—informally called structure-dependence'—universally in relevantstructures, in a wide range of constructions and cross-linguistically? The principle ofminimal computation (MC) is presumably a 'third factor'1 property, not specific tolanguage or probably even to biological organisms, so it can be presupposed here. Wewould have a principled solution to the POS problem if it could be shown that lineardistance is not available to the computational system that generates structures andassigns them interpretations at the interface. One possibility, mentioned in POTSS, isthat linear order is a reflex of the SM system, where it is required for externalization(with conditions varying depending on the mode of externalization—speech vs signfor example, and with further complexities not relevant here). Hence it would not beaccessible to the operations that generate structures and map them to the conceptual-intentional CI systems (where, it seems, they play no role for core semantic proper-ties). Here numerous interesting questions arise, but it seems a plausible direction topursue.

As discussed in POTSS, this very simple POS problem has a curious history. Therehave been extensive efforts to account for the facts on purely computational grounds.To the extent that these efforts even address relevant problems, they are completefailures, irremediable it seems. Furthermore, it would not much matter if some suchprocedure were to work for a particular language, or even for Aux-fronting generally.Similar procedures would (generally) work just as well for a non-human languagein which Aux-fronting was based on linear rather than structural distance, and mighteven be simpler, since the concept is so much simpler in general computational terms.The significant question is why linear distance procedures are never chosen. The ques-tion is ignored or begged in this kind of work, without exception to my knowledge. Itappears that we are left with only one proposal, the earliest one: linguistic operationsare based on structural not linear distance, the principle of 'structure dependence'.(See POTSS. Also the chapter by Kam and Fodor in this volume for an interestingreanalysis and new confirming data).

In the final section of POTSS a new POS problem is raised, one that has gone unde-tected, another case of insufficient puzzlement. Consider a simple subject-predicateconstruction such as (4), more generally (4'), with pred the phrase merged withT(ense) (fly, fly planes, feel angry, etc.). The corresponding interrogative is (5), not(6) (with the same convention for deletion):

(4) [young eagles] [T fly]

(4') [young eagles] [T pred]

1 The three factors, as discussed in POTSS, are: (i) Linguistic experience, that is, primary linguisticdata (PLD); (2) Genetic endowment, in particular, Universal Grammar (UG), the linguistically specificinitial state; (3) General laws of biology or beyond, specifically principles of computational efficiency. SeeN. Chomsky (2005), 'Three factors in language design'. Linguistic Inquiry, 361:1-22.

Page 79: Rich Languages From Poor Inputs

64 N. Chomsky

(5) do [young eagles] [Tpred]

(6) eagles [young eagles] [T pred]

The facts are entirely obvious, just as obvious as the fact that stones and steam moveto their natural places, down and up respectively. But why?

The answer provided by Phrase Structure Grammar and its descendants (hence-forth PSG+) is on a par with the Aristotelian explanation for motion down or up. Theanswer is stipulated by taking (4) to be a TP, with T the most promainent (projected)element, and taking the NP subject to be the specifier of TP, subordinate to T. But inthe simplest system conforming to MC, (4) is of the form [Subject-Predicate] ({XP,YP}), with the nominal head of the subject and the T head of the predicate equallyprominent. So we are left without any argument for choosing (5) over (6) in terms ofminimal structural distance; and the other C-T properties suffer the same fate.

This puzzle would be resolved very simply if the Subject is not actually present atthe stage of computation at which relations between C-T are established, includingAux-fronting. The structure at this point would therefore be (7):

(7) C [Tpred]

The Subject is introduced later. It cannot be introduced by External Merge EM, whichwould violate MC (which entails, in this case, the No Tampering Condition NTC).2

The Subject can, however, be introduced by Internal Merge IM, and in fact that hasbeen generally assumed on different grounds for some years (the Predicate-InternalSubject Hypothesis, PISH).3 The considerations adduced here seem to me to providea much sounder argument for the conclusion. One consequence is that the dualityof semantics, which appears to be an important semantic principle that probablyderives from the Conceptual-Intentional (CI) interface, is captured in a narrow formby the EM-IM distinction, the former determining theta roles and the latter discourse-oriented and scopal properties (not an exhaustive classification, but an important firstapproximation).

Suppose that the construction in question is transitive, e.g., 'young eagles buildnests.' The structure formed by EM is (8):

(8) C [Tpred],

where pred = [[young eagles] [build nests]], hence

(9) C {T, {{young eagles}, {build nests}}}

2 The No Tampering Condition does not permit any further manipulation of the structure X containingY and therefore forces Merge (be it external or internal) always to target the root. See N. Chomsky (2000)and POTSS, this volume, especially footnote 13.

3 Note that the IM operation appears to be counter-cyclic, but is not in terms of phase theory, and ismotivated on other grounds. See Chomsky (loo/b), and for another approach, see Epstein, Kitahara, andSeely (forthcoming).

Page 80: Rich Languages From Poor Inputs

POS: Willingness to be Puzzled 65

Here {young eagles} is conventionally called the external argument of the verb 'build',and {nests} its internal argument. And semantic roles are assigned accordingly, withinthe syntactic object pred, considered to be a verbal phrase, though we have as yet nowarrant for that reasonable decision.

The syntactic object pred is of the form {XP, YP}, and the question again arises asto what category it belongs to: X, Y, or neither? In PSG+, it is required that a categorybe assigned, and the choice is stipulated to be Y = v, the verbal head of the predicate,clearly the desired outcome. But that again is arbitrary, in fact doubly so: the insistencethat there must be a label, and that the label is verbal, not nominal. To overcome theseproblems, we have to think a little more carefully about how and why categories areassigned to syntactic objects formed by Merge.

In early generative grammar, a basic distinction was made between contiguousand non-contiguous relations: e.g., the Verb-Object relation of 'build nests' and thedisplacement relations of (3) (among others). The former were to be captured byPhrase Structure Grammar and its descendants (PSG+), the latter by transformationalrules. Subsequent work, eliminating the massive complexity and stipulations of PSG+,reduced the contiguous operations to External Merge EM (bare phrase structure).Later it was recognized that non-contiguous operations too fall under the simpler andmore general operation Merge, thus unifying the two basic sets of properties. Keepingjust to the special case of contiguous relations, PSG+ amalgamated three distinctproperties: compositionality, linearity, and projection. Merge extracts composition-ality from the amalgam, leaving linearity and projection. Linearity, I have suggested,maybe a reflex of the Sensory-Motor SM interface, hence part of the ancillary systemof externalization. That leaves projection, or labeling as it has come to be called inmore recent work. The same three properties hold for Internal Merge IM.

The problem is not to find a way to assign labels; that can be done in variousways, a number of them discussed in published and forthcoming papers. Rather, theproblem is to determine the consequences of assigning labels in the most principledway, observing Minimal Computation MC. To follow this course, let us ask why wehave labels in the first place.

The motivation for labeling derives from the computational system itself. To carryout further computations on a syntactic object SO, it is sometimes necessary to knowwhat category it belongs to: is it a nominal phrase or a verbal phrase, for example?If SO does not enter into further computation, it requires no label. Furthermore,labeling should conform to MC. Therefore, like the relation Agree, it should reduce tominimal search. The computational system itself should contain a labeling algorithmthat searches an SO to discover what kind of an object it is, keeping to minimalsearch. Optimally, all of the information relevant to further computation should becontained in a designated minimal element, a head drawn from the lexicon. In astructure SO = {H, XP}, where H is a head and XP its complement, minimal searchwill assign the structure the category H. But in SO = {XP, YP}, where neither is a head,

Page 81: Rich Languages From Poor Inputs

66 N. Chomsky

there maybe (and typically is) an ambiguity, so there is no category assignment. Thatis a problem if and only if SO enters into further computation.

One particular case has been discussed by Andrea Moro (2000, 2009): copularconstructions that he takes to be based on a small clause (SC), as in (10):

(10) {copula {XP, YP}}

As Moro shows, one of XP, YP must raise: his principle of 'dynamic antisymmetry.'The principle reduces to the labeling algorithm (see Chomsky 2008). It cannot applyto Small Clause SC = {XP, YP}, but SC must receive an interpretation at CI, so mustbe labeled. Suppose one of the two members of SC, sayXP, raises by Internal MergeIM. Then the labeling algorithm 'sees' YP, but not XP, which is the lower part (tail)of a discontinuous element, a chain consisting of a series of copies headed by thestructurally most prominent element. Since the chain is invisible to the algorithm,SC is labeled Y Similarly if YP raises, and it is labeled X. The phenomenon is afamiliar one. It is well known that the tail of a chain does not intervene to blockAgree, but the most prominent element—equivalently, the whole chain—does inducean intervention effect. The small clause case therefore reduces to this more generalphenomenon. The example illustrates that the PSG+ stipulation that all categoriesmust be labeled is too strong: they should be labeled only when they enter into furthercomputation, the only motivation for labeling.

There are other structures that illustrate the same conclusion (see section 2.3 inBerwick, Chomsky, and Piattelli-Palmarini's chapter in this volume for 'an optimalgeneral framework').

With this in mind, let us return to the predicate-internal subject, now in the form(8)-(9). In these structures, pred (= {{young eagles}, {build nests}}) is an unlabelableXP-YP structure, but as in the case of the small clause, it must be labeled, for assign-ment of theta roles. Suppose that XP, the External Argument, raises by Internal MergeIM. Then exactly as in the case of the small clause, only YP is visible to the labelingalgorithm, and the structure is labeled Y, that is verbal, the desired outcome. Thisyields core elements of the Extended Projection Principle EPP, which stipulates thatsomething must fill the surface subject position.

For reasons that go beyond the discussion here, YP cannot raise. But its NP objectcan, in which case the residue will be of the form (11):

(11) {XP,{V,Z}}

Here Z is the lower copy of the raised NP, hence invisible to the algorithm, which willonly detect (12):

(12) {XP,{V}}

In (12), the structurally most prominent member is V (which amounts to taking a sin-gleton set to be identical to its member). Hence the label is V, with XP its complement;again a verb phrase, the desired outcome.

Page 82: Rich Languages From Poor Inputs

POS: Willingness to be Puzzled 67

The minimal labeling conditions suggest that there should be a principle that ina structure of the form {{external argument}, {V, {internal argument}}} either theexternal or the internal argument must raise. Though the matter is complex, thereis evidence to that effect (see Alexiadou and Anagnostopoulou 2007).

It should be evident that this is only the tip of an iceberg. Once we agree tobe puzzled by the elementary POS problem of Aux-raising, and seek to provide aprincipled answer to it and related problems, many consequences follow, here onlyexemplified with regard to this simple case. Many new problems arise, exactly as weshould anticipate (and hope) in a research program that seeks to answer fundamentalquestions, not to be satisfied with descriptive observations, valuable as these of courseare for formulating clearly the questions that require answers. That is true for the entirerich and complex range of problems posed by the fact that rich outputs arise frompoor inputs, the core problem of growth and development generally, for language inparticular.

Page 83: Rich Languages From Poor Inputs

Revisiting Modularity: UsingLanguage as a Window to the Mind

SUSAN CURTISS

5.1 Introduction

This chapter takes a new look at modularity, both so-called 'big modularity' (BMod)and 'little modularity' (LMod). In it, I reconsider the viability of the BMod claims thatlanguage, and in particular, grammar, represents a domain-specific mental faculty,one that rests on structural organizing principles and constraints not shared in largepart by other mental faculties and in its processing and computation is automaticand mandatory, and the LMod assertion that language, itself, is comprised of distinctsubmodules, each of which can be seen to develop and function separately. The firstpart of the chapter concentrates on BMod issues, the second part on LMod. I will atthe end summarize what I believe I have shown and draw a final conclusion.

Having written my dissertation on the Genie case, it is no surprise that I havedevoted much of my research to following up on ideas this case led me to consider.Her cognitive profile—a severely limited grammar that lacked functional structure,including all I- and C-system functional elements and the syntactic operations, Moveand Agree, alongside excellent vocabulary learning ability, good ability to initiate andsustain topics, excellent ability to apprehend complex hierarchical structure outsidethe realm of grammar, good ability to logically sequence pictures into stories, abilityto count, ability to draw in silhouette and capture in drawing juxtapositions of objectsand events that she could not communicate verbally, bafflingly powerful non-verbalcommunicative ability, and superior visual and spatial cognition—compelled me to

* In deciding what contribution I could make to this volume that honors Carol's work and reflectsinfluences her work had on my own, I determined that her work's greatest contributions to my own were,perhaps, in encouraging me to think 'out of the box', to look where others had not looked to find data thatspoke to those issues I have designed my research program to address, and to take a new look at individualsand populations that can teach us something about both language and human nature. Carol was alwaysconcerned about the humanity of those she studied and worked with, and we all know how wonderfullywell she managed to do this. Her example has inspired me throughout my career to try to do the same.

5

Page 84: Rich Languages From Poor Inputs

Revisiting Modularity 69

explore the issue of modularity further and to attempt, throughout my career, togenerate empirical data which bore on issues of modularity.

This chapter brings to bear empirical data from a wide variety of sources that speakto the questions involved. The first part of this chapter focuses on data that speak toBMod, drawing from both normal and atypical data to demonstrate non-trivial disso-ciations between grammar and other mental faculties. To this end, I discuss relevantdata from ERP and imaging studies of the neural basis for language, cases of linguisticisolation, mentally retarded children, normally developing children acquiring ASL asa first language, children and adolescents with Specific Language Impairment (SLI),with Turners syndrome and with other developmental anomalies, and adults withacquired aphasia and progressive dementia.

The second part of this chapter addresses LMod. I again examine data from studiespertinent to the neural mediation of language and from children and adolescents withSLI, as well as data from adolescents and men with Klinefelter s syndrome, adults withacquired aphasia, and adults with progressive dementia. In both parts some of thedata examined come from research done by others; some are data drawn from myown work.1

5.2 Big Modularity

What would it mean for language to be a distinct module of the mind/brain, basedon domain-specific organizing principles? What kind of evidence would support suchan idea? To begin, language, like all of cognition, lives in the brain. Is there evidencefor the tenets of BMod from work examining the neurology of language? It shouldbe noted that even if it were the case that grammar-dedicated brain tissue could notbe segregated from brain regions not dedicated to the representation or processingof grammar, there is, I would argue, abundant evidence supporting the existence oflanguage/grammar as a discrete, separable, functional human biological system. How-ever, there are neurological arguments to be made supporting BMod. Space constrainsme to mention but a small fraction of this ever-growing body of work that suggeststhat notions underlying the claims of BMod are correct.

5.2.1 The neurology of language

To begin, there is the columnar organization of neurons in the cytoarchitecture of thehuman cortex (e.g., Roland and Zilles 1998 and sources therein). Columnar neuralcytoarchitecture is not, in and of itself, evidence that the mind is modular, but someexperimental research appears to strongly support functional modularity, and it istelling that much of this research examines what might be predicted to be most closely

1 A list of tests used in collecting my data on Genie, Chelsea, TS children, KS individuals, and mentallyretarded children and adolescents is presented in the Appendix, which can be found on my website at<http://www.linguistics.ucla.edu/people/curtiss/index.html>. The Appendix also includes example itemsfrom many of these tests.

Page 85: Rich Languages From Poor Inputs

70 Curtiss

aligned with and inseparable from spoken language representation and processingneurally—namely, acoustic processing.

5.2.1.1 Phonological processing separable from (non-linguistic) acoustic processingA number of studies appear to show that distinct columns of neurons or neural tissuerespond to phonological categories and are different from those that respond to acous-tic distinctions (e.g., Phillips et al. 2000; Dehaene-Lambertz et al. 2002; Dehaene-Lambertz and Pena 2001). These studies and numerous others have looked specificallyat the neural instantiation of acoustic processing and whether it can be shown tounderlie phonetic/phonological processing or even be prior to and/or inseparablefrom it, or whether evidence can be generated that argues for their separability.One technique studies have used to examine this issue is to look at the timing ofneural responses to sounds or sound combinations utilizing Event Related PotentialsERPs, Optical Scanning, or MEG. Using Oddball paradigms2 that generate Mismatchresponses ('Mismatch Negativities', MMNs) to signals perceived to be distinct fromothers in a series, experimenters have manipulated acoustic signals such that whilethe outlier is distinct acoustically, it may or may not be phonologically.

One set of studies of this type has examined the MMN reflex of categorical percep-tion. In these studies the 'Standards' are stimuli that are distinct acoustically but notphonologically; i.e., they are acoustically distinct but within phonological category(WC) signals. This series of WC stimuli is followed by an equivalently acousticallydistinct signal, but one that crosses phonemic category boundaries (an AC stimulus)(e.g., Dehaene-Lambertz 1997; Dehaene-Lambertz and Gliga 2004). Such studiesuniformly find that the AC signals generate a robust MMN, while the WC stimuli donot. The WC stimuli generate a weaker, slower, and spatially different neural response.

Building on this idea, Phillips et al. (2000) devised an especially compelling exper-iment to elicit a neural response that would unequivocally demonstrate a distinctionbetween phonological and acoustic processing. Phillips et al. constructed a set ofCV stimuli to demonstrate an MMN to the abstract phonological category, the fea-ture [voice], realized differently in stop consonants of different places of articulation(POA). Given that acoustically the signals corresponding to [-voice] or [+voice] instop consonants at different POAs are very different and can be treated as the sameonly at the level of phonology, the result that the brain indeed treats these quitedistinct acoustic signals as the same across labial, alveolar, and velar stops is per-suasive evidence that phonological representation and processing is both cognitivelyand neurally distinct from acoustic representations and processing.3 Furthermore,

2 The Oddball paradigm utilizes a design wherein a series of signals is presented which has the designSSSD—where the Ss (the 'Standards') are the same along some dimension, and the Ds ('Deviants') differfrom the Ss along that dimension.

3 The MMN disappeared when the S and D stimuli all fell into either the — or + voice category. Thusthe MMN response could only have been a phonological and not an acoustic response, since all the stimulidiffered acoustically.

Page 86: Rich Languages From Poor Inputs

Revisiting Modularity 71

comparing brain activation to sublexical units for both sign and spoken language,Pettito et al. (2000) report that specific neural tissue is sensitive to phonologicalpatterning regardless of modality.

Other studies of adults whose findings have the same implications abound. As but afew examples, Zatorre et al. (1992) and Burton et al. (2000) both report that the sameCVC sequence (thus the same acoustic stimulus) elicits different lateralized neuralresponses depending on whether the task is a phonological one (e.g., discriminatingonsets, codas, or rimes) or a non-linguistic acoustic task (i.e., discriminating pitch). Inaddition, Jacquemot et al. (2003), again using the Oddpall paradigm, studied speakersof French and Japanese and examined their neurological responses to stimuli thatdid or did not conform to phonotactically permissible sound sequences. In somesequences, the Ds were sequences that involved long vowels, phonemic in Japanese butnot in French; in others the Ds involved a consonant cluster, permissible in French butnot Japanese. They found (i) that there was a faster response to a sound sequence thatconforms to the phonotactic constraints of ones native language, than to acousticallydistinct items that do not constitute a possible phonological sequence; and (2) that thephonological task elicited a spatially different neural response from the acoustic one.

We find the same response pattern in the brains of infants (Dehaene-Lambertz2000; Dehaene-Lambertz and Baillet 1998; Dehaene-Lambertz and Gliga 2004;Dehaene-Lambertz and Pena 2001; Dehaene-Lambertz et al. 2006; Pena et al. 2003).Dehaene-Lambertz et al. (2002), measuring brain activation of awake and sleepingthree-month-olds evoked by forward and backward speech, were able to show that theinfant cortex is already structured into several functional regions sensitive to forwardbut not backward speech. This finding suggests that the precursors of adult corticallanguage areas are already present and active in infants well before the onset of speechproduction, despite the fact that synaptogenesis and myelination of these areas are notat all yet mature!

The fact that discrimination of phonological categories takes place more quicklyand via different neural 'networks' than the processing of acoustic distinctions inboth adults and infants (e.g., Dehaene-Lambertz 1997; Dehaene-Lambertz and Pena2001) suggests that the widely argued assertion that an acoustic mapping is bothprior to and more basic than a direct phonological mapping of discriminable signalsis incorrect. More pertinently, it illustrates that discriminating among sounds thatare not linguistically relevant is cognitively and neurally distinct from discriminatingamong sounds that are; i.e., phonology and phonetic discrimination is not reducibleto acoustics, even for the sleeping neonate (Dehaene-Lambertz and Pena 2001).

5.2.1.2 Event Related Potential (ERP) evidence for BMod An early ERP componentreferred to as the ELAN (Early Left Anterior Negativity) is a component associatedwith automatic syntactic processing and structure building and is present in bothadults and children, including children as young as two (Hahne and Freiderici 1999;

Page 87: Rich Languages From Poor Inputs

72 Curtiss

Hahne et al. 1999, 2004; Oberecker et al. 1995; Pulvermuller and Shtyrov 2003; Pul-vermuller et al. 2008).

The ELAN is not only a reflection of automatic processing, it is insensitive totask demands or violation frequency (number of syntactic violations; Pulvermulleret al. 2008; Pulvermuller and Shtyrov 2006; MacGregor et al. forthcoming). UsingERPs to measure MMNs to syntactic violations, Pulvermuller et al. asked subjectsto listen to sentences, some of them ungrammatical, some of them grammatical,while performing a demanding acoustic signal detection task.4 Their subjects hadto listen for grammaticality and not only point out when they detected a grammaticalviolation, but correct the error. Pulvermuller et al. found first, that the syntactic MMNwas extremely rapid, occurring at or before 150 msecs following the point at whichthe relevant information occurred, and second, that the magnitude of the MMNresponse was unaffected by attention load. This early time window appears to be verynarrow, but is robustly present as an index of automatic syntactic processing, a keycharacteristic of a task-specific, modular response.

Similar findings have been reported for MMN responses to detecting native seg-ments and syllables and word semantics while subjects concurrently carry out adifficult, attention-demanding task. Using, in the first case, native and non-nativespeech sounds and phonotactically possible or ungrammatical syllables, and in thesecond, pseudowords and real words, a number of different labs looking at responsesto different languages report that neurophysiological signatures of language-specificphonological responses and language-specific, word-specific memory circuits/cellassemblies are activated in the human brain in a largely automatic and attention-independent fashion (e.g., MacGregor et al. 2012, forthcoming; Pulvermulleret al. 2009).

5.2.2 Language separable from spatial cognition

5.2.2.1 Sign language and spatial cognition It is well known that in the 'modal' brain,(that of the right-handed, typically male individual), spatial cognition is mediated bythe right hemisphere, while computational linguistic tasks asymmetrically engage theleft hemisphere. This pattern in itself provides clear evidence that these two aspectsof human cognition are separate at both the cognitive and neural level. The extentto which these two cognitive domains might become interrelated and therefore lessdissociable when the language involved is in part a spatially coded system, i.e., asign language, has also been explored. Such research in normals largely comprisesstudies comparing sign language processing with spoken language processing andthese, with brain areas activated during spatial tasks. For example, McGuire et al.(1997) found that (outside of motor cortex) the same brain regions were activated by

4 Pulvermuller et al. divided their subjects into two groups and used two tasks. One group had to watcha silent video while listening to sentences; the other had to determine if a tone was briefly attenuated—atask with a high attentional load, whose stimuli were in the same modality as the language stimuli.

Page 88: Rich Languages From Poor Inputs

Revisiting Modularity 73

Deaf signers mentally articulating British Sign Language sentences as those activatedby hearing speakers when silently articulating sentences of English, in neither caseinvolving areas of the right hemisphere activated during non-linguistic visual/spatialprocessing. Emmorey (2002) provides a survey and discussion of much of the relevantresearch in this area. We return to this topic in 5.2.3 below.

5.2.2.2 Turner's syndrome (TS)5 Behavioral data also speak to this issue. Studiesof TS, for example, provide compelling evidence for dissociations between languageand non-linguistic spatial cognition. TS is a genetic disorder occurring in females thatarises from partial or complete absence of an X chromosome. TS is associated witha peculiar cognitive profile characterized by normal grammatical development andfunction, enhanced reading ability compared to age-matched normals in childhood,including the ability to read irregularly spelled words and long unfamiliar regularwords (Temple and Carney 1996), normal arithmetic abilities, but impaired numberreasoning and severely impaired visual and spatial cognition (Bruandet et al. 2004;Money 1963, 1973; Money and Alexander 1966; Pennington et al. 1985; Rovet 1998and references therein; Silbert et al. 1977; Waber 1979), a profile that persists fromearly childhood through adulthood (Temple and Shephard forthcoming).

In my own work, I have studied a number of TS children, including a mentallyretarded girl, V, with an IQ of 68, tested over the course of several months (from9;6-io;o)6 (Curtiss and Yamada 1981). One of the most notable aspects of the cog-nitive profile associated with TS is the discrepancy between the absence of visual andspatial defects in the realms of reading, writing, and performing arithmetic calcula-tions alongside pervasive non-language visual and spatial deficits. To wit, despite V'sinability to copy a simple square or circle, to draw representationally (see Figure 5.1),to string colored beads in accordance with a visually present model, to build even asimple bridge with blocks or copy any hierarchical stick structure, a preschool levelperformance on the block design, object assembly, and picture completion subtestsof the WISC, a 'defective' level score on visual memory, below all norms performanceon the Mooney Faces test, a performance in the 'defective' range on the ThurstoneMental Rotation test and Thurstone Closure Speed test, inability to do either embed-ded figures task, a preschool level of drawing and copying, the absence of Piagetianconservation in every area except possibly number and an inability to perform the

5 While there is no debate regarding substantial visual and spatial cognitive deficits in Williams syn-drome (WS), TS provides a clearer example of the relevant dissociation than does the WS population,I would argue. For there is ongoing controversy over how intact language is in the WS population. Someresearch indicates lexical impairments (e.g., Jones 2007; Clahsen and Almazon 2001) or syntactic anomalies(e.g., Karmillof-Smith et al. 1997; Perovic and Wexler 2007). Other research indicates largely intact syntax,even of complex structures such as complex nominal compounds and relative clauses (Zukowski 2005,forthcoming). However, there is no such controversy over the language or language development in TS.

6 This work was carried out with J. Yamada.

Page 89: Rich Languages From Poor Inputs

74 Curtiss

FIGURE 5.1 V's drawing of a house. The two smaller squares alongside the larger one representwindows of the house.

Localization of Topographical Stimuli task, V not only had an essentially maturegrammar, but she could read and spell and do simple arithmetic problems, both inher head and on paper. (See Curtiss and Yamada 1981 for further details of this case.)

The profile V displayed is characteristic of that displayed by TS girls and women.Only within the language-related domains of reading and writing and in calculationdoes one see no evidence of a visual or spatial deficit. Outside these domains, onefinds marked visual and spatial impairment.

TS raises the issue of the role of the sex chromosomes in verbal ability, one we willturn to again below in our discussion of research with Klinefelters syndrome (KS)adolescents and adults. Of relevance here is that it provides empirical evidence for aclear dissociation between language, particularly grammar and language-dependentabilities such as reading, and a number of aspects of non-linguistic cognition, includ-ing visual memory, visual constructive ability, number reasoning, visual cognition,and spatial cognition.

5.2.3 Language separable from both spatial cognition andnon-linguistic communication

Another population demonstrating striking dissociations between linguistic and non-linguistic visual and spatial cognition is brain-damaged Deaf signers. Several studieshave documented the relevant dissociations (Bellugi et al. 1993; Corina et al. 1992;Emmorey 2002 and references therein), including also a dissociation between (i) theuse of and ability to copy meaningless non-linguistic gestures alone and in combina-tion and (2) linguistic gesture (sign). Importantly, comparing brain-damaged signers

Page 90: Rich Languages From Poor Inputs

Revisiting Modularity 75

with and without acquired aphasia, one finds a double dissociation7 between languageon the one hand and non-linguistic communication and spatial cognition on theother, with aphasics showing normal visual and spatial cognition outside of the useof the spatialized syntax of ASL, and right-hemisphere-damaged non-aphasic signersdisplaying impaired visual and spatial cognition.8 A striking illustration of this doubledissociation, reported in Bellugi et al. (1989), is left-hemisphere-damaged signersproducing via aphasic utterances accurate descriptions of the spatial layout of thefurniture in a room of their house, contrasting with right-hemisphere-damaged, non-aphasic signers producing in grammatical sentences, descriptions of the arrangementof furniture in a manner that reflects hemi-neglect (neglect of the left side of space).Notably, their spatial cognitive deficits do not translate into deficits in comprehendingor producing spatially transmitted aspects of sign grammar.

Thus we see deficits in the apprehension of linguistic gesture not transferring tothe realm of non-linguistic gesture, communicative or not, and deficits in visual andspatial cognition not transferring to the realm of the use of space for grammaticalpurposes—the double dissociation referred to above.

The dissociation between linguistic and non-linguistic gesture seen in brain-damaged signers, and therefore between grammar and non-verbal communication, isalso seen in the acquisition of sign vs communicative gesture in normally developingchildren. Pointing as a communicative gesture comes in prelinguistically, typicallyaround the end of the first year of life (Bates et al. 1987; Pettito 1987). Completelyisomorphic pronouns in ASL, however, emerge later, in correspondence with thetiming of the acquisition of pronouns in English and other languages. This fact israther striking, since not only are pronouns formationally isomorphic to the pointinggestures already established as part of the child's communicative repertoire far earlier,but also because ASL pronouns are iconic (i.e., a point to the speaker means 'you', apoint to oneself means 'me', etc.). Children appear to process pronouns as formal lin-guistic units, as functional units of grammar, ignoring the iconicity and isomorphismthey could readily exploit if grammatical development were driven by mechanismsshared by those underlying communicative development.

A similar picture is found with the acquisition of possessive determiners and nega-tion in ASL (Jackson 1984). Again we find that ASL-learning children ignore thecommunicative isomorphism with the earlier learned and used communicative 'head-shaking' gesture of the negative or the transparency of the possessive determiner forfirst and second person 'my/mine' or 'your'. Rather, they master these structures intiming and manner so as to indicate that they are developing a grammatical system,

7 A double dissociation between two phenomena is the neuropsychological benchmark for determiningtheir cognitive independence.

8 Recent research emanating from Helen Neville's lab at the University of Oregon suggests that cerebralorganization of these systems may not be uniform across the deaf population. That, however, is not theissue under discussion.

Page 91: Rich Languages From Poor Inputs

76 Curtiss

with functional structure that is independent of their communicative repertoire, blindto iconicity as a cue for grammatical acquisition.

5.2.4 Dissociations between grammar and non-linguistic communication

A double dissociation can also be found between grammar and non-linguisticcommunication in adults and children, in that one finds selective deficits in one andnot the other. Autism is defined in part by impairments in social interaction andcommunicative competence, such that regardless of where an individual may fallon the autism spectrum, some impairment in social interaction and communicativecompetence is always present. However, high-functioning autistic individuals andthose with related Asperger's syndrome develop and maintain normal grammars.The opposite profile is seen in many acquired aphasics, in Deaf individuals who havelittle knowledge of a first language, with late Li learners as exemplified by Genieand Chelsea and in most individuals with SLI. In these instances, we find impairedgrammars alongside good non-verbal communication abilities.

5.2.5 Multiple dissociations between grammar and other mental faculties

Examining a number of different populations, I and others have demonstrated adouble dissociation between grammar and non-linguistic cognition including com-munication.9 In one direction I have reported on several cases of mentally retardedchildren and adolescents (e.g., Curtiss 1982, ipSSa.b, 1995, 2011) with an island ofintact function, namely, grammar, within a sea of pervasive and comprehensive non-grammatical deficits, including deficits in areas some have hypothesized to underlie orbe necessary for language development—sequencing, ability to construct and appre-hend hierarchical structures, normal auditory/verbal short-term memory, symbolicthought as revealed through play or drawing, categorization, rule formation or gen-eralization, to name just some. (See above references for details.)

Individuals with spina bifida (see Stough et al. 1988) and hydrocephalus (see Tewand Laurence i979a,b) have also been reported to show a profile of good grammat-ical function coupled with mental retardation and atypical communicative behavior,giving rise to the terms 'chatterbox syndrome' and 'cocktail party syndrome'.

An additional study describing a case of intact grammatical development andfunction in the face of substantial non-linguistic impairments in non-grammaticaldomains is the case of Francoise (Rondal 1995). Francoise, a woman with DownSyndrome with an IQ of 64/65 and an MA of /;4 when she was in her early to mid-thirties, like the MR children I report on, appears to have an intact, mature, age-appropriate grammar. Moreover, though Francoise displays short-term memory andvocabulary performance comparable to her MA-matched peers, she shows intact,

9 Again, tests used in my research are listed in the appendix to this paper, found at <http://www.linguistics.ucla.edu/people/curtiss/index.html>.

Page 92: Rich Languages From Poor Inputs

Revisiting Modularity 77

fully adult working memory for sentence processing and sentence repetition andalthough unable to make semantic plausibility judgments, can make grammaticalityjudgments of long, syntactically complex sentences.

The mentally deficient individuals with intact grammars I and others have doc-umented, though not autistic, could be considered Linguistic Savants, with savantabilities in grammar. So, too, is Christopher (e.g., Smith and Tsimpli 1995), who isautistic, and whose pervasive non-linguistic retardation makes his intact computa-tional linguistic abilities quite remarkable. (At last estimate Christopher could speakand understand sentences in more than twenty languages, including British Sign Lan-guage.) An additional case is that of Daniel, a 3i-year-old high-functioning autisticwhose savant areas include number and language. Daniel is reported to speak elevenlanguages, and to have learned to speak Icelandic in a matter of days (for a televisioninterview). He has also written an autobiography of sorts (Tammet 2007).

Establishing a double dissociation between grammar and non-linguistic cognition,we see the flip side to the profile of Francoise, the mentally retarded children I havewritten about, and cases like Christopher and Daniel in Genie, who has no cleardeficits outside of grammar and psychosocial function, and in Chelsea, an adultlinguistic isolate who shows a grammatical profile even more impaired than Genie's(Chelsea appears to have no grammatical system at all but evidences robust vocab-ulary learning10 (Curtiss 1995) alongside non-grammatical functioning between alo-ii-year-old level. (This profile is evidenced to a less extreme degree by Grammat-ical SLI children; see section 5.2.7 below.) Like Chelsea, other deaf adults who werenot exposed to a sign language until adulthood (e.g., Newport 1990) also manifestsignificant grammatical deficits together with normal (even superior) non-verbalcommunication and normal cognitive function, even normal number knowledge andarithmetic ability.

5.2.6 Dissociations between language and number

Chelsea can add, subtract, multiply, and divide, manipulate money well enough toconduct restaurant and shopping transactions, keeps a correctly reconciled check-book (Glusker, personal communication) and can tell time—all without a grammar(Grinstead et al. 1998, 2002).

There is evidence from other cases as well for the independence of language andnumber, both in development and breakdown. There is abundant literature docu-menting acquired impairments in language with the number faculty spared as wellas acquired acalculia with no aphasia. In addition, there is other behavioral evidencethat the domain of number can develop or remain functional despite the absenceof language and grammar. Cases of individuals who appear to have fully developed

10 Genie has deficits in sociocultural aspects of discourse; namely, using the cultural rituals of discourse,including conversational operators and turn-holding devices that differ from culture to culture, whileChelsea has no difficulty in this area but shows difficulties in conversational aspects of discourse, suchas contributing to a conversation's progress, most probably because of her poor comprehension.

Page 93: Rich Languages From Poor Inputs

78 Curtiss

number faculties; i.e., knowledge of how numbers work—knowing how to performarithmetic operations at will, from counting to multiplication, despite a completelack of language—are reported by Schaller (1991). A fascinating study described bySchaller of a community of immigrant deaf adults without language, with a focus on aparticular case study within this community, documents how number systems can bedeveloped or readily learned despite the total absence of language, even vocabulary.Galvan (reported in Schaller) describes another deaf man, who despite having nolanguage, learned to tell time and utilize his knowledge of time to negotiate busschedules. Both of these cases and others like them point to number being its owndistinct mental module, separate from language. (See Curtiss 2011 and Grinstead et al.1998, 2002 for more details and discussion of such cases.) A bit more on number ispresented in section 5.3.7 below.

5.2.7 Specific Language Impairment (SLI)

Anywhere from 2 to 19 percent of children are estimated to have developmentallanguage impairment not associated with hearing loss, mental retardation, frank neu-rological damage, or social/emotional impairments (Nelson et al. 2006). Although wewill return to this population in section 5.3, the G-SLI subgroup within this pop-ulation, so called because they present with selective grammatical (G) impairments,while non-linguistic cognition remains intact, offer another source of compelling datain support of BMod and the double dissociation displayed by cases and populationsalready discussed.

The most relevant research on G-SLI has been done largely by van der Lely and col-leagues.11 Van der Lely and her colleagues have demonstrated normal non-linguisticcognitive function not only by their selection criteria but also by demonstrating thatG-SLI individuals perform like age-matched controls in their ability to solve logicalproblems as tapped by a modified game of Cluedo (van der Lely and Battell 2003)and the ability to perform a complex visual inference task (van der Lely et al. 2004).Van der Lely and her colleagues have also established that individuals with G-SLI aretypically unimpaired in the acoustic processing that some researchers have hypoth-esized underlies SLI itself (Tallal 1976, 2000; Rosen et al. 2009; van der Lely et al.2004)12.

In addition, heritability studies of children with G-SLI (and others) suggest geneticfactors dedicated to grammar separable from non-language cognition (see, for exam-ple, Stromswold 2001 for a general review; Bishop, North, and Donlan 1996; van derLely and Stollwerck 1996; van der Lely 2004). G-SLI children are found to have a

11 Many other linguists have done important, even seminal work characterizing the specific linguisticimpairments of individuals with SLI. I concentrate on van der Lely and her colleagues' research here, forits particular relevance to BMod issues.

12 The discovery of SLI in a signer (Morgan et al. 2007; Mason et al. 2010) is further evidence againstthe hypothesis that an impairment in the ability to process rapidly changing acoustic information underliesSLI, since in the sign signal, information is presented at a far slower rate.

Page 94: Rich Languages From Poor Inputs

Revisiting Modularity 79

higher incidence of family histories with language impairments than normally devel-oping children. Taken together with data characterizing G-SLI as a domain-specificdeficit, this then provides another source of evidence that there is a brain systemdedicated to grammar alone.

There is increasingly abundant data indicating the heritability of language bothin children with a variety of language or language-related deficits from twin stud-ies showing greater heritability of language impairments in monozygotic (MZ) thandizygotic (DZ) twins, from familial aggregation studies showing significantly higherfamily histories of language or reading problems in families of children with SLI thanin normally developing children and from adoption studies that show a significantlyhigher correlation in language abilities between adopted children and their biologicalparents than with their adopted parents (Bishop et al. 1996, i999a,b; Felsenfeld andPlomin 1997; Hohnen and Stevenson 1999; Stromswold 2007). These data also con-stitute evidence for genetic factors dedicated to language alone. This is not surprisingto anyone working in the field of grammatical development, since normal languagedevelopment is characterized by uniformity in timing of emergence and of majordevelopmental linguistic milestones across the species. Given the additional forceof poverty of stimulus arguments, normal language development itself can be seenas strong evidence that language is a biological system that is part of our geneticendowment as humans, and therefore, genetically transmitted. It is simply becauseof the selective deficits manifested by the SLI population, that evidence regardingthe heritability of language and its subsystems by those with language impairmentsmay seem more persuasive as evidence for a domain-specific, genetically determinedmental module we call 'language'. I return to this topic briefly in section 5.3.5 below.

5.2.8 Acquired aphasia and Dementia of the Alzheimer's Type (DAT)

5.2.8.1 Acquired aphasia and intelligence With agrammatic (non-fluent) aphasia,one typically sees intact intelligence (Bay 1964; Varley and Siegal 2000), another note-worthy dissociation of grammar from extra-grammatical cognition. The manuals forand findings from clinical tests given worldwide for classifying aphasic disorders takeas a given at this point in time, that in aphasias whose predominant characteristics areanomia or belabored and halting production with particular difficulty with functionalelements, intelligence remains largely intact. This fact is reflected in the depressionthat is a frequent artifact of these aphasias, a depression based on such aphasics'keen awareness of their functional loss. Largely intact intelligence alongside acquiredlanguage loss provides, then, yet another source of evidence that language and non-language cognition are not of a piece, but rather are separate and separable.13

13 Work demonstrating equivalent intelligence (little g') of the right and left hemispheres also providesevidence of the independence of language and intelligence (e.g., Zaidel, Zaidel, and Sperry 1981).

Page 95: Rich Languages From Poor Inputs

8o Curtiss

5.2.8.2 Dementia of the Alzheimer's Type (DAT) With agrammatic aphasics andadults with DAT we have two populations that together reveal a double dissociationof grammar and non-grammatical mental faculties. Alongside cognitive dissolution,progressive dementia is characterized by lexical and other forms of semantic loss, evenearly on. While non-linguistic cognition and extra-grammatical aspects of language(e.g., lexicon and discourse functions) deteriorate, however, phonological and mor-phosyntactic knowledge appears to remain largely intact, often until late stages ofDAT (Kempler 1984; Kempler et al. 1987). I return to this population in section5.3.3, where I concentrate on the selective preservation of submodules of language;however, I mention the DAT population here, as they reflect in breakdown an adultparallel to developmentally retarded individuals with selectively intact grammaticaldevelopment, much as adult acquired aphasia is in many respects a breakdown parallelto SLI.

5.2.9 Selectively impaired non-linguistic cognition

Selective developmental and acquired impairments in non-linguistic cognitivedomains are well established in the clinical literature and include the domains ofnumber, spatial cognition, facial recognition, and other visual agnosias, visual cogni-tion in general, proprioception, non-linguistic communication, and music. Selectiveimpairments of a number of different cognitive domains or submodules within thosedomains will be discussed in more detail in section 5.3.6, but the fact that so manycognitive domains in addition to language can be selectively spared or damaged bothin breakdown and development points to a model of the mind comprised of a numberof distinct faculties which rest on task-specific principles, constraints, and mecha-nisms for processing domain-relevant information. As we will see, each of these can befractionalized and subsystems within them develop abnormally or become impaired.Space constraints prevent me from elaborating on this additional, important area ofevidence supporting the basic claims of BMod; however, I mention them here to pointout that the issue of modularity of mind is one that speaks not just to language andmind, but to the broader issue of the nature of mind in general.

5.3 Little Modularity

Above I enumerated a variety of populations and kinds of evidence in support of thebasic claims of BMod. In this section of the chapter we will do the same for the claimsof little modularity (LMod); namely, that language (like other cognitive systems) isnot all of a piece, and that different subsystems within language—lexicon, pragmatics,and the computational system (the grammar)—can be selectively impaired in devel-opment and breakdown.

Page 96: Rich Languages From Poor Inputs

Revisiting Modularity 81

5.3.1 Acquired aphasia

Lack of space prevents me from richly explicating relevant findings from studies ofacquired aphasia, but linguistic aphasiology has documented selective impairmentsof (i) the lexicon, even semantic and syntactic category-specific deficits within lexicalloss (e.g., Hart et al. 1985; Jodzio et al. 2008; Caramazza 1988); (2) morphology(Thompson et al. 2002), including selective impairments differentially affectingderivational and inflectional morphology (e.g., Miceli and Carramazza 1988); and(3) syntax (Grodzinsky 1986; Grodzinsky and Finkel 1998; Friedmann et al. 2006;Bastiaanse and van Zonnefeld 1998; Bastiaanse and Thompson 2003; Buchert et al.2008).14 A cottage industry seems to be devoted just to the selective loss of 'closed-class' elements in the lexicon (Bradley, Garrett, and Zurif 1980) and at which levelof grammar or processing the relevant generalization regarding what is impaired canbest be captured (e.g., Kean 1980; Grodzinsky 1986; Colston 1991). Researchers havealso noted selective deficits in pragmatics (e.g., Bottini et al. 1994; Champagne-Lavauand Joanette 2009), typically following RH damage, where the grammar and lexiconremain essentially intact, but 'non-ordinary' language (e.g., appreciating metaphor,jokes) is affected.

Studies of rare forms of aphasia reveal additional interesting patterns of selectivelyimpaired vs selectively intact pieces of language. Both mixed transcortical aphasicsand transcortical sensory aphasics spontaneously and unconsciously (i.e., automati-cally and mandatorally) correct minor phonological or morphosyntactic errors (e.g.,Whitaker 1976; Davis et al. 1978), but appear impervious to the semantic plausibilityof the sentences they are asked to repeat.

Linguists have focused particular attention on agrammatism, with most attemptingto delimit exactly which syntactic principles or piece(s) of computational machineryare affected in agrammatism. Many have suggested, for example, that the operationMove is selectively impaired (Bastiaanse and van Zonnefeld 1998; Bastiaanse andThompson 2003), providing a striking adult parallel with S-SLI (Friedmann, Gvionand Novogrodsky 2006). There is ongoing debate as to whether the agrammatic's lossis one of being able to compute the entire syntactic tree, with all of the relevant func-tional heads and internal functional structure (e.g., the Trace Deletion Hypothesis(Grodzinsky, 1986); the Tree-Pruning type Hypotheses (Hagiwara 1985; Friedmann2001; Friedmann and Grodzinsky 1994, 1997)), or is one of selective loss either forparts of that functional structure (e.g., elements/features marking finiteness) or ofthe operation Move triggered by such features (e.g., Friedmann 2001; Friedmannand Grodzinsky 1994, 1997; Grillo 2009; Grodzinsky and Finkel 1998; Friedmann

14 It is intriguing that, to my knowledge, deficits in phonology unaccompanied by other deficits are notdocumented. One can speculate as to why this is the case, taking into consideration the interdependenceand interrelation of phonological processes and the lexicon or between phonology and morphologicalrealization, among other factors. It remains a curiosity, nonetheless.

Page 97: Rich Languages From Poor Inputs

82 Curtiss

et al. 2010). It is striking that problems with Move are noted for both the G-SLI/S-SLIpopulation and agrammatics.15

Whatever the correct analysis, the acquired aphasias provide clear evidence thatlanguage is decomposable into separate and separable submodules or subcompo-nents, the principles of which can be selectively damaged or remain intact, includingthe piece of the parser by which grammatical form is filtered for errors—all evidencesupporting the fundamental tenets of LMod.

5.3.2 Klinefelter's syndrome (KS)

In TS we saw the potential influence of the sex chromosomes, in particular, the Xchromosome, on language function. In KS we see another instance implicating the Xchromosome in language. In TS, there was the partial or complete absence of one ofthe X chromosomes, resulting in enhanced reading abilities. KS is a genetic disorderin which males have an extra X chromosome and have a sex chromosomal make-up of forty-seven XXY. There is very little linguistic research on KS, but the littleresearch that exists on language, reading, and spelling in KS reports that KS boys aredevelopmentally dyslexic, have spelling deficits, and are frequently mildly retarded,with verbal IQ typically lower than Performance IQ (e.g., Bender et al. 1986; Netleyand Rovet 1982; Rovet et al. 1996; Boone et al. 2001). We see an influence of the Xchromosome again here, but this time an extra X has the effect of causing readingdifficulties—the opposite profile to that seen in TS girls.

I have been conducting research on KS adolescents and men16 with the objectives(1) to determine how long such deficits persist across the lifespan in KS males and(2) to conduct a comprehensive linguistic-theoretically driven investigation of lan-guage production and comprehension in KS.17 Our subjects to date comprise twelveadolescents and men with KS, none retarded. Eight are highly educated professionals(engineers, lawyers, accountants, teachers, insurance brokers), two are high-schoolstudents, two are college students. Subjects are presented in Table 5.1 below.

Our findings to date are summarized in Table 5.2 below. Surprisingly, we have notfound any traces of persistent reading difficulties, at least at the level of the word, even

TABLE 5.1. KS subjects by age

Subj.

CA

MS

14

GS

18

TR

32

JK

34

GS

68

MC

16

PR

22

SP

28

PG

47

GM

49

CM

36

WL

17

15 This raises the possibility that Move itself and/or the principles that trigger or underlie it mightrepresent a separable submodule within the syntax.

16 This research has been done in collaboration with S. de Bode and D. Geschwind and is still ongoing.17 Some of the subjects overlap with those studied by Boone et al. (2001), yet our findings diverge quite

a bit from theirs—the result most probably of the nature of the tests and tasks used.

Page 98: Rich Languages From Poor Inputs

TABLE 5.2. Test performance of KS subjects

Subj.

SS Relatives

SO Relatives

OS Relatives

OO Relatives

Obj. Clefts

S-V Agreement

Negative Scope

Binding* Items

Control* Items

Presupp.* Items

Entlmt. Items

Homophones

Reading Homophones

Reading Rhymes

Reading Nonsense Words

Curtiss Agreement

CELF-PFS

Gopnik Plural

MS

5/5

5/5

5/5

4/5

4/5

20/20

5/5

2/4

2/6

2/4

1/4

18/18

30/30

28/30

18/20

32/32

>norms

20/20

GS

5/5

5/5

5/5

5/5

5/5

20/20

5/5

1/4

0/6

1/4

3/4

18/18

29/30

29/30

19/20

32/32

>norms

20/20

TR

5/5

5/5

5/5

3/5

3/5

20/20

5/5

2/4

1/6

2/4

4/4

18/18

30/30

30/30

20/20

32/32

> norms

20/20

JK

5/5

5/5

5/5

5/5

5/5

20/20

5/5

0/4

3/6

3/4

1/4

18/18

29/30

30/30

18/20

28/32

>norms

20/20

GS

5/5

5/5

5/5

5/5

4/5

20/20

5/5

1/3

0/6

1/4

2/4

18/18

30/30

29/30

17/20

32/32

>norms

20/20

MC

5/5

5/5

5/5

3/5

4/5

20/20

5/5

1/3

0/4

3/4

0/4

18/18

30/30

30/30

20/20

32/32

>norms

20/20

FR

5/5

5/5

5/5

5/5

4/5

20/20

5/5

o/3

0/5

0/4

1/4

18/18

29/30

20/20

32/32

>norms

20/20

SP

5/5

5/5

5/5

5/5

5/5

20/20

5/5

0/4

1/4

0/4

4/4

18/18

29/30

29/30

18/20

32/32

> norms

20/20

PG

5/5

5/5

5/5

5/5

5/5

20/20

5/5

1/4

1/3

4/4

3/4

18/18

30/30

28/30

20/20

32/32

>norms

20/20

GM

5/5

5/5

5/5

5/5

5/5

20/20

5/5

2/4

o/3

0/4

2/4

18/18

30/30

30/30

19/20

32/32

>norms

20/20

CM

5/5

5/5

5/5

3/5

2/5

20/20

5/5

1/3

1/5

4/4

4/4

18/18

30/30

30/30

17/20

32/32

>norms

20/20

WL

5/5

5/5

5/5

5/5

5/5

20/20

5/5

2/4

2/6

2/4

4/4

18/18

30/30

30/30

17/20

32/32

>norms

20/20

*From the Binding, Control, and Presupposition Test

Page 99: Rich Languages From Poor Inputs

84 Curtiss

in our youngest subjects. All twelve read the words on our Reading Nonsense WordsTest like the normal adults we used as controls, making at most three 'errors' (an errordefined as a pronunciation that none of our twenty native English speaker controlsproduced), three of the twelve made one error on the Reading Homophones Test, andonly two made two errors on the Reading Rhymes task18, with the remainder makingeither no errors (eight) or one error (two). Additionally, all twelve have normal speechand conversational abilities and frequently produce sentences with complex syntacticstructures, including passives and embedded clauses of all types, including relativeclauses with object extractions.

However, 'to a man, all twelve show difficulty with Binding and Control, withsome also doing poorly on tasks exploring Presupposition and Entailment as well.A few have also shown difficulty with the Object Clefts and OO relatives subtestsof the CYCLE-R, and one subject made four errors with nonce items on the CurtissAgreement Test, failing to add the I-si 3rd singular marker to stems that ended in asibilant, and changing one stem from [pern] to [penz].

The Binding and Control items of the BCP Test provoked comments like, "This isgiving me a headache. Can we please stop?' from almost all the subjects, and othercomments like, 'I have no idea' or 'Maybe' or 'How should I know?' were maderepeatedly throughout the test on these items. Given how well these same subjects per-formed on the Sentence Judgement Test and the large battery of comprehension andelicited production tests of grammar, which require correct sentence interpretationbecause of the foils on the comprehension tasks put there to test just that and controlof the large number of grammatical structures tested on the production subtests, ourKS subjects appear to have a quite selective and specific 'hole' in their grammars—one that involves ready access to the Binding Principles or the relevant C-Commandrelations for Principles A and B and construal of nominals. Some also appear tohave deficits in the complex verb semantics of presupposition and entailment, all ofthese, perhaps, deficits at the level of the syntax/semantics interface. Such selectivedeficits are evidence that the computational component is divisible into subsystems farmore specific than the submodules of phonology, morphology, syntax, and semantics(logical form). They provide support for a linguistic theory that incorporates interfacelevels of computation and representation, after the fashion of Minimalist models,as an example. In any event, these findings do not corroborate those of others whohave worked on this population, though it should be stated that ours maybe the firstlinguistically informed investigation into the grammar of the KS population.

5.3.3 DAT

In research with Kempler, we studied a population of twenty adults with DAT whovaried along a continuum of severity (as measured by the Mini Mental Status exam).

18 All of these reading tests were designed to elicit errors by including different words with different pro-nunciations for the same letter sequence (said-paid), same pronunciation for very different letter sequences(write-right), or by requiring close attention to orthographic content (dose-does, loose-lose).

Page 100: Rich Languages From Poor Inputs

Revisiting Modularity 85

We conducted several behavioral experiments on this population, including two thatI will briefly describe here. (See Kempler et al. 1987 for a full description of thisresearch.) In one study we examined spontaneous speech to determine whether therange of syntactic structures used, the frequency of a predetermined set of syntacticstructures used, and the type and frequency of morphosyntactic vs lexical/semanticerrors in our DAT samples differed, and if so, in what way(s) from that found incomparably sized samples of speech by SES, age, and sex-matched controls. The twogroups differed significantly in the number of lexical/semantic errors made, with theDAT group making significantly more sem./lexical errors (T = 3.35i,p < 0.005). Thenumber of morphosyntactic errors made by the DAT subjects ranged from none tothree, with four of the twenty subjects making no such errors. In contrast, all DATsubjects made at least three semantic errors, with three subjects making more thantwenty-six such errors. Fourteen of the twenty normal controls made no errors of anykind; the remaining six made only six errors in total, four syntactic and two lexical.Moreover, the number of lexical/semantic errors made correlated significantly withseverity of illness (r = .7057, p < 0.025), while morphosyntactic errors appeared tobe independent of disease stage (r = .1287, p > 0.05).

Additionally, the frequency with which specific construction types were used wasrank-ordered and compared and was almost identical between the two groups (r =•9833, p < o.oooo) and the structural complexity embodied in the sentences usedwas also compared (with complexity indexed by constituent movement and numberof embedded clauses), and again no significant differences were found (t = —.664,p > 0.05). In contrast, the DAT population showed difficulty with conversationalpragmatics, often drifting off-topic or interjecting inappropriate comments.

The second experiment involved examining disambiguating differently spelledhomophones by means of a semantic vs a syntactic cue (using the Curtiss andKempler Written Homophones Test). Words were spoken aloud in pairs and subjectsasked to write the words just spoken. A word semantically related to one member ofthe homophone pair accompanied half the items; a word providing a syntactic cueoccurred with the other half. (Each word was presented with each type of cue (e.g.,lake-sea, the sea; look-see, /-see).) There was a significant difference in the ability of theDAT subjects to make use of the two kinds of disambiguating cues, with the ability toutilize syntactic cues far more preserved than the ability to make use of the semanticcues (t = 6.147, P < o.ooi). These findings paint a striking picture—morphosyntaxis significantly more preserved than is the lexicon or phrasal semantics. This findingagain points to the decomposability of the linguistic system, with semantics (lexicaland phrasal) and pragmatics in this case far more impaired than the computationalsystem.

5.34 SLI

In section 5.2.7 I discussed the selective impairment to the computational systemevidenced by G-SLI children. However, SLI is an umbrella label for a heterogeneous

Page 101: Rich Languages From Poor Inputs

86 Curtiss

population, one that can readily be divided into subgroups with differing impair-ments. For many individuals with SLI, the linguistic deficits involved are widespreadand affect linguistic performance seemingly across the board, from word learning,to articulation, to grammatical development, to pragmatic function. But for manywith SLI their deficits are much more circumscribed. In a study examining justthis, Friedmann and Novogrodsky (2008) identified SLI children who had only aphonological deficit ('PhoSLI'), only a syntactic deficit ('SySLI'), a selective lexicalimpairment ('LeSLI'), or a selective pragmatic deficit ('PraSLi'). A few children intheir study showed deficits in more than one area, but the majority of the children intheir sample did not. Together with the G-SLI children studied by van der Lely andcolleagues, all of whom are reported to have deficits implicating the computationalcomponent, we find in this single developmental population evidence for selectivedisruption of the computational component as opposed to the lexicon or pragmatics(the G-SLI children), and a variety of SLI types, each of which manifests selectivedeficits in only one part of the linguistic system, including all its subsystems, phonol-ogy, morphosyntax, lexicon, and/or pragmatics.

Many other linguists studying SLI have identified the deficit in the children theystudied as being very discrete, for example, involving only the marking of finiteness,or only tense, or only agreement, or primarily a deficit in affixal morphology, orverb movement, or the operation Move more generally, or hierarchical complexity(defined as a branching structure, regardless of the subsystem of the computationalcomponent involved). Although at this time there is no agreement among linguistsas to the best characterization of SLI, the current state of the art in research on SLIfrom a theoretical linguistic perspective is that this population is one that illustratesLMod in a strong form. The impairments in the SLI population demonstrate thatlanguage can be divided into many distinct components and subsystems, and withinthese, particular principles or operations can, themselves, be selectively impaired orspared.

5.3.5 Evidence for LMod from studies on the genetics of language

Although still relatively new, the study of the genetics of language already providessupport for LMod. Though there are many kinds of studies that comprise this field(e.g., concordance studies, familial aggregation studies, linkage studies), most of therelevant studies for our purpose come from studies of MZ twins, one or both of whomhave SLI. These studies examine the family histories of these twin pairs to determineif there is a greater incidence of language or language-related impairments in suchtwins than is found in the families of MZ twins, neither of whom have SLI.

Stromswold (2007) conducted a meta-analysis of twin studies and reports thatfor both language-impaired and normal twins, genetic factors are found to affectvocabulary the least, syntax a bit more, articulation and phonology even more, and

Page 102: Rich Languages From Poor Inputs

Revisiting Modularity 87

general language function most of all. Interestingly, there is little genetic overlapbetween babbling and other areas of linguistic development, suggesting that differentgenetic factors underlie babbling than other aspects of language.19 Her meta-analysisof twin studies indicates a 68 percent heritability rate for MZ twins' phonologicaldevelopment, a 56 percent heritability rate for MZ twins' syntactic developmentand performance, and a 40 percent heritability rate for MZ twins' lexical and lexicalaccess abilities.

Twin studies of this kind provide clear evidence of a significant heritable factorfor language, indicating, as stated in section 5.2.7 above, that genetic factors play anon-negligible role in language, both the language of normals and those with SLI.Moreover, these studies demonstrate that genetic factors affect all aspects of language,though I report here only the findings for phonology, morphosyntax, and the lexicon.While clearly there are genetic factors that are not specific to language that togetherwith others may influence language development, it appears that some genetic factorsare specific to language and include genetic factors specific to only pieces of language.Thus, here is another source of data that speak to LMod and support its basic tenets.

5.3.6 Evidence for LMod from studies of the neurology of language

Recent electrophysiological evidence provides fascinating evidence in support ofLMod. Using intracranial electrophysiological (ICE) recordings, a technique thatallows microscopic temporal and spatial measurement of brain activity, Sahin et al.(2009) found robust evidence for distinct neural processing of lexical, morpholog-ical, and phonological processing. In distinct, neighboring regions of Broca's area,three individuals revealed the very sequential computational processing hypothesizedby models of lexical processing (e.g., Levelt et al. 1991). As the authors state, theirfindings 'suggest that a linguistic processing predicted on computational grounds isimplemented in the brain in fine-grained spatiotemporally patterned activity.'

ERP evidence also supports LMod (as well as BMod). As noted above insection 5.2.1.2, an early ERP component referred to as the ELAN is a componentassociated with automatic syntactic processing and structure building present inboth adults and children. This component, however, has been shown to be absent inG-SLI children. Comparing eighteen G-SLI children and adolescents with language-matched controls, age-matched controls, and normal adults, Fonteneau and van derLely (2008) found that the ELAN was absent only in the G-SLI group. Moreover, ashas been found with agrammatic aphasics (Hagoort et al. 2003), Fonteneau and vander Lely found that their population, impaired in specific syntactic computations,appears to attempt to compensate for their syntactic deficits by using extra-syntacticprocessing, in this case leading to the utilization of neural circuitry associated withsemantic processing.

19 This is not to suggest that babbling is not part of language development and maturation of thelanguage faculty, a fact that work by Pettito and Marentette (1991) and others has clearly established.

Page 103: Rich Languages From Poor Inputs

88 Curtiss

An ever-growing number of fMRI studies evidence distinct neural responses fordifferent components of language. Dapretto and Bookheimer (1999), examining syn-tactic and semantic processing, reported distinct fMRI activation patterns for each.Indefrey et al. (2001) found lexical and syntactic processing to elicit distinct andseparable neural responses. Yamada and Neville (2007) and many others report robustneural processing of syntactic form in the absence of meaning, and therefore a clearseparability of the two both neurally and cognitively. Newman et al. (2001) also foundboth temporally and spatially distinct patterns of activation for semantic vs syntacticacceptability judgments using fMRI and ERR Moreover, the fMRI patterns temporallymirrored the distinct ERP patterns for the same stimuli—providing corroboratingevidence that syntactic and semantic processing depend on distinct neurolinguisticprocesses and neural substrates. Even within the subsystem of syntax, we find neuralreflexes of distinctions that syntactic theory posits. For example, Ben-Shacher et al.(2004) and several others report a consistent, spatially defined neural response tosyntactic movement, and Santi and Grodzinsky (2010) demonstrate spatially distinctneural processing of movement vs clausal embedding.

Like the aphasic literature, the imaging literature is at this point too vast to coverhere. What is important for the thesis of this chapter is that there are a plethora ofstudies from laboratories across continents with converging findings that demonstrateclear distinctions between syntactic and semantic processing, including many studiessuggesting differential brain circuitry for different classes of syntactic structures (e.g.,those involving object extraction vs subject-verb agreement) vs violations as well asdifferent brain activation patterns for phonological processing as opposed to otherlevels of linguistic processing.

Though too numerous to mention or discuss, the findings of these studies supporttwo fundamental tenets of LMod: (i) the separability of language into submodules,each of which can be differentiated from the others in terms of neural circuitry aswell as on linguistic-theoretic grounds, and (2) the further decomposability of themajor components of language: phonology, morphology, syntax, or lexicon into evensmaller, discrete subsystems.

5.3.7 Non-linguistic evidence for LMod: the fractionation of othermental domains into submodules

Almost all of the above discussion has revolved around language as a test case for theviability of the Modularity hypothesis. Yet if the human mind is indeed modular inits make-up, one should find evidence of this regardless of which particular domainone examines. As many have asked, if language is like it is, then what does thatindicate the mind must be like more generally? In addressing this question, one findsthat language serves as a window into the nature of mind more broadly. When oneexamines cognitive domains outside of language, one finds that they, too, appear tobe modular in composition. And, as with language, this is typically most visible fromthe purchase of impairment—the atypical case.

Page 104: Rich Languages From Poor Inputs

Revisiting Modularity 89

One can find instances of quite specific agnosias, for example, agnosias for specificobjects or categories of objects (Apperceptive agnosia or 'category-specific agnosia'),agnosia for faces (prosopagnosia), for words ('pure word deafness'), or quite spe-cific categories of words, for numbers or for colors (achromatopsia). Moreover,non-linguistic cognitive domains can be fractionated into submodules, most apparentwhen the system is impaired, with clinical literature providing the necessary evidencethat these pieces can be selectively impaired or spared.

The system underlying our body awareness and sense of self (proprioception) isanother cognitive domain that can be fractionated, such as with loss of one's sense ofbody posture, not uncommon in Parkinsons, or failure to recognize one's body parts asone's own (asomatagnosia), or the opposite, phantom limbs, in which one experiencesas present a body part that is no longer there or was never there (e.g., Melzack 1992;Ramachandran and Blakeslee 1998; Sacks 1987; Shreeve 1993).

Spatial cognition can similarly be fractionated into discrete submodules. Clinicalcases of selective impairments of spatial transformation (including selective deficitsin processing mirror images or other degrees of mental rotation) and spatial locationhave been documented (e.g., Bricolo et al. 2000), and neglect of one/half of space(hemi- or unilateral neglect) is not at all uncommon, typically, though not always,following right hemisphere parietal damage.

Selective deficits within the domain of number knowledge and processing have alsobeen documented; in addition to developmental dyscalculia, a selective impairmentwith the number faculty alongside otherwise normal cognitive function, deficits inprocedural dyscalculia and number facts dyscalculia as separate impairments havealso been described (e.g., Temple 1991).

Visual cognition can be fractionated as well. Visual closure ('Gestalt Perception),color agnosia, the ability to apprehend the relation between a part and the whole ofwhich it is a part (e.g., arc to circle), the ability to locate embedded figures within otherfigures, the ability to recognize particular visual configurations, meaningful or not, theability to reproduce (copy or construct) particular visual configurations, meaningfulor not, and all the various visual agnosias are all part of the mental faculty of visualcognition.

A subcomponent of visual cognition is facial recognition, which in and of itself isdecomposable. The system of facial recognition can be impaired broadly (prosopag-nosia), resulting in extensive dysfunction in the otherwise automatic ability to rec-ognize a face as a face (one form of visual agnosia), and such an impairment canbe developmental (as in congenital prosopagnosia) or acquired (typically after righthemisphere damage). However, the cognitive faculty of facial recognition can befractionated into more discrete pieces, such as the ability to recognize familiar faces,an impairment of which can have serious psychosocial consequences (as in Capgrassyndrome). The facial recognition system can also be affected after brain damage insuch a way as to lead to abnormal hyperemotional responses to unfamiliar faces, as inFregoli syndrome (Ramachandran and Blakeslee 1998; Feinberg 2001).

Page 105: Rich Languages From Poor Inputs

90 Curtiss

The domain of music cognition, too, can be fractionated into discrete subsystems,each of which can be selectively impaired or spared, both developmentally and asa result of acquired brain damage. Deficits in rhythm, timbre, melody, and har-mony have all been reported (e.g., Sacks 2007), and while recent research revealstantalizing relationships and similarities between music and spoken language, onecan be impaired while the other remains intact—amusia without aphasia and aphasiawithout amusia (e.g., Sacks 2007). Both patterns have been documented frequently inDAT (e.g., Cuddy et al. 2005; Cuddy and Duffin 2005; Piccirilli et al. 2000). Moreover,at least aspects of music cognition can be seen in the brains of newborns (Perani et al.2010). Similar to brain imaging results for aspects of linguistic processing, Peraniet al.s results indicate that within the first hours of life the infant brain shows (right)hemispheric specialization for music and is differentially sensitive to subcomponentsof music, in this case, to differences in consonance and dissonance and to changes intonal key. As with language, before brain regions responsible for such processing inthe adult are at all mature, the neural architecture underlying such processing appearsto be hardwired into the species.

5.4 Summary and Conclusions

The primary objective of this chapter has been to take a new look at the question ofmodularity of mind, bringing to bear evidence from a wide array of sources fromwhich to examine its basic tenets—evidence from studies on the neurology of lan-guage, the genetics of language, from cases of atypical development, cases of geneticanomalies, language breakdown, from cognitive dissolution, and from a variety ofcognitive domains in addition to language. I am well aware of efforts to debunk thisview, the arguments raised toward that end, and the disfavor into which a modularview of the mind has fallen. I believe, however, that when one considers the vast arrayof relevant evidence, new and old, only a small amount of which I was able to includehere, there is strong reason to conclude that language, and in particular, grammar,is a mental faculty that rests on structural organizing principles and constraints notshared in large part by other mental faculties and in its processing and computationis automatic and mandatory. Further, the evidence I have described strongly indicatesthat language, itself, is comprised of distinct submodules which can be selectivelyreflected, impaired, or spared, neurologically and cognitively.

A look at domains outside of language further supports the fundamental notion ofa modular mind. The various discrete agnosias and fragmentations of other mentalsystems only briefly mentioned provide potent non-linguistic evidence that the mindis composed of a set of mental faculties, which under normal circumstances intricatelyinteract in a beautiful dance that we recognize as normal human function, or beinghuman, but when examined carefully, can be seen as separable pieces that togethercomprise the human mind—a modular mind.

Page 106: Rich Languages From Poor Inputs

Every Child an Isolate: Nature'sExperiments in Language Learning

LILA GLEITMAN AND BARBARA LANDAU

In this chapter we will concentrate our attention on two specific issues that are implicitin Carol Chomsky's challenging work: to understand how children come to knowas much as they do about language and its interpretation onto the world, whenthe information they receive is paltry. The first concerns the robustness of languageacquisition to variability in learners' access to input that would seem crucial to thefunction being acquired, as dramatized by studies of language in people who becameboth deaf and blind during infancy. The second concerns the abilities of children toreconstruct the meanings of sentences with covert structure, as in Carol Chomsky'slandmark studies of whether blindfolded dolls might be hard to see. These two themesare crucially related, of course, for both exemplify the general problem known as'the poverty of the stimulus'; in the present case, how humans reconstruct linguisticform and meaning from the blatantly inadequate information offered in their usableenvironment (cf Plato 38080; N. Chomsky 1965; J. A. Fodor 1981, inter alia).

6.1 See and the Blind Learner

Children ordinarily acquire their native tongue in circumstances where they canlisten to speech that refers to the passing scene. To use a famous example, a luckylearner might hear 'Lo! Rabbit!' just as a rabbit hops by. Not only Quine (1960) butserious commentators of every theoretical persuasion are at pains to emphasize thatsimply alluding to this word-world pairing leaves us light years from the specifics ofvocabulary acquisition; indeed exposing the class of problems here is the very purposeof discussing rabbits spied by vexed field linguists (in related regards, see particularlyN. Chomsky 1957; Goodman 1951). All the same, it is safe to say that the sensiblepairing of sound to circumstance is a crucial precondition for learning, playing acausal role for both vocabulary and syntax acquisition, and most especially at early

6

Page 107: Rich Languages From Poor Inputs

92 Gleitman and Landau

stages of those learning processes. After all, children must have access to informationallowing them to build representations of the words and sentences they hear, and tointerpret them semantically; somehow preliminary information for doing so mustbe derived from situational contingencies. For this reason, it has been recognized atleast since the time of the British empiricists that experience-deprived individualscan provide critical evidence for understanding the learning procedure. For instance,could one acquire the word red or even the concept that it linguistically encodes if onecould not see? David Hume (1739/1978) voted no:

.. .wherever by any accident the faculties which give rise to any impressions are obstructed intheir operations, as when one is born blind or deaf, not only the impressions are lost, but alsotheir correspondent ideas; so that there never appear in the mind the least traces of either ofthem. (p. 49)

6.1.1 Children who are blind and deaf from early in life

Helen Keller, blinded and deafened by a sudden high fever in the middle of the secondyear of her life, learned to speak and understand English—and for good measurelearned Latin and Greek and algebra and most of the social graces. She lectured allover the world, interacted easily with Presidents and literary celebrities, and wrotetwelve books. Carol Chomsky (19863) studied three people in very similar circum-stances, deafened and blinded through illness (usually meningitis and its associatedfever) very early in life. "The unusual channel' (as Carol Chomsky called it) throughwhich language is learned and used by these individuals shows how much can changein the learners environment with little consequence for final attainment. To perceivespeech at all, the deaf-blind must place their fingers strategically at the mouth andthroat of the speaker, picking up the dynamic movements of the mouth and jaw, thetiming and intensity of vocal-cord vibration, and release of air; the overall method iscalled Tadoma. From this information, differing radically in kind and quality fromthe continuously varying speech wave, the deaf-blind recover the same ornate systemof structured facts as do hearing learners: for instance, that English has fundamentalunits including t,p, and a; that these combine into tap, apt, andpat, but not (in princi-ple) tpa, and that these larger units are categorized into classes distributed differentlyin the sequences that make up sentences.

But how are these units, so acquired, to stick to the world that they are meant todescribe? It seems almost impossible to imagine how these children make contact withthe objects, activities, qualities, and relations being spoken about by their interlocu-tors but which they can neither see nor hear. Yet enthusiastically describing a recentfield trip, one of Carol Chomsky's subjects, deaf-blind from nineteen months of age,remarks:

'I saw one cab flattened down to about one foot high. ..And my mechanics friend told me that thedriver who got out of that cab that was squashed down by accident got out by a [narrow] escape!(in C. Chomsky 19863: 337)

Page 108: Rich Languages From Poor Inputs

Every Child an Isolate 93

So here is language, acquired on a puff of air in a world that ends at the ringer tips,complete with embedded relative clauses and including the semantically appropriateuse of the word see. How can this sophisticated knowledge be explained when theinformation supplied by the environment is so different and apparently diminishedfrom the usual case?

6.1.2 Studies of the blind child

The present authors made an intensive study of language learning in children whowere blind from birth (Landau and Gleitman 1985). This immediate and completedeprivation of visual information can fill some gaps left by the findings for CarolChomsky's deaf-blind subjects, two of whom were blinded and deafened at 19-20months and one at seven years of age. Arguably these individuals might have madetheir conceptual and linguistic breakthroughs in the period preceding the illness thatrobbed them of vision and hearing. Moreover, the tests of their competence wereconducted when they were in their fifties, so that the learning course for them andfor sighted individuals might be quite different. Finally, one might argue with theinterpretation of the facts. It is noteworthy that clinicians who work with the blind,observing that children in these circumstances utter 'look' and 'see' freely, are unfazedin their belief that language and concept acquisition arise from the evidence of thesenses, counseling parents not to let the children say these words because they mustbe 'empty verbalisms—sound without meaning'.

As we (re)discovered, congenitally blind infants acquire predicates that—to thesighted—refer to visual experience without having had any experience of seeing atall, and they acquire such items at the ordinary times—ages two and three. Many oftheir earliest words refer to objects, people, places, motions, and locations in ways thatseem quite ordinary, even though their experience of such things is surely differentfrom that of the sighted child. Even more surprisingly, and consistent with CarolChomsky's findings, among the earliest words in the blind child's vocabulary werethe verbs look and see, followed shortly by a variety of color terms such as red, blue,and orange. Sighted blindfolded three-year-olds told to 'Look up!' turn their faces,i.e., their covered eyes, upward, suggesting that they interpret look to implicate visionin particular. But a blind three-year-old given the same command raises her handsrather than her face, suggesting that for her the term is connected to the manual sense(see Figure 6.1).

So far so good for a theory of language learning rooted in experience of theworld. The difference in observational opportunities—haptic rather than visualinformation—leads the blind and sighted to different interpretations of the sameterms. Successful communication from mother to blind child using these visual wordsoften occurred just when the objects to be 'looked at' were in the learner's hands,further suggesting a physical contact interpretation of blind looking. However, this

Page 109: Rich Languages From Poor Inputs

94 Gleitman and Landau

(a)

FIGURE 6.1 The blind child Kelli responds to the command 'Look up' by raising her hands(Panel (a)), while the sighted/blindfolded child responds by raising her (unseeing) eyes(Panel (b)). This shows that the blind and sighted child share a representation for look thatmeans 'perceive', but that the particular modality of perception differs. (Adapted from Landauand Gleitman 1985.)

interpretation turns out to be grossly inadequate to the facts of the blind child'ssemantic competence. First, several common verbs used by the mother to the blindchild shared the property of being uttered—proportionally much more frequentlythan look—when the child had a relevant object in hand—including hold, give, put,and play, all of which are differentiated and used appropriately by blind toddlers. Thus'used with an object in hand' is insufficient to account for why look and see are theitems selected for this semantic purpose. Moreover, the blind interpretation of lookgoes far beyond manual contact. If one says 'You can touch that table but don't lookat it!' the blind toddler gingerly taps or scratches at the table. Subsequently told 'Nowyou can look at it', she systematically explores the surfaces of the table manually. 'Youcan look at this table but don't touch it' elicits only confused complaints, as it should,i.e., blind looking entails touching whereas neither blind nor sighted touching (nec-essarily) entails looking. Somehow the blind child extracts from the contextualizedspeech in her environment that look and see are terms for perceptual exploration andachievement quite different in meaning from terms such as hold and touch.

The blind child's understanding of color terms offers a similar insight: by aboutthree years of age she, like sighted peers, knew that color is the supernym of red andgreen but not of happy or round, though of course she had only hearsay knowledgeof the actual colors of common things. For instance, asked at age five 'Can a dog be

(b)

Page 110: Rich Languages From Poor Inputs

Every Child an Isolate 95

blue?' a blind child responded 'A dog is not even blue. It's gold or brown or somethingelse.' But more interestingly, when asked 'Can an idea be green?' she responded—asdid sighted peers—'Really isn't green; really just talked about—no color but we thinkabout it in our mind.' Blind learners' experience with blue dogs and green ideas isexactly the same, namely none. But the response to whether either of these two 'couldbe' some color is different in a principled way.

Summarizing, blind looking differs from sighted looking by being linked to a dif-ferent spatial sense modality: haptic rather than visual. But blind look and see differfrom hold, touch, etc., in being terms of perception. The blind child's understandingof color is that it refers to an (unknown) quality of concrete objects and not to men-tal objects. These findings display the remarkable resilience of semantic acquisitionover variations of input: Lacking the ordinarily relevant observations that (one mightguess) support solution of the mapping problem for visual terms, the blind are nothelpless to do the same.

But then what is the basis for the learning of these terms? Two questions are urgentto engage here: The first is where the information came from. The finding that lookingis visual for the sighted but haptic for the blind suggests that for both populations theword meanings are linked to the world, and conform in detail to how the learnerinfers the semantics from situational contingencies. Maybe this question is in callingdistance of an answer if we guess that a sighted child hears look when in visual contact(with something) whereas a blind child hears look when in haptic contact; that is, adultcaregivers are sure to adjust to the facts about the child's blindness. We will return tothis matter later in discussion, trying to unravel a few of the questions we just beggedby so saying. But the finding that terms like look are perceptual, distinct from suchcontact terms as touch and set eyes on seems even less straightforward. It is this secondquestion that brings us to the second major line of Carol Chomsky's investigationsof language learning.

6.2 Why is Easy Hard? The Syntactic Encoding of Argument Structure

6.2.1 The experiments

Carol Chomsky approached the problem of how children learn predicate semanticsfrom the point of view of how they learn syntax (1969). At a time when few languageacquisition researchers studied anything more complex than two-word speech andits inchoate surface organization (e.g., Braine 1963), Carol Chomsky was studyingchildren's knowledge of delicate aspects of English verbal syntax, using ingenious andcarefully controlled elicitation procedures. Famously, she asked if a blindfolded doll is'hard to see'. And her four- and five-year-old subjects confidently replied yes, 'becauseof the blindfold'. One revelation from this work is thus that learning isn't all overand done with by three or four years of age; rather, complexities are still evolving

Page 111: Rich Languages From Poor Inputs

96 Gleitman and Landau

through the school years, with a certain few structures appearing to elude some nativespeakers throughout life. Notice that the root meaning of hard isn't what's making thedifficulty, for this much young children understand by age two or three. Rather, theymisunderstand the associated requirement that the (covert) subject of the infinitivein the complement clause is not doll but an implied party who can be anybody exceptthe doll. In contrast, in "This doll is eager to see' the subject of the infinitive, the onewho sees, is indeed the very doll who is eager to do so.

6.2.2 Explanations: the minimum distance principle

Carol Chomsky studied several other English structures, very different in their syn-tax and semantics from easy/eager, finding the same disparities in learning rate andcharacter depending, again, on how grammatical subjects are assigned to infinitivalcomplement clauses. For instance, a cooperative response to 'Tell Bozo to jump' is'Jump, Bozo!' a command directed to Bozo. But the appropriate response to 'PromiseBozo to jump' is something like 'Bozo, I promise I'll jump.' Tell treats its object NP assubject of the embedded clause but promise assigns this role to the subject NP.

A single principle predicts the facts of interpretation and learning disparity forboth tell/promise and easy/eager. Chomsky expressed this as a 'minimum distance'principle (MDP), i.e., the structurally closest NP argument of the upstairs clause isthe mandatory subject of the infinitive in the embedded clause. She further arguedthat because this principle holds very generally in English, any violations of it, suchas in easy and promise constructions, should be and are hard for children to learn.Regularities first, exceptions later.

6.2.3 Syntactic principles and semantic interpretation: hard and easy, together again

Perhaps the apogee of this line of reasoning is Carol Chomsky's explication of therelationship between the syntactic principles she uncovered and the semantic inter-pretation of predicates for which they are licensed. In her words, 'We have two seman-tic classes and an unambiguous syntactic process associated with each.' Specifically,command verbs (tell, order, etc.) obey MDP while promise verbs do not. And verbsof requesting, which arguably fall semantically between the two, accept either choice.Hence the ambiguity of 'I asked the teacher to leave the room'1 and of "These mission-aries are ready to eat'. Generalized, the idea is that the argument-taking propertiesof predicates are reflected in the (interpreted) syntactic structures that they license:If easy is hard, then hard simply can't be easy. Thus one learning dictum implied bythese studies is that knowledge of the root meaning predicts aspects of clause structure(for extensive discussion, see Pinker 1984, 1989; Grimshaw 1981; for experimentalevidence of the predictive power and stability of these mappings, see Fisher, Gleitman,

1 A seven-year-old of our acquaintance explained what this sentence could mean 'Either I asked theteacher if I could leave the room to go to the bathroom or if she would leave the room so I could go to thebathroom in privacy.'

Page 112: Rich Languages From Poor Inputs

Every Child an Isolate 97

and Gleitman 1991; Fisher, Hall, Rakowitz, and Gleitman 1994). When these expecta-tions are frustrated, sentences containing the predicate are likely to be misinterpreted.

6.2.4 Learning effects of universal correspondence rules

In her early studies, Carol Chomsky described MDP and its associated semantic link-ing as a dominant pattern specific to English, explaining the late learning of easy struc-tures as arising from their observed irregularity for the input corpus as a whole (Slobin(2001), has coined the term 'typological bootstrapping' to describe this learning phe-nomenon). There is of course considerable evidence that a significant proportion ofthese linkages represent universal tendencies in how languages map between clausestructure and the argument-taking properties of predicates (Baker 2001; N. Chomsky1981; Croft 1990; Dowty 1991; Fillmore 1968; Jackendoff 1983; Rappaport Hovavand Levin 1988, inter alia) and in several known cases cross-language stability in thisregard predicts learning rate just as it did in Chomsky's early studies of child English.

One further extensively studied instance concerns locative verbs that describethe relation (spatial or at least metaphorically spatial) between some moving entity(the Figure) and its position (Ground); for discussion see Talmy 1985, and in thelearnability context, Pinker 1989). These vary in their syntax both within and acrosslanguages as to whether Figure or Ground term captures the direct object position.Sometimes a single language has a pair with different morphology representing thesechoices (e.g., substitute/replace, pour/fill) and sometimes not (e.g., load, as in both'John loaded hay into a wagon/loaded a wagon with hay', Fillmore 1968). Early diarystudies from Bowerman (1982) documented errorful learning for some of these items.Kim, Landau, and Philips (1999) examined the learning functions for a variety ofsuch items in Korean and English. Of special interest in the present context, theyshowed that errors and late learning are largely confined to cases in which differentlanguages vary in their patterning for verbs whose root meaning is the same. Forinstance English three-year-olds say 'Fill water into the glass' almost 100 percentof the time even though this is a Ground verb in English. But it is a Figure verbin Thai, and an Alternator in Korean and Singapore Malay. This again shows thatuniversal correspondence patterns are playing a powerful learning role, for learningis decremented where there is cross-language variability in the mappings. There seemto be more and less natural correspondence patterns.

6.2.5 The scope and power of linking rules as information sources for learning

We have just discussed some instances in which NP positioning (including covertNPs) is conditioned by semantic factors and plays a role in acquisition. It is wellknown from extensive linguistic investigation that argument type and number alsomap systematically onto a semantic cross-classification of the verb lexicon (see earliercitations, also Levin for an English compendium, and Pinker 1989, Gleitman 1990,

Page 113: Rich Languages From Poor Inputs

98 Gleitman and Landau

and Fisher, Hall, Rakowitz, and Gleitman 1994 for discussions of learning implica-tions). Verbs that accept sentences as their complements describe relations betweentheir subjects and an event or state; these include verbs of cognition (know, think),perception (see, hear), and communication (explain, say). Verbs that license threenoun-phrase arguments describe relations among the referents of those three nounphrases, typically transfer of position (put, drop), possession (give, take), or informa-tion (explain, argue). These regularities can be recovered from a sample of Englishsentences produced in spontaneous child-directed speech in languages as disparateas English (Lederer, Gleitman, and Gleitman 1995), Mandarin Chinese (Li 1994), andHebrew (Geyer 1991). Thus verbs' syntactic behavior provides a potential source ofinformation that systematically cross-classifies the set of verbs in much the same waywithin and across languages, pointing to the same dimensions of semantic similarity;a corpus with these characteristics is readily available in natural speech to infantlearners.

Recent experimentation demonstrates that this source of information is heavilyexploited by learners in interpreting novel predicates (e.g., Fisher et al. 1994; Gleit-man 1990; Fisher 1996; Gillette, Gleitman, Gleitman, and Lederer 1999; Gleitmanet al. 2005; Lidz, Gleitman, and Gleitman 2003; Landau and Stecker 1990, amongmany sources). Thus, infants under two years of age interpret gorp' as encoding acausal predicate in "The rabbit is gorping the dog' but not in "The rabbit and thedog are gorping', though they are seeing the same scenario play out in both cases(Naigles 1990). Symmetrically, three- and four-year-olds reinterpret known verbsin new ways if they are used in novel constructions. 'Noah comes the elephant tothe ark' is interpreted as a verb of transfer (bring) while 'Noah brings to the ark' isinterpreted as a non-causal verb of motion (come) (Naigles, Gleitman, and Gleitman1993). The same interpretive strategies, corrected for other architectural principlesthat differentiate languages, have been documented for young learners in languages asdisparate as English, Greek, and Kannada (e.g., Lidz et al. 2003; Papafragou, Cassidy,and Gleitman 2007). The children's inferential method would seem analogous to howwe understand Lewis Carroll's Jabberwocky consensually though its content wordsare apparently so much nonsense. Borogoves, for example, likely are indulging insome self-caused activity when they gyre in the wabe. This likelihood arises pal-pably from the fact that the nonsense verb is surfacing in a one-argument struc-ture. We can further examine this argument number clue to predicate interpretationby reference to another of nature's experiments: language learning in young deafchildren.

6.2.6 The isolated deaf: deprivation of linguistic input

Some of the most striking evidence that the structure of human cognition yields alanguage-appropriate division of our thoughts into semantically constrained predi-cates and arguments comes from learners who are isolated from ordinary exposure

Page 114: Rich Languages From Poor Inputs

Every Child an Isolate 99

to a language and therefore have to invent one on their own. Most deaf children areborn to hearing parents who do not sign, and therefore the children may not come intocontact with gestural languages for years (Newport 1990). Deaf children with no avail-able language model spontaneously invent gesture systems called Home Sign (Goldin-Meadow 2003; see also Senghas 2003 for evidence of how fully and rapidly suchsystems evolve if there is a viable interactive community). Remarkably, though thesechildren are isolated from exposure to any conventional language, their home signsystems partition their experience into the same pieces that characterize the elementsof sentences in Italian, Inuktitut, and English. Specifically, home sign systems havenouns and verbs, distinguishable from each other by their positions in the children'sgesture sequences and by their distinctive iconic properties. Moreover, and especiallypertinent to the issues that we have been discussing, sentence-like combinations ofthese gestures vary in both the number and positioning of the nouns as a functionof what their verbs mean. Systematically appearing with each verb in a child's homesign system are other signs spelling out the thematic roles required by the logic of theverb: the agent of the act, the patient or thing affected, and so forth (Feldman, Goldin-Meadow, and Gleitman 1978). The nature of this relationship is easy to see from a fewexamples: Because crying involves only a single participant (the crier), a verb withthis meaning is associated with only one nominal argument. Because tapping hastwo participants, the tapper and the thing tapped, such verbs may appear with twonominal arguments. Because giving requires a giver, a getter, and a gift, this verb isassociated with three nominal phrases in the deaf children's spontaneous signing (cf.N. Chomsky 1981). Thus the same fundamental relationships between verb meaningand nominal arguments surface in much the same way, and at the same developmentaltimes, in the speech of children who are acquiring a conventional language, and in thegestures of linguistically isolated children who must invent one for themselves. Suchfindings tend to undermine some theories of acquisition positing that verb structuresare learned one by one 'from the input' (e.g., Tomasello 2000). As deaf isolates factorexperience into predicates and arguments of varying types without any input at all, itseems unlikely that there is a stage at which more fortunately circumstanced childrenhave to learn the same facts in a one-by-one stipulative fashion.2

In sum, linguistically isolated children construct, out of their own thoughts andcommunicative needs, systems that resemble the languages of the world in at leastthe following universal regards: all have words of more than one kind, at minimumnouns and verbs, organized into sentences expressing predicate-argument relations.The number of noun phrases is predictable from the meaning of the verb; the posi-tioning of the nouns expresses their semantic roles relative to the verb. Thus, the

As in the speech of all young learners, the actual sequences produced by two- and three-year-old deafisolates are usually very short, with a surface length of only two or a few words, but the covert structureof complex predications can be reconstructed by examining patterns in which some argument types aredropped ('deleted') selectively (L. Bloom 1970).

2

Page 115: Rich Languages From Poor Inputs

ioo Gleitman and Landau

fundamental structure of the clause in both self-generated and more established com-munication systems derives from the non-linguistic conceptual structures by whichhumans represent events with strong preferences about how to sequence these inlinguistic expressions.

6.2.7 See must mean 'perceive haptically'

We can now revisit the specific question with which we began: how blind and blind-deaf learners come to believe that look and see are haptic perceptual terms, andthat touch is merely a contact term. The answer comes apart into the two aspectsof 'knowledge of the word' that Carol Chomsky considered in her studies of lexicaland syntactic learning. The root meaning of look/see as inferred from the situations inwhich it is said, requires haptic contact. Adults speaking to the blind do not tend to say'Look at the moon or 'Do you see that bird flying overhead?' whereas these are likelyconversational topics when addressing sighted children who have the visual distancereceptor. This difference in the environments of discourse predicts a difference in themeaning of these terms, and this is just what is found. There is no internal linguisticmarking of perceptual modality that would rein in such a distinction.

In contrast, the argument-taking properties are recoverable by blind and sightedlearners alike if, as we have argued, they have antecedent access to universal syntactic-semantic linking rules. Learning can here transcend observational information in thenon-linguistic situation, by analyzing the heard sentence itself. As we have alreadymentioned, mental content verbs, including verbs of perception, belief, and desire,license clausal complements where other verbs do not (for experimental documen-tation with young children, see Papafragou et al. 2007). One can intelligibly andgrammatically say 'Look who's coming to dinner' and 'Let's see if there's cheese in therefrigerator' but these structures are proscribed for action verbs such as jump or touch.Landau and Gleitman exhaustively coded transcripts of maternal speech to a blindchild in the earliest period of word learning (before the learner uttered any verbs), andfound that the structural contexts for the perception verbs (their subcategorizationframes) selected look and see as the only items that appeared with embedded tensedclausal complements, e.g., 'Let's see if there's cheese in the refrigerator.' (See Figure 2,and for further documentation, Snedeker and Gleitman 2004.)

Before leaving this topic, we should say that in at least some central cases the kindsof syntax-semantics correspondences that are found in language after language arenot simply stipulative; rather, the forms transparently embody their semantics. Forexample, what could be more natural than that each argument of a predicate shouldsurface as a noun phrase, and that therefore pat will be treated transitively and snoreintransitively? As for see, it expectedly appears with NP objects just because one canperceive things (and things generally surface as nouns, as any school teacher wouldtell you). But see also expectedly appears with sentential complements because onecan perceive events and states of affairs (and whole events surface as clauses). Just as

Page 116: Rich Languages From Poor Inputs

Every Child an Isolate 101

Group I Group II Group III

Looka Seea Givea Puta Holda Play Geta Have Goa Comea

Look/see only

DeicticVI 8V? iV!, S 10V?,S 3V relhow 2

OtherV like NP 5VS 5 ib

come V NP 3

Exclude look/seeVNPPPV N P DV D N PVNPNPV reUere

Overlap withlook/see

VPP 3VD 2V0 2

VNPVAP 2

Totals 34

5 31 i 228 6 2i

16 2

1

7° 25 10

3 83 3 3 13 14

3

18 21 6l 1O 1O 27 15 2O

2

134

1934 lo 21 61 lo 10 27 15 20 19

a. Verbs that occur with locative prepositions and adverbs.b. A causative use of have: 'Will we have Barbara come baby sit?'c. Play with the nonlocative (reciprocal) preposition with: 'You're not gonna playwith the triangle, so forget it!'

FIGURE 6.2 Subcategorization frames used by the mother of the blind child for verbs lookand see compared to other verbs. Note that only look/see occur with sentential complements,whereas verbs involved in transfer (give, put) or other activities involving the haptic modality(hold, play) participate in a different range of frames not used with look/see. More generally,the three sets of verbs participate in different sets of syntactic frames, suggesting that the verbclasses can be distinguished by the syntactic frames in which they occur, which could serveas a crucial source of information for the learner (blind or sighted) about the verb's meaning.(Adapted from Landau and Gleitman 1985.)

expectedly the contact term touch behaves like see in the first regard (it is transitive)

but not the second (no clausal complements).

6.3 Every Child an Isolate

This chapter has reviewed some of Carol Chomsky's early experimental studies of

language acquisition, emphasizing its continuing relevance to the theory of language

acquisition. The deep subtext of this work concerns the poverty of the stimulus

Page 117: Rich Languages From Poor Inputs

102 Gleitman and Landau

problem. Successful language learning takes place under conditions of input depri-vation that intuition suggests would pose insuperable problems. These include deaf-blind people acquiring linguistic-conceptual categories whose instances they cannotexperience. Thus usable input can differ radically across populations of learners, butthe outcome is the same, contra Hume. The other symmetrically related finding isthat all of the learners acquire delicacies of syntactic form and interpretation that(if we are literal) are experienced by nobody. Thus certain arguments of embeddedinfinitivals are never 'there' in the utterance, they are as a matter of syntactic necessityempty of phonetic content. Yet these arguments are reconstructed, and reconstructeddifferently, depending on their predicate context—systematically different cases of'nothing.' But knowledge of these syntactic properties of lexical items often, andcrucially, occurs with very significant delay: children know the semantic differencebetween easy and hard at age two or so, but they misinterpret sentences containingthese words well into the school years.

6.3.1 Real world context: Crucial but limited

Part of the explanation for why defects in situational context are readily overcomeand in part discounted in vocabulary learning is that their role is more limited thanone might think. A three-year-old's vocabulary contains an impressive proportion ofitems for which the observed world yields little or no straightforward interpretiveclues: you can't see thinking, or maybe, or seem, or wanting, or fair (as in 'not fair!').It is hard to observe forests and pets, physical as these are, because one observes thetrees and dogs instead. Even apparent 'action predicates have subtle mental contentthat can't be observed. For instance, notice that while it may be a bit hard to get bloodfrom a stone, it is impossible to give it any: give requires a sentient recipient. Yet veryyoung preschoolers acquire such items and use them appropriately, so far as can bedetermined. Finally, careful inspection of the real contexts in which even concreteobject nouns are acquired must leave one puzzled about how this could help verymuch. For example, a picture book context like that of Figure 6.33 might be envisagedas helpful indeed for learning the meaning of shoe, and maybe it is. But the fact is thatthe context of Figure 6.3b is more like what children experience every day, and 'from'which they learn most of their early vocabulary. One line of investigation finds thatfewer than 8 percent of the situational contexts of natural parental talk to infants offersobservers—child or adult—a fighting chance (50 percent correct) to guess a simplewhole-object term—a concrete noun—that the mother was then uttering; this is stud-ied by videotaping minute-long scenes of actual parent-infant conversation with thesound muted (Medina, Snedeker, Trueswell, and Gleitman 2011; Snedeker, Geren, andShafto 2007). Inefficient and errorful as this guessing game is for concrete nouns, it ismaterially worse for learning other linguistic categories—the verbs, adjectives, and soforth. Moreover, cross-situational observation, i.e., further situational input, compli-cates this picture rather than resolving it because, in the ever-changing situations in

Page 118: Rich Languages From Poor Inputs

Every Child an Isolate 103

(a)

FIGURE 6.3 What is the situational context for learning the meaning of shoe*. Panel (a): anidealized environment; Panel (b): a snapshot of a child's everyday environment.

which some single word is uttered, plausible hypotheses proliferate almost limitlesslyand overtax memorial resources.

6.3.2 Single observations, multiple cues

We just noted that even adults are quite inept at guessing the meaning of simplewords from their situational contingencies, and are successful at all only for concreteobject nouns. Similarly, word learning in infants until about eighteen months is slowand heavily skewed toward concrete nouns (Centner and Boroditsky 2001; Fensonet al. 1994). However, toward the end of the first year of life the rate of word learningaccelerates materially, to about eight words a day, and continues at this rate for all the

(b)

Page 119: Rich Languages From Poor Inputs

104 Gleitman and Landau

many months and years thereafter; as this implies, learning now seems to require onlyone or very few exposures, and is essentially errorless (Carey 1978; P. Bloom 2002).What has changed at this later stage?

The approach that seems most promising today takes into account the fact thatmultiple cues to a word meaning are present simultaneously when a word is heard.These include not only a sound and its contingent situation, but also the structure inwhich the word occurs (Landau and Gleitman 1985; Gleitman 1990; Gleitman et al.2005). As already noted, this syntactic information, once acquired, has the potentialfor picking out certain semantic classes (e.g., the command verbs, the request verbs,the mental verbs, the perception verbs).

Thus we can envisage a learning procedure that begins by pairing words to theirobservational contingencies. Such a procedure will be slow and errorful, accruingprimarily concrete ('observable') words. Limited as this early vocabulary is, it pro-vides initial ways to refer to the world, and a scaffold for projecting the languagespecifics of clause-level syntactic structure, e.g., that English is SVO (Pinker 1984;Grimshaw 1981). These clause structures are further differentiated syntactically (andsometimes morphologically), in accord with their interpretations. In the maturemachinery emerging from the age of two years through the early school years, andas was suggested by Carol Chomsky in her work, 'all this syntactic knowledge relatingto the word' then becomes a further source of interpretive inference because it is keyedsemantically to the argument-taking properties of the component predicates. Situa-tional and syntactic cues can now trade and conspire with each other to overdeterminethe meanings of words that observation, operating alone, cannot reveal. Easy.

Page 120: Rich Languages From Poor Inputs

Part II

Discrepancies between ChildGrammar and Adult Grammar

Page 121: Rich Languages From Poor Inputs

This page intentionally left blank

Page 122: Rich Languages From Poor Inputs

7

Recent Findings about LanguageAcquisition

J E A N - R E M Y HOCHMANN AND JACQUES M E H L E R

The complexity that cognitive scientists encounter when they try to understandlanguage acquisition is twofold. On the one hand, experimentalists are generallytempted to propose their preferred powerful mechanism to explain language acqui-sition without taking into account that one mechanism is unlikely to account forthis human learning skill. On the other hand many theoretical linguists often ignorethe advances made by experimentalists that explore how very young infants processspeech, the habitual carrier of language. In this chapter, we present a theoretical viewwhich, we hope, will bring the fields of linguistics and cognitive science closer to oneanother.

In the last thirty years, studies of language acquisition have tested infants and chil-dren of different ages depending on the structures that were investigated. Generativegrammarians have considered syntax as the core of human languages. Therefore, moststudies were initially carried out with young children that were capable of producingspeech utterances; namely starting at two-and-a-half-years old. A group of develop-mental psycholinguists began studying infants younger than eighteen months of age,focusing on speech perception and comprehension rather than speech production(Bertonici and Mehler 1981; Eimas et al. 1971; Jusczyk 1997). Other researchers wereaiming to uncover biological endowments privy to humans which license languageacquisition from the linguistic data. These researchers have discovered that the lin-guistic data allows very young infants to extract regularities and statistical distribu-tions (Marcus et al. 1999; Saffran, Aslin, and Newport 1996).

The field of language acquisition has benefited from the three methodologiesdescribed above. To illustrate some recent progress that experimental cognitive neu-roscience discovered, we review the hypothesis that infants use low-level cues to try tosegregate the input into two categories. Once these categories become available, theytend to be used for different functions.

Page 123: Rich Languages From Poor Inputs

i o 8 Hochmannand Mehler

The theoretical work of Noam Chomsky has influenced many linguists. It is impor-tant to acknowledge that cognitive scientists have had a tendency to ignore or misrep-resent his view and converge instead toward the study of statistical models as well asthe study of a few mechanisms that could help the infant to learn some properties ofthe target language rather than its grammar. This, we view as a diversion that has led tosome positive discoveries. Indeed, some of the mechanisms that have been the objectof study have influenced cognitive science because they had interesting properties.For instance, some statistical computations turned out to be interesting because theycould apply to all languages. Unfortunately, the problem is that many non-humanspecies are capable of computing the same statistics but do not converge to systemsthat have complex grammars like the natural languages deployed by humans. This is ageneral problem with classical learning that assumes that there is one general-purposemechanism that will serve for every possible kind of learning.

Thus Gallistel and King (2009) have recently stated that a 'computational/representational approach to learning, which assumes that learning is the extractionand preservation of useful information from experience, leads to a more modular,hence more biological conception of learning. For computational reasons, learn-ing mechanisms must have a problem-specific structure, because the structure ofa learning mechanism must reflect the computational aspects of the problem ofextracting a given kind of representation from a given kind of data. Learning mecha-nisms are problem-specific modules—organs of learning (Chomsky, 1975)—becauseit takes different computations to extract different representations from different data'(p. 219).

The above quote is supportive of experimentalists who try to discover the mech-anisms of language acquisition without ignoring the biological underpinnings thatmake humans the only species capable of learning from speech or sign languages. For-mal descriptions of how the different components of language function are acquiredtend to neglect the relevance of the specialized learning devices that are essential atdifferent ages. In fact, language has distinct components such as phonology, morphol-ogy, and syntax that animals are not able to learn and use to produce and comprehendnovel forms; only humans do it. It is our hunch that the consequence of ignoring thenotion of ancestry of the different species is a disaster. In fact, the competence of onespecies will not emerge in other species. There is no learning that will allow a monkeyto learn by classical learning the abilities of a bat or a bird. Nevertheless, some of theidealizations that behaviorists once proposed are making a comeback; an event thatstudents of cognitive science working in the early sixties would not have conceivedas possible. Our view is that behaviorist approaches are likely to fail again. Rather,we investigate the existence of language-specific representations and constraints ongeneral learning mechanisms.

In the first ten years of this new century we have explored language acquisitionfocusing on the specific functions of the categories that infants build in order to

Page 124: Rich Languages From Poor Inputs

Language Acquisition: Recent Findings 109

acquire language. Below we illustrate two cases that show that infants have a ten-dency to split the different continua carried by speech into two categories. Once thesecategories are established, each category assumes a function that can be essential toacquire language.

When linguists and psycholinguists describe the organization and acquisition oflanguages, they use a series of categories, attributing special roles to each of them.However, it is unclear whether these categories are convenient constructs for describ-ing experimental phenomena, or whether these are actually represented in infants'minds and play a role in language acquisition. In particular, linguists distinguishbetween two broad classes of words: content words (or open-class items), which carrymost of the semantics of sentences, and function words (or close-class items), whichmainly serve syntax. In our laboratory, we discovered that pre-lexical infants aresensitive to a distributional property that could allow them to identify function wordsin their language, i.e., the frequency of occurrence. Indeed function words constitutethe most frequent words of a given language.

The first study addressing this problem showed that infants are sensitive to thedistribution of frequent words in their language (Gervain et al. 2008). We exposedseven-month-old infants raised either in an Italian-speaking environment or in aJapanese-speaking environment to an artificial speech stream built as an alternationof high-frequency (Hi) and low-frequency (Lo) syllables (e.g.,.. .fi bage lufi to ge mefi si ge ka..., where fi and ge are Hi syllables). All syllables had a Consonant-Vowelstructure; they also had identical pitch and the same duration. The intensity of thespeech stream slowly increased at the beginning, and slowly decreased at the end, sothat infants could not perceive whether the stream started or ended with a frequent orinfrequent syllable. Using a head-turn preference procedure, we asked whether infantswould rather perceive this stream as a series of sequences starting with frequentsyllables (Hi Lo Hi Lo) or ending with frequent syllables (Lo Hi Lo Hi). Interestingly,Italian and Japanese showed opposite preference patterns. Italian infants preferred tolisten to sequences starting with frequent syllables, whereas Japanese infants preferredto listen to sequences ending with frequent syllables. Moreover, we showed that thispreference pattern correlates with the actual position of frequent words, and thereforethat of function words, in their language. Thus, the results of Gervain and colleaguesshow that seven-month-old infants learn the preferential position of frequent wordsin their language, and extend that property to novel frequent words (or syllables) inan artificial speech stream. This suggests that infants have created (or recognized) acategory characterized by a distributional property, i.e., frequency of occurrence.

In a subsequent series of experiments, Hochmann, Endress, and Mehler (2010)asked whether seventeen-month-old infants have different expectations about therole of frequent and infrequent words in language acquisition. In particular, if thecategory of frequent words is related to the category of function words, infants shoulduse frequent and infrequent words in a way relevant to the use of function and

Page 125: Rich Languages From Poor Inputs

no Hochmann and Mehler

content words, respectively. We thus conjectured that infants should rely more oninfrequent words when learning the label of a novel object. In Hochmann et al.'s studyinfants were tested in an experiment consisting of three phases. First, participantswere exposed to eighty-one short French sentences, e.g., ce chat mange vos mete. Theparticipants in this experiment had no exposure to French since they were born inItaly from Italian families. The sentences were constructed to ensure that the twodeterminers—in bold—were nine times more frequent than nouns and verbs for thefamiliarization phase. Thus the Hi and the Lo syllables have distributional propertiesthat are similar to the ones found in infant-directed speech. All the words used inthe French sentences were monosyllabic and sentences were pronounced by a nativespeaker of French. Following this exposure phase, we tested the conjecture that infantsassociate low-frequency rather than high-frequency syllables with a novel object. Inthe second phase of the experiment, a$D object was presented on a computer screen,while a det-noun utterance (e.g., ce chat) was repeated. We showed in a pilot studythat infants associate the bisyllabic phrase with the object. Finally, in the third phase,we asked whether infants formed a stronger association between the object and thefrequent determiner or the infrequent noun. To this end, we displayed two objects onthe screen; one was the object that they had just learned and the other was a novel one.While the infants were looking at this display, the participant heard a new det-nounphrase, e.g., vos chat in which a new determiner appears with the familiar noun (same-noun condition) or ce met in which the same determiner appears associated witha new noun (same-determiner condition). We used infants' first looks to determinewhich objects infants associated with the test utterances. Infants' first looks were morelikely to go to the side of the familiar object in the same-noun condition than in thesame-determiner condition. In a control experiment, we showed that infants do notlook preferentially at one or the other object in either of the two conditions, if theyhad not undergone the first phase of exposure to French sentences.

The above results suggest two important conclusions. First, infants who hear well-formed sentences drawn from a language they had never heard before classify frequentsyllables in one group and infrequent syllables in another group, or category. Second,once the categories are established, infants who listen to a bisyllabic acoustic utterance,say ce chat, i.e., this cat, will rely on the infrequent chat syllable to pair with a referentobject. The frequent syllables were mostly ignored in this task, even though the sameevidence was provided for the association of the frequent and infrequent syllableswith the object. Thus infants appear to have expectations about the roles of frequentand infrequent words in their language, expecting infrequent rather than frequentwords to refer to objects in the world. This study together with various other controls(Hochmann submitted) suggests that infants tend to build binary classes when con-fronted with continuous property such as frequency. Moreover, once binary classesare established, infants tend to use the classes for different purposes.

Page 126: Rich Languages From Poor Inputs

Language Acquisition: Recent Findings 111

If the above conjecture is correct, the most frequent syllables are ignored whenparents pronounce 'this is a cat' and babies can understand the animal pointed at iscalled cat rather than acat, isacat, or thisisacat. Obviously, as has been pointed out byvarious theorists (e.g., Mintz, Bever, and Newport 2002), the high-frequency syllablesare used as anchor points that inform the needs of syntax (see for example Braine 1966;Green 1979; Valian and Coulson 1988). We conjecture that infants start segregatingthe syllables carried by the linguistic data into two classes of syllables, defined on thebasis of the frequency distributions. Once this is done, infants will be able to linkthe most frequent syllables with the hierarchical structure of the syntactic tree andwill use the infrequent syllables to learn the labels of the nouns or verbs that werehighlighted in the input. Once this first step is achieved, more abstract computationsand generalizations may take place in order to individuate other word classes such asadjectives, adverbs, cognitive verbs, etc. However, the first step into syntax seems toarise from the computation of statistical distributions to form (or recognize) binaryclasses and use one of the classes for the needs of the lexicon and the other for theneeds of syntax.

The above conjecture may be just a minor example of an application that infantsdisplay in the course of language acquisition. Possibly such an application arises withthe statistical distribution that is computed to form a binary divide between high-and low-frequency syllables. Below we will argue that the formations of classes orcategories on the basis of surface properties are often found while language acquisitionis taking place.

Another categorical distinction that linguists and psycholinguists have extensivelyused in their models, and that infants appear to use in language acquisition, is the dis-tinction between vowels and consonants. In fact, based on the observation of a seriesof linguistic phenomena, Nespor, Pena, and Mehler (2003) proposed that consonantsand vowels serve different functions in language acquisition. According to the so-called CV hypothesis, consonants are favored in the acquisition of the lexicon whilevowels mainly serve the development of syntax. A number of experimental resultssupport the CV hypothesis. Specifically, observing that infants and adults are ableto use statistical information, such as dips in transition probabilities (TPs) betweensyllables to identify word boundaries in a continuous speech stream (Saffran, Aslin,and Newport 1996), Bonatti, Pena, Nespor, and Mehler (2005) tested whether adultparticipants could equally use TPs over consonants and TPs over vowels to segment acontinuous speech stream. They found that participants would use consonantal statis-tical information, but not the vocalic statistical information. Moreover, Mehler, Pena,Nespor, and Bonatti (2006) showed that when in one stream, TPs between consonantsand TPs between vowels predict different segmentations, the consonant statistics arefavored. Thus, consonants appear to be a privileged category for discovering words ina continuous speech stream.

Page 127: Rich Languages From Poor Inputs

112 Hochmann and Mehler

A series of word-learning experiments in infants also support the hypothesizedadvantage of consonants in encoding lexical items. Nazzi and colleagues (Nazzi,Floccia, Moquet, and Butler 2009) showed that in a word-learning situation wherethirty-month-olds must ignore either a consonantal change or a vocalic change (e.g.,match a /duk/ with either a /guk/ or a /dok/), both French- and English-learninginfants choose to neglect the vocalic change rather than the consonantal change. Thispreference was observed for word-initial (/guk/-/duk/-/dok/), word-final (/pib/-/pid/-/ped/), and word-internal consonants (/gito/-/gipo/-/gupo/), and did not dependon an inability to process fine vocalic information. In agreement with these results,sixteen- to twenty-month-old infants could acquire simultaneously two words dif-fering only in one consonant, whereas they could not do so for minimal pairsdiffering in one vowel (Havy and Nazzi 2009; Nazzi 2005; Nazzi and Bertoncini2009).

Whereas consonants are favored for learning novel words, the CV hypothesis pre-dicts a reliance on vowels for extracting and generalizing structures. Testing this pre-diction, Toro, Nespor, Mehler, and Bonatti (2008) showed that while adult participantseasily learned the repetition-based structure ABA over vowels (i.e., the vowels of thefirst and last syllables of trisyllabic words are identical; e.g., tepane, badeka, topano),they were unable to learn the same structure over consonants (i.e., the consonantsof the first and last syllables of trisyllabic words are identical; e.g., binebo, nibeno,banube). Adults remain unable to generalize ABA over consonants even when vowelduration was reduced to one third of the duration of consonants, while they couldgeneralize ABA on barely audible vowels (Toro, Shukla, Nespor, and Endress 2008).Thus, the reliance on vowels for extracting repetition-based structures is not solely dueto a major acoustic salience. Rather, vowels and consonants are involved in differenttypes of processes.

Recently, Hochmann, Benavides-Varela, Nespor, and Mehler (2011) showed thatthe specialization of vowels and consonants has already emerged by the end of the firstyear of life. They tested twelve-month-old infants in two experiments. Participantsneeded to learn to predict the location where a puppet would appear on the basis oftwo speech sequences. In a first experiment, the toys would appear in one locationafter infants heard the pseudo-word kuku, and in a second location after they heardthe pseudo-word de.de.. After this learning phase, participants were tested with twoambiguous words. Each one was composed with the original consonants of one of theformer words and the vowels of the other (i.e., keke or dudu). We explored whetherone-year-olds would expect the toy to appear in the location predicted by the con-sonant sequence or in the location predicted by the vowel sequence or whether theywould be utterly confused. We found that twelve-month-olds behave as adults, theylearn to go toward the side predicted by consonants.

In a second, very similar experiment, the location of the toys' appearances wasno longer predicted by single pseudo-words, but by the structure of several speechsequences characterized by a generalization, namely all of the items predicting one

Page 128: Rich Languages From Poor Inputs

Language Acquisition: Recent Findings 113

location had a vowel repetition (e.g., dulu,fodo, /a/a), whereas the other location wascued by other speech sequences that all had a consonant repetition structure (e.g.,dodu, fufa, lalo). After a learning phase, infants were tested with novel sequencesrespecting the vowel or the consonant repetition. Infants correctly anticipated thetoys' appearances for novel words for items that had a vowel repetition (e.g., meke),and not for novel words with a consonant repetition (e.g., mend}, showing that theygeneralized the structure implemented over vowels, but not that implemented overconsonants. Thus, Hochmann et al. showed that twelve-month-olds privilege conso-nants when processing single words, but rely on vowels instead when generalizing astructure.

The above results suggest that the categories of consonants and vowels are not solelyconvenient constructs for linguists and psycholinguists, but are actually representedin infants' minds. Moreover, by the end of the first year of life, infants have differ-ent expectations about the type of information carried by consonants and vowels,expecting lexical information to be carried by consonants and structural informationby vowels. Whether these categories and the expectations about their functions areinnate or learned from twelve months of exposure to speech remains a problem toexplore. Still, our results make it clear that twelve-month-old infants make use ofthe categorical distinction between consonants and vowels in the course of languageacquisition.

An interesting parallel can be drawn from the two series of studies discussed in thischapter. In both cases, infants appear to operate a partition of speech units on the basisof perceptual or distributional properties, and use the resulting categories differentlyto extract information about the lexicon or structural properties of the input. Infantsappear to segregate words or syllables into two classes according to their frequencyof occurrence, and understand that frequent words should not be privileged whenlearning about referents (possibly an important part of learning about semantics).Frequent words may rather be used when learning about syntax. At another level, inan initial stage, vowels and consonants may be recognized as more or less sonorantspeech sounds. By the end of the first year of life at least, these speech sound cate-gories are associated with different functions. Infants focus on consonants to encodewords in memory, but rely on vowels to generalize structures. Certain perceptual anddistributional properties thus appear to serve as cues for infants to recognize relevantcategories associated with specialized functions in language acquisition. A new fieldof investigation will consist in understanding the origin of such specialized categories.These may be learned from the input, or, just like other core cognitive concepts(e.g., the concept of object), pre-exist in infants' minds, needing only to be triggered byspecific perceptual properties (Carey 2009; Spelke 1990; Spelke et al. 1994). Languageacquisition may thus initially rely on a series of core linguistic representations thatare triggered by specific perceptual or distributional properties, as exemplified herewith sonority and frequency. Core representations will then be enriched by experienceso as to yield the mature representations that adult speakers entertain. The way core

Page 129: Rich Languages From Poor Inputs

ii4 Hochmann and Mehler

representations are enriched is however constrained and not all representations feedthe same learning mechanisms. In fact, this view fits well with the Principles andParameters approach proposed by Noam Chomsky (1981). In this view, universallinguistic principles constrain the possible human languages. Some of these principlesmay take the form of core linguistic representations. In Noam Chomsky's view, param-eters are seen as switches that should be put in one or another position according to theinformation extracted from the input. Parameter setting would thus serve to enrichcore representations, and yield the more detailed representations found in the finalstate of language acquisition. In honor of Carol Chomsky, we hope to have providedan interesting novel framework that may trigger fruitful research on the mechanismsand representations involved in language acquisition.

Page 130: Rich Languages From Poor Inputs

8

Ways of Avoiding Intervention: SomeThoughts on the Development ofObject Relatives, Passive, and Control

ADRIANA B E L L E T T I AND LUIGI RIZZI

Children at the stage of language learningwhich borders on adult competence can offervaluable material for studying degrees ofcomplexity that may be otherwise difficult to detect.

C. Chomsky (1969: 121)

8.1 Introduction

In her The Acquisition of Syntax in Children from 5 to 10, Carol Chomsky observedthat certain syntactic constructions such as subject control with promise type verbsare acquired surprisingly late. She argued that initially children strictly adhere toRosenbaum's (1967) Minimal Distance Principle, barring subject control, and excep-tions to the principle are acquired at later points in development. In this chapterwe would like to argue that these cases are part of a much larger class of syntacticconfigurations which are 'complex' for the child to compute. We propose that thecritical notion is 'intervention': the child cannot compute a local relation across anintervener close enough in structural type to the target of the relation. In fact, thisfollows from a general locality principle, Relativized Minimality (RM: Rizzi 1990,2004), also holding in adult grammars; our hypothesis is that the intervention effectcan be voided through the adoption of certain structural strategies which becomeaccessible only at later stages in development. We will first present two cases discussedin previous work, which involve different adult strategies avoiding intervention: objectrelatives and the passive; we will then go back to control and try to trace back subjectcontrol to a similar explanatory scheme.

Page 131: Rich Languages From Poor Inputs

116 Belletti and Rizzi

It is a well-established fact that children till after age five are unable to properlycomprehend and produce object relatives while subject relatives are understood andproduced as soon as the relative construction is mastered around age three. In jointwork with Naama Friedmann (Friedmann, Belletti, and Rizzi 2009), we have arguedthat the difficulty arises from the intervention of the subject in the chain connectingthe head of the relative with the object gap:

(i) Show me:The lion that the elephant wets <the lion>

We traced back this effect to RM.

8.2 Some Relevant Background on RM

Theoretical studies have introduced RM as a locality principle capturing interventioneffects.

In a configuration like the following:

(2) X . . . Z . . . Y

a local relation between X and Y cannot hold if Z intervenes, and Z is a position ofthe same type as X (Rizzi 1990, 2004). This principle has been expressed in slightlydifferent forms under different names in various minimalist models (e.g., MinimalLink Condition, Minimal Search, Featural Relativized Minimality: see Chomsky 2000,2001, 2008; Boeckx 2008; Starke 2001). The initial empirical motivation for the prin-ciple had to do with the unextractability of certain wh-elements from weak islandslike indirect questions, as illustrated in (3):

(3) a. How do you think [ John behaved ]?

b. * How do you wonder [ who behaved ]?

c' * How do you wonder [who behaved ]?

X Z Y

In representation (30) the relation between how and its trace is disrupted bythe intervention of who, an element belonging to the same featural type as how,a wh-operator.

The central idea of Friedmann, Belletti, and Rizzi (2009) is that the same principleis implicated in the difficulty that children experience with object relatives. If it seemsnatural to relate all intervention effects to the same core principle, an immediatedifficulty is represented by the fact that the intervention of the subject does not appearto affect the acceptability of object relatives in adults. A possible analytic path is offered

Page 132: Rich Languages From Poor Inputs

Ways of Avoiding Intervention 117

by the observation that intervention effects are modulated by certain characteristics ofthe intervener (Z) and the target (X) of the relation. For instance, the unacceptabilityof sentences like (3!)) decreases if the constituent moving to the target position is nota simple wh-word, but a complex wh-phrase containing a full lexical noun phrase,which we will refer to from now on as 'lexical restriction'.1 This is illustrated in (4):

(4) ? Which problem do you wonder how to solve <which problem>?

Straightforward evidence that presence vs absence of a lexical restriction plays acrucial role is offered by the paradigm of combien extraction in French illustratedin (5a, b):

(5) a. ? [Combien de problemes] ne sait-il pas [ comment resoudre _ ]?'How many of problems don't you know how to solve?'

b. * Combien ne sait-il pas [ comment resoudre [ _ de problemes ]?'How many don't you know how to solve of problems?'

(6) a. [Combien de problemes] a-t-il resolus _ ?'How many of problems has he solved?'

b. Combien a-t-il resolu [ _ de problemes]?'How many has he solved of problems?'

As illustrated in (6), the wh-element combien can either pied-pipe the whole DP asin (6a) or move alone leaving the lexical restriction in situ, as in (6b). However inthe weak island environment in (5), only the pied-piped version yields a well-formedresult. Why should the presence vs absence of a lexical restriction in the moved phrasemake a difference? There is substantial cross-linguistic evidence that presence of alexical restriction in the wh-phrase has a grammatical impact. As an illustration,consider the contrast in (7) in Italian:

(7) a. * Dove Gianni ha messo il libro?'Where did Gianni put the book?'

b. In che cassette Gianni ha messo il libro?'Into which drawer did Gianni put the book?'

The difference in (7) has been interpreted as due to the fact that the two types of wh-phrases target distinct positions in the left periphery of the clause, such that only theposition targeted by the simple wh-element triggers obligatory leftward movementof a verbal constituent (Rizzi 1997, 2006), thus disallowing the subject to break theadjacency with the inflected verb.2

1 We will not discuss here the possible role of interpretive properties, such as the 'discourse-linked'character of the lexical restriction.

A particularly striking piece of evidence for a positional difference is offered by certain North East-ern Italian dialects, which exhibit patterns like the following, discussed in Munaro (1999), in which the

2

Page 133: Rich Languages From Poor Inputs

118 Belletti and Rizzi

Going back to the asymmetries in (4) and (5), the relevance of the presence of alexical restriction within the wh-phrase undergoing movement can be expressed interms of the featural approach to RM developed in Starke (2001) and summarized inthe schema in (8), where A and B are abstract morphosyntactic features triggeringmovement:

If we look at the set-theoretic relations holding between the feature specification ofthe target X and the intervener Z, three main cases arise: identity, inclusion, anddisjunction. When the intervener s specification is identical to the target specification(81) as in examples (3!)) and (sb), the structure is ruled out by RM. When the featuralspecification is disjoint (8III) as in, e.g., wh-movement across a subject (as in (33)),the principle is satisfied and the structure is well formed. Under Starkes approach theprinciple is stated in such a way that it also rules in case (811), in which the featuralspecification of the intervener is properly included in the featural specification of thetarget. Therefore examples like (4) and (53) are ruled in under this approach. Theinclusion configuration is illustrated in (9) for example (4), with the target specifiedas both +Q (the feature designating interrogative operators) and +NP (the featuredesignating nominal expressions with a lexical restriction):

(9) The Inclusion configuration:

? Which problem do you wonder how to solve <which problem> ?

lexically restricted wh-phrase and the simple wh-phrase end up in clause-initial and clause-final position,respectively:

(i) a. Con che tosat a-tu parla?'With which boy did you speak?'

b. Ave-o parla de chi?'Have you spoken of whom?'

According to Munaro (1999), both wh-phrases move to the left periphery, but the bare wh-phrase targetsa lower position, and the derivation then involves further remnant movement of the rest of the clause toa higher topic-like position, while the lexically restricted wh-phrase targets an even higher position in theCP. See also Poletto and Pollock (2009).

+Q, +NP

X

+Q

Z

+Q, +NP

Y

(8) X

I) +A

II) +A.+B..

Ill) +A

Z

+A..

, .+A. .

+B. . .

Y

.. <+A>

,.<+A,+B>

.. <+A>

#

OK

OK

(identity)

(inclusion)

(disjunction)

Page 134: Rich Languages From Poor Inputs

Ways of Avoiding Intervention 119

8.3 Object Relatives in Children

This approach permits a natural characterization of the difficulty that children experi-ence with object relatives (and other A' constructions on the object).3 In Friedmann,Belletti, and Rizzi (2009) it is assumed that in the early systems a stricter versionof the principle applies than in adult grammars, to the effect that also the inclusionconfiguration of (8) is ruled out: in this more restrictive system, even the identityof a single constitutive feature shared by the target and the intervener suffices todisrupt the relation. Thus in (i) it is the lexically restricted ([+NP]) character of boththe relative head and the intervening subject which disrupts the chain connectionbetween the relative head and the object gap for the child. This is illustrated in (10)for example (i); +R stands for the scope discourse (or 'criteria!') feature attractingthe relative head, and +NP characterizes its lexically restricted nature as is the case inheaded relative clauses:

(10) Headed object relative crossing over a lexically restricted subject:

Show me the elephant that the lion is wetting <the elephant >

+R, +NP +NP +R, +NP

X Z Y

In the series of experiments reported in Friedmann, Belletti, and Rizzi (2009),twenty-two children speaking Modern Hebrew aged 3;/-5;o (M = 4;6, SD = 055)were tested on the comprehension of different kinds of A' constructions in Hebrew.Expectedly, they comprehended well subject relatives in which no intervention arises,but had severe problems in the comprehension of object relatives (90 percent SRs,55 percent ORs).

In order to test the validity of the hypothesis that RM is involved in the problematiccomprehension of (10), we manipulated the lexical restriction feature either on therelative head or on the intervening subject, thus turning the inclusion configurationof (10) into a disjunction configuration. The prediction was that the comprehensionshould significantly improve in both cases. A construction illustrating the first type ofmanipulation is a free relative in which a lexically unrestricted wh-pronoun crossesover a subject as in (11):

(11) Free object relative:

Show me who the lion is wetting <who>+R +NP +R

The second manipulation is a structure where the lexically restricted relative headcrosses over a lexically unrestricted subject pronoun as in (12):

3 The question whether the difficulty with object relatives is general or is restricted to languages withhead-initial relative constructions is controversial. See Laka (this volume) for a discussion on Basque whichhighlights the possible relevance of ergativity over and above the headedness of the language.

Page 135: Rich Languages From Poor Inputs

120 Belletti and Rizzi

(12) Headed object relative crossing over an intervening pronominal subject:

Show me the elephant that they are wetting < the elephant >+NP.+R +Pron +NP.+R

Here we have reported the English equivalent of the Hebrew constructions that weretested. The prediction of the RM approach was borne out in both cases. In the freerelative case of the type in (11) the comprehension of the object relative raised from55 percent to 79 percent, getting close to the subject free relative value (84 percent).In the case of the crossing of a pronominal subject, comprehension of the headedobject relative also raised up to 83 percent (Friedmann, Belletti, and Rizzi (2009):experiments, i, 3, and 4 for details).

Summarizing, these results showed that comprehension significantly improved inchildren if the target and the intervener were made structurally dissimilar, in factfeaturally disjoined, with only one of them being lexically restricted.

As for adults, the selective violability of weak islands as in (9) shows that their gram-mar tolerates situations of featural inclusion. Hence, the crucial case of headed objectrelatives crossing over a lexically restricted subject is expected to be unproblematic.Under this approach, the same formal principle, RM, applies in a slightly stricter formin children than in adults. In the reference quoted, we speculated that the reluctanceof children to admit configurations of proper inclusion relates to the difficulty ofcomputing such configuration involving comparisons between feature sets. The child'ssystem thus rules in only the simpler case of disjunction.4 This approach develops thesame line of analysis as Grilles (2008) approach to the difficulties that agrammaticspeakers experience with different A' constructions on the object, a difficulty which isalso related to RM, applying in systems with reduced computational resources.

8.4 Passive and Intervention

Another well-known instance of a structure which successfully overcomes a potentialintervention configuration in adult grammars is the passive. If the external argumentin a passive structure is syntactically projected (as would follow from Bakers 1988Uniformity of Theta Assignment Hypothesis, a hypothesis also supported by the factthat the external argument is a syntactically active implicit argument), then movementof the object to the clausal EPP position would seem to cross over the position ofthe intervening external argument, a straight violation of RM. Collins (2005) hasproposed that intervention here is voided by a preliminary leftward movement of a

4 See Belletti, Friedmann, Brunato, and Rizzi (loio/submitted) for the discussion of the remaining set-theoretic relation, intersection, in children and adults in the context of the discussion of the possible statusof the gender feature.

The greater complexity of object relatives manifests itself also with adults in a variety of experimentalconditions in psycholinguistics: see Warren and Gibson (2002) and much experimental work discussedthere.

Page 136: Rich Languages From Poor Inputs

Ways of Avoiding Intervention 121

VP chunk containing the verb and the object and excluding the external argument,the operation he refers to as smuggling. The object can then move from the derivedposition of the VP chunk, without violating RM as illustrated in (13); to enhance clar-ity, the moved (smuggling the object DP in Collins's terms) VP chunk is highlightedin grey in (13):

The acquisition of passive is also delayed in acquisition (a well-known developmentalfinding; e.g., Borer and Wexler 1987, and much subsequent work). Under Collins'sanalysis, it is natural to conjecture that the complexity of passive structures may bedue to the operation moving the VP chunk (smuggling) that they involve (see Hyamsand Snyder 2007 for a similar proposal).

If an analysis a la Collins is adopted, the way intervention is avoided in passivestructures is different from what we have seen in object relatives: in the passive,intervention is simply eliminated through the additional movement operation of theVP chunk. This opens up the interesting possibility that object relatives and the passivedevelop independently. We may then expect that the cost in terms of complexity inchild grammar will not be equal, one being favored over the other. That this may be sois shown by striking recent experimental results of elicited production of object rela-tive clauses in Italian (an adaptation of Novogrodsky and Friedmann's 2006 design).At the age when the passive starts being acquired by children (after around age five inItalian), one clearly preferred strategy for coping with the production of the complexobject relative structure is to transform it into a subject relative through the passive(Belletti 2009; Utzeri 2007; Belletti and Contemori 2010, and references cited there).

As an illustration, consider the pair in (14); instead of the object relative that wastargeted in the elicitation experiment, children often opted for the production of asubject relative with the passive:

(i4) Elicited Object Relative: Vorrei essere il bambino che la mamma copreI would rather be the child that the mother covers

Produced (Subject) Relative: Vorrei essere il bambino che e coperto dallamammaI would rather be the child that is covered by themother

The passive choice follows a clear developmental path, with older children adoptingit to a larger extent (up to 59 percent in children aged six to eleven; from 8 percentat age five, to 22 percent at age six in younger children. Children had no problemsin producing subject relatives, also elicited in the experiment, from the youngestgroup on: e.g., 89 percent at age four; see Belletti (2009) and Belletti and Contemori

(13) ITP [VP V DP] by [VP DP <[ V P V DP]>]

Page 137: Rich Languages From Poor Inputs

122 Belletti and Rizzi

(2010) for detailed discussion).5 In the same elicitation experiment, two groups ofItalian adults have been tested: the results were particularly striking as the two groupsproduced up to 93 percent and 87 percent of relatives with the passive, respectively,when an object relative of the type in (14) was the target of elicitation (Elicited Subjectrelatives were produced at ceiling, e.g., 99.5 percent in the second group). Clearly,use of the passive yielding the production of a subject relative in place of an objectrelative, with no change in the distribution of the theta roles associated to the differentarguments, is a favored option under the experimental conditions set in the elicitationdesign; and children tend to approach the adult performance as they grow older.6

From the perspective of the question raised above, it can be concluded from theseresults that not only do object relatives and the passive develop independently, butalso that the computation involved in the passive is favored over the one involvedin the direct derivation of an object relative. So, through the passive, the (headed)object relative is realized as a subject relative, preserving the intended meaning andthe interpretation of the arguments. And the passive is adopted in the same relativeenvironment, as soon as development allows it.7

Why should the passive be favored over the direct computation of an object relative?It is natural to think that the relevant crucial notion is intervention: the derivationwhere intervention is totally avoided, as happens in the passive under the analysisimplicating movement of the verbal chunk through smuggling, is the derivation whichis adopted.8

Moving verbal chunks through operations ultimately reducible to Collins's smug-gling, where portions of the verb phrase are involved, is a fairly widespread com-putational option; its instantiation in the passive is just one of its occurrences. Thefact that intervention by the external argument is avoided through this derivationalmechanism in the case of the passive is just one of its consequences. Hence, there isno special status of smuggling strictly related to passive in this respect.

Another structure where movement of a verbal chunk can be argued to be impli-cated in a manner similar to the passive is with psych-verbs ofihepreoccupare (worry)class illustrated in (15) (see Belletti and Rizzi 2009 for further details; Belletti and Rizzi1988 for the original analysis of psych-verbs on which the new account is based):

5 Data on older children, currently under collection in ongoing work by C. Contemori and A. Belletti,confirm this path.

Similar results have also been found in various other languages for children at age five (Friedmann et al.(in prep) report results from several languages investigated through the same elicitation method under theframe of the Cost Action/33, Cross-linguistically Robust Stages of Children's Linguistic Performance).

7 Younger children up to age five tend to produce different structures, often misinterpreting the task,when an object relative is elicited. See the references quoted for detailed discussion.

8 Of course, this holds all other things being equal, including the general appropriateness of the derivedsentence, also from the point of view of its informational value, an aspect that we do not discuss here forreasons of space. See also Gehrke and Grillo (2009) on a semantic motivation for movement of a verbalchunk in the passive, tied to the event structure of the verb phrase.

6

Page 138: Rich Languages From Poor Inputs

Ways of Avoiding Intervention 123

(15) a. Questo problema preoccupa GianniThis problem worries G.

b. [.... [ vp...v [vpExperiencerv ,[vp preoccupa Theme]]]]

In (isb) the verbal chunk containing the verb and the theme-internal argument ismoved to some higher position within a complex vP shell, crossing over the higherexperiencer argument; extraction of the theme to the clausal EPP position to yieldthe structure in (153) can occur from this derived position. Again, intervention isavoided in exactly the same fashion as in the passive in this derivation. The onlydifference is that the movement of the verbal chunk is triggered here by some lexicalproperty of the verb of this class, not by some morphosyntactic property, as is thecase in the passive.9 Similar manipulations of the clausal structure involving differentportions of the verb phrase, triggered by different properties—lexical, morphosyn-tactic, informational—have been proposed in the literature, e.g., in Cinque (1999)the movement of the portion of functional structure immediately including the verbphrase produces an apparent reordering of the adverbial hierarchy; in Belletti (2004)the movement of a portion of the verb phrase containing the verb and object andexcluding the external argument yields a vP peripheral clause-internal topicalizationof the verb-object sequence with the subject remaining postverbal; and in ongoingwork by D. Sportiche and I. Roberts, Romance causatives are analyzed along lineswhich capture the fundamental insight of classical analyses making reference to VP-preposing (Burzio 1986; Rouveret and Vergnaud 1980). This very sketchy list is simplymeant to indicate that the process which moves a verbal chunk in the passive, withthe effect of remodulating intervention from the external argument with respect tothe internal argument, is not an isolated process. From an acquisition perspective theprocess seems to need some time to develop, as is shown by the delay of the passivein the first years of syntactic development; but once it develops, children and adultsappear to resort to it quite extensively. Avoiding intervention through smuggling isthus a widespread option across the board for adults and older children.

8.5 Speculations on Subject Control and Intervention

Going back to control, we may now try to explain the delay of subject control withpromise type verbs through the same theoretical ingredients. Consider the follow-ing possible approach to control: PRO and its controller must be connected by asearch operation (Agree-like, as in N. Chomsky 2000) constrained by RM. Control is

9 In Belletti and Rizzi (2009), we speculate that some 'cause' type verbal head (or some close equivalentin systems of lexical decomposition as in, e.g., Ramchand 2008) could be the trigger of the verbal chunkmovement in this case, thus incorporating Pesetskys (1995) observations on the special interpretive statusof the theme in his comments to our original (1988) analysis. See the reference quoted for more detaileddiscussion.

Page 139: Rich Languages From Poor Inputs

124 Belletti and Rizzi

therefore local and obeys the Minimal Distance Principle now subsumed under RM(a result that can thus be achieved without necessarily adopting a movement approachto control: see Hornstein 1999, Landau 2003 for discussion).

If this is so, subject control across an intervening object should be barred inprinciple:

(16) Bill promised John [ PRO to leave early ]

This straightforwardly accounts for the fact that children systematically misinterpretsuch sentences as cases of object control, Carol Chomsky's result. But why is subjectcontrol possible at all in adult grammar? We have considered two possible techniquesthat avoid intervention in different types of local relations:

1. The intervention configuration holds, but the featural specification of the inter-vener is properly included in the featural specification of the target (this is thecase of object relatives in adult grammars).

2. The intervention configuration is destroyed by a movement of a verbal chunk('smuggling'), which bypasses the intervenes

So, a natural possibility to explore is that subject control across an object in the adultgrammar may involve one of these avoidance techniques. A recourse to techniquei does not look very plausible: subject and object in (16) don't look amenable to anatural featural differentiation making the object's specification a proper subset ofthe subject's specification and making the case different from a well-formed objectcontrol:

(17) Bill ordered John [ PRO to leave early ]

So, we are left with hypothesis 2: subject control verbs undergo a smuggling-typeprocess which makes the subject the closest controller for PRO. In this sense, subjectcontrol across an object would be akin to raising across an experiencer (John seems toMary to be a nice guy), which, according to Collins (2005) also involves a smugglingoperation voiding the intervention of the experiencer (again, under the approach tocontrol we have outlined this analogy between control and raising would not imply araising analysis of control). Smuggling operations are costly (even though they maybe less costly than the other avoidance technique of computing inclusion relations,as we have argued in section 8.4), and are acquired only in a relatively late temporalwindow, as the case of passive shows: this would give us a key to understanding thedelay of subject control in acquisition.

But why would a smuggling-like operation apply with promise type verbs, andnot with order type verbs? Another quote from Carol Chomsky's seminal book ishelpful here:

Page 140: Rich Languages From Poor Inputs

Ways of Avoiding Intervention 125

Promise is in a distinct syntactic category from these command verbs [order, force, compel,require,...]. We may say that each semantic class (command verbs on the one hand and promiseon the other) has associated with it a separate syntactic process. (C. Chomsky 1969: 12)

We would like to try to follow Carols hint that the syntax-lexical semantics interfaceplays a critical role here by sketching out an analysis in terms of lexical decompositionof the verb-structured meaning d la Hale and Keyser (1993) and much subsequentwork in a similar vein on the lexical and syntactic properties of verbs (in particular,Folli and Harley 2007; Ramchand 2008; Travis 2000). Promise and order allow para-phrases with different light verbs plus nominal elements:

(18) John made Bill the promise to leave early

(19) John gave Bill the order to leave early

Following this type of approach, we will assume that lexical verbs such as promise andorder are obtained through incorporation of the lexical object noun into the relevantlight verb. If give in (19) is further decomposed as make-have, we have the followingrepresentation (in which vx is a light verb with 'interpretive flavour' x):

(20) John vmake [ Bill vhave [ order [PRO to go ]]]

Here order incorporates first into light verb Vhave and then into vmake; the object Bill isthe closest potential controller for PRO in the derived representation, hence we haveobject control in this case.

Consider now promise. If (18) is an accurate paraphrase close enough to the abstractsyntactic representation, and vmake does not further decompose, we do not have averbal small clause comparable to (20). How is the object integrated into the structurethen? We may think that Bill in (18) is a kind of benefactive of the promise. If thebenefactive relation is mediated by a benefactive (ben) particle-like functional head,we would have the following representation:

(21) John vmake[ Bill ben [ promise [PRO to go ]]]

Here the noun promise should incorporate into the light verb vmake to create the verbpromise. But in (21) the two elements aren't local enough for incorporation to takeplace because of the intervening head ben. The problem can be overcome by a leftwardmovement of the chunk [promise [PRO to go]], which 'smuggles'promise to a positionsuitable for incorporation:

(22) John vmake [promise [PRO to go]] [ Bill ben t ]

At this point the noun promise can incorporate into the light verb; the surface wordorder is obtained through extraposition of the infinitive:

(23) John promise+vmake [[ tpromise t infinitive] [ Bill ben t ]] [PRO to go]10

10 We suggest that extraposition is made compulsory here by the necessity of having Bill adjacent tothe case assigning/checking v. One might observe that extraposition of the infinitival clause to a high

Page 141: Rich Languages From Poor Inputs

126 Belletti and Rizzi

On this representation the object Bill does not c-command PRO, hence it does notstructurally intervene between the subject and PRO. Subject control thus obtains, asthe subject is the closest potential controller. A crucial step for producing the rele-vant configuration for subject control across the benefactive object is the 'smuggling'operation extracting the phrase [promise [PRO to go]] from the c-domain of the object.The delay of subject control can thus be looked at as a particular case of the delay thatchildren experience with smuggling operations.

Subject control involves, in this analysis, at least as much derivational machineryas passive, and in fact more: the obligatory extraposition of the infinitive, arguablymotivated by case-theoretic considerations (see fn. 8) makes the derivational com-putation of subject control, under our analysis, more complex than passive.11 It maythus be expected that subject control develops even later than passive, and with theindividual variation that Carol Chomsky discovered.

structural position in the clause not c-commanded by the object would be sufficient to the void interventionconfiguration; no appeal to movement of the verbal chunk would be needed under this view. However,there would be no principled reason to force obligatory extraposition exactly with these verbs; the optionwould be stated in an ad hoc way specific to this particular lexical choice and construction. In contrast,the approach we have elaborated in the text suggests principled reasons as to why both movement of averbal chunk and extraposition should occur with subject control verbs. We have expressed extrapositionin traditional rightward terms, but an antisymmetric analysis with double movement to the left (Kayne1994) would also yield the required configuration.

11 The derivational complexity of subject control would be akin to the derivational complexity of raisingacross an overt experiencer, which also involves both smuggling and extraposition under Collins's analysis.Hirsch and Wexler (loo/b) have in fact provided experimental evidence that comprehension of raisingacross an experiencer is substantially delayed in acquisition. We would expect its developmental course tobe roughly on a par with the development of subject control.

Page 142: Rich Languages From Poor Inputs

Merging from the Temporal Input:On Subject-Object Asymmetries andan Ergative Language*

ITZIAR LAKA

Our first task in the study of a particular structure implicit to adult language behavioris to ascertain its source rather than immediately assuming that it is grammaticallyrelevant.

Bever (1970)

9.1 Introduction

Research often reveals the abyssal depth of our ignorance in what might initiallyhave seemed like a shallow puddle; discovering this is essential to formulating newquestions, to finding new answers to old ones and to discarding those questions thatturn out to be ill-formed. Carol Chomsky saw a sea of mystery in a child's tentative,'half-cooked' language, though according to most linguists and thinkers of her timethis pool of evidence held shallow scientific potential. However, her experimentalapproach to language acquisition revealed that syntactically complex structurestook significantly longer to master than was originally assumed, and exposed thegulf between children's language production and their linguistic knowledge. It alsounderscored the importance of finding new means to study language, not solely

* I wish to thank Robert Berwick and Massimo Piattelli-Palmarini for the invitation to take part inRich Languages from Poor Inputs: A Workshop in Honor of Carol Chomsky. I also thank the audience andparticipants, particularly Adriana Belletti, discussant of my talk. Thanks to Jon Andoni Dunabeitia, ManuelCarreiras, and Maria Polinsky, and to colleagues in The Bilingual Mind research group, Kepa Erdocia,Irene de la Cruz-Pavia, Mikel Santesteban, Iraia Yetano, and Adam Zawiszewski, for their feedback andsuggestions. Errors and misinterpretations are solely mine. This research has been funded by the SpanishMinistry of Education and Science (CSDioo/'Ooon), the Spanish Ministry of Science and Innovation(FFI2OO9-O9695), and the Basque Council for Education, Universities, and Research (IT4i4-io).

9

Page 143: Rich Languages From Poor Inputs

128 Laka

based on what speakers say, but also on how they comprehend what is said to them(C. Chomsky 1969).

Relative clauses are among those complex syntactic structures that children masterrelatively late, and adult language processing displays asymmetries that intriguinglyparallel those found in acquisition. Children do not understand relatives in an adult-like fashion before the age of six (Roth 1984), and they start producing them at aboutthree years of age (Grain, McKee, and Emiliani 1990). Relative clauses containing sub-ject gaps (la) are acquired and produced earlier than relatives containing object gaps(ib) (Brown 1972). It has been widely assumed in the literature that these asymme-tries are invariant across languages and rooted in deep universal aspects of linguisticstructure. I will contend instead that they are subject to variation, and depend onexternal aspects of linguistic form largely independent of syntactic structure, thoughextremely relevant to the study of language use.

I will discuss how certain aspects of linguistic form specific to a language belongingto an understudied type of languages (ergative languages) yield processing resultsand acquisition patterns that have hitherto hardly ever been reported from studiesof a well-studied type of languages (nominative languages). Specifically, I will dis-cuss some recent results from studies on relative-clause processing in Basque thatare incompatible with the widely held assumption that subject-object language pro-cessing asymmetries are universal and that they tap into deep aspects of linguisticstructure involving the core grammatical functions 'subject-of and 'object-of. AsI will argue, the processing results obtained in Basque do not entail that the structurallocation of subjects and objects in ergative and nominative languages is different;rather they entail that morphological differences and input-initial choices have non-trivial consequences for processing.

9.2 Subject-Object Asymmetries in Language Processing

Studies from many languages report that relative clauses containing a gap in subjectposition are easier to process than relative clauses containing a gap in object position.These two types of relative clauses are illustrated in (la) and (ib) for English.

(i) a. The woman; [cp that e; saw the man] arrived early,

b. The woman; [cp that the man saw e;] arrived early.

This effect has been widely reported for languages where relative clauses follow theirantecedent noun, such as English, Dutch, French, German, and Spanish.1 In other

1 For English: Caplan et al. (2002); Ford (1983); King and Just (1991); King and Kutas (1995); Gibson,Hickok, and Schutze (1994); Gordon, Hendrick, and Johnson (2001); Traxler, Morris, and Seely (2002);Weckerly and Kutas (1999). For Dutch: Frazier (1987); Mak, Vonk, and Schriefers (2002, 2006). For French:Cohen and Mehler (1996); Frauenfelder, Segui, and Mehler (1980); Holmes and O'Regan (1981). ForGerman: Mecklinger, Schriefers, Steinhauer, andFriederici (1995); Schriefers, Friederici, andKuhn (1995).For Spanish: Betancort, Carreiras, and Sturt (2009).

Page 144: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 129

languages, relative clauses precede the antecedent noun, and the gap position precedesits antecedent, as in the Japanese examples in (2), where (23) is a subject gap relativeand (2b) is an object gap relative.

(2) a. [e; uma-o ketta] robai-ga; sinda.horse-ace kicked mule-nom died

"The mule that kicked the horse died.'

b. [cp uma-ga e; ketta] robai-ga; sinda.horse-nom kicked mule-nom died

"The mule that the horse kicked died.' (from Ishizuka 2005)

Most studies from languages with prenominal relative clauses also report that subjectgap relatives like (23) are faster and easier to process than object gap relatives like(2b) (see Lin 2006, 2008 for an overview; Lin and Bever 2006 for Chinese; Miyamotoand Nakamura 2003, Ishizuka 2005, Ueno and Garnsey 2008 for Japanese; Kwon,Polinsky, and Kluender 2006, Kwon et al. forthcoming for Korean). However, Hsiaoand Gibson 2003 report an object-relative advantage in doubly embedded Chineserelatives, and a couple of studies find that contextual cues facilitate object-relativeprocessing (Ishizuka et al. 2006 for Japanese; Wu and Gibson 2008 for Chinese).

Several hypotheses have been put forth to explain these subject-object asymmetriesin relative-clause processing. These accounts can be grouped in two classes. Onegroup of hypotheses argues that this effect is rooted in fundamental properties ofsubjects and objects, and therefore predict the effect to be universal, revealing a factof the internal architecture of human language. Another group of hypotheses arguesthat the effect is ultimately due to the temporal arrangement of the linguistic input,and thus predicts that the effect can be reversed if the temporal arrangement of theinput varies.

Let us consider the first group of accounts. Universal subject preference hypothesesargue that subject-object processing asymmetries follow from the greater saliencyof subjects relative to objects in human language, where this saliency can be dif-ferently conceived, either as an inherent property of grammatical functions, under-stood as linguistic primitives, or as derived from general properties of syntacticstructure.

The Accessibility Hierarchy (Keenan and Comrie 1977) ascribes the advantage ofsubjects to an inherent cognitive preeminence of this grammatical function as com-pared to others. In its original form it states that grammatical functions are universallyarranged in a hierarchy that determines their relative accessibility for relative-clauseformation, where subjects are higher than objects and therefore more accessible. Thehierarchy was originally postulated to account for relativization patterns in languagetypology and was later extended to other domains of grammar, language acquisition,and processing (see Kwon, Polinsky, and Kluender 2006 for a critical discussion of its

Page 145: Rich Languages From Poor Inputs

130 Laka

explanatory force). Given this account, (la) and (ia) are easier to process because theyrelativize subjects, whereas (ib) and (ib) are harder because they relativize objects.

(3) The Accessibility HierarchySubject > Direct Object > Indirect Object > Oblique > Genitive > Object ofComparison

The Perspective Shift Hypothesis (Bever 1970; MacWhinney 1977, 1982;MacWhinney and Pleh 1988), states that sentential subjects set the discourseperspective, and that a processing event involving a perspective shift is more costlythan a processing event where perspective is kept constant. Processing a subjectrelative clause entails no perspective shift, but object relative clauses induce a shiftto a new subject in the embedded clause, thus generating a complexity effect. Thatis, (la) and (ia) are easier to process because they involve the same subject with noperspective shift, whereas (ib) and (ib) are harder to process because they involvedifferent subjects in the main and relative clause, with a shift in perspective.

The Structural Distance Hypothesis (O'Grady et al. 2003) appeals to the saliency ofsubjects in syntax; subjects are higher than objects in syntactic structure, and thusthe structural distance between antecedent and gap is always greater for an objectgap structure, because the distance involved in a syntactic operation, calculated interms of number of nodes crossed, is always larger for object dependencies than forsubject dependencies. It is widely agreed in linguistics that all known languages sharethis hierarchical arrangement (Chomsky 1965, 1995). If this is a universal propertyof language, then also according to the Structural Distance Hypothesis, subject gaprelatives must be easier to process independently of the language observed.

Relativized Minimality, henceforth RM (Rizzi 1990; Belletti and Rizzi, this volume)is a general account of locality effects in syntax where intervention of a possibleantecedent has an impact on syntactic dependencies. RM views relative-clause pro-cessing asymmetries as emerging from intervention effects on the resolution of thesyntactic dependency between the gap and its antecedent (Friedmann et al. 2009).If an antecedent-like phrase structurally intervenes between the antecedent and thegap, it will increase processing difficulty. This account predicts that, given clausesinvolving similar subjects and objects (where similarity is dependent on the features ofthe DPs), object relative clauses are harder to process than subject relatives due to thestructural intervention of the subject DP between the object gap and its antecedent,an intervention that does occur in the case of the subject relative, as illustratedin (4):

(4) a. DP; [Cp...ei...[Vp...DP...]...]

b. [Cp...ei...[vp...DP...]...] DP

c. DP; [Cp...DP..,[Vp...ei...]...]

d. [cp...DP...[vp...e i...]...]DP

Page 146: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 131

The schematic structural representations in (43) and (/|.b) replicate the syntactic skele-ton of the subject gap relative clauses (la) and (ia) respectively. Regardless of temporalorder, no DP structurally intervenes between the gap and the antecedent. The reverseis true of (4c) and Ud), representations of the object gap relatives (ib) and (ib)respectively, where the subject DP structurally intervenes between the antecedentand the object gap. Both the Structural Distance Hypothesis and RM share the ideathat processing difficulty depends on syntactic structure and is independent fromthe temporal order of linguistic elements. However, they differ significantly on thepredictions they make regarding the similarity/dissimilarity of the phrases involvedin the sentences, which, according to RM crucially modulate the processing difficultyof the structure. Thus, according to the RM account, processing difficulty increasesas the similarity of the antecedent and intervener increases (Friedmann et al. 2009;Adani et al. 2010). In this respect, this account relies both on invariable aspects oflinguistic structure and specific features of the arguments involved in the structures.

Many studies on subject/object relative-clause processing have shown that severalfactors related to the features of the DPs involved in the relative clauses modulatethe relative processing difficulty of object relatives (Kidd et al. 2007). The processingdifficulty of object relatives is reduced when the antecedent DP is inanimate (Maket al. 2002,2006; Traxler, Morris, and Seely 2002; Traxler, Williams, Blozis, and Morris2005; Weckerly and Kutas 1999), and also when the clauses contain a pronominalsubject or a proper noun subject (Gordon, Hendrick, and Johnson 2001; Warren andGibson 2002, 2005). More recently, Adani et al. (2010) have shown that dissimilaritiesin grammatical features like number also diminish the relative difficulty of object rela-tives. Finally, another factor that has been shown to modulate the asymmetry betweensubject and object relative clauses is morphological case; in particular, relative clauseswhere the case of the gap and the case of the antecedent match are easier to processthan clauses where the gap and the antecedent have different cases (Sauerland andGibson 1998; Ishizuka 2005). Therefore, all these factors must be taken into accountand controlled for to reliably inquire into the impact of subject versus object gaps inrelative-clause processing asymmetries.

The group of hypotheses that claim the temporal arrangement of the linguisticinput to be the locus of the asymmetry include working memory (Ford 1983; Frazierand Fodor 1978; Wanner and Maratsos 1978), integration costs (Gibson 1998, 2000;Hsiao and Gibson 2003), syntactic strategies such as Active Filler Strategy and theMinimal Chain Principle (Clifton and Frazier 1989; Frazier and Flores d'Arcais 1989),the simultaneous influence of syntactic and non-syntactic information (MacDonald,Pearlmutter, and Seidenberg 1994; Trueswell, Tanenhaus, and Kello 1993), and differ-ences in word order canonicity (e.g., Bever 1970; MacDonald and Christiansen 2002;Mitchell, Cuetos, Corley, and Brysbaert 1995; Tabor, Juliano, and Tanenhaus 1997).For an extensive review of these proposals, see Traxler et al. (2002) and Hsiao andGibson (2003).

Page 147: Rich Languages From Poor Inputs

132 Laka

There are also frequency-based approaches, according to which processing com-plexity emerges from competition between alternative structures partially activatedduring processing, favoring of the most frequent structure and rendering less frequentstructures harder to process (Boland 1997; MacDonald 1994; MacDonald et al. 1994;McRae et al. 1998; Spivey-Knowlton and Sedivy 1995; Trueswell et al. 1994; Gennariand MacDonald 2008).

One relevant aspect of some of these accounts is the appeal to the temporal intervalbetween the gap and the antecedent, with increasing temporal distance correlatingwith increasing complexity, as in the Dependency Locality Theory, DLT (Gibson 1998,2000). The DLT predicts inverse asymmetry effects depending on the precedencerelations: in prenominal relative clauses, the linear/temporal distance between the gapand the antecedent is greater in the subject gap relative (53) than in the object gaprelative (*)b), regardless of the relative order of the verb and the object.2 The reverseis the case in languages with postnominal relative clauses (sc.d):

(5) a- [CP esubj object/verb] antecedent prenominal subject gap relative

b. [cp subject e0bj/verb] antecedent prenominal object gap relative

c. antecedent [cp esubj verb/object] postnominal subject gap relative

d. antecedent [cp subject verb/e0bj ] postnominal object gap relative

Therefore, a processing account based on the temporal interval between antecedentand gap such as the DLT predicts that whereas subject gap relatives will be easierto process in languages with postnominal relative clauses, object gap relatives willbe easier in languages with prenominal relative clauses. This prediction is met in theresults reported in Hsiao and Gibson (2003) for Chinese, but subsequent studies inChinese, Japanese, and Korean that report a subject gap advantage cast doubts withrespect to the validity of the DLT as a general account of subject-object asymmetries(see references above, in particular Lin (2008) for a general discussion of this issue).

9.3 Subject/Object Prenominal Relative-Clause Asymmetries in anErgative Grammar

In Carreiras et al. (2010), we studied the processing of subject and object relativesin Basque; our results showed faster and easier processing for object gap relativeclauses as compared to subject gap relative clauses. Basque has prenominal relativeclauses, like Chinese, Japanese, and Korean. But unlike all previously mentionedlanguages, including also those with postnominal relative clauses like (i), it is anergative language. Ergative languages mark actor/undergoer core arguments of theverb in a way that is different from how nominative languages do it; this difference

This is indicated by the slanted bar in the examples: object/verb means either object-verb or verb-object order.

2

Page 148: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 133

crucially involves the grammatical functions of subject and object. Hence, the studyof processing asymmetries in an ergative grammar becomes particularly relevant inorder to ascertain its cross-linguistic validity.

Approximately 25 percent of the worlds languages exhibit ergativity in their gram-mars. Ergativity has been described in detail, and many ergative languages have beendocumented and studied until now, though the task of describing and understandingthe phenomenon is far from complete (for overviews see Dixon 1994; Johns et al.2006; Aldridge 2008; McGregor 2009; Laka and Fernandez 2012). Despite the growingamount of research on ergativity in linguistics, and despite the growing number oflanguages explored in language processing research in recent years, ergative languageshave hardly been the object of processing studies so far.

The salient property of ergative languages is that the morphological marking ofcore verbal arguments diverges from that found in nominative languages. Nominativelanguages differentiate two classes of core arguments: (a) subjects (both transitiveand intransitive) and (b) objects. Ergative languages also differentiate two main coreargument types, but the classes are different: (a) one class consists of transitive subjects,and it is referred to as ergative, and (b) the other class consists of intransitive subjectsand objects,3 and it is referred to as absolutive. The ergative morphological pattern isillustrated in (6) for Basque:

(6) a. emakume-a-k gizon-a ikusi duwoman-D-erg man-D seen has

'the woman has seen the man'

b. gizon-a etorri daman-D arrived is

'the man arrived'

The object of the transitive sentence (6a) and the intransitive subject in (6b) arealike (they belong to the absolutive class), whereas the transitive subject in (6a) isdifferent (ergative). In ergative languages, labels like absolutive and ergative describemost conveniently the morphological marking of core arguments. As can be seenby comparing (6) to their English translations, these two labels do not readily cor-respond to subject and object as they stand in this language. This trait makes erga-tive languages particularly relevant to the study of phenomena that involve subject-object asymmetries, precisely because these two types of arguments find no directequivalents in the morphology of ergative languages. As has been repeatedly observed'... ergativity raises a number of important problems for linguistic theory. (...) Onesuch problem is the status and universality of subject (and to a lesser extent, object)

3 A full characterization of nominative and ergative marking systems is slightly more complex than theone just provided. Interested readers are referred to Dixon (1994), Johns, Massam, and Ndayiragije (2006),Aldridge (2008), McGregor (2009), Laka and Fernandez 2012 and references therein, where overviews ofthe phenomenon are provided in greater detail for a variety of languages.

Page 149: Rich Languages From Poor Inputs

134 Laka

as a grammatical relation, given the morphological groupings of ergative languages,(...). Ultimately we are led to questions of how grammatical relations are theorised'(McGregor 2009: 501).

The processing study in Carreiras et al. (2010) exploited a morphosyntactic ambi-guity first used in Erdocia et al. (2009) for the exploration of word order processingcomplexity effects. This ambiguity involves the -ak ending on DPs,4 illustrated in(7). The ending is homophonous with a singular ergative morphology and a pluralabsolutive morphology. In each of the cases, the morphological structure of the DPis different. Plurality in Basque is marked solely in the determiner, which can besingular -a, as shown in (/a), or plural -ak, as shown in (/b).5 Determiner phrasesmust be marked with ergative case when they are transitive subjects, and the form ofthe ergative case marker is -k. As shown in (/c), the result of adding ergative case to asingular DP yields the sequence -ak, homophonous to the plural determiner in (/b).For completeness, (/d) shows the resulting form of merging the plural determiner -akwith the ergative marker -k, which is the ending -ek, unambiguously denoting a pluralergative DP.

(7) a. emakume-a c. emakume-a-kwoman-D woman-D-erg'the woman 'the woman (ergative case)'

b. emakume-ak d. emakume-ekwoman-Dpi woman-Dpi+erg'the women 'the women (ergative case)'

Hence, upon encountering as input a word like emakumeak, two possible interpre-tations are compatible with Basque grammar: interpreting it as a singular ergative,meaning 'the woman, or interpreting it as a plural absolutive, meaning 'the women'.Remember that in ergative languages, subjects of transitive sentences are marked withergative case, and objects are absolutive (8):

(8) a + k: [singular determiner] + [ergative case]

a. emakume-a-k gizon-a ikusi du Singular, Transitive Subjectwoman-D-erg man-D seen has

'the woman has seen the manak: [plural determiner]

b. zu-k emakume-ak ikusi dituzu Plural, Objectyou-erg women-Dpi seen them-have-you

'you have seen the women'

4 I am using the label DP, although it is less familiar to non-linguists, who are more used to the label NP.The label DP 'determiner phrase' is more accurate for this type of nominal structure (Bernstein 2001).

5 I leave aside other elements pertaining to the determiner category, such as demonstratives, whichare not relevant to this discussion. The determiner -a, -ak is not always definite, that is, it is not alwaystranslationally equivalent to English the (Laka 1993).

Page 150: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 135

c. emakume-ak etorri dira Plural, Unaccusative Subjectwomen-Dpi arrived are'the women have arrived'

In Carreiras et al. (2010) we constructed relative clauses building upon this ambigu-ity. The experimental sentences were ambiguous between a subject gap or an objectgap relative-clause interpretation until the verb at the end of the main sentence wasencountered (9):

(9) Object gap relative clause

a. [RC emakume-a-k [yp &i ikusi ] ditu-en] gizon-ak; lagun-ak dira orain[RC woman-D-erg [yp e; seen] has-C] man-Dpi; friend-Dpl are now'the men; [that the woman saw ei\ are friends now'

Subject gap relative clause

b. [RC £; [VP emakume-ak ikusi] ditu-en] gizon-a-k; lagun-ak ditu orain[RC e; [yp woman-Dpi seen] has-C] man-D-erg; friend-Dpi has now'the man; [that ei saw the women] has friends now'

The sentences were presented word by word, and given the ambiguity of the wordstrings, the asymmetric cost should arise at the final main verb. Note that there areno differences between the two types of clauses in (9) in terms of storage resources(Gibson 2000), because the number of unresolved dependencies is the same in thetwo clauses: only one verb is required to generate a grammatical sentence.6 Theeffect of word order canonicity can also be disregarded as a potential cause of anasymmetry (MacDonald and Christiansen 2002), because in neither type of relativeclause do overt elements follow the SOV canonical order of the language (Erdocia et al.2009).

In order to avoid effects due to the internal make-up of the arguments involved inthe clauses, all the DPs in the experimental sentences are animate definite descrip-tions, and the cases of the gap and the antecedent always match: the gap andthe antecedent bear absolutive case in the object relatives (93), and the gap and theantecedent bear ergative case in the subject relatives (9b). Hence, the results reportedhere refer only to processing events where two animate definite descriptions areinvolved in the relative clauses, and the gap and the antecedent are matched forcase. The results do not preclude a modulation of processing difficulty determinedby dissimilarities between the arguments of the clause or case mismatch effects. Ourstudy intended to control for these ulterior factors and did so by making all argu-ments involved similar regarding their lexical/internal and morphological/externalstructures.

6 A grammatical sentence is created by simply adding a verb: ikasleak datoz 'the students arrive' akin toexamples (sb), (6c), using a synthetically inflected verb.

Page 151: Rich Languages From Poor Inputs

136 Laka

Three experiments were carried out: two self-paced reading experiments and oneERP experiment. The only difference between the first and second self-paced readingexperiment was the addition of a temporal adjunct after the verb of the main sentencein the second experiment, to avoid possible wrap-up effects at the end of the clause.The results of the two self-paced reading experiments showed that subject gap sen-tences took significantly longer to read than object gap sentences in the main verb(dira/ditu), the critical disambiguating region (10):

(10) Reading times for subject gap versus object gap relatives, taken from Carreiraset al. (2010)

(b) Experiment 2

(a) Experiment i

Page 152: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 137

In addition, an ERP experiment was conducted with the materials of the secondexperiment. ERPs showed that subject gap sentences produced larger amplitudesthan object gap sentences in the P6oo window immediately after reading the criticaldisambiguating word (dim/ditu), which also indicates greater processing difficulty forsubject gap sentences than for object gap sentences.

Thus, the results of the three experiments conducted clearly point to a greaterprocessing difficulty for subject gap sentences than for object gap sentences, contraryto the majority of studies previously conducted in nominative languages. If subject gapsyntactic structures are not universally easier to process than object gap structures,then accounts based on the inherent saliency or higher structural position of sub-jects cannot constitute a cross-linguistically valid account for processing asymmetriesinvolving subjects and objects.

In order to determine whether frequency of occurrence correlated with theseresults, an initial corpus study was undertaken on a 25,000 word subset of the EPECcorpus (Aduriz et al. 2006), a 55,000 word sample collection of written standardBasque created and morphosyntactically tagged by the IX A group from the Universityof the Basque Country. Out of a total of 625 prenominal relative clauses found inthe sample, 399 were subject gap relatives (approximately 64 percent), while only226 were object relatives (approximately 36 percent). Subsequently, Yetano (2009)conducted a frequency study on the entirety of the EPEC corpus (300,000 words);results revealed that out of the total 1,509 relative clauses in the corpus that involvedsubjects or objects as gaps, 65.6 percent were subject relatives, whereas 34.4 percentwere object relatives. These data make a constraint satisfaction approach based onfrequency not suitable to account for the findings: there is no correlation between thefrequency of occurrence of subject versus object relative clauses and the processingasymmetry found. If frequency were the factor modulating processing difficulty, thenthe subject gap relative clause should have turned out to be easier to process than theobject gap relative, contrary to results.

The object gap relative-clause advantage found in Basque is compatible withthe Dependence Locality Theory (Gibson 1998) based on temporal/linear distance,because the object gap in (93) is temporally/linearly closer to its antecedent thanthe subject gap in (9b). However, as mentioned before, results from other languageswith prenominal relative clauses, where a subject preference has been reported,cast doubts on the cross-linguistic explanatory power of the DLT (Lin 2008). Ifresults from languages with prenominal relative clauses do not converge, thensome other factor or factors must be at play behind the inverse effect found inBasque.

If neither temporal nor structural distance can provide a comprehensive account ofsubject/object relative-clause processing asymmetries across languages, we must seekalternative factors underlying the effect. Language-specific properties are a plausiblecandidate, because language processing must handle externalized language forms, and

Page 153: Rich Languages From Poor Inputs

138 Laka

infer syntactic structure therefrom. Given the view that the most plausible locus forlanguage variation is morphology (Chomsky 1995), and given the fact that ergativityin Basque is a morphological phenomenon (Levin 1983), this linguistic trait standsout as a likely source for this divergent pattern of processing asymmetries, because itdirectly involves morphological case marking of core arguments.

For instance, if we consider morphological markedness, core argument markingin ergative languages entails that, given a transitive clause, the object is generallyunmarked, and morphologically identical to the subject of an intransitive clause. Forthe case of Basque in particular, this is certainly borne out because the absolutive casecarries no overt marker, whereas the ergative case is marked with the morpheme -k,as shown earlier in (6), (7), and (8).

Transitive Subject

Intransitive Subject

Object: accusative

Ergative languages

Transitive Subject: ergative

Intransitive Subject

Object absolutive

If morphological markedness correlates with processing difficulty (Baayen et al. 1997;Badecker and Kuminiak 2007), then opposite patterns of complexity should arise ineach type of grammar: nominative-accusative languages should typically display anominative (= subject) advantage, but given the same underlying processing mech-anism, ergative languages should display an absolutive advantage (remember thatabsolutive includes objects and intransitive subjects, that is, patient-like arguments,but excludes transitive subjects, that is, agent-like arguments). Hence, the ergativemorphological marking pattern can explain why an absolutive/object gap advantageobtains when transitive relative clauses are processed in Basque.

An account involving morphological markedness and ergativity at its source gainsplausibility given a recent relative-clause comprehension study in Basque, which con-verges with the results we have just discussed. In a picture-matching task study, carriedout by two groups of children of four and six years of age respectively and a group ofyoung adults, Gutierrez (2010) found, for all three groups, that performance on abso-lutive/object gap relatives was significantly better than in ergative/subject relatives:the former are comprehended with more accuracy than the ergative/subject relatives,which suggests a measure of processing difficulty.

These results are thus compatible with the hypothesis that morphologicallyunmarked antecedent gap dependencies are easier to process: for nominative lan-guages, unmarked dependencies correspond to nominative/subject gap relativeclauses, but in ergative languages unmarked dependencies are those establishedon absolutive arguments, which include objects. In the specific case of transitive

Nominative languages(11)

nominative

Page 154: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 139

sentences, as the ones considered here, this predicts a subject gap relative-clauseadvantage for the class of nominative languages, but an object gap relative-clauseadvantage for the class of ergative languages.

9.4 Subject Processing Strategies in an Ergative Grammar

Another factor at play in the object gap advantage found in Basque involves input-initial processing choices, which, given the incremental/deterministic nature of lan-guage processing, and the temporal order of the input, can lead to preferences thatfavor the object gap interpretation in this language. Since it was first proposed byFrazier and Fodor (1978), it has been generally accepted that, when confronted witha morphosyntactically ambiguous structure, processing hypotheses favor the oneleading to the simplest structure: 'We assume in addition that if there is a choice ofactions to take, then the processing system will again mirror the grammar, and sofavor the "economy" condition that the closest adjacent feature should be checked,rather than delaying to a later point in the derivation' (Berwick 2011). This generalprinciple of processing economy predicts that input-initial nominal phrases will betaken to be sentential subjects unless there is overt evidence to the contrary. The pre-diction is strongly borne out across languages; there is ample evidence for a subject-first processing strategy for languages like Dutch, German, English, Italian, Turkish,some of which are head-initial, others head-final, some pro-drop and others not (e.g.,de Vincenzi 1991; Frazier 1987; Schriefers et al. 1995; Demiral et al. 2008). It hasbeen proposed that the subject-first preference is just a corollary of the minimalistnature of language processing, such that it postulates the minimal syntactic structureconsistent with the input (Chomsky 1995; Bornkessel-Schlesewsky and Schlesewsky2009; Berwick 2011).

When encountering a morphologically ambiguous nominal constituent likeemakumeak, Basque speakers can in principle process that phrase as (i) a singularergative case-marked DP 'the woman or as (ii) a plural absolutive DP 'the women,as illustrated earlier in (8). In turn, these two morphological choices combine withthe fact that Basque is a pro-drop language where arguments need not be overt,and with the fact that word order is not fixed. Hence, upon encountering a phraselike emakumeak, the grammar of the language allows at least either of the following(repeated here from examples in (8) for convenience):

(12) a. emakume-ak etorri dira Plural Intransitive Subjectwomen-Dpi arrived are'the women have arrived'

b. emakume-a-k gizon-a ikusi du Singular Transitive Subjectwoman-D-erg man-D seen has'the woman has seen the man

Page 155: Rich Languages From Poor Inputs

140 Laka

c. pro; emakume-ak ikusi dit-u-zu; Plural Object(youerg) women-Dpi seen them-have-you'you have seen the women

Minimalist considerations rule out the choice in (nc), for it requires postulatingadditional syntactic structure to accommodate the null argument in a transitivestructure. Indeed, minimal processing predicts that the choice for the ambiguousemakumeak should be (na), for this is the simplest syntactic structure consistentwith the input; namely, the sentence containing an intransitive/unaccusative verbwith a single argument. However, relative-clause processing data indicate that subjectschoose to interpret the ambiguous phrase emakumeak as a singular ergative subject(lib), contrary to prediction. Let us see this in greater detail.

If the preferred choice for an ambiguous -ak phrase were absolutive plural (na)instead of ergative singular (lib), then incremental processing should generate agarden path effect for the case of the object gap relative, but no garden path effectshould emerge for the subject gap relative. Consider (133), where a sentence-initialDP like emakumeak is encountered. Given that it can be an intransitive/unaccusativeabsolutive subject, minimal processing makes the choice of parsing it as a pluralabsolutive phrase 'the women. Once this choice is made, when the verb ikusi 'to see' isencountered (isb), we must now assume that emakumeak is the object of a transitivesentence; given that Basque is a head-final, free word order, and pro-drop language,the transitive subject can appear later in the sentence or not at all. Notice that in anergative grammar it is not an obvious matter whether the change from absolutivesubject to absolutive object involves a significant change in argument role identi-fication, of the kind that would be involved for a similar change in a nominativelanguage (nominative to accusative). In any event, once the verb ikusi is encountered,emakumeak (if absolutive) must be the object of the transitive clause.

(13) a. emakume-ak...woman-Dpi

b. emakume-ak ikusi...woman-Dpi seen

c. emakume-ak ikusi ditu-en...woman-Dpi seen has-Comp

When the inflected auxiliary verb is encountered (to which the complementizer isattached), given its form it becomes clear that (i) the ergative subject is singular andobject agreement is plural (morpheme -it-) and (ii) this is a relative clause. Given theinitial choice for emakumeak as a plural absolutive, and since the relative clause is now'closed' by the complementizer -en, we must postulate the ergative gap in the clause,which is coreferent with the head of the relative immediately following the inflectedauxiliary:

Page 156: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 141

(i4) [RC ei [VP emakume-ak ikusi] ditu-en] gizon-a-k; lagun-ak ditu orain[RC ei [VP woman -Dpi seen] has-C] man-D-erg; friend-Dpi has now'the man; [that e; saw the women] has friends now'

In other words, if subjects made an intransitive subject choice for the ambiguoussentence-initial DP emakumeak, this should result in a subject gap relative-clauseprocessing choice, contrary to the results obtained. Instead of this, speakers preferredto process the ambiguous sentence-initial DP as a singular ergative phrase, that is, as atransitive subject, which explains why object gap relative clauses were processed fasterand with greater ease:7

Us) [RC emakume-a-k [yp e; ikusi] ditu-en] gizon-ak; lagun-ak dira orain[RC woman-D-erg [yp e; seen] has-C] man-Dpi; friend-Dpl are now'the men; [that the woman saw ej] are friends now'

Hence, a sentence-initial choice favoring the ergative over the absolutive has as aconsequence that the subject gap disambiguation at sentence-final position resultsin a garden path, while no such effect emerges in the case of the object gap disam-biguation. The increase in reading times for the subject gap disambiguation as com-pared to the object gap disambiguation shown in (10) strongly suggests this is in factthe case.

Given that ergative marked arguments are unequivocally subjects, this ergativepreference can be interpreted as strong evidence of a 'subject-first' processing strategy,independently reported in most processing studies to date. However, it is worth-while noting that this processing choice in favor of the ergative would violate theprinciple of minimal structure (Chomsky 1995; Bornkessel-Schlesewsky and Schle-sewsky 2009; Berwick 2011), because minimalist processing predicts that in theface of ambiguity, the parser chooses the option that generates the simplest struc-ture. If, as it is generally assumed, the simplest sentence structure is that corre-sponding to a monoargumental/unaccusative predicate, then we should expect thatfor the case of emakumeak, speakers should choose an absolutive interpretation,given that this is the form a subject takes in unaccusative predicates in ergativelanguages.

We must not conclude, however, that Basque violates the principles of minimalparsing. As mentioned before, all DPs in the experimental sentences in Carreiras et al.(2010) were animate definite descriptions. Ferreira and Clifton (1986) demonstratedthat initial DP-animacity strongly influenced processing choices and garden patheffects; since then, the effects of animacy in relative-clause processing have been

7 For reasons of space, I will not retrace in detail the consequences of an initial ergative choice regardingthe object gap preference; the interested reader can verify that repeating the previous steps illustrating theconsequences of an initial absolutive choice replacing an initial ergative do yield an object gap preference.

Page 157: Rich Languages From Poor Inputs

142 Laka

explored in depth (Traxler et al. 2002, 2005), and it has been suggested that animacystrongly determines the processing choices speakers initially make for relative clauses(Mak et al. 2002, 2006). Bornkessel-Schlesewsky and Schlesewsky (2008, 2009) arguethat prominence features like animacy, known to be active features in the morphologyof many human grammars, can drive processing choices in the absence of other cues,and can have variable impact on processing cross-linguistically. In particular, verb-final grammars would rely more on prominence features like animacy to determineactor/agent-like and undergoer/patient-like roles during online comprehension. Thefinding that morphologically ambiguous animate DPs in Basque are processed asergative/transitive subjects appears to converge with the findings in Choudhary et al.(2009), where ERPs revealed N/I.OO effects for clause-initial inanimate ergatives, thatthe authors interpreted as an index of difficulties in grammatical role assignment. Insum, the subject-initial processing strategy that follows from a minimalist processingis modulated by grammatically active features like animacy, which favor an erga-tive/actor processing choice for a sentence with an initial animate ambiguous form.

9.5 Subject/Object Postnominal Relative Clause Asymmetriesin an Ergative Grammar

If morphological markedness combined with sentence-initial processing choices arethe main factors behind the object gap advantage found for relative clauses in Car-reiras et al. (2010), then we predict that variations in the temporal arrangement of thelinguistic input can have an impact on this effect. In Yetano et al. (2010) we exploredthis possibility by investigating a type of relative clauses in Basque, which is cruciallypostnominal. This type of relative clause is less frequent, and belongs to a higherregister of the language. Unlike prenominal relative clauses, it involves a wh-elementin complementizer position, much like an English relative clause. Examples of thepostnominal relative clauses used in Yetano et al. (2010) are given in (16):

(16) Postnominal subject gap relative clause

a. gizon-a-k; [zeinak, ei emakume-ak ikusi bait-ditu] lagunak dituman-D-erg; [who; e; woman-Dpi seen C-has] friend- Dpi hasorainnow

'the man who saw the women has friends now'

Postnominal object gap relative clause

b. gizon-ak; [zeinak; emakume-a-k ei ikusi bait-ditu] lagunak diraman-Dpi; [who; woman-D-erg e; seen C-has] friend- Dpi areorainnow

'the men who the woman saw are friends now'

Page 158: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 143

As shown in (16), we employed morphologically ambiguous -ak phrases to constructour experimental sentences. Hence, as in Carreiras et al. (2010), relative clauses wereambiguous between a subject gap or an object gap reading until the inflected verbof the main clause was reached (dira/ditu), in the next-to-last position. We gener-ated twenty-six experimental sentences, thirteen subject relatives and thirteen objectrelatives, which were mixed with seventy-four fillers. The experiment consisted ofphrase-by-phrase (= word-by-word) self-paced reading with a comprehension ques-tion after each sentence. Results from the forty native speakers who took part inthe experiment revealed shorter reading times for the subject gap relatives at thecritical disambiguating region and at the subsequent region, as shown in (17), andcomprehension accuracy was higher for subject relatives (86 percent correct responsesfor subject relatives versus 81 percent correct responses for object relatives).

(17) Reading times for the different regions of the sentences containing SR and ORclauses, taken from Yetano et al. (2010)

The object gap preference found in prenominal relative clauses in Carreiras et al.(2010) is reversed in postnominal relatives, which display a subject gap advantage.These results are compatible with a linear distance account like the Dependency Local-ity Theory, DLT (Gibson 1998, 2000), because it predicts inverse asymmetry effectsdepending on the precedence relations, of the type we find in Basque. However,studies from languages with pronominal relatives that report a subject gap advantagerender the DLT problematic as a cross-linguistic account of subject-object asymme-tries (see Lin 2008 for a general discussion).

Page 159: Rich Languages From Poor Inputs

144 Laka

The results are also compatible with the hypothesis that the main factors behindthe processing asymmetries are morphological markedness and sentence-initial pro-cessing choices. An interaction between an absolutive gap advantage and subjectpreference that is parallel to the one discussed here for Basque is also recently reportedby Polinsky, Gomez-Gallo, Kravtchenko, and Testelets (2012) for relative-clause pro-cessing in Avar, an ergative language from the Caucasus. In the case of Basque, ergativemorphology (which has absolutive as the unmarked class), combined with a DPsentence-initial processing choice that favors an ergative interpretation for ambiguousanimate DPs, yields the subject advantage in the postnominal case, as can be seen byconsidering the syntactic structure in (i6a), which takes the initial DP to be ergative,and thus leads to a subject gap interpretation, opposite to what was the case with thepronominal relative in (15). For convenience, we repeat the two representations in(18), where (i8a) corresponds to a postnominal subject gap relative, and (i8b) to apronominal object relative:

(18) a. gizon-a-k; [zeinak; ei emakume-ak ikusi bait-ditu] lagunak dituman-D-erg; [who; e; woman-Dpi seen C-has] friend- Dpi hasorainnow

'the man who saw the women has friends now'

b. [RC emakume-a-k [yp e-i ikusi ] ditu-en] gizon-ak; lagun-ak dira[RC woman-D-erg [yp e; seen] has-C] man-Dpi; friend-Dpl areorainnow

"The men; [that the woman saw ei\ are friends now'

Both syntactic structures, different as they are, share the fact that the first DP encoun-tered is interpreted as an ergative singular subject. The object advantage of thepronominal relative and the subject advantage of the postnominal relative both followgiven this initial processing choice.

9.6 Conclusion: Subject/Object Asymmetries Need Not be AboutSubjects and Objects

The results and processing mechanisms we have discussed in this chapter underscorethat a full picture of language and the way in which it is processed in real timerequires the study of a broad sample of significantly different languages and linguisticphenomena; only a truly cross-linguistic approach to language research will reveal theinterplay of the various factors at work in molding the interface between grammaticalform and language processing strategies.

Page 160: Rich Languages From Poor Inputs

Subject-Object Processing Asymmetries 145

The task of language processing is to infer syntactic structure from the temporalinput in order to access (propositional) meaning. It is well established that severalnon-syntactic factors modulate the strategies involved in processing. Given this, com-plexity effects observed in processing do not necessarily reveal syntactic/grammaticalcomplexity per se. Differential processing costs can result from the interplay of pro-cessing strategies that determine parsing choices for incoming input, as input isturned into language (words, phrases, sentences). A cross-linguistic outlook is thusessential to discover the ultimate nature not only of linguistic structure (as is widelyaccepted in linguistics), but also of input-processing mechanisms at an adequate levelof abstraction.

In generative linguistics, notions like 'subject' or 'object' are not taken to be uni-versal primitives; rather, they are considered derivative-descriptive categories thatlabel syntactic configurations (Chomsky 1965). It is plausible that also in languageprocessing, when encountering what appear to be irreducible subject/object asymme-tries, these can be reduced to more general, and less language-dependent processingmechanisms. In this respect, the goals originally set out for linguistics in SyntacticStructures can be used to state what the ultimate goal of cross-linguistic studies is, notonly in theoretical inquiry, but also when inquiring into language use:

More generally, linguists must be concerned with the problem of determining the fundamentalunderlying properties of successful grammars. The ultimate outcome of these investigationsshould be a theory of linguistic structure in which the descriptive devices utilized in partic-ular grammars are presented and studied abstractly, with no specific reference to particularlanguages. (N. Chomsky 1957: 11)

This goal can be extended to research on language processing, by simply writing'language researchers' instead of 'linguists' then 'processing mechanisms', instead of'successful grammars', and finally 'language processing' instead of'linguistic structure',thus:

More generally, language researchers must be concerned with the problem of determiningthe fundamental underlying properties of processing mechanisms. The ultimate outcome ofthese investigations should be a theory of language processing in which the descriptive devicesutilized in particular grammars are presented and studied abstractly, with no specific referenceto particular languages.

Page 161: Rich Languages From Poor Inputs

10

Tough-Movement DevelopmentalDelay: Another Effect of PhasalComputation

KEN WEXLER

10.1 Introduction

The explanation for the very slow development of the tough-construction (TC)1 is amajor open problem in the theory of language acquisition. Chomsky2 showed thatgood comprehension of this construction was very seriously delayed. I will proposean updated analysis of why the construction is delayed in terms of the UniversalPhase Requirement (Wexler 2004). The analysis will explain why a large number ofparticular complex constructions develop synchronously, while other complex con-structions show no such delay. It has been extremely difficult to capture the empiricalgeneralizations in traditional terms, but the insights developed on the strong role oflocal cyclic computation in Minimalist theory turn out to be crucial in explaining thedevelopmental facts. This kind of synthesis of linguistic theory and developmentaltheory is one of the hallmarks of contemporary developmental psycholinguistics, astriking feature that helps to make the field one of the most vibrant and productive incontemporary cognitive science. The whole field was set in motion by the importantearly contributions of pioneering scientists like Carol Chomsky, who made the leapinto the careful experimental study of the development of significant grammaticalabilities.

1 Traditionally the tough-construction was called tough-movement, and often still is. Since there havebeen attempts to explain its properties via non-movement analyses, some authors (e.g. Hicks 2009) haveadopted the alternative more neutral name, and one which refers to the complex of properties that allowsthe sentences to be derived. I'll alternate between TM (tough-movement) and TC (tough-construction),though (like Hicks and many before) I will adopt a movement analysis.

To keep references clear in the text, when I refer to 'Chomsky' it means Carol Chomsky. A referenceto the publications of Noam Chomsky will be labeled as N. Chomsky.

2

Page 162: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 147

Chomsky (1969) is a study of several aspects of grammar, the development ofwhich had never been previously studied. The book investigated the developmentof TC, obligatory control for various kinds of verb types (including promise, ask, tell),and aspects of what we now call binding theory. The original methodology had acertain flavor of closely detailed clinical interviews in comprehension experiments,including follow-up questions to the child. Rereading Chomsky's book after all theseyears makes one appreciate how much of the current fascination with small-sample,small number of stimuli, rich-context, interview-style methods that often goes underthe name truth-value judgment task was foreshadowed in this book. The creative andvery detailed study of particular grammatical properties was an omen for what wasto come, as experimental developmental psycholinguistics bloomed. Looking backat Chomsky's results from a modern perspective, it is fair to say that they representa groundbreaking demonstration of the slow growth of some specific grammaticalcapacities despite the existence of a biologically given design for language withinthe human species. Much of contemporary theorizing about linguistic developmenthas concerned itself with this question (Borer and Wexler 1987, 1992; Wexler 1992,1996, 1998, 2003, 2004). The currently active and promising attempts to understandsome of language development in genetic terms, integrating biology and linguistics(Ganger, Wexler, and Soderstrom 1998; Bishop, Adams, and Norbury 2006; Ganger,Dunn, and Gordon 2005; Falcaro et al. 2006; Wexler forthcoming) also owes muchto the surprising demonstration of slow development of some particular features ofgrammar.

10.2 Tough-Movement

TC has long been a subject of syntactic fascination and analysis. Its properties donot easily fit into the generalizations that have captured much of syntactic abilities.In this section I only attempt to give a very brief, very informal, hopefully intuitivedescription of the unusual properties of TC. In a later section, we'll follow a somewhatmore formal analysis to see how we can understand how the properties of TC exist.Consider a TC sentence like (i):

(1) That house was easy to knock down.

Clearly the subject that house is the patient of the verb knock down; it receives itsthematic role as the object of knock down. It appears as if the DP that house is movingfrom object position of the verb knock down to the subject position. But how and whyshould this movement take place? There is a parallel construction (2) that shows nosuch movement:

(2) It was easy to knock down that house.

Page 163: Rich Languages From Poor Inputs

148 Wexler

Movement of constituents is generally motivated either by semantic operations, e.g.,scope, as in w/i-movement or by syntactic considerations, e.g., the Extended Projec-tion Principle, which requires certain heads to have filled specifiers (or more abstractconsiderations involving features). Traditionally the second kind of movement (syn-tactically motivated) has been called A(rgument) movement, because the movedphrase ends up in an 'argument' position, e.g., in subject position. The second kind ofmovement (semantically motivated) is called A-bar (i.e., not A(rgument) movementbecause the moved phrases wind up in particular kinds of semantic-related positions,e.g., wh-movement and its relations place phrases in the specifier of the whole sen-tence, the CP.

In (i), the moved phrase that house isn't in a special semantic position; rather itis in subject position. So TM must be syntactically motivated (A) movement. Butsyntactic movement of this kind only works when the moved phrase hasn't receivedthe necessary values in its original position; there are unvalued features on the phrase.For example, passive sentences (e.g., that house was knocked down) move objects tosubject positions. But passive sentences only work when the verb is a participle, andno case is assigned to the object (a property of participles). But in (i), that house is theobject of a transitive verb knock down; thus the question arises, how is it possible forsyntactic movement to apply to that house if it already has accusative case from knockdown? So the TC must involve semantically motivated (A-bar) movement.

This unusual combination of properties has led to central attempts to explain TCas a combination of both A and A-bar movements (N. Chomsky 1977 and a longlist of references). With such an interesting odd duck characterization, it's crucial tosee how and why they arise and are allowed by current theory, which we will doin a later section. It is also of much interest to study the developmental course ofthis construction. Does the child show the behavior that the combinations predict,given the known developmental course of the underlying syntactic processes that areresponsible for the construction?

10.3 Chomsky's Experiment

Chomsky's (1969) experiment on TC (children aged five to nine years) used a doll thatwas sometimes blindfolded, and the verb see, to investigate whether the child took thesurface subject of the TC as the object (correct) or subject (incorrect) of see. The childcould be asked ayes/no question like (33) or asked to produce an action like (3!)).

(3) a. Is the doll easy to see or hard to see?

b. Would you make her easy/hard to see?

The idea was that if the doll was blindfolded, she still was easy to see. If the childmisunderstood the question and thought that the doll is easy to see meant that the dollcould see easily then the child would answer (33) as easy to see when the doll wasn'tblindfolded, and hard to see when the doll was blindfolded. Similarly for (3!)).

Page 164: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 149

Chomsky concluded that twenty-six children understood the construction andfourteen didn't. Almost all the five-year-olds were incorrect. The six-, seven-, andeight-year olds showed mixed performance. Only at nine years were all the childrencorrect. We'll return to Chomsky's explanation for the severe delay in development.

10.4 Cromer's Replications

Such a strong result, very late development of a particular grammatical construction,is surprising (to some) and important (to everybody) enough to demand replication.Cromer (1970) set out to carry out these replications. The method was an act-outtask. The child had to make hand puppets carry out the action that was described ina sentence that the child heard. For example, the child had to make the wolf bite theduck or the duck bite the wolf. There were three types of adjectives that take infinitivalcomplements; the grammar of the complement of each adjective type led to a differentinterpretation.

(4b) is a list of TM adjectives, of the type we've studied. They lead to what Cromercalls an 'object' interpretation since the surface subject has the interpretation of thelogical object of the main verb; in (sb), the wolf has the interpretation of the objectof bite.

(4) a. S (subject) type, e.g., happy, anxious, willing, <

b. O (object) type, e.g., tasty, easy, hard, fun

c. A (ambiguous) type, e.g., bad, horrible, nice, nasty

(5) Who bites?a. The wolf is happy to bite wolfb. The wolf is easy to bite duckc. The wolf is bad to bite either

Cromer calls adjectives like happy, S-type, since the surface subject has the interpre-tation of the logical subject of the main verb; in (53), the wolf has the interpretationof the subject of bite. In modern terms we would call sentences like (53) structuresof '(adjectival) control.' The surface subject is coreferential with the (phoneticallyempty, usually designated PRO) subject of bite. (40) adjectives are A (ambiguous) typebecause the surface subject has either the interpretation of the subject or object of themain verb. In (50), we can interpret the wolf as either the biter or the bitten. So baddisplays an ambiguity between TM and control complements.3

The point of the experiment was to see how children interpret the sentences with thedifferent grammars induced by the adjectives. If, for example, children perform muchbetter on S-type adjectives (53) than O-type (sb), we might take this as evidence that

3 The structure and semantics of (50) bad, etc. when it is interpreted as a 'control' structure may in factbe different from structures like (sa). We'll ignore this potential complication here.

Page 165: Rich Languages From Poor Inputs

150 Wexler

it's TM that's difficult, not simply a structure with an adjective that takes an infinitivalcomplement. Thus Cromer's experiment provides useful experimental controls thatsupplement Chomsky's methods. Forty-one children, ages 5;3 to 755, participated.There were four tokens of each type, namely one each of the adjectives in (4).

Children who made errors on S-type adjectives also made errors on O-type adjec-tives. Quite a few children, however, made errors on O-type adjectives, but not onS-type. So Cromer could classify the children into categories determined by theirresponses to the S-type and O-type adjectives in the following way:4

(6) a. Primitive Rule: Always identified deep subject as being surface subject, thusalways right on S-type, always wrong on O-type

b. Intermediate: Gave mixed responses

c. Passers: Consistently answered correctly

Instead of analyzing his results by age, Cromer compared the child's category tohis/her PPVT, a test of vocabulary. The age score in Cromer's table of results (7) isthe 'mental age' of the child's vocabulary score as determined by PPVT. This is oftencalled Verbal IQ.'

(7)Mental Age (PPVT)

2;ii-5;7

5;9-6;6

6;8-lo;8

Primitive

17

0

0

Intermediate

10

8

i

Passers

0

0

5

No child with MA less than 6;8 was a passer. All five children older (here we alwaysmean MA) than 6;8 passed. But there were only five of them, and their ages rangedfrom 6;8 to io;8. We would need many more participants from seven and up to deter-mine a more exact range of ages. Moreover, MA is an inexact measure for biological(chronological) age. There were children with biological age as high as 7;4 who wereclassified as Primitive. Thus it seems that an approximate statement based on this data

4 It is no surprise that subject control structures (4a, 5a) are the ones that are interpreted correctly early.There are two possible explanations. If the child ignores the adjective and some of the surrounding materialand only interprets the sentence according to the noun phrase and verb (the wolf bite), then sentences like(5a) will be consistently correct and sentences like (sc) will be consistently incorrect. So one possibility isno understanding of any of the adjectival control structures, leading to a response strategy. On the otherhand, there is an excellent body of evidence, based mostly on experiments on verbal control that showsthat control structures develop fairly early (see Wexler 1992 for a review). If the results from verbal controlwere extended to adjectival control, then the child would answer sentences like ($a) correctly on the basisof the actual grammar. Then only sentences like (sc) would not be parsable by the child, and a responsestrategy (perhaps guessing, perhaps interpreting the sentences always as subject control) would be needed.Without more details on the pattern of child responses, it is difficult to tell what is actually going on in theexperiment on these items, but it seems to me to be a reasonable hypothesis that the children actually areinterpreting (^a) correctly. Further experiments would be useful.

Page 166: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 151

is that only after biological age eight can we be fairly confident that a child will be apasser.

Age eight strikes a bell. This is also the minimum age at which we can be fairlyconfident that a child will perform well on a comprehension test of verbal passivesof subject experiencer verbs (Mary was seen by John), as well as several related con-structions. We'll briefly review the evidence later. But it's a hopeful sign for theory;the biolinguistic view (the generative grammar view) suggests that we would expectconstructions sharing the appropriate syntactic processes to develop around the sametime. This gives us a hint at how to pursue a theory of the development of tough-movement.

First, let us briefly review some further data. Cromer (1972) replicated the exper-iment,5 testing fifty-six children between 5;9 and n;o. Though the paper shows sig-nificant difficulties with TM (O-adjectives), performance isn't reported by age. ButCromer (1987) writes that, 'Cross-sectional studies (e.g., Cromer, 1972) had shownthat correct adult performance was not achieved by a majority of children until after10 or 11 years of age.' So again, eight or nine doesn't seem to be a bad approximationof the age of development for the data in Cromer (1972).

Cromer (1983) studied 63 seven- and eight-year-old children. The ages were delib-erately chosen so that the children would still be making errors on TM (O-adjectives).The proportion of errors for these older children ranged from 12 percent to 41 percentfor the five O-adjectives, but only from 3 percent to 9 percent for the five S-adjectives.Clearly, TM has not yet been mastered by all of these seven- to eight-year-old children.

The experiment was done to test whether the children were consistent on any givenadjective. Is it simply a question of learning particular adjectives at different times?6

To test this possibility, the experiment was carried out over a year, with two-day(consecutive day) testing sessions at intervals of three months over the year. The sametest was repeated on Day i and Day 2 of a session. Especially on the O-adjectives, thechildren were quite inconsistent from Day i to Day 2, often responding in the oppositeway on the second day to their response on the same adjective on the first day.

The consistency analysis was quantified for the forty-eight children in an interme-diate stage. Inconsistency was much greater for the O-adjectives, using a measure thatCromer calls 'non-growth inconsistency', i.e., that reflects inconsistency only while thechild remains in the intermediate stage. To take an extreme example, the O-adjectivehard elicited 50 percent inconsistent responses.

Cromer uses these results to argue against a gradualist view of lexical acquisition asan explanation for the results. We'll come back to this argument. First, let's point out

5 There was a second methodology in Cromer (1972). In addition to the initial test of TM, the childrenwere taught 'new' made-up adjectives in frames. The results of this study support the conclusion thatchildren don't know the syntax of TM.

6 Even if there were consistency within the adjectives, we would still have to explain why it's always theO-adjectives that are much more difficult.

Page 167: Rich Languages From Poor Inputs

152 Wexler

that in this paper Cromer tests whether frequency could be a possible explanationof the mean number of errors. The answer is no. The correlation between frequencyand mean number of errors is non-significant. As Cromer points out, one of the mostdifficult adjectives, hard, is also one of the most frequent ('AA') words.

Step back from the data for a minute, concentrating on intuition. It does seem thathard is a word that children hear a lot. That's hard! Lots of things are hard, and evenmore so for kids. So kids hear these words in this context, they must know what theword means, e.g., This puzzle is hard is understood by the child.7 Yet in contexts withinfinitival complements, where the lexical content of the subject is not a giveaway,children often fail to understand the sentence.

Cromer (1987) compared the performance of nine-year-olds who had undergonethe year-long experiment in Cromer (1983) with nine-year-olds from other experi-ments that had not undergone this experience. The children who had the year-longexperience did better on O-adjectives than the children without experience. Theexperience has no feedback—the children weren't told whether they were corrector incorrect. Nevertheless, at nine years of age, 5 5 percent of the children with theyear-long experience performed 'in an adult manner' whereas only 14 percent of thechildren without experience did so.

As Cromer argues, this effect can't be one of hypothesis testing given feedback—there was no feedback. Rather, it seems to be an effect of the child 'using' the gram-matical experience she has to figure out the correct interpretation. Experience inattempting the derivation seems to speed up maturation.

This is not surprising to a maturational, growth-driven point of view. There is noreason to think that use of an ability can't make its instantiation in performanceincrease, especially around the age of biological maturation of the ability. And noticethat the effect is from age eight to nine, the age at which we'll predict the appropri-ate syntactic operations will be maturing. It would be an important experiment tosee if the same year-long experience had such an effect at age four, say, when thechild is (according to theory), far from the relevant grammatical capacity. So far, thisseems to be an effect of use slightly speeding up a maturational process that is takingplace.

In summary

(8) a. TM is not mastered until about age eight or nine, with children often showingquite poor performance.

7 There is no reason to think that the relevant syntactic operations that the child has difficulty withnecessarily appear with the simpler uncomplemented form. For a comparable observation and an analysis,see Hirsch and Wexler (2006) where it is shown that sentences with main verb seem and no infinitivalcomplement (Mary seems cold) are understood (produced) by the child despite the great lag in compre-hension of sentences with seem and an infinitival complement (raising sentences), Mary seems (to John)to be wearing a hat. That paper argues that the simpler sentences don't involve the necessity for defectivephases whereas the raised sentences do.

Page 168: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 153

b. For most of the younger period, children perform quite well on adjectiveswith the kind of infinitival complements that we call 'control' constructions.8

c. Performance is not related to frequency of the adjectives; TM adjectives aredifficult compared to control adjectives whether frequent or not.

d. There are two response patterns on TM sentences before children mature intothe adult stage:

(i) Primitive: At the youngest ages, children always get TM constructionswrong.

(ii) Intermediate: At somewhat older ages, children sometimes get TM con-structions correct and sometimes wrong.

10.5 The Reorganizational Principle

On the basis of the data that we have just reviewed, Cromer (1983: 316) concludesthat

... it is not clear what organizational principle [my ital.—KW] underlies the differential cat-egorization of S- and O-adjectives except that they are useable in differing linguistic frames.But that is the very puzzle [my ital.]... children are not slowly acquiring, piecemeal, one andanother of the crucial words— It is not clear how the acquisition of word knowledge necessaryfor correct adult performance on this linguistic structure comes about. What seems evident,however, is that theories that assume a process of gradual learning... cannot account for theacquisition of word knowledge seen in the results of this longitudinal study. It appears thatsome kind of reorganizational process must be posited [my ital.] but precisely how or why thisreorganization occurs remains unanswered.

Cromer acknowledges that he doesn't know what the reorganizational principle is. Yetthe quote is striking for its understanding of the problem. My own view, confirmedby the extensive experience of developmental psycholinguistics, is that it would havebeen impossible to come up with the exact solution to the question of what thereorganizational principle is without a careful analysis of the syntactic derivation ofTM constructions, an enterprise that Cromer didn't undertake. So he couldn't find

8 In fact, in addition to the data on S-adjectives that I have just reviewed, there is an extensive exper-imental literature on the development of complement control where the complement is of a verb, notadjective. This literature firmly shows that complement control develops quite early. There is no reasonto think that control into the complement of an adjective is any more difficult, so that Cromer's results onadjective control are very compatible with a large experimental literature. For a summary of the literatureon complement control, see Wexler (1992), and since then the consensus has remained the same. However,Becker (2004) has more recently questioned the consensus. For arguments that her experimental resultsare based on artifacts, and new experiments demonstrating this, see Hirsch and Wexler (2006) and Hirsch,Orfitelli, and Wexler (2007).

Page 169: Rich Languages From Poor Inputs

154 Wexler

the solution. Nevertheless, his understanding of the problem was a necessary first steptoward finding the solution.9

Chomsky (1969) in fact had understood this and made a first step toward a solution.She proposed that young children don't understand sentences where the deep andsurface structures don't line up, aren't identical. TM sentences (O-objects) have thisproperty. In (sb), the wolf is the surface subject but the deep object. So children shouldperform poorly on TM. But in control sentences the surface subject is also the deepsubject. In (53), the wolf is both surface subject and deep subject. So control sentencesshould not present a problem to the child. These predictions conform with the facts,as we have seen.

So Chomsky actually makes a proposal for the reorganizational principle (thoughshe also puts some weight on lexical learning). The principle allows sentences to havenon-aligned deep and surface structures. As a first step, it is on the right track, butfurther evidence shows that it can't be the whole story. There are many other con-structions where the deep and surface structures aren't identical, yet the developmentof these structures isn't delayed.10 For example, in a wh-question in which the object isquestioned, the surface structure places the object in an early position in the sentences(specifier of the Complementizer Phrase), whereas in the deep structure it is in objectposition. So in (93), who is the deep object but not the surface object. Yet children haveno difficulties on such structures. Similarly for relative clauses, which develop muchearlier than TM constructions,11 although a bit later than wh-questions and with afew more errors.

(9) a. who did Mary kick?

b. the person who Mary kicked.

c. the book read Mary [in a Vi language, meaning Mary read the book]

Or consider Vi structures (German, Scandinavian, many others), for example (9c)using English words, in which a phrase is placed in first (Spec, CP) position in root

Compare this to so much in contemporary psychology, where an argument like Cromer's seems notto be understood. Rather, it so often seems to be assumed without argument, that a gradualist, 'learningtheory' (whether traditional or Bayesian) analysis must underlie all change, without argument. In this sense,there has been a certain loss of standards, a loss of knowledge, really, in much of current cognitive science.

10 For evidence of the very early comprehension of such structures, see Stromswold (1995) and Hirschand Hartman (2006). There are many studies of production. Children know (pretty much 100%) that a deepobject must raise to an early position, if the object is wh. E.g., Guasti (2000) shows of 2,809 wh-questionsin child English, only 41 are in situ (all echo questions, where in fact the raising doesn't occur in the adultlanguage). For a general survey of the precociousness of many properties of wh-questions, see de Villiers(i995)-

11 Goodluck and Tavakolian (1982) and Hamburger and Grain (1982) show that object relative clausesdevelop before age four. Recently this standard claim for the early development of relative clauses hasbeen challenged by Belletti and Rizzi (see this volume); however I think that a good case can be madethat the reason that their results contradict standard results in, for example, English, concerns processingconsiderations (temporary garden paths in the Italian and Hebrew structures) in the particular structuresthat they study.

9

Page 170: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 155

clauses, even in declarative sentences, while the verb moves into second (C) position.The deep object (the book) shows up in this early non-object position in surfacestructure. One of the striking results of the last twenty years of large-scale research invery early syntactic development is how well young children know syntactic principlesin this stage, even though the surface and deep structures don't align. For example,one early result (Poeppel and Wexler 1993) showed that one German-speaking childat 2;i produced 30 percent of his utterances with a non-subject in this first Spec, CPposition, always with a finite verb, raised to second position. There have been manyreplications in the literature; see Santelmann (1995) for a large amount of data onSwedish and Wexler, Schaeffer, and Bol (2004) for much data on Dutch.12

Confronted with this type of distinction, in particular the developmental distinc-tion between wh-movement (early) and verbal passive (late), Borer and Wexler (1987,1992) proposed that the distinction between forms that children have trouble withand those they don't was more specific and more related to underlying syntacticmechanism. They proposed that children up to a certain age (we now know this isabout eight years, if measured by verbal passive and related constructions) don't havethe mechanism to form non-trivial argument chains (the kind of chain relation thatpassives need). This hypothesis is called the A-Chain Deficit Hypothesis (ACDH). Theproposal is that the capacity to form A-chains matures.

ACDH has been replaced by another theory, more empirically accurate, the Uni-versal Phase Requirement (UPR) (Wexler 2004). The background is the derivationby phase analysis in Minimalist theory. The idea is to severely restrict the compu-tation needed in a sentence by proceeding to analyze by phases, from the bottomup, with only the minimal amount of material available from the next phase down.N. Chomsky's (2001) Phase Impenetrability Condition (PIC) states that when workingat a phase (taken to be vP or CP), only the edge (specifiers and head) of the nextlower phase is visible to the derivation (to the probing feature in the higher phase).The complement of the lower head and anything below that in the phrase marker isnot visible, but, rather, has already been interpreted. At first sight, several well-attestedstructures (passive, unaccusatives, and subject-to-subject raising among them) shouldnot be derivable, since they raise material from within (not on the edge of) the lowerphase. In order to derive these structures, the relevant categories are taken to not bephasal. Passives, unaccusatives, and raising structures are grammatical because therelevant vP is defective. It is not a phase. Thus the complement of the vP (for example,the direct object in the case of passives and unaccusatives, the lower subject in the caseof raising) is visible. In short, several structures require a non-phasal characterizationof categories that are usually phasal (we will return to the characterization of therelevant environments later in this chapter).

12 For a review of the OI field and many relevant results, see Wexler (2003 and forthcoming).

Page 171: Rich Languages From Poor Inputs

156 Wexler

The Universal Phase Requirement (UPR) simply states that children don't count anyphases as defective; all potentially phasal categories are phasal to them:

(10) Universal Phase Requirement (Wexler 2004): Children (to about age eight) takeall vP and CP to define phases, rendering passives, unaccusatives, and (subject -to-subject) raising structures ungrammatical.

For example, in verbal passive derivations, the adult can move the direct object (moreprecisely, can allow it to be a goal for the T probe) because there is no phasal barrierbetween T and the object. A child subject to UPR, however, will take vP as a phasalnode; thus the direct object, which is not on the edge of vP, but rather in the comple-ment of v, will not be accessible to T. This is sketched in (11), where T is called I.

Replacing ACDH by UPR solved a major problem in the theory of syntactic develop-ment. Although many structures containing A-chains were known to be quite delayedin development, there were well-known structures with A-chains that were known todevelop very early. Most prominent among these was the raising of a subject from thespecifier of vP to the specifier of T. ACDH disallows this raising, since it is an A-chain.UPR notes that the subject comes from the specifier (edge) of V, so there is no problemwith T probing the subject. This is a natural, long-sought solution that only workedwith the emergence of the derivation by phase theory, and the role given to edges interms of their visibility to the next phase. The fact that developmental theory didn'twork in a theory without these characteristics, and that it now does, provides strongsupport both for the linguistic theory and for the developmental theory.

A major strategy in developmental psycholinguistics when confronting a construc-tion that is delayed, and the reasons for the delay are not known, is to attempt toassimilate the construction to known reasons for delay. What could the reasons befor the great delay in TM? In fact, the only explanation in the literature of whichI am aware for the late development of TM (except for Chomsky's original proposal,which we have seen can't be exactly right) is one in Wexler (2004). Wexler points outthat many analyses of TM contain an A-chain, so that ACDH could explain the latedevelopment of TM. The proposal wasn't worked out in detail, and to do so would

(11)

Page 172: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 157

require the adoption of a particular syntactic derivation for TM, and then asking, isthe kind of A-chain needed one for which ACDH holds? The answer could be yes, butsince ACDH didn't apply to all A-chains, it wouldn't have been completely satisfactoryto adopt ACDH, only a partial result. But we can turn to UPR, provide an exactderivation, and ask unambiguously whether UPR can explain the late developmentofTM.

Given that UPR explains a wide variety of late developing syntactic structures, itis natural to ask this question. First, we should ask the biological question. Since theexplanation for the withering away of UPR is biological, maturational, depending onthe biological/linguistic state of the organism, if UPR explains the late developmentof TM, we predict that TM and other UPR-delayed structures emerge about the sametime. In this regard, we have an excellent answer.

To very briefly review some comparison structures, let us consider the developmentof the verbal passive. Comprehension studies seem to give the most precise numbersfor how the passive develops, and should be compared to the comprehension studieson TM that we have looked at. There is good reason to not take any verbal passivestudy in English as the appropriate measure, because it seems that children who don'tunderstand the syntax of verbal passive will nevertheless often perform correctly inthe experiments by using the homophonous adjectival passive as a means towardunderstanding (Borer and Wexler 1987). Thus the best measure to use to study thedevelopment of the syntax of verbal passive in English is a measure of passives forwhich there is no homophonous verbal passive. In practice to date, this meant thepassives of subject experiencer verbs like like, see, remember, etc.

The dotted line in (12) shows the mean percentage correct in a two-choice pictureselection task for verbal passives of subject experiencer verbs.13 Random guessingproduces chance (50 percent) results. Three-year-olds are at below chance, and it isnot until the seven-year age range that the mean percentage correct is 75 percent,with eight-year-olds at about 80 percent. Although these data are probably the mostcomplete study of the development of verbal passive of subject experiencer verbs overa full age range, their results are consistent with several other studies referenced in thepaper.14 Reviewing experimental results on the development of TM in section 10.3,we concluded that only at about age eight could we become fairly confident that achild would be a 'passer' for TM. The age of development of TM looks quite similarto the age of development for verbal passive based on a correct syntactic analysisas measured in (12). Thus it appears as if TM and another UPR-delayed structuredevelop about the same time.

13 See the paper for full data on a two-by-two design crossing active/passive with agentive/experiencerverbs.

14 The original detailed study of the late development of passives of subject experiencer (called 'psycho-logical') verbs was in Maratsos et al. (1985).

Page 173: Rich Languages From Poor Inputs

158 Wexler

(12) Graph from Hirsch and Wexler (2006)

Comprehension of Raising & Passives

The black line in (12) graphs the percentage correct against age for a two-choicepicture comprehension experiment for subject-to-subject raising on the same chil-dren who participated in the passive experiment. This measured comprehension ofsentences like (is).15

(13) Bart seems to Lisa to be playing the saxophone

(13) demands a defective v for seems, and is thus subject to child errors due to UPR, asargued by Wexler (2004). Chance performance is 50 percent, as in the passive study.The graphs for passive and raising parallel each other extremely closely, as predicted.Raising and passive, two structures predicted to be delayed until the UPR withersaway, develop in tandem. Thus we can see that the development of TM also patternswith the development of raising.16 TM thus patterns with two known UPR structures.Hirsch and Wexler (20073) argued that (object) cleft sentences and specificationalpredicates (inverse copulas in the sense of Moro 1997) should also be delayed by UPR.Their experiments confirmed that they develop in a similar fashion to passive andraising. These are all structures that should be comparable in development to TM. Tothe extent that we have evidence, this prediction holds.

The discovery of the immature child's adjectival strategy for interpreting verbalpassive structures allows us to clear up a puzzle raised by Cromer (1972), which pointsout that Cromer (1970), along with its study of TM, studied children's comprehensionof two instances of a passive sentences, using bitten, as in (14).

15 See the paper, and for even more detail Hirsch and Wexler (2006), for a much fuller discussion of thedata including excellent comprehension of unraised sentences with seem.

16 What is needed to even more precisely paint out this picture is an experiment that studies thecomprehension of TM and passive in the same child, as the passive/raising study that we have just reviewedstudies passive and raising in the same child. Similarly for TM and raising, and similarly for other UPRstructures. Cromer's small study on passives carried out with TM that we are about to discuss adds usefulinformation of this type.

Page 174: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 159

(14) The duck is bitten by the wolf.

The children performed quite well on (14). Although he doesn't state the percentagecorrect, one can interpolate from his figure that the children perform on (14) with amean percentage correct in the low 90 percents, quite an excellent performance. Onthe basis of this passive data, Cromer (1972) claims that TM difficulty is 'specific tothe structure is shown by the fact that they can interpret passive sentences correctly.'If true, this would be a puzzle for our claim that the same underlying grammaticallimitation that causes children difficulties with passives and other structures alsocauses difficulty with TM.

The issue is immediately resolved, on current understanding. Note that bite is an'actional' verb, i.e., an activity verb; it makes a good resultant-state adjectival passive17

(Kratzer 2001) and the present tense reinforces this interpretation. The adjectivalstrategy will allow verbal passives of an activity verb to be understood as this particularkind of adjectival (stative) passive. More precisely, the child doesn't get the (eventive)meaning of the verbal passive, but the resultant-state adjectival passive interpretationallows the child to behave correctly in the experiments that are performed, i.e., thechild knows who did what to whom. For the child's interpretation, the event tookplace before the time referred to by the sentence (e.g., before now in (14)), but thatplays no role in the usual experiments.18

The children that Cromer studied were aged from 5;3 to 755. Hirsch and Wexler(2004) measure child performance on verbal passives of activity verbs (i.e., similarto bite) at about 90 percent in the six-year-old age range. This good performanceon passives of activity verbs is the result of the adjectival strategy (when UPR pre-vents the adult syntactic analysis). Thus Cromer's results are strictly compatible withother results on passive. Children at this age haven't mastered the syntax of pas-sive; they are at around chance levels on passives of subject experiencer verbs (seethe approximately 50 percent performance on subject experiencer passives for six-year-olds in (12)), which do not make good adjectival passives, including resultant-state adjectival passives. But the adjectival strategy allows the child to performmuch better than her understanding of the syntax of verbal passive would otherwiseallow.

Thus there is no need to say, as Cromer did, that TM difficulty is 'specific to thestructure'. The syntax of verbal passives and the syntax of TM seem to develop aboutthe same time.

17 Kratzer shows that all activity verbs can form resultant-state adjectival passives. These have stativemeaning of a particular kind. In particular, 'the duck is bitten' has the meaning that the duck at the presenttime is in a state such that the event of biting the duck took place before this time.

18 One would like a direct test of the child's interpretation, essentially a test of aspectual relations. This isa delicate experimental matter. My lab has attempted such an experiment, but we have not been successfulto date.

Page 175: Rich Languages From Poor Inputs

160 Wexler

Because of his empirical conclusions about the time course of development forpassive and TM, Cromer couldn't attempt to assimilate their late development to thesame cause. Knowing what we now know about the development of passive, we can.Which in science means, of course, that we should.

10.6 The Syntax of TM:19 Necessity for Defective Phases

TM is famously a puzzle, and famously complex. As we said in section 10.1, TM is anodd duck. There are at least two serious puzzles about its behavior.20

(15) a. Why should a DP which should be perfectly happy to stay in object positionmove to subject position of the matrix. How is it even possible, given thatthe object position is assigned accusative case by the verb?

b. It looks as if the moved DP (what ultimately becomes the subject (that housein (i)) must first move to an A-bar position in the (complementizer positionof the) embedded clause, and then move up to the subject (an A-position) inthe matrix clause. Otherwise, how could A-movement, a fairly local opera-tion, move the DP straight up in one fell swoop from its original embeddedobject position to the subject position of the matrix clause?

But there is a large literature arguing that such A-bar followed by A-movement ofa constituent (called Improper Movement) does not exist. So how can it appear toexist in TM? We might add that if the first movement is A-bar, it helps to solvethe problem of how an object that has been valued for case in position can in factmove. A-bar movement moves such objects, with the right motivation. What is thatmotivation?

N. Chomsky (1977) proposed that the problems of TM could be solved if weassumed that there was first A-bar, then A-movement.21 Furthermore, he proposedthat the A-bar movement was the movement of an 'empty operator.' An empty opera-

19 The presented syntactic analysis of TM, as I state, follows Hicks (2009). All phrase-markers thatI present are either taken directly from those in Hicks (2009) or are part of his phrase-markers.

20 My presentation of the syntax of TM is quite informal. I am trying to explain the basic features tonon-syn tacticians (and to understand the guts of the analysis myself). But I follow a large explicit literature,and the proposal for the analysis that I adopt is the well-worked-out explicit analysis of Hicks (2009). (15)is in particular quite informal. I am trying to intuitively present the conclusions of a great deal of syntacticanalysis that is done with care. Any reader who finds the presentation too informal has an ample literatureto pursue to correct this deficiency in my presentation.

21 The leading alternative to the class of A-bar movement followed by A-movement' is the class of anal-yses in which the matrix subject (that house in (i)) is base-generated, thus vitiating the improper movementproblem, but leading to a number of other problems. In recent important work, Hartman (forthcoming)shows that the kind of 'defective intervention' effects one would expect on an A-movement account (andnot expect with base generation) exist. The innovative strategy of that paper is to find interveners (namelyto-phrase experiencers) that are not ambiguous; unlike/or-phrases they can't be read as complementizerplus subject of the embedded clause.

Page 176: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 161

tor is simply a DP that has 'operator' semantics and is phonetically empty. For example,a relative clause with no visible relative pronoun contains a (moved) empty operator:

(16) that book [O(i) John saw t(i)]22

Hicks adopts these ideas, detailing a precise account of what the empty operator is.A null operator is a complex DP with an Operator and its 'paired' associate DP. Forexample, in the TM sentence John is easy to see, the surface subject John was originallypaired with a null operator in a complex DO in object position. Hicks gives thefollowing structure for the null operator containing John.

(17) has several important features. First, the object is a complex DP with a null opera-tor head. The 'visible' object John is the complement of this operator head. I won't com-ment in detail on the features (Hicks gives a thorough discussion of the derivation).But it is the whole complex DP that is raised by A-bar (empty operator) movement.Since the operator moves, the DP that contains it moves, and the ultimate subjectof the sentence John also moves, as part of the DP. The movement, of course, is toSpec, CR

The clever part of this treatment of the null operator is the solution to (isb), theproblem of improper movement. The DP in (17) moves to Spec, CR After that, theassociate of the operator, John, raises to the surface subject position. Why is thissecond movement, of John, permitted? Isn't it Improper Movement? No, because Johnwas never the target for an A-bar movement operation. Rather the (null operator)DP that contains John (or the operator Op itself) was the target of the movement.So the way that Hicks works out the syntax, the DP John was never the subject ofA-movement.

Of course, when a complex DP is raised, we have to ask whether it should bepossible for a DP that the complex contains to be later moved. Hicks treats theconstraints on such later movement to the theory of phases. The first movement,

22 For transparency of understanding, I have retained trace notation. If one preferred the copy theoryof movement, the essentially equivalent properties of the empty operator would hold.

(17)

Page 177: Rich Languages From Poor Inputs

162 Wexler

of the complex object DP to Spec, C, brings the complex operator DP into theedge of C. That means that anything that the complex operator DP contains is in theedge of C. In particular the operators associate (and surface subject) John is in the edgeof C. So T in the matrix clause can attract John without violating the Phase Impenetra-bility Condition. T is attracting a constituent in the edge of the lower phase, permittedbecause at this stage of the operation the edge has not yet been subject to semanticinterpretation.

Lets go through (superficially, we won't analyze all the feature operations) Hicks'sderivation of the TM construction (18), following his description closely, often usinghis words.

(18) everyone is tough for us to please

The complex null operator DP merges with V and then the whole with v. Crucially,case on the null operator DP is checked, but case on everyone is not checked, it notbeing in position for the probe/goal relationship. (As I stated, this is the benefit ofthe complex DP analysis of null operators. The unchecked case feature will ultimatelyallow object everyone to raise to subject position.) Note in particular that the nulloperator DP will have an (unchecked) wh feature. This will later force movement ofthe DP. Before movement, the result is (19).

(19)

Subject PRO merges. (There is much evidence that for us can be higher than thesubject of the embedded infinitival.) Since [wwh] must be checked, it has to move,successive cyclically since vP is phase. The null operator DP first adjoins to vP, pied-piping object everyone with it. (20) results.

Page 178: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 163

(20)

The empty operator (DP (k)) must move from Spec, v to Spec, C (bringing the objecteveryone with it). That is, the successive cyclic A-bar raising of the complex emptyoperator DP continues. Note the following properties (21) of the derivation thatresults in (22), making the derivation compatible with PIC and other principles.

(21) a. vP is shipped off to interpretation.

b. All the uninterpretable features are on the edge of vP.

c. PRO moves to Spec, T.

d. C merges with T.

e. Crucially, C has wwh (uQ and wEPP) features, which attract the edge DP ofvP, which contains the object DP everyone, its features still unchecked.

(22)

Page 179: Rich Languages From Poor Inputs

164 Wexler

After movement of the complex Op DP into Spec, CP, AP (the Adjective Phrase, withhead tough) merges with CP. Note the experiencer PP for us in Spec, AP. A lightadjective a merges with AP, and tough raises to it. This last movement of tough derivesthe correct word order, with tough preceding the experiencer phrase. The need tohave some place to raise tough to provides empirical support for postulation of thelight adjective a. (23) results.

(23)

T merges. wEPP in T probes and reaches the goal everyone, from within the complexDP in the edge of CP. This is the way that the second movement, the 'A-movement'of the two-movement approach is realized in the complex null operator DP analysis.'A-movement' takes place from part of a DP within Spec, CP.23 (24) results.

One should ask whether other movements of this general type are allowed from within Spec, CP.23

Page 180: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 165

(24)

Crucially, for this last movement of everyone into Spec, T to take place, AP (and aP)may not be phases. If, say, aP were a phase, then nothing in its complement AP is seenat T; interpretation has already taken place. Hicks in fact argues that aP (or AP) is nota phase. It's interesting that Hicks considers that AP might in fact be a phase (if therewere an external argument).

10.7 The UPR and the Phasal Syntax of TM: Why the ConstructionDevelops Late

Hicks's statements on the reasoning about whether aP should be a phase derive fromtwo separate arguments (although he consistently derives the same conclusion, thataP is not a phase). On the one hand, he writes, following N. Chomsky (2000, 2001),that vP and CP are the phases, a substantive definition. On the other hand, he writes,

Page 181: Rich Languages From Poor Inputs

166 Wexler

'Since aP projects no specifier,24 like passive and unaccusative vPs for example, it isnot a phase.'

We have seen that it is crucial that aP not be a phase, or the derivation will crash.Suppose, however, that children take aP to be a phase. Then for children the derivationwill crash. That is our hypothesis for why TM is so late. Children take aP to be a phase,so the crucial movement of the object from within the complex null DP to the subjectposition is not allowed. In (23), everyone is not in the edge (specifier) of the lightadjective a, rather it is in the complement of a. Thus if aP is phasal, T can't see into thecomplement of a.

Our argument that UPR forces immature children to take aP as a phase comes viaanalogy to the argument that children take vP to be phasal. Hirsch and Wexler (20073)take an empty predicative phrase25 to be phasal for children, in order to cash out theirproposal that inverse copulas (and thus cleft sentences) crash for immature childrenvia the UPR.

If predicative phrases are phasal for children, it seems that light adjective phrasesshould also be phasal for children. It is even possible that the (light) adjectivephrases and predicate phrases reduce to the same thing.

At any rate, it is quite natural, it comes for free, that children should take aP tobe phasal. If the idea is that a specifier or external argument makes a phrase phasal iscorrect, then aP should be a candidate, and children should treat aP the same way thatthey do vP or PredP. It really does seem that children's taking aP to be phasal comesfor free once we have UPR and syntax as we know it.26

Once UPR forces the child to take aP as phasal, T cannot probe beyond the edgeof aP. wEPP and the phi-features of T cannot probe beyond aP and remain uncheckedand existent at LE Full interpretation then marks the TM sentence as ungrammatical.

Notice that we don't have to ask whether the movement of everyone to T from thecomplex null DP is or is not 'A-movement.' We don't have to define A-movement.Rather, the EPP and the theory of phases determine whether the movement can takeplace. This consideration has played a key role in developmental theory, for examplein explaining why short scrambling in the Germanic languages is not delayed to thesame extent as passives, etc. (Wexler 2002). In the derivation of scrambling, there isno phase to be moved across, except by successive cyclic movement. Thus there is noquestion of UPR ruling out this type of scrambling.

24 Usually it's thought that what is crucial to make a projection a phase is the existence of an externalargument in the specifier of the projection.

25 For the necessity of predicative phrases with empty heads in these contexts, see Baker (2003) andBowers (1993).

26 It remains a question exactly which phrases are phasal and which phrases children will take to bephasal. Perhaps all Verbal-like' phrases are taken as phasal by the immature child; that would include vP,PredP CP, aP.

Page 182: Rich Languages From Poor Inputs

Tough-Movement Developmental Delay 167

10.8 Conclusion

In short, TM constructions are so slow in development because UPR doesn't allow thenecessary derivation. The syntactic analysis of TM, and its relation to phase theoryhas helped us strongly in our understanding of why TM is delayed. It seems fair to saythat the developmental facts (delay till age nine or so) of TM and its derivation fromUPR provide further empirical support for the phasal analysis of TM, as in Hicks, andfor phase theory in syntax more generally. It will not be the first time that syntactictheory and developmental theory have provided results that beautifully interact witheach other, supporting the analysis of UG and development at the same time.

Page 183: Rich Languages From Poor Inputs

11

Assessing Child and Adult Grammar

JULIE ANNE LEGATE AND CHARLES YANG*

11.1 Introduction

Idealization is the prerequisite for theoretical progress, yet it requires constant revisionto keep in touch with reality. The assumption of the child as an instantaneous learnerhas helped sharpen the focus on the properties of Universal Grammar (UG), thoughit inevitably deprives us of insights into the process of language acquisition. As CarolChomsky's pathbreaking research shows, we stand to gain much from the transientstages in child language. Not all aspects of child language are acquired instantaneouslyor uniformly: acknowledging this in no way denies the critical contribution fromUG and can only lead to a more complete understanding of language. To do sorequires accurate measures of children's developmental trajectories, realistic estimatesof the primary linguistic data, concrete formulations of linguistic theory, and precisemechanisms of language acquisition. It is in this spirit that we tackle the acquisitionof the English metrical stress system in the present paper.

Why stress? First, the stress system of English has played a central role in thedevelopment of phonological theories (N. Chomsky and Halle 1968; Liberman andPrince 1977; Hayes 1982, 1995; Halle and Vergnaud 1987; Halle 1998) yet consid-erable disagreement remains. The developmental patterns of stress acquisition maycontribute to the understanding of grammatical theories as Carol Chomsky's workdemonstrated. Second, there is now a reasonable body of developmental data onstress acquisition, both longitudinal and cross-sectional, that the main (early) stagesin children's metrical system can be identified—although as we shall see, more studiesare still required before the phonological theory of stress can be fully connected withchild language acquisition. Third, and quite generally, linguistic theories frequently

* For helpful comments and suggestions, we would like to thank Morris Halle, Kyle Gorman, andthe audiences of the 35th Penn Linguistics Colloquium and the Parallel Domains workshop at USC. Anextended version of this chapter, Legate and Yang (2011), is available from the authors.

Page 184: Rich Languages From Poor Inputs

Assessing Child and Adult Grammar 169

have to make decisions on what constitutes the core system of the grammar—see,e.g., basic word orders, default rules, unmarked forms—and what can be relegatedto the lexicalized margins. The complex metrical system of English is riddled withexceptions, thanks in part to the extensive borrowing in the history of the language.As far as we can see, theoretical devices that express these idiosyncrasies—see, e.g.,diacritics, exception marking, or 'lexical listing'—are frequently asserted without aprincipled basis. Of course, these are decisions the child learner needs to make aswell, for the primary linguistic data does not arrive pre-labeled as core or peripheral;the child's navigation toward the adult grammar might shed light on the choices of lin-guistic theorizing. Indeed, one might go as far as to identify the failure of dealing withrealistic linguistic input, and exceptions in particular, as the source of a long-standingchallenge that has been magnified in recent years. As discussed by N. Chomsky andHalle (1968), the existence of exceptions and other idiosyncratic patterns that runcounter to a theory of grammar is unremarkable unless it leads to the development of atheory with higher degrees of generality since exceptions can be always be memorized.But as illustrated most vividly in the so-called past tense debate, there is a slipperyslope from 'some parts of language are memorized exceptions' to 'all of languageare memorized exceptions'. And the temptation grows stronger by the day as longas one fails to produce the principled treatment of exceptions, and it is presentlynot difficult to find radically lexicalized theories where everything is memorized(e.g., Sag 2010).

Linguistics would seem a dreary enterprise if language were no more than a col-lection of idiosyncrasies. The burden of proof must fall upon those who do wish touphold a systematic grammar to develop a principled account for exceptions. Ourapproach here is learning-theoretic, as we try to develop a realistic acquisition modelthat operates on the type of data that a young English learner might encounter. As faras we know, no formal study of language acquisition has ever considered the full rangeof linguistic experience. Keeping to the topic of stress acquisition, all current learningmodels have been 'sanitized' as they only deal with what the researcher regards asthe core patterns of language, thereby steering clear of noise, exceptions, and the like.At the same time, one cannot uncritically assume the ready availability of especiallyinformative items in the input (Tesar and Smolensky 2000); the welfare of the child'smetrical stress should not be left to chance—needing to hear words such as Manitobaor Winnipesaukee (Dresher and Kaye 1990; Dresher 1999).

Our learning model is designed to detect structural productivity, or lack thereof,in the face of exceptions—exactly the type of situation that a metrical stress learnerfaces, and exactly the type of theoretical choices that the linguist faces. We evaluatethe validity of generalizations in the metrical system that the learner might arriveat, and we aim to relate these to the developmental stages in child grammar and thetheoretical treatments of stress in adult grammar.

Page 185: Rich Languages From Poor Inputs

170 Legate and Yang

ii.2 Learning Productivity

How many exceptions can a productive rule tolerate? Our approach is a throwback tothe notion of an evaluation measure, which dates back to the foundations of genera-tive grammar (Chomsky 1955, N. Chomsky and Halle 1968, in particular p. 172). Itprovides an evaluation metric, and hence a decision procedure, that the learner candeploy to determine whether a linguistic generalization is productive and thus can beextended to new items that meet its structural description.

Though many decision metrics are conceivable, the calculus in our analysis isbased on real-time processing complexity of linguistic processes to which there areexceptions. Suppose that there exists a rule R that can in principle apply to a set of Nlexical items; of these, m items are exceptions and do not follow R. We state withoutfurther comment the following result:

(i) Tolerance Principle: R is productive if and only if

NW < r n N

The reader is referred to Yang (2005) for the mathematical details of the model.In essence, the empirical motivation comes from psycholinguistic evidence that thenumber of exceptions (m) contributes to the time complexity of processing, so muchso that after m reaches a certain threshold as specified above, it becomes more efficientto list all N items as exceptions, which can be processed in a frequency-sensitivefashion.

The Tolerance Principle can be straightforwardly applied to identify both pro-ductive and unproductive processes in languages. The case of English past tenseis obvious: supposing that there are 120 irregular verbs, one needs a total of 800(8oo/ln8oo ss 120) verbs altogether, or 680 regulars, to sustain the productivity ofthe -d suffix, which is of course easily met. Take another well-known case in thepsycholinguistic study of morphology: the plural formation of nouns in German. Thefailure of the Tolerance Principle would be total if puralization in German operatesas claimed in some quarters (e.g., Marcus et al. 1995) with only one productive rule('-s'), which accounts for only a tiny fraction of nouns (about 5 percent; Sonnen-stuhl and Huth 2002): the -s rule would have 5 percent coverage and 95 percentexceptions. Thus there must be productive processes within the so-called irregu-lars. One quickly discovers that the feminine nouns in German tend to take the -n suffix though all grammatical descriptions are quick to point out the existence ofa considerable number of feminine nouns that take other suffixes. The TolerancePrinciple can be used to evaluate these generalizations. For monomorphemic1 fem-inine nouns that have appeared at least once per million in the Mannheim corpus,

1 This is the most conservative estimate. If one includes compound nouns, the number of -n suffixedfeminine nouns greatly increases. We thank Kyle Gorman for verifying these counts.

Page 186: Rich Languages From Poor Inputs

Assessing Child and Adult Grammar 171

709 take the -n suffix while 61 do not—which is well below the tolerance thresholdof 770/^(770) «a 116. Thus, the -n suffix is predicted to be productive for femi-nine nouns. Two converging lines of evidence support this prediction. First, Ger-man children overuse the -n suffix as frequently as the -s suffix (Szagun 2001): thetwo thus must both be productive, which is the prerequisite for over-regularization.Second, lexical decision tasks show no whole-word frequency effect among the-n suffixed nouns—a hallmark for productive word formation processes (Penke andKrauss 2002). The claim of a productive -n rule has been made by many specialists onGerman morphology (Wiese 1996; Wunderlich 1999), often in reaction to the dualroute position of Marcus et al. (1995). The novelty of the present approach lies in itsability of reaching similar conclusions on purely numerical basis.

Under the Tolerance Principle, mere majority of a form does not entail productiv-ity; only a filibuster-proof super majority will do, as the sublinear function i/lnNtranslates into a small number of exceptions.2 Another case in English past tenseillustrates the opposite side of productivity: paradigmatic gaps. It is well known (e.g.,Pinker 1999; see also Gorman 2012) that the irregular stem forgo has no generallyaccepted past tense form (*forwent, *forgoed) while stride has no generally acceptedpast participle form (*strided, *striden). Following the original discussion of such mat-ters (Halle 1973, in particular footnote i), these ineffable forms can only arise in theunproductive regions of word formation, for otherwise a productive rule would auto-matically apply (as in the case of the wugtest). Suppose the learner has encountereda verb for which the past tense or past participle form is irregular, i.e., not the regular-d form. He now knows undergo and stride must be irregular but has not encounteredthe past tense of the former or the past participle of the latter. He may also notice thepattern among the irregular verbs that a majority of them have identical forms forthe preterite and participle (e.g., hold-held-held, think-thought-thought). Indeed, inthe CELEX English lexicon, 102 out of the 161 irregular verbs follow this pattern ofsyncretism, but the 59 exceptions (e.g., break-broke-broken, sing-sang-sung) provefatal. For a set with N = 161 items, a valid generalization can tolerate no more than(i6i/ln 161 ss 32) exceptions, which is considerably fewer than the actual numberof exceptions. Thus, even though the preterite-participle identity pattern holds foralmost double as many items as exceptions, it fails to reach the productivity threshold.We correctly predict that the learner will be at a loss when he needs to 'undergo' inthe past or 'stride' in the past participle.

The application of the Tolerance Principle critically depends on the compositionof the vocabulary—or syntactic constructions, see Yang (2010)—that resides in theindividual learner. The productivity of a certain process may even change, along withits scope of application and exceptions—the two quantities N and m may fluctuate as

2 Clearly, none of the English irregular rules can be productive since each would have thousands ofexceptions (i.e., regular verbs); this is clearly reflected in the virtually total absence of over-irregularizationerrors (e.g., bring-brang) in child English and other languages (Xu and Pinker 1995; Clahsen 1999).

Page 187: Rich Languages From Poor Inputs

172 Legate and Yang

the learner processes more primary linguistic data. We turn to explain these issues inthe acquisition of the metrical stress system of English.

11.3 The Learning Model

We assume that the child learner has acquired a sufficient amount of phonologicalknowledge of her specific language to carry out the computation and acquisition ofmetrical stress. Specifically, we assume

(2) a. That the child has acquired the segmental inventory of the native language,which is typically fairly complete before her first birthday, even though themechanisms by which such learning takes place are currently unknown(Werker and Tees 1983; Kuhl et al. 1992; see Yang 2006 for a review).

b. That the child has acquired the basic phonotactic constraints of the language(Halle 1978) and is thus capable of building syllables from segments whichare subsequently used to construct the metrical system.3 For instance, Dutch -and English-learning infants at nine months prefer consonant clusters nativeto their languages despite the segmental similarities between these two lan-guages (Jusczyk et al. 1993).

c. That the child is capable of extracting words from continuous speech, perhapsas early as seven-and-a-half months (Jusczyk and Aslin 1995). While therole of statistical learning in word segmentation (Saffran, Aslin, and Newport1996) is not as useful as previously thought, universal constraints on lexicalstress (Halle and Vergnaud 1987; Yang 2004) and the bootstrapping use ofpreviously segmented words (Jusczyk and Hohne 1997; Bortfeld et al. 2005)appear to be sufficient for the task of segmentation, at least for English (Yang2004).

d. That the child can readily detect prominence of stress. Indeed, very younginfants appear to have identified the statistically dominant stress pattern ofthe language, as seven-and-a-half-month-old English-learning infants per-form better at recognizing trochaic than iambic words (Jusczyk, Cutler, andRedanz 1993; Jusczyk, Houston, and Newsome 1999): at the minimum, thechild is able to locate primary stress on the metrical structure of words, andacquisition of the metrical system probably starts well before the onset ofspeech. We return to the issue of trochaic preference in early child language,as it appears to be a transient stage toward the target grammar.

3 See Gorman (2012) for a modern assessment of the extent to which phonotactics can be regarded asa consequence of phonological knowledge, as the traditional position holds (Halle 1962), rather than anindependent component of grammar.

Page 188: Rich Languages From Poor Inputs

Assessing Child and Adult Grammar 173

These assumptions are warranted by the current understanding of prosodic devel-opment in children and appear indispensable for any formal treatment of stressacquisition.

We share the insights emerging from metrical theories that stress acquisition canbe viewed as an instance of parameter setting as the learner makes a set of choicesmade available by UG. However, we part ways with previous efforts on metrical stressacquisition in the following ways. Unlike Tesar and Smolensky (2000) and much ofthe acquisition research in Optimality Theory, we do not assume that the learnerhas access to target-like representation of the metrical structure, which would largelytrivialize the learning process. Indeed, similar complaints may be lodged against alllearning models that provide the learner with both the underlying and surface rep-resentations of linguistic data: recovering the underlying structure from the surfacestructure is the task of the grammar, the very target of learning.4 In addition, the crit-icisms lodged at the cue-based approach below, in particular the issue of productivityand exceptions, apply equally to OT and corresponding learning models: the data doesnot go away under constraints.

In what is known as the cue-based learning approach (Dresher and Kaye 1990;Dresher 1999),5 the metrical parameters are set in an ordered sequence, each ofwhich is crucially conditioned upon the choices of prior decisions. For instance,while syllables containing a long vowel (VV) may universally be regarded as heavyand syllables with a short vowel without coda (V) light, the weight of those withshort vowel and coda consonants (VC) is a choice of the rime parameter for thespecific language. However, the rime parameter is only 'active' for metrical systems,as in English, that are quantity-sensitive, where the stress placement makes crucialreference to syllable weight. Languages such as Maranungku are, by contrast, quantity-insensitive: the primary stress falls on the initial syllable, and secondary stresses onevery odd syllable thereafter regardless of their weights. Thus, the quantity sensitivityparameter must be set prior to the rime parameter, which likewise must precede thesetting of the stress placement parameters.

A major motivation for learning as a sequence of decisions is to uphold the ide-alization of the child as a deterministic learner. For instance, suppose the child hasnot yet determined the quantity sensitivity of the language: if he proceeds to thestress placement parameters in a quantity-sensitive language such as English, he mightas well need to retreat from these parameters. But this idealization of deterministiclearner is both empirically problematic and formally unnecessary. As we shall see,there is an initial stage of stress acquisition of Dutch (Fikkert 1994), a quantity-sensitive language, that can be appropriately characterized as quantity-insensitive (cf.

4 Conceivably, a joint inference approach could be used to infer both the underlying structures andthe grammar mapping them to surface structures, which the learner can directly observe from the input.However, these techniques, which have been used in natural language processing, rely on supervisedtraining methods, and we are not aware of any successful application in models of language acquisition.

5 See Baker (2001) for a similar approach in syntax.

Page 189: Rich Languages From Poor Inputs

174 Legate and Yang

Kehoe and Stoel-Gammon 1997), and the child does seem able to backtrack fromthis incorrect hypothesis before heading toward the target. Moreover, with the adventof UG-based probabilistic learning such as the variational model (Yang 2002; Straus2008), the formal learnability motivations for cues are no longer necessary (Yang2011).

More important, and more general to the theory of language and language learning,is the issue of balancing generalizations with exceptions. In more recent treatments ofcue-based learning (Dresher 1999), it was recognized that the learners choice maybe influenced by the composition of the linguistic data. For instance, if the childwere to suppose that English has a quantity-insensitive stress acquisition, then wordswith n syllables must be stressed consistently. Dresher points to the presence of a fewcounter-examples to this conjecture (e.g., America but Minnesota) as cues for the childabandoning quantity insensitivity. However, this approach would disqualify all gener-alizations about English stress, as every generalization must deal with the exceptions.The learners dilemma reduces to that of productivity: quantity insensitivity may beupheld if the patterns such as America and Minnesota are not sufficiently abundantand can be listed as lexical exceptions.

Thus, the productivity model outlined in section 11.2 will play a critical role in ourapproach to metrical stress. The preliminary success of the model reviewed in section11.2, and reported in comprehensive details in Yang (in preparation), provides us withsufficient motivation for its applicability in the present case. We outline our approachbelow.

Universal Grammar provides a core set of parametric options that delimit a range ofpossible metrical structures (syllables, weight, feet) and possible computational oper-ations (e.g., projection, foot building, edge marking) that manipulate these structures.Frequently the stress rules are subject to highly language-specific structural conditionsbeyond the metrical system: as noted earlier, English exhibits distinct stress patternsfor nouns and verbs (see Roca 2005 for Spanish), and a variety of affixes with stress-shifting properties. It is inconceivable that the totality of these options is available tothe learner. Rather, we envision the learner experimenting and evaluating the coremetrical hypotheses in an incremental fashion as he processes linguistic data, and thelearner chooses the grammar most highly valued with respect to the present data:

(3) a. If a grammar fails to reach productivity as prescribed by the Tolerance Prin-ciple (i), it is rejected.

b. If there are multiple grammars meeting the Tolerance threshold, the learnerselects the one with fewest exceptions (i.e., most productive).

c. If no grammar is productive, then the stress patterns of words are memorizedas a lexicalized list.6

6 This is not to say that the learner directly memorizes the stress patterns of words. If the acquisitionof morphophonology is of any relevance, it seems that the learner would uses rules to generate the stresspatterns of words—it's just that these rules are not productive. See Yang (2002) for such a treatment of the

Page 190: Rich Languages From Poor Inputs

Assessing Child and Adult Grammar 175

Each grammar G,, then, can be associated with a tuple (Nj, m,-)> the number of words(Nj) it could apply to, and the number of words that contradict it (m,-)- Thus, thelearner traverses through a sequence of grammars as learning proceeds, presumablyreaching the target G? in the end:7

(4) G! -> G2 -> G3 -> ... -> Gr

Under this view, G,+1 is more highly valued than G, resulting from additional linguis-tic evidence unavailable at the stage of G,. In particular, the additional data may havethe effect of rendering G, unproductive thereby forcing the learner to adopt a differentgrammar G^.8 In general, it is possible that a grammars productivity changes aslearning proceeds; after all, the numerical basis of productivity (N; and m,-) changesas the child learns more words.

It is also possible that UG provides certain markedness hierarchies, which leadthe learner to entertain some grammars before others. For instance, it is conceivablethat quantity-insensitive systems are simpler than quantity-sensitive ones, and thelearner will evaluate the latter only if the former has been rejected by the linguisticdata. Alternatively, the learning mechanism may consist of simplicity metric—e.g., thelength of the grammar (Chomsky 1955)—that favors certain grammars over others.And all such constraints can be construed as categorical principles or stated in aprobabilistic framework of learning.

To operationalize the conception of learning in (4), we will first construct anapproximate sample of the child's vocabulary and then evaluate several leading treat-ments of the English metrical system reviewed in section 11.3. This exercise servesthe dual purpose of testing on the one hand the plausibility of a productivity-drivenlearning model, and on the other, the descriptive adequacies of theoretical proposals.

11.4 The Learning Process

The English stress system is complex enough to have engendered a number of compet-ing theoretical analyses, though several points of generalization are common to most.Space limitation prevents us from giving the topic even a cursory review. Roughly

English irregular verbs, in contrast to the direct memorization approach in the dual-route morphologyliterature (Pinker 1999).

7 Strictly speaking, of course, there is no target grammar that the learner converges to. The learnerreaches a terminal state, his I-language, based on the linguistic data he receives during language acquisition.Since the data is necessarily a sample of the environment, it is possible that the learner converges to agrammar that is distinct from that of the previous generation of learners, thereby leading to languagechange. See Yang (in prep) for an application of the productivity model to the well-known case of noun/verbdiatonic stress shift in the history of English.

8 This process of learning, which we believe is what Chomsky put forward in Aspects (1965), is somewhatdifferent in character from the acquisition process in syntactic learning, perhaps reflecting the differencesbetween phonological and syntactic systems (Bromberger and Halle 1989). For additional discussion, seeYang (2010).

Page 191: Rich Languages From Poor Inputs

176 Legate and Yang

speaking, main stress in the nominal domain falls on a heavy penult, and otherwiseon the antepenult. In verbs, main stress falls one syllable closer to the word boundary:on a heavy final, and otherwise on the penult. Major differences between the modelsarise largely in the treatment of nouns with long vowels in the final syllable. Theinfluential treatment of Halle and Vergnaud (1987) predicts final primary stress, whileHalle's later account (1998), based on a different conception of metrical calculationthat needn't concern us here, predicts final secondary stress, except in the case of afinal long unstressable syllable, which will not bear stress. We will provide a summaryof these predictions momentarily; for the moment, let's develop a realistic assessmentof the linguistic input.

We took a random selection of about i million utterances from child-directedEnglish in the CHILDES database. We approximate the growth of the learner's vocab-ulary, which serves as the raw material for grammar learning, by extracting wordswithin two frequency ranges to reflect the development of the metrical system. Intotal, 4.5 million words are used for a total of about 26,700 distinct types. Using a state-of-the-art part-of-speech tagger based on Brill (1995),9 we evaluate the words thathave been automatically tagged as nouns and verbs, about 20,000 in all, which consti-tute the majority of the child's vocabulary for any frequency range. Since nouns andverbs have somewhat different stress patterns, considering them together will pose arealistic test for any model that seeks systematic regularity amidst a heterogeneousmix of patterns.

In some of the studies we describe below, for reasons that will become immediatelyclear, words are morphologically processed using a computerized database from theEnglish Lexicon Project (Balota et al. 2007) as morphology is also known to playan important role in the computation of stress and it is worthwhile to explore itsimplications in acquisition. Based on the consistent developmental evidence that theinflectional morphology is acquired relatively early—in some languages very early—we assume that the learner is capable of parsing inflectionally formed words intomorphological structures and considering their roles in the acquisition of stress.

In all our studies, the computerized pronunciation dictionary CMUDICT version0.7 is used to obtain the phonemic transcriptions of words, which are then syllabifiedfollowing the Maximize Onset principle (Kahn 1976) with sonorants and glides inthe coda treated as syllabic.10 We ignore the prosodic effects on lexical stress in thepresent study. We assume that syllables containing long vowels (diphthongs and the

9 Available at <http://gposttl.sourceforge.net/>.10 Entries that could not be found in these lexical databases are omitted. These are almost exclusively

transcription errors or nonsense words in the CHILDES database.A technical note regarding the utility of electronic databases in the present study. The CMU pronun-

ciation dictionary does not contain part-of-speech information, making it impossible to distinguish thehomographic words with distinct stress patterns (e.g., record the verb and record the noun). Words in theCELEX database do contain parts of speech but their phonemic transcription has systematic inaccuracies.We combined the two databases to obtain the correct transcription.

Page 192: Rich Languages From Poor Inputs

Assessing Child and Adult Grammar 177

tense vowels l\l and /u/) are heavy (H), syllables containing short vowels and no codaare light (L); it is the learners task to determine the proper treatment of syllables withshort vowels and at least one coda consonant (C), which maybe treated as either H orL depending on the language. For the present chapter, we only consider the placementof the main stress. Since the pronunciation dictionary marks primary, secondary, aswell as no stress, we mark the former as i and collapse the latter two as o. For instance,the word animals will be represented syllabically as LLC with the stress contour of 100.

A thorough assessment of the learning model as encapsulated in (4) would involvean incremental growth of the learners vocabulary (via Monte Carlo sampling, forinstance) and the evaluation of alternative grammars along the way. For simplicity,we consider only two specific points of stress development, one designed to capturethe child's stress system under a very small vocabulary and the other when the childhas already learned enough words to potentially match the target state.

In the first study on early stress development, we extracted words that only appearmore than once per 10,000 words, resulting in 420 words, most of which, as expected,are relatively simple. The distribution of stress patterns is summarized in Table n.i.11

The distribution in Table 11.1 is clearly consistent with a quantity-insensitivetrochaic system. A total of 402 words can tolerate 4027 In 402 = 67 exceptions wherein fact there are 26. Interestingly, children learning English and similar languagesgo through an initial stage, which terminates at about 250, during which the childis limited to a maximum bisyllabic template with the primary stress falling on thefirst.12 In the most detailed longitudinal study of stress acquisition, Fikkert (1994)notes that children acquiring Dutch, a language with similar metrical properties toEnglish, frequently stress the initial syllable in disyllabic words for which the primary

TABLE 11.1. Stress patternsfor words with frequency > iin 10,000

contour

i10100010101OOO

counts

28710713733

11 These extraordinarily long words are everybody, anybody, and caterpillar.12 Fikkert provides evidence, noted immediately below, for this limitation. Also compatible with our

model would be for the child not to be limited to a bisyllabic template, but rather for the child to conjecturea quantitative insensitive grammar with the primary stress on the initial syllable. This grammar is obviouslyproductive, having even fewer exceptions than that discussed in the text.

Page 193: Rich Languages From Poor Inputs

178 Legate and Yang

stress falls on the final syllable (e.g., ballon^ ballon, giraf-^gimf). Moreover, the fewtrisyllabic words are invariably reduced to a bisyllabic form, with the primary stressalways preserved (e.g., vakatie^kantie, olifant^ofant). Similar patterns have beenobserved for English-learning children (Kehoe and Stoel-Gammon 1997) in a wordimitation task.

The preference for a trochaic stress system is not surprising since it is well knownthat English children's early language has a large number of nouns (Tardif, Shatz,and Naigles 1997), most of which are bisyllabic thus heavily favoring the trochee.Of course, the English stress is not quantity-insensitive, and there are further com-plications with respect to lexical category and morphological structures. Indeed, ifwe expand the vocabulary for learning, with more verbal forms coming in, the initialtrochaic grammar starts to break down, prompting the learner to develop alternativegrammars. To this end, we consider now words that appear at least once per millionin our sample of child-directed English, again focusing only on nouns and verbs.There are 4,047 nouns, 2,402 verbs, and 5,763 lexically and prosodically distinct wordsaltogether.13

Now the bisyllabic trochaic grammar drops below the productivity threshold: whilestill the numerical majority, there are 2,388 monosyllabic words and 2,145 bisyllabicwords with initial stress. A total of 4,533 is well below the requisite amount forproductivity (5,7637 In 5,763 = 5,097). Even a grammar that is not subject to the two-syllable limit and one that always places the primary stress on the initial syllable failsto rise to the occasion. Even though it accounts for an overwhelming majority ofwords (4,960 or 86 percent), there has been no report of an initial stress strategy inthe later development of the metrical system: we take this to be a non-trivial result ofthe productivity model.

The child, then, must seek alternatives—in the direction of quantity sensitivity,an option in the metrical system. Here the learner has several moves to make. Onepossibility is to discover regularities within separate lexical classes, e.g., nouns andverbs. Language-learning children are well prepared to undertake this task, as theknowledge of lexical categories is acquired extremely accurately (see, e.g., Valian1986). Another possibility is to consider the interaction between morphology andstress: in English, the inflectional suffixes do not trigger stress shifts in the stems butsome of the derivational affixes do (e.g., -ic but not -ment). This case merits somediscussion.

13 For words that appear in the input as both nouns and verbs (such as walk and record), they contributeto both the noun and the verb counts; these will be used when the learner evaluates distinct grammars fornouns and verbs. In the case of walk, the word only contributes once to the total count of words since thenoun and verb form of walk are metrically identical. A word like record, by contributes, counts twice in thetotal word counts, since the verb and noun forms of the word are distinct.

Page 194: Rich Languages From Poor Inputs

Assessing Child and Adult Grammar 179

An English-learning child is well positioned to take inflectional morphology intoconsideration in the computation of stress. All inflectional suffixes are learned before3;6 when measured by Browns 90 percent obligatory usage criterion in production,and it is likely that these suffixes are reliably put into use in comprehension evenearlier: children as young as 20 months to 2 years old can interpret the inflected verbsof words (GolinkofF et al. 1987) including novel ones (Naigles 1990). Derivationalaffixes, however, are an altogether different matter. While we do not subscribe to thecommonly held view that inflectional and derivational morphologies reflect funda-mentally different aspects of grammar (see also Halle 1973), the fact remains thatderivational morphology is learned relatively late, perhaps well into the school years(Tyler and Nagy 1989), which may simply be the result of derivational forms beingless frequent in the input data and thus providing the learner with fewer instancesof data for acquisition. Taken together, we assume that the learner is capable ofrelating inflectional forms of verbs to their stem forms, but is incapable of parsingderivational forms into decomposable pieces (words such as growth and govern-ment will be treated as morphologically simplex). Furthermore, we assume that thelearner has correctly learned that inflectional suffixes do not trigger stress shift—atask easily accomplished, again, by the use of the productivity model: there are noexceptions to the lack of stress shift with inflectional morphology. In other words,the child treats all inflectional forms of walk (i.e., walk, walks, walked, and walking)as walk for the purposes of stress acquisition. Toward the end of this section, webriefly discuss how the child may acquire the stress-shifting properties of derivationalsuffixes.

We now turn to the placement of primary stress under the Halle and Vergnaud(1987) and Halle (1998) proposals, which are summarized operationally asfollows:

(5) The Halle and Vergnaud (1987) system (HV87)

a. Nouns:

• If the final syllable contains a long vowel (VV), it receives primary stress.• Otherwise if the penult is heavy (i.e., VV or VC+, a short vowel with at

least one consonant coda), then the penult receives primary stress.• Otherwise the antepenult receives primary stress.

b. Verbs:

• If the final syllable is super heavy (i.e., VV or VCC+, a short vowel with atleast two consonants in the coda), then the final syllable receives primarystress.

• Otherwise the penult receives primary stress.

Page 195: Rich Languages From Poor Inputs

i8o Legate and Yang

(6) The Halle (1998) system (HpS):

a. Nouns:

• If the penult is heavy (i.e., VV or VC+), then it receives primary stress.

• Otherwise the antepenult receives primary stress.

b. Verbs: Same as HV8/ above (5!)).

Table 11.2 below summarizes the results of evaluating HV8/ and HpS under avariety of conditions with respect to inflectional decomposition (stem±) and lexicalseparation (lex±). When evaluating grammars without making the lexical distinction([lex+]) between nouns and verbs, we use the noun rules in the HV8/ and HpS. Sincethe vocabulary consists of far more nouns than verbs, the failure of the noun rulesto reach productivity entails the failure of the verb rules. When evaluating grammarswith separate rules for nouns and verbs, we consider a grammar to be successful onlyif its rules reach productivity for both nouns and verbs. The raw data can be found inLegate and Yang (2011).

The HpS system under (lex+, stem+) can successfully identify the stress patterns ofEnglish with a tolerable amount of exceptions. It also manages to reach productivityunder (lex+, stem-) though it accumulates more exceptions and is thus disfavored.Unfortunately, there are no direct studies of the interaction between inflectionalsuffixes and stress—or lack thereof, to be precise—from the transient stages of met-rical acquisition, although our results do support the HpS description of the targetgrammar.

It is interesting to examine the nature of the exceptions under the HpS system,which reveals some interesting patterns considered in Halle's discussion, as well asthe traditional literature. Upon inspection, most of these end in the long vowel /i/,including the final derivational suffix (e.g., the dimunitive -yl-ie. such as kitty anddoggie) as well as morphologically simplex words such as body and army. Halle notes(see also Liberman and Prince 1977) that these suffixes are unstressable and aretherefore ignored by the rules for stress assignment. Although he does not address

TABLE 11.2. Evaluation of stress grammars forwords with frequency > i per million

lex stem

— —— +

+

+ +

HV87

nononono

H98

nonoyes"

yesfc

a. With 515 exceptions.b. With 355 exceptions.

Page 196: Rich Languages From Poor Inputs

Assessing Child and Adult Grammar 181

TABLE 11.3. The validity of stress preservationfor certain derivational suffixes that are factuallystress-preserving

suffix

-ment-ary

shifting

nono

N

20141

m

o8

valid

yesyes"

a. 8 < 4i/ In 41 = 11

how the learner might reach such conclusions, the productivity model can be straight-forwardly deployed for this task. The morpheme segmentations in the English LexiconProject lists 530 words with -y suffix: none receives primary stress, or even secondarystress. The productivity model can clearly identify such generalizations; if so, theproductivity of the HpS system will be further enhanced.

More broadly, the productivity model can be used to detect the metrical propertiesof all morphological processes.14 In the study presented here, we have assumed thatthe learner has not fully mastered the derivational morphology of English: indeed, thestress-shifting properties of derivational suffixes are acquired quite late, partly havingto do with their low frequencies in the linguistic data (Jarmulowicz 2002). Here wesample a few representative derivational suffixes and explore their roles in affecting thestress contour of the stem; some of these, as we shall see, have exceptions and thus posesome challenges to a learning model. For instance, the suffix -ary is generally taken tobe stress-preserving as in station-stationary but there are also pairs such as document-documentary where the stress does shift. Again using the morpheme segmentationsprovided in the English Lexicon Project, we compare the stress pattern of the stem andthe suffixed form, while omitting words for which stress shift is not applicable (i.e.,monosyllabic stems such as tone-tonic). For all four suffixes, we consider whether thenon-shifted variant is productive, as this is the assumption of the child at the timeof acquisition—the child has learned that suffixes do not shift in English. Anothermotivation for this treatment is due to the fact that young children may not havecarried out derivational segmentation; once derivational suffixes are beginning to beacquired, they are initially assumed to be stress-preserving.

The results for stress-preserving -ment and -ary are summarized in Table 11.3.We see that the stress-preserving suffix -ary remains productively so despite a fewcounter-examples.

As seen in Table 11.4, for the stress-shifting suffixes -ic and -ous, the non-shiftingoption is non-productive. The shifting option, in contrast, is exceptionless, assumingthat the child analyzes -ous using the stress pattern for nouns.

14 It can be used to detect the productivity of morphological rules/affixes. Some examples are alreadyreviewed in section 11.2; for a comprehensive treatment, see Yang (in prep).

Page 197: Rich Languages From Poor Inputs

182 Legate and Yang

TABLE 11.4. The validity of stress preservationfor certain derivational suffixes that are factuallystress-shifting

suffix shifting N m valid

-1C

-ousnono

13590

12030

nonofl

a. 30 > 9 0 / I n 90 = 20

11.5 Conclusion

Given the complexity of the English metrical system and its interactions with the othercomponents of grammar, our treatment here is admittedly preliminary. We do hope,however, that the quantitative approach guided by a precise model of learning canbe used to evaluate the theories of metrical stress from the past and shed light onthe directions of research in the future. And we hope that this study makes a suitabletribute to Carol Chomsky's legacy:

The information thus revealed about discrepancies between child grammar and adult grammaraffords considerable insight into the process of acquisition, and in addition, into the nature ofthe structures themselves. (Carol Chomsky 1969: 2)

Page 198: Rich Languages From Poor Inputs

12

Three Aspects of the Relationbetween Lexical and SyntacticKnowledge

THOMAS G. BEVER

Carol Chomsky's work on language presents a balance of empirical, theoretical, andapplied research. In this brief chapter, I outline three areas of current research thatreflect different emphases in her work. First, language learning can proceed over along period, possibly into adolescence. That is, the syntactic affordances of individualwords can be acquired slowly, sometimes not until early adolescence. Second, lan-guage learning is robust despite many individual and environmental differences: theacquisition of basic syntactic patterns follows the same general patterns despite widevariation in individuals and linguistic environment. Third, (psycho)linguistic sciencecan be usefully applied to such problems as reading; in particular, fluent readinginvolves integrating lexical and phrasal levels.

Each of these areas involves a balance between processing of the two major kindsof information one has about ones language: the lexicon and the syntax. Knowingand using a language necessarily requires both kinds of information: C. Chomskywas somewhat unusual in recognizing immediately that a central problem for under-standing language is the relation between these two kinds of information.

12.1 How Long Does Language Learning Really Take?

A dominant explicit and implicit assumption of todays language science is 'thebiolinguistic assumption: that language learning is paced by internal maturationalfactors. The apparent formal similarities of all languages initiates the idea that biolog-ical linguistic universals underlie linguistic structure, and hence, language learning.For many years, the matter seemed open and shut to many: the notionally avail-able alternative model of language 'learning', 'associative stimulus-response training',

Page 199: Rich Languages From Poor Inputs

184 Bever

is hopeless in the face of several facts. First, language learning appears to proceedwithout direct feedback; second, the child is exposed to a very small number ofgrammatical utterances, usually without any didactic intent by its caretakers; finally,the similarity of stages in normal language learning in different languages—even signlanguages—attest to universal constraints and computational stages that all childrenbring to language-learning experiences (for recent discussions, see Hauser et al. 2002;N. Chomsky 20073).

The emerging biolinguistic program defined research on language acquisition as theclose study of specific stages and of the prefigured typological dimensions of language(aka 'parameters') that a child must set for his or her native language (see, e.g., Hauseret al. 2002; Lightfoot 1991; Fodor 2001; Fodor and Sakas 2004). A strong empiricalcorollary of this research approach is that the critical features of each language areacquired by mid-childhood, certainly by age six years: children not only have mas-tered intricacies of syntactic patterns within, they understand the structure of remoterelations between clauses despite frequent patterns that seem identical but are not: forexample, the difference between 'John told Bill to go' in which 'Bill' should go', and'John promised Bill to go' in which 'John should go'.

C. Chomsky started her graduate research on language acquisition within thishistorical framework. But her eventual dissertation famously mitigated the categoricaland punctate interpretation of how language learning proceeds (C. Chomsky 1969).She used innovative methods to show that while children seem to have mastered com-plex verb distinctions such as the difference between 'tell' and 'promise', they actuallycan systematically misunderstand 'promise' sentences as being like 'tell' sentencesuntil at least age ten or later. That is, a child of nine might interpret 'the monkeypromised the dog to leave' as meaning 'the monkey told the dog to leave'. Similarly,children confuse so-called 'tough' constructions with corresponding actives: if a childis asked to make a blindfolded monkey 'easy to see' s/he might simply remove theblindfold. To account for such data, Chomsky formulated a version of a 'minimumdistance' principle, on which language-learning children apply a principle that theagent of a verb is its nearest leftward noun phrase. This principle is often discussedeven today in comprehension models, generally without attribution to Carol Chom-sky. In the language acquisition world, this led to a series of studies, actively pursuedtoday, showing that many aspects of linguistic structure are mastered over a muchlonger period of childhood than was earlier believed (e.g., see articles collected inFrazier and DeVilliers 1990). C. Chomsky later broadened the evidence for the impactof experience by showing that extensive exposure to written English is associated withmore sophisticated mastery of complex constructions (C. Chomsky 1980).

Chomsky's findings should have shattered supportive corollaries of the biolinguisticprogram—that language acquisition should be rapid and categorical stage by stage,and largely unaffected by experiential variables. But such results can be ignored bycommitted biolinguists, at least as having no deep implications for the biolinguistically

Page 200: Rich Languages From Poor Inputs

Relating Lexical and Syntactic Knowledge 185

paced model of language acquisition. If the period is longer than originally thought, itdoes not in itself deny the possibility of prefigured universals that channel the child'smastery of a native language into setting specific parameters—it remains the casethat the incremental associationist model of language learning cannot account forstructural phenomena, whether final language learning takes six or twelve years: eitherway, the language experience of the child is too impoverished, and often totally lacksany corrective feedback to account for a purely incremental accumulation of linguisticstructures. If enriched language experience facilitates fluent mastery of complex con-structions and subtle vocabulary nuances, that does not undercut the fundamentalcore of acquisition as dependent on universal maturationally emergent mechanisms.A recent example is evidence that certain neurological fiber tracts connecting differ-ent language-processing areas of the brain are not fully functional even at age five,and await further development (see Friederici 2009). This is a modern physiologicalconfirmation of some facts that may underlie the gradual acquisition of syntax in away consistent with the biolinguistic maturational hypothesis. Linguistically relevantneurological maturation itself may be much slower than had been usually thought.

Several developments in recent years have increased the salience of the finding thatat least some linguistic features are acquired slowly. First, there has been a burst ofinterest in showing that random discourses do contain statistically valid informationfrom which it is possible to extract categorical structures, given the right sort of statis-tical engine (Cartwright and Brent 1997; Moerk 2000; Mintz 2002,2006; Yang, 2006).At the same time, studies of infants and young children are showing that they dohave pattern-extracting abilities that might interact with statistically valid informationto aid, if not completely support, language acquisition (Gerken 1996). Finally, wehave never actually been restricted to considering only two kinds of learning models:behaviorist associationism vs biolinguistic nativism. There is at least one third kind,which may be receiving renewed support, that integrates both biologically prefiguredcategories and the statistically valid features of experience: a hypothesis-testing modelon which language learning utilizes both innate constraints and human problem-solving strategies (Bever et al. 1984; Bever 2009).

In the last century, the sustained work of gestaltists (especially Wertheimer 1945)outlined several features of how problem solving works. The most important featureis the emotional importance of the so-called 'aha' reaction when a person thinks s/hehas found the solution to some problem or task. What this shows is often overlookedin formulating theories of learning—humans experience an intrinsic thrill merely insolving a problem—this is true, whether the problem is an important one or not.Indeed, we often engage in creating otherwise useless problems to solve, just so wecan enjoy the experience of solving them. An entire theory of aesthetic experience isbased on this principle: music sets acoustic problems for resolution; graphic arts dothe same in vision; of course, drama, literature, and poetry are the flagship cases ofproblem creation and resolution.

Page 201: Rich Languages From Poor Inputs

186 Sever

Suppose the child treats discovering the syntax of her language as one of the firstbig life problems to solve. This would explain it as motivated not by the urge to com-municate (as in the usual behaviorist explanation), nor as forced by maturation (as inthe strong biolinguistic explanation), but as an activity that is cognitively intrinsicallythrilling and fun. That is, the child learns the language because it is an exciting,self-stimulating thing to do (Bever 1987). At the same time, current sociolinguisticresearch reminds us that language variation serves an important group-identifyingpurpose (see articles in Eckert and Rickford 1995). On this integrated view, childrenare determined to solve the problem of how their native language works because ithelps them be 'just like' the grown-ups around them: the cognitive thrill involved insuccessive solutions to how the adult system works provides stage-by-stage feedbackand intrinsic reward.

What are the structural features of problem-solving models, and what do they tellus about language and language learning? Miller et al. (1960) rehabilitated the olderGestalt model of problem solving, as 'hypothesis formation and testing'. In the caseof language, this requires a set of systems that formulate hypotheses and mechanismsfor testing those hypotheses. Recently, we have formulated this in the framework of ananalysis by synthesis model of language acquisition (Townsend and Bever 2001; Bever2009). On this model, children apply both inductive and deductive computationsfor hypothesis formulation and confirmation. The overall goal is to find a coherentstructure for the language experiences that systematizes the relation amongst andbetween meanings and forms.

This model makes several kinds of predictions:

(a) Languages should exhibit statistically valid patterns, independent from struc-tural constraints. This is a necessity for the inductive component of the analysis bysynthesis acquisition model to have data to formulate hypotheses for structural con-firmation based on the child's structural, deductive, language component. A simpleexample of this is the universality of a 'Canonical Syntactic Form' in every language.In English, this appears with a general surface: almost every sentence has the sur-face form, 'noun phrase' followed by a 'predicate' that agrees with the noun phrase,followed by other material. For a time, this has been thought to motivate the exis-tence of a particular configurational constraint on derivations, the so-called 'extendedprojection principle' (N. Chomsky 1981; Lasnik 2001; Epstein and Seely 2002; andSvenonius 2002; Richards 2003). Other languages have other canonical forms, some-times based on word order (e.g., German is 'inflected verb second'), sometimes basedon inflectional morphology, sometimes on a combination of linguistic features. Ineach case, the universality and frequency of the canonical form is unmotivated byuniversal linguistic architectural constraints—thus, attested languages are a subset ofarchitecturally possible languages, such that they exhibit forms that facilitate the dis-covery by the child of an initial set of generalizations for test and analysis in structural

Page 202: Rich Languages From Poor Inputs

Relating Lexical and Syntactic Knowledge 187

terms. The theoretical problem with the EPP is that it is an add-on configurationalconstraint on derivations. This did not matter so much in the context of GB theory,with its many 'filters'. But in the context of today's Minimalist program it is definitely astipulated universal, not one that follows from more general principles. Thus if we canexplain it as a function of what makes attested languages learnable, we have removedit as a theoretical carbuncle (Bever 2009).

(b) In English, for example, it is critical that the canonical form both have a nearuniversal surface appearance, but also have critical differences in some of the map-pings of that surface form onto thematic relations. In English almost every sentencewith the canonical surface form assigns the initial noun phrase 'agent' or 'experiencer'status in relation to the following predicate. But it is critical for the model that notevery such sentence is mapped the same way. This variation sets a problem for thechild to solve: what is the overall structure that accounts for both the surface featuresand the variation in the thematic mapping? This calls on application of the structuralcomponent of the dialectic involved in building up syntactic knowledge.

(c) The canonical form is learned by the child as an inductive process rather than aninitial stage. Numerous studies have confirmed this, that the child starts to rely on thecanonical form of its language by age three to four, not initially (Bever 1970; Slobinand Bever 1982).

(d) The problem-solving model can mitigate the 'poverty of the stimulus', by utiliz-ing the canonical form to generate sets of meaning-form pairs that the child has notyet experienced. This helps the language-learning child to be a 'little linguist' (Valian1999) without having memorized a large number of form-meaning pairs, and withoutquerying the adult world the way grown-up linguists do. A classic reflection of thisis in the research of Ruth Weir (1962) showing that children manifestly 'practice' tothemselves the paradigms in their language—most important is the apparent fact thatthey utter sentences in canonical frames that they have never heard.

(e) Certain aspects of language learning maybe relatively dependent on induction,and hence may take a longer time to be mastered than others. We can (and do)interpret C. Chomsky's findings of the relatively slow mastery of certain kinds of verb-intrinsic structural constraints as support for this prediction. In the framework of ahypothesis-testing model of acquisition, certain linguistic features will intrinsicallyemerge as the 'core' of the language, and others will be modifications of the core byvirtue of their less frequent appearance. In this way, frequency of a feature in the child'sexperience can actually explain some aspects of the order of acquisition of differentcomponents.

It should be emphasized that this view does not deny nor minimize the critical com-putational capacities that underlie the successive structural hypotheses that the childformulates to match the empirical generalizations. The model requires the dynamicinteraction of both biological constraints and statistical features of experience.

Page 203: Rich Languages From Poor Inputs

188 Sever

12.2 Language Learning is Robust Despite Variation in IndividualBiology and Experience

C. Chomsky's work showing that some aspects of syntax are acquired slowly has beentaken as support for language-learning theories that emphasize experience and induc-tion primarily. However, the implication of Chomsky's other acquisition work showedthat children are remarkably resistant to the effects of variation in experience andindividual abilities in learning the basic syntactic forms of language (e.g., C. Chomsky19863). That is, despite the effects of experience on the final level of sophistication chil-dren can reach, they all learn the essential core of language roughly as fast and well. Inthis sense, Chomsky's discoveries of how experience can affect vocabulary subtletiesthat interact with syntactic structures actually served to highlight the fundamentalsimilarities in the formative stages of language acquisition.

These studies highlight the contrast between the acquisition of lexical items andthe acquisition of syntactic patterns: since language learning involves both kinds ofknowledge, it raises the possibility that there might be profound individual differencesin even the early stages of how children approach language acquisition. An interestingpossibility lies in genetic variation in the nature of cerebral asymmetries for language.More than half a century ago, A. Luria observed that right-handers with left-handedfamily members (RH-LHF) appear to have more right hemisphere involvement inlanguage than right-handers with only right-handed family members (RH-RHF):RH-LHF aphasics recover language function faster and more fully, and they show agreater incidence of 'crossed aphasia' (aphasia resulting from right-hemisphere dam-age) (Luria 1948, 1970). These findings have been replicated (Hutton et al. 1977);recently, a direct fMRI study showed that some RH-LHF people actually show noleft-hemisphere asymmetry in brain activation language tasks (e.g., Knecht et al. 2000;Khedr et al. 2002).

It is important that the frequency of RH-LHF people is not small: roughly40 percent of all undergraduates we have studied are RH-LHF. The question then is,is the difference in brain laterality related to a difference in language processing, andhence in neurological organization for language? The emerging answer is yes. Overmany years of research, we have found evidence for a major language-processingdifference in the relative emphasis on accessing separate words and syntactic patternsbetween the two biologically coded groups of people: right-handed adults withfamilial left-handers (RH-LHF) access lexical items more readily than syntacticpatterns, while right-handed people with no left-handed family members (RH-FRH)access syntactic patterns more readily. Before discussing a possible explanationfor this initially strange finding, here are some published facts that support thegeneralization (Bever et al. 1989).

Page 204: Rich Languages From Poor Inputs

Relating Lexical and Syntactic Knowledge 189

(a) RH-LHF subjects read texts faster in a self-paced word-by-word readingparadigm when each button press brings the next word. RH-RHF subjects read textsfaster in a self-paced clause-by-clause reading paradigm where each button pressbrings the entire next clause (Bever et al. 1989).

(b) RH-LHF subjects recognize that a probe word occurred in a just-heard sentencefaster than RH-RHF subjects. But RH-LHF people recognize a short phrase that issynonymous with part of a just-heard sentence more slowly than RH-RHF subjects(Townsend et al. 2001).

(c) RH-LHF subjects understand short essays better when the text alternates wordsbetween the ears than when the text is presented monaurally to one ear or the other:the opposite obtains for RH-RHF subjects (Iverson and Bever, reported in Bever1989).

These and many other studies support the behavioral distinction in adult lan-guage behavior, with some initial implications for language learning. Decades ago,we speculated that the group differences might result from a general difference in theextent to which relevant neurological areas of the right and left hemisphere are moreequipotential in people with familial left-handedness. This relative equipotentialitycould then result in more widespread and redundant representation for those aspectsof language that are not as computationally demanding as syntactic processing. Thatis, the neurological representation of lexical items in RH-LHF people can be morewidespread and hence offer more separately available representations of words.

Recently we tested this hypothesis in a brain-imaging study and found support for it(Bever et al. in preparation). We gave subjects words in random order, and asked themto think of the input in a logical order. In one case, a syntactic task, the words could beordered into a sentence ('mothers upset daughters'): in the other case, a lexical task, thewords could be ordered by class inclusion ('penny, coin, money'). With fMRI imaging,we found relatively faster processing for the lexical task in relevant areas of the righthemisphere than the left for RH-LHF subjects only (Bever et al. in prep.).

A recent study published evidence that this differentiation has implications fordifferences in how language is acquired. We looked at the effects of the initial ageof exposure to ASL in a large population of deaf people: RH-LHF people show aconsiderably younger 'critical' age for mastery of ASL, roughly at eight years, whilethe corresponding age for RH-RHF people is at least twelve years (Ross and Bever2004). We interpreted this difference as reflecting the usual age at which there is aburst of word learning, roughly between four and eight years—if RH-LHF peopledepend on word learning as a vehicle for language acquisition, then they would bemore dependent on the period of rapid lexical learning. Of course, this would alsomake predictions for individual differences in the phenomena found by C. Chomskyinvolving later learning of the syntactic restrictions related to specific lexical items.However, that research remains to be done.

Page 205: Rich Languages From Poor Inputs

190 Sever

The preliminary conclusion from this aspect of our research is that there mayindeed be more than one way to access the information critical for language learning,as implied by C. Chomsky's early studies. Of course, there is the further possibility thatthe neurological representation of some essential features of language may also reflectthe biological variables. But that somewhat radical viewpoint remains to be shown.

12.3 Understanding Fluent Reading as Unifying WordSequences into Phrases

The preceding discussions revolve around acquiring and using the relation betweenindividual words and syntactic patterns as the child learns language. C. Chomskyaddressed the corresponding problem in the acquisition of reading: how does thechild pass through from the stage of reading word-by-word to fluent phrase-by-phrasereading in which words are grouped into computationally appropriate phrases? Howcan we help this process? In this goal, C. Chomsky hit on a paradigm which utilizes thechild's relatively natural ability to group phrases in normal intonational units, to carryover to his/her reading ability. Chomsky developed the technique that sounds reallysimple, but like many such ideas is simple only after one has isolated it: have the childread and then reread the same passage repeatedly (C. Chomsky 19763). On reread-ings, the word-by-word reading child naturally begins to impose normal intonationpatterns on the now familiar word sequence: Chomsky showed that the result was anotable increase in fluent reading of new material—the experience of discovering howto read known passages with normal grouping may play a role in stimulating groupingstrategies in general. One interpretation of this is that the child learns to listen to aVoice in the head' as s/he reads, after training with the Voice outside the head'. Thatis, like St Augustine, the child can discover that it is possible to read 'silently', whilehearing and utilizing natural intonation patterns generated internally (Bever 2009).

In our research, we have worked out a different kind of application of phrasingknowledge to the reading process. In our case, we systematically increase spacesbetween phrases.

Writing systems in general (but not always historically) have specific ways of indi-cating segmentation in words, thereby solving a major problem of speech comprehen-sion. Today, we take it as obvious that putting a space between words is a good idea. Wealso rely on punctuation conventions that can mark major phrases from each other.But what about smaller phrasing such as in the previous sentence, as broken up below:

We also rely

on punctuation conventions

that can mark major phrases

from each other

Page 206: Rich Languages From Poor Inputs

Relating Lexical and Syntactic Knowledge 191

Numerous published studies, starting in the 19605 have shown that indicating phraseboundaries by some marker improves text comprehension (see review in Bever et al.1990). This fact remained a laboratory curiosity for many years without practicalvalue, for three reasons: identifying 'phrases' had to be done by actual people; imple-menting the phrase boundary markers was limited to actual characters or extra wholespaces, which looked odd if not downright ugly; the notion of what counted as arelevant 'phrase' was not well understood or uniform. Modern computer and printingtechniques have offered solutions to each of these problems. Printers can be controlledto modify spaces and characters in very small increments that do not result in aes-thetic disturbance; 'phrases' can be automatically identified by many algorithms; thealgorithms themselves provide precise definition of the phrases.1

Why should phrase spacing improve reading? On the traditional view, it is becauseit reveals to the reader how to segment words together and build the correct surfacephrase structure as an initial step in reading comprehension: this follows from the tra-ditional view that the first step in comprehension is to determine the correct syntacticstructure. But our phrase-formatting algorithms in fact do not find the syntacticallycorrect phrase structure—rather, they isolate those kinds of phrases that are easilydetected, based on distributional patterns of words and phrases in actual texts. Forexample, our algorithm phrases the two sentences below differently, as shown by extraspaces in them. Yet, from a linguistic standpoint, they have identical phrase structuresas shown by the bracketed examples.

The large dog was barking at the small catThe large dog barked loudly at the small cat

(the (large dog)) ((was barking) (at (the (small cat))))(the (large dog)) ((barked loudly) (at (the (small cat))))

The different analyses assigned by our algorithms follow from the fact that functionwords such as /was/ and /at/ are easily learned as beginning phrases, while /barked/is infrequent and will not be recognized by a model that learns phrase boundarycues from texts. This raises a question of theoretical interest: which kind of phraseboundaries are the best to use for implementing segmentation, syntactically correctones or those assigned by ReadSmart? With linguistic colleagues to help us assigna correct surface phrase structure to standard font-testing texts, we examined thisquestion carefully. We contrasted the comprehension of phrase-spaced formats basedon syntactic vs ReadSmart phrases. The results (published) astounded even us: the

1 We have been testing the efficacy of a set of automatic programs we have written, called ReadSmart™(now patented), which incrementally increase space size between phrases. We have shown that compre-hension of ReadSmart texts and reading speed improve by roughly 15% each, more for poor readers. (See,e.g., Jandreau and Bever 1992; Bever, 2009.) We have also found that the texts are enjoyed more by readersand found to be more convincing. In one semester-long classroom study, readers using the phrase-spacedformat earned significantly more honor grades, and had significantly fewer failures than readers using thenormal format.

Page 207: Rich Languages From Poor Inputs

192 Sever

ReadSmart-phrased texts were far easier to comprehend; in fact the syntactic-phrasedtexts were harder to understand than normal untreated texts (Bever et al. 1990).

This follows from the reconstructive view of reading comprehension, as refined byour consideration of details of the analysis by synthesis model of spoken language (seeGoodman 1967 for the original proposal of this idea). That model involves two phasesof structure assignment, an initial one based on readily available cues and patterns,and a later one based on a full syntactic analysis (Townsend and Bever 2001). Ourresults show that basing visually salient phrase information on readily available cuesleads to the best comprehension, thereby giving empirical support to our claims aboutinitial phases of reading comprehension itself. It also gives support to the larger claimthat like speech comprehension, reading involves several stages of extracting structureand assigning meaning. This notion is now receiving empirical support from neu-rolinguistics studies. For example, Dikker et al. (2009) have shown that an early sen-sory component of evoked potentials (within 100-200 milliseconds) occurs to localphrase violations only in cases involving explicit function words or morphemes. Thisis direct support for the initial prospective component of comprehension proposedby us and Goodman, and is consistent with our finding that phrase spacing based onmorphologically explicit phrases is most effective in improving comprehension.

These considerations offer some perspective on how readers rapidly create a lin-guistic representation along with the ghostly voice offering an internal rendering ofthe text. C. Chomsky's technique of having children read and reread text to inculcatephrasing fluency brings out an initial Voice outside the head'. Our phrase-basedformatting technique aids the reader in discovering the corresponding Voice in thehead'.

12.4 Conclusion

Chomsky's sustained work was an early clarion, reminding us of the importance ofinput and time in language acquisition, even if'environment', 'input', and 'incrementallearning' cannot account for the fundamental linguistic structures that are universallyacquired. This and her work on the teaching of reading were early frameworks fora more integrated theory of language behavior involving both words and syntacticpatterns, and the applications of linguistics to practical problems.

Page 208: Rich Languages From Poor Inputs

Part III

Broadening the Picture: Spellingand Reading

Page 209: Rich Languages From Poor Inputs

This page intentionally left blank

Page 210: Rich Languages From Poor Inputs

13

Children's Invented Spelling: WhatWe Have Learned in Forty Years*

CHARLES READ AND REBECCA T R E I M A N

Some preschool and primary-grade children create their own spellings as they write,in many cases without prompting from adults. For example, a US five-year-old createdthe sign for his fathers study that appears in Figure 13.1:

FIGURE 13.1 Sign written by US five-year-old. The child's writing of 'B CWIYIT" has beenretraced with a darker crayon to make it more visible.

DOT MAK NOYSMY DADAAY WRX HIRB CWIYIT

* Preparation of this chapter was supported in partbyNIH Grant HDo51610.

Page 211: Rich Languages From Poor Inputs

196 Read and Treiman

The non-standard spellings in this message, including DOT [don't] and WRX [worfcs](we follow the convention here of placing children's spellings in upper case, andstandard spellings in lower-case italics), are the child's creation, at least in part. Theycannot have been acquired from instruction or dictation, as the standard spelling MYmay have been. The invented spellings may thus tell us something about the child'sknowledge of language. For example, the fact that WRX does not explicitly representthe morphology (base + inflection) tells us that, for this child at this time, writing iscloser to a phonological representation than is standard orthography. That distinctiondefines part of what this child has to learn.

As it turns out, the child who created the sign for his father's office is not unique.Analyzing a collection of writings from twenty such children, Charles Read (1970,1971) found some spelling patterns to be quite consistent. For example, a spellinglike DOT for don't, with no representation of a nasal sound before a consonant,is common in young children's spelling. Read identified several such patterns andproposed explanations, mainly phonological, for each. Carol Chomsky contributedsignificantly to this line of work, with additional examples and close observationof children engaged in writing. Her greatest contribution was her thinking aboutwhat invented spelling might mean for learning to read and how writing might beincorporated into a preschool or primary-grade classroom.

In this chapter, we will summarize these early contributions. We will then examinewhat four decades of further research has uncovered. How well have the initial viewsof the nature of invented spelling and the early ideas about classroom instructionheld up? What else can we learn from the spellings of children in the US and othercountries? How do the early spellings fit into the larger picture of spelling developmentin general?

13.1 Early Work on the Nature of Invented Spelling

The basic assertions in the 19705 publications were that invented spelling exists,that it has common characteristics across children, and that these characteristicsshed light on children's knowledge about language. As mentioned previously, Read(1970,1971) examined spellings from twenty US children, ages three to five, who hadbegun to create their own spellings at home or in preschool. Read (1975) analyzedthose and the spellings of twelve more children, who spelled 1,201 different wordsaltogether. Carol Chomsky (1971, i976b) contributed additional examples, includingfirst-hand accounts of interacting with children as they wrote. In Chomsky (19753),she described how one first-grade teacher encouraged independent writing as a reg-ular classroom activity, fostering and valuing the children's spellings.

While there is considerable variation among children, certain features are observedagain and again in US children who are learning English. These features are discussedin Read (1970,1971,1975) and further discussed and exemplified by Carol Chomsky(1971,19753, i976b, 1979). We have chosen six such features to discuss here.

Page 212: Rich Languages From Poor Inputs

Children's Invented Spelling 197

First, young spellers often use a letter to represent its entire name, as in MAK formake, PEL for feel, TIM for time, KOK for Coke, and HUMIN for human. US childrenwriting English frequently use A, E, I, O, and U in this way. Viewed in relation tostandard spelling, these cases may suggest that a letter, such as the final e of make, hassimply been omitted. However, other examples show more distinctively the influenceof letter names, including YL for while, THAKQ for thank you, and R U DF for Areyou deaf? (Bissex 1980).

To represent vowels that do not correspond directly to a letter name, children mayuse a letter whose name at least begins with a similar vowel. Thus, for example, theyuse A to represent /e/ as in MAS for mess, SHALE for shelf, ALLS for else, and PRTANDfor pretend. They use E to represent /:/ in SEP for ship, FES for fish, LETL for little, andFLEPR for flipper. And I represents /a/ in spellings such as GIT for got, BICS for box,DIKTR for doctor, and UPIN for upon.

The two patterns we have discussed so far appear to be strongly influenced byletter names, but other patterns may reflect details of pronunciation. For example,in American English, when /t/ or /d/ occurs between a stressed and an unstressedvowel, it is pronounced as a tap of the tongue tip. The tap is voiced, like the vowels oneither side of it. In that respect, it is more like /d/, even when it is spelled t. Thus inmost American pronunciation, letter is pronounced /leraV, where /r/ represents thetap. Children sometimes spell this tap with a D, appearing to represent the voicing.Thus, we see spellings such as LADR for letter, BODOM for bottom, AODOV for outof, and WOODR for water. This spelling is not, however, in the majority in Reads(1975) tabulation.

As in spellings like DOT for don't and THAKQ for thank you, children often omitnasal sounds before stop consonants, as in the sequences /mp/, /mb/, /nt/, /nd/, /rjk/and /ng/. Such omissions are especially common when the consonant that followsthe nasal is voiceless. For example, children may write STAPS for stamps, NUBRSfor numbers, PLAT for plant, and THEKCE for thinks. In Reads (1975) study, theseare the most frequent non-standard spellings; in fact, for the velar nasal /n/, theyare more frequent than the standard nk or ng. Read proposed that the explanationis phonetic and/or phonological. When a nasal occurs before another consonantwithin an English syllable, especially if that consonant is voiceless, the nasal is realizedprimarily as a nasalized vowel, not as a consonant. Moreover, the articulation of thenasal is predictable: only /n/ can occur before HI or /d/, only /m/ before /p/ or /b/,and only /n/ before /k/ or /g/. This explanation assumes that young spellers can hearthe difference between such pairs as set-sent and sick-sink. However, because thatdifference is not a segment, the children do not represent it in their spelling. As in thespelling of taps, the invented spellings seem to reflect a phonetic fact, but in this casechildren fail to represent a meaningful distinction. This suggests that perhaps childrenproceed segment by segment as they spell.

To represent sonorant consonants like /!/ or a nasal when they constitute an entiresyllable (with no vowel), children often use only the letter for that consonant: LITE

Page 213: Rich Languages From Poor Inputs

198 Read and Treiman

for little, PESL for pencil, GOBI for gobble, KITN for kitten, SATNS for sentence.Standard spelling, on the other hand, consistently adds a vowel letter, making thesyllable structure explicit. In this respect, too, the invented spellings are closer to asegmental representation.

When representing /tr/ and /dr/ at the beginning of a syllable, children sometimeswrite CHR and JR, respectively. For example, they may write AS CHRAY for ashtray,CHRIE for try, CHRAC for truck, JRAGIN for dragon, and JRADL for dreidel. Thesespellings are in the minority in the corpus analyzed in Read (1975). However, theyhave a plausible phonetic basis. Within a syllable, a /t/ or /d/ before /r/ is retractedand is released more slowly than is /t/ or /d/ before a vowel. The resulting turbulentsound is similar to the affrication in the sounds spelled ch and j in standard spelling.A writer with incomplete knowledge of standard spelling who wishes to write try isthus making a reasonable choice when he or she writes CHR.

The invented spellings, the 19705 researchers concluded, are a window into the con-ceptions of language and of writing that some children share during the course of theirdevelopment. Children who create these spellings have acquired the crucial alphabeticprinciple (Rozin and Gleitman 1977) that spellings represent roughly phoneme-sizedunits in the stream of speech. They have also learned at least some letter namesand some standard sound-spelling correspondences. While they have learned somestandard spellings, they are willing and able to apply their partial knowledge to createspellings of their own. When they do, they represent language at the level of thephonemic segment, primarily, but they also represent some phonetic details thatare not reflected in standard English orthography. The processes that children usesometimes yield spellings that appear strange to adults, especially when inventionsare combined within a word. Our knowledge of standard spelling in effect tells uswhat categories sounds belong in; those categorizations are in some cases arbitraryfrom the perspective of someone who is starting afresh.

13.2 Early Ideas about Instructional Implications of Invented Spelling

As interesting as it is to infer something of children's linguistic development fromtheir spelling, as well as from the content of their writing, a more intriguing questionfor most people is what children's writing might mean for preschool and primary-grade learning. Two of Carol Chomsky's first publications on the subject of inventedspelling (1971,19753) addressed this topic and appeared in journals for teachers. Theyput forward two main arguments. The first is that children ought to learn how toread by creating their own spellings for familiar words as a beginning' (1971: 296),an argument previously made by Montessori (1912/1964: 282-3, 296). Chomsky'ssecond argument is that teachers can and should encourage young children to writeindependently as part of ordinary classroom activities.

The first argument, encapsulated as 'Write first, read later,' is based on Chomsky'sobservations of children engaged in writing. She argued that with some degree of

Page 214: Rich Languages From Poor Inputs

Children's Invented Spelling 199

phonemic awareness and a knowledge of which sounds some letters represent, a childmay need little more than writing materials, such as plastic letters, and encouragementin order to begin to write. She emphasized the excitement of then reading one's ownproductions: 'And what better way to read for the first time than to try recognizingthe very word you have just carefully built up on the table in front of you?' (1971:296). Based on such observations, Chomsky (1971, i972b) proposed that quasi-independent writing should precede and support learning to read. She argued that tostart with one's own message and figure out how to inscribe it, whether in writing orwith movable letters, is a more concrete, more accessible, and more natural operationthan trying to deduce someone else's message from print.

Chomsky's second main argument is that young children should be encouraged towrite as part of normal classroom activities. Doing so fosters the growth of literacyand, equally important, a child's joy and confidence in his or her own communica-tive abilities. Quoting a first-grade teacher: 'By providing [children with] immediateaccess to the printed word, writing can give them a sense of power very quickly'(19753: 37). Invented spelling may make an essential contribution to this growingsense of mastery because it involves relying on one's own judgments: 'Let [the youngwriter] trust his linguistic judgments..." (1971: 299).

Not only is this self-motivated writing empowering, according to Chomsky, it isthe kind of creative work that leads to genuine, lasting understanding. On this point,Chomsky cited Piaget: 'children have real understanding only of that which theyinvent themselves, and each time we try to teach them something too quickly, wekeep them from reinventing it themselves' (i97ib: 127). This emphasis on the valueof learning through discovery became an important theme in the study of children'sinvented spelling in relation to schooling (Ferreiro and Teberosky 1982). The theoryknown as constructivism, based on the work of Vygotsky as well as Piaget, providedan epistemological framework for this view (Piaget 1973).

Chomsky argued that early writing is 'a creative feat' (Chomsky 1981: 145), likethe acquisition of language itself, and is properly compared to artwork. It must 'notdegenerate into a form of exercise', and it must be guided by the child. 'How muchwriting he will eventually produce, if any, depends on his own inclination and interest'(1981: 148). Both of these warnings, against turning early writing into a requiredexercise and against adults setting the pace, speak to debates that arose later over theplace of early writing in the school curriculum.

13.3 Later Work on US Children's Classroom Writings

Not many four- or five-year-olds will spontaneously write a multi-word messageasking people to refrain from making noise in father's office. There is no good basis forestimating what proportion of preschoolers will do this, but Read (1970) called suchspontaneous spelling by preschoolers 'rare' (p. 16). One might therefore ask whether

Page 215: Rich Languages From Poor Inputs

200 Read and Treiman

the phenomena that he and Chomsky observed are limited to a special group oflinguistically precocious youngsters. In fact, that question initially limited the impactof Read's (1971) article on thinking about school practices (Shanahan and Neuman1997: 207).

It turns out, however, that somewhat older children produce very similar sorts ofspellings when they are encouraged to write independently at school and when correctspelling is downplayed. That activity was common in many US primary schools in the19705 and 19805, in part because of Chomsky's writings on the educational value ofearly spelling. Many teachers of the time followed a whole-language approach, empha-sizing the communicative function of reading and writing and deemphasizing correctspelling. Children, they believed, could and should construct an understanding ofthe writing system largely on their own. Teachers who followed a whole-languageapproach encouraged children to spell words as they thought best and, during thefirst few years of schooling, encouraged and accepted non-standard spelling.

Rebecca Treiman (1993) studied forty-three US first graders (aged between aboutsix and seven) whose teacher advocated a whole-language approach. These childrenattended an ordinary public (i.e., state-supported) school; they were not especiallyprivileged or precocious. The teacher set aside about a half an hour each morning forindependent writing. She told the children that they should spell words on their own.They should not copy from one another, and they should not ask an adult. A child whocould not yet write words or letters was encouraged to draw pictures instead. Whenchildren had finished writing, they dictated their story to a teacher or teacher's aide.The adult wrote the children's words on the child's paper in standard form but did notpoint out how the child's spellings differed from the conventional ones.

Examining a total of 5,617 spellings that were produced by forty-three studentsin this classroom during two successive school years, Treiman (1993) found manyof the same phenomena that the researchers of the 19705 had discovered amongpreschoolers who start to spell on their own. Thus, the first graders showed each ofthe phenomena discussed earlier—use of a letter to represent the entire syllable thatis its name, spelling of vowels that are not the names of letters on the basis of theirsimilarity to those that are, use of D as well as T to spell taps, omissions of nasals inwords like don't and sink, use of single letters to represent syllabic consonants, andspellings of/d/ and HI before /r/ that represented the affrication of these sounds.

Treiman's (1993) results further suggested that some of the spellings that had beenreported in the 19705 were manifestations of larger phenomena. For example, earlierresearchers had noted that children sometimes failed to spell nasals that immediatelypreceded other consonants at the ends of words, especially when the following conso-nants were voiceless. Treiman verified this finding. However, she found that omissionsof consonants were by no means restricted to nasals in final clusters. They occurredas well for other types of consonants in the initial positions of final clusters, as in ODfor old, HOS for horse, SES for cents, and FUOS for fox. They also occurred for the

Page 216: Rich Languages From Poor Inputs

Children's Invented Spelling 201

second (and third) consonants of initial clusters, as in BO for blow, AFAD for afraid,and SET for street. At syllable boundaries, too, the last consonant of the first syllablewas susceptible to omission, as in PESEI for Pepsi. Thus, nasal omissions appeared tobe one manifestation of a larger effect of syllable structure on children's spelling. Theseeffects arise because the segments in a syllable do not have equal status. Segments incertain positions of a syllable are more easily conceptualized as separate units thanothers. For example, the /!/ of/bio/ (blow) is closely bound to the /b/, /bl/ forming thesyllables onset and /o/ its rime. Children apparently find it natural to spell /bl/ with asingle letter, just as they find it natural to spell the nasalized vowel of don't with a singleletter. Children's difficulties in analyzing syllables into units of the size required by thewriting system are exacerbated, in the case of nasal-voiceless consonant clusters, bycertain phonetic properties of the nasals. However, omissions of consonants in earlyspelling are not restricted to this structure.

Other phenomena, Treiman (1993) found, are more limited than the early workimplied. For example, Read (1970,1971,1975) reported that children sometimes usea letter to stand for its full name, as in CRT for cart, HLP for help, HM for hem,DP for deaf, and BD for bead. Treiman confirmed the existence of such errors, butshe found that those involving R were more common than those involving vowel-obstruent or consonant-vowel letter names. Letter-name errors involving the vowel-liquid letter name L and the vowel-nasal letter names M and N were fairly common aswell. The explanation, Treiman proposed, lies in the internal structure of the syllable.Consonants that are high on the sonority scale form a strong unit with a precedingvowel. For example, the /a/ and the following /r/ of a word like /kart/ (cart) are tightlylinked. The strong linguistic bond encourages children to spell the sequence as a unit,using their knowledge of letter names. In a word like /best/ (best), in contrast, thevowel and the obstruent are not so tightly linked. Even when children know that /es/ isthe name of s, they are not likely to spell /es/ as a unit. Similarly, the relatively weak linkbetween /b/ (the onset) and /i/ (the first part of the rime) in a word like /bid/ (bead)means that children do not often spell these two sounds as a unit. Thus, spellingslike BD for bead and BST for best are less common than spellings like CRT for cart.Treiman's results show that children who bring knowledge of letter names to the taskof spelling sometimes spell letter name sequences as units but that the knowledgeof letter names interacts with the phonological properties of the units. It is not thecase (as proposed by Henderson and Beers 1980) that children go through a stageduring which they consistently spell letter name sequences with the correspondingletters.

Treiman's (1993) results confirm that children sometimes make different choicesthan the standard writing system does when representing sounds. Thus, the firstgraders in her study sometimes chose G or J to represent /d/ before /r/ and CH orC to represent HI before /r/. They occasionally represented stop consonants after /s/as voiced, as in SGIE for sky. This is another plausible but unconventional choice:

Page 217: Rich Languages From Poor Inputs

202 Read and Treiman

The second segment of a word like sky is indeed similar in lack of aspiration to /g/.And children sometimes produced spellings such as TEKN for chicken, implicitlyrecognizing that the affricate begins with a /t/ portion. Even professional linguistshave differing opinions in these cases, and it is not surprising that children's analysessometimes differ from those that are embodied in the conventional English writingsystem.

Also not surprisingly, children's identifications of segments are sometimes inexact.Treiman's (1993) results indicate that children sometimes confuse consonants that dif-fer only in voicing. For example, they may spell the first consonant of care as if it werethe voiced /g/ rather than the voiceless /k/: GARY. Some of children's substitutionsreflect the visual similarity among letters, but many reflect phonological factors.

13.4 Experimental Studies

A child who spends ten or fifteen minutes drawing a boy standing on the ground atthe bottom of a castle and writing the words JAC JUPT (Jack jumped) has workedhard to convey an idea. Whether the child does this at home or in a classroom wherehe is encouraged to write on topics of his own choosing, he has selected the message,the words, and the letters. That independence is important for learning, accordingto Chomsky, and we will talk more about its educational value later in the chapter.However, the independence can cause problems for researchers who are trying tolearn about the nature of children's early spellings. One problem is that children whoselect their own messages may choose to spell some kinds of words and not others.The data that researchers get may be unbalanced and incomplete.

In an attempt to solve these problems, Read (1971,1975) supplemented naturalisticdata with experiments. For example, he asked children to write selected words thatbegin with /dr/ and /tr/ in order to verify the existence of spellings such as JR, GR, andCHR and to test ideas about their nature and development. The experiments that Readreported examined several of the phenomena that were mentioned earlier, includingomissions of nasals in final clusters. Encouragingly, the results of the experimentsaligned well with the results of the naturalistic study. For example, children producedspellings such as BET for bent when they were asked to write specific words, as whenthey composed messages of their own choosing. In experiments, moreover, childrencould be asked to perform tasks or make judgments designed to shed light on the basisfor their spelling inventions. For example, the children in an experiment reported byRead (1975) tended to judge that the difference between bent and bet lay on the vowel.

In the 19805 and beyond, researchers increasingly adopted an experimentalapproach to the study of children's spelling (see Treiman 1998 for more on the exper-imental work carried out in the 19805 and 19905). In what follows, we will considerthe results of experiments as well as the results of naturalistic studies when discussingwhat we have learned about the nature of children's early spelling.

Page 218: Rich Languages From Poor Inputs

Children's Invented Spelling 203

13.5 Beyond Phonology

Children's early spellings must be considered in light of the characteristics of thewriting system that is their target. The English writing system is often considereddeficient because many segments have more than one possible spelling and becausethose spellings may be complex, containing more than one letter. For example, /J7is spelled sh in shoot, ch in chute, and ci in magician. The ch in chute reflects theword's French origins. Users of English will find this spelling unexpected unless theyknow that the word comes from French and unless they know something about thespelling of that language. The c in magician reflects its relationship to magic. Users ofEnglish who know that a magician is someone who does magic and who know thatthe spellings of base words are often retained in the spellings of derived forms will notbe surprised by the c of magician.

Carol Chomsky (1970) drew educators' attention to the view, put forward by NoamChomsky and Morris Halle (1968), that the English writing system is more principledthan it first appears. The language includes many spellings such as magician, whichmake morphology visible, and this means that people who think about related wordscan often find solutions to their spelling questions. For example, thinking about magiccan help one to spell magician and thinking about preside can help one to spell thesecond vowel of president. Chomsky suggested that children be encouraged to lookfor reasons, morphological or other, why words are spelled the way they are. Teachersshould understand and convey the idea that 'spelling very often is not arbitrary, butrather corresponds to something real that... [the child] already knows and can exploit'(C. Chomsky 1970: 307).

The five-year-old who spelled works as WRX seems not to have considered the baseform work when doing so. However, children who are only a little older have beenfound to use morphology at least in simple spelling tasks. For example, the older five-and six-year-olds tested by Treiman, Cassar, and Zukowski (1994) were significantlymore likely to spell the tap of a two-morpheme word like later with T (rather than D)than to spell the tap of a one-morpheme word like city with T. To at least someextent, the children could use their knowledge that late ends with HI to help inferthe standard spelling of the tap of the related word later. Children of this age couldprobably not use preside to help spell president; indeed they probably don't know theword preside. However, the beginnings of the idea that spelling reflects morphologyas well as phonology appear to be present from an early age in US children.

The spellings of a language represent aspects of its linguistic structure, includingphonology and morphology in the case of English. The spellings also have a charac-teristic appearance: they follow certain graphotactic patterns. For example, a vowelor consonant letter sometimes appears twice in sequence in an English word, as inseem and sell. Sequences of three identical letters do not appear in English. Consonantdoublets occur in the middles and at the ends of words but rarely at the beginnings;

Page 219: Rich Languages From Poor Inputs

204 Read and Treiman

words like pillow and ball are fairly common but words like llama are rare. Voweldoublets may appear in all positions, as in eel, Lee, and seem. The researchers ofthe 19705, seeing invented spelling as a window onto children's ideas about spokenlanguage, focused on its phonological patterns. They did not examine the degree towhich invented spellers honored graphic patterns such as the ones just described.

In recent years, researchers have broadened their focus by examining children'sknowledge of graphotactic patterns. The first-graders in Treiman's (1993) study fol-lowed some such patterns in their classroom writings. For example, they produced anumber of errors such as SUPRMORRKIT for supermarket and PASS for face, withconsonant doublets in the middle or at the end of a word. They produced fewer errorssuch as MMNP for money, with consonant doublets at the beginning. In experimentsin which first-graders spelled non-words to dictation, spellings such as DASS for/des/ outnumbered spellings such as DDAS for /des/ (see Cassar and Treiman 1997).Moreover, children of this age showed some success in other experimental tasks thatwere designed to tap knowledge of graphotactic patterns. Thus, they performed abovethe level expected by random guessing when asked whether baff or bbaf looks morelike a word of Pnglish (Cassar and Treiman 1997).

This early knowledge of graphic patterns, although unexpected from the per-spective of theories that relegate such knowledge to a later stage of development(Henderson and Beers 1980), is not surprising given that children in the US andother literate societies are surrounded by print from an early age. An infant's namemaybe embroidered on her blanket, the alphabet maybe written on toys and posters,and print abounds on the labels of commercial products, on street signs, and so on.Some of this writing, like that in many books, is not designed to draw children'sattention and indeed does not (Pvans and Saint-Aubin 2005). Other writing, like thaton packages of breakfast cereals that children favor, is colorful and attractive. Parentsand preschool teachers actively draw children's attention to written words when theydo such things as write a child's name. Where writing is concerned, the input to mostmodern children is rich, not poor. From this exposure, it appears, children learn aboutthe properties of writing as a graphic object.

Recent work suggests, in fact, that children learn about some of the graphic prop-erties of writing even before they learn about its link to language. Before they inventspellings that reflect the sounds in spoken words, children sometimes produce onesthat do not. For example, one four-year-old boy wrote a banner with the lettersSSIDCA to tell his mother welcome home (Bissex 1980). A number of the first-gradersin Treiman's (1993) study wrote similar sorts of things before they began to producephonologically based writing. Near the beginning of the school year, for example,Calvin wrote ACR and told the teacher that it said T like swings and I like slides. AndI like the sun.' Such productions have traditionally been considered to reflect a stageof spelling development during which children string together random sequences ofletters, but recent work suggests that they may be more than this.

Page 220: Rich Languages From Poor Inputs

Children's Invented Spelling 205

Polio, Kessler, and Treiman (2009) showed that US four-year-olds who do not yetrepresent phonology in their spelling do not string letters together purely randomly.The frequency with which they use individual letters is related to the frequency of theletters in the language, and the frequency with which they use pairs of letters is relatedto the frequency of the pairs in written texts. Children's exposure to the alphabetsequence is also influential: their non-phonological spellings include, more often thanexpected by chance, sequences of letters in alphabetical order such as BC and FG.Moreover, children use letters from their own first name—letters that are especiallyfrequent in their own experience—at especially high rates. These results suggest that,from an early age, children pick up patterns in the writing around them. To do this,they appear to use the same statistical learning skills that they use in some otheraspects of language learning and in other aspects of learning more generally. Youngwriters reproduce certain graphotactic patterns before knowing what they mean interms of letter-sound correspondence.

13.6 Beyond US Learners of English

The 19705 work on invented spelling concentrated on monolingual US learners ofEnglish, but the reasoning behind such studies points to the value of studying earlyspelling in a variety of languages and educational contexts. The logic is that therelationship between the sound system of a language and its writing system defineswhat a speller must learn. In contrast to English, for example, Spanish uses alphabeticspelling that is closer to a consistent representation of phonemes. Chinese charactersrepresent meaningful units that are generally one syllable in length; its writing systemis not alphabetic. By comparing how beginning writers approach these very differentsystems, we stand to learn much more about their strategies.

Research with children in other cultures and learning other languages is still rel-atively sparse. In one early study, Temple (1980) studied children learning Spanishin the Dominican Republic. Other early studies summarized by Read (1986: 76-98)involved Dutch, French, German, and Spanish. More recently, a special issue of anonline journal (Fijalkow 2007) included studies of French, Spanish, Greek, Japanese,and Mayan. Although less work has been done with other languages than withEnglish, the findings point to many of the same underlying principles at work.

In many societies, children learn the names of letters from an early age and use thisknowledge to help invent spellings. The most basic letter name strategy in spelling isto use a letter to symbolize all of the sounds in its name. The Portuguese-speakingchild who spelled UUU for urubu Vulture' (Nunes Carraher and Rego 1984) andthe Spanish-speaking child who spelled AO for sapo 'frog' (Ferreiro and Teberosky1982) seem to have used this strategy, just as US learners of English do. The tendencythat has been reported by some investigators for beginning spellers to write the samenumber of letters as syllables (Ferreiro and Teberosky 1982) may reflect, in part, this

Page 221: Rich Languages From Poor Inputs

206 Read and Treiman

use of letter names. The spoken words of Portuguese and Spanish contain many vowelsounds that are the names of letters, and so a child who goes through a word and writesthe letter names that he or she hears would produce many spellings that have the samenumber of letters as syllables (Polio, Kessler, and Treiman 2005).

The effects of letter names are not limited to exact matches. Children sometimesuse a letter to spell a sequence that is similar but not identical to the letters name.We saw this earlier in the case of English vowels, and a similar phenomenon has beenreported for the Portuguese letter q, which is named /ke/. Young Portuguese speakerssometimes spell /ke/ as Q, an exact match, but they sometimes also spell /gel as Q(Polio, Treiman, and Kessler 2008), an inexact match and a highly unusual spellingfrom the point of view of standard Portuguese. The sequence /gel matches /ke/ in thevowel and in all features of the consonant except voicing. Evidently, the child classifies/gel as similar enough to /ke/ to merit the spelling of /gel with the same letter that isassociated with /ke/. In making this judgment, the child generalizes over voicing, thesame generalization we saw earlier in the case of English spellings such as GARY forcare.

The effects we have been discussing depend on children knowing the names ofletters. Children who are not familiar with the names of letters cannot use this infor-mation to guide their spellings. Currently in England teachers and parents oftenrefer to letters by the sounds that they make in words (e.g., /s/ for S) rather than bytheir conventional names (e.g., /es/ for S), and children are expected to do the same.Consequently, the effects of letter names on spelling that have been reported in USchildren appear to be small or non-existent in English children (Ellefson, Treiman,and Kessler 2009).

US learners, we have seen, pick up some of the more obvious graphotactic patternsof their writing system from an early age. The same is true for children in other literatesocieties. Consider French, in which vowel letters may not double and consonantdoublets may occur in the middles of words but hardly ever at the beginnings orthe ends. Correspondingly, French six-year-olds are more likely to place a consonantdoublet such as // in the middle of a word than at the end (Pacton, Perruchet, Fayol,and Cleeremans 2001). Moreover, when asked to choose whether for example jukkerorjjuker looks more like a word of their language, they tend to pick the item with thedoublet in the middle. In Finland, children who have just started formal instructionin reading and writing already prefer items with medial consonant doublets, whichare legal in Finnish, over items with initial consonant doublets, which are illegal(Lehtonen and Bryant 2005).

Further evidence for early knowledge of graphotactic patterns comes from theabove-mentioned study of Polio et al. (2009), which examined early spellingsthat did not represent phonology. In addition to studying US children who wereexposed to English, Polio et al. studied Brazilian children who were exposed to Por-tuguese. The productions of the two groups of children looked somewhat different.

Page 222: Rich Languages From Poor Inputs

Children's Invented Spelling 207

For example, the Brazilian children used more vowel letters than the US children. Thisis probably because vowel letters are more common in Portuguese than in English.Neither group of children represented phonology, but the children's productions hadalready been molded by some of the properties of the writing in their environments.

13.7 Invented Spelling as a Part of Spelling Development

The researchers of the 19705 were interested in children's invented spellings, in largepart because of what these spellings could show us about children's knowledge ofspoken language. Thus, Read titled his 1975 book Children's Categorization of SpeechSounds in English; the term spelling did not appear in the title. More recently, inves-tigators have been examining early invented spelling within the context of spellingdevelopment in general.

The 19705 research drew a sharp distinction between invented spelling and conven-tional spelling. Children's non-standard spellings, because they are invented, informus about their conceptions of sounds. Children's conventional spellings may not beinformative because they could have been produced through memorization or obser-vation. Thus, Read (1975) did not include conventional spellings in his corpus. But,as we have seen, observation of the patterns in conventional spelling plays a role fromearly on. The distinction between invented spelling and conventional spelling is notas sharp as it first appeared.

Virtually every speller—child or adult, beginner or expert—invents in some situ-ations. For example, a skilled speller may retrieve some parts of a complex spellingfrom memory but, having a poor memory for other parts of the spelling, may beforced to invent. What differentiates beginners from experts may be the sources ofconstraint on their spelling inventions. The spellings of young children who producesequences like ACR for / like swings are constrained only graphically. Children whoproduce such spellings know something about the letters in their writing systemand how often they occur individually and in groups. They know that English isusually written horizontally, from left to right. However, their choice of letters is notyet constrained by phonology. Learning to spell involves learning more and moreconstraints—phonological, graphotactic, and morphological. When a skilled spellerinvents part or all of a spelling, it is thus likely to be close to the conventional one. Itmay even be fully correct, not recognizable as an invention. Likewise, even a youngchild who writes me as ME may have used invention as well as memorization.

13.8 Back to Education

Earlier in this chapter, we discussed the value of experimental work on children'sspelling. Experiments can provide data that would be difficult or impossible to obtainin naturalistic studies. However, experiments can draw us away from looking at

Page 223: Rich Languages From Poor Inputs

20 8 Read and Treiman

writing as it occurs in classrooms. One of Carol Chomsky's primary concerns waswith the classroom, and we return in this final section of the chapter to that topic.

Children's spellings are creative achievements, and there are often good reasonswhy children produce the spellings that they do. At the same time, it is important forchildren to learn conventional spelling. One approach, adopted by the teacher of thechildren studied by Treiman (1993), is to assume that children will learn standardspelling largely on their own. This teacher therefore provided children with minimalfeedback. If they asked how to spell a word as they were writing, the teacher did notanswer.

Carol Chomsky did not agree with that approach, nor do we. Children need guid-ance and instruction in spelling, as in many other domains (Mayer 2004); they cannotbe relied upon to discover on their own the principles that underlie the system.Explicit instruction is more important for learning to spell and read than for learningto speak and understand, for which humans are better equipped. Adults who careshould appreciate and support children's independent spelling efforts, but they shouldalso provide correction that is calibrated to children's level of development. For exam-ple, if a child asks a direct question about spelling, a teacher or parent may take that asevidence of what that child is ready to learn. The adult may respond with an accuratestatement about standard spelling in a way that does not deny the legitimacy of thechild's own judgments. Read (1975: 77-8) gives an example of feedback that is meantto acknowledge the phonetic basis of a child's own spelling while being truthful aboutstandard spelling, and Chomsky (i976b: 503-5) tells of helping two nursery-schoolchildren at the very beginning of writing by posing questions that bring out what thechild knows and providing explicit direction when the child is at a loss. This balancebetween accepting, even celebrating, a child's own invention and teaching toward anadult model is characteristic of thoughtful instruction in many domains, but manyparents and teachers may find it more difficult to honor non-standard spelling than,say, stick-figure artwork. Traditional views of spelling are more like those of basicarithmetic: an area in which invention is unwelcome.

To provide useful feedback, teachers need to know about the characteristics of earlyspellings and why they occur. Teachers should also know about the characteristicsof the target writing system, including the fact that most English spellings are notarbitrary and that there are often good reasons why words are spelled the way theyare. Teachers' own literacy is a double-edged sword, because it can make it hard forthem to think about how a language seems to a person who does not yet know howto spell it. Teachers may not appreciate, for example, the logic behind a child's non-standard categorizations of certain sounds. Teachers' skills can be increased throughappropriate instruction (Moats 1994), and this may in turn benefit students.

If there were identifiable stages in the development of spelling, then we could designa curriculum for spelling instruction based on knowledge about the order in whichconcepts should be introduced. We would have a powerful basis for individualized

Page 224: Rich Languages From Poor Inputs

Children's Invented Spelling 209

assessment and teaching; a child at stage B is ready for instruction at stage C, butnot D. We could reliably evaluate instructional materials: those that move childrenfrom B to D in a year are probably better than those that manage only C. However,theories that attempt to identify discrete stages during the development of spelling,such as those proposed by Henderson and Beers (1980), are problematic. Some ofthe empirical challenges to these theories have been mentioned previously, such asthe fact that children don't appear to go through a stage during which they spell allletter name sequences with the corresponding letter and that early spellings that havebeen interpreted as reflecting a random-letter stage may not be as random as theyappear. Other challenges to the theories are ones that any stage theory of developmentfaces: the fact that there is a great deal of variation within a child at a given time.Varnhagen (1995) challenged the concept of stages as it applies to spelling, notingthat 'progression from stage to stage is not invariant' (p. 260) and that children have avariety of strategies available to them from an early age. At this point, we have comecloser to consensus on identifying some of the strategies that children use in spellingthan in arranging those strategies on a developmental continuum sufficiently precisethat it could be used for instruction and evaluation.

Although questions remain, a great deal has been learned about children's spellingin the years since Chomsky and Read began to work on the topic. The continuing workhas increased our appreciation for the variety and depth of cognitive activity that isinvolved, even at early ages, in acquiring what might seem to be a subordinate part oflanguage and literacy.

Page 225: Rich Languages From Poor Inputs

14

How Insights into Child LanguageChange our Understanding of theDevelopment of Written Language:The Unfolding Legacy of CarolChomsky

STEPHANIE GOTTWALD AND MARYANNE WOLF

The insights of Carol Chomsky into how a young child acquires language across timehave changed the field of reading research in ways that are only now coming to thefore. In her synthesis of work from linguistics, education, and child development,Chomsky challenged linguists to study language development beyond the age of five,and she challenged educators to incorporate research on language development forthe children who could not learn to read easily. For teachers, she introduced the term'invented spelling', a highly influential description of children's first writing attemptsthat could be useful in learning to read (C. Chomsky 1971). In the first part of thischapter, we review briefly some better-known aspects of her scholarly work.

The bulk of the chapter, however, will be devoted to a description of a recent andless-known outgrowth of Carol Chomsky's legacy—what she colorfully described inher article, After decoding—what?' (Chomsky 1978). In this context, we discuss aview of reading in which comprehensive linguistic knowledge is shown to be criticalfor both the diagnosis and intervention of reading disabilities. We describe a theoret-ical overview in which explicit instruction in multiple areas of linguistic developmenthelps to propel children's acquisition of written language from decoding to fluentcomprehension. We then present an application of these principles in an innovativecurriculum designed by us to assist struggling readers in the development of theirlinguistic knowledge in areas ranging from phonology to syntax. Finally, we provideefficacy data that show the significant impact of these areas of knowledge on the

Page 226: Rich Languages From Poor Inputs

Oral and Written Language Development 211

acquisition of written language, a fundamental assumption made by Chomsky threedecades ago, but implemented only in the last few years.

14.1 Background

In her groundbreaking dissertation research, Chomsky described the orderliness ofthe developmental sequence in which older children learned various syntactic struc-tures and how it shared a great deal of similarity with earlier data from researchers likeRoger Brown (Brown 1973) and Ursula Bellugi (Brown and Bellugi 1964; Klima andBellugi 1966) with younger children. The implications of Chomsky's findings demon-strated that syntactic aspects of language acquisition are not complete upon schoolentry. Further, she demonstrated that this acquisition process proceeds beyond the ageof five in a manner identical to other aspects of language acquisition like semantic andmorphological development in the young child—systematically and without directinstruction. Through this research Chomsky cautioned that these findings may bejust the 'tip of the iceberg' and that the syntactic structures yet to be acquired couldbe fairly extensive.

Most important for our foci in this chapter, Chomsky analyzed the relationshipbetween language development and exposure to reading material. Utilizing informa-tion concerning IQ, SES, and the amount and complexity of reading material to whichthe child was exposed, Chomsky found important insights into the relationshipsbetween the child's linguistic stage and three variables: the child's knowledge of books,the average complexity of books named by their parents, and the child's IQ. In otherwords, linguistic development was closely related to the quantity and complexity ofreading material to which the child had been exposed by the parent. In one of the mostinfluential of her papers she advocated that one of the best predictors of the youngchild's future reading was the amount of time the parent read to the child (C. Chomsky19723). As the children grew older, their linguistic development became more closelyrelated to their own reading behavior, rather than to the reading performed by theparent. These early conclusions by Chomsky were borne out repeatedly over the lastthree decades (Snow 2000; Wolf 2007).

Discussed in other chapters, Chomsky expanded her interests in the links betweenoral and written language development through her insights into the 'inventedspelling' attempts by children who had neither explicit spelling nor writing instruc-tion. The characteristics of their 'invented spellings' bear some striking similarities,e.g., long vowels represented by letter names (EOT boat, PEL feel); short vowelsrepresented by the letter name which contains the closest sound (BAD bed, WOTRwater); nasals omitted before consonants (WOT won't, PLAT plant); and some wordswhich are spelled using the full name of the letter (YL while, THAQ thank you).

Chomsky found that different children converge on a system of spelling, whichis relatively systematic and uniform. Based on these and other findings, Chomsky

Page 227: Rich Languages From Poor Inputs

212 Gottwald and Wolf

suggested that some aspects of reading acquisition may be approached through theteaching of writing before reading. In combination with large amounts of exposureto reading and listening materials, this method provides the children with the rawmaterials of letters, sounds, and words to interact with text. This experience withspelling, in turn, helps children actively hypothesize how words are represented inwritten language. Charles Read and many other researchers replicated and extendedChomsky's early insights into an entire body of research concerning 'invented spelling'(Read 1986).

The most pertinent directions in Carol Chomsky's work on written language havedirectly and indirectly influenced our own research into the development of readingin children with dyslexia. Both then and now, the largest research questions andemphases among researchers target the acquisition of early decoding at the word level.As alluded to earlier, with her own emphasis on later syntactic development, Chomskyasked a quite different level of question: what happens after decoding, when childrenmust become fluent readers and comprehenders of more sophisticated, sentential-level text? Towards that end, Chomsky (1979) performed several experiments withinstructional methods to improve reading fluency in a small group of teacher-referredstruggling readers. In her method, children listened to a recording of text by a compe-tent adult reader, while they read along silently. Chomsky proposed that by essentiallymemorizing a text, the children would begin a tacit, deeper discovery of text andwould be motivated to read more on their own. Chomsky's end goal was that thechild should be'... bathed in inputs with which he interacts' (C. Chomsky 1979). Veryimportantly to our present research, she suggested that if any of the children were notmaking progress listening and reading along with the recording, a range of activitieswere to be added which emphasized the development of automatic knowledge inEnglish orthography, in morphology, in text writing, and in comprehension strategies.In so doing, Chomsky's methods foreshadowed both what would become one of themost-used methods for teaching fluency—called 'repeated reading' methods—andnew multi-componential approaches designed by us for the remediation of fluencyand decoding issues in struggling readers.

Until recently, Chomsky's influences were, in essence, only half understood andonly partially incorporated in reading research and in classroom practice through thealmost ubiquitous use of repeated reading methods. The second half of her thinking—about children who fail to make gains with repeated reading methods—has beenconsistently neglected for many years. Described below, her concomitant suggestionof directly and explicitly emphasizing other major linguistic systems for the childrenwho do not progress with repeated reading is now an integral part of the foundationfor our multi-component intervention for fluent comprehension in dyslexia research.

To summarize this brief review, the work of Carol Chomsky strongly emphasizedthe links between linguistic knowledge and the acquisition of written language, bothfor those who acquire reading easily and for those who struggle. In her view, the

Page 228: Rich Languages From Poor Inputs

Oral and Written Language Development 213

child comes to school with a great deal of knowledge about language that can beactively engaged and put to work in the process of learning to read. Each new piece ofinformation about language—from learning the correspondence between sounds andtheir spellings and the use of words in different contexts and their different meanings,to the varied but orderly progression in the structure of sentences—invites childrento examine their theories about their native language and make adjustments thatincorporate this new knowledge. To Chomsky, the more evidence about language thata child is exposed to, the more likely the child will become an advanced user of oraland written language. When this fails, for whatever reason, then all these other varioussources of linguistic input need to be addressed supplementally for the child.

14.2 Theoretical and Applied Implications of Chomsky's Work

The latter insight by Chomsky has become a fundamental cornerstone of our workwith struggling readers. Through a synthesis of cognitive neurosciences, child linguis-tics, education, and child development, we have examined the processes that youngbrains require to identify words when reading. To be sure, we now know far moreabout the particular linguistic and cognitive processes that together make up the read-ing brain (see Wolf 2007). Our research combined with imaging results from otherresearchers in the neurosciences (see, for example, Gaab et al. 2007; Gabrieli 2009;Hoeft et al. 2011) together provide a working template of the multiple attentional,linguistic, visual, and cognitive processes that are required by the reading circuitry.Together these systems enable the reading brain to recognize letters and familiarletter patterns; to connect this information to the stored, corresponding phonemes;and almost simultaneously, connect this cumulative information to the meaning(s) ofthe word, its grammatical uses, its potential incorporation of morphemes, and its usagein social contexts (pragmatic knowledge). Very importantly, the brain must retrieve,connect, and integrate all this information in a fraction of a second in order to havetime to comprehend the word in text.

Although, as noted, a great deal of research over the last decades has been success-fully devoted to understanding the decoding phases during the reading act, there hasbeen significantly less success in understanding and remediating the goals of readingacquisition—that is, fluent comprehension. Just as Carol Chomsky recognized in herearly work with struggling readers, many of these children can learn to decode, butlaboriously, and never rapidly enough to allocate time to the critical comprehensionprocesses. Using cognitive models of word retrieval and reading, we set out to under-stand each of the component processes involved in decoding single words, sentences,and text with fluent comprehension.

The background for our evolving models of the reading circuit began with evidencefrom research in aphasiology, particularly with acquired alexia and from children

Page 229: Rich Languages From Poor Inputs

214 Gottwald and Wolf

with developmental dyslexia. We sought to understand how the brain learns to readin typical development and fails to read in adults with discrete areas of brain injuryand in children with reading disabilities (Pugh et al. 2005; Wolf 2007). An exami-nation of the young reader's first 'reading circuit' illustrates the many componentsinitially involved—from visual pattern recognition systems to varied cognitive andlinguistic systems (Tan et al. 2005; Sandak et al. 2004). As the child progresses frompre-reading to early acquisition, progressively more linguistic knowledge becomesessential to understand the many dimensions contained within a written word: i.e.,phonology, orthography, morphology, syntax, semantics, pragmatics. Each systemactivates specific areas of the brain when we read. A conclusion from this researchand the cornerstone of our intervention is that almost everything the child knowsabout oral language contributes in some way to the development of the componentprocesses of written language and to the automatic access and connections of all theprocesses needed to ensure rapid comprehension of what is read. Unfortunately, theconverse is also true: what the child does NOT know will also influence the acquisitionprocess.

Although correctly emphasized in most instructional decoding programs, phono-logical knowledge represents a necessary but insufficient component process for read-ing fluency. As Carol Chomsky recognized in her work on 'invented spelling', thedevelopment of knowledge about the common orthographic patterns in English isalso very important, as well as semantic and morphological knowledge for compre-hension and rich writing efforts. In essence, she was, to our knowledge, the firstresearcher to begin to target the development of additional linguistic systems, likesemantics and morphology. Ironically, perhaps, she did not include in her particu-lar study her own findings on the importance of increased syntactic knowledge forchildren between five and nine for the fluent comprehension of more sophisticatedsentential and text-level reading.

It is now clear that the development of knowledge in each of these linguistic areasbecomes even more essential for comprehension at the sentential level, with semanticand morpho syntactic knowledge of increasing importance over time. For example,rich semantic knowledge both plays a significant role in children's reading compre-hension and impacts fluent word recognition. Semantic knowledge refers both to thesize of a vocabulary, and also to the strength and depth of individual word knowledge(Frishkoff et al. 2008). Consider the multiple meanings of the word 'duck'. When func-tioning as a noun, it represents a relatively charming, web-footed, swimming bird; asa verb, it means to avoid; and if you live in Boston, it is an adjective for the charming'duck tours' on vehicles that traverse both harbor and land! It is important to stressthat a great many of the most common children's words are equally polysemous. Themore knowledgeable children are about a word, its multiple meanings, and variouspragmatic and syntactic contexts, the more rapidly the word is processed during itsreading in sentential contexts (Locker, Simpson, and Yates 2003). As a result, children

Page 230: Rich Languages From Poor Inputs

Oral and Written Language Development 215

can move into more sophisticated text-level reading with greater fluency. This, inturn, allows more time to be allocated for understanding. This same scenario takes onparticular significance for children who struggle to read because of a lack of fluency.If they are 'prepared' linguistically with the knowledge that a word can have multiplemeanings, they are more likely to comprehend the polysemous word even if they readmore slowly. In short, semantic knowledge not only affects the speed of accessing theword, but also significantly impacts a deeper comprehension of text for good and poorreaders alike.

The implications of this conclusion are significant beyond work on dyslexia. Investi-gations into 'word poverty' (Moats 2000) and the effects of impoverished word envi-ronments demonstrate the significant and long-term impact of a child's vocabularysize on his/her reading comprehension (Stanovich 1985). Moats (2001), for example,estimates that there is a 15,000 word gap between lower-income and higher-incomechildren who enter first grade. The significance of this finding is brought home byBiemiller (2005), who found that kindergarten children with a vocabulary in thebottom 25 percent remain behind in vocabulary and comprehension into middleschool and often well beyond.

Related to both semantic and orthographic knowledge is the least emphasized lin-guistic component of reading—morphological awareness. Morphological knowledgein young children includes, among other things, the conventions that govern wordformation, and the ways in which roots and affixes create new word meanings. Forexample, children need to learn that adding the suffix morpheme '-s' to the root'duck', can create the plural noun 'ducks' or the present, singular verb; while adding'-ing' creates the present participle 'ducking'; and adding '-ed' creates the past verbform 'ducked'. Such morphological knowledge provides the child with critical disam-biguating syntactic information in sentences (e.g., '-ed' rapidly clarifies that 'ducked'is the verb form). Because the role a word has in sentence structure helps determineits meaning, this collective morphosyntactic information propels comprehension.

While morphological awareness is critical in most languages, it plays a particu-larly important role in English, which is, of course, a morphophonemic language thatrepresents both morphemes and phonemes in its spelling. Words that are irregularlyspelled no longer seem as arbitrary in their spelling to children when they understandtheir morphemic roots. To take a well-used example (from N. Chomsky and Halle1968), the word 'muscle' connects this seemingly irregularly spelled word to its basicroots. Thus, it illumines for children the semantic relationships among words like'muscle', 'muscular', and 'musculature'. From this perspective, by conveying semantic,syntactic, and orthographic information, morphological knowledge contributes to thedevelopment of spelling, to faster word recognition, and to fluent comprehension.

We have reserved for last the singular importance of a growing syntactic base forcomprehending text. Like morphology it represents another less emphasized compo-nent in reading instruction in general, and reading intervention in particular, despite

Page 231: Rich Languages From Poor Inputs

216 Gottwald and Wolf

the fact that syntactic knowledge is of exponentially increasing importance over theschool years. Knowledge of how words are used within different grammatical orsyntactic contexts is essential for the child's growing fluency and comprehension. Justas Chomsky demonstrated in her study of syntactic constructions, children's syntacticknowledge of different structures is acquired over time; so also is their understandingof different sentence constructions. Analogous to the extensive research into the recip-rocal relationships between vocabulary knowledge and reading, a similar reciprocityexists between syntax and written language. Children who read a variety of increas-ingly sophisticated sentence constructions have enhanced comprehension and moresyntactic knowledge. Children with highly developed syntactic knowledge, in turn,comprehend text with more complex syntactic constructions better and more rapidlythan those with less syntactic knowledge.

If we would try to summarize the existing research on what the young human brainlearns to connect when it reads a single word, it would be an impressive panoply ofmultiple linguistic components, perceptual systems, and cognitive processes. Further,that developing brain must learn to retrieve, connect, and integrate all the informationfrom these processes in a fraction of a second. The precision and rapidity involvedin integrating all these components (i.e., fluency) enables the young reader to havethe time necessary both to comprehend the meaning of the author and to connectthis meaning to his or her own thoughts and insights. Without fluency, withoutfluent comprehension, the reader is virtually bereft of the ultimate goal of reading:an understanding that goes beyond the text into insight and discovery.

14.3 RAVE-O Intervention: An Applied, Unfolding Legacyfrom Carol Chomsky

The RAVE-O intervention is an unusual reading program that shouldn't be unusual atall. Indeed with no small historical humility, the program bears notable resemblancesto the first known reading pedagogy by the Sumerians, who had no previous modelsor methods to guide them (Wolf 2007)! The intervention's purposes are to teach theyoung reading brain how to build up and rapidly retrieve all the sources of visual, cog-nitive, and linguistic information described above and connect them during reading.The ultimate goal is to teach struggling readers to read rapidly enough to be able tounderstand the text and think for themselves about what they read.

Based on theoretical accounts of reading fluency and comprehension (Wolf andKatzir-Cohen 2001), the program attempts to simulate what the brain does when ittries to read a single word with fluency and comprehension. RAVE-O's basic premiseis that the more the child knows about a word (i.e., phonemes, orthographic pat-terns, semantic meanings, syntactic and pragmatic uses, and morphological roots andaffixes), the faster the word is decoded, retrieved, and comprehended. RAVE-O is not

Page 232: Rich Languages From Poor Inputs

Oral and Written Language Development 217

so much a wholly new program, as it is the application of insights from Chomsky andcognitive neuroscience when connected to 'best teaching practices' and some newlydesigned activities that can systematically address multiple linguistic, cognitive, andaffective systems.

To make the program come to life, a few examples will suffice. Each week childrenlearn all the relevant phonological, orthographic, semantic, and syntactic content fora small group of core words and learn to make explicit connections across theselinguistic systems. Making these connections is key to re-enacting what the brains'reading circuit' does. For example, with the word 'jam', the instructor first reviewsthe individual phonemes, l]l + /a/ + /m/, and then teaches the child to find the chunksin 'jam': that is, the rime (/am/) and the onset or starter (/)/). This step consolidatesphoneme-level knowledge and connects it to orthographic patterns. Almost immedi-ately this knowledge is then connected to the semantic base. The word 'jam possessesat least three common meanings and can be used in different syntactic functions(as noun and verb) and pragmatic contexts (e.g., a musical 'jam'). Moreover, 'jamcan be easily changed by the addition of different morphemes (e.g., jams, jamming,unjammed) to show how words can change, but still have their root visible. Theuniqueness of RAVE-O is that explicit attention is given to learning and connect-ing the major linguistic components in every word, and in every teaching unit ofthe program.

A continuum of game-like activities offers whimsical means to teach childrento connect individual phonemes, to orthographic units, to meanings, to uses. Inturn, these connections facilitate rapid decoding and comprehension processes andimprove spelling along the way. Word Webs are a regularly recurring semantic exercisewhere the child's knowledge of the different possible meanings are elicited and thenrepresented by image cards on a huge web. The web provides a simple, visual way ofillustrating how words are interconnected; simultaneously, the image cards for eachof the various meanings of the core words provide important visual imagery that aidboth storage and retrieval from memory for children who are often characterized byword retrieval difficulties.

A range of metacognitive strategies (called 'Magic Tips') enables children to seg-ment the most common orthographic and morphological units in words. The tipsare quick, often humorous mnemonics that teach key strategies about words. Forexample, the strategy called Ender Benders helps children quickly recognize com-mon morpheme endings that change (that is, 'bend') the word's meaning. The ThinkThrice Comprehension Tip is a set of three comprehension strategies to enhance thechild's prediction skills, comprehension monitoring skills, and the child's analytical andinferential skills. The third of these strategies sounds deceptively simple: "Think ForYourself. In reality, it represents a concrete embodiment of the 'Proustian principle',that is the ultimate goal of RAVE-O and reading itself—going beyond the author' tothink new thoughts.

Page 233: Rich Languages From Poor Inputs

218 Gottwald and Wolf

These metacognitive comprehension strategies become almost daily implementedthrough a series of specially written RAVE-O Minute Stories. Each story is introducedand then followed by the teacher guiding the child in the use of the strategies. In addi-tion, the stories' controlled vocabulary incorporates the phonemic and orthographicpatterns, multiple meanings, and varied syntactic contexts of core words. The MinuteStories represent, therefore, multipurpose vehicles for facilitating more automaticrates within phonological, orthographic, syntactic, and semantic systems at the sametime that they reinforce the use of the most important strategies for understandingstories and thinking their own thoughts. In the process, the stories build an ever-important affective dimension for the children, who often feel disenfranchised fromlearning and from their own language. The content of the Minute Stories providesa platform for exploring the dejected feelings struggling readers often have aboutlearning to read.

The various 'Magic Tips' and whimsical activities for wordplay may appear light-hearted and fun-filled, but our goals for them are very serious. Children who arestruggling readers need to learn the interconnected nature of words, and they usuallydon't. The collective activities, the use of specifically designed computer games, andthe novel-sounding strategies provide a deeply important, systematic foundation forsome of the most important skills used in all later learning.

14.4 Summary of Results

The effects of RAVE-O with struggling readers have now been studied for ten years infederally funded research in three contexts: (i) a pull-out intervention in the schoolduring the school day; (2) an intensive summer-school remediation program; and (3)an after-school intervention. In each of these studies, RAVE-O is combined with asystematic phonological analysis and blending program (e.g., SRA Reading Masteryor Orton-Gillingham) and taught to small groups of four children.

Recent results are based on a three-city, federally funded (National Institute forChild Health and Human Development) randomized treatment-control study, whichinvolved children who represented the most impaired readers in Grades 2 and 3.Children were randomly assigned to four treatment conditions and were controlledfor socioeconomic status (SES), race, and IQ. Each group received seventy hours oftreatment throughout the school year.

We compared the effects of four types of treatment on an extensive batteryof tests on all aspects of reading—from accuracy and fluency in word attack tocomprehension—and on many language measures. The four treatments includedtwo programs with multi-component emphases (RAVE-O and PHAST), onephonologically-based program, and one control group who received regular class-room reading instruction. The PHAST program (Lovett et al. 2000) emphasizedphonology, orthography, and morphology, but did not include the semantic and

Page 234: Rich Languages From Poor Inputs

Oral and Written Language Development 219

syntactic emphases in the RAVE-O program. The RAVE-O and the PHAST programsoutperformed the control group on every measure. When compared to the systematicphonological analysis and blending treatment, the RAVE-O and PHAST groups againproved better on every measure. When compared only to PHAST, RAVE-O made sim-ilar significant gains on standardized measures of decoding, but superior gains on theGORT-3 Oral Reading Quotient, a combined fluency and comprehension score, andon measures of vocabulary and semantic flexibility (see overview in Morris, Lovett,Wolf, et al. 2012). In other words, students who received instruction in programs thatemphasized multiple dimensions of linguistic knowledge, performed equally well orbetter on every word attack and word identification measure (the specific emphases ofthe more unidimensional decoding treatment). Most importantly, RAVE-O with itsadditional emphases on semantic and syntactic development outperformed all othertreatments in vocabulary and on the GORT fluency comprehension measure.

The theoretical implications of these outcome data are critical. The premise ofRAVE-O is that the plural linguistic emphases will enhance decoding, as well asvocabulary and fluent comprehension. The fact that RAVE-O instruction expends farless time on specific decoding skills and yet made comparable or superior gains inword attack and word identification to programs which spent more of their instruc-tional time on these skills is compelling evidence supporting the theoretical premiseof RAVE-O: the more the child knows about a word, the faster and better the wordwill be decoded and understood.

In addition, and very importantly, this NICHD study demonstrated that impairedreading children could make significant gains in reading regardless of initial SES, race,or IQ factors (Morris et al. 2012; Wolf et al. 2009). The latter results cannot be overem-phasized. It suggests that despite these known impediments to achievement, the twomultidimensional interventions produced similar gains in children from privilegedand underprivileged backgrounds, regardless of IQ level or race. This result directlyanswers the question whether the linguistic demands in RAVE-O are too heavy forchildren in poverty or for children with lower cognitive aptitudes. We are well awareof Chomsky's lifelong concern with children from underprivileged backgrounds. Wefeel the results from the RAVE-O intervention program represent an important affir-mation not only of her theoretical insights, but also of Carol Chomsky's deeply heldgoals for all children.

In summary, the contributions of Carol Chomsky are hardly over. Researchers inchild language and early reading have long been indebted to her for work concerning'invented spelling', the importance of reading to young children, and syntactic devel-opment. Her most important insights, however, maybe her least known till now: thecritical contributions of all aspects of oral language to the development of reading,particularly for struggling readers. Our evolving interventions are daily testimony tothe unfolding legacy of Carol Chomsky.

Page 235: Rich Languages From Poor Inputs

15

The Phonology of Invented Spelling1

WAYNE O'NEIL

15.1 Introduction

Invented spelling is a term used to refer to young children's attempts to write priorto spelling instruction. Some characteristics of invented spelling also appear in thespelling errors that children make in the course of acquiring an alphabetic writingsystem, some of these often persisting into adulthood. Since, for the most part, thestudy of invented spelling has been conducted in the United States and on English, thephonological analysis of invented spelling presented below is based on the AmericanEnglish data.

An example of invented spelling from Dylan, a five-year-old raised in New YorkCity is shown in Figure 15.1. Note that among its non-standard spellings, there arethree instances of standard the and the child's correctly spelled name Dylan. For theother five words, Dylan is on her own: POL for Paul; JON for John (with, however,a mirror-image J); GOJ for George; and RGO for Ringo.2 Beatles is spelt as bothBTELS and BETLS. As we shall see, these few words illustrate some basic phonologicalcharacteristics of invented spelling. They also reveal that children's invented spelling,influenced as it often is by standard spelling, is not as consistently non-standard asthe following well-chosen examples might suggest.

However, before examining the phonology of invented spelling more fully, we notethat in their early years, children are quite sensitive to phonetic nuance. For example,it has been observed that a 'striking fact about language acquisition in the young childis the degree of precision with which the child imitates the speech of its models— Theprecision of phonetic detail goes far beyond what adults can perceive without special

1 This phonological analysis, modulo some terminological differences, is largely based on Read (1971).For further detail, see Read (1970, 1975, 1980). Except for those from Dylan, all examples of inventedspelling that follow are taken from Read's work.

I follow the standard practice for the discussion of invented spelling by presenting the child's spellingin capital letters.

2

Page 236: Rich Languages From Poor Inputs

The Phonology of Invented Spelling 221

FIGURE 15.1 Five-year-old Dylan's "The Btels'

training and thus cannot possibly be the result of any form of training The child isevidently hearing—not consciously, of course—details of phonetic nuance that it willincorporate as part of its linguistic knowledge but that in adult life it will no longer beable to detect' (N. Chomsky 1988: 27).

Thus in advance of the evidence from invented spelling, we know that youngchildren hover close to phonetic ground in phonological perception and production.Their invented spelling provides further evidence in support of this fact. And as weshall see, it also provides further support for Jakobsonian distinctive feature analysis(as in, e.g., Chomsky and Halle 1968).

15.2 Phonology

What is most intriguing about this phenomenon is that inventive spellers, given theabundance of written language surrounding them and the set of letter names, strikeout into the world of writing, bringing their perception of phonetic detail to somelevel of consciousness, in order to make their own written labels, signs, and stories.

Using letter names to represent the relevant sounds, syllables (particularly in thecase of the syllabic sonorant /m n 1 r/), and whole words is fairly straightforward, andnot particularly remarkable: A for /ey/, as in MAK make; E for /iy/, as in FEL/ee/; Ifor /ay/, as in TIM time; Y /way/ for the question word why; YL for while; and so on.

Page 237: Rich Languages From Poor Inputs

222 O'Neil

15.3 Consonants

Thus, invention doesn't really begin until the child has to deal with the fact that theletter names are insufficient to the task and is then pushed into uncharted territory,dealing with the problems it faces in these ways, among others:

• Taking the onset or the coda of a consonant letter name to represent the relevantsound: thus /k/ stripped from /key/, as in KOK Coke; HI from /tiy/ and /m/ from/em/, as in TIM time; HI from /ef/ and III from /el /, as in FEL/ee/; etc.

• Perceiving that the first segment of the syllable onsets /tr/ and /dr/ is affricated,thus alternating between CH/HC or T and J or D, but generally focusing on thefeature [+strident] of /c/ and 1)1 as the more prominent one in representationsof these initial consonants; thus AS CHRAY for ashtray; CHRIE for try; JRAGINfor dragon; JRADL for dreidel; etc.

• Recognizing that /c/ (more clearly seen in its IPA representation /t|7) begins asa stop [—continuant] before affrieating into a [+continuant]; noting, moreover,that when followed by a [—low, —back] vowel such as the III of chicken, HI isitself palatalized, arguably slightly affricated; thus TEKN for chicken; etc.

• Perceiving the second segment of syllable onsets like /sp/, /st/, and /sk/ to be[—aspirated] like /b/, /d/, /g/, but [—voice] like /p/, HI, /k/, then choosing torepresent the stops with the symbol for either the [+voice] or [—voice] memberof the pair; thus STAPS for stamps; SDROGIST for strongest; etc.

• Representing both the flapped HI of later and /d/ of ladder as D; PREDE forpretty; LADR for letter; LADE for lady, etc.

• Ignoring, however, the difference between oral and nasal vowels (that is vowelsfollowed by a nasal cluster like /mp/, /mb/, /nt/, /nd/, etc.—as in Dylans RGOfor Ringo), and more generally, sonorant-consonant clusters (her GOJ for George,perhaps3). For these sounds, a vowel symbol or nothing is the best children cando, evidently felt to be sufficient though it is clearly not adequate.

There are finally a number of English consonants for which the child has no easyrepresentational solution: /0 3 s z g rj h/, as in thin, that, fish, azure, get, sing, and hat.The spellings of /0/, /S/, /g/, and /h/ represented by TH, TH, G, and H, respectively,appear to be solved by guidance from 'above', for standard spellings predominate:THOPY for thumpy, FETHR for feather, GOWT for^oaf; HACC for Hank's; etc.

But for /s/, feature analysis again plays a role, for children represent this soundwith S or H; thus FES for fish; SOGR for sugar; AS for ash; FEHEG for fishing; etc./s/ differs from /s/ by the feature [anterior], and the /c/ of the consonant in theletter name H /eye/ differs from /s/ by the feature [continuant]. So using either Sor H to represent /s/ is off by one distinctive feature: [+anterior, +continuant] /s/ ~[—anterior, +continuant] /s/ ~ [—anterior, —continuant] /c/.

3 Since Dylan was raised in an /r/-less environment, the missing Irl is perhaps better explained bythat fact.

Page 238: Rich Languages From Poor Inputs

The Phonology of Invented Spelling 223

15.4 Vowels

Inventive spellers are unusually perceptive and ingenious in their representationof the short' vowels of English, for given that all the letter names for vowels arediphthongal—that they trail off into the glides Ijl or /w/—simply stripping the glidesaway does not deal with the representational problem: E, that is /iy/ minus Ijl = HI,solving nothing. So the child solves its problems by forgiving the difference, by aphonological feature or two, between the available vowel and the vowel it wants torepresent; thus:

• The vowel /:/ is represented by E, for with /iy/ stripped of its /y/-glide, the child isleft with HI [+high, —back, +ATR], a vowel that differs by the value of the feature[ATR] from the target vowel III [+high, -back, -ATR]; thus, FLEPR for Flipper;FES for fish; etc.

• The vowel Id is represented by A, for when the letter name /ey/ is stripped ofits /y/-glide, the child is left with Id [—high, —low, —back, +ATR], a vowel thatdiffers by the value of the feature [ATR] from the target vowel Id [—high, —low,-back, -ATR]; thus FALL for fell; LAFFT for left; etc.

• The letter A can also serve for /as/ [+low, —back, —ATR], which differs by thevalues for [low] and [ATR] from Id [-high, -low, -back, +ATR]; thus HCRAKfor track; STAPS for stamps; etc.

• The vowel hi is represented by O, for when this letter name low/ is stripped ofits /w/-glide, the child is left with lol [—high, —low, +back, +ATR], a vowel thatdiffers by distinctive feature values for [low] and [ATR] from the target vowel hi[+low, +back, -ATR]; thus POL for Paul; COLD for called; etc.

• Representing /a/ with I is more straightforward; take Ijl from /ay/ and you're leftwith /a/, more or less; thus GIT for^of; SCICHTAP for Scotch® tape; etc.

• /A/, differing from /a/ only by the feature [low] is thus grouped with the /a/ of/ay/ and also represented by I: LIV love; WIS was; SINDAS Sundays; etc.

• The reduced, unstressed vowel, schwa/barred i (/a~i/) is given a numberof representations, but E reduced to /:/, being the closest vowel phonologi-cally, predominates: ANEMEL for animal; BENANE for banana; CEPECOL forCepacol ®; etc.

15.5 Invented Spelling and Spelling

The data clearly indicate that English inventive spellers aim for a taxonomic phonemicrepresentation, one that is phonetically grounded and does not take the morphologyof the language into account. However, the writing system that the English-speakingchild must ultimately control is morphophonemic: its general principle, obviouslygrossly violated at times, is to leave unrepresented what can be predicted by phono-

Page 239: Rich Languages From Poor Inputs

224 O'Neil

logical rule (C. Chomsky 1970; N. Chomsky and Halle 1968). For example, for theregular verbs of English, the past tense and participial forms are generally spelled -ed:waited, talked, climbed, and so on. But, as we can easily see (and hear), the suffixof waited is pronounced /-id/; for talked, it is /-t/; and for climbed, it is /-d/. More-over, the distribution of the three different realizations of the past tense morpheme(/-id/ ~ /-t/ ~ /-d/) is not random. It is predictable from the fact

• that the final HI of wait is [+anterior, —continuant]; i.e., {/t/, /d/};• that like /-t/, the final /k/ of talk is [-voice];• that like /-d/, the final /m/ of climb is [+voice];

a pattern carried out in the pronunciation of such forms as raided, coated; laughed,blocked; and fooled, bowed. The [+anterior, —continuant] features of the suffix are'protected' by the epenthetic vowel hi from the identical [+anterior, —continuant]features of the final sound of wait, raid, coat, etc. These set aside, the [voice] value ofthe final sound of the verb is then spread to the past-tense suffix, /d/ following vowels(including epenthetic hi) and voiced consonants; /t/ elsewhere.

There are many other spellings that illustrate the morpheme-based character ofEnglish orthography, including those in which the spelling of the root is invariantregardless of its pronunciation; telegraph, telegraph-ic, telegraph-y, for example:

• /tebgreef/ : /talegrafi/ : /tebgrasfik/, where / 7 marks a primary stress, and / V, asecondary stress.

Moreover, generally irregularity, but not all of it, is accommodated in the orthog-raphy; for example, in the unpredictable past-tense and participial forms of irregularverbs and plural forms of irregular nouns:

• keep /kiyp/ : kept /kept/; leave /liyv/ : left /left/; etc.• run /TAI\./ : ran /rasn/; drive /drayv/ : drove /drowv/; etc.• leaf /liyf/ : leaves /liyvz/; shelf /self/ : shelves /selvz/; etc.• goose /guws/ : geese /giys/; ox /aks/: oxen /aksan/; etc.

However, English orthography does often fail to represent irregularity and it over-represents regularity; for example, the fact that /iy/ in obese is not reduced in obe-sity (as it regularly is in obscene ~ obscenity; supreme ~ supremacy, for example) isleft unrepresented, while the regular reduction of /uw/ to /A/ is represented in, forexample, profound ~ profundity; abound ~ abundance. In many other cases, Englishorthography falls far short of ideal morphophonological representations.

Other writing systems, the Navajo system for example, come close to a taxonomicphonemic representation (Young and Morgan 1987). For in Navajo what is predictableby phonological rule is nevertheless always represented in writing.

Page 240: Rich Languages From Poor Inputs

The Phonology of Invented Spelling 225

For example, Navajo's [+continuant] consonants alternate in a predictable waybetween [—voice] and [+voice]; when prefixed, the initial consonant of the stem is[+voice]; otherwise, it is [—voice]:4

unprefixed prefixedloh 'noose' biloh 'his/her/their noose'saad 'language' bizaad 'his/... language'shaazh 'bear' bizhaazh 'his/... bear'hosh 'cactus' bighosh 'his/... cactus'

Since Navajo prefixes are vowel-final and all stems begin CV—, the initial consonant ofa prefixed stem would be surrounded by [+voice] sounds, thus quite naturally voiced.

Navajos [anterior] harmony, whereby [—anterior] shi- alternates with [+anterior]si-, is another example of a predictable set of phonological alternations that is nev-ertheless fully represented in the writing system. (For details, see Hale, Honie, andO'Neil 1972/2010: Chs VII and IX.)

This characteristic of Young and Morgan's practical writing system makes it a veryuseful writing system for people who do not know the language—for Navajo heritagelanguage learners, in particular.

15.6 Conclusion

A child is not hardwired to read and write; thus it cannot know what kind of writingsystem, if any, it will have to contend with. And when we examine the range ofwriting systems that exist for the world's languages (alphabetic, alphasyllabic, syllabic,logographic, and combinations thereof), we begin to understand that writing systemscan be 'friendly' or not relative to their different audiences. For example, the Chinesesystem, a logographic system with thousands of characters, is no one's friend, for it isa barrier to literacy, acquired at great cost to the learner. This system does, however,establish a useful written link between the relatively small set of literate speakers ofthe mutually unintelligible varieties of Chinese, varieties that are so different that theywould be considered separate languages if divided by international boundaries.

English orthography is only somewhat friendly, to those who know the languagewell, who know at some unconscious level that the English lexicon is bifurcated(perhaps trifurcated) between morphemes from its Germanic origins and those thatflowed into the language from Romance and Greco-Roman conquests, both politicaland cultural. In accommodating the morphological complexities that arose in thelanguage, English orthography, which the inventive speller is trying to repair, hasmetastasized beyond repair. But this is the system the child has to learn.

4 Navajo sh, zh, h, and gh represent the sounds Is/, HI, /x/, and /y/, respectively. /!/ represents a[-voice] III.

Page 241: Rich Languages From Poor Inputs

226 O'Neil

"/f'^ all learning-is-fun and invented spelling,and then—bam!—second grade."

FIGURE 15.2 Cartoon by Barbara Smaller in The New Yorker, 14 December 2009

In the second to last paragraph of the final chapter "The higher learning' of hisTheory of the Leisure Class, Thorstein Veblen puts the task of acquiring conventionalEnglish orthography and the consequences of failing to acquire it as follows:

As felicitous an instance of futile classicism as can well be found, outside of the Far East, isthe conventional spelling of the English language. A breach of the proprieties in spelling isextremely annoying and will discredit any writer in the eyes of all persons who are possessed of adeveloped sense of the true and the beautiful. English orthography satisfies all the requirementsof the canons of reputability under the law of conspicuous waste. It is archaic, cumbrous,and ineffective; its acquisition consumes much time and effort; failure to acquire it is easy ofdetection. Therefore it is the first and readiest test of reputability in learning, and conformityto its ritual is indispensable to a blameless scholastic life (Veblen 1899/1974: 244).

Prior to second grade, children may be invited to violate this 'first and readiesttest of reputability in learning, and conformity to its ritual', 'and then—bam!—secondgrade', the beginning of acquiring an 'archaic, cumbrous, and ineffective' orthography,whose 'acquisition consumes much time and effort[,]... failure to acquire it [being]easy of detection.'

Page 242: Rich Languages From Poor Inputs

16

The Arts as Language: Invention,Opportunity, and Learning

MERRYL GOLDBERG

Two times two making four is a pert coxcomb who stands with arms akimbo barringyour path and spitting. I admit that two times two making four is an excellent thing;but if we are to give everything its due, two times two making five is a very charmingthing too.

Dostoyevsky, Notes from the Underground (in Guerney 1943)

One of the most important things, in my opinion, is the view of children as won-derfully creative and inventive human beings. This chapter is concerned with howuncovering the role of invented spelling in children's development highlights theimaginative, persistent, consistent, and creative work of young children.

It differs from most of the others in this volume, in that I am not a linguist, nor is myexpertise in the area of linguistics. I am a musician and educator, and Carol Chomskywas my advisor at the Harvard Graduate School of Education.

It pulls together Carol's work with my own, connecting it directly to current issuesin arts education. In this chapter I look closely at the notion of arts as language,the role of arts in literacy, but mostly, I uncover the intersection of the essence ofinvented spelling—not so much as a developmental phenomenon, but as a way to lookat children's development in a positive creative light, and how that way of looking atchildren runs parallel to what I see as the role of arts in education.

16.1 Arts Integration

The arts are a natural bridge to communication across cultures and languages. Thearts are also a historical repository of cultures. In embracing the arts, there are severalways to engage, reach, motivate, and capture the attention of students. Arts integrationutilizes the power of the arts to communicate and express content. In simple terms,

Page 243: Rich Languages From Poor Inputs

228 Goldberg

arts integration is an approach to teaching and learning in which students approachsubject matter through an art form. For example, in language arts, students may create'picto-spells' which are drawings that illustrate the meaning of a particular vocabularyword, as you will see later in this chapter.

In reading, children might act out scenes in their reading texts as a way to under-stand characters and show their understanding of the storyline including inferences,or they may illustrate (through drawing) the main idea of a paragraph. Studentsmay use art prints as writing prompts; for example, three different art prints mightbe the 'prompts' for the beginning, middle, and end of a story as a creative writingexercise. Arts integration works throughout the curriculum, as students dance scienceconcepts (one of the most magnificent was college students enacting DNA!), createsoundscapes for addition, subtraction, multiplication, and division, and study poems,historical paintings, or photographs to understand and interpret events throughouthistory.

Arts integration is not a new phenomenon. In fact, teaching and learning usedto be far more integrated prior to the Sputnik incident in the late 19505 (alas, aneducational history lesson would be an entirely different chapter) which reallypropelled the disciplinary-specific studies of each subject area. More recently,however, several theories emerged and blossomed onto the education scene startingin the 19705 and 19805, including Howard Gardners theory of Multiple Intelligences(MI) that bolsters the importance of arts integration techniques. Gardners theoryimpacted education, broadening teachers' views on how to teach and reach students.The Multiple Intelligences theory reminded teachers that there was more to teachingthan the teaching of subject matter. Teachers were reminded that they were teachingchildren in addition to subject matter. And, teachers were reminded that childrenentered the classroom with many capabilities which could be brought into theteaching and learning equation. The notion of broadening teaching tools to includelearning through the arts was broached.

It was in the 19805 and 19905 when the arts started playing a role in education as avehicle for learning. Rather than simply as a discipline to be studied (such as learningto play the trumpet or perspective drawing), educators and writers such as myself(Goldberg 2012) and Karen Gallas (1994) began uncovering and documenting theimportance of arts in the classroom as a 'language' or mode of communication forchildren, especially children whose first language might not be English, and childrenfor whom speaking and writing is enhanced when they can sing, dance, or draw anidea out prior to writing about it or speaking about it.

16.2 Art as Language

Art is many things and consists of many disciplines such as theater, music, dance,painting, sculpture, poetry, etc. Most artists consider the arts as both a language and

Page 244: Rich Languages From Poor Inputs

The Arts as Language 229

a discipline, in so much as it is a vehicle for the expression of ideas and feelings as itis a product, such as a musical tune, painting, dance, poem, etc.

Mickey Hart (1991: 7), one of the drummers from the Grateful Dead, describes thenotion of music as a language through which emotions and thoughts can be expressed:'Music is a reflection of our dreams, our lives, it represents every fiber of our being.It's an aural soundscape, a language of the deepest emotions; it's what we sound likeas people.'

Art Hodes (in Gottlieb 1996: 66), jazz pianist who played with Louis Armstrongand Bessie Smith, describes music as a language more literally than that as he reflectson his own early experiences with playing jazz:

Many times they'd ask me to play. I was kidded plenty. Someone would holler, 'Play the blues,Art,' and when I would play they would laugh. That hurt, but I couldn't blame them. I hadn't asyet learned the idiom. I was entranced by their language, but I hadn't learned to speak it yet.

Both musicians underscore the notion that music, and I would argue that all arts,provide an opportunity through which one can experience a fluency of language inso much as it enables learners to work with ideas, reflect on them, and ultimatelycommunicate and express them.

My qualifying paper1 completed at Harvard University and titled, '2+2 Doesn'tAlways Equal Four: Understanding Children's Inventions', explores the 'arts as lan-guage' notion and was based on the work of Carol and her students Charles Readand Glenda Bissex. The paper compared the literature of invented spelling to theliterature of 'misconceptions' with regard to how children learn and are perceived.And to tie it all together, I contextualized the paper in the work and writing of Dos-toyevsky, excerpting several passages from his short story Notes from the Underground.Remember, I was one of the Harvard oddballs, and an artist, and couldn't resistbringing the arts into everything I did! It was through this paper that I truly beganto explore the connections between understanding children's development and howthat understanding could inform the field of arts education. All these years later it stillstands as a foundation upon which I have created many programs.

There is no question that in the course of their development, children come up withthoughts and ideas that differ from convention. Quite often when a child's thoughtdiffers from an adult's or an authority's the child's thought is labeled a 'misconception'.I believe that this label doesn't address the process or liveliness of the child's develop-ment and discovery; the educator has made their own judgment of the child withoutseeking to understand the child's process of reasoning. This labeling can lead, in somecases, to a negative view of children's development.

1 At the Harvard Graduate School of Education, one must submit a qualifying paper before writinga dissertation. While most people relate their qualifying paper to their eventual dissertation, I did not!However, I have never regretted the work associated with my qualifying paper.

Page 245: Rich Languages From Poor Inputs

230 Goldberg

From a place beyond the wall of convention and accepted fact the most inventivediscoveries and ideas emerge. Artists live in this space, and educators often aspireto innovation, despite the constraints of federally and state-imposed frameworksand testing. Dostoyevsky, in his short story Notes from the Underground, provokesreaders to question their beliefs about conventions and truths. His character from theunderground is unwilling to accept mathematical certainty or scientific laws as truth.Instead, the laws offer us but a wall, a stopping point from which we can judge allelse. In many ways, his character could be a metaphor for current trends in educationwhere teachers and school districts have become so tied to testing and test scoresthat they leave little room for real learning aside from teaching to the test. Howunfortunate.

One of the things about arts integration that I love is how it provides a way forstudents and teachers alike to re-enter the world of inventiveness, imagination, andcreativity. Much like the core of invented spelling, which provides adults a uniqueview into the thinking of a child, arts integration opens adults toward seeing childrenas they truly are. Through arts, children reveal their thoughts and conceptions ina most engaging manner. Upon utilizing art as a language for children, teachersoften will become more open to understanding and using children's conceptions. Forinstance, even aside from any artwork, when a child comes to us and presents us witha statement such that 2 + 2 equals 3, we could decide that the child has some sort ofmisconception, or we could seek to understand if the child is inventing a way to thinkabout the problem.

A fun example of this particular problem, 2 + 2 = 3, came during an interview Idid with some small children as a part of the work in preparation for my qualifyingpaper. Since children do not have the same filters we have, they often are far moreinventive in their thoughts. When I asked the children to come up with an example ofhow 2 + 2 can equal something other than 4, one child raised his hand immediatelyand stated, 'when you add 2 cups of sugar to 2 cups of water you get 3 cups of mush!'This set offa landslide of inventive solutions and expanded the mathematical meaningof the word 'plus'. What delighted me was raw data of children being creative. Howjoyful.

16.3 The Discourse of Misconception

The difference in discourse between that of those who represent children's ideasas 'misconceptions' and invention is remarkable. In my research I was struck bythe invention discourse that speaks in such positive and glowing terms of studentdevelopment. This was in sharp contrast to the 'failures', 'problems', and 'difficulties'addressed in the misconceptions literature.

Several pupils showed much difficulty... some pupils failed to appreciate... Misconceptions existwhich constituted affective barriers to understanding...

Page 246: Rich Languages From Poor Inputs

The Arts as Language 231

The misconceptions in science movement studies show why and how students misconstrue orfail to learn science concepts...young children not only lack... they often have an inaccurateknowledge... wrong knowledge... erroneous concepts, erroneous ideas, misunderstandings...

Of course, we could search for what the child is understanding and still label thatthought a misconception: I don't mean to imply that the misconceptionists do notsearch for what their students understand, in fact my qualifying paper research indi-cated they were extremely diligent in this regard. Nonetheless, when a child's idea islabeled as a misconception the discourse is often negative. This is a sharp contrast toa very positive discourse which can be seen in the literature that describes children'sthoughts and actions as 'inventions'.

It is also important to point out that there have been various movements andresearchers who have taken the initiative to create alternative discourses and labelstudents' notions in ways other than 'misconceptions'. Children's thoughts have beendescribed as alternative conceptions, 'alternative frameworks' (Nussbaum and Novick1982), or 'conceptual systems' (Carey 1986), 'preconceptions' (Clement, Brown, andZietsman 1989), and 'critical barriers' (Hawkins 1978). I. O. Abimbola (1988) writesthat 'emphasis of student conceptions in science has resulted in a proliferation of termsthat described these conceptions'. These terms, Abimbola writes, include 'alternativeconceptions' and 'student conceptions'.

So why this wordplay? Misconception/invention—what difference does it makeif both groups of people are devoted to understanding children's development? Thedifference is powerful in shaping our views of our students. Misconceptionist litera-ture often negates that child's thinking process. While most misconceptionists valuechildren's pre-existing knowledge and acknowledge that it is constructed by eachindividual, the discourse is negative in nature. The discourse of invention, on theother hand, respects the children's process and development and is written in a morepositive language. Teachers who consider learners' work as invention acknowledgethat children are engaged in creative knowledge building. Invention is a creativeaffair. Viewing the learner as an active creator of knowledge respects not only thelearner but also acknowledges the very many different ways students invent theirunderstandings of phenomena. A training that emphasizes looking at children's inven-tions, no matter how they unfold, clearly is far more respectful toward children andtheir capabilities than one that finds 'misconceptions' and then figures out ways tofix them.

16.4 Invention

Invention, innovation, risk-taking: these are qualities associated with the process ofart, and the work of artists. Art also requires dedication, discipline, and specificallythe learning of techniques. Children are constantly engaged in the act of inventing.

Page 247: Rich Languages From Poor Inputs

232 Goldberg

These terms are also associated with the inventive speller, as with most children when

they engage in learning.

Piaget (1973) writes in To Understand is to Invent: The Future of Education:

...to understand is to discover, or reconstruct by rediscovery, and such conditions must becomplied with if in the future individuals are to be formed who are capable of production increativity and not simply repetition.

Creativity, which encompasses invention, is fundamentally a part of children's growth

and development. Of course it is also at the core of the artistic process.

16.5 Invented Spelling

Invented spelling is essentially an act whereby a child is being naturally creative

in communicating his or her needs and wants. There are children of four and five

years of age, who know the letters of the alphabet but do not yet read, who show

an uncanny ability to compose words and messages on their own, in creating their

own spellings. "They use the letters of the alphabet according to their names, or their

sounds if they know them, and represent words as they hear them, caring out splendid

phonetic analyses' (C. Chomsky ip/sb). This phenomenon is what was to become

known as 'invented spelling'. The movement began as a result of curiosity over a

child's seemingly unique ability to write before reading. Carol Chomsky reflected on

invented spelling in a paper presented at the American Montessori Society sympo-

sium, 'Montessori in the contemporary American culture' (April 26,1990):

I first encountered a child who wrote without knowing how to read in the early 19505. He wasthe 3 !/2 year-old son of some friends. He mentioned that he was producing writing that lookedvery strange, full of misspellings. They were surprised and then amused, not sure what to makeof it.

Of course I was intrigued. I asked to see some of the writing, and they obliged. To the eye ofa linguistic and phonetician, it was anything but nonsense. It took a while to figure it out, butthere was clearly a system at work. The spelling was startlingly regular, a principled renderingof the way English sounds. It was 'unreadable' because it was not like conventional spelling.One didn't read it, one deciphered it. What I stumbled onto was a remarkable construction of aprivate spelling system, apparently the idiosyncratic invention of a highly creative and unusualchild. He had worked it all out his own way, this child had not yet read.

This was my first glimpse at what later came to be called invented spelling. For as it turns out,this child was not unique, nor were his abilities rare. His spellings were neither idiosyncraticnor the result of unusual creative ability. We were to learn later that children can quite com-monly write, in their own spellings before they can read, creating spellings that are surprisinglyuniform from child to child. Thus, it was not my friend's son who was so unusual. Rather whatwas unusual, in retrospect, was the lack of recognition, on the part of the developmental andeducational world, of this interesting and important ability that young children command.

Page 248: Rich Languages From Poor Inputs

The Arts as Language 233

It was some ten years later, in the mid-1960s, that Charles Read, then a studentat the Harvard Graduate School of Education working with Chomsky, investigatedchildren's early writing for his doctoral dissertation (see Read 1970; Read and Treimanthis volume). He worked with twenty children, in various stages of learning to read,all of whom were writing with invented spellings. What he discovered was that 'theirspelling systems exhibited common features, different from conventional spelling andbased on the way English sounds. They each work according to principles that werestrikingly similar, though derived independently' (C. Chomsky 1990). Carol Chom-sky continued:

What was exciting about this discovery was two-fold: That writing before reading (or early on inlearning to read) was not an individual quirk but was practiced by larger numbers of children:and that the specific spelling features is invented by these children were not idiosyncratic, butshared by all of them. They all made up their spellings in the same way!

16.6 The Discourse of Invented Spelling

GlendaBissex(i98o),inherbookGNYSAT WORK: A ChildLearns toReadand Write,describes her work researching her son's spelling: 'I was amazed [emphasis minethroughout] and fascinated'. She understood her son Paul's spelling and languagedevelopment as '[reflecting] the child's expanding grasp of the complex principles ofwritten language' (p. vi).

What is striking about the invented spelling research is the enthusiasm and atten-tion with which the investigators view children's abilities. Researchers such as Chom-sky, Bissex, and Read continually used words such as 'intrigued', 'interesting', 'excited','discovery', 'important', 'construction, and 'remarkable'. The emphasis is positive, basedupon children's actions and intentions, as opposed to emphasizing what they do notknow in relation to convention.

Invented spelling is a creative developmental matter that provides an excellentfoundation for later reading (see C. Chomsky 1971). It is important to point out thatwell before Chomsky, Read, and others documented this phenomenon, Montessoriwrote about the ability of young children to write before reading, and indeed shetaught reading through word composition (Montessori 1912/1964). Glenda Bissex(1980) takes us one step further in grasping the significance of invented spelling. Herbook follows the development of her son Paul, who describes what he was doing 'aswriting rather than spelling'.

Had his main interest been in spelling words, he would have written word lists: What he wrote,however, were messages. He cared about what he wrote, not just about how he wrote it. Whenwe look over six years of his writing, patterns emerge that suggest how much a part of hisperson and growth his earliest writings were. Therefore this study, while looking closely at Paul'sdevelopment as a speller, seeks also to keep in view his development as a writer.

Page 249: Rich Languages From Poor Inputs

234 Goldberg

16.7 Relating Invention to the Arts Integration

Glenda Bissex writes in the preface of GNYS AT WRK (1980) that rather thanoffer generalizations to be "applied" to other children' with regard to understandinginvented spelling, what is needed 'is encouragement to look at individuals in the actof learning. And I do mean act, with all that implies of drama and action (p. vi).In considering the relationships between the 'acts' of invented spelling and the act ofcreating art, numerous similarities can be noted. Qualities that describe of the processof invented spelling and the process of making art(s) are noteworthy. In comparingkeywords that describe each, the following words appear in both the discourse ofinvented spelling and the discourse associated with arts integration:

• desire• care• participation• perseverance• engagement• passion• wonder• imagination• creativity• confidence• discipline

Without really knowing it, I developed a visual extension of spelling several years agofor a program I developed aimed at reaching English language learners through artsmethods. The program, called SUAVE (Socios Unidos para Artes Via Educacion orUnited Community for Arts in Education), placed artists in classrooms with teachers(and still does), to find ways to teach the curriculum through the arts. The artist'sjob is to find out what areas the teachers struggle with and then work with teach-ers to invent ways to present the subject matter through arts-based methods. Manyteachers expressed interest in focusing on spelling and word comprehension, not a bigsurprise since here in southern California, a great majority of students are English-language learners. After some brainstorming, voila, the 'picto-spell' was born. Picto-spells (Figures 16.1-16.5) are visual representations of spelling words that incorporatethe spelling of the word into the picture (Goldberg 2012).

I have found a remarkable overlap between my own concerns which have arisenin the course of my work in arts education, and the list of words I took from CarolChomsky's studies of children's development. I have put in bold all the areas where Isee a direct connection between arts education and the invented spelling movementwith regard to children's work and how children are portrayed in their capabilities

Page 250: Rich Languages From Poor Inputs

The Arts as Language 235

FIGURE 16.1 Snap

FIGURE 16.2 Round

and promise. All but one of my top ten concerns directly overlap with Carol and herstudents' characterization of children and the way in which they learn and develop.

My top ten list revolves around concerns I have that I believe are unmet educationalpractice.

Top ten concerns for education:

i. Desire/Care. My first concern is that school settings have gotten away frombeing a place where students have a desire to learn; where they can feel engagedin learning, and where they care about what it is that they are learning. Childrenwho are inventing spellings clearly have a desire to communicate. This is alsotrue of students in the arts; they often engage in the arts because it fulfills that

Page 251: Rich Languages From Poor Inputs

236 Goldberg

FIGURE 16.3 Dinosaur

FIGURE 16.4 Fish

desire to communicate, create, share, or be engaged with people and ideas. It

seems to me that a no-nonsense approach to successful learning and successful

teaching would be to bank on kids' desires and interests. Finding ways to create

desire such that students want to learn and care about learning would also create

learning environments rich with complexity and excitement. In arts education

we also need to find ways to create the desire for arts again, especially in light of

years of very little or non-funding of arts programs. It is hard to desire what we

don't know. Therefore, a major task of educators is reintroducing the power and

potential of the arts to children.

Passion. Adults are not the only people with passions! Kids develop passions

very early on in their lives. I believe that when kids are engaged in invented

spelling via writing messages of importance, they are generally doing so because

they are passionate about what they wish to express. Passion goes a long way.

In the arts, it is easy to uncover passions as children act, sing, dance, or draw.

Creating spaces in schools where kids' passions can take hold (love of theater,

2.

Page 252: Rich Languages From Poor Inputs

The Arts as Language 237

FIGURE 16.5 Dread. This 'dread' picto-spell was interesting in that the student was very con-cerned about the picture since the e' is repeated in the picture, but not in the word. Afterdiscussion, the student and I agreed that symmetry could trump spelling.

music, cooking, animals, bugs, photography, sports, knitting, skateboarding, soon...) taps into children's natural tendencies, and also sets the stage for creatinga school environment whereby they can become passionate for learning itself.

3. Perseverance. Children who are engaged also tend to persevere in their pursuitof learning or interest in a particular idea. Children engaged in invented spellingpersevere in that they want or need to get across a particular idea. The roleof perseverance in learning is critical. Learning through the arts also teachesperseverance. When children engage in putting on a puppet performance, theymust practice, learn (or invent) lines, and practice more before an audience isinvited to view the play. This activity creates in students a sense of the importanceof sticking to something, seeing it through, and the potential joy of sharing theirefforts with a wider audience.

4. Wonder. Children naturally wonder about the world around them. Wonder andcuriosity are probably two of the most potent openings for learning. As childreninvent spellings they are engaged in wonder. The arts at their core are a tool forwonder to take hold. Teaching and learning would blossom if we returned toeducation as a stage for the support of wonder.

Page 253: Rich Languages From Poor Inputs

238 Goldberg

5. Grace is something that is often experienced through the arts. As individualswork together they increase their empathy for each other. Furthermore, theexperience sets the stage for individuals to recognize and value their own skillsand attributes. Community is often formed around art-making, leading towarda greater mutual understanding of individuals in the group. In addition, throughthis collective art-making, which relies on a process of listening vs communica-tion, cultural differences and values are often shared and appreciated.

6. Creativity and practical applications. The arts provide many opportunities todirectly apply learning. Arts activities demand hands-on participation. Artsactivities may in fact restate formulas, dates, and other facts, and/but are cre-ative at the same time. When dancing out DNA or answering a math problemin sounds with a xylophone, students are thinking about the subject matterrather than reiterating it. One of my concerns is that schools have drifted awayfrom the actual application of learning, while resorting to ready-made guidedlearning complete with worksheets and formulaic reiteration of facts. I am notconvinced that this kind of learning will add up to much when all is said anddone.

7. Engagement and risk taking. This is a good thing! Risk-taking means that astudent is going beyond some comfort level and is clearly engaged in a process oflearning. The arts naturally teach the art of risk-taking since whenever a studentimprovises during theater games or acts a play in front of her peers or adults,a certain level of risk is involved. Finding ways to encourage students to takerisks in a safe and accepting environment teaches students to be willing to trynew ideas, to search for differences, and to act upon something new. Individualswho take risks are often the individuals who find new solutions to challengesno matter what field or career. Risk-taking also teaches students to think in newways.

8. Confidence. I find that many of my college students have not had the oppor-tunity to feel confident in their learning. This is truly a sad commentary on thestate of education. Kids who are inventing spellings feel confident in their abilityto get across their ideas. The arts also build confidence in students' abilities tobe before an audience, to work in groups, and to put their own ideas on a pieceof paper.

9. Participation. Invented spellers are participants. They are engaged with others.The performing arts require participation, and can provide a model for learningthe skills of democracy. The arts and sports are areas that teach children skillsof democracy in action. This is in sharp contrast to the emphasis on teachingto tests—or teaching test-taking skills. Test-taking, constant worksheets, theseare elements of teaching that can and do create a sense of doubt in students,as opposed to inspiring learning. I see it in my students at the university andin my own daughter and it truly saddens me. Test-taking itself is not the core

Page 254: Rich Languages From Poor Inputs

The Arts as Language 239

problem; it is the emphasis on tests, test-taking, and test-taking skills to thedetriment of other ways of assessing understanding that concerns me. The samestudent who can do well on a math test might not have any sense of how themath applies to manufacturing the car, bus, or subway we use in everyday life.Likewise, a student who can answer all the questions on a history test might nothave the same understanding as the child who, along with classmates, becomesthe characters in history and acts out a particular event.

10. Imagination. Artists rarely see things as either/or. This sensibility is wonder-filled and sets the stage for expanded learning in the sense of searching forunderstandings as opposed to identifying answers. Artists tend to thrive oncomplexity, crossing boundaries, bridges, jumping walls, and taking risks. Theseartistic notions expand the possibilities for learning. Inventive spellers are clearlyimaginative in how they seek to get across their messages!

16.8 Concluding Thoughts

If you put a musician in a place where he has to do something different from whathe does all the time, then he can do that—but he's got to think differently in orderto do it. He has to use his imagination, be more creative, more innovative; he's got totake risks I've always told the musicians in my band to play what they know andthen play above that. Because then anything can happen, and that's where the greatart and music happens.

Miles Davis (in Gottlieb 1996: 243-4)

Miles Davis, the great jazz trumpeter and composer understood the potential tocreate great art and music when his musicians played 'above' what they knew. Just asperforming ensembles seek to make great music happen, schools are ensembles wherechildren should be able to learn the skills that will enable them to create a world wheregreat things happen.

Each and every child has potential and is capable. What most kids need is opportu-nity. Sadly, its increasingly the case that children are losing opportunities in learning asschools focus narrowly on reading and math and measuring success via test scores ofthose subjects. Diane Ravitch (2010:226) put it this way: 'Our schools will not improveif we value only what tests measure. The tests we now have provide useful informationabout students' progress in reading and mathematics, but they cannot measure whatmatters most in education. Not everything that matters can be quantified.' She goes onto advocate for an emphasis on a curriculum that educates children to become respon-sible citizens and includes the arts. 'If we do not treasure our individualists, we willlose the spirit of innovation, inquiry, imagination, and dissent that has contributed sopowerfully to the success of our society in many different fields of endeavor' (p. 226).

Page 255: Rich Languages From Poor Inputs

240 Goldberg

Carol Chomsky and her students found joy in how they viewed children asmeaning-makers and creative individuals with a passion to present their ideas viamessages with their invented spellings. I believe this is a legacy that applies to how weshould view education overall.

In the world of the arts, creativity, imagination, passion, and perseverance areall underlying aspects of what it takes to be engaged as an artist. In the world ofchildhood, the same aspects apply. Carol gave of herself, and gave to us many things.Perhaps for me one of the most important legacies is that of a lens through whichto see children as creative and exciting, passionate beings with a sense of purpose. Ididn't connect all the dots to arts and education while we were working together, butI can connect them now. 'Anything can happen, Miles Davis writes above, when theopportunities for risk-taking and creativity are open. Carol knew how to honor andkeep that door open. And it is now our job to keep opening doors.

Page 256: Rich Languages From Poor Inputs

Epilogue: Analytic Study of theTadoma Method—Language Abilitiesof Three Deaf-Blind Subjects*

CAROL CHOMSKY

A number of papers have recently appeared on the Tadoma method of speechreading,a vibrotactile method of speech perception used by deaf-blind subjects (Norton et al.1977; Reed, Doherty, Braida, and Durlach 1982; Reed, Durlach, and Braida 1982;Reed, Durlach, Braida, and Schultz 1982; Reed et al. 1985; Reed, Rubin, Braida, andDurlach 1978; Snyder, Clements, Reed, Durlach, and Braida 1982). This method ofspeechreading has been used in training deaf and deaf-blind individuals for bothreceiving and producing speech, and for developing a knowledge of language (Alcorn1932; Gruver 1955; Van Adestine 1932; Vivian 1966).

In the Tadoma method, the person receiving speech places a hand on the face andneck of the speaker and monitors the articulatory motions associated with normalspeech production. In the typical hand placement, the thumb rests lightly on thetalkers lips and the fingers spread out over the face and neck (Vivian 1966). For thedeaf-blind Tadoma speechreader, there is no auditory or visual input. Speech percep-tion is achieved through the tactile sense alone. One advantage of using Tadoma forspeechreading is that Tadoma users can receive speech from virtually any speaker, andthus are not limited to communication with specially trained individuals with whomthey share a manual system of communication (Reed et al. 1985; Schultz, Norton,Conway-Fithian, and Reed 1984).

Training in the skilled use of Tadoma for receiving and producing speech mayextend over years of intensive, individual instruction. Students first receive trainingin speech reception, followed by training in speech production through imitating ateachers articulatory motions (Schultz et al. 1984). In learning to produce speech, thestudent monitors the teacher s articulation by placing a hand on the teacher's face andneck, and then attempts to match the articulation while placing a hand on his or herown face.

* Originally published as Chomsky, Carol (1986). Analytic study of the Tadoma Method: Languageabilities of three deaf-blind subjects, in Journal of Speech and Hearing Research 29 (September): 332-347.Reprinted with permission.

Page 257: Rich Languages From Poor Inputs

242 C. Chomsky

The extent of the use of Tadoma in schools for the hearing-impaired and deaf-blindin the United States and Canada is described in a recent survey article (Schultz et al.1984). Schultz et al. report on the use of Tadoma with students of varying disabilities,both as a primary means of speech training and in conjunction with other methods ofspeech and language training. The method was most widely used from 1920 to 1960,and its use has apparently declined since then. The survey reports that there are somefifteen to twenty deaf-blind persons in the United States today who rely on Tadomaas their primary means of speech communication.

There was little discussion of the method in the research literature until the late19705, when reports on the speechreading abilities of experienced Tadoma usersbegan to appear (see above). This recent research is motivated by an interest indeveloping tactile aids for speech communication by the hearing-impaired and thedeaf-blind. The study of Tadoma is relevant to this goal in the information it providesabout the capabilities of the tactile sense and the parameters involved in the use ofTadoma. The adequacy of the tactile sense for processing temporal information suchas speech is clearly a question of basic importance with regard to the feasibility anddesign of such tactile aids. The degree of success that can be attained by Tadoma usersin processing speech and developing language thus has serious implications for thepotential of tactile aids to transmit spoken language information to deaf individuals.

A preliminary probe of the language knowledge of one deaf-blind Tadoma subjectappeared in Norton et al. (1977). The purpose of this report is to extend the studyto additional language measures on the original subject, and to present results oflanguage testing with two additional highly experienced deaf-blind Tadoma users.

The three subjects are totally deaf and blind, two of them since they were one-and-half years old. To examine the language1 of these Tadoma users, we administeredseveral standardized verbal intelligence measures, a syntax test in use with deaf pop-ulations, and a number of special-purpose linguistic tests constructed for this study.In addition we analyzed samples of their oral and written language. The study is anexploratory one, and our purpose was to sample a wide range of diverse abilities,rather than attempt an exhaustive and systematic account in any one area. We thushave included in the testing an examination of vocabulary, and a range of syntactic,semantic, and prosodic features of language.

Method

Subjects

Subject LD LD, age fifty-five, has been totally deaf and blind since age nineteenmonths, following a case of spinal meningitis. His development was normal until that

1 Throughout the body of this paper, the term language is used to refer to English. This is a study of theEnglish of our subjects, and even though American Sign Language is mentioned, we did not analyze thesubjects' competence in this area.

Page 258: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 243

time, but he emerged from a nine-week coma having lost both sight and hearing. Hisuse of language ceased, and he received no language training for almost four years.At age 5:4 (years:months) he entered the Perkins School for the Blind in Watertown,MA, and his language training through Tadoma was begun. Records of his earlyschool years, reported by Stenquist (1974), provide details of his progress. Withineight months he had forty expressive words, and by age seven he had 410 words andwas combining three words into sentences. His schooling continued at the PerkinsSchool until age twenty, at which time his Stanford Achievement Test scores show anaverage Grade Equivalent of 6.6, with Grade Equivalents of 7.6 in Language Usage,9.0 in Spelling, and 7.0 in Word Meaning. LD lives today in his own home, is married,and holds a factory job. Tadoma is his primary means of communication, aidedoccasionally by tactile finger-spelling and sign which he learned as an adult. His orallanguage and Tadoma speechreading abilities are sufficient for him to engage in fluentconversation with untrained hearing individuals. His literacy skills enable him to readBraille and type his own letters.

The results of audiometric testing indicate no pure-tone response in the right ear,while minimal low frequency response (probably vibrotactile) was observed in the leftear. LD demonstrates no ability to discriminate or identify speech sounds auditorily.He has attempted to use a variety of hearing aids with no success. Results of visualtesting indicate no measurable visual acuity.

Subject RB RB, age forty-nine, has been deaf and blind since twenty months of age,as a result of spinal meningitis. His development was normal until that time. When hewas two-and-a-half he entered St. Mary's School for the Deaf in Buffalo, NY, where heremained until age seventeen. He was trained in Tadoma for speaking and receivingspeech, and in the use of sign language, signed into his hand. After graduation fromSt. Marys, he went on to study electronics at the Burgard Vocational High School.Today he works as an electronic technician, is a licensed ham radio operator, and haslearned computer programming.2 RB is able to read Braille and type his own letters.He is fluent in American Sign Language, and he often uses tactile sign or finger-spelling in communicating with people who command these manual systems. Hereports that his communication with others is achieved about half the time using signlanguage, and half the time using Tadoma. His Tadoma proficiency and oral languageare adequate for him to engage in conversation with untrained hearing speakers.

Results of pure-tone testing indicate minimal low frequency response (probablyvibrotactile) in the right ear, with no response in the left ear. RB demonstrates noability to discriminate or identify speech sounds auditorily and has never used ahearing aid. Visual tests reveal no measurable visual acuity.

RB uses Morse code to talk on his ham radio, and reads the computer screen by tactile Morse codeoutput.

2

Page 259: Rich Languages From Poor Inputs

244 C. Chomsky

Subject JC JC, age fifty-four, developed normally until age seven, when she lost bothsight and hearing as a result of spinal meningitis. She was subsequently trained inTadoma at the Arizona State School and the California School for the Blind. Sheattended the University of the Pacific where she obtained a BA in Sociology. Today JCworks for a State Department of Rehabilitation as the state-wide consultant for deaf-blind persons. JC reads Braille and is able to type her own correspondence and originalshort stories. Tadoma is JC s primary means of communication, and she is proficientenough to engage in fluent conversation with untrained hearing individuals. Herspoken language is sufficient to enable her to lecture at conferences which she attendsin connection with her employment. She has not received training in sign language,but is skilled in tactile reception of finger-spelling.

JC has no measurable hearing or sight. She has no response to pure tones in eitherear across the audiometric frequencies. She demonstrates no awareness of speechsounds and has never used a hearing aid. Similarly, tests of vision reveal no measurablevisual acuity.

Description of tests

Verbal Subtestsfrom WAIS, WISC-R, and Stanford-Binet Verbal subtests from threeintelligence scales in use with the hearing population were selected to provide somestandardized measures of the subjects' verbal abilities. Performance was measured ontests of vocabulary, differences between abstract words, proverbs, and the like. Thesetests were all administered orally, with a copy in Braille available for reference in caseof doubtful perception. The subjects often referred to the Braille copy to be certainthey were perceiving the words and questions accurately.

Two subtests were administered from the Wechsler Adult Intelligence Scale (WAIS1955). The WAIS Vocabulary subtest, which requires the subject to provide worddefinitions, has a maximum raw score of 80 and an average scaled score in the rangeof 8 to 12 (with a maximum of 19). The WAIS Similarities subtest, which requires adescription of how two items are alike, has a maximum raw score of 26 and an averagescaled score identical to that for the Vocabulary subtest. The Vocabulary subtest fromthe Wechsler Intelligence Scale for Children (WISC-R 1974) was administered as wellto provide a fuller picture of vocabulary knowledge, even though the scaled score isapplicable only to chronological age sixteen. Finally, four subtests of the Stanford-Binet Intelligence Scale (1960) were administered. These included (a) Differencesbetween abstract words (which requires a subject to define differences between threepairs of words, e.g., laziness and idleness); (b) Essential differences (which requires thesubject to describe the principal difference between three pairs of words, e.g., work andplay); (c) Abstract words III (which requires definition of five words, e.g., generosity);and (d) Proverbs (which requires the subject to relate the meaning of several proverbs,e.g., All that glitters is not gold').

Page 260: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 245

Test of Syntactic Abilities The Test of Syntactic Abilities (Quigley, Steinkamp, Power,and Jones 1978) was administered to provide a measure of the subjects' abilitieswith reference to norms for deaf individuals. This test was developed for evaluatingthe English skills of deaf pupils. It contains a no-item screening test to evaluateperformance on a set of nine grammatical structures (e.g., negation, conjunction,verb processes). The test items, which are presented in multiple-choice format withfour alternatives, were administered in Braille. Normative data on a population of505 students ages eight to eighteen with a hearing loss greater than 20 and lessthan 120 dB are available. Average performance in the norming group is 68 percentcorrect.

Special-Purpose Linguistic Tests Special-Purpose linguistic tests were designed forthis study to examine the subjects' knowledge of a range of syntactic structures andthe principles of semantic interpretation of syntactic structures. Particular prosodicfeatures of language were also studied. The tests cover a variety of grammatical prop-erties within the domain of transformational-generative grammar (N. Chomsky 1965,1975), examining the subjects' interpretations of sentences that involve fairly complexand subtle grammatical features of English. These are aspects of grammar that, forthe most part, would not have been taught to the subjects but that they would havehad to acquire independently through experience and exposure to language use. Thefeatures selected are ones that are commonly known to native speakers of English. Thetest questions have all been answered successfully by hearing native English speakers,both high-school students and a variety of adult volunteers whom we have questionedinformally. We were interested in the degree to which the subjects have been ableto acquire these basic but complex language forms, and the nature of the deficits,if any.

A Braille copy of the structural tests was prepared, and the subjects read the ques-tions and reviewed them orally before responding. Considerable care was taken toensure that the subjects perceived the questions accurately. For the tests of stressand intonation, the test materials were spoken to the subjects. They gave all of theiranswers orally.

The items included in each of the Special-Purpose tests are listed in the Appendix,and a brief description is included here. The individual tests contain different numbersof items and scores are reported as percentage of items correct.

Structure. The structural tests examine the subjects' knowledge of syntactic andsemantic features of English. Subjects are asked, for example, to report on the mean-ing or acceptability of sentences that contain semantic complexities, ambiguities, orgrammatical anomalies. On tests entitled Deletions, Article Switch, Ambiguity, andIllicit Comparison and Conjunction, subjects answer questions about meaning of thesentences, the correctness of the sentences, or how two sentences differ in meaning.

Page 261: Rich Languages From Poor Inputs

246 C. Chomsky

On Tag Questions, Contractions, and Phrase Analysis, subjects are asked to producetarget linguistic forms.

For example, on the Article Switch test, the task is to describe the difference inmeaning between two sentences which differ in placement of a and the:

1. Maggie looked at the puppy at Peter's Pet Shop, but later she decided not to buya puppy.

2. Maggie looked at a puppy at Peters Pet Shop, but later she decided not to buythe puppy.

In sentence i Maggie saw a particular puppy at the shop and later decided not tobuy any puppy at all. In sentence 2 Maggie saw a puppy at the shop and later decidednot to buy that particular puppy.

On the Ambiguity test, the task is to give more than one meaning for sentencessuch as "The long drill was boring' and "The chicken is ready to eat'.

Prosodies. These tests examine subjects' knowledge of and ability to utilize intona-tion and stress cues to derive meaning in phrases and sentences. The items consistedof Compound Noun Stress, Contrastive Stress, and Yes/no Question Intonation. InCompound Noun Stress, for example, subjects are asked to distinguish the meaningsof GREENhouse (special place for growing plants) and green HOUSE (a house whichis green).

In contrast to the other language tests, which were all presented in Braille or hadBraille copies available for reference, the prosodic tests were delivered orally to thesubjects. These tests examined the subjects' ability to perceive the prosodic featuresthrough Tadoma, as well as their recognition of the linguistic function of the prosodicinformation, if perceived. Sentences were repeated as often as necessary to ensureoptimal tactile access to the prosodic features pronounced by the speaker.

Developmental Sentence Scoring The Developmental Sentence Scoring (DSS) proce-dure of Lee's Developmental Sentence Analysis (Lee 1974) was applied to a sampleof each subject's spontaneous speech, produced during normal conversation in thelaboratory with members of the research group. The DSS procedure analyzes fiftycomplete consecutive sentences spoken by a subject, scoring occurrences of pronounusage, verb types, conjunctions, negatives, and the like. Although the DSS measure isintended to assess developmental language disorders of young children and is normedonly to age 6:11, a DSS score on adult speech can be informative in comparison tothese early levels. An adult score well above the scores of the six-year-old norminggroup, for example, maybe interpreted as evidence of language development beyondthe middle childhood stage. The mean DSS score for the norming group at age 6:6 is10.94, with a range from 8.11 to 13.78.

For each the of three subjects, the number of sentences analyzed was less than therecommended fifty. For this reason, the scores reported should be considered a 'rather

Page 262: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 247

tentative DSS' (Lee 1974: 163). In each case, scores were calculated by dividing thetotal sentence scores by the number of sentences contained in the sample.

Results

This section presents the results for each of the three subjects, along with samples oftheir spoken and written language. The test results across subjects are summarized inTables i through 3.

Subject LD

Verbal Subtestsfrom WATS, WISC-R, Stanford-Binet (Table i) LD performed well onsix of the seven standardized tests, comparing favorably with the hearing population.His WAIS Vocabulary scaled score is 12, at the high end of average for hearing indi-viduals. On the WISC-R Vocabulary, he scored 50 out of a possible 64. His definitionswere generally thorough and well-stated, for example, sentence: complete group ofwords written in one sentence; a judge gives a sentence'; calamity: 'a great disaster.'

His WAIS Similarities score was within the average range for hearing individuals,but at the low end of the scale. He consistently described similarities between the itemsby naming common attributes: Tn what way are a coat and a dress alike?' 'Both madeof cloth.' 'A dog and a lion? 'Both have teeth, tails, four paws, both covered with hair.'

TABLE i. Scores for the three subjects on the StandardizedMeasures for the Hearing Population

Tests

WAIS Vocabulary; Scaled (raw)WAIS Similarities: Scaled (raw)WISC-R Vocabulary: Raw (of 64) a

Stanford-Binet:Differences between abstractwordsEssential differencesAbstract words IIIProverbs

Developmental Sentence Scoring(DSS)b

LD

12 (54)

8(6)50

83%

100%

80%o

26.6

Subjects

RB

7(22)13(19)

32

o

33%30%

o24-97

JC

17 (76)16 (22)

61

100%

100%

100%100%20.67

aWISC-R Vocabulary scores are reported as raw scores only. A scaled scorecannot be assigned because the test is applicable only up to age sixteen.

Tentative DSS, based on fewer than fifty sentences.b

Page 263: Rich Languages From Poor Inputs

248 C. Chomsky

TABLE 2. Percent correct response for the three subjects onthe Test of Syntactic Abilities

Structures

NegationConjunctionDeterminersQuestion formationVerb processesPronominalizationRelativizationComplementationNominalization

Total

LD

10010010010090

10089948995

Subjects

RB

899i86

1001009089898390

JC

10010010010090

10010010010099

(Deaf norms)a

(83)(64)(78)(73)(63)(67)(59)(65)(65)(68)

"Based on the data of Quigley, Steinkamp, Power, and Jones (1978).

Even with continued prompting, LD did not provide category-type answers such as"They are both clothing' or 'both animals'.

On the Stanford-Binet Essential Differences subtest, he scored 100 percent, and onthe Abstract Words subtests he missed one item each. His responses were accurate andto the point. For example, 'What is the principal difference between an optimist anda pessimist?': An optimist is a person who looks at the bright side of something, andwho knows the best time's to come. A pessimist is a person who is on the dark side ofthings and who thinks nothing can be done.'

LD performed poorly on the Proverbs test. On this measure, he was able to give onlyliteral restatements of the proverbs. In no case was he able to refer to the generality orlarger truth embodied in a proverb, and we conclude that he does not understand thespecial character of proverbs.

Test of Syntactic Abilities (Table 2) LD's overall score of 95 percent correct on thistest places him well above the norm of 68 percent for deaf subjects. His answers were100 percent correct on five of the structures: Negation, Conjunction, Determiners,Question Formation, and Pronominalization. He missed one item on Verb Processesand Complementation and two items each on Relativization and Nominalization.

Special-Purpose Linguistic Tests (Table 3) Structure. LD's judgments on the structuraltests were mixed. On the Deletions test, which required him to identify missinginformation, he answered correctly for the sentences that follow the general rulesof English, and incorrectly on the exceptions. For example, in answer to 'Who issupposed to wash the dishes?' in the two sentences:

Page 264: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 249

TABLE 3. Percent correct response for the three subjects onthe Special-Purpose linguistic tests

Subjects

Tests

StructureDeletionsArticle switchAmbiguity: Sentences

Subject phrasesIllicit comparisonIllicit conjunctionTag questionsContractionsPhrase analysis

ProsodiesCompound noun stressContrastive stress: Pronoun reference

Focus of negativeYes/no question intonation

LD

50o

5040

10083i/a

5oa

100

67a

ooo

RB

6375432080

oo

27100

75ooo

JC

100100100100100100100100100

100100100o

aln these cases, less than the full test was administered. The percentage listedis based on a count of correct responses with respect to the number of itemsactually given. See text for details.

1. John told Susan to wash the dishes.2. John promised Susan to wash the dishes.

he answered 'Susan to both, instead of 'John' in sentence 2. The general rule requiresSusan in such constructions, and sentence 2 with the verb promise is an exception.

He performed poorly on all four pairs of sentences on the Article Switch test. Heeither did not recognize a meaning difference in the pair, or if he thought the twosentences were different, described the meanings inaccurately.

On the Ambiguity (Sentences) test, he easily detected lexical ambiguity, describ-ing meaning differences accurately. He was successful in detecting about half of thestructural ambiguities, detecting deep and surface structure ambiguity about equallyHe paraphrased the structural ambiguities that he detected quite well. For example, forThey are moving sidewalks he said, 'Means two things. People are moving sidewalks,or it could mean they are conveyor sidewalks. The sidewalks are moving.'

On the Ambiguity (Subject phrases) test, LD initially filled in only is or are. Whenasked if the other verb was possible, he answered yes and reported the two meaningsfor two of the five sentences.

He achieved a perfect score on the Illicit Comparison sentences, easily and accu-rately describing what was the matter with each one. For example, he rejected "This

Page 265: Rich Languages From Poor Inputs

250 C. Chomsky

math problem is not as hard as that rock' because 'A rock is hard to knock, to touch(he rapped on the table). Math is hard in the head.'

On the Illicit Conjunction test he also achieved a high score. He distinguished theacceptable sentences from the unacceptable ones, accepting the good ones immedi-ately, and pausing at some length over the unacceptable ones. These latter were clearlyquestionable to him, and he ended up accepting some and rejecting others.

Performance was poor on Tag Questions. LD was able to give the correct tag for onlythe first sentence: John is an engineer, 'isn't he?' On subsequent sentences, he resortedto a generalized tag such as Ts that so?' and was unable to produce the syntacticallyaccurate form. The test was discontinued after six sentences.

Only half of the items were answered correctly on the Contractions test. It isof interest that these were the items that have only one possible expansion; won't,should've, they're. The ones that he missed, you'd and two instances of what's, areambiguous and depend on sentence context. You'd, for example, may derive fromeither you would or you had and what's may derive from what is, what does, or whathas. There was no ambiguity, of course, in the context of the test sentences.

On the Phrase Analysis measure, LD correctly filled in is or are in all the sentences,showing an accurate perception of the internal structure of the subject phrase.

Prosodies. LD did well on only one prosodic test, Compound Noun Stress. Onthe Compound Noun Stress test, LD gave correct meanings for hot DOG and HOTdog, green HOUSE and GREENhouse. Black BOARD and BLACKboard were indeter-minate. LD was tested on only these three word-pairs. His success on the first twoexamples shows an ability to perceive and interpret stress appropriately at the level ofthis test, where stress functions to distinguish word and phrase meaning.

LD did poorly on both Contrastive Stress tests, where stress is used for emphasis orcontrastive purposes. On the Contrastive Stress: Pronoun Reference measure, he didnot perceive the stress differences that we pronounced in the sentences and reportedthat they sounded the same to him. On the Contrastive Stress: Focus of Negation mea-sure, he did perceive the stress differences as pronounced, but reported no resultingdifference in meaning.

On the Yes/no Question Intonation test, LD did not perceive the intonation differ-ences between the questions and the statements. He was able to correctly identify theintonation as rising or falling about half the time, no better than chance. When it wasexplained to LD that rising intonation signals a question, he answered "That's news tome.' It is of interest, however, that in conversation LD responds entirely appropriatelyto such 'questions'. Note the following exchanges in a conversation with LD:

You live in Kansas? LD: Yes.You cross the border? LD: Yes.Someone gives you a ride? LD: Yes, a friend from work.In a taxi? By car? LD: By car.

Page 266: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 251

LD always recognizes that such constructions require answers when they occur incontext in conversation. Intonation, which he does not perceive, is apparently a super-fluous cue under such circumstances.

Developmental Sentence Scoring (Table i) The DSS procedure was applied to a corpusof thirty sentences spoken by LD in conversation with members of the research group.LDs DSS score, on the basis of the thirty sentences, is 26.6, well above the 10.94 meanfor age 6:6.

Here is a portion of LD s oral language sample:

Oh, one time one of my friends took me to a huge trucking garage where he works. This truckinggarage repairs transportation trailer trucks and trailer cabs. You know how high they are. Well,I stayed at the garage for more than an hour and a half or two hours and I saw all the giantmechanical equipment there is. And I saw the small equipment for testing and cleaning outcarburetors. And I was taken to a place where trucks were smashed up in an accident. And Isaw one cab flattened down to about a foot high. The cab—you know how big the cab is—butit was squashed down about one foot. And I was amazed to see the trucks that got smashed upin an accident. And my mechanics friend told me that the driver who got out of that cab thatwas squashed down by accident, got out by [?] escape. He came out alive. He was not killed buthe was very badly injured.

As can be seen in this sample, LDs spoken language is of high quality, comparableto that of individuals with normal hearing. His vocabulary is mature, for example,mechanical equipment, carburetor, repairs, injured. He uses complex sentence struc-ture, for example, two levels of subordination: / was amazed (main verb), to see thetrucks (embedding i), that got smashed up (embedding 2), and several passives: wastaken, was squashed down, was not killed. This sample is typical of his speech, whichis noteworthy for its fluency, naturalness, and low incidence of error.

Note also LDs appropriate use of the verb see: I saw all the giant mechanical equip-ment, I saw one cab ..., I was amazed to see the trucks... . This use of sight verbsis typical of LDs productions. Throughout his conversations, there is frequent andappropriate use of visual terminology. For example, elsewhere he has commented:It's a beautiful place, I like to see the snow come and go. Further, his definitions ofsight verbs like gaze, fade, and dazzle are exact. It is of interest that the accurateknowledge and use of sight terms has been noted for hearing blind subjects, whoseaccess to language is not limited as in the case of a person who is also deaf (Landau andGleitman 1985:94-7). The linguistic and cognitive mastery of such sight vocabulary isall the more dramatic when it develops in the absence of both sight and hearing, as inLDs case.

Written Language A sample of LDs writing is included as another example of hislanguage production, and to illustrate his level of literacy. This is an excerpt from aletter typed by LD to a member of the MIT research group.

Page 267: Rich Languages From Poor Inputs

252 C. Chomsky

Since my next trip will be in the summer, I hope we can find some free time to go surfboardingto see if I can handle the surfboard easier than waterskis, then maybe try the skis later. AlsoI want to spend more time examining the train engine, with some old workclothes on, and Ihope you can find a man who knows about all the many valves, and devices on the engine, sohe can really explain them to me.

The writing is comparable to the writing of hearing individuals. Sentence struc-ture is complex, including four levels of subordination: / hope (main verb), we canfind... time (embedding i), to go surfboarding (embedding 2), to see (embedding 3),if I can handle... (embedding 4). Sentence length averages 40.5 words, indicating agood command of the written language. There are no errors in grammar, spelling, orpunctuation. This sample is typical of LD's productions.

In summary, LD has an excellent command of English. His spoken and writtenlanguage are fluent, mature, and largely error-free, comparable to the speech andwriting of literate, hearing individuals. His vocabulary compares favorably with normsfor the hearing, and his syntax is above norms for the deaf. The tests show aboveaverage or average performance on all but one of the standardized tests for the hear-ing population, with a lack of understanding of the special character of Proverbs.He scores well above norms for the deaf on a syntax test for deaf subjects. On theSpecial-Purpose linguistic tests he performs well on about half of the structural tests.Specifically, he succeeds with Illicit Comparison and Conjunction, deletions whichfollow the general rules of English, and some contractions. He recognizes lexicalambiguity more readily than structural ambiguity. He is unable to provide tags for tagquestions, to interpret the semantic effect of Article Switch, and to fill in deletions thatare exceptions to general rules. The prosodic features of language present difficultyfor him. He is unable to perceive the intonation pattern that signals yes/no questions.His perception of stress differences is variable. He both perceives and interprets stressdifferences in compound nouns, but with contrastive stress shows variable perceptionand no knowledge of effect on sentence interpretation.

Subject RB

Verbal Subtestsfrom WAIS, WISC-R, Stanford-Binet (Table i) RB's best performanceon these tests was on the WAIS Similarities, where his score is above average forhearing individuals. His answers were direct and accurate: In what way are a dogand a lion alike? 'Both animals.' A table and a chair? 'Both pieces of furniture.' Hereadily answered by naming the category to which both items belong for eight of thepairs.

On the WAIS Vocabulary subtest his scaled score of 7 is just below the average rangefor hearing individuals. His WISC raw score is 32 out of a possible 64. His responsesto the words that he knew were well-stated, for example, fabric: 'piece of material likecloth, made of cotton, silk'; repair: 'to fix, to get things into shape.'

Page 268: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 253

On the Stanford-Binet Differences between Abstract Words and Essential Differ-ences, the only difference he stated was between work and play: 'Work is doing theduty to do the job. In play, you don't do the duty, you're just having fun.' In all othercases, he did not know the meaning of one or both words, and merely defined theindividual words he knew. Of the Abstract Words III, he gave a full definition only forindependent: 'do anything you please without anybody stopping you.'

For Proverbs, RB was unable to give any generalizations, and merely gave literalrestatements. We provided considerable prompting and explanations of the specialcharacter of proverbs, and modeled the answers we were seeking, but RB persistedwith literal interpretations only.

Test of Syntactic Abilities (Table 2) RB's overall score on this test was 90 percent,well above the norm for deaf individuals. He scored 100 percent on both QuestionFormation and Verb Processes. He missed one item each on Negation, Conjunction,and Pronominalization, two items each on Determiners, Relativization, and Comple-mentation, and three items on Nominalization.

Special-Purpose Linguistic Tests (Table 3) Structure. RB's judgments on the structuraltests were mixed. On the Deletions measure, he filled in the missing informationcorrectly for the regular constructions, and incorrectly for most of the exceptions.The one exception with which he was successful was Mary is easy to see. He answeredcorrectly for a variety of adjectives in this sentence frame:

Mary is anxious to see. Who wants to see? 'Mary'Mary is hard to see. Who is having trouble seeing?'The person who is trying to see Mary.'

He did well on the Article Switch test, stating the meaning difference accurately forthree of the four sentence pairs. This test performance is of particular interest becauseRB often omits the article a when speaking. He uses it more regularly in his writing.He clearly understands its function in these test sentences.

On the Ambiguities (Sentences) test, RB easily detected the lexical ambiguities andexplained all of them well. He had trouble with the structural ambiguities, detectingonly two of the deep structure ambiguities, 'Flying planes can be dangerous' and 'Thechicken is ready to eat', and none of the surface structure ones.

On the test of Ambiguity (Subject phrases), RB initially filled in only is or are. Whenasked if the other verb was possible, he answered yes and gave the two meanings inonly one case.

He did well on the Illicit Comparison measure, explaining what was wrong withall but one of the sentences very accurately. For example, on hearing The movie waslonger than her hair, he laughed and said, 'No good. The movie gives you the lengthof time. Girl's hair is a measurement. Difference between time and measurement.'

Page 269: Rich Languages From Poor Inputs

254 C. Chomsky

The results on the Illicit Conjunction test were indeterminate. RB interpreted thetask as one of judging if the two events could occur together, rather than attending tothe conjoined sentence and making a judgment about its form. His judgments aboutthe two events were sensible and carefully made, but not related to the point at issuehere.

He performed poorly on Tag Questions, failing to supply any tags correctly. Thetask was discontinued after four sentences, because it appeared pointless and RB lostinterest very quickly.

On the Contractions test, RB expanded approximately one quarter of them cor-rectly, including some whose form varies with sentence context such as he'd. Heexpanded it correctly to he had in the sentence T knew he'd finished his work by 5o'clock.'

On the Phrase Analysis measure, RB correctly filled in is or are for all the sentences,assigning the correct internal structure to all the subject phrases.

Prosodies. RB did well on only one prosodic test, Compound Noun Stress. Onthe Compound Noun Stress test, RB responded correctly to three of the four pairs.He perceived the stress differences and correctly differentiated the meanings of hotDOG and HOT dog, green HOUSE and GREEN house, and white HOUSE and WHITEhouse. At this level of word and phrase meanings, he is successful in using stress cuesto signal linguistic distinctions.

He failed to make any of the relevant distinctions on the Contrastive Stress: Pro-noun Reference measure, and it was difficult to determine whether he actually per-ceived the stress difference in all cases.

On the Contrastive Stress: Focus of Negation test, he did perceive the stress place-ment on different words in the sentence. However, he recognized no associated dif-ference in meaning or implication. After the function of stress in such cases wasexplained to him with examples, RB did understand and subsequently gave threecorrect answers on additional sentences.

He did not perceive the intonation differences on the Yes/no Question Intonationtest. To test his pitch perception we tried singing high and low notes, and he was ableto detect large differences in pitch. The intonation changes in the sentences, however,were apparently not large enough for him to perceive.

Developmental Sentence Scoring (Table i) The DSS procedure was applied to a corpusof twenty-nine sentences, spoken by RB in conversation with the research group. RB'sDSS score on the basis of the reduced corpus is 24.97, well above the 10.94 mean forage 6:6.

Here is a portion of RB's oral language sample:

When I am ready to go backto work, I am thinkingto take retirement because myjob is 15 milesaway from home, and I do not have good transportation. My father has to take me to work and

Page 270: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 255

he is going to be 76 years old. And if I should be working another 18 years till retirement, willhe be in perfect health for another 18 years?

This sample is typical of RB's productions. His sentence structure shows frequentuse of subordination, for example, / am thinking (main verb), to take (embedding i),because my job is... (embedding 2). His vocabulary is mature, for example, transporta-tion, retirement. His speech nevertheless contains deviations from idiomatic usage, forexample, / am thinking to take retirement, in place of 'I am thinking of retiring', andif I should be working another 18 years in place of the colloquial 'if I work another 18years' or perhaps the somewhat more formal 'if I should work another 18 years'. Thislast example reflects the common difficulty with verb tense that many deaf speakersexperience, a frequent problem for RB. He also is sporadic in his use of the third-person singular marker -s on verbs. Other speech samples contain examples of boththe presence of-s and its absence, for example, when I type it come out in Morse Code,my brother goes to Montreal, maybe he knows something. Another occasional error inRB's speech is omission of the article a. Some examples are: 50 / get bit rusty, I haveIBM-PC computer, Montreal is nice place, I was going to bring camera— Again, hereflects errors common to many deaf speakers.

Written Language A sample of RB's writing is included to demonstrate anotheraspect of his language production. This is an excerpt from a letter typed by RB toa member of the MIT research group.

Friday morning, we had some programs at the auditorium at Perkins School. I had a surprisewhen they called my name to give a talk about M.I.T. on the stage. I told about the different testslike Tadoma methods and some words and etc. After the programs, we had picnic and games inthe play ground. I met many more people, and talked with them. At 4:30, C. took J. and me tothe air port. We had supper there. E. met me in Buffalo, and I am glad to see family and friendsagain.

Today I was reading Popular Mechanics and Consumer Report and slept for a while. Tonight,I experimented with the buzzer. This buzzer will be experiment for the electric braille writer.I hope the braille typewriter will be here tomorrow.

In his writing, RB uses the article a with more consistency than in his spoken lan-guage: / had a surprise, to give a talk, for a while, although occasional omissions occur:we had picnic, will be experiment. His writing is for the most part grammatical. Sen-tences tend to be brief, with longer sentences showing the use of subordination: lhada surprise (main verb), when they called (embedding i), to give a talk (embedding 2).Spelling is good, with errors in this sample limited to word-juncture conventions incompound nouns: air port, play ground. Punctuation is appropriate. This sample istypical of RB's written language.

In summary, RB has a good command of English. His spoken language is matureand fluent, and his written language is competent. Both his speech and writing exhibit

Page 271: Rich Languages From Poor Inputs

256 C. Chomsky

some features common to deaf speakers, such as lack of verb tense agreement andomission of the article a. His vocabulary is just below the average range for hearingspeakers, and his syntax is well above norms for the deaf. He scores above average forhearing subjects on one standardized test (WAIS Similarities), and below average forhearing subjects on the other standardized tests. On the Special-Purpose linguistictests, he performs well on half of the structural tests. Specifically, he succeeds withArticle Switch and Illicit Comparison, deletions which follow the general rules ofEnglish, and some contractions. He recognizes lexical ambiguity more readily thanstructural ambiguity. He is unable to provide tags for tag questions, to detect IllicitConjunction, and to fill in deletions that are exceptions to general rules. The prosodicfeatures of language present difficulties for him. He is unable to perceive the intonationpattern that signals yes/no questions. His perception of stress differences is variable.He both perceives and interprets stress differences appropriately in compound nouns,but with contrastive stress shows variable perception and no prior knowledge of effecton sentence interpretation. Considering that RB communicates with others only halfthe time using speech through Tadoma (by his own report he uses sign language abouthalf the time), his grasp of English is impressively solid.

Subject JC

JC's language performance and all responses to the tests are extremely high-level. Sheis unusually sophisticated linguistically and scores well above average for the hearingpopulation. Of course JCs language was well established before she became deaf andblind at age seven, but the excellence she displays today clearly results from continuedadequate and even rich exposure to language. She has advanced not only beyond aseven-year-old's linguistic ability, but outdistances the average hearing adult.

Verbal Subtests from WAIS, WISC-R, Stanford-Binet (Table i) JCs WAIS Vocabularyand Similarities scaled scores are well above average, 17 and 16, respectively. Shemissed only two words on the WAIS Vocabulary subtest, giving excellent definitionsthroughout. For example, breakfast: 'the morning meal when one has broken the fastof the night'; matchless: 'peerless; nothing is as good as what that is; incomparable';travesty: 'a mockery, usually something ugly; something that was beautiful made tolook ugly and obscene.' Her raw score on the WISC was 61 out of 64. Her explanationsin the WAIS Similarities were quite sophisticated, for example, dog/lion: "They are bothanimals. I could have said both are quadrupedal animals'; eye/ear: "They are both usedfor receiving sensory impulses. They are both senses'

All of JC's answers on the Stanford-Binet tests were correct. Some examples are:difference between poverty/misery: 'Poverty refers to not having earthly goods; miseryrefers to pain and agony'; definition of generosity: 'Noun that refers to being generous,open-handed, very giving, unselfish, very liberal.'

Page 272: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 257

She interpreted the Proverbs appropriately. For 'We only know the worth of thewater when the well is dry', she responded: 'We don't appreciate what we have untilwe no longer have it. Or, we don't think about our blessings until we lose them.' Shewas even able to construct a generalization for a proverb with which she was notfamiliar. For 'Large oaks from little acorns grow', she said: 'Well, I'm not sure, but itcould mean that you might have to start from the bottom but you could build up intogreat strength. Like Tadoma—I started with single sounds first and built up finallyinto sentences.'

Test of Syntactic Abilities (Table 2) JC scored 99 percent on this test, missing onlyone item in Verb Processes. When asked this particular question a second time,she answered correctly. Her near-perfect performance indicates full command of thesyntactic structures on this test.

Special-Purpose Linguistic Tests (Table 3) JC's record on the Special-Purpose linguis-tic tests is elegant and simple. She achieved a perfect score on every one of these tests,with the exception only of Yes/no Question Intonation under Prosodies. She com-pleted all the structural tests and all the other prosodic tests with 100 percent accuracy,enjoying the type of thought they engendered and stopping to discuss meanings andwords throughout the test. There is no need to discuss her scores individually as theywere all 100 percent, but examples of her responses are provided.

Structure. JC's judgments on the structural tests showed considerable depth ofunderstanding.

On the Deletions test, she handled the consistent examples and the exceptions withequal facility. Her explanations were excellent. For example, to T told him what to eat.Who is to eat?' she responded: 'Whoever the "him" is. The boy is to eat.' For T askedhim what to eat. Who is to eat?' she said: "The person speaking. The "I".'

On the Article Switch test, her responses captured the meaning differences pre-cisely. For sentence pair 3, the one about the puppy in the pet shop, she said '(a) Theimplication is that Maggie looked at a particular puppy, but decided not to buy anypuppy at all. (b) A puppy, one puppy. She decided against buying this particular puppy,but she might buy another one.'

On the Ambiguities (Sentences) test, she gave excellent paraphrases of the twomeanings of the sentences, handling lexical, deep structure and surface structureambiguity all with equal ease. For, example, for 'Flying planes can be dangerous' shesaid 'A plane flying in the air can be a dangerous object. Flying planes yourself can bedangerous.' For T know a taller man than Bill': 'Well, it might mean—I know a manwho is taller than Bill. Or, I know a man taller than the one Bill knows who is tall.'For 'Dick finally decided on the boat': 'He decided to take the boat, or he made hisdecision while on the boat.'

Page 273: Rich Languages From Poor Inputs

258 C. Chomsky

On the Ambiguity (Subject phrases) test, JC recognized the ambiguity for all thesentences. She supplied both is and are for each one, unprompted, and explained themeaning differences accurately.

She analyzed the nature of the problem with the Illicit Comparison sentences ina very sophisticated fashion. In each case she stated the two senses of the adjective,describing the difference with precision. For example, "The movie was longer than herhair': 'Not a good comparison. Hair is long in terms of inches. A movie is long in termsof time or hours' "This math problem is not as hard as that rock': 'Wrong comparisonas before. A math problem is difficult to work out. A rock is hard in terms of solidity,not in terms of working out.'

On the Illicit Conjunction test, she gave good explanations why the sentences wereunacceptable. For example, 'Bill called John a fool and Susan up': 'No, it doesn't fit.These two don't go together. It sounds like he called Susan the word "up". It shouldmean he called her up on the telephone.'

On the Tag Questions test, JC finished the list easily, answering quickly and withcertainty. Her response to the final sentence was exceptional: 'The one who robbed thebank was John, ?' She supplied, 'wasn't it? You had me trapped there. I wasn'tsure. I almost said "Is, that not so?"—"n'est-ce pas?" or in German "nicht wahr?"'

She expanded all Contractions readily, 100 percent correctly. On the Phrase Anal-ysis measure, she supplied is or are correctly for all the sentences, interpreting theinternal structure of the subject phrase appropriately.

Prosodies. JC succeeded on all prosodic tests except for Yes/no Question Into-nation. On the Compound Noun Stress test, JC finished the test pairs successfully,and we gave her additional items on which she also was successful, for example,BLACKbird: 'species of bird' vs black BIRD: 'any bird black in color;' and FRENCHteacher: 'teacher who teaches the language French' vs French TEACHER: 'teacher her-self is French'. Clearly she both perceived the stress differences and interpreted themcorrectly.

She completed both the Contrastive Stress: Pronoun Reference and Focus of Nega-tion tests easily and correctly. She exhibited no difficulty in perceiving the stress, andin recognizing its use for this type of linguistic contrast and emphasis.

Her replies in the Focus of Negation test were as follows:

JOHN didn't sell Bill the car. 'A car was sold to Bill but it wasn't John who sold it.'John didn't SELL Bill the car. 'John may have loaned Bill the car.'John didn't sell BILL the car. 'John sold a car all right, but not to Bill.'John didn't sell Bill the CAR. 'John sold Bill something, but not a car.'

The Yes/no Question Intonation test was the one test on which JC performedpoorly. She did not perceive the rising/falling intonation differences, and classifiedmost of the sentences as statements. It is of interest that the speaker took care to varyonly the intonation while he spoke the sentences, holding his body still. In discussion

Page 274: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 259

afterward, JC explained, 'I don't believe I can do it on the basis of inflection. I go bythe movements of the head. In real life if someone was asking me something like thatthey would say, "You missed the first part of the movie?"' (accompanied by drawingher body up and back, and spoken with neutral intonation).

Developmental Sentence Scoring (Table i) The DSS procedure was applied to a sampleof thirty-six sentences that JC spoke during a conversation with the MIT researchgroup. JC's DSS score on the basis of this corpus is 20.67, well above the 10.94 meanfor age 6:6.

A portion of JCs oral language sample is provided below:

E: What is the story about how you and Judy got together?

JC: Well, Judy was teaching a class at American River College. A night class. It's on interpreting.She wanted me to give a talk to the class, about interpreting for the deaf-blind. Well, she droveme home with her since it was at night, a night class. We had dinner, then started out. She hasa big camper-car, and what did it do but just politely stop in the middle of the street at a stopsign and refused to go. So we sat there. She tried to signal somebody, to ask them to call hergarage. Her gestures were unnoticed. But finally some young man passing stopped. He was onhis way to college, so he phoned the garage for her, then he came back and waited. Nothinghappened. Finally he went on to college to let everybody know, when we weren't there, whathad happened. And then he came back again. Later a group of the students and some of theparents came out to rescue us. We never got to the class that night.

As can be determined by this sample, JCs spoken language is of high quality, com-parable to the speech of hearing individuals. It is fluent, idiomatic, and conversational.Note the use of the interesting phrase and what did it do but just politely stop Thissample is typical of JCs oral language in its naturalness and freedom from error.

Written Language A sample of JC' s writing is included to demonstrate her writingability. This is an excerpt from a letter typed by JC to a member of the MIT researchgroup.

Thank you enormously for all you did for me at MIT, and for being such an overwhelminglynice person. And great thanks to N. also. You two together treated R. and me just about likeroyalty. We certainly did appreciate everything you did for us.

Now about that Abraham Lincoln robot in Disneyland. I didn't think of it until we were onthe plane headed for home, and, when I did, I wondered if you or N. knew about it. Someonetold me that when the robot recites the Gettysburg address the lips move so distinctively deafpersons can read them. I asked my boss J. about this; she said it didn't seem to her the lip-movement was all that distinctive. So I think you are right: it has more to do with lighting andsound than with any motion of the lips.

The writing is sophisticated and fluent. Sentence structure is complex, including sev-eral instances of two-level deep subordination: Someone told me (main verb), the lipsmove (embedding i), deaf persons can read them (embedding 2); she said (main verb),

Page 275: Rich Languages From Poor Inputs

260 C. Chomsky

it didn't seem (embedding i), the lip-movement was. ..(embedding 2). Vocabulary isadvanced, for example, overwhelmingly, distinctively. Grammar, spelling, and punctu-ation are error-free. This sample is typical of the writing in JC's letters. She also writesshort stories, for both children and adults.

In summary, JCs knowledge of English is extraordinarily advanced. Indeed, herabilities exceed those of the average hearing adult. Her spoken and written languageare mature, fluent, and error-free. She scored well above average for the hearing adulton all the standardized tests, and performed almost perfectly on the syntax test for thedeaf. She achieved a perfect score on all the Special-Purpose tests, with the exceptiononly of the Yes/no Question Test in which she did not perceive intonation differences.JC has the linguistic command of a highly literate and sophisticated adult, and we mayassume she has reached her full potential in language.

Discussion

This section summarizes the results for the language areas studied: vocabulary, syntax,prosodies, and spoken and written language.

Vocabulary

The vocabulary skills of the three subjects were good, and compare favorably withhearing individuals. On standardized tests for the hearing (WAIS Vocabulary), JCscored well above average and LD scored at the high end of the average range, KBscored just below the average range. Their definitions were of high quality and con-tained many details. For example, LD defined diamond as 'a stone that comes out ofthe ground; very hard stone, for rings and machine tools' and JC gave this definitionfor espionage: 'undercover work which includes spying and destruction in enemycountries'. The high level was maintained throughout. On the WAIS Similarities test,the performance of all three equaled or exceeded the hearing standard.

Examples of complex vocabulary occurred in the spontaneous conversation of thesubjects as well, in many cases where a simple word would have done. For exam-ple, note the use of varieties, intricate, and opportunity in LD's conversation: "Theytransferred me to another department so I can do many varieties of work.' T do manydifferent kinds of jobs such as making intricate wires for the tail lights...' 'So I nevergot the opportunity to practice with the hearing aid.'

On the other standardized tests drawn from Stanford-Binet, performance for twoof the subjects was excellent. JC succeeded even with Proverbs, which were out ofrange for the other subjects. Knowledge of the special character of proverbs appears torequire exposure or perhaps instruction beyond what has been available to LD and RB.We do not have information about the source of JCs knowledge, but it was clear sheunderstood the principle of proverb interpretation, as she was able to offer a correctgeneralization even for a proverb with which she was unfamiliar.

Page 276: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 261

Syntax

The syntactic abilities of all three subjects were excellent in comparison with a deafpopulation. Their scores on a syntax test normed on a deaf population (Quigley et al.1978) were well above the norms for deaf speakers. In addition, the Special-Purposelinguistic tests indicated many areas of syntactic competence. In JC s case, there wascomplete command of all structures tested, and a high degree of metalinguistic skillas well. The other two subjects showed a good command of general syntax, with areasof deficit limited to particular details of English.

The deficits are evident in the following areas on the Special-Purpose linguistictests. LD and RB, for example, interpreted syntactic constructions according to thegeneral rules of English and often failed to take account of particular exceptions. Anexample is LD's processing of the verb promise as a regular verb, rather than as anexception, on the Deletions test. This processing according to general rules and lackof familiarity with exceptions is consistent with reduced language exposure imposedby deafness and with early stages in child language development as well. A strengthof the language learner is the ability to construct implicit rules on the basis of a fewexamples, and then to use these rules widely, extending them to related constructionsand new vocabulary. Specific exceptions must be learned one by one. Until each oneis learned, the language user assumes the general rules apply. This was the case withLD and RB on the Deletions test for words whose exceptional status was unfamiliar tothem. Their answers revealed knowledge of the basic rules, and a lack with regard tospecific details. They simply used the regular rules too widely, LD for all the exceptionsand RB for all but one set: the adjectives easy and hard in Mary is easy/hard to see. Inthis one case, RB was able to interpret the structure with easy/hard as an exception,recognizing that Mary is the object, rather than the subject, of the verb see.

Another syntactic difficulty that LD and RB exhibited was with Tag Questions.Neither one was able to supply correct tags on this test. The form of a tag in English iscomplex and constrained by the form of the sentence to which it is added. Because oftheir complexity, tags are typically learned fairly late by children and pose problemsfor foreigners learning English. They are an unusual construction, peripheral to thebasic structure of the language. Other languages, for example, have only one formfor tags (cf. German nicht wahr and French n'est-ce pas) rather than variable tags ofcomplex form, and English itself offers the option of using the single word right? asin You ordered the roast beef, right? Non-mastery of tags affects a limited aspect of thelanguage, not a basic structure.

The function of articles in the language is far more basic to the language itself.English has both a definite and an indefinite article that occur with high frequency,and that interact in interesting and subtle ways as brought out in the Article Switchtest. Incomplete command of the article system is a deeper problem than failure toaccommodate particular exceptions or tag questions. What is interesting here is thatLD performed poorly on the Article Switch test, although his use of articles in speech

Page 277: Rich Languages From Poor Inputs

262 C. Chomsky

and writing was, so far as we could observe, flawless. This test uncovered a gap in hisknowledge that was not apparent from observations of his productive language. Bycontrast, RB's speech and writing contain omissions of the article a, but his knowledgeof the article system is complete enough to include the subtleties measured in the test.These are interesting examples of the distinction between linguistic competence andlinguistic performance. Language production does not always reflect what speakersknow about their language, and spoken language may be an inaccurate indicator ofunderlying knowledge. It is often possible, as in this case, to learn more about specificareas of competence by probing comprehension than by analyzing spoken languagesamples.

Ambiguity detection is another domain in which both LD and RB exhibitedreduced performance. Ambiguity detection differs from the other tests of syntax inthat it relies more heavily on linguistic awareness than the other tests. Subjects maynot succeed in detecting an ambiguity, that is, notice a second meaning for a sentenceon their own, although they can recognize and confirm (or reject) a second meaningwhen it is suggested. Our testing examined detection ability, a metalinguistic skill.Both LD and RB easily detected lexical ambiguity, and had less success with structuralambiguity. This accords with the developmental picture in children, in whom the abil-ity to detect structural ambiguity develops considerably later than detection of lexicalambiguity. Recognizing structural ambiguity appears to require greater metalinguisticskill, which the two subjects have not achieved.

In sum, the syntactic deficits of the two subjects tend to be limited to marginalaspects of English, with the basic syntax of the language largely in place. Syntacticknowledge exceeds that of most deaf persons for all three subjects, and in JCs case,equals that of highly sophisticated hearing speakers. JCs superior language skills, ofcourse, may well reflect the fact that her exposure to language, before loss of sight andhearing, was considerably longer than that of the other subjects.

Prosodies

These tests required the subjects to use suprasegmental aspects of the speech sig-nal (stress and intonation) to make lexical and syntactic interpretations. Two sepa-rate questions were under investigation here. One, could the subjects perceive, withTadoma, the physical differences in stress and intonation that the examiner pro-nounced? And, two, if the differences were perceived, could the subjects use thisinformation to make correct syntactic interpretations?

Recall that in all the other tests, perception was not under examination. In theother tests attempts were made to overcome any limitations of Tadoma perceptionby providing Braille copies of the tests, and discussing the wording of the tests withthe subjects to make sure they understood the questions. In these prosodic testsperception itself was examined, along with knowledge of the linguistic role of thesuprasegmental features.

Page 278: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 263

All three subjects experienced difficulty with the prosodic tests. The one test theyall did well on was the Compound Noun test. All three were able to distinguishcompound nouns like GREENhouse from the phrase green HOUSE, reporting themeanings correctly. Clearly they perceived the stress difference, and understood thelinguistic function it serves in distinguishing compound nouns from adjective-nounsequences.

JC was the only subject to succeed with the Contrastive Stress tests. In both Focusof Negation and Pronoun Reference she made the correct interpretations, clearlyperceiving the variations in stress and understanding their function. LD and RBperformed poorly on both Contrastive Stress tests. Although they reported perceiv-ing the stress variations in the Focus of Negation sentences, they did not recog-nize any meaning differences associated with the stress differences. On the PronounReference test LD did not perceive the stress differences, and results for RB wereindeterminate.

It is of interest to consider the linguistic distinction in stress processes that LD andRB have and have not mastered. As noted, they succeeded with Compound NounStress and did not succeed on Contrastive Stress. The compound noun/adjective-noun phrase distinction with which they had no trouble is a stress-related syntacticand lexical process that is basic to the language. The Contrastive case that they failed tointerpret, although they perceived the stress, uses stress for emphasis and Contrastivepurposes. It appears that the basic processes are known, and it is the more peripheralprocesses (such as emphasis and contrast) which are missing. Once again, as was thecase in syntax, LD and RB succeeded with structures that are general and regular inthe language, and had trouble with the exceptional constructions.

Intonation posed a problem for all three subjects. None of them perceived thedifferences in rising/falling intonation on the Yes/no Question test. This failure to per-ceive intonation differences is consistent with the relatively poor ability of humans todiscriminate frequency changes in tactile stimulation as documented by Rothenberg,Verrillo, Zahorian, Brachman, and Bolanowski (1977). Given that intonation was notavailable to be interpreted as a cue in such constructions, it is hardly surprising thatthe subjects were unable to succeed on this test.

Oral and Written Language

The oral language of the three subjects was fluent and mature. In the case of JC andLD it is comparable to the language of hearing individuals. RBs spoken language con-tained some features common to deaf speakers, such as lack of verb tense agreementand article omission, for example.

The tentative DSS scores for all three subjects placed them well above the six-year-old level. All three scored above 20, in contrast to the mean for the six-year-oldnorming group of 10.94.

Page 279: Rich Languages From Poor Inputs

264 C. Chomsky

The written language of all three subjects was fluent and grammatical. They didtheir own typing, and showed mastery of the mechanisms of spelling and punctuation.They all used subordination in their writing.

General Summary

The three Tadoma users have a command of English that exceeds that of many deafpersons, and in many areas compares favorably with hearing speakers. Their back-grounds differ, and they provide evidence in different ways that spoken languagecan be learned effectively through an unorthodox sensory route. Touch might seemunlikely as a candidate for transmission of spoken language, but we see that it mayfunction successfully for learning language both from the early stages, as with LDand KB who were deafened in infancy, and at the advanced level, as with JC whowas deafened after language was well established. Even when a deaf-blind individualdivides his/her already limited linguistic exposure between English and ASL, as inRBs case, spoken language can still develop to a high degree.

In JC's case, we are interested in aspects of language that she has learned since ageseven. Her situation might be viewed as less dramatic than LD's and RBs, as in hercase considerable vocabulary and a major portion of English grammar were alreadyknown. JC might have managed linguistically had she not continued to develop herlanguage past age seven, but merely maintained what she knew at that time. Preser-vation of the status quo would have permitted communication with others, and, fromthe point of view of this study, provided satisfactory evidence that speech can indeedbe perceived and language preserved through touch alone. JC, however, did notremain at the linguistic level of a seven-year-old. She has made normal progress intomature language. Her language today is in fact not only normal but extraordinarilyadvanced. As noted earlier, she has the linguistic command of a highly literate andsophisticated adult. With the exception of intonation which is not available throughTadoma, all the linguistic details are in place.

JC is impressive not only in her high test scores, but also in the detailed accuracyand linguistic finesse with which she handled the questions. She has an extremelywell-developed sense of language, a high degree of metalinguistic awareness, and ananalytic ability with language that amazed us. Her responses were the sort that mighthave come from a graduate student in linguistics.

All of these results attest to JC s ability to progress with language fully and normally,indeed to a level well above average, with recourse only to touch as a source of input.It would seem that the tactile sense has enabled her to reach full potential in language,with mastery of all detail that we were able to test. Whereas with our other two subjectswe observe Tadoma fostering language development from an initial (or very early)stage to a fairly advanced level but one that lacks various linguistic details, in JC s caseTadoma has supported extensive elaboration. The information is clearly available, atthe early and at the most advanced levels, even through the ill-suited tactile sense.

Page 280: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 265

Conclusion

This study demonstrates that the skin is able to transmit information about speechthat is rich enough to permit the development of language. With our subjects the eyeand the ear have been bypassed successfully and the necessary information deliveredto the brain through touch alone. The three subjects also demonstrate that speech canbe successfully processed online without sight or hearing, and that the tactile sensecan suffice not only for perception of spoken language but also for learning to producespeech.

Further, our observations reveal a relatively minor effect on language achievementof severe restriction on amount and range of language input. Exposure to languageis drastically reduced for the deaf-blind, whose world is eighteen inches away, armslength. Language input is available only when a conversational partner is literallywithin reach, and from reading Braille. Nevertheless, on the basis of even such limitedlinguistic evidence, for these three subjects virtually normal language is established,and the areas of deficit are few.

We note certain conditions that are present for our subjects, and we are left towonder which of them are critical to the success of their endeavor. With regard tobackground, our subjects are not multiply handicapped, but only sensorially deprived.Brain function is normal so far as we know. In all three cases, mental development wasnormal up to the time of illness, and language was developing normally.

With regard to training, various factors may contribute to the success of Tadoma.First, the subjects received many years of one-on-one training in this method fromdevoted teachers. Second, the nature of the Tadoma display (a talking face) is such thatmultidimensional access to information about the speech signal (including vibration,air flow, and lip and jaw movements) is provided. Third, Tadoma combines learningto produce speech with learning to perceive it. Finally, the use of the hand in Tadomamay provide a significant reception advantage over systems that employ other bodysites.

Individual qualities may also play a role, and it should be recognized that our threesubjects may not be typical of the deaf-blind population as a whole. They certainlydo not represent individuals who were congenitally impaired. Personal aptitude andcharacteristics such as inquisitiveness and drive may be important factors in a per-sons ability to learn and use spoken language with a system such as Tadoma. Thisunorthodox sensory route to language, though available to some individuals, may notbe equally accessible to all.

We simply do not know which of these factors, or what others that we have notconsidered, are critical to the success of our subjects. As observers of one of naturesexperiments, we can only examine the outcome and speculate about conditions. Whatis clear, however, is that language is established under conditions of extreme stimuluspoverty. The human language faculty is clearly adequate to the task of constructing a

Page 281: Rich Languages From Poor Inputs

266 C. Chomsky

rich linguistic system even under the unusual conditions of an impoverished stimulusdelivered through an unlikely channel.

Acknowledgments

The work reported here was carried out in the Sensory Communication Group ofthe Research Laboratory of Electronics at the Massachusetts Institute of Technology.We are indebted to the members of the research group for their close collaborationthroughout all phases of the research. In particular we are grateful to Charlotte Reedfor many discussions of the work and for her extensive help in the preparation of thisreport.

This work was supported by the National Institutes of Health (Grant No. i ROI

Appendix

Special-Purpose Linguistic Tests

The Special-Purpose tests administered to the subjects are presented in full here. Itemsin these tests were drawn from a variety of sources in the linguistic literature. Amongthe sources are Fromkin and Rodman (1983) and Akmajian and Heny (1975).

An asterisk (*) preceding a sentence indicates an ungrammatical sentence.

Structure

Report on Sentence Meaning

(a) Deletions. Identify missing information.

1. Mary encouraged John to apply for the job. Who is to apply?2. Mary was encouraged by John to apply for the job. Who is to apply?3. John is eager to see. Who is doing the seeing?4. John is easy to see. Who is doing the seeing?5. John told Susan to wash the dishes. Who is to wash the dishes?6. John promised Susan to wash the dishes. Who is to wash the dishes?7. I told him what to eat. Who is going to eat?8. I asked him what to eat. Who is going to eat?

(b) Article switch. Describe the difference in meaning between two sentences whichdiffer in placement of a and the.

la. I bumped into a man on Maple Street, and when I turned around to apolo-gize, the man ran away.

b. I bumped into the man on Maple Street, and when I turned around toapologize, a man ran away.

Page 282: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 267

ia. I didn't mind killing the chicken, but I didn't enjoy eating a chicken after-wards.

b. I didn't mind killing a chicken, but I didn't enjoy eating the chicken after-wards.

33. Maggie looked at the puppy at Peter's Pet Shop, but later she decided not tobuy a puppy,

b. Maggie looked at a puppy at Peter's Pet Shop, but later she decided not tobuy the puppy.

43. The police saw the robber on Main St., and shot a man on Walnut St.b. The police saw a robber on Main St., and shot the man on Walnut St.

(c) AmbiguitySentences. Give two meanings for these sentences.

1. Is he really that kind?2. The long drill was boring.3. They fed her dog biscuits.4. Leonard finally decided on the boat.5. She hit the man with the glasses.6. He bought the picture in her living room.7. Congress passed a dangerous drug bill.8. He kept the car in the garage.9. They are moving sidewalks.

10. They are biting dogs.11. Flying planes can be dangerous.12. The chicken is ready to eat.13. The shooting of the hunters was terrible.14. I know a taller man than Bill.

Subject phrases. Fill in is or are, and give two meanings according to ambiguoussubject phrase.

1. Flying planes dangerous.2. Moving sidewalks dangerous.3. Exploding firecrackers illegal.4. Biting dogs a nuisance.5. Speeding cars deadly.

Account for Ungrammaticality

(a) Illicit comparison. Tell what is wrong in sentences in which two items (thoughboth long, for example) may not be compared.

1. * The movie was longer than her hair.2. *This math problem is not as hard as that rock.

Page 283: Rich Languages From Poor Inputs

268 C. Chomsky

3. *John is as sad as the movie I saw last week.4. *Hydrogen is lighter than the blue she painted her room.5. *Red velvet is softer than her voice.

(b) Illicit conjunction. (6 sentence pairs: 3 acceptable, 3 unacceptable) Decide if twosentences maybe legitimately conjoined. If not, explain.

1. John was looking for a hat.John was looking for a pair of gloves.

John was looking for a hat and a pair of gloves.2. Mary read an interesting book.

Mary read a fascinating magazine.Mary read an interesting book and a fascinating magazine.

3. John walked along the crowded street.John walked down the steep steps.

John walked along the crowded street and down the steep steps.4. The station wagon looked like a good buy.

The station wagon looked like a truck.*The station wagon looked like a good buy and a truck.

5. Peter took his sweater off.Peter took his time.

* Peter took his sweater off and his time.6. Bill called John a fool.

Bill called Susan up.*Bill called John a fool and Susan up.

Produce Structure Dependent Forms

(a) Tag questions. Place appropriate tag at end of statement to turn it into a question,

i. John is an engineer, ?2. You and Bill have been here since 6 o'clock, ?3. You aren't certain of what you think, ?4. Bill and I don't always agree, ?5. These points, the chairman will take up later, ?6. Mary shouldn't see him alone, ?7. They could have been going, ?8. There were three men in the park, ?9. In the park were three men, ?

10. Three men were in the park, ?11. For you to do that would be crazy, ?12. What I just said bothered you, ?13. I bet Mary won't leave today, ?14. I expect John won't sing the songs, ?

Page 284: Rich Languages From Poor Inputs

Epilogue: The Tadoma Method 269

15. I don't expect John will sing the16. John is the one who robbed th17. It was John who robbed 18. The one who robbed the bank

Contractions. Give full form of the contracted item.

1. What's he been doing all day?2. He could've tried harder.3. What's in that box on the table?4. He won't do that again.5. You'll never agree with me.6. What's he want that book for?7. He'd never been here before today.8. I knew you'd be good at this.9. I should've said no.

10. I knew foe'd finished his work by 5 o'clock.11. I know he's been here before.

are:

1.2.

3-

4-

5-

1.

2.

3-

4-

Washing dishesRaising flowersKnitting sweatersPainting picturesWriting letters

Sleeping childrenDancing bearsGrowling lionsSwimming ducks

dull.fun.

satisfying.hard.

interesting.

beautiful.amusing.frightening.

pleasant.

Prosodies

These tests examine the subject's knowledge of and ability to utilize intonation andstress cues to meaning in phrases and sentences.

(a) Compound noun stress. Distinguish the meaning of compound nouns andadjective-noun sequences.

1. Look at that HOT dog/hot DOG on the front steps.2. He stopped to look at the GREENhouse/green HOUSE on the corner.3. Who lives in the WHITE House/white HOUSE?4. There are three BLACKboard erasers/black BOARD erasers in that box.

(c) Phrase analysis. Fill in is or are according to the internal structure of the subjectphrase containing Verb -ing.

is:

(b)

?

?

??

Page 285: Rich Languages From Poor Inputs

270 C. Chomsky

(b) Contrastive stress: Pronoun reference. Identify pronoun reference with normaland Contrastive stress.

1. Peter kicked Bill, and then / kicked 'im. Who did I kick?2. Peter kicked Bill, and then / kicked him. Who did I kick?3. Peter kicked Bill, and then e kicked Mary. Who kicked Mary?4. Peter kicked Bill, and then he kicked Mary. Who kicked Mary?5. Peter kicked Bill, and then e hit 'im. Who hit who?6. Peter kicked Bill, and then he hit him. Who hit who?

Contrastive stress: Focus of negation. Identify difference in implication asdifferent words are stressed.

1. John didn't sell Bill the car.2. John didn't sell Bill the car.3. John didn't sell Bill the car.4. John didn't sell Bill the car.

(c) Yes/no question intonation. Differentiate statements from questions on the basisof intonation. Statements (falling intonation)

1. They don't know the girl's last name.2. It snowed again on Thursday.3. The children are asleep already.

Questions (rising intonation)

1. She tore her sweater in the fight?2. He hurt himself this morning?3. He'll be here at nine tomorrow?

Page 286: Rich Languages From Poor Inputs

References

Abimbola, I. (1988). The problem of terminology in the study of student conceptions in science.Science Education, 72/2 (April): 175-84.

Adani, P., Lely, H. K. J. van der, Forgiarini, M., and Guasti, M. T. (2010). Grammatical featuredissimilarities make relative clauses easier: A comprehension study with Italian children.Lingua, 120: 2148-66.

Adestine, G. van (1932). An evaluation of the Tadoma method. Volta Review, 34: 199.Aduriz, I., Aranzabe, M. J., Arriola, J. M., Atutxa, A., Diaz de Ilarraza, A., Ezeiza, N., et al. (2006).

Methodology and steps towards the construction of EPEC, a corpus of written Basque taggedat morphological and syntactic levels for the automatic processing, in A. Wilson, P. Rayson,and D. Archer (eds), Corpus Linguistics Around the World: Language and Computers, 56. TheNetherlands: Rodopi, 1-15.

Akmajian, A. and Heny, P. (1975). An Introduction to the Principles of Transformational Syntax.Cambridge, MA: MIT Press.

Alcorn, S. (1932). The Tadoma method. Volta Review, 34: 195-8.Aldridge, E. (2008). Generative approaches to ergativity. Language and Linguistics Compass,

2/5:966-95.Alexiadou, A. and Anagnostopoulou, E. (2007). The Subject-in-Situ Generalization revisited,

in U. Sauerland and H.-M. Gartner (eds), Interfaces + Recursion = Language? Berlin: Moutonde Gruyter, 31-60.

Ambridge, B., Rowland, C. P., and Pine, J. (2008). Is structure dependence an innate constraint?New experimental evidence from children's complex-question production. Cognitive Science,32: 184-221.

Theakston, A., and Tomasello, M. (2006). Comparing different accounts of non-inversion errors in children's non-subject wh-questions: What experimental data can tell us?Journal of Child Language, 30: 519-57.

Artiagoitia, X. (2002). The functional structure of the Basque noun phrase, in X. Artiagoitia,P. Goenaga, and J. A. Lakarra (eds), Erramu Boneta: Festschrift for Rudolf P. G. deRijk. Bilbao:Anex to ASJU, 73-90.

Baayen, H., Burani, C., and Schreuder, R. (1997). Effects of semantic markedness in the pro-cessing of regular nominal singulars and plurals in Italian, in G. Booij and J. van Marie (eds),Yearbook of Morphology 1996. Dordrecht: Kluwer Academic Publishers, 13-33.

Badecker, W and Kuminiak, P. (2007). Morphology, agreement, and working memory retrievalin sentence production: Evidence from gender and case in Slovak. Journal of Memory andLanguage, 56: 65-85.

Baker, M. C. (1988). Incorporation. Chicago: Chicago University Press.(2001). The Atoms of Language: The Mind's Hidden Rules of Grammar. New York, NY: Basic

Books.(2003). Lexical Categories: Verbs, Nouns, and Adjectives. Cambridge: Cambridge

University Press.

Page 287: Rich Languages From Poor Inputs

272 References

Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H.,Nelson, D. L., Simpson, G. B., and Treiman, R. (2007). The English Lexicon Project. BehaviorResearch Methods, 39: 445-59.

Bastiaanse, R. and Thompson, C. (2003). Verb and auxiliary movement in agrammatic Broca'saphasia. Brain and Language, 87: 286-305.

and van Zonneveld, R. (1998). On the relation between verb inflection and verb positionin Dutch agrammatic aphasics. Brain and Language, 64: 165-81.

Bates, E., Benigni, L., Bretherton, I., Camaioni, L., and Volterra, V (1977). From gesture tothe first word: On cognitive and social prerequisites, in M. Lewis and L. Rosenblum (eds),Interaction, Conversation and the Development of Language. New York: John Wiley.

Friederici, A., and Wulfeck, B. (1987). Comprehension in aphasia: A cross-linguistic study.Brain and Language, 32/1: 19-68.

Bay, E. (1964). Aphasia and intelligence. International Journal of Neurology, 4/3: 251-64.Becker, M. (2004). Learning verbs that lack argument structure: The case of raising verbs, in

Jacqueline van Kampen and Sergio Baauw (eds), Proceedings of GALA 2003, LOT OccasionalSeries 3. Netherlands: Utrecht University, Graduate School of Linguistics.

Belletti, A. (2004). Aspects of the low IP area, in L. Rizzi (ed.), The Structure of CP and IP.Oxford: Oxford University Press, 16-51.

(2009). Notes on passive object relatives, in P. Svenonius (ed.), Functional Structure fromTop to Toe. Oxford: Oxford University Press.

and Contemori, C. (2010). Intervention and attraction: On the production of Subject andObject relatives by Italian (young) children and adults, in J. Costa (ed.), Language Acqui-sition and Development: Proceedings of GALA 2009, University of Lisbon, September 2009.Cambridge: Cambridge Scholars Publishing, 505-18.

Friedmann, N., Brunato, D., and Rizzi, L. (2oio/submitted). Does gender make a dif-ference? Comparing the effect of gender on children's comprehension of relative clausesin Hebrew and Italian. Ms. ciscl, University of Siena, Language and Brain Lab, School ofEducation, and Tel Aviv University.

and Rizzi, L. (1988). Psych-verbs and Th-Theory. Natural Language and Linguistic Theory,6:291-352

(2009). Moving verbal chunks, in L. Bruge, A. Cardinaletti, G. Giusti, N. Munaro,and C. Poletto (eds), Functional Heads. Oxford: Oxford University Press.

Bellugi, U, Poizner, H., and Klima, E. (1989). Language modality and the brain. Trends in theNeurosciences, 12: 380-8.

Bender, B., Puck, M., Salbenblatt, J., and Robinson, A. (1986). Dyslexia in 47 XXY boys identi-fied at birth. Behavior Genetics, 16/3: 343-54.

Bender, E., Sag, L, and Wasow, T. (2003). Syntactic Theory: A Formal Introduction. Stanford:CLSI Publications.

Ben-Shachar, M., Palti, D., and Grodzinsky, Y (2004). Neural correlates of syntactic movement:Converging evidence from two fMRI experiments. Neurolmage, 21: 1320-36.

Bernstein, J. (2001). The DP hypothesis: Identifying clausal properties in the nominal domain,in M. Baltin and C. Collins (eds), Handbook of Contemporary Syntactic Theory. Maiden, MA:Blackwell Publishers, 536-61.

Page 288: Rich Languages From Poor Inputs

References 273

Bernstein-Ratner, N. (1984). Patterns of vowel modification in motherese. Journal of ChildLanguage, 11: 557-78.

Bertoncini, J. and Mehler, J. (1981). Syllables as units in infant perception. Infant Behavior andDevelopment, 4: 271-84.

Berwick, R. C. (1985). The Acquisition of Syntactic Knowledge. Cambridge, MA: MIT Press.(2011). Syntax facit saltum redux: Biolinguistics and the leap to syntax, in A. M. Di Sciullo

and C. Aguero (eds), Biolinguistic Investigations. Cambridge, MA: MIT Press, 65-99.and Chomsky, N. (2011). Biolinguistics: The current state of its evolution and develop-

ment, in A. M. Di Sciullo and C. Boeckx (eds), Biolinguistic Investigations. Oxford: OxfordUniversity Press, 19-41.

Pietroski, P., Yankama, B., and Chomsky, N. (2011). Poverty of the Stimulus revisited.Cognitive Science, 35/7: 1207-42.

and Weinberg, A. (1984). The Grammatical Basis of Linguistic Performance. Cambridge,MA: MIT Press.

Betancort, M., Carreiras, M., and Sturt, P. (2009). The processing of subject and object relativeclause in Spanish: An eye-tracking study. The Quarterly Journal of Experimental Psychology,62/10: 1915-29.

Bever, T G. (1970). The cognitive basis for linguistic structures, in R. Hayes (ed.), Cognitionand Language Development. New York: Wiley & Sons, 279-362.

(1975). Cerebral asymmetries in humans are due to the differentiation of two incom-patible processes: Holistic and analytic, in D. Aaronson and R. Rieber (eds), DevelopmentalPsycholinguistics and Communication Disorders. New York: New York Academy of Sciences,263, 76-86.

(1981). Normal acquisition processes explain the critical period for language learning,in K. C. Diller (ed.), Individual Differences and Universals in Language Learning Aptitude.Rowley, MA: Newbury House, 176-98.

(1987). The aesthetic basis for cognitive structures, in W. Brand and R. Harnish (eds), TheRepresentation of Knowledge and Belief. Tucson: University of Arizona Press, 314-56.

(2009). All language comprehension is a psycholinguistic guessing game: Explaining thestill small voice, in P. Anders (ed.), Issues in the Present and Future of Reading. London andNew York: Routledge, 249-81.

Carrithers, C., Cowart, W., and Townsend, D. J. (1989). Language processing and famil-ial handedness, in A. Galaburda (ed.), From Neurons to Reading. Cambridge, MA: MITPress.

Carroll, J. M., and Miller, L. A. (eds) (1984). Introduction, in Talking Minds: The Study ofLanguage in the Cognitive Sciences. Cambridge, MA: MIT Press.

Chan, S., Hancock, R., and Ryan, L. (in prep.). Only right-handers from left-handed fam-ilies have bilateral representation for words, but all people have left-hemisphere dominancefor syntactic processing.

Jandreau, S., Burwell, R., Kaplan, R., andZaenen, A. (1990). Spacing printed text to isolatemajor phrases improves readability. Visible Language, 25: 74-87.

Biemiller, A. (2005). Size and sequence in vocabulary development: Implications for choosingwords for primary grade vocabulary instruction, in A. Heibert and M. Kamil (eds), Teaching

Page 289: Rich Languages From Poor Inputs

274 References

and Learning Vocabulary: Bringing Research to Practice. Mahwah, NJ: Lawrence Erlbaum,223-42.

Bishop, D. V. M., Adams, C. V, and Norbury, C. F. (2006). Distinct genetic influences ongrammar and phonological short-term memory deficits: Evidence from 6-year-old twins.Genes, Brain, and Behavior, 5/2: 158-69.

Bishop, S. J., Bright, P., James, C., Delaney, T., and Tallal, P. (19993). Different ori-gin of auditory and phonological processing problems in children with language impair-ment: Evidence from a twin study. Journal of Speech, Language, and Hearing Research, 42:155-68.

Carlyon, R. P., Deeks, J. M., and Bishop, S. J. (i999b). Auditory temporal processingimpairment: Neither necessary nor sufficient for causing language impairment in children.Journal of Speech, Language, and Hearing Research, 42: 1295-310.

North, T., and Donlan, C. (1996). Nonword repetition as a behavioural marker for inher-ited language impairment: Evidence from a twin study. Journal of Child Psychology andPsychiatry, 37: 391-403.

Bissex, G. L. (1980). GNYS AT WRK: A Child Learns to Write and Read. Cambridge, MA:Harvard University Press.

Bloom, L. (1970). Language Development: Form and Function in Emerging Grammars. Cam-bridge, MA: MIT Press.

Bloom, Paul (2002). How Children Learn the Meanings of Words. Cambridge, MA: MIT Press.Bod, R. (2009). From exemplar to grammar: A probabilistic analogy-based model of language

learning. Cognitive Science, 33/4: 752-93.Boeckx, C. (2008) Bare Syntax. New York: Oxford University Press.Boland, J. E. (1997). The relationship between syntactic and semantic processes in sentence

comprehension. Language and Cognitive Processes, 12: 423-84.Bonatti, L. L., Pena, M., Nespor, M., and Mehler, J. (2005). Linguistic constraints on statis-

tical computations: The role of consonants and vowels in continuous speech processing.Psychological Science, 16: 451-9.

Boone, K., Swerdloff, R., Miller, B., Geschwind, D., Razani, J., Lee, A., Gaw, L, Gonzalo, L,Haddal, A., and Rankin, K. (2001). Neuropsychological profiles of adults with Klinefelter'ssyndrome. Journal of the International Neuropsychological Society, 7/4: 446-56.

Borer, H. and Wexler, K. (1987). The maturation of syntax, in T. Roeper and E. Williams (eds),Parameter Setting. Dordrecht: Reidel.

(1992). Bi-unique relations and the maturation of grammatical principles. NaturalLanguage and Linguistic Theory, 10: 147-89.

Bornkessel-Schlesewsky, I. and Schlesewsky, M. (2008). An alternative perspective on 'semanticP6oo' effects in language comprehension. Brain Research Reviews, 59: 55-73.

(2009). The role of prominence information in the real-time comprehension oftransitive constructions: A cross-linguistic approach. Language and Linguistics Compass,3/1: 19-58.

Bortfeld, H., Morgan, J., Golinkoff, R., and Rathbun, K. (2005). Mommy and me: Famil-iar names help launch babies into speech stream segmentation. Psychological Science,16: 298-304.

Page 290: Rich Languages From Poor Inputs

References 275

Bottini, G., Corcoran, R., Sterzi, R., Paulesu, E., Schenone, P., Scarpa, P., Frackowiak, R., andFrith, C. (1994). The role of the right hemisphere in the interpretation of figurative aspects oflanguage: A positron emission tomography activation study. Brain, 117: 1241-53.

Bowerman, M. (1982). Reorganizational processes in lexical and syntactic development, in E.Wanner and L. R. Gleitman (eds), Language Acquisition: State of the Art. Cambridge MA:Cambridge University Press, 319-46.

Bowers, J. (1993). The syntax of predication. Linguistic Inquiry, 24/4: 591-656.Bradley, D., Garrett, M. E, and Zurif, E. B. (1982). Syntactic deficits in Broca's aphasia, in

D. Caplan (ed.), Biological Studies of Mental Processes. Cambridge, MA: MIT Press, 269-86.Braine, Martin D. S. (1963). The ontogeny of English phrase structure: The first phase. Language,

39: 1-13-(1966). Learning the positions of words relative to a marker element. Journal of Experi-

mental Psychology, 72/4: 532-40.Bricolo, E., Shallice, T, Priftis, K., and Meneghello, E (2000). Selective space transformation

deficit in a patient with spatial agnosia. Neurocase, 6/4: 307-19.Brill, E. (1993). Automatic grammar induction and parsing free text: A transformation-based

approach. In Proceedings of the 3rd Conference on Applied Natural Language. Stroudsburg, PA:Association for Computational Linguistics, 259-65.

(1995)- Transformation-based error-driven learning and natural language processing:A case study in part of speech tagging. Computational Linguistics, 21: 543-65.

Bromberger, S. and Halle, M. (1989). Why phonology is different. Linguistic Inquiry, 20:51-70.

Brown, H. (1972). Children's comprehension of relativized English sentences. Child Language,11: 89-107.

Brown, J. (1977). Mind, Brain and Consciousness: The Neuropsychology of Cognition. New York:Academic Press.

Brown, R. (1973). A First Language: The Early Stages. Cambridge, MA: Harvard UniversityPress.

and Bellugi, U. (1964). Three processes in the child's acquisition of syntax. Harvard Edu-cational Review, 34: 133-51.

Bruandet, M., Molko, N., Cohen, L., and Dehaene, S. (2004). A cognitive characterization ofdyscalculia in Turner syndrome. Neuropsychologia, 42/3: 288-98.

Brunelliere, A., Franck, J., Ludwig, C., and Frauenfelder, U. H. (2007). Early and automaticsyntactic processing of person agreement. Neuroreport, 18/6: 537-41.

Buchert, R., Thomasius, R., Wilke, E, Petersen, K., Nebeling, B., Obrocki, J., Schulze, O., andSchmidt, U. (2008). Sustained effects of ecstasy on the human brain: A prospective neu-roimaging study in novel users. Brain, 131/11: 2936-45.

Burton, M. W, Small, S., and Blumstein, S. E. (2000). The role of segmentation in phonologicalprocessing: An fMRI investigation. Journal of Cognitive N'euroscience, 12: 679-90.

Burzio, L. (1986). Italian Syntax: A Government and Binding Approach. Dordrecht: Reidel.Caplan, D., Vijayan, S., Kuperberg, G., West, C., Waters, G., Greve, D., and Dale, A. M. (2002).

Vascular responses to syntactic processing: Event-related fMRI study of relative clauses.Human Brain Mapping, 15: 26-38.

Page 291: Rich Languages From Poor Inputs

276 References

Caramazza, A. (1988). Some aspects of language processing as revealed through the analysis ofacquired aphasia: The lexical system. Annual Review ofNeurosdence, 11: 395-421.

Carey, S. (1978). The child as word learner, in M. Halle, J. Bresnan, and G. A. Miller (eds),Linguistic Theory and Psychological Reality. Cambridge MA: MIT Press, 264-93.

(1986). Cognitive science and science education. American Psychologist, 41/10: 1123-30.(2009). The Origin of Concepts. New York: Oxford University Press.

Carreiras, M., Dunabeitia, J. A., Vergara M., de la Cruz-Pavia, I., and Laka, I. (2010). Subjectrelative clauses are not universally easier to process: Evidence from Basque. Cognition, 115:79-92.

Cartwright, T. A. and Brent, M. R. (1997). Syntactic categorization in early language acquisition:Formalizing the role of distributional analysis. Cognition, 63/2: 121-70.

Cassar, M. and Treiman, R. (1997). The beginnings of orthographic knowledge: Children'sknowledge of double letters in words. Journal of Educational Psychology, 89: 631-44.

Champagne-Lavau, M. and Joanette, Y. (2009). Pragmatics, theory of mind and executive func-tions after a right-hemisphere lesion: Different patterns of deficits. Journal ofNeurolinguistics,22: 413-26.

Chang, R, Lieven, E., and Tomasello, M. (2006). Using child utterances to evaluate syntaxacquisition algorithms. Proceedings of the 28th Annual Conference of the Cognitive ScienceSociety. Vancouver, Canada, 154-9.

Chomsky, C. (1969). The Acquisition of Syntax in Children From 5 to 10. Cambridge, MA: MITPress.

(1970). Reading, writing, and phonology. Harvard Educational Review, 40: 287-309.(1971). Write first, read later. Childhood Education, 47: 296-300.(19723). Stages in language development and reading exposure. Harvard Educational

Review, 42/1: 1-33.(i972b). Write now, read later, in C. Cazden (ed.), Language in Early Childhood Education.

Washington, DC: National Association for the Education of Young Children, 119-27. Reprintof Chomsky (1971) with additions.

(1975). How sister got into the grog. Early Years, 6/3 (November): 36-9, 78-9.(19763). After decoding, what? Language Arts, 53: 288-96.(i976b). Invented spelling in the open classroom. Word, 27:499-518. A special issue titled

Child Language—1975, guest-ed. W von Raffler-Engel. Milford, CT: International LinguisticsAssociation.

(19760). Approaching reading through invented spelling. Paper presented at the confer-ence on Theory and Practice of Beginning Reading Instruction, University of Pittsburgh,Learning Research and Development Center, Pittsburgh, PA, May 1976.

(1978). When you still can't read in third grade: After decoding, What?, in S. J. Samuels(ed.), What Research Has to Say about Reading Instruction. Newark, DE: International ReadingAssociation, 13-30.

(1979). Appro aching reading thro ugh invented spelling, in L. B. Resnick and P. A. Weaver,(eds), Theory and Practice of Early Reading, Vol. 2. Hillsdale, NJ: Lawrence Erlbaum Asso-ciates, 43-65.

(1980). Developing facility with language structure, in G. S. Pinnel (ed.), DiscoveringLanguage with Children. Urbana, IL: National Council of Teachers of English, 56-9.

Page 292: Rich Languages From Poor Inputs

References 277

(1981). Write now, read later, in C. Cazden (ed.), Language in Early Childhood Education,rev. edn. Washington, DC: National Association for the Education of Young Children, 141-9.

(19863). Analytic study of the Tadoma method: Language abilities of three deaf-blindsubjects. Journal of Speech and Hearing Research, 29/3: 332-47.

(i986b). Language abilities of three deaf-blind subjects. Journal of Speech and HearingResearch, 29: 332-47.

(1990). Writing before reading: Eighty years later. Paper presented at the AmericanMontessori Society symposium, Montessori in the Contemporary American Culture, April26, 1990, Arlington VA.

Chomsky, N. (1955). The logical structure of linguistic theory. Manuscript, Harvard University.Excerpts published in 1975. New York: Plenum.

(1957). Syntactic Structures. London and The Hague: Mouton.(1962). Current issues in linguistic theory, in M. Halle (ed.), Preprints of papers for the

Ninth International Congress of Linguistics, August 27-31,1962, Cambridge, Mass. Cambridge,MA: Morris Halle, 509-74.

(1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.(1968). Language and Mind. New York: Harcourt, Brace, Jovanovitch.(1970). Remarks on nominalization, in R. A. Jacobs and P. S. Rosenbaum (eds), Readings

in English Transformational Grammar. Waltham, MA: Ginn, 184-221.(1971). Problems of Knowledge and freedom. London: Fontana.(1975). Reflections on Language. New York: Pantheon Books.(1977)- On wh-movement, in P. Culicover, T. Wasow, and A. Akmajian (eds), Formal

Syntax. New York: Academic Press, 77-132.(1980). On cognitive structures and their development, in M. Piattelli-Palmarini (ed.),

Language and Learning: The Debate between Jean Piaget and Noam Chomsky. Cambridge,MA: Harvard University Press.

(1981). Lectures on Government and Binding: The Pisa Lectures. Dordrecht: ForisPublications.

(1982). Rules and Representations. New York: Columbia University Press.(1988). Language and Problems of Knowledge: The Managua Lectures. Cambridge, MA:

MIT Press.(1995). The Minimalist Program. Cambridge, MA: MIT Press.(2000). Minimalist inquiries: The framework, in R. Martin, D. Michaels, and J. Uriagereka

(eds), Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik. Cambridge, MA:MIT Press, 89-155.

(2001). Derivation by phase, in M. Kenstowicz (ed.), Ken Hale: A Life in Language.Cambridge, MA: MIT Press, 1-52.

(20073). Biolinguistic explorations. International Journal of Philosophical Studies, 15/1(March): 1-21.

(2oo7b). Approaching UG from below, in U Sauerland and H.-M. Gartner (eds), Inter-faces + Recursion = Language? Berlin: Mouton de Gruyter.

(2008). On phases, in R. Freidin, C. P. Otero, and M. L. Zubizarreta (eds), FoundationalIssues in Linguistic Theory: Essays in Honor of Jean-Roger Vergnaud. Cambridge, MA: MITPress, 133-66.

Page 293: Rich Languages From Poor Inputs

278 References

Chomsky, N. (2009). Opening remarks, in M. Piattelli-Palmarini, J. Uriagereka, and P. Salaburu(eds), Of Minds and Language: A Dialogue with Noam Chomsky in the Basque Country.Oxford: Oxford University Press, 13-43.

(2012—forthcoming). Foreword, in A. Gallego (ed.), Phases: Developing the Framework.Berlin: Mouton de Gruyter, 1-7.

and Halle, M. (1968). Sound Pattern of English. New York: Harper & Row. Repr. (1991)Cambridge, MA and London: MIT Press.

Choudhary, K. K., Schlesewsky, M., Roehma, D., and Bornkessel-Schlesewsky, I. (2009).The N/joo as a correlate of interpretively relevant linguistic rules: Evidence from Hindi.Neuropsychologia, 47: 3012-22.

Chung, S. and McCloskey, J. (1987). Government, barriers and small clauses in modern Irish.Linguistic Inquiry, 18: 173-237.

Cinque, G. (1999). Adverbs and Functional Heads: A Cross-linguistic Perspective. New York:Oxford University Press.

Clahsen, H. (1999). Lexical entries and rules of language. Brain and Behavioral Sciences, 22:991-1013.

Clahsen, S. and Almazon, M. (2001). Compounding and inflection in language impairment:Evidence from Williams Syndrome (and SLI)*. Lingua, 111/10: 729-57.

Clark, A. (2010). Efficient, correct, unsupervised learning of context-sensitive languages, inM. Lapata and A. Sakar (eds), Proceedings of the 14 Meeting on Natural Language Learningof the Association for Computational Linguistics (CoNLL), July 15-16, 2010, Uppsala, Sweden.Stroudsburg, PA: Association for Computational Linguistics.

andEyraud, R. (2006). Learning auxiliary fronting with grammatical inference. Presentedat the Tenth Conference on Computational Natural Language Learning, New York.

(2007). Polynomial time identification in the limit of substitutable context-free lan-guages. Journal of Machine Learning Research, 8: 1725-45.

and Habrard, A. (2008). A polynomial algorithm for the inference of context-freelanguages, in A. Clark, A. E Coste, and L. Miclet (eds), Grammatical Inference: Algorithmsand Applications, Lecture Notes in Computer Science, 5728. New York: Springer, 29-42.

Clement, J., Brown, D., and Zietsman, A. (1989). Not all preconceptions are misconceptions:Finding anchoring conceptions for grounding instruction of students institutions. Paper pre-sented at AERA, March 1989, San Francisco, CA. Forthcoming in the International Journal ofScience Education.

Clifton, C. and Frazier, L. (1989). Comprehending sentences with long-distance dependencies,in G. N. Carlson and M. K. Tanenhaus (eds), Linguistics Structure in Language Processing.Dordrecht: Kluwer Academic Publishers.

Cohen, L. and Mehler, J. (1996). Click monitoring revisited: An on-line study of sentencecomprehension. Memory and Cognition, 24: 94-102.

Cohen, M. S. and Bookheimer, S. Y (1994). Functional magnetic resonance imaging. Trends inNeurosciences, 17/7: 268-77.

Collins, C. (2005). A smuggling approach to the passive in English. Syntax, 8/2: 81-120.Corina, D., Poizner, H., Bellugi, U, Feinberg, T, Dowd, D., and O'Grady-Batch, L. (1992). Dis-

sociations between linguistic and nonlinguistic gestural systems: A case for compositionality.Brain and Language, 43: 414-47.

Page 294: Rich Languages From Poor Inputs

References 279

Grain, S., McKee, C, and Emiliani, M. (1990). Visiting relatives in Italy, in J. de Villiers andL. Frazier (eds), Language Processing and Language Acquisition. Dordrecht and New York:Kluwer Academic Publishers, 335-56.

and Nakayama, M. (1987). Structure dependence in grammar formation. Language, 63/3:522-43.

Croft, W. (1990). Typology and Universals. New York: Cambridge University Press.Cromer, R. F. (1970). 'Children are nice to understand': Surface structure clues forthe recovery

of a deep structure. British Journal of Psychology, 61: 397-408.(1972). The learning of surface structure clues to deep structure by a puppetshow tech-

nique. Quarterly Journal of Experimental Psychology, 24: 66-76.(1983). A longitudinal study of the acquisition of word knowledge: Evidence against

gradual learning. British Journal of Developmental Psychology, i: 307-16.(1987). Language growth with experience without feedback. Journal of Psycholinguistic

Research, 16/3: 223-31.Cuddy, L., Balkwill, L., Peretz, L, and Holden, R. (2005). Musical difficulties are rare: A study of

'tone deafness' among university students, in G. Avanzini, L. Lopez, S. Koelsch, and M. Majno(eds), TkeNeurosciences and Music, II: From Perception to Performance. Annals oftheNew YorkAcademy of Sciences, 1060: 311-21.

and Duffin, J. (2005). Music, memory, and Alzheimer's disease: Is music recognitionspared in dementia, and how can it be assessed? Medical Hypotheses, 64: 229-35.

Curtiss, S. (1982). Developmental dissociations of language and cognition, in L. Obler andL. Menn (eds), Exceptional Language and Linguistics. New York: Academic Press, 285-312.

(19883). The special talent of grammar acquisition, in L. Obler and D. Fein (eds), TheExceptional Brain. New York: The Guilford Press, 364-86.

(i988b). Abnormal language acquisition and grammar: Evidence for the modularity oflanguage, in L. Hyman and C. Li (eds), Language, Speech, and Mind: Studies in Honor ofVictoria A. Fromkin. New York: Routledge, Kegan, &Paul, 81-102.

(1995). Language as a cognitive system: Its independence and selective vulnerability, inC. Otero (ed.), Noam Chomsky: Critical Assessments, 4. New York: Routledge.

(2011). Revisiting modularity, in Yukio Otsu (ed.), Proceedings of the nth Annual TokyoConference on Psycholinguistics. Tokyo: Hituzi Syobo, 1-33.

and Yamada, J. (1981). The relationship between language and cognition in a case ofTurner's syndrome. UCLA Working Papers in Cognitive Linguistics, 3: 93-116.

Dapretto, M. and Bookheimer, S. (1999). Form and content: Dissociating syntax and semanticsin sentence comprehension. Neuron, 24: 427-32.

Davis, L., Foldi, N., Gardner, H., and Zurif, E. (1978). Repetition in the transcortical aphasias.Brain and Language, 6: 226-38.

De Villiers, J. (1995). Empty categories and complex sentences: The case of wh-questions,in P. Fletcher and B. MacWhinney (eds), Handbook of Child Language. Oxford: BlackwellPublishing.

Dehaene-Lambertz, G. (1997). Electrophysiological correlates of categorical phoneme percep-tion in adults. NeuroReport, 8/4: 919-24.

(2000). Cerebral specialization for speech and nonspeech stimuli in infants. Journal ofCognitive Neuroscience, 12/3: 449-60.

Page 295: Rich Languages From Poor Inputs

28 o References

Dehaene-Lambertz, G. and Baillet, S. (1998). A phonological representation in the infant brain.NeuroReport, 9/8: 1885-8.

Dehaene, S., Anton, J.-L, Campagne, A., Coiciu, P., Dehaene, G. P., Denghian, I., Jobert,A., LeBihan, D., Sigman, M., Palier, C., and Poine, J.-B. (2006). Functional segregation ofcortical language areas by sentence repetition. Human Brain Mapping, 27: 360-71.

and Hertz-Pennier, L. (2002). Functional neuroimaging of speech perception ininfants. Science, 298: 2013-15.

and Gliga, T. (2004). Common neural basis for phoneme processing in infants and adults.Journal of Cognitive Science, 16/8: 1375-87.

Hertz-Pennier, L., and Dubois, J. (2006). Nature and nurture in language acquisition:Anatomical and functional brain-imaging studies in infants. Trends in Neuroscience, 29/7:

367-73-Meriaux, S., Roche, A., Sigman, M., and Dehaene, S. (2006). Functional orga-

nization of perisylvian activation during presentation of sentences in preverbal infants.Proceedings of the National Academy of Sciences, 103/38: 14240-5.

and Pena, M. (2001). Electrophysiological evidence for automatic phonetic processing inneonates. NeuroReport, 12/14: 3155-8.

Christophe, A., and Landrieu, P. (2004). Phenome perception in a neonate with aleft sylvian infarct. Brain and Language, 88: 26-38.

Demiral, S. B., Schlesewsky, M., and Bornkessel-Schlesewsky, I. (2008). On the universality oflanguage comprehension strategies: Evidence from Turkish. Cognition, 106: 484-500.

Dikker, S., Rabagliati, H., and Pylkkanen, L. (2009). Sensitivity to syntax in visual cortex.Cognition, no: 293-321.

Dixon, R. M. W. (1994). Ergativity, Cambridge Studies in Linguistics. Cambridge: CambridgeUniversity Press.

Dostoyevsky, Theodore Michailovich (1864/1943). Notes from the Underground, inB. G. Guerney (ed.), A Treasury of Russian Literature. New York: Vanguard Press, 1943.

Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67: 547-619.Dresher, E. (1999). Charting the learning path: Cues to parameter setting. Linguistic Inquiry,

30: 27-67.and Kaye, J. (1990). A computational learning model for metrical phonology. Cognition,

34: 137-95-Eckert, P. and Rickford, J. R. (eds) (1995). Style and Sociolinguistic Variation. Cambridge:

Cambridge University Press.Eimas, P. D., Siqueland, E. R., Jusczyk, P., and Vigorito, J. (1971). Speech perception in infants.

Science, 171: 303-6.Ellefson, M., Treiman, R., and Kessler, B. (2009). Learning to label letters by sounds or names:

A comparison of England and the United States. Journal of Experimental Child Psychology,102: 323-41.

Emmorey, K. (2002). Language, Cognition and the Brain: Insights from Sign Language Research.Mahwah, NJ: Lawrence Erlbaum.

Epstein, S., Kitahara, H., and Seely, D. (forthcoming). Structure building that can't be, inM. Uribe-Etxebarria and V. Valmala (eds), Ways of Structure Building. Oxford: OxfordUniversity Press.

Page 296: Rich Languages From Poor Inputs

References 281

and Seely, D. (2002). Derivation and Explanation in the Minimalist Program. Maiden, MA:Blackwell Publishing.

Erdocia, K., Laka, I., Mestres, A., and Rodriguez-Fornells, A. (2009). Syntactic complexityand ambiguity resolution in a free word order language: Behavioral and electrophysiologicalevidences from Basque. Brain and Language, 109: 1-7.

and Rodriguez-Fornells, A. (forthcoming). Processing derived word orders inBasque, in P. de Swart and M. Lamers (eds), Case, Word Order, and Prominence: Psycholin-guistic and Theoretical Approaches to Argument Structure. Berlin: Springer.

Evans, M. A. and Saint-Aubin, J. (2005). What children are looking at during shared storybookreading: Evidence from eye movement monitoring. Psychological Science, 16: 913-20.

Falcaro, M., Pickles, A., Newbury, D. P., Addis, L, Banfield, E., Fisher, S. E., Monaco, A. P.,Simkin, Z., and Conti-Ramsden, G. (2006). Genetic and phenotypic effects of phonologicalshort-term memory and grammatical morphology in specific language impairment. GenesBrain and Behavior, 7/4: 393-402.

Feinberg, T. (2001). Altered Egos: How the Brain Creates the Self. Oxford: Oxford UniversityPress.

Feldman, H., Goldin-Meadow, S., and Gleitman, L. R. (1978). Beyond Herodotus: The creationof language by linguistically deprived deaf children, in A. Lock (ed.), Action, Symbol, andGesture: The Emergence of Language. New York: Academic Press, 351-414.

Felsenfeld, S. and Plomin, R. (1997). Epidemiological and offspring analyses of developmentalspeech disorders using data from the Colorado adoption project. Journal of Speech, Language,and Hearing Research, 40: 778-91.

Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thai, D. J., Pethick, S. J., Tomasello, M., Mervis,C. B., and Stiles, J. (1994). Variability in early communicative development. Monographs ofthe Society for Research in Child Development, 59/5: 1-185.

Ferreira, F. and Clifton, C. J. (1986). The independence of syntactic processing. Journal ofMemory and Language, 25: 348-68.

Ferreiro, E. and Teberosky, A. (1982). Literacy before Schooling. New York: Heinemann.Fijalkow, J. (2007). Invented spelling in various contexts. Li- Educational Studies in Language

and Literature, 7/3: 1-4. Accessed at: <http://li.publication-archive.com/start>.Fikkert, Paula. (1994). On the acquisition of prosodic structure. Doctoral dissertation, Leiden

University.Fillmore, C. J. (1968). The case for case, in E. Bach and R. T. Harms (eds), Universals in Linguistic

Theory. New York: Holt, Rinehart, & Winston, 1-88.Fisher, C. (1996). Structural limits in verb mapping: The role of analogy in children's interpre-

tation of sentences. Cognitive Psychology, 31: 41-81.Gleitman, H., and Gleitman, L. R. (1991). On the semantic content of subcategorization

frames. Cognitive Psychology, 23: 331-92.Hall, D. G., Rakowitz, S., and Gleitman, L. (1994). When it is better to receive than to give:

Syntactic and conceptual constraints on vocabulary growth. Lingua, 92: 333-75.Fodor, J. A. (1981). The present status of the innateness controversy, in J. A. Fodor (ed.),

Representations. Cambridge, MA: MIT Press.Fodor, J. D. (2001). Setting syntactic parameters, in M. Baltin and C. Collins (eds), The Hand-

book of Contemporary Syntactic Theory. Oxford: Blackwell Publishing, 730-8.

Page 297: Rich Languages From Poor Inputs

282 References

Fodor, J. D. and Sakas, W. G. (2004). Evaluating models of parameter setting, in A. Bruges, L.Micciula, and C. E. Smith (eds), BUCLD 28: Proceedings of the 28th Annual Boston UniversityConference on Language Development. Somerville, CA: CascadiHa Press, 1-27.

Folli, R. and Harley, H. (2007). Causation, obligation and argument structure: On the nature oflittle v. Linguistic Inquiry, 38/2: 97-238.

Fonteneau, E. and van der Lely, H. K. J. (2008). Electrical brain responses in language-impairedchildren reveal grammar-specific deficits. PLOS ONE, 3/3: 1-6.

Ford, M. (1983). A method for obtaining measures of local parsing complexity throughoutsentences. Journal of Verbal Learning and Verbal Behavior, 22: 203-18.

Frauenfelder, U., Segui, J., and Mehler, J. (1980). Monitoring around the relative clause. Journalof Verbal Learning and Verbal Behavior, 19: 328-37.

Frazier, L. (1987). Syntactic processing: Evidence from Dutch. Natural Language & LinguisticsTheory, 5: 519-59.

andDeVilliers, J. (eds) (1990). Language Processing and Language Acquisition. Dordrecht:Kluwer Academic Publishers.

and Flores DArcais, G. (1989). Filler-driven parsing: A study of gap filling in Dutch.Journal of Memory andLanguage, 28: 331-44.

and Fodor, J. D. (1978). The sausage machine: A new two-stage parsing mo del. Cognition,6:291-325.

Friederici, A. D. (2009). Pathways to language: Fiber tracts in the human brain. Trends inCognitive Sciences, 13, 175-81.

Friedmann, N. (2001). Agrammatism and the psychological reality of the syntactic tree. Journalof Psycholinguistic Research, 30: 71-90.

Belletti, A., and Rizzi, L. (2009). Relativized relatives. Types of intervention in the acqui-sition of A-bar dependencies. Lingua, 119: 67-88.

Friedmann, N. et al. (in prep.). Children's production of relative clauses. Final paper of WG-3of the European Cost Action/33.

and Grodzinsky, Y. (1994). Verb inflection in agrammatism: A dissociation between tenseand agreement. Brain andLanguage, 47: 402-5.

(1997)- Tense and agreement in agrammatic production: Pruning the syntactic tree.Brain andLanguage, 56: 397-425.

Gvion, A., Biran, M., and Novogrodsky, R. (2006). Do people with agrammatic aphasiaunderstand verb movement? Aphasiology, 20: 136-53.

and Novogrodsky, R. (2006). Syntactic movement in agrammatism and S-SLI: Twodifferent impairments, in A. Belletti, E. Bennati, C. Chesi, E. Di Domenico, and I. Ferrari (eds),Language Acquisition and Development. Newcastle: Cambridge Scholars Press, 197-210.

and Novogrodsky, R. (2007). Is the movement deficit in syntactic SLI related to traces orto thematic role transfer? Brain and Language, 101/1: 50-63.

(2008). Subtypes of SLI: SySLI, PhoSLI, LeSLI, and PraSLI, in A. Gavarro and M. JoaoFreitas (eds), Language Acquisition and Development: Proceedings of GALA 2007. Newcastle:Cambridge Scholars Press, 205-17.

(2011). Which questions are most difficult to understand? The comprehension ofwh questions in three sub-types of SLI. Lingua, 121: 367-82.

Page 298: Rich Languages From Poor Inputs

References 283

Reznick, J., Dolinski-Nuger, D., and Soboleva, K. (2010). Comprehension and productionof movement-derived sentences by Russian speakers with agrammatic aphasia. Journal ofNeurolinguistics, 23: 44-65.

Frishkoff, G. A., Collins-Thompson, K., Perfetti, C. A., and Callan, J. (2008). Measuring incre-mental changes in word knowledge: Experimental validation and implications for learningassessment. Behavioral Research Methods, 40/4: 907-25.

Fromkin, V. and Rodman, R. (1983). An Introduction to Language. New York: CBS CollegePublishing.

Gaab, N., Gabrieli, J. D. E., Deutsch, G., Tallal, P., and Temple, E. (2007). Neural correlates ofrapid auditory processing are disrupted in children with developmental dyslexia and amelio-rated with training: An fMRI study. Restorative Neuroscience and Neurology, 25/3-4: 295-31°-

Gabrieli, J. D. E. (2009). Dyslexia: A new synergy between education and cognitive neuro-science. Science, 325/5938: 280-3.

Gallas, K. (1994). The Languages of Learning: How Children Talk, Write, Dance, Draw, and SingTheir Understanding of the World. New York: Teachers College Press.

Gallistel, C. R. and King, A. P. (2009). Memory and the Computational Brain: Why CognitiveScience will Transform Neuroscience. New York: Wiley/Blackwell.

Ganger, J., Dunn, S., and Gordon, P. (2005). Genes take over when the input fails:A twin study of the passive, in Online Proceedings of the 2?th Annual Boston Univer-sity Conference on Language Development, November 5-7, 2004, Boston, MA. Available at<http://faculty.tc.columbia.edu/upload/pg328/GangerBUo4.pdf>.

Wexler, K., and Soderstrom, M. (1998). The genetic basis for the development of tense:A preliminary report on a twin study, in A. Greenhill, M. Hughes, H. Littlefield, and H. Walsh(eds), Proceedings of the 22nd Annual Boston University Conference on Language Development,Boston, 224-34.

Gardner, H. (1993). Multiple Intelligences. New York: Basic Books.Gazdar, G. (1981). Unbounded dependencies and coordinate structure. Linguistic Inquiry, 12/2:

155-84.Gehrke, B. and Grille, N. (2009). How to BECOME passive, in K. K. Grohmann (ed.),

Explorations of Phase Theory: Features, Arguments, and Interpretation at the Interfaces. Berlinand New York: De Gruyter, 231-68.

Gennari, S. P. and MacDonald, M. C. (2008). Semantic indeterminacy in object relative clauses.Journal of Memory and Language, 58: 161-87.

Centner, D. and Boroditsky, L. (2001). Individuation, relativity and early word learning. In M.Bowerman and S. C. Levinson (eds), Language Acquisition and Conceptual Development. NewYork: Cambridge University Press, 215-56.

Gerken, L. A. (1996). Phonological and distributional cues to syntax acquisition, in J. Morganand K. Demuth (eds), Signal to Syntax: Bootstrapping from Speech to Grammar in EarlyAcquisition. Mahwah, NJ: Lawrence Erlbaum, 411-26.

Gervain, J., Nespor, M., Mazuka, R., Horie, R., and Mehler, J. (2008). Bootstrapping wordorder in prelexical infants: A Japanese-Italian cross-linguistic study. Cognitive Psychology, 57:56-74.

Geyer, H. (1991). Sub-categorization as a predictor of verb meaning: Evidence from modernHebrew. Unpublished manuscript, University of Pennsylvania.

Page 299: Rich Languages From Poor Inputs

284 References

Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68:1-76.

(2000). The dependency locality theory: A distance-based theory of linguistic complexity,in Y. Miyashita, A. Marantz, and W. O'Neil (eds), Image, Language, Brain. Cambridge, MA:MIT Press, 95-126.

Hickok, G., and Schutze, C. (1994). Processing empty categories in a parallel parsingframework. Journal of Psycholinguistic Research, 23: 381-405.

Gillette, J., Gleitman, H., Gleitman, L. R., and Lederer, A. (1999). Human simulations of vocab-ulary learning. Cognition, 73: 135-76.

Gleitman, L. R. (1990). The structural sources of verb meanings. Language Acquisition, i: 3-55.Cassidy, K., Papafragou, A., Nappa, R., and Trueswell, J. T. (2005). Hard words. Journal of

Language Learning and Development, 1/1: 23-64.Goethe, J. W. von (1872). Elective Affinities, with an Introduction by V. C. Woodhull. Boston:

D. W. Niles.Goldberg, M. (1991). 2 + 2 doesn't always equal 4: Understanding children's inventions.

Qualifying Paper, Harvard Graduate School of Education.(2012). Arts Integration: Teaching Subject Matter through the Arts in Multicultural Settings,

4th edn. New York: Allyn and Bacon.Goldin-Meadow, S. (2003). The resilience of language: What gesture creation in deaf children

can tell us about how all children learn language, inj. Werker and H. Wellman (eds), TheEssays in Developmental Psychology Series. New York: Psychology Press.

Golinkoff, R., Hirsh-Pasek, K., Cauley, K., and Gordon, L. (1987). The eyes have it: Lex-ical and syntactic comprehension in a new paradigm. Journal of Child Language, 14:

23-45-Colston, C. (1991). Both Lexicons. Unpublished PhD dissertation, UCLA.Gomez, R. L. and Gerken, L. A. (1999). Artificial grammar learning by i-year-olds leads to

specific and abstract knowledge. Cognition, 70/2: 109-35.Goodluck, H. and Tavakolian, S. (1982). Competence and processing in children's grammar of

relative clauses. Cognition, 11: 1-27.Goodman, K. (1967). Reading: A psycholinguistic guessing game. Journal of the Reading

Specialist, 6: 126-35.Goodman, N. (1951). The Structure of Appearance. Cambridge, MA: Harvard University Press.Gopnik, M., Dalalakis, J., Fukuda, S., and Hough-Eyamic, W. (1997). The biological basis of

language: Familial language impairment, in M. Gopnik (ed.), The Inheritance and Innatenessof Grammars. New York: Oxford University Press, 111-40.

Gordon, P. C., Hendrick, R., and Johnson, M. (2001). Memory interference during lan-guage processing. Journal of Experimental Psychology: Learning Memory and Cognition,27: 1411-23.

Gorman, K. (2012). Words and Gaps. Doctoral dissertation, Department of Linguistics,University of Pennsylvania. Forthcoming.

Gottlieb, R. (ed.) (1996). Reading Jazz: A Gathering of Autobiography, Reportage, and Criticismfrom 1919 to Now. New York: Pantheon Books.

Green, T. (1979). The necessity of syntax markers: Two experiments with artificial languages.Journal of Verbal Learning and Verbal Behavior, 18: 481-96.

Page 300: Rich Languages From Poor Inputs

References 285

Grille, N. (2008). Generalized Minimality: Syntactic Underspecification in Broca's Aphasia, LOTseries 186. The Netherlands: University of Utrecht.

(2009). Generalized Minimality: Feature impoverishment and comprehension deficits inagrammatism. Lingua, 119: 1426-43.

Grimshaw, J. (1981). Form, function, and the language acquisition device, in C. L. Baker andJ. J. McCarthy (eds), The Logical Problem of Language Acquisition. Cambridge, MA: MIT Press,165-82.

Grinstead, J., MacSwan, J., Curtiss, S., and Gelman, R. (2004). The independence of numberand language. Unpublished MS.

Grodzinsky, Y. (1986). Language deficits and the theory of syntax. Brain and Language,

27: 135-59-and Finkel, L. (1998). The neurology of empty categories. Journal of Cognitive Neuro-

science, 10/2: 281-92.Gruver, M. (1955). The Tadoma method. Volta Review, 57: 17-19.Guasti, M. T. (2000). An excursion into interrogatives in Early English and Italian, in

M.-A. Friedemann and L. Riggi (eds), The Acquisition of Syntax: Studies in ComparativeDevelopmental Linguistics. Harlow: Longman, 105-28.

Gutierrez, M. J. (2010). Comprehension of relative clauses in Li Basque, in K. Franich, K. M.Iserman, and L. L. Keil (eds), BUCLD 34: Proceedings of the 34th Annual Boston UniversityConference on Language Development. Somerville, CA: Cascadilla Press, 162-73.

Hagiwara, H. (1985). The breakdown of functional categories and the economy of derivation.Brain and Language, 50: 92-116.

Hagoort, P., Wassenaar, M. E. D., and Brown, C. M. (2003). Syntax-related ERP-effects in Dutch.Cognitive Brain Research, 16/1: 38-50.

Hahne, A., Eckstein, K., and Friederici, A. D. (2004). Brain signatures of syntactic and seman-tic processes during children's language development. Journal of Cognitive Neuroscience,16: 1302-18.

and Friederici, A. D. (1999). Electrophysiological evidence for two steps in syntacticanalysis: Early automatic and late controlled processes. Journal of Cognitive Neuroscience,11: 194-205.

Hale, K. and Honie, L. (1972). An Introduction to the Sound System ofNavajo. MIT ms, revisedand expanded in 2010 by Wayne O'Neil.

and Keyser, S. J. (1993). On argument structure and the lexical expression of grammaticalrelations, in K. Hale and S. J. Keyser (eds), The View from Building 20: Essays in Honor ofSylvain Bromberger. Cambridge, MA: MIT Press, 53-110.

(2002). Prolegomenon to a Theory of Argument Structure. Cambridge, MA: MITPress.

Halle, M. (1962). Phonology in generative grammar. Word, 18: 54-72.(1973). Prolegomena to a theory of word formation. Linguistic Inquiry, 4: 3-16.(1978). Knowledge unlearned and untaught: What speakers know about the sounds of

their language, in M. Halle, J. Bresnan, and G. Miller (eds), Linguistic Theory and PsychologicalReality. Cambridge, MA: MIT Press, 294-303.

(1998). The stress of English words 1968-1998. Linguistic Inquiry, 29: 539-68.

Page 301: Rich Languages From Poor Inputs

286 References

Halle, M. and Kenstowicz, M. (1991). The Free Element Condition and cyclic vs. noncyclicstress. Linguistic Inquiry, 22: 457-501.

and Marantz, A. (1993). Distributed morphology and the pieces of inflection, in K. Haleand S. J. Keyser (eds), The View from Building 20: Essays in Honor of Sylvain Bromberger.Cambridge, MA: MIT Press, 111-76.

and Mohanan, K.-P. (1985). Segmental Phonology of Modern English. Linguistic Inquiry,16: 57-116.

and Stevens, K. N. (1962). Speech Recognition: A model and a program for research. RLEReports. Repr. in J. A. Fodor and J. J. Katz (eds), The Structure of Language: Readings in thePhilosophy of Language. Englewood Cliffs, NJ: Prentice-Hall.

and Vergnaud, J.-R. (1987). An Essay on Stress. Cambridge, MA: MIT Press.Hamburger, H. and Grain, S. (1982). Relative acquisition, inS. Kuczaj (ed.), Language Develop-

ment, Vol. I: Syntax and Semantics. Hillsdale, NJ: Lawrence Erlbaum.Hart, J., Berndt, R. S., and Caramazza, A. (1985). Category-specific naming deficit following

cerebral infarction. Nature, 316: 439-40.Hart, M. (1991). Planet Drum. San Francisco: Harper San Francisco.Hartman, J. (forthcoming). Intervention in tough constructions, in NELS 39: Proceedings of the

39th Annual Meeting of the North East Linguistic Society. Amherst, MA: GLSA.Hauser, M., Aslin, R., and Newport, E. (2001). Segmentation of the speech stream in a nonhu-

man primate: Statistical learning in cotton-top tamarin monkeys. Cognition, 78: 853-865.Chomsky, N., and Fitch, W. T. (2002). The faculty of language: What is it, who has it, and

how did it evolve? Science, 298: 1569-79.Havy, M. and Nazzi, T. (2009). Better processing of consonantal over vocalic information in

word learning at 16 months of age. Infancy, 14: 439-56.Hawkins, D. (1978). Critical barriers to science learning. Outlook, 29: 3-23.Hayes, B. (1982). Extrametricality and English stress. Linguistic Inquiry, 13/2: 227-76.

(1995). Metrical stress theory: Principles and case studies. Cambridge: Cambridge Uni-versity Press.

Henderson, E. H. and Beers, J. (eds) (1980). Developmental and Cognitive Aspects of Learningto Spell: A Reflection of Word Knowledge. Newark, DE: International Reading Association.

Hicks, G. (2009). 'Tough'-constructions and their derivation. Linguistic Inquiry, 40: 535-66.Hirsch, C. and Hartman, J. (2006). Some (wh-) questions concerning passive interactions, in

A. Belletti, E. Bennati, C. Chesi, E. Di Domenico, and I. Ferrari (eds), Proceedings of theConference on Generative Approaches to Language Acquisition (GALA). Cambridge: Cam-bridge Scholars Press.

Orfitelli, R., and Wexler, K. (2007). When seem means think: The role of the experiencer-phrase in children's comprehension of raising, in A. Belikova, L. Meroni, and M. Umeda (eds),Galana 2: Proceedings of the Conference on Generative Approaches to Language Acquisition-North America 2. Somerville, MA: Cascadilla Press.

and Wexler, K. (2004). Children's passives and their resulting interpretation, in K. U. Deen,J. Nomura, B. Schulz, and B. D. Schwartz (eds), The Proceedings of the Inaugural Conferenceon Generative Approaches to Language Acquisition-North America, Honolulu, HI, OccasionalPapers in Linguistics, 4. Storrs, Conn.: University of Connecticut, 125-36.

Page 302: Rich Languages From Poor Inputs

References 287

(2006). The late development of raising: What children seem to think about seem,in W. Davies and S. Dubinsky (eds), New Horizons in the Analysis of Control and Raising.Dordrecht: Springer, 35-70.

(20073). The development of inverse copulas (and clefts). Paper presented at WesternConference on Linguistics. December 2007, San Diego, CA.

(2oo7b). The late acquisition of raising: What children seem to think about seem,in S. Dubinsky and B. Davies (eds), New Horizons in the Analysis of Control and Raising.New York: Springer.

Hochmann, J.-R. (submitted). Frequency, function words, and the second Gavagai problem.Benavides-Varela, S., Nespor, M., and Mehler, J. (2011). Vowels and consonants in early

language acquisition. Developmental Science, 14: 1445-58.Endress, A. D., and Mehler, J. (2010). Word frequency as a cue for identifying function

words in infancy. Cognition, 115: 444-57.Hoeft, E, McCandliss, B. D., Black, J. M., Gantman, A., Zakerani, N., Hulme, C,

et al. (2011). Neural systems predicting long-term outcome in dyslexia. Proceedings of theNational Academy of Sciences of the United States of America, 108/1: 361-6.

Hohnen, B. and Stevenson, J. (1999). The structure of genetic influences on general cognitive,language, phonological, and reading abilities. Developmental Psychology, 35/2: 590-603.

Holmer, Arthur (2001). The ergativity parameter. Working Papers, 48. Lund University, Depart-ment of Linguistics, 101-13.

Holmes, V. M. and O'Regan, J. K. (1981). Eye fixation patterns during the reading of relative-clause sentences. Journal of Verbal Learning and Verbal Behavior, 20: 417-30.

Hornstein, N. (1999). Movement and control. Linguistic Inquiry, 30: 69-96(2009). A Theory of Syntax: Minimal Operations and Universal Grammar. Cambridge:

Cambridge University Press.Hsiao, E and Gibson, E. (2003). Processing relative clause in Chinese. Cognition, 90: 3-27.Hume, D. (1739/1978). A Treatise on Human Nature. Oxford: Clarendon.Hutton, J. T, Arsenina, N., Kotik, B., and Luria, A. R. (1977). On the problems of speech

compensation and fluctuating intellectual performance. Cortex, 13/2 (June): 195-207.Hyams, N. and Snyder, W. (2007). Young children never smuggle: Reflexive clitics and the

Universal Freezing hypothesis. Manuscript, UCLA/University of Connecticut.Indefrey, P., Hagoort, P., Herzog, H., Seitz, R. J., and Brown, C. M. (2001). Syntactic processing

in left prefrontal cortex is independent of lexical meaning. Neurolmage, 14: 546-55.Ishizuka, T. (2005). Processing relative clauses in Japanese, in R. Okabe and K. Nielsen (eds),

Papers in Psycholinguistics, 2. UCLA Working Papers in Linguistics, 13. Los Angeles: UCLA,

135-57-Nakatani, K., and Gibson E. (2006). Processing Japanese relative clauses in context. Paper

presented at the igth Annual CUNY Conference on Human Sentence Processing. New York:City University of New York.

Itti, E., Gaw, I., Gonzalo, I., Pawlikowska-Haddal, A., Boone, K. B., Mlikotic, A., Itti, L., Mishkin,E S., and Swerdloff, R. S. (2006). The structural brain correlates of cognitive deficits in adultswith Klinefelters syndrome. Journal of Clinical Endocrinology and Metabolism, 91/4: 1423-7.

Jackendoff, R. (1972). Semantic Interpretation in Generative Grammar. Cambridge, MA: MITPress.

Page 303: Rich Languages From Poor Inputs

288 References

Jackendoff, R. (1977). X-bar Syntax. Cambridge, MA: MIT Press.(1983). Semantics and Cognition. Cambridge, MA: MIT Press.

Jackson, C. (1984). Language acquisition in two modalities: Person deixis and negation in ASLand English. Unpublished Masters thesis, UCLA.

Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S., and Dupoux, E. (2003). Phonologicalgrammar shapes the auditory cortex: A functional magnetic resonance imaging study. Journalof Neuroscience, 23: 9541-6.

Jakubowicz, C. (2011). Measuring derivational complexity: New evidence from typically-developing and SLI learners of Li French. Lingua, 121/3: 339-51.

James.W. (1890/1981). The Principles of Psychology. Cambridge, MA: Harvard University Press.Jandreau, S. and Bever, T G. (1992). Phrase-spaced formats improve comprehension in average

readers. Journal of Applied Psychology, 77: 143-6.Jarmulowicz, L. (2002). English derivational suffix frequency and children's stress judgments.

Brain and Language, 81: 192-204.Jodzio, K., Biechowska, D., and Leszniewska-Jodzio, B. (2008). Selectivity of lexical-semantic

disorders in Polish-speaking patients with aphasia: Evidence from single-word comprehen-sion. Archives of Clinical Neuropsychology, 23/5: 543-51.

Johns, A., Massam, D., and Ndayiragije, J. (eds) (2006). Ergativity: Emerging Issues. Dordrechtand Berlin: Springer.

Jones, N. (2007). The use of deictic and cohesive markers in narratives by children with WilliamsSyndrome. Unpublished PhD dissertation, UCLA.

Jusczyk, P. W. (1997). The Discovery of Spoken Language. Cambridge, MA: MIT Press.and Aslin, R. (1995). Infant's detection of the sound patterns of words in fluent speech.

Cognitive Psychology, 46: 65-97.Cutler, A., and Redanz, N. (1993). Preference for the predominant stress patterns of

English worlds. Child Development, 64: 675-87.Friederici, A. D., Wessels, J. M. L, Svenkerud, V. Y., and Jusczyk, A. M. (1993). Infants'

sensitivity to the sound patterns of native language words. Journal of Memory and Language,32: 402-20.

and Hohne, E. A. (1997). Infants' memory for spoken words. Science, 277: 1984-6.Houston, D. M., and Newsome, M. (1999). The beginnings of word segmentation in

English-learning infants. Cognitive Psychology, 39: 159-207.Kac, S. (1987). The Paninian approach to natural language processing. International Journal of

Approximate Reasoning, i: 117-30.Kahn, D. (1976). Syllable-based generalizations in English phonology. Doctoral dissertation,

Department of Linguistics and Philosophy, Massachusetts Institute of Technology.Kam, X. N. C. (2007). Statistical induction in the acquisition of auxiliary-inversion. In

H. Caunt-Nulton, S. Kulatilake, and I. Woo (eds), BUCLD 31: Proceedings of the 31 st BostonUniversity Conference on Language Development. Somerville, MA: Cascadilla Press.

(2009). Contributions of statistical induction to models of syntax acquisition. PhD dis-sertation, The Graduate Center of the City University of New York.

Stoyneshka, L, Tornyova, L., Fodor, J. D., and Sakas, W. G. (2008). Bigrams and therichness of the stimulus. Cognitive Science, 32: 771-87.

Page 304: Rich Languages From Poor Inputs

References 289

Karmilloff-Smith, A., Grant, J., Bertjoud, I., Davies, M., Howline, P., and Udwin, O. (1997).Language and Williams syndrome: How intact is 'intact'? Child Development, 68/2: 246-62.

Kayne, R. (1984). Connectedness and Binary Branching. Dordrecht: Foris Publications.(1994). The Antisymmetry of Syntax. Cambridge, MA: MIT Press.

Kean, M.-L. (1979). Agrammatism: A phonological deficit? Cognition, 7: 69-83.Keenan, E. L. and Comrie, B. (1977). Noun phrase accessibility and universal grammar.

Linguistic Inquiry, 8: 63-99.and Hawkins, S. (1987). The psychological validity of the accessibility hierarchy, in

E. Keenan (ed.), Universal Grammar: 15 Essays. London: Routledge, 60-85.Kehoe, M. and Stoel-Gammon, C. (1997). The acquisition of prosodic structure: An

investigation of current accounts of children's prosodic development. Language,73: 113-44.

Kempler, D. (1984). Syntactic and symbolic abilities in Alzheimer's disease. Unpublished PhDdissertation, UCLA.

Curtiss, S., and Jackson, C. (1987). Syntactic preservation in Alzheimer's disease. Journalof Speech and Hearing Research, 30: 343-50.

Kessels, R., Hendriks, M., Schouten, J., Asselen, M. van, and Postma, A. (2004). Spatial mem-ory deficits in patients after unilateral selective amygdalohippocampectomy. Journal of theInternational Neuropsychological Society, 10/6: 907-12.

Khedr, E. M., Hamed, E., Said, A., and Basahi, J. (2002). Handedness and language cerebrallateralization. European Journal of Applied Physiology, 87/4-5: 469-73.

Kidd, E., Brandt, S., Lieven, E., and Tomasello, M. (2007). Object relatives made easy: A cross-linguistic comparison of the constraints influencing young children's processing of relativeclauses. Language and Cognitive Processes, 22/6: 860-97.

Kim, M., Landau, B., and Phillips, C. (1999). Cross-linguistic differences in children's syntaxfor locative verbs, in A. Greenhill, H. Littlefield, and C. Tano (eds), BCULD 23: Proceedings ofthe 2$rd Boston University Conference on Language Development. Somerville, MA: CascadillaPress, 337-48.

King, J. and Just, M. A. (1991). Individual differences in syntactic processing: The role ofworking memory. Journal of Memory and Language, 30: 580-602.

and Kutas, M. (1995). Who did what and when? Using word- and cause-level ERPs tomonitor working memory usage in reading. Journal of Cognitive N'euro science, 7: 376-95.

Klima, E. S. and Bellugi, U. (1966). Syntactic regularities in the speech of children, in J. Lyonsand R. J. Wales (eds), Psycholinguistics Papers. Edinburgh: University of Edinburgh Press,183-208.

Knecht, S., Drager, B., Deppe, M., Bobe, L., Lohmann, H., Floel, A., Ringelstein, E.-B., andHenningsen, H. (2000). Handedness and hemispheric language dominance in healthyhumans. Brain, 123/12: 2512-18.

Kratzer, A. (2001). Building statives. Proceedings of the Berkeley Linguistics Society, 26: 385-99.Kuhl, P., Williams, K., Lacerda, E, Stevens, K., and Lindblom, B. (1992). Linguistic experience

alters phonetic perception in infants by 6 months of age. Science, 255: 606-8.Kwon, N., Lee, Y., Gordon, P., Kluender, R., and Polinsky, R. (2010). Cognitive and linguistic

determinants of the subject-object asymmetry: An eye-tracking study of pre-nominal relativeclauses in Korean. Language, 86/3: 546-82.

Page 305: Rich Languages From Poor Inputs

290 References

Kwon, N., Polinsky, M., and Kluender, R. (2006). Subject preference in Korean, in D. Baumer,D. Montero, and M. Scanlon (eds), Proceedings of the 2$th West Coast Conference on FormalLinguistics WCCFL-2$. Somerville, MA: Cascadilla Press, 1-14.

Laka, I. (1993). Unergatives that assign ergative, unaccusatives that assign accusative, inJ. D. Bobaljik and C. Phillips (eds), Papers on Case and Agreement I: MIT Working Papersin Lingusitics, 18. Cambridge, MA: MITWPL, 149-72.

and Fernandez, B. (eds) (2012). Accounting for ergativity. Lingua, 122/3: 177-80.Lambek, J. (1958). The mathematics of sentence structure. American Mathematical Monthly,

65: 154-70.Landau, B. and Gleitman, L. R. (1985). Language and Experience: Evidence from the Blind Child.

Cambridge, MA: Harvard University Press.and Stecker, D. (1990). Objects and places: Geometric and syntactic representations in

early lexical learning. Cognitive Development, 5: 287-312.Landau, I. (2003). Movement out of control. Linguistic Inquiry, 34: 471-98.Lasnik, H. (2001). A note on the EPP. Linguistic Inquiry, 32/2: 356-62.Lederer, A., Gleitman, H., and Gleitman, L. (1995). Verbs of a feather flock together: Seman-

tic information in the structure of maternal speech, in M. Tomasello and W. E. Merriman(eds), Beyond Names for Things: Young Children's Acquisition of Verbs. Hillsdale, NJ: LawrenceErlbaum, 277-97.

Lee, L. L. (1974). Developmental Sentence Analysis: A Grammatical Assessment Procedure forSpeech and Language Clinicians. Evanston, IL: Northwestern University Press.

Legate, J. A. and Yang, C. (2002). Empirical re-assessment of stimulus poverty arguments. TheLinguistic Review, 19: 151-62.

(2011). Learning exceptions. Manuscript, University of Pennsylvania.Lehtonen, A. and Bryant, P. (2005). Doublet challenge: Form comes before function in children's

understanding of their orthography. Developmental Science, 8: 211-17.Levelt, W. J. M., Schriefers, H., Vorberg, D., Meyer, A. S., Pechmann, T., and Havinga, J. (1991).

The time course of lexical access in speech production: A study of picture naming. Psycholog-ical Review, 98: 122-42.

Levin, B. (1983). On the nature of ergativity. Dissertation, Cambridge, MA, MIT.(1993). English Verb Classes and Alternations: A Preliminary Investigation. Chicago, IL:

University of Chicago Press.Lewis, B. (1992). Pedigree analysis of children with phonology disorders. Journal of Learning

Disabilities, 25/9: 586-97.Lewis, J. D. and Elman, J. (2001). Learnability and the statistical structure of language: Poverty

of stimulus arguments revisited, in B. Skarabela, S. Fish, and A. H. Do (eds), Proceedings ofthe 26th Annual Boston University Conference on Language Development. Somerville, MA:Cascadilla Press, 359-70.

Li, P. (1994). Subcategorization as a Predictor of Verb Meaning: Cross-language Study inMandarin. Unpublished manuscript, University of Pennsylvania.

Liberman, M. and Prince, A. S. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8:249-336.

Lidz, J., Gleitman, H., and Gleitman, L. (2003). Understandinghow input matters: Verb learningand the footprint of universal grammar. Cognition, 87: 151-78.

Page 306: Rich Languages From Poor Inputs

References 291

Lightfoot, D. (1991). How to Set Parameters. Cambridge, MA: MIT Press.Lin, C. C. (2006). Grammar and parsing: A typological investigation of relative-clause

processing. Dissertation, Tucson, University of Arizona.(2008). The processing foundation of head-final relative clauses. Language and Linguistics,

9:813-38.and Bever, T (2006). Subject preference in the processing of relative clauses in Chinese, in

D. Baumer, D. Monterio, and M. Scanlon (eds), Proceedings of the 2$th West Coast Conferenceon Formal Linguistics, WCCFL-2$. Somerville, MA: Cascadilla Press, 254-60.

Locker, Jr, L., Simpson, G. B., and Yates, M. (2003). Semantic neighborhood effects on therecognition of ambiguous words. Memory and Cognition, 31/4: 505-15.

Lovett, M. W., Lacerenza, L., Borden, S. L., Frijters, J. C., Steinbach, K. A., and DePalma, M.(2000). Components of effective remediation for developmental reading disabilities:Combining phonological and strategy-based instruction to improve outcomes. Journal ofEducational Psychology 92, 263-283.

Luria A. R. (1948/1963). The Restoration of Brain Functions After War Trauma. Moscow: Pressof the Academy of Medical Sciences of the USSR. English edn, The Hague: Pergamon Press,

1963-(1970). Traumatic Aphasia (trans.). The Hague: Mouton

McCloskey, J. (1991). Clause structure, ellipsis and proper government in Irish. Lingua,85: 259-302.

(1996). On the scope of verb raising in Irish. Natural Language and Linguistic Theory,

14: 47-104-(2009). Irish as a Configurational Language. Berkeley: Berkeley Syntax Circle.

MacDonald, M. C. (1994). Probabilistic constraints and syntactic ambiguity resolution. Lan-guage and Cognitive Processes, 9: 157-201.

MacDonald, M. C., and Christiansen, M. (2002). Reassessing working memory: Comment onJust and Carpenter (1992) and Waters and Caplan (1999). Psychological Review, 109: 35-54.

Pearlmutter, N., and Seidenberg, M. S. (1994). The lexical nature of syntactic ambiguityresolution. Psychological Review, 101: 676-703.

MacGregor, L., Pulvermuller, R, Caseren, M. van, and Shtyrov, M. (2012, forthcoming). Ultra-rapid access to words in the brain: Neuromagnetic evidence. Nature Communication.

McGregor, W. B. (2009). Typology of ergativity. Language and Linguistics Compass, 3/1:480-508.

McGue, M. and Broen, P. (1995). Familial aggregation of phonological disorders: Results froma 28-year follow-up. Journal of Speech and Hearing Research, 38: 1091-107.

McGuire, P. K., Robertson, D., Thacker, A., David, A. S., Kitson, N., Frackowiak, R. S. J.,and Frith, C. D. (1997). Neural correlates of thinking in sign language. NeuroReport, 8/3:695-8.

McRae, K., Spivey-Knowlton, M. J., and Tanenhaus, M. K. (1998). Modeling the influence ofthematic fit (and other constraints) in on-line sentence comprehension. Journal of Memoryand Language, 38: 283-312.

MacWhinney, B. (1977). Starting points. Language, 53: 152-68.(1982). Basic syntactic processes, in S. Kuczaj (ed.), Syntax and Semantics, i: Language

Acquisition. Hillsdale, NJ: Lawrence Erlbaum, 73-136.

Page 307: Rich Languages From Poor Inputs

292 References

(2000). The CHILDES-Project, Volume 2: Tools for Analyzing Talk: The Database, 3rd edn.Hillsdale, NJ: Lawrence Erlbaum.

and Pleh, C. (1988). The processing of restrictive relative clauses in Hungarian. Cognition,29:95-141.

Mak, P., Vonk, W., and Schriefers, H. (2002). The influence of animacy on relative clauseprocessing. Journal of Memory and Language, 47: 50-68.

Mak, W. M., Vonk, W., and Schriefers, H. (2006). Animacy in relative clause processing: Thehiker that rocks crush. Journal of Memory and Language, 54: 466-90.

Maratsos, M., Fox, D., Becker, J., and M. Chalkey. (1985). Semantic restrictions on children'spassives. Cognition, 19: 167-91.

Marcus, G. E, Brinkmann, U., Clahsen, H., Wiese, R., and Pinker, S. (1995). German inflection:The exception that proves the rule. Cognitive Psychology, 29: 189-256.

Vijayan, S., Bandi Rao, S., and Vishton, P. M. (1999). Rule-learning in seven-month-oldinfants. Science, 283: 77-80.

Marcus, M. P., Santorini, B., Marcinkiewicz, M. A., and Taylor, A. (1999). Treebank-$.Philadelphia: Linguistic Data Consortium.

Mason, K., Rowley, K., Marshall, C., Atkinson, J., Herman, R., Woll, B., and Morgan, G. (2010).Identifying specific language impairment in deaf children acquiring British Sign Language:Implications for theory and practice. British Journal of Developmental Psychology, 28: 33-49.

Mayer, R. E. (2004). Should there be a three-strikes rule against pure discovery learning? Thecase for guided methods of instruction. American Psychologist, 59: 15-19.

Mecklinger, A., Schriefers, H., Steinhauer, K., and Friederici, D. (1995). Processing relativeclauses varying on syntactic and semantic dimensions: An analysis with event-related poten-tials. Memory and Cognition, 23: 477-94.

Medina, T. N., Snedeker, J., Trueswell, J. C., and Gleitman, L. R. (2011). How words can (andcannot) be learned by observation. Proceedings of the National Academy of Sciences, 108:9014-19.

Mehler, J., Peria, M., Nespor, M., and Bonatti, L. L. (2006). The 'soul' of language does not usestatistics: Reflections on vowels and consonants. Cortex, 42: 846-54.

Melzack, R. 1992. Phantom limbs. Scientific American, 266 (April): 120-6.Meyer, A. and Fischer, M. (1969). Economy of description by automata, grammars, and formal

systems. Mathematical Systems Theory, 3: 110-18.Miceli, G. and Caramazza, A. (1988). Dissociation of inflectional and derivational morphology.

Brain and Language, 35/1: 24-65.Miller, G., Galanter, E., and Pribram, K. (1960). Plans and the Structure of Behavior. New York:

Holt.Mintz, T. H. (2002). Category induction from distributional cues in an artificial language:

Grammatical categories in speech to young children. Cognitive Science, 26/4: 393-424.(2003). Frequent frames as a cue for grammatical categories in child-directed speech.

Cognition, 90/1: 91-117.(2006). Finding the verbs: Distributional cues to categories available to young learners,

in K. Hirsh-Pasek and R. M. Golinkoff (eds), Action Meets Word: How Children Learn Verbs.New York: Oxford University Press, 31-63.

Page 308: Rich Languages From Poor Inputs

References 293

Newport, E. L, and Bever, T. G. (2002). The distributional structure of grammaticalcategories in speech to young children. Cognitive Science, 26: 393-424.

Mitchell, D. C, Cuetos, E, Corley, M. M. B., and Brysbaert, M. (1995). Exposure-based modelsof human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records.Journal of Psycholinguistic Research, 24: 469-88.

Miyamoto, E. and Nakamura, M. (2003). Subject/Object asymmetries in the processing ofrelative clauses in Japanese, in G. Carding and M. Tsujimura (eds), Proceedings of the 23thWest Coast Conference on Formal Linguistics WCCFL-23. Somerville, MA: Cascadilla Press,

342-55-Moats, L. C. (1994). The missing foundation in teacher education: Knowledge of the structure

of spoken and written language. Annals of Dyslexia, 44: 81-102.(2000). Speech to Print: Language Essentials for Teachers. Baltimore, MD: Paul H. Brookes

Publishing.(2001). Overcoming the language gap. American Educator, 5-9.

Moerk, E. (2000). The Guided Acquisition of First-Language Skills. Westport, CT: Ablex.Money, J. (1963). Cytogenic and psychosexual incongruities with a note on space form blind-

ness. American Journal of Psychiatry, 119: 820-7.(1973). Turner's syndrome and parietal function. Cortex, 9: 385-93.and Alexander, D. (1966). Turner's syndrome: Further demonstrations of the presence of

specific cognitional deficiencies. Journal of Medical Genetics, 3: 223-31.Montessori, M. (1912/1964). The Montessori Method. New York: Schocken Books.Morgan, G., Herman, R., and Woll, B. (2007). Language impairment in sign language: Break-

throughs and puzzles. International Journal of Language and Communication Disorders, 42/1:97-105.

Morgan,}. L. (1986). From Simple Input to Complex Grammar. Cambridge, MA: MIT Press andBradford Books.

Moro, A. (1997). The Raising of Predicates: Predicative Noun Phrases and the Theory of ClauseStructure. Cambridge: Cambridge University Press.

(2000). Dynamic Antisymmetry, Linguistic Inquiry Monograph Series, 38. Cambridge,MA: MIT Press.

(2009). Rethinking Symmetry: A Note on Labelling and the EPP, in P. Cotticelli-Kurrasand A. Tomaselli (eds), La grammatica tra storia e teoria: Scritti in onore di Giorgio Graffi.Alessandria: Edizioni dell'Orso, 129-31.

(2011). A closer look at the turtle's eyes. Proceedings of the National Academy of Sciences,108/6: 2177-8.

Morris, R., Lovett, M., Wolf, M., Sevcik, R., Steinbach, K., Frijters, J. (2012). Multiple-component remediation for developmental reading disabilities: IQ, socioeconomic status, andrace as factors in remedial outcome. Journal of Learning Disabilities, 45/2: 99-127.

Munaro, N. (1999). Sintagmi interrogativi nei dialetti italiani settrentrionali. Padova: Unipress.Musso, M., Moro, A., Glauche, V., Rijntjes, M., Reichenbach, J., Buechel, C., and Weiller, C.

(2003). Broca's area and the language instinct. Nature Neuroscience, 6/7: 774-81.Naatanen, R., Lehtokovski, A., Lennes, M., Cheour, M., Huotilainen, M., and Ivonen, A.

(1997)- Language-specific phoneme representations revealed by electric and magnetic brainresponses. Nature, 385: 432-4.

Page 309: Rich Languages From Poor Inputs

294 References

Naigles, L. G. (1990). Children use syntax to learn verb meanings. Journal of Child Language,

i7:357-74-Gleitman, H., and Gleitman, L. R. (1993). Children acquire word meaning components

from syntactic evidence, in E. Dromi (ed.), Language and Cognition: A Developmental Per-spective. Norwood, NJ: Ablex, 104-40.

Nazzi, T. (2005). Use of phonetic specificity during the acquisition of new words: Differencesbetween consonants and vowels. Cognition, 98: 13-30.

and Bertoncini, J. (2009). Consonant specificity in onset and coda positions in early lexicalacquisition. Language and Speech, 52: 463-80.

Floccia, C, Moquet, B., and Butler, J. (2009). Bias for consonantal over vocalic informa-tion in French- and English-learning 3o-month-olds: Crosslinguistic evidence in early wordlearning. Journal of Experimental Child Psychology, 102: 522-37.

Nelson, H. D., Ngyren, P., Walker, M., and Panoscha, R. (2006). Screening for speech and lan-guage delay in preschool children: Systematic evidence review for the US Preventive ServicesTask Force. Pediatrics, 117/2: 298-319.

Nespor, M., Pena, M., and Mehler, J. (2003). On the different roles of vowels and consonants inspeech processing and language acquisition. Lingue e Linguaggio, 2: 201-27.

Netley, C. and Rovet, J. (1982). Verbal deficits in children with 47 XXY and 47XXXkaryotypes:A descriptive and experimental study. Brain and Language, 17: 10-18.

Newman, A. J., Pancheva, R., Ozawa, K., Neville, H. J., and Ullman, M. T. (2001). An event-related fMRI study of syntactic and semantic violations. Journal of Psycholinguistic Research,

30/3: 339-64-Newport, E. L. (1990). Maturational constraints on language learning. Cognitive Science,

14: 11-28.Norton, S. J., Schultz, M. C., Reed, C. M., Braida, L. D., Durlach, N. L, Rabinowitz, W. M., and

Chomsky, C. (1977). Analytic study of the Tadoma method: Background and preliminaryresults. Journal of Speech and Hearing Research, 20: 574-95.

Novogrodsky, R. and Friedmann, N. (2006). The production of relative clauses in syntacticSLI: A window to the nature of the impairment. Advances in Speech-Language Pathology,

8: 364-75-Nunes Carraher, T. and Rego, L. R. B. (1984). Desenvolvimento cognitive e alfabetiza9ao [Cog-

nitive development and the acquisition of literacy]. Revista Brasileira de Estudos Pedagogicos,

63:38-55-Nussbaum, J. and Novick, S. (1982). Alternative frameworks, conceptual conflict and accom-

modations: Toward a principal teaching strategy. Instructional Sciences, 11: 183-200.Oberecker, R., Friedrich, M., and Friederici, A. (2005). Neural correlates of syntactic processing

in two-year-olds. Journal of Cognitive Science, 17/10: 1667-78.O'Grady, W., Miseon, L., and Miho, C. (2003). A subject-object asymmetry in the acquisition

of relative clauses in Korean as a second language. Studies in Second Language Acquisition,25:433-48.

Pace, A. J. and Lucido, P. (n.d.). Intuitive conceptions and misconceptions in science: Specificcases of intractable prior knowledge. Unpublished paper, University of Missouri, Kansas City,MO. 64110.

Page 310: Rich Languages From Poor Inputs

References 295

Pacton, S., Perruchet, P., Fayol, M., and Cleeremans, A. (2001). Implicit learning out of thelab: The case of orthographic regularities. Journal of Experimental Psychology: General, 130:401-26.

Pallier, C, Devauchelle, A.-D., and Dehaene, S. (2011). Cortical representation of the con-stituent structure of sentences. Proceedings of the National Academy of Sciences, 108/6:2522-7.

Papafragou, A., Cassidy, K., and Gleitman, L. R. (2007). When we think about thinking: Theacquisition of belief verbs. Cognition, 105: 125-65.

Peirce, C. S. (1957). The Logic of Abduction, in V. Tomas (ed.), Peirce's Essays in the Philosophyof Science. New York: Liberal Arts Press.

Pena, M., Maki, A., Kovacic, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, R, and Mehler,J. (2003). Sounds and silence: An optical topography study of language recognition at birth.Proceedings of the National Academy of Sciences, 100/20: 11702-5.

Penke, M. and Krause, M. (2002). German noun plurals: A challenge to the dual-mechanismmodel. Brain and Language, 81: 303-11.

Pennington, B., Heaton, R., Karzmark, M., Pennington, M., Lehman, R., and Shucard, D.(1985). The neuropsychological phenotype in Turner's syndrome. Cortex, 21: 391-404.

Perani, D., Saccuman, M. C., Scifo, P., Spadae, D., Andreollia, G., Rovellif, R., Baldoli, C., andKoelsch, S. (2010). Functional specializations for music processing in the human newbornbrain. Proceedings of the National Academy of Sciences, 107/10 (March 9): 4758-63.

Pereira, F. (2000). Formal grammar and information theory: together again? PhilosophicalTransactions: Mathematical, Physical and Engineering Sciences, 358: 1239-53.

Perfors, A., Tenenbaum, J., and Regier, T. (2006). Poverty of the stimulus? A rational approach.Proceedings of the 28th Annual Conference of the Cognitive Science Society, Vancouver, Canada,663-8.

Perfors, A., Tenenbaum, J., and Regier, T. (2011). The learnability of abstract syntactic princi-ples. Cognition, 118/3: 306-38.

Perovic, A. and Wexler, K. (2007). Complex grammar in Williams syndrome. Clinical Linguisticsand Phonetics, 21/9: 729-45.

Pesetzky, D. (1995). Zero Syntax: Experiencers and Cascades. Cambridge, MA: MIT Press.Petitto, L. A. (1987). On the autonomy of language and gesture: Evidence from the acquisition

of personal pronouns in ASL. Cognition, 27: 1-52.Zatorre, R., Gauna, K., Nikelski, E. J., Dostie, D., and Evans, A. (2000). Speech-like cerebral

activity in profoundly deaf people while processing signed languages: Implications for theneural basis of human language. Proceedings of the National Academy of Sciences, 97/25:13961-6.

Phillips, C. (2001). Levels of representation in the electrophysiology of speech perception.Cognitive Science, 25: 711-31.

Kazanina, N., and Abada, S. (2005). ERP effects of the processing of syntactic long-distance dependencies. Cognitive Brain Research, 22: 407-28.

Pellathy, T, Marantz, A., Yellin, E., Wexler, K., Poeppel, D., McGinnis, M., and Roberts, T.(2000). Auditory cortex accesses phonological categories: An MEG mismatch study. Journalof Cognitive^!euro science, 12: 1036-55.

Page 311: Rich Languages From Poor Inputs

296 References

Piaget, Jean (1973/1948). To Understand is to Invent: The Future of Education. New York:Grossman (The Viking Press).

Piattelli-Palmarini, M. (1980). Language and Learning: The Debate between Jean Piaget andNoam Chomsky. London: Routledge and Kegan Paul.

PicciriHi, M., Sciarma, T., and Luzzi, S. (2000). Modularity of music: Evidence from a case ofpure amusia. Journal of Neurology, Neurosurgery and Psychiatry, 69: 541-5.

Pinker, S. (1984). Language Learnability and Language Development. Cambridge, MA:Harvard University Press.

(1989). Learnability and Cognition. Cambridge, MA: MIT Press.(1999). Words and Rules: The Ingredients of Language. New York: Basic Books.

Plato (c.38oBCE/2oio). Meno, in Dialogues of Plato, trans, into English, with Analyses andIntroduction by B. Jowett. Cambridge: Cambridge University Press.

Poeppel, D. and Wexler, K. (1993). The full competence hypothesis of clause structure in earlyGerman. Language, 69: 1-33.

Poletto, C. and Pollock, J.-Y (2009). Another look at wh-questions in Romance: The caseof medrisiotto and its consequences for the analysis of French wh-in-situ and embeddedinterrogatives, in L. Wentzel, Romance Languages and Linguistic Theory 2006: Selected Papersfrom 'GoingRomance', vol. i, December 7-9, 2006. Amsterdam: John Benjamins, 199-258.

Polinsky, M., Gomez-Gallo, C., Kravtchenko, E., and Testelets, Y (2012). Subject preference andergativity, in I. Laka and B. Fernandez (eds), Accounting for Ergativity. Lingua, 122/3: 267-7.

Pollard, C. and Sag, I. (1994). Head-driven Phrase Structure Grammar. Chicago: University ofChicago Press.

Polio, T. C., Kessler, B., and Treiman, R. (2005). Vowels, syllables, and letter names: Differencesbetween young children's spelling in English and Portuguese. Journal of Experimental ChildPsychology, 92: 161-81.

(2009). Statistical patterns in children's early writing. Journal of ExperimentalChild Psychology, 104: 410-26.

Treiman, R., and Kessler, B. (2008). Preschoolers use partial letter names to selectspellings: Evidence from Portuguese. Applied Psycholinguistics, 29: 1-18.

Poser, William (1992). Blocking of phrasal constructions by lexical items, in I. Sag andA. Szabolcsi (eds), Lexical Matters. Stanford: CLSI Publications, 111-31.

Pugh, K. R., Sandak, R., Frost, S. J., Moore, D., and Mencl, W. E. (2005). Examining readingdevelopment and reading disability in English language learners: Potential contributions fromfunctional neuroimaging. Learning Disabilities Research and Practice, 20/1: 24-30.

Pulvermiiller, E and Assadollah, P. (2007). Grammar or serial order?: Discrete combinatorialbrain mechanisms reflected by the syntactic Mismatch Negativity. Journal of Cognitive Neu-roscience, 19/6: 971-80.

and Shtyrov, Y. (2003). Automatic processing of grammar in the human brain as revealedby the mismatch. Neuroimage, 20/1: 159-72.

Hastings, A., and Carlyon, R. (2008). Syntax as a reflex: Neurophysiologicalevidence for early automaticity of grammatical processing. Brain and Language, 104/3:

244-53-and Hauk, O. (2009). Understanding in an instant: Neurophysiological evidence for

mechanistic language circuits in the brain. Brain and Language, 110: 81-94.

Page 312: Rich Languages From Poor Inputs

References 297

Quigley, S. P., Steinkamp, M. W., Power, D. J., and Jones, B. W. (1978). Test of Syntactic Abilities.Beaverton, OR: Dormac.

Quine, W. (1960). Word and Object. New York: Wiley.Ramachandran, V S. and Blakeslee, S. (1998). Phantoms in the Brain: Probing the Mysteries of

the Human Mind. New York: Quill William Morrow.Ramchand, G. (2008). Verb Meaning and the Lexicon: A First Phase Syntax. Cambridge, MA:

Cambridge University Press.Ramshaw, L. A. and Marcus, M. P. (1995). Text chunking using transformation-based learning,

in D. Yarowsky and K. Church (eds), Proceedings of the 3rd ACL Workshop on Very LargeCorpora. June 30, 1995, Cambridge, MA, MIT, 82-94.

Rappaport Hovav, M. and Levin, B. (1988). What to do with theta-roles, in W. Wilkins (ed.),Syntax and Semantics, Vol. 21: Thematic Relations. New York: Academic Press, 7-36.

Ravitch, D. (2010). The Death and Life of the Great American School System: How Testing andChoice Are Undermining Education. New York: Basic Books.

Read, C. (1970). Children's perceptions of the sounds of English: Phonology from three to six.Unpublished doctoral dissertation, Harvard University.

(1971)- Pre-school children's knowledge of English phonology. Harvard EducationalReview, 41: 1-34.

(1975). Children's Categorization of Speech Sounds in English. Urban a, IL: National Councilof Teachers of English.

(1980). Creative spelling by young children, in T. Shopen and J. M. Williams (eds), Stan-dards and Dialects in English. Cambridge, MA: Winthrop Publishers, 106-36.

(1986). Children's Creative Spelling. London: Routledge & Kegan Paul.Reali, E and Christiansen, M. H. (2003). Reappraising poverty of stimulus argu-

ment: A corpus analysis approach. In BUCLD 28: Proceedings Supplement of the28th Annual Boston University Conference on Language Development. Online at<http://www.bu.edu/linguistics/BUCLD/supp.html>.

(2005). Uncovering the richness of the stimulus: Structure dependence and indirectstatistical evidence. Cognitive Science, 29: 1007-28.

Redington, M. and Chater, N. (1998). Connectionist and statistical approaches to languageacquisition: A distributional perspective. Language and Cognitive Processes, 13/2-3: 129-91.

Reed, C. M., Doherty, M. J., Braida, L. D., and Durlach, N. I. (1982). Analytic study of theTadoma method: Further experiments with inexperienced observers. Journal of Speech andHearing Research, 25: 216-23.

Durlach, N. L, and Braida, L. D. (1982). Research on tactile communication of speech:A review. ASHA Monographs, 20.

and Schultz, M. C. (1982). Analytic study of the Tadoma method: Identificationof consonants and vowels by an experienced Tadoma user. Journal of Speech and HearingResearch, 25: 108-16.

Rabinowitz, W. M., Durlach, N. L, Braida, L. D., Conway-Fithian, S., and Schultz, M. C.(1985). Research on the Tadoma method of speech communication. Journal of the AcousticalSociety of America, 77: 247-57.

Rubin, S. L, Braida, L. D., and Durlach, N. I. (1978). Analytic study of the Tadoma method:Discrimination ability of untrained observers. Journal of Speech and Hearing Research,21:625-37.

Page 313: Rich Languages From Poor Inputs

298 References

Richards, N. (2003). Why there is an EPP. Gengo Kenkyu, 123: 221-56.Ristad, E. (1986). Computational complexity of current GPSG theory. AI Lab Memo 894,

Cambridge, MA.(1993). The Language Complexity Game. Cambridge, MA: MIT Press.

Rizzi, L. (1990). Relativized Minimality. Cambridge, MA: MIT Press.(1997)- The nne structure of the left periphery, in L. Haegeman (ed.), Elements of

Grammar. Dordrecht: Kluwer Academic Publishers, 281-387(2004). Locality and the left periphery, in A. Belletti (ed.), Structures and Beyond: The

Cartography of Syntactic Structures, Vol. 3. New York: Oxford University Press, 223-51.(2006). Selective residual V-2 in Italian interrogatives, in P. Brandt and E. Fuss (eds), Form,

Structure and Grammar. Berlin: Akademie Verlag, 229-42.Roca, I. (2005). Saturation of parameter setting in Spanish stress. Phonology, 22: 345-94.Roland, P. K. and Zilles, R. (1998). Structural divisions and functional fields in the human

cerebral cortex. Brain Research Reviews, 26/2-3: 87-105.Rondal, J. (1995). Exceptional Language Development in Down Syndrome. Cambridge:

Cambridge University Press.Rosen, S., Adlard, A., and van der Lely, H. K. J. (2009). Backward and simultaneous masking

in children with grammatical specific language impairment: No simple link between auditoryand language abilities. Journal of Speech, Language and Hearing Research, 52/2: 396-411.

Rosenbaum, P. S. (1967). The Grammar of English Predicate Constructions. Cambridge MA: MITPress.

Ross, D. S. and Bever, T G. (2004). The time course for language acquisition in biologicallydistinct populations: Evidence from deaf individuals. Brain and Language, 89: 115-21.

Ross, J. R. (1967). Constraints on variables in syntax. Doctoral dissertation, MassachusettsInstitute of Technology. Online at <http://hdl.handle.net/i72i.i/i5i66>. Published 1986 asInfinite Syntax! Norwood, NJ: Ablex.

Roth, P. E (1984). Accelerating language learning in young children. Journal of Child Language,11: 89-107.

Rothenberg, M., Verrillo, R. T, Zahorian, S. A., Brachman, M. L., and Bolanowski, Jr, S. J.(19/7)- Vibrotactile frequency for encoding a speech parameter. Journal of the AcousticalSociety of America, 62: 1003-12.

Rouveret, A. and Vergnaud, J. R. (1980). Specifying reference to the subject. Linguistic Inquiry,11: 97-202

Rovet, J. (1998). Turners syndrome, in B. P. Rourke (ed.), Syndrome of Nonverbal LearningDisabilities: Neurodevelopmental Manifestations. New York: The Guilford Press, 351-71.

and Ireland, L. (1994). The behavioral phenotype in children with Turner's syndrome.Journal of Learning Disabilities, 26: 333-41.

Netley, C.,Keenan,M., Bailey, J., and Stewart, D. (1996). The psychoeducational profile ofboys with Klinefelter Syndrome. Journal of Learning Disabilities, 29/2: 180-96.

Rozin, P. and Gleitman, L. R. (1977). The structure and acquisition of reading, II: The readingprocess and the acquisition of the alphabetic principle, in A. S. Reber and D. L. Scarborough(eds), Toward a Psychology of Reading. Hillsdale, NJ: Lawrence Erlbaum, 55-141.

Sacks, O. (1987). The Man who Mistook his Wife for a Hat and Other Clinical Tales. New York:Summit Books.

Page 314: Rich Languages From Poor Inputs

References 299

-(2007). Musicophilia. New York: Albert Knopf.Saffran, J. R. (2001). Words in a sea of sounds: The output of infant statistical learning.

Cognition, 81/2: 149-69.Aslin, R. N., and Newport, E. L. (1996). Statistical learning by 8-month-old infants.

Science, 274: 1926-8.and Wilson, D. P. (2003). From syllables to syntax: Multi-level statistical learning by

12-month-old infants. Infancy, 4: 273-84.Sag, I. A. (2010). English filler-gap constructions. Language, 86: 486-545.

Wasow, T., and Bender, E. M. (2003). Syntactic Theory: A Formal Introduction, 2nd edn.Chicago: University of Chicago Press.

Sahin, N., Pinker, S., Cash, S., Schomer, C, and Holgren, E. (2009). Sequential processing oflexical, grammatical, and phonological information within Broca's area. Science, 326/5951:

445-9-Sandak, R., Mencl, W E., Frost, S. J., and Pugh, K. R. (2004). The neurological basis of skilled

and impaired reading: Recent findings and new directions. Scientific Studies of Reading, 8/3:273-92.

Santelmann, L. (1995). The acquisition of verb-second grammar in child Swedish: Continuityof Universal Grammar in wh-questions, topicalization and verb raising. PhD dissertation,Department of Linguistics, Ohio State University.

Santi, A. and Grodzinsky, Y. (2010). fMRI adaptation dissociates syntactic complexity dimen-sions. Neurolmage, 51 (2010): 1285-93.

Sauerland, U. and Gartner, H.-M. (eds) (2007). Interfaces + Recursion = Language? Berlin:Mouton de Gruyter.

Sauerland, U. and Gibson, E. (1998). How to predict the relative clause attachment preference.Paper presented at the i ith CUNY Sentence Processing Conference, Rutgers University, NewBrunswick, NJ.

Schaller, S. (1995). A Man Without Words. Berkeley: University of California Press.Schmitt, B. M., Schiltz, K., Zaake, W, Kutas, M., and Milnte, T. F. (2001). An electrophysio-

logical analysis of the time course of conceptual and syntactic encoding during tacit picturenaming. Journal of Cognitive Neuroscience, 13/4: 510-22.

Schriefers, H., Friederici, A. D., and Kuhn, K. (1995). The processing of locally ambiguousrelative clauses in German. Journal of Memory and Language, 34: 499-520.

Schultz, M. C., Norton, S. J., Conway-Fithian, S., and Reed, C. M. (1984). A survey of the useof the Tadoma method in the United States and Canada. Volta Review, 86: 282-92.

Senghas, A. (2003). Intergenerational influence and ontogenetic development in the emergenceof spatial grammar in Nicaraguan sign language. Cognitive Development, 18/4: 511-31.

Shanahan, T. and Neuman, S. B. (1997). Conversations: Literacy research that makes a differ-ence. Reading Research Quarterly, 32: 202-10.

Shreeve, J. (1993). Touching the phantom. Discover (June): 35-42.Silbert, A., Wolff, P., and Lillienthal, J. (1977). Spatial and temporal processing in patients with

Turners syndrome. Behavior Genetics, 7: 11-21.Slobin, D. I. (2001). Form-function relations: How do children find out what they are? In

M. Bowerman and S. C. Levinson (eds), Language Acquisition and Conceptual Development.New York: Cambridge University Press, 406-49.

Page 315: Rich Languages From Poor Inputs

300 References

and Bever, T. G. (1982). Children use canonical sentence schemas: A crosslinguistic studyof word order and inflections. Cognition, 12: 229-65.

Smith, N. V. and Tsimpli, I. M. (1995). The Mind of a Savant: Language Learning and Modularity.Oxford: Blackwell.

Snedeker, J., Geren, J., and Shafto, C. (2007). Starting over: International adoption as a naturalexperiment in language development. Psychological Science, 18/1: 79-87.

and Gleitman, L. (2004). Why it is hard to label our concepts, in G. Hall and S. Waxman(eds), Weaving a Lexicon. Cambridge, MA: MIT Press.

Snyder, J. C., Clements, M. A., Reed, C. M., Durlach, N. I., and Braida, L. D. (1982). Tactilecommunication of speech, I: Comparison of Tadoma and a frequency-amplitude spectraldisplay in a consonant discrimination task. Journal of the Acoustical Society of America, 71:1249-54.

Sonnenstuhl, I. and Huth, A. (2002). Processing and representation of German -n plurals:A dual mechanism approach. Brain and Language, 81/1-3: 276-90.

Spelke, E. S. (1990). Principles of object perception. Cognitive Science, 14: 29-56.Katz, G., Purcell, S. E., Ehrlich, S. M., and Breinlinger, K. (1994). Early knowledge of object

motion: Continuity and inertia. Cognition, 51: 131-76.Spivey-Knowlton, M. and Sedivy, J. C. (1995). Resolving attachment ambiguities with multiple

constraints. Cognition, 55: 227-67.Stanford-Binet Intelligence Scale (1960). Boston, MA: Houghton Mifflin.Stanovich, K. E. (1985). Explaining the variance in reading ability in terms of psychological

processes: What have we learned? Annals of Dyslexia, 35: 67-96.Starke, M. (2001). Move dissolves into Merge. PhD dissertation, University of Geneva.Stenquist, G. (1974). The story of Leonard Dowdy: Deafblindness acquired in infancy. Watertown,

MA: Perkins School for the Blind.Stough, C., Nettelbeck, T, and Ireland, G. (1988). Objectively identifying the Cocktail Party

syndrome among children with spina bifida. The Exceptional Child, 35/1: 23-30.Straus, K. (2008). Validations of a probabilistic model of language learning. PhD dissertation,

Department of Mathematics, Northeastern University, Boston, MA.Stromswold, K. (1995). The acquisition of subject and object wh-questions. Language

Acquisition, 4: 5-48.(2001). The heritability of language: A review and meta-analysis of twin, adoption and

linkage studies. Language, 77/4: 647-723.(2006). Biological and psychosocial factors affect linguistic and cognitive development

differently: A twin study. BUCLD 30: Proceedings of the $oth Annual Boston University Con-ference on Language Development, 2. Somerville, MA: Cascadilla Press, 595-606.

(2007). A gene linked to speech and language in the developing human brain. AmericanJournal of Human Genetics, 87: 1144-57.

Suzuki, K. and Sakai, K. (2003). An event-related fMRI study of explicit syntactic processingof normal/anomalous sentences in contrast to implicit syntactic processing. Cerebral Cortex,13/5: 517-26.

Svenonius, P. (2002). Subjects, expletives and theEPP. New York: Oxford University Press.Szagun, G. (2001). Learning different regularities: The acquisition of noun plurals by German-

speaking children. First Language, 21: 109-41.

Page 316: Rich Languages From Poor Inputs

References 301

Tabor, W., Juliano, C, and Tanenhaus, M. K. (1997). Parsing in a dynamical system: Anattractor-based account of the interaction of lexical and structural constraints in sentenceprocessing. Language and Cognitive Processes, 12: 211-72.

Takahashi, E. and Lidz, J. (2008). Beyond statistical learning in syntax, in A. Gavarro andJ. Freitas (eds), Language Acquisition and Development: Proceedings of GALA 2007. Newcastle -upon-Tyne: Cambridge Scholars Publishing, 446-56.

Tallal, P. (1976). Rapid auditory processing in normal and disordered language development.Journal of Speech and Hearing Research, 19: 561-71.

(2000). Experimental studies of language learning impairments: From research to reme-diation, in D. V M. Bishop and L. B. Leonard (eds), Speech and Language Impairmentsin Children: Causes, Characteristics, Intervention and Outcome. Hove: Psychology Press,

131-55-Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms, in T. Shopen

(ed.), Language Typology and Syntactic Description. New York: Cambridge University Press,

57-149-Tammet, D. (2007). Born on a Blue Day: Inside the Extraordinary Mind of an Autistic Savant.

New York: Free Press.Tan, L. H., Spinks, J. A., Eden, G. E, Perfetti, C. C., and Siok, W. T. (2005). Reading depends on

writing in Chinese. Proceedings of the National Academy of Sciences, 24: 8781-5.Tardif, T, Shatz, M., and Naigles, L. (1997). Caregiver speech and children's use of nouns

versus verbs: A comparison of English, Italian, and Mandarin. Journal of Child Language, 24:

535-65-Temple, C. (1980). Learning to spell in Spanish, in M. L. Kamil and A. J. Moe (eds), Perspec-

tives in Reading Research and Instruction: 2$th Yearbook of the National Reading Conference.Washington, DC: National Reading Conference, 172-8.

(1991)- Procedural dyscalculia and number facts dyscalculia: Double dissociation indevelopmental dyscalculia. Cognitive ̂ euro-psychology, 8: 155-76.

and Carney, R. (1996). Reading skills in children with Turner's syndrome: An analysis ofhyperlexia. Cortex,, 32/2: 335-45.

and Shephard, C. M. (2012). Exceptional lexical skills but executive language deficits inschool starters and young adults with Turners syndrome: Implications for X chromosomeeffects on brain function. Brain and Language, 120/3, (March), 345-59.

Tesar, B. and Smolensky, P. (2000). Learnability in Optimality Theory. Cambridge, MA: MITPress.

Tew, B. and Laurence, K. (19793). The clinical and psychological characteristics of children withthe 'cocktail party' syndrome. Kinderchir Grenzgeb, 28/4: 360-7.

(i979b). The cocktail party syndrome' in children with hydrocephalus andspina bifida. International Journal of Language and Communication Disorders, 14/2:89-101. (Published online, March 2011.)

Thompson, C., Fixa, S., and Gitelman, D. (2002). Selective impairment of morphosyntacticproduction in a neurological patient. Journal ofNeurolinguistics, 15/3-5: 189-207.

Thompson, S. P. and Newport, E. L. (2007). Statistical learning of syntax: The role of transitionalprobability. Language Learning and Development, 3: 1-42.

Page 317: Rich Languages From Poor Inputs

302 References

Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition,74/3: 209-53.

Tomblin, J. and Buckwalter, P. (1998). Heritability of poor language achievement among twins.Journal of Speech and Hearing Research, 41: 188-99.

Toro, J. M., Nespor, M., Mehler, J., and Bonatti, L. L. (2008). Finding words and rules in aspeech stream: Functional differences between vowels and consonants. Psychological Science,

19: 137-44-Shukla, M., Nespor, M., and Endress, A. D. (2008). The quest for generalizations over

consonants: Asymmetries between consonants and vowels are not the by-product of acousticdifferences. Perception andPsychophysics, 70: 1515-25.

Torrego, E. (1984). On inversion in Spanish and some of its effects. Linguistic Inquiry,15: 103-30.

Townsend, D. J. and Bever, T. G. (2001). Sentence Comprehension. Cambridge, MA: MIT Press.Carrithers, C, and Bever, T. G. (2001). Familial handedness and access to words, meaning,

and syntax during sentence comprehension. Brain and Language, 78: 308-31.Travis, L. (2000). Event structure in syntax, in C. Tenny and J. Pustejovsky (eds), Events as

Grammatical Objects: The Converging Perspectives of Lexical Semantics and Syntax. Stanford,CA: CSLI Publications.

Traxler, M. J., Morris, R. K., and Seely, R. E. (2002). Processing subject and object relativeclauses: Evidence from eye movements. Journal of Memory and Language, 47: 69-70.

Williams, R. S., Blozis, S. A., and Morris, R. K. (2005). Working memory, animacy, andverb class in the processing of relative clauses. Journal of Memory and Language, 53: 204-24.

Treiman, R. (1993). Beginning to Spell: A Study of First-grade Children. New York: OxfordUniversity Press.

(1998). Beginning to spell in English, in C. Hulme and R. M. Joshi (eds), Reading andSpelling: Development and Disorders. Mahwah, NJ: Lawrence Erlbaum, 371-93.

Cassar, M., and Zukowski, A. (1994). What types of linguistic information do childrenuse in spelling? The case of flaps. Child Development, 65: 1310-29.

Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. (1994). Semantic influences on parsing:Use of thematic role information in syntactic ambiguity resolution. Journal of Memory andLanguage, 33: 285-318.

and Kello, C. (1993). Verb-specific constraints in sentence processing: Separatingeffects of lexical preference from garden-path. Journal of Experimental Psychology: Learning,Memory and Cognition, 19: 528-53.

Tyler, A. and Nagy, W (1989). The acquisition of English derivational morphology. Journal ofMemory and Language, 28: 649-67.

Ueno, M. and Garnsey, S. (2008). An ERP study of the processing of subject and object relativeclauses in Japanese. Language and Cognitive Processes, 23: 646-88.

Uszkoreit, H. and Peters, S. (1986). On some formal properties of metarules. Linguistics andPhilosophy, 9: 477-94.

Utzeri, I. (2007). The production and acquisition of subject and object relative clauses in Italian.Nanzan Linguistics, Special Issue 3: 283-314.

Valian, V. (1986). Syntactic categories in the speech of young children. DevelopmentalPsychology, 22: 562-79.

Page 318: Rich Languages From Poor Inputs

References 303

(1999). Input and language acquisition, in W. C. R. T. K. Bhatia (ed.), Handbook of ChildLanguage Acquisition. New York: Academic Press, 497-530.

and Coulson, S. (1988). Anchor points in language learning: The role of marker frequency.Journal of Memory and Language, 27: 71-86.

van der Lely, H. K. J. (2004). Evidence for and implications of a domain-specific grammaticaldeficit, in L. Jenkins (ed.), Variations and Universals in Biolinguistics. Oxford: Elsevier, 117-45.

(20053). Domain-specific cognitive systems: Insight from grammatical specific languageimpairment. Trends in Cognitive Sciences, 9/2: 53-9.

(2005!?). Grammatical-SLI and the computational grammatical complexity hypothesis.Revue Frequences, 17/3: 13-20.

and Battell, J. (2003). Wh-movement in children with grammatical SLI: A test of theRDDR hypothesis. Language, 79: 153-81.

and Maranis, T. (2007). On-line processing of wh-questions in children with G-SLI andtypically developing children. International Journal of Language and Communication Disor-ders, 42/5: 557-82.

Rosen, S., and Adlard, A. (2004). Grammatical language impairment and the speci-ficity of cognitive domains: Relations between auditory and language abilities. Cognition,94/2: 167-83.

and McClelland, A. (1998). Evidence for a grammar specific deficit in children.Current Biology, 8: 1253-8.

and Stollwerck, L. 1996. A grammatical specific language impairment in children:An autosomal dominant inheritance? Brain and Language, 52: 484-504.

Varley, R. and Siegal, M. (2000). Evidence for cognition without grammar from causal reason-ing and theory of mind in an agrammatic patient. Current Biology, 10/12: 723-6.

Varnhagen, C. K. (1995). Children's spelling strategies, in V. Berninger (ed.), The Varieties ofOrthographic Knowledge, II: Relationships to Phonology, Reading, and Writing. Dordrecht:Kluwer Academic Publishers, 251-90.

Veblen, Thorstein (1899/1974). Theory of the Leisure Class. New York: Dover Thrift Edition.Vivian, R. (1966). The Tadoma method: A tactual approach to speech and speechreading. Volta

Review, 68: 733-7.Waber, D. (1979). Neuropsychological aspects of Turner's syndrome. Developmental Medicine

and Child Neurology, 21: 58-70.Wang, H. and Mintz, T. H. (2010). From linear sequences to abstract structures: Distributional

information in infant-directed speech, in K. Franich, K. M. Iserman, and L. L. Keil (eds),BUCLD 34: Proceedings of the 34th Annual Boston University Conference on Language Devel-opment. Somerville, MA: Cascadilla Press.

Wanner, E. and Maratsos, M. (1978). An ATN approach in comprehension, in M. Halle,J. Bresnan, and G. Miller (eds), Linguistic Theory and Psychological Reality. Cambridge, MA:MIT Press, 119-61.

Warren, T. and Gibson, E. (2002). The influence of referential processing on sentencecomplexity. Cognition, 85: 79-112.

(2005). Effects of NP type in reading cleft sentences in English. Language andCognitive Processes, 20: 751-67.

Wechsler Adult Intelligence Scale (WAIS) (1955). New York: The Psychological Corp.

Page 319: Rich Languages From Poor Inputs

304 References

Wechsler Intelligence Scale for Children-Revised (WISC-R) (1974). New York: The Psycho-logical Corp.

Weckerly, J. and Kutas, M. (1999). An electrophysiological analysis of animacy effects in theprocessing of object relative sentences. Psychophysiology, 36/5: 559-70.

Weir, Ruth (1962). Language in the Crib. The Hague: Mouton.Werker, J. and Tees, R. (1983). Developmental change across childhood in the perception of

non-native speech sounds. Canadian Journal of Psychology, 37/2: 278-86.Wertheimer, M. (1912). Experimentelle Studien ilber das Sehen von Bewegung. Zeitschrift fur

Psychologie, 61: 161-265.(1945). Productive Thinking. New York: Harper.

Wexler, K. (1992). Some issues in the growth of control, in R. K. Larson, S. latridou, U. Lahiri,and J. Higginbotham (eds), Control and Grammar. Cambridge, MA: MIT Press, 253-95.

(1996). The development of inflection in a biologically based theory of language acqui-sition, in M. L. Rice (ed.), Toward a Genetics of Language. Mahwah, NJ: Lawrence Erlbaum,

113-44-(1998). Maturation and growth of grammar, in W. C. Ritchie and T. K. Bhatia (eds),

Handbook of Language Acquisition. San Diego: Academic Press, 55-110.(2003). Lenneberg's dream: Learning, normal language development and specific lan-

guage impairment, in J. Schaeffer and Y Levy (eds), Towards a Definition of Specific Lan-guage Impairment. Mahwah, NJ: Lawrence Erlbaum, 11-62. Reprinted with small changesin L. Jenkins (ed.) (2004), Variations and Universals in Biolinguistics. Amsterdam: Elsevier,239-90.

(2004). Theory of phasal development: Perfection in child grammar. MIT Working Papersin Linguistics, 48: 159-209.

(forthcoming). Grammatical computation in the Optional Infinitive stage, in J. de Villiersand R. Roeper (eds), Handbook of Generative Approaches to Language Acquisition. Berlin:Springer-Verlag.

Schaeffer, J., and Bol, G. (2004). Verbal syntax and morphology in Dutch normal andSLI children: How developmental data can play an important role in morphological theory.Syntax, 7/2: 148-98.

Whitaker, H. (1976). A case of the isolation of the language function, in H. Whitaker andH. A. Whitaker (eds), Studies in Neurolinguistics, vol. 2. New York: Academic Press, 1-58.

Wiese, R. (1996). The Phonology of German. Cambridge: Cambridge University Press.Witten, I. and Bell, T. (1991). The zero-frequency problem: Estimating the probabilities of

novel events in adaptive text compression. IEEE Transactions on Information Theory, 37/4:1085-94.

Wolf, M. (2007). Proust and the Squid: The Story and Science of the Reading Brain. New York:Harper Collins.

Barzillai, M., Gottwald, S., Miller, L., Spencer, K., Norton, E., Lovett, M., and Morris, R.(2009). The RAVE-O Intervention: Connecting neuroscience to the classroom. Mind, Brain,andEducation, 3/2: 84-93.

and Katzir-Cohen, T. (2001). Reading fluency and its intervention. Scientific Studies ofReading, Special Issue on Fluency, ed. E. Kame-enui and D. Simmons, 5: 211-38.

Page 320: Rich Languages From Poor Inputs

References 305

Wu, H. and Gibson E. (2008). Processing Chinese relative clauses in context. Poster presentedat the 2ist CUNY Conference on Human Sentence Processing.

Wunderlich, D. (1999). German noun plural reconsidered. Manuscript, University of Diissel-dorf.

Xu, F. and Pinker, S. (1995). Weird past tense forms. Journal of Child Language, 22: 531-56.Yamada, Y. and Neville, H. (2007). An ERP study of syntactic processing in English and non-

sense sentences. Brain Research, 1130/1: 167-80.Yang, C. (2002). Knowledge and Learning in Natural Language. New York: Oxford University

Press.(2004). Universal grammar, statistics, or both. Trends in Cognitive Sciences, 8:451-6.(2005). On productivity. Language Variation Yearbook, 5: 333-70.(2006). The Infinite Gift. New York: Scribner.(2008). The great number crunch. Journal of Linguistics, 44: 205-28.(2010). Three factors in language variation. Lingua, 120: 1160-77.(2011). Computational models of syntactic acquisition. Wiley Interdisciplinary Review:

Cognitive Science, 3/2: 205-13.(in prep.). The price of productivity. Manuscript, University of Pennsylvania.

Yetano, I. (2009). A corpus-based study of Basque relative clauses. Unpublished manuscript,University of the Basque Country (UPV/EHU), Vitoria-Gasteiz.

Dunabeitia J. A., de la Cruz-Pavia, I., Carreiras, M., and Laka, I. (2010). Processing post-nominal relative clauses in Basque: An inquiry into the dependency locality theory. Posterpresented at the 23rd Annual CUNY Conference on Human Sentence Processing, New YorkUniversity, New York.

Young, R. W. and Morgan, Sr, W. (1987). The Navajo Language: A Grammar and ColloquialDictionary. Albuquerque: University of New Mexico Press.

Zaidel, E., Zaidel, D. W., and Sperry, R. W. (1981). Left and right intelligence: Case studiesof Raven's Progressive Matrices following brain bisection and hemidecortication. Cortex, 17:167-86.

Zatorre, R. J., Evans, A. C., Meyer, E., and Gjedde, A. (1992). Lateralization of phonetic andpitch processing in speech perception. Science, 256: 846-9.

Zukowski, A. (2005). Knowledge of constraints on compounding in children and adolescentswith Williams Syndrome. Journal of Speech, Language and Hearing Research, 48: 79-92.

(forthcoming). Elicited production of relative clauses reflects intact grammatical knowl-edge in Williams syndrome. Language and Cognitive Processes.

Page 321: Rich Languages From Poor Inputs

This page intentionally left blank

Page 322: Rich Languages From Poor Inputs

Index

ABA, over vowels and consonants 112A-bar movement 148,160-161,163Abimbola, I.O. 231Accessibility Hierarchy 129-30A-Chain Deficit Hypothesis (ACDH) 155,

156-7acoustic processing 78

vs. phonological processing 70-71adjective phrases 164-7adjectives 24, 253, 258, 261

adjective-noun phrases 263, 269vs. object 149-54

agnosias 88-90agrammatism 81, 87, 120'aha' reaction 185Alzheimer's disease 79-80Ambridge, B. et al. 52A-movement 148, 160-161, 164, 166animacy 10, 141-2aphasia 69, 75-6, 79, 81, 89, 188artificial language learning 43-4artificial learners 44-5arts integration 228-40Asperger's syndrome 76autism 76, 77aux-doubling 29/7aux-fronting 61-2, 63auxiliary inversion 46-60Avar (ergative language) 144

babies, phonological processing 71Baker, M.C. 120Basque, subject-object asymmetries

132-44Bayesian model selection 33-8Becker, M. 154/7behaviorist approach to learning 108Belletti, A. 123

and L. Rizzi 123/7

Bellugi, U. et al. 75Ben-Shachar, M. et al. 88Bernstein-Ratner corpus 39-40Berwick, R.C.

and N. Chomsky 29-30/77et al. 39

Sever, T.G. 127Biemiller, A. 215big modularity (BMod) 68, 69-80bigram models 38-41, 46-53, 54-60binary classes 110-111Binding and Control 84biolinguistic program 183Bissex, G. 233-4blindfolded doll experiment 91, 94, 95-6,

148, 184blind people

language acquisition 92-5perception 100-101Tadoma method 241-66

Bonatti, L.L. et al. 111Borer, H. and K. Wexler 155,157Bornkessel-Schlesewsky, I. and

M. Schlesewsky 142Bowerman, M. 97bracketing 22-3, 36/77, 55-8brain 26/7, 69-88, 90, 188-90, 213-17Brill, E. 176

Canonical Syntactic Form 186-8canonicity 131, 135Carreiras, M. et al. 132, 134-5, 141.

142-4Carroll, Lewis 98causatives 101, 123chatterbox syndrome' 76child-directed utterances 48CHILDES Adam corpus 34, 38, 176Chinese, writing system 205, 225

Page 323: Rich Languages From Poor Inputs

3o8 Index

Chomsky, C 53, 68/«, 91, 92-7, 100-102,104, 114, 115, 124-6, 127-8, 146-5,148-50, 154, 182, 184, 190, 192, 196,198-202, 208-9, 210-18, 232-3,234, 240

Chomsky, N. 22, 25, 47, 108, 114, 145, 155,160, 165, 221

and M. Halle 169, 203Choudhary, K.K. et al. 142Cinque, G. 123Clark, A. 33/«

and R. Eyraud 31-3cognitive science 108Collins, C. 120-121, 124computational efficiency vs.

communicational efficiency 62computational system 65-7,108Conceptual-Intentional (CI) interface 64concrete nouns 102-4consonants 70-71, 172-3

spelling 197,200-5,211,222vs. vowels 111-13

constructivism 199content words vs. function words 109-1 1Context Free Grammar (CFG) 34-8control 123-6copular constructions 66copy theory of movement 28copying account 29-30core representations 113-14Grain, S. and M. Nakayama z6fn, 51creativity 232Cromer, R.F. 149-53, 158-60C-T relation 62-4cue-based learning approach 173CV hypothesis 111-12

Dapretto, M. and S. Bookheimer 87Davis, Miles 238-239deaf-blind people

language acquisition 92-3,100-101Tadoma method 241-65

deaf people 73, 74, 77-8, 98-100, 189deep vs. surface structures 154-5

Dehaene-Lambertz, G. et al. 71Dementia of the Alzheimer's Type

(DAT) 80,84-5,90Dependency Locality Theory (DLT) 132,

137. 143derivational morphology 179,181Descartes, R. 20Developmental Sentence Scoring

(DSS) 246-7,251,254-4,259-259Dikker, S. et al. 192disciplines, combining 216-18discovery procedures 31domain-general knowledge 19,20-21,

33-4domain-specific knowledge 19, 21, 26, 33-4,

68-9, 78-9Dostoyevsky, T.M. 227, 229-230Down Syndrome 76-7Dresher, E. 174Dutch

auxiliary inversion 51stress acquisition 173,177-8

dynamic antisymmetry 66dyslexia 212-13

Early Left Anterior Negativity

(ELAN) 71-2,87education

arts integration 227-40reading 216-18spelling 198-199, 208-8

Elicited Object Relative 121embedded can 62Emmorey, K. 73English

as morphophonemic language 215orthography 203,204,214,224-5

English Lexicon Project 176, 181EPEC corpus 137equipotentiality 189Erdocia, K. et al. 134, 135ergative languages, subject-object

asymmetries 132-46evaluation measure 170

Page 324: Rich Languages From Poor Inputs

Index 309

Event Related Potential (ERP) 71-2, 87-8,

136-7exceptions, accounting for 169-70experience, variation in 188-91Extended Projection Principle (EPP) 66,

186-7External Merge (EM) 28-9, 64-5external stimuli 20extraction islands 58

facial cognition 89Ferreira, F. and C.J. Clifton 141Fikkert, P. 177finite list grammar 35-6Finite-State Grammar (FSG) 34, 35-6, 54Finnish, spelling 206fMRI imaging z6fn, 87-8, 189Fonteneau, E. and H.K.J. van der Lely 87Frazier, L. and J.D. Fodor 139Fregoli syndrome 89French

combien 117phonological processing 71spelling 206

frequent syllables, preference for 109-1 1Friedmann, N. 116

and A. Belletti, and L. Rizzi 116, 119-20and R. Novogrodsky 86

function words vs. content words 109-11

Gallas, K. 228Gallistel, C.R. and A.P. King 108Galvan 78GAP features 60garden path effects 140-141Gardner, H. 228genetics of language 86-87Genie case 68-9German

tensed verbs 23Tolerance Principle 170-171

Gervain, J. et al. 109gestaltism 185Goethe, J.W. von i

Goldberg, M. 234Gomez, R.L. and L.A. Gerken 54GORT reading fluency 219grammar, dissociation from non-linguistic

cognition 72-80grammatical errors, recognition of 72grammatical impairments (G-SLI) 78-9,

85-6Grfflo, N. 120Gutierrez, M.J. 138

Hg8 system 180-182Hale, K. and S.J. Keyser 125Halle, M. 176, 179

and J.-R. Vergnaud 176, 179handedness 188-91Harris, Z. 3 1Hart, Mickey 229Head-driven Phrase Structure Grammar

(HPSG) 59-60Hebrew, object relatives 119-20Henderson, E.H. and J. Beers 209heritability of language 79heritability of language impairments 79Hicks, G. 161-7hierarchical structure 35-8Hirsch, C. and K. Wexler i53/«, 158,

159, 166Hochmann, J.-R.

and S. Benavides-Varela, M. Nespor, andJ. Mehler 112-13

and A.D. Endress, and J. Mehler 109-11Hodes, Art 229Home Sign systems 99Hume, David 92HV87 system 180hydrocephalus 76hypothesis-testing model 185, 186, 187

idiosyncratic patterns 169Inclusion configuration 118Indefrey, P. et al. 88innateness 19, 37intelligence 79

Page 325: Rich Languages From Poor Inputs

310 Index

intelligence tests 244Internal Merge (IM) 28-9,62,64,65-6interpretation of sentences 22-6

Merge 27-31probability 34-5

intervention effect 115-26intracranial electrophysiological (ICE)

recordings 87invented spelling 193-209, 220-6, 232-9invention 231-2

vs. 'misconceptions' 230-1Irish, tensed verbs 23island constraints 26Italian

infants' preference for frequentsyllables 109-10

passive 121-3w/j-phrases 117

Jabberwocky (Carroll) 98Jacquemot, C. et al. 71Japanese

infants' preference for frequentsyllables 109

phonological processing 71relative clauses 129

Kam,X.N.C. 39,46et al. 39-40

Keenan, E.L. and B. Comrie 129-30Keller, Helen 92Kempler, D. 84-5Kim, M., B. Landau, and C. Phillips 97Klinefelter's Syndrome (KS) 74, 82-4knowledge in linguistic disciplines 214-18Kratzer, A. 159

labeling 65-7Landau, B. and L.R. Gleitman 100language acquisition

in blind children 92-5consonants vs. vowels 111-13in deaf children 98-100in deaf-blind children 92-3,100-101

preference for frequent syllables 109-11

stress system 168-69, 1/2-83

theories 108-9, 113-14learning devices

artificial 44-5and target languages 43

Lee, L.L. 246Legate, J.A. and C. Yang 180Lely, H.K.J. van der 78, 86lexical acquisition vs. syntactic

acquisition 188lexical decomposition 125linguistic cognition vs. non-linguistic

cognition, dissociation 72-80Linguistic Savants 77little modularity (LMod) 68, 80-90Luria, A. 188

Mannheim corpus 170-171McGuire, P.K. et al. 72-3meanings, multiple 214-15Mehler, J., M. Pena, M. Nespor, and

L.L. Bonatti 111Merge 27-31,41,62,64-6metrical stress acquisition 168-70, 172-83Miller, G. et al. 186Minimal Computation (MC) 62-7Minimal Distance Principle (MDP) 96-7,

115, 124minimal processing 139-40Minimalist program 187'misconceptions' vs. invention 230-231Mismatch Negativities (MMNs) 70, 72Moats, L.C. 215'modal' brain 72modularity 68, 69-90Montessori, M. 198, 233Moro, A. 66morphological acquisition 196, 203-4morphological knowledge 215morphological markedness 138, 142, 144morphology

role in stress system 176of spelling 223-6

Page 326: Rich Languages From Poor Inputs

Index 311

Move operation 81Multiple Intelligences (MI) 228Munaro, N. n8/«music 229, 239music cognition 90

nasals, omission in children'sspelling 197-198, 200-2

natural language with artificial learners 44-5natural law 20, 41Navajo writing system 224-4Nazzi, T. et al. 112Nespor, M., M. Peria, and J. Mehler 111neurological differences in

handedness 189-90neurology of language 69-72, 87, 90Newman, A.J. et al. 88«-gram models 46-60nominative languages, subject-object

asymmetries 138Norton, S.J. et al. 242no-tampering condition (NTC) 27, 29/17,

64/«null operators 161-3number cognition 77-8

object control 124object relatives, difficulty with 1 16-26object vs. subject

adjectives 149-54in relative clauses 128-46

observable world items 102-4Oddball paradigms 70-71Optimality Theory (OT) 173order type verbs 124-6orthography 203,204,214,224-5O-type adjectives 149-53

pairings 22-6paradigmatic gaps 171passive 120-123

development 157-60Perania, D. et al. 90Pereira, F. 52

Perfors, A., J. Tenenbaum and T. Regier 33-^Perspective Shift Hypothesis 130Pesetsky, D. 123/17Pettito, L.A. et al. 71Phase Impenetrability Condition

(PIC) 155, 162PHAST program 217-18Phillips, C. et al. 70phonological acquisition 172, 196-198, 214phonological processing vs. acoustic

processing 70-71phonology of invented spelling 220-5phrase boundary markers 191phrase structure (PS) information 54-60Phrase Structure Grammar (PSG) 41, 64-7Piaget, J. 199, 232plurality 134-4polar interrogatives 20-21,22-3

with subject relative clauses (PIRCs) 39,46,48,50-53,56,59

Polinsky, M., C. Gomez-Gallo,E. Kravtchenko, and Y. Testelets 144

Polio, T. C., B. Kessler, and R.Treiman 205-6

polysemy 214-15Portuguese spelling 206-7postnominal relative clauses 142-4Poverty of the Stimulus (POS) 2, 19-21, 25,

41-2, 187structure dependence of rules 36-8trigram approach 39-41

Poverty of the Stimulus Revisited(POSR) 61-3

predicate-internal subjects 66preference for frequent syllables 1 09 - 1 1primary stress placement 179-8 1probability

in sentence interpretation 34-5transitional 45-9

problem-solving model 185-8Produced (Subject) Relative 121productivity 170-172, 174-5, 180-181prominence features 142promise verbs 124-6, 184

Page 327: Rich Languages From Poor Inputs

312 Index

psych verbs 122-3Pulvermiiller, F. et al. 72

quantity- sensitivity 173-4, i/7> 1/8question formation 46-60Quine, W. 91

raising 25-6,28,41-2RAVE-O intervention 216-18Ravitch, D. 239Read, C. 196-198, 199-201, 202, 207,

208-10, 233reading acquisition 190-192, 211-18reading experiments 136,143reading tasks for KS subjects 82-4ReadSmart 191-2Reali, F. and M.H. Christiansen 38-41,

45-8regular grammar 35-6relative clauses

as extraction islands 58subject-object asymmetries 128-46

Relativized Minimality 115-26, 130-131reorganizational principle 153-5repetition of vowels and consonants 112-13representational approach to learning 108Rizzi, L. 115-16Rosenbaum, RS. 115rule hypotheses 25-6

Saffran, J.R. and D.P. Wilson 54Sag, LA. et al. 59Sahin, N. et al. 87Santi, A. and Y. Grodzinsky 88savants 77Schaller, S. 78Schultz, M.C. et al. 242selective impairment 80-81, 85, 89semantic competence 95-104

in blind children 94-5semantic knowledge tests 245-5sign language 99-100, 189, 243

processing 72-3, 74-6Simple Recurrent Networks (SRNs) 39, 53/77

smuggling 121, 122, 124Spanish spelling 206spatial cognition 72-5, 80, 89Special-Purpose linguistic tests 245, 248-50,

257-59,266-70Specific Language Impairment (SLI) 78-9,

85-6speech in deaf-blind people 263-4speechreading 241-66spelling 211

invented 196-209, 220-6, 232-40spinabifida 76Sportiche, D. and I. Roberts 123Stanford-Binet Intelligence Scale 244,

256-7Starke, M. 118Stenquist, G. 243stress acquisition 168-70, 172-82

in deaf-blind people 262-3strings, weakly generated 36Stromswold, K. 86Structural Distance Hypothesis 130, 131structural tests for syntactic and semantic

knowledge 245-6structure-dependence 26, 36-8, 63structure-independent rules 26struggling readers 213,216-18S-type adjectives 149-53subject vs. object

adjectives 149-52in relative clauses 128-46

subject control 123-6subject relative clauses 26substitutability 31-3successive-cyclic movement 30/77surface forms 186surface vs. deep structures 154-5Syntactic Abilities test 245-6, 248, 257syntactic ability in deaf-blind people 261-2syntactic acquisition 43, 45-6, 54, 95-104

beyond five years 211vs. lexical acquisition 188as problem- solving 185-6

syntactic knowledge 216

Page 328: Rich Languages From Poor Inputs

Index 313

tactile aids for speechcommunication 241-66

Tadoma method 92, 241-66Takahashi, E. and J. Lidz 44target languages

with artificial learners 44-5and learning devices 43

teachingarts integration 227-41to read 216-20to spell 198-199, 208-209

Tesar, B. and P. Smolensky 173Tolerance Principle 170-173, 174-5Toro, J.M., M. Nespor, J. Mehler, and

L.L. Bonatti 112Torrego, E. 30/77tough-construction (TC) /tough-movement

(TM) 146-53, 156-67transformational framework 59transition probabilities (TPs) 45-9, 111Treiman, R. 200-32, 204

and M. Cassar, and A. Zukowski 203trigram approach 39-41, 46-7, 52-3Turners syndrome (TS) 73-4twin studies 86-7typological bootstrapping 97

Universal Grammar (UG) 19, 168,174-6

Universal Phase Requirement (UPR) 145,

155, 156-59, 166-7

variation in experience 188-90Varnhagen, C.K. 209Veblen, T 226verbal passive, development 157-60

verbsirregular 171syntactic behavior 98

visual cognition 89vocabulary 215

in deaf-blind people 260vowels

vs. consonants 111-13spelling 223

V-raising 25, 3q/«, 41-2

WC stimuli 70weak substitutability 32weakly generated strings 36Wechsler Adult Intelligence Scale

(WAIS) 244Weir, R. 187Wexler, K. 145, i54/«, 156, 158w/j-phrases 117-18, 154word order rules, acquisition 54'word poverty' 215Word Webs 217word-world pairing 91-2writing

children's 193-209by deaf-blind people 264systems 203,204,214,224-7

X chromosome 82

Yamada, Y. and H. Neville 8Yang, C. 170, 171, 174yes-no questions 20-21, 22-3, 148-50Yetano, I. 137

et al. 142Young, R.W and W Morgan Sr 224-6