introduction to language acquisition theory janet dean fodor st. petersburg july 2013 class 2. from...

19
Introduction to Introduction to Language Acquisition Theory Language Acquisition Theory Janet Dean Fodor Janet Dean Fodor St. Petersburg July 2013 St. Petersburg July 2013 Class 2. From computer Class 2. From computer science (then) science (then) to psycholinguistics to psycholinguistics (now) (now)

Upload: anne-mccarthy

Post on 01-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Introduction to

Language Acquisition TheoryLanguage Acquisition Theory

Janet Dean FodorJanet Dean FodorSt. Petersburg July 2013 St. Petersburg July 2013

Class 2. From computer science Class 2. From computer science (then)(then) to psycholinguistics (now) to psycholinguistics (now)

Syntax acquisition as parameter Syntax acquisition as parameter

settingsetting Like playing “20 questions’. The learner’s task is to detect Like playing “20 questions’. The learner’s task is to detect

the correct settings of the finite number of parameters.the correct settings of the finite number of parameters.

Headedness parameter: Are syntactic phrases head-initial Headedness parameter: Are syntactic phrases head-initial (e.g., in VP, the verb precedes its object) or head-final (the (e.g., in VP, the verb precedes its object) or head-final (the verb follows the object)?verb follows the object)?

Wh-movement parameter: Does a Wh-phrase move to the Wh-movement parameter: Does a Wh-phrase move to the top of a clause or does it remain in situ? top of a clause or does it remain in situ?

Parameter values are ‘triggered’ by learner’s encountering Parameter values are ‘triggered’ by learner’s encountering a distinctive revealing property of an input sentence. a distinctive revealing property of an input sentence.

This Principles-and-Parameters approach has been This Principles-and-Parameters approach has been retained through many subsequent changes in TG theory.retained through many subsequent changes in TG theory.

It greatly reduces a learner’s workload of data-processing. It greatly reduces a learner’s workload of data-processing. It helps address the Poverty of Stimulus problem. It helps address the Poverty of Stimulus problem.

22

33

Parameter setting as flipping Parameter setting as flipping switchesswitches Chomsky never provided a specific implementation of parametric triggering. He often employed a metaphor of setting switches. Chomsky never provided a specific implementation of parametric triggering. He often employed a metaphor of setting switches.

(Chomsky 1981/1986) (Chomsky 1981/1986)

The metaphor suggests that parameter setting is:The metaphor suggests that parameter setting is:

›› Automatic, instantaneous, effortless: no linguistic reasoning Automatic, instantaneous, effortless: no linguistic reasoning is required of the learner. (Unlike hypothesis-formation models.) is required of the learner. (Unlike hypothesis-formation models.) ›› Input-guided (no trial-and-error process). Input-guided (no trial-and-error process). ›› A universal mechanism, but leading reliably to language- A universal mechanism, but leading reliably to language- specific parameter settings. specific parameter settings. ›› Non-interacting parameters: Each can be set separately. Non-interacting parameters: Each can be set separately. ›› Each has unambiguous triggers recognizable regardless of Each has unambiguous triggers recognizable regardless of what else the learner does or doesn’t know about the language. what else the learner does or doesn’t know about the language. ›› Deterministic learning: fully accurate, so no revision is ever Deterministic learning: fully accurate, so no revision is ever needed. needed.

A wonderful advanceA wonderful advance if true – if psychologically feasible! if true – if psychologically feasible!

A wonderful advance, if true – if psychologically feasible.A wonderful advance, if true – if psychologically feasible.

44

But computational linguists couldn’t But computational linguists couldn’t implement itimplement it

(parameters yes; triggering no)(parameters yes; triggering no) Syntacticians largely embraced this neat picture. Syntacticians largely embraced this neat picture.

But But as a mechanismas a mechanism, triggering was never implemented. , triggering was never implemented. Computational linguists deemed it unfeasible. Due to Computational linguists deemed it unfeasible. Due to ambiguityambiguity and and opacityopacity of would-be triggers, in the natural of would-be triggers, in the natural language domain. (Clark, 1989) Examples, next slide language domain. (Clark, 1989) Examples, next slide

Only the concept of parameterization was retained: Language Only the concept of parameterization was retained: Language acquisition is acquisition is selectionselection of a grammar from a finite set, which is of a grammar from a finite set, which is defined by UG (innate principles + innate parametric choices). defined by UG (innate principles + innate parametric choices).

The learning The learning processprocess was modeled as a was modeled as a trial-and-error search trial-and-error search through the domain of all possible grammarsthrough the domain of all possible grammars. Applying familiar . Applying familiar domain-generaldomain-general learning algorithms from computer science. learning algorithms from computer science.

No input No input guidanceguidance toward correct grammar. Input serves only toward correct grammar. Input serves only as feedback on hypotheses selected partly at random.as feedback on hypotheses selected partly at random.

55

Why doesn’t instant triggering Why doesn’t instant triggering work?work?

Input ambiguity: E.g. Exceptional Case Marking (Clark 1989)Input ambiguity: E.g. Exceptional Case Marking (Clark 1989)

We consider We consider himhim to be clever. to be clever. ECM or Infin assigns Acc case?ECM or Infin assigns Acc case? I consider I consider myselfmyself to be clever. to be clever. Long-distance anaphora? Long-distance anaphora?

Derivational opacity: E.g. Derivational opacity: E.g. Adv P not Verb Subj.Adv P not Verb Subj.

Entails -NullSubj. Why?! Because P with no object must be Entails -NullSubj. Why?! Because P with no object must be due to obj-topicalization, then topic-drop. +NullTop entails -NS. due to obj-topicalization, then topic-drop. +NullTop entails -NS.

Conclusion: It’s impossible or impractical to recognize the Conclusion: It’s impossible or impractical to recognize the parameter-values from the surface sentence. parameter-values from the surface sentence. Learners Learners have to guesshave to guess.. (Counter-argument in Classes 6 & 7.)(Counter-argument in Classes 6 & 7.)

Also, classic triggering mis-predicts child data (Yang 2002): Also, classic triggering mis-predicts child data (Yang 2002): children’s grammar changes are gradual; they must be children’s grammar changes are gradual; they must be contemplating two or more (many?) grammars simultaneously. contemplating two or more (many?) grammars simultaneously.

66

Trial-and-error Trial-and-error domain searchdomain search methods: methods:

under-powered or over-resourced under-powered or over-resourced Genetic algorithm.Genetic algorithm. Clark & Roberts (1993) Clark & Roberts (1993)

Test many grammars each on many sentences, rank them, Test many grammars each on many sentences, rank them, breed them, repeat, repeat. (Over-resourced)breed them, repeat, repeat. (Over-resourced)

Triggering Learning Algorithm.Triggering Learning Algorithm. Gibson & Wexler (1994) Gibson & Wexler (1994)Test one grammar at a time, on one sentence. If it fails, Test one grammar at a time, on one sentence. If it fails, change one P at random. (Under-powered; fails often, slow)change one P at random. (Under-powered; fails often, slow)

Variational Model.Variational Model. Yang (2000) Yang (2000) Next slide Next slideGive TLA a memory for success-rate of each parameter Give TLA a memory for success-rate of each parameter value. Test one grammar, but sample the whole domain.value. Test one grammar, but sample the whole domain.

Bayesian Learner.Bayesian Learner. Perfors, Tenenbaum & Regier (2006) Perfors, Tenenbaum & Regier (2006)Test all grammars on total input sample. Adopt the one Test all grammars on total input sample. Adopt the one with best mix of simplicity & good fit. (Over-resourced)with best mix of simplicity & good fit. (Over-resourced)

77

Variational Model’s memory for Variational Model’s memory for how how

well each P-value has performed well each P-value has performed

1 1 1 1 1 1 1 11 1

0 0 0 0 0 0 0 0 00

etc.etc.

Head-Head-directiondirection

Null subjectNull subjectWH-WH-movementmovement

Test one grammar Test one grammar at at a time. If it succeeds, a time. If it succeeds, nudge the pointer for nudge the pointer for each parameter each parameter toward toward the successful P- the successful P-value. value. If the grammar fails, If the grammar fails, nudge the pointers nudge the pointers away away from those P-values. from those P-values.

Select a grammar Select a grammar to to test next, with test next, with probab- probab- ility based on the ility based on the weights of its P- weights of its P-values.values.

88

Varieties of domain-search, Varieties of domain-search, illustrated illustrated

Think Easter egg hunt. Eggs are the parameter values, to Think Easter egg hunt. Eggs are the parameter values, to be found. Search domain is the park. be found. Search domain is the park.

Genetic Algorithm: Send out hordes of searchers, compare Genetic Algorithm: Send out hordes of searchers, compare notes.notes.

Triggering Learning Algorithm: A lone searcher, following Triggering Learning Algorithm: A lone searcher, following own nose, small steps: “getting warmer”. own nose, small steps: “getting warmer”.

Variational Model: Mark findings/failures on a rough map to Variational Model: Mark findings/failures on a rough map to focus search; occasionally dash to another spot to see focus search; occasionally dash to another spot to see what’s there. what’s there.

Compare these with Compare these with decodingdecoding: First consult the sentence! : First consult the sentence! Read a clue, decipher its meaning, go where it says; the Read a clue, decipher its meaning, go where it says; the egg is there. egg is there.

99

Varieties of domain-search, Varieties of domain-search, illustratedillustrated

GA: Send out hordes of GA: Send out hordes of searchers, compare notes.searchers, compare notes.(Vast effort)(Vast effort)

TLA: A lone searcher, TLA: A lone searcher, following own nose, small following own nose, small steps: “getting warmer”. steps: “getting warmer”. (Slow progress)(Slow progress)

VM: Mark findings/failures VM: Mark findings/failures on a rough map; on a rough map; occasionally dash to another occasionally dash to another spot to see what’s there. spot to see what’s there. (Still a needle in a haystack)(Still a needle in a haystack)

……......

......

Yang’s VM: the best current search Yang’s VM: the best current search modelmodel

Can learn from Can learn from everyevery input sentence. input sentence.

Choice of a grammar to try is based on its track record.Choice of a grammar to try is based on its track record.

But no decoding, so it extracts little info per sentence: But no decoding, so it extracts little info per sentence: Only can /cannot parse. Not Only can /cannot parse. Not whywhy, or what would help. , or what would help.

Can’t recognize unambiguity.Can’t recognize unambiguity.

Non-deterministic. Parameters may swing back & forth Non-deterministic. Parameters may swing back & forth between the two values repeatedly.between the two values repeatedly.

Inefficiency increases with size of the domain, perhaps Inefficiency increases with size of the domain, perhaps exponentially (especially if domain is not ‘smooth’). exponentially (especially if domain is not ‘smooth’).

Yang’s simulations and ours agree: VM consumes an Yang’s simulations and ours agree: VM consumes an order of magnitude more input than decoding models. order of magnitude more input than decoding models.

1010

Is VM plausible as psychology?Is VM plausible as psychology?

VM improves on TLA, achieving more effective search VM improves on TLA, achieving more effective search with modest resources. And it avoids getting permanently with modest resources. And it avoids getting permanently trapped in a wrong corner of the domain. (‘local minimum’)trapped in a wrong corner of the domain. (‘local minimum’)

But it has some strange un-human-like(?) properties:But it has some strange un-human-like(?) properties:

Irrelevant parameter values are rewarded / punished, Irrelevant parameter values are rewarded / punished, e.g., prep-stranding in a sentence with no preps.e.g., prep-stranding in a sentence with no preps.

Without decoding, Without decoding, VM can’t know which parameters are VM can’t know which parameters are relevantrelevant to the input sentence. to the input sentence.

To explore, it tests some grammars that are NOT highly valued To explore, it tests some grammars that are NOT highly valued at present at present The child will often fail to parse a The child will often fail to parse a sentence, even if her currently best grammar sentence, even if her currently best grammar cancan parse it! parse it!

Exploring fights normal language useExploring fights normal language use. . 1111

What’s more psychologically What’s more psychologically

realistic? realistic? A crucial aspect of the VM is that even low-valued grammars A crucial aspect of the VM is that even low-valued grammars

are occasionally tried out on input sentences.are occasionally tried out on input sentences. But is this what children do? But is this what children do? When a toddler hears an utterance, what goes on in her When a toddler hears an utterance, what goes on in her

brain? Specifically:brain? Specifically: What grammar does she try to process the sentence with?What grammar does she try to process the sentence with? Surely, she’d apply her currently ‘highest-valued’ grammar?Surely, she’d apply her currently ‘highest-valued’ grammar?

Why would she use one that she believes to be wrong?Why would she use one that she believes to be wrong? A low-valued grammar would often fail to deliver a successful A low-valued grammar would often fail to deliver a successful

parse of the sentence. When it fails, the child doesn’t parse of the sentence. When it fails, the child doesn’t (linguistically) (linguistically) understandunderstand the sentence – even if it’s one she the sentence – even if it’s one she understood yesterday and it is generated by her current ‘best’ understood yesterday and it is generated by her current ‘best’ grammar!grammar!

CUNY’s alternative: Learning by CUNY’s alternative: Learning by parsingparsing

This is a brief preview. We’ll go into more detail in Class 7.This is a brief preview. We’ll go into more detail in Class 7.

A child’s aim is to A child’s aim is to understand what people are sayingunderstand what people are saying. .

So, just like adults, children try to parse the sentences they So, just like adults, children try to parse the sentences they hear. hear. ((Assign structure to word string; semantic compositionAssign structure to word string; semantic composition .).)

When the child’s grammar licenses an input, her parsing When the child’s grammar licenses an input, her parsing routines function just as in adult sentence comprehension.routines function just as in adult sentence comprehension.

When the sentence lies beyond her current grammar, the When the sentence lies beyond her current grammar, the parsing mechanism can process parts of the sentence but not parsing mechanism can process parts of the sentence but not all. It seeks a way to complete the parse tree. all. It seeks a way to complete the parse tree. ((Not just yes/no.Not just yes/no.))

To do so, it draws on the additional parameter-values that UG To do so, it draws on the additional parameter-values that UG makes available, seeking one that can solve the problem.makes available, seeking one that can solve the problem.

If a parameter-value succeeds in rescuing the parse, that If a parameter-value succeeds in rescuing the parse, that means it’s useful, so it is adopted into the grammar. means it’s useful, so it is adopted into the grammar. 1313

So a parameter value must be So a parameter value must be something the parser can something the parser can useuse

What a parser (adult or child) really needs is a way to What a parser (adult or child) really needs is a way to connect an incoming word into the tree structure being built. connect an incoming word into the tree structure being built. Some linkage of syntactic nodes and branches. Some linkage of syntactic nodes and branches.

At CUNY we take parameter values to be UG-specified At CUNY we take parameter values to be UG-specified ‘treelets’, that the parser can use. (‘treelets’, that the parser can use. (NotNot switch-settings.) switch-settings.)

A treelet is a sub-structure of larger sentential trees (typically A treelet is a sub-structure of larger sentential trees (typically underspecified in some respects). underspecified in some respects).

Example treelet: a PP node immediately dominating a Example treelet: a PP node immediately dominating a preposition and a nominal trace. Indicates a positive value preposition and a nominal trace. Indicates a positive value for the preposition-stranding parameter (for the preposition-stranding parameter (Who are you talking Who are you talking with now?with now? vs. * vs. *Avec qui parles-tu maintenant?Avec qui parles-tu maintenant?).).

1414

Children do what adults do: Children do what adults do:

ExampleExample E.g., E.g., Which rock can you jump to from here?Which rock can you jump to from here? has a stranded has a stranded

preposition preposition toto, with no overt complement. That becomes , with no overt complement. That becomes evident at the word evident at the word fromfrom. .

For an For an adultadult English speaker, the parsing mechanism has English speaker, the parsing mechanism has access to a possible piece of tree structure (a ‘treelet’) which access to a possible piece of tree structure (a ‘treelet’) which inserts a phonologically null complement to the preposition, inserts a phonologically null complement to the preposition, and links it to a fronted wh-phrase. See tree diagram and links it to a fronted wh-phrase. See tree diagram

Now consider a child who already knows wh-movement but Now consider a child who already knows wh-movement but not yet preposition stranding (maybe not realistic!). The child’s not yet preposition stranding (maybe not realistic!). The child’s parser would do exactly the same as the adult’s, up to the parser would do exactly the same as the adult’s, up to the word word fromfrom..

The child’s current grammar offers no means of continuing the The child’s current grammar offers no means of continuing the parse. It has no treelet that fits between parse. It has no treelet that fits between to to and and fromfrom. So it . So it must look and see whether UG can provide one.must look and see whether UG can provide one.

15151515

In English, a preposition may have a null In English, a preposition may have a null

complement.complement.Learners will discover this, as they parse.Learners will discover this, as they parse.

1616

+nul+nulllii

Children must reach out to UGChildren must reach out to UG The child’s parser must search for a treelet in the wider The child’s parser must search for a treelet in the wider

pool of candidates made available by UG, to identify one pool of candidates made available by UG, to identify one that will fill that gap in the parse tree.that will fill that gap in the parse tree.

Once found, that treelet would become part of the learner’s Once found, that treelet would become part of the learner’s grammar, for future use in understanding and producing grammar, for future use in understanding and producing sentences with stranded prepositions.sentences with stranded prepositions.

Summary: In the treelet model, the learner’s innate parsing Summary: In the treelet model, the learner’s innate parsing mechanism works with the learner’s single mechanism works with the learner’s single currently best currently best grammargrammar hypothesis, and upgrades it hypothesis, and upgrades it on-lineon-line just if and just if and where it finds that where it finds that a new treelet is neededa new treelet is needed in order to parse in order to parse an incoming sentence. an incoming sentence.

A child’s processing of sentences differs from an adults’, A child’s processing of sentences differs from an adults’, onlyonly in the need to reach out to UG for new treelets. in the need to reach out to UG for new treelets.

17171717

Compared with domain search Compared with domain search

systemssystems In this way, the specific properties of input sentences provide In this way, the specific properties of input sentences provide

a a word-by-word guideword-by-word guide to the adoption of relevant parameter to the adoption of relevant parameter values, in a narrowly channeled process. values, in a narrowly channeled process.

E.g., What to do if you encounter a sentence containing a E.g., What to do if you encounter a sentence containing a prep without an overt object. prep without an overt object.

This input-guidance gets the maximum benefit from the This input-guidance gets the maximum benefit from the information the input contains. information the input contains.

It requires no specifically-evolved learning mechanism for It requires no specifically-evolved learning mechanism for language. (But it does need access to UG.)language. (But it does need access to UG.)

It makes use of the sentence parsing mechanism, which is It makes use of the sentence parsing mechanism, which is needed in any case – and which is generally regarded as needed in any case – and which is generally regarded as being innate, ready to function as soon as the child knows being innate, ready to function as soon as the child knows some words.some words.

1818

Please read before Friday (Class 3)Please read before Friday (Class 3)

The 2-page article “Positive and negative evidence in The 2-page article “Positive and negative evidence in language acquisition”, by Grimshaw & Pinker.language acquisition”, by Grimshaw & Pinker.

On the availability and utility of negative data. On the availability and utility of negative data. The key questions: Does negative evidence exist? The key questions: Does negative evidence exist?

Do language learners use it? Do language learners use it? Do language learners need to? Do language learners need to?

1919