9.012Brain andCognitive Sciences II
Part VIII: Intro to Language & Psycholinguistics
- Dr. Ted Gibson
Nathan Wilson
Distributed Representations, Simple Recurrent Networks,And Grammatical Structure
Jeffrey L. Elman (1991)Machine Learning
Distributed Representations/ Neural Networks
• are meant to capture the essence of neural computation:
many small, independent units calculating very simple functions in parallel.
Distributed Representations/ Neural Networks
• are meant to capture the essence of neural computation:
many small, independent units calculating very simple functions in parallel.
Why Apply Network / Connectionist Modeling to
Language Processing?
• Connectionist Modeling is Good at What it Does
• Language is a HARD problem
What We Are Going to Do
• Build a network• Let it learn how to “read”• Then test it!
– Give it some words in a reasonably grammatical sentence
– Let it try to predict the next word, • Based on what it knows about grammar
What We Are Going to Do
• Build a network• Let it learn how to “read”• Then test it!
– Give it some words in a reasonably grammatical sentence
– Let it try to predict the next word, • Based on what it knows about grammar
– BUT: We’re not going to tell it any of the rules
1000000000000
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
Methods > Network Implementation > Structure
Methods > Network Implementation > Training
Words We’re going to Teach it:
- Nouns:boy | girl | cat | dog | boys | girls | cats | dogs
- Proper Nouns:John | Mary
- “Who”
- Verbs:chase | feed | see | hear | walk | live | chases | feeds | sees | hears | walks | lives
- “End Sentence”
Methods > Network Implementation > Training
1. Encode Each Word with Unique Activation Pattern
- boy => 000000000000000000000001- girl => 000000000000000000000010-feed => 000000000000000000000100-sees => 000000000000000000001000
. . .
-who => 010000000000000000000000-End sentence =>
100000000000000000000000
Methods > Network Implementation > Training
1. Encode Each Word with Unique Activation Pattern
- boy => 000000000000000000000001- girl => 000000000000000000000010-feed => 000000000000000000000100-sees => 000000000000000000001000
. . .
-who => 010000000000000000000000-End sentence =>
100000000000000000000000
2. Feed these words sequentially to the network(only feed words in sequences that make good grammatical sense!)
1000000000000
100100100100100100100100
INPUT
HIDDEN
OUTPUT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
Methods > Network Implementation > Structure
Methods > Network Implementation > Training
1. Encode Each Word with Unique Activation Pattern
- boy => 000000000000000000000001- girl => 000000000000000000000010-feed => 000000000000000000000100-sees => 000000000000000000001000
. . .
-who => 010000000000000000000000-End sentence =>
100000000000000000000000
2. Feed these words sequentially to the network(only feed words in sequences that make good grammatical sense!)
1000000000000
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
Methods > Network Implementation > Structure
If learning wordrelations, needsome sort of memoryfrom word to word!
1000000000000
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
CONTEXT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
CONTEXT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
CONTEXT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
CONTEXT
Methods > Network Implementation > Structure
1000000000000
100100100100100100100100
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
CONTEXT
Methods > Network Implementation > Structure
BACKPROP!
What We Are Going to Do
• Build a network• Let it learn how to “read”• Then test it!
– Give it some words in a reasonably grammatical sentence
– Let it try to predict the next word, • Based on what it knows about grammar
– BUT: We’re not going to tell it any of the rules
-After Hearing: “boy….”
-Network SHOULD predict next word is:“chases”
-NOT:“chase”
Subject and verb should agree!
Results > Emergent Properties of Network > Subject-Verb Agreement
-After Hearing: “boy….”
-Network SHOULD predict next word is:“chases”
-NOT:“chase”
Subject and verb should agree!
Results > Emergent Properties of Network > Noun-Verb Agreement
Results > Emergent Properties of Network > Noun-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boy…..
Results > Emergent Properties of Network > Noun-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boy…..
-Likewise, after Hearing: “boys….” (or boyz!)
-Network SHOULD predict next word is:“chase”
-NOT:“chases”
Again, subject and verb should agree!
Results > Emergent Properties of Network > Noun-Verb Agreement
Results > Emergent Properties of Network > Noun-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys…..
Results > Emergent Properties of Network > Noun-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys…..
Results > Emergent Properties of Network > Noun-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys…..
There’s a differencebetween nouns and verbs. There are even different kinds of nounsthat require differentkinds of verbs.
-After Hearing: “chase”
-Network SHOULD predict next word is:“some direct object (like ”boys”)
-NOT:“. ”
Hey, if a verb needs an argument, it only makes sense to give it one!
Results > Emergent Properties of Network > Verb-Argument Agreement
-Likewise, after hearing the verb: “lives”
-Network SHOULD predict next word is:“. “
-NOT:“dog”
If the verb doesn’t make sense with an argument,It falls upon us to withhold one from it.
Results > Emergent Properties of Network > Verb-Argument Agreement
0.0 0.2 0.4 0.6 0.8 1.0
Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boy chases…..Results > Emergent Properties of Network > Verb-Argument Agreement
0.0 0.2 0.4 0.6 0.8 1.0
Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boy chases…..Results > Emergent Properties of Network > Verb-Argument Agreement
0.0 0.2 0.4 0.6 0.8 1.0
Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boy lives…..Results > Emergent Properties of Network > Verb-Argument Agreement
0.0 0.2 0.4 0.6 0.8 1.0
Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boy lives…..Results > Emergent Properties of Network > Verb-Argument Agreement
0.0 0.2 0.4 0.6 0.8 1.0
Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boy lives…..Results > Emergent Properties of Network > Verb-Argument Agreement
There are different kinds of verbsthat require differentkinds of nouns.
-After hearing: “boy who mary chases…”
-Network might predict next word is:“boys“
Since it learned that “boys” follows “mary chases”
But if it’s smart: may realize that “chases” is linked to “boys”, not “mary”
In which case you need a verb next, not a noun!
A good lithmus test for some intermediate understanding?
Results > Emergent Properties of Network > Longer-Range Dependence
Results > Emergent Properties of Network > Verb-Argument Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys who Mary…..
Results > Emergent Properties of Network > Verb-Argument Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys who Mary…..
Results > Emergent Properties of Network > Subject-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys who mary chases…..
Results > Emergent Properties of Network > Subject-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys who mary chases feed…..
Results > Emergent Properties of Network > Subject-Verb Agreement
0.0 0.2 0.4 0.6 0.8 1.0Activation
Single Noun
Plural Noun
Single Verb, DO Optional
“Who”
Single Verb, DO Required
Single Verb, DO Impossible
Plural Verb, DO Optional
Plural Verb, DO Required
Plural Verb, DO Impossible
End of Sentence
Wha
t Wor
d N
etw
ork
Pre
d ict
s is
Nex
t
boys who mary chases feed cats…..
What We Are Going to Do
• Build a network• Let it learn how to “read”• Then test it!
– Give it some words in a reasonably grammatical sentence
– Let it try to predict the next word, • Based on what it knows about grammar
– BUT: We’re not going to tell it any of the rules
Did Network Learn About Grammar?
• It learned there are different classes of nouns that need singular and plural verbs.
• It learned there are different classes of verbs that have diff. requirements in terms of direct objects.
• It learned that sometimes there are long-distance dependencies that don’t follow from immediately preceding words – => relative clauses and constituent structure of sentences.
Once You Have a Successful Network, can Examine its
Properties with Controlled I/O Relationships
• Boys hear boys• Boy hears boys.• Boy who boys chase chases boys.• Boys who boys chase chase boys.
1000000000000
100100100100100100100100
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
CONTEXT
Methods > Network Implementation > Structure
BACKPROP!
1000000000000
100100100100100100100100
100100100100100100100100
0000000000001
INPUT
HIDDEN
OUTPUT
CONTEXT
Methods > Network Implementation > Structure
BACKPROP!
What Does it Mean, “No Explicit Rules?”
• Does it just mean the mapping is “too complicated?”
• “Too difficult to formulate?”• “Unknown?”
• Possibly just our own failure to understand the mechanism, rather than description of mechanism itself.
General Advantages of Distributed Models
• Distributed, which while not limitless, is less rigid than models where there is strict mapping from concept to node.
• Generalizations are captured at a higher level than input – abstractly. So generalization to new input is possible.
9.012Brain andCognitive Sciences II
Part VIII: Intro to Language & Psycholinguistics
- Dr. Ted Gibson