12 prod computational - stanford university
TRANSCRIPT
Today
1. Prominent theoretical alternative to Availability-based production: Uniform Information Density
2. Looking back: probability in different areas of language processing
Information selection vs. form selection
information
form
full thought/message/idea/intention the speaker wants to convey
thought/message
select specific aspects of the full thought to encode into a sentence
decide how to distribute the information across a sentence, within the constraints of syntax
compress
distribute
focus of most production research
Choices at many levels of production
Utterance level:
Phrasal level:
Word level:Morphological level:Phonological level:
Phonetic level:
Move the triangle to the left. Select the triangle. Move it to the left. She gave {him the key / the key to him} She already ate (dinner) She stabbed him (with a knife) I read a book (that) she wrote. I’ve/have gone there. t/d-deletion (tha[t] cat) metathesis (ask/aks) speech rate, clarity of articulation
Choices at many levels of production
Utterance level:
Phrasal level:
Word level:Morphological level:Phonological level:
Phonetic level:
Move the triangle to the left. Select the triangle. Move it to the left. She gave {him the key / the key to him} She already ate (dinner) She stabbed him (with a knife) I read a book (that) she wrote. I’ve/have gone there. t/d-deletion (tha[t] cat) metathesis (ask/aks) speech rate, clarity of articulation
Many factors affect these choices. Let’s investigate.
Availability-based production
Not all production is priming! Many factors affect the choice of syntactic structure. A prominent perspective:
The Principle of Immediate Mention“Production proceeds more efficiently if syntactic structures are used that permit quickly selected lemmas to be mentioned as soon as possible.” (p. 299)
Ferreira & Dell 2000
Availability-based production
Not all production is priming! Many factors affect the choice of syntactic structure. A prominent perspective:
The Principle of Immediate Mention“Production proceeds more efficiently if syntactic structures are used that permit quickly selected lemmas to be mentioned as soon as possible.” (p. 299)
Ferreira & Dell 2000
…because it allows for working memory to not get cluttered with pieces of information speakers are waiting to produce
What determines speed of lemma selection? Ease of retrieval (accessibility)
• imageability
• concreteness
• frequency
• predictability
• prior mention
• animacy
Communicating through a noisy channel
transmitter receiver
Assuming language is an instance of communication through a noisy channel: information density is optimized near the channel capacity, where speakers maximize the rate of information transmission while minimizing the danger of a mistransmitted message.
Shannon 1949, Levy & Jaeger 2007
Communicating through a noisy channel
Speakers should provide more redundancy in linguistic signal when message is less inferable.
transmitter receiver
Uniform Information Density (UID)Uniform Information Density (UID) Within the bounds defined by grammar, speakers prefer utterances that distribute information uniformly across the signal. Where speakers have a choice between several variants to encode their message, they prefer the variant that results in more uniform information density.
Levy & Jaeger 2017; Jaeger 2010
Uniform Information Density (UID)Uniform Information Density (UID) Within the bounds defined by grammar, speakers prefer utterances that distribute information uniformly across the signal. Where speakers have a choice between several variants to encode their message, they prefer the variant that results in more uniform information density.
Levy & Jaeger 2017; Jaeger 2010
…because it allows for efficient communication, minimizing effort and error on the listener’s side
Uniform Information Density (UID)Uniform Information Density (UID) Within the bounds defined by grammar, speakers prefer utterances that distribute information uniformly across the signal. Where speakers have a choice between several variants to encode their message, they prefer the variant that results in more uniform information density.
Levy & Jaeger 20017; Jaeger 2010
information per unit time
…because it allows for efficient communication, minimizing effort and error on the listener’s side
Efficient morpho-syntactic production Frank & Jaeger 2008
Information content of NOT
Pres.Clintondidn’t/nothave…
more surprising/less predictable
mor
e lik
ely
contracted full
n’t: not:
-> less signal -> more signal
Estimating the information carried by a contractible element
Clinton did NOT have…w2 w1 w
I(NOT|context)
= � log p(NOT|context)
Definition of Shannon information
Frank & Jaeger 2008
Estimating the information carried by a contractible element
Clinton did NOT have…w2 w1 w
I(NOT|context)
= � log p(NOT|context)
⇡ � log p(NOT|“Clinton did”)
Definition of Shannon information
Frank & Jaeger 2008
Estimating the information carried by a contractible element
Clinton did NOT have…w2 w1 w
I(NOT|context)
= � log p(NOT|context)
⇡ � log p(NOT|“Clinton did”)
= � log[p(“not”|“Clinton did”) + p(“n’t”|“Clinton did”)]
Definition of Shannon information
Frank & Jaeger 2008
Estimating the information carried by a contractible element
Clinton did NOT have…w2 w1 w
I(NOT|context)
= � log p(NOT|context)
⇡ � log p(NOT|“Clinton did”)
= � log[p(“not”|“Clinton did”) + p(“n’t”|“Clinton did”)]
Definition of Shannon information
Used trigram model to estimate probability
Frank & Jaeger 2008
Estimating the information carried by a contractible element
Clinton did NOT have…w2 w1 w
I(NOT|context)
= � log p(NOT|context)
⇡ � log p(NOT|“Clinton did”)
= � log[p(“not”|“Clinton did”) + p(“n’t”|“Clinton did”)]
Definition of Shannon information
Used trigram model to estimate probability
same as surprisal
Frank & Jaeger 2008
Surprisal
Certain events (P=1): 0 information (unsurprising) Impossible events (P=0): infinite information (highly surprising)
0
2
4
6
0.00 0.25 0.50 0.75 1.00Probability
Surprisal
MacKay, David J. C.. Information Theory, Inference, and Learning Algorithms
Surprisal
Certain events (P=1): 0 information (unsurprising) Impossible events (P=0): infinite information (highly surprising)
0
2
4
6
0.00 0.25 0.50 0.75 1.00Probability
Surprisal
MacKay, David J. C.. Information Theory, Inference, and Learning Algorithms
1 bit for equi-probable events
Replicated for{WAS,WERE, AM, ARE, IS, WILL} {HAD, HAS, HAVE}
The longer (full) form is more likely to be used the more surprising the contractible element is
Back to complement clausesJaeger 2010
My boss confirmed we were absolutely crazy
My boss confirmed that we were absolutely crazy
;
Can we predict complementizer omission?
POLL
Back to complement clausesJaeger 2010
My boss confirmed we were absolutely crazy
My boss confirmed that we were absolutely crazy
;
Can we predict complementizer omission?
POLL
Back to complement clausesJaeger 2010
My boss confirmed we were absolutely crazy
My boss confirmed that we were absolutely crazy
;
Can we predict complementizer omission?
POLL
Back to complement clausesJaeger 2010
My boss confirmed we were absolutely crazy
My boss confirmed that we were absolutely crazy
;
complement clause with null onset
Can we predict complementizer omission?
POLL
Back to complement clausesJaeger 2010
My boss confirmed we were absolutely crazy
My boss confirmed that we were absolutely crazy
;
complement clause with null onset
complement clause with complementizer
Can we predict complementizer omission?
POLL
Back to Availability-based productionBock & Warren 1985
What would UID predict for the choice between the prepositional and dative object structure?
Back to Availability-based productionBock & Warren 1985
BREAKOUT SESSION
What would UID predict for the choice between the prepositional and dative object structure?
Back to Availability-based productionBock & Warren 1985
What would UID predict for the choice between the prepositional and dative object structure?
Looking back: probability in different areas of language
processingThink back to previous classes (e.g., on language acquisition, speech perception, word recognition, sentence processing): 1. Come up with (at least) 2 examples where probability played an
important role, either in a particular experimental finding or in a theory. 2. Based on the examples you came up with and today’s content: are there
generalizations you can draw about the role of probability in language processing?
Looking back: probability in different areas of language
processingBREAKOUT SESSION
Think back to previous classes (e.g., on language acquisition, speech perception, word recognition, sentence processing): 1. Come up with (at least) 2 examples where probability played an
important role, either in a particular experimental finding or in a theory. 2. Based on the examples you came up with and today’s content: are there
generalizations you can draw about the role of probability in language processing?
Looking back: probability in different areas of language
processingThink back to previous classes (e.g., on language acquisition, speech perception, word recognition, sentence processing): 1. Come up with (at least) 2 examples where probability played an
important role, either in a particular experimental finding or in a theory. 2. Based on the examples you came up with and today’s content: are there
generalizations you can draw about the role of probability in language processing?
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
prediction
guides
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
predicts
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
bottom-up information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
predicts
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
bottom-up information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
predicts
integration
guides
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
bottom-up information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
predicts
integration
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
bottom-up information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
predicts
integration
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
ERROR SIGNAL
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
bottom-up information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
predicts
integration
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
updates
ERROR SIGNAL
ERROR SIGNAL
Brains as prediction machines engaged in error minimization
Clark 2013
top-down information
contains
bottom-up information
contains
prediction
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
predicts
integration
guides
linguistic unit phoneme, eg /b/ word, eg task structure, eg DO
yields
updates
goal: minimize
Utility of error minimization
• faster, less resource-intensive processing (lower surprisal)
• more accurate processing
Summary• Languages provide flexibility to allow speakers to
avoid suspension of speech.
• Speakers take advantage of this flexibility by ordering and timing linguistic material in ways that allow for efficient information transfer.
• Probabilistic/statistical information fundamentally guides language acquisition, comprehension, and production