nlp representations from a perspective of human cognition
TRANSCRIPT
![Page 1: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/1.jpg)
NLP representations from a perspective of human cognition
Allyson EttingerCLSP, Johns Hopkins University
Nov 18 2019
![Page 2: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/2.jpg)
The big goal
• NLP is trying to solve “natural language understanding”
• This can be defined in various ways
• Ideal: achieve human capacity to extract, represent, and deploy information from language input
![Page 3: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/3.jpg)
How to assess “understanding”?
• How do we assess the information that a system has captured?
• Downstream tasks?
![Page 4: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/4.jpg)
How to assess “understanding”?
• Current challenge in NLP: powerful pre-trained models are beating our current benchmarks
• But no one really thinks we have mastered “understanding”
• This is a mismatch that needs to be addressed
• Our dominant question: how can we better understand and more effectively evaluate what our models actually “know” about language
![Page 5: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/5.jpg)
Using human cognition as a lens
• We’re going to examine this from the perspective of human cognition
• What do we need humans for? Planes don’t flap their wings …
• Concept of understanding is defined based on humans
• Essentially all NLP benchmarks use human judgments at some level
![Page 6: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/6.jpg)
What about humans to aspire to
• Certain levels of human understanding make sense for us to emulate with our systems – specifically, endpoint of comprehension
• Other aspects (errors, early stages) not clear we want to emulate
• But sometimes our models do resemble these other aspects • Worth identifying, thinking about why this is happening, and
determining what needs to change to target the endpoint of comprehension
![Page 7: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/7.jpg)
Outline
1. Assessing systematic composition in sentence encoders
2. Simpler models as approximation of real-time predictive response
3. Evaluating pre-trained LMs against human predictive responses
![Page 8: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/8.jpg)
Outline
1. Assessing systematic composition in sentence encoders
2. Simpler models as approximation of real-time predictive response
3. Evaluating pre-trained LMs against human predictive responses
![Page 9: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/9.jpg)
Learning sentence representations
The turquoise giraffe recited the sonnet but did not
forgive the flight attendant
SENTENCE MEANING
COMPOSITION
![Page 10: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/10.jpg)
How are we doing at meaning composition?
[ .23 -.04 .45 .13 … ]
??
The turquoise giraffe recited the sonnet but did not
forgive the flight attendant
![Page 11: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/11.jpg)
“The cat scratched the dog”
Is “dog” in the sentence?
Probing tasksIs my sentence encoder capturing word content?
![Page 12: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/12.jpg)
Probing tasks
• Ettinger et al. (2016), Adi et al. (2016)
• Dates back over a decade in neuroscience: multivariate pattern analysis, Haxby et al. (2001)
![Page 13: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/13.jpg)
Our work
• Target aspects of sentence meaning relevant to composition
• Additional measures to control tests and increase confidence in conclusions
![Page 14: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/14.jpg)
Control 1: sentence generation
![Page 15: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/15.jpg)
“professor = AGENT of help”
The professor helped the student
The lawyer is being helped by the professor
The professor that the girl likes helped the man
The professor is not helping the executive
Control 1: sentence generation
![Page 16: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/16.jpg)
Control 2: Bag-of-words check
![Page 17: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/17.jpg)
Control 2: Bag-of-words check
the waitress served the customer
??
the customer served the waitress
AVERAGE
waitress served customer
![Page 18: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/18.jpg)
Control 2: Bag-of-words check
the waitress served the customer
??
the customer served the waitress
AVERAGE
waitress served customer
Should be at chance
![Page 19: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/19.jpg)
• What information do we know that humans extract systematically?
![Page 20: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/20.jpg)
Target information types
• Semantic role (who did what to whom?)
• Negation (what happened and what didn’t?)
![Page 21: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/21.jpg)
Semantic role: is x agent of y?
SENT: The waitress who served the customer is sleeping
X-PROBE: waitress
Y-PROBE: serve
LABEL: +1
SENT: The waitress who served the customer is sleepingX-PROBE: customer
Y-PROBE: sleep
LABEL: -1
![Page 22: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/22.jpg)
Negation: did y happen?
SENT: The waitress is serving the customer who is not actually sleeping Y-PROBE: sleep
LABEL: -1
SENT: The waitress is not actually serving the customer who is sleeping Y-PROBE: sleep
LABEL: +1
![Page 23: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/23.jpg)
sentence embedding probe vector(s)
single hidden layer
binary label prediction
MLP classifier
![Page 24: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/24.jpg)
Sentence embedding models
• BOW: Bag-of-words vector averaging
• SDAE: Sequential Denoising Autoencoder (Hill et al., 2015)
• ST-UNI, ST-BI: SkipThought – uniskip and biskip (Kiros et al., 2015)
• InferSent (Conneau et al. 2017)
• 2400 dimensions
![Page 25: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/25.jpg)
Sanity check: surface tasks
Adi et al., 2016
• word content• given probe x: is x present in sentence?
• word order • given probes x, y: does x precede y in sentence?
![Page 26: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/26.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 27: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/27.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 28: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/28.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 29: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/29.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 30: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/30.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 31: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/31.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 32: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/32.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
The waitress is not actually serving the customer who is sleeping
![Page 33: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/33.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
the waitress is not serving the customer customer the serving not is waitress the
The waitress is not actually serving the customer who is sleeping
![Page 34: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/34.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 35: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/35.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
![Page 36: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/36.jpg)
Classification accuracy
CONTENT ORDER ROLE NEG
BOW 100.0 55.0 51.3 50.9
SDAE 100.0 92.9 63.7 99.0
ST-UNI 100.0 93.2 62.3 96.6
ST-BI 96.6 88.7 63.2 74.7
InferSent 100.0 86.4 50.1 97.2
Sequence models appear to identify linking of negation to next verb
Work to be done on semantic roles
![Page 37: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/37.jpg)
Update from Sesame Street
• Davis Yoshida (TTIC) tested semantic role tasks on ELMo, BERT, GPT• Tested various configurations: CLS token, average of WordPiece
tokens – below reports best performance
• ELMo (68.60%)• BERT (63.00%)• GPT (61.4%)
![Page 38: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/38.jpg)
Outline
1. Assessing systematic composition in sentence encoders
2. Simpler models as approximation of real-time predictive response
3. Evaluating pre-trained LMs against human predictive responses
![Page 39: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/39.jpg)
Beyond the endpoint
• Part I also emphasized that BOW can’t capture sentence meaning, so a model that resembles BOW can’t be doing understanding
• But there are other stages of comprehension that might actually look a bit like this
![Page 40: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/40.jpg)
Measuring human brain activity (EEG)
![Page 41: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/41.jpg)
N400 component
I take coffee with cream and _____
… socks
… sugar
(Kutas & Hillyard, 1980)
![Page 42: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/42.jpg)
Cloze probability
I take coffee with cream and _____
… socks
… sugar
Cloze probability = 0
Cloze probability = .6
![Page 43: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/43.jpg)
Deviating from cloze
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 44: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/44.jpg)
Deviating from cloze
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 45: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/45.jpg)
Deviating from cloze
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 46: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/46.jpg)
Deviating from cloze
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 47: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/47.jpg)
Deviating from cloze
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 48: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/48.jpg)
Deviating from cloze
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 49: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/49.jpg)
N400
• Probably reflects most efficient available information for predicting upcoming words
• BOW-type representation may be a common go-to for this purpose
![Page 50: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/50.jpg)
Federmeier & Kutas (1999)
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
… football
… baseball
… monopoly
expected
within-category
between-category
![Page 51: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/51.jpg)
Federmeier & Kutas (1999)
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
… football
… baseball
… monopoly
expected
within-category
between-category
![Page 52: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/52.jpg)
Federmeier & Kutas (1999)
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
… football
… baseball
… monopoly
expected
within-category
between-category
![Page 53: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/53.jpg)
Federmeier & Kutas (1999)
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
… football
… baseball
… monopoly
expected
within-category
between-category
![Page 54: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/54.jpg)
Federmeier & Kutas (1999)
football necklace
earring
mascara
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
![Page 55: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/55.jpg)
Federmeier & Kutas (1999)
monopoly
football necklace
earring
mascara
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
![Page 56: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/56.jpg)
Federmeier & Kutas (1999)
monopoly
baseball
football necklace
earring
mascara
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
![Page 57: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/57.jpg)
Federmeier & Kutas (1999)
monopoly
baseball
football necklace
earring
mascara
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
Unexpected facilitation
![Page 58: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/58.jpg)
Federmeier & Kutas (1999)
monopoly
baseball
football
![Page 59: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/59.jpg)
Federmeier & Kutas account
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
football
![Page 60: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/60.jpg)
Federmeier & Kutas account
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
baseballfootball
![Page 61: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/61.jpg)
Federmeier & Kutas account
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
baseballfootball(overlap
activation)
![Page 62: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/62.jpg)
Alternative account
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
![Page 63: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/63.jpg)
Alternative account
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
![Page 64: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/64.jpg)
Alternative account
He caught the pass and scored another touchdown. There was nothing he enjoyed more than a good game of ____
baseball
![Page 65: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/65.jpg)
BOW averaging simulation
caught the pass and scored a touchdown …
football/
baseball/
monopoly
(cosine)
AVERAGE
![Page 66: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/66.jpg)
Simulation results
Cosin
e sim
ilarit
y
![Page 67: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/67.jpg)
Simulation results
Cosin
e sim
ilarit
yExpected exemplar: facilitation
![Page 68: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/68.jpg)
Cosin
e sim
ilarit
yExpected exemplar: facilitation
Between-category: no facilitation
Simulation results
![Page 69: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/69.jpg)
Cosin
e sim
ilarit
yWithin-category (high-constraint):unexpected facilitation
Simulation results
![Page 70: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/70.jpg)
Interim takeaways
• Cognitive scientists: alternative explanation for observed result (made possible by availability of word embeddings)
• Our purposes: BOW model may not amount to comprehension – but it may align with other aspects of human processing
• Understanding which part of human processing we are approximating can help to improve in desired directions
![Page 71: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/71.jpg)
Outline
1. Assessing systematic composition in sentence encoders
2. Simpler models as approximation of real-time predictive response
3. Evaluating pre-trained LMs against human predictive responses
![Page 72: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/72.jpg)
Pre-trained language models
• Impressive generalization across large number of tasks
• What kinds of generalizable linguistic competence do these models acquire during LM pre-training?
• Is it “understanding”? Is it shallower?
![Page 73: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/73.jpg)
BERT(Devlin et al 2018)
![Page 74: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/74.jpg)
Probe representations?
• We could use probing tasks to probe the representations that pretrained models produces
• Few a priori expectations
• Should the CLS token represent all the sentence information? Should the average of token representations? At which layers?
![Page 75: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/75.jpg)
Test word predictions
• Alternative: test pre-trained BERT in its most natural setting of predicting words in context
• What information is BERT sensitive to when making word predictions in context?
![Page 76: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/76.jpg)
Psycholinguistic tests
• Designed to draw conclusions based on predictive responses in context
• Controlled to ask targeted questions about predictive mechanisms
![Page 77: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/77.jpg)
N400/cloze divergence
• Choose psycholinguistic tests for which the N400 and cloze response diverge
• N400 predictive response shows apparent insensitivity to certain useful information for prediction
• Will BERT show similar insensitivities, or will it be able to make use of the higher-level predictive information that cloze reflects?
![Page 78: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/78.jpg)
Psycholinguistic diagnostics
• Adapt three psycholinguistic datasets
• Three types of tests for each:1. Word prediction accuracy—how well can the model use the
relevant information to guide word predictions2. Sensitivity tests—how well can the model distinguish between
completions that the N400 has showed insensitivity on3. Qualitative analysis—what do BERT’s top predictions tell us about
the information it has access to?
![Page 79: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/79.jpg)
Datasets
• CPRAG-102: commonsense/pragmatic inference
• ROLE-88: event knowledge and semantic roles
• NEG-136: negation
![Page 80: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/80.jpg)
Datasets
• CPRAG-102: commonsense/pragmatic inference
• ROLE-88: event knowledge and semantic roles
• NEG-136: negation
![Page 81: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/81.jpg)
CPRAG-102: commonsense/pragmatic inference
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
He complained that after she kissed him, he couldn’t get the red color
off his face. He finally just asked her to stop wearing that ____
![Page 82: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/82.jpg)
Prediction accuracy test
• Need to use commonsense inference to discern what is being described in first sentence
• Need to use pragmatic inference (along with normal syntactic/semantic information) to determine how the second sentence relates to the first
![Page 83: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/83.jpg)
CPRAG-102: commonsense/pragmatic inference
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
He complained that after she kissed him, he couldn’t get the red color
off his face. He finally just asked her to stop wearing that ____
![Page 84: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/84.jpg)
Sensitivity test
• Can BERT distinguish between completions with semantic features in common?
![Page 85: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/85.jpg)
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
… football
… baseball
… monopoly between-category
Federmeier & Kutas (1999)
CPRAG-102: commonsense/pragmatic inference
![Page 86: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/86.jpg)
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
… football
… baseball
… monopoly between-category
Federmeier & Kutas (1999)
CPRAG-102: commonsense/pragmatic inference
![Page 87: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/87.jpg)
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of ____
… football
… baseball
… monopoly
Federmeier & Kutas (1999)
CPRAG-102: commonsense/pragmatic inference
![Page 88: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/88.jpg)
Datasets
• CPRAG-102: commonsense/pragmatic inference
• ROLE-88: event knowledge and semantic roles
• NEG-136: negation
![Page 89: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/89.jpg)
ROLE-88: events and semantic roles
The restaurant owner forgot which customer the waitress had ____
The restaurant owner forgot which waitress the customer had ____
Original study: Chow et al., 2015
![Page 90: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/90.jpg)
Prediction accuracy test
• Need to use semantic role information and knowledge about typical events in order to make accurate predictions
![Page 91: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/91.jpg)
ROLE-88: events and semantic roles
The restaurant owner forgot which customer the waitress had ____
The restaurant owner forgot which waitress the customer had ____
Original study: Chow et al., 2015
![Page 92: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/92.jpg)
Sensitivity test
• Will BERT reliably prefer continuations in the appropriate contexts rather than the role-reversed contexts?
![Page 93: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/93.jpg)
ROLE-88: events and semantic roles
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 94: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/94.jpg)
ROLE-88: events and semantic roles
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 95: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/95.jpg)
ROLE-88: events and semantic roles
The restaurant owner forgot which customer the waitress had ____
… served
The restaurant owner forgot which waitress the customer had ____
… served
Chow et al., 2015
![Page 96: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/96.jpg)
Datasets
• CPRAG-102: commonsense/pragmatic inference
• ROLE-88: event knowledge and semantic roles
• NEG-136: negation
![Page 97: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/97.jpg)
NEG-136: negation
A robin is a ____
… bird
A robin is not a ____
… bird
Original study: Fischler et al., 1983
![Page 98: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/98.jpg)
NEG-136: negation
A robin is a ____
… bird
A robin is not a ____
… bird
Original study: Fischler et al., 1983
![Page 99: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/99.jpg)
NEG-136: negation
A robin is a ____
… bird
A robin is not a ____
… bird
Original study: Fischler et al., 1983
![Page 100: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/100.jpg)
NEG-136: negation
A robin is a ____
… bird
A robin is not a ____
… bird
Original study: Fischler et al., 1983
![Page 101: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/101.jpg)
NEG-136: negation
A robin is a ____
… bird
A robin is not a ____
… bird
Original study: Fischler et al., 1983
![Page 102: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/102.jpg)
Prediction accuracy
• This test doesn’t make sense in negated contexts, so test accuracy only on affirmative contexts
• Accurate predictions here require access to hypernym information
![Page 103: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/103.jpg)
Sensitivity test
• This is where the test of negation comes in
• Can BERT prefer true continuations to false continuations, with and without negation?
![Page 104: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/104.jpg)
Experiments
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of [MASK]
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 105: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/105.jpg)
Experiments
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of [MASK]
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 106: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/106.jpg)
Experiments
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of [MASK]
Extract BERT word predictions on [MASK] token, as in pre-training
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 107: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/107.jpg)
Experiments
• BERTBase – 12 layers, hidden layer size 768 dimensions, 12 self-attention heads. Total parameters 110M
• BERTLarge – 24 layers, 1024 dim hidden size, 16 self-attention heads. Total parameters 340M
![Page 108: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/108.jpg)
Results: CPRAG accuracy test
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of [MASK]
football in top k BERT predictions ?
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 109: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/109.jpg)
Results: CPRAG accuracy test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 110: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/110.jpg)
Results: CPRAG accuracy test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 111: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/111.jpg)
Results: CPRAG accuracy test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 112: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/112.jpg)
Results: CPRAG sensitivity test
He caught the pass and scored another touchdown. There was nothing
he enjoyed more than a good game of [MASK]
football > baseball and monopoly ?
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 113: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/113.jpg)
Results: CPRAG sensitivity test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 114: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/114.jpg)
CPRAG qualitative analysis
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 115: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/115.jpg)
Results: ROLE accuracy test
The restaurant owner forgot which customer the waitress had [MASK]
(served in top k BERT predictions?)
The restaurant owner forgot which waitress the customer had [MASK]
(tipped in top k BERT predictions?)
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 116: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/116.jpg)
Results: ROLE accuracy test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 117: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/117.jpg)
Results: ROLE accuracy test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 118: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/118.jpg)
Results: ROLE accuracy test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 119: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/119.jpg)
Results: ROLE sensitivity test
The restaurant owner forgot which customer the waitress had [MASK]
served
>
The restaurant owner forgot which waitress the customer had [MASK]
served
?
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 120: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/120.jpg)
Results: ROLE sensitivity test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 121: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/121.jpg)
ROLE qualitative analysis
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 122: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/122.jpg)
Results: NEG accuracy test
A robin is a [MASK]
bird in top k BERT predictions ?
A robin is not a [MASK]
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 123: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/123.jpg)
Results: NEG accuracy test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 124: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/124.jpg)
Results: NEG sensitivity test
A robin is a [MASK]
bird > tree ?
A robin is not a [MASK]
tree > bird ?
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 125: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/125.jpg)
Results: NEG sensitivity test
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 126: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/126.jpg)
NEG qualitative analysis
Ettinger (2019). What BERT is not: Lessons from a new
suite of psycholinguistic diagnostics for language models
![Page 127: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/127.jpg)
Takeaways
• Decent on sensitivity to role reversal and differences within semantic category – but seemingly weaker sensitivity than cloze
• Great with hypernyms, determiners, grammaticality
• Struggles with challenging inference and event-based prediction
• Clear insensitivity to contextual impacts of negation
![Page 128: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/128.jpg)
Discussion
• Many of these results give general indication that these pre-trained models have a way to go to incorporate human inference
• Negation result is more striking and starker
• Not surprising, ultimately, given LM training – but possibly means that LM training isn’t suited for learning negation
• What other aspects of comprehension have this property?
![Page 129: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/129.jpg)
Outline
1. Assessing systematic composition in sentence encoders
2. Simpler models as approximation of real-time predictive response
3. Evaluating pre-trained LMs against human predictive responses
![Page 130: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/130.jpg)
Conclusions
• What we want to be able to do is capture the endpoint of comprehension
• What we’re good at right now is leveraging co-occurrence statistics in a way that maximizes our ability to predict surrounding/upcoming words
• This sometimes causes our models to better resemble earlier stages of human comprehension rather than the endpoint
• Understanding what part of human processing we’re capturing, and how that relates to what we do want to capture, could help us meet our goals
![Page 131: NLP representations from a perspective of human cognition](https://reader030.vdocuments.us/reader030/viewer/2022012409/616a409811a7b741a35079c7/html5/thumbnails/131.jpg)
Thank you!
GRF Grant DGE-1322106NRT Grant DGE-1449815
Toyota Technological Institute at Chicago
Philip Resnik Colin Phillips
Naomi Feldman Ahmed Elgohary