![Page 1: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/1.jpg)
Applied Natural Language Processing
Info 256Lecture 17: WordNet 2 (March 21, 2019)
David Bamman, UC Berkeley
![Page 2: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/2.jpg)
0
1
2
3
0 50 100 150
avg.
sus
pens
e ra
ting
Suspense
![Page 3: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/3.jpg)
0
1
2
3
0 50 100 150
avg.
sus
pens
e ra
ting
But the yawning policeman saw the thing, the busy crowds in the markets stopped agape, workmen going to their work betimes, milkmen, the
drivers of news-carts, dissipation going home jaded and pale, homeless wanderers, sentinels on their beats, and in the country, labourers trudging
afield, poachers slinking home, all over the dusky quickening country it could be seen--and out at sea by seamen watching for the day--a great
white star, come suddenly into the westward sky!
![Page 4: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/4.jpg)
0
1
2
3
0 50 100 150
avg.
sus
pens
e ra
ting
And voice after voice repeated, "It is nearer," and the clicking telegraph took that up, and it trembled along telephone wires, and
in a thousand cities grimy compositors fingered the type.
![Page 5: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/5.jpg)
0
1
2
3
0 50 100 150
avg.
sus
pens
e ra
ting
"It would seem, gentlemen, if I may put the thing clearly and briefly, that--Man has lived in vain.”
![Page 6: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/6.jpg)
0
1
2
3
0 50 100 150
avg.
sus
pens
e ra
ting
And then came a wonder.
Larger grew the star, and larger, hotter, and brighter with a terrible swiftness now. The tropical ocean had lost its phosphorescence, and the whirling steam rose in ghostly wreaths from the black waves that plunged incessantly, speckled with storm-tossed ships.
![Page 7: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/7.jpg)
Activity
• 9.annotation/IAAMetrics.ipynb
![Page 8: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/8.jpg)
WordNet
• WordNet encodes human-judged measures of similarity.
![Page 9: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/9.jpg)
Synsets
synset gloss
mark, grade, score a number or letter indicating quality
scratch, scrape, scar, mark an indication of damage
bell ringer, bull's eye, mark, home run
something that exactly succeeds in achieving its goal
chump, fool, gull, mark, patsy, fall guy, sucker, soft touch, mug
a person who is gullible and easy to take advantage of
mark, stigma, brand, stain a symbol of disgrace or infamy
![Page 10: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/10.jpg)
Synsets
Hypernyms of {chump, fool, gull, mark, patsy, fall guy, sucker, soft touch, mug} sunset
![Page 11: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/11.jpg)
WordNet• WordNet encodes human-judged measures of
similarity.
• Learn distributed representations of words that respect WordNet similarities (Faruqui et al. 2015)
• By indexing word senses, we can build annotated resources on top of it for word sense disambiguation.
![Page 12: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/12.jpg)
x
y
hate adore
love
dislike
detest
We want words that are known synonyms to have similar
representations
![Page 13: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/13.jpg)
Retrofittingthe likes wrong untrue …
4.1 4.2 0.1 0.12
-0.9 -0.7 0.3 0.28
• Start out with pre-trained word embeddings from any source.
wrong
untrue
love
likefalse
• Define an undirected graph from sets of synomyms
![Page 14: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/14.jpg)
• Learn new embedding ewrong that is simultaneously close to the original embedding êwrong and close to all of its synonyms in WordNet (eflawed euntrue, eincorrect,etc).
Retrofitting
![Page 15: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/15.jpg)
• Learn new embedding ewrong that is simultaneously close to the original embedding êwrong and close to all of its synonyms in WordNet (eflawed euntrue, eincorrect,etc).
Retrofitting
Ψ(Q) =N
∑i=1
αi ∥qi − ̂qi∥2 + ∑(i,j)∈E
βij ∥qi − qj∥2
Distance between new embedding and old one
Distance between new embedding
and synonyms
We want to minimize this
function
![Page 16: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/16.jpg)
Activity
• 10.wordnet/Retrofitting
![Page 17: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/17.jpg)
Semcor• Semcor: 200K+ words from Brown corpus tagged
with Wordnet senses.
• http://web.eecs.umich.edu/~mihalcea/downloads/semcor/semcor3.0.tar.gz
original It urged that the city take steps to remedy this problem
lemma sense It urge1 that the city2 take1 step1 to remedy1 this problem2
synset number It urge2:32:00 that the city1:15:01 take2:41:04 step1:04:02 to remedy2:30:00 this problem1:10:00
![Page 18: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/18.jpg)
WordNet
• WordNet encodes human-judged measures of similarity. Learn distributed representations of words that respect WordNet similarities (Faruqui et al. 2015)
• By indexing word senses, we can build annotated resources on top of it for word sense disambiguation.
![Page 19: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/19.jpg)
“All-word” WSD
“Onlyonly1 a relativerelative1 handfulhandful1 of suchsuch0 reportsreport3 was receivedreceive2”
• For all content words in a sentence, resolve each token to its sense in an fixed sense inventory (e.g., WordNet).
![Page 20: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/20.jpg)
WSD
• Dictionary methods (Lesk)
• Supervised (machine learning)
• Semi-supervised (Bootstrapping)
![Page 21: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/21.jpg)
Dictionary methods
• Predict the sense a given token that has the highest overlap between the token’s context and sense’s dictionary gloss.
![Page 22: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/22.jpg)
“The boat washed up on the river bank.”
Dictionary methods
![Page 23: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/23.jpg)
Lesk Algorithm
![Page 24: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/24.jpg)
Lesk Algorithm
• Extension (Basile et al. 2014): measure similarity between gloss g = {g1, … gG} and context c = {c1, …, cC} as cosine similarity between sum of distributed representations
cos
�G�
i=1
gi,C�
i=1
ci
�
![Page 25: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/25.jpg)
Supervised WSD
• We have labeled training data; let’s learn from it.
• Decision trees (Yarowsky 1994)
• Naive Bayes, log-linear classifiers, support vector machines (Zhong and Ng 2010)
• Bidirectional LSTM (Raganato et al. 2017)
![Page 26: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/26.jpg)
Supervised WSD• Collocational: words
in specific positions before/after the target word to be disambiguated
• Bag-of-words: words in window around target (without encoding specific position)
feature
wi-1 = fish
wi-2= fish
wi+1 = fish
wi+2 = fish
word in context = fish
…
![Page 27: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/27.jpg)
Raganato et al. 2017
![Page 28: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/28.jpg)
One sense per discourse
• If a word appears multiple times in a document, it’s usually with the same sense. (Gale et al. 1992)
• Articles about financial banks don’t usually talk about river banks.
![Page 29: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/29.jpg)
But WSD is not POS tagging
Supervised learning
![Page 30: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/30.jpg)
Semi-supervised WSD1. Produce seeds (dictionary definitions, single
defining collocate, or label common collocates)
2. Repeat until convergence:
1. Train supervised classifier on labeled examples
2. Label all examples, and keep labels for high-confidence instances
![Page 31: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/31.jpg)
“Plant”
Semi-supervised WSD
![Page 32: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/32.jpg)
“Plant”
Semi-supervised WSD
![Page 33: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/33.jpg)
Evaluation
• Annotated data; cross-validation.
• Semcor
• Ontonotes
• Semeval/Senseval competitions
![Page 34: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/34.jpg)
Ungulate
Equine
Horse
Mammal
Vertebrate
Animal
… Entity
Self-propelled vehicle
Motor vehicle
Car
Wheeled vehicle
Conveyance
Instrumentality
Artifact
… EntityHyponymy
![Page 35: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/35.jpg)
Ciarmita and Altun 2003
![Page 36: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/36.jpg)
Supersense tagging
The station wagons arrived at noon, a long shining line
that coursed through the west campus.
artifact artifact motion time group
motion locationlocation
Noun supersenses (Ciarmita and Altun 2003)
![Page 37: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/37.jpg)
• Ciarameta and Altun (2006). Trained on data from Semcor (Miller et al. 1993); Brown corpus annotated with WordNet synset labels
• Token-level predictor – each instance of a word has its own supersense tag.
• Maximum-entropy Markov Model (MEMM) trained with averaged perceptron. Features for: word token identity, part-of-speech tag, word shape, previous label + supersense for most frequent synset for word.
• In-domain accuracy: 77.1 F score (cf. 66 F MFS baseline)
Supersense tagging
![Page 38: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/38.jpg)
Data
• Semcor: 200K+ words tagged with Wordnet senses. http://www.cse.unt.edu/~rada/downloads.html#semcor
• WordNethttps://wordnet.princeton.edu/wordnet/download/
![Page 39: Applied Natural Language Processingdbamman/anlp19... · 2019. 4. 4. · Applied Natural Language Processing Info 256 ... • For all content words in a sentence, resolve each token](https://reader034.vdocuments.us/reader034/viewer/2022051916/60084521cfd9fa43fa6f6f50/html5/thumbnails/39.jpg)
Activity
• 10.wordnet/Lesk_TODO.ipynb