june 14 2017 berlin nlp crosslinguistic kate mccurdy word...
TRANSCRIPT
![Page 1: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/1.jpg)
Grammatical and topical gender in crosslinguistic word embeddingsKate McCurdyBerlin NLPJune 14 2017
![Page 2: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/2.jpg)
Word embeddings: From (almost) scratch to NLP● Goal: word representations that...
○ capture maximal semantic/syntactic information, yet○ require minimal task-specific feature engineering
● Neural embeddings to the rescue!○ Input: barely processed, massive corpora
■ In general: tokenization + trimming the long tail in vocab
■ Collobert et al.: capitalization as feature + a few extra tweaks
■ Mikolov et al: n-gram phrase identification
○ Output: dense, magically performant vectors
![Page 3: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/3.jpg)
… but there are pitfalls
![Page 4: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/4.jpg)
You shall know a word by the company it keeps.
Firth 1957
![Page 5: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/5.jpg)
Pitfall #1What if your words keep
company with some unsavory stereotypes?
![Page 6: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/6.jpg)
Analogous relations in the GloVe word embedding; from Caliskan-Islam et al 2016
![Page 7: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/7.jpg)
Stereotypes in word embeddings:Bolukbasi et al. 2016
addiction
accountant
pilot
athlete
professor emeritus
eating disorder
paralegal
flight attendant
gymnast
associate professor
:
:
:
:
:
![Page 8: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/8.jpg)
Bias in humans: the Implicit Association Test● Standard psychological
test to assess implicit bias● Design:
○ Two sets of attribute words■ Male, man, boy, …
■ Female, woman, …
○ Two sets of target words■ Children, wedding,...
■ Office, salary, …
○ Task: left vs right fast
categorization of both sets
● Measurement: differential association in average response timeGreenwald et al. 1998
![Page 9: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/9.jpg)
● WEAT: the Word Embedding Association Test
● Parallels the Implicit Association Test ● Measures the differential association
between paired target and attribute word sets via cosine distance
● Core finding: nearly every single prejudice uncovered by the IAT is replicated by the WEAT on Google News + GloVe word embeddings
![Page 10: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/10.jpg)
Pitfall #1What if your words keep
company with some unsavory stereotypes?
![Page 11: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/11.jpg)
Pitfall #2What if your content words hang out with your function
words and make weird artefacts?
![Page 12: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/12.jpg)
Work with Oguz Serbetci (not pictured)
Crosslinguistic word embeddings
![Page 13: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/13.jpg)
Data
● Corpus: OpenSubtitles● ~5.5K movies with subtitles in 4 languages (2.6-2.9m ws):
○ German - grammatical gender
○ Spanish - grammatical gender
○ Dutch - grammatical gender orthogonal to “natural” gender
○ English - “natural” gender
● Lemmatized each corpus to remove gender● Trained 10 word2vec CBOW embeddings per condition:
○ Language (4) x
○ Corpus version (2 - unprocessed vs lemmatized)
![Page 14: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/14.jpg)
Method● Measurement:
○ differential association using the Word Embedding Association Test (WEAT - Caliskan et al.)
{male} {female}
{career} {family}
![Page 15: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/15.jpg)
Method● Measurement:
○ differential association using the Word Embedding Association Test (WEAT - Caliskan et al.)
● Comparisons:○ “Topical” semantic gender bias
■ replicate IAT findings of Caliskan et al. on dimension
male:career::female:family
![Page 16: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/16.jpg)
Method● Measurement:
○ differential association using the Word Embedding Association Test (WEAT - Caliskan et al.)
● Comparisons:○ “Topical” semantic gender bias
■ replicate IAT findings of Caliskan et al. on dimension
male:career::female:family
○ Grammatical gender bias ■ use stimuli from Phillips & Boroditsky on dimension
male:masculine::female:feminine■ e.g. Spanish el sol (m), German die Sonne (f)
![Page 17: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/17.jpg)
Topical gender bias
≈ average increase in cosine similarity per word
![Page 18: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/18.jpg)
Topical gender bias Grammatical gender bias
![Page 19: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/19.jpg)
Pitfall #2What if your content words hang out with your function
words and make weird artefacts?
![Page 20: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/20.jpg)
Words can keep strange company!And arbitrary properties like grammatical gender can distort your embeddings.
![Page 21: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/21.jpg)
Thank! Q?
![Page 22: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/22.jpg)
ReferencesBolukbasi, T., Chang, K.-W., Zou, J.,
Saligrama, V., & Kalai, A. (2016). Quantifying and reducing stereotypes in word embeddings. arXiv Preprint arXiv:1606.06121.
Caliskan-Islam, A., Bryson, J. J., & Narayanan, A. (2016). Semantics derived automatically from language corpora necessarily contain human biases. arXiv Preprint arXiv:1608.07187.
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug), 2493–2537.
Firth, John R. 1957. A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis, 1–32. Oxford: Blackwell.
Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. (1998). Measuring individual differences in implicit cognition: the implicit association test. Journal of Personality and Social Psychology, 74(6), 1464.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
![Page 23: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/23.jpg)
Appendix
![Page 24: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/24.jpg)
Interaction between topical and grammatical gender effects in DE + ES
![Page 25: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/25.jpg)
Stereotypes in word embeddings:Bolukbasi et al. 2016
1. Define gender subspace
![Page 26: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/26.jpg)
Stereotypes in word embeddings:Bolukbasi et al. 2016
1. Define gender subspace
2. Project profession names
onto subspace
![Page 27: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/27.jpg)
Stereotypes in word embeddings:Bolukbasi et al. 2016
1. Define gender subspace
2. Project profession names
onto subspace
3. Generate analogies & get
stereotype ratings from MTurk
addiction
accountant
pilot
athlete
professor emeritus
eating disorder
paralegal
flight attendant
gymnast
associate professor
:
:
:
:
:
![Page 28: June 14 2017 Berlin NLP crosslinguistic Kate McCurdy word ...anacode.de/.../06/Kate-McCurdy-Grammatical-gender... · German - grammatical gender Spanish - grammatical gender Dutch](https://reader033.vdocuments.us/reader033/viewer/2022050117/5f4e10ab15c8ba1e500e7ac9/html5/thumbnails/28.jpg)
Stereotypes in word embeddings:Bolukbasi et al. 2016
1. Define gender subspace
2. Project profession names
onto subspace
3. Generate analogies & get
stereotype ratings from MTurk
4. Compute transformation matrix
to debias designated words