word2vec: learning of word representations in a vector space - di mitri & hermans
TRANSCRIPT
![Page 1: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/1.jpg)
Word2Vec: Learning of word representations in a vector space
1
Daniele Di Mitri - Joeri Hermans
23 March 2015
![Page 2: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/2.jpg)
Student Lecture - Di Mitri & Hermans
1. Classic NLP techniques limitations
2. Skip-gram
3. Negative sampling
4. Learning of word representations
5. Applications
6. References
Outline
2
![Page 3: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/3.jpg)
Student Lecture - Di Mitri & Hermans
classic NLP techniques N-grams, Bag of words
• words as atomic units
• or in vector space [0,0,0,0,1,0,0….0] also known as one-hot
simple and robust models also when trained on huge amounts of data BUT
• No semantical relationships between words: not designed to
model linguistic knowledge.
• Data is extremely sparse due to high number of dimensions
• Scaling up will not result in significant progress
3
love candy store
Classic NLP techniques limitations
![Page 4: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/4.jpg)
Student Lecture - Di Mitri & Hermans
successful intuition: the context represents the semantics
Word’s context
4
these words represent banking
![Page 5: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/5.jpg)
Student Lecture - Di Mitri & Hermans
• One-hot problem [0,0,1] AND [1,0,0] = 0!
• Bengio et al (2003) introduce word features (feature vector) learned using a neural architecture
P(wt |w
t-(n-1),…,w
t-1)
candy = {0.124, -0.553, 0.923, 0.345, -0.009}
• Dimensionality reduction using word vectors • Data sparsity is no longer a problem.• Not computationally efficient.
Feature vectors
5
![Page 6: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/6.jpg)
Student Lecture - Di Mitri & Hermans
• Mikolov et al. introduce in 2013 more computationally efficient neural architectures skip-gram and Continuous Bag of words
• Hypothesis: more simple models trained on (a lot) more data will result in better word representations
• How to evaluate these word representations? Semantical similarity (cosine similarity)!
Importance of efficiency
6
![Page 7: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/7.jpg)
Student Lecture - Di Mitri & Hermans
Example
7
vec(“man”) – vec(“king”) + vec(“woman”) = vec(“queen”)
![Page 8: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/8.jpg)
Student Lecture - Di Mitri & Hermans
Feedforward NN for classification
Classification task: predict next and previous words (the context)
The features learned in weight matrix to hidden layer are our word vectors
Skip-gram
8
Supervised learning with unlabeled input data!
![Page 9: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/9.jpg)
Student Lecture - Di Mitri & Hermans
• Computing similarity between every word is very expensive.
• Including the correct context, select multiple incorrect contexts at random.
• Faster training
• Only a few words will change instead of all words in the language.
Negative sampling
9
![Page 10: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/10.jpg)
Student Lecture - Di Mitri & Hermans 10
![Page 11: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/11.jpg)
Student Lecture - Di Mitri & Hermans
• In Machine learning• Machine translation.
• In Data mining• Dimensionality reduction.
Example applications
11
![Page 12: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/12.jpg)
![Page 13: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/13.jpg)
Student Lecture - Di Mitri & Hermans
1. Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model.
2. Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning.
3. Tomas Mikolov, Kai Chen, Greg Corrado, and Jerey Dean. Ecient estimation of word representations in vector space.
4. Tomas Mikolov, Wen tau Yih, and Georey Zweig. Linguistic regularities in continuous space word representations.
• Try the code word2vec.googlecode.com
References
13
![Page 14: Word2Vec: Learning of word representations in a vector space - Di Mitri & Hermans](https://reader034.vdocuments.us/reader034/viewer/2022051318/58f9a99d760da3da068b707d/html5/thumbnails/14.jpg)
Student Lecture - Di Mitri & Hermans
Questions?
Thank you for your attention!
14