bbm495-lecture8burcucan/bbm495-lecture8-4pp.pdf · exp(uwvc) dot product compares similarity of o...
TRANSCRIPT
![Page 1: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/1.jpg)
4/29/19
1
BBM 495WORD EMBEDDINGS
2018-2019 SPRING
§ How similar is pizza to pasta?§ How related is pizza to Italy?
§ Representing words as vectors allows easy computation of similarity
![Page 2: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/2.jpg)
4/29/19
2
§ Increase in size with vocabulary§ Very high dimensional: require a lot of storage§ Subsequent classification models have sparsity issues§ Models are less robust
![Page 3: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/3.jpg)
4/29/19
3
![Page 4: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/4.jpg)
4/29/19
4
![Page 5: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/5.jpg)
4/29/19
5
![Page 6: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/6.jpg)
4/29/19
6
![Page 7: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/7.jpg)
4/29/19
7
Predict!
![Page 8: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/8.jpg)
4/29/19
8
§ Remember: two vectors are similar if they have a high dot product§ Cosine is just a normalized dot product
§ So: § Similarity (o,c) ∝ uo · Vc
§ Wewill need to normalize to get a probability
§ We use softmax to turn into probabilities:
![Page 9: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/9.jpg)
4/29/19
9
![Page 10: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/10.jpg)
4/29/19
10
![Page 11: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/11.jpg)
4/29/19
11
all§ Take gradients at each window
§ Go through gradient for each center vector v in a window§ In each window, we will compute updates for all parameters that
are being used in that window. For example:
![Page 12: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/12.jpg)
4/29/19
12
![Page 13: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/13.jpg)
4/29/19
13
![Page 14: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/14.jpg)
4/29/19
14
![Page 15: BBM495-Lecture8burcucan/BBM495-Lecture8-4pp.pdf · exp(uwvc) Dot product compares similarity of o and c. Larger dot product = larger probability After taking exponent, normalize over](https://reader035.vdocuments.us/reader035/viewer/2022070906/5f7684fbd99ce827a7278836/html5/thumbnails/15.jpg)
4/29/19
15
§ Dense Vectors, Dan Jurafsky§ Representation for Language: from Word Embeddings to
Sentence Meanings, Christopher Manning, Stanford University, 2017
§ Natural Language Processing with Deep Learning, Richard Socher, Stanford University
§ More Word Vectors, Richard Socher, Stanford University§ Improving Distributional Similarity with Lessons Learned from
Word Embeddings, Omer Levy,