![Page 1: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/1.jpg)
A word is worth a thousand vectors
(word2vec, lda, and introducing lda2vec)
Christopher Moody @ Stitch Fix
Welcome, thanks for coming, having me, organizer
NLP can be a messy affair because you have to teach a computer about the irregularities and ambiguities of the English language in this sort of hierarchical sparse nature in all the grammar
3rd trimester, pregnant“wears scrubs” — medicinetaking a trip — a fix for vacation clothing
power of word vectors promise is to sweep away a lot of issues
![Page 2: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/2.jpg)
About
@chrisemoody Caltech Physics PhD. in astrostats supercomputing sklearn t-SNE contributor Data Labs at Stitch Fix github.com/cemoody
Gaussian Processes t-SNE
chainer deep learning
Tensor Decomposition
![Page 3: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/3.jpg)
CreditLarge swathes of this talk are from
previous presentations by:
• Tomas Mikolov • David Blei • Christopher Olah • Radim Rehurek • Omer Levy & Yoav Goldberg • Richard Socher • Xin Rong • Tim Hopper
![Page 4: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/4.jpg)
word2vec
lda
1
23ld
a2vec
![Page 5: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/5.jpg)
1. king - man + woman = queen 2. Huge splash in NLP world 3. Learns from raw text 4. Pretty simple algorithm 5. Comes pretrained
word2vec
1. Learns what words mean — can solve analogies cleanly.1. Not treating words as blocks, but instead modeling relationships
2. Distributed representations form the basis of more complicated deep learning systems3. Shallow — not deep learning!
1. Power comes from this simplicity — super fast, lots of data4. Get a lot of mileage out of this
1. Don’t need to model the wikipedia corpus before starting your own
![Page 6: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/6.jpg)
word2vec
1. Set up an objective function 2. Randomly initialize vectors 3. Do gradient descent
![Page 7: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/7.jpg)
word
2vec
word2vec: learn word vector vin from it’s surrounding context
vin
1. Let’s talk about training first2. In SVD and n-grams we built a co-occurence and transition probability matrices3. Here we will learn the embedded representation directly, with no intermediates, update it w/ every example
![Page 8: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/8.jpg)
word
2vec
“The fox jumped over the lazy dog”Maximize the likelihood of seeing the words given the word over.
P(the|over) P(fox|over)
P(jumped|over) P(the|over) P(lazy|over) P(dog|over)
…instead of maximizing the likelihood of co-occurrence counts.
1. Context — the words surrounding the training word2. Naively assume P(*|over) is independent conditional on the training word3. Still a pretty simple assumption!
Conditioning on just *over* no other secret parameters or anything
![Page 9: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/9.jpg)
word
2vec
P(fox|over)
What should this be?
![Page 10: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/10.jpg)
word
2vec
P(vfox|vover)
Should depend on the word vectors.
P(fox|over)
Trying to learn the word vectors, so let’s start with those(we’ll randomly initialize them to begin with)
![Page 11: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/11.jpg)
word
2vec
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
“The fox jumped over the lazy dog”
P(vOUT|vIN)
![Page 12: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/12.jpg)
word
2vec
“The fox jumped over the lazy dog”
vIN
P(vOUT|vIN)
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
IN = training word
![Page 13: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/13.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
![Page 14: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/14.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
![Page 15: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/15.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
![Page 16: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/16.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
![Page 17: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/17.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
![Page 18: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/18.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
![Page 19: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/19.jpg)
word
2vec
P(vOUT|vIN)
“The fox jumped over the lazy dog”
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
![Page 20: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/20.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 21: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/21.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 22: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/22.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 23: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/23.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 24: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/24.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
…So that at a high level is what we want word2vec to do.
![Page 25: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/25.jpg)
word
2vec
“The fox jumped over the lazy dog”
vOUT
P(vOUT|vIN)
vIN
Twist: we have two vectors for every word. Should depend on whether it’s the input or the output.
Also a context window around every input word.
…So that at a high level is what we want word2vec to do.
two for loops
That’s it! A bit disengious to make this a giant network
![Page 26: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/26.jpg)
objectiv
e
Measure loss between vIN and vOUT?
vin . vout
How should we define P(vOUT|vIN)?
Now we’ve defined the high-level update path for the algorithm.
Need to define this prob exactly in order to define our updates.
Boils down to diff between in & out — want to make as similar as possible, and then the probability will go up.
Use cosine sim.
![Page 27: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/27.jpg)
word
2vec
vin . vout ~ 1
objectiv
e
vin
vout
Dot product has these properties:Similar vectors have similarly near 1
![Page 28: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/28.jpg)
word
2vec
objectiv
e
vin
vout
vin . vout ~ 0
Orthogonal vectors have similarity near 0
![Page 29: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/29.jpg)
word
2vec
objectiv
e
vin
vout
vin . vout ~ -1
Orthogonal vectors have similarity near -1
![Page 30: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/30.jpg)
word
2vec
vin . vout ∈ [-1,1]
objectiv
e
But the inner product ranges from -1 to 1 (when normalized)…and we’d like a probability
![Page 31: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/31.jpg)
word
2vec
But we’d like to measure a probability.
vin . vout ∈ [-1,1]
objectiv
e
But the inner product ranges from -1 to 1 (when normalized)…and we’d like a probability
![Page 32: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/32.jpg)
word
2vec
But we’d like to measure a probability.
softmax(vin . vout ∈ [-1,1])
objectiv
e
∈ [0,1]
Transform again using softmax
![Page 33: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/33.jpg)
word
2vec
But we’d like to measure a probability.
softmax(vin . vout ∈ [-1,1])
Probability of choosing 1 of N discrete items. Mapping from vector space to a multinomial over words.
objectiv
e
Similar to logistic function for binary outcomes, but instead for 1 of N outcomes.
So now we’re modeling the probability of a word showing up as the combination of the training word vector and the target word vector and transforming it to a 1 of N
![Page 34: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/34.jpg)
word
2vec
But we’d like to measure a probability.
exp(vin . vout ∈ [0,1])softmax ~
objectiv
e
So here’s the actual form of the equation — we normalize by the sum of all of the other possible pairs of word combinations
![Page 35: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/35.jpg)
word
2vec
But we’d like to measure a probability.
exp(vin . vout ∈ [-1,1])Σexp(vin . vk)
softmax =
objectiv
e
Normalization term over all words
k ∈ V
So here’s the actual form of the equation — we normalize by the sum of all of the other possible pairs of word combinations
two effectsmake vin and vout more similarmake vin and every other word less similar
![Page 36: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/36.jpg)
word
2vec
But we’d like to measure a probability.
exp(vin . vout ∈ [-1,1])Σexp(vin . vk)
softmax = = P(vout|vin)
objectiv
e
k ∈ V
This is the kernel of the word2vec. We’re just going to apply this operation every time we want to update the vectors.
For every word, we’re going to have a context window, and then for every pair of words in that window and the input word, we’ll measure this probability.
![Page 37: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/37.jpg)
word
2vec
Learn by gradient descent on the softmax prob.
For every example we see update vin
vin := vin + P(vout|vin)
objectiv
e
vout := vout + P(vout|vin)
…I won’t go through the derivation of the gradient, but this is the general idea
relatively simple, fast — fast enough to read billions of words in a day
![Page 38: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/38.jpg)
word2vec
explain table
![Page 39: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/39.jpg)
word2vec
if not convinced by qualitative results….
![Page 40: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/40.jpg)
![Page 41: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/41.jpg)
Showing just 2 of the ~500 dimensions. Effectively we’ve PCA’d it
![Page 42: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/42.jpg)
![Page 43: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/43.jpg)
![Page 44: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/44.jpg)
![Page 45: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/45.jpg)
![Page 46: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/46.jpg)
![Page 47: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/47.jpg)
![Page 48: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/48.jpg)
![Page 49: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/49.jpg)
![Page 50: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/50.jpg)
![Page 51: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/51.jpg)
![Page 52: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/52.jpg)
If we only had locality and not regularity, this wouldn’t necessarily be true
![Page 53: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/53.jpg)
![Page 54: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/54.jpg)
![Page 55: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/55.jpg)
So we live in a vector space where operations like addition and subtraction are meaningful.
So here’s a few examples of this working.
Really get the idea of these vectors as being ‘mixes’ of other ideas & vectors
![Page 56: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/56.jpg)
ITEM_3469 + ‘Pregnant’
SF is a person service
Box
![Page 57: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/57.jpg)
+ ‘Pregnant’
I love the stripes and the cut around my neckline was amazing
someone else might write ‘grey and black’
subtlety and nuance in that language
We have lots of this interaction — of order wikipedia amount — far too much to manually annotate anything
![Page 58: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/58.jpg)
= ITEM_701333 = ITEM_901004 = ITEM_800456
![Page 59: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/59.jpg)
Stripes and are safe for maternityAnd also similar tones and flowy — still great for expecting mothers
![Page 60: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/60.jpg)
what about?LDA?
![Page 61: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/61.jpg)
LDA on Client Item Descriptions
This shows the incredible amount of structure
![Page 62: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/62.jpg)
LDA on Item
Descriptions (with Jay)
clunky jewelrydangling delicate jewelry elsewhere
![Page 63: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/63.jpg)
LDA on Item
Descriptions (with Jay)
topics on patterns, styles — this cluster is similarly described as high contrast tops with popping colors
![Page 64: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/64.jpg)
LDA on Item
Descriptions (with Jay)
bright dresses for a warm summer
![Page 65: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/65.jpg)
LDA on Item
Descriptions (with Jay)
maternity line clothes
![Page 66: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/66.jpg)
LDA on Item
Descriptions (with Jay)
not just visual topics, but also topics about fit
![Page 67: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/67.jpg)
Latent style vectors from textPairwise gamma correlation
from style ratings
Diversity from ratings Diversity from text
Lots of structure in both — but the diversity much higher in the text
Maybe obvious: but the way people describe items is fundamentally richer than the style ratings
![Page 68: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/68.jpg)
lda vs word2vec
![Page 69: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/69.jpg)
word2vec is local: one word predicts a nearby word
“I love finding new designer brands for jeans”
as if the world where one very long text string. no end of documents, no end of sentence, etc.
and a window across words
![Page 70: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/70.jpg)
“I love finding new designer brands for jeans”
But text is usually organized.
as if the world where one very long text string. no end of documents, no end of sentence, etc.
![Page 71: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/71.jpg)
“I love finding new designer brands for jeans”
But text is usually organized.
as if the world where one very long text string. no end of documents, no end of sentence, etc.
![Page 72: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/72.jpg)
“I love finding new designer brands for jeans”
In LDA, documents globally predict words.
doc 7681
these are client comment which are short, only predict dozens of words
but could be legal documents, or medical documents, 10k words — here the difference between global and local algorithms is much more important
![Page 73: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/73.jpg)
[ -0.75, -1.25, -0.55, -0.12, +2.2] [ 0%, 9%, 78%, 11%]
typical word2vec vector typical LDA document vector
![Page 74: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/74.jpg)
typical word2vec vector
[ 0%, 9%, 78%, 11%]
typical LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
All sum to 100%All real values
![Page 75: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/75.jpg)
5D word2vec vector
[ 0%, 9%, 78%, 11%]
5D LDA document vector
[ -0.75, -1.25, -0.55, -0.12, +2.2]
Sparse All sum to 100%
Dimensions are absolute
Dense All real values
Dimensions relative
LDA is a *mixture* w2v is a bunch of real numbers — more like and *address*much easier to say to another human 78% of something rather than it is +2.2 of something and -1.25 of something else
![Page 76: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/76.jpg)
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Sparse All sum to 100%
Dimensions are absolute
Dense All real values
Dimensions relative
dense sparse
![Page 77: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/77.jpg)
100D word2vec vector
[ 0%0%0%0%0% … 0%, 9%, 78%, 11%]
100D LDA document vector
[ -0.75, -1.25, -0.55, -0.27, -0.94, 0.44, 0.05, 0.31 … -0.12, +2.2]
Similar in fewer ways (more interpretable)
Similar in 100D ways (very flexible)
+mixture +sparse
![Page 78: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/78.jpg)
can we do both? lda2vec
series of expgrain of saltvery new — no good quantitative results only qualitative (but promising!)
![Page 79: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/79.jpg)
The goal: Use all of this context to learn
interpretable topics.
P(vOUT |vIN)word2vec
@chrisemoody
Use this at SF. typical tablew2v will use w-w
![Page 80: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/80.jpg)
word2vecLDA P(vOUT |vDOC)
The goal: Use all of this context to learn
interpretable topics.
this document is 80% high fashion
this document is 60% style
@chrisemoody
LDA will use that doc ID columnyou can use this to steer the business as a whole
![Page 81: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/81.jpg)
word2vecLDA
The goal: Use all of this context to learn
interpretable topics.
this zip code is 80% hot climate
this zip code is 60% outdoors wear
@chrisemoody
But doesn’t predict word-to-word relationships.
in texas, maybe i want more lonestars & stirrup iconsin austin, maybe i want more bats
![Page 82: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/82.jpg)
word2vecLDA
The goal: Use all of this context to learn
interpretable topics.
this client is 80% sporty
this client is 60% casual wear
@chrisemoody
love to learn client topicsare there ‘types’ of clients? q every biz asks
so this is the promise of lda2vec
![Page 83: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/83.jpg)
lda2
vec
word2vec predicts locally: one word predicts a nearby word
P(vOUT |vIN)
vIN vOUT
“PS! Thank you for such an awesome top”
But doesn’t predict word-to-word relationships.
![Page 84: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/84.jpg)
lda2
vec
LDA predicts a word from a global context
doc_id=1846
P(vOUT |vDOC)
vOUTvDOC
“PS! Thank you for such an awesome top”
But doesn’t predict word-to-word relationships.
![Page 85: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/85.jpg)
lda2
vec
doc_id=1846
vIN vOUTvDOC
can we predict a word both locally and globally ?
“PS! Thank you for such an awesome top”
![Page 86: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/86.jpg)
lda2
vec
“PS! Thank you for such an awesome top”doc_id=1846
vIN vOUTvDOC
can we predict a word both locally and globally ?
P(vOUT |vIN+ vDOC)
doc vector captures long-distance dependencies
word vector captures short-distance
![Page 87: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/87.jpg)
lda2
vec
doc_id=1846
vIN vOUTvDOC
*very similar to the Paragraph Vectors / doc2vec
can we predict a word both locally and globally ?
“PS! Thank you for such an awesome top”
P(vOUT |vIN+ vDOC)
![Page 88: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/88.jpg)
lda2
vec
This works! 😀 But vDOC isn’t as interpretable as the LDA topic vectors. 😔
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
![Page 89: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/89.jpg)
lda2
vec
This works! 😀 But vDOC isn’t as interpretable as the LDA topic vectors. 😔
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
about as interpretable a hash
![Page 90: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/90.jpg)
lda2
vec
This works! 😀 But vDOC isn’t as interpretable as the LDA topic vectors. 😔
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
![Page 91: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/91.jpg)
lda2
vec
This works! 😀 But vDOC isn’t as interpretable as the LDA topic vectors. 😔
We’re missing mixtures & sparsity.
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
![Page 92: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/92.jpg)
lda2
vec
This works! 😀 But vDOC isn’t as interpretable as the LDA topic vectors. 😔
Let’s make vDOC into a mixture…
Too many documents. I really like that document X is 70% in topic 0, 30% in topic1, …
![Page 93: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/93.jpg)
lda2
vec
Let’s make vDOC into a mixture…
vDOC = a vtopic1 + b vtopic2 +… (up to k topics)
sum of other word vectors
intuition here is that ‘hanoi = vietnam + capital’ and lufthansa = ‘germany + airlines’
so we think that document vectors should also be some word vector + some word vector
![Page 94: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/94.jpg)
lda2
vec
Let’s make vDOC into a mixture…
vDOC = a vtopic1 + b vtopic2 +…Trinitarian baptismal
Pentecostals Bede
schismatics excommunication
twenty newsgroup dataset, free, canonical
![Page 95: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/95.jpg)
lda2
vec
Let’s make vDOC into a mixture…
vDOC = a vtopic1 + b vtopic2 +…
topic 1 = “religion” Trinitarian baptismal
Pentecostals Bede
schismatics excommunication
![Page 96: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/96.jpg)
lda2
vec
Let’s make vDOC into a mixture…
vDOC = a vtopic1 + b vtopic2 +…
Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
topic 1 = “religion” Trinitarian baptismal
Pentecostals Bede
schismatics excommunication
![Page 97: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/97.jpg)
lda2
vec
Let’s make vDOC into a mixture…
vDOC = a vtopic1 + b vtopic2 +…
topic 1 = “religion” Trinitarian baptismal
Pentecostals bede
schismatics excommunication
topic 2 = “politics” Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
purple a,b coefficients tell you how much it is that topic
![Page 98: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/98.jpg)
lda2
vec
Let’s make vDOC into a mixture…
vDOC = 10% religion + 89% politics +…
topic 2 = “politics” Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
topic 1 = “religion” Trinitarian baptismal
Pentecostals bede
schismatics excommunication
Doc is now 10% religion 89% politics
mixture models are powerful for interpretability
![Page 99: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/99.jpg)
lda2
vec
Let’s make vDOC sparse
[ -0.75, -1.25, …]
vDOC = a vreligion + b vpolitics +…
Now 1st time I did this…
Hard to interpret. What does -1.2 politics mean? math works, but not intuitive
![Page 100: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/100.jpg)
lda2
vec
Let’s make vDOC sparse
vDOC = a vreligion + b vpolitics +…
How much of this doc is in religion, how much in poltics
but doesn’t work when you have more than a few
![Page 101: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/101.jpg)
lda2
vec
Let’s make vDOC sparse
vDOC = a vreligion + b vpolitics +…
How much of this doc is in religion, how much in cars
but doesn’t work when you have more than a few
![Page 102: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/102.jpg)
lda2
vec
Let’s make vDOC sparse
{a, b, c…} ~ dirichlet(alpha)
vDOC = a vreligion + b vpolitics +…
trick we can steal from bayesian
make it dirichlet skipping technical detailsmake everything sum to 100%penalize non-zeroforce model to only make it non-zero w/ lots of evidence
![Page 103: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/103.jpg)
lda2
vec
Let’s make vDOC sparse
{a, b, c…} ~ dirichlet(alpha)
vDOC = a vreligion + b vpolitics +…
sparsity-inducing effect. similar to the lasso or l1 reg, but dirichletfew dimensions, sum to 100%I can say to the CEO, set of docs could have been in 100 topics, but we picked only the best topics
![Page 104: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/104.jpg)
word2vecLDA
P(vOUT |vIN + vDOC)lda2vec
The goal: Use all of this context to learn
interpretable topics.
@chrisemoody
this document is 80% high fashion
this document is 60% style
go back to our problem lda2vec is going to use all the info here
![Page 105: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/105.jpg)
word2vecLDA
P(vOUT |vIN+ vDOC + vZIP)lda2vec
The goal: Use all of this context to learn
interpretable topics.
@chrisemoody
add column = adding a termadd features in an ML model
![Page 106: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/106.jpg)
word2vecLDA
P(vOUT |vIN+ vDOC + vZIP)lda2vec
The goal: Use all of this context to learn
interpretable topics.
this zip code is 80% hot climate
this zip code is 60% outdoors wear
@chrisemoody
in addition to doc topics, like ‘rec SF’
![Page 107: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/107.jpg)
word2vecLDA
P(vOUT |vIN+ vDOC + vZIP +vCLIENTS)lda2vec
The goal: Use all of this context to learn
interpretable topics.
this client is 80% sporty
this client is 60% casual wear
@chrisemoody
client topics — sporty, casual, this is where if she says ‘3rd trimester’ — identify a future mother‘scrubs’ — medicine
![Page 108: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/108.jpg)
word2vecLDA
P(vOUT |vIN+ vDOC + vZIP +vCLIENTS) P(sold | vCLIENTS)
lda2vec
The goal: Use all of this context to learn
interpretable topics.
@chrisemoody
Can also make the topics supervised so that they predict
an outcome.
helps fine-tune topics so that correlate with your favorite business metricalign topics w/ expectationshelps us guess when revenue goes up what the leading causes are
![Page 109: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/109.jpg)
github.com/cemoody/lda2vec
uses pyldavisAPI Ref docs (no narrative docs) GPU Decent test coverage
@chrisemoody
![Page 110: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/110.jpg)
“PS! Thank you for such an awesome idea”
@chrisemoody
doc_id=1846
Can we model topics to sentences? lda2lstm
SF is all about mixing cutting edge algorithms but we absolutely need interpretability. human component to algos is not negotiable
Could we demand the model make us a sentence that is 80% religion, 10% politics?
classify word level, LSTM on sentence, LDA on document level
![Page 111: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/111.jpg)
“PS! Thank you for such an awesome idea”
@chrisemoody
doc_id=1846
Can we represent the internal LSTM states as a dirichlet mixture?
Dirichlet-squeeze internal states and manipulations, that maybe will help us understand the science of LSTM dynamics — because seriously WTF is going on there
![Page 112: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/112.jpg)
Can we model topics to sentences? lda2lstm
“PS! Thank you for such an awesome idea”doc_id=1846
@chrisemoody
Can we model topics to images? lda2ae
TJ Torres
Can we also extend this to image generation? TJ is working on a ridiculous VAE/GAN model… can we throw in a topic model? Can we say make me an image that is 80% sweater, and 10% zippers, and 10% elbow patches?
![Page 114: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/114.jpg)
![Page 115: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/115.jpg)
![Page 116: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/116.jpg)
Bonus slides
![Page 117: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/117.jpg)
![Page 118: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/118.jpg)
![Page 119: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/119.jpg)
Crazy Approaches
Paragraph Vectors (Just extend the context window)
Content dependency (Change the window grammatically)
Social word2vec (deepwalk) (Sentence is a walk on the graph)
Spotify (Sentence is a playlist of song_ids)
Stitch Fix (Sentence is a shipment of five items)
![Page 120: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/120.jpg)
See previous
![Page 121: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/121.jpg)
CBOW
“The fox jumped over the lazy dog”
Guess the word given the context
~20x faster. (this is the alternative.)
vOUT
vIN vINvIN vINvIN vIN
SkipGram
“The fox jumped over the lazy dog”
vOUT vOUT
vIN
vOUT vOUT vOUTvOUT
Guess the context given the word
Better at syntax. (this is the one we went over)
CBOW sums words vectors, loses the order in the sentenceBoth are good at semantic relationships Child and kid are nearby Or gender in man, woman If you blur words over the scale of context — 5ish words, you lose a lot grammatical nuanceBut skipgram preserves order Preserves the relationship in pluralizing, for example
![Page 122: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/122.jpg)
Shows that are many words similar to vacation actually come in lots of flavors — wedding words (bachelorette, rehearsals)— holiday/event words (birthdays, brunch, christmas, thanksgiving)— seasonal words (spring, summer,)— trip words (getaway)— destinations
![Page 123: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/123.jpg)
LDA Results
context
History
I loved every choice in this fix!! Great job!
Great Stylist Perfect
![Page 124: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/124.jpg)
LDA Results
context
History
Body Fit
My measurements are 36-28-32. If that helps. I like wearing some clothing that is fitted.
Very hard for me to find pants that fit right.
![Page 125: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/125.jpg)
LDA Results
context
History
Sizing
Really enjoyed the experience and the pieces, sizing for tops was too big.
Looking forward to my next box!
Excited for next
![Page 126: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/126.jpg)
LDA Results
context
History
Almost Bought
It was a great fix. Loved the two items I kept and the three I sent back were close!
Perfect
![Page 127: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/127.jpg)
What I didn’t mention
A lot of text (only if you have a specialized vocabulary)
Cleaning the text
Memory & performance
Traditional databases aren’t well-suited
False positives
hundreds of millions of words, 1,000 books, 500,000 comments, or 4,000,000 tweets
high-memory and high-performance multicore machine. Training can take several hours to several days but shouldn't need frequent retraining.
If you use pretrained vectors, then this isn't an issue.
Databases. Modern SQL systems aren't well-suited to performing the vector addition, subtraction and multiplication searching in vector space requires. There are a few libraries that will help you quickly find the most similar items12: annoy, ball trees, locality-sensitive hashing (LSH) or FLANN.
False-positives & exactness. Despite the impressive results that come with word vectorization, no NLP technique is perfect. Take care that your system is robust to results that a computer deems relevant but an expert human wouldn't.
![Page 128: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/128.jpg)
and now for something completely crazy
![Page 129: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/129.jpg)
All of the following ideas will change what ‘words’ and ‘context’ represent.
But we’ll still use the same w2v algo
![Page 130: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/130.jpg)
parag
raph
vecto
r
What about summarizing documents?
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
![Page 131: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/131.jpg)
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
parag
raph
vecto
r
Normal skipgram extends C words before, and C words after.
IN
OUT OUT
Except we stay inside a sentence
![Page 132: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/132.jpg)
On the day he took office, President Obama reached out to America’s enemies, offering in his first inaugural address to extend a hand if you are willing to unclench your fist. More than six years later, he has arrived at a moment of truth in testing that
The framework nuclear agreement he reached with Iran on Thursday did not provide the definitive answer to whether Mr. Obama’s audacious gamble will pay off. The fist Iran has shaken at the so-called Great Satan since 1979 has not completely relaxed.
parag
raph
vecto
r
A document vector simply extends the context to the whole document.
IN
OUT OUT
OUT OUTdoc_1347
![Page 133: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/133.jpg)
fromgensim.modelsimportDoc2Vecfn=“item_document_vectors”model=Doc2Vec.load(fn)model.most_similar('pregnant')matches=list(filter(lambdax:'SENT_'inx[0],matches))
#['...Iamcurrently23weekspregnant...',#'...I'mnow10weekspregnant...',#'...notshowingtoomuchyet...',#'...15weeksnow.Babybump...',#'...6weekspostpartum!...',#'...12weekspostpartumandamnursing...',#'...Ihavemybabyshowerthat...',#'...amstillbreastfeeding...',#'...Iwouldloveanoutfitforababyshower...']
sente
nce
sear
ch
![Page 134: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/134.jpg)
translation
(using just a rotation matrix)
Miko
lov
2013
English
Spanish
Matrix Rotation
Blows my mind
Explain plot
Not a complicated NN here
Still have to learn the rotation matrix — but it generalizes very nicely.
Have analogies for every linalg op as a linguistic operator: + and - and matrix multiplies
Robust framework and new tools to do science on words
![Page 135: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/135.jpg)
context dependent
Levy
& G
oldberg
2014
Australian scientist discovers star with telescopecontext +/- 2 words
![Page 136: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/136.jpg)
context dependent
context
Australian scientist discovers star with telescope
Levy
& G
oldberg
2014
What if we
![Page 137: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/137.jpg)
context dependent
context
Australian scientist discovers star with telescopecontext
Levy
& G
oldberg
2014
![Page 138: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/138.jpg)
context dependent
context
BoW DEPS
topically-similar vs ‘functionally’ similar
Levy
& G
oldberg
2014
![Page 139: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/139.jpg)
context dependent
context
Levy
& G
oldberg
2014
Also show that SGNS is simply factorizing:
w * c = PMI(w, c) - log kThis is completely amazing!
Intuition: positive associations (canada, snow) stronger in humans than negative associations
(what is the opposite of Canada?)
Also means we can do SVD-like techniques to get a convex w2v, uses fast lining libs, uses compressed word count matrix so also better storage…. but not online
![Page 140: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/140.jpg)
deepwalk
Perozz
i
et al 2
014
learn word vectors from sentences
“The fox jumped over the lazy dog”
vOUT vOUT vOUT vOUT vOUTvOUT
‘words’ are graph vertices ‘sentences’ are random walks on the graph
word2vec
![Page 141: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/141.jpg)
Playlists at Spotify
context
sequence
lear
ning
‘words’ are songs ‘sentences’ are playlists
![Page 142: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/142.jpg)
Playlists at Spotify
contextErik
Bernhar
dsson
Great performance on ‘related artists’
![Page 143: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/143.jpg)
Fixes at Stitch Fix
sequence
lear
ning
Let’s try: ‘words’ are styles ‘sentences’ are fixes
![Page 144: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/144.jpg)
Fixes at Stitch Fix
context
Learn similarity between styles because they co-occur
Learn ‘coherent’ styles
sequence
lear
ning
![Page 145: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/145.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ningGot lots of structure!
![Page 146: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/146.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ning
![Page 147: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/147.jpg)
Fixes at Stitch Fix?
context
sequence
lear
ning
Nearby regions are consistent ‘closets’
![Page 148: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/148.jpg)
![Page 149: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/149.jpg)
![Page 150: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/150.jpg)
A specific lda2vec model
Our text blob is a comment that comes from a region_id and a style_id
![Page 151: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/151.jpg)
![Page 152: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/152.jpg)
![Page 153: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/153.jpg)
![Page 154: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/154.jpg)
![Page 155: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/155.jpg)
![Page 156: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/156.jpg)
![Page 157: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/157.jpg)
![Page 158: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/158.jpg)
![Page 159: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/159.jpg)
Can measure similarity between topic vectors m and n, and word vectors w
This gets you the ‘top’ words in a topic, can figure out what that topic is
![Page 160: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/160.jpg)
![Page 161: word2vec, LDA, and introducing a new hybrid algorithm: lda2vec](https://reader035.vdocuments.us/reader035/viewer/2022081722/587a52a51a28ab520b8b4af1/html5/thumbnails/161.jpg)
lda2
vec
Let’s make vDOC into a mixture…
vDOC = 10% religion + 89% politics +…
topic 2 = “politics” Milosevic absentee
Indonesia Lebanese Isrealis
Karadzic
topic 1 = “religion” Trinitarian baptismal
Pentecostals bede
schismatics excommunication
This is now on the 20 newsgroups dataset…
Doc is now 10% religion 89% politics
mixture models are powerful for interpretability