distributed representation of sentences and documents

25
Distributed Representations of Words and Phrases and their Compositionality Abdullah Khan Zehady

Upload: abdullah-khan-zehady

Post on 15-Apr-2017

57 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Distributed representation of sentences and documents

Distributed Representations of Words and Phrases and their Compositionality

Abdullah Khan Zehady

Page 2: Distributed representation of sentences and documents

Neural Word Embedding● Continuous vector space representation

o Words represented as dense real-valued vectors in Rd

● Distributed word representation ↔ Word Embeddingo Embed an entire vocabulary into a relatively low-dimensional linear

space where dimensions are latent continuous features.

● Classical n-gram model works in terms of discrete units o No inherent relationship in n-gram.

● In contrast, word embeddings capture regularities and relationships between words.

Page 3: Distributed representation of sentences and documents

Syntactic & Semantic Relationship

Regularities are observed as the constant offset vector between pair of words sharing some relationship.

Gender RelationKING-QUEEN ~ MAN - WOMAN

Singular/Plural Relation

KING-KINGS ~ QUEEN - QUEENS

Other Relations: Language France - French ~ Spain - Spanish

Past Tense Go – Went ~ Capture - Captured

Page 4: Distributed representation of sentences and documents

Neural Net

Page 5: Distributed representation of sentences and documents

Language Model(LM) Different models for estimating continuous representations of words.

Latent Semantic Analysis (LSA) Latent Dirichlet Allocation (LDA)

Neural network Language Model(NNLM)

Page 6: Distributed representation of sentences and documents

Feed Forward NNLM Consists of input, projection, hidden and output layers.

N previous words are encoded using 1-of-V coding, where V is size of the vocabulary. Ex: A = (1,0,...,0), B = (0,1,...,0), … , Z = (0,0,...,1) in R26

NNLM becomes computationally complex between projection(P) and hidden(H) layer

For N=10, size of P = 500-2000, size of H = 500-1000 Hidden layer is used to compute prob. dist. over all the words in

vocabulary V Hierarchical softmax as the rescue.

Page 7: Distributed representation of sentences and documents

Recurrent NNLM No projection Layer, consists of input, hidden and output layers only.

No need to specify the context length like feed forward NNLM

What is special in RNN model?

Recurrent matrix that connects

layer to itself

Page 8: Distributed representation of sentences and documents

Recurrent NNLMw(t): Input word at time ty(t): Output layer produces a prob. Dist. over words.s(t): Hidden layerU: Each column represents a word

RNN is trained with backpropagationto maximize the log likelihood.

Page 9: Distributed representation of sentences and documents

Continuous Bag of Word Model

Page 10: Distributed representation of sentences and documents

Hierarchical Softmax

Page 11: Distributed representation of sentences and documents

Negative Sampling

Page 12: Distributed representation of sentences and documents

Negative Sampling

Page 13: Distributed representation of sentences and documents

Subsampling of Frequent words

Page 14: Distributed representation of sentences and documents

Skip gram model

Page 15: Distributed representation of sentences and documents

Empirical Result

Page 16: Distributed representation of sentences and documents

Skip gram model

Page 17: Distributed representation of sentences and documents

Learning Phrases

Page 18: Distributed representation of sentences and documents

Phrase skip gram results

Page 19: Distributed representation of sentences and documents
Page 20: Distributed representation of sentences and documents

Additive compositionality

Page 21: Distributed representation of sentences and documents
Page 22: Distributed representation of sentences and documents

Compare with published word representations

Page 23: Distributed representation of sentences and documents
Page 24: Distributed representation of sentences and documents

Skip gram model

Page 25: Distributed representation of sentences and documents

Skip gram model