uncertainty in neural network word embedding: exploration of threshold for similarity

45
Navid Rekabsaz ([email protected]) Mihai Lupu ([email protected]) Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity NeuIR at SIGIR, 21st July 2016 Navid Rekabsaz, Mihai Lupu, Allan Hanbury @NRekabsaz [email protected]

Upload: navid-rekabsaz

Post on 14-Apr-2017

253 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Navid Rekabsaz ([email protected]) Mihai Lupu ([email protected])

Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

NeuIR at SIGIR, 21st July 2016

Navid Rekabsaz, Mihai Lupu, Allan Hanbury

@NRekabsaz [email protected]

Page 2: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Similar Terms

Page 3: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Similar Terms

books 0.82foreword 0.77author 0.74published 0.73preface 0.69republished 0.68reprinted 0.68afterword 0.67memoir 0.67

bookcorpulent 0.44hideous 0.43unintelligent

0.42wizened 0.42catoblepas 0.42creature 0.42humanoid 0.41grotesquely 0.41tomtar 0.41

dwarfish

Page 4: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

TopN:

Similar Terms

books 0.82foreword 0.77author 0.74published 0.73preface 0.69republished 0.68reprinted 0.68afterword 0.67memoir 0.67

bookcorpulent 0.44hideous 0.43unintelligent

0.42wizened 0.42catoblepas 0.42creature 0.42humanoid 0.41grotesquely 0.41tomtar 0.41

dwarfish

Page 5: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

TopN:

Similar Terms

Using threshold in SIGIR 2016, tuned by brute-force search: Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising.Mihajlo Grbovic et al.Robust and Collective Entity Disambiguation through Semantic Embeddings.Stefan Zwicklbauer et al.

books 0.82foreword 0.77author 0.74published 0.73preface 0.69republished 0.68reprinted 0.68afterword 0.67memoir 0.67

bookcorpulent 0.44hideous 0.43unintelligent

0.42wizened 0.42catoblepas 0.42creature 0.42humanoid 0.41grotesquely 0.41tomtar 0.41

dwarfish

Page 6: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Contribution

Page 7: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Our Contribution • Analytical exploration of a general threshold on term similarity

• Defined for all the terms in the lexicon • Tested on Ad Hoc retrieval

• Showing the advantage of threshold-based rather than TopN-based approach

Contribution

Page 8: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Roadmap • Uncertainty in NN-based word embeddings • Similarity distribution • Proposing threshold • Experiments

Our Contribution • Analytical exploration of a general threshold on term similarity

• Defined for all the terms in the lexicon • Tested on Ad Hoc retrieval

• Showing the advantage of threshold-based rather than TopN-based approach

Contribution

Page 9: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

Page 10: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

W2V

W2V: Word2Vec SkipGram with 200 Dimensions

Page 11: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

W2V

BookDwarfish

W2V: Word2Vec SkipGram with 200 Dimensions

Page 12: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

W2V

BookDwarfish

W2V: Word2Vec SkipGram with 200 Dimensions

Page 13: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

W2V

W2V

BookDwarfish

W2V: Word2Vec SkipGram with 200 Dimensions

Page 14: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

W2V

W2V

BookDwarfish

BookDwarfish

W2V: Word2Vec SkipGram with 200 Dimensions

Page 15: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

W2V

W2V

BookDwarfish

BookDwarfish

W2V: Word2Vec SkipGram with 200 Dimensions

Page 16: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

W2V

W2V

BookDwarfish

BookDwarfish X 4

W2V: Word2Vec SkipGram with 200 Dimensions

Page 17: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Observations

Similarity values: Base

W2V

W2V

BookDwarfish

Average of similarity values of 5 models:

Avg

BookDwarfish X 4

W2V: Word2Vec SkipGram with 200 Dimensions

Page 18: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Uncertainty

Page 19: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Uncertainty

Uncertainty:

Page 20: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Uncertainty• Arbitrary term: Average of 100 representative terms [Schnabel et al. 2015]

Page 21: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Uncertainty• Arbitrary term: Average of 100 representative terms [Schnabel et al. 2015]

• Less uncertainty in higher similarity values • Less uncertainty in higher dimensions

Page 22: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Similarity Probability Distribution• Similarity between terms as probability distribution instead

of concrete values • Assumption: normal distribution based on 5 observed

similarities

Page 23: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Mixture of Cumulative Similarity Distributions

• Y axes: number of neighbors, located in the space between the X value and the term

Page 24: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Mixture of Cumulative Similarity Distributions

• Y axes: number of neighbors, located in the space between the X value and the term

Page 25: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Mixture of Cumulative Similarity Distributions

• Y axes: number of neighbors, located in the space between the X value and the term

Page 26: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Mixture of Cumulative Similarity Distributions

• Y axes: number of neighbors, located in the space between the X value and the term

Page 27: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Mixture of Cumulative Similarity Distributions

• Y axes: number of neighbors, located in the space between the X value and the term

Page 28: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Filtering NeighborsWhat is the best threshold for filtering the related terms?

Page 29: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Filtering NeighborsWhat is the best threshold for filtering the related terms?

Hypothesis: it can be estimated based on the average number of synonyms over the terms

Page 30: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Filtering NeighborsWhat is the best threshold for filtering the related terms?

Hypothesis: it can be estimated based on the average number of synonyms over the terms

What is the expected number of synonyms for a word in English?

Page 31: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Filtering NeighborsWhat is the best threshold for filtering the related terms?

Hypothesis: it can be estimated based on the average number of synonyms over the terms

What is the expected number of synonyms for a word in English?

# of terms: Average # of synonyms per term:

Standard deviation :

147306 1.6 3.1

Page 32: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Threshold• Proposed Threshold: cumulative frequency equal to 1.6

Page 33: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Threshold• Proposed Threshold: cumulative frequency equal to 1.6

Page 34: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Threshold• Proposed Threshold: cumulative frequency equal to 1.6

Page 35: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Threshold• Proposed Threshold: cumulative frequency equal to 1.6

Page 36: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Experiments Setup• Translation Language Model with Dirichlet smoothing • Use word embedding similarity value for translation probability

Zuccon et al. [2015] • Apply the proposed threshold to select the similar words • Brute-force search to find the optimal threshold

Page 37: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Experiments Setup• Translation Language Model with Dirichlet smoothing • Use word embedding similarity value for translation probability

Zuccon et al. [2015] • Apply the proposed threshold to select the similar words • Brute-force search to find the optimal threshold

• Test collections:

Page 38: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Experiments Setup• Translation Language Model with Dirichlet smoothing • Use word embedding similarity value for translation probability

Zuccon et al. [2015] • Apply the proposed threshold to select the similar words • Brute-force search to find the optimal threshold

• Test collections:

• Baseline: language model (Dirichlet smoothing) • Significance Test : T-Test p<0.05

Page 39: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Experiments Results• Gain of MAP over baseline, averaged on four collections. • Conclusion1: Optimal threshold is either the same or in the

confidence interval of the proposed threshold.

Page 40: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Experiments Results• Gain of MAP over baseline, averaged on four collections. • Conclusion1: Optimal threshold is either the same or in the

confidence interval of the proposed threshold.

Page 41: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Threshold vs. TopN• Conclusion2: Threshold outperforms TopN

Threshold-based TopN

Page 42: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Take Home Message

• Uncertainty in neural network word embeddings: • depends on similarity value • depends on dimensionality

• Threshold to filter unrelated terms : • Proposed threshold as good as optimal threshold • Threshold approach much better than Top-N approach

Page 43: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

@NRekabsaz [email protected]

Questions?

Ideas!

Follow-up paper in CIKM 2016: Generalizing Translation Models in the Probabilistic Relevance Framework Navid Rekabsaz, Mihai Lupu, Allan Hanbury

Page 44: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Dimensionality

Page 45: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

Proposed vs. Optimal Threshold