linguistic regularities in sparse and explicit word representations

58
Linguistic Regularities in Sparse and Explicit Word Representations Omer Levy Yoav Goldberg Bar-Ilan University Israel

Upload: graiden-frederick

Post on 01-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Linguistic Regularities in Sparse and Explicit Word Representations. Omer Levy Yoav Goldberg Bar- Ilan University Israel. Papers in ACL 2014*. * Sampling error: +/- 100%. Neural Embeddings. Representing words as vectors is not new!. Explicit Representations (Distributional). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linguistic Regularities in  Sparse and Explicit  Word Representations

Linguistic Regularities in Sparse and Explicit

Word Representations

Omer Levy Yoav GoldbergBar-Ilan University

Israel

Page 2: Linguistic Regularities in  Sparse and Explicit  Word Representations

Papers in ACL 2014*

Neural Networks &

Word Embeddings

Other Topics

* Sampling error: +/- 100%

Page 3: Linguistic Regularities in  Sparse and Explicit  Word Representations

Neural Embeddings

Page 4: Linguistic Regularities in  Sparse and Explicit  Word Representations

Representing words as vectors is not new!

Page 5: Linguistic Regularities in  Sparse and Explicit  Word Representations

Explicit Representations (Distributional)•

Page 6: Linguistic Regularities in  Sparse and Explicit  Word Representations

Questions

• Are analogies unique to neural embeddings?Compare neural embeddings with explicit representations

• Why does vector arithmetic reveal analogies?Unravel the mystery behind neural embeddings and their “magic”

Page 7: Linguistic Regularities in  Sparse and Explicit  Word Representations

Background

Page 8: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

• Neural embeddings have interesting geometries

Page 9: Linguistic Regularities in  Sparse and Explicit  Word Representations
Page 10: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

• Neural embeddings have interesting geometries

• These patterns capture “relational similarities”

• Can be used to solve analogies:man is to woman as king is to queen

Page 11: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 12: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 13: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 14: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 15: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 16: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 17: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 18: Linguistic Regularities in  Sparse and Explicit  Word Representations

Mikolov et al. (2013a,b,c)

Page 19: Linguistic Regularities in  Sparse and Explicit  Word Representations

Are analogies unique to neural embeddings?

Page 20: Linguistic Regularities in  Sparse and Explicit  Word Representations

• Experiment: compare embeddings to explicit representations

Are analogies unique to neural embeddings?

Page 21: Linguistic Regularities in  Sparse and Explicit  Word Representations

Are analogies unique to neural embeddings?• Experiment: compare embeddings to explicit representations

Page 22: Linguistic Regularities in  Sparse and Explicit  Word Representations

Are analogies unique to neural embeddings?• Experiment: compare embeddings to explicit representations

• Learn different representations from the same corpus:

Page 23: Linguistic Regularities in  Sparse and Explicit  Word Representations

Are analogies unique to neural embeddings?•

Page 24: Linguistic Regularities in  Sparse and Explicit  Word Representations

Analogy Datasets

Page 25: Linguistic Regularities in  Sparse and Explicit  Word Representations

Embedding vs Explicit (Round 1)

Page 26: Linguistic Regularities in  Sparse and Explicit  Word Representations

Embedding vs Explicit (Round 1)

MSR Google0%

10%

20%

30%

40%

50%

60%

70%

Embedding54%

Embedding63%

Explicit29%

Explicit45%

Accu

racy

Many analogies recovered by explicit, but many more by embedding.

Page 27: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?

Page 28: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

Page 29: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

Page 30: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

Page 31: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

Page 32: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

Page 33: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

Page 34: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

Page 35: Linguistic Regularities in  Sparse and Explicit  Word Representations

Why does vector arithmetic reveal analogies?•

royal? female?

Page 36: Linguistic Regularities in  Sparse and Explicit  Word Representations

What does each similarity term mean?• Observe the joint features with explicit representations!

uncrowned Elizabethmajesty Katherinesecond impregnate

… …

Page 37: Linguistic Regularities in  Sparse and Explicit  Word Representations

Can we do better?

Page 38: Linguistic Regularities in  Sparse and Explicit  Word Representations

Let’s look at some mistakes…

Page 39: Linguistic Regularities in  Sparse and Explicit  Word Representations

Let’s look at some mistakes…

Page 40: Linguistic Regularities in  Sparse and Explicit  Word Representations

Let’s look at some mistakes…

Page 41: Linguistic Regularities in  Sparse and Explicit  Word Representations

Let’s look at some mistakes…

Page 42: Linguistic Regularities in  Sparse and Explicit  Word Representations

The Additive Objective

Page 43: Linguistic Regularities in  Sparse and Explicit  Word Representations

The Additive Objective

Page 44: Linguistic Regularities in  Sparse and Explicit  Word Representations

The Additive Objective

Page 45: Linguistic Regularities in  Sparse and Explicit  Word Representations

The Additive Objective

Page 46: Linguistic Regularities in  Sparse and Explicit  Word Representations

The Additive Objective

Page 47: Linguistic Regularities in  Sparse and Explicit  Word Representations

The Additive Objective

• Problem: one similarity might dominate the rest• Much more prevalent in explicit representation• Might explain why explicit underperformed

Page 48: Linguistic Regularities in  Sparse and Explicit  Word Representations

How can we do better?

Page 49: Linguistic Regularities in  Sparse and Explicit  Word Representations

How can we do better?

• Instead of adding similarities, multiply them!

Page 50: Linguistic Regularities in  Sparse and Explicit  Word Representations

How can we do better?

Page 51: Linguistic Regularities in  Sparse and Explicit  Word Representations

How can we do better?

Page 52: Linguistic Regularities in  Sparse and Explicit  Word Representations

Embedding vs Explicit (Round 2)

Page 53: Linguistic Regularities in  Sparse and Explicit  Word Representations

Multiplication > Addition

MSR Google MSR GoogleEmbedding Explicit

0%

10%

20%

30%

40%

50%

60%

70%

80%

Add54%

Add63%

Add29%

Add45%

Mul59%

Mul67% Mul

57%

Mul68%Ac

cura

cy

Page 54: Linguistic Regularities in  Sparse and Explicit  Word Representations

Explicit is on-par with Embedding

MSR Google0%

10%

20%

30%

40%

50%

60%

70%

80%

Embedding59%

Embedding67%Explicit

57%

Explicit68%Ac

cura

cy

Page 55: Linguistic Regularities in  Sparse and Explicit  Word Representations

Explicit is on-par with Embedding

• Embeddings are not “magical”

• Embedding-based similarities have a more uniform distribution

• The additive objective performs better on smoother distributions

• The multiplicative objective overcomes this issue

Page 56: Linguistic Regularities in  Sparse and Explicit  Word Representations

Conclusion

• Are analogies unique to neural embeddings?No! They occur in sparse and explicit representations as well.

• Why does vector arithmetic reveal analogies?Because vector arithmetic is equivalent to similarity arithmetic.

• Can we do better?Yes! The multiplicative objective is significantly better.

Page 57: Linguistic Regularities in  Sparse and Explicit  Word Representations

More Results and Analyses (in the paper)• Evaluation on closed-vocabulary analogy questions (SemEval 2012)

• Experiments with a third objective function (PairDirection)

• Do different representations reveal the same analogies?

• Error analysis

• A feature-level interpretation of how word similarity reveals analogies

Page 58: Linguistic Regularities in  Sparse and Explicit  Word Representations