why spectral retrieval works

23
Why Spectral Retrieval Works Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Debapriyo Majumdar SIGIR 2005 in Salvador, Brazil, August 15 – 19

Upload: urbain

Post on 06-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Why Spectral Retrieval Works. SIGIR 2005 in Salvador, Brazil, August 15 – 19. Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Debapriyo Majumdar. What we mean by spectral retrieval. Ranked retrieval in the term space. . 1.00. 1.00. 0.00. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Why Spectral Retrieval Works

Why Spectral Retrieval Works

Holger BastMax-Planck-Institut für Informatik (MPII)

Saarbrücken, Germany

joint work with Debapriyo Majumdar

SIGIR 2005 in Salvador, Brazil, August 15 – 19

Page 2: Why Spectral Retrieval Works

What we mean by spectral retrieval

d1 d2 d3 d4 d5

qTd1———|q||d1|

cosine similaritiesqTd2———|q||d2|

0.82 0.00 0.00 0.38 0.00

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000

internetwebsurfingbeach

Ranked retrieval in the term space

1.00 1.00 0.00 0.50 0.00 "true" similarities to query

q

Page 3: Why Spectral Retrieval Works

What we mean by spectral retrieval

d1 d2 d3 d4 d5

cosine similarities0.82 0.00 0.00 0.38 0.00

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000

internetwebsurfingbeach

Ranked retrieval in the term space

1.00 1.00 0.00 0.50 0.00 "true" similarities to query

q

Spectral retrieval = linear projection to an eigensubspace

projection matrix L

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33

L qL d1 L d2 L d3 L d4 L d5

(Lq)T(Ld1)——————

|Lq| |Ld1|

…0.98 0.98 -0.25 0.73 0.01 cosine similarities in the subspace

Page 4: Why Spectral Retrieval Works

Why and when does this work? Previous work: if the term-document matrix is a slight

perturbation of a rank-k matrix then projection to a k-dimensional subspace works

– Papadimitriou, Tamaki, Raghavan, Vempala PODS'98

– Ding SIGIR'99

– Ando and Lee SIGIR'01

– Azar, Fiat, Karlin, McSherry, Saia STOC'01

Our explanation: spectral retrieval works through its ability to identify pairs of terms with similar co-occurrence patterns

– no single subspace is appropriate for all term pairs

– we fix that problem

Page 5: Why Spectral Retrieval Works

Spectral retrieval — alternative view

Spectral retrieval = linear projection to an eigensubspace

projection matrix L

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33

L qL d1 L d2 L d3 L d4 L d5

(Lq)T(Ld1)——————

|Lq||Ld1|

… cosine similarities in the subspace

qT(LTLd1)——————|Lq||LTLd1|

=

d1 d2 d3 d4 d5

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000

internetwebsurfingbeach

Ranked retrieval in the term spaceq

Page 6: Why Spectral Retrieval Works

Spectral retrieval — alternative view

Spectral retrieval = linear projection to an eigensubspace

2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33

L qL d1 L d2 L d3 L d4 L d5

… cosine similarities in the subspace

qT(LTLd1)——————|Lq||LTLd1|

d1 d2 d3 d4 d5

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000

internetwebsurfingbeach

Ranked retrieval in the term spaceq

0.29 0.36 0.25 -0.120.36 0.44 0.30 -0.170.25 0.30 0.44 0.30-0.12 -0.17 0.30 0.84

expansion matrix LTL

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

projection matrix L

Page 7: Why Spectral Retrieval Works

Spectral retrieval — alternative view

Spectral retrieval = linear projection to an eigensubspace

2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33

L qL d1 L d2 L d3 L d4 L d5

… cosine similarities in the subspace

1.18 0.96 -0.12 1.03 0.01

1.45 1.19 -0.17 1.22 -0.05

1.24 1.04 0.30 1.73 1.04

-0.11 -0.04 0.84 1.15 1.98

1000

internetwebsurfingbeach

Ranked retrieval in the term spaceq

0.29 0.36 0.25 -0.120.36 0.44 0.30 -0.170.25 0.30 0.44 0.30-0.12 -0.17 0.30 0.84

expansion matrix LTL

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

qT(LTLd1)——————|Lq||LTLd1|

LTLd1 LTLd2 LTLd3 LTLd4 LTLd5

qT(LTLd1)——————

|q||LTLd1|

… similarities after document expansion

Spectral retrieval = document expansion (not query expansion)

projection matrix L

Page 8: Why Spectral Retrieval Works

Why document "expansion"

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 1

internet

web

surfing

beach

·

0-1 expansion matrix

inte

rnet

web

surfi

ng

beac

h

1

1

1

0

=

0

1

1

0

Page 9: Why Spectral Retrieval Works

Why document "expansion"

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 1

internet

web

surfing

beach

·

add "internet" if "web" is present

0-1 expansion matrix

inte

rnet

web

surfi

ng

beac

h

1

1

1

0

=

0

1

1

0

Page 10: Why Spectral Retrieval Works

Why document "expansion"

0.29 0.36 0.25 -0.12

0.36 0.44 0.30 -0.17

0.25 0.30 0.44 0.30

-0.12 -0.17 0.30 0.84

internet

web

surfing

beach

0

1

1

0

Ideal expansion matrix has

– high scores for intuitively related terms

– low scores for intuitively unrelated terms

expansion matrix LTL

inte

rnet

web

surfi

ng

beac

h

0.61

0.74

0.74

0.13

matrix L projectingto 2 dimensions

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

add "internet" if "web" is present

· =

expansion matrix depends heavily on the subspace dimension!

Page 11: Why Spectral Retrieval Works

Why document "expansion"

0.93-

0.120.20 -0.11

-0.12 0.80 0.34 -0.18

0.20 0.34 0.44 0.30

-0.11 -0.18 0.30 0.84

internet

web

surfing

beach

0

1

1

0

Ideal expansion matrix has

– high scores for intuitively related terms

– low scores for intuitively unrelated terms

inte

rnet

web

surfi

ng

beac

h

0.08

1.13

0.78

0.12

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84-0.80 0.59 0.06 -0.01

add "internet" if "web" is present

· =

matrix L projectingto 3 dimensions

expansion matrix depends heavily on the subspace dimension!

expansion matrix LTL

Page 12: Why Spectral Retrieval Works

node / vertex

200 400 6000subspace dimension

logic / logics

200 400 6000subspace dimension

logic / vertex

200 400 6000subspace dimension

Our Key Observation We studied how the entries in the expansion matrix depend on the

dimension of the subspace to which documents are projected

expansi

on

matr

ix e

ntr

y

0

no single dimension is appropriate for all term pairs

Page 13: Why Spectral Retrieval Works

node / vertex

200 400 6000subspace dimension

logic / logics

200 400 6000subspace dimension

logic / vertex

200 400 6000subspace dimension

Our Key Observation We studied how the entries in the expansion matrix depend on the

dimension of the subspace to which documents are projected

expansi

on

matr

ix e

ntr

y

0

no single dimension is appropriate for all term pairs

but the shape of the curve is a good indicator for relatedness!

Page 14: Why Spectral Retrieval Works

Curves for related terms· · · · · 1 1 0 0

· · · · · 0 0 1 1

· · · · · 1 1 1 1

0 0 1 1 1 1 0 1 0

0 0 1 1 1 0 1 0 1

We call two terms perfectly related if they have an identical co-occurrence pattern

200

400

600

0subspace dimension

200

400

600

0subspace dimension

200

400

600

0subspace dimension

expansi

on

matr

ix e

ntr

y

proven shape for perfectly related

terms

provably small change after slight

perturbation

half way to a real matrix

up-and-then-down shape

remains

point of fall-off is different for every term pair!

term 1

term 2

0

Page 15: Why Spectral Retrieval Works

Curves for unrelated terms Co-occurrence graph:

– vertices = terms

– edge = two terms co-occur

We call two terms perfectly unrelated if no path connects them in the graph

curves for unrelated terms are random oscillations around zero

proven shape forperfectly unrelated terms

provably small changeafter slight perturbationhalf way to a real matrix

200

400

600

0subspace dimension

200

400

600

0subspace dimension

200

400

600

0subspace dimension

expansi

on

matr

ix e

ntr

y

0

Page 16: Why Spectral Retrieval Works

Telling the shapes apart — TN

1. Normalize term-document matrix so that theoretical point of fall-off is equal for all term pairs

2. For each term pair: if curve is never negative before this point, set entry in expansion matrix to 1, otherwise to 0

200 400 6000subspace dimension

200 400 6000subspace dimension

200 400 6000subspace dimension

a simple 0-1 classification, no fractional entries!

set entry to 1 set entry to 1 set entry to 0

expansi

on

matr

ix e

ntr

y

0

Page 17: Why Spectral Retrieval Works

An alternative algorithm — TM1. Again, normalize term-document matrix so that theoretical

point of fall-off is equal for all term pairs

2. For each term pair compute the monotonicity of its initial curve (= 1 if perfectly monotone, 0 as number of turns increase)

3. If monotonicity is above some threshold, set entry in expansion matrix to 1, otherwise to 0

again: a simple 0-1 classification!

200 400 6000subspace dimension

200 400 6000subspace dimension

200 400 6000subspace dimension

0.82 0.69 0.07

expansi

on

matr

ix e

ntr

y

set entry to 1 set entry to 1

0.82 0.69 0.07

set entry to 0

0

Page 18: Why Spectral Retrieval Works

Experimental results

TIME

63.2%

62.8%

58.6%

59.1%

62.2%

64.9%

64.1%

COS

LSI*

LSI-RN*

CORR*

IRR*

TN

TM

425 docs3882 terms

* the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!

Baseline: cosine similarity in term space

Latent Semantic Indexing Dumais et al. 1990

Term-normalized LSI Ding et al. 2001

Correlation-based LSI Dupret et al. 2001

Iterative Residual Rescaling Ando & Lee 2001

our non-negativity test

our monotonicity test

(average precision)

Page 19: Why Spectral Retrieval Works

Experimental results

TIME

63.2%

62.8%

58.6%

59.1%

62.2%

64.9%

64.1%

COS

LSI*

LSI-RN*

CORR*

IRR*

TN

TM

425 docs3882 terms

REUTERS

36.2%

32.0%

37.0%

32.3%

——

41.9%

42.9%

21578 docs5701 terms

OHSUMED

13.2%

6.9%

13.0%

10.9%

——

14.4%

15.3%

233445 docs99117 terms

* the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!

(average precision)

Page 20: Why Spectral Retrieval Works

Conclusions

Main message: spectral retrieval works through its ability to identify pairs of terms with similar co-occurrence patterns

– a simple 0-1 classification that considers a sequence of subspaces is at least as good as schemes that commit to a fixed subspace

Some useful corollaries …

– new insights into the effect of term-weighting and other normalizations for spectral retrieval

– straightforward integration of known word relationships

– consequences for spectral link analysis?

Page 21: Why Spectral Retrieval Works

Conclusions

Obrigado!

Main message: spectral retrieval works through its ability to identify pairs of terms with similar co-occurrence patterns

– a simple 0-1 classification that considers a sequence of subspaces is at least as good as schemes that commit to a fixed subspace

Some useful corollaries …

– new insights into the effect of term-weighting and other normalizations for spectral retrieval

– straightforward integration of known word relationships

– consequences for spectral link analysis?

Page 22: Why Spectral Retrieval Works
Page 23: Why Spectral Retrieval Works

Why document "expansion"

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

internet

web

surfing

beach

0

1

1

0

Ideal expansion matrix has

– high scores for related terms

– low scores for unrelated terms

Expansion matrix LTL depends on the subspace dimension

inte

rnet

web

surfi

ng

beac

h

0

1

1

0

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84-0.80 0.59 0.06 -0.010.27 0.45 -0.75 0.41

add "internet" if "web" is present

· =

expansion matrix LTLmatrix L projecting

to 4 dimensions