why spectral retrieval works

Why Spectral Retrieval Works

Holger BastMax-Planck-Institut für Informatik (MPII)

Saarbrücken, Germany

joint work with Debapriyo Majumdar

SIGIR 2005 in Salvador, Brazil, August 15 – 19

What we mean by spectral retrieval

d1 d2 d3 d4 d5

qTd1———|q||d1|

cosine similaritiesqTd2———|q||d2|

0.82 0.00 0.00 0.38 0.00

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000

internetwebsurfingbeach

Ranked retrieval in the term space

1.00 1.00 0.00 0.50 0.00 "true" similarities to query

q

What we mean by spectral retrieval

d1 d2 d3 d4 d5

cosine similarities0.82 0.00 0.00 0.38 0.00

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000


Ranked retrieval in the term space

1.00 1.00 0.00 0.50 0.00 "true" similarities to query

q

Spectral retrieval = linear projection to an eigensubspace

projection matrix L

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33

L qL d1 L d2 L d3 L d4 L d5

(Lq)T(Ld1)——————

|Lq| |Ld1|

…0.98 0.98 -0.25 0.73 0.01 cosine similarities in the subspace

Why and when does this work? Previous work: if the term-document matrix is a slight

perturbation of a rank-k matrix then projection to a k-dimensional subspace works

– Papadimitriou, Tamaki, Raghavan, Vempala PODS'98

– Ding SIGIR'99

– Ando and Lee SIGIR'01

– Azar, Fiat, Karlin, McSherry, Saia STOC'01

Our explanation: spectral retrieval works through its ability to identify pairs of terms with similar co-occurrence patterns

– no single subspace is appropriate for all term pairs

– we fix that problem

Spectral retrieval — alternative view


projection matrix L

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33


(Lq)T(Ld1)——————

|Lq||Ld1|

… cosine similarities in the subspace

qT(LTLd1)——————|Lq||LTLd1|

=

d1 d2 d3 d4 d5

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000


Ranked retrieval in the term spaceq



2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33




d1 d2 d3 d4 d5

2 0 0 1 01 2 0 1 01 1 0 2 10 0 1 1 2

1000



0.29 0.36 0.25 -0.120.36 0.44 0.30 -0.170.25 0.30 0.44 0.30-0.12 -0.17 0.30 0.84

expansion matrix LTL

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84

projection matrix L



2.01 1.67 0.37 2.61 1.39

1.01 0.79 -0.84 -0.21 -1.75

0.42

0.33



1.18 0.96 -0.12 1.03 0.01

1.45 1.19 -0.17 1.22 -0.05

1.24 1.04 0.30 1.73 1.04

-0.11 -0.04 0.84 1.15 1.98

1000



0.29 0.36 0.25 -0.120.36 0.44 0.30 -0.170.25 0.30 0.44 0.30-0.12 -0.17 0.30 0.84


0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84


LTLd1 LTLd2 LTLd3 LTLd4 LTLd5

qT(LTLd1)——————

|q||LTLd1|

… similarities after document expansion

Spectral retrieval = document expansion (not query expansion)

projection matrix L

Why document "expansion"

1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 1

internet

web

surfing

beach

·

0-1 expansion matrix

inte

rnet

web

surfi

ng

beac

h

1

1

1

0

=

0

1

1

0


1 1 0 0

1 1 0 0

0 0 1 0

0 0 0 1

internet

web

surfing

beach

·

add "internet" if "web" is present

0-1 expansion matrix

inte

rnet

web

surfi

ng

beac

h

1

1

1

0

=

0

1

1

0


0.29 0.36 0.25 -0.12

0.36 0.44 0.30 -0.17

0.25 0.30 0.44 0.30

-0.12 -0.17 0.30 0.84

internet

web

surfing

beach

0

1

1

0

Ideal expansion matrix has

– high scores for intuitively related terms

– low scores for intuitively unrelated terms


inte

rnet

web

surfi

ng

beac

h

0.61

0.74

0.74

0.13

matrix L projectingto 2 dimensions

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84


· =

expansion matrix depends heavily on the subspace dimension!


0.93-

0.120.20 -0.11

-0.12 0.80 0.34 -0.18

0.20 0.34 0.44 0.30

-0.11 -0.18 0.30 0.84

internet

web

surfing

beach

0

1

1

0


– high scores for intuitively related terms

– low scores for intuitively unrelated terms

inte

rnet

web

surfi

ng

beac

h

0.08

1.13

0.78

0.12

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84-0.80 0.59 0.06 -0.01


· =

matrix L projectingto 3 dimensions

expansion matrix depends heavily on the subspace dimension!


node / vertex

200 400 6000subspace dimension

logic / logics


logic / vertex


Our Key Observation We studied how the entries in the expansion matrix depend on the

dimension of the subspace to which documents are projected

expansi

on

matr

ix e

ntr

y

0

no single dimension is appropriate for all term pairs

node / vertex


logic / logics


logic / vertex


Our Key Observation We studied how the entries in the expansion matrix depend on the

dimension of the subspace to which documents are projected

expansi

on

matr

ix e

ntr

y

0

no single dimension is appropriate for all term pairs

but the shape of the curve is a good indicator for relatedness!

Curves for related terms· · · · · 1 1 0 0

· · · · · 0 0 1 1

· · · · · 1 1 1 1

0 0 1 1 1 1 0 1 0

0 0 1 1 1 0 1 0 1

We call two terms perfectly related if they have an identical co-occurrence pattern

200

400

600

0subspace dimension

200

400

600

0subspace dimension

200

400

600

0subspace dimension

expansi

on

matr

ix e

ntr

y

proven shape for perfectly related

terms

provably small change after slight

perturbation

half way to a real matrix

up-and-then-down shape

remains

point of fall-off is different for every term pair!

term 1

term 2

0

Curves for unrelated terms Co-occurrence graph:

– vertices = terms

– edge = two terms co-occur

We call two terms perfectly unrelated if no path connects them in the graph

curves for unrelated terms are random oscillations around zero

proven shape forperfectly unrelated terms

provably small changeafter slight perturbationhalf way to a real matrix

200

400

600

0subspace dimension

200

400

600

0subspace dimension

200

400

600

0subspace dimension

expansi

on

matr

ix e

ntr

y

0

Telling the shapes apart — TN

1. Normalize term-document matrix so that theoretical point of fall-off is equal for all term pairs

2. For each term pair: if curve is never negative before this point, set entry in expansion matrix to 1, otherwise to 0




a simple 0-1 classification, no fractional entries!

set entry to 1 set entry to 1 set entry to 0

expansi

on

matr

ix e

ntr

y

0

An alternative algorithm — TM1. Again, normalize term-document matrix so that theoretical

point of fall-off is equal for all term pairs

2. For each term pair compute the monotonicity of its initial curve (= 1 if perfectly monotone, 0 as number of turns increase)

3. If monotonicity is above some threshold, set entry in expansion matrix to 1, otherwise to 0

again: a simple 0-1 classification!




0.82 0.69 0.07

expansi

on

matr

ix e

ntr

y

set entry to 1 set entry to 1

0.82 0.69 0.07

set entry to 0

0

Experimental results

TIME

63.2%

62.8%

58.6%

59.1%

62.2%

64.9%

64.1%

COS

LSI*

LSI-RN*

CORR*

IRR*

TN

TM

425 docs3882 terms

* the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!

Baseline: cosine similarity in term space

Latent Semantic Indexing Dumais et al. 1990

Term-normalized LSI Ding et al. 2001

Correlation-based LSI Dupret et al. 2001

Iterative Residual Rescaling Ando & Lee 2001

our non-negativity test

our monotonicity test

(average precision)

Experimental results

TIME

63.2%

62.8%

58.6%

59.1%

62.2%

64.9%

64.1%

COS

LSI*

LSI-RN*

CORR*

IRR*

TN

TM

425 docs3882 terms

REUTERS

36.2%

32.0%

37.0%

32.3%

——

41.9%

42.9%

21578 docs5701 terms

OHSUMED

13.2%

6.9%

13.0%

10.9%

——

14.4%

15.3%

233445 docs99117 terms

* the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!

(average precision)

Conclusions

Main message: spectral retrieval works through its ability to identify pairs of terms with similar co-occurrence patterns

– a simple 0-1 classification that considers a sequence of subspaces is at least as good as schemes that commit to a fixed subspace

Some useful corollaries …

– new insights into the effect of term-weighting and other normalizations for spectral retrieval

– straightforward integration of known word relationships

– consequences for spectral link analysis?

Conclusions

Obrigado!

Main message: spectral retrieval works through its ability to identify pairs of terms with similar co-occurrence patterns

– a simple 0-1 classification that considers a sequence of subspaces is at least as good as schemes that commit to a fixed subspace

Some useful corollaries …

– new insights into the effect of term-weighting and other normalizations for spectral retrieval

– straightforward integration of known word relationships

– consequences for spectral link analysis?


1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

internet

web

surfing

beach

0

1

1

0


– high scores for related terms

– low scores for unrelated terms

Expansion matrix LTL depends on the subspace dimension

inte

rnet

web

surfi

ng

beac

h

0

1

1

0

0.42 0.51 0.66 0.370.33 0.43 -0.08 -0.84-0.80 0.59 0.06 -0.010.27 0.45 -0.75 0.41


· =

expansion matrix LTLmatrix L projecting

to 4 dimensions

why spectral retrieval works

Documents

term spacespectral retrieval

queryspectral retrieval

termdocument matrix

linear projection

true similarities

term pairswe

expansionprojection

d1l d2l d3l d4l d51