less is more probabilistic models for retrieving fewer relevant documents harr chen, david r. karger...

32
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

Upload: tracey-snow

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

Less is MoreProbabilistic Models for Retrieving Fewer Relevant Documents

Harr Chen, David R. KargerMIT CSAIL

ACM SIGIR 2006August 9, 2006

Page 2: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 2

Outline

• Motivations

• Expected Metric Principle

• Metrics

• Bayesian Retrieval

• Objectives

• Heuristics

• Experimental Results

• Related Work

• Future Work and Conclusions

Page 3: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 3

Motivation

• In IR, we have formal models, and formal metrics

• Models provide framework for retrieval– E.g.: Probabilistic

• Metrics provide rigorous evaluation mechanism– E.g.: Precision and recall

• Probability ranking principle (PRP) provably optimal for precision/recall– Ranking by probability of relevance

• But other metrics capture other notions of result set quality and PRP isn’t necessarily optimal

Page 4: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 4

Example: Diversity

• User may be satisfied with one relevant result– Navigational queries, question/answering

• In this case, we want to “hedge our bets” by retrieving for diversity in result set– Better to satisfy different users with different

interpretations, than one user many times over

• Reciprocal rank/search length metrics capture this notion

• PRP is suboptimal

Page 5: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 5

IR System Design

• Metrics define preference ordering on result sets– Metric[Result set 1] > Metric[Result set 2]

Result set 1 preferred to Result set 2

• Traditional approach: Try out heuristics that we believe will improve relevance performance– Heuristics not directly motivated by metric

– E.g. synonym expansion, psuedorelevance feedback

• Observation: Given a model, we can try to directly optimize for some metric

Page 6: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 6

Expected Metric Principle (EMP)

• Knowing which metric to use tells us what to maximize for – the expected value of the metric for each result set, given a model

Corpus

Document 1

Document 2

Document 3

1, 2

1, 3

2, 1

2, 3

3, 1

3, 2

Result Sets

CalculateE[Metric]

usingmodel

Returnsetwithmax

score

Page 7: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 7

Our Contributions

• Primary: EMP – metric as retrieval goal– Metric designed to measure retrieval quality

• Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

– Build probabilistic model

– Retrieve to maximize an objective: the expected value of metric

• Expectations calculated according to our probabilistic model

– Use computational heuristics to make optimization problem tractable

• Secondary: retrieving for diversity (special case)– A natural side effect of optimizing for certain metrics

Page 8: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 8

Detour: What is a Heuristic?

Ad hoc approach

• Use heuristics that are believed to be correlated with good performance

• Heuristics used to improve relevance

• Heuristics (probably) make system slower

• Infinite number of possibilities, no formalism

• Model, heuristics intertwined

Our approach

• Build model that directly optimizes for good performance

• Heuristics used to improve efficiency

• Heuristics (probably) make optimization worse

• Well-known space of optimization techniques

• Clean separation between model and heuristics

Page 9: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 9

Our Contributions

• Primary: EMP – metric as retrieval goal– Metric designed to measure retrieval quality

• Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

– Build probabilistic model

– Retrieve to maximize an objective: the expected value of metric

• Expectations calculated according to our probabilistic model

– Use computational heuristics to make optimization problem tractable

• Secondary: retrieving for diversity (special case)– A natural side effect of optimizing for certain metrics

Page 10: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 10

Search Length/Reciprocal Rank

• (Mean) search length (MSL): number of irrelevant results until first relevant

• (Mean) reciprocal rank (MRR): one over rank of first relevant

}Search length = 2

Reciprocal rank = 1/3

Page 11: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 11

Instance Recall

• Each topic has multiple instances (subtopics, aspects)

• Instance recall is how many instances covered (in union) over first n results

} Instance recall @ 5 = 0.75

Page 12: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 12

k-call @ n

• Binary metric: 1 if top n results has k relevant, 0 otherwise

• 1-call is (1 – %no)– See TREC robust track

} 1-call @ 5 = 1

2-call @ 5 = 1

3-call @ 5 = 0

Page 13: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 13

Motivation for k-call

• 1-call: Want one relevant document– Many queries satisfied with one relevant result

– Only need one relevant document, more room to explore promotes result set diversity

• n-call: Want all relevant documents– “Perfect precision”

– Hone in on one interpretation and stick to it!

• Intermediate k– Risk/reward tradeoff

• Plus, easily modeled in our framework– Binary variable

Page 14: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 14

Our Contributions

• Primary: EMP – metric as retrieval goal– Metric designed to measure retrieval quality

• Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

– Build probabilistic model

– Retrieve to maximize an objective: the expected value of metric

• Expectations calculated according to our probabilistic model

– Use computational heuristics to make optimization problem tractable

• Secondary: retrieving for diversity (special case)– A natural side effect of optimizing for certain metrics

Page 15: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 15

Bayesian Retrieval Model

• There exists distributions that generate relevant documents, irrelevant documents

• PRP: rank by

• Remaining modeling questions: form of rel/irrel distributions and parameters for those distributions

• In this paper, we assume multinomial models, and choose parameters by maximum a posteriori– Prior is background corpus word distribution

]|Pr[

]|Pr[]|Pr[

rd

rddr

Page 16: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 16

Our Contributions

• Primary: EMP – metric as retrieval goal– Metric designed to measure retrieval quality

• Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

– Build probabilistic model

– Retrieve to maximize an objective: the expected value of metric

• Expectations calculated according to our probabilistic model

– Use computational heuristics to make optimization problem tractable

• Secondary: retrieving for diversity (special case)– A natural side effect of optimizing for certain metrics

Page 17: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 17

Objective

• Probability Ranking Principle (PRP): maximize at each step in ranking

• Expected Metric Principle (EMP): maximize for complete result set

• In particular for k-call, maximize:

]|Pr[ dr

]...|metric[ 1 nddE

]...|relevant k least at Pr[]...|relevant k least at [ 11 nn ddddE

Page 18: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 18

Our Contributions

• Primary: EMP – metric as retrieval goal– Metric designed to measure retrieval quality

• Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call

– Build probabilistic model

– Retrieve to maximize an objective: the expected value of metric

• Expectations calculated according to our probabilistic model

– Use computational heuristics to make optimization problem tractable

• Secondary: retrieving for diversity (special case)– A natural side effect of optimizing for certain metrics

Page 19: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 19

Optimization of Objective

• Exact optimization of objective is usually NP-hard– E.g.: Exact optimization for k-call reducible to NP-hard

maximum graph clique problem

• Approximation heuristic: Greedy algorithm– Select documents successively in rank order

– Hold previous documents fixed, optimize objective at each rank

d1 Maximize E[metric | d]

Page 20: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 20

Optimization of Objective

• Exact optimization of objective is usually NP-hard– E.g.: Exact optimization for k-call reducible to NP-hard

maximum graph clique problem

• Approximation heuristic: Greedy algorithm– Select documents successively in rank order

– Hold previous documents fixed, optimize objective at each rank

d1 Fixed

d2 Maximize E[metric | d, d1]

Page 21: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 21

Optimization of Objective

• Exact optimization of objective is usually NP-hard– E.g.: Exact optimization for k-call reducible to NP-hard

maximum graph clique problem

• Approximation heuristic: Greedy algorithm– Select documents successively in rank order

– Hold previous documents fixed, optimize objective at each rank

d1 Fixed

d2 Fixed

d3 Maximize E[metric | d, d1, d2]

Page 22: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 22

Greedy on 1-call and n-call

• 1-greedy– Greedy algorithm reduces to ranking each successive

document assuming all previous documents are irrelevant

– Algorithm has “discovered” incremental negative pseudorelevance feedback

• n-greedy: Assume all previous documents relevant

],...,,|Pr[

)]...(|Pr[)]...(Pr[]...Pr[

])...(Pr[]...Pr[

]...Pr[

121

111111

1111

21

ii

iiii

iii

i

rrrr

rrrrrrr

rrrrr

rrr

Page 23: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 23

Greedy on Other Metrics

• Greedy with precision/recall reduces to PRP!

• Greedy on k-call for general k (k-greedy)– More complicated…

• Greedy with MSL, MRR, instance recall works out to 1-greedy algorithm– Intuition: to make first relevant document appear

earlier, we want to hedge our bets as to query interpretation (i.e., diversify)

Page 24: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 24

Experiments Overview

• Experiments verify that optimizing for metric improves performance on metric– They do not tell us which metrics to use

• Looked at ad hoc diversity examples

• TREC topics/queries

• Tuned weights on separate development set

• Tested on:– Standard ad hoc (robust track) topics

– Topics with multiple annotators

– Topics with multiple instances

Page 25: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 25

Diversity on Google Results

• Task: reranking top 1,000 Google results

• In optimizing 1-call, our algorithm finds more diverse results than PRP, Google results

Page 26: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 26

Experiments: Robust Track

• TREC 2003, 2004 robust tracks– 249 topics

– 528,000 documents

• 1-call, 10-call results statistically significant

1-call 10-call MRR MSL P@10

PRP 0.791 0.020 0.563 3.052 0.333

1-greedy 0.835 0.004 0.579 2.763 0.269

10-greedy 0.671 0.084 0.517 3.992 0.337

Page 27: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 27

Experiments: Instance Retrieval

• TREC-6,7,8 interactive tracks– 20 topics

– 210,000 documents

– 7 to 56 instances per topic

• PRP baseline: instance recall @ 10 = 0.234

• Greedy 1-call: instance recall @ 10 = 0.315

Page 28: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 28

Experiments: Multi-annotator

• TREC-4,6 ad hoc retrieval– Independent annotators assessed same topics– TREC-4: 49 topics, 568,000 documents, 3 annotators– TREC-6: 50 topics, 556,000 documents, 2 annotators

More annotators more satisfied using 1-greedy

1-call (1) 1-call (2) 1-call (3) Total

TREC-4 PRP 0.735 0.551 0.653 1.939

TREC-4 1-greedy 0.776 0.633 0.714 2.122

TREC-6 PRP 0.660 0.620 N/A 1.280

TREC-6 1-greedy 0.800 0.820 N/A 1.620

Page 29: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 29

Related Work

• Fits in risk minimization framework (objective as negative loss function)

• Other approaches look at optimizing for metrics directly, with training data

• Pseudorelevance feedback

• Subtopic retrieval

• Maximal marginal relevance

• Clustering

• See paper for references

Page 30: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 30

Future Work

• General k-call (k = 2, etc.)– Determination if this is what users want

• Better underlying probabilistic model– Our contribution is in the ranking objective, not the

model model can be arbitrarily sophisticated

• Better optimization techniques– E.g., Local search would differentiate algorithms for

MRR and 1-call

• Other metrics– Preliminary work on mean average precision, precision

@ recall• (Perhaps) surprisingly, these metrics are not optimized by PRP!

Page 31: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 31

Conclusions

• EMP: Metric can motivate model – choosing and believing in a metric already gives us a reasonable objective, E[metric]

• Can potentially apply EMP on top of a variety of different underlying probabilistic models

• Diversity is one practical example of a natural side effect of using EMP with the right metric

Page 32: Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006

August 9, 2006 ACM SIGIR 2006 Slide 32

Acknowledgments

• Harr Chen supported by the Office of Naval Research through a National Defense Science and Engineering Graduate Fellowship

• Jaime Teevan, Susan Dumais, and anonymous reviewers provided constructive feedback

• ChengXiang Zhai, William Cohen, and Ellen Voorhees provided code and data