on statistical analysis and optimization of information retrieval effectiveness metrics
TRANSCRIPT
![Page 1: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/1.jpg)
On Statistical Analysis and Optimization of Information Retrieval
Effectiveness Metrics
Jun Wang
Joint work with Jianhan Zhu
Department of Computer Science
University College London
![Page 2: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/2.jpg)
Motivation
IR Models
Calculate (relevance) scores for individual documents
Probability Indexing
BM25
Language Models
The Binary Independent Rel. Model
![Page 3: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/3.jpg)
Motivation
✔
✖
✔
✖m (a rank order | “true” relevance of documents))
A general definition:
![Page 4: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/4.jpg)
MotivationWe have different rank preferences and thus IR metrics
NDCG
IR ModelsMRR
MAP
?
…
Something missing in between
![Page 5: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/5.jpg)
MotivationThe fundamental question
What is the underlying generative retrieval process?
![Page 6: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/6.jpg)
Outline
• What is happening right now• The statistical retrieval process• Text retrieval experiments
![Page 7: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/7.jpg)
What is happening right now (1)?
• Still focusing on (relevance) score, but with the acknowledgement the final rank context
– The “less is more” model [Chen&Karger 2006] extended the relevance model
– assumed the previously retrieved documents non-relevant when calculating the rel. of documents for the current rank position,
– equivalent to maximizing the Reciprocal Rank measure
![Page 8: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/8.jpg)
What is happening right now (2)?
• Still focusing on (relevance) score, but with the acknowledgement the final rank context
– In the Language Model framework, various loss functions were defined to incorporate various ranking strategies [Zhai&Lafferty 2006]
![Page 9: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/9.jpg)
What is happening right now (3)?
• Focusing on IR metrics and Ranking– bypass the step of estimating the relevance states of
individual documents– construct a document ranking model from training data
by directly optimizing an IR metric [Volkovs&Zemel 2009]
• However, not all IR metrics necessarily summarize the (training) data well; thus, training data may not be fully explored [Yilmaz&Robertson2009]
![Page 10: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/10.jpg)
A “balanced” view of the retrieval process
– let us first understand (infer) the relevance of documents as accurate as possible,
– and to summarize it by the joint probability of documents’ relevance
– dependency between documents is considered
– Secondly, rank preference is specified by an IR metric.
– The rank decision making is a stochastic one due to the uncertainty about the relevance
– As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric
Given an IR Metric
![Page 11: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/11.jpg)
The statistical document ranking process
a = αργ µ αξα Ε(µ | θ)
= αργ µ αξα1 ,...,αΝ( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ))
ρ1 ,...,ρΝ
∑
The joint probability of relevance given a query
IR metric:Input: 1.A rank order2.Relevance of docs. r1,...,rN
a1,...,aN
![Page 12: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/12.jpg)
The Optimal Ranker
uncertaintyFixed an IR Metric
OUTPUT: the estimated Performance Score
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ
∑
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
![Page 13: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/13.jpg)
Now the question is how to calculate the Expected IR metric under the joint probability of relevance
if we predefine the IR metric
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ
∑
m(a1,...,aN | r1,...,rN )
![Page 14: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/14.jpg)
We worked out it for the major IR metrics (Average Precision, DCG, Precision at N, Reciprocal Rank)
• Certain assumptions are needed
• The join distribution of relevance is summarized by the marginal means and co-variances
E(r1 | q),...,E(rN | q)cov(ri ,rj | q)
p(r1,...,rN | q)
![Page 15: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/15.jpg)
Some of the results
• Expect Average Precision:
• Expected Reciprocal Rank (two documents):
E[ m ]
![Page 16: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/16.jpg)
Properties of IR metrics under the uncertainty
![Page 17: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/17.jpg)
But, is this analysis can be used in practice?
• The key question is how to obtain the joint probability of relevance? – Click through data– Marginal mean
• Current IR models – relevance models, language models
- Co-variance of relevance- Use the documents’ score correlation to estimate the relevance
correlation. - It is query-independent. We approximate it by sampling queries
and calculating the correlation between documents’ ranking scores
E(r1 | q),..., E(rN | q)
cov(ri ,rj | q)
![Page 18: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/18.jpg)
TREC evaluation
![Page 19: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/19.jpg)
No free lunch
![Page 20: On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics](https://reader033.vdocuments.us/reader033/viewer/2022060200/5598b9c11a28abb24a8b462b/html5/thumbnails/20.jpg)
The ideal can be applied for evaluation too.
uncertaintyFixed an IR Metric
Output the estimated Performance Score
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Input a IR model
Relevance judgments