improvements to bm25 and language models examined andrew trotman, antti puurula, blake burgess...

44
Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE, AUSTRALIA PRESENTED BY ANTTI PUURULA

Upload: dimitri-willden

Post on 16-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Improvements to BM25 and Language Models Examined

ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS

AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014

MELBOURNE, AUSTRALIA

PRESENTED BY ANTTI PUURULA

Page 2: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Introduction TREC evaluations of the 90es established the current ranking functions for ad-hoc document retrieval

Mid 90s introduced BM25 [23], the most successful ranking function to date

Armstrong et al. [1, 2] in 2009 showed no evidence of improvements in a decade, but multiple recent publications claim improvements

Page 3: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Introduction Has there been any improvement in ranking function precision?

We examine this question testing several recent BM25 and LM ranking functions

We test each function, add relevance feedback, stemming, and stopping

Mean Average Precision (MAP) compared on INEX Wikipedia 2010 and TREC Ad-hoc 1-8, with functions optimized on Wikipedia 2009

Page 4: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: ATIRE BM25 Trotman et al. (2012) [27] : ATIRE version of BM25

Page 5: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: ATIRE BM25

RetrievalStatusValuefor query q

Trotman et al. (2012) [27] : ATIRE version of BM25

Page 6: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: ATIRE BM25

RetrievalStatusValuefor query q

Robertson-WalkerIDFN = #documentsdft= #documents term t occurs

Trotman et al. (2012) [27] : ATIRE version of BM25

Page 7: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: ATIRE BM25

RetrievalStatusValuefor query q

Robertson-WalkerIDFN = #documentsdft= #documents term t occurs

BM25 term frequency normalizationtftd = count of term t in document dLd = length (L1-norm) of document dLavg= average length of documents

Trotman et al. (2012) [27] : ATIRE version of BM25

Page 8: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25L Lv & Zhai (2011) [12] : BM25 corrected for very long documents

Page 9: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25L Lv & Zhai (2011) [12] : BM25 corrected for very long documents

= BM25 with smoothed parameter estimates (with 1.0, 0.5, and δ added)

Smoothed Robertson-Walker IDF

Length-corrected BM25 term frequency normalization

Page 10: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25+ Lv & Zhai (2011) [11]: BM25 with lower-bounded term weights

Page 11: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25+ Lv & Zhai (2011) [11]: BM25 with lower-bounded term weights

Smoothed Robertson-Walker IDF Lower-bounding parameter

Page 12: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25-adpt Lv & Zhai (2011) [10]: BM25 with term-dependent k1, using Information Gain Gq

r

Page 13: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25-adpt Lv & Zhai (2011) [10]: BM25 with term-dependent k1, using Information Gain Gq

r

Smoothed Robertson-Walker IDFTerm-dependent component

Page 14: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25-adpt Lv & Zhai (2011) [10]: BM25 with term-dependent k1, using Information Gain Gq

r

k’1 solved offline for each term from the index, using a curve-fitting technique and the least square method

Page 15: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: BM25T Lv & Zhai (2012) [13]: BM25 with term-dependent k1, using log-logistic method

k’1 solved offline for each term from the index, using Newton-Raphson method

Page 16: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: TFl°δ°pxIDF Rousseau & Vazirgiannis (2013) [25]: Composite non-linear TF normalizations

Page 17: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: TFl°δ°pxIDF Rousseau & Vazirgiannis (2013) [25]: Composite non-linear TF normalizations

Smoothed Robertson-Walker IDF BM25 soft length normalization

Log-concavity normalization Lower-bounding parameter

Page 18: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: LM-DS Zhai & Lafferty (2001): Unigram Language Model with Dirichlet Prior Smoothing

Page 19: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: LM-DS Zhai & Lafferty (2001): Unigram Language Model with Dirichlet Prior Smoothing

Smoothing component Matched term component

Page 20: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: LM-PYP Momtazi & Klakow (2010): Unigram LM with Pitman-Yor Process smoothing

Page 21: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: LM-PYP Momtazi & Klakow (2010): Unigram LM with Pitman-Yor Process smoothing

Power-law discounting

Page 22: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: LM-PYP-TFIDF Puurula (2012): LM-PYP with TFIDF feature weighting

Page 23: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Ranking functions: LM-PYP-TFIDF Puurula (2012): LM-PYP with TFIDF feature weighting

TF-IDF feature weighting

Page 24: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

ATIRE KL-divergence feedback Rank terms in top k retrieved documents Ri using KL-divergence

Expand query with the top n ranked terms using Rocchio feedback:

Page 25: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

ATIRE KL-divergence feedback Rank terms in top k retrieved documents Ri using KL-divergence

Expand query with the top n ranked terms using Rocchio feedback:

Top-k document model

Collection model

Feedbackquery vector

Original query vector

Page 26: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Truncated model-based feedback

Reweight original query terms using the top-k documents, using posterior probabilities of documents as mixture weights

Interpolate with original query weights

Page 27: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Truncated model-based feedback

Reweight original query terms using the top-k documents, using posterior probabilities of documents as mixture weights

Interpolate with original query weights

Original query vector Feedback query vector

Page 28: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Parameter optimization Parameters for each ranking function optimized on INEX Wikipedia 2009

◦ Parameters constrained on reasonable ranges◦ Particle Swarm Optimization with 64 particles and 20 generations◦ 50 generations used for models with feedback (with up to 8 parameters)

Functions tested on INEX Wikipedia 2010 and TREC 1-8 datasets◦ INEX Wikipedia 2010: same documents as INEX 2009, different queries◦ TREC 1-8: different documents, different queries

Page 29: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 30: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 31: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

First observations Same Documents, Different Queries (INEX 2010):

◦ Differences between ranking functions very small

Different Documents, Different Queries (TREC 1-8):◦ BM25-adpt slightly better than others on 5 out of 9 collections

◦ Most likely due to the collection-adaptive k1-parameters◦ LMs generally worse than BM25 variants

◦ But ATIRE LM implementations not extensively optimized, unlike BM25

Page 32: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 33: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 34: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 35: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 36: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

More observations Feedback is very effective for both BM25 and LM

◦ ATIRE KL-feedback fails on LMs, truncated model-based feedback works

Stopping harms BM25 strongly, stemming can help◦ Porter-stemming seems to harm◦ S-stemmer and Krovetz help

Page 37: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 38: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Final observations: feedback+stemming

Feedback+stemming improves BM25 and LM+DP◦ No ranking function clearly better than rest◦ Stemming is effective◦ Again ATIRE KL-feedback fails on LMs, truncated model-based feedback works

Paired 1-tailed t-tests of best-performing functions:◦ Feedback is better than no feedback (p=0.0267)◦ Stemming with feedback is better than just feedback (p=0.0292)◦ Stemming with feedback is better than neither (p<0.0001)

Page 39: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

ConclusionsDifferences between the suggested BM25 ranking functions become very small, when parameters are optimal for a different but similar dataset

◦ LM power-law discounting particularly brittle, BM25 parameters more stable

Feedback works for both BM25 and LM, but different feedback functions needed

Stopping harms BM25, stemming can help

Results were exploratory, but in this scenario BM25 seems to outperform LM◦ Implementation differences can reduce ranking function performance◦ Optimization becomes increasingly difficult with many parameters

Page 40: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 41: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 42: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
Page 43: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Rewriting BM25 (BM25L example)

Page 44: Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,

Robertson & Sparck-Jones 1976