information retrieval models part i
TRANSCRIPT
![Page 1: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/1.jpg)
IR ModelsPart I — Foundations
Thomas Roelleke, Queen Mary University of LondonIngo Frommholz, University of Bedfordshire
Autumn School for Information Retrieval and ForagingSchloss Dagstuhl, September 2014
ASIRF Sponsors:
![Page 2: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/2.jpg)
IR Models
Acknowledgements
The knowledge presented in this tutorial and the Morgan & Claypool book isthe result of many many discussions with colleagues.
People involved in the production and reviewing: Gianna Amati and DjoerdHiemstra (the experts), Diane Cerra and Gary Marchionini (Morgan &Claypool), Ricardo Baeza-Yates, Norbert Fuhr, and Mounia Lalmas.
Thomas’ PhD students (who had no choice): Jun Wang, Hengzhi Wu, FredForst, Hany Azzam, Sirvan Yahyaei, Marco Bonzanini, Miguel Martinez-Alvarez.
Many more IR experts including Fabio Crestani, Keith van Rijsbergen, StephenRobertson, Fabrizio Sebastiani, Arjen deVries, Tassos Tombros, Hugo Zaragoza,ChengXiang Zhai.
And non-IR experts Fabrizio Smeraldi, Andreas Kaltenbrunner and Norman
Fenton.
2 / 133
![Page 3: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/3.jpg)
IR Models
Table of Contents
1 Introduction
2 Foundations of IR Models
3 / 133
![Page 4: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/4.jpg)
Introduction
Warming UpBackground: Time-Line of IR ModelsNotation
![Page 5: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/5.jpg)
Introduction
Warming Up
![Page 6: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/6.jpg)
IR Models
Introduction
Warming Up
Information Retrieval Conceptual Model
DD
rel.
judg.
α
αQ βQ
Dβ
ρIR
DQ
DD
Q
D
Q
R
[Fuhr, 1992]
6 / 133
![Page 7: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/7.jpg)
IR Models
Introduction
Warming Up
Vector Space Model, Term Space
Still one of the prominent IR frameworks is the Vector SpaceModel (VSM)
A term space is a vector space where each dimensionrepresents one term in our vocabulary
If we have n terms in our collection, we get an n-dimensionalterm or vector space
Each document and each query is represented by a vector inthe term space
7 / 133
![Page 8: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/8.jpg)
IR Models
Introduction
Warming Up
Formal Description
Set of terms in our vocabulary: T = {t1, . . . , tn}T spans an n-dimensional vector space
Document dj is represented by a vector of document termweights
Query q is represented by a vector of query term weights
8 / 133
![Page 9: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/9.jpg)
IR Models
Introduction
Warming Up
Document Vector
Document dj is represented by a vector of document termweights dji ∈ R:
Document term weights can be computed, e.g., using tf andidf (see below)
9 / 133
![Page 10: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/10.jpg)
IR Models
Introduction
Warming Up
Document Vector
Document dj is represented by a vector of document termweights dji ∈ R:
Weight of term
in document
Document term weights can be computed, e.g., using tf andidf (see below)
10 / 133
![Page 11: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/11.jpg)
IR Models
Introduction
Warming Up
Query Vector
Like documents, a query q is represented by a vector of queryterm weights qi ∈ R:
~q =
q1
q2
. . .qn
qi denotes the query term weight of term tiqi is 0 if the term does not appear in the query.
qi may be set to 1 if the term does appear in the query.Further query term weights are possible, for example
2 if the term is important1 if the term is just “nice to have”
11 / 133
![Page 12: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/12.jpg)
IR Models
Introduction
Warming Up
Retrieval Function
The retrieval function computes a retrieval status value (RSV)using a vector similarity measure, e.g. the scalar product:
RSV (dj , q) = ~dj × ~q =n∑
i=1
dji × qi
t
t
1
2
q
d
d
1
2
Ranking of documents according to decreasing RSV
12 / 133
![Page 13: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/13.jpg)
IR Models
Introduction
Warming Up
Example Query
Query: “ side effects of drugs on memory and cognitive abilities”
ti Query ~q ~d1~d2
~d3~d4
side effect 2 1 0.5 1 1drug 2 1 1 1 1memory 1 1 0 1 0cognitive ability 1 0 1 1 0.5
RSV 5 4 6 4.5
Produces the ranking d3 > d1 > d4 > d2
13 / 133
![Page 14: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/14.jpg)
IR Models
Introduction
Warming Up
Term weights: Example Text
In his address to the CBI, Mr Cameron is expected to say:“Scotland does twice as much trade with the rest of the UK thanwith the rest of the world put together – trade that helps to supportone million Scottish jobs.” Meanwhile, Mr Salmond has set out sixjob-creating powers for Scotland that he said were guaranteed witha “Yes” vote in the referendum. During their televised BBC debateon Monday, Mr Salmond had challenged Better Together headAlistair Darling to name three job-creating powers that were beingoffered to the Scottish Parliament by the pro-UK parties in theevent of a “No” vote.
Source: http://www.bbc.co.uk/news/uk-scotland-scotland-politics-28952197
What are good descriptors for the text? Which are more, which areless important? Which are “informative”? Which are gooddiscriminators?How can a machine answer these questions?
14 / 133
![Page 15: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/15.jpg)
IR Models
Introduction
Warming Up
Frequencies
The answer is countingDifferent assumptions:
The more frequent a term appears in a document, the moresuitable it is to describe its content
Location-based count. Think of term positions or locations.In how many locations of a text do we observe the term?The term ‘scotland’ appears in 2 out of 138 locations in theexample text
The less documents a term occurs in, the more discriminativeor informative it is
Document-based count. In how many documents do weobserve the term?Think of stop-words like ‘the’, ‘a’ etc.
Location- and document-based frequencies are the buildingblocks of all (probabilistic) models to come
15 / 133
![Page 16: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/16.jpg)
Introduction
Background: Time-Line of IR Models
![Page 17: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/17.jpg)
IR Models
Introduction
Background: Time-Line of IR Models
Timeline of IR Models: 50s, 60s and 70s
Zipf and Luhn: distribution of document frequencies;[Croft and Harper, 1979]: BIR without relevance;[Robertson and Sparck-Jones, 1976]: BIR;[Salton, 1971, Salton et al., 1975]: VSM, TF-IDF;[Rocchio, 1971]: Relevance feedback; [Maron and Kuhns, 1960]:On Relevance, Probabilistic Indexing, and IR
17 / 133
![Page 18: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/18.jpg)
IR Models
Introduction
Background: Time-Line of IR Models
Timeline of IR Models: 80s
[Cooper, 1988, Cooper, 1991, Cooper, 1994]: Beyond Boole,Probability Theory in IR: An Encumbrance;[Dumais et al., 1988, Deerwester et al., 1990]: Latent semanticindexing; [van Rijsbergen, 1986, van Rijsbergen, 1989]: P(d → q);[Bookstein, 1980, Salton et al., 1983]: Fuzzy, extended Boolean
18 / 133
![Page 19: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/19.jpg)
IR Models
Introduction
Background: Time-Line of IR Models
Timeline of IR Models: 90s
[Ponte and Croft, 1998]: LM;[Brin and Page, 1998, Kleinberg, 1999]: Pagerank and Hits;[Robertson et al., 1994, Singhal et al., 1996]: Pivoted DocumentLength Normalisation; [Wong and Yao, 1995]: P(d → q);[Robertson and Walker, 1994, Robertson et al., 1995]: 2-Poisson,BM25; [Margulis, 1992, Church and Gale, 1995]: Poisson;[Fuhr, 1992]: Probabilistic Models in IR;[Turtle and Croft, 1990, Turtle and Croft, 1991]: PIN’s;[Fuhr, 1989]: Models for Probabilistic Indexing
19 / 133
![Page 20: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/20.jpg)
IR Models
Introduction
Background: Time-Line of IR Models
Timeline of IR Models: 00s
ICTIR 2009 and ICTIR 2011; [Roelleke and Wang, 2008]: TF-IDFUncovered; [Luk, 2008, Robertson, 2005]: Event Spaces;[Roelleke and Wang, 2006]: Parallel Derivation of Models;[Fang and Zhai, 2005]: Axiomatic approach; [He and Ounis, 2005]:TF in BM25 and DFR; [Metzler and Croft, 2004]: LM andPIN’s;[Robertson, 2004]: Understanding IDF;[Sparck-Jones et al., 2003]: LM and Relevance;[Croft and Lafferty, 2003, Lafferty and Zhai, 2003]: LM book;[Zaragoza et al., 2003]: Bayesian extension to LM;[Bruza and Song, 2003]: probabilistic dependencies in LM;[Amati and van Rijsbergen, 2002]: DFR;[Lavrenko and Croft, 2001]: Relevance-based LM;[Hiemstra, 2000]: TF-IDF and LM; [Sparck-Jones et al., 2000]:probabilistic model: status
20 / 133
![Page 21: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/21.jpg)
IR Models
Introduction
Background: Time-Line of IR Models
Timeline of IR Models: 2010 and Beyond
Models for interactive and dynamic IR (e.g.iPRP [Fuhr, 2008])
Quantum models[van Rijsbergen, 2004, Piwowarski et al., 2010]
21 / 133
![Page 22: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/22.jpg)
Introduction
Notation
![Page 23: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/23.jpg)
IR Models
Introduction
Notation
Notation
A tedious start ... but a must-have.
Sets
Locations
Documents
Terms
Probabilities
23 / 133
![Page 24: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/24.jpg)
IR Models
Introduction
Notation
Notation: Sets
Notation description of events, sets, and frequencies
t, d , q, c, r term t, document d , query q, collection c, rele-vant r
Dc , Dr Dc = {d1, . . .}: set of Documents in collection c;Dr : relevant documents
Tc , Tr Tc = {t1, . . .}: set of Terms in collection c; Tr :terms that occur in relevant documents
Lc , Lr Lc = {l1, . . .}; set of Locations in collection c; Lr :locations in relevant documents
24 / 133
![Page 25: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/25.jpg)
IR Models
Introduction
Notation
Notation: Locations
Notation description of events, sets, andfrequencies
Traditional notation
nL(t, d) number of Locations at whichterm t occurs in document d
tf, tfd
NL(d) number of Locations in docu-ment d (document length)
dl
nL(t, q) number of Locations at whichterm t occurs in query q
qtf, tfq
NL(q) number of Locations in query q(query length)
ql
25 / 133
![Page 26: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/26.jpg)
IR Models
Introduction
Notation
Notation: Locations
Notation description of events, sets, andfrequencies
Traditional notation
nL(t, c) number of Locations at whichterm t occurs in collection c
TF, cf(t)
NL(c) number of Locations in collec-tion c
nL(t, r) number of Locations at whichterm t occurs in the set Lr
NL(r) number of Locations in the setLr
26 / 133
![Page 27: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/27.jpg)
IR Models
Introduction
Notation
Notation: Documents
Notation description of events, sets, andfrequencies
Traditional notation
nD(t, c) number of Documents inwhich term t occurs in the setDc of collection c
nt , df(t)
ND(c) number of Documents in theset Dc of collection c
N
nD(t, r) number of Documents inwhich term t occurs in the setDr of relevant documents
rt
ND(r) number of Documents in theset Dr of relevant documents
R
27 / 133
![Page 28: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/28.jpg)
IR Models
Introduction
Notation
Notation: Terms
Notation description of events, sets, andfrequencies
Traditional notation
nT (d , c) number of Terms in docu-ment d in collection c
NT (c) number of Terms in collec-tion c
28 / 133
![Page 29: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/29.jpg)
IR Models
Introduction
Notation
Notation: Average and Pivoted Length
Let u denote a collection associated with a set of documents. Forexample: u = c, or u = r , or u = r .
Notation description of events, sets, and frequen-cies
Traditional notation
avgdl(u) average document length: avgdl(u) =NL(u)/ND(u) (avgdl if collection im-plicit)
avgdl
pivdl(d , u) pivoted document length: pivdl(d , u) =NL(d)/avgdl(u) = dl/avgdl(u)(pivdl(d) if collection implicit)
pivdl
λ(t, u) average term frequency over all docu-ments in Du: nL(t, u)/ND(u)
avgtf(t, u) average term frequency over elite docu-ments in Du: nL(t, u)/nD(t, u)
29 / 133
![Page 30: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/30.jpg)
IR Models
Introduction
Notation
Notation: Location-based Probabilities
Notation Description of Probabili-ties
Traditional notation
PL(t|d) := nL(t,d)NL(d)
Location-based within-document term probabil-ity
P(t|d) = tfd|d| , |d | = dl = NL(d)
PL(t|q) := nL(t,q)NL(q)
Location-based within-query term probability
P(t|q) =tfq|q| , |q| = ql = NL(q)
PL(t|c) := nL(t,c)NL(c)
Location-based within-collection term probabil-ity
P(t|c) = tfc|c| , |c| = NL(c)
PL(t|r) := nL(t,r)NL(r)
Location-based within-relevance term probabil-ity
Event space PL: Locations (LM, TF)
30 / 133
![Page 31: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/31.jpg)
IR Models
Introduction
Notation
Notation: Document-based Probabilities
Notation Description of Probabilities Traditional notation
PD(t|c) := nD (t,c)ND (c)
Document-based within-collection term probability
P(t) = ntN
, N = ND(c)
PD(t|r) := nD (t,r)ND (r)
Document-based within-relevance term probability
P(t|r) = rtR
, R = ND(r)
PT (d |c) := nT (d,c)NT (c)
Term-based document proba-bility
Pavg(t|c) := avgtf(t,c)avgdl(c)
probability that t occursin document with averagelength; avgtf(t, c) ≤ avgdl(c)
Event space PD: Documents (BIR, IDF)
31 / 133
![Page 32: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/32.jpg)
IR Models
Introduction
Notation
Toy Example
Notation ValueNL(c) 20ND(c) 10avgdl(c) 20/10=2
Notation Valuedoc1 doc2 doc3
NL(d) 2 3 3pivdl(d , c) 2/2 3/2 3/2
Notation Valuesailing boats
nL(t, c) 8 6nD(t, c) 6 5PL(t|c) 8/20 6/20PD(t|c) 6/10 5/10λ(t, c) 8/10 6/10avgtf(t, c) 8/6 6/5
32 / 133
![Page 33: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/33.jpg)
Foundations of IR Models
TF-IDFPRF: The Probability of Relevance FrameworkBIR: Binary Independence RetrievalPoisson and 2-PoissonBM25LM: Language ModellingPIN’s: Probabilistic Inference NetworksRelevance-based ModelsFoundations: Summary
![Page 34: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/34.jpg)
Foundations of IR Models
TF-IDF
![Page 35: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/35.jpg)
IR Models
Foundations of IR Models
TF-IDF
TF-IDF
Still a very popular model
Best known outside IR research, ery intuitive
“TF-IDF is not a model; it is just a weighting scheme in thevector space model”
“TF-IDF is purely heuristic; it has no probabilistic roots.”
But:
TF-IDF and LM are dual models that can be shown to bederived from the same root.Simplified version of BM25
35 / 133
![Page 36: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/36.jpg)
IR Models
Foundations of IR Models
TF-IDF
TF Variants: TF(t, d)
TFtotal(t, d) := lftotal(t, d) := nL(t, d) (= tfd)
TFsum(t, d) := lfsum(t, d) :=nL(t, d)
NL(d)= PL(t|d)
(=
tfddl
)TFmax(t, d) := lfmax(t, d) :=
nL(t, d)
nL(tmax, d)
TFlog(t, d) := lflog(t, d) := log(1 + nL(t, d)) (= log(1 + tfd))
TFfrac,K (t, d) := lffrac,K (t, d) :=nL(t, d)
nL(t, d) + Kd
(=
tfdtfd + Kd
)TFBM25,k1,b(t, d) := ... :=
nL(t, d)
nL(t, d) + k1 · (b · pivdl(d , c) + (1− b))
36 / 133
![Page 37: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/37.jpg)
IR Models
Foundations of IR Models
TF-IDF
TF Variants: Collection-wide TF(t, c)
Analogously to TF(t, d), the next definition defines the variants ofTF(t, c), the collection-wide term frequency.Not considered any further here.
37 / 133
![Page 38: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/38.jpg)
IR Models
Foundations of IR Models
TF-IDF
TFtotal and TFlog
0 50 100 150 200
050
100
150
200
nL(t,d)
tf tot
al
0 50 100 150 200
01
23
45
nL(t,d)
tf log
Bias towards documents with many terms(e.g. books vs. Twitter tweets)
TFtotal:
too steepassumes all occurrences are independent(same impact)
TFlog:
less impact to subsequent occurrencesthe base of the logarithm is rankinginvariant, since it is a constant:
TFlog,base(t, d) :=ln(1 + tfd)
ln(base)
38 / 133
![Page 39: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/39.jpg)
IR Models
Foundations of IR Models
TF-IDF
Logarithmic TF: Dependence Assumption
The logarithmic TF assigns less impact to subsequent occurrencesthan the total TF does. This aspect becomes clear whenreconsidering that the logarithm is an approximation of theharmonic sum:
TFlog(t, d) = ln(1 + tfd) ≈ 1 +1
2+ . . .+
1
tfdtfd > 0
Note: ln(n + 1) =∫ n+1
11x dx .
Whereas: TFtotal(t, d) = 1 + 1 + ...+ 1
The first occurrence of a term counts in full, the secondcounts 1/2, the third counts 1/3, and so forth.
This gives a particular insight into the type of dependence thatis reflected by “bending” the total TF into a saturating curve.
39 / 133
![Page 40: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/40.jpg)
IR Models
Foundations of IR Models
TF-IDF
TFsum, TFmax and TFfrac: Graphical Illustration
0 50 100 150 200
0.0
0.2
0.4
0.6
0.8
1.0
nL(t,d)
tf
tfsum: NL(d)=200
tfsum: NL(d)=2000
tfmax: NL(tmax d)=500
tfmax: NL(tmax d)=400
0 50 100 150 200
0.0
0.2
0.4
0.6
0.8
1.0
nL(t,d)
tf fra
c
tffrac K=1tffrac K=5
tffrac K=10
tffrac K=20
tffrac K=100
tffrac K=200
tffrac K=1000
40 / 133
![Page 41: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/41.jpg)
IR Models
Foundations of IR Models
TF-IDF
TFsum, TFmax and TFfrac: Analysis
Document length normalisation (TFfrac: K may depend ondocument length)
Usually TFmax yields higher TF-values than TFsum
Linear TF variants are not really important anymore, sinceTFfrac (TFBM25) delivers better and more stable quality
TFfrac yields relatively high TF-values already for smallfrequencies, and the curve saturates for large frequencies
The good and stable performance of BM25 indicates that thisnon-linear nature is key for achieving good retrieval quality.
41 / 133
![Page 42: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/42.jpg)
IR Models
Foundations of IR Models
TF-IDF
Fractional TF: Dependence Assumption
What we refer to as “fractional TF”, is a ratio:
ratio(x , y) =x
x + y=
tfdtfd + Kd
= TFfrac,K (t, d)
The ratio is related to the harmonic sum of squares.
n
n + 1≈ 1 +
1
22+ . . .+
1
n2n > 0
This approximation is based on the following integral:∫ n+1
1
1
z2dz =
[−1
z
]n+1
1
= 1− 1
n + 1=
n
n + 1
TFfrac assumes more dependence than TFlog. k-th occurrence of aterm has an impact of 1/k2.
42 / 133
![Page 43: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/43.jpg)
IR Models
Foundations of IR Models
TF-IDF
TFBM25
TFBM25 is a special TFfrac used in the BM25 modelK is proportional to the pivoted document length(pivdl(d , c) = dl/avgdl(c)) and involves adjustmentparameters (k1, b).The common definition is:
KBM25,k1,b(d , c) := k1 · (b · pivdl(d , c) + (1− b))
For b = 1, K is equal to k1 for average documents, less thank1 for short documents and greater than k1 for longdocuments.Large b and k1 lead to a strong variation of K with an highimpact on the retrieval scoreDocuments shorter than the average have an advantage overdocuments longer than the average 43 / 133
![Page 44: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/44.jpg)
IR Models
Foundations of IR Models
TF-IDF
Inverse Document Frequency IDF
The IDF (inverse document frequency) is the negativelogarithm of the DF (document frequency).
Idea: the less documents a term appears in, the morediscrimiative or ’informative’ it is
44 / 133
![Page 45: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/45.jpg)
IR Models
Foundations of IR Models
TF-IDF
DF Variants
DF(t, c) is a quantification of the document frequency, df(t, c).The main variants are:
df(t, c) := dftotal(t, c) := nD(t, c)
dfsum(t, c) :=nD(t, c)
ND(c)= PD(t|c)
(=
df(t, c)
ND(c)
)dfsum,smooth(t, c) :=
nD(t, c) + 0.5
ND(c) + 1
dfBIR(t, c) :=nD(t, c)
ND(c)− nD(t, c)
dfBIR,smooth(t, c) :=nD(t, c) + 0.5
ND(c)− nD(t, c) + 0.5
45 / 133
![Page 46: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/46.jpg)
IR Models
Foundations of IR Models
TF-IDF
IDF Variants
IDF(t, c) is the negative logarithm of a DF quantification. Themain variants are:
idftotal(t, c) := − log dftotal(t, c)
idf(t, c) := idfsum(t, c) := − log dfsum(t, c) = − log PD(t|c)
idfsum,smooth(t, c) := − log dfsum,smooth(t, c)
idfBIR(t, c) := − log dfBIR(t, c)
idfBIR,smooth(t, c) := − log dfBIR,smooth(t, c)
IDF is high for rare terms and low for frequent terms.
46 / 133
![Page 47: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/47.jpg)
IR Models
Foundations of IR Models
TF-IDF
Burstiness
A term is bursty is it occurs often in the documents in whichit occurs
Burstiness is measured by the average term frequency in theelite set:
avgtf(t, c) =nL(t, c)
nD(t, c)
Intuition (relevant vs. non-relevant documents):
A good term is rare (not frequent, high IDF) and solitude (notbursty, low avgtf) in all documents (all non-relevantdocuments)Among relevant documents, a good term is frequent (low IDF,appears in many relevant documents) and bursty (high avgtf)
47 / 133
![Page 48: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/48.jpg)
IR Models
Foundations of IR Models
TF-IDF
IDF and Burstiness
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.2 0.4 0.6 0.8 1
PD(t|c)
idf(t,c)
λ(t,c) = 1
λ(t,c) = 2
λ(t,c) = 4
10
8
6
4
2
BURSTY
RARE
avgtf(t,c)
P (t|c)0.1 0.2 0.3 0.4 0.5
RARE
SOLITUDE
FREQUENT
SOLITUDE
FREQUENT
BURSTY
BIR
Example: nD(t1, c) = nD(t2, c) = 1, 000. Same IDF.nL(t1, d) = 1 for 1,000 documents, whereasnL(t2, d) = 1 for 999 documents, and nL(t2) = 1, 001 for one doc.nL(t1, c) = 1, 000 and nL(t2, c) = 2, 000.avgtf(t1, c) = 1 and avgtf(t2, c) = 2.λ(t, c) = avgtf(t, c) · PD(t|c)
48 / 133
![Page 49: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/49.jpg)
IR Models
Foundations of IR Models
TF-IDF
TF-IDF Term Weight
Definition (TF-IDF term weight wTF-IDF)
wTF-IDF(t, d , q, c) := TF(t, d) · TF(t, q) · IDF(t, c)
49 / 133
![Page 50: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/50.jpg)
IR Models
Foundations of IR Models
TF-IDF
TF-IDF RSV
Definition (TF-IDF retrieval status value RSVTF-IDF)
RSVTF-IDF(d , q, c) :=∑t
wTF-IDF(t, d , q, c)
RSVTF-IDF(d , q, c) =∑t
TF(t, d) · TF(t, q) · IDF(t, c)
50 / 133
![Page 51: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/51.jpg)
IR Models
Foundations of IR Models
TF-IDF
Probabilistic IDF: Probability of Being Informative
IDF so far is not probabilistic
In probabilistic scenarios, a normalised IDF value such as
0 ≤ idf(t, c)
maxidf(c)≤ 1
can be useful
maxidf(c) := − log 1ND(c) = log(ND(c)) is
the maximal value of idf(t, c) (when a termoccurs only in 1 document)
0 50000 100000 150000 200000
89
1011
12
Nd(c)
max
idf
The normalisation does not affect the ranking
51 / 133
![Page 52: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/52.jpg)
IR Models
Foundations of IR Models
TF-IDF
Probability that term t is informative
A probabilistic semantics of a max-normalised IDF can be achievedby introducing an informativeness-based probability,[Roelleke, 2003], as opposed to the normal notion ofoccurrence-based probability, and we denote the probability asP(t informs|c), to contrast it from the usual
P(t|c) := P(t occurs|c) = nD(t,c)ND(c) .
52 / 133
![Page 53: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/53.jpg)
Foundations of IR Models
PRF: The Probability of RelevanceFramework
![Page 54: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/54.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
PRF: The Probability of Relevance Framework
Relevance is at the core of any information retrieval model
With r denoting relevance, d a document and q a query, theprobability that a document is relevant is
P(r |d , q)
54 / 133
![Page 55: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/55.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Probability Ranking Principle
[Robertson, 1977], “The Probability Ranking Principle (PRP) inIR”, describes the PRP as a framework to discuss formally “what isa good ranking”? [Robertson, 1977] quotes Cooper’s formalstatement of the PRP:
“If a reference retrieval system’s response to each requestis a ranking of the documents in the collections in orderof decreasing probability of usefulness to the user whosubmitted the request, ..., then the overall effectivenessof the system ... will be the best that is obtainable onthe basis of that data.”
Formally, we can capture the principle as follows. Let A and B berankings. Then, a ranking A is better than a ranking B if at everyrank, the probability of satisfaction in A is higher than for B, i.e.:
∀rank : P(satisfactory|rank,A) > P(satisfactory|rank,B) 55 / 133
![Page 56: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/56.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
PRF: Illustration
Example (Probability of relevance)Let three users u1, u2, u3 have judged document-query pairs.
User Doc Query Judgement R
u1 d1 q1 ru1 d2 q1 ru1 d3 q1 ru2 d1 q1 ru2 d2 q1 ru2 d3 q1 ru3 d1 q1 ru3 d2 q1 ru4 d1 q1 ru4 d2 q1 r
PU(r |d1, q1) = 3/4, and PU(r |d2, q1) = 1/4, and PU(r |d3, q1) = 0/2.Subscript in PU : “event space”, a set of users.Total probability:
PU(r |d , q) =∑u∈U
P(r |d , q, u) · P(u)
56 / 133
![Page 57: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/57.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Bayes Theorem
Relevance judgements can be incomplete
In any case, for a new query we often do not have judgements
The probability of relevance is estimated via Bayes’Theorem:
P(r |d , q) =P(d , q|r) · P(r)
P(d , q)=
P(d , q, r)
P(d , q)
Decision whether or not to retrieve a document is based onthe so-called “Bayesian decision rule”:retrieve document d , if the probability of relevance is greaterthan the probability of non-relevance:
retrieve d for q if P(r |d , q) > P(r |d , q)
57 / 133
![Page 58: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/58.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Probabilistic Odds, Rank Equivalence
We can express the probabilistic odds of relevance
O(r |d , q) =P(r |d , q)
P(r |d , q)=
P(d , q, r)
P(d , q, r)=
P(d , q|r)
P(d , q|r)· P(r)
P(r)
Since P(r)/P(r) is a constant, the following rank equivalenceholds:
O(r |d , q)rank=
P(d , q|r)
P(d , q|r)
Often we don’t need the exact probability, but aneasier-to-compute value that is rank equivalent (i.e. it preserves theranking)
58 / 133
![Page 59: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/59.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Probabilistic Odds
The document-pair probabilities can be decomposed in two ways:
P(d , q|r)
P(d , q|r)=
P(d |q, r) · P(q|r)
P(d |q, r) · P(q|r)(⇒ BIR/Poisson/BM25)
=P(q|d , r) · P(d |r)
P(q|d , r) · P(d |r)(⇒ LM?)
The equation where d depends on q, is the basis of BIR,Poisson and BM25...document likelihood P(d |q, r)
The equation where q depends on d has been related to LM,P(q|d), [Lafferty and Zhai, 2003], “Probabilistic RelevanceModels Based on Document and Query Generation”
This relationship and the assumptions required to establish it,are controversial [Luk, 2008]
59 / 133
![Page 60: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/60.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Documents as Feature Vectors
The next step represents document d as avector ~d = (f1, . . . , fn) in a space of features ~fi :
P(d |q, r) = P(~d |q, r)
A feature could be, for example, the frequency of a word(term), the document length, document creation time, time oflast update, document owner, number of in-links, or numberof out-links.
See also the Vector Space Model in the Introduction – herewe used term weights as featuresAssumptions based on features:
Feature independence assumptionNon-query term assumptionTerm frequency split
60 / 133
![Page 61: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/61.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Feature Independence Assumption
The features (terms) are independent events:
P(~d |q, r) ≈∏i
P(fi |q, r)
Weaker assumption for the fraction of feature probabilities(linked dependence):
P(d |q, r)
P(d |q, r)≈∏i
P(fi |q, r)
P(fi |q, r)
Here it is not required to distinguish between these twoassumptions
P(fi |q, r) and P(fi |q, r) may be estimated for instance bymeans of relevance judgements
61 / 133
![Page 62: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/62.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Non-Query Term Assumption
Non-query terms can be ignored for retrieval (ranking)
This reduces the number of features/terms/dimensions toconsider when computing probabilities
For non-query terms, the feature probability is the same in relevantdocuments and non-relevant documents.
for all non-query terms:P(fi |q, r)
P(fi |q, r)= 1
Then
62 / 133
![Page 63: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/63.jpg)
IR Models
Foundations of IR Models
PRF: The Probability of Relevance Framework
Term Frequency Split
The product over query terms is split into two parts
First part captures the fi > 0 features, i.e. the document terms
Second part captures the fi = 0 features, i.e. thenon-document terms
63 / 133
![Page 64: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/64.jpg)
Foundations of IR Models
BIR: Binary Independence Retrieval
![Page 65: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/65.jpg)
IR Models
Foundations of IR Models
BIR: Binary Independence Retrieval
BIR: Binary Independence Retrieval
The BIR instantiation of the PRF assumes the vectorcomponents to be binary term features,i.e. ~d = (x1, x2, . . . , xn), where xi ∈ {0, 1}Term occurrences are represented in a binary feature vector ~din the term space
The event xt = 1 is expressed as t, and xt = 0 as t
65 / 133
![Page 66: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/66.jpg)
IR Models
Foundations of IR Models
BIR: Binary Independence Retrieval
BIR Term Weight
We save the derivation ... after few steps from O(r |d , q) ...
Event Space: d and q are binary vectors.
Definition (BIR term weight wBIR)
wBIR(t, r , r) := log
(PD(t|r)
PD(t|r)· PD(t|r)
PD(t|r)
)
Simplified form (referred to as F1) considering term presence only:
wBIR,F1(t, r , c) := logPD(t|r)
PD(t|c)
66 / 133
![Page 67: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/67.jpg)
IR Models
Foundations of IR Models
BIR: Binary Independence Retrieval
BIR RSV
Definition (BIR retrieval status value RSVBIR)
RSVBIR(d , q, r , r) :=∑
t∈d∩qwBIR(t, r , r)
RSVBIR(d , q, r , r) =∑
t∈d∩qlog
P(t|r) · P(t|r)
P(t|r) · P(t|r)
67 / 133
![Page 68: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/68.jpg)
IR Models
Foundations of IR Models
BIR: Binary Independence Retrieval
BIR Estimations
Relevance judgements for 20 documents (examplefrom [Fuhr, 1992]):
di 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20x1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0x2 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0R r r r r r r r r r r r r r r r r r r r r
ND(r) = 12; ND(r) = 8; P(t1|r) = P(x1 = 1|r) = 8/12 = 2/3P(t1|r) = 2/8 = 1/4; P(t1|r) = 4/12 = 1/3; P(t1|r) = 6/8 = 3/4Dito for t2.
68 / 133
![Page 69: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/69.jpg)
IR Models
Foundations of IR Models
BIR: Binary Independence Retrieval
Missing Relevance Information I
Estimation of non-relevant documents:
Take collection-wide term probability as approximation (r ≡ c)
P(t|r) ≈ P(t|c)
Use the set of all documents minus the set of relevantdocuments (r = c\r)
P(t|r) ≈ nD(t, c)− nD(t, r)
ND(c)− ND(r)
69 / 133
![Page 70: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/70.jpg)
IR Models
Foundations of IR Models
BIR: Binary Independence Retrieval
Missing Relevance Information II
“Empty” set problem: For ND(r) = ∅ (no relevant documents),P(t|r) is not defined. Smoothing deals with this situation.
Add the query to the set of relevant documents and thecollection:
P(t|r) =nd(t, r) + 1
ND(r) + 1
Other forms of smoothing, e.g.
P(t|r) =nd(t, r) + 0.5
ND(r) + 1
This variant may be justified by Laplace’s law of succession.
70 / 133
![Page 71: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/71.jpg)
IR Models
Foundations of IR Models
BIR: Binary Independence Retrieval
RSJ Term Weight
Recall wBIR,F1(t, r , c) := log PD(t|r)PD(t|c)
Definition (RSJ term weight wRSJ)
The RSJ term weight is a smooth BIR term weight.The probability estimation is as follows:P(t|r) := (nD(t, r) + 0.5)/(ND(r) + 1);P(t|r) := ((nD(t, c) + 1)− (nD(t, r) + 0.5))/((ND(c) + 2)− (ND(r) + 1)).
wRSJ,F4(t, r , r , c) :=
log
((nd (t, r) + 0.5)/(ND(r)− nd (t, r) + 0.5)
(nD(t, c)−nD(t, r)+0.5)/((ND(c)−ND(r))− (nD(t, c)−nD(t, r)) + 0.5)
)
71 / 133
![Page 72: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/72.jpg)
Foundations of IR Models
Poisson and 2-Poisson
![Page 73: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/73.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson Model I
Less known model; however, there are good reasons to look at thismodel
“Demystify” the Poisson probability – some research studentsresign when hearing “Poisson” ;-)
Next to the BIR model the natural instantiation of a PRFmodel; the BIR model is a special case of the Poisson model
The 2-Poisson probability is arguably the foundation of theBM25-TF quantification [Robertson and Walker, 1994],“SomeSimple Effective Approximations to the 2-Poisson Model”.
73 / 133
![Page 74: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/74.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson Model II
The Poisson probability is a model of randomness. Divergencefrom randomness (DFR), which is based on the probabilityP(t ∈ d |collection) = P(tfd > 0|collection). The probabilityP(tfd |collection) can be estimated by a Poisson probability.
The Poisson parameter λ(t, c) = nL(t, c)/ND(c), i.e. theaverage number of term occurrences, relates Document-basedand Location-based probabilities.
avgtf(t, c) · PD(t|c) = λ(t, c) = avgdl(c) · PL(t|c)
We refer to this relationship as Poisson Bridge since theaverage term frequency is the parameter of the Poissonprobability.
74 / 133
![Page 75: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/75.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson Model III
The Poisson model yields a foundation of TF-IDF (see Part II)
The Poisson bridge helps to relate TF-IDF and LM (seePart II)
75 / 133
![Page 76: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/76.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson Distribution
Let t be an event that occurs in average λt times. The Poissonprobability is:
PPoisson,λt (k) :=λktk!· e−λt
76 / 133
![Page 77: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/77.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson Example
The probability that k = 4 sunny daysoccur in a week, given the averageλ = p · n = 180/360 · 7 = 3.5 sunnydays per week, is:
0.00
0.05
0.10
0.15
0.20
k
P3.
5(k)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PPoisson,λ=3.5(k = 4) =(3.5)4
4!· e−3.5 ≈ 0.1888
77 / 133
![Page 78: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/78.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson PRF: Basic Idea
The Poisson model could be used to estimate the documentprobability P(d |q, r) as the product of the probabilities of thewithin-document term frequencies kt :
P(d |q, r) = P(~d |q, r) =∏t
P(kt |q, r)
kt := tfd := nL(t, d)
~d is a vector of term frequencies (cf. BIR: binary vector ofoccurrence/non-occurrence)
78 / 133
![Page 79: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/79.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson PRF: Odds
Rank-equivalence, non-query term assumption and probabilisticodds lead us to:
O(r |d , q)rank=∏t∈q
P(kt |r)
P(kt |r), kt := tfd := nL(t, d)
Splitting the product into document and non-document termsyields:
O(r |d , q)rank=
∏t∈d∩q
P(kt |r)
P(kt |r)·∏
t∈q\d
P(0|r)
P(0|r)
79 / 133
![Page 80: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/80.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson PRF: Meaning of Poisson Probability
For the set r (r) and a document d of length dl, we look at:
How many times would we expect to see a term t in d?
How many times do we actually observe t in d (= kt)?
P(kt |r) is highest if our expectation is met by our observation.
λ(t, d , r) = dl · PL(t|r) is a number of expected occurrences in adocument with length dl (how many times will we draw t if we haddl trials?)
P(kt |r) = PPoisson,λ(t,d ,r)(kt): probability to observe kt occurrencesof term t in dl trials
80 / 133
![Page 81: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/81.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson PRF: Example
dl = 5,PL(t|r) = 5/10 = 1/2λ(t, d , r) = 5 · 1/2 = 2.5P(2|r) = PPoisson,2.5(2) = 0.2565156
0.00
0.05
0.10
0.15
0.20
0.25
k
P2.
5(k)
0 1 2 3 4 5 6 7 8 9 10
81 / 133
![Page 82: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/82.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson Term Weight
Event Space: d and q are frequency vectors.
Definition (Poisson term weight wPoisson)
wPoisson(t, d , q, r , r) := TF(t, d) · logλ(t, d , r)
λ(t, d , r)
wPoisson(t, d , q, r , r) = TF(t, d) · logPL(t|r)
PL(t|r)
λ(t, d , r) = dl · PL(t|r)
TF(t, d) = kt (smarter TF quantification possible)
82 / 133
![Page 83: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/83.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
Poisson RSV
Definition (Poisson retrieval status value RSVPoisson)
RSVPoisson(d , q, r , r) :=
∑t∈d∩q
wPoisson(t, d , q, r , r)
+
len normPoisson
RSVPoisson(d , q, r , r) =
∑t∈d∩q
TF(t, d) · logPL(t|r)
PL(t|r)
+
dl ·∑t
(PL(t|r)− PL(t|r))
83 / 133
![Page 84: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/84.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
2-Poisson
Viewed as a motivation for the BM25-TF quantification,[Robertson and Walker, 1994], “Simple Approximations to the2-Poisson Model”. In an exchange with Stephen Robertson, heexplained:
“The investigation into the 2-Poisson probabilitymotivated the BM25-TF quantification.Regarding thecombination of TF and RSJ weight in BM25, TF can beviewed as a factor to reflect the uncertainty aboutwhether the RSJ weight wRSJ is correct; for terms with arelatively high within-document TF, the weight is correct;for terms with a relatively low within-document TF, thereis uncertainty about the correctness. In other words, theTF factor can be viewed as a weight to adjust the impactof the RSJ weight.”
84 / 133
![Page 85: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/85.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
2-Poisson Example: How many cars to expect? I
How many cars are expected on a given commuter car park?
Approach 1: In average, there are 700 cars per week. Thedaily average is: λ = 700/7 = 100 cars/day.
Then, Pλ=100(k) is the probability that there are k carswanting to park on a given day.
Estimation is less accurate than an estimation based on a2-dimensional model – Mo-Fr are the busy days, and onweek-ends, the car park is nearly empty.
This means that a distribution such as (130, 130, 130, 130,130, 25, 25) is more likely than 100 each day.
85 / 133
![Page 86: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/86.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
2-Poisson Example: How many cars to expect? II
Approach 2: In a more detailed analysis, we observe 650 carsMon-Fri (work days) and 50 cars Sat-Sun (week-end days).
The averages are: λ1 = 650/5 = 130 cars/work-day,λ2 = 50/2 = 25 cars/we-day.
Then, Pµ1=5/7,λ1=130,µ2=2/7,λ2=25(k) is the 2-dimensionalPoisson probability that there are k cars looking for a carpark.
86 / 133
![Page 87: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/87.jpg)
IR Models
Foundations of IR Models
Poisson and 2-Poisson
2-Poisson Example: How many cars to expect? III
Main idea of the 2-Poisson probability: combine (interpolate,mix) two Poisson probabilities
P2-Poisson,λ1,λ2,µ(kt) := µ ·λkt1
kt !· e−λ1 + (1− µ) ·
λkt2
kt !· e−λ2
Such a mixture model is also used with LM (later)
λ1 could be over all documents whereas λ2 could be over anelite set (e.g. documents that contain at least one query term)
87 / 133
![Page 88: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/88.jpg)
Foundations of IR Models
BM25
![Page 89: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/89.jpg)
IR Models
Foundations of IR Models
BM25
BM25
One of the most prominent IR modelsThe ingredients have already been prepared:
TFBM25,K (t, d)wRSJ,F4(t, r , r , c)
Wikipedia 2014 formulation: Given a query Q containingkeywords q1, . . . , qn:
score(D,Q) =n∑
i=1
IDF(qi ) ·f (qi ,D) · (k1 + 1)
f (qi ,D) + k1 · (1− b + b · |D|avgdl )
with f (qi ,D) = nL(qi ,D) (qi ’s term frequency)Here:
Ignore (k1 + 1) (ranking invariant!)Use wRSJ,F4(t, r , r , c). If relevance information is missing:wRSJ,F4(t, r , r , c) ≈ IDF (t)
89 / 133
![Page 90: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/90.jpg)
IR Models
Foundations of IR Models
BM25
BM25 Term Weight
Definition (BM25 term weight wBM25)
wBM25,k1,b,k3(t, d , q, r , r) :=∑t∈d∩q
TFBM25,k1,b(t, d) · TFBM25,k3(t, q) · wRSJ(t, r , r)
TFBM25,k1,b(t, d) :=tfd
tfd + k1 · (b · pivdl(d) + (1− b))
90 / 133
![Page 91: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/91.jpg)
IR Models
Foundations of IR Models
BM25
BM25 RSV
Definition (BM25 retrieval status value RSVBM25)
RSVBM25,k1,b,k2,k3(d , q, r , r , c) := ∑t∈d∩q
wBM25,k1,b,k3(t, d , q, r , r)
+ len normBM25,k2
Additional length normalisation
len normBM25,k2(d , q, c) := k2 · ql · avgdl(c)− dl
avgdl(c) + dl
(ql is query length). Suppresses long documents (negative lengthnorm).
91 / 133
![Page 92: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/92.jpg)
IR Models
Foundations of IR Models
BM25
BM25: Summary
We have discussed the foundations of BM25, an instance of theprobability relevance framework (PRF).
TF quantification TFBM25 using TFfrac and the pivoteddocument length
TFBM25 can be related to probability theory throughsemi-subsumed event occurrences (out of scope of this talk)
RSJ term weight wRSJ as smooth variant of the BIR termweight (“0.5-smoothing” can be explained through Laplace’slaw of succession)
92 / 133
![Page 93: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/93.jpg)
IR Models
Foundations of IR Models
BM25
Document- and Query-Likelihood Models
We are now leaving the world of document-likelihood models andmove towards LM, the query-likelihood model.
P(d , q|r)
P(d , q|r)=
P(d |q, r) · P(q|r)
P(d |q, r) · P(q|r)(⇒ BIR/Poisson/BM25)
=P(q|d , r) · P(d |r)
P(q|d , r) · P(d |r)(⇒ LM?)
93 / 133
![Page 94: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/94.jpg)
Foundations of IR Models
LM: Language Modelling
![Page 95: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/95.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
Language Modelling (LM)
Popular retrieval model since the late 90s[Ponte and Croft, 1998, Hiemstra, 2000,Croft and Lafferty, 2003]
Compute probability P(q|d) that a document generates aquery
q is a conjunction of term events: P(q|d) ∝∏
P(t|d)
Zero-probability problem: terms t that don’t appear in d leadto P(t|d) = 0 and hence P(q|d) = 0
Mix the within-document term probability P(t|d) and thecollection-wide term probability P(t|c)
95 / 133
![Page 96: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/96.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
Probability Mixture
Let three events “x , y , z”, and two conditional probabilities P(z |x)and P(z |y) be given.Then, P(z |x , y) can be estimated as a linear combination/mixtureof P(z |x) and P(z |y).
P(z |x , y) ≈ δx · P(z |x) + (1− δx) · P(z |y)
Here, 0 < δx < 1 is the mixture parameter.The mixture parameters can be constant (Jelinek-Mercer mixture),or can be set proportional to the total probabilities.
96 / 133
![Page 97: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/97.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
Probability Mixture Example
Let P(sunny,warm, rainy, dry,windy|glasgow) describe theprobability that a day in Glasgow is sunny, the next day iswarm, the next rainy, and so forth
If for one event (e.g. sunny), the probability were zero, thenthe probability of the conjunction (product) is zero. A mixturesolves the problem.
For example, mix P(x |glasgow) with P(x |uk) whereP(x |uk) > 0 for each event x .
Then, in a week in winter, when P(sunny|glasgow) = 0, andfor the whole of the UK, the weather office reports 2 of 7 daysas sunny, the mixed probability is:
P(sunny|glasgow, uk) = δ · 0
7+ (1− δ) · 2
7
97 / 133
![Page 98: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/98.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
LM1 Term Weight
Event Space: d and q are sequences of terms.
Definition (LM1 term weight wLM1)
P(t|d , c) := δd · P(t|d) + (1− δd) · P(t|c)
wLM1,δd (t, d , q, c) := TF(t, q) · log
=P(t|d ,c)︷ ︸︸ ︷(δd · P(t|d) + (1− δd) · P(t|c))
P(t|d): foreground probability
P(t|c): background probability
98 / 133
![Page 99: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/99.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
Language Modelling Independence Assumption
We assume terms are independent, meaning the product over theterm probabilities P(t|d) is equal to P(q|d):
Probability that mixture of background and foregroundprobabilities generates query as sequence of terms:
t IN q: sequence of terms (e.g.q = (sailing, boat, sailing))
t ∈ q: set of terms; TF(sailing, q) = 2
Note: log(P(t|d , c)TF(t,q)
)= TF(t, q) · log (P(t|d , c))
99 / 133
![Page 100: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/100.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
LM1 RSV
Definition (LM1 retrieval status value RSVLM1)
RSVLM1,δd (d , q, c) :=∑t∈q
wLM1,δd (t, d , q, c)
100 / 133
![Page 101: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/101.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
JM-LM Term Weight
For constant δ, the score can be divided by∏
t IN q(1− δ). Thisleads to the following equation, [Hiemstra, 2000]:
P(q|d , c)
P(q|c) ·∏
t IN q(1− δ)=∏
t∈d∩q
(1 +
δ
1− δ· P(t|d)
P(t|c)
)TF(t,q)
Definition (JM-LM (Jelinek-Mercer) term weight wJM-LM)
wJM-LM,δ(t, d , q, c) := TF(t, q) · log
(1 +
δ
1− δ· P(t|d)
P(t|c)
)
101 / 133
![Page 102: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/102.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
JM-LM RSV
Definition (JM-LM retrieval status value RSVJM-LM)
RSVJM-LM,δ(d , q, c) :=∑
t∈d∩qwJM-LM,δ(t, d , q, c)
RSVJM-LM,δ(d , q, c) :=∑
t∈d∩qTF(t, q) · log
(1 +
δ
1− δ· P(t|d)
P(t|c)
)We only need to look at terms that appear in both document andquery.
102 / 133
![Page 103: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/103.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
Dirichlet-LM Term Weight
Document-dependent mixture parameter: δD = dldl+µ
Definition (Dirich-LM term weight wDirich-LM)
wDirich-LM,µ(t, d , q, c) := TF(t, q) · log
(µ
µ+|d |+|d ||d |+µ
· P(t|d)
P(t|c)
)
103 / 133
![Page 104: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/104.jpg)
IR Models
Foundations of IR Models
LM: Language Modelling
Dirich-LM RSV
Definition (Dirich-LM retrieval status value RSVDirich-LM)
RSVDirich-LM,µ(d , q, c) :=∑t∈q
wDirich-LM,µ(t, d , q, c)
RSVDirich-LM(d , q, c, µ) =∑t∈q
TF(t, q) · log
(µ
µ+ |d |+
|d ||d |+ µ
· P(t|d)
P(t|c)
)
104 / 133
![Page 105: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/105.jpg)
Foundations of IR Models
PIN’s: Probabilistic InferenceNetworks
![Page 106: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/106.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
PIN’s: Probabilistic Inference Networks
Random variables andconditional dependencies asdirected acyclic graph(DAG)
Minterm: Conjunction ofterm events (e.g. t1 ∧ t2)
Decomposition of (disjoint)minterms X leads tocomputation
P(q|d) =∑x∈X
P(q|x)·P(x |d)
106 / 133
![Page 107: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/107.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
Link Matrix
Probability flow in a PIN can be described via a link ortransition matrix
Link matrix L contains the transition probabilitiesP(target|x , source)
Usually, P(target|x , source) = P(target|x) is assumed, andthis assumption is referred to as the linked independenceassumption.
107 / 133
![Page 108: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/108.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
PIN Link Maxtrix
L :=
[P(q|x1) . . . P(q|xn)P(q|x1) . . . P(q|xn)
]
(P(q|d)P(q|d)
)= L ·
P(x1|d)...
P(xn|d)
108 / 133
![Page 109: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/109.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
Link Matrix for 3 Terms
For illustrating the link matrix, a matrix for three terms is shownnext.
LTransposed =
P(q|t1, t2, t3) P(q|t1, t2, t3)P(q|t1, t2, t3) P(q|t1, t2, t3)P(q|t1, t2, t3) P(q|t1, t2, t3)P(q|t1, t2, t3) P(q|t1, t2, t3)P(q|t1, t2, t3) P(q|t1, t2, t3)P(q|t1, t2, t3) P(q|t1, t2, t3)P(q|t1, t2, t3) P(q|t1, t2, t3)P(q|t1, t2, t3) P(q|t1, t2, t3)
109 / 133
![Page 110: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/110.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
Special Link Matrices
The matrices Lor and Land reflect the boolean combination of thelinkage between a source and a target.
Lor =
[1 1 1 1 1 1 1 00 0 0 0 0 0 0 1
]
Land =
[1 0 0 0 0 0 0 00 1 1 1 1 1 1 1
]
110 / 133
![Page 111: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/111.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
y = A x
(P(q|d)P(q|d)
)= L ·
P(t1, t2, t3|d)P(t1, t2, t3|d)P(t1, t2, t3|d)P(t1, t2, t3|d)P(t1, t2, t3|d)P(t1, t2, t3|d)P(t1, t2, t3|d)P(t1, t2, t3|d)
111 / 133
![Page 112: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/112.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
Turtle/Croft Link Matrix
Let wt := P(q|t) be the query term probabilities. Then, estimatethe link matrix elements P(q|x), where x is a boolean combinationof terms, as follows.
LTurtle/Croft =
[1 w1+w2
w0
w1+w3w0
w1w0
w2+w3w0
w2w0
w3w0
0
0 w3w0
w2w0
w2+w3w0
w1w0
w1+w3w0
w1+w2w0
1
]
[Turtle and Croft, 1992, Croft and Turtle, 1992]
112 / 133
![Page 113: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/113.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
PIN Term Weight
Definition (PIN-based term weight wPIN)
wPIN(t, d , q, c) :=1∑
t′ P(q|t ′, c)· P(q|t, c) · P(t|d , c)
113 / 133
![Page 114: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/114.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
PIN RSV
Definition (PIN-based retrieval status value RSVPIN)
RSVPIN(d , q, c) :=∑t
wPIN(t, d , q, c)
RSVPIN(d , q, c) =1∑
t P(q|t, c)·∑
t∈d∩qP(q|t, c) · P(t|d , c)
P(q|t, c) proportional pidf(t, c)P(t|d , c) proportional TF(t, d)
114 / 133
![Page 115: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/115.jpg)
IR Models
Foundations of IR Models
PIN’s: Probabilistic Inference Networks
PIN RSV Computation Example
Let the following term probabilities be given:
ti P(ti |d) P(q|ti )sailing 2/3 1/10, 000boats 1/2 1/1, 000
Terms are independent events. P(t|d) is proportional to thewithin-document term frequency, and P(q|t) is proportional to theIDF. The RSV is:
RSVPIN(d , q, c) =1
1/10, 000 + 1/1, 000· (1/10, 000 · 2/3 + 1/1, 000 · 1/2) =
=1
11/10, 000· (2/30, 000 + 1/2, 000) =
10, 000
11· (2 + 15)/30, 000 =
=1
11· (2 + 15)/3 = 17/33 ≈ 0.51
115 / 133
![Page 116: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/116.jpg)
Foundations of IR Models
Relevance-based Models
![Page 117: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/117.jpg)
IR Models
Foundations of IR Models
Relevance-based Models
Relevance-based Models
VSM (Rocchio’s formulae)
PRF
[Lavrenko and Croft, 2001], “Relevance-based LanguageModels”: LM-based approach to estimate P(t|r). “Massivequery expansion”.
117 / 133
![Page 118: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/118.jpg)
IR Models
Foundations of IR Models
Relevance-based Models
Relevance Feedback in the VSM
[Rocchio, 1971], “Relevance Feedback in Information Retrieval”, isthe must-have reference and background for what a relevancefeedback model aims at. There are two formulations thataggregate term weights:
weight(t, q) = weight(t, q) +1
|R|·∑d∈R
~d − 1
|R|·∑d∈R
~d
weight(t, q) = α · weight(t, q) + β ·∑d∈R
~d − γ ·∑d∈R
~d
118 / 133
![Page 119: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/119.jpg)
Foundations of IR Models
Foundations: Summary
![Page 120: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/120.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Foundations: Summary
1 TF-IDF: Semantics of TF quantification.
2 PRF: basis of BM25. LM?
3 BIR
4 Poisson
5 BM25: The P(d |q)/P(d) side of IR models.
6 LM: The P(q|d)/P(q) side of IR models.
7 PIN’s: Special link matrix .
8 Relevance-based Models
120 / 133
![Page 121: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/121.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Model Overview: TF-IDF, BIR, Poisson
RSVTF-IDF(d , q, c) :=∑t
wTF-IDF(t, d , q, c)
wTF-IDF(t, d , q, c) := TF(t, d) · TF(t, q) · IDF(t, c)
RSVBIR(d , q, r , r) :=∑
t∈d∩q
wBIR(t, r , r)
wBIR(t, r , r) := log(
PD (t|r)PD (t|r)
· PD (t|r)PD (t|r)
)RSVPoisson(d , q, r , r) :=
∑t∈d∩q
wPoisson(t, d , q, r , r)
+ len normPoisson
wPoisson(t, d , q, r , r) := TF(t, d) · log λ(t,d,r)λ(t,d,r)
(= TF(t, d) · log PL(t|r)
PL(t|r)
)
121 / 133
![Page 122: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/122.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Model Overview: BM25, LM, DFR
RSVBM25,k1,b,k2,k3 (d , q, r , r , c) :=
∑t∈d∩q
wBM25,k1,b,k3 (t, d , q, r , r)
+ len normBM25,k2
wBM25,k1,b,k3 (t, d , q, r , r) :=∑
t∈d∩q TFBM25,k1,b(t, d) · TFBM25,k3 (t, q) · wRSJ(t, r , r)
RSVJM-LM,δ(d , q, c) :=∑
t∈d∩q
wJM-LM,δ(t, d , q, c)
wJM-LM,δ(t, d , q, c) := TF(t, q) · log(
1 + δ1−δ ·
P(t|d)P(t|c)
)RSVDirich-LM,µ(d , q, c) :=
∑t∈q
wDirich-LM,µ(t, d , q, c)
wDirich-LM,µ(t, d , q, c) := TF(t, q) · log(
µµ+|d| + |d|
|d|+µ ·P(t|d)P(t|c)
)RSVDFR,M(d , q, c) :=
∑t∈d∩q
wDFR,M(t, d , c)
wDFR-1,M(t, d , c) := −log PM(t ∈ d |c) wDFR-2,M(t, d , c) := −log PM(tfd |c)
122 / 133
![Page 123: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/123.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
The End (of Part I)
Material from book: “IR Models: Foundations and Relationships”.Morgan & Claypool, 2013.References and textual explanations of formulae in book.
123 / 133
![Page 124: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/124.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib I
Amati, G. and van Rijsbergen, C. J. (2002).
Probabilistic models of information retrieval based on measuring the divergence from randomness.ACM Transaction on Information Systems (TOIS), 20(4):357–389.
Bookstein, A. (1980).
Fuzzy requests: An approach to weighted Boolean searches.Journal of the American Society for Information Science, 31:240–247.
Brin, S. and Page, L. (1998).
The anatomy of a large-scale hypertextual web search engine.Computer Networks, 30(1-7):107–117.
Bruza, P. and Song, D. (2003).
A comparison of various approaches for using probabilistic dependencies in language modeling.In SIGIR, pages 419–420. ACM.
Church, K. and Gale, W. (1995).
Inverse document frequency (idf): A measure of deviation from Poisson.In Proceedings of the Third Workshop on Very Large Corpora, pages 121–130.
124 / 133
![Page 125: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/125.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib II
Cooper, W. (1991).
Some inconsistencies and misnomers in probabilistic IR.In Bookstein, A., Chiaramella, Y., Salton, G., and Raghavan, V., editors, Proceedings of the FourteenthAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages57–61, New York.
Cooper, W. S. (1988).
Getting beyond Boole.Information Processing and Management, 24(3):243–248.
Cooper, W. S. (1994).
Triennial ACM SIGIR award presentation and paper: The formalism of probability theory in IR: A foundationfor an encumbrance.In [Croft and van Rijsbergen, 1994], pages 242–248.
Croft, B. and Lafferty, J., editors (2003).
Language Modeling for Information Retrieval.Kluwer.
Croft, W. and Harper, D. (1979).
Using probabilistic models of document retrieval without relevance information.Journal of Documentation, 35:285–295.
125 / 133
![Page 126: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/126.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib III
Croft, W. and Turtle, H. (1992).
Retrieval of complex objects.In Pirotte, A., Delobel, C., and Gottlob, G., editors, Advances in Database Technology — EDBT’92, pages217–229, Berlin et al. Springer.
Croft, W. B. and van Rijsbergen, C. J., editors (1994).
Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Developmentin Information Retrieval, London, et al. Springer-Verlag.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990).
Indexing by latent semantic analysis.Journal of the American Society for Information Science, 41(6):391–407.
Dumais, S. T., Furnas, G. W.and Landauer, T. K., and Deerwester, S. (1988).
Using latent semantic analysis to improve information retrieval.pages 281–285.
Fang, H. and Zhai, C. (2005).
An exploration of axiomatic approaches to information retrieval.In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research anddevelopment in information retrieval, pages 480–487, New York, NY, USA. ACM.
126 / 133
![Page 127: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/127.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib IV
Fuhr, N. (1989).
Models for retrieval with probabilistic indexing.Information Processing and Management, 25(1):55–72.
Fuhr, N. (1992).
Probabilistic models in information retrieval.The Computer Journal, 35(3):243–255.
Fuhr, N. (2008).
A probability ranking principle for interactive information retrieval.Information Retrieval, 11:251–265.
He, B. and Ounis, I. (2005).
Term frequency normalisation tuning for BM25 and DFR models.In ECIR, pages 200–214.
Hiemstra, D. (2000).
A probabilistic justification for using tf.idf term weighting in information retrieval.International Journal on Digital Libraries, 3(2):131–139.
Kleinberg, J. (1999).
Authoritative sources in a hyperlinked environment.Journal of ACM, 46.
127 / 133
![Page 128: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/128.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib V
Lafferty, J. and Zhai, C. (2003).
Probabilistic Relevance Models Based on Document and Query Generation, chapter 1.In [Croft and Lafferty, 2003].
Lavrenko, V. and Croft, W. B. (2001).
Relevance-based language models.In SIGIR, pages 120–127. ACM.
Luk, R. W. P. (2008).
On event space and rank equivalence between probabilistic retrieval models.Inf. Retr., 11(6):539–561.
Margulis, E. (1992).
N-Poisson document modelling.In Belkin, N., Ingwersen, P., and Pejtersen, M., editors, Proceedings of the Fifteenth Annual InternationalACM SIGIR Conference on Research and Development in Information Retrieval, pages 177–189, New York.
Maron, M. and Kuhns, J. (1960).
On relevance, probabilistic indexing, and information retrieval.Journal of the ACM, 7:216–244.
Metzler, D. and Croft, W. B. (2004).
Combining the language model and inference network approaches to retrieval.Information Processing & Management, 40(5):735–750.
128 / 133
![Page 129: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/129.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib VI
Piwowarski, B., Frommholz, I., Lalmas, M., and Van Rijsbergen, K. (2010).
What can Quantum Theory Bring to Information Retrieval?In Proc. 19th International Conference on Information and Knowledge Management, pages 59–68.
Ponte, J. and Croft, W. (1998).
A language modeling approach to information retrieval.In Croft, W. B., Moffat, A., van Rijsbergen, C. J., Wilkinson, R., and Zobel, J., editors, Proceedings of the21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,pages 275–281, New York. ACM.
Robertson, S. (1977).
The probability ranking principle in IR.Journal of Documentation, 33:294–304.
Robertson, S. (2004).
Understanding inverse document frequency: On theoretical arguments for idf.Journal of Documentation, 60:503–520.
Robertson, S. (2005).
On event spaces and probabilistic models in information retrieval.Information Retrieval Journal, 8(2):319–329.
129 / 133
![Page 130: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/130.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib VII
Robertson, S., S. Walker, S. J., Hancock-Beaulieu, M., and Gatford, M. (1994).
Okapi at TREC-3.In Text REtrieval Conference.
Robertson, S. and Sparck-Jones, K. (1976).
Relevance weighting of search terms.Journal of the American Society for Information Science, 27:129–146.
Robertson, S. E. and Walker, S. (1994).
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval.In [Croft and van Rijsbergen, 1994], pages 232–241.
Robertson, S. E., Walker, S., and Hancock-Beaulieu, M. (1995).
Large test collection experiments on an operational interactive system: Okapi at TREC.Information Processing and Management, 31:345–360.
Rocchio, J. (1971).
Relevance feedback in information retrieval.In [Salton, 1971].
Roelleke, T. (2003).
A frequency-based and a Poisson-based probability of being informative.In ACM SIGIR, pages 227–234, Toronto, Canada.
130 / 133
![Page 131: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/131.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib VIII
Roelleke, T. and Wang, J. (2006).
A parallel derivation of probabilistic information retrieval models.In ACM SIGIR, pages 107–114, Seattle, USA.
Roelleke, T. and Wang, J. (2008).
TF-IDF uncovered: A study of theories and probabilities.In ACM SIGIR, pages 435–442, Singapore.
Salton, G., editor (1971).
The SMART Retrieval System - Experiments in Automatic Document Processing.Prentice Hall, Englewood, Cliffs, New Jersey.
Salton, G., Fox, E., and Wu, H. (1983).
Extended Boolean information retrieval.Communications of the ACM, 26:1022–1036.
Salton, G., Wong, A., and Yang, C. (1975).
A vector space model for automatic indexing.Communications of the ACM, 18:613–620.
131 / 133
![Page 132: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/132.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib IX
Singhal, A., Buckley, C., and Mitra, M. (1996).
Pivoted document length normalisation.In Frei, H., Harmann, D., Schauble, P., and Wilkinson, R., editors, Proceedings of the 19th AnnualInternational ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–39,New York. ACM.
Sparck-Jones, K., Robertson, S., Hiemstra, D., and Zaragoza, H. (2003).
Language modelling and relevance.Language Modelling for Information Retrieval, pages 57–70.
Sparck-Jones, K., Walker, S., and Robertson, S. E. (2000).
A probabilistic model of information retrieval: development and comparative experiments: Part 1.Information Processing and Management, 26:779–808.
Turtle, H. and Croft, W. (1991).
Efficient probabilistic inference for text retrieval.In Proceedings RIAO 91, pages 644–661, Paris, France.
Turtle, H. and Croft, W. (1992).
A comparison of text retrieval models.The Computer Journal, 35.
132 / 133
![Page 133: Information Retrieval Models Part I](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a995760da3da068b703d/html5/thumbnails/133.jpg)
IR Models
Foundations of IR Models
Foundations: Summary
Bib X
Turtle, H. and Croft, W. B. (1990).
Inference networks for document retrieval.In Vidick, J.-L., editor, Proceedings of the 13th International Conference on Research and Development inInformation Retrieval, pages 1–24, New York. ACM.
van Rijsbergen, C. J. (1986).
A non-classical logic for information retrieval.The Computer Journal, 29(6):481–485.
van Rijsbergen, C. J. (1989).
Towards an information logic.In Belkin, N. and van Rijsbergen, C. J., editors, Proceedings of the Twelfth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pages 77–86, New York.
van Rijsbergen, C. J. (2004).
The Geometry of Information Retrieval.Cambridge University Press, New York, NY, USA.
Wong, S. and Yao, Y. (1995).
On modeling information retrieval with probabilistic inference.ACM Transactions on Information Systems, 13(1):38–68.
Zaragoza, H., Hiemstra, D., Tipping, M. E., and Robertson, S. E. (2003).
Bayesian extension to the language model for ad hoc information retrieval.In ACM SIGIR, pages 4–9, Toronto, Canada.
133 / 133