sigir 2012 - explicit relevance models in intent-oriented information retrieval diversification
Post on 27-May-2015
919 Views
Preview:
DESCRIPTION
TRANSCRIPT
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
http://ir.ii.uam.es
Explicit Relevance Models in Intent-Aware IR Diversification
35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Saúl Vargas, Pablo Castells and David Vallet Universidad Autónoma de Madrid
http://ir.ii.uam.es
Portland, OR, 13 August 2012
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Outline
Context: IR diversification formulation and algorithms
Proposed approach: relevance-based reformulation
of diversification algorithms
Experiments
Adjustable tolerance to redundancy
Conclusion
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Brief recap
Appliance
Golf
Chemical element
Nutrition / Health
Mining / Metallurgy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Brief recap
Appliance
Golf
Chemical element
Nutrition / Health
Mining / Metallurgy
Diversity as a means to address uncertainty in user queries
– The same query may have different intents or aspects in the information need underneath
Revision of document relevance independence
– Marginal utility of additional relevant documents decreases fast
Trade diminishing marginal utility for increased intent coverage
– Thus maximize the number of users who obtain at least some useful document
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversification – Problem statement
Given a query 𝑞 on a collection
Find 𝑆 ⊂ of given size maximizing:
𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞
Agrawal 2009, Santos 2010, Chen 2006, …
𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞
Greedy approx
NP-hard
arg max𝑑∈𝑅−𝑆
𝝋 𝒅, 𝑺 𝒒
𝑆 Diversified ranking
𝑅 − 𝑆 Baseline ranking 𝑝(𝑑|𝑞)
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞
𝑑′∈𝑆𝑧
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1− 𝑝 𝑑′ 𝑞, 𝒛
𝑑′∈𝑆𝑧
Explicit query aspects
Explicit query aspects
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑑′∈𝑆𝑧
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧
𝑑′∈𝑆𝑧
Query aspect coverage
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑑′∈𝑆𝑧
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧
𝑑′∈𝑆𝑧
Document “relevance” for query aspect
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function
Redundancy penalization
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑑′∈𝑆𝑧
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧
𝑑′∈𝑆𝑧
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑑′∈𝑆𝑧
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧
𝑑′∈𝑆𝑧
Mixture with baseline
State of the art aspect-based approaches
𝜆 Degree of diversification
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function
IA-Select scheme (Agrawal 2009)
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
𝑑′∈𝑆𝑧
xQuAD scheme (Santos 2010)
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞
= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧
𝑑′∈𝑆𝑧
Probability to observe documents
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function
IA-Select scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1− 𝑝 𝒓 𝑑′, 𝑞, 𝑧
𝑑′∈𝑆𝑧
xQuAD scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓𝑑 𝑞 + 𝜆 𝑝 𝒓𝑑 , ¬ 𝒓𝑆 𝑞
= 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1− 𝑝 𝒓 𝑑′, 𝑞, 𝑧
𝑑′∈𝑆𝑧
Probability of relevance
Our proposal
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function
IA-Select scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧
𝑑′∈𝑆𝑧
xQuAD scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓𝑑 𝑞 + 𝜆 𝑝 𝒓𝑑 , ¬ 𝒓𝑆 𝑞
= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧
𝑑′∈𝑆𝑧
More literal interpretation of initial problem statement
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function
IA-Select scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧
𝑑′∈𝑆𝑧
xQuAD scheme – relevance-based
𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟𝑑 𝑞 + 𝜆 𝑝 𝑟𝑑 , ¬ 𝑟𝑆 𝑞
= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧
𝑑′∈𝑆𝑧
Equivalent for 𝜆 = 1
𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance distirbution vs. document distribution
𝑑 0
1
𝑝 𝑟 𝑑, 𝑞, 𝑧𝑑
= E nr relevant docs ≥ 1
1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧
𝑑′∈𝑆𝑧
𝑝 𝑑 𝑞, 𝑧𝑑
= 1
Different potential behavior E.g. stronger redundancy penalization
𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context)
Potential rank equivalences do not apply here
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Aspect-based relevance model
Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛
Cannot use odds, logs, constant removal… or any other rank-preserving step
(we need the specific values)
𝑝 𝑟 𝑑, 𝑞
𝑝 𝑟 𝑑, 𝑞, 𝑧
𝑝 𝑧 𝑑
𝑝 𝑧 𝑞
𝑝 𝑑 𝑞
𝑝(𝑧)
Normalized baseline IR system score (as in e.g. Bache 2009)
Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending
on available observations:
• 𝑧 as document classes (e.g. ODP)
• 𝑧 as subqueries (e.g. reformulations)
Then derive the other two parameters
Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Positional relevance distribution estimate
𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
0 20 40 60 80 100 120 140 160 180 200
p(r
|k)
k
pLSA
Lemur
AOL
Click log statistics
Precision estimates
𝑝 𝑟 𝑘
𝑘
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments
Collection: ClueWeb09 category B (50M documents)
Query/subtopic set: TREC 2009/10 diversity task (100 queries)
Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100
Query aspect space:
a) ODP categories level 4 (~7K categories)
b) TREC subtopics (oracle for reference)
Specific parameter estimates:
𝑝 𝑧 𝑞 Uniform
𝑝 𝑧 𝑑
𝑝 𝑟 𝑘
Search diversity
ODP categories: semi-supervised text classification by Textwise
TREC subtopics: Indri search system run on 𝑧 as if a query
i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)
ii. Click statistics from AOL log (thus different IR system)
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments – Search diversity on TREC
ERR
-IA
Based on 𝑝 𝑑 𝑞, 𝑧
Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
ERR
-IA
λ
ODP categories TREC subtopics
λ
xQuAD scheme
𝑝 𝑟 𝑘 from qrels
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments – Search diversity on TREC
-nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20
Lemur - 0.2587 0.1630 0.2396 0.4636
a) O
DP
ca
tego
rie
s IA-Select - 0.2651 0.1681 0.2423 0.4483
xQuAD 0.9 0.2675 0.1656 0.2451 0.4864
Rel-based xQuAD
i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△
ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽
b)
TR
EC
sub
top
ics IA-Select - 0.3541 0.2346 0.3213 0.5787
xQuAD 1.0 0.3445 0.2241 0.3127 0.5704
Rel-based xQuAD
i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△
ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△
“informally” maximizing ERR-IA by 0.1 steps for each diversifier
Best value in bold green
▲ ▼ 𝑝 < 0.05
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments
Dataset 1: MovieLens 1M
Dataset 2: Last.fm crawl
Adaptation of IR diversity paradigm
(Vargas, Castells & Vallet SIGIR 2011)
Baseline rankings: Diversified top n: 100
Specific parameter estimates:
𝑝 𝑧 𝑞 Uniform
𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association)
𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test users
Recommendation diversity
Queries users Documents items (movies, music artists) Subtopics item features (genres, tags) Relevance judgments test ratings from data split
Collection: 6K users, 4K movies, 1M ratings
Subtopic set: 10 movie genres
Collection: 1K users, 175K artists, 20M playcounts
Subtopic set: 120K social tags on artists by Last.fm users
a) pLSA
b) Popularity-based recommendation
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments – Recommendation diversity on MovieLens and Last.fm
λ
MovieLens 1M
ERR
-IA
Last.fm
λ
pLS
A r
eco
mm
en
der
R
eco
mm
end
atio
n
by
item
po
pu
lari
ty
ERR
-IA
Based on 𝑝 𝑑 𝑞, 𝑧
Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Adjustable tolerance to redundancy
Generalization of relevance-based diversification scheme
Formally support adjustable redundancy penalization
Approach: generalize relevance to browsing model
𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟𝑑 , ¬ 𝒔𝒕𝒐𝒑𝑆 𝑞 = ⋯
= 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1− 𝑝 𝑟 𝑑′, 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓
𝑑′∈𝑆𝑐
Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1]
– High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches
– In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1,
i.e. a single relevant document is sought
Tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs. in -nDCG
Adjustable tolerance to redundancy
𝑝𝑠𝑡𝑜𝑝𝑟
𝑝𝑠𝑡𝑜𝑝𝑟
Search task Lemur on TREC / Subtopics
Recommendation task pLSA on MovieLens / Genres
0 0 1 1
1 1
best -nDCG value of column
worst -nDCG value of column For each
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Conclusion
Alternative, relevance-based formulation of greedy aspect-based diversification
– Unifies two previous aspect-based algorithms
– More literal expression of formal problem statement (and metrics?)
𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧
– Literal value estimates needed (rather than rank-equivalent approximations)
– Estimate based on positional relevance (relevance or click data needed)
Seems to perform well empirically
– Light requirements on relevance or click data for training positional relevance
– Improvement trend, but needs to be tested under further optimizations
Formal support for redundancy tolerance adjustment
top related