11
What Makes a Query What Makes a Query Difficult?Difficult?David Carmel, Elad YomTov, Adam David Carmel, Elad YomTov, Adam Darlow, Dan PellegDarlow, Dan PellegIBM Haifa Research LabsIBM Haifa Research LabsSIGIR 2006SIGIR 2006
22
OutlineOutline IntroductionIntroduction A model for topic difficultyA model for topic difficulty Validating the modelValidating the model Uses of the modelUses of the model ConclusionConclusion
33
IntroductionIntroduction Typical TREC topics for comparison Typical TREC topics for comparison
between systems are defined by:between systems are defined by:– A textual descriptionA textual description– A set of documents relevant to the A set of documents relevant to the
information needinformation need Experimental results of TREC participants Experimental results of TREC participants
show a show a wide diversitywide diversity in effectiveness among in effectiveness among topics as well as among systemstopics as well as among systems Robust trackRobust track
44
IntroductionIntroduction The goals of TREC Robust trackThe goals of TREC Robust track
a.a. Encouraging systems to decrease variance by Encouraging systems to decrease variance by focusing on poorly performing topicsfocusing on poorly performing topics
b.b. To estimate the relative difficulty of each topicTo estimate the relative difficulty of each topicc.c. To study whether an old and difficult topic is still To study whether an old and difficult topic is still
difficult for current state-of-the-art IR systemsdifficult for current state-of-the-art IR systemsd.d. To study whether topics difficult in one To study whether topics difficult in one
collection are still difficult in another collectioncollection are still difficult in another collection Why are some topics more difficult than Why are some topics more difficult than
others?others?
55
Related WorkRelated Work Clarity measure for queriesClarity measure for queries Linguistic features of queryLinguistic features of query Number of topic aspectsNumber of topic aspects Features of entire collectionFeatures of entire collection Reliable Information Access (RIA) workshopReliable Information Access (RIA) workshop
– Ten failure categories are identifiedTen failure categories are identified– Most failures are related to identify all aspects Most failures are related to identify all aspects
of the topicof the topic
66
OutlineOutline IntroductionIntroduction A model for topic difficultyA model for topic difficulty Validating the modelValidating the model Uses of the modelUses of the model ConclusionConclusion
77
Topic Difficulty ModelTopic Difficulty Model Components of a topic: the queries Components of a topic: the queries QQ and and
the relevant documents the relevant documents RR, dependent on , dependent on the collection the collection CC::
1.1. d(d(QQ, , CC) – the distance between the queries, ) – the distance between the queries, QQ, and the , and the collection, collection, CC
2.2. d(d(QQ, , QQ) – the distance among the queries) – the distance among the queries3.3. d(d(RR, , CC) – the distance between the relevant ) – the distance between the relevant
documents, documents, RR, and the collection , and the collection CC4.4. d(d(R, RR, R) – the distance among the relevant documents) – the distance among the relevant documents5.5. d(d(QQ, , RR) – the distance between the queries, ) – the distance between the queries, QQ, and the , and the
relevant documents, relevant documents, RR..
)|,( CRQTopic
Figure 1: a general model for topic difficulty
88
Distance MeasureDistance Measure Jensen-Shannon divergence (JSD)Jensen-Shannon divergence (JSD)
– A symmetric version of the Kullback-Leibler A symmetric version of the Kullback-Leibler divergence (KLD)divergence (KLD)
– Applied to Applied to d(d(QQ, , CC), d(), d(RR, , CC) and d() and d(QQ, , RR) ) – For the distributions For the distributions PP((ww) and ) and QQ((ww) over the ) over the
words in the collectionwords in the collection , the JSD is: , the JSD is:Ww
WwWw
JS
KLKLJS
wMwQwQ
wMwPwPQPD
Q(w))(P(w) M(w)
MQDMPDQPD
)()(log)(
)()(log)()||(
21
))||()||((21)||(
99
Distribution of TermsDistribution of Terms The probability distribution of a word The probability distribution of a word ww
within the document or query within the document or query xx::
λ=0.9 for d(λ=0.9 for d(QQ, , QQ), d(), d(QQ, , RR) and d() and d(RR, , RR))λ=0.99 for d(λ=0.99 for d(QQ, , CC) and d() and d(RR, , CC))
)()1()|(
''
wP.n
n.xwP c
xww
w
1010
Topic Aspects and Topic Topic Aspects and Topic BroadnessBroadness The The aspect coverageaspect coverage problem is to find documents problem is to find documents
that cover as many different aspects as possiblethat cover as many different aspects as possible– Providing more information to the userProviding more information to the user
In the model, topic broadness (difficulty) is In the model, topic broadness (difficulty) is measured by the distance d(measured by the distance d(RR, , RR))
JSD suffers from the drawback that identical JSD suffers from the drawback that identical documents are very close togetherdocuments are very close together
Using topic aspects to measure d(Using topic aspects to measure d(RR, , RR))– Number of clusters of the relevant documentsNumber of clusters of the relevant documents
Square root of JSD for distance measure between Square root of JSD for distance measure between documentsdocuments
1111
Document Coverage and Document Coverage and Query CoverageQuery Coverage Rarely does the information pertaining to Rarely does the information pertaining to
both facets of the model existboth facets of the model exist When only When only QQ or or RR are available, the missing are available, the missing
part is approximated by JSDpart is approximated by JSD– Document coverage (DC)Document coverage (DC)
– Query coverage (QC)Query coverage (QC)
)'||()( ' RQDargminQDC JSR
)||'()( ' RQDargminRQC JSQ
1212
Practical Considerations for Practical Considerations for Document CoverageDocument Coverage Computing document coverage for a given Computing document coverage for a given
query is query is NP-hardNP-hard approximation approximation– Only the top 100 documents retrieved for the Only the top 100 documents retrieved for the
query are consideredquery are considered– A greedy algorithmA greedy algorithm
The document closest to the query is foundThe document closest to the query is found Iteratively adding the document that causes the Iteratively adding the document that causes the
largest decrease in JSD between the query and the largest decrease in JSD between the query and the selected docsselected docs
Once a minimum is reached, the value of JSD is Once a minimum is reached, the value of JSD is measured and the set of accumulated documents measured and the set of accumulated documents is used as an approximation to the true DC setis used as an approximation to the true DC setFigure 2: A typical JSD curve obtained by the
greedy algorithm for document coverage detection
1313
Practical Considerations for Practical Considerations for Query CoverageQuery Coverage Query coverage for given relevant Query coverage for given relevant
documentsdocuments– Only the set of terms belong to Only the set of terms belong to RR are are
considered by the greedy algorithmconsidered by the greedy algorithm– The iterative process results in a list of The iterative process results in a list of
ranked wordsranked words The most representative wordsThe most representative words
1414
OutlineOutline IntroductionIntroduction A model for topic difficultyA model for topic difficulty Validating the modelValidating the model Uses of the modelUses of the model ConclusionConclusion
1515
Experiment EnvironmentExperiment Environment Search engine: Search engine: JuruJuru Topics: the 100 topics of TREC 2004 and Topics: the 100 topics of TREC 2004 and
2005 terabyte tracks2005 terabyte tracks Document collection: Document collection: .GOV2.GOV2 (25 million (25 million
docs)docs)
1616
Model-Induced Distances vs. Model-Induced Distances vs. Average PrecisionAverage Precision
Table 1: Comparison of Pearson and Spearman Table 1: Comparison of Pearson and Spearman correlation coefficients between the different correlation coefficients between the different distances induced by the topic difficulty model and distances induced by the topic difficulty model and the AP of the 100 topicsthe AP of the 100 topics
DistanceDistanceJuru’s APJuru’s AP TREC median APTREC median AP
PearsonPearson Spearman’s Spearman’s ρρ
PearsonPearson Spearman’s Spearman’s ρρ
d(d(QQ, , CC))d(d(RR, , CC))d(d(QQ, , RR))d(d(RR, , RR))
0.1670.1670.3220.322-0.065-0.065+0.150+0.150
0.1700.1700.2900.290-0.134-0.1340.1410.141
0.2980.2980.3310.331-0.019-0.0190.1190.119
0.2920.2920.3230.3230.0040.0040.1550.155
CombinedCombined 0.4470.447 0.4760.476
1717
Model-Induced Distances vs. Model-Induced Distances vs. Topic Aspect CoverageTopic Aspect Coverage Topic aspect coverageTopic aspect coverage
– Average precision of the top-ranked document Average precision of the top-ranked document for each aspectfor each aspect
DistanceDistanceJuru’s APJuru’s AP
PearsonPearson Spearman’s Spearman’s ρρ
d(d(QQ, , CC))d(d(RR, , CC))d(d(QQ, , RR))d(d(RR, , RR))
0.0470.0470.1430.143-0.271-0.271-0.364-0.364
0.0470.0470.1940.194-0.285-0.285-0.418-0.418
CombinedCombined 0.4820.482
Table 2: Correlations between the Table 2: Correlations between the different distances and the aspect different distances and the aspect coveragecoverage
1818
OutlineOutline IntroductionIntroduction A model for topic difficultyA model for topic difficulty Validating the modelValidating the model Uses of the modelUses of the model ConclusionConclusion
1919
Uses of the ModelUses of the Model Estimating query average precisionEstimating query average precision Estimating topic aspect coverageEstimating topic aspect coverage Estimating topic Estimating topic findabilityfindability
– The likelihood of documents in the domain The likelihood of documents in the domain (topic) returning as answers to queries (topic) returning as answers to queries related to the domainrelated to the domain
2020
Estimating Average Estimating Average PrecisionPrecision R’R’ is an approximation of the set of relevant is an approximation of the set of relevant
documents, by approximating document documents, by approximating document coverage coverage
Using d(Using d(QQ, , CC), d(), d(QQ, , R’R’) and d() and d(R’R’, , CC) as ) as features for Support-Vector Machine (SVM)features for Support-Vector Machine (SVM)– Leave-one-out cross-validationLeave-one-out cross-validation
The Pearson correlation between the actual The Pearson correlation between the actual average precision to the predicted average average precision to the predicted average precision is 0.362precision is 0.362
2121
Estimating Aspect Estimating Aspect CoverageCoverage The same approach as estimating average The same approach as estimating average
precisionprecision The Pearson correlation between the actual The Pearson correlation between the actual
aspect coverage and the predicted one is aspect coverage and the predicted one is 0.3970.397
The same features are also used to train an The same features are also used to train an estimator to detect low coverage (<10%) estimator to detect low coverage (<10%) queriesqueries
Figure 3: Receiver operating characteristic (ROC) curve for distinguishing queries with low aspect coverage from other queries (the area under the curve is 0.88)
P(decide query with low coverage | query with high coverage)
P(decide query with low coverage | query with low coverage)
2222
Estimating Topic Estimating Topic FindabilityFindability Given a set of documents of a domain, Given a set of documents of a domain,
findability represents how easy it is for a user findability represents how easy it is for a user to find these documentsto find these documents– Related to the field of search engine optimizationRelated to the field of search engine optimization
For each topic, the 10 best words are selected For each topic, the 10 best words are selected from the result of query coverage from the result of query coverage approximationapproximation– A sequence of queries: the best one word, the best A sequence of queries: the best one word, the best
two words and so ontwo words and so on– For each topic, the resulting values of AP against For each topic, the resulting values of AP against
the number of terms are its features for the number of terms are its features for KK-means -means clustering clustering
2323
Results of Estimating Topic Results of Estimating Topic FindabilityFindability
Figure 4: Cluster centers of the AP curves versus Figure 4: Cluster centers of the AP curves versus the number of best words. The curves represent the number of best words. The curves represent three typical findability behaviorsthree typical findability behaviors
2424
OutlineOutline IntroductionIntroduction A model for topic difficultyA model for topic difficulty Validating the modelValidating the model Uses of the modelUses of the model ConclusionConclusion
2525
ConclusionConclusion The novel topic difficulty model is addressed to The novel topic difficulty model is addressed to
capture the main components of a topic and the capture the main components of a topic and the relations between those components to topic difficulty.relations between those components to topic difficulty.
The larger the distance of the queries and the Qrels The larger the distance of the queries and the Qrels from the entire collection, the better the topic can be from the entire collection, the better the topic can be answered.answered.
The applicability of the difficulty model is The applicability of the difficulty model is demonstrated.demonstrated.
There are more important features affecting topic There are more important features affecting topic difficulty left for further research, ex: ambiguity of the difficulty left for further research, ex: ambiguity of the query terms, or topics with missing content.query terms, or topics with missing content.