a thesis - georgetown universityinfosense.cs.georgetown.edu/publication/dongyi_guan_thesis.pdf · a...
Post on 21-Jul-2020
1 Views
Preview:
TRANSCRIPT
Structured Query Formulation and Result Organization for Session
Search
A Thesis
submitted to the Faculty of the
Graduate School of Arts and Sciences
of Georgetown University
in partial ful�llment of the requirements for the
degree of
Master of Science
in Computer Science
By
Dongyi Guan
Washington, DC
April 22, 2013
Copyright c© 2013 by Dongyi Guan
All Rights Reserved
ii
Structured Query Formulation and Result Organization for Session
Search
Dongyi Guan
Thesis Advisor: Dr. Grace Hui Yang
Abstract
Complicated search such as making a travel plan usually requires more than one
search queries. A user interacts with a search engine for multiple iterations, which
we call a session. Session search is the task that deals with document retrieval within
a session. A session often involves a series of interactions between the user and the
search engine. To make use of all the queries and various interactions in a session,
we propose an e�ective structured query formulation method for session search. By
identifying phrase-like textual nuggets, we investigate di�erent degrees of importance
for phrases in queries, aggregate them to create a highly e�ective session-wise query,
and send them to state-of-the-art search engine to retrieval the relevant documents.
Our system participated in the TREC 2012 Session track evaluation and won the
second position in whole session search (RL2-RL4).
A second main contribution of this thesis is to increase stability of result organiza-
tion for session search. Search result clustering (SRC) hierarchies are widely used in
organizing search results. These hierarchies provide users overviews about their search
results. Search result organization is usually sensitive to even slight change in queries.
Within a session, queries are related and hence the search result organization should
be related as well and maintain a more stable representation of the organization. We
propose two monothetic concept hierarchy approaches that exploit external knowl-
edge to build more stable SRC hierarchies for session search. One approach corrects
iii
erroneous relations generated by Subsumption, a state-of-the-art concept hierarchy
construction approach. The other employs external knowledge and directly builds
SRC hierarchies. Evaluations show that our approaches generate statistically signi�-
cantly more stable search result organizations while keeping the organization in good
quality.
Index words: Information retrieval, session search, structured query, searchresult organization
iv
Acknowledgments
This thesis would not have been completed without the guidance and the help of
the persons who contributed and extended their valuable assistance in the preparation
and completion of this study.
First and foremost, I would like to express my utmost gratitude to my advisor Dr.
Grace H. Yang for her continuous support and inspiring instruction throughout my
study and research. Dr. Grace H. Yang is a great advisor with patience, motivation,
enthusiasm, and immense knowledge. I would also like to thank her for encouraging
and helping me to shape my interest and ideas.
Besides my advisor, I am deeply grateful to the rest of my thesis committee: Dr.
Lisa Singh and Dr. Calvin Newport, for their insightful comments and high quality
questions.
My sincere thanks also go to the professors at Georgetown University for their
great support and kind help: Dr. Ophir Frieder, Dr. Evan Barba, Dr. Eric Burger,
Dr. Der-Chen Chang, Dr. Jeremy Fineman, Dr. Nazli Goharian, Dr. Bala Kalyanasun-
daram, Dr. Mark Maloof, Dr. Jami Montgomery, Dr. Micah Sherr, Dr. Clay Shields,
Dr. Richard Squier, Dr. Mahendran Velauthapillai, Dr. Wenchao Zhou. Also I thank
my friends Yifan Gu, Jiyun Luo, Jon Parker, Henry Tan, Amin Teymorian, Chris
Wacek, Yifang Wei, Andrew Yates, and Sicong Zhang, for the stimulating discussion
and sleepless nights we were working together before deadlines.
I owe my warm thanks to my family for their continuous love and supports in my
decision. My parents always give me advice to help me get through the di�cult times.
v
I am so grateful to my Fiancée, whose love and unconditional support allow me to
�nish this journey. Finally, I would like to dedicate this work to my lost Grandma,
who left us too soon. I hope that this work makes her proud.
vi
Table of Contents
Chapter
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Session Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Query Formulation . . . . . . . . . . . . . . . . . . . . 31.2.3 Search Result Organization . . . . . . . . . . . . . . . 4
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.1 Challenges in Query Formulation for Session Search . . 51.3.2 Challenges in Result Organization for Session Search . 6
1.4 TREC Session Tracks . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Our Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1 Structured Query Formulation for Session Search . . . 91.5.2 Stable Search Result Organization by Exploiting External
Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . 111.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1 Session Search and TREC Session Tracks . . . . . . . . . . . . . 142.2 Query Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Search Result Organization . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Hierarchical Clustering . . . . . . . . . . . . . . . . . . 192.3.2 Subsumption . . . . . . . . . . . . . . . . . . . . . . . 202.3.3 Exploiting External Knowledge . . . . . . . . . . . . . 20
3 E�ective Structured Query Formulation for Session Search . . . . . . . 233.1 Identifying Nuggets and Formulating Structured Queries . . . . 23
3.1.1 The Strict Method . . . . . . . . . . . . . . . . . . . . 243.1.2 The Relaxed Method . . . . . . . . . . . . . . . . . . . 26
3.2 Query Aggregation within a Session . . . . . . . . . . . . . . . . 293.2.1 Aggregation Schemes . . . . . . . . . . . . . . . . . . . 29
3.3 Query Expansion by Anchor Text . . . . . . . . . . . . . . . . . 303.4 Removing Duplicated Queries . . . . . . . . . . . . . . . . . . . 323.5 Document Re-ranking . . . . . . . . . . . . . . . . . . . . . . . . 33
vii
3.6 Evaluation for Session Search . . . . . . . . . . . . . . . . . . . 343.6.1 Datasets, Baseline, and Evaluation Metrics . . . . . . . 343.6.2 Results for TREC 2011 Session Track . . . . . . . . . . 353.6.3 Results for TREC 2012 Session Track . . . . . . . . . . 373.6.4 O�cial Evaluation Results for TREC 2012 Session Track 39
3.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Increasing Stability of Result Organization for Session Search . . . . . . 424.1 Utilizing External Knowledge to Increase Stability of Search
Result Organization . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Identifying Reference Wikipedia Entries . . . . . . . . . . . . . . 444.3 Improving Stability of Subsumption . . . . . . . . . . . . . . . . 474.4 Building Concept Hierarchy Purely Based on Wikipedia . . . . . 494.5 Evaluation for Search Result Organization . . . . . . . . . . . . 51
4.5.1 Hierarchy Stability . . . . . . . . . . . . . . . . . . . . 524.5.2 Hierarchy Quality . . . . . . . . . . . . . . . . . . . . . 56
4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.1 Research Summary . . . . . . . . . . . . . . . . . . . . . . . . . 615.2 Signi�cance of the Thesis . . . . . . . . . . . . . . . . . . . . . . 625.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
viii
List of Figures
1.1 Typical procedure of session search. . . . . . . . . . . . . . . . . . . . 21.2 Retrieved documents by Lemur (TREC 2011 Session 25). The top doc-
ument only describes the symptoms and treatments for communicablediseases, which is not relevant to the topic of session �collagen vasculardisease�. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Search result clustering (SRC) hierarchies by Yippy (TREC 2010 Ses-sion 123). SRC hierarchies (a) and (b) are for queries �diet� and �lowcarb diet� respectively. A low carb diet �South Beach Diet� that shouldhave appeared in both (a) and (b), is missing in (b); The cluster of �DietAnd Weight Loss� in (a) are dramatically changed in (b). Screenshotwas snapped at 15:51EST, 6/15/2012 from Yippy. . . . . . . . . . . . 7
3.1 A sample nugget in the TREC 2012 session 53 query �servering spinalcord paralysis�. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Words in a snippet built from TREC 2012 session 53 query �serveringspinal cord consequenses�, where �spinal� is always connected to �cord�. 25
3.3 Words in a snippet built from TREC 2011 session 20 query �dooneybourke purses�, where �dooney and bourke� is a brand name but theuser omits the word �and�. . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 nDCG@10 values of retrieved documents using TREC 2011 Sessiontrack dataset. two cases, with threshold and without threshold, arecompared. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Anchor text in a web page. . . . . . . . . . . . . . . . . . . . . . . . . 313.6 Changes in nDCG@10 from RL1 to RL2 presented by TREC 2012
Session track. Error bars are 95% con�dence intervals (Figure 1 in [26]) 393.7 All results by nDCG@10 for the current query in the session for each
subtask (Table 2 in [26]). . . . . . . . . . . . . . . . . . . . . . . . . . 404.1 Framework overview of the Wikipedia enhanced concept hierarchy con-
struction system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Mapping to relevant Wikipedia entry. Text in circles denotes Wikipedia
entries, while text in rectangle denotes concepts. Based on the contextof current search session, the entry �Gestational diabetes� is selectedas the most relevant Wikipedia entry. Therefore the concept �GDM�is mapped to �Gestational diabetes�, whose supercategories are �Dia-betes� and �Health issues in pregnancy�. . . . . . . . . . . . . . . . . 45
ix
4.3 An example of Wikipedia-enhanced Subsumption. The concepts �Dia-betes� and �type 2 diabetes� satisfy Eq.(4.5) and is identi�ed as apotential subsumption pair. The reference Wikipedia entry of �Dia-betes� is a category, and reference Wikipedia entry of �type 2 diabetes�is a Wikipedia entry �Diabetes mellitus type 2�. Therefore we check if�Diabetes� is one of the supercategories of �Diabetes mellitus type 2�and con�rm that �diabetes� subsumes �type 2 diabetes�. . . . . . . . . 49
4.4 An example of Wikipedia-only hierarchy construction. From concept�Diabetes mellitus� we �nd the reference Wikipedia entry �Diabetesmellitus�, then we �nd its start category �Diabetes�. Similarly, foranother concept �joslin�, we �nd its reference Wikipedia entry �JoslinDiabetes Center� and its start category �Diabetes organizations�. Wethen expand from these two start categories. �Diabetes organizations�is one of the subcategories of �Diabetes�, thus we merge them together. 50
4.5 Major clusters in hierarchies built by Clusty for TREC 2010 session 3.(a) is for query �diabetes education� and (b) is for �diabetes educationvideos books�. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6 Major clusters in hierarchies built by Wiki-only for TREC 2010 session3. (a) is for query �diabetes education� and (b) is for �diabetes educationvideos books�. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.7 Major clusters in hierarchies built by Subsumption for TREC 2010session 3. (a) is for query �diabetes education� and (b) is for �diabeteseducation videos books�. . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.8 Major clusters in hierarchies built by Subsumption+Wiki for TREC2010 session 3. (a) is for query �diabetes education� and (b) is for�diabetes education videos books�. . . . . . . . . . . . . . . . . . . . . 57
4.9 Search result organization quality improvement vs. stability for Sub-sumption and Subsumption+Wiki. . . . . . . . . . . . . . . . . . . . 58
4.10 Extreme case 1. A totally static hierarchy for two queries in a session(TREC 2010 session 107). . . . . . . . . . . . . . . . . . . . . . . . . 59
4.11 Extreme case 2. A totally di�erent hierarchy for two queries in a session(TREC 2010 session 75). . . . . . . . . . . . . . . . . . . . . . . . . . 60
x
List of Tables
3.1 nDCG@10 for TREC 2011 Session track RL1. Dirichlet smoothingmethod is used. µ = 4000, f = 10 for strict method and µ = 4000,f = 20 for relaxed method. Methods are compared to the baseline �original query. A signi�cant improvement over the baseline is indicatedwith a † at p < 0.05 level and a ‡ at p < 0.005 level (t-test, single-tailed). The best run and median run in TREC 2011 are listed forcomparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 nDCG@10 for TREC 2011 Session track RL2. Dirichlet smoothingmethod and strict method are used. µ = 4000, f = 5 for uniform,µ = 4500, f = 5 for previous vs. current (PvC ) and distance-based.Methods are compared to the baseline � original query. A signi�cantimprovement over the baseline is indicated with a † at p < 0.05 leveland a ‡ at p < 0.005 level (t-test, single-tailed). The best run andmedian run in TREC 2011 are listed for comparison. . . . . . . . . . 37
3.3 nDCG@10 for TREC 2011 Session track RL3 and RL4. All runs usestrict method and the con�guration of µ = 4500, f = 5. Methods arecompared to the baseline � original query. A signi�cant improvementover the baseline is indicated with a † at p < 0.05 level and a ‡ atp < 0.005 level (t-test, single-tailed). The best run and median run inTREC 2011 are listed for comparison. . . . . . . . . . . . . . . . . . . 37
3.4 Methods and parameter settings for TREC 2012 Session track. µ is theDirichlet smoothing parameter, f is the number of pseudo relevancefeedback. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 nDCG@10 for TREC 2012 Session track. Mean of the median of theevaluation results in TREC 2012 are listed. . . . . . . . . . . . . . . . 38
3.6 AP for TREC 2012 session track. Mean of the median of the evaluationresults in TREC 2012 are listed. . . . . . . . . . . . . . . . . . . . . . 38
4.1 Statistics of TREC 2010 and TREC 2011 Session track datasets. . . . 524.2 Stability of search result organization for TREC 2010 Session queries.
Approaches are compared to the baseline - Subsumption. A signi�cantimprovement over the baseline is indicated with a † at p < 0.05 and a‡ at p < 0.005 (t-test, single-tailed). . . . . . . . . . . . . . . . . . . . 54
xi
4.3 Stability of search result organization for TREC 2011 Session queries.Approaches are compared to the baseline - Subsumption. A signi�cantimprovement over the baseline is indicated with a † at p < 0.05 and a‡ at p < 0.005 (t-test, single-tailed). . . . . . . . . . . . . . . . . . . . 54
xii
Chapter 1
Introduction
1.1 Motivation
Complicated search tasks, such as planning a trip, buying a product, or looking for
a good elementary school, are common in our daily life. These tasks often contain
multiple sub-topics so that they require more than one query. A user usually interacts
with a search engine when performing complicated search tasks. These interactions
form a session. Session search is the task that deals with document retrieval within a
session.
Major Web search engines, including Google and Bing, return a list of documents
that are ranked in decreasing relevance order to a single query. However, this repre-
sentation may not fully satisfy the users. Because more complicated information needs
may contain multiple sub-topics, therefore documents relevant to di�erent sub-topics
are perhaps mixed up in the returned document list. If a search engine organizes the
search results into a hierarchical representation that shows the sub-topics emerging
in the documents explicitly, a user will probably be able to locate the information
that he or she needs in the search results more easily and more e�ciently.
For example, if a user is preparing an article about the �Pocono Mountains region�,
he or she would want to search the information about many sub-topics of the �Pocono
Mountain region�, such as national parks, resorts, shopping, etc. It is not easy to
retrieve the information about all these aspects by one query. Consequently, the user
1
Figure 1.1: Typical procedure of session search.
may begin with a query �pocono mountain region� and then turn to queries like
�pocono mountains region things to do�, �pocono mountains region activities�, and
�pocono mountains region national park� to search the sub-topics. Because the queries
are about di�erent sub-topics, the user may expect a system that can organize the
relevant documents into a hierarchy structure, which places documents about di�erent
sub-topics in di�erent groups such as �activities� or �national park�.
1.2 Session Search
Session search is a �eld devoted to �nding documents relevant to a session. A system
that support session search accepts an entire session, which includes a series of pre-
vious queries with corresponding previous search results and a current/last query,
and retrieves documents relevant to the topic of the session.
2
1.2.1 Overview
Figure 1.2.1 shows a typical procedure for a session search. The interaction between a
user and a search engine can be represented as a session. The session contains a series
of previous queries q1, q2, · · · , qn−1 each associated with a set of relevant documents,
i.e., previous results,D1, D2, · · · , Dn−1, and a current/last query qn. The search engine
usually formulates a query that represents the entire session. The search engine then
applies retrieval models on the formulated query and an indexed corpus to retrieve
relevant documents. After retrieval, the search engine presents them to the user. The
user may be satis�ed with the search results and �nish the procedure, or may be
unsatis�ed and modify the session to re-retrieve the documents.
This thesis studies two crucial components in this procedure: query formulation
and search result organization.
1.2.2 Query Formulation
Query formulation is important in a system because the retrieval model directly
relies on the formulated query. The system can retrieve more relevant documents
if the formulated query represents the topic of session more accurately. Structural
formulation for queries, like combination of terms, assigning weights to terms, or
query expansion, focuses on the underlying meanings in queries. Structured queries
identify the concepts in queries and emphasize the important concepts as individual
atoms. In other words, structured queries express user intentions more precisely, so
as to retrieve relevant documents more e�ectively.
In the example in Section 1.1, there are multiple concepts in this session: �pocono
mountains region�, �things to do�, �activities�, and �national park�. Since �pocono
mountains region� appears in every query, it is probably much more important than
3
the others. Therefore, the search engine can build a structured query assigning higher
weight to �pocono mountains region� to express the importance of the concept �pocono
mountains region�.
1.2.3 Search Result Organization
Clear organization for search results gives users an overview on the search results and
may help users discover their further information needs e�ectively. Since the results
of a session search often contain multiple aspects, the search engine is friendly to the
user if it applies search result clustering to organize the search results into hierarchies,
which we call SRC hierarchies. SRC hierarchies support better information access by
improving the display of information. Search results are presented in a "lay of the
land" format, which presents similar results together and reveals important concepts
in lower ranked results.
In the example in Section 1.1, a SRC hierarchy is appropriate for organizing the
search results because the documents relevant to the topic contains multiple sub-
topics and some sub-topics can be further divided. For example, there is a query
about �pocono mountains things to do�, which may include documents that can be
divided to more detailed groups like �hiking� or �camping�.
1.3 Challenges
The complexity of session search lays great challenges in front of the researchers,
especially in the two crucial components in the procedure of session search: query
formulation and search result organization.
4
Figure 1.2: Retrieved documents by Lemur (TREC 2011 Session 25). The top docu-ment only describes the symptoms and treatments for communicable diseases, whichis not relevant to the topic of session �collagen vascular disease�.
1.3.1 Challenges in Query Formulation for Session Search
Words within a query may form phrases to express coherent meanings, or concepts.
A word group may describe a topic di�erent from every single word. Words in the
group may be more important than the rest. Furthermore, a session contains multiple
queries, some of which are more important than others for expressing the topic of
session. If a search engine treats all the words in a session individually and identically,
it could give the documents relevant to every single word higher rankings. However,
the documents relevant to single words are not necessarily relevant to the topic of the
query. This may decrease the search accuracy.
5
Figure 1.2 shows an example of directly submitting all the words in queries in a
session to Lemur1, a powerful search engine. The session is composed of three queries:
�collagen vascular disease causes symptoms treatments e�ects�, �CVD causes symp-
toms treatments�, and �collagen vascular disease causes symptoms treatments�. As we
can see, the search engine processes �collagen�, �vascular�, and �disease� as separate
words. Moreover, the common words �disease�, �symptoms�, and �treatments�, which
repeatedly occur in all the queries, enormously bias the search results, leading to high
ranks of the documents related to �symptoms� and �treatments� of other �diseases�.
Consequently, the relevant documents about the topic �collagen vascular disease� do
not appear in the top retrieval results.
Session search lays tremendous challenges in front of us: (1) How to identify word
groups expressing unit coherent meanings, that is, concepts, in queries within a ses-
sion; (2) How to formulate these word groups into a structured query according to
their importance.
1.3.2 Challenges in Result Organization for Session Search
SRC hierarchies (see an example in Figure 1.3) are suitable to organize the search
results for a regular search. However, most SRC hierarchies created by the state-of-
the-art algorithms are overly sensitive to minor query changes regardless of whether
queries are similar and belong to the same session. This minor query change often
occurs within a session. For instance, about 38.6% adjacent queries in TREC 2010-
2011 session tracks only show a one-word change and 26.4% show a two-word change.
Figure 1.3 shows hierarchies generated by Yippy2 for adjacent queries �diet� and
�low carb diet�. The second query �low carb diet� is a speci�cation of the �rst. We
1http://www.lemurproject.org/, version 5.02http://www.yippy.com.
6
Figure 1.3: Search result clustering (SRC) hierarchies by Yippy (TREC 2010 Session123). SRC hierarchies (a) and (b) are for queries �diet� and �low carb diet� respectively.A low carb diet �South Beach Diet� that should have appeared in both (a) and (b), ismissing in (b); The cluster of �Diet And Weight Loss� in (a) are dramatically changedin (b). Screenshot was snapped at 15:51EST, 6/15/2012 from Yippy.
observe many changes between two SRC hierarchies (a) and (b). Overall, hierarchies
(a) and (b) share only 4 common words, �weight�, �loss�, �review�, and �diet�, and 0
common pair-wise relations. This is a very low overlap given that these two queries
are closely related and within the same session.
The dramatic change, a.k.a., instability, of SRC hierarchies for a session search
weakens their functionality to serve as an information overview. With rapidly changing
7
SRC hierarchies, users may perceive them as random search result organizations and
it is di�cult to re-�nd relevant documents identi�ed in the previous queries. We argue
that although SRC hierarchies should not be static, while making changes, they should
maintain the basic topics and structures across the entire session.
Ideally, SRC hierarchies should not only be closely related to the current query
and its search results but also re�ect changes in adjacent queries to the right degree
and at the right places. In this work, we address this new challenge of producing
stable SRC hierarchies for session search.
1.4 TREC Session Tracks
National Institute of Standards and Technology (NIST) had held TREC Session tracks
[24, 25, 26] for three years from 2010 to 2012. TREC Session tracks aim to test whether
IR systems can improve the search accuracy by the assistance of previous queries
and corresponding user interactions. The session data was composed of sequences of
queries q1, q2, · · · , qn−1, and qn, with only the current (last) query qn being the subject
for retrieval; and q1, q2, · · · , qn−1 were named previous queries correspondingly.
NIST invited faculty, sta�, and students at the University of She�eld as users to
generate session queries. In addition to these queries, NIST provided a user interaction
for each previous query in a session. A user interaction contained a ranked document
list that was retrieved for a previous query and user-click information, including the
click order, start time, and end time.
The TREC participants (we are one of them) were requested to submit the
retrieval results as ranked document lists. NIST assessors evaluated the submissions.
TREC released o�cial evaluation results every year.
8
1.5 Our Approaches
In this work, we tackle challenges in two crucial components in session search: query
formulation and search result organization. We formulate structured queries for ses-
sions to improve the search accuracy. We also propose to build stable and high quality
SRC hierarchies for a session search.
1.5.1 Structured Query Formulation for Session Search
Observation shows that a query often contains phrases which describe a coherent
meaning as a group. For example, the query �russian politics kursk submarine� (TREC
2012, session 18) contains two phrases �russian politics� and �kursk submarine�, each
of which expresses a concept and cannot be split. Phrases usually are more related
to the topic of a session so as to be more important than single words. A structured
query can represent the phrases in a query. Therefore, we focus on formulating e�ective
structured queries for search tasks within a session.
In order to represent phrases, we introduce Nugget, a substring in a query with the
terms that frequently occur together. We propose to identify nuggets by examining,
in the pseudo-relevance feedback, the distance between terms that are adjacent in a
query. Two rules, named strict and relaxed, are applied when calculating the term
distance.
We can generate a set of terms and nuggets from every query in a session. However,
the importance of these queries are not identical. We combine the terms and nuggets
from every query into one structured query using di�erent aggregation schemes.
We compare three schemes: uniform, previous vs. current, and distance-based. The
schemes are designed based on the order of queries in a session.
9
Our approach includes query expansion and document re-ranking as well. The top
k terms in anchor texts in pseudo-relevance feedback are extracted to expand the
structured query. We then re-rank the retrieved documents by comparing them to
the clicked documents in the user interactions, setting the dwell times as the weights.
1.5.2 Stable Search Result Organization by Exploiting External
Knowledge
External knowledge such as Wikipedia and WordNet are compiled manually. There-
fore, they are widely used to enhance automatic information retrieval. Correct rela-
tions between concepts is crucial for generating high quality SRC hierarchies. We
apply external knowledge as a reference to build relations between concepts. We
choose Wikipedia in this work because it contains more extensive de�nition for con-
cepts and relations represented by links and categories. Wikipedia is used in two
ways: (1) �xing the incorrect relations generated by an existing approach, which is
named Subsumption+Wiki ; (2) extracting the category information to build the con-
cept hierarchies directly, which is named Wiki-only.
The issue of unstable SRC hierarchies might occur for various reasons, where the
most signi�cant one is the popular bottom-up strategy. Contrastingly, monothetic
concept hierarchy approaches �rst extract the labels (or concepts) from retrieved doc-
uments and then organize these concepts into hierarchies. Since labels are obtained
before clusters are formed, they are not derived from the clusters. Monothetic con-
cept hierarchy approaches, hence, produce more stable hierarchies than clustering
approaches. Therefore, we build our system based on the monothetic concept hier-
archy approach.
In both methods that we exploit Wikipedia, we extract a set of concepts from the
document set, i.e., the search results. In the �rst one, we apply an existing approach
10
to draw the possible parent-child relations between pairs of concepts. Then we iden-
tify the location of this pair of concepts in the category network of Wikipedia and
�lter out the incorrect relations. In the second one, for each concept, we identify the
most relevant page in Wikipedia. Then we extract the category structure from this
Wikipedia page. The category structures for all concepts are merged to build the SRC
hierarchies.
1.6 Contributions of this Thesis
This thesis focuses on improving the search accuracy for session search and building
stable SRC hierarchies for queries in a session. By combining the nugget approach
and aggregation schemes, a structured query represents the topic of a session more
accurate. In addition, our approach integrates external knowledge into the monothetic
concept hierarchy algorithm and signi�cantly increases the stability of SRC hierarchies
without loss of quality. The speci�c contributions are: 1) we propose an approach that
introduces a concept of nugget to formulates a session into a structured query; 2) we
propose an e�cient method to predict a window size for a nugget; 3) we present
two e�ective approaches that organize the search results into SRC hierarchies of high
stability and high quality; 4) we evaluate the stability of concept hierarchies built
by monothetic concept hierarchy approaches and by clustering approaches over the
dataset of TREC Session tracks.
We propose to formulate a structured query to represent the topic of a session
precisely. We try to �nd the phrases in queries, which express atomic meanings.
In particular, we introduce a concept of nugget, a phrase-like substring in queries.
Based on identifying nuggets, an e�ective approach is proposed to generate a struc-
tured query from a session. Evaluation indicates that nuggets increase the accuracy
11
for session search. Moreover, We propose an e�cient relaxed method to predict an
appropriate window size for a nugget according to the average distance between two
terms in the pseudo-relevance feedback. Experiments show that the relaxed method
gives advantage to the single-query task over the traditional method examining the
n-grams.
Furthermore, we introduce three aggregation schemes for multiple queries in a
session are studied in this work. A session contains multiple queries, from which we
can obtain a set of nuggets. The queries may be di�erent in importance, hence some
nuggets may play more important roles than others. We �nd out that the last query
is commonly more important than the previous ones.
This thesis further studies the results organization for session search. Search result
organization gives the user an overview about the relevant documents, which helps
the user to locate the needed information rapidly. We present a novel framework based
on the monothetic concept hierarchy approach, which show advantages in terms of
stability over the popular organization approaches mostly based on hierarchical clus-
tering. Our algorithm dynamically maps the concepts to Wikipedia entries and gen-
erates the hierarchical structure, which can extract Wikipedia categories structures
about a speci�c topic e�ciently.
We are the �rst to evaluate the stability of concept hierarchies built by monothetic
concept hierarchy approaches and by clustering approaches. Moreover, we are the �rst
to integrate external knowledge into an monothetic concept hierarchy approach, to
correct the erroneous parent-child relationship between concepts. The results indicate
that our approach improves the quality of the hierarchies.
12
1.7 Outline
The rest of this thesis is organized as follows. Chapter 2 discusses the related work.
Chapter 3 presents the methods of generating the e�ective structured query from
the session data. Chapter 4 presents the enhancement of the results organization for
session search by integrating the Wikipedia category structure. Chapter 5 summarizes
the thesis and describes possible directions for future work.
13
Chapter 2
Related Work
This chapter reviews the related work to this thesis research. The related work includes
the submissions to TREC Session tracks, query formulation, and search result orga-
nization.
2.1 Session Search and TREC Session Tracks
In TREC 2011 and TREC 2012 Session tracks [25, 26], a session contained multiple
queries q1, q2, · · · , qn−1, qn, and the user interactions such as the previous search results
and click information. Four subtasks were requested:
• RL1. Only using the current query qn.
• RL2. Including the previous queries q1, q2, · · · , qn−1 and the current query qn.
• RL3. Including top retrieved documents for previous queries.
• RL4. Considering additional information about which top results are clicked by
users.
ClueWeb09 collection1 is used as the corpus in TREC Session tracks. However,
participants are allowed to use the �rst 50 million documents of ClueWeb09, named
�Category B� or �CatB�, as the corpus as well. However, they are evaluated as if they
were using the entire collection (named �Category A� or �CatA�).
1http://www.lemurproject.org/clueweb09.php
14
Twenty teams have participated in TREC Session tracks in three years [22, 6, 45,
32, 28, 19, 23, 13, 33, 2, 15, 30, 50]. Evaluation results showed signi�cant improvements
from the �rst subtask to the last one in most of submissions. These results indicated
that considering all session information contributed to search accuracy.
Jiang et al. [22, 23] applied the sequential dependence model (SDM) [36] as the
basic retrieval model. SDM features, including all single terms, ordered phrases, and
unordered phrases, were extracted from the query. Then the features were incorpo-
rated into the Lemur system by using the Indri query language. The session historical
query model (SH-QM) was used when including previous queries. For each SDM fea-
ture, a weight was assigned by linearly combining the frequencies of this feature in
current and previous queries. After introducing previous search results in RL3 and
RL4, the author applied the pseudo-relevance feedback query model (PRF-QM) on the
single-term feature. The weight of a single-term feature was adjusted by calculating
the term frequency in pseudo-relevance feedback. In RL3, top 10 ranked Wikipedia
documents served as the pseudo-relevance feedback; while in RL4, clicked documents
associated with their snippets were considered as the pseudo-relevance feedback. Fur-
thermore, Jiang et al. introduced document novelty to adjust document scores in
retrieval. Scores of the documents clicked by the user previously were lowered based
on their ranks in previous search results. Jiang et al. achieved the top rank in TREC
2011 and TREC 2012 Session track.
Albakour et al. from the University of Essex [34] utilized anchor texts to expand
queries. The anchor log �le proposed by the University of Twente2 was used as the ref-
erence to �nd terms with similar topic to the session. First, stop words were removed
from queries in a session. Then they found in the anchor log lines containing any
2http://wwwhome.cs.utwente.nl/hiemstra/2010/anchor-text-for-clueweb09-category-a.html
15
of these queries. Terms in these lines were extracted to expand queries. Anchor text
approach was proven e�ective and adopted by other teams, such as the BUPT team
[31].
Nootropia model [38] was applied in another approach proposed by the University
of Essex [34]. The authors built a Nootropia network based on previous search results
and then re-ranked the documents retrieved for the current query. They experimented
applying two opposite strategies. The �positive� one assumed that previous search
results in a session were relevant to the topic of the session so that the documents with
higher Nootropia scores would be ranked higher in the �nal results. The �negative� one
made the opposite assumption that previous search results in a session dissatis�ed the
user who submitted the session, hence the documents with higher Nootropia scores
would be ranked lower in the �nal results. Evaluation indicated that the �positive�
strategy was more valid than the �negative� one.
The CWI team [17] presented a discount rate model for the previous queries.
They assumed two classes of users: �good� users and �bad� users. A �good� user
learned from previous search results in a session to generate a high quality query,
so that the current query in a session was able to express the topic of the session
precisely. On the contrary, a �bad� user failed to adjust queries to �t the topic of a
session. Consequently the previous queries had equal values in representing the topic
of a session. Based on the above assumption, a session submitted by a �better� user
received a more discounted rate for its previous queries. When only considering the
queries in the session, the authors used the average number of interactions for all
sessions as a standard to determine whether a sessions was submitted by a �good�
user or a �bad� user. A session submitted by a �good� user was supposed to be �nished
within the average number of interactions, while a session submitted by a �bad� user
was supposed to contain more interactions than the average number. After adding
16
the information about the previous search results, the average adjacent interaction
overlap for all sessions became the standard of di�erentiating sessions submitted by
�good� users or �bad� users. The authors assumed that a session submitted by a
�better� user would have less overlap between the adjacent interactions in it.
The BUPT team [31] exploited dwell time of documents clicked by users. They
built a reference document set which contained all clicked documents in a session.
Then dwell time of every document in the set was transformed into attention time by
an exponential decay function regarding to the rank of the document. Next, they pre-
dicted attention time for every retrieved document based on its cosine similarity to the
reference document set. Finally the author re-ranked retrieved documents according
to their predicted attention time.
Most teams modi�ed the retrieval models and used query expansion [22, 23, 33, 2,
30, 50] to �t session search. However, they did not apply query formulation to generate
structured queries. Structured queries can represent phrases, which can emphasize
the important terms in the query. We propose to build structured queries for session
search in our work.
2.2 Query Formulation
The process of query formulation modi�es the original query submitted by a user [8].
The goal is to understand the user intention underlying the query more accurately.
Query formulation includes spelling correction, term proximity, etc.
As a crucial component of search engines, spelling correction has been studied
thoroughly [29, 20, 10]. For example, Li et al. proposed a generalized hidden Markov
model to correct query spelling errors [29]. They divided the spelling errors into six
types. For every type, the authors designed a rule to �x the spelling error. For each
17
word in the query submitted by a user, they classi�ed it into one type based on
the Markov model. The parameters of the Markov model are trained using manually
corrected documents.
Many structural query formulation approaches were based on n-grams, which were
de�ned to be continuous terms with a length of n [3, 37, 42]. For example, Bendersky
et al. focused on optimizing weights of concepts in a query [3]. The authors �rst
extracted bi-grams from a query as concept candidates. Then they referred to multiple
information sources such as ClueWeb09 and Wikipedia concurrently to evaluate the
relatedness between a bi-gram and the query. With the evaluation, they �ltered out
those meaningless bi-grams and assigned a weight to the the remained bi-grams, i.e.,
concepts. Finally a structured query consisted of the concepts associated with weights.
Mishne et al. applied proximity terms on web retrieval. They extracted n-grams
from a query and then experimented multiple ways to de�ne a term frequency (tf)
and a inversed document frequency (idf) for an n-gram. For example, idf could be
de�ned as the minimum or maximum idf of the terms in a group. Finally, they applied
a traditional tf -idf retrieval model with extended tf 's and idf 's for n-grams.
Zhao and Callan tried to identify term mismatches and �x them by expanding
them using boolean conjunctive normal form (CNF) [51]. CNF queries contain oper-
ator �AND� or �OR� to describe relations between terms in a query. The authors
experimented two measurements, highest inverse document frequency or lowest prob-
ability of a term in pseudo-relevance feedback, to diagnose mismatched terms. Evalua-
tion showed that it was more accurate to use probability of a term in pseudo-relevance
feedback. After identifying mismatched terms, they use a set of manually built CNF
queries to expand the original query.
Huston and Croft detected key concepts in queries by a classi�er [18]. The features
used in the classi�er included term frequency, inverse document frequency, residual
18
inverse document frequency, weighted information gain, n-grams term frequency, and
query frequency. The classi�er was trained using GOV 2 dataset.
The approaches using n-grams can e�ectively represent phrases in a query. How-
ever, some phrases have multiple forms. For example, both �pull out a book� and �pull
it out� contain the phrase �pull out�. Therefore, n-grams method is sometimes too
strict. If we only identify continuous terms, we may miss some relevant documents.
On the contrary, Boolean conjunctive normal form is hard to represent phrases. On
the other hand, a classi�er can precisely detect key concepts in queries with a good
training dataset. However, is is not easy to �nd a training dataset that can �t all
queries. We propose a relaxed method, which relies on the query itself, to predict
window sizes for nuggets and then build structured queries.
2.3 Search Result Organization
Meta search engines such as Clusty (now Yippy) employ search result clustering (SRC)
[1, 5, 41] to automatically organize search results into hierarchical clusters. There are
two strategies to cluster the search results: hierarchical clustering and monothetic
concept hierarchy construction. Furthermore, external knowledge bases are more and
more exploited to improve the quality of clustering. The remainder of this subsection
discusses the related work on hierarchical clustering, Subsumption, and exploiting
external knowledge.
2.3.1 Hierarchical Clustering
Most search result organization adopted clustering-based approaches, which shared
a common scheme that �rst clustered similar documents and then assigned labels
to the clusters [5, 9]. Clustering-based approaches often produce non-interpretable
19
clusters and semantically-ill hierarchies due to their data-driven nature and poor
cluster labeling. Even in the best known commercial clustering-based search engine,
Clusty (now Yippy), which presents search results in hierarchical clusters and labels
the clusters by variable length sentences, cluster labeling remains a challenging issue.
2.3.2 Subsumption
Monothetic concept hierarchy approaches build concept hierarchy di�erently. They
avoid cluster labeling by �rst extracting concepts from documents and then organizing
concepts into a hierarchy where each concept attaches to the subset of documents
containing it. Hence, for document sets about similar topics, monothetic concept
hierarchy approaches usually generate hierarchies with more stable nodes, which are
concepts extracted from the entire document set.
The Subsumption approach [39] is a classic and state-of-the-art monothetic con-
cept hierarchy approach. This approach built browsing hierarchies based on condi-
tional probability. They expanded the query by Local Context Analysis [46]. Then
they used the expanded query to retrieve the documents. Terms with high ratio of
occurrence in the retrieved documents to that in the collection were chosen to add
into the concept set composed of query terms. For the term pairs (x, y) in the concept
set, x was said to subsume y if P (x|y) ≥ 0.8, P (y|x) < 1.
2.3.3 Exploiting External Knowledge
Computer scientists usedWikipedia to improve their research work because Wikipedia
was compiled by thousands of experts from all over the world [16, 44, 4, 40, 27]. Not
only the texts but also the links and the categories in Wikipedia are prevalently
exploited.
20
Carmel et al. improved cluster labeling accuracy by using labels from Wikipedia
[4]. They �rst look for a set of terms that maximizes the Jensen-Shannon Divergence
(JSD) distance between the speci�c cluster and the entire corpus. They then search
Wikipedia using these terms for a list of documents. The title and corresponding
categories are picked up as the candidate cluster labels. After that, they rank the
candidates by Mutual Information judgment and Score Propagation judgment. The
results indicate high quality labels.
Han et al. organized the search result based on topics by leveraging the knowledge
that Wikipedia links provided [16]. They chose Wikipedia concepts from the links
words in the retrieved Wikipedia documents. Then a semantic graph was built based
on the semantic relatedness between these Wikipedia concepts. The graph was divided
into communities according to the internal links density. The terms in the community
represent the subtopics of the query. Finally, the search results for the query were
assigned to the communities by comparing the similarity to the communities.
Wang et al. [44] constructed a thesaurus of concepts from Wikipedia and use this
thesaurus to improve the text classi�cation. They used the out-link category-based
measure to assist deciding whether or not two articles were related. The out-link cat-
egories of an article were de�ned as the categories to which out-link articles from the
original one belong. The two articles were more closely related if their out-link cate-
gories overlapped more. They reported signi�cant improvement in text classi�cation
by introducing the out-link category-based measure.
Many SRC hierarchy construction approaches are data driven, such as the widely-
used hierarchical clustering algorithms. These algorithms �rst group similar docu-
ments into clusters and then label the clusters as hierarchy nodes. Multiple aspects in
textual search results often yield mixed-initiative clusters, which reduce the stability
of SRC hierarchies. Moreover, when clustering algorithms build clusters bottom-up,
21
little changes in leaf clusters propagate to upper levels and amplify the instability. Fur-
thermore, hierarchy labels are automatically generated from documents in a cluster,
which is often data-sensitive so that SRC hierarchies could look even more unstable.
Monothetic concept hierarchy approaches usually generate hierarchies with stable
nodes. However, monothetic concept hierarchy approaches often produce hierarchies
short of semantic meanings because just term frequencies, not meanings are taken into
account. This work �lls the gap by exploiting external knowledge to correct relations
between concepts.
22
Chapter 3
Effective Structured Query Formulation for Session Search
In session search, a user feeds a session into a search engine. The session includes
a series of previous queries q1, q2, · · · , qn−1 with corresponding previous results
D1, D2, · · · , Dn−1, and a current/last query qn. All the queries have a common under-
lying topic, or session topic. The search engine is expected to retrieve documents
relevant to the session topic.
Each query in a session is composed of terms and phrases. In order to represent
the topic of session precisely, we extract phrases from each query and combine words
and phrases in all queries together when performing document retrieval. The Indri
query language1 supports complex queries such as proximity terms and combining
beliefs, so as to bene�t building structured queries from all queries in a session. In
this work, we further expand structured queries with anchor texts. Anchor texts are
texts in a document, each of which is associated with a link to another document.
The research reported in this chapter have been published in Proceedings of the
21st Text REtrieval Conference (TREC 2012) [13].
3.1 Identifying Nuggets and Formulating Structured Queries
In a query, several words sometimes bundle together as a phrase to express a coherent
meaning. We identify phrase-like text nuggets and formulate them into Lemur queries
1http://www.lemurproject.org/, version 5.0
23
Figure 3.1: A sample nugget in the TREC 2012 session 53 query �servering spinalcord paralysis�.
for retrieval. Nuggets are substrings in a query, similar to phrases but are not neces-
sarily as semantically coherent as phrases.
Figure 3.1 shows an example of a nugget. Words �spinal� and �cord� often occur
together to represent a speci�c concept. We discover that a valid nugget appears
frequently in the top returned snippets for a query. Hence, we identify nuggets to
formulate new structured queries in Lemur query language. Particularly, we look for
nuggets in the top s snippets returned by Lemur for a query q. Nuggets are identi�ed
by two methods, a strict one and a relaxed one, as described below.
3.1.1 The Strict Method
First, a query is represented as a word list q = w1w2 · · ·wn. We send this word list
into Lemur and retrieve the top s snippets over an inverted index built for ClueWeb
CatB. Then all snippets are concatenated as a reference document R.
For every bi-gram in q, we count its occurrences in R. The occurrence of a bi-gram
is normalized by the smaller occurrence of the words in the bi-gram. A bi-gram is
marked as a nugget candidate if its normalized occurrence exceeds a threshold, as
shown in (3.1):
count(wiwi+1;R)
min(count(wi;R), count(wi+1;R))≥ θ (3.1)
24
Figure 3.2: Words in a snippet built from TREC 2012 session 53 query �serveringspinal cord consequenses�, where �spinal� is always connected to �cord�.
where count(x;R) denotes the occurrence of x in the reference document R; wi and
wi+1 are adjacent words in the query; θ is the threshold, which is tuned to be 0.97
over all of the TREC 2011 session data. For example, in TREC 2012 session 53
query �servering spinal cord consequenses�, we identify a bi-gram �spinal cord� as a
candidate.
Bi-grams could connect to form longer n-grams. For instance, there is a query
�hawaii real estate average resale value house or condo news� in TREC 2011 session
11. We discover that �hawaii real� and �real estate� are both marked as nugget can-
didates, so that they can be merged into a longer sequence �hawaii real estate�. On
the contrary, �estate average�, is not a candidate, hence we cannot append it to form
�hawaii real estate average�. Therefore, �hawaii real estate� is the longest sequence
and is recognized as a nugget.
Consequently, the query is broken down into nuggets and single words. All serve
as the elements to build up a structured query using the Lemur query language
#combine(nugget1 nugget2 · · · nuggetm w1 w2 · · · wr) (3.2)
where we suppose there are m nuggets and r single words.
25
Figure 3.3: Words in a snippet built from TREC 2011 session 20 query �dooneybourke purses�, where �dooney and bourke� is a brand name but the user omits theword �and�.
The example nugget detection for TREC 2012 session 53 is shown in Figure 3.2.
We obtain the structured query �#1(spinal cord) servering consequenses�.
3.1.2 The Relaxed Method
Operator #1 is a strict structure operator and may miss relevant documents. For
example, the queries in TREC 2011 session 20 all contain �dooney bourke�. However,
�dooney and bourke� is a brand name and is written as �dooney bourke� sometimes.
We would miss relevant documents with the phrase �dooney and bourke� if we for-
mulate the query to be �#1(dooney bourke)�. Hence, we introduce a relaxed method
for query formulation. We relax the constraints based on the intuition that distance
between two words re�ects the associativity of them. Particularly, we �rst retrieve
the reference document R as in Section 3.1.1. Every word's position in the snippet is
26
Figure 3.4: nDCG@10 values of retrieved documents using TREC 2011 Session trackdataset. two cases, with threshold and without threshold, are compared.
marked as shown in Figure 3.3. We then estimate the centroid of a word wi by
x̄(wi) =
∑j xj(wi;R)
count(wi;R)(3.3)
where s is the number of snippets, R is the reference document, which is the snippets,
xj(wi;R) is the index of the jth instance of wi in R, count(wi;R) is the occurrence of
wi in R.
For every bi-gram in a query, the distance between their estimated centroids is
calculated. We predict the window size (X in #X) of a nugget based on the distance.
Intuitively, it is reasonable to assume that the window size is proportional to the
27
distance between their estimated centroids, which can be written as:
nugget = #
⌈|x̄(wi)− x̄(wi+1)|
ξ
⌉(wi wi+1) (3.4)
where ξ is an empirical factor. However, some terms in a query do not form nuggets.
The distance between the centroids of these terms may be very long so as to generate
a large window size, which could be noise and hurt the search precision. Therefore,
we set a threshold to �lter out those term pairs with too far centroids. Figure 3.4
compares the nDCG@10 values of the retrieved documents over TREC 2011 sessions,
with and without threshold respectively. It shows that the precision greatly increases
with the threshold. A decision tree can be derived from Eq (3.4) with the threshold:
nugget =
#1(wi wi+1) |x̄(wi)− x̄(wi+1)| ≤ ξ
#2(wi wi+1) ξ < |x̄(wi)− x̄(wi+1)| ≤ 2ξ
∅ |x̄(wi)− x̄(wi+1)| > 2ξ
(3.5)
where we set the threshold to be 2ξ by experiments, i.e. we only consider nuggets
with window size no larger than 2. A structured query is then formulated as in eq
(3.2).
We tune ξ from 2 to 8 using TREC 2011 Session track dataset. Figure 3.4 shows
the nDCG@10 value for di�erent ξ. We �nd that the precision of session search is
not sensitive to the value of threshold ξ. Hence, we choose the ξ value with largest
nDCG@10, which is 5.
For the above query �dooney bourke purses�, Figure 3.3 shows the procedure of
generating a structured query �#2(dooney bourke) purses�.
28
3.2 Query Aggregation within a Session
A session contains multiple queries, from each of which we can build a structured
query. Therefore, we aggregate over all queries in a session to generate a large struc-
tured query. We �rst obtain a set of nuggets and single words from every query
qk = {nuggetik, wjk} by the approach presented in Section 3.1. Then we merge these
nuggets to form a structured query:
#weight(
λ1 #combine(nugget11 nugget12 · · · nugget1m w11 w12 · · · w1r)
λ2 #combine(nugget21 nugget22 · · · nugget2m w21 w22 · · · w2r)
· · ·
λn #combine(nuggetn1 nuggetn2 · · · nuggetnm wn1 wn2 · · · wnr)
)
(3.6)
where λk denotes the weight of query qk. Note that the last #combine is for the
current query qn.
3.2.1 Aggregation Schemes
Three weighting schemes are designed to determine the weight λk, namely uniform,
previous vs. current, and distance-based.
• uniform. Queries are assigned the same weight, i.e., λk = 1.
• previous vs. current. All previous queries share the same weight while the current
query uses a complementary and higher weight. Particularly, we de�ne:
λk =
λp k = 1, 2, · · · , n− 1
1− λp k = n
(3.7)
29
where λp is tuned to be 0.4 on TREC 2011 session track data.
• distance-based. The weights are distributed based on how far a query's position
in the session is from the current query. We use a reciprocal function to model
it:
λk =
λpn−k k = 1, 2, · · · , n− 1
1− λp k = n
(3.8)
where λp is tuned to be 0.4 based on TREC 2011 session track data, k is the
position of a query.
3.3 Query Expansion by Anchor Text
A session also provides previous search results, which are pages relevant to the pre-
vious queries. An anchor text pointing to a page often provide valuable human-created
description to this page [34], as shown in Figure 3.5, which enable us to expand a
query by words in anchor texts. A anchor log is extracted by harvestlinks in the Lemur
toolkit.
We collect anchor texts for all previous search results and sort them by term fre-
quency in decreasing order. The top 5 frequent anchor texts are appended to the
structured query generated in 3.2, each with a weight proportional to its term fre-
30
Figure 3.5: Anchor text in a web page.
quency.
#weight(
λ1 #combine(nugget11 nugget12 · · · nugget1m w11 w12 · · · w1r)
λ2 #combine(nugget21 nugget22 · · · nugget2m w21 w22 · · · w2r)
· · ·
λn #combine(nuggetn1 nuggetn2 · · · nuggetnm wn1 wn2 · · · wnr)
βω1 #combine(e1) βω2 #combine(e2) · · · βω5 #combine(e5)
)
(3.9)
where ei(i = 1 · · · 5) is the top 5 anchor texts, ωi(i = 1 · · · 5) denotes the corresponding
frequency of the anchor text, which is normalized by the maximum frequency, β is a
31
factor to adjust the intervention of anchor texts, which is tuned to be 0.1 based on
the TREC 2011 session data.
For example, in TREC 2012 session 53, the anchor texts with top frequen-
cies are �type of paralysi�, �quadriplegia paraplegia�, �paraplegia�, �spinal cord
injury�, and �quadriplegic tetraplegic�, hence the �nal structured query becomes
�#weight(1.0 #1(spinal cord) 0.6 consequenses 0.4 paralysis 1.0 servering 0.380723
#combine(type of paralysi) 0.004819 #combine(quadriplegia paraplegia) 0.004819
paraplegia 0.004819 #combine(spinal cord injury) 0.00241 #combine(quadriplegic
tetraplegic) )�, where the underlined part is from anchor texts.
3.4 Removing Duplicated Queries
The trace of how a user modi�es queries in a session may suggest the intention of the
user so that it can be exploited to study the real information need of the user. We
notice that sometimes a user repeats a previous query and makes duplicated queries.
Thus, we make two assumptions to re�ne the �nal structured query as follows.
• If there is a previous query that is the same as the current query qn, we only use
the current query to generate a �nal structured query. For example, in TREC
2011 session 22, the current query �shoulder joint pain� is the same as the �rst
query �shoulder joint pain�. The possible reason is that the search results for
the intermediate queries do not satisfy the user, which results in that the user
returns to one of the previous queries.
• If multiple previous queries are duplicated but they are all di�erent from qn, we
remove these queries when formulating a �nal structured query. For example,
in TREC 2011 session 60, the query �non-extinct marsupials� occurs for three
32
times and the query �marsupial manure� occurs twice. It could bias the search
results if we used all of these duplicate queries.
In the duplicate detection, we consider a special situation as follows. If a substring
is the abbreviation of another one, we consider that these two queries are duplicated.
For example, the only di�erence between queries �History of DSEC� and �History of
dupont science essay contest� is �DSEC� and �dupont science essay contest�, in which
the former is the abbreviation of the latter, hence they are considered as duplicates. To
detect abbreviations, we scan a query string and split a word into letters if this word
is entirely uppercase. In the example above, the �rst query is transformed to �History
of D S E C�. When comparing two queries, two words in corresponding positions are
considered the same if one of them contains only one capital letter and they start
with the same letter. In the above example, �dupont� and �D� are considered to be
same.
3.5 Document Re-ranking
A user intends to stay in a page that he or she is interested for longer time [11, 14, 47].
We use dwell time, which is de�ned as the elapsed time that a user stays in a page,
to re-rank the search results from the structured query generate in Section 3.4.
Click information in a session is associated with a start time ts and an end time te.
Therefore, dwell time ∆t can be derived by te− ts. In a session, we retrieve all clicked
pages ci with their dwell time ∆ti. For each returned document dj for a structured
query generated after 3.4, its cosine similarity to ci is computed. We calculate the
score of dj by
s(dj) =∑i
Sim(dj, ci) ·∆ti (3.10)
33
where Sim(dj, ci) is the cosine similarity between dj and ci. We rank dj by s(dj) in
decreasing order as the �nal search results.
In our experiments, the raw dwell time used in our method strongly bias the
document weights towards those with long dwell time, which corresponds to that
satisfying visits receive much higher weights. For example, if a document has been
reviewed by a user for more than 30 seconds, we consider that the user is satis�ed
by this document. On the contrary, if the dwell time of a document is only a few
seconds, the user might just have a glimpse on this document and �nd the content
not relevant. Since the dwell time is multiplied to the similarity, the former document
would achieve much higher score than the latter.
3.6 Evaluation for Session Search
We participated in TREC 2012 Session track and submitted three runs using di�erent
approach combinations, which are listed in Table 3.4. Four subtasks named RL1, RL2,
RL3, and RL4 are described in Section 2.1. In the evaluation results o�cially released
by NIST [26], we achieved the highest improvement from RL1 to RL2. Our retrieval
result for RL2�RL4 won the second rank among the participants.
3.6.1 Datasets, Baseline, and Evaluation Metrics
We build an inverted index over ClueWeb09 CatB. An anchor log is acquired by
applying harvestlinks over ClueWeb09 CatA, since the o�cial previous search results
are from CatA. Previous research demonstrates that ClueWeb09 collection involves
many spam documents. We �lter out spam documents based on Waterloo �GroupX�
spam ranking score2 less than 70 [7].
2http://durum0.uwaterloo.ca/clueweb09spam/
34
Lemur search engine is employed in our experiments as the baseline. Lemur's lan-
guage model based on the Bayesian belief network is applied [35]. The language model
is a multinomial distribution, for which the conjugate prior for Bayesian analysis is
the Dirichlet distribution [49]:
pµ(w|d) =c(w; d) + µp(w|C)∑
w c(w; d) + µ(3.11)
where c(w; d) denotes the occurrences of term w in document d, p(w|C) is the col-
lection language model, µ is the parameter. The parameter µ is tuned based on the
2011 session data.
The metrics provided by TREC 2012 Session track [26] are used to evaluate the
retrieval performance: Expected Reciprocal Rank (ERR), ERR@10, ERR normalized
by the maximum ERR per query (nERR), nERR@10, normalized discounted cumu-
lative gain (nDCG), nDCG@10, Average Precision (AP), and Precision@10, where
nDCG@10 serves as the primary metric, which is de�ned as [21]:
nDCG@10 =10∑i=1
rel(i)
1 + log2(i)/
10∑i=1
rel∗(i)
1 + log2(i)(3.12)
where rel(i) is the relevance score of the document at rank i in the ranked document
list retrieved for a session, rel∗(i) denotes the relevance score of the document at rank
i in the ideal ranked document list for the session. The top 10 documents are taken
account because search engines usually display top 10 relevant documents in the �rst
page, which are the most attractive to a user.
3.6.2 Results for TREC 2011 Session Track
For RL1, where only the current query qn is available, we generate a structured
query from qn by the approach described in Section 3.1 and send it into Lemur. The
Dirichlet parameter µ and the number of pseudo relevance feedback f are tested on
35
Table 3.1: nDCG@10 for TREC 2011 Session track RL1. Dirichlet smoothing methodis used. µ = 4000, f = 10 for strict method and µ = 4000, f = 20 for relaxed method.Methods are compared to the baseline � original query. A signi�cant improvementover the baseline is indicated with a † at p < 0.05 level and a ‡ at p < 0.005 level(t-test, single-tailed). The best run and median run in TREC 2011 are listed forcomparison.
Method original query strict relaxed TREC BestnDCG@10 0.3378 0.3834 0.3979 0.3789
%chg 0.00% 13.50%† 17.79%‡ 12.17%
TREC 2011 session data. The documents retrieved by directly searching qn serve
as the baseline. Table 3.1 shows the nDCG@10 results for RL1 on TREC 2011. By
formulating structured query using nuggets, we greatly boost the search accuracy
than baseline by 13.50%. The relaxed form achieves even better search accuracy of
0.3979 (+17.79%).
For RL2, we apply query expansion with the previous queries explained in Section
3.2. We observe that the strict method performs much better, because the window
size in relaxed method is hard to optimize for multiple queries. Table 3.2 presents the
nDCG@10 for RL2 on TREC 2011 session data. We �nd that �previous vs. current�
scheme gives the best search accuracy. It is worth noting that distance-based scheme
performs even worse than uniform scheme, which implies that the modi�cation of user
intention is complex and we cannot assume that the early query has less importance
in the entire session.
For RL3 and RL4, we combine several methods, including anchor texts, removing
duplicated queries and re-ranking by dwell time. Table 3.3 displays the nDCG@10
for RL3 and RL4 on 2011 session track data. It illustrates that removing duplicated
queries signi�cantly improves the performance. However, neither re-ranking nor only
36
Table 3.2: nDCG@10 for TREC 2011 Session track RL2. Dirichlet smoothing methodand strict method are used. µ = 4000, f = 5 for uniform, µ = 4500, f = 5 forprevious vs. current (PvC ) and distance-based. Methods are compared to the baseline� original query. A signi�cant improvement over the baseline is indicated with a †at p < 0.05 level and a ‡ at p < 0.005 level (t-test, single-tailed). The best run andmedian run in TREC 2011 are listed for comparison.
Scheme original query uniform PvC distance-based TREC Best
nDCG@10 0.3378 0.4475 0.4626 0.4431 0.4281
%chg 0.00% 32.47%‡ 36.94%‡ 31.17%‡ 26.73%
Table 3.3: nDCG@10 for TREC 2011 Session track RL3 and RL4. All runs use strictmethod and the con�guration of µ = 4500, f = 5. Methods are compared to thebaseline � original query. A signi�cant improvement over the baseline is indicatedwith a † at p < 0.05 level and a ‡ at p < 0.005 level (t-test, single-tailed). The bestrun and median run in TREC 2011 are listed for comparison.
Method
baseline=0.3378
anchor text TREC Bestall documents clicked documents nDCG@10
nDCG@10 %chg nDCG@10 %chg RL3
all queries 0.4695 38.99%‡ 0.4680 38.54%‡ 0.4307
remove duplicate 0.4836 43.16%‡ 0.4542 34.46%‡ RL4
re-rank by dwell time 0.4435 31.29%‡ � � 0.4540
considering clicked document contributes to the results. The reason may lie in that
we calculate cosine similarity based on the full text of documents, which perhaps
introduce lots of noise.
3.6.3 Results for TREC 2012 Session Track
We submit three runs to TREC 2012 session track. The run names, methods and
parameters are listed in Table 3.4, where µ is the Dirichlet smoothing parameter and
f is the number of pseudo relevance feedback.
37
Table 3.4: Methods and parameter settings for TREC 2012 Session track. µ is theDirichlet smoothing parameter, f is the number of pseudo relevance feedback.run RL1 RL2 RL3 RL4
guphrase1 strict methodµ = 4000, f =10
strict methodquery expansionµ = 4500, f = 5
strict methodquery expansionanchor textremove duplicatesµ = 4500, f = 5
strict methodquery expansionanchor textall queriesµ = 4500, f = 5
guphrase2 strict methodµ = 3500, f =10
strict methodquery expansionµ = 5000, f = 5
strict methodquery expansionanchor textremove duplicatesµ = 5000, f = 5
strict methodquery expansionanchor textall queriesµ = 5000, f = 5
gurelaxphr relaxed methodµ = 4000, f =20
relaxed methodquery expansionµ = 4500, f =20
relaxed methodquery expansionanchor textremove duplicatesµ = 4500, f = 20
strict methodquery expansionanchor textre-ranking by timeµ = 4500, f = 5
Table 3.5: nDCG@10 for TREC 2012 Session track. Mean of the median of the eval-uation results in TREC 2012 are listed.
run original query guphrase1 guphrase2 gurelaxphr TREC BestRL1 0.2474 0.2298 0.2265 0.2334 0.2615RL2 0.2474 0.2932 0.2839 0.2832 0.3100RL3 0.2474 0.3021 0.2995 0.3033 0.3221RL4 0.2474 0.3021 0.2995 0.2900 0.3153
Table 3.6: AP for TREC 2012 session track. Mean of the median of the evaluationresults in TREC 2012 are listed.
run original query guphrase1 guphrase2 gurelaxphr TREC BestRL1 0.1274 0.1185 0.1186 0.1223 0.1286RL2 0.1274 0.1466 0.1457 0.1455 0.1496RL3 0.1274 0.1490 0.1483 0.1482 0.1538RL4 0.1274 0.1490 0.1483 0.1467 0.1542
38
Figure 3.6: Changes in nDCG@10 from RL1 to RL2 presented by TREC 2012 Sessiontrack. Error bars are 95% con�dence intervals (Figure 1 in [26])
The evaluation results of nDCG@10 and Average Precision (AP) by TREC are
presented in Table 3.5 and Table 3.6. They show similar trends as what we observe
on the TREC 2011 data, but in a much lower range even beneath the results using
the original query. This may imply that our query formulation methods may over�t
on TREC 2011 session data. Nonetheless, using previous queries and eliminating
duplicates continues to demonstrate signi�cant improvement in search accuracy.
3.6.4 Official Evaluation Results for TREC 2012 Session Track
TREC 2012 Session track presented the evaluation for all participants [26]. The runs
were compared on both the individual subtasks and the improvements between the
39
Figure 3.7: All results by nDCG@10 for the current query in the session for eachsubtask (Table 2 in [26]).
40
pairs of subtasks. Our runs achieved the highest improvement from RL1 to RL2, as
shown in Figure 3.6 (Figure 1 in [26]). This improvement made us ranked second
among all the groups in RL2, RL3, and RL4, as shown in Figure 3.7 (Table 2 in
[26]). The evaluation results demonstrate the e�ectiveness of query formulation by
combining nuggets and user interaction information in session search.
3.7 Chapter Summary
In this chapter, we describe an approach to build e�ective structured query in session
search. A concept of nuggets is introduced to represent the phrase-like semantic units
in the query. A window size can be predicted for a nugget by the relaxed method
in nugget identi�cation. Nuggets from all queries are combined together by three
aggregation schemes. Experiments indicate that injecting nuggets into all queries in
a session increases the search accuracy signi�cantly. Removing duplicated queries in
a session improves the search accuracy further more. In addition, nuggets from the
current query are more important than those from previous queries. However, no
evidence shows di�erence in the importances of previous queries.
Query expansion and document re-ranking are applied to make additional progress
in search accuracy. Moreover, we design two rules to remove duplicate queries within
a session, which improves the search accuracy e�ectively. All these techniques make
our results ranked second among all the participants in the subtasks that involve a
session in TREC 2012 Session track.
41
Chapter 4
Increasing Stability of Result Organization for Session Search
The relatedness of the queries in a session requires high stability for the search result
organization. In order to improve the stability of SRC hierarchies, we presented an
original system framework based on the monothetic concept hierarchy approach. Par-
ticularly, we extract concepts from the document set �rst. Then the hierarchies are
built according to the statistics of the concepts such as document frequency in the
document set. Additionally, we applied the category information in Wikipedia to
regulate the parent-child relationship between pairs of concepts.
It is worth mentioning that we investigate how to increase stability of concept
hierarchies by considering only the current query and its search results. One may
argue that the instability issue could be resolved if considering queries in the same
session all together when building SRC hierarchies. However, in Web search, session
membership is not always available. Therefore, our task is more consistent with the
real application. Moreover, our task is to independently generate similar hierarchies
for queries as long as these queries are similar, which place more challenge in front
of us. Furthermore, our algorithms can be extended to include other queries in the
session if session segmentation is known.
The research results reported in this chapter have been published in Proceedings
of the 35th European Conference on Information Retrieval (ECIR 2013) [12].
42
Figure 4.1: Framework overview of the Wikipedia enhanced concept hierarchy con-struction system.
4.1 Utilizing External Knowledge to Increase Stability of Search
Result Organization
We propose to exploit external knowledge to increase stability of SRC hierarchies.
Wikipedia, a broadly used knowledge base, is used as the main source of external
knowledge. We refer to each article in Wikipedia as a page, which usually discusses a
single topic. The title of a page is called an entry. Every entry belongs to one or more
categories. The categories in Wikipedia are organized following the subsumption (also
called is-a) relations; together all Wikipedia categories form a network that consists
of many connected hierarchies.
Our framework consists of three components: concept extraction, identifying refer-
ence Wikipedia entries, and relationship construction, as shown in Figure 4.1. Initially,
the framework takes in a single query q and its search results D and extracts con-
43
cept set C that best represents D by an e�cient version of [48] Chapter 4. Next,
for each concept c ∈ C, the framework identi�es its most relevant Wikipedia entry e
which is called a reference Wikipedia entry. Finally, relationship construction adopts
two schemes to incorporate Wikipedia category information. One applies Subsump-
tion [39] �rst and then re�nes the relationships according to Wikipedia categories
while another connects the concepts purely based on Wikipedia. We present mapping
to reference Wikipedia entry in Section 4.2, followed by enhancing Subsumption by
Wikipedia in Section 4.3 and constructing hierarchies purely based on Wikipedia in
Section 4.4.
4.2 Identifying Reference Wikipedia Entries
Given a set of concepts C acquired by concept extraction, we identify the reference
Wikipedia entry for each concept. In particular, we �rst obtain potential Wikipedia
entries by retrieval. We employ Lemur toolkit to build an index from the entire
Wikipedia collection in ClueWeb09 CatB dataset. Each concept c ∈ C is sent as
a query to the index and the top 10 returned Wikipedia pages are kept. The titles
of these pages are considered as Wikipedia entry candidates for c. We denote these
entries as {ei}, i = 1 · · · 10.
We then select the most relevant Wikipedia entry as the reference Wikipedia entry.
Although we have obtained a ranked list of Wikipedia pages for c, the top result is not
always the best suited Wikipedia entry for the search session. For instance, TREC
2010 session 3 is about �diabetes education�, the top Lemur returned Wikipedia entry
for concept �GDM� is �GNOME Display Manager�, which is not relevant. Instead, the
second ranked entry �Gestational diabetes� is relevant. We propose to disambiguate
among the top returned Wikipedia entries by the following measures.
44
Figure 4.2: Mapping to relevant Wikipedia entry. Text in circles denotes Wikipediaentries, while text in rectangle denotes concepts. Based on the context of currentsearch session, the entry �Gestational diabetes� is selected as the most relevantWikipedia entry. Therefore the concept �GDM� is mapped to �Gestational diabetes�,whose supercategories are �Diabetes� and �Health issues in pregnancy�.
Cosine Similarity. Selected by the concept extraction component, most concepts
in C are meaningful phrases and exactly map to a Wikipedia entry. However, many
mutiple-word concepts and entries only partially match to each other. If they par-
tially match with a good portion, they should still be considered as matched. We
therefore measure the similarity between a concept c and its candidate Wikipedia
entries by cosine similarity. Particularly, we represent the concept and the entry as
term vectors after stemming and stop word removal. If a candidate entry, i.e. the title
of a Wikipedia page, starts with �Category:�, we remove the pre�x �Category�. Cosine
similarity of c and Wikipedia entry candidate ei is:
Sim(c, ei) =~vc · ~vei|~vc||~vei |
(4.1)
45
where ~vc and ~vei are term vectors of c and ei respectively.
Mutual Information. To resolve the ambiguity in Wikipedia entry candidates, we
select the entry that best �ts the current search query q and its search results D. For
example, in Figure 4.2, concept �GDM� could mean �GNOME Display Manager� or
�Gestational Diabetes Mellitus�. Given the query �diabetes education�, only the latter
is relevant. We need a measure to indicate similarity between a candidate entry ei and
the search query. Since concept set C can be used to represent the search results D,
we convert this problem into measuring the similarity between ei and C. We calculate
the mutual information MI(ei, C) between an entry candidate ei and the extracted
concept set C as described in [4], but with a modi�ed formula for calculating the
weight of a concept:
w(c) = log(1 + ctf(c)) · idf(c) (4.2)
where ctf(c) is the term frequency of concept c with regard to the entire document
set, and idf(c) is the inverse document frequency of concept c with regard to the entire
document set. It is worth noting that [4] clustered the document set �rst. Therefore,
the weight formula in [4] counted the term frequency of a concept with regard to the
cluster to which the concept belong. In addition, the weight formula in [4] slightly
biased weights of the terms distributed over many cluster documents by multiplying
an extra item
cdf(c, L) = log(N(c, L) + 1) (4.3)
where N(c, L) is the document frequency of the concept c with regard to the cluster
L.
Finally, we aggregate the scores. Each candidate entry is scored by a linear com-
bination of cosine similarity and MI:
score(ei) = αSim(ei, c) + (1− α)MI(ei, c) (4.4)
46
where α is set to 0.8 empirically. The aggregated score considers both the word sim-
ilarity and topic relevancy of a candidate entry. The highest scored candidate entry
is selected as the reference Wikipedia entry. Figure 4.2 illustrates the procedure of
�nding the reference Wikipedia entry.
4.3 Improving Stability of Subsumption
Subsumption is a popular approach for building concept hierarchies [39]. It identi�es
the is-a relationship between two concepts based on conditional probabilities: concept
x subsumes concept y if 0.8 < P (x|y) < 1 and P (y|x) < 1. The main weakness of
Subsumption is that minor �uctuation in document frequency may result in oppo-
site conclusion. For example, in search results for the query �diabetes education�, two
concepts �type 1 diabetes� and �type 2 diabetes�, show very similar document fre-
quencies. Small changes in search result documents may completely turn the decision
from �type 1 diabetes� subsuming �type 2 diabetes� into �type 2 diabetes� subsuming
�type 1 diabetes�. Neither of the conclusions is reliable or stable. In this work, we
propose to inject Wikipedia category information to Subsumption for building more
stable hierarchies.
First, we build a concept hierarchy by Subsumption. For the sake of e�ciency,
we sort all concepts in C by their document frequencies in D from high to low.
We compare document frequency of a concept c with every concept that has higher
document frequency than c. Since the concepts are all relevant to the same session,
we slightly relax the decision condition in Subsumption: for concepts x and y with
document frequencies dfx > dfy, we say x potentially subsumes y if
log(1 + dfy)
log(1 + dfx)> 0.6 (4.5)
47
where dfx and dfy are document frequencies of concepts x and y respectively and are
evaluated in D.
Second, based on reference Wikipedia entries ex and ey for concepts x and y, we
evaluate all potential subsumption pairs (x, y) in the following cases:
• ex is marked as a Wikipedia category: We extract the Wikipedia categories
that ey belongs to, including the case that ey itself is a Wikipedia category,
from ey's Wikipedia page. Note that ey may have multiple categories. The list
of Wikipedia categories for ey is called super-categories of ey and denoted as Sy.
x subsumes y is con�rmed if ex ∈ Sy.
• Neither ex nor ey is marked as a Wikipedia category: We extract the Wikipedia
categories that contain ey (ex) to form its super-categories set Sy (Sx). For each
syi ∈ Sy, we again extract its super-categories and form the super-supercategory
set SSy for ey. Next we calculate a subsumption score by counting the overlap
between SSy and Sx, normalized by the smaller size of SSy and Sx. The sub-
sumption score for concepts x and y is de�ned as:
Scoresub(x, y) =count(s; s ∈ Sx and s ∈ SSy)
min(|Sx|, |SSy|)(4.6)
where count(s; s ∈ Sx and s ∈ SSy) denotes the number of categories that
appear in both Sx and SSy. If Scoresub(x, y) for a potential subsumption pair
(x, y) passes a threshold (set to 0.6), x subsumes y.
• ey is marked as a Wikipedia category but ex is not: The potential subsumption
relationship between x and y is canceled.
By employing Wikipedia to re�ne and expand the relationships identi�ed by Sub-
sumption, we remove the majority of noise in hierarchies built by Subsumption. Figure
4.3 demonstrates this procedure.
48
Figure 4.3: An example of Wikipedia-enhanced Subsumption. The concepts �Dia-betes� and �type 2 diabetes� satisfy Eq.(4.5) and is identi�ed as a potential subsump-tion pair. The reference Wikipedia entry of �Diabetes� is a category, and referenceWikipedia entry of �type 2 diabetes� is a Wikipedia entry �Diabetes mellitus type 2�.Therefore we check if �Diabetes� is one of the supercategories of �Diabetes mellitustype 2� and con�rm that �diabetes� subsumes �type 2 diabetes�.
4.4 Building Concept Hierarchy Purely Based on Wikipedia
This section describes how to build SRC hierarchies purely based on Wikipedia. We
observed that categories on the same topic often share common super-categories or
common subcategories. This inspired us to create hierarchies by joining Wikipedia
subtrees. The algorithm is described as the following:
First, identify the start categories. For each concept c ∈ C, we collect all Wikipedia
categories that c's reference Wikipedia entry belongs to. We call these categories start
categories. If an entry is marked as a category, it is the start category.
Second, expand from the start categories. For each start category, we extract its
sub-categories from its Wikipedia page. Among these subcategories, we choose those
relevant to the current query for further expansion. The relevance for (ei, q) is mea-
sured by the MI measure described in Section 4.2. The subcategories with the MI
49
Figure 4.4: An example of Wikipedia-only hierarchy construction. From concept �Dia-betes mellitus� we �nd the reference Wikipedia entry �Diabetes mellitus�, then we �ndits start category �Diabetes�. Similarly, for another concept �joslin�, we �nd its refer-ence Wikipedia entry �Joslin Diabetes Center� and its start category �Diabetes orga-nizations�. We then expand from these two start categories. �Diabetes organizations�is one of the subcategories of �Diabetes�, thus we merge them together.
score higher than a threshold (set to 0.9) are kept. For the sake of e�ciency as well as
hierarchy quality, we expand the subcategories to three levels at most. Since concepts
in the search session share many start categories, expanding to a limited number of
levels hardly misses relevant categories. At the end of this step, we generate a forest
of trees consisting of all concepts in C as well as their related Wikipedia categories.
Third, select the right nodes to merge the trees. We apply the MI score described
in Section 4.2 to determine which super-category �ts into the search session and
assign the common node as its child. For example, start categories �Diabetes� and
�Medical and health organizations by medical condition� share a common child node
�Diabetes organizations�, which is a start category too. �Diabetes� is selected as the
super-category of �Diabetes organizations�. The trees that have common nodes get
connected together and form a larger hierarchy.
50
Last, clean up the hierarchy. For every internal node in the joined structure, we
traverse downwards to the leaves. Along the way, we trim the nodes that have no
o�spring in the concept set C to eliminate noise that is irrelevant to the current
query. Figure 4.4 shows the Wikipedia-only algorithm.
4.5 Evaluation for Search Result Organization
We evaluate our approach using the dataset of TREC 2010 and 2011 Session Tracks.
For each q, to obtain its search resultsD, we retrieve the top 1000 documents returned
by Lemur from an index built from the ClueWeb09 CatB collection. All relevant
documents identi�ed by TREC assessors are merged into the results set. Table 4.1
summarizes the data used in this evaluation.
We compare our approaches, Subsumption+Wikipedia (Section 4.3) andWikipedia-
only (Section 4.4), with the following systems:
• Clusty (now Yippy): We could not re-implement Clusty's algorithm. Instead,
we sent queries to yippy.com, saved the hierarchies.
• Hierarchical clustering: We employ WEKA1 to form hierarchical document clus-
ters and then assign labels to the clusters. The labeling is done by a highly
e�ective cluster labeling algorithm [4].
• Subsumption: A popular monothetic concept hierarchy construction algorithm,
used as the baseline. [39]. We modify Subsumption's decision parameters to
suit our dataset. In particular, we consider x subsumes y if P (x|y) ≥ 0.6 and
P (y|x) < 1.
1http://www.cs.waikato.ac.nz/ml/weka/, version 3.6.6, bottom-up hierarchical clus-tering based on cosine similarity.
51
Table 4.1: Statistics of TREC 2010 and TREC 2011 Session track datasets.Dataset #sessions #q #q per session #docTREC2010 100 200 2 200,000TREC2011 24 99 4.12 99,000Total 124 299 2.41 299,000
4.5.1 Hierarchy Stability
To quantitatively evaluate the stability of SRC hierarchies, we compare the similarity
between SRC hierarchies created within one search session. Given a query session Q
with queries q1, q2, ... qn, the stability of SRC hierarchies for Q is measured by the
average of pairwise hierarchy similarity between unique query pairs in Q. It is de�ned
as:
Stability(Q) =2
n(n− 1)
n−1∑i=1
n∑j=i+1
Simhie(Hi, Hj) (4.7)
where n is the number of queries in Q, Hi and Hj are SRC hierarchies for query qi
and qj, and Simhie(Hi, Hj) is the hierarchy similarity between Hi and Hj.
We apply three methods to calculate Simhie(Hi, Hj). Suppose there are M nodes
in Hi and N nodes in Hj,
• node overlap: Measures the percentage of identical nodes in Hi and Hj, normal-
ized by min(M,N).
• parent-child precision: Measures the percentage of similar parent-child pairs in
Hi and Hj, normalized by min(M,N).
52
• fragment-based similarity (FBS) [48]: Given two hierarchies Hi and Hj, FBS
compares their similarity by calculating
1
max(M,N)
m∑p=1
Simcos(cip, cjp) (4.8)
where cip ⊆ Hi, cjp ⊆ Hj, and they are the pth matched pair among the m
matched fragment pairs.
These metrics measure di�erent aspects of two hierarchies. Node overlap measures
content di�erence between hierarchies and ignores structure di�erences. Parent-child
precision measures local content and structure di�erences and it is a very strict mea-
sure. FBS considers both content and structure di�erences; it measures di�erences at
fragment level and tolerant minor changes in hierarchies.
Table 4.2 and Table 4.3 summarize the stability evaluation over the TREC 2010
and 2011 datasets, respectively. The most stable hierarchies are generated by the pro-
posed approaches. Our approaches statistically signi�cantly outperform Subsumption
in terms of stability in FBS for the evaluation datasets.
Not only our approaches but also Subsumption tremendously improves the sta-
bility of SRC hierarchies as compared to Clusty. Our observation is that monothetic
concept hierarchy approaches acquire concepts directly from the search results, it
probably learns from a more complete dataset rather than a segment of data (one
cluster) and be able to avoid minor changes.
Figure 4.5 and Figure 4.6 exhibit major clusters in SRC hierarchies for TREC
2010 session 3 generated by Clusty and Wiki-only (Section 4.4) respectively. The
queries are �diabetes education� and �diabetes education videos books�. We observe
that the Clusty hierarchies (Figure 4.5(a)(b)) are less stable than that built by Wiki-
only (Figure 4.6(a)(b)). For example, Clusty groups the search results by types of
services (Figure 4.5(a)); however, a test indicator of diabetes �Blood Sugar�, which is
53
Table 4.2: Stability of search result organization for TREC 2010 Session queries.Approaches are compared to the baseline - Subsumption. A signi�cant improvementover the baseline is indicated with a † at p < 0.05 and a ‡ at p < 0.005 (t-test,single-tailed).
MethodFBS Node overlap Parent-child precision
Average % chg Average % chg Average % chg
Clusty 0.463 � 0.415 � 0.144 �Hierarchical clustering 0.347 � 0.342 � 0.061 �Subsumption 0.573 0.00% 0.518 0.00% 0.394 0.00%Subsumption + Wikipedia 0.603† 5.24% 0.529† 2.12% 0.450‡ 14.21%Wikipedia only 0.634‡ 10.65% 0.516 −0.39% 0.452‡ 14.72%
Table 4.3: Stability of search result organization for TREC 2011 Session queries.Approaches are compared to the baseline - Subsumption. A signi�cant improvementover the baseline is indicated with a † at p < 0.05 and a ‡ at p < 0.005 (t-test,single-tailed).
MethodFBS Node overlap Parent-child precision
Average % chg Average % chg Average % chg
Clusty 0.440 � 0.327 � 0.115 �Hierarchical clustering 0.350 � 0.129 � 0.043 �Subsumption 0.483 0.00% 0.420 0.00% 0.262 0.00%Subsumption + Wikipedia 0.504† 4.35% 0.420 0.00% 0.247 −5.73%Wikipedia only 0.532‡ 10.14% 0.425† 1.19% 0.255 −2.67%
not any type of services, is added after the query is slightly changed (Figure 4.5(b)).
Moreover, the largest cluster in Figure 4.5(a), �Research�, disappears completely in
Figure 4.5(b). These changes make Clusty hierarchies less stable and less desirable.
The Wiki-only approach (Figure 4.6(a)(b)) that employ external knowledge bases
better maintain a single classi�cation dimension, in this case types of diabetes, and
are easy to follow.
54
Figure 4.5: Major clusters in hierarchies built by Clusty for TREC 2010 session 3. (a)is for query �diabetes education� and (b) is for �diabetes education videos books�.
Figure 4.6: Major clusters in hierarchies built by Wiki-only for TREC 2010 session 3.(a) is for query �diabetes education� and (b) is for �diabetes education videos books�.
55
4.5.2 Hierarchy Quality
One may question that perfect stability can be achieved by a static SRC hierarchy
regardless of query changes in a session. To avoid evaluating SRC hierarchies only
by stability while sacri�cing other important features, such as hierarchy quality, we
manually evaluate the hierarchies. Particularly, we compare two approaches, Sub-
sumption and Subsumption+Wikipedia, to see how much quality improvement is
done by adding Wikipedia information.
Figure 4.7 and Figure 4.8 illustrates the major clusters in hierarchies built for
TREC 2010 session 3 by Subsumption (Section 2.3.2) and Subsumption+Wiki (Sec-
tion 4.3). We observe errors in Figure 4.7(a): �Type 1 diabetes� is misplaced under
�type 2 diabetes�. While Figure 4.8(a) corrects this relationship and these two con-
cepts are both correctly identi�ed under �diabetes�.
Moreover, we �nd that hierarchies created by Wikipedia (Figure 4.8(a)(b))
exhibits higher stability than that by Subsumption only (4.7(a)(b)). For example, in
Figure 4.7, �type 2 diabetes� becomes the root of hierarchy when the query changes.
While in Figure 4.8, the main structure of hierarchy, �Diabetes� with two children
�type 2 diabetes� and �Type 1 diabetes� are maintained.
We further compare the hierarchies generated by Subsumption+Wiki (Section 4.3)
and Wiki-only (Section 4.4). Wiki-only approach generates more stable hierarchies
because it utilizes Wikipedia entries, which are standardized concepts, to connect
the concepts extracted from search results. This may cause a high overlap between
the hierarchies generated from queries about a similar topic. On the contrary, the
hierarchies generated by Subsumption+Wiki approach are more related to the query
because it primarily builds the relations between the concepts extracted from the
search results and only uses Wikipedia to �lter out the inappropriate relations.
56
Figure 4.7: Major clusters in hierarchies built by Subsumption for TREC 2010 session3. (a) is for query �diabetes education� and (b) is for �diabetes education videos books�.
Figure 4.8: Major clusters in hierarchies built by Subsumption+Wiki for TREC 2010session 3. (a) is for query �diabetes education� and (b) is for �diabetes educationvideos books�.
57
Figure 4.9: Search result organization quality improvement vs. stability for Subsump-tion and Subsumption+Wiki.
Quantitatively, we measure the quality improvement of Subsumption+Wiki over
Subsumption by checking the correctness of parent-child concept pairs in a hierarchy
H as:
countw,corr − countw,err − (counts,corr − countw,err)
countw + counts(4.9)
where count∗ is the number of concept pairs inH, w denotes Subsumption+Wikipedia,
s denotes Subsumption, corr means the correct pairs, err means the incorrect pairs.
Figure 4.9 plots the quality improvement vs stability for Subsumption and Sub-
sumption+Wiki over all evaluated query sessions. Stability is measured by the
number of di�erent parent-child pairs in corresponding hierarchies generated by these
58
Figure 4.10: Extreme case 1. A totally static hierarchy for two queries in a session(TREC 2010 session 107).
two approaches. Figure 4.9 demonstrates that quality and stability could correlate
well. Moreover, we calculate the Spearman's rank correlation coe�cient [43] and the
Pearson's correlation coe�cient [43] between quality improvement and stability and
the values are 0.764 and 0.760 respectively.
Queries change slightly within a session, hence the user may not expect a totally
static hierarchy in session search. Figure 4.10 shows two extreme cases. In the �rst
case, two queries in a session are �elliptical trainer� and �elliptical trainer bene�ts�
(TREC 2010 session 107). The hierarchies are exactly same for these two queries, but
the user may want more detailed hierarchy about �bene�ts� for the query �elliptical
trainer bene�ts�. With hierarchies not stable, the user would not be satis�ed either,
as shown in Figure 4.11 (TREC 2010 session 75). Therefore, in these extreme cases,
the quality of hierarchies are poor for all stability, shown as the red line in Figure
4.9. The comparison indicates that our proposed techniques increase the quality of
hierarchies while improving stability.
59
Figure 4.11: Extreme case 2. A totally di�erent hierarchy for two queries in a session(TREC 2010 session 75).
4.6 Chapter Summary
This chapter present a system framework which can generate stable hierarchy for
session search. Because the query usually changes little within a session, the stability
is required for search result organization. Our system �rst extracts the concepts from
the document set, and then use the concepts as the nodes to build the hierarchy. We
present two approaches that exploit Wikipedia category to improve the stability of
hierarchy. The �rst one corrects the mistaken relationship generated by Subsumption,
while the second one builds the hierarchy purely from the Wikipedia categories related
to the concepts.
The monothetic concept hierarchy approaches indicate signi�cant improvement
in stability over the hierarchical clustering approaches. The evaluation further shows
that the Wikipedia category information increase not only the stability but also the
quality of the hierarchy.
60
Chapter 5
Conclusion
This chapter concludes the thesis. Section 5.1 summarizes the research work. Section
5.2 highlights the signi�cance of this thesis. Section 5.3 proposes the probable research
directions in the future.
5.1 Research Summary
This thesis contains two components: (1) applying query formulation to improve the
accuracy of session search; (2) presenting a new system based on a monothetic concept
hierarchy approach to stably organize the retrieved documents for session search.
First, to obtain documents relevant to queries in a session, we translate individual
queries into a set of nuggets associated with weights. The nuggets from all queries are
aggregated by three schemes. Then we extract anchor texts from the previous inter-
actions to build an expansion term set. In order to process duplicated queries in a
session, we design two rules for two di�erent cases whether the duplicate involves the
current query or not. Dwell time of clicked documents is applied to re-rank the results
returned by Lemur. Evaluation results on both TREC 2011 and 2012 datasets show
that the search accuracy is signi�cantly improved by introducing previous queries,
expanding queries by anchor texts, and removing duplicates sequentially. Further-
more, the strict method of identifying nuggets performs better in an entire session
while the relaxed method performs better at searching a single query. The o�cial eval-
uation of TREC 2012 Session track showed that our submission achieved the highest
61
improvement in RL2. Our results for RL2, RL3, and RL4 were ranked second among
all groups.
Second, we study results organization for session search. We present two algorithms
to generate stable SRC hierarchies. The �rst one inserts a module, which utilizes
Wikipedia, into Subsumption approach to re�ne the parent-child relationships. The
second directly build SRC hierarchies from the category information of Wikipedia.
We draw noun phrases using POS tagger and then �lter them through Google to
generate a concept set. The concepts are mapped to Wikipedia entries by using cosine
similarity and mutual information. Evaluation indicates that the monothetic concept
hierarchy approaches bene�t the stability of SRC hierarchies for the session search.
Furthermore, applying external knowledge like Wikipedia improve the quality of SRC
hierarchies as well.
5.2 Significance of the Thesis
This thesis addresses how to search for a series of queries and associated user inter-
actions e�ectively and e�ciently. In order to retrieve documents relevant to the topic
of a session, search engines are expected to precisely understand queries in a session.
This expectation requires the system to identify important concepts in a session.
Nuggets, which represent the phrases in query, can identify the concepts emphasized
by the user so as to increase the accuracy of session search.
Nuggets can �nd out important concepts in queries, while aggregation schemes and
removal rules for duplicated queries deal with the importance of queries in a session.
Some queries in a session are much more important because they are submitted more
recently, while some cannot satisfy the user who submits them and become noise.
Aggregation schemes help the system to decide the importance of queries with regard
62
to when they are submitted. Removal rules for duplicated queries �lter out the noise
queries which could bias the search results.
This thesis further generates stable result organization for session search. We
adopt the idea of monothetic concept hierarchy because its top-down strategy bene�ts
the stability of hierarchies. We implement a �lter module that utilizes Wikipedia to
improve the accuracy of parent-child relationship generated by Subsumption, the most
prevalent monothetic concept hierarchy approach. Evaluation results show signi�cant
improvements in both stability and quality.
Building highly stable SRC hierarchies requires dynamical access to Wikipedia
category information, because the concepts, which become the cluster labels, are
dynamically extracted from the search results. We propose an approach, which com-
bines cosine similarity and mutual information for disambiguation, to identify the
Wikipedia entry relevant to a concept. With this e�ective approach of building a
map between concepts andWikipedia entries, we present an original concept hierarchy
construction algorithm that dynamically generates stable hierarchies from Wikipedia
categories.
5.3 Future Directions
The �eld of session search is attractive. We can continue to make progress in improving
the accuracy of search session based on our present system. We think that the nugget
approach is promising since it greatly boosted the performance of the system that uses
TREC 2011 data. Therefore, it is important to improve the accuracy of identifying
nuggets in a query. Phrases in queries could be identi�ed by their appearances. For
example, �American Girl� with the words uppercased represents a brand name, if in
lowercase, the connection between these two words is much looser. External knowledge
63
like dictionaries could be used to improve the accuracy of identifying nuggets. We
could build a component to compare candidate nuggets with an external knowledge
base so as to �lter out the erroneous nuggets.
Some phrases are not suited to be nuggets. For example, verb phrases are more
volatile than noun phrases, so that they could wipe out some relevant documents if we
formulate them into strict nuggets. The better method is perhaps to identify nuggets
by di�erent rules according to the type of phrase. A POS tagger could be applied to
classify a phrase and then select an appropriate rule to identify nuggets.
Our approaches of search result organization can be improved in several aspects.
First, other knowledge bases, such as freebase and WordNet, can be adopted in
the proposed framework. The synonyms and anonyms information in these knowl-
edge bases may give us the advantage to build more stable generated hierarchies.
Second, anchor texts in Wikipedia pages often link related concepts together. There-
fore, besides Wikipedia categories, we consider extracting relations between concepts
by analyzing anchor texts, which might be helpful to build SRC hierarchies with high
quality. In the future, we will additionally consider using lexical-syntactic patterns to
create more stable and better quality SRC hierarchies. Extended user study will also
be conducted to evaluate the quality of SRC hierarchies.
64
Bibliography
[1] D. C. Anastasiu, B. J. Gao, and D. Buttler. A framework for personalized and
collaborative clustering of search results. In Proceedings of the 20th international
conference on Information and knowledge management, CIKM '11, pages 573�
582, New York, NY, USA, 2011.
[2] S. Araujo, G. Gebremeskel, J. He, C. Bosscarino, and A. de Vries. Cwi at trec
2012, kba track and session track. In Proceedings of the 21st Text REtrieval
Conference, TREC '12, Gaithersburg, MD, USA, 2012.
[3] M. Bendersky, D. Metzler, and W. B. Croft. E�ective query formulation with
multiple information sources. In Proceedings of the 5th international conference
on Web search and data mining, WSDM '12, pages 443�452, New York, NY,
USA, 2012.
[4] D. Carmel, H. Roitman, and N. Zwerdling. Enhancing cluster labeling using
wikipedia. In Proceedings of the 32nd international SIGIR conference on Research
and development in information retrieval, SIGIR '09, pages 139�146, New York,
NY, USA, 2009.
[5] C. Carpineto and G. Romano. Optimal meta search results clustering. In
Proceedings of the 33rd international SIGIR conference on Research and
development in information retrieval, SIGIR '10, pages 170�177, New York, NY,
USA, 2010.
65
[6] B. Carterette and P. Chander. Implicit feedback and document �ltering
for retrieval over query sessions. In Proceedings of the 20th Text REtrieval
Conference, TREC '11, Gaithersburg, MD, USA, 2011.
[7] G. V. Cormack, M. D. Smucker, and C. L. Clarke. E�cient and e�ective spam
�ltering and re-ranking for large web datasets. Information Retrieval, 14(5):441�
465, Oct. 2011.
[8] W. B. Croft, M. Bendersky, H. Li, and G. Xu. Query representation and under-
standing workshop. SIGIR Forum, 44(2):48�53, Jan. 2011.
[9] D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/gather:
a cluster-based approach to browsing large document collections. In Proceedings
of the 15th annual international SIGIR conference on Research and development
in information retrieval, SIGIR '92, pages 318�329, New York, NY, USA, 1992.
[10] H. Duan and B.-J. P. Hsu. Online spelling correction for query completion. In
Proceedings of the 20th international conference on World wide web, WWW '11,
pages 117�126, New York, NY, USA, 2011.
[11] S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit
measures to improve web search. Transactions on Information System, 23(2):147�
168, Apr. 2005.
[12] D. Guan and H. Yang. Increasing stability of result organization for ses-
sion search. In Proceedings of the 35th European conference on Advances in
information retrieval, ECIR '13, Berlin, Heidelberg, 2013. Springer-Verlag.
66
[13] D. Guan, H. Yang, and N. Goharian. E�ective structured query formulation for
session search. In Proceedings of the 21st Text REtrieval Conference, TREC '12,
Gaithersburg, MD, USA, 2012.
[14] Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher
goals from interaction data. In Proceedings of the 33rd international SIGIR
conference on Research and development in information retrieval, SIGIR '10,
pages 130�137, New York, NY, USA, 2010.
[15] M. Hagen, M. Potthast, M. Busse, J. Gomoll, J. Harder, and B. Stein. Webis at
the trec 2012 session track. In Proceedings of the 21th Text REtrieval Conference,
Gaithersburg, MD, USA, 2012.
[16] X. Han and J. Zhao. Topic-driven web search result organization by lever-
aging wikipedia semantic knowledge. In Proceedings of the 19th international
conference on Information and knowledge management, CIKM '10, pages 1749�
1752, New York, NY, USA, 2010.
[17] J. He, V. Hollink, C. Bosscarino, R. Cornacchia, and A. de Vries. Cwi at trec
2011, session, web, and medical. In Proceedings of the 20th Text REtrieval
Conference, TREC '11, Gaithersburg, MD, USA, 2011.
[18] S. Huston and W. B. Croft. Evaluating verbose query processing techniques. In
Proceedings of the 33rd international ACM SIGIR conference on Research and
development in information retrieval, SIGIR '10, pages 291�298, New York, NY,
USA, 2010.
[19] B. Huurnink, R. Berendsen, K. Hofmann, E. Meij, and M. de Rijke. The uni-
versity of amsterdam at the trec 2011 session track. In Proceedings of the 20th
Text REtrieval Conference, TREC '11, Gaithersburg, MD, USA, 2011.
67
[20] A. Islam and D. Inkpen. Real-word spelling correction using google web 1tn-gram
data set. In Proceedings of the 18th conference on Information and knowledge
management, CIKM '09, pages 1689�1692, New York, NY, USA, 2009.
[21] K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques.
Transactions on Information Systems, 20(4):422�446, Oct. 2002.
[22] J. Jiang, S. Han, J. Wu, and D. He. Pitt at trec 2011 session track. In Proceedings
of the 20th Text REtrieval Conference, TREC '11, Gaithersburg, MD, USA, 2011.
[23] J. Jiang, D. He, and S. Han. Pitt at trec 2012 session track. In Proceedings of
the 21st Text REtrieval Conference, TREC '12, Gaithersburg, MD, USA, 2012.
[24] E. Kanoulas, B. Carterette, P. D. Clough, and M. Sanderson. Overview of the
trec 2010 session track. In Proceedings of the 19th Text REtrieval Conference,
TREC '10, Gaithersburg, MD, USA, 2010.
[25] E. Kanoulas, B. Carterette, M. Hall, P. Clough, and M. Sanderson. Overview of
the trec 2011 session track. In Proceedings of the 20th Text REtrieval Conference,
TREC '11, Gaithersburg, MD, USA, 2011.
[26] E. Kanoulas, B. Carterette, M. Hall, P. Clough, and M. Sanderson. Overview of
the trec 2012 session track. In Proceedings of the 21st Text REtrieval Conference,
TREC '12, Gaithersburg, MD, USA, 2012.
[27] R. Kaptein, P. Serdyukov, A. De Vries, and J. Kamps. Entity ranking using
wikipedia as a pivot. In Proceedings of the 19th international conference on
Information and knowledge management, CIKM '10, pages 69�78, New York,
NY, USA, 2010.
68
[28] L. Leal, S. Kharazmi, J. Dhaliwal, M. Sanderson, F. Scholer, S. Sadeghi, and
F. Alahmari. Rmit at trec 2011 session track. In Proceedings of the 20th Text
REtrieval Conference, TREC '11, Gaithersburg, MD, USA, 2011.
[29] Y. Li, H. Duan, and C. Zhai. A generalized hidden markov model with discrimina-
tive training for query spelling correction. In Proceedings of the 35th international
SIGIR conference on Research and development in information retrieval, SIGIR
'12, pages 611�620, New York, NY, USA, 2012.
[30] C. Liu, M. Cole, E. Baik, and J. N. Belkin. Rutgers at the trec 2012 session
track. In Proceedings of the 21st Text REtrieval Conference, TREC '12, 2012.
[31] T. Liu, C. Zhang, Y. Gao, W. Xiao, and H. Huang. BUPT_WILDCAT at trec
2011 session track. In Proceedings of the 20th Text REtrieval Conference, TREC
'11, Gaithersburg, MD, USA, 2011.
[32] W. Liu, H. Lin, Y. Ma, and T. Chang. Dutir at the session track in trec 2011.
In Proceedings of the 20th Text REtrieval Conference, TREC '11, Gaithersburg,
MD, USA, 2011.
[33] A. M-Dyaa and K. Udo. University of essex at the trec 2012 session track. In
Proceedings of the 21st Text REtrieval Conference, TREC '12, Gaithersburg,
MD, USA, 2012.
[34] A. M-Dyaa, K. Udo, N. Nikolaos, N. Brendan, L. Deirdre, and F. Maria. Uni-
versity of essex at the trec 2011 session track. In Proceedings of the 20th Text
REtrieval Conference, TREC '11, Gaithersburg, MD, USA, 2011.
69
[35] D. Metzler and W. B. Croft. Combining the language model and inference
network approaches to retrieval. Information Processing and Management,
40(5):735�750, Sept. 2004.
[36] D. Metzler andW. B. Croft. A markov random �eld model for term dependencies.
In Proceedings of the 28th annual international SIGIR conference on Research
and development in information retrieval, SIGIR '05, pages 472�479, New York,
NY, USA, 2005.
[37] G. Mishne and M. de Rijke. Boosting web retrieval through query operations.
In Proceedings of the 27th European conference on Advances in information
retrieval, ECIR '05, pages 502�516, Berlin, Heidelberg, 2005. Springer-Verlag.
[38] N. Nanas and A. Roeck. Autopoiesis, the immune system, and adaptive infor-
mation �ltering. Natural Computing, 8(2):387�427, June 2009.
[39] M. Sanderson and B. Croft. Deriving concept hierarchies from text. In
Proceedings of the 22nd annual international SIGIR conference on Research and
development in information retrieval, SIGIR '99, pages 206�213, New York, NY,
USA, 1999.
[40] C. Santamaría, J. Gonzalo, and J. Artiles. Wikipedia as sense inventory to
improve diversity in web search results. In Proceedings of the 48th Annual
Meeting of the Association for Computational Linguistics, ACL '10, pages 1357�
1366, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[41] U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita. Topical clustering of
search results. In Proceedings of the 5th international conference on Web search
and data mining, WSDM '12, pages 223�232, New York, NY, USA, 2012.
70
[42] R. Song, M. J. Taylor, J.-R. Wen, H.-W. Hon, and Y. Yu. Viewing term proximity
from a di�erent perspective. In Proceedings of the 30th European conference on
Advances in information retrieval, ECIR '08, pages 346�357, Berlin, Heidelberg,
2008. Springer-Verlag.
[43] D. Wackerly, W. Mendenhall, and R. L. Schea�er. Mathematical Statistics with
Applications. Duxbury Advanced Series, 2002.
[44] P. Wang, J. Hu, H.-J. Zeng, and Z. Chen. Using wikipedia knowledge to improve
text classi�cation. Knowledge and Information Systems, 19(3):265�281, May
2009.
[45] M. Wei, Y. Xue, C. Xu, X. Yu, Y. Liu, and X. Cheng. Ictnet at session track
trec 2011. In Proceedings of the 20th Text REtrieval Conference, TREC '11,
Gaithersburg, MD, USA, 2011.
[46] J. Xu and W. B. Croft. Query expansion using local and global document
analysis. In Proceedings of the 19th annual international SIGIR conference on
Research and development in information retrieval, SIGIR '96, pages 4�11, New
York, NY, USA, 1996.
[47] S. Xu, H. Jiang, and F. C. Lau. User-oriented document summarization through
vision-based eye-tracking. In Proceedings of the 14th international conference on
Intelligent user interfaces, IUI '09, pages 7�16, New York, NY, USA, 2009.
[48] H. Yang. Personalized Concept Hierarchy Construction. Ph.D. dissertation,
Carnegie Mellon University, 2011.
71
[49] C. Zhai and J. La�erty. A study of smoothing methods for language
models applied to information retrieval. Transactions on Information Systems,
22(2):179�214, Apr. 2004.
[50] C. Zhang, X. Wang, S. Wen, and R. Li. BUPT_PRIS at trec 2012 session track.
In Proceedings of the 21st Text REtrieval Conference, TREC '12, Gaithersburg,
MD, USA, 2012.
[51] L. Zhao and J. Callan. Automatic term mismatch diagnosis for selective query
expansion. In Proceedings of the 35th international ACM SIGIR conference on
Research and development in information retrieval, SIGIR '12, pages 515�524,
New York, NY, USA, 2012.
72
top related