Download - Session Search by Direct Policy Learning
Dynamic IR• Statistical Modeling of Information Seeking
• Aims to connect user’s information seeking behaviors with a set of new retrieval models
• most are generative
• the ‘dynamics’ in the search process are the primary elements to be modeled
1
Research Agenda
Evaluation of DIR CubeTest, CIKM’13
POMDP stochastic game,
SIGIR’14
MDP query change
model,SIGIR’13
Design of POMDPs ECIR’15
Direct Policy Learning reduce complexity,
ICTIR’15
Two-Way Communication,
ICTIR’15
TREC Dynamic Domain Track’15
2
Session Search by Direct Policy Learning
Jiyun Luo, Xuchu Dong, Grace Hui Yang Georgetown University
ICTIR 2015
3
Session Search• The information retrieval task that aims to find
relevant documents for a session of multiple queries.
• We would like to call it ‘dynamic search’, too
• It happens when information needs are complex, vague, evolving, often containing multiple subtopics
• not possible to be resolved by one-shot ad-hoc search
4
E.g. Find what city and state Dulles airport is in, what shuttles ride-sharing vans and taxi cabs connect the airport to other cities, what hotels are close to the airport, what are some cheap off-airport parking, and what are the metro stops close to the Dulles airport.
Information need
User
Search Engine
An Illustration
5
• Search by “test-the-water” • Trial-and-error • Repeatedly trying different search
paths via writing different queries, until succeeding in satisfying the information need by finding relevant documents
• The search engine receives immediate, instant feedback (reward) from the user
• Aim for optimization of long term gain
A good fit for Reinforcement Learning
• Existing work on RL in IR
• Query Change Model (SIGIR’13, Guan et al.)
• POMDP Modeling (SIGIR’14, Luo et al.)
• POMDP in Re-ranking (WWW’13, Jin et al.)
• More Prior work
• UCAIR (CIKM’05, X. Shen, B. Tan, and C. Zhai)
• Online Learning (ECIR’11, K. Hofmann, S. Whiteson, and M. de Rijke)
6
POMDP: Partially Observable Markov Decision Process
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
● Hidden states ● Actions ● Rewards
1R. D. Smallwood et. al., ‘73
o1 o2 o3
7
● Markov ● Long Term Optimization ● Observations ● Beliefs
7
Challenges of RL in IR• Formulation of the Problem (ECIR’15, Luo et al. )
• What are the states, actions, rewards, observations, agents?
• Math vs. Physical Meanings
• Efficiency
• RL training is expensive
• Existing Solutions - Reduce the states, actions
• Four Decision-Making states in our POMDP modeling (SIGIR’14)
8
Contribution of this Paper• Addresses high complexity of RL in IR
• directly learns mappings from observations to actions
• skips states, beliefs
• flatten the model structure (a more down-to-the-earth model)
• … but, still complex enough to be interesting
• less model complexity leads to higher efficiency
9
A Direct Policy Learning Framework
• At each search iteration, the search engine maximizes long-term rewards (value function)
• Learns a direct mapping from observations to actions by gradient descent
• Experimented on TREC Session Tracks 2012-2014
V✓(s0) = E� 1X
t=0
�tr(t)|s0�
10
Defining a History• History: the record of a session from the search
iteration 0 to the current iteration t
• A chain of events happening in a session
• the dynamic changes of states, actions, observations, and rewards in a session
ht = [ht�1, Ct, Tt, qt,�qt, Dt]
11
12
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
Example: TREC 2014 Session 1011 “quit smoking”
13
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
Example: TREC 2014 Session 1011 “quit smoking”
14
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
SAT-Clicked. !Dwell time: 40 seconds!
Clicked. !Dwell time:24 seconds!
Example: TREC 2014 Session 1011 “quit smoking”
15
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
SAT-Clicked. !Dwell time: 40 seconds!
Clicked. !Dwell time:24 seconds!
smoking quitting hypnosis!q2 !
D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
Example: TREC 2014 Session 1011 “quit smoking”
16
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
smoking quitting !q2 ! hypnosis !+∆q "
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
SAT-Clicked. !Dwell time: 40 seconds!
D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
Clicked. !Dwell time:24 seconds!
Exploitation!
Query reformulation using words in previous search results!
Example: TREC 2014 Session 1011 “quit smoking”
17
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
smoking quitting !q2 ! hypnosis !+∆q "
h2 !
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
[h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !
C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
SAT-Clicked. !Dwell time: 40 seconds!
D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
Clicked. !Dwell time:24 seconds!
Exploitation!
Query reformulation using words in previous search results!
Example: TREC 2014 Session 1011 “quit smoking”
18
side effects !
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
smoking quitting !q2 ! hypnosis !+∆q "
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
SAT-Clicked. !Dwell time: 40 seconds!
D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
Clicked. !Dwell time:24 seconds!
Exploitation!
Query reformulation using words in previous search results!
h2 ! [h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !
C3 !!Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
quit smoking !q3 ! hypnosis !
SAT-Clicked. !Dwell time: 31 seconds!
D3 !!Rank 1: Side Effects Of Quitting Smoking | Self Hypnosis To Quit Smoking … !… !… !
Example: TREC 2014 Session 1011 “quit smoking”
19
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
smoking quitting !q2 !
C3 !!Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
hypnosis !+∆q "
quit smoking !q3 ! side effects !+∆q "
hypnosis !-∆q "
SAT-Clicked. !Dwell time: 31 seconds!
h2 !
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
[h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !
C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
SAT-Clicked. !Dwell time: 40 seconds!
D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
D3 !!Rank 1: Side Effects Of Quitting Smoking | Self Hypnosis To Quit Smoking … !… !… !
Clicked. !Dwell time:24 seconds!
Exploitation!
Query reformulation using words in previous search results!
Exploration!
Query reformulation excluding words in previous search results!
Example: TREC 2014 Session 1011 “quit smoking”
20
quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!
… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
smoking quitting !q2 !
C3 !!Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
hypnosis !+∆q "
quit smoking !q3 ! side effects !+∆q "
hypnosis !-∆q "
SAT-Clicked. !Dwell time: 31 seconds!
h2 !
h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !
[h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !
C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !
SAT-Clicked. !Dwell time: 40 seconds!
D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !
h3 ! [h2, clicked:[[4,31,SAT-Clicked=T]],q3,+∆q:side effects,-∆q:hypnosis, D3 ] !
D3 !!Rank 1: Side Effects Of Quitting Smoking | Self Hypnosis To Quit Smoking … !… !… !
Clicked. !Dwell time:24 seconds!
Exploitation!
Query reformulation using words in previous search results!
Exploration!
Query reformulation excluding words in previous search results!
Example: TREC 2014 Session 1011 “quit smoking”
Decompose a history• First level: iteration by iteration
• Second level: break down an iteration into
• browse phase
• query phase
• retrieval phase
21
Browse Phase• Actor: the user
• It happens
• after the search results are shown to the user
• before the user starts to write the next query
• Records how the user perceives and examines the (previously retrieved) search results
22
s(t)�
orank(t)� abrowse(t)�
n1(t)�
…"
Query Phase• Actor: the user
• It happens
• when the user writes a query
• Assuming the query is created based on
• what has been seen in the browse phase
• the information need
23
s(t)�
obrowse(t)� aquery(t)�orank(t)� abrowse(t)�
n2(t)�n1(t)�
…"
Rank Phase• Actor: the search engine
• It happens
• after the query is entered
• before the search results are returned
• It is where the search algorithm takes place
24
s(t)�
obrowse(t)� aquery(t)�
oquery(t)� arank(t)�s(t+1)�
orank(t)� abrowse(t)�
n2(t)�
n3(t)�
n1(t)�
browse' query'rank'…" …"
P (h|✓) =len(h)Y
t=1
P (orank
(t), abrowse
(t),
o
browse
(t), aquery
(t), oquery
(t), arank
(t)|ht�1, ✓)
/len(h)Y
t=1
P (abrowse
(t)|orank
(t), ✓1)
⇥P (aquery
(t)|obrowse
(t), ✓2)
⇥P (arank
(t)|obrowse
(t), oquery
(t), orank
(t), ✓3)
/len(h)Y
t=1
Y
i2{1,2,3}
P (ai(t)|ni(t), ✓i)
25
Our objective function:
where
Action Selection DistributionP (ai|ni, ✓i) =
e✓i·�(ai,ni)
Pa0i e✓i·�(a
0i,ni)
@V✓(s0)
@✓k=
1X
t=1
�tX
h2H
r(t, h)@P (h|✓)@✓k
=1X
t=1
�tX
h2H
r(t, h)P (h|✓)
⇥tX
i=0
@ln[P (abrowse|n1, ✓1)P (aquery|n2, ✓2)P (arank|n3, ✓3)]
@✓k
26
Softmax Function
Gradient
Ranking Function
• It originally presents the probability of selecting a (ranking) action
• In our context, the probability of selecting d to be put at the top of a ranked list under n3 and θ3 at the tth iteration
• Then we sort the documents by it to generate the document list
P (arank|n3, ✓3) =e✓3·�(arank,n3)
Pa0rank
e✓3·�(a0rank,n3)
27
28
�✓3 =X
h2H
len(h)X
t=1
�tr(t, h)⇥tX
i=1
[�(arank, n3)�
X
a0rank
�(a0rank, n3) P (a0rank|n3, ✓3)]
Updates:
�(arank, n3)Feature function: Query Features • Test if a search term w∈qt and w∈qt−1
• # of times that a term w occurs in q1,q2,…,qt
Query-Document Features • Test if a search term w∈+∆qt and w∈Dt−1
• Test if a document d contains a term w ∈ −∆qt
tf.idf score of a document d to qt
Click Features • Test if there are SAT-Clicks in Dt−1 • # of times a document being clicked in the
current session • # of seconds a document being viewed and
reviewed in the current session Query-Document-Click Features • Test if qi leads to SAT-Clicks in Di, where i =
0...t−1 Session Features • position at the current session
Browse
Query
Rank
Experiments• Data: TREC 2012, 2013, 2014 Session Tracks • Corpus: ClueWeb09, ClueWeb12
29
Baselines• lemur • qcm (the query change model, MDP [Guan et. al
SIGIR’13]) • winwin: a POMDP model [Luo et. al SIGIR’14]
• winwin-short: user clicks are used as reward • winwin-long: nDCG, an ideal reward function, are
used as reward • dpl (proposed in this paper) • dpl+upper bound
• using ground truth as reward function • a upper bound of the proposed approach
30
Whole-Session Search Accuracy
• dpl achieves a significant improvement over the TREC best run • We found similar conclusions on TREC 2013 and 2014 Session Track
Experiments
31
Efficiency
• lemur > dpl > qcm > winwin • dpl achieves a good balance between accuracy and efficiency • the conclusions are also consistent upon experiments on TREC’12
~ 14 Session Tracks
32
Conclusions• A novel document retrieval algorithm by Direct Policy
Learning
• Define the history and three phases in search
• a way to describe the (messy) information seeking process
• The approach achieves a good balance between effectiveness and efficiency
• less complexity
• more flexible to incorporate large set of features
33