session search by direct policy learning

33
Dynamic IR Statistical Modeling of Information Seeking Aims to connect user’s information seeking behaviors with a set of new retrieval models most are generative the ‘dynamics’ in the search process are the primary elements to be modeled 1

Upload: grace-yang

Post on 08-Apr-2017

418 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Session Search by Direct Policy Learning

Dynamic IR• Statistical Modeling of Information Seeking

• Aims to connect user’s information seeking behaviors with a set of new retrieval models

• most are generative

• the ‘dynamics’ in the search process are the primary elements to be modeled

1

Page 2: Session Search by Direct Policy Learning

Research Agenda

Evaluation of DIR CubeTest, CIKM’13

POMDP stochastic game,

SIGIR’14

MDP query change

model,SIGIR’13

Design of POMDPs ECIR’15

Direct Policy Learning reduce complexity,

ICTIR’15

Two-Way Communication,

ICTIR’15

TREC Dynamic Domain Track’15

2

Page 3: Session Search by Direct Policy Learning

Session Search by Direct Policy Learning

Jiyun Luo, Xuchu Dong, Grace Hui Yang Georgetown University

ICTIR 2015

3

Page 4: Session Search by Direct Policy Learning

Session Search• The information retrieval task that aims to find

relevant documents for a session of multiple queries.

• We would like to call it ‘dynamic search’, too

• It happens when information needs are complex, vague, evolving, often containing multiple subtopics

• not possible to be resolved by one-shot ad-hoc search

4

Page 5: Session Search by Direct Policy Learning

E.g. Find what city and state Dulles airport is in, what shuttles ride-sharing vans and taxi cabs connect the airport to other cities, what hotels are close to the airport, what are some cheap off-airport parking, and what are the metro stops close to the Dulles airport.

Information need

User

Search Engine

An Illustration

5

• Search by “test-the-water” • Trial-and-error • Repeatedly trying different search

paths via writing different queries, until succeeding in satisfying the information need by finding relevant documents

• The search engine receives immediate, instant feedback (reward) from the user

• Aim for optimization of long term gain

Page 6: Session Search by Direct Policy Learning

A good fit for Reinforcement Learning

• Existing work on RL in IR

• Query Change Model (SIGIR’13, Guan et al.)

• POMDP Modeling (SIGIR’14, Luo et al.)

• POMDP in Re-ranking (WWW’13, Jin et al.)

• More Prior work

• UCAIR (CIKM’05, X. Shen, B. Tan, and C. Zhai)

• Online Learning (ECIR’11, K. Hofmann, S. Whiteson, and M. de Rijke)

6

Page 7: Session Search by Direct Policy Learning

POMDP: Partially Observable Markov Decision Process

……s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

● Hidden states ● Actions ● Rewards

1R. D. Smallwood et. al., ‘73

o1 o2 o3

7

● Markov ● Long Term Optimization ● Observations ● Beliefs

7

Page 8: Session Search by Direct Policy Learning

Challenges of RL in IR• Formulation of the Problem (ECIR’15, Luo et al. )

• What are the states, actions, rewards, observations, agents?

• Math vs. Physical Meanings

• Efficiency

• RL training is expensive

• Existing Solutions - Reduce the states, actions

• Four Decision-Making states in our POMDP modeling (SIGIR’14)

8

Page 9: Session Search by Direct Policy Learning

Contribution of this Paper• Addresses high complexity of RL in IR

• directly learns mappings from observations to actions

• skips states, beliefs

• flatten the model structure (a more down-to-the-earth model)

• … but, still complex enough to be interesting

• less model complexity leads to higher efficiency

9

Page 10: Session Search by Direct Policy Learning

A Direct Policy Learning Framework

• At each search iteration, the search engine maximizes long-term rewards (value function)

• Learns a direct mapping from observations to actions by gradient descent

• Experimented on TREC Session Tracks 2012-2014

V✓(s0) = E� 1X

t=0

�tr(t)|s0�

10

Page 11: Session Search by Direct Policy Learning

Defining a History• History: the record of a session from the search

iteration 0 to the current iteration t

• A chain of events happening in a session

• the dynamic changes of states, actions, observations, and rewards in a session

ht = [ht�1, Ct, Tt, qt,�qt, Dt]

11

Page 12: Session Search by Direct Policy Learning

12

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

Example: TREC 2014 Session 1011 “quit smoking”

Page 13: Session Search by Direct Policy Learning

13

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

Example: TREC 2014 Session 1011 “quit smoking”

Page 14: Session Search by Direct Policy Learning

14

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

SAT-Clicked. !Dwell time: 40 seconds!

Clicked. !Dwell time:24 seconds!

Example: TREC 2014 Session 1011 “quit smoking”

Page 15: Session Search by Direct Policy Learning

15

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

SAT-Clicked. !Dwell time: 40 seconds!

Clicked. !Dwell time:24 seconds!

smoking quitting hypnosis!q2 !

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

Example: TREC 2014 Session 1011 “quit smoking”

Page 16: Session Search by Direct Policy Learning

16

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

smoking quitting !q2 ! hypnosis !+∆q "

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

SAT-Clicked. !Dwell time: 40 seconds!

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

Clicked. !Dwell time:24 seconds!

Exploitation!

Query reformulation using words in previous search results!

Example: TREC 2014 Session 1011 “quit smoking”

Page 17: Session Search by Direct Policy Learning

17

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

smoking quitting !q2 ! hypnosis !+∆q "

h2 !

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

[h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !

C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

SAT-Clicked. !Dwell time: 40 seconds!

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

Clicked. !Dwell time:24 seconds!

Exploitation!

Query reformulation using words in previous search results!

Example: TREC 2014 Session 1011 “quit smoking”

Page 18: Session Search by Direct Policy Learning

18

side effects !

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

smoking quitting !q2 ! hypnosis !+∆q "

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

SAT-Clicked. !Dwell time: 40 seconds!

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

Clicked. !Dwell time:24 seconds!

Exploitation!

Query reformulation using words in previous search results!

h2 ! [h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !

C3 !!Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

quit smoking !q3 ! hypnosis !

SAT-Clicked. !Dwell time: 31 seconds!

D3 !!Rank 1: Side Effects Of Quitting Smoking | Self Hypnosis To Quit Smoking … !… !… !

Example: TREC 2014 Session 1011 “quit smoking”

Page 19: Session Search by Direct Policy Learning

19

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

smoking quitting !q2 !

C3 !!Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

hypnosis !+∆q "

quit smoking !q3 ! side effects !+∆q "

hypnosis !-∆q "

SAT-Clicked. !Dwell time: 31 seconds!

h2 !

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

[h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !

C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

SAT-Clicked. !Dwell time: 40 seconds!

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

D3 !!Rank 1: Side Effects Of Quitting Smoking | Self Hypnosis To Quit Smoking … !… !… !

Clicked. !Dwell time:24 seconds!

Exploitation!

Query reformulation using words in previous search results!

Exploration!

Query reformulation excluding words in previous search results!

Example: TREC 2014 Session 1011 “quit smoking”

Page 20: Session Search by Direct Policy Learning

20

quit smoking !q1 !D1 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help …!

… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

smoking quitting !q2 !

C3 !!Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

hypnosis !+∆q "

quit smoking !q3 ! side effects !+∆q "

hypnosis !-∆q "

SAT-Clicked. !Dwell time: 31 seconds!

h2 !

h1 ! [clicked:none, q1, +∆q:quit smoking, -∆q:none, D1] !

[h1, clicked:[[3,24,SAT-Clicked=F],[6,40,SAT-Clicked=T]],q2,+∆q:hypnosis, -∆q:none, D2 ] !

C2 ! Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs… !

SAT-Clicked. !Dwell time: 40 seconds!

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart… !

h3 ! [h2, clicked:[[4,31,SAT-Clicked=T]],q3,+∆q:side effects,-∆q:hypnosis, D3 ] !

D3 !!Rank 1: Side Effects Of Quitting Smoking | Self Hypnosis To Quit Smoking … !… !… !

Clicked. !Dwell time:24 seconds!

Exploitation!

Query reformulation using words in previous search results!

Exploration!

Query reformulation excluding words in previous search results!

Example: TREC 2014 Session 1011 “quit smoking”

Page 21: Session Search by Direct Policy Learning

Decompose a history• First level: iteration by iteration

• Second level: break down an iteration into

• browse phase

• query phase

• retrieval phase

21

Page 22: Session Search by Direct Policy Learning

Browse Phase• Actor: the user

• It happens

• after the search results are shown to the user

• before the user starts to write the next query

• Records how the user perceives and examines the (previously retrieved) search results

22

s(t)�

orank(t)� abrowse(t)�

n1(t)�

…"

Page 23: Session Search by Direct Policy Learning

Query Phase• Actor: the user

• It happens

• when the user writes a query

• Assuming the query is created based on

• what has been seen in the browse phase

• the information need

23

s(t)�

obrowse(t)� aquery(t)�orank(t)� abrowse(t)�

n2(t)�n1(t)�

…"

Page 24: Session Search by Direct Policy Learning

Rank Phase• Actor: the search engine

• It happens

• after the query is entered

• before the search results are returned

• It is where the search algorithm takes place

24

s(t)�

obrowse(t)� aquery(t)�

oquery(t)� arank(t)�s(t+1)�

orank(t)� abrowse(t)�

n2(t)�

n3(t)�

n1(t)�

browse' query'rank'…" …"

Page 25: Session Search by Direct Policy Learning

P (h|✓) =len(h)Y

t=1

P (orank

(t), abrowse

(t),

o

browse

(t), aquery

(t), oquery

(t), arank

(t)|ht�1, ✓)

/len(h)Y

t=1

P (abrowse

(t)|orank

(t), ✓1)

⇥P (aquery

(t)|obrowse

(t), ✓2)

⇥P (arank

(t)|obrowse

(t), oquery

(t), orank

(t), ✓3)

/len(h)Y

t=1

Y

i2{1,2,3}

P (ai(t)|ni(t), ✓i)

25

Our objective function:

where

Page 26: Session Search by Direct Policy Learning

Action Selection DistributionP (ai|ni, ✓i) =

e✓i·�(ai,ni)

Pa0i e✓i·�(a

0i,ni)

@V✓(s0)

@✓k=

1X

t=1

�tX

h2H

r(t, h)@P (h|✓)@✓k

=1X

t=1

�tX

h2H

r(t, h)P (h|✓)

⇥tX

i=0

@ln[P (abrowse|n1, ✓1)P (aquery|n2, ✓2)P (arank|n3, ✓3)]

@✓k

26

Softmax Function

Gradient

Page 27: Session Search by Direct Policy Learning

Ranking Function

• It originally presents the probability of selecting a (ranking) action

• In our context, the probability of selecting d to be put at the top of a ranked list under n3 and θ3 at the tth iteration

• Then we sort the documents by it to generate the document list

P (arank|n3, ✓3) =e✓3·�(arank,n3)

Pa0rank

e✓3·�(a0rank,n3)

27

Page 28: Session Search by Direct Policy Learning

28

�✓3 =X

h2H

len(h)X

t=1

�tr(t, h)⇥tX

i=1

[�(arank, n3)�

X

a0rank

�(a0rank, n3) P (a0rank|n3, ✓3)]

Updates:

�(arank, n3)Feature function: Query Features • Test if a search term w∈qt and w∈qt−1

• # of times that a term w occurs in q1,q2,…,qt

Query-Document Features • Test if a search term w∈+∆qt and w∈Dt−1

• Test if a document d contains a term w ∈ −∆qt

tf.idf score of a document d to qt

Click Features • Test if there are SAT-Clicks in Dt−1 • # of times a document being clicked in the

current session • # of seconds a document being viewed and

reviewed in the current session Query-Document-Click Features • Test if qi leads to SAT-Clicks in Di, where i =

0...t−1 Session Features • position at the current session

Browse

Query

Rank

Page 29: Session Search by Direct Policy Learning

Experiments• Data: TREC 2012, 2013, 2014 Session Tracks • Corpus: ClueWeb09, ClueWeb12

29

Page 30: Session Search by Direct Policy Learning

Baselines• lemur • qcm (the query change model, MDP [Guan et. al

SIGIR’13]) • winwin: a POMDP model [Luo et. al SIGIR’14]

• winwin-short: user clicks are used as reward • winwin-long: nDCG, an ideal reward function, are

used as reward • dpl (proposed in this paper) • dpl+upper bound

• using ground truth as reward function • a upper bound of the proposed approach

30

Page 31: Session Search by Direct Policy Learning

Whole-Session Search Accuracy

• dpl achieves a significant improvement over the TREC best run • We found similar conclusions on TREC 2013 and 2014 Session Track

Experiments

31

Page 32: Session Search by Direct Policy Learning

Efficiency

• lemur > dpl > qcm > winwin • dpl achieves a good balance between accuracy and efficiency • the conclusions are also consistent upon experiments on TREC’12

~ 14 Session Tracks

32

Page 33: Session Search by Direct Policy Learning

Conclusions• A novel document retrieval algorithm by Direct Policy

Learning

• Define the history and three phases in search

• a way to describe the (messy) information seeking process

• The approach achieves a good balance between effectiveness and efficiency

• less complexity

• more flexible to incorporate large set of features

33