rutgers information interaction lab at trec 2005: trying hard n.j. belkin, m. cole, j. gwizdka,...

26
Rutgers Information Interaction Lab at TREC 2005: Trying HARD N.J. Belkin, M. Cole, J. Gwizdka, Y.-L. Li, J.-J. Liu, G. Muresan, D. Roussinov*, C.A. Smith, A. Taylor, X.-J. Yuan Rutgers University; *Arizona State

Post on 21-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Rutgers Information Interaction Lab at TREC 2005:

Trying HARD

N.J. Belkin, M. Cole, J. Gwizdka, Y.-L. Li, J.-J. Liu, G. Muresan, D. Roussinov*,

C.A. Smith, A. Taylor, X.-J. Yuan

Rutgers University; *Arizona State University

Our Major Goal

• Clarification forms (CFs) are simulations of user-system interaction

• Users are unwilling to engage in explicit interaction unless payoff is high, and interaction is understood as relevant

• Is explicit interaction worthwhile, and if so, under what circumstances?

General Approach to the Question

• Use relatively “standard” interactive elicitation techniques to enhance/ disambiguate original query

• Compare results to baseline• Compare results to baseline plus relatively

“standard” non-interactive query enhancement techniques, in particular, pseudo-rf

Methods for Automatic Query Enhancement

• Pseudo-relevance feedback (standard Lemur)

• Language modeling-based query expansion (clarity), derived from collection

• Web-based query expansion

Methods for User-Based Query Enhancement

• User selection of terms suggested by “clarity” and web methods (user selection based on Koenemann & Belkin, 1996; Belkin, et al., 2000)

• Elicitation of extended information problem descriptions (elicitation based on Kelly, Dollu & Fu, 2004; 2005)

Hypotheses for Automatic Enhancement

• H1: Query expansion using “clarity”-derived terms will improve performance over baseline & baseline + pseudo-rf

• H2: Query expansion using web-derived terms will improve performance, ditto

• H2b: Query expansion using both clarity- and web-derived terms will improve performance, ditto

Hypotheses for User-Based Query Enhancement

• H3: Query expansion with terms selected by the user from those suggested by clarity- and web-derived terms will improve performance, over everything else

• H4: Query expansion using “problem statements” elicited from users will increase performance over baseline & baseline + pseudo-rf

Hypothesis for When Elicitation is Useful

• H5: The effectiveness of query expansion using problem statements will be negatively correlated with query clarity.

Query Run Designations

• RUTGBL: Baseline query (title + description)

• RUTGBF3: Baseline + pseudo-rf (Lemur)

• RUTGWS1: Baseline + 0.1(Web-suggested terms)

• RUTGLS1: Baseline + 0.1(clarity-suggested terms)

• RUTGAS1: Baseline + 0.1(all suggested terms)

• RUTGUS1: Baseline + 0.1(terms selected by user)

• RUTGUG1: Baseline + 0.1(user-generated terms)

• RUTGALL: Baseline + all suggested terms and all user-generated terms

Identification of Suggested Terms

• Clarity: Compute query clarity for topic baseline (Lemur QueryClarity); sort terms accordingly; choose top ten

• Web: Next slide, please

Title: human smugglingDescription: Identify incidents of human smuggling

Navigation by Expansion Paradigm (NBE)

aliens

arrested

borderhaitianstrafficked

undocumented

Navigation by Expansion Paradigm (NBE)

• Step1: Overview of the surroundings– Produces words and phrases “clearly related” to the topic– Internet mining: topic sent to Google– Logistic regression on the “signal to noise” ratio:

• Signal = df(results)/#results• Noise = df(web)/#web• Pr = 1 – exp (-(signal/noise – 1)/a)

• Step2: Valid “moves” identified – Related concepts from step 1 and those that

• Are present in AQUAINT• Would affect search results if selected: impact estimate = P*df*idf

• Step 3: Selected moves executed– E.g. by query expansion:

• Score = original score + expansion score * expansion factor

“Combination” Run

• Combining pseudo-rf with user-selected terms from CF1 (run RUTBE)

• R-Prec. for RUTBE 0.334• Substantially better than all other runs, but

not comparable, because using different ranking function (BM25) and different differential weighting (0.3 for added terms)

• Indicative of possible improvements

User Selection (CF1)

User Generation (CF2)

System Implementation

• Lemur 3.1, 4.0, 4.1, using StructQueryEval

• Could we ask for somewhat more detailed documentation from the Lemur group?

Comparison to Other Sites

 R-precision MAP p@10

 Mean SD Mean SD Mean SD

Overall Baseline median 0.252 0.149 0.190 0.147 0.408 0.28

RUTGBL 0.270 0.167 0.206 0.163 0.408 0.30

Overall Final median 0.264 0.152 0.207 0.161 0.45 0.30

RUTGALL 0.299* 0.182 0.253 0.188 0.49** 0.31

R-Precision for Test Runs

rp.ALL rp.AS1 rp.BF3 rp.BL rp.LS1 rp.UG1 rp.US1 rp.WS1

0.0000

0.2000

0.4000

0.6000

0.8000

374

Summary of Significant Differences, R-Prec.

 BL AS1 LS1 WS1 US1 UG1 BF3 ALL

BL 0.270             

AS1*

0.278----            

LS1*

0.279n/s ----          

WS1*

0.281n/s n/s ----        

US1*

0.282n/s n/s n/s ----      

UG1*

0.286n/s n/s n/s n/s ----    

BF3n/s

0.287n/s n/s n/s n/s n/s ----  

ALLn/s

0.299n/s n/s n/s n/s n/s n/s ----

Varying Weights of Baseline Terms w.r.t.CF2 Terms

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Q1

Q2

Q3

Q1,Q2,Q3

Q1&Q2

Q1,Q3

Varying Weights of CF2 Terms w.r.t. Baseline Terms

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Q1

Q2

Q3

Q1,Q2,Q3

Q1&Q2

Q1,Q3

CF2 & Baseline Terms, Equal WeightsRun name R-Precision Precision at 10 Mean Average Precision

  Mean SD Mean SD Mean SD

RUTGBL 0.270 0.167 0.408 0.3 0.206 0.16

Q1 0.290 0.178 0.498* 0.325 0.236 0.183

Q2 0.274 0.181 0.474* 0.321 0.223 0.181

Q3 0.295 0.164 0.498** 0.303 0.237** 0.175

Q1Q2 0.298* 0.182 0.514** 0.326 0.248** 0.190

Q1Q3 0.313* 0.176 0.538*** 0.314 0.263** 0.186

Q1Q2Q3 0.314** 0.179 0.564*** 0.304 0.268** 0.190

Results w.r.t. Hypotheses

• H1, H2, H3, H4 weakly supported w.r.t. baseline, not to pseudo-rf

• H5 not supported– No correlation between baseline query clarity,

and effectiveness of expanding with CF2 terms

Discussion (1)

• Both automatic and user-based query enhancement improved performance over baseline, but not over pseudo-rf

• No significant differences in performance between any enhancement methods, except Q1 v. Q1+Q3 (r-precision, 0.290 vs. 0.313)

Discussion (2)

• Some benefit both from automatic methods, and to explicit interaction with user, which require some effort from the user that goes beyond initial query formulation

• This interpretation of the results depends on the assumption that title+description queries are accurate simulations of user behavior

(Tentative) Conclusions

• Results indicate that invoking user interaction for query clarification is unlikely to be cost effective

• Alternative might be to develop ways to encourage more elaborate query formulation in the first instance, enhanced with automatic methods.

• Subsequent enhancement could be via implicit sources of evidence, rather than explicit questioning, requiring no additional effort from the user.