overview of the living labs for ir evaluation (ll4ir) clef lab

Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

http://living-labs.net@livinglabsnet

“Give us your ranking, we’ll have it clicked!”

Krisztian BalogUniversity of Stavanger

Liadh KellyTrinity College Dublin

Anne SchuthBlendle

7th International Conference of the CLEF Association (CLEF 2016) | Évora, Portugal, 2016

Living Labs for IR Evaluation

Motivation- Overall goal: make information retrieval

evaluation more realistic

new retrieval methodusers live site

interaction data

How to test a new method with real users in their natural task environment (i.e., on the live site)?

How to make interaction data available for method development?

Key idea

new retrieval methods

users live site

data (docs/products,

logs, etc.)

K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14

Key idea

users live site

An API orchestrates all data exchange between the live site and experimental systems#1

logs, etc.)

Key idea

users live site

Focus on frequent (head) queries.- Ranked result lists can be generated offline - Enough traffic on them (historical & live)#2

logs, etc.)

Key idea

users live site

Medium to large organizations with fair amount of search volumeTypically lack their own R&D department#3

logs, etc.)

Methodology1. Queries, candidate documents, historical search and

click data made available

{ "queries": [ { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q1", "qstr": "monster high", "type": "train" }, { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q51", "qstr": "puzzle",

{ "doclist": [ { "docid": "R-d1291", "site_id": "R", "title": "LEGO DUPLO Hamupip\u0151ke hint\u00f3ja 6153" }, { "docid": "R-d1306", "site_id": "R", "title": "LEGO Rend\u0151rkapit\u00e1nys\u00e1g 5681" },

{ "content": { "age_max": 3, "age_min": 1, "arrived": "2014-08-28", "available": 0, "brand": "Lego", "category": "LEGO", "category_id": "38", "characters": [], "description": "Lego Duplo - \u00c9p\u00edt\u0151-\u00e9s j\u00e1t\u00e9kkock\u00e1k kicsiknek 10553<br />[…]",

Methodology2. Rankings are generated for each query and uploaded

through an API

{ "qid": "U-q22", "runid": "82" "creation_time": "Wed, 04 Jun 2014 15:03:56 -0000", "doclist": [ { "docid": "U-d4" }, { "docid": "U-d2" }, ... ],

Methodology3. When any of the test queries is fired, the live site

request rankings from the API and interleaves them with that of the production system

Interleaving- Site provides the set of candidate items that can be

re-ranked (safety mechanism)- Experimental ranking is interleaved with the

production ranking- Meeds 1-2 order of magnitudes data than A/B testing (also,

it is within subject as opposed to between subject design)

system A system Bdoc 1

interleaved list

A>BInference:

Methodology4. Participants get detailed feedback on user

interactions (clicks)

{ "feedback": [ { "qid": "S-q1", "runid": "baseline", "type": "tdi", "doclist": [ { "docid": "S-d1", "clicked": true, "team": "site", },

Methodology5. Ultimate measure is the number of “wins” against the

production system (aggregated over a period of time)

Outcome =#Wins

#Wins + #Losses

What is in it for participants?

- Access to privileged commercial data - (Search and click-through data)

- Opportunity to test IR systems with real, unsuspecting users in a live setting- (Not the same as crowdsourcing!)

- (Continuous evaluation is possible, not limited to yearly evaluation cycle)

The Living Labs Platform

Source codehttps://bitbucket.org/living-labs/ll-api

Documentationhttp://doc.living-labs.net/

Dashboardhttp://dashboard.living-labs.net/

CLEF LL4IR

Use-cases

• Product search (REGIO Játék)

• Web search(Seznam)

• Product search (REGIO Játék)

Benchmark organizationtraining period test period

query type

train- feedback available- individual feedback

- update possible

test - feedback available

- no individual feedback - update possible

- no feedback available - no individual feedback

- update not possible

Product search- Ad-hoc retrieval over a product catalog- Several thousand products- Limited amount of text, lots of structure

- Categories, characters, brands, etc.

Product data

Product data Product name

Price / bonus price

Short description

Recommended age from/to

Gender recommendation

Categories

Brands

Long description

(Links to) photos

{ "content": { "age_max": 10, "age_min": 6, "arrived": "2014-08-28", "available": 1, "brand": "Mattel", "category": "Bab\u00e1k, kell\u00e9kek", "category_id": "25", "characters": [], "description": "A Monster High\u00ae iskola sz\u00f6rnycsemet\u00e9i […]", "gender": 2, "main_category": "Baba, babakocsi", "main_category_id": "3", "photos": [ "http://regiojatek.hu/data/regio_images/normal/20777_0.jpg", "http://regiojatek.hu/data/regio_images/normal/20777_1.jpg", […] ], "price": 8675.0, "product_name": "Monster High Scaris Parav\u00e1rosi baba t\u00f6bbf\u00e9le", "queries": { "clawdeen": "0.037", "monster": "0.222", "monster high": "0.741" }, "short_description": "A Monster High\u00ae iskola sz\u00f6rnycsemet\u00e9i els\u0151 k\u00fclf\u00f6ldi \u00fatjukra indulnak..." }, "creation_time": "Mon, 11 May 2015 04:52:59 -0000", "docid": "R-d43", "site_id": "R", "title": "Monster High Scaris Parav\u00e1rosi baba t\u00f6bbf\u00e9le" }

Frequent queries that led to the product

Queries- Typically very short

monster high magnetiz duplo lego friends geomag trash+pack barbie

monopoly lego duplo transformers star wars nerf carrera baba

Results (2015)O

Evaluation round0 1 2 3 4 5

Baseline UiS GESIS IRIT

Inventory changesNew arrivalBecame availableBecame unavailable

80−4

05−01 05−03 05−05 05−07 05−09 05−11 05−13 05−15

Summary and Outlook

Summary- Successes

- Experimental methodology - Many interesting opportunities to address current limitations

(come to NewsREEL & LL4IR session tomorrow) - The living labs platform

- Open source, can be used for a variety of tasks - Some interesting work for product search

- See best of the labs session - Lack of success

- Raise sufficient interest in the use-cases at CLEF

Limitations / Open issues- Head queries only: Considerable portion of traffic,

but only popular info needs- Lack of context: No knowledge of the searcher’s

location, previous searches, etc.- No real-time feedback: API provides detailed

feedback, but it’s not immediate- Limited control: Experimentation is limited to single

searches, where results are interleaved with those of the production system; no control over the entire result list

- Ultimate measure of success: Search is only a means to an end, it is not the ultimate goal

TREC Open Search http://trec-open-search.org/

- Use-case: academic search- Ad-hoc document search

- Sites- CiteSeerX - SSOAR — German Social Sciences - Microsoft Academic Search

- Round #3 runs from Oct 1 to Nov 15

We you!living-labs.net

Thanks to

overview of the living labs for ir evaluation (ll4ir) clef lab

Technology

clef february 2014

clef 2006 poster boaster session cross-language evaluation...

notation - university of tennessee · treble clef bass clef...

overview of clef newsreel 2014: news...

musical terms and symbols practice. treble clef bass clef

clef notes - sjys.org

king treble clef

treble clef/g clef

clef june 2015

music note reading basic lessons the staff the g-clef or...

clef july 2014

clef may 2015

clef july 2013

la clef 78 saint-germain.en.laye la clef … clef 78...

clef transposition

clef september 2014

clef august 2014

clef february 2015

a fantasy for bandtrombone in bb bass clef 3 trombone in bb...

bass clef rap