exploring composite retrieval from the users' perspective

62
School of Computing Science Exploring Composite Retrieval from the Users’ Perspective Horațiu Bota Ke Zhou Joemon Jose by @hora&ubota

Upload: horatiubota

Post on 17-Jul-2015

378 views

Category:

Education


0 download

TRANSCRIPT

School of Computing Science

Exploring Composite Retrievalfrom the Users’ Perspective

Horațiu BotaKe Zhou

Joemon Jose

by

@hora&ubota

Cluster based Web search

Cluster based Web search

Metasearch / Mashups

Cluster based Web search

Metasearch / Mashups

Aggregated Search

Cluster based Web search

Metasearch / Mashups

Aggregated Search

Knowledge Card

Cluster based Web search

Metasearch / Mashups

Aggregated Search

Knowledge Card

Composite Search?

@hora&ubota

Background: Aggregated Search

@hora&ubota

Background: Composite Search

• Finding accessories for an iPhone under budget constraints

• Tourist itineraries in a city within time budget

• Course recommendations with constraints

• A general framework

• Web search

Basu Roy et al., 2010

Amer-Yahia et al., 2013

Parameswaran et al., 2011

De Choudhury et al., 2010 Bota et al., 2014

@hora&ubota

Background: Composite Search

“Rather than returning and merging results from different verticals into the SERP, we propose to return to users a set of information objects (bundles) which

are composed of results from several verticals.”Bota et al., WWW 2014

@hora&ubota

• What do searchers expect from these information objects?

• What characteristics of these objects are most important to searchers?

Problem

@hora&ubota

1. Background • Aggregated Search• Composite Search

Roadmap

2. Our work • Study design• Research questions

3. Findings • Contents• Characteristics

4. Discussion • Insights• Future

@hora&ubota

Our work: study design

• Exploratory user study, 40 participants

• Simulated work task: manual document aggregation„Select most useful search results for writing a blog post”

• Compensated £10 for ~1h total duration.

@hora&ubota

Our work: study design

• Exploratory user study, 40 participants

• Simulated work task: manual document aggregation„Select most useful search results for writing a blog post”

• Compensated £10 for ~1h total duration.

• 40 different topics (MillionQuery, FedWeb)• Live collection - 8 verticals:

GW Images News Videos Social Blog QA Wiki

Bing Web Search API

YouTubeAPI

Twitter API

Google Custom Search Engine

@hora&ubota

Our work: task design

Pre-task questionnaire

@hora&ubota

Our work: task design

Pre-task questionnaire

Topic briefing

@hora&ubota

Our work: task design

Pre-task questionnaire

Topic briefing

Subtopic selection

Document selection

Build bundles

@hora&ubota

Our work: task design

Pre-task questionnaire

Topic briefing

Subtopic selection

Document selection

Build bundles

Document relevance

judgements

Bundle characteristics assessments

Rate bundles

@hora&ubota

Our work: task design

Pre-task questionnaire

Topic briefing

Subtopic selection

Document selection

Build bundles

Document relevance

judgements

Bundle characteristics assessments

Rate bundles

Pairwise preference

@hora&ubota

Our work: task design

Pre-task questionnaire

Topic briefing

Subtopic selection

Document selection

Build bundles

Document relevance

judgements

Bundle characteristics assessments

Rate bundles

Pairwise preference

Post-task questionnaire

@hora&ubota

Our work: task design

Pre-task questionnaire

Topic briefing

Subtopic selection

Document selection

Build bundles

Document relevance

judgements

Bundle characteristics assessments

Rate bundles

Pairwise preference

Post-task questionnaire

4X

@hora&ubota

Interface

• Interface

@hora&ubota

Interface

@hora&ubota

Interface

@hora&ubota

Our work: questions

1. Do users agree with each other on the subtopics they form bundles on?

@hora&ubota

Our work: questions

1. Do users agree with each other on the subtopics they form bundles on?

2. How do users aggregate information to build bundles?

@hora&ubota

Our work: questions

1. Do users agree with each other on the subtopics they form bundles on?

2. How do users aggregate information to build bundles?

3. Which bundle characteristics are most important to users?

@hora&ubota

Findings: subtopic agreement

(1) Do users agree with each other on the subtopics they form bundles on?

@hora&ubota

Findings: subtopic agreement

(1) Do users agree with each other on the subtopics they form bundles on?

% of par ticipants / topic involved in determining subtopic agreement

100% 75% 50%

% of bundles „about” same subtopic 12% 14% 16%

% of topics with

at least 1 common subtopic 32% 75% 90%

at least 2 common subtopics 0% 32% 85%

at least 3 common subtopics 0% 5% 60%

@hora&ubota

Findings: subtopic agreement

(1) Do users agree with each other on the subtopics they form bundles on?

% of par ticipants / topic involved in determining subtopic agreement

100% 75% 50%

% of bundles „about” same subtopic 12% 14% 16%

% of topics with

at least 1 common subtopic 32% 75% 90%

at least 2 common subtopics 0% 32% 85%

at least 3 common subtopics 0% 5% 60%

@hora&ubota

Findings: subtopic agreement

(1) Do users agree with each other on the subtopics they form bundles on?

% of par ticipants / topic involved in determining subtopic agreement

100% 75% 50%

% of bundles „about” same subtopic 12% 14% 16%

% of topics with

at least 1 common subtopic 32% 75% 90%

at least 2 common subtopics 0% 32% 85%

at least 3 common subtopics 0% 5% 60%

@hora&ubota

Findings: subtopic agreement

(1) Do users agree with each other on the subtopics they form bundles on?

% of par ticipants / topic involved in determining subtopic agreement

100% 75% 50%

% of bundles „about” same subtopic 12% 14% 16%

% of topics with

at least 1 common subtopic 32% 75% 90%

at least 2 common subtopics 0% 32% 85%

at least 3 common subtopics 0% 5% 60%

@hora&ubota

Findings: subtopic agreement

(1) Do users agree with each other on the subtopics they form bundles on?

% of par ticipants / topic involved in determining subtopic agreement

100% 75% 50%

% of bundles „about” same subtopic 12% 14% 16%

% of topics with

at least 1 common subtopic 32% 75% 90%

at least 2 common subtopics 0% 32% 85%

at least 3 common subtopics 0% 5% 60%

@hora&ubota

Findings: content

(2) How do users aggregate information to build bundles?

@hora&ubota

Num

ber o

f doc

umen

ts

0

1

2

3

GW Image Video News Social Blog Wiki QA

Findings: content

(2) How do users aggregate information to build bundles?

@hora&ubota

Num

ber o

f doc

umen

ts

0

1

2

3

GW Image Video News Social Blog Wiki QA

Findings: content

(2) How do users aggregate information to build bundles?

Perc

enta

ge o

f bun

dles

10%

20%

30%

Number of verticals in bundle

1 2 3 4 5 6 7

@hora&ubota

Num

ber o

f doc

umen

ts

0

1

2

3

GW Image Video News Social Blog Wiki QA

Findings: content

(2) How do users aggregate information to build bundles?

Perc

enta

ge o

f bun

dles

10%

20%

30%

Number of verticals in bundle

1 2 3 4 5 6 7

7.5%

15%

22.5%

30%

Vertical distribution in 3 vertical bundles

GW ImageVideo NewsBlog WikiQA

@hora&ubota

Findings: document roles

D3D2D1 D4

REL RELREL NREL

A:

@hora&ubota

Findings: document roles

D3D2D1 D4

REL RELREL NREL

A:

REL RELREL NREL

D5D4D3 D6 :B

@hora&ubota

Findings: document roles

D3D2D1 D4

REL RELREL NREL

A:

REL RELREL NREL

D5D4D3 D6 :B

@hora&ubota

Findings: document roles

D3D2D1 D4

REL RELREL NREL

A:

REL RELREL NREL

D5D4D3 D6 :B

Pivot documents

@hora&ubota

Findings: document roles

D3D2D1 D4

REL RELREL NREL

A:

REL RELREL NREL

D5D4D3 D6 :B

Pivot documents

Ornament documents

@hora&ubota

Findings: document roles

Pivot typeGW Wiki

Ornamenttype

GW - 24.6%Image 23.5% 31.1%Video 21.3% 18%News 7.1% 1.6%

Social <1% 6.6%Blog 9% 11.5%QA 17.4% 4.9%Wiki 19.7% -

Verticals in bundle2 verts 3 verts

Averagedocumentrelevance

perverticaltype

GW 3.872 3.667Image 3.208 3.352Video 3.228 3.649News 2.954 3.156Social 2.667 2.200Blog 3.593 3.402QA 2.560 2.652Wiki 3.553 3.584

@hora&ubota

Findings: document roles

Pivot typeGW Wiki

Ornamenttype

GW - 24.6%Image 23.5% 31.1%Video 21.3% 18%News 7.1% 1.6%

Social <1% 6.6%Blog 9% 11.5%QA 17.4% 4.9%Wiki 19.7% -

@hora&ubota

Findings: document roles

Verticals in bundle2 verts 3 verts

Averagedocumentrelevance

perverticaltype

GW 3.872 3.667Image 3.208 3.352Video 3.228 3.649News 2.954 3.156Social 2.667 2.200Blog 3.593 3.402QA 2.560 2.652Wiki 3.553 3.584

@hora&ubota

Findings: document roles

Verticals in bundle2 verts 3 verts

Averagedocumentrelevance

perverticaltype

GW 3.872 3.667Image 3.208 3.352Video 3.228 3.649News 2.954 3.156Social 2.667 2.200Blog 3.593 3.402QA 2.560 2.652Wiki 3.553 3.584

@hora&ubota

Findings: characteristics

(3) Which bundle characteristics are most

important to users?

@hora&ubota

Perc

enta

ge o

f sel

ectio

ns

0

20

40

Relevance Diversity Overall Freshness None Cohesion

5%6%7%21%24%37%

Findings: characteristics(3) Which bundle

characteristics are most important to users?

@hora&ubota

Perc

enta

ge o

f sel

ectio

ns

0

20

40

Relevance Diversity Overall Freshness None Cohesion

5%6%7%21%24%37%

Pearson’s R

All Chosen

Criterion

Relevance 0.332 0.496

Cohesion 0.228 0.432

Diversity 0.334 0.487

Freshness 0.208 0.213

Overall 0.453 0.454

Findings: characteristics(3) Which bundle

characteristics are most important to users?

@hora&ubota

Perc

enta

ge o

f sel

ectio

ns

0

20

40

Relevance Diversity Overall Freshness None Cohesion

5%6%7%21%24%37%

Pearson’s R

All Chosen

Criterion

Relevance 0.332 0.496

Cohesion 0.228 0.432

Diversity 0.334 0.487

Freshness 0.208 0.213

Overall 0.453 0.454

Findings: characteristics(3) Which bundle

characteristics are most important to users?

@hora&ubota

Perc

enta

ge o

f sel

ectio

ns

0

20

40

Relevance Diversity Overall Freshness None Cohesion

5%6%7%21%24%37%

Pearson’s R

All Chosen

Criterion

Relevance 0.332 0.496

Cohesion 0.228 0.432

Diversity 0.334 0.487

Freshness 0.208 0.213

Overall 0.453 0.454

Findings: characteristics(3) Which bundle

characteristics are most important to users?

@hora&ubota

Relevance Diversity Cohesion Freshness Overall

Relevance - 0.272 0.538 0.334 0.630

Diversity 0.272 - 0.144 0.485 0.478

Cohesion 0.538 0.144 - 0.250 0.548

Freshness 0.334 0.485 0.250 - 0.537

Overall 0.630 0.478 0.548 0.537 -

Perc

enta

ge o

f sel

ectio

ns

0

20

40

Relevance Diversity Overall Freshness None Cohesion

5%6%7%21%24%37%

Pearson’s R

All Chosen

Criterion

Relevance 0.332 0.496

Cohesion 0.228 0.432

Diversity 0.334 0.487

Freshness 0.208 0.213

Overall 0.453 0.454

Findings: characteristics(3) Which bundle

characteristics are most important to users?

@hora&ubota

Our work: conclusions

1. Do users agree with each other on the subtopics they form bundles on?

3. Which bundle

@hora&ubota

Our work: conclusions

1. Do users agree with each other on the subtopics they form bundles on?

• Some agreement between users

• Information objects could be focused popular facets

3. Which bundle

@hora&ubota

Our work: conclusions

2. How do users aggregate information to build bundles?

1. Do users agree with each other on the subtopics they form bundles on?

• Some agreement between users

• Information objects could be focused popular facets

3. Which bundle

@hora&ubota

Our work: conclusions

2. How do users aggregate information to build bundles?

1. Do users agree with each other on the subtopics they form bundles on?

• Some agreement between users

• Information objects could be focused popular facets

• Vertically diverse

• Relatively compact• Pivots & ornaments

3. Which bundle

@hora&ubota

Our work: conclusions

2. How do users aggregate information to build bundles?

• Vertically diverse

• Relatively compact• Pivots & ornaments

3. Which bundle characteristics are most important to users?

@hora&ubota

Our work: conclusions

2. How do users aggregate information to build bundles?

• Vertically diverse

• Relatively compact• Pivots & ornaments

3. Which bundle characteristics are most important to users?

• Hard to assess ind.• Relevance • Cohesion / diversity

@hora&ubota

Take home

@hora&ubota

Take home

Searchers want more complex ways of interacting with SERPs — if done right.

@hora&ubota

Take home

Searchers want more complex ways of interacting with SERPs — if done right.

Results composition can be used to generate more

complex information objects.

@hora&ubota

Take home

Searchers want more complex ways of interacting with SERPs — if done right.

Results composition can be used to generate more

complex information objects.

Understanding searcher needs in such contexts is very

important.

School of Computing Science

Thank you!Horațiu Bota

Ke ZhouJoemon Jose

from

@hora&ubota

This work was partially funded

by the LiMoSINeproject.

www.limosine-project.eu

Interface code is on Github, checkwww.horatiubota.com for details.