coping with noisy search experiences

37
Coping with Noisy Search Experiences Coping with Noisy Search Experiences Pierre-Antoine Champin , Peter Briggs, Maurice Coyle, Barry Smyth LIRIS, Universit´ e de Lyon, France Clarity, University College Dublin, Ireland 16 December 2009

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Pierre-Antoine Champin, Peter Briggs,Maurice Coyle, Barry Smyth

LIRIS,Universite de Lyon,

France

Clarity,University College Dublin,

Ireland

16 December 2009

Page 2: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Structure of the Talk

1 Context

2 Addressed Problem

3 Proposals and results

4 Perspectives

2 / 29

Page 3: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Structure of the Talk

1 ContextRecommender SystemsContext Aware RecommendationSocial Search

2 Addressed Problem

3 Proposals and results

4 Perspectives

3 / 29

Page 4: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

HeyStaks

HeyStaks is a

social

context aware

recommender system

for web searches

4 / 29

Page 5: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Recommender Systems

Recommender Systems

Recommender systems aim at presenting users withinformation that suit their particular preferences or needs.

General purpose search engines provide results based on anobjective measure of relevance w.r.t. the query→ same results for everyone

Recommender systems for web search aim at personalisingthe results of search engines.

5 / 29

Page 6: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Recommender Systems

HeyStaks as a Recommender System

HeyStaks is an extension for Firefox. It aims at integratinginto users’ habits rather than forcing them to change.

It recognizes result pages from popular search engines, andalter them in order to

promote links (move them up in the list),insert new links,

based on the user’s past search experiences.

Past search experiences are acquired by :

implicit feedback: query results click-throughexplicit feedback: tagging page, voting, sharing

6 / 29

Page 7: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Recommender Systems

HeyStaks as a Recommender System

HeyStaks is an extension for Firefox. It aims at integratinginto users’ habits rather than forcing them to change.

It recognizes result pages from popular search engines, andalter them in order to

promote links (move them up in the list),insert new links,

based on the user’s past search experiences.

Past search experiences are acquired by :

implicit feedback: query results click-throughexplicit feedback: tagging page, voting, sharing

6 / 29

Page 8: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Recommender Systems

Recommendations in HeyStaks

7 / 29

Page 9: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Context Aware Recommendation

Context Aware Recommendation

Search engines provide the same results for every user.

Recommender systems provide personalised results...... but provide the same personalisation every time.

Nobody is only one user...... their need depends on the context,

especially when considering Web searches.

My searches are sometimes related to my research, my teaching,my leisure...→ need for different recommendations in different contexts.

8 / 29

Page 10: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Context Aware Recommendation

Context Aware Recommendation

Search engines provide the same results for every user.

Recommender systems provide personalised results...... but provide the same personalisation every time.

Nobody is only one user...... their need depends on the context,

especially when considering Web searches.

My searches are sometimes related to my research, my teaching,my leisure...→ need for different recommendations in different contexts.

8 / 29

Page 11: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Context Aware Recommendation

Search Staks

A search stak is a repository of search experiences all relatedto the same context.

Users can create as many staks as they need.

They manually select the active stak (current context).

The active stak is where search experiences will be collected,and where they will be tapped to provide recommendations.

9 / 29

Page 12: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Social Search

Social Search

Social search is the process of sharing search experiencesbetween like-minded Web searchers.

In HeyStaks, social search is possible by shared staks: severalusers can contribute to, and receive recommendation from thesame stak.

Staks can be

private: only people I invite can join it.public: anyone can join the stak.

10 / 29

Page 13: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Context

Social Search

HeyStaks Portal

11 / 29

Page 14: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Addressed Problem

Structure of the Talk

1 Context

2 Addressed Problem

3 Proposals and results

4 Perspectives

12 / 29

Page 15: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Addressed Problem

The Problem of Stak Selection

Users fail to select the appropriate stak before starting a search.

The recommendations they will get will be less relevant.

Their search experience is filed in the wrong stak.

→ HeyStaks ends up with a noisy experience repositories, andprovides less accurate recommendations (even when the correctstak is selected).

13 / 29

Page 16: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Addressed Problem

Implemented Workarounds

Fall back to default stak when idle.

limits the input noisepotentially reduces context awareness

Signal when other staks provide recommendations.

improves the relevance of recommendation, despite a wrongactive stakencourages to select the right stak

14 / 29

Page 17: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Addressed Problem

Other Possible Solutions

Automatically select the right stak at query time.

almost impossible if based on the sole query terms

hazardous if based on the available recommendations

technically complicated if based on external indicators

time tracking toolsbrowsing history...

Help stak owners to maintain (or curate) their staks.

use classification techniques

recommend the correct stak for a page

pages are easier to classify than queries

15 / 29

Page 18: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Addressed Problem

Other Possible Solutions

Automatically select the right stak at query time.

almost impossible if based on the sole query terms

hazardous if based on the available recommendations

technically complicated if based on external indicators

time tracking toolsbrowsing history...

Help stak owners to maintain (or curate) their staks.

use classification techniques

recommend the correct stak for a page

pages are easier to classify than queries

15 / 29

Page 19: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Addressed Problem

Other Possible Solutions

Automatically select the right stak at query time.

almost impossible if based on the sole query terms

hazardous if based on the available recommendations

technically complicated if based on external indicators

time tracking toolsbrowsing history...

Help stak owners to maintain (or curate) their staks.

use classification techniques

recommend the correct stak for a page

pages are easier to classify than queries

15 / 29

Page 20: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Addressed Problem

Training the Page Classifier

Most work on coping with noise in recommender systemsassume that a clean training set is available before noise isencountered.

We need to find the kernel of each stak: the set (or a subsetof) the pages actually relevant to that stak.

How can we find a reliable kernel?How can we evaluate its reliability?

16 / 29

Page 21: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Structure of the Talk

1 Context

2 Addressed Problem

3 Proposals and resultsClusteringPopularity weightingPopularity-based kernel

4 Perspectives

17 / 29

Page 22: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Clustering

Clustering

Idea: use clustering techniques to identify a candidate kernel

Rationale: kernel pages must be somehow similar, while noisypages will be heterogeneous

Problem: huge variability depending on numerous parameters

comparing terms or pagesdifferent similarity measuresdifferent clustering algorithmsthreshold values

18 / 29

Page 23: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Popularity Weighting

Idea: use a measure of the popularity of pages as a proxy torelevance, in order to provide a fuzzy kernel

Rationale: kernel pages are repeatedly selected, while noisypages will only be accidentally selected

19 / 29

Page 24: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Popularity Measure

20 / 29

Page 25: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Popularity Measure

20 / 29

Page 26: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Popularity Measure

20 / 29

Page 27: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Popularity Measure

20 / 29

Page 28: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

User Evaluation

Poll:

for each of the 20 biggest shared staks

15 most popular pages & 15 least popular pages

presented in random order to the stak owner

asked if the page is relevant to the stak

21 / 29

Page 29: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Poll Results

0

10

20

30

40

50

60

70

80

90

100

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90

nu

mb

er

of

do

cu

me

nts

popularity

IrelevantI don’t know

Relevant

22 / 29

Page 30: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Experiment

Classifier:

decision tree / naive bayse

for each of the 20 biggest shared staks

trained with every page, weighted by normalized popularity

10-fold cross validation

Accuracy

each page contributes to the accuracy proportionally to itsnormalized popularity→ it is more important for the classifier to recognize popularpages than unpopular pages.

23 / 29

Page 31: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Experimental Results

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

J48 NaiveBayes ZeroR

we

igh

ted

accu

racy

weighted

24 / 29

Page 32: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity weighting

Experimental Results

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

J48 NaiveBayes ZeroR

we

igh

ted

accu

racy

weightedboolean unweighted

24 / 29

Page 33: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity-based kernel

Validity of the Measure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

we

igh

ted

accu

racy

minimum normalized popularity

J48NaiveBayes

ZeroR

25 / 29

Page 34: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Proposals and results

Popularity-based kernel

Kernel-Trained Classifier

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

J48 NaiveBayes ZeroR

we

igh

ted

accu

racy f

or

np

> 0

.6

whole-trained kernel-trained

26 / 29

Page 35: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Perspectives

Structure of the Talk

1 Context

2 Addressed Problem

3 Proposals and results

4 Perspectives

27 / 29

Page 36: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Perspectives

Further Work

Integrate the classifier to the stak maintenance page.

Which user interface?How to integrate user feedback into the classifier?

Use the classifier to recommend a stak at query time.

First experiments are not satisfying

28 / 29

Page 37: Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Perspectives

Perspectives

Beyond the specific application domain of this work...

Coping with noisy knowledge bases.

Relevant for experience repositoriesRelevant for large scale (Web-based) systems

Using knowledge for a different purpose.

What other purposes can a knowledge base serve?What properties of the knowledge base make it easy/hard toreuse?

Experience is the name everyone gives to their mistakes

—Oscar Wilde

29 / 29