coping with noisy search experiences
TRANSCRIPT
Coping with Noisy Search Experiences
Coping with Noisy Search Experiences
Pierre-Antoine Champin, Peter Briggs,Maurice Coyle, Barry Smyth
LIRIS,Universite de Lyon,
France
Clarity,University College Dublin,
Ireland
16 December 2009
Coping with Noisy Search Experiences
Structure of the Talk
1 Context
2 Addressed Problem
3 Proposals and results
4 Perspectives
2 / 29
Coping with Noisy Search Experiences
Context
Structure of the Talk
1 ContextRecommender SystemsContext Aware RecommendationSocial Search
2 Addressed Problem
3 Proposals and results
4 Perspectives
3 / 29
Coping with Noisy Search Experiences
Context
HeyStaks
HeyStaks is a
social
context aware
recommender system
for web searches
4 / 29
Coping with Noisy Search Experiences
Context
Recommender Systems
Recommender Systems
Recommender systems aim at presenting users withinformation that suit their particular preferences or needs.
General purpose search engines provide results based on anobjective measure of relevance w.r.t. the query→ same results for everyone
Recommender systems for web search aim at personalisingthe results of search engines.
5 / 29
Coping with Noisy Search Experiences
Context
Recommender Systems
HeyStaks as a Recommender System
HeyStaks is an extension for Firefox. It aims at integratinginto users’ habits rather than forcing them to change.
It recognizes result pages from popular search engines, andalter them in order to
promote links (move them up in the list),insert new links,
based on the user’s past search experiences.
Past search experiences are acquired by :
implicit feedback: query results click-throughexplicit feedback: tagging page, voting, sharing
6 / 29
Coping with Noisy Search Experiences
Context
Recommender Systems
HeyStaks as a Recommender System
HeyStaks is an extension for Firefox. It aims at integratinginto users’ habits rather than forcing them to change.
It recognizes result pages from popular search engines, andalter them in order to
promote links (move them up in the list),insert new links,
based on the user’s past search experiences.
Past search experiences are acquired by :
implicit feedback: query results click-throughexplicit feedback: tagging page, voting, sharing
6 / 29
Coping with Noisy Search Experiences
Context
Recommender Systems
Recommendations in HeyStaks
7 / 29
Coping with Noisy Search Experiences
Context
Context Aware Recommendation
Context Aware Recommendation
Search engines provide the same results for every user.
Recommender systems provide personalised results...... but provide the same personalisation every time.
Nobody is only one user...... their need depends on the context,
especially when considering Web searches.
My searches are sometimes related to my research, my teaching,my leisure...→ need for different recommendations in different contexts.
8 / 29
Coping with Noisy Search Experiences
Context
Context Aware Recommendation
Context Aware Recommendation
Search engines provide the same results for every user.
Recommender systems provide personalised results...... but provide the same personalisation every time.
Nobody is only one user...... their need depends on the context,
especially when considering Web searches.
My searches are sometimes related to my research, my teaching,my leisure...→ need for different recommendations in different contexts.
8 / 29
Coping with Noisy Search Experiences
Context
Context Aware Recommendation
Search Staks
A search stak is a repository of search experiences all relatedto the same context.
Users can create as many staks as they need.
They manually select the active stak (current context).
The active stak is where search experiences will be collected,and where they will be tapped to provide recommendations.
9 / 29
Coping with Noisy Search Experiences
Context
Social Search
Social Search
Social search is the process of sharing search experiencesbetween like-minded Web searchers.
In HeyStaks, social search is possible by shared staks: severalusers can contribute to, and receive recommendation from thesame stak.
Staks can be
private: only people I invite can join it.public: anyone can join the stak.
10 / 29
Coping with Noisy Search Experiences
Context
Social Search
HeyStaks Portal
11 / 29
Coping with Noisy Search Experiences
Addressed Problem
Structure of the Talk
1 Context
2 Addressed Problem
3 Proposals and results
4 Perspectives
12 / 29
Coping with Noisy Search Experiences
Addressed Problem
The Problem of Stak Selection
Users fail to select the appropriate stak before starting a search.
The recommendations they will get will be less relevant.
Their search experience is filed in the wrong stak.
→ HeyStaks ends up with a noisy experience repositories, andprovides less accurate recommendations (even when the correctstak is selected).
13 / 29
Coping with Noisy Search Experiences
Addressed Problem
Implemented Workarounds
Fall back to default stak when idle.
limits the input noisepotentially reduces context awareness
Signal when other staks provide recommendations.
improves the relevance of recommendation, despite a wrongactive stakencourages to select the right stak
14 / 29
Coping with Noisy Search Experiences
Addressed Problem
Other Possible Solutions
Automatically select the right stak at query time.
almost impossible if based on the sole query terms
hazardous if based on the available recommendations
technically complicated if based on external indicators
time tracking toolsbrowsing history...
Help stak owners to maintain (or curate) their staks.
use classification techniques
recommend the correct stak for a page
pages are easier to classify than queries
15 / 29
Coping with Noisy Search Experiences
Addressed Problem
Other Possible Solutions
Automatically select the right stak at query time.
almost impossible if based on the sole query terms
hazardous if based on the available recommendations
technically complicated if based on external indicators
time tracking toolsbrowsing history...
Help stak owners to maintain (or curate) their staks.
use classification techniques
recommend the correct stak for a page
pages are easier to classify than queries
15 / 29
Coping with Noisy Search Experiences
Addressed Problem
Other Possible Solutions
Automatically select the right stak at query time.
almost impossible if based on the sole query terms
hazardous if based on the available recommendations
technically complicated if based on external indicators
time tracking toolsbrowsing history...
Help stak owners to maintain (or curate) their staks.
use classification techniques
recommend the correct stak for a page
pages are easier to classify than queries
15 / 29
Coping with Noisy Search Experiences
Addressed Problem
Training the Page Classifier
Most work on coping with noise in recommender systemsassume that a clean training set is available before noise isencountered.
We need to find the kernel of each stak: the set (or a subsetof) the pages actually relevant to that stak.
How can we find a reliable kernel?How can we evaluate its reliability?
16 / 29
Coping with Noisy Search Experiences
Proposals and results
Structure of the Talk
1 Context
2 Addressed Problem
3 Proposals and resultsClusteringPopularity weightingPopularity-based kernel
4 Perspectives
17 / 29
Coping with Noisy Search Experiences
Proposals and results
Clustering
Clustering
Idea: use clustering techniques to identify a candidate kernel
Rationale: kernel pages must be somehow similar, while noisypages will be heterogeneous
Problem: huge variability depending on numerous parameters
comparing terms or pagesdifferent similarity measuresdifferent clustering algorithmsthreshold values
18 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Popularity Weighting
Idea: use a measure of the popularity of pages as a proxy torelevance, in order to provide a fuzzy kernel
Rationale: kernel pages are repeatedly selected, while noisypages will only be accidentally selected
19 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
User Evaluation
Poll:
for each of the 20 biggest shared staks
15 most popular pages & 15 least popular pages
presented in random order to the stak owner
asked if the page is relevant to the stak
21 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Poll Results
0
10
20
30
40
50
60
70
80
90
100
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
nu
mb
er
of
do
cu
me
nts
popularity
IrelevantI don’t know
Relevant
22 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Experiment
Classifier:
decision tree / naive bayse
for each of the 20 biggest shared staks
trained with every page, weighted by normalized popularity
10-fold cross validation
Accuracy
each page contributes to the accuracy proportionally to itsnormalized popularity→ it is more important for the classifier to recognize popularpages than unpopular pages.
23 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Experimental Results
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
J48 NaiveBayes ZeroR
we
igh
ted
accu
racy
weighted
24 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity weighting
Experimental Results
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
J48 NaiveBayes ZeroR
we
igh
ted
accu
racy
weightedboolean unweighted
24 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity-based kernel
Validity of the Measure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
we
igh
ted
accu
racy
minimum normalized popularity
J48NaiveBayes
ZeroR
25 / 29
Coping with Noisy Search Experiences
Proposals and results
Popularity-based kernel
Kernel-Trained Classifier
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
J48 NaiveBayes ZeroR
we
igh
ted
accu
racy f
or
np
> 0
.6
whole-trained kernel-trained
26 / 29
Coping with Noisy Search Experiences
Perspectives
Structure of the Talk
1 Context
2 Addressed Problem
3 Proposals and results
4 Perspectives
27 / 29
Coping with Noisy Search Experiences
Perspectives
Further Work
Integrate the classifier to the stak maintenance page.
Which user interface?How to integrate user feedback into the classifier?
Use the classifier to recommend a stak at query time.
First experiments are not satisfying
28 / 29
Coping with Noisy Search Experiences
Perspectives
Perspectives
Beyond the specific application domain of this work...
Coping with noisy knowledge bases.
Relevant for experience repositoriesRelevant for large scale (Web-based) systems
Using knowledge for a different purpose.
What other purposes can a knowledge base serve?What properties of the knowledge base make it easy/hard toreuse?
Experience is the name everyone gives to their mistakes
—Oscar Wilde
29 / 29