barbara made the news: mining the behavior of crowds for time-aware learning to rank
Post on 13-Apr-2017
434 Views
Preview:
TRANSCRIPT
Barbara Made the News:
Mining the Behavior of Crowds
for Time-Aware Learning to Rank
Flávio Martins*, João Magalhães*, and Jamie Callan†
* NOVA LINCS, Universidade NOVA de Lisboa† LTI, Carnegie Mellon University
The 9th ACM International Conference on Web Search and Data Mining
San Francisco, California, USA. 20-25 February, 2016
1
Twitter real-time ad-hoc retrieval• Vocabulary mismatch
• Temporal dimension
• High volume / Lots of users
What do people search on Twitter? #TwitterSearch [Teevan et al. 2011]
Temporally relevant information• breaking news, real-time content, popular trends, etc.
Information related to people• information about people of interest, general sentiment and opinion, etc.
Why Time-aware Ranking?
2
Time-aware Ranking
Recency-based ranking
• Time-based language models [Li and Croft 2003]
• Estimation methods for ranking recent information [Efron and Golovchinsky 2011]
Time-dependent ranking
• Temporal profiles of queries [Jones and Diaz 2007]
• Time-sensitive queries [Dakka et al. 2012]
• Temporal feedback for tweet search [Efron et al. 2014]
Temporal distribution of documents
• Histograms vs Probability Density function
3
(age)( )P D e
( | ) ~ ( | ) ( | ) ( ) ( )D D D DP D Q P W Q P T Q P W P T
Time-aware Ranking
Most prior work gathers temporal information from a single source, typically
the corpus itself or an external source such as Wikipedia.
These strategies assume that all events will have an impact on Twitter or
Wikipedia, and that the temporal signal available from Twitter or Wikipedia will
be clear and unambiguous, however that is not always the case.
Temporal feedback: Twitter as the only
source of data can fail for some queries.
4
Mining the Behavior of Crowds
What is happening now prompts users on the Web to produce and interact
with new posts about newsworthy events giving rise to trending topics.
We leverage on the behavioral dynamics of the crowd to estimate a topic’s
temporal relevance i.e., the most relevant time periods for a search query.
5
Time-aware Ranking with Multiple Sources
Our approach builds on two major novelties:
1. First, a unifying approach that given query q, mines and represents temporal
evidence from multiple sources of crowd signals. This allows us to predict
the temporal relevance of documents for query q.
2. Second, a principled retrieval model that integrates temporal signals in a
learning to rank framework, to rank results according to the predicted
temporal relevance.
6
Time-aware Ranking with Multiple Sources
7
A Unified Representation of Temporal Crowd Signals
• Considering a set 𝑆 ∈ ሼ ሽ𝑆1, 𝑆2, … , 𝑆𝑘 of information sources that reflect a real-
world event, our goal is to discover the relevance of a timestamp for a
particular query. In other words, we wish to infer a function:
• The probability density function 𝑓𝑠𝑘 𝑡𝑑𝑏 𝑞𝑎) is approximated by a kernel density
estimation which is advantageous due to the natural smoothness of the
resulting function:
( , ) [0,1]ks a bf q t
0
1( )
k
k
k
sns i
s i
i
t tf t w K
nh h
1
* 5ˆ1.06h n
( | , , )bd a kP r t q s
( ) ,K z z h
8
A Unified Representation of Temporal Crowd Signals
1
( | )
( | )
ts a bi n
a j
j
P q dw
P q d
( , )es
i j a
j
w TF q e
( , )hs
i a iw J q h
min( ),vs
i i iw v v v
Wikipedia Edits
News
Wikipedia Views
Temporal Feedback
Jaccard similarity coefficient of the terms
of query 𝑞𝑎 with this news headline ℎ𝑖.
Normalized page views per day.
Frequency of query 𝑞𝑎 terms in edit.
Query-likelihood document score
9
Evaluation: Baselines and methods
Standard retrieval functions
• BM25 retrieval function [Robertson et al. 1994]
• IDF – Inverse Document Frequency
• LM.Dir – Language models with Dirichlet smoothing [Zhai and Lafferty 2004]
Time-aware ranking methods
• Recency – Time-based language models [Li and Croft 2003]
• KDE(score) – Temporal feedback for tweet search [Efron et al. 2014]
Learning to rank
• LTR – Learning to rank model with lexical and domain features (Non-temporal)
• RMTS – Learning to rank model with multiple temporal sources (Temporal)
10
Learning to Rank: Ranking framework
LTR( , ) ( , ) ( )i ja b i l a b j c b
i j
q d f q d f d
( , , ) ( , ) ( ) ( , )b i j k ba d b i l a b j c b k s a d
i j k
RMTS q t d f q d f d f q t
Lexical features Domain features
Lexical features Domain features Temporal features
11
LTR: Lexical and domain features
Feature name Feature description
BM25 Okapi BM25 score for tweet-text.
LM.Dir Language modeling score for tweet-text.
IDF Sum of term IDF in tweet-text.
Length Tweet-text length.
NumURLs Number of URLs in tweet-text.
HasURLs True if tweet-text contains URLs.
NumHashtags Number of Hashtags in tweet-text.
HasHashtags True if tweet-text contains Hashtags.
NumMentions Number of Mentions in tweet-text.
HasMentions True if tweet-text contains Mentions.
isReply True if tweet-text is a reply.
NumStatuses Number of user’s statuses.
NumFollowers Number of user’s followers.
Lexical features
Credibility
Tweet features
12
Domain features
• Each temporal feature represents
the likelihood of relevance of a
document (tweet) according to its
publication timestamp.
• This value is obtained using
Kernel Density Estimation over
the data mined from each source.
RMTS: LTR + Temporal ranking features
Feature name Feature source
Recency (R) Recency prior [15].
Twitter Feedback (TF) Temporal feedback [9].
Wikipedia Views (WV) Wikipedia article page views.
Wikipedia Edits (WE) Wikipedia article page edits.
News News headlines.
13
Learning to Rank: Coordinate Ascent
We can learn the optimal coefficients for the features with a learning to rank
method optimizing retrieval performance for a set of training queries.
Coordinate Ascent [Metzler and Croft 2007] can optimize a retrieval metric
directly (MAP). It was shown to obtain higher performance compared to other
learning to rank methods on Twitter datasets [Xu et al. 2014].
Allows us to interpret the model by looking at the weights of features.
14
Evaluation: TREC Microblog datasets
Tweets2013 corpus is a 1% sample of Twitter over the period spanning from
1 February – 31 March containing approximately 240 million tweets.
Topics
• TREC 2013 Microblog queries Train set of 60 queries (20% validation)
• TREC 2014 Microblog queries Test set of 55 queries
Relevance judgments are on a three-point scale of “interestingness”:
• Not relevant
• Relevant
• Highly relevant
15
Results: Temporal and Atemporal queries
16
Temporal queries/model Atemporal queries/model
Results: All model vs T+A model
17
R – Recency
TF – Twitter feedback
WV – Wikipedia Views
WE – Wikipedia Edits
News – AP, Reuters, USAToday, BBC
From the external sources, News provided the greatest performance increase.
Twitter feedback provided the best increase in MAP but not in P30.
Contributions of Individual Sources
* stands for a p < 0.05 statistical significant improv. over LTR (doubled for p < 0.01)18
Feature contribution: Recency
19
Feature contribution: Temporal Feedback
20
Feature contribution: Wikipedia
21
Feature contribution: News
Barbara Walters, chicken pox
UK passes marriage bill
Port Said football riot, death sentences
Hostess bought by Apollo
22
Barbara made the news!
23
News about Barbara’s recovery
Return to the U.S. television
talk show “The View” on ABC
“Who gave Barbara Walters chickenpox?”“Barbara Walters plans 'View' return”
“Barbara Walters to return to 'The View' March 4”
“Barbara Walters returning to 'The View' on Monday”
“Barbara Walters to return to TV's 'The View' next week”
News
Time-aware Ranking with Multiple Sources
• Starting from the temporal cluster hypothesis:
“in search tasks where time plays an important role, do
relevant documents tend to cluster together in time?”
[Efron et al. 2014]
• Follows our research hypothesis:
…, do the time periods, where relevant documents
cluster together, correlate highly with an increased
“interest” or “activity” in pages on the Web related to
the topic?
24
Query: Barbara Walters, chicken pox
Conclusions
Time-aware ranking improves precision: our approach statistically
significantly outperforms the BM25 and LM.Dir models by approximately 13.2%
and a strong learning-to-rank model (LTR) by 6.2% in MAP.
RMTS is less biased: it explores temporal signals from multiple Web sources
to estimate the temporal relevance of search topics.
Unified representation of temporal signals: a representation that allows
predicting temporal relevance from multiple heterogeneous sources.
25
Thanks
SIGIR Student Travel Grant
http://cognitus-h2020.eu
26
top related