barbara made the news: mining the behavior of crowds for time-aware learning to rank

Barbara Made the News:

Mining the Behavior of Crowds

for Time-Aware Learning to Rank

Flávio Martins*, João Magalhães*, and Jamie Callan†

* NOVA LINCS, Universidade NOVA de Lisboa† LTI, Carnegie Mellon University

The 9th ACM International Conference on Web Search and Data Mining

San Francisco, California, USA. 20-25 February, 2016

Twitter real-time ad-hoc retrieval• Vocabulary mismatch

• Temporal dimension

• High volume / Lots of users

What do people search on Twitter? #TwitterSearch [Teevan et al. 2011]

Temporally relevant information• breaking news, real-time content, popular trends, etc.

Information related to people• information about people of interest, general sentiment and opinion, etc.

Why Time-aware Ranking?

Time-aware Ranking

Recency-based ranking

• Time-based language models [Li and Croft 2003]

• Estimation methods for ranking recent information [Efron and Golovchinsky 2011]

Time-dependent ranking

• Temporal profiles of queries [Jones and Diaz 2007]

• Time-sensitive queries [Dakka et al. 2012]

• Temporal feedback for tweet search [Efron et al. 2014]

Temporal distribution of documents

• Histograms vs Probability Density function

(age)( )P D e

( | ) ~ ( | ) ( | ) ( ) ( )D D D DP D Q P W Q P T Q P W P T

Time-aware Ranking

Most prior work gathers temporal information from a single source, typically

the corpus itself or an external source such as Wikipedia.

These strategies assume that all events will have an impact on Twitter or

Wikipedia, and that the temporal signal available from Twitter or Wikipedia will

be clear and unambiguous, however that is not always the case.

Temporal feedback: Twitter as the only

source of data can fail for some queries.

Mining the Behavior of Crowds

What is happening now prompts users on the Web to produce and interact

with new posts about newsworthy events giving rise to trending topics.

We leverage on the behavioral dynamics of the crowd to estimate a topic’s

temporal relevance i.e., the most relevant time periods for a search query.

Time-aware Ranking with Multiple Sources

Our approach builds on two major novelties:

1. First, a unifying approach that given query q, mines and represents temporal

evidence from multiple sources of crowd signals. This allows us to predict

the temporal relevance of documents for query q.

2. Second, a principled retrieval model that integrates temporal signals in a

learning to rank framework, to rank results according to the predicted

temporal relevance.

A Unified Representation of Temporal Crowd Signals

• Considering a set 𝑆 ∈ ሼ ሽ𝑆1, 𝑆2, … , 𝑆𝑘 of information sources that reflect a real-

world event, our goal is to discover the relevance of a timestamp for a

particular query. In other words, we wish to infer a function:

• The probability density function 𝑓𝑠𝑘 𝑡𝑑𝑏 𝑞𝑎) is approximated by a kernel density

estimation which is advantageous due to the natural smoothness of the

resulting function:

( , ) [0,1]ks a bf q t

t tf t w K

* 5ˆ1.06h n

( | , , )bd a kP r t q s

( ) ,K z z h

A Unified Representation of Temporal Crowd Signals

ts a bi n

P q dw

( , )es

w TF q e

( , )hs

i a iw J q h

min( ),vs

i i iw v v v

Wikipedia Edits

Wikipedia Views

Temporal Feedback

Jaccard similarity coefficient of the terms

of query 𝑞𝑎 with this news headline ℎ𝑖.

Normalized page views per day.

Frequency of query 𝑞𝑎 terms in edit.

Query-likelihood document score

Evaluation: Baselines and methods

Standard retrieval functions

• BM25 retrieval function [Robertson et al. 1994]

• IDF – Inverse Document Frequency

• LM.Dir – Language models with Dirichlet smoothing [Zhai and Lafferty 2004]

Time-aware ranking methods

• Recency – Time-based language models [Li and Croft 2003]

• KDE(score) – Temporal feedback for tweet search [Efron et al. 2014]

Learning to rank

• LTR – Learning to rank model with lexical and domain features (Non-temporal)

• RMTS – Learning to rank model with multiple temporal sources (Temporal)

Learning to Rank: Ranking framework

LTR( , ) ( , ) ( )i ja b i l a b j c b

q d f q d f d

( , , ) ( , ) ( ) ( , )b i j k ba d b i l a b j c b k s a d

RMTS q t d f q d f d f q t

Lexical features Domain features

Lexical features Domain features Temporal features

LTR: Lexical and domain features

Feature name Feature description

BM25 Okapi BM25 score for tweet-text.

LM.Dir Language modeling score for tweet-text.

IDF Sum of term IDF in tweet-text.

Length Tweet-text length.

NumURLs Number of URLs in tweet-text.

HasURLs True if tweet-text contains URLs.

NumHashtags Number of Hashtags in tweet-text.

HasHashtags True if tweet-text contains Hashtags.

NumMentions Number of Mentions in tweet-text.

HasMentions True if tweet-text contains Mentions.

isReply True if tweet-text is a reply.

NumStatuses Number of user’s statuses.

NumFollowers Number of user’s followers.

Lexical features

Credibility

Tweet features

Domain features

• Each temporal feature represents

the likelihood of relevance of a

document (tweet) according to its

publication timestamp.

• This value is obtained using

Kernel Density Estimation over

the data mined from each source.

RMTS: LTR + Temporal ranking features

Feature name Feature source

Recency (R) Recency prior [15].

Twitter Feedback (TF) Temporal feedback [9].

Wikipedia Views (WV) Wikipedia article page views.

Wikipedia Edits (WE) Wikipedia article page edits.

News News headlines.

Learning to Rank: Coordinate Ascent

We can learn the optimal coefficients for the features with a learning to rank

method optimizing retrieval performance for a set of training queries.

Coordinate Ascent [Metzler and Croft 2007] can optimize a retrieval metric

directly (MAP). It was shown to obtain higher performance compared to other

learning to rank methods on Twitter datasets [Xu et al. 2014].

Allows us to interpret the model by looking at the weights of features.

Evaluation: TREC Microblog datasets

Tweets2013 corpus is a 1% sample of Twitter over the period spanning from

1 February – 31 March containing approximately 240 million tweets.

Topics

• TREC 2013 Microblog queries Train set of 60 queries (20% validation)

• TREC 2014 Microblog queries Test set of 55 queries

Relevance judgments are on a three-point scale of “interestingness”:

• Not relevant

• Relevant

• Highly relevant

Results: Temporal and Atemporal queries

Temporal queries/model Atemporal queries/model

Results: All model vs T+A model

R – Recency

TF – Twitter feedback

WV – Wikipedia Views

WE – Wikipedia Edits

News – AP, Reuters, USAToday, BBC

From the external sources, News provided the greatest performance increase.

Twitter feedback provided the best increase in MAP but not in P30.

Contributions of Individual Sources

* stands for a p < 0.05 statistical significant improv. over LTR (doubled for p < 0.01)18

Feature contribution: Recency

Feature contribution: Temporal Feedback

Feature contribution: Wikipedia

Feature contribution: News

Barbara Walters, chicken pox

UK passes marriage bill

Port Said football riot, death sentences

Hostess bought by Apollo

Barbara made the news!

News about Barbara’s recovery

Return to the U.S. television

talk show “The View” on ABC

“Who gave Barbara Walters chickenpox?”“Barbara Walters plans 'View' return”

“Barbara Walters to return to 'The View' March 4”

“Barbara Walters returning to 'The View' on Monday”

“Barbara Walters to return to TV's 'The View' next week”

• Starting from the temporal cluster hypothesis:

“in search tasks where time plays an important role, do

relevant documents tend to cluster together in time?”

[Efron et al. 2014]

• Follows our research hypothesis:

…, do the time periods, where relevant documents

cluster together, correlate highly with an increased

“interest” or “activity” in pages on the Web related to

the topic?

Query: Barbara Walters, chicken pox

Conclusions

Time-aware ranking improves precision: our approach statistically

significantly outperforms the BM25 and LM.Dir models by approximately 13.2%

and a strong learning-to-rank model (LTR) by 6.2% in MAP.

RMTS is less biased: it explores temporal signals from multiple Web sources

to estimate the temporal relevance of search topics.

Unified representation of temporal signals: a representation that allows

predicting temporal relevance from multiple heterogeneous sources.

Thanks

SIGIR Student Travel Grant

http://cognitus-h2020.eu

barbara made the news: mining the behavior of crowds for time-aware learning to rank

Internet

creative crowds presentation

flávio martins temporal models for microblog search...

jesus & the crowds

the wisdom of crowds

teaching crowds

fly above the crowds

crowds and clouds

the power of crowds

managing crowds guide

bring crowds into crowdfunding

wisdom of crowds and rank aggregation

swarms and crowds

sfu bus495 organizing crowds

extracting semantics from crowds

crowds deleuze

moshe.jove.tuesday 830 crowds

data, crowds, and change

bo where crowds

its for crowds

wisdom of crowds