generating event storylines from microblogs

45
CIKM’12 Generating Event Storylinesfrom Microblogs

Upload: moresmile

Post on 14-Jul-2015

131 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Generating event storylines from microblogs

CIKM’12

Generating Event Storylinesfrom

Microblogs

Page 2: Generating event storylines from microblogs

ABSTRACT

we explore the problem of generating storylines

from microblogs for user input queries.

Given a query of an ongoing event, we propose

to sketch the real-time storyline of the event by a

two-level solution.

1. propose a language model with dynamic

pseudo relevance feedback to obtain relevant

tweets

2. Generate storylines via graph optimization

Page 3: Generating event storylines from microblogs

INTRODUCTION

Generating Event Storyline from Microblogs

(GESM)

Page 4: Generating event storylines from microblogs

differences between GESM and prior studies:

1. Well edited facts ---- short noisy text

2. GESM provides personalized service

3. A two-level framework is necessary: at the low

level, finding all relevant tweets through the

time-line of the event by a retrieve model; and

at the high level, summarizing relevant tweets

and the latent structure to produce a storyline.

INTRODUCTION

Page 5: Generating event storylines from microblogs

INTRODUCTION

Challenges

1、the dynamic and sparse nature of microblogs

——How to match the underlying event expressed

by the vague event query to potential relevant

tweets which possibly not contain any query terms

2、Numerous duplicate tweets and direct and

undirect re-tweets

Page 6: Generating event storylines from microblogs

INTRODUCTION

contributions

1. generating event storylines from microblogs

2. A dynamic pseudo relevance feedback (DPRF)

language

model

3. a graph-based optimization problem and is

solved by approximation algorithms of

minimum-weight dominating set and directed

Steiner tree

Page 7: Generating event storylines from microblogs

THE FRAMEWORK OVERVIEW

generated storyline should be a graph structure

Node is labeled by a summary

Edge represents causal relationship between two

phases

Offline layer

Online layers

Page 8: Generating event storylines from microblogs

THE RETRIEVAL MODEL

Preliminaries

the original query is usually short and vague

Query expansion

In a pseudo relevance manner, suppose the few top

ranked documents d + by the initial query Q builds a

relevant model θ F , we can set the new query to be

a linear combination of original query Q and

relevant model θF

Page 9: Generating event storylines from microblogs

THE RETRIEVAL MODEL

Dynamic Pseudo Relevance Feedback

K burst periods

Assume that the prior probability of relevant

document d + is dependent on the distance of td+

to the centroid

of burst periods, denoted as Φ = { φ 1 ··· φ K }

three probability functions to model the effective

range of burst period, decay coefficient and

skewness.

1. Mixture Gaussian Distribution

2. Local Power Distribution

3. Skewed Linear Distribution

Page 10: Generating event storylines from microblogs

THE RETRIEVAL MODEL

Mixture Gaussian Distribution

Local Power Distribution

Skewed Linear Distribution

Page 11: Generating event storylines from microblogs

THE RETRIEVAL MODEL

Burst Period Detection

1. appear more frequently than usual

2. be continuously frequent around the time point.

detect burst periods of the event by

1. for each query term, finding the time intervals

with arbitrary length in which the query term

appears constantly frequent;

2. picking the time points within these intervals

with the

largest sum of frequencies over all query terms.

Page 12: Generating event storylines from microblogs

THE RETRIEVAL MODEL

“bursty score”

find time interval Tw,j = <st, et, LS, RS> with the maximal cumulative burst score B ( w, Tw,j )

Compute the score of any query term q at each time point

Rank each time point by ∑q∈QH ( q,t )and choose the largest K time point φk .

Page 13: Generating event storylines from microblogs

STORYLINE GENERATION

1. Representative tweets

2. Depict the evolving structure of the event

3. an optimistic connection

a multi-view tweet graph is constructed

a minimum dominant set on the tweet graph

a minimum steiner tree

Page 14: Generating event storylines from microblogs

STORYLINE GENERATION

three non negative real parameters α, τ1, τ2 , τ1<

τ2 .

define E : text similarity > α

define A : τ1 ≤ t j − t i ≤ τ2

w(vi ) = 1 − score ( Q,vi ).

Page 15: Generating event storylines from microblogs

STORYLINE GENERATION

A subset S of the vertex set of an undirected

graph is a

dominating set if for each vertex u ,either u is in

S or is adjacent to a vertex in S .

Page 16: Generating event storylines from microblogs

STORYLINE GENERATION

greedy algorithm

Page 17: Generating event storylines from microblogs

STORYLINE GENERATION

A Steiner tree of a graph G with respect to a

vertex subset S is the edge-induced sub-tree of G

that contains all the vertices of S having the

minimum total cost, where the cost is

the total weight of the vertices.

Page 18: Generating event storylines from microblogs

STORYLINE GENERATION

Page 19: Generating event storylines from microblogs

STORYLINE GENERATION

Page 20: Generating event storylines from microblogs

EXPERIMENTS

Data Set

Page 21: Generating event storylines from microblogs

EXPERIMENTS

Tweet Retrieval

49 queries

evaluation metric :

precision at top 30 tweets(P@30)

mean average precision(MAP)

precision at top 100 tweets(P@100)

R-precision (R-PREC)

Page 22: Generating event storylines from microblogs

EXPERIMENTS

Comparative Study

Page 23: Generating event storylines from microblogs

EXPERIMENTS

Parameter Tuning

Page 24: Generating event storylines from microblogs

EXPERIMENTS

Summarization Capability

Page 25: Generating event storylines from microblogs

EXPERIMENTS

Parameter Tuning

Page 26: Generating event storylines from microblogs

EXPERIMENTS

A User Study

Page 27: Generating event storylines from microblogs

CONCLUSION

The proposed dynamic pseudo relevance

feedback model

minimum weighted Steiner tree on a dominant set

充分的实验

Page 28: Generating event storylines from microblogs

OMG, I Have to Tweet That!

A Study of Factors that Influence Tweet

Rates

Page 29: Generating event storylines from microblogs

Abstract

key limitation :

it depends on people self reporting their own

behaviors

and observations.

a large scale quantitative analysis of some of

the factors that influence self reporting bias.

the daily variations in tweet rates about weather

events

Page 30: Generating event storylines from microblogs

Introduction

treating social media as a signal to measure the relative real-world occurrence of events

critical challenge :the bias introduced by the self-reported nature of

social media

What is it about an event that makes it more or less “tweetable”?

A first large-scale, quantitative analysis of some of the factors that influence self-reporting bias by comparing a year of tweets about weather events in cities across the United States and Canada to ground-truth knowledge about actual weather occurrences.

Page 31: Generating event storylines from microblogs

Introduction

three potential factors :

1. How extreme is the weather?

2. How expected is the weather given the time-of-

year?

3. How much did the weather change?

Page 32: Generating event storylines from microblogs

Data Preparation

Jun 1, 2010 and Jun 30, 2011

56 different metropolitan areas

historical weather data provided by the National

Oceanic and Atmospheric Administration of the

United States.

Page 33: Generating event storylines from microblogs

Identifying Weather-related Tweets

discovering the rate of weather-related tweets

that occurred per-day across metropolitan areas

1. filtering the full archive of tweets for tweets that

contain at least 1 weather-related word from a

list of 179 weather-related words and phrases

2. build a classifier for weather-related tweets

Page 34: Generating event storylines from microblogs

a simple classifier that estimates the probability

of a tweet being weather related as

Page 35: Generating event storylines from microblogs

Identifying the Location of Tweets

geo-coded

the textual user- provided location field in a user’s Twitter profile

normalize the textual

arbitrary user-provided location information into concrete geo-coded coordinates

1. a mapping from user-provided location fields to latitude-longitude coordinates.

2. merge location fields with similar geo-mappings together to create clusters for roughly metropolitan-sized areas

Page 36: Generating event storylines from microblogs

Identifying the Location of Tweets

Page 37: Generating event storylines from microblogs

Historical Weather Data

calculate daily summaries

For each daily summary of weather data at a

location:

Expectation: how normal the observed weather

is at a location

Extremeness : how extreme the weather is on a

particular day

Change: how different the observed weather data

is from previous days’ weather

Page 38: Generating event storylines from microblogs

Analysis and Results

Tweet Rates and Weather Reports

Page 39: Generating event storylines from microblogs

Analysis and Results

Linear Regression

the relationship between a set of weather-derived

features and the daily rate of weather-related

tweets

Page 40: Generating event storylines from microblogs

Analysis and Results

Correlating Basic Weather Data and Tweet

Rates

Page 41: Generating event storylines from microblogs

Analysis and Results

Correlating Expectation and Tweet Rates

expectation measure adds little information about likely tweet rates beyond what is already contained in basic weather data

Correlating Extremeness and Tweet Rates

extremeness can independently explain more of the variation in weather-related tweet rates than basic weather alone

Correlating Delta Change and Tweet Rates

there is little difference in the amount of information gained from building these delta-change models

Combining Extremeness, Expectation, and Delta Change Models

Page 42: Generating event storylines from microblogs

Analysis and Results

Per-Location Models

Page 43: Generating event storylines from microblogs
Page 44: Generating event storylines from microblogs

Discussion

Additional Factors Likely to Effect Tweet

Rates

Sentiment

Privacy concerns, embarrassments and safety:

Population segments :

Mobile devices

Time-of-Day, day-of-week, holiday, and other

effects of time:

Page 45: Generating event storylines from microblogs

Conclusions

the correlation between daily tweet

rates and the expectation, extremeness, and the

change in

observed weather.

global models

location-specific models

Extremeness>change>expectation