a network based model for predicting a hashtag break out in twitter

Post on 17-Jul-2015

75 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Network-Based Model for Predicting Hashtag Breakouts in Twitter

Agenda

Background

Methodology

Our visualization tool

Experiment & Results

Introduction

Tweets:

Textual contents

User interaction: retweeting,

mentioning, replying, etc.

Hashtags:

tagging mechanism created

by users

Help in categorizing tweets

Become very popular in

trending topics

Some Definitions

Tweet Hashtag Volume: Number of tweets “containing a given

hashtag” per day.

Spike: sharp increase in the volume

Research Question

Some hashtags become viral.

Can we predict whether a hashtag will go viral at nascent

stages?

Network base?

Textual Content base?

Viral Diffusion

Network Based Analysis

• Arruda et al. examined the role of centrality measures in diseasespread on a SIR model and spreading rumors on a social network.

• In SIR model for rumors, infected individuals recover by someprobability while a spreader becomes a carrier thru contacts insocial networks.

Content Based Analysis

• Hypothesized that a specific groups of words are more likely to be contained in viral tweets.

• Li et al. analyzed tweets in terms of emotional divergence aspects (or sentiment analysis) and noted that highly interactive tweets tend to contain more negative emotions than other tweets.

Running average and standard deviation

20 days sliding window

Running Average and Standard Deviation

20 days sliding window

Hashtag Volume

Utilizing Three Sigma Rule

68-95-99.7 Rule

Empirical rule

Hashtags Distribution

Accumulative Period

Break out or Die

out?

Build a

predictive

learning model

based on …

Accumulative Period

Break out or Die

out?

Build a

predictive

learning model

based on …

Break out vs Die out

Break out Non break out (Die out)

Our Approach

Can we predict #Hashtag breakouts in Twitter at their early stages using local and global network interaction measures ?

Local measures: interaction network within the 20 days accumulation window.

Global measures: interaction network from earlier until the end of the current window.

1. Define a 3-sigma/empirical rule based breakout measure2. Model evolutionary episodes of hashtag volumes, as:

• Accumulation, Breakout, Die-Out3. Extract local and global network features 4. Train and test a classifier to:

• Predict if Accumulation leads to Breakout or Die-Out

IDENTIFY evolutionary episodes in #Hashtag volume time-series

BreakoutAccumulation Die-out Accumulation Die-out

Trending Hashtag Forcaster

Local and global network measures are computed as features

Network measures:

Eigen Vector Centrality

Page Rank

Closeness Centrality

Betweeness Centrality

Degree Centrality

Indegree Centrality

Outdegree Centrality

Link Rate

Distinct Link Rate

Number of Uninfected neighbors of early adopters

Neighborhood average degree

PCA Ranking of Features

Exploratory method: reducing the original measure

variables by orthogonal transformation.

PCA would return sorted number of (linearly uncorrelated)

components along with its variance.

Highest number of variance among instances.

PCA Ranking of Features

Prediction Accuracies

Break out

• Non Break out (die out)

Conclusion and Future Work

• A content independent network based classifier for predicting hashtag breakouts

• Next, we propose to study the utility of content based features such as keywords, named-entities, topics and sentiments.

Thank you for listening!

Any question?

top related