finding event-specific influencers in dynamic social networks

40
FINDING EVENT-SPECIFIC INFLUENCERS IN DYNAMIC SOCIAL NETWORKS Masters Thesis – Chris Schenk December 1 st , 2010

Upload: afi

Post on 24-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Finding Event-Specific Influencers in Dynamic Social Networks. Masters Thesis – Chris Schenk December 1 st , 2010. Outline. Problem overview Influencers, reputation, validation and security Summary of analysis methods Boulder f ire data Twitter Data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Finding Event-Specific Influencers in Dynamic Social Networks

FINDING EVENT-SPECIFIC INFLUENCERS IN DYNAMIC SOCIAL NETWORKSMasters Thesis – Chris SchenkDecember 1st, 2010

Page 2: Finding Event-Specific Influencers in Dynamic Social Networks

OUTLINE Problem overview

Influencers, reputation, validation and security Summary of analysis methods Boulder fire data

Twitter Data API, formats, collection and data limitations Statistics

Finding event-specific influencers – Rankings Stats Hyperlink-Induced Topic Search (HITS) Context-specific in-degree (original work)

Conclusions and Future Work

Page 3: Finding Event-Specific Influencers in Dynamic Social Networks

PROBLEM OVERVIEW

Page 4: Finding Event-Specific Influencers in Dynamic Social Networks

INFLUENCERS Social dynamics vs online social dynamics

Social network features Search, friends, re-tweets

Influencers and sheep What is meant by influence?

Understanding the data Sampling and baseline statistics Similarity measures, clustering Semantics, intent (NLP)

Baseline activity

Page 5: Finding Event-Specific Influencers in Dynamic Social Networks

INFLUENCERS – NETWORK STRUCTURE Betweenness/Closeness centrality PageRank/TwitterRank/TunkRank Local/Global hierarchical clustering K-core decomposition K-clique percolation Nearest Neighbor Networks Assortative mixing

HITS Activity Network

Page 6: Finding Event-Specific Influencers in Dynamic Social Networks

TWITTER DATA STATS – BOULDER FIRE Tweets

First day – September 6th, 2010 10:00am to September 7th, 2010 10:00am, Mountain time

First week – September 6th, 2010 10:00am to September 13th, 2010 10:00am, Mountain time

Social graph Five one-day snapshots beginning September 7th, 2010

12:40pm, Mountain time Tweet example

RT @garytx: Article on Twitter's use during #eqnz, #boulderfire, and #sanbrunofire: http://bit.ly/cwI1fi

kate30_CU - 2010-09-13 15:29:24+00:00 Keywords: boulder, boulderfire, fourmilefire,

fourmilecanyon, 4milefire

Page 7: Finding Event-Specific Influencers in Dynamic Social Networks

QUALITATIVELY INFLUENTIAL USERS Sixteen users gathered by Jo White

Used as “ground truth” data for ranking comparison

epiccolorado laurasrecipes HumaneBoulder fishnettesuzanbond CampSteve ConnectColorad

oOrg9

metroseen palen sophiabliu MediamumTanukun eadvocate kate30_CU BoulderChannel

1

Page 8: Finding Event-Specific Influencers in Dynamic Social Networks

TWITTER API AND DATA COLLECTION Search+Track+REST

Unique users for a given event Profiles

Periodic collection Friends/Followers

Periodic collection Tweets

One-time collection Limitations

Rate limits, multi-threading Improper SQL query

Page 9: Finding Event-Specific Influencers in Dynamic Social Networks

TWEET STATSStat First Day First Week

# Tweets (total) 12,147 2,314,700# Users 398 13,955Avg. Tweets/user 30.5 165.9Med. Tweets/user 9.0 38.0# Hashtags (total) 7,422 756,785# Hashtags (unique) 895 66,765Avg. Hashtag occurrence 8.3 11.3Med. Hashtag occurrence 1.0 1.0# Mentions (total) 7,877 1,224,851Avg. Mentions/User 19.9 87.8Med. Mentions/User 1.0 1.0# Users mentioning others

308 (77.39%)

11,036 (79.08%)

Page 10: Finding Event-Specific Influencers in Dynamic Social Networks

TWEET STATS (CONT.)Stat First Day First Week

# Addressed Msgs. 2,291 (18.85%)

368,047 (15.90%)

# Users addressing msgs.

227 (57.04%) 8,404 (60.22%)

# Re-tweet Msgs. 3,994 (32.88%)

504,836 (21.81%)

# Users re-tweeted (global)

1,456 134,204

# Users re-tweeted (fire) 356 (24.45%) 2,085 (1.55%)# URLs (unique) 4,105 1,200,927# Source applications 85 1,026# Users giving location

30 (7.53%) 858 (6.14%)

# Tweets with location 172 (1.42%) 17,093 (0.77%)

Page 11: Finding Event-Specific Influencers in Dynamic Social Networks

GRAPH STATS Timezone: Mountain

2010-09-07

12:40:01

2010-09-08

12:40:01

2010-09-09

12:40:01

2010-09-10

12:40:01

2010-09-11

15:10:01Users (fire)

448 1,631 1,623 1,622 4,093

Users (all) 821,609 2,292,929 2,295,885 2,300,838 4,075,573Edges (fire)

3,142 25,193 25,484 25,664 87,539

Edges (all) 1,510,036 5,361,650 5,370,451 5,372,597 30,458,948

Page 12: Finding Event-Specific Influencers in Dynamic Social Networks

LOCATION DATA – U.S.

Page 13: Finding Event-Specific Influencers in Dynamic Social Networks

LOCATION DATA – DENVER METRO

Page 14: Finding Event-Specific Influencers in Dynamic Social Networks

LOCATION DATA – BOULDER, LONGMONT, BROOMFIELD

Page 15: Finding Event-Specific Influencers in Dynamic Social Networks

USER “FISHNETTE” DATA - AGGREGATE HOURLY TWEET COUNTS

Page 16: Finding Event-Specific Influencers in Dynamic Social Networks

USER “FISHNETTE” DATA – AGGREGATE MONTHLY TWEET COUNTS

Page 17: Finding Event-Specific Influencers in Dynamic Social Networks

HASHTAG COUNTS

Page 18: Finding Event-Specific Influencers in Dynamic Social Networks

ADDRESSED MESSAGES

Page 19: Finding Event-Specific Influencers in Dynamic Social Networks

RE-TWEETS

Page 20: Finding Event-Specific Influencers in Dynamic Social Networks

FINDING INFLUENCERS - RANKINGS Tweets

Number of tweets Username mentions Number of re-tweets

Graph In-degree HITS

all users (sorted by frequency) active users Mentions addressed messages (replies)

Context-specific in-degree Global followers count Active edges (pre-existing network) New Edges

Page 21: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS - NUMBER OF TWEETS

Page 22: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – USERNAME MENTIONS

Page 23: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – RE-TWEETS

Page 24: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – IN-DEGREE (FOLLOWERS)

Page 25: Finding Event-Specific Influencers in Dynamic Social Networks

HYPERLINK-INDUCED TOPIC SEARCH (HITS) Hubs

Those that link to many authorities Authorities

Those that are linked to by many hubs Process

Calculate the principle eigenvector of two matrices Followers adjacency matrix (authorities) Friends adjacency matrix (hubs)

Iterative Rankings by highest value descending in

eigenvectors

Page 26: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – HITS – ALL USERS

Page 27: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – HITS – ACTIVE USERS

Page 28: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – HITS – MENTIONS

Page 29: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – HITS – ADDRESSED MSGS.

Page 30: Finding Event-Specific Influencers in Dynamic Social Networks

CONTEXT-SPECIFIC IN-DEGREE RANKING Global followers count

Periodically download user profiles Calculate change in followers count for each snapshot Rank based on overall change, descending

Active edges (includes pre-existing edges) Periodically download friend/follower lists Calculate change in followers count for each snapshot Rank based on overall change, descending

New Edges Periodically download friend/follower lists Calculate change in followers count for each snapshot

Do not count edges that existed prior to the start of the event

Rank based on overall change, descending

Page 31: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – GLOBAL FOLLOWERS COUNT

Page 32: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – ACTIVE EDGES

Page 33: Finding Event-Specific Influencers in Dynamic Social Networks

RANKINGS – NEW EDGES

Page 34: Finding Event-Specific Influencers in Dynamic Social Networks

LIMITATIONS AND MODIFICATIONS On-going influence

Can only measure when a user becomes influential Global popularity masking local influence

User “andrewhyde” News and bot activity

Extra data needed to ignore these users Large events

Data collection limitations How important is a de-follow?

Can identify individual user activity Identifying the sheep

Can equivalently count friends (out-links) created

Page 35: Finding Event-Specific Influencers in Dynamic Social Networks

CONCLUSIONS Notions of influence and interaction are

heavily dependent on social network features No agreement on definitions

Influence measured by features not 100% in use Or features not used in the same way by

everyone Composability problem

HITS ranking no better than global in-degree Context-specific in-degree ranking good!

Needs to be tested on multiple events of varying sizes

Page 36: Finding Event-Specific Influencers in Dynamic Social Networks

FUTURE WORK Understanding “baseline” behavior

For users active (using keywords) during an event

Calculate all given statistics for a user (Klout.com?) Lots of ways to cut the data

Composable factors/measures/attributes Explaining new links created

Models for searching, re-tweeting, hashtags, #ff, etc

Incorporating blogs, forums, news websites Real-time vs not

Informing algorithms with other techniques NLP and more automation Qualitative analysis (crowdsourcing?)

Page 37: Finding Event-Specific Influencers in Dynamic Social Networks

THANKS! QUESTIONS?

Page 38: Finding Event-Specific Influencers in Dynamic Social Networks

REPUTATION Definitions? Scores

Composability Explicit reputation

Ratings, votes Implicit reputation

Client Server

Page 39: Finding Event-Specific Influencers in Dynamic Social Networks

VALIDATION Ground truth

Authorities Armies of grad students Crowd-sourcing?

More data Cross-referencing News websites Blogs Public health and safety (or other)

Page 40: Finding Event-Specific Influencers in Dynamic Social Networks

SECURITY Malicious users

Inflation of reputation Sybil attacks

Reporting Audience? Anonymization