who will rt this?: automatically identifying and engaging strangers on twitter to spread information

27
@jwnichls 1 Who will RT this? Automatically Identifying and Engaging Strangers on Twitter to Spread Information Kyumin Lee, Jalal Mahmud, Jilin Chen, Michelle X. Zhou, Jeffrey Nichols Utah State University & IBM Research – Almaden [email protected] @jwnichls

Upload: jeffrey-nichols

Post on 10-May-2015

718 views

Category:

Technology


1 download

DESCRIPTION

There has been much effort on studying how social media sites, such as Twitter, help propagate information in differ- ent situations, including spreading alerts and SOS messages in an emergency. However, existing work has not addressed how to actively identify and engage the right strangers at the right time on social media to help effectively propagate intended information within a desired time frame. To ad- dress this problem, we have developed two models: (i) a feature-based model that leverages peoples’ exhibited social behavior, including the content of their tweets and social interactions, to characterize their willingness and readiness to propagate information on Twitter via the act of retweeting; and (ii) a wait-time model based on a user's previous retweeting wait times to predict her next retweeting time when asked. Based on these two models, we build a recommender system that predicts the likelihood of a stranger to retweet information when asked, within a specific time window, and recommends the top-N qualified strangers to engage with. Our experiments, including live studies in the real world, demonstrate the effectiveness of our work. Presented at Intelligent User Interfaces 2014, Haifa, Israel. February 27, 2014.

TRANSCRIPT

Page 1: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 1

Who will RT this?Automatically Identifying and Engaging Strangers on Twitter to Spread Information

Kyumin Lee, Jalal Mahmud, Jilin Chen, Michelle X. Zhou, Jeffrey NicholsUtah State University & IBM Research – Almaden

[email protected] @jwnichls

Page 2: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 2

Public Social Media Contains a Wealth of Information about Individuals…

Page 3: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 3

Public Social Media Contains a Wealth of Information about Individuals…

Can we harness this informationfor something useful?

• Identify people to recruit to do various tasks

• Collect Information

• Spread Information

Page 4: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 4

Information Collection: TSA TrackerCrowdsourcing airport security wait times through Twitter

Step #1. Watch for people tweeting about beingin airport

Step #2. Ask nicely if theywould share wait time tohelp others

Step #3. Collect responsesand share relevant dataon web site

Step #4. Say thank you!Key Result: Would people respond to questions from strangers? Yes…around 40% of the time

http://tsatracker.org/@tsatracker, @tsatracking

[Nichols et al. CSCW 2012, Mahmud et al. IUI 2013]

Page 5: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 5

Today: Information Spreading

• Relevant marketing campaign messages• Alerts and SOS messages in an emergency• Etc.

Page 6: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 6

Today: Information Spreading

Challenge: Low percentage of people respond to this task• Can we predict who will retweet and direct requests only to them?• Can we predict who will retweet more quickly?

Page 7: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 7

Our Process

1. Data Collection2. Feature Extraction3. Feature Selection4. Model Building5. Evaluation

Page 8: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 8

Ground-Truth Data Collection

Public Safety (location-based)

• Randomly selected users who tweeted from the San Francisco bay area (via geo-tags)

• Contacted 1,902 users• 52 (2.8%) retweeted our message• Message reached a total of 18,670

followers

Bird Flu (topic-based)

• Randomly selected users who posted one of the following words in at least one tweet: “bird flu”, “H5N1” and “avian influenza”

• Contacted 1,859 users • 155 (8.4%) retweeted our message• Message reached a total of 184,325

followers

Page 9: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 9

Resulting Data

In total: • Contacted 3,761 strangers• 207 positive examples, 3554

negative examples

For each user we contacted, we collected:

• Twitter profile (screen name, tweet count, etc.)

• People they followed, followers,

• Up to 200 recent messages • Ground truth (“retweeter” or

“non-retweeter”)

Page 10: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 10

Feature Extraction

Feature Categories

• Profile Features• Social Network Features• Personality Features• Activity Features• Past Retweeting Features• Readiness Features

Page 11: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 11

Feature Extraction

Feature Categories

• Profile Features• Social Network Features• Personality Features• Activity Features• Past Retweeting Features• Readiness Features

Profile Features• longevity (age) of an account• length of screen name• whether the user profile has a

description• length of the description• whether the user profile has a

URL

Social Network Features• number of users following

(friends)• number of followers• and the ratio of number of

friends to number of followers

Page 12: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 12

Feature Extraction

Feature Categories

• Profile Features• Social Network Features• Personality Features• Activity Features• Past Retweeting Features• Readiness Features

Users’ word usage has been found to predict their personality•Linguistic Inquiry and Word Count (LIWC) dictionary•Personality features derived from LIWC categories [Yarkoni 2010, Mahmud 2013]

Page 13: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 13

Feature Extraction

Feature Categories

• Profile Features• Social Network Features• Personality Features• Activity Features• Past Retweeting Features• Readiness Features

• Number of status messages• Number of direct mentions (e.g., @johny)

per status message• Number of URLs per status message• Number of hashtags per status message• Number of status messages per day during

her entire• Account life (= total number of posted

status messages / longevity)• Number of status messages per day during

last one month• Number of direct mentions per day during

last one month• Number of URLs per day during last one

month• Number of hashtags per day during last

one month

Page 14: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 14

Feature Extraction

Feature Categories

• Profile Features• Social Network Features• Personality Features• Activity Features• Past Retweeting Features• Readiness Features

Past Retweeting Behavior• Number of retweets per status message:

R/N• Average number of retweets per day• Fraction of retweets for which original

messages are posted by strangers who are not in her social network

Readiness Based on Previous Activity• Tweeting Likelihood (Day)• Tweeting Likelihood (Hour)• Entropy of Tweeting Likelihood (Day)• Entropy of Tweeting Likelihood (Hour)• Tweeting Steadiness• Tweeting Inactivity

Page 15: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 15

Predicting Retweeters

Training and Test Sets:• Each dataset (public safety

and bird flu) was randomly split to training set (2/3 data) and testing set (1/3 data)

5 Predictive Models• Random Forest, Naïve Bayes,

Logistic Regression, SMO (SVM) and AdaboostM1

Handing Class Imbalance• Used both over-sampling

(SMOTE) and weighting approaches (cost-sensitive approach)

Page 16: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 16

Feature SelectionComputed χ2 value for each feature in training

21 Features Selected by χ2 in Publish Safety Dataset

46 Features Selected by χ2 in Bird Flu Dataset

Activity, personality, readiness and past retweeting feature groups have more significant power.Six significant features (bolded names) are common to both sets.

Feature Group

Significant Features (bolded is common to both data sets)

Profile the longevity of the account

Social-network|following|

ratio of number of friends to number of followers

Activity

|URLs| per day

|direct mentions| per day

|hashtags| per day

|status messages|

|status messages| per day during entire account life

|status messages| per day during last one month

Past Retweeting

|retweets| per status message

|retweets| per day

Readiness Tweeting Likelihood of the Day

Tweeting Likelihood of the Day (Entropy)

Personality

7 LIWC features: Inclusive, Achievement, Humans, Time, Sadness, Articles, Nonfluencies

1 Facet feature: Modesty

Feature Group

Significant Features (bolded is common to both data sets)

Profilethe length of description

has description in profile

Activity

|URLs| per day

|direct mentions| per day

|hashtags| per day

|URLs| per status message

|direct mentions| per status message

|hashtags| per status message

Past Retweeting

|retweets| per status message

|retweets| per day

|URLs| per retweet message

Readiness Tweeting Likelihood of the Hour (Entropy)

Personality

34 LIWC features: Inclusive, Total Pronouns, 1st Person Plural, 2nd Person, 3rd Person, Social Processes, Positive Emotions, Numbers, Other References, Occupation, Affect, School, Anxiety, Hearing, Certainty, SZensory Processes, Death, Body States, Positive Feelings, Leisure, Optimism, Negation, Physical States, Communication 8 Facet features: Liberalism, Assertiveness, Achievement Striving, Self-Discipline, Gregariousness, Cheerfulness, Activity Level, Intellect

2 Big5 features: Conscientiousness, Openness

Page 17: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 17

Evaluating Retweeter PredictionOnly the significant features are used for prediction

Prediction accuracy (Public Safety) Prediction accuracy (Bird Flu)

We use Random Forest for all following experiments.

Classifier AUC F1F1 of

Retweeter

Basic

Random Forest 0.638 0.958 0

Naïve Bayes 0.619 0.939 0.172

Logistic 0.640 0.958 0

SMO 0.500 0.96 0

AdaBoostM1 0.548 0.962 0.1

SMOTE

Random Forest 0.606 0.916 0.119

Naïve Bayes 0.637 0.923 0.132

Logistic 0.664 0.833 0.091

SMO 0.626 0.813 0.091

AdaBoostM1 0.633 0.933 0.129

Cost-Sensitive (Weighting, showing the best results in each model)

Random Forest 0.692 0.954 0.125

Naïve Bayes 0.619 0.93 0.147

Logistic 0.623 0.938 0.042

SMO 0.633 0.892 0.123

AdaBoostM1 0.678 0.956 0.133

Classifier AUC F1F1 of

Retweeter

Basic

Random Forest 0.707 0.877 0.066Naïve Bayes 0.670 0.834 0.222Logistic 0.751 0.878 0.067SMO 0.500 0.876 0AdaBoostM1 0.627 0.878 0.067

SMOTE

Random Forest 0.707 0.819 0.236Naïve Bayes 0.679 0.724 0.231Logistic 0.76 0.733 0.258SMO 0.729 0.712 0.278AdaBoostM1 0.709 0.837 0.292

Cost-Sensitive (Weighting, showing the best results in each model)

Random Forest 0.785 0.815 0.296Naïve Bayes 0.670 0.767 0.24Logistic 0.735 0.742 0.243SMO 0.676 0.738 0.256AdaBoostM1 0.669 0.87 0.031

Page 18: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 18

Comparison with Two Baselines

BaselinesRandom people contact

• Randomly select and ask a sub-set of qualified candidates

Popular people contact• Sort candidates in our

test set by their follower count in the descending order

Comparison of retweeting rates

ApproachRetweeting Rate in Testing Set

Public Safety Bird flu

Random People Contact 2.6% 8.3%

Popular People Contact 3.1% 8.5%

Our Prediction Approach 13.3% 19.7%

Page 19: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 19

Incorporating Time Constraints

• Can we predict when a person will retweet given a request?• We use an exponential distribution model to estimate a user’s retweeting

wait time with a probability.• Assume that retweeting events follow a poisson process.

• 1/λ = the average wait time for a user based on prior retweeting wait time

When a person’s average wait time 1/λ = 180 min, retweeting probability within 200 minutes is larger than 0.6

Page 20: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 20

Evaluation with Time Constraints

• Compare two baselines with our prediction approach and our prediction approach with wait-time model

• For the wait-time model, if a user has at least 70% probability to retweet within a certain time window (say, 6 hours), we will contact the user.

• We experimented with different time windows such as 6, 12, 18 or 24 hours. The below table shows the averaged retweeting rate

ApproachAverage Retweeting Rate in Testing Set

under Time Constraints

Public Safety Bird flu

Random People Contact 2.2% 6.5%

Popular People Contact 2.7% 6.4%

Our Prediction Approach 13.3% 13.6%

Our Prediction Approach + Wait-Time Model

19.3% 14.7%

Comparison of retweeting rates with time constraints

Page 21: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 21

Evaluating in Terms of Benefit & Cost

Benefit: Number of people exposed to messageCost (N): the number of people contactedK: the number of retweeters

(normalized benefit)

ApproachUnit-Info-Reach-Per-Person

Public Safety Bird flu

Random People Contact 6 85

Popular People Contact 11 116

Our Prediction Approach 106 135

Our Prediction Approach + Wait-Time Model

153 155

Comparison of information reach

Page 22: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 22

Live Experiment

• To validate the effectiveness of our approach in a live setting, we used our recommender system to test our approach against the two baselines

• Randomly selected 426 candidates who had recently tweeted about “bird flu” in July 2013

• Each approach selected top 100 candidates based on its criteria

Approach Retweeting Rate

Random People Contact 4%

Popular People Contact 9%

Our Prediction Approach 19%

ApproachAverage

Retweeting Rate

Random People Contact 4%

Popular People Contact 8.7%

Our Prediction Approach 18%

Our Prediction Approach + Wait time model

18.5%Comparison of retweeting rates in live experiment

Comparison of retweeting rates in live experiment with time constraints

Page 23: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 23

To wrap up…• We have presented a feature-based prediction model that can

automatically identify the right individuals at the right time on Twitter

• We have also described a time estimation model

• In the experiments, our approaches doubled the retweeting rates over the two baselines

• With our time estimation model, our approach outperformed other approaches significantly

• Overall, our approach effectively identifies qualified candidates for retweeting a message within a given time window

Page 24: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 24

Thanks!For more information, contact:Jeffrey [email protected] @jwnichls

Page 25: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 25

Page 26: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 26

Why Retweet a Stranger’s Request?

We randomly selected 50 people who retweeted and asked them why they chose to retweet (33 replied)

Main reasons to retweet our requested message• Trustworthiness of the content

“Because it contained a link to a significant report from a reputable media news source”

• Content relevance“Because it happened in my neighborhood”

• Content value“my followers should know this or they may think this info is valuable”

Page 27: Who will RT this?: Automatically Identifying and Engaging Strangers on Twitter to Spread Information

@jwnichls 27

Real-Time Retweeter Recommendation

The interface of our retweeter recommendation system: (a) left panel: system-recommended candidates, and (b) right panel: a user can edit

and compose a retweeting request.