it’s not in their tweets: modeling topical expertise of twitter users

24
It’s not in their tweets: Modeling topical expertise of Twitter users Claudia Wagner, Vera Liao, Peter Pirolli, Les Nelson and Markus Strohmaier Amsterdam, 16.4.2012

Upload: claudia-wagner

Post on 27-Jan-2015

107 views

Category:

Technology


1 download

DESCRIPTION

presented at the ASE/IEEE International conference on Social Computing 2012 in Amsterdam

TRANSCRIPT

Page 1: It’s not in their tweets: Modeling topical expertise of Twitter users

It’s not in their tweets: Modeling topical expertise

of Twitter usersClaudia Wagner, Vera Liao, Peter Pirolli, Les Nelson and Markus Strohmaier

Amsterdam, 16.4.2012

Page 2: It’s not in their tweets: Modeling topical expertise of Twitter users

with…

Vera Liao

Markus Strohmaier

Les Nelson

Peter Pirolli

Page 3: It’s not in their tweets: Modeling topical expertise of Twitter users

3Motivation

On Twitter information consumption is mainly driven by social networks

Users need to decide whom to follow in order to get trustful and relevant information about the topics they are interested in

Evidence from real-life

Search online for evidence

Page 4: It’s not in their tweets: Modeling topical expertise of Twitter users

Searching for evidence at Twitter user’s profile page

Bio

Tweets and Retweets

List Memberships

Page 5: It’s not in their tweets: Modeling topical expertise of Twitter users

6Research Questions

How useful are different types of user-related data for humans to inform their expertise judgments of Twitter users?

How useful are different types of user-related data for learning computational expertise models of users?

Page 6: It’s not in their tweets: Modeling topical expertise of Twitter users

User StudyExpertise Judgments of humans

16 participants

Task: Rate (1-5) expertise level of selected Twitter users (with high and low expertise) for the topic „semanticweb“

3 Conditions under which the user accounts were presented to subjects:

Condition 1: Tweets, Retweets, List, Bio

Condition 2: Only Tweets and Retweets are shown

Condition 3: Only List and Bio are shown

For each condition and expertise level we have 4 Twitter pages (4 replicates)

4 * 3 * 2 = 24 pages to rate per subject

Page 7: It’s not in their tweets: Modeling topical expertise of Twitter users

User StudyExpertise Judgments of humans

2-way ANOVA

Within-Subject Variables:• Twitter user expertise (high/low) • 3 Conditions

Interaction between conditions and Twitter user expertise is significant (F(2) = 8,326 , p < 0,01 )

Post-Hoc Test shows that users’ ability to correctly judge expertise of Twitter users differs significantly under condition 1 and 2 and condition 2 and 3.

Page 8: It’s not in their tweets: Modeling topical expertise of Twitter users

9Research Questions

How useful are different types of user-related data for humans to inform their expertise judgments of Twitter users?

How useful are different types of user-related data for learning computational expertise models of users?

Page 9: It’s not in their tweets: Modeling topical expertise of Twitter users

10Dataset

10 topics semanticweb, biking, wine, democrat, republican, medicine, surfing, dogs, nutrition and diabetes

We use Wefollow directories as a manually created proxy ground truth for expertise

Top 150 users per Wefollow directory

Excluded users who are in more than one of the 10 directories and users who mainly tweet non-english

Page 10: It’s not in their tweets: Modeling topical expertise of Twitter users

11Dataset

1145 usersMost recent 1000 tweets and retweets

Most recent 300 user lists

Bio info

Information on Twitter is sparseExtend URLs in Tweets, RTs and bio

Use list names as search query terms

Use top 5 search query result snippets obtained from Yahoo Boss3 to enrich list information

Page 11: It’s not in their tweets: Modeling topical expertise of Twitter users

Computational Expertise ModelsMethodology

Learn latent semantic structures (topics) from Twitter communication by fitting an LDA model

Top 20 stemmed words of 3 randomly select topics learned by an LDA model with T=50

T1 T2 T3

Page 12: It’s not in their tweets: Modeling topical expertise of Twitter users

Computational Expertise ModelsMethodology

Associate users with topics by using statistical Inference based on different types of user related data user’s topical expertise profile

Bio

Lists

Tweets

RTs

T1 T2 T3

T1 T3T2

T1 T3T2

T1 T3T2

Page 13: It’s not in their tweets: Modeling topical expertise of Twitter users

Topical Similarity between lists/bio/tweets/RTs

Page 14: It’s not in their tweets: Modeling topical expertise of Twitter users

15Types of User Lists

Manual inspection of user lists

Selected 10 users at random and inspected their user list memberships (455 user lists)

We found 3 main classes of user lists:Personal judgments (e.g., “great people”, “geeks”)

Personal relationships (e.g., “my family”,“colleagues”)

Topical Lists (e.g., “science”, “researcher”, “healthcare”)

Page 15: It’s not in their tweets: Modeling topical expertise of Twitter users

16Value of User Lists

3 human raters judged if a list (label and/or description) belongs to the class Topical Lists

77,67% of user lists were topical lists

Inter-rater agreement Kappa=0.62

Page 16: It’s not in their tweets: Modeling topical expertise of Twitter users

17

Quantify the Value of Lists/Bio/Tweets/RTs

Which type of information reflects best the topical expertise of a user?

Information Theoretic EvaluationWhich type of topic distribution reflects best the underlying category information of the user?

Normalized Mutual Information (NMI) between user’s topic distributions and user’s Wefollow directory

Task-based EvaluationWhich type of topic distributions are most useful for classifying users into their Wefollow directories?

F1-score of classifcation models

Page 17: It’s not in their tweets: Modeling topical expertise of Twitter users

18

Information-Theoretic Evaluation ofComputational Expertise Models

Page 18: It’s not in their tweets: Modeling topical expertise of Twitter users

Task-based Evaluation ofComputational Expertise Models

Compare topic distributions inferred via different types of user-related data within a classification task

Objective: Classifying users into Wefollow directories by using topic distribution as features

Classification Task:

Train Partial Least Square classifier with topic distributions inferred via different types of user-related data as features

Perform 5-fold-cross validation

Use F-measure (harmonic mean of precision and recall) to compare classifiers’ performance

Page 19: It’s not in their tweets: Modeling topical expertise of Twitter users

Task-based Evaluation ofComputational Expertise Models

Page 20: It’s not in their tweets: Modeling topical expertise of Twitter users

Task-based Evaluation ofComputational Expertise Models

Page 21: It’s not in their tweets: Modeling topical expertise of Twitter users

Task-based Evaluation ofComputational Expertise Models

T=300

x-axis shows reference values y-axis shows predictions

Page 22: It’s not in their tweets: Modeling topical expertise of Twitter users

Conclusions

Different types of user-related data lead to different topic annotations

List-based topic annotations are most distinct from all others

Bio-, tweet- and retweet-based topic annotations are quite similar

For creating topical expertise profiles of users information about their list memberships is most useful

For informing humans’ expertise judgments about Twitter users contextual information (user’ bio and list memberships) is most useful

Page 23: It’s not in their tweets: Modeling topical expertise of Twitter users

24Implications & Limitations

User InterfaceMake user lists and bio information more prominent

Incentives for people to use lists more heavilyE.g. provide weakly list-summaries

Search and Recommender Systems could benefit from exploiting user list information

Results are biased towards users with high Wefollow rank

Page 24: It’s not in their tweets: Modeling topical expertise of Twitter users

Experimental Setup

THANK YOU

[email protected]://claudiawagner.info

src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/

Bio and User Lists are useful for judging topical expertise