online reputation monitoring in twitter from an information access perspective

66
Online Reputation Monitoring in Twitter from an Information Access Perspective Damiano Spina UNED NLP & IR Group [email protected] @damiano10 January 29, 2014 FdI UCM, Madrid, Spain

Upload: damiano-spina

Post on 10-May-2015

341 views

Category:

Social Media


0 download

DESCRIPTION

Slides of my talk about the research I'm doing for my PhD thesis, given at Grasia, UCM (http://grasia.fdi.ucm.es/) on January, 2014

TRANSCRIPT

Page 1: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring in Twitter

from an Information Access Perspective

Damiano Spina

UNED NLP & IR Group

[email protected] @damiano10

January 29, 2014 FdI UCM, Madrid, Spain

Page 2: Online Reputation Monitoring in Twitter from an Information Access Perspective

In Collaboration with

● Julio Gonzalo

● Enrique Amigó

● Jorge Carrillo de Albornoz

● Irina Chugur

● Tamara Martín

University of Amsterdam

● Maarten de Rijke

● Edgar Meij (Yahoo! Barcelona)

● Mª Hendrike Peetz

Llorente & Cuenca

● Vanessa Álvarez

● Ana Pitart

● Adolfo Corujo

LiMoSINe EU Project

www.limosine-project.eu

Page 3: Online Reputation Monitoring in Twitter from an Information Access Perspective

Arab Spring in Egypt, Jan 2011

Page 4: Online Reputation Monitoring in Twitter from an Information Access Perspective
Page 5: Online Reputation Monitoring in Twitter from an Information Access Perspective
Page 6: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring (ORM)

● Reputation/public image is key for entities:

– Companies, Organizations, Personalities

Page 7: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring (ORM)

● Reputation/public image is key for entities:

– Companies, Organizations, Personalities

● Social Media:

– Necessity (and opportunity) of handling the public image

of entities on the Web

Page 8: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring (ORM)

● Reputation/public image is key for entities:

– Companies, Organizations, Personalities

● Social Media:

– Necessity (and opportunity) of handling the public image

of entities on the Web

– Online Reputation Managers/Analysts

● Handle the reputation of an entity of interest (i.e., customer)

● Among other tasks, monitoring Social Media (manually!)

– Early detection of issues/conversations/topics that may damage the

reputation of the entity of interest

Page 9: Online Reputation Monitoring in Twitter from an Information Access Perspective

Automatic Tools for ORM

Information Access (IA) techniques for -Tracking Relevant Mentions - Sentiment Analysis - Discover Keywords/Topics

Page 10: Online Reputation Monitoring in Twitter from an Information Access Perspective

Problem

● Lack of standard benchmarks

for evaluation

Page 11: Online Reputation Monitoring in Twitter from an Information Access Perspective

Problem

● Lack of standard benchmarks

for evaluation

● It is hard for the analysts to know

how automatic tools will perform

on their real data

Page 12: Online Reputation Monitoring in Twitter from an Information Access Perspective

Goals

● Formalize the Online Reputation Monitoring

problem as scientific challenges

Page 13: Online Reputation Monitoring in Twitter from an Information Access Perspective

Goals

● Formalize the Online Reputation Monitoring

problem as scientific challenges

– Build standard test collections

– Organize International evaluation campaigns

– Bring together ORM and IA experts from Industrial and

Academic communities

Page 14: Online Reputation Monitoring in Twitter from an Information Access Perspective

Goals

● Formalize the Online Reputation Monitoring

problem as scientific challenges

– Build standard test collections

– Organize International evaluation campaigns

– Bring together ORM and IA experts from Industrial and

Academic communities

● Propose automatic solutions that may assist the

reputation manager, reducing the effort in their daily

work

Page 15: Online Reputation Monitoring in Twitter from an Information Access Perspective

Outline

● Online Reputation Monitoring in Twitter

Page 16: Online Reputation Monitoring in Twitter from an Information Access Perspective

Outline

● Online Reputation Monitoring in Twitter

● Formalization from an Information Access perspective

– Tasks Definition

– Evaluation Framework

Page 17: Online Reputation Monitoring in Twitter from an Information Access Perspective

Outline

● Online Reputation Monitoring in Twitter

● Formalization from an Information Access perspective

– Tasks Definition

– Evaluation Framework

● How much of the problem can be solved automatically?

– Filtering

– Topic Detection

Page 18: Online Reputation Monitoring in Twitter from an Information Access Perspective

Outline

● Online Reputation Monitoring in Twitter

● Formalization from an Information Access perspective

– Tasks Definition

– Evaluation Framework

● How much of the problem can be solved automatically?

– Filtering

– Topic Detection

● Putting the Human in the Loop: A Semi-Automatic ORM

Assistant

Page 19: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring in

Twitter

● Analysts' daily work

– Focus on a given entity of interest

Page 20: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring in

Twitter

● Analysts' daily work

– Focus on a given entity of interest

– Recall oriented

● They have to check all potential mentions!

● Also filter out not relevant mentions manually

Page 21: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring in

Twitter

● Analysts' daily work

– Focus on a given entity of interest

– Recall oriented

● They have to check all potential mentions!

● Also filter out not relevant mentions manually

– They make a summary to report to the client periodically

– Summary

● What is being said about the entity in Twitter?

What are the topics that may damage its reputation?

Page 22: Online Reputation Monitoring in Twitter from an Information Access Perspective

Why Twitter?

● (Bad) news spread earlier/faster/more unpredictable

than any other source in the Web

● Most popular microblogging service

– >230M monthly active users

– 5k tweets published per second

Page 23: Online Reputation Monitoring in Twitter from an Information Access Perspective

Why Twitter?

● (Bad) news spread earlier/faster/more unpredictable

than any other source in the Web

● Most popular microblogging service

– >230M monthly active users

– 5k tweets published per second

● Challenging for Information Access

– Little context (only 140 characters)

– Non-standard, SMS-like language

Page 24: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring in

Twitter

Page 25: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring in

Twitter

?

Page 26: Online Reputation Monitoring in Twitter from an Information Access Perspective

Problem Formalization

ORM from an Information Access Perspective

Page 27: Online Reputation Monitoring in Twitter from an Information Access Perspective

Filtering Task

● Is the tweet related to the entity of interest?

● Example: Suzuki

related unrelated

Page 28: Online Reputation Monitoring in Twitter from an Information Access Perspective

Filtering Task

● Is the tweet related to the entity of interest?

● Example: Suzuki

● Input: Entity of interest (name + representative

URL) + tweets that potentially mention the entity

● Output: Binary classification at tweet-level

(relevant/not relevant)

related unrelated

Page 29: Online Reputation Monitoring in Twitter from an Information Access Perspective

Polarity for Reputation Task

● Does the tweet affect negatively/positively to the reputation

of the entity?

● Example: Goldman Sachs

Page 30: Online Reputation Monitoring in Twitter from an Information Access Perspective

Polarity for Reputation Task

● Does the tweet affect negatively/positively to the reputation

of the entity?

● Example: Goldman Sachs

● Input: Entity of interest (name + representative URL) +

Stream of tweets that potentially mention the entity

● Output: Multi-class classification at tweet-level

(positive/negative/neutral)

Page 31: Online Reputation Monitoring in Twitter from an Information Access Perspective

Topic Detection Task

● What are the topics discussed in the tweets?

Page 32: Online Reputation Monitoring in Twitter from an Information Access Perspective

Topic Detection Task

● What are the topics discussed in the tweets?

● Input: Entity of interest (name + representative URL) +

Stream of tweets that mention the entity

● Output: Topics (Cluster of tweets)

Page 33: Online Reputation Monitoring in Twitter from an Information Access Perspective

Topic Priority Task

● What is the priority of each topics

in terms of reputational issues?

● Input: Topics

● Output: Ranking of Topics

– Alerts go first

Page 34: Online Reputation Monitoring in Twitter from an Information Access Perspective

Evaluation Framework

● Reusable Test Collections

● Evaluation Measures

– Compare systems to annotated ground truth

Page 35: Online Reputation Monitoring in Twitter from an Information Access Perspective

Evaluation Framework

● Reusable Test Collections

● Evaluation Measures

– Compare systems to annotated ground truth

● Evaluation Campaigns

– Involve community

– Compare different approaches

Page 36: Online Reputation Monitoring in Twitter from an Information Access Perspective

RepLab: Evaluating Online Reputation

Management Systems

● Organized as CLEF Labs

Cross-Language Evaluation Forum

Page 37: Online Reputation Monitoring in Twitter from an Information Access Perspective

RepLab: Evaluating Online Reputation

Management Systems

● Organized as CLEF Labs

Cross-Language Evaluation Forum

● 2 editions so far (+1 this year)

– RepLab 2012

● Filtering and Polarity for Reputation

● Topic Detection and Topic Priority as Monitoring Pilot Task

– RepLab 2013

– RepLab 2014 (in progress)

E. Amigó, J. Carrillo de Albornoz, I. Chugur, A. Corujo, J. Gonzalo, T. Martín, E. Meij, M. de Rijke, D. Spina Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems Proceedings of the Fourth International Conference of the CLEF initiative. 2013.

Page 38: Online Reputation Monitoring in Twitter from an Information Access Perspective

Building Test Collections

Page 39: Online Reputation Monitoring in Twitter from an Information Access Perspective

Annotation Process

Page 40: Online Reputation Monitoring in Twitter from an Information Access Perspective

RepLab 2013 Annotation Tool

Page 41: Online Reputation Monitoring in Twitter from an Information Access Perspective

The RepLab 2013 Dataset

Page 42: Online Reputation Monitoring in Twitter from an Information Access Perspective

Evaluation

Page 43: Online Reputation Monitoring in Twitter from an Information Access Perspective

Why we Need All this Stuff?

● To Evaluate Automatic Systems

● To be able to answer the questions:

– Which system performs better?

– Can tasks be solved automatically?

Page 44: Online Reputation Monitoring in Twitter from an Information Access Perspective

Automatic Solutions for ORM:

Filtering + Topic Detection

Page 45: Online Reputation Monitoring in Twitter from an Information Access Perspective

Evaluation: Filtering Task

Automatic systems can significantly help when there is enough training data for each entity (750 tweets)

Page 46: Online Reputation Monitoring in Twitter from an Information Access Perspective

Evaluation: Filtering Task

Automatic systems can significantly help when there is enough training data for each entity (750 tweets) How? * Supervised learning POPSTAR (Univ. of Porto): Features: Twitter metadata, textual features, keyword similarity + external resources such as the entity’s homepage, Freebase and Wikipedia.

Page 47: Online Reputation Monitoring in Twitter from an Information Access Perspective

Evaluation: Topic Detection

Much more difficult than the Filtering Task

Page 48: Online Reputation Monitoring in Twitter from an Information Access Perspective

Evaluation: Topic Detection

Much more difficult than the Filtering Task

What performed better in RepLab? UNED_ORM: Clustering of wikified tweets Tweets are represented as Bag of Wikipedia Concepts Tweet content linked to Wikipedia concepts based on intra-Wikipedia links

Page 49: Online Reputation Monitoring in Twitter from an Information Access Perspective

Topic Detection Approach

● Tweet -> Set of Wikipedia Concepts/Articles

● Clustering: Tweets sharing x% of identified

Wikipedia articles are grouped together

D. Spina, J. Carrillo de Albornoz, T. Martín, E. Amigó, J. Gonzalo, F. Giner UNED Online Reputation Monitoring Team at RepLab 2013 CLEF 2013 Labs and Workshops Notebook Papers. 2013.

Page 50: Online Reputation Monitoring in Twitter from an Information Access Perspective

Wikification: Commonness probability

WP concept c, n-gram q

q=“ferrari”

Page 51: Online Reputation Monitoring in Twitter from an Information Access Perspective

Wikification: Commonness probability

WP concept c, n-gram q

q=“ferrari”

Page 52: Online Reputation Monitoring in Twitter from an Information Access Perspective

Wikification: Commonness probability

WP concept c, n-gram q

COMMONNESS "Ferrari S.p.A.", "ferrari" =4

(4 + 2 + 1)= 0.57

q=“ferrari”

Page 53: Online Reputation Monitoring in Twitter from an Information Access Perspective

Putting the Human in the Loop

Page 54: Online Reputation Monitoring in Twitter from an Information Access Perspective

Building Semi-Automatic Tools for

ORM

Page 55: Online Reputation Monitoring in Twitter from an Information Access Perspective

ORMA: A Semi-Automatic Tool for

Online Reputation Monitoring

J. Carrillo de Albornoz, E. Amigó, D. Spina, J. Gonzalo ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter 36th European Conference on Information Retrieval (ECIR). 2014.

Page 56: Online Reputation Monitoring in Twitter from an Information Access Perspective

Basic Filtering Approach

Page 57: Online Reputation Monitoring in Twitter from an Information Access Perspective

Basic Filtering Approach

Training tweet

Test tweet (unknown label)

Related/Unrelated

Bag of Words: Tokenization + Preprocessing + Term Weighting

Support Vector Machines (SVM)

Filtering Classifier

0.42 F: Similar to best RepLab

Page 58: Online Reputation Monitoring in Twitter from an Information Access Perspective

Active Learning for Filtering

M. H. Peetz, D. Spina, M. de Rijke, J. Gonzalo Towards an Active Learning System for Company Name Disambiguation in Microblog Streams CLEF 2013 Labs and Workshops Notebook Papers. 2013.

Page 59: Online Reputation Monitoring in Twitter from an Information Access Perspective

Active Learning for Filtering

● Margin Sampling (confidence of the classifier)

● After inspecting 2% of test data (30 out of 1500 tweets):

– 0.42 -> 0.52 F(R,S) (19.2% improvement)

– Higher than the best RepLab contribution

Page 60: Online Reputation Monitoring in Twitter from an Information Access Perspective

Active Learning for Filtering

● Margin Sampling (confidence of the classifier)

● After inspecting 2% of test data (30 out of 1500 tweets):

– 0.42 -> 0.52 F(R,S) (19.2% improvement)

– Higher than the best RepLab contribution

● The cost of initial training data can be reduced

substantially:

– Effectiveness:

10% training + 10% test for feedback = 100% training

Page 61: Online Reputation Monitoring in Twitter from an Information Access Perspective

Conclusions

Page 62: Online Reputation Monitoring in Twitter from an Information Access Perspective

Conclusions

● Online Reputation Monitoring in Twitter

Page 63: Online Reputation Monitoring in Twitter from an Information Access Perspective

Conclusions

● Online Reputation Monitoring in Twitter

● Formalized as Information Access Tasks

– Reusable Test Collections

– Systematic Evaluation

Page 64: Online Reputation Monitoring in Twitter from an Information Access Perspective

Conclusions

● Online Reputation Monitoring in Twitter

● Formalized as Information Access Tasks

– Reusable Test Collections

– Systematic Evaluation

● Can tasks be solved automatically?

– Filtering: Almost solved with enough training data

(0.49F, 0.91 accuracy)

– Topic: Systems are useful but not perfect

Page 65: Online Reputation Monitoring in Twitter from an Information Access Perspective

Conclusions

● Online Reputation Monitoring in Twitter

● Formalized as Information Access Tasks

– Reusable Test Collections

– Systematic Evaluation

● Can tasks be solved automatically?

– Filtering: Almost solved with enough training data

(0.49F, 0.91 accuracy)

– Topic: Systems are useful but not perfect

● We need the expert in the loop

– With a substantial reduction of manual effort

Page 66: Online Reputation Monitoring in Twitter from an Information Access Perspective

Online Reputation Monitoring in Twitter

from an Information Access Persepective

Damiano Spina

UNED NLP & IR Group

[email protected] @damiano10

January 29, 2014 FdI UCM, Madrid, Spain