deep tweets: from entity linking to sentiment analysis

Deep Tweets: from Entity Linking to Sentiment Analysis

Pierpaolo Basile, Valerio Basile, Malvina Nissim, Nicole Novielli{pierpaolo.basile,nicole.novielli}@uniba.it

{v.basile,m.nissim}@rug.nl

Timeline of Tasks

SemEval‘13

Sentiment Analysis in Twitter

SemEval‘14

- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis

Evalita 2014

SENTIPOLC

SemEval‘15

- Implicit Polarity of Events- Sentiment Analysis in Twitter- Sentiment Analysis of Figurative

Language in Twitter- Aspect Based Sentiment Analysis

SemEval‘16

- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis- Detecting Stance in Tweets

SENTIPOLC @Evalita 2014• Tasks

– Subjectivity Classification– Polarity Classification (most popular)– Irony Detection

• Best system supervised (Uniba)– Two rule-based systems (Unibo, Ca’ Foscari-Venezia)– All ML systems supervised

• Most popular task at Evalita 2014– 11 Teams– 35 Submitted runs (only from research institutions)– Interest from industry

Timeline of Tasks

#Micropost2014

Named Entity Extraction and Linking (NEEL)

#MSM2013

Concept Extraction Challenge

SemEval‘13

Sentiment Analysis in Twitter

SemEval‘14

- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis

Evalita 2014

SENTIPOLC

SemEval‘15

- Implicit Polarity of Events- Sentiment Analysis in Twitter- Sentiment Analysis of Figurative

Language in Twitter- Aspect Based Sentiment Analysis

#Micropost2015

Named Entity Extraction and Linking (NEEL)

SemEval’15

Multilingual All-Words Sense Disambiguation

and Entity Linking

SemEval‘16

- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis- Detecting Stance in Tweets

Evalita 2016?

Entity-Based Sentiment Analysis

• Detecting the sentiment attached to an entity in a tweet

• Stance detection• Relevant for modelling socio-economic

phenomena– Mining political sentiment, predicting election

results– Commercial application– Health issues

Annotation of Entities

@FabioClerici sono altri a dire che un reato. E il "politometro" come lo chiama #Grillo vale per tutti. Anche per chi fa #antipolitica.

FabioClerici (offsets 1-13) linked as NIL

(no resources in DBpedia)

Grillo (offsets 85-91) linked with the respective URI in DBpedia: http://dbpedia. org/resource/Beppe_Grillo

Challenge-oriented Sentiment Analysis?

• Prevalence of supervised ML systems in both SemEval and Evalita

• Beyond the challenge, are they valid in the real world?– Domain-dependence and low temporal validity– Political debates: countries afflicted by war– Technology: ‘killer’ features in positive reviews

Distribution in SENTIPOLC Data

39%

61%

Positive Tweets

Negative Tweets

34%

66%

#Grillo #Monti

Sentiment Analysis of Figurative Language• Complex relation between sentiment and

figurative language– Irony mainly acts as a polarity reverser– Metaphor, sarcasm and other linguistic devices

might impact sentiment in different ways• Necessary treatment: > 20% of tweets show

some form of figurative usage (irony/sarcasm)

Annotation of Irony

• Extension of the SENTIPOLC schema

subj pos neg irony opos oneg Description1 1 0 1 0 1 Subjective tweet

Positive literal polarity Negative overall polarity

Botta di ottimismo a #lInfedele: Governo Monti, o la va o la spacca

Resources

• SENTIPOLC Dataset1

– Train set using tweets about political topic• TWITA2

– Expand train set– Test (no political topic)

• Italian dataset of manually annotated tweets for Named Entity Linking3 – Add sentiment annotation

1 - http://www.di.unito.it/~tutreeb/sentipolc-evalita14/data.html (Basile et al., 2014)2 - http://valeriobasile.github.io/twita/about.html (Basile and Nissim 2013)3 - https://github.com/swapUniba/neel-it-twitter (Basile et al., @CLIC 2015)

http://www.di.unito.it/~tutreeb/sentipolc-evalita14/data.html

http://www.di.unito.it/~tutreeb/sentipolc-evalita14/data.html

http://valeriobasile.github.io/twita/about.html

https://github.com/swapUniba/neel-it-twitter



Conclusion and Open Issues

• Entity linking and sentiment analysis on Twitter are challenging, attractive, and timely tasks for the Italian NLP community– Options: running the two tasks on shared data?– How does SA differ in message- and entity-level? Techniques,

features, results.– How to deal with the layer of figurative language?– How is annotation affected?

• How to prevent challenge-bound systems?– Train and test set from different domains– Multiple runs of submission

Evalita 2016?