deep tweets: from entity linking to sentiment analysis
TRANSCRIPT
Deep Tweets: from Entity Linking to Sentiment Analysis
Pierpaolo Basile, Valerio Basile, Malvina Nissim, Nicole Novielli{pierpaolo.basile,nicole.novielli}@uniba.it
{v.basile,m.nissim}@rug.nl
Timeline of Tasks
SemEval‘13
Sentiment Analysis in Twitter
SemEval‘14
- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis
Evalita 2014
SENTIPOLC
SemEval‘15
- Implicit Polarity of Events- Sentiment Analysis in Twitter- Sentiment Analysis of Figurative
Language in Twitter- Aspect Based Sentiment Analysis
SemEval‘16
- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis- Detecting Stance in Tweets
SENTIPOLC @Evalita 2014• Tasks
– Subjectivity Classification– Polarity Classification (most popular)– Irony Detection
• Best system supervised (Uniba)– Two rule-based systems (Unibo, Ca’ Foscari-Venezia)– All ML systems supervised
• Most popular task at Evalita 2014– 11 Teams– 35 Submitted runs (only from research institutions)– Interest from industry
Timeline of Tasks
#Micropost2014
Named Entity Extraction and Linking (NEEL)
#MSM2013
Concept Extraction Challenge
SemEval‘13
Sentiment Analysis in Twitter
SemEval‘14
- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis
Evalita 2014
SENTIPOLC
SemEval‘15
- Implicit Polarity of Events- Sentiment Analysis in Twitter- Sentiment Analysis of Figurative
Language in Twitter- Aspect Based Sentiment Analysis
#Micropost2015
Named Entity Extraction and Linking (NEEL)
SemEval’15
Multilingual All-Words Sense Disambiguation
and Entity Linking
SemEval‘16
- Sentiment Analysis in Twitter- Aspect Based Sentiment Analysis- Detecting Stance in Tweets
Evalita 2016?
Entity-Based Sentiment Analysis
• Detecting the sentiment attached to an entity in a tweet
• Stance detection• Relevant for modelling socio-economic
phenomena– Mining political sentiment, predicting election
results– Commercial application– Health issues
Annotation of Entities
@FabioClerici sono altri a dire che un reato. E il "politometro" come lo chiama #Grillo vale per tutti. Anche per chi fa #antipolitica.
FabioClerici (offsets 1-13) linked as NIL
(no resources in DBpedia)
Grillo (offsets 85-91) linked with the respective URI in DBpedia: http://dbpedia. org/resource/Beppe_Grillo
Challenge-oriented Sentiment Analysis?
• Prevalence of supervised ML systems in both SemEval and Evalita
• Beyond the challenge, are they valid in the real world?– Domain-dependence and low temporal validity– Political debates: countries afflicted by war– Technology: ‘killer’ features in positive reviews
Distribution in SENTIPOLC Data
39%
61%
Positive Tweets
Negative Tweets
34%
66%
#Grillo #Monti
Sentiment Analysis of Figurative Language• Complex relation between sentiment and
figurative language– Irony mainly acts as a polarity reverser– Metaphor, sarcasm and other linguistic devices
might impact sentiment in different ways• Necessary treatment: > 20% of tweets show
some form of figurative usage (irony/sarcasm)
Annotation of Irony
• Extension of the SENTIPOLC schema
subj pos neg irony opos oneg Description1 1 0 1 0 1 Subjective tweet
Positive literal polarity Negative overall polarity
Botta di ottimismo a #lInfedele: Governo Monti, o la va o la spacca
Resources
• SENTIPOLC Dataset1
– Train set using tweets about political topic• TWITA2
– Expand train set– Test (no political topic)
• Italian dataset of manually annotated tweets for Named Entity Linking3 – Add sentiment annotation
1 - http://www.di.unito.it/~tutreeb/sentipolc-evalita14/data.html (Basile et al., 2014)2 - http://valeriobasile.github.io/twita/about.html (Basile and Nissim 2013)3 - https://github.com/swapUniba/neel-it-twitter (Basile et al., @CLIC 2015)
Conclusion and Open Issues
• Entity linking and sentiment analysis on Twitter are challenging, attractive, and timely tasks for the Italian NLP community– Options: running the two tasks on shared data?– How does SA differ in message- and entity-level? Techniques,
features, results.– How to deal with the layer of figurative language?– How is annotation affected?
• How to prevent challenge-bound systems?– Train and test set from different domains– Multiple runs of submission
Evalita 2016?