reaction reaction workshop 2011.01.06 task 1 – progress report & plans lisbon, pt and austin,...

14
REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

Upload: cory-pitts

Post on 17-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

NREACTION Workshop 2011.01.06Task 1 – Progress Report & PlansLisbon, PT and Austin, TX

Mário J. SilvaUniversity of Lisbon, Portugal

Page 2: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

Grants (paid by Reaction)

Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 )

João Ramalho (BIC: Jan 1, 2011 – April 31, 2011)

Page 3: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

Mining resources

Development of robust linguistic resources to process different types and genres of texts knowledge resources about media personalities:

recognizing and resolving references to named-entities;

sentiment lexicons and grammars: detecting the polarity of opinions about media personalities

annotated corpora: training different text classifiers and evaluating classification procedures

Page 4: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

Mining resources

POWER - Political Ontology for Web Entity Retrieval

SentiLex-PT01 – Sentiment Lexicon for Portuguese

SentiCorpus-PT09 – Sentiment annotated corpus of user comments to political debates

Page 5: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

POWER

POWER is an ontology that formalizes the domain knowledge defining a political landscape, i.e., the political actors and their roles in the political scene, their relationships and interactions. The ontology is foccused in describing:

Politicians Political Institutions with different levels of authority

(International, National, Regional,...) Political Associations Political Affiliations and Endorsements Elections Mandates

Page 6: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

POWER

Currently, the ontology describes: 587 Political actors 17 (editions) of Political Institutions 16 Political Associations 900 Mandates

1 Election 6 Candidate Lists

from the Portuguese political scene

Page 7: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

SentiLex-PT01SentiLex-PT01 is a sentiment lexicon for Portuguese made up of 6,321 adjective lemmas, and 25,406 inflected forms.

The sentiment entries correspond to human predicate adjectives

The sentiment attributes described in SentiLex-PT01 concern:

the predicate polarity,

the target of sentiment, and

the polarity assignment (which was performed manually or automatically, by JALC)

Page 8: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

SentiLex-lem-PT01

8

6,321 lemmas

abatido.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN

abelhudo.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN

abençoado. PoS=Adj;TG=HUM;POL=1;ANOT=JALC

atrevido, PoS=Adj;TG=HUM;POL=0;ANOT=MAN

bem-educado.PoS=Adj;TG=HUM;POL=1;ANOT=MAN

brega.PoS=Adj;TG=HUM;POL=-1;ANOT=JALC

violento, PoS=Adj;TG=HUM;POL=-1;ANOT=JALC

Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01

Page 9: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

SentiLex-flex-PT01

9

25,406 inflected forms abatida,abatido.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=MAN

abatidas,abatido.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=MAN

abatido,abatido.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=MAN

abatidos,abatido.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=MAN

bem-educada,bem-educado.PoS=Adj;GN=fs;TG=HUM;POL=1;ANOT=MAN

bem-educadas,bem-educado.PoS=Adj;GN=fp;TG=HUM;POL=1;ANOT=MAN

bem-educado,bem-educado.PoS=Adj;GN=ms;TG=HUM;POL=1;ANOT=MAN

bem-educados,bem-educado.PoS=Adj;GN=mp;TG=HUM;POL=1;ANOT=MAN

brega,brega.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=JALC

brega,brega.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=JALC

bregas,brega.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=JALC

bregas,brega.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=JALC

Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01

Page 10: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

SentiCorpus-PT09SentiCorpus-PT09 is a collection of comments posted by the readers of the Público newspaper to a series of 10 news articles, each covering a televised face-to-face debate between the main candidates to the 2009 parliamentary elections.

The collection is composed by 2,795 comments (~8,000 sentences).

3,537 sentences, from 736 comments (27% of the corpus), were

manually labeled with sentiment information.

Sentiment annotation involves different relevant dimensions, such as

polarity, opinion target, target mention and verbal irony.

Page 11: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

Page 12: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

Main findings Real challenge in performing opinion mining in user-

generated content is correctly identifying the positive opinions Positive opinions are less frequent than negative opinions (20%) Positive opinions particularly exposed to verbal irony (11%)

Other opinion mining challenges are related to the entity recognition and co-reference resolution sub-tasks mentions to human targets are frequently made through pronouns,

definite descriptions and nicknames. The most frequent type of mention is the person name, but it only

covers 36% of the analyzed cases.

Page 13: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

Next steps

April 2011: POWER

Populating the ontology, using text-mining approaches Internal release

SentiLex-PT01 Exploring other methods and algoritms (SVM, Active Learning) for

automatic polarity classification Enlarging the sentiment lexicon (verbs, predicate nouns, idiomatic

expressions)

Page 14: REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal

REAC

TIO

N

Next steps

August 2011: POWER

First release to the general public via SPARQL endpoint and web user interface

SentiCorpus-PT09 Publically available

Analysis and (semi-automated) annotation of a collection of documents from industrial and social media, over a period of 6 months