entity sentiment extraction using text ranking - sigir

27
Entity Sentiment Extraction Using Text Ranking John O’Neil Attivio, Inc. 15 August 2012

Upload: others

Post on 11-Feb-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Entity Sentiment Extraction Using Text Ranking - sigir

Entity Sentiment Extraction Using Text Ranking

John O’Neil

Attivio, Inc.

15 August 2012

Page 2: Entity Sentiment Extraction Using Text Ranking - sigir

An Example

I already hated AT&T. It’s my fixed telephony andinternet provider (because it has something of amonopoly on such services). I go through periods wheremy internet becomes intermittent, which AT&T refusesto acknowledge. . . I love love love my iPhone. It’s mymini-computer on the go. I use it for texting, socialsharing, photography, editing, keeping track of mycalendar, storing contacts, finding directions, listening tomusic and podcasts, watching videos, reading, andblogging. Sometimes, I even make a phone call.

— http://stumbledownunder.com/2012/01/07/using-my-beloved-iphone-in-australia/

Page 3: Entity Sentiment Extraction Using Text Ranking - sigir

Entity Sentiment

I Entity extraction and document sentiment are well-knowntechniques.

I For many uses, it’s important to assign sentiment to entitiesin a document, not to the document as a whole.

I How best to accomplish this?

Page 4: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank (Mihalcea & Tarau 2004)

I Document as graph.

I Choose representation appropriately!

I Power iteration finds the dominant eigenvector.

Page 5: Entity Sentiment Extraction Using Text Ranking - sigir

Prerequisite: Entity Extraction

I A combination of statistical and rule-based approaches.

I We get the positions of the entity mentions in the document,and resolve matches.

Page 6: Entity Sentiment Extraction Using Text Ranking - sigir

Prerequisite: Document Sentiment

I Train on a corpus of positive and negative BOWs using yourfavorite linear classifier.

I This associates a (positive or negative) sentiment weight foreach word (and optionally phrase) in the training corpus.

Page 7: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank Highlights

I Document graph

Nodes Words and entitiesEdges Between nearby words-entity pairs and

word-word pairs.Edge Weights Word sentiment

I PageRank

I De-sparsified matrix

Page 8: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank algorithm

Input: Initial set of vertex weights WSIterate until convergence:

WS(Vi ) = (1 − d) + d ∗∑

Vj∈In(Vi )

wji∑Vk∈Out(Vj )

wjk

WS(Vj)

I wij is the weight of the edge going from vertex Vi to vertex Vj

I In(Vi ) are the edges that point to Vi

I Out(Vi ) are the edges that point away from Vi .

I d is a constant damping factor (typically 0.85).

I At convergence, WS contains the final sentiment weights.

Page 9: Entity Sentiment Extraction Using Text Ranking - sigir

An Example, Again

I already hated AT&T. It’s my fixed telephony andinternet provider (because it has something of amonopoly on such services). I go through periods wheremy internet becomes intermittent, which AT&T refusesto acknowledge. . . I love love love my iPhone. It’s mymini-computer on the go. I use it for texting, socialsharing, photography, editing, keeping track of mycalendar, storing contacts, finding directions, listening tomusic and podcasts, watching videos, reading, andblogging. Sometimes, I even make a phone call.

— http://stumbledownunder.com/2012/01/07/using-my-beloved-iphone-in-australia/

Page 10: Entity Sentiment Extraction Using Text Ranking - sigir

Simple Text Graph

ATT monopoly

refusesiPhone

hated

love

−1.5

−0.5

3.2

−0.9

−0.9 3.2

Page 11: Entity Sentiment Extraction Using Text Ranking - sigir

Simple Text Node Weights

initial final

AT&T 1.0 -1.25iPhone 1.0 0.95hated 1.0 -0.84monopoly 1.0 -0.09refuses 1.0 0.28love 1.0 0.95

Page 12: Entity Sentiment Extraction Using Text Ranking - sigir

Main Uses of Entity Sentiment (for us)

Faceting Filling facets with entries relevant to the query.

Entities Creating metadata for entities, improving search.

Time Viewing entity sentiment changes over time.

Page 13: Entity Sentiment Extraction Using Text Ranking - sigir

Entity Sentiment Evaluation

I Without test corpus, compare systems:

TextRank The one described here.Baseline System using document’s sentiment for each

entity in the document.

I Task: get most highly correlated (and anti-correlated)entity-&-sentiment pairs

Page 14: Entity Sentiment Extraction Using Text Ranking - sigir

Entity Sentiment Evaluation Corpus

I One day of the Moreover feed: 23 September 2011.

I Approximately 423,000 news articles in English, mostly U.S.

Page 15: Entity Sentiment Extraction Using Text Ranking - sigir

Top Headlines for 23 September, 2011

I Idaho to seek waiver for No Child Left Behind law

I Spending Dispute Threatens U.S. Government Shutdown

I Faster than light? CERN findings bewilder scientists

I Saleh Returns to Yemen amid Increased Violence

I GOP Candidates Debate in Orlando; Audience Boos GaySoldier

Page 16: Entity Sentiment Extraction Using Text Ranking - sigir

Baseline: Top Document Co-occurrences on query Obama

entity log likelihood

Barack Obama White House 4344.15Barack Obama Mitt Romney 3677.81Barack Obama Rick Perry 3612.59Barack Obama West Bank 3120.62Barack Obama Mahmoud Abbas 2879.53Barack Obama Jon Huntsman 2644.31Barack Obama Michele Bachmann 2526.38Barack Obama United States 2520.69Barack Obama Benjamin Netanyahu 2508.19Barack Obama Rick Santorum 2083.20

Page 17: Entity Sentiment Extraction Using Text Ranking - sigir

Baseline: Top Document Co-occurrences on query StephenHill

entity log likelihood

Stephen Hill Rick Santorum 815.64Stephen Hill Gay Soldier 220.70Stephen Hill Rick Perry 195.20Stephen Hill Megyn Kelly 171.66Stephen Hill Ron Paul 165.22Stephen Hill Brian Williams 141.52Stephen Hill John Kerry 109.87Stephen Hill Mitt Romney 105.68Stephen Hill Herman Cain 90.21Stephen Hill Newt Gingrich 86.38

Page 18: Entity Sentiment Extraction Using Text Ranking - sigir

Baseline: Positive and Negative Entity Sentiment inCorpus

entity %pos %neg

Barack Obama 70.7 29.3Congress 87.6 12.4Michelle Bachmann 94.2 5.8Rick Perry 79.7 20.3Rick Santorum 82.5 17.5Ron Paul 90.0 10.0John Kerry 88.0 12.0Mitt Romney 82.3 17.7Herman Cain 85.1 14.9Newt Gingrich 85.0 15.0Ali Abdullah Saleh 38.9 61.1

Page 19: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank: Positive and Negative Entity Sentiment inCorpus

entity %pos %neg

Barack Obama 46.25 53.75Congress 46.0 54.0Michelle Bachmann 0.0 100.0Rick Perry 39.2 60.8Rick Santorum 13.3 86.7Ron Paul 34.5 65.5Mitt Romney 70.5 29.5Newt Gingrich 77.4 22.6Ali Abdullah Saleh 5.6 94.4

Page 20: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank: Top Same-Polarity Correlations on queryObama

entity log likelihood

Barack Obama Idaho 314.57Barack Obama Eric Holder 134.66Barack Obama Arne Duncan 107.16Barack Obama Angela Merkel 103.03Barack Obama Education Department 74.15

Page 21: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank: Top Opposite-Polarity Correlations on queryObama

entity log likelihood

Barack Obama Mumbai Attackers 388.06Barack Obama Capitol Hill 282.10Barack Obama Republicans 144.94Barack Obama Congress 84.18Barack Obama Michele Bachmann 76.61

Page 22: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank: Top Same-Polarity Correlations on queryStephen Hill

entity log likelihood

Stephen Hill Gay Soldier 13.58

Page 23: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank: Top Opposite-Polarity Correlations on queryStephen Hill

entity log likelihood

Stephen Hill Fox News 114.39Stephen Hill Rick Santorum 116.08Stephen Hill Republican Debate 30.96Stephen Hill Rick Perry 18.43Stephen Hill Mitt Romney 3.98

Page 24: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank: Top Same-Polarity Correlations on queryCongress

entity log likelihood

Congress Mitch Daniels 466.28Congress Senate 122.13Congress Democrats 117.57Congress Treasury Department 95.66Congress Sonia Gandhi 64.59

Page 25: Entity Sentiment Extraction Using Text Ranking - sigir

TextRank: Top Opposite-Polarity Correlations on queryCongress

entity log likelihood

Congress Capitol Hill 278.19Congress Americans 110.84Congress Barack Obama 54.55Congress Janet Napolitano 27.03Congress Senate 17.43

Page 26: Entity Sentiment Extraction Using Text Ranking - sigir

Conclusions & Future Directions

I Extraction of recognizably useful information.

I Need test corpus.

Page 27: Entity Sentiment Extraction Using Text Ranking - sigir

The End

Thanks! Questions?