just how mad are you? finding strong and weak opinion clauses

21
Just how mad are you? Finding strong and weak opinion clauses Theresa Wilson, Janyce Wiebe, and Rebecc a Hwa University of Pittsburgh Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-2004).

Upload: gwendolyn-fuller

Post on 03-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Just how mad are you? Finding strong and weak opinion clauses. Theresa Wilson, Janyce Wiebe, and Rebecca Hwa University of Pittsburgh Proceedings of the Nineteenth National Conference on Artificial Intelligence ( AAAI - 2004 ). Abstract & introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Just how mad are you? Finding strong and weak opinion clauses

Just how mad are you? Finding strong and weak opinion clauses

Theresa Wilson, Janyce Wiebe, and Rebecca Hwa

University of Pittsburgh

Proceedings of the Nineteenth National Conference

on Artificial Intelligence (AAAI-2004).

Page 2: Just how mad are you? Finding strong and weak opinion clauses

Abstract & introduction

To date, most research has focused on classification at the document or sentence level.

Many applications would benefit from being able to determine not just whether something is opinionated but also the strength of the opinion. Ex: Flame detection system.

Presents the first experimental results classifying the strength of opinions and classifying the subjectivity of deeply nested clauses.

Use a wide range of features, including new syntactic features developed for opinion recognition.

Page 3: Just how mad are you? Finding strong and weak opinion clauses

Subjective Expression Subjective expressions are words and phrases that ex

press opinions, emotions, sentiments, speculations, etc.

A general covering term for such states is private state

There are three main ways that private states are expressed in language: direct mentions of private states, (1)-fears speech events expressing private states (2)-continued expressive subjective elements (1)-full of absurdities

(1) “The US fears a spill-over,” said Xirao-Nima.(2) “The report is full of absurdities,” he continued.

Page 4: Just how mad are you? Finding strong and weak opinion clauses

Corpus

MPQA corpus was released in 2003 53% are unique strings Attribute

who is expression the opinion target type of attitude key for our purposes strength neutral, low, medium, and high

Page 5: Just how mad are you? Finding strong and weak opinion clauses

Exploring Strength Strong subjectivity is expressed in many different ways.

Clearly strong. Ex: reckless , praise Obviously modifications that increase or decrease their

strength. Ex: very reckless Expressions marked high often contain words that are

very infrequent. Ex: petards Pattern “expressed <direct-object>”.

Ex: expressed hope, expressed concern…. Syntactic modifications and syntactic patterns have

subjective force. patterns that have a cumulative effect on strength.

Ex: criticize and condemn. Polarity: the strength of subjective expressions will be

informative for recognizing polarity, and vice versa.

Page 6: Just how mad are you? Finding strong and weak opinion clauses

Subjective clues – previous(1)

The clues from manually developed resources include entries from (Levin,1993; Ballmer and Brennenstuhl, 1981), Framenet lemmas with frame element experiencer (Baker et

al., 1998), Adjectives manually annotated for polarity (Hatzivassiloglou a

nd McKeown, 1997) subjectivity clues listed in (Wiebe,1990).

Clues learned from annotated data distributional similarity adjectives and verbs (Wiebe, 2000)

Hypothesized that seeds with strong potential subjective elements, Lin’s (Lin parser) process would find others. (not all synonyms)

Page 7: Just how mad are you? Finding strong and weak opinion clauses

Subjective clues – previous(2)

From unannotated data extraction patterns and subjective nouns learned using two di

fferent bootstrapping algorithms and a set of seed words (Riloff et al., 2003; Riloff and Wiebe, 2003).

extraction patterns are lexico-syntactic patterns Riloff and Wiebe (2003) show that AutoSlog-TS, an algorithm

for automatically generating extraction patterns, is able to find extraction patterns that are correlated with subjectivity.

Ex: <subj> was asked Ex: <subj> is fact.

Page 8: Just how mad are you? Finding strong and weak opinion clauses

Subjective clues – previous(3) n-grams (Dave et al., 2003; Wiebe et al., 2001)

Potentially subjective collocations were selected based on their precision, using two criteria.

The precision of the n–gram must be at least 0.1

The precision of the n–gram must at least as good as the precision of its constituents.

Ex: p(w1,w2) >0.1 && >max (p(w1) ,p(w2));

Ex : “it be time“, “to do so”,

low-frequency words, which require no training to identify (Wiebe et al., 2001), are also used as clues Ex : Unique-adj and-cc U-adj //charming and cheerful Ex : U-noun be-verb U-adj //idealism be hopeless

Page 9: Just how mad are you? Finding strong and weak opinion clauses

Subjective clues – syntax clues

The learning procedure consists of three steps.1) First, parse the training sentences in the MPQA corpus with a

broad-coverage lexicalized English parser.

For this study, we use 48 POS tags and 24 grammatical relationships.

Page 10: Just how mad are you? Finding strong and weak opinion clauses

Subjective clues – syntax clues

2) Next, we form five classes of syntactic clues from each word w in every dependency parse tree.

root(w, t): word w with POS tag t is the root of a dependency tree (i.e., the main verb of the sentence).

leaf(w, t): word w with POS tag t is a leaf in a dependency tree (i.e., it has no modifiers).

node(w, t): word w with POS tag t. bilex(w, t, r,wc, tc): word w with POS tag t is modified by w

ord wc with POS tag tc, and the grammatical relationship between them is r.

allkids(w, t, r1,w1, t1, . . . , rn,wn, tn): word w with POS tag t has n children. Each child word wi has POS tag ti and modifies w with grammatical relationship ri, where1<= I <= n

Page 11: Just how mad are you? Finding strong and weak opinion clauses

Subjective clues – syntax clues

3) Finally, evaluate the collected clues.

A clue is considered to be potentially useful if more than x% of its occurrences in the MPQA training data are in phrases marked as subjective, where x is a parameter tuned on development data (in our experiments, we chose x = 70%).

Page 12: Just how mad are you? Finding strong and weak opinion clauses

Subjective clues – syntax clues

Potentially useful clues are further categorized into one of three reliability levels.1. a useful clue is highly reliable

if it occurs frequently in the MPQA training data. For those that occur fewer than five times, we chec

k their reliability on the larger corpus of automatically identified subjective and objective sentences.

2. Clues that do not occur in the larger unannotated corpus are considered not very reliable.

3. Clues that occur in the subjective set at least y times more than in the objective set are considered somewhat reliable (y = 4 in experiments)

4. the rest are rejected as not useful clues

Page 13: Just how mad are you? Finding strong and weak opinion clauses

Feature Organization

How best to organize them into features for strength classification Each clue is a separate feature, it gives poor result. aggregating clues into sets, and creating one feature

per set. For example, there are two features for the polar adjectives

: one for the set of positive adjectives and one for the set of negative adjectives.

The value of each feature is the number of instances in the sentence or clause of all the members of the set.

We define 29 features for the PREV clues reflecting how they were presented in the original research. called PREV-type.

Page 14: Just how mad are you? Finding strong and weak opinion clauses

Feature Organization

15 features for the SYNTAX clues. For example, one feature represents the set of highly reliabl

e bilex clues. These features are called SYNTAX-type.

Although the above sets of subjectivity clues were selected because of their correlation with subjective language, they are not necessarily geared to discriminate between strong and weak subjectivity, and the groupings of clues into sets were not created with strength in mind.

We hypothesized that a feature organization that takes into consideration the potential strength of clues would do better for strength classification.

Page 15: Just how mad are you? Finding strong and weak opinion clauses

Feature Organization

To adapt the clues to strength classification, use the annotations in the training data to filter the clues and organize them into new sets based on strength.

For each clue c and strength rating s, we calculate the P(strength(c)) = s as the probability of c being in an annotation of strength.If P(strength(c)) = s >T, for some threshold T, we put c in the set for strength s. It is possible for a clue to be in more than one set.

When PREV and SYNTAX clues are used in this featureorganization they are called PREV-strength and SYNTAX-strength

Page 16: Just how mad are you? Finding strong and weak opinion clauses

Experiment

We wished to confirm three hypotheses. 1. it is possible to classify the strength of clauses, for those that

are deeply nested as well as those at the sentence level. subjectivity at deeper levels can be challenging because

there is less information to use for classification.

2. classifying the strength of subjectivity depends on a wide variety of features, including both lexical and syntactic clues.

3. organizing features by strength is beneficial.

Page 17: Just how mad are you? Finding strong and weak opinion clauses

Experiment To test our hypotheses, performed the experiments under

different settings:1. the learning algorithm used to train the classifiers

the three machine learning algorithms are Boosting use BoosTexter (Schapire and Singer, 2000) rule learning use Ripper (Cohen, 1995). support vector regressionuse SVMlight, and discretize the resultin

g output into the ordinal strength classes. 2. the depth of the clauses to be classified

Vary the depth of clauses to determine the effect of clausal depth. In our experiments, clauses are determined based on the non-leaf

verbs in the parse tree For example, sentence has three clauses “They were driven out by rival warlord Saif Ullah, who //level 1

has refused //level 2to give up power.” //level 3

3. the types of features used

Page 18: Just how mad are you? Finding strong and weak opinion clauses

Experiment result

Average over 10 – fold cross validation using 9313 sentences from the MPQA corpus.

Significance is measured using a 1-tailed t-test. Mean–square error (MSE)

captures this distinction, it is perhaps more important than accuracy as a metric for evaluation.

Classification accuracy

Page 19: Just how mad are you? Finding strong and weak opinion clauses

Classification results Strength classification results for clauses of depth 1-4. (1) baseline: choose most frequent class. (2) BAG: the words in each sentence are given to the

classification algorithm as features (6) gives the best result.

Page 20: Just how mad are you? Finding strong and weak opinion clauses

Classification results Note that BoosTexter and Ripper are non-ordinal classification a

lgorithms, whereas support vector regression takes into account ordinal values. This difference is reflected in the results.

The best experiments use all the features, supporting hypothesis that using a wide variety of features is effective.

The new syntax clues contribute information over and above bag-of-words and the previous clues.

Organizing features by strength is beneficial.

Page 21: Just how mad are you? Finding strong and weak opinion clauses

Conclusions

This paper presents promising results in identifying opinions in deeply nested clauses and classifying their strengths.

In 10-fold cross-validation experiments using boosting, achieve improvements over baseline

mean-squared error ranging from 48% to 60% and improvements in accuracy ranging from 23% to 79%.

using support vector regression show even stronger mean-squared error results, with improvements ranging from 57% to 64% over baseline.

Applications such as question answering and summarization will benefit from the rich subjectivity analysis performed by our system.