automatic identification of pro and con reason in online reviews
TRANSCRIPT
![Page 1: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/1.jpg)
1
Automatic Identification of Pro and Con Reason in Online Reviews
Soo-Min and Eduard HovyCOLING’06
Advisor: Chia-Hui ChangPresenter: Teng-Kai Fan
Date: 2008-05-20
![Page 2: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/2.jpg)
2
Abstract
Authors present a system that automatically extracts the pros and cons from online reviews. Their focus is on extracting the reasons of the
opinions, which may be in the form of either fact or opinions.
They proposed a system based on maximum entropy model for aligning the pros and cons to their sentence in review texts.
![Page 3: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/3.jpg)
3
Outline
Introduction Pro and Con in Online Reviews Finding Pros and Cons Dataset Experiments and Results Conclusion
![Page 4: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/4.jpg)
4
Introduction
Many opinions are being expressed on the Web in such settings as product reviews, personal blogs, and news group message....
The trend has raised many interesting research topics such as subjectivity detection, semantic orientation classification, and review classifications.
![Page 5: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/5.jpg)
5
Introduction cont.
Subjectivity detection: It is the task of identifying subjective
words, expressions, and sentences.
Semantic orientation classification: It is the task of determining positive or
negative sentiment of words (phrases, sentence or document).
![Page 6: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/6.jpg)
6
Introduction cont.
The opinion reason identification problem seeks to answer the question “What are the reasons that the author of this review likes or dislikes the product?”
Hence, they focus on extracting pros and cons which include not only sentences that contain opinion-bearing expressions about products and features but also sentences with reasons.
![Page 7: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/7.jpg)
7
Introduction cont.
Labeling each sentence is a time consuming and costly task. Authors propose a framework for automatically
identifying reasons in online reviews and introduce a novel technique to label training data.
The experimental results show that the pros and cons with 66% precision and 76% recall.
![Page 8: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/8.jpg)
8
Pros and Cons in Online Reviews
Researchers study opinions at three different levels: word, sentence, and document level.
They assume that reasons in a review are closely related of pros and cons expressed in the review. Pros in a product review are sentences that
describe reasons why an author of the review likes the product.
![Page 9: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/9.jpg)
9
Automatically Labeling Pro and Con Sentences
Many web sites that have product reviews such as amzaon.com and epinions.com explicitly state pros and cons phrases.
Hence, the automatic labeling system first collects phrases in pro and con fields and then searches the main reviews text in order to collect sentences corresponding to those phrase.
![Page 10: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/10.jpg)
10
Automatically Labeling Pro and Con Sentences cont.
First, generating two sets of phrases: {P1, P2,…,Pn}, {C1, C2,…,Cn} by extracting each pro and con fileds. Ex.: beautiful display.
Then, the system checks each sentence to find a sentence that covers most of the words in the phrase. Ex.: I’m personally quite happy
with it because of the beautiful display.
Last, the system annotates this sentence with the “pro” label.
Pro
Con
Main
Review
![Page 11: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/11.jpg)
11
Modeling with Maximum Entropy Classification
They use Maximum Entropy classification for the task of finding pro and con sentences in a given review.
The conditional probability of a class c given a feature vector x:
where:
fi (c, x): feature function with boolean value. λ a weight parameter for the feature function.
![Page 12: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/12.jpg)
12
Modeling with Maximum Entropy Classification cont.
To build an efficient model, the task of finding pro and con sentence is separated into two phases: The Identification separates pro and cons candidate
sentences (PR and CR) from sentences irrelevant to either of them (NR).
The Classification classifies candidates into pros and cons.
IdentificationClassification
![Page 13: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/13.jpg)
13
Features
1. News Corpus2. WordNet.
![Page 14: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/14.jpg)
14
DataSet
Two different source: Epininos.com for training. Complaints.com for testing.
Dataset1: Automatically Labeled Data Mp3 player: 3241 reviews (115029 sentences) Restaurant: 7524 reviews (194391 sentences)
Dataset2: Complaints.com Data Mp3 player: 59 reviews. Restaurant: 322 reviews.
![Page 15: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/15.jpg)
15
Experimental Results
Two goals: How well our pro and con detection mode
l. How well the trained model performs on c
omplaints.com 80 % for training, 10 % for developmen
t, and 10 % for testing.
![Page 16: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/16.jpg)
16
Experiments on Dataset 1 Identification step
![Page 17: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/17.jpg)
17
Experiments on Dataset 1Classification step
![Page 18: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/18.jpg)
18
Experiment on DataSet 2
Gold Standard Annotation: Four humans annotated test sets.
Only Identification:
![Page 19: Automatic Identification Of Pro And Con Reason In Online Reviews](https://reader036.vdocuments.us/reader036/viewer/2022062513/554e1665b4c90511778b46a5/html5/thumbnails/19.jpg)
19
Conclusions
This paper propose a framework for identifying the online product review.
They present a novel technique that automatically labels a large set of pro and con sentences by using clue phrases.