reviewmining presentation 2014 04 22courses.washington.edu/ling575/spr2014/slides/tp_review...slide...
TRANSCRIPT
![Page 1: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/1.jpg)
Topic Presenta.on on Review Mining
1. M. Hu and B. Liu. “Mining and Summarizing Customer Reviews”. 2. M. Ganapathibhotla and B. Liu. “Mining Opinions in Compara.ve Sentences”.
![Page 2: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/2.jpg)
Slide 2
Papers
• M. Hu and B. Liu. “Mining and Summarizing Customer Reviews,” In Proceedings of KDD, 2004. (Primary reference)
• M. Ganapathibhotla and B. Liu. “Mining Opinions in Compara.ve Sentences,” In Proceedings of COLING, 2008. (Supplemental reference)
![Page 3: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/3.jpg)
Slide 3
Summary of [Hu 2004], “Mining and Summarizing Customer Reviews”
• Problem: – Products are sold online (e.g. digital cameras, MP3 player, DVD player)
– Customers write reviews sta.ng their opinion – How can we generate a summary of these customer reviews?
• Approach: – Iden.fy product features/a[ributes/aspects – Iden.fy opinion sentences (that contain a product feature and an opinion word)
– Determine posi.ve or nega.ve orienta.on of the opinion sentences
![Page 4: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/4.jpg)
Slide 4
Overview of workflow (1/7)
• Retrieve data • Experiments used reviews from Amazon.com and Cnet.com
![Page 5: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/5.jpg)
Slide 5
Overview of workflow (2/7)
• POS tagging – Find N and NP to iden.fy product features – Remove stopwords, apply stemming
• Finding product features in sentences – Only get explicitly-‐men.oned N/NPs
• The pictures are clear. • While light, it won’t easily fit in pockets.
– Frequently occurring N/NPs are likely to be product features
• Used Apriori algorithm • Kept N/NPs that occurred in >1% of review sentences
• Pruning – Keep only NPs that occur in some order – Remove single N that are also part of larger NP – Improves precision
![Page 6: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/6.jpg)
Slide 6
Overview of workflow (3/7)
• Extract opinion words from each opinion sentence – An opinion word is the adjec.ve:
• closest to a product feature (sec.on 3.6) • and has a posi.ve or nega.ve orienta.on
– An opinion sentence has 1+ product feature and 1+ opinion word
– Example: The strap is horrible and gets in the way of parts of the camera you need access to.
![Page 7: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/7.jpg)
Slide 7
Overview of workflow (4/7)
• Iden.fy the posi.ve or nega.ve orienta.on of opinion words (adjec.ves) – Applied bootstrapping to grow list of oriented words – Started with 30 seed words that are clearly posi.ve or nega.ve (e.g. great, fantas.c, bad, dull)
– For a given opinion word: • use Wordnet to find a related opinionated adjec.ve • assign same orienta.on • add to growing list of oriented words
![Page 8: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/8.jpg)
Slide 8
Overview of workflow (5/7)
• Infrequent features – Find features that did not pass Apriori’s minimum support (occurred in less than 1% of review sentences)
– For a sentence with known opinion words (adjec.ves) but no known frequent features, pick the closest N/NP closest to the opinion word
– Improves recall
![Page 9: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/9.jpg)
Slide 9
Overview of workflow (6/7)
• Iden.fy the posi.ve or nega.ve orienta.on of opinion sentences – Count the number of posi.ve and nega.ve opinion words
– The orienta.on with the higher count wins and is assigned to the sentence
– Tie breaking rules, such as: • Use average orienta.on for feature(s) • Use orienta.on of previous sentence
– Rules for handling but/however clause – Negate orienta.on of opinion word if nega.on word occurs within 5 words
![Page 10: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/10.jpg)
Slide 10
Overview of workflow (7/7)
• Generate summary 12 opinion sentences related to picture with posi.ve orienta.on
![Page 11: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/11.jpg)
Slide 11
Experimental results
• For each product, authors created list of features by manual inspec.on • Recall and precision for features found by algorithm at each stage
![Page 12: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/12.jpg)
Slide 12
Experimental results
• Authors manually iden.fied opinion sentences and determined orienta.on • Recall and precision of determining if a sentence is an opinion sentence • Accuracy of orienta.on
![Page 13: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/13.jpg)
Slide 13
Cri.ques
• Hard to cri.que – Paper published in 2004 – 1992 cita.ons on Google Scholar – Almost 200 cita.ons per year
![Page 14: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/14.jpg)
Slide 14
Cri.ques
• The output is a summary of individual features from individual sentences, not of the overall opinion – Do customers like the product or not?
• No context – The laptop had a fast hard drive. – The fast hard drive nonetheless produced a lot of noise.
• Context is discussed further in [Ganapathibhotla 2008]
![Page 15: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/15.jpg)
Slide 15
Cri.ques
• No reference resolu.on – losing a lot of informa.on – The lens turns quickly. It is easy to use. – The former was great, but the la@er was horrible.
• How to handle mixed products in the review? – Example: I liked using CameraX. I previously owned CameraY. The lens was horrible on that one.
![Page 16: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/16.jpg)
Slide 16
Cri.ques
• Lots of heuris.cs, sec.on 3.6 • Regarding nega.on:
– “By ‘closely’ we mean that the word distance between a negaHon word and the opinion word should not exceed a threshold (in our experiment, we set it to 5).”
• Regarding subordinate clauses: – “For a sentence that contains a ‘but’ clause … we first use the effecHve opinion in the clause to decide the orientaHon of the features. If no opinion appears in the clause, the opposite orientaHon of the sentence will be used.”
• Regarding adjec.ves and nouns: – “EffecHve opinion is the closest opinion word for a feature in an opinion sentence.”
![Page 17: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/17.jpg)
Slide 17
Cri.ques
• Assumed reviews are legi.mate – Promo.onal spam / astroturfing – Malicious trolling – Sarcasm
![Page 18: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/18.jpg)
Slide 18
Things I learned
• Iden.fying product features by (1) finding N/NP and (2) applying a minimum threshold – Seems to work well in prac.ce
• A. Popsecu and O. Etzioni, “Extrac.ng product features and opinions from reviews.” In Proceedings Empirical Methods and in NLP, 2005.
• R. Feldman, “Techniques and Applica.ons for Sen.ment Analysis,” CACM, April 2013.
• Adjec.ves as sen.ment-‐bearing tokens – As discussed in 4/15 lecture – Growing a list of opinionated adjec.ves by searching Wordnet for synonyms and antonyms is useful
– Some opinionated adjec.ves are unambiguous – or are they?
![Page 19: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/19.jpg)
Slide 19
Summary of [Ganapathibhotla 2008], “Mining Opinions in Compara.ve Sentences”
• Problem: – Customers write reviews sta.ng their opinion, open with sentences comparing two products
– How can we iden.fy which product the user prefers? • Approach:
– Clearly opinionated compara.ves are easy • CameraX is be/er than CameraY. • CameraX is more beauHful than CameraY. (where beauHful is a known posi.ve adjec.ve from Hu’s paper)
– Context-‐dependent compara.ves are hard • CameraX has a higher build quality than CameraY. • CameraX has a higher failure rate than CameraY. • Check external source (e.g. epinion.com’s Pros and Cons lists) to see if higher build quality and higher failure rate are posi.ve or nega.ve
![Page 20: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/20.jpg)
Slide 20
Applica.ons
![Page 21: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/21.jpg)
Slide 21
Applica.ons If I were an algorithm, I would look for this
![Page 22: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/22.jpg)
Slide 22
Applica.ons Product features
![Page 23: ReviewMining presentation 2014 04 22courses.washington.edu/ling575/SPR2014/slides/TP_Review...Slide 3 Summary&of&[Hu&2004],& “Mining&and&Summarizing&Customer&Reviews” • Problem:](https://reader034.vdocuments.us/reader034/viewer/2022042405/5f1d498ef2384b78bb621801/html5/thumbnails/23.jpg)
Slide 23
Thank you for staying awake