sentiment analysis on amazon movie reviews dataset
TRANSCRIPT
SENTIMENT ANALYSISAMAZON MOVIE REVIEW DATASET
IS 688 – WEB MINING
INSTRUCTOR: CHRISTOPHER MARKSON
TEAM MEMBERS: Maham | Amit | Mashael | Karan | Nidhish
OUTLINE
• Data Source, Collection & Parsing• Model Selection & Optimizing Parameters• Methods / Code Sample• Results Overview & Value
DATA SOURCE, COLLECTION & PARSING
Amazon movie reviews, published by Jure Leskovec. Assistant Professor of Computer Science at Stanford University on his personal site.
PROBLEMS
• Format was not R-Friendly• Only partial information was available, data context were missing
• we had reviews but no information about the movie
WORKAROUND / SOLUTION• Wrote a parser to convert JSON txt file into CSV using R Compiler
• Developed a NodeJS middleware to gather information about movie
PREPARED FILESAfter parsing, and gather more data using Amazon Web Service, we got following 2 files
&
Reviews
Movie Details
MODEL SELECTION & OPTIMIZATION• Basic Sentiment Score for Each Review, using Syuzhet package
• Provides 4 types of method, bing, afinn, nrc, Stanford; AFFIN has weighted 2477 words and phrases
• Uses coreNLP, stringr libraries mainly.. Emotional trajectory of review
• Create WordCloud for Each Movie, using wordcloud package
• Combined all reviews into one variable, calculated term frequency & generated WordCloud images
• Used tm (text minig), SnowballC (text stemming), RColorBrewer (color palettes) alongside
• Pointwise Mutual Information (PMI) Sentiment Score for Each Movie, using RCurl package
• Wrote our own function
• Movie_Title vs Excellent/Poor, Movie_Genre vs Excellent/Poor
• Final score was the ratio of Movie_Title / Movie_Genre
MODEL SELECTION & OPTIMIZATION
• Aggregated all the Sentiment Scores• Took Median of all the users review score
• Took Median of all the users review text sentiment score
• Assigned an overall Sentiment Score to each movie• Took median of
• User Review Score Aggr,
• User Review Text Sentiment Score Aggr,
• Movie_Title vs Genre PMI Score
METHODS / CODE SAMPLE
Basic Sentiment Score
WordCloud
METHODS / CODE SAMPLE
Aggregation
PMI
RESULT OVERVIEW & VALUE
RESULT OVERVIEW & VALUE
The Count of Monte Cristo [Region 2]
Far from HomePhonics Volume 1
RESULT OVERVIEW & VALUE
• Alongside with aggregate user reviews, Amazon can present
• overall rating score, and
• Word Cloud local to that product
• This will save users a lot of time to read through all the reviews and they can easily picture the overall user sentiments regarding that product.
THANK YOU