prediction of box office success of movies using hype analysis of twitter data
TRANSCRIPT
Presentation on Seminar Topic
PREDICTION OF BOX OFFICE SUCCESS OF MOVIES USING HYPE
ANALYSIS OF TWITTER DATA
By
SAMEER THIGALE
TUSHAR PRASAD
USTAT KAUR
VIBHA RAVICHANDRAN
Guided ByPROF. REENA PAGARE
Sponsored ByPERSISTENT SYSTEMS LIMITED
124 November 2014
A BRIEF OUTLINE
• PRESENCE OF “RICH INSIGHTS” IN SOCIAL NETWORKS
• IDEA OF PREDICTING BOX OFFICE SUCCESS OF MOVIES
• PRE-RELEASE HYPE- A SUCCESS FACTOR
AGENDA
• LITERATURE SURVEY• PROBLEM STATEMENT• MODEL EMPLOYED• BLOCK DIAGRAM• ACTIVITY DIAGRAM• PLATFORM AND TECHNOLOGY• LIMITATIONS• FUTURE SCOPE• FEASIBILITY ASSESSMENT• MATHEMATICAL MODEL• FUNCTIONAL POINT ANALYSIS• PROJECT PLAN AND INDIVIDUAL CONTRIBUTION• CONCLUSION• REFERENCES
LITERATURE SURVEY
• CURRENT SCENARIO IN MOVIE INDUSTRY
• FORECASTING METHODS EMPLOYED
– QUANTITATIVE
• TIME SERIES / EXPLANATORY
– QUALITATIVE
– UNPREDICTABLE
• SOCIAL MEDIA: A KNOWLEDGE REPOSITORY
– HYPE ANALYSIS
• SENTIMENT ANALYSIS
REFERENCE DESCRIPTION
FORECASTING-Methods and Applications by-Spyros M., Steven W., Rob H
We studied the Model used for forecasting and used for some cases from this reference. We are here using the regression model for better accuracy and efficiency.They are many other models such as weighted average model which might be a less efficient as in case of results
Predicting the future with social media- S Asur, B Huberman, HP Labs, Hp Journal,January 2012
From this paper we analyzed the variousfactors that could be considered for calculating the success rate. The factors may be hype,Distribution,Cast,Budget, Type of film etc.. We also Analyzed the concept of sentiment analyses from the same
Box-Office opening prediction of Movies based on Hype Analysis through Data Mining-A.Reddy,St.Francis Institute of TechnologyInternational Journal of Computer Application,October 2012
From this paper we studied the calculation of the hype factor and gathering tweets from twitter.
PROBLEM STATEMENT
• To demonstrate how social media content can be used to predict real-world outcomes. In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies.
• We further demonstrate how sentiments extracted from Twitter can be further utilized to improve the forecasting power of social media.
TERMS AFFINIATED WITH #MARYKOM
MODEL EMPLOYED
• MULTIPLE LINEAR REGRESSION
– WITH TIME SERIES REGRESSION
• The regression coefficients are calculated using partial differentiation and by using the particular data set available
Y = ßaA + ßpP + ßdD + ßbB + ßeE + ßsS + e
REGRESSION COEFFFICIENTS
ATTENTION SEEKING POLARITYHEATNESS
ERROR FACTORCATEGORY STAR CAST
SEQUEL
MODEL EMPLOYED
A
P CALCULATED USING SENTIMENT ANALYSIS
D FOLLOWER COUNT-T/FOLLOWER COUNTT=AVG NO OF FOLLOWERS PER ALL USERS WHO TWEETED
B CATEGORY OF MOVIE- ACTION, THRILLER, COMEDY, SCI-FI, ANIMATION, 3-D, ROMANCE
E STAR CAST- INPUT THROUGH USER
S SEQUEL FACTOR
e ERROR FACTOR
BLOCK DIAGRAM
SERVER SIDE ACTIVITY DIAGRAM
PLATFORM AND TECHNOLOGY
PLATFORM-UBUNTU
TECHNOLOGY-Java-SENTIMENT ANALYSER- SENTIWORD, LINGPIPE-TWITTER API-OAUTH 2.0
HARDWARE-COMMODITY HARDWARE-SERVER
APPLICATION SOFTWARES-ECLIPSE-MYSQL SERVERAPACHE TOMCAT-TWITTER4J
PLATFORM USED
• Linux-open source
• MySQL-optimized database for web based applications
• Twitter4j-for accessing tweets through twitter
API
• Sentimental Analyzer-Sentinet
• Oauth 2.0-for authorization
FEATURES & APPLICATIONS
• FORECAST MOVIE SUCCESS RATE
• ESTIMATE REVENUE FROM MOVIE
• COMPARE MOVIES– SCHEDULING
• HYPE ANALYSIS
• EFFECT OF PUBLIC HOLIDAYS ON SUCCESS
• Heatmap- pleasureness/hypeness of tweets
• Twitter affinity-which tweets are interrelated
LIMITATIONS
• FORECASTING ACCURACY IMPROVES OVER TIME
• TWITTER LIMITATIONS
– IT’S CONSIDERED A NEWS NETWORK
FUTURE SCOPE
• TWEETS IN OTHER LANGUAGES CAN BE TAKEN INTO ACCOUNT WITH TRANSLATORS
• SENTIMENTS FROM FACEBOOK AND OTHER SITES CAN BE ADDED
MATHEMATICAL MODEL
Let S be the system:S = {S, AP, A, P, D, B, Q, E, DB, En, C, Y, Er |
f1,f2,f3,f4,f5}
S-Server AP-ApplicationA-Attention Seeking Factor P-PolarityD-Distribution Factor B-BudgetQ-Star Cast E-SequelDB-Database for movie C- CATEGORYY-Regression Output Er-Error Factor
MATHEMATICAL MODEL
SET THEORY:A={A1,A2…An}A is the attention seeking factorA=(R,S,T)R=rate of tweetsS=seasonal VariablesT=time specified
P={P1,P2…Pn}P is the PolarityP=(Pos, Neg, Neu)Pos=Positive TweetNeg=Negative TweetNeu=Neutral Tweet
D={D1,D2..Dn]D is the Distribution AreaD=(f, t)f=follower countt=Average no of followers per all users who tweeted
MATHEMATICAL MODEL
Sr.no Function Description
1 f1(AP)->S Function invoked by AP to send request to server
2 f2(S)->DB Function invoked by S to fetch information from Database
3. f3(En)->Y Function invoked by Engine to calculate the output through regression at certain interval of time
4. f4(S)->AP Function invoked by S to display response output to the AP
5. f5(AP)->DB Function invoked to store the results in the database
Following Functions can be mapped onto the elements of the set
MATHEMATICAL MODELMATHEMATICAL MODEL
Sr.no Function Mapping of the Function
1 f1(AP)->S One-to-one
2 f2(S)->DB One- to-many
3. f3(En)->Y One-to-many
4. f4(S)->AP One-to-one
5. f5(AP)->DB One-to-many
MAPPINGS:
Failure Condition:The Application works on the inputs from the user. Thus allocation of resources is animportant factor. The integrity of data maintained by the system is of utmostimportance and the set of these parameters should be mutually exclusive. This willensure that no resource entity is left unaccounted for. If sufficient no of tweets arenot available then the accuracy may be hampered.
FUNCTION POINT ANALYSIS
EXTERNAL INPUTS •Movie Details(Movie name, Release Date, Budget, Star Cast)•User Details(Stakeholder)
EXTERNAL OUTPUTS •Revenue of movies Predicted•Success Rate of the movies
EXTERNAL INQUIRY •Information about Past Transaction •Comparison of Movies
INTERNAL LOGICAL FILES •User log File•User Command regarding Analysis
EXTERNAL INTERFACE FILES •Regression coefficients•Twitter Data•Polarity and other Parameters like Hype, Distribution Factor.
PROJECT PLAN
Group formation, Search and
Finalization , Guide allocation
Literature survey, Group discussion
basic study
Development of Mathematical moddel and
development of UML and project
plan
Refinement, Delivery,
Documentation,Feedback
Coding, Testing, Deployment and
presentation of topic at seminars
Deployment
Planning
Modeling
Construction
Communication
JanAug Oct DecNovSeptJuly
FebMar
May
INDIVIDUAL CONTRIBUTION
GROUP MEMBER NAME WORK DONE
SAMEER THIGALE TWITTER DATA ANALYSIS, OAUTH, UML
TUSHAR PRASAD SENTIMENT ANALYSIS, PLATFORM SURVEY
USTAT KAUR LITERATURE SURVEY, SRS
VIBHA RAVICHANDRAN FORECASTING MODEL, PROJECT PLAN, FPA
CONCLUSION
• In this project we have shown how social media can beutilized to forecast future outcomes.
• Specifically, using the rate of chatter from tweets fromthe popular site Twitter, we constructed a multiplelinear regression model for predicting box-officerevenues of movies in advance of their release.
• At a deeper level, this work shows how social media expresses a collective wisdom which, when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.
REFERENCES
[1] FORECASTING-Methods and Applications by-Spyros M., Steven W., Rob H.
[2] Predicting the future with social media- S Asur, B Huberman, HP Labs, Hp Journal,January 2012
[3]Box-Office opening prediction of Movies based on Hype Analysis through Data Mining-A.Reddy,St.Francis Institute of Technology
International Journal of Computer Application,October 2012
THANK YOU!
25