paper.pdf
TRANSCRIPT
1
Forecast Model for Box-Office Revenue of Bollywood Feature Films
Prerit Kohli Rajat Taneja Saumya Bansal
Department of Computer Engineering
Netaji Subhas Institute of Technology, University of Delhi
New Delhi, ND 110078, India
Email : [email protected]; [email protected]; [email protected]
Abstract:
We consider the technique to forecast the net revenue collections of a feature film.
Previous work on this problem has been addressed majorly to Hollywood films with very
limited work on motion pictures developed by the Hindi Film Industry – Bollywood. In this
piece of work, we use the parameters governing a movie’s revenue and the historical revenue
gross patterns for forecasting. We also show that the model can be used for low budget movies
which are usually left out by technology giants like Google, Twitter etc. due to negligible buzz
for the movie as compared to that for high-budget ones.
Key words: : Forecasting, Machine Learning, Bollywood, Regression model.
Introduction:
Bollywood is the Hindi-language film industry based in Mumbai, India. With 1,000 films
produced annually, it is the world’s largest filmmaking entity. Bollywood gross receipts have
almost tripled since 2004. It generated revenue of around Rs. 15,000 crores in 2011 and this figure
has been growing by 10 percent a year. So, it is of immense importance to study the behavior of
the Indian viewers, the Bollywood industry and the revenue forecast of a particular movie.
In the recent years, forecasting of movie revenues has been linked to the volume of
Google searches, popularity on YouTube, fan-following on Facebook, buzz in the Twitter world,
and so on. But these models may not be very useful in forecasting revenues for novels, music
albums or even low-budget movies that are yet to be released due to lack of buzz associated with
them. Therefore we adopt a model based on the historical sales patterns of similar products, which
can be used efficiently for movies and beyond.
2
The study of Bollywood is distinct from the study of other film industries. Apart from the
generic parameters that govern the revenue of a feature film of any film industry, Hindi cinema
has a number of distinct features associated with it which also play a deciding role. The first and
the most important factor is the inclusion of a number of songs within the film, which are released
a few weeks before the movie release date. It is true that these songs top the charts once the movie
hits the theaters but this success pattern is also true the other way around. Some musical movies
like Rockstar [2011] and Aashiqui 2 [2013] have had a major chunk of their revenue collections
associated with the popularity of their music albums among the fans, weeks before the movies
were released. Secondly, it is the “masala” film genre with which many Bollywood films have
been associated. The genre is named after “masala”, the mixture of spices in South Asian cuisine
and depicts the nature of Bollywood mixing genres like action, comedy, drama and romance freely
into one movie. This is done in order to attract audiences with diverse interests.
We, therefore, decided to study all these factors that make Bollywood distinct. Our task is
related to Machine Learning and we have worked upon developing a forecast model with the
primary objective of forecasting the net revenue of Bollywood feature films at the domestic level,
based on early Box Office data. Our intention is to develop a model that is able to assist movie
studios as even a single movie can be the difference between crores of rupees of profit or loss for a
production house in a given year. The model is of intense interest to motion picture exhibitor
chains (retailers) as well, in managing their exhibition capacity with distributors (studios), by
allowing them to project the Box Office potential of the movies they plan to or currently exhibit.
Guidelines:
Gross revenue of a movie mentioned herein refers to the total sales of movie tickets in
India. It does not include auxiliary revenues such as international market revenues, video rentals,
merchandise and soundtrack sales, etc.
The variable of interest is box-office net revenue. Net revenue refers to the actual revenue
a movie makes, after the deduction of taxes. This variable is considered in our study for the
database of previously released films used for forecasting revenue for new feature films.
3
Data Set:
The dataset has been extracted from the Internet Movie Database for movie information,
revenue details and cast & crew database from Koimoi.com, and music popularity index from
Top10Bollywood.com. The response a movie receives from film critics have been recorded via
CNN IBN, Hindustan Times, Dainik Bhaskar, Zoom and Bollywood Hungama.
The Machine Learning database covers 100 films released in India in the years 2010-
2014. For power lists on popular and trending members of the film fraternity, movie performances
during the decade 2004-2014 have been considered.
Forecasting Task:
We explore the use of Machine Learning in forecasting the commercial success of a
movie at the Box Office. Machine Learning is a branch of Artificial Intelligence that involves the
study and construction of systems that can learn from data. The Machine learning technique used
is Regression analysis.
The process of regression is used to take into account all the revenue-determining factors
of movies which have already been released. This is done so as to determine the influence of each
parameter that contributes to a film’s Box office performance. The parameters range from general
factors like cast & crew and genre, to specific ones like music-album popularity, competition for
same release date, etc.
General Parameters:
Table 1 enlists the general factors used for forecast analysis, whose default values are constant for
each film considered. Some factors such as Star Power, success of movie franchisee in the past,
type of release (festive season) etc., play a major role in the commercial success of a feature film.
Best Actor (2004-2014)
Trending Actor
Best Actress (2004-2014)
Trending Actress
Best Director (2004-2014)
Promising Director
Production House
Sequel
Trilogy Finale/Extended Trilogy
4
Successful Pair (actor-actor)
Successful Pair (actor-director)
Festive Season Release: Diwali/Christmas
Genre
CBFC Rating: U/UA/A
Table 1: List of General Parameters
Movie Specific Parameters:
Table 2 enlists the factors whose values depend on the movie or its release date. Some factors such
as Production and Marketing budget, Music album popularity, Level of competition (movie
releasing on same day/blockbuster next week), Number of screens booked nationwide etc., play a
major role in improving the efficiency of forecast results for a film.
Production and Marketing Budget
Fan-following/Adaptation/Remake
Music Album Popularity
Number of Screens
Publicity/Reviews on Paid-Previews
Level of Competition
Critics’ review
Audience response
Post-release Promotion
Word of Mouth
Entertainment Tax promotion
Table 2: List of Movie-specific Parameters
Implementation:
The model comprises of three methods for revenue forecasting, each applied at three
different stages of a movie’s lifecycle.
The first stage is when the film is completed and sent to the studio. This forecast data is used
by movie studios for purposes such as deciding the prerelease marketing budget, the number of
cinema screens to book, etc.
5
Figure 1: Average-Error Graph for implementation of Method 1
The second stage is when the movie prints are sent to the theaters a few days before the
release. This forecast data is used by movie exhibitors for finalizing the number of screens to be
devoted to that movie being released in the next Box Office week.
Figure 2: Average-Error Graph for implementation of Method 2
6
The third is at the end of the first weekend of the release date. Usually, more than 25% of
the total revenue is grossed in the first weekend, but the rate at which it decays depends on a
number of factors such as Critics’ reviews, Audience response, magnitude of word of mouth, etc.
Figure 3: Average-Error Graph for implementation of Method 3
Results and Discussion:
We use linear regression to forecast the net earnings of a film, denoted R, based on
parameters P which govern the revenue of a movie. The formula is as follows:
Ri = β1(P1)i + β2(P2)i +... + βn(Pn)i
where Ri, (Pn)i and βn indicate the forecasted revenue of the film i, the value of nth
parameter for film i, the corresponding coefficient of the nth parameter, respectively.
Studios make fewer films that are expected to get a CBFC rating A since they are skewed to
a narrower market, while films with CBFC rating U and U/A have a much larger potential
audience. The U/A rating is quite desirable, as it can pull in both adults and children, and excludes
virtually no one.
Similar to the parameter devoted to Music-album popularity, a number of parameters have
been analyzed which do not come in the list of conventional factors like Best Actor, Trending
7
Actress, Production House, etc. Some of them are Number of Screens booked, Competition with
movies sharing the same Release-date and the viral Word of Mouth.
The number of prints/screens booked for a potential blockbuster film has doubled over the
year the past 5 years. Dabangg [2010] booked 1800 screens nationwide while Jai Ho [2014]
booked 3900 screens worldwide, including single screens and multiplexes. But Dabangg
collecting more revenue than Jai Ho is a different issue, pertaining to the “masala” element in a
film.
Next factor considered is the Competition faced by movies sharing the same Release-date. In
India, the Diwali season of 2012 saw the clash of two potential blockbusters Jab Tak Hain Jaan
[2012] and Son Of Sardaar [2012], releasing on the same date. Though both the movies collected
above 100 crores (net) at the Box Office, it did not do justice to the amount of prerelease
marketing done by both of them.
The third factor is the one pertaining to the Word of Mouth publicity surrounding a movie,
with the most prominent example being that of Queen [2014]. Its first weekend net gross was Rs.
10 Crores. Being a low-budget film with a debutant director and a decent star cast, the film would
not have made more than Rs. 20-25 Crores, but it collected a total of Rs. 61 Crores at the Box
Office, which is not possible to forecast without the use of Word of Mouth parameter.
Net revenues of movies displayed on a logarithmic scale to account for the disparity in the
earnings of all movies.
Forecasting Accuracy:
The forecast accuracy for box office grosses increases as we go from Stage 1 (the film is
completed and sent to the studio) to Stage 2 (the movie prints are sent to the theaters a few days
before the release) and finally, Stage 3 (at the end of the first weekend of the release date).
The average errors for Stage 1, 2 and 3 were 36.05%, 19.52% and 11.74% respectively. This
depicts the correlation between the numbers of factors incorporated in the forecast dataset for a
given movie.
References:
8
[1] Jeffrey S. Simonoff, Ilana R. Sparrow (2000), Predicting movie grosses: Winners and losers,
blockbusters and sleepers. Stern School of Business, New York University.
[2] Nikhil Apte, Mats Forssell, Anahita Sidhwa, Predicting Movie Revenue, Dec 2011.
[3] Chrysanthos Dellarocas, Xiaoquan (Michael) Zhang, Neveen F. Awad. (2007, Aug.).
Exploring the value of online product reviews in forecasting sales: The case of motion
pictures. Journal of Interactive Marketing. [Online]. Available:
http://blog.mikezhang.com/files/movieratings.pdf
[4] Jae-Mook Lee, Tae-Hyung Pyo, Forecast Model for Box-office Revenue of Motion Pictures,
Dec 2009.
[5] Mahesh Joshi, Dipanjan Das, Kevin Gimpel, Noah A. Smith, Movie Reviews and Revenues:
An Experiment in Text Regression. Language Technologies Institute, Carnegie Mellon
University.
[6] Alec Kennedy, “Predicting Box Office Success: Do Critical Reviews Really Matter?”,
unpublished.
[7] Márton Mestyán, Taha Yasseri, János Kertész (2013, Aug.). Early Prediction of Movie Box
Office Success Based on Wikipedia Activity Big Data. Institute of Physics, Budapest
University of Technology and Economics.