metis project 2: predicting box office gross

16
Analysis of features most influential in the success of Jamie Fradkin January 29, 2016 PREDICTING SUCCESS FOR MOVIES

Upload: jamie-fradkin

Post on 15-Apr-2017

150 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Metis Project 2: Predicting Box Office Gross

Analysis of features most influential in the success of

Jamie FradkinJanuary 29, 2016

PREDICTING SUCCESS FOR MOVIES

Page 2: Metis Project 2: Predicting Box Office Gross

Problem Statement/MotivationCreate a linear regression model that can predict Worldwide Gross of movies Based on a True Story by determining the features most influential to their success.

Page 3: Metis Project 2: Predicting Box Office Gross

All Features1) MPAA Rating (G/PG/PG-13/R)

2) Runtime

3) IMDB Score

4) Opening # Theaters (Domestic)

5) Opening Gross (Domestic)

6)

6) Total # Theaters (Domestic)

7)Total Gross (Domestic)

8) Peak movie season*

9) Budget

10) Genre: Action, Adventure, Biography,

Comedy, Crime, Documentary, Drama, Family,

History, Horror, Music, Mystery, Romance, Sport,

Thriller, War, Western*May, June, July, November, December are highest grossing months (BoxOfficeMojo.com)

Page 4: Metis Project 2: Predicting Box Office Gross

Procedure• Scrape all relevant data from various sources*, merge

into data frame by Title• Perform OLS regression on training set (70% of data)

beginning with all features• Evaluate model based on p-values for each feature and R2,

remove features as needed• Apply new model to remainder of data set

*Boxofficemojo.com, TheNumbers.com, IMDB.com

Page 5: Metis Project 2: Predicting Box Office Gross

Training Set—Results Feature P >|t| R2 : 0.925

Adjusted R2: 0.918

MPAA Rating 0.123Runtime 0.300IMDB Score 0.155Opening Theaters 0.002Opening Gross 0.005Total Theaters (Domestic)

0.000

Total Gross (Domestic)

0.000

Peak Movie Season 0.067Budget 0.000

Page 6: Metis Project 2: Predicting Box Office Gross

Training Set—ResultsRule: remove feature if p-value > 0.100Feature P >|t| R2 : 0.925

Adjusted R2: 0.918

MPAA Rating 0.123Runtime 0.300IMDB Score 0.155Opening Theaters 0.002Opening Gross 0.005Total Theaters (Domestic)

0.000

Total Gross (Domestic)

0.000

Peak Movie Season 0.067Budget 0.000

Page 7: Metis Project 2: Predicting Box Office Gross

Training Set—ResultsFeature P >|t| R2 : 0.920

Adjusted R2: 0.915

Opening Theaters 0.007Opening Gross 0.011Total Theaters (Domestic)

0.000

Total Gross (Domestic)

0.000

Peak Movie Season 0.069Budget 0.000

Page 8: Metis Project 2: Predicting Box Office Gross

Training Set—ResultsRule: remove feature if p-value > 0.005Feature P >|t| R2 : 0.920

Adjusted R2: 0.915

Opening Theaters 0.007Opening Gross 0.011Total Theaters (Domestic)

0.000

Total Gross (Domestic)

0.000

Peak Movie Season 0.069Budget 0.000

Page 9: Metis Project 2: Predicting Box Office Gross

Final ModelFeature P >|t| R2 : 0.904

Adjusted R2: 0.902

Total Theaters (Domestic)

0.000

Total Gross (Domestic)

0.000

Budget 0.000Next step: Add in genre categories to determine which one(s) have lowest p-values and determine how they affect the model overall

Page 10: Metis Project 2: Predicting Box Office Gross

Final ModelFeature P >|t| R2 : 0.905

Adjusted R2: 0.902

Total Theaters (Domestic)

0.000

Total Gross (Domestic)

0.000

Budget 0.000Romance ❤ * 0.053

*Runner-ups: Thriller, Sport, Family

Page 11: Metis Project 2: Predicting Box Office Gross

Key Features:Trends in Raw Data

Page 12: Metis Project 2: Predicting Box Office Gross

Model Fit:Actual Worldwide Gross vs. Predicted by Feature Model predictions

Raw Data

95% Confidence Interval

Page 13: Metis Project 2: Predicting Box Office Gross

Model Fit: Prediction Residuals

Residuals increase in magnitude as Worldwide Gross increases: model predictions are not as accurate with extreme cases

Page 14: Metis Project 2: Predicting Box Office Gross

Model Fit: Prediction Residuals

Accuracy could be improved with more data points in extreme high-grossing group

Page 15: Metis Project 2: Predicting Box Office Gross

Conclusion:Predicting Success of Movies Based on a True Story• Domestic release (measured by Total Theaters and

Total Domestic Gross) is key indicator of Worldwide Gross• Total Domestic Gross and Budget are linearly related

to Worldwide Gross• Romance genre is the most highly correlated to

Worldwide Gross

Page 16: Metis Project 2: Predicting Box Office Gross

Next Steps• Revisit larger data set knowing key features • Addition features to explore:• Actors/actresses• Award nominations or wins

• Analyze model with more features, allow more lenient p-values