all genres hamming test loss accuracy p a m g f i pcs229.stanford.edu/proj2019spr/poster/9.pdf ·...

1
PREDICTING A MOVIES GENRE FROM ITS POSTER GABRIEL BARNEY AND KRIS KAYA STANFORD UNIVERSITY SYMBOLIC SYSTEMS PROGRAM {barneyga, kkaya23}@stanford.edu Motivation The film industry is one industry that is incredibly reliant upon the use of posters to promote movies. Posters must convey a movie’s theme and genre to make the film seem as appealing as possible to a wide variety of people This makes the features that a poster include on it incredibly important in the portrayal of a movie Our project attempted to train a model that could learn features on a movie poster and predict the movie’s genre/genres on the basis of these features. Results References Discussion At Least One Genre All Genres Hamming Loss Baseline 12.34% ------ ----- ML- kNN (K=40) 34.28% 7.77% 0.117 OVR- kNN (K=40) 35.428% 9.71% 0.118 Data/Features Table 1: Performance of ML-kNN and OVR-kNN models Future For this project, we used features to make predictions about genre, however, movie posters contain more potential information about a film. Given more time, we would attempt to predict other things about a movie from its poster such as viewer ratings or cast members. The current dataset had a large number of dramas and few TV movies - we could augment our dataset to expose our model to more examples and make our model more robust. Figure 1: A visualization of our problem with an example poster 1. The Movies Dataset” . https://www.kaggle.com/rounakbanik/the-movies-datase 2. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.90 3. Zhang, Min-Ling and Zhi-Hua Zhou. “ML-KNN: A lazy learning approach to multi-label learning.” Pattern Recognition 40 (2007): 2038-2048. Methods and Models 1. Random Baseline - Test Size: (2000). Randomly selects genres for a given movie. 2. K-Nearest Neighbors 3 - Dataset split: (2800, 350, 350). Uses MAP to predict on unseen examples on the basis of their K - nearest neighbors. Can be framed as a multi-label classification problem (ML) or multiple binary classification problems (OVR) Objective is minimizing Hamming Loss (defined below): 3. ResNet34 2 - Dataset split: (28000, 3500, 3500). Pretrained network. We replaced the final softmax layer with a sigmoid layer and changed the loss function from Cross Entropy Loss to Binary Cross Entropy Loss. ResNets solve the problem of vanishing gradients by using residual blocks. Figure 3: ResNet34 Architecture Figure 4: Residual Block 4. Custom Architecture - Dataset Split: (28000, 3500, 3500). A simple CNN architecture with Maxpool and Dropout Layers as well as an Adam Optimizer and binary cross entropy loss (shown below). We used the Full MovieLens Dataset 1 from Kaggle, which consists of meta-data collected from TMDB and GroupLens. The dataset contains entries for 45,466 movies and each entry for a given movie contains various elements about the film such as genre, user rating, cast, and most importantly, poster. We preprocessed the dataset to remove entries with improper formatting, which simplified working with the data, and isolated the genres. We formatted each individual poster into a 224x224 square grid. Our raw input data is the color of each pixel in the image expressed in terms of RGB values - a 224x224x3 matrix. The genres are encoded using a one-hot vector. Figure 2: An example poster from our dataset The ResNet network and the Custom Architecture performed slightly better in pure accuracy when compared to ML-kNN The ResNet and Custom Architecture both had better performance in other evaluation metrics such as F1 Score, Recall, and Top K Categorical Accuracy. The distribution of our dataset may predispose our model towards certain genres. We had a range of 655 class members (TV Movies) to 15941 members (Dramas) Our model also would sometimes learn features that were prevalent but not intrinsic to a given genre, which could account for errors. This may also have been caused by issues of resolution. Type Size Stride Outputs CONV 5x5 2 64 MAXPOOL 2x2 --- 64 CONV 5x5 2 128 MAXPOOL 2x2 --- 128 CONV 5x5 2 256 MAXPOOL 2x2 --- 256 DROPOUT --- --- 256 FLATTEN --- --- 1024 RELU --- --- 128 DROPOUT --- --- 128 SIGMOID --- --- 20 Figure 5: The custom architecture Binary Cross Entropy Loss At Least One Genre All Genres Hamming Loss Test Loss Accuracy ResNet34 38.26% 12.49% 0.0938 0.2486 90.62% Genre Recall Precision F1 Count Animation 0.44 0.84 0.58 135 Comedy 0.40 0.77 0.52 1135 Drama 0.39 0.68 0.50 1558 Horror 0.32 0.52 0.40 406 Family 0.23 0.75 0.35 233 Table 2: Performance of ResNet34 Model Table 3: Top 5 class performances by ResNet34 Figure 6: Top 5 Class performances by Custom Architecture Figure 6: Top K Categorical Accuracy - Custom Architecture Genre Recall Precision F1 Count Drama 0.46 0.48 0.47 1558 Comedy 0.38 0.46 0.41 1135 Thriller 0.17 0.35 0.23 632 Horror 0.10 0.27 0.15 406 ACtion 0.08 0.26 0.12 526

Upload: others

Post on 24-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: All Genres Hamming Test Loss Accuracy P A M G F I Pcs229.stanford.edu/proj2019spr/poster/9.pdf · All Genres Hamming Loss Test Loss Accuracy ResNet34 38.26% 12.49% 0.0938 0.2486 90.62%

PREDICTING A MOVIE’S GENRE FROM ITS POSTER

GABRIEL BARNEY AND KRIS KAYA

STANFORD UNIVERSITY SYMBOLIC SYSTEMS PROGRAM

{barneyga, kkaya23}@stanford.edu

Motivation

● The film industry is one industry that is incredibly reliant upon the use of posters to promote movies.○ Posters must convey a movie’s theme and genre to make the film

seem as appealing as possible to a wide variety of people○ This makes the features that a poster include on it incredibly

important in the portrayal of a movie● Our project attempted to train a model that could learn features on a

movie poster and predict the movie’s genre/genres on the basis of these features.

Results

References

Discussion

At Least One Genre All Genres Hamming Loss

Baseline 12.34% ------ -----

ML- kNN (K=40) 34.28% 7.77% 0.117

OVR- kNN (K=40) 35.428% 9.71% 0.118

Data/Features

Table 1: Performance of ML-kNN and OVR-kNN models

Future

● For this project, we used features to make predictions about genre, however, movie posters contain more potential information about a film.○ Given more time, we would attempt to predict other things about a

movie from its poster such as viewer ratings or cast members.● The current dataset had a large number of dramas and few TV movies - we

could augment our dataset to expose our model to more examples and make our model more robust.

Figure 1: A visualization of our problem with an example poster

1. The Movies Dataset” .https://www.kaggle.com/rounakbanik/the-movies-datase2. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.903. Zhang, Min-Ling and Zhi-Hua Zhou. “ML-KNN: A lazy learning approach to multi-label learning.” Pattern Recognition 40 (2007): 2038-2048.

Methods and Models

1. Random Baseline - Test Size: (2000). Randomly selects genres for a given movie.

2. K-Nearest Neighbors3 - Dataset split: (2800, 350, 350). Uses MAP to predict on unseen examples on the basis of their K - nearest neighbors. Can be framed as a multi-label classification problem (ML) or multiple binary classification problems (OVR) Objective is minimizing Hamming Loss (defined below):

3. ResNet342 - Dataset split: (28000, 3500, 3500). Pretrained network. We replaced the final softmax layer with a sigmoid layer and changed the loss function from Cross Entropy Loss to Binary Cross Entropy Loss. ResNets solve the problem of vanishing gradients by using residual blocks.

Figure 3: ResNet34 Architecture Figure 4: Residual Block

4. Custom Architecture - Dataset Split: (28000, 3500, 3500). A simple CNN architecture with Maxpool and Dropout Layers as well as an Adam Optimizer and binary cross entropy loss (shown below).

● We used the Full MovieLens Dataset1 from Kaggle, which consists of meta-data collected from TMDB and GroupLens. ○ The dataset contains entries for 45,466 movies and each

entry for a given movie contains various elements about the film such as genre, user rating, cast, and most importantly, poster.

○ We preprocessed the dataset to remove entries with improper formatting, which simplified working with the data, and isolated the genres.

● We formatted each individual poster into a 224x224 square grid. ○ Our raw input data is the color of each pixel in the image

expressed in terms of RGB values - a 224x224x3 matrix.○ The genres are encoded using a one-hot vector.

Figure 2: An example poster from our dataset

● The ResNet network and the Custom Architecture performed slightly better in pure accuracy when compared to ML-kNN○ The ResNet and Custom Architecture both had better performance in other evaluation

metrics such as F1 Score, Recall, and Top K Categorical Accuracy.● The distribution of our dataset may predispose our model towards certain genres.

○ We had a range of 655 class members (TV Movies) to 15941 members (Dramas)● Our model also would sometimes learn features that were prevalent but not intrinsic to a

given genre, which could account for errors.○ This may also have been caused by issues of resolution.

Type Size Stride Outputs

CONV 5x5 2 64

MAXPOOL 2x2 --- 64

CONV 5x5 2 128

MAXPOOL 2x2 --- 128

CONV 5x5 2 256

MAXPOOL 2x2 --- 256

DROPOUT --- --- 256

FLATTEN --- --- 1024

RELU --- --- 128

DROPOUT --- --- 128

SIGMOID --- --- 20

Figure 5: The custom architecture

Binary Cross Entropy Loss

At Least One Genre

All Genres Hamming Loss

Test Loss Accuracy

ResNet34 38.26% 12.49% 0.0938 0.2486 90.62%

Genre Recall Precision F1 Count

Animation 0.44 0.84 0.58 135

Comedy 0.40 0.77 0.52 1135

Drama 0.39 0.68 0.50 1558

Horror 0.32 0.52 0.40 406

Family 0.23 0.75 0.35 233

Table 2: Performance of ResNet34 Model

Table 3: Top 5 class performances by ResNet34

Figure 6: Top 5 Class performances by Custom Architecture

Figure 6: Top K Categorical Accuracy - Custom Architecture

Genre Recall Precision F1 Count

Drama 0.46 0.48 0.47 1558

Comedy 0.38 0.46 0.41 1135

Thriller 0.17 0.35 0.23 632

Horror 0.10 0.27 0.15 406

ACtion 0.08 0.26 0.12 526