And the Oscar goes to… Creating an Academy Award winner predictor using supervised learning methods
Romi Gelman Noa Rogozinski
Introduction to Artificial Intelligence, The Hebrew University, 2015
ABSTRACT: Our goal in this project is to predict whether an Oscar nominated movie would indeed win an Oscar for “Best
Picture”. In order to create a worthy predictor we implemented two supervised learning techniques – the decision tree algorithm
with two different heuristics and the binary perceptron model. In this project we present the results, analyze the way the
implemented algorithms behave with different parameters and compare between the methods to arrive at the best predictor.
1. INTRODUCTION
1.1. The Problem
The Academy Awards, or The Oscars, is an annual American awards ceremony honoring cinematic
achievements in the film industry. It is considered to be the most important and prestigious cinema awards
ceremony in the world. “The Best Picture” award is the most important award category, as it represents all
of the efforts that are put into a film, including directing, acting, music composition, writing, and editing.
Thus, each year after the nominees for “Best Picture” are listed, the entire film industry and the media that
covers it, are filled with speculations and predictions for the winner. Traditionally, these predictions have
been made by movie experts, but advances in technology have made way for new methods for predictions.
For instance, a team of researchers1 has recently found crowdsourcing to be a very successful predictor,
translating Oscar predictions data from online betting markets into probabilities.
1.2. Our Approach
In this project, we wish to predict which movies out of a given list of real Oscar-nominees for “Best Picture”
would win an Oscar. We base our project on the presumption that Oscar winning movies are getting picked
throughout the years according to recurring patterns relating to the movies’ attributes.
Our approach for predicting whether Oscar nominated movies will win an Oscar is to address the problem as
a supervised learning problem. Supervised learning was a natural choice, since each movie can be defined as
a set of its attributes, and, with the use of existing information about past nominees, can be considered as an
example, with the attributes as the input and a label ("won" or "lost") as the desired output. The attributes of
these Oscar-winning and Oscar-losing examples can be then fed into various models from the supervised
learning field in order to train the models, and also to later test them.
For this project, we used the decision tree and the binary perceptron model, as they were the clear choice for
supervised learning models and fit well with our representation of the problem. The algorithms in both
models identify which of a movie’s attributes are more relevant to its chance at winning an Oscar and then
try to predict if a movie would win according to the values of its attributes. The performance of each model
and comparisons between them will be discussed later in the project.
We believe information about the significance of the different components of a film and their effect on
whether a film would have a higher chance to win an Oscar may be highly valuable for production
companies.
1 http://blog.revolutionanalytics.com/2014/02/oscars-betfair.html, http://www.predictwise.com/node/2837
2. REPRESENTATION OF THE PROBLEM
2.1. The Movie Attributes
Each of the movies that were nominated for “Best Picture” in the 87 years the Academy Awards have taken
place is represented as a set of attributes retrieved from IMDb and stored as an instance of the Movie class.
We decided to concentrate on attributes relating to the movie’s cast and crew and their previous success in
the Oscar’s, dry stats attributes, content-related attributes, and on whether the movies won other cinematic
awards that took place before the Oscars.
In order for the learning algorithms to allow generalization (and to avoid over-fitting the learned examples),
we discretized (made a coarser division of) the possible values of some attributes.
Following is the list of attributes used in our project –
a) Attributes relating to the content:
Genre, as extracted from IMDB database (Drama, Comedy, Crime, Action , Biography ,
Adventure, Animation , Western, Romance, Musical , History, Fantasy, War, Mystery, Thriller,
Sport, Family, Music, Sci-Fi)
MPAA rating – the Motion Picture Association of America rating of a film’s suitability for
certain audiences, based on its content: “G” – for general audiences, “PG” – parental guidance
suggested, “PG-13” – parents strongly cautioned, “R” – restricted
Age Category of Main Character – the age of the main character, rounded down to the nearest
10-year mark it belongs to (0, 10, 20, …, 60, 70); determined by the real age of the first
actor/actress listed for the movie in IMDb
Theme of movie: we picked three themes: war, physical and mental disease, and show
business; created a pool of representative keywords for each theme, and checked for each movie
if the movie contains any of the keywords. If so, we classified the movie as belonging to the
corresponding theme.
Note - the selection of the themes is based on “movie experts' chatter” during the 2015 Oscar’s
ceremony.
b) Attributes relating to dry stats:
IMDb rating - the rating given to the movie by IMDB users: from 0 to 10 rounded to the
nearest multiple of 0.5
Runtime category - length of movie, rounded down to the closest half hour
Season of release – January to March, April to June, July to September, October to December
c) Attributes relating to the quality of the cast and crew:
Total number of times cast won an Oscar – counts the total number of times any of the first 3
actors listed for the movie in IMDb had won an Oscar (not including current movie)
Total number of times cast was nominated for an Oscar – counts the total number of times
any of the first 3 actors listed for the movie in IMDb were nominated for an Oscar, including
wins (not including current movie). This attribute was also discretized.
Producers’ quality – this category is split into 3 categories: “won” – if any of the producers of
the film had won an Oscar for other movies; “nominated” – if any of the producers of the film
had been nominated for an Oscar; “nada” – no nominations for any of the producers for other
films
Director’s quality - this category is split into 3 categories, as in producers' quality.
Writer’s quality - this category is split into 3 categories, as in producers' quality.
Editors' quality - this category is split into 3 categories, as in producers' quality.
d) Attributes relating to other cinematic achievements:
Won Bafta - whether the movie had won the “BAFTA” awards
2.2. Training Set and Test Set
In order to measure the predictive power of each of the models (their ability to correctly predict whether a
nominated movie will win an Oscar), we had to test the performance of each model on movies that weren’t
given as training examples during the model’s learning process. Therefore, for each trial, we divided our list
of movie nominees (our set of examples) into a training set and a test set, the split was done so that all of the
movies nominated in the same year were in the same subset. The movies in the training set were used to
teach the model, and the ones in the test set were used to measure the quality of the learning of the model.
The size of the training set differs from trial to trial. In each trial we applied one of two methods to divide
the examples into a training set and a test set. In the first method the set itself is randomly created each time
and therefore independent from the training sets in the other trials. As for the second method, we decided it
is more natural for predictions regarding Oscar winners to rely on the history of winnings and therefore the
training set contains the earlier years and the test set the later years.
2.3. Extracting the Data
We used a table2 listing all Oscar nominations and wins for “Best Picture”, “Best Actor”, “Best Supporting
Actor”, “Best Actress”, “Best Supporting Actress”, “Best Director”, “Best Film Editing”, “Best Writing –
Original Screenplay”, and “Best Writing – Adapted Screenplay”. This table was used as the base input list
of movies nominated for “Best Picture”, as well as for fetching the Oscar nomination and winning statistics
of the cast and crew for each movie. In order to extract the attributes for each movie in the input list, we
used the IMDb python package and the OMDb API web service (using the Python requests package).
Note: some of the attributes for some of the movies were missing from the database, so we created a
separate “doesn’t exist in db” value for each attribute (we will later regard this as the “missing attributes”
problem).
3. METHODS
3.1. Models and Algorithms
3.1.1. Decision Tree:
The Decision Tree model uses a list of attributes and a set of classified examples defined by their
attributes in order to create a flow-chart like structure (the tree) that is used to classify new inputs. Each
internal node of the tree represents an attribute, and its branches represent the different values for that
attribute. Each leaf node represents a classification. A new input can be represented according to its
attributes by a path from the root of the tree to one of the tree's leafs, and is classified according to the
leaf (“true” or “false”).
The Decision Tree Learning algorithm (DTL) is based on recursively choosing the “best” attribute to
divide the current examples by and setting it as the root of the subtree. Choosing the “best” attribute is
done using various heuristics. We used the entropy based heuristics "Information Gain" and
"Information Gain Ratio" in our implementation.
The "Information Gain" heuristic chooses the attribute A with the largest IG, i.e. minimizes the
average conditional entropy - H(Examples|A).
IG(Examples|A) = H(Examples) − H(Examples|A)
H(Examples |A) = ∑p(v)H(Examples|A = v)
v∈A
The "Information Gain Ratio" heuristic chooses the attribute with the largest ratio of information
gain to the intrinsic information.
2 http://www.ya-shin.com/awards/awards.html
IGR(A) =IG(Examples|A)
H(A)
3.1.2. Binary Perceptron:
The binary perceptron model is a classical classification model taken from the field of artificial neural
networks and supervised learning. This model is based on the idea of a single neuron that fires if and
only if its output, which is calculated as the total weighted sum of its inputs, is higher than a certain
threshold (in our case, 0). An n-dimensional binary perceptron has n inputs and n weights. During its
training process, it receives examples, which are n-dimensional inputs with their classification label
(⟨𝑋→, 𝑦0⟩), and adjusts its weights to better fit each example seen (according to the
online algorithm showed below). Once the training process is done, an input can be
classified into one of two categories according to whether the trained perceptron fired
or not upon receiving it.
In our case, the n-dimensional input (X→
) is the attributes of a certain nominated movie. A movie’s
(binary) label in the training set (𝑦0) represents whether or not it won an Oscar. The weight given to the
i’th attribute is marked 𝑊𝑖, and the perceptron’s output for a movie is marked 𝑌.
The following equation describes the classification done by a trained perceptron –
𝑌 = 𝑠𝑔𝑛 (𝑊→ ∗ 𝑋→) = 𝑠𝑔𝑛(∑𝑊𝑖
𝑛
𝑖=1
𝑋𝑖)
The Binary Perceptron Learning Algorithm –
Given a set of P inputs {< 𝑋1→ , 𝑌01 >,…, < 𝑋𝑝
→ , 𝑌0𝑝
>} where 𝑋𝑖→
is the i’th n-dimensional input and
Y0𝑖 ∈ {1,−1} is that input’s label, and given that the inputs are linearly separable, the following
algorithm finds a set of weights (𝑊 → ∈ 𝑅𝑛) so that for each 𝑖 ∈ {1…𝑝} it holds that 𝑌0
𝑖 = 𝑌𝑖.
1. Start with an arbitrary set of weights 𝑊0→
2. Loop over the given set of inputs from first to last repeatedly until convergence:
If 𝑌0𝑖 = sgn(𝑊𝑖−1
→ ∗ 𝑋𝑖→ ), go to the next example (in case the perceptron's output for the example
matches its label). Else, update the weights: 𝑊𝑖→
= 𝑊𝑖−1→
+ 𝑌0𝑖 ∗ 𝑋𝑖→
3.2. Implementation
3.2.1. Decision Tree Implementation:
The implementation of the Decision Tree and the two different heuristics is as learned in class. Since
we build the decision tree using only the attribute values of the examples in the training set, when
testing movies from the test set, we occasionally encounter a missing attribute value. In this case we
return the default value of false. We made this choice since most nominated movies don't win an Oscar
and when changing the default value to true didn’t affect much the results.
3.2.2. Perceptron Implementation:
The implementation of the Binary Perceptron is according to the algorithm described above. Two
problems we had to address while implementing the learning algorithm and testing of this model were
how to handle missing attributes and the convergence of the weights vector.
Any "missing attributes" (i.e. attributes of the perceptron that couldn’t be determined when creating the
movie instance of a nominee) were given a default value of 0. We chose 0 as the default value so that in
the case that the output of the perceptron doesn’t match the true winning status of the nominee and an
update of the weights vector is made, the update wouldn’t affect the current learned weight value of the
weight that represents the missing attribute.
Regarding the convergence of the algorithm, the algorithm described above converges only if the
examples are linearly separable (can be divided by a hyper plane into two classification areas so all of
the examples with the same label are in the same area). According to Cover’s Theorem, as long as
P < 2N (the number of examples is less than double the dimension of the perceptron), all dichotomies
are possible – i.e. the examples can be linearly separable. In our case, N ≈30, and in most of the cases
we tested P > 60, so the examples are not necessarily linearly separable, and the algorithm doesn’t
converge for sure. For this reason, we limited the number of iterations as part of the algorithm's
implementation.
The implementation of the input vectors for the perceptron was done with consideration to the fact that
some movie attributes have continuous numerical values (such as “imdb rating” and “times cast was
nominated”), while others have non-numeric values (such as “genre” and “plot keywords”).
“Continuously-valued” attributes were assigned a single weight in the perceptron (and so a single index
in the input vector), and “discrete-valued” attributes were assigned several “Boolean” weights – one for
each of its possible values.
In the process of teaching the perceptron a single example, translating a movie’s “continuously-valued”
attribute value into a value in its input vector involved simply extracting the value of the attribute and
assigning the attribute’s corresponding index in the input vector a value that matches the attribute’s
value in its magnitude. For example, a movie with a higher “imdb rating” would have a greater value in
the index representing “imdb rating” in its input vector, comparing to a movie with a lower rating. On
the other hand, translating a movie’s “discrete-valued” attribute value into perceptron inputs involved
assigning a positive value to the index that represents that value in the input vector and assigning a
negative value to the indices that represent the other possible values for that attribute.
4. THE TESTING PROCESS
After implementing the basic algorithm, we played with several parameters relating to different aspects of the
project (algorithms, attributes, and training examples) in order to find the configuration that gives the best
predictions. All the parameters chosen have some degree of freedom that allows us to change their values within
the limits of the chosen methods.
The parameters we chose are:
a) Method for choosing the test set examples – randomly or not
b) Number of perceptron training iterations
c) Training set size
d) Default choice for missing attribute values –the default choice is given by the decision tree when
encountering a path when testing an example from the test set that reaches an attribute value that didn’t exist
in the training set.
e) Attributes – which attributes from the above description are taken under consideration (inserted to the list of
attributes given to the DTL algorithm and perceptron model)
f) Level of Discretization – regular (the level of discretization described for each attribute in the second
chapter) or a more coarse division of the values for some attributes. A stricter discretization can be done on
attributes such as ‘genre’ (by defining five main genres – Action, History, Drama, Family and Sci-fi),
‘runtime’, ‘IMDb rating’ and ‘age category of main character’ (by narrowing it down to only 4 age groups).
5. RESULTS AND ANALYSIS
We examined our algorithms on two different sets of data: the first contained all the movies nominated from the
last 15 years and the second contained all movies since 1928, the first year the Academy Awards were held. For
each data set, we performed several tests, differing in several parameters.
In this section, we will show the results of the different tests, and in the next section we will analyze the results
and try to understand if we can answer the question of whether it is possible to predict if a movie will win an
Oscar, and if so, which method works best.
5.1. Evaluating the quality of the prediction
Since it is problematic to determine the quality of our predictors according to a single form of measurement,
we will use 5 different indicators:
Overall correct = number of matches
test set size
True positives = number of positive matches
number of Oscar winners in test set = percent of actual winners that were rightly predicted to be winners
True negatives = number of negative matches
number of Oscar losers in test set = percent of actual losers that were rightly predicted to be losers
False positives = number of Oscar losers with a positive predictor outcome
number of positive predictor outcomes
False negatives = number of Oscar winners with a negative predictor outcome
number of negative predictor outcomes
* A match signifies that the predictor’s output for the movie was the same as its real label, a positive match
is a match for a movie with a “true” label (did win the Oscar) and a negative match is a match for a movie
with a “false” label (didn’t win an Oscar).
We used several measures and not only the “Overall correct” measure due to the fact that most movies do
not win the Oscars (only 1 out of 5-6 movies wins each year), and thus even a predictor that returns “false”
for every input would show “good” performance according to this measure. The use of several relative
measures, when taken into account together, can portray a predictor’s quality in a more accurate way.
5.2. Results from the first data set
This data-set contains movies nominated for the "best picture" award, starting from the 2000 Academy
Awards and up to the 2014 Academy Awards.
5.2.1. Base Configuration results
We began our testing with a base configuration of parameters:
a) Training set size – 80% of the data set (training set consisted of movies from 2000-2011 and testing
set consisted of movies from 2012-2014).
b) Choosing the test set examples – examples were not chosen randomly, i.e. the movies in the training
set were taken from the chronologically first 80% of years in the last 15 years.
c) Perceptron training iterations – 6
d) Default choice for missing attribute values – False
e) Attributes - using all of the attributes described above
f) Level of Discretization – regular
Chart 1 shows that the Decision Tree with the information gain ratio heuristic gave the best prediction
for the base configuration. Figure of the learned tree can be seen in figure 1 in the appendix.
As can be seen in Chart 1, and repeatedly in the rest of the results, that the false positives are a lot higher
than the false negatives. This may be misleading since the percent of false positives is calculated in
relation to the total number of winning movies and the false negatives are calculated in relation to the
total number of losing movies, which is much bigger. Therefore, even a single false positive creates a
giant leap in the percentage of false positives.
Chart 1 - base configuration results (small data set)
When looking at the predictions this configuration returns for the movies nominated in the 2015
Academy Awards we receive the following –
Movie Did the
movie win
the Oscars
According to the Decision
Tree with the Information
Gain heuristic
According to the Decision
Tree with the Information
Gain Ratio heuristic
According
to the
Perceptron
American Sniper False False True False
Birdman True False True True
Boyhood False False False False
Selma False False False False
The Grand Budapest Hotel False True False False
The Imitation Game False False True True
The Theory Of Everything False False False True
Whiplash False False False False
5.2.2. Comparisons to the base configuration
Our goal was to improve the results of the base configuration by playing with the values of the different
parameters listed in the previous chapter. For each of the parameters for comparison, we compared the
results of several different values in order to find the best one. We will now discuss the conclusions
drawn from the analysis of these comparisons.
When comparing different number s of training iterations of the perceptron we found that that the
optimal number of training iterations in our model is approximately 6 (as can be seen in chart 2 in the
appendix – 7.1). Furthermore, we found that the size of the training set used changes the quality of the
prediction significantly, an 80-20 split achieves the best results for this data-set (see chart 3).
Most of the attributes used by us to represent a movie (listed above) have a positive effect on the
quality of the prediction. An exception is the 'age category of main character' attribute, that when
removing it completely, an improvement is seen in both heuristics of the decision tree (see chart 4).
When creating a strict discretization of some of the movie attributes ('genre', 'runtime', 'imdb rating',
'age category of main character' and ' times cast nominated'), we notice a clear loss in the quality of the
prediction of the decision tree with the information gain ratio heuristic and a clear gain of the quality of
the prediction of the decision tree with the information gain heuristic. The perceptron does better in
some measurements and worse in others. On the other hand, when narrowing the range of values of
only the 'genre' and 'age category of main character' attributes (partial discretization), we get a
significant improvement in the quality of the prediction of the decision tree with the information gain
ratio heuristic (this is in fact the best predictor found for this data-set). In the two other methods, some
measurements improve while others do worse (see chart 5).
Figure 2 - The best predictor found for the first data set. Trained on 80% of data set with base configuration
parameter values and discretization of the ‘genre’ and 'age category of main character' attributes.
In addition to these results, changes such as changing the default classification of the decision trees,
shuffling the years before assigning them to the training and test sets, balancing the number of winners
and losers in the data set, and removing additional attributes resulted in worse results comparing to the
base configuration described above.
To conclude the results of the different comparisons, the best predictor for this data set was received
when using the decision tree model with the information gain ratio heuristic and setting the parameters
to the base configuration parameter values with the further discretization of the ‘genre’ and 'age
category of main character' attributes. This prediction gave 66.6% true positives, 95.6 % true negatives,
33.3 false positives and 4.3% false negatives. The learned tree is shown in figure 2, the perceptron
weights of this configuration can be seen in the appendix.
5.3. Results from the second data set
The second data-set contains all the movies ever nominated for the "best picture" award, starting from the
1928 Academy Awards and up to the 2014 Academy Awards.
5.3.1. Base Configuration results
We began our testing with a base configuration of parameters:
a) Training set size – 70% of the data set
b) Choosing the test set examples – examples were not chosen randomly, i.e. the movies in the training
set were taken from the chronologically first 70% of years in the data set
c) Perceptron training iterations – 6
d) Default choice for missing attribute values – False
e) Attributes - using all of the attributes described above
f) Level of discretization - regular
Chart 6 shows that the Decision Tree with the information gain ratio heuristic gave the best prediction
for the base configuration when taking into account all of the measures together.
Chart 6 - base configuration results (complete data set)
When looking at the predictions this configuration returns for the movies nominated in the 2015
Academy Awards we get the following –
Movie Did the
movie win
the Oscars
According to the Decision
Tree with the Information
Gain heuristic
According to the Decision
Tree with the Information
Gain Ratio heuristic
According
to the
Perceptron
American Sniper False False False True
Birdman True False False True
Boyhood False False False False
Selma False False False True
The Grand Budapest
Hotel
False False False True
The Imitation Game False False False False
The Theory Of Everything False False False True
Whiplash False False False True
Marked in bold are the differences comparing to the results from the small data set
We can see that training on the complete data set caused the decision trees to become more “negative”,
and the perceptron to become more “positive”, and overall the results are less accurate.
5.3.2. Comparisons to the base configuration
In similar to the comparisons done with the first data set, we tried to improve the above results by
playing with the different parameter values and also by applying some of the conclusions from the first
data set. The results are brought forward here.
We found that the size of the training set does indeed impact the results, and that the 70%-30% split
seems to be best, though it is not very decisive because the impact on the performance in various
measures is different for each measure (see chart 7 in the appendix - 7.2). Attempts to completely
remove an attribute didn’t result in an improvement (some of these attempts are shown in chart 8).
We also tried to recreate the success of the predictor that was achieved for the first data set by further
discretizing attributes. When we applied discretization to the ‘genre’ attribute no significant change
was noticed. By discretizing both the ‘genre’ and ‘age of main character’ attributes, the decision tree
with the information gain ratio heuristic results and the perceptron results actually got worse (became
too “negative”), this being in contradiction to the beneficial impact this change had on the first data set
(see chart 9).
In addition, once again we saw that changing the default classification of the decision trees and
shuffling the years before assigning them to the training and test sets resulted in same or worse results
comparing to the base configuration described above.
To conclude, the predictors for the complete data set were not very efficient, and it is hard to say which
configuration was best. We present here one of the best configurations with its results. The following
prediction was received when using the decision tree model with the information gain ratio heuristic
and setting the parameters to the base configuration parameter values. This prediction gave results of
48% true positives, 78% true negatives, 69% false positives and 12% false negatives. The learned tree
is shown in an attached file named bestDecisionTreeBigDataSet(iGRatio).gv.pdf.
6. CONCLUSIONS AND DISCUSSION
6.1. Can we predict if a movie will win an Oscar
The question of whether it is possible to predict, through the use of “dry” attributes regarding the movie and
its cast and crew, whether a nominated movie will win an Oscar doesn’t have a very clear answer.
According to the results, it seems that predictions have a better chance at being accurate if made based on a
relatively small range of years immediately prior to the range of years in the test set. Overall, the results for
the first and smaller data set (2000-2014) were better than the results for the large and complete data set. To
illustrate, the “true positives” rate in the complete data set never reaches the 50% mark, while the better
configurations used on the small data set all exceed 65% for “true positives”. Perhaps the qualities by which
movies are picked for winning the Oscars have changed over the years, and so a predictor that is trained from
a training set that contains movies from a long span of years may in fact be learning from “inconsistent” data,
which results in poor performance.
The results show that for most of the configurations, the prediction is better than chance level, but we still
wouldn’t put our money on it. The only configuration we found that we can consider as a good predictor
involved using the decision tree model with the information gain ratio heuristic and setting the parameters to
the base configuration parameter values with the further discretizing of the ‘genre’ and 'age category of main
character’ attributes and running this configuration on the smaller data set (from the last 15 years) with a 80-
20 split. We can assume the discretization of these two attributes helped to avoid overfitting, and thus
produced better results.
Chart 10 – prediction results of applying the decision tree model with the information gain ratio heuristic on the small
data set, when setting the parameters to the base configuration parameter values and with the further discretization of the ‘genre’ and 'age category of main character’ attributes
6.2. Comparing between the methods
As described above, for both data sets the best results were achieved when using the decision tree model with
the information gain ratio heuristic. Even so, throughout most of the configurations in both data sets, the
performance of the perceptron is similar in its quality to the quality of the IG ratio heuristic of the decision
tree. It is difficult to make an accurate distinction between the different methods due to the different
performance on the different measures. The perceptron does relatively well in the “true positives”
measurement, whereas the IG ratio heuristic of the decision tree usually has more “true negatives”.
When comparing the two heuristics of the decision tree, we see that the information gain ratio heuristic
usually does better. This is probably due to the fact that the information gain heuristic has a bias for choosing
attributes with many possible values, a bias that is reduced in the information gain ratio heuristic because it
takes the number and size of branches into account when choosing an attribute. This can also explain why
when we created a stricter discretization of attribute values or removed attributes with many values, the
performance of information gain improved, and sometimes even exceeded that of information gain ratio.
It is interesting to see that adding “Boolean” inputs to the perceptron (which classically receives
“continuous” inputs) did not have a negative impact on its prediction abilities. In fact, removing these kinds
of attributes from the perceptron did have a negative effect.
6.3. Going Forward
Throughout our project, we identified several issues that should be taken under consideration, and that could
be used to make improvements:
1) Using even distribution to divide attribute values –
When discretizing the different attributes values, we chose values using fixed intervals. Dividing the
values according to their distribution could contribute to making better distinctions between them.
2) A single winner per year –
A main issue is that when we implemented our algorithms we simplified the problem and didn’t take into
account the fact that only one of the nominees wins each year. Our algorithms test each movie in separate
and not as part of the group of nominees for that year. Our predictor in fact seeks the “Oscar winner
quality” and not the “relative Oscar winner quality” (the movie with best chances of winning the Oscars
from a group of specific nominees).
This desired type of prediction does not fit the decision tree and binary perceptron models. Trying to
solve this problem with different methods that address this issue could improve our results and lower our
typically high “false positives” rate.
3) Improving the Movie object –
We could of course expand our data sources and also improve the plot analysis in order to find new
attributes that might make more essential distinctions between movies. For example, we could add more
attributes relating to other prestigious awards the nominated movies had won.
7. APPENDIX
7.1. Comparison results of small data-set
Figure 1 – The decision tree received using the gain ratio heuristic on the first data set with the base configuration. An example of the path for the movie – "imitation Game" which was wrongly predicted:
𝑤𝑜𝑛𝐵𝑎𝑓𝑡𝑎: 𝑓𝑎𝑙𝑠𝑒 → 𝑡𝑖𝑚𝑒𝑠𝐶𝑎𝑠𝑡𝑁𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑: 1 (𝑟𝑜𝑢𝑛𝑑𝑒𝑑 𝑡𝑜 0) → 𝑡𝑖𝑚𝑒𝑠𝐶𝑎𝑠𝑡𝑊𝑜𝑛: 0 →
𝑖𝑚𝑑𝑏𝑅𝑎𝑡𝑖𝑛𝑔: 8.0 → 𝑚𝑜𝑛𝑡ℎ𝑂𝑓𝑅𝑒𝑙𝑒𝑎𝑠𝑒: 𝑂𝑐𝑡𝑇𝑜𝐷𝑒𝑐 → 𝑡𝑟𝑢𝑒
Chart 2 - comparing results of perceptron with different number of training iterations
Training set size – 70% Training set size – 90%
Chart 3 - comparing the results of different sizes of training sets
Results when removing ‘times cast nominated’ attribute Results when removing ‘age category of main character’ attribute
Chart 4 – examples of results when removing attributes
High discretization (genre, runtime, imdb rating etc.) Adding discretization in ‘genre’ and ‘age category of main character’
Chart 5 – comparing results when changing the level of discretization
Example of weight vector of the perceptron –
'wonBafta': 72, 'Sci-Fi': -24, 'Comedy':-11, 'Drama': -11, 'Action': -11, 'History': -11, , 'Family': -11,
ageCategoryOfMainCharacter-20': 0, 'ageCategoryOfMainCharacter-30': 16, 'ageCategoryOfMainCharacter-
45': -8, 'ageCategoryOfMainCharacter-60': -8, AprToJune': 14, 'OctToDec': 6', JulyToSept': -26, 'JanToMar':
6, 'timesCastNominated': 0, 'timesCastWon': 6, 'editorQual': 20, writerQual': 0, producerQual': 10,
'directorQual': -26, 'imdbRating': 9.0, 'war': -26, 'retarded': 50, 'showBiz': -26', 'runtime': 0', 'mpaa': 6
7.2. Comparison results of big data set
Training set size – 60% Training set size – 80%
Chart 7 - comparing the results of different sizes of training sets of the complete data set
Results when removing genre attribute Results when removing ‘times cast nominated’ attribute
Chart 8 – examples of results when removing attributes in complete data set
With discretization to the ‘genre’ attribute With discretization to ‘genre’ and ‘age category of main character’
Chart 9 – comparing results when changing the level of discretization in complete data set