Predicting the Popularity of Songs - CS221
Maika Isogawa ([email protected])Sam Masling ([email protected])
Monica Pan ([email protected])Stanford University
1 Overview1
The music industry has become one of the central hubs for fame, fortune, and artistic commentary.2
The success of a musical artist and their work depends heavily on how popular their songs are, or3
how widely their songs are heard. With the development of modern streaming services like Spotify,4
YouTube, and more, the ability of songs to go viral has changed. But what makes a song popular?5
Does the success of a song depend on the lyrics? The artist? Or perhaps the musicality of the song6
plays a larger role than both? This study aims to use various machine learning methods to discover7
what components of a song contribute its popularity and success.8
The success of a song is heavily dependent on an individual’s subjective tastes. To streamline what9
is categorized as a ’successful’ song, we use the Billboard Hot 100 [1]. This ranking is the music10
industry standard record chart in the United States. The rankings are determined by sales, radio play,11
and online streaming.12
The code for the entire project can be found in our GitHub repository [2]13
2 Task Definition14
The goal of our project is to classify whether or not a given song will make it onto the Billboard Hot15
100 chart. Essentially, we aim to predict the popularity of a song. Using historical data from decades16
of past songs, we aim to develop models that utilize both text and audio features of a song to generate17
a prediction. We designed a simple binary prediction: ’1’ if the song makes it to the Billboard Hot18
100, and ’0’ if it does not.19
The evaluation metric for the success of our project will be classification accuracy, an F1 score, and20
a confusion matrix. We have implemented three main methods: a binary linear classifier, logistic21
regression, and SVM.22
3 Infrastructure23
3.1 Data Sets24
To properly implement our algorithms, we cleaned, compiled, and restructured two large data sets.25
The first data set was compiled through the usage of the Billboard API[3]. The Billboard API was26
used to access the title and artist of every song that has made the weekly Top 100 since 2000 through27
present day. We ended up with a list of over 7000 unique songs that had made the weekly Billboard28
Hot 100 list. The songs in this data set were exclusively songs that made it to the Billboard Hot 100.29
Below are two example data points from this data set:30
• Artist: Ed Sheeran, Song Title: Perfect.31
• Artist: Fergie, Song Title: Fergalicious.32
The second data set is from Kaggle[4], an online community that allows users to find and publish a33
variety of data sets. This data set is a collection of over 55,000 unique songs, the majority of which34
did not make it onto the Billboard Hot 100. This data set included the artist, song name, a link to the35
song, and the song lyrics.36
Below are two example data points from this data set:37
• artist: ABBA, song: As Good As New, link: /a/abba/as+good+as+new_20003033.html, text:38
I’ll never know why I had to go Why I had to put up such a lousy rotten show Boy, I was39
tough, packing all my stuff Saying I don’t need you anymore, I’ve had enough And now,40
look at me standin...41
• artist: Aerosmith, song: My Fist Your Face, link:42
/a/aerosmith/my+fist+your+face_20004249.html, text: Wake up baby, what you in43
for Start the day upon your knees What you pissin’ in the wind for You musta snorted too44
much bleas East house pinball wizard Full tilt bozo plague Second floor t...45
3.2 Data Pre-processing46
In order to format the data in a way that would be useful for our project, we put our data through47
heavy pre-processing. We combined the two data sets into one large data set. Our master data set48
includes a label of whether or not the song made it to the Billboard Hot 100, in addition to a variety49
of text and audio features. To create our master data set, we used a variety of techniques to gather50
data that each data set was lacking.51
The first data set only included the artist and song title. To gather the rest of the data, we utilized the52
Genius API [6] to scrape the lyrics for the songs from the first data set. The second data included53
most of the text data that was necessary for our project. We removed the link to the song, and labeled54
whether or not each song made it to the Billboard Hot 100.55
3.3 Data Cleaning56
As we were compiling our master data set, many sections of the data were left empty or unusable.57
Much of these issues came from the Spotify and Genius APIs themselves. Thus, we had to be careful58
when reviewing our data set, and re-scraped data when necessary.59
To balance our data, we took an equal amount of songs that made it onto the Hot 100, and songs that60
did not. In addition, to account for differences in era and genre, we attempted to mitigate superfluous61
imbalances by selecting an equal amount of songs from similar eras and genres. For the Genius API,62
and the Spotify API [5] which we discuss later, we implemented a multithreading tool that allowed63
us to scrape data far more efficiently, saving us hours of dead time.64
3.4 Features65
For our basic features, we used the lyrics of the song (text features). We extracted word features from66
the lyrics and created a feature vector.67
For more advanced features, we utilized the Spotify API to gather metadata from each track. Our68
advanced features include: duration of song, key, acousticness, danceability, energy, instrumentalness,69
liveliness, loudness, speechiness, tempo, valence (musical features). These advanced features will70
give us insight into the audio qualities of the track, so that our classification will be more robust71
compared to the text data alone. A full table describing the audio features that are available through72
Spotify’s API can be found in the appendix [9]. The descriptions are officially defined by Spotify.73
3.5 Example Input74
Below are two examples of our data. Each song has both text and audio data, and our models were75
trained on this data.76
77
4 Approach78
In order to build our system, we want to understand the complex relationships between different79
aspects of a song and its popularity. Given that our data include many important features, such as80
lyrics and audio, it is hard to identify the real trends and patterns. We want the system to be able to81
make decisions based on them. A machine learning model serves the purpose really well. The weight82
of each feature would be improved as each song gets trained. After learning, the model would be able83
to predict the classification result based on the trained weights of the feature vector.84
There are many pros and cons using machine learning models. Machine learning is really good at85
reviewing a large amount of data and looking for hidden patterns that human cannot easily see between86
thousands of word and audio features. It is also a highly adaptive method as we are continuously87
improving our data set over time. However, the machine learning results are very susceptible to error.88
With the automation, machine learning models cannot identify inaccurate data or implementation89
mistakes by themselves. Therefore, some diagnosis and correction process are necessary in building90
the system.91
4.1 Baseline and Oracle92
The baseline of the project is training a linear classifier only with the lyrics of songs. The weights93
of the feature vector was trained based on the frequency of each words. The baseline algorithm is94
expected to give some statistical truth about the relationships between the lyrics and the popularity95
of songs. As a result, the baseline algorithm works pretty well: the overall accuracy is about 80%96
while the training error is as low as 12%. Therefore, it can be seen that training on lyrics of songs is a97
promising approach.98
The oracle is having a professional music producer listen to 50 songs randomly selected from our99
data set and determine if they will be popular. As a result, the music producer guessed 48 songs100
correctly out of 50 songs, giving a high accuracy of 96%. It is nearly impossible for us to implement101
an algorithm that analyzes rhythms or process lyrics like a human. However, we believe that it is102
important to measure the audio features of songs since the popularity of music itself heavily influences103
the success of songs nowadays.104
4.2 Models105
We explored several effective algorithms in three machine learning models. We utilized stochastic106
gradient descent in the binary linear classifier, regularized logistic regression, and support vector107
machine (SVM). A few implementation details are shown in the section below.108
To separate our full data set: the training set consisted of 80% of the total data, and the testing set109
consisted of 20% of the total data. The training and testing sets are selected using the train_test_split110
method from Scikit-learn library. Each model uses the same feature extractor. The feature vector is111
presented as a sparse matrix efficiently. The feature extractor extracts each word from the lyrics of112
the songs and maps it to its frequency.113
114
4.2.1 Binary Linear Classifier115
We drew inspiration from the Sentiment Analysis assignment. We used a linear predictor for the116
binary classification. The model calculates the score of the given inputs, defined as the weighted117
combination of features and predicts positive if the score is equal to or larger than 0. In order to get118
the optimized weight for each feature, we used stochastic gradient descent to minimize the hinge loss.119
4.2.2 Logistic Regression120
We implemented a regularized logistic regression using the Scikit-learn LinearRegression class. In121
order to solve the binary problem, we used the liblinear solver. The method DictVectorizer is used to122
transform the feature vector to valid sparse matrix inputs.123
4.2.3 SVM124
Implemented using the Scikit-learn SVM class. We explored four kernel SVM: linear, polynomial,125
Gaussian, and Sigmoid. The fit method was used to train, and the predict method was run on the126
testing set to predict the classification.127
4.3 Challenges128
One challenge was building the master data set. Not only were there technical challenges, but there129
were conceptual challenges as well. On the technical side, we had problems with making multiple130
request to the Genius and Spotify servers. As we were running a large data set, we had to make131
thousands of requests. Not only did this take a long time, but we also had issues with disconnection,132
barring by the website, and other credential issues. Many of these issues were likely on the company’s133
end, and we were able to work around them by utilizing multiple techniques such as multithreading,134
and using multiple account credentials. Conceptually, since the success of a song is not binary, it135
was difficult to classify a song as ’not successful’ for the purpose of our project. A popular song that136
barely missed the Billboard Hot 100 would be classified as not-successful. With a binary classifier,137
we had no way of representing the range in successes, even within the songs that did make it to the138
Billboard hot 100.139
Another challenge we encountered was that the models with text features only gave much better140
results than with both text and audio features. This seemed counter-intuitive because we expected141
that feeding the classifiers with more information would certainly improve the results. It was possibly142
caused by the noise in the data and the fact that modern pop songs put more emphasis on the music143
instead of lyrics. We continuously improved the data set by cleaning out some unnecessary data144
points. The final data set only contained songs from 2000 to 2019.145
5 Literature Review146
Prediction tasks are common in machine learning. Thus, there is already a lot of literature on147
predicting the popularity of songs, or predicting hit songs. These projects ask questions similar148
to ours: are there certain characteristics for hit songs? What has the largest influence on a song’s149
success? Can old songs predict the popularity of new songs? Music is likely a common area of150
interest, as it adds another layer of complexity beyond text and natural language alone.151
Here we briefly review two projects who’s goals are very similar to ours.152
The first project is "Song Popularity Predictor" by Mohamed Nasreldin, Stephen Ma, Eric Dailey,153
and Phuc Dang [7]. Like we did, this project utilized Spotify’s API to gather metadata on tracks.154
They also defined the "success" of a song by whether or not it made it on to the Billboard Hot 100155
chart. This project used models like logistic regression and SVM as we did, but they also looked at156
models like KNN, and a decision tree. The most accurate model predicted popular songs with a 68%157
accuracy.158
The second is the article, "Predicting Hit Songs with Machine Learning," by Minna Reiman and159
Philippa Ornell [8]. This project theorized that hit songs had features in common that made them160
appealing to a majority of people. They also utilized Spotify’s API. The other models used for this161
project were K-Nearest Neighbors and Gaussian Naive Bayes. Their most accurate model was their162
Gaussian Naive Bayes model, with 60.17% accuracy. This project explored the changes in certain163
music features over time. This is further analysis of the data and results that would be great to include164
in a more rigorous approach of this problem.165
Compared to projects similar to ours, our main differential is our data set and our balance of text166
vs audio features. Our data set is well balanced, and it is also modernized to contain songs from a167
similar age. These are components that may have led to a higher accuracy rate with our models.168
6 Error Analysis169
The main metrics for our classification task are accuracy, an F1 score, and a confusion matrix. For170
each model, there are results for basic text features and results for both text and audio features. Our171
final results for each of the models are found below.172
173
174
175
176
For text features alone, the binary linear classifier, logistic regression, and most of SVC models177
performed equally well. Among them, the logistic regression performed the best with an 87%178
accuracy and an F1 score of 0.89. The polynomial SVC model did not perform as well as the rest of179
models but still had an accuracy of 61%.180
For the text and audio features, all classifiers produced slightly more accurate results, and performed181
better than with text features alone. The most significant improvement is in the results of the bi-linear182
classifier. The accuracy increased from 81% to 89% and the F1 score from 0.86 to 0.91. The increases183
in the results of other models are within 1%. One of the possible reasons why the results changed184
very little is that the feature vector mostly consisted of word features. Since we only have 13 audio185
features but thousands of unique words in the feature vector, the predictions were heavily based on186
lyrics itself.187
Below is a comparison of the prediction accuracy of all of the models with text features and with text188
and audio features.189
190
7 Conclusion and Next Steps191
To predict the popularity of a song, we built and explored several machine learning models: bi-linear192
classifier, logistic regression and SVM. We explored different aspects of songs, lyrics and many audio193
features in order to understand which feature affects the popularity the most. Our final results showed194
that focusing on both lyrics and audio features generally gave us a better classifier than if we only195
had text features. The results were expected because the musical features of a song should impact the196
popularity of it. We also found that the bi-linear classifier is the most successful among all models197
we tested. The high accuracy and F1 score showed that it is a very promising prediction tool of the198
success of a song. It also yielded a clear statistical relationship between the performance of the model199
and different features.200
As the literature review and the multitude of API tools exemplify, music and music success are201
popular topics for research. Music seems to be able to convey meaning and feeling, similar to words,202
but without the semantics. This phenomenon lends itself well to machine learning techniques, as203
models may reveal things about music that we may not know or yet understand. Our project could204
lend itself to be developed into a more sophisticated one, but nonetheless, the techniques we learned205
in this project will be useful to apply in various problems that we may face in the future.206
8 Appendix207
[1] https://www.billboard.com/charts/hot-100208
[2]https://github.com/smasling/FinalCS221Project209
[3]https://github.com/guoguo12/billboard-charts210
[4]https://www.kaggle.com/mousehead/songlyrics211
[5]https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/212
[6]https://docs.genius.com/213
[7]https://towardsdatascience.com/song-popularity-predictor-1ef69735e380214
[8]https://pdfs.semanticscholar.org/e6cc/edb50d2c2b01bca108cb090943e86fb58135.pdf215
[9]216
217