runninghead: applyingpsychologicalmodelstomusic
TRANSCRIPT
Running head: APPLYING PSYCHOLOGICAL MODELS TO MUSICRECOMMENDATION 1
Exploring the Uses of Psychological Models of Generalization in Music Recommendation
Systems
Tiffany Hwu
University of California, Berkeley
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 2
Exploring the Uses of Psychological Models of Generalization in Music Recommendation
Systems
Abstract
Recommender systems are integrated into many internet-based services, including
movie recommendations, shopping suggestions, and automatic playlist generation.
Although there are a number of effective recommender models inspired by traditional
machine learning methods, the use of psychological models in recommendation is far less
explored. We propose that the process of finding similar music tracks parallels the
cognitive task of generalization, which could potentially be used to aid in music playlist
recommendation. In generalization, stimuli are defined within a psychological space, in
which previously experienced stimuli are used to create generalizations about newly
presented stimuli. Similarly, a person who is trying to construct a playlist will use their
prior musical knowledge or intuitions to find songs of similar taste. The main objective of
this work is to evaluate the effectiveness of applying psychological models to large scale
online datasets which contain the listening histories of users. The models are tested both
qualitatively and quantitively by holding out portions of the dataset and evaluating how
well they can predict the missing information. Using common metrics from information
retrieval, we explore the advantages and differences of using psychological models over
traditional machine learning models in recommender systems. Additionally, we provide an
example of how large existing databases of human behavior can be used to conduct
psychology experiments in a robust and affordable manner.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 3
Introduction
Recommendation systems, also known as recommender systems, have had a wide
variety of uses in recommending music, shopping items, and movies, to the benefit of
websites hoping to maximize profit and customers searching for items that meet their needs
and preferences. The basic recommender system consists of a collection of items described
by content and user ratings, accompanied by a model that uses this data to generate
predictions on which items a particular user will prefer. There are two main approaches to
recommendation: the collaborative filtering approach, which relies on user ratings for
prediction, and a content-based approach, which uses data on features innate to the items,
such as the audio features of songs. Specifically within the collaborative filtering approach,
there are memory-based algorithms which compare users and their preferred items, and
model-based algorithms which use the data to train models and learn latent
representations of the users and items(Su & Khoshgoftaar, 2009).
Music recommendation systems are a subset of recommendation system containing a
collection of tracks and user preferences of tracks which are either explicit in the form of
numerical ratings or implicit with other behavioral data. With websites such as
Last.fm(CBS Interactive, n.d.) containing vast amounts of information on the contents of
user playlists, we are able to create and test our own recommendation models.
The process of providing recommendations based on user and item data can be
viewed as a task of finding which stimuli are most similar to each other and which
conceptual groups of stimuli can be formed. In other words, the process of recommendation
can be viewed as a form of generalization as described by Shepard(1987). This different
perspective on recommendation models naturally presents the question of whether we can
use the field of psychology to augment current recommendation algorithms. A number of
computational models of generalization could readily be used as recommendation
algorithms and contrasted with traditional models to see what they may contribute to the
cause. For instance, Tenenbaum and Griffiths(2001) suggest that human generalization can
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 4
be captured within a simple Bayesian framework which is able to generalize from an
arbitrary number of consequential stimuli and with an arbitrary representational structure.
This Bayesian generalization model is an example of the many topics of psychology
relevant to our topic of recommendation.
Such an exploration would benefit not only the world of recommender systems, but
also serve as an example of incorporating large preexisting datasets into psychology
research. The availability of music playlist data makes it a perfect medium for observing
psychological trends. Unlike the traditional experimental paradigm of experimentation
with a small population size of hand-run subjects, we can move toward the paradigm of
finding large preconstructed data of human behavior that buffers against risks of small
homogenous testing populations and expensive experimental procedures. With these
motivations, we compare and contrast various psychological and non-psychological models
in music recommendation.
This paper begins by providing background on the Bayesian generalization
framework. We then extend this model and other models to the task of music
recommendation, and describe the datasets and methods used to compare them. Finally,
we discuss the ways in which recommendations using psychological models can provide far
different results from recommendations of more traditional models.
Background
Bayesian Generalization. The Bayesian generalization framework(Tenenbaum &
Griffiths, 2001) has been successfully used in a variety of different psychological domains.
The framework consists of a query X of positively seen examples and a hypothesis space
H, a collection of hypotheses h defined by collections of positively seen examples. The
likelihood of any one hypothesis is defined by
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 5
P (X|h) =
1/|h|n x(j) ∈ h
0 otherwise
(1)
which demonstrates the size principle, the idea that hypotheses of smaller size (defined by
fewer examples) are more likely than hypotheses of larger size. Here, size is represented by
the variable n. To find the posterior probability P (h|x) of a hypothesis being correct, we
apply Bayes’ rule. The prior, P (h), is adjusted according to the particular task that is
being modeled, as shown here:
P (h|X) = P (X|h)P (h)∑h′∈H
P (X|h′)P (h′) (2)
Once there is a posterior probability of each hypothesis being correct, we can
determine whether a new object is part of a concept C by applying the equation below. C
represents the concept embodied by our query X, and P (y ∈ C|h) is either a 1 or a 0
depending on whether the output element exists in the hypothesis:
P (y ∈ C|X) =∑h∈H
P (y ∈ C|h)p(h|X) (3)
Abbott, Austerweil, and Griffiths(2012) apply a Bayesian generalization framework
for large-scale word learning. With a hypothesis space constructed from Wordnet(Miller,
1995), the model is able to learn the taxonomic relationships between words. The success
of the Bayesian generalization framework in this domain is the main motivation for
expanding into other applications such as music playlist recommendation.
Recommendation as Generalization
Recommendation can be viewed as a generalization task in which a group of items
exists in a user’s history and the goal is to determine which other items would belong in a
similar category. In this section we discuss how to apply models of generalization to this
new domain.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 6
Datasets
Our primary method for exploring psychological and traditional models follows the
format of the Million Song Dataset Challenge, a music recommendation challenge that
provides half of the listening histories of a large collection of users and asks contestants to
predict the missing half of the data(Brian McFee & Lanckriet, 2012). Since the contest was
hosted by Kaggle in 2012 and is no longer accepting submissions, the missing half of the
data has been released, allowing us to calculate scores that our models would have achieved
if entered in the competition. While all contestants have access to advanced audio features
of each song through the million song dataset, our solution relies mainly on user ratings.
The listening histories for 110,000 users is provided in the form of triplets consisting
of user id, song id, and playcount. A dataset containing the listening histories of an
additional 1 million users is available in the Echonest Taste Profile Subset, which follows
the same format.
Additionally, we repeat the same procedure on the AOTM-2011 dataset(McFee &
Lanckriet, 2012), a large dataset compiled from Art of the Mix, which is a website where
users post their favorite playlists. The playlist data was separated into equally-sized
training and testing sets, split randomly. While both datasets draw from the Million Song
Dataset, the AOTM-2011 dataset consists of consciously-selected playlists as opposed to
entire listening histories as in the MSD challenge.
Constructing a Hypothesis Space
The testing and training datasets were converted into matrices with columns
representing users and rows representing songs. The elements of the matrix are ‘1’ if the
song has been played by the user at least once or is contained in a playlist, and ‘0’
otherwise. This process resulted in 2 binary matrices for each dataset, as summarized
below.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 7
Models
The primary task of our work involves applying psychological models on the visible
half of user listening histories to see how well they can predict the songs in the missing
half. All models will be applied to the binary matrices described above. In addition, each
model will be tested along 2 conditions: query size (number of songs in the visible half of a
user’s history), and popularity threshold (matrix filtered out to contain only songs above a
specified total playcount).
Music Recommendation as Bayesian Inference. To apply this framework to
our data, we treat each each column as a hypothesis. The corresponding query X of
positively seen examples is a collection of songs representing the visible half of a particular
user’s listening history. The likelihood is as described above, with hypotheses of fewer
songs being more likely.
The prior in this case is assigned an Erlang distribution, representative of the
intuition that intermediate-sized playlists are more likely than small or large playlists. This
is described by P (h) ∝ (|h|/σ2)e−|h|/σ where σ was hand-selected as 10.
One further adjustment was made to the likelihood calculation, allowing for an error
term ε = 1 ∗ 10−15 accounting for noise in the dataset. This allows for a likelihood to be
calculated even if not all songs in a query are members of a particular hypothesis. This is
expressed in
P (d|h) =
1/|h|(1− ε) + ε d ∈ h
ε otherwise
(4)
P (X|h) =∏d
P (d|h) (5)
Finally we can use probability generalization to create a ranking of all songs in the
dataset listed in order of songs most likely to be in the user’s missing half of listening
history.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 8
Exemplar/Prototype Theory. The exemplar and prototype models perform
categorization tasks through probability density estimates(Ashby & Alfonso-Reese, 1995)
as described below.
Prototype theory is the idea that some objects are more prototypical of a category
than others and can be used as the basis of comparison when deciding whether or not a
new stimulus belongs to the category. A formal model based on this idea can be stated as
Equation 6, where dist is the Hamming distance between the two vectors x and y, and λp
is a hand-picked value (0.15 in this case) which is used to optimize results. The score is
thus calculated by
Pscore(y) = exp{−λpdist(x, yproto)} (6)
Exemplar theory is the idea that all instances in memory belonging to a certain
category are used in the process of comparison with a new stimulus. The formalized model
of this idea is similar to that of prototype theory, except that it uses the sum of
comparisons with all items in a category rather than a prototype. As with the prototype
model, λe is a hand-picked value of 0.15, found through
Escore(y) =∑xj∈x
exp{−λedist(y, xj)} (7)
The models described above can be applied as they are to the binarized matrices of
user listening history. We can interpret the prototype model as a construction of a
prototypical song representing all of the songs in a query. We can then rank all songs by
similarity to the prototype. The exemplar model compares all songs in the dataset to each
song in the query and sums up the comparisons.
Baseline Models
Performance of the psychological models is measured alongside non-psychological
models as a standard of comparison
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 9
Bayesian Sets. The Bayesian sets model is a machine learning method used to
categorize elements into sets(Ghahramani & Heller, 2005). It can be applied very efficiently
with a single matrix multiplication and has seen success in modeling judgments of
representativeness in images(Abbott, Heller, Ghahramani, & Griffiths, 2011). The Bayesian
sets score is found by the following:
score(x) = p(x|Hc)/p(x) (8)
TFIDF. Term frequency-inverse document frequency is a common technique for
determining the uniqueness of a term within a document. The term is devalued if it
belongs in many documents, and valued if it appears frequently within specific documents.
TFIDF is purportedly used in many commercially-used music recommendation
algorithms(Mims, 2011) and thus serves as a good standard of comparison. The two
equations below summarize our use of TFIDF
score(h) =∑x∈X
TF (x, h) ∗ IDF (x) (9)
P (y ∈ C|X) =∑h∈H
P (y ∈ C|h)score(h) (10)
Term frequency (TF) is simply the frequency of a term x in a document h and inverse
document frequency (IDF) is the reciprocal of the frequency of documents containing the
term. In our case, users are analogous to documents and terms are analogous to songs. We
compute the sum of TFIDF scores for each song in the query X and use a probability
generalization scheme identical to the one used in the Bayesian generalization framework.
Popularity. As our most baseline model, we can simply rank all songs by total
playcount. Songs with higher playcount are more highly recommended as a whole and
therefore appear more relevant in general. This serves as a sanity check, as we must ensure
that no reasonably effective model should fare worse than this.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 10
Metrics
The submission and scoring process is as follows. A submission uses the visible half of
the data in whatever way it wishes, and returns a list of songs for each user, ranked in
order of how likely the song is to be in the missing half of the data for a specific user. Just
as in the Million Song Dataset Challenge, we use four standard metrics in information
retrieval to compare the ranked output y for user u with the actual hidden data. This is
represented in matrix M .
Precision at 10. Precision is a common metric of information retrieval,
representing the proportion of correct items in a top-k ranking. The particular form of
precision we use here is the precision at rank 10 of the ranked list of relevant songs,
calculated by
Pk(u, y) = 110
10∑j=1
Mu,y(j) (11)
Truncated mAP. The main metric used to compare submissions in the
competition is the mean average precision (mAP) of the ranked song suggestions, with a
cutoff τ of the first 500 songs. nu is the total number of users. Average precision is found by
AP (u, y) = 1nu
τ∑k=1
Pk(u, y) ·Mu,y(k) (12)
while mean average precision is simply the mean of these AP scores:
mAP = 1m
∑u
AP (u, yu) (13)
DCG. Discounted cumulative gain (DCG) highly rewards relevant documents for
having a high ranking and penalizes for having a low ranking. We find the DCG only up to
the 10th element in the ranking, so n = 10. The equation for this is
DCG(n) =n∑j=1
2relevant(j) − 1log(1 + j) (14)
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 11
MRR. Mean reciprocal rank (MRR) is simply the mean of the reciprocal ranks of
all items in the query set. n is the number of items in the query. Mean reciprocal rank is
calculated by
MRR(n) = 1n
n∑j=1
1relevant(j) (15)
Results
Since all of the metrics used show similar trends, we will use mAP for illustrating
results, as it was the main metric for the Million Song Dataset Challenge. Full results can
be found in the appendix.
*NOTE: The metrics have not yet been run on prototype and exemplar models.
Varying Population Threshold
All models show an increase in performance as the population threshold increases as
presented in Figures 1 and 2. This makes sense, as the precision scores should increase
when there is more listening history data available for each song. For the Million Song
Dataset Challenge, the Bayesian generalization model appears to outperform TF-IDF,
while the reverse is true for the AOTM-2011 dataset.
Varying Query Size
For the Million Song Dataset Challenge, the models perform poorly on query sizes
1-10, and perform much better when the full query sizes are used. For the AOTM-2011
dataset, Bayesian generalization and TF-IDF show an increase in performance as the query
size increases. The Bayesian generalization framework shows a particular strength in
generating correct recommendations when the query size is small. Interestingly, Bayesian
sets performs worse as the query size increases.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 12
Qualitative Comparison
A quick glance at the actual content of the recommendations provided by different
models tangibly shows the large difference in results. In a sample query of 3 Michael
Jackson songs, the Bayesian generalization model has inferred the theme of our query and
suggested only Michael Jackson songs. Bayesian Sets appears to have guessed themes of
80’s music and Halloween (likely from Michael Jackson’s ’Thriller’). TF-IDF may have
picked up on these but also recommends a few songs which have a less clear relation to
Michael Jackson. Further, if we provide just one song, ’Thriller’, in our query for Bayesian
generalization, it directly picks up the Halloween theme and offers several Halloween songs.
Discussion
The qualitative and quantitative results both show that Bayesian generalization can
often provide insightful recommendations that more traditional models overlook. A
particular strength of the Bayesian generalization framework is its ability to detect the
theme of a certain query when given one song (ex. Thriller) vs. three songs (ex. three
Michael Jackson songs). This suggests that applying such psychological models to music
recommendation could perhaps provide a more human-like quality to current
recommendation systems. The difference in trends regarding scores of the MSD taste
profile and the scores of the AOTM-2011 dataset show the importance in selecting the
proper model for the task. A proposed explanation of this discrepancy lies in the fact that
the AOTM-2011 dataset consists of users selecting songs that they believe would go well
together. Thus, applying a Bayesian generalization model might do well at modeling a real
human who makes recommendations based off of their knowledge of particular song
combinations. Additionally, since the MSD taste profile hypotheses contain entire listening
histories and are less thematic, smaller queries may be insufficient to generate good
recommendations, as suggested by the query size results.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 13
These results additionally call to question the use of traditional information retrieval
metrics when thinking about the problem of recommendation. A typical quantitative
approach in machine learning consists of the methods we used, in which half of the dataset
is removed and recovered. While our results can detect a few trends in increasing
population thresholds and query sizes, many phenomena, such as the poor performance of
Bayesian sets with increasing query size, is hard to deconstruct. In contrast, a qualitative
survey of a sample query leads to clear contrasts amongst the models and a good intuition
for which model would be best for the task.
Conclusion
The world of music recommendation and recommender systems in general shows
strong use of traditional machine learning techniques and metrics. We have seen that
applying psychological models of generalization can contribute significantly to current
systems and perhaps provide more insight into how a human would recommend songs
versus how a typical machine learning algorithm would recommend songs.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 14
References
Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2012). Constructing a hypothesis space
from the web for large-scale Bayesian word learning. In Proceedings of the 34th
Annual Conference of the Cognitive Science Society.
Abbott, J. T., Heller, K. A., Ghahramani, Z., & Griffiths, T. L. (2011). Testing a Bayesian
measure of representativeness using a large image database. In NIPS (Vol. 24, pp.
2321–2329).
Ashby, F. G., & Alfonso-Reese, L. A. (1995). Categorization as probability density
estimation. Journal of mathematical psychology, 39 (2), 216–233.
Brian McFee, D. P. E., Thierry Bertin-Mahieux, & Lanckriet, G. R. (2012). The million
song dataset challenge. Proceedings of the 21st International Conference Companion
on World Wide Web, 12 , 909–916. Retrieved from
http://cosmal.ucsd.edu/ gert/papers/msdc.pdf
CBS Interactive. (n.d.). www.last.fm. ((last accessed: Apr. 16 2014))
Ghahramani, Z., & Heller, K. A. (2005). Bayesian sets. In NIPS (Vol. 2, pp. 22–23).
McFee, B., & Lanckriet, G. R. (2012). Hypergraph models of playlist dialects. In Ismir
(pp. 343–348).
Miller, G. A. (1995). Wordnet: a lexical database for english. Communications of the
ACM , 38 (11), 39–41. (Princeton University dataset available at
http://wordnet.princeton.edu)
Mims, C. (2011). How itunes genius really works. Technology Review. Retrieved from
www.technologyreview.com/view/419198/how-itunes-genius-really-works/
Shepard, R. N. (1987). Towards a universal law of generalization for psychological science.
Science, 237 , 1317-1323.
Su, & Khoshgoftaar. (2009). A survey of collaborative filtering techniques. Advances in
Articial Intelligence, 2009 (421425).
Tenenbaum, J., & Griffiths, T. L. (2001). Generalization, similarity, and Bayesian
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 15
inference. Behavioral and Brain Sciences, 24 , 629-641.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 16
train set test set
matrix dimensions 286213*110000 286213x110000
avg songs per user 13.1903 1.2358
avg users per song 3.7568 0.3520
Table 1
Summary of MSD Challenge Dataset
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 17
train set test set
matrix dimensions 1818*13514 1818x13514
avg songs per user 9.8113 8.4004
avg users per song 1.3199 1.1301
Table 2
Summary of AOTM-2011 Dataset
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 18
Psychological Models Traditional Models
Bayesian Generalization Bayesian Sets
Exemplar TF-IDF
Prototype Popularity
Table 3
Summary of models being tested
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 19
Bayesian Generalization TF-IDF Bayesian Sets
*Bad *Smooth Criminal The Monster Mash
*I Just Can’t Stop Loving You Tiny Dancer *Smooth Criminal
*Smooth Criminal Like A Prayer Nightmare on My Street
*Man in the Mirror *Let’s Get It On *Bad
*Wanna Be Startin’ Somethin’ I Believe in a Thing Called Love *Can You Feel It
*PYT *Money Love is a Battlefield
*You Rock My World Kiss Every Day is Halloween
*Baby Be Mine *Bad *I Just Can’t Stop Loving You
*Why You Wanna Trip on Me? Dead Man’s Party Shake Your Groove Thing
*Speechless Halloween *Stranger in Moscow
Table 4
Rankings for query: Billie Jean, Thiller, The Way You Make Me Feel. Michael Jackson
songs are denoted by asterisks (*).
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 20
Bayesian Generalization
Halloween
The Monster Mash
Werewolves of London
I Put a Spell on You
*Billie Jean
Dead Man’s Party
Girls Just Wanna Have Fun
*Smooth Criminal
Ghost Town
*The Way You Make Me Feel
I’m a Mummy
Video Killed the Radio Star
Table 5
Rankings for Bayesian generalization for query: Thriller. Michael Jackson songs are
denoted by asterisks (*).
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 21
Figure 1 . The mean average precision (mAP) of each model on the Million Song Dataset
Challenge as a function of population threshold.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 22
Figure 2 . The mean average precision (mAP) of each model on the AOTM dataset as a
function of population threshold.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 23
Figure 3 . The mean average precision (mAP) of each model on the Million Song Dataset
Challenge as a function of query size.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 24
Figure 4 . The mean average precision (mAP) of each model on the AOTM dataset as a
function of query size.
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 25
Appendix A: Full Results - Varying Query Size
AOTM-2011 Dataset
N 1 2 3 5 10 all
Bayesian Generalization 0.0025 0.0029 0.0033 0.0037 0.0039 0.0039
TF-IDF 0.0029 0.0026 0.0029 0.0031 0.0032 0.0032
Bayesian Sets 0.0029 0.0015 0.0014 0.0017 0.0016 0.0016
Popularity 0.0033 0.0033 0.0033 0.0033 0.0033 0.0033
Prototype - - - - - -
Exemplar - - - - - -
Table 6
P at 10
N 1 2 3 5 10 all
Bayesian Generalization 0.0029 0.0032 0.0033 0.0037 0.0039 0.0039
TF-IDF 0.0024 0.0024 0.0027 0.0030 0.0031 0.0031
Bayesian Sets 0.0032 0.0015 0.0015 0.0015 0.0015 0.0015
Popularity 0.0035 0.0035 0.0035 0.0035 0.0035 0.0035
Prototype - - - - - -
Exemplar - - - - - -
Table 7
mAP
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 26
N 1 2 3 5 10 all
Bayesian Generalization 6.5692 6.5703 6.5720 6.5740 6.5749 6.5749
TF-IDF 6.5722 6.5685 6.5700 6.5707 6.5709 6.5709
Bayesian Sets 6.5744 6.5631 6.5630 6.5639 6.5633 6.5631
Popularity 6.5767 6.5767 6.5767 6.5767 6.5767 6.5767
Prototype - - - - - -
Exemplar - - - - - -
Table 8
DCG
N 1 2 3 5 10 all
Bayesian Generalization 0.0025 0.0027 0.0028 0.0030 0.0031 0.0031
TF-IDF 0.0020 0.0020 0.0022 0.0024 0.0025 0.0025
Bayesian Sets 0.0029 0.0012 0.0013 0.0013 0.0013 0.0013
Popularity 0.0032 0.0032 0.0032 0.0032 0.0032 0.0032
Prototype - - - - - -
Exemplar - - - - - -
Table 9
MRR
Million Song Dataset Challenge
Appendix B: Full Results- Varying Population Threshold
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 27
N 1 2 3 5 10 all
Bayesian Generalization 2.8425e-05 1.4213e-04 5.6850e-05 0 0 1.3758e-02
TF-IDF 5.6850e-05 5.6850e-05 2.8425e-05 0 2.4417e-02 4.4940e-02
Bayesian Sets 5.6850e-05 5.6850e-05 2.8425e-05 0 2.8425e-05 4.4940e-02
Popularity 3.1950e-02 3.1950e-02 3.1950e-02 3.1950e-02 3.1950e-02 3.1950e-02
Prototype - - - - - -
Exemplar - - - - - -
Table 10
P at 10
N 1 2 3 5 10 all
Bayesian Generalization 5.9968e-05 3.2262e-05 1.8260e-05 8.5345e-06 1.3408e-05 1.9592e-02
TF-IDF 5.8601e-05 2.1097e-05 1.3903e-05 9.6758e-06 1.1140e-05 3.8134e-02
Bayesian Sets 3.5465e-05 6.7142e-05 9.7076e-06 8.9287e-06 8.0981e-06 3.4094e-02
Popularity 1.9151e-02 1.9151e-02 1.9151e-02 1.9151e-02 1.9151e-02 1.9151e-02
Prototype - - - - - -
Exemplar - - - - - -
Table 11
mAP
AOTM-2011 Dataset
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 28
N 1 2 3 5 10 all
Bayesian Generalization 6.5554 6.5559 6.5553 6.5550 6.5550 6.6172
TF-IDF 6.5555 6.5555 6.5552 6.5551 6.5550 6.6688
Bayesian Sets 6.5553 6.5556 6.5551 6.5550 6.5551 6.7747
Popularity 6.7804 6.7804 6.7804 6.7804 6.7804 6.7804
Prototype - - - - - -
Exemplar - - - - - -
Table 12
DCG
N 1 2 3 5 10 all
Bayesian Generalization 0.4728 0.5347 0.5349 0.5349 0.5349 0.5349
TF-IDF 0.4440 0.5319 0.5347 0.5349 0.5349 0.5349
Bayesian Sets 0.0032 0.4137 0.4318 0.4487 0.4552 0.4554
Popularity 0.0034 0.0034 0.0034 0.0034 0.0034 0.0034
Prototype - - - - - -
Exemplar - - - - - -
Table 13
MRR
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 29
T none 2 3 5
Bayesian Generalization 0.0039 0.0057 0.0060 0.0066
TF-IDF 0.0032 0.0050 0.0055 0.0058
Bayesian Sets 0.0016 0.0025 0.0028 0.0032
Popularity 0.0033 0.0043 0.0047 0.0056
Prototype - - - -
Exemplar - - - -
Table 14
P at 10
T none 2 3 5
Bayesian Generalization 0.0039 0.0085 0.0101 0.0128
TF-IDF 0.0031 0.0071 0.0085 0.0103
Bayesian Sets 0.0015 0.0040 0.0048 0.0071
Popularity 0.0035 0.0073 0.0089 0.0126
Prototype - - - -
Exemplar - - - -
Table 15
mAP
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 30
T none 2 3 5
Bayesian Generalization 6.5749 6.5851 6.5873 6.5912
TF-IDF 6.5709 6.5810 6.5841 6.5856
Bayesian Sets 6.5631 6.5688 6.5705 6.5731
Popularity 6.5767 6.5835 6.5860 6.5921
Prototype - - - -
Exemplar - - - -
Table 16
DCG
T none 2 3 5
Bayesian Generalization 0.0031 0.0073 0.0088 0.0113
TF-IDF 0.0025 0.0061 0.0074 0.0092
Bayesian Sets 0.0013 0.0036 0.0044 0.0066
Popularity 0.0032 0.0068 0.0082 0.0117
Prototype - - - -
Exemplar - - - -
Table 17
MRR
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 31
Million Song Dataset Challenge
T none 10 25 50 100 200
Bayesian Generalization 0.0138 0.0205 0.0248 0.0300 0.0345 0.0380
TF-IDF 0.0244 0.0408 0.0479 0.0508 0.0523 0.0523
Bayesian Sets 0.0449 0.0404 0.0408 0.0408 0.0405 0.0359
Popularity 0.0319 0.0322 0.0327 0.0337 0.0366 0.0434
Prototype - - - - - -
Exemplar - - - - - -
Table 18
P at 10
T none 10 25 50 100 200
Bayesian Generalization 0.0196 0.0275 0.0332 0.0400 0.0502 0.0669
TF-IDF 0.0381 0.0509 0.0553 0.0589 0.0638 0.0736
Bayesian Sets 0.0341 0.0363 0.0379 0.0391 0.0400 0.0444
Popularity 0.0192 0.0222 0.0256 0.0305 0.0409 0.0634
Prototype - - - - - -
Exemplar - - - - - -
Table 19
mAP
APPLYING PSYCHOLOGICAL MODELS TO MUSIC RECOMMENDATION 32
T none 10 25 50 100 200
Bayesian Generalization 6.6172 6.6524 6.6775 6.7075 6.7376 6.7674
TF-IDF 6.6688 6.7515 6.7920 6.8122 6.8265 6.8335
Bayesian Sets 6.7747 6.7521 6.7570 6.7600 6.7623 6.7459
Popularity 6.7804 6.7822 6.7857 6.7931 6.8130 6.8616
Prototype - - - - - -
Exemplar - - - - - -
Table 20
DCG
T none 10 25 50 100 200
Bayesian Generalization 0.0091 0.0137 0.0181 0.0232 0.0322 0.0466
TF-IDF 0.0142 0.0214 0.0263 0.0309 0.0380 0.0479
Bayesian Sets 0.0145 0.0170 0.0195 0.0220 0.0257 0.0315
Popularity 0.0120 0.0142 0.0167 0.0204 0.0282 0.0454
Prototype - - - - - -
Exemplar - - - - - -
Table 21
MRR