recommender systems - unibo.it systems a recommender system ... e-commerce medical applications...
TRANSCRIPT
1
Recommender SystemsInformation Systems M
Prof. Paolo Ciaccia
http://www-db.deis.unibo.it/courses/SI-M/
Recommender Systems
� A recommender system (RS) helps people to evaluate the, potentially huge,
number of alternatives offered by a Web site
� In their simplest form RS’s recommend to their users personalized and ranked lists of items
� Provide consumers with information to help them decide which items to purchase
� Given a set of users and items (documents, products, …) recommend items
to a user based on
� past behavior of this and other users
� additional information on users/items
� A multitude of applications, and a big market!
� E-commerce
� Medical applications (e.g., matching patients to doctors)
� Customer Relationship Management (e.g., matching customer problems to experts)
Recommender Systems 2Sistemi Informativi M
2
What book should I buy?
Recommender Systems Sistemi Informativi M 3
What movie should I watch?
• The Internet Movie Database (IMDb) provides information about actors, films, television shows, television stars, video games…
• Owned by Amazon.com since 1998 • 796,328 titles and 2,127,371 people• More than 50M users per month
Recommender Systems 4Sistemi Informativi M
3
The Netflix prize (1)
� Netflix is a US online movie rental service
� Over 100K titles and 55 million DVDs total
� A proprietary recommendation system called “Cinematch”
� Approximately 60% of Netflix members select their movies based on movie
recommendations
In October 2006, Netflix announced it would have paid a $1 million to whoever
created a movie-recommending algorithm 10% better than Cinematch
Recommender Systems Sistemi Informativi M 5
12.01.2011
The Netflix prize (2)
� Within two weeks, Netflix received 169 submissions, including three that
were slightly superior to Cinematch
� After a month, more than a thousand programs had been entered, and the
top scorers were almost halfway to the goal
� Three years later, on 21st of September 2009, Netflix announced the winner
Recommender Systems Sistemi Informativi M 6
4
What news should I read?
Recommender Systems Sistemi Informativi M 7
Where should I spend my vacation?
Recommender Systems Sistemi Informativi M 8
5
Remarkable examples
Recommender Systems Sistemi Informativi M 9
Amazon.com Books, movies, music
CDNOW.com Music
Ebay.com (feedback forms) Anything
Reel.com Movies
Barnes & Noble Books
Method
Systems
JinniTaste
Kid
Nano
crow
d
Clerk
dogs
Critic
kerIMDb
Flixst
er
Movi
elens
Netfli
x
Shaza
m
Pand
ora
LastF
M
YooC
hoos
e
Think
Analy
tics
Itune
s
Amaz
on
Collaborative Filtering v v v v v v v v v v v v
Content-Based
Techniquesv v v v v v v v v v v
Knowledge-Based
Techniquesv v v v v v v
Ontologies and
Semantic Web
Technologies for
Recommender Systems
v v v
Hybrid Techniques v v v v v v v
Context Dependent
Recommender Systemsv v v v v v
Technologies
Recommender Systems Sistemi Informativi M 10
6
Inputs to a RS
� Behavior of user in past “transactions”
� which items viewed/purchase
� content/attributes of items
� pages bookmarked
� explicit ratings on items
� Context (used in context-based recommendations)
� what the user appears to be doing now
� Role/domain
� additional info about users, items, …
Recommender Systems 11Sistemi Informativi M
Content-Based Recommendation
� In content-based recommendations the system tries to recommend items
that matches the user profile
� The profile is based on items that the user liked in the past or on explicit
interests that s/he defines
Recommender Systems 12Sistemi Informativi M
New booksUser Profile
Recommender
Systems
Match
7
Implementing content-based RS’s
� The basic idea is borrowed from the Vector Space Model
� Each item is characterized by a set of (weighted) features
� Movie: actors, director, title, …
� Weight: use tf.idf
� Also works for “unstructured” data (web pages, docs, etc.)
� The user profile is built using user history
� E.g., a vector representing the relevance of features/keywords for that user
� Either implicit or explicit “rating of features” (or both)
� Cosine similarity can be used to match the user profile with an item vector
Recommender Systems Sistemi Informativi M 13
Pros and cons of content-based RS’s
� Able to recommend new and unpopular items
� No need for data on other users
� Can provide explanations of recommended items
� Limited content analysis
� Not always easy to find the appropriate features to use
� Overspecialization
� Can only recommend items similar to previously seen/rated ones
� Further, items too similar to some the user already knows might not be of interest (e.g., news articles)
� New users
� How to build a profile?
Recommender Systems Sistemi Informativi M 14
8
Collaborative filtering (CF)
� Unlike content-based recommendation methods, CF recommender systems
try to predict the utility of items for a particular user based on the items
previously rated by other users
� Two basic variants of CF:
User-based: To predict a user’s opinion for an item, use the opinion of similar
users, where similarity between users depends on their opinions for other
items
Item-based: as in content-based RS’s, the assumption is that a user is likely to
have the same opinion for similar items; however, now similarity between
items depends on how other users have rated them
Recommender Systems Sistemi Informativi M 15
User-based CF
Recommender Systems Sistemi Informativi M 16
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 8 1?
2 7
User 2 2?
5 7 5
User 3 5 4 7 4 7
User 4 7 1 7 3 8
User 5 1 7 4 6 5
User 6 8 3 8 3 7
9
Similarity between users: simple way
� Only consider items both users have rated
� For each item, compute the difference in the users’ ratings
� If Item j has been rated by both User 1 and User 2:
| rating (User 1, Item j) – rating (User 2, Item j) |
� Take the average of these differences over all common items
Recommender Systems Sistemi Informativi M 17
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 8 1 ? 2 7
User 2 2 ? 5 7 5
Similarity between users: more realistic
� Can use either all items or only those rated by both users
� We have a user-item matrix R of ratings, where ra,i is the rating of user a for
item I, and is the average rating of user a
� Two major alternatives for measuring the similarity between users:
Recommender Systems Sistemi Informativi M 18
∑∑
∑
−−
−−
=
i
2
bib,
i
2
aia,
i
bib,aia,
)r(r)r(r
)r)(rr(r
b)sim(a,
ar
Pearson correlation
Cosine
∑∑
∑=
i
2
ib,
i
2
ia,
i
ib,ia,
rr
rr
b)sim(a,
10
Rating prediction and recommendation
� To predict the rating ra,i for the (target) user a and item i, a weighted sum
can be used:
� Rather than considering all the users, only the k most similar to user a can
be used
� Based on rating predictions, the top-N items can be recommended to user a
Recommender Systems Sistemi Informativi M 19
iu,
u
ia, ru)sim(a,r ×=∑
5
4
7 7
8
weighted sum
Problems with user-based CF
� User Cold-Start problem
� Not enough is known about new user to decide who is similar
� Sparsity of the rating matrix
� With large item sets, users will have rated only some of the items(makes it hard to find similar users)
� With 2M books, rating 2K of them is only 0.1%
� Scalability
� With millions of users and items, computations become slow
� Item Cold-Start problem
� Cannot predict ratings for a new item until some users have rated it
� Also a problem with “esoteric” items
� Popularity bias
� Cannot recommend items to a user with unique tastes
Recommender Systems Sistemi Informativi M 20
11
Item-based CF
� Pearson correlation (or cosine) is now used to measure the similarity of
items
� Still based on ratings, not on items’ content!
Recommender Systems Sistemi Informativi M 21
∑∑
∑
−−
−−
=
u
2
jju,
u
2
iiu,
u
jju,iiu,
)r(r)r(r
)r)(rr(r
j)sim(i,
Pearson correlation
Cosine
∑∑
∑=
u
2
ju,
u
2
iu,
u
ju,iu,
rr
rr
j)sim(i,
Generating predictions
� As with user-based CF, can use all items or only the k most similar ones
Recommender Systems Sistemi Informativi M 22
∑
∑ ×
=
j
ja,
j
ia,j)sim(i,
rj)sim(i,
r
Item
3
2
18
7
Weighted sum
Item
5
Item
4
Item
2
Item
1
12
Problems with item-based CF
� Item Cold-Start problem
� This is a major problem here
Recommender Systems Sistemi Informativi M 23
Important Issues
� Cold Start, Implicit/Explicit Rating, Sparsity, Portfolio Effect (non diversity
problem), Security, Privacy, …
� A lot of work exists on RS‘s, and many other alternatives have been
developed
� Hybrid RS‘s
� Model-based CF
� Develop a model of user ratings (probabilistic, based on clustering, etc.)
� Context-based RS‘s
� Vary the predictions depending on user context
� …
� See also the survey [AT05] on the web site
Recommender Systems 24Sistemi Informativi M