music recommender systems
DESCRIPTION
Algorithm about recommender systems.TRANSCRIPT
Who is Using Recommender Systems?
Recommender Systems
• Summary :
– http://en.wikipedia.org/wiki/Recommender_system
• Keywords :
recommender systems 、 association rules 、 collaborative filtering 、 slope one 、 SVD 、 KNN....
Algorithms
• Association Rules
• Slope one
• SVD
• ….
Algorithms
• Association Rules
• Slope one
• SVD
• ….
Association Rules
TID Items
1 Bread 、 Milk
2 Bread 、 Diaper 、 Beer 、 Egg
3 Diaper 、 Beer 、 Cola
4 Bread 、 Milk 、 Diaper 、 Beer
5 Bread 、 Milk 、 Diaper 、 Cola
Items Times
Beer 、 Diaper 3
Bread 、 Milk 3
Beer 、 Bread 2
Diaper 、 Milk 2
Beer 、 Milk 1
Association Rules
• Support :
• Confidence:
• Algorithms : Apriori algorithm 、 FP-growth algorithm
• http://en.wikipedia.org/wiki/Association_rule_learning
• Demo : Python + Orange
http://www.fuchaoqun.com/2008/08/data-mining-with-python-orange-association_rule/
N
YXYXs
)(
)(
)()(
X
YXYXc
Algorithms
• Association Rules
• Slope one
• SVD
• ….
Slope One
User That is it Straight Through My
Heart
Jim 4 5
Mike 2 4
Fred 3 ?
Slope One
• By Daniel Lemire in 2005
– http://www.daniel-lemire.com/fr/abstracts/SDM2005.html
• Simper Could Be Better
• Weighted Average:
• http://en.wikipedia.org/wiki/Slope_One
• Implements:
http://taste.sourceforge.net/ (Java)
http://code.google.com/p/openslopeone (PHP&MySQL)
nm
rRnrRmBP BCCBAA
)()()(
Algorithms
• Association Rules
• Slope one
• SVD
• ….
Similarity
Similarity :2
,2
,22
,12,
2,2
2,1
,,,2,2,1,1),cos(jmjjimii
jmimjiji
RRRRRR
RRRRRRji
SVD In Image Compression
Original K=10 K=20
Process SVD
1. Define the original user-item matrix, R, of size m x n, which includes the ratings of m users on n items. rij refers to the rating of user ui on item ij .
2. Preprocess user-item matrix R in order to eliminate all missing data values.
3. Compute the SVD of R and obtain matrices U, S and V , of size m x m, m x n, and n x n, respectively. Their relationship is expressed by: R =U * S * VT .
4. Perform the dimensionality reduction step by keeping only k diagonal entries from matrix S to obtain a k x k matrix, Sk. Similarly, matrices Uk and Vk of size m x k and k x n are generated. The "reduced" user-item matrix, R’, is obtained by R’ = Uk * Sk * Vk
T, while r'
ij denotes the rating by user ui on item ij as included in this reduced matrix.
5. Compute sqrt(Sk) and then calculate two matrix products: Uk * sqrt(Sk)T, which represents m users and sqrt(Sk) * Vk
T , which represents n items in the k dimen-sional feature space. We are particularly interested in the latter matrix, of size k x n.
6. Use KNN on user matrix and item matrix, or you can multiply them to get user's rating on every item.
Demo
from Here
Which two people have the most similar tastes?
Which two season are the most close?
Demo
Demo
SVD
• SVD– matlab– LAPCKL、 BLAS (Fortran)– numpy、 scipy (Python)– SVDLIBC、 Meschach (C)– http://en.wikipedia.org/wiki/Singular_value_decompositio
n– ……
• KNN:– matlab– FLANN– ……
• All in one solution:– DIVISI– ……
MAGIC DIVISI !
#!/usr/bin/env python#coding=utf-8
import divisifrom divisi.cnet import *
data = divisi.SparseLabeledTensor(ndim = 2)
# read some rating into data# data[user_id, song_id] = 4
svd_result = data.svd(k = 128)
# get songs that the user may like# predict_features(svd_result, user_id).top_items(100)# get similar songs# feature_similarity(svd_result, song_id).top_items(100)# get users that have similar tastes# concept_similarity(svd_result, user_id).top_items(100)
Music Recommender Systems
• Data collection
• Data Cleaning
• Data Preprocessing
• Data Mining
• Tracking & Optimization
Data collection
• User rating
• User collection
• User listen log
• User view log
• ….
Data Cleaning
• Missing data
• Wrong data
• Noise data
• Duplicate data
• ….
UserId SongId Times
3306 3654 200
3306 6950 236
3306 6528 268
3306 5874
3306 9527 foo
3306 5624 1000000
3306 9635 5
3306 6950 236
…. …. ….
Data Preprocessing
UserId SongId Times
3306 3654 200
3306 6950 236
3306 6528 268
3306 5874 325
3306 9527 126
3306 5624 98
3306 9635 115
3306 6962 210
…. …. ….
UserId SongId Weight
3306 3654 0.62
3306 6950 0.73
3306 6528 0.82
3306 5874 1
3306 9527 0.39
3306 5624 0.30
3306 9635 0.35
3306 6962 0.65
…. …. ….
Data Mining
UserId SongId Weight
3306 3654 0.62
3306 6950 0.73
3306 6528 0.82
3306 5874 1
3306 9527 0.39
3306 5624 0.30
3306 9635 0.35
3306 6962 0.65
…. …. ….
UserId Similary Users’ Id
…. ….
SongId Similary Songs’ Id
…. ….
Tracking & Optimization
• Recommended result
• User view and click what he like
• Store user's click
• Data Mining
• Better recommendation
That's it, Thanks.Q&A