music recommender systems

Music Recommender Systems

超群 [email protected]

http://www.fuchaoqun.com

http://www.fuchaoqun.com/

Who is Using Recommender Systems?

Recommender Systems

• Summary ：

– http://en.wikipedia.org/wiki/Recommender_system

• Keywords ：

recommender systems 、 association rules 、 collaborative filtering 、 slope one 、 SVD 、 KNN....

http://en.wikipedia.org/wiki/Recommender_system

Algorithms

• Association Rules

• Slope one

• SVD

• ….

Association Rules

TID Items

1 Bread 、 Milk

2 Bread 、 Diaper 、 Beer 、 Egg

3 Diaper 、 Beer 、 Cola

4 Bread 、 Milk 、 Diaper 、 Beer

5 Bread 、 Milk 、 Diaper 、 Cola

Items Times

Beer 、 Diaper 3

Bread 、 Milk 3

Beer 、 Bread 2

Diaper 、 Milk 2

Beer 、 Milk 1

Association Rules

• Support ：

• Confidence:

• Algorithms ： Apriori algorithm 、 FP-growth algorithm

• http://en.wikipedia.org/wiki/Association_rule_learning

• Demo ： Python + Orange

http://www.fuchaoqun.com/2008/08/data-mining-with-python-orange-association_rule/

N

YXYXs

)(

)(

)()(

X

YXYXc

http://en.wikipedia.org/wiki/Association_rule_learning



Algorithms


• Slope one

• SVD

• ….

Slope One

User That is it Straight Through My

Heart

Jim 4 5

Mike 2 4

Fred 3 ？

Slope One

• By Daniel Lemire in 2005

– http://www.daniel-lemire.com/fr/abstracts/SDM2005.html

• Simper Could Be Better

• Weighted Average:

• http://en.wikipedia.org/wiki/Slope_One

• Implements:

http://taste.sourceforge.net/ (Java)

http://code.google.com/p/openslopeone (PHP&MySQL)

nm

rRnrRmBP BCCBAA

)()()(

http://www.daniel-lemire.com/fr/abstracts/SDM2005.html

http://en.wikipedia.org/wiki/Slope_One

http://taste.sourceforge.net/

http://code.google.com/p/openslopeone

Algorithms


• Slope one

• SVD

• ….

Similarity

Similarity ：2

,2

,22

,12,

2,2

2,1

,,,2,2,1,1),cos(jmjjimii

jmimjiji

RRRRRR

RRRRRRji

SVD

Image copy from Here

http://games.cs.ualberta.ca/~greiner/R/OLD-BiCluster/SVDApproaches.html

SVD In Image Compression

Original K=10 K=20

Process SVD

1. Define the original user-item matrix, R, of size m x n, which includes the ratings of m users on n items. rij refers to the rating of user ui on item ij .

2. Preprocess user-item matrix R in order to eliminate all missing data values.

3. Compute the SVD of R and obtain matrices U, S and V , of size m x m, m x n, and n x n, respectively. Their relationship is expressed by: R =U * S * VT .

4. Perform the dimensionality reduction step by keeping only k diagonal entries from matrix S to obtain a k x k matrix, Sk. Similarly, matrices Uk and Vk of size m x k and k x n are generated. The "reduced" user-item matrix, R’, is obtained by R’ = Uk * Sk * Vk

T, while r'

ij denotes the rating by user ui on item ij as included in this reduced matrix.

5. Compute sqrt(Sk) and then calculate two matrix products: Uk * sqrt(Sk)T, which represents m users and sqrt(Sk) * Vk

T , which represents n items in the k dimen-sional feature space. We are particularly interested in the latter matrix, of size k x n.

6. Use KNN on user matrix and item matrix, or you can multiply them to get user's rating on every item.

Demo

from Here

Which two people have the most similar tastes?

Which two season are the most close?

http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/

SVD

• SVD– matlab– LAPCKL、 BLAS (Fortran)– numpy、 scipy (Python)– SVDLIBC、 Meschach (C)– http://en.wikipedia.org/wiki/Singular_value_decompositio

n– ……

• KNN:– matlab– FLANN– ……

• All in one solution:– DIVISI– ……

http://www.netlib.org/lapack/

http://www.netlib.org/blas/

http://numpy.scipy.org/

http://www.scipy.org/

http://tedlab.mit.edu/~dr/svdlibc/

http://www.cs.uiowa.edu/~dstewart/meschach/

http://en.wikipedia.org/wiki/Singular_value_decomposition

http://en.wikipedia.org/wiki/Singular_value_decomposition

http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

http://divisi.media.mit.edu/

MAGIC DIVISI ！

#!/usr/bin/env python#coding=utf-8

import divisifrom divisi.cnet import *

data = divisi.SparseLabeledTensor(ndim = 2)

# read some rating into data# data[user_id, song_id] = 4

svd_result = data.svd(k = 128)

# get songs that the user may like# predict_features(svd_result, user_id).top_items(100)# get similar songs# feature_similarity(svd_result, song_id).top_items(100)# get users that have similar tastes# concept_similarity(svd_result, user_id).top_items(100)

Music Recommender Systems

• Data collection

• Data Cleaning

• Data Preprocessing

• Data Mining

• Tracking & Optimization

Data collection

• User rating

• User collection

• User listen log

• User view log

• ….

Data Cleaning

• Missing data

• Wrong data

• Noise data

• Duplicate data

• ….

UserId SongId Times

3306 3654 200

3306 6950 236

3306 6528 268

3306 5874

3306 9527 foo

3306 5624 1000000

3306 9635 5

3306 6950 236

…. …. ….

Data Preprocessing

UserId SongId Times

3306 3654 200

3306 6950 236

3306 6528 268

3306 5874 325

3306 9527 126

3306 5624 98

3306 9635 115

3306 6962 210

…. …. ….

UserId SongId Weight

3306 3654 0.62

3306 6950 0.73

3306 6528 0.82

3306 5874 1

3306 9527 0.39

3306 5624 0.30

3306 9635 0.35

3306 6962 0.65

…. …. ….

Data Mining

UserId SongId Weight

3306 3654 0.62

3306 6950 0.73

3306 6528 0.82

3306 5874 1

3306 9527 0.39

3306 5624 0.30

3306 9635 0.35

3306 6962 0.65

…. …. ….

UserId Similary Users’ Id

…. ….

SongId Similary Songs’ Id

…. ….

Tracking & Optimization

• Recommended result

• User view and click what he like

• Store user's click

• Data Mining

• Better recommendation

That's it, Thanks.Q&A

music recommender systems

Technology