Download - Movie Recommendation with DBpedia - IIR 2012

3rd Italian Information Retrieval Workshop (IIR 2012) – Bari January 26, 2012

MOVIE RECOMMENDATION WITH DBPEDIA

Politecnico di Bari

Via Orabona, 4

70125 Bari (ITALY)

Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Claudio Ostuni, Eugenio Di Sciascio [email protected], [email protected] , [email protected], [email protected], [email protected]


Outline

DBpedia: a nucleus for a Web of Open Data Social knowledge bases for similarity detection

Semantic Vector Space Model Vector Space Model adapted to RDF graphs

MORE: More than Movie Recommendation Content-based recommendation in action

Evaluation Precision and Recall experiments with MovieLens

Conclusion


What is Linked Data?

Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.” [www.linkeddata.org]


DBpedia: a Nucleus for a Web of Data (i)


DBpedia: a Nucleus for a Web of Data (ii)

Let’s use all this knowledge to build smarter content-based recommender

systems

The DBpedia knowledge base currently describes more than 3.64 million things, highly interconnected in the RDF graph.


Social KBs for similarity detection

Ocean’s Eleven

George Clooney

Brad Pitt

Ocean’s Twelve

Steven Soderbergh

Catherine Zeta-Jones

2000s crime films

American criminal comedy films

Crime films

Crime


Semantic Vector Space Model (i)

[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]

Quick recap on Vector Space Model Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary.

1, 2, ,, ,...,T

d d d N dv w w w

, ,t d t d tw tf idf

,

,

,

t d

t d

k dk

ntf

n

, ,1

2 2

, ,1 1

( , )

N

i j i qj q ij

N Nj i j i qi i

w wd dsim d q

d q w w

' 'logt

Didf

d D t d


Each resource (movie) is expressed as a tensor in a multi-dimensional space where each dimension corresponds to a specific property of the considered datasets (e.g., starring, subject/broader, director, genre, …)

Semantic Vector Space Model (ii)

Ocean’s Eleven

George Clooney

Steven Soderberg 2000s crime films

Crime starring

director subject/broader

genre

Ocean’s Twelve

Brad Pitt Catherine Zeta-Jones

Crime films American criminal…

Oce

an’s

Ele

ven

Ge

org

e C

loo

ney

Stev

en

So

de

rbe

rg

20

00

s cr

ime

film

s

Cri

me

Oce

an’s

Tw

elv

e

Bra

d P

itt

Cat

he

rin

e Z

eta-

Jon

es

Cri

me

film

s A

me

rica

n c

rim

inal

…

Vector Space Model applied to RDF graphs

Ocean’s Eleven Ocean’s Twelve

starring

Ge

org

e C

loo

ney

B

rad

Pit

t C

ath

eri

ne

Zet

a-Jo

ne

s


STARRING George

Clooney [gc] (38 movies)

Catherine Z. Jones [czj] (22 movies)

Brad Pitt [bp]

(35 movies)

Ocean’s Eleven [o11] (13 actors)

Ocean’s Twelve [o12] (15 actors)

STARRING George



Brad Pitt [bp]

(35 movies)



Semantic Vector Space Model (iii)

Ocean’s Eleven

STARRING George



Brad Pitt [bp]

(35 movies)



Ocean’s Twelve

xyxyx actormovieactormovieactor idftfw ,,

12 11 12 11 12 11

12 12 12 11 11

, , , , , ,

12 112 2 2 2 2

, , , , ,

( , )gc o gc o czj o czj o bp o bp o

starring

gc o czj o bp o gc o bp o

w w w w w wsim o o

w w w w w


Semantic Vector Space Model (iv)

0.24235

49184log

13

1

0.21035

49184log

15

1

022

49184log0

0.22322

49184log

15

1

0.23938

49184log

13

1

0.20738

49184log

15

1

1111

1212

1111

1212

1111

1212

,,

,,

,,

,,

,,

,,

bpobpobp

bpobpobp

czjoczjoczj

czjoczjoczj

gcogcogc

gcogcogc

idftfw

idftfw

idftfw

idftfw

idftfw

idftfw12 11( , )starring starringsim o o

12 11( , )genre genresim o o

12 11( , )subject subjectsim o o

+

+

),( 1112 oosim

+

… =


MORE: More than Movie Recommendation

http://apps.facebook.com/movie-recommendation/

MORE is a Facebook application that semantically recommends movies to the user leveraging the knowledge within DBpedia. MORE supports the user in exploratory browsing tasks by guiding their search through a semantic knowledge space. Similarities between movies are computed by a Semantic version of the classical Vector Space Model (sVSM), applied to semantic datasets.


Semantic Content-based Recommender

Given a user profile, defined as:

( ) likes j jprofile u m u m

We compute a similarity between mi and the information encoded in profile(u):

( )

1( , )

( , )( )

j

p p j i

m profile u p

i

sim m mP

r u mprofile u

If this similarity is greater or equal to 0.5, we suggest the movie mi to the user u.


Training the system

In order to identify the best possible values for the coefficients p (i.e., the weights associated to each property), we train the system via a genetic algorithm adopting an N-fold cross validation approach (with N = 5) on the 100k MovieLens dataset. At the end we obtain a set Ap = {p

1, …, p5} of 5 different values for each p, e.g.:

Then, we evaluate the performances with standard precision and recall tests, when p is one of the following:

min( )pA max( )pA ( )pavg A ( )pmedian A ( )plowestError A


Evaluation: Precision & Recall

@@

Rec N TestSetP N

N

@@

Rec N TestSetR N

TestSet

The figure shows high values of Precision and Recall. The best values are obtained choosing the lowest misclassification error on Ap for the coefficients p.

3,4,5,6,7N

We also evaluated the importance of the subject/broader property. The information of this property is peculiar of ontological datasets. As shown in the figure, the performances drastically decrease if we do not consider this property.


Conclusion & Future directions

The huge amount of data available on DBpedia can be successfully exploited to build content-based recommender systems.

We have presented MORE, a Facebook application that leverages the knowledge within DBpedia to produce movie recommendations by means of a semantic version of the classical vector space model (sVSM).

Evaluation against historical datasets and high values of precision and recall prove the validity of our approach.

We are currently working on: Testing the approach with different domains

Improving the recommendation with a hybrid approach (content-based and collaborative filtering)

We acknowledge partial support of HP IRP 2011. Grant CW267313.


Q? A!

Download - Movie Recommendation with DBpedia - IIR 2012

Top Related