transferring semantic categories with vertex kernels: recommendations with semanticsvd++

20
TRANSFERRING SEMANTIC CATEGORIES WITH VERTEX KERNELS: RECOMMENDATIONS WITH SemanticSVD ++ MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | [email protected] International Semantic Web Conference 2014 Riva del Garda, Trento, Italy

Upload: matthew-rowe

Post on 02-Dec-2014

410 views

Category:

Science


6 download

DESCRIPTION

Presentation slides from the International Semantic Web Conference 2014

TRANSCRIPT

Page 1: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

TRANSFERRING SEMANTIC CATEGORIES WITH VERTEX KERNELS: RECOMMENDATIONS WITH SemanticSVD++

MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | [email protected] International Semantic Web Conference 2014 Riva del Garda, Trento, Italy

Page 2: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

1 2 3

1 8* 8* 2*

2 9* ? 1*

3 9* 8* 1*

1 2 3

1 8* 8* 2*

2 9* 8* 1*

3 9* 8* 1* Induce Model and

Predict Ratings

Predicting Ratings

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 1

Page 3: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

1 … f

1

2

3

1 2 3

1 8* 8* 3*

2 9* ? 1*

3 9* 8* 1*

1 2 3

1

f

Latent Factor Models: Factor Consistency Problem

•  Cannot ‘accurately’ align latent factors •  Cannot tell how users’ taste have evolved

F = #factors (a priori)

Time

?

?

?

?

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 2

Page 4: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

1 … c

1

2

3

1 2 3

1 8* 8* 3*

2 9* ? 1*

3 9* 8* 1*

i <URI> {<SKOS_CATEGORY>}

Solution: Semantic Categories

Preference for category c at time s

√" √"

Time

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 3 SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings. M Rowe. In the proceedings of the International Conference on Web Intelligence 2014. Warsaw, Poland. (2014)

Page 5: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Problem: Cold-Start Categories

Page 6: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Cold-start Categories

dbpedia:c4!

dbpedia:c5!

9* 8* ?

dbpedia:c1!

dbpedia:c3!dcterms:subject!

dbpedia:c2!

Unrated Categories

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 5

Page 7: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

¨  Dataset + URI Alignment ¨  Quantification: Cold-Start Categories Problem ¨  Transferring Semantic Categories ¨  User Profiling

¤ Tracking Taste Evolution ¨  SemanticSVD++

¤  Incorporating Transferred Categories ¨  Evaluation ¨  Conclusions

Outline

Page 8: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

SPARQL Query for Candidate URIs from

Movie’s title

Get Semantic Categories of

each candidate

Disambiguate based on

Movie’s Year

explicit factors. The web of linked data provided a resourcefor such information, where movies appear within the linkeddata cloud as Uniform Resource Identifiers (URIs) which, upondereferencing, return information about the movie: director,year of release, actors, and the semantic categories in whichthe film has been placed. For instance, for the movie ‘Alien’released in 1979, which we shall now use as a runningexample, the following categories are found:<h t t p : / / d b p e d i a . o rg / r e s o u r c e / A l i e n s ( f i l m )>

d c t e r m s : s u b j e c t c a t e g o r y : A l i en ( f r a n c h i s e ) f i l m s ;d c t e r m s : s u b j e c t c a t e g o r y :1986 h o r r o r f i l m s .

In this work we use DBPedia URIs, given their relationto semantic (SKOS) categories. In order to provide suchinformation, however, we require a link between a given itemwithin one of our recommendation datasets and the URI thatdenotes that movie item, prompting the question: How canitems be aligned to their semantic web URIs? Our method forsemantic URI alignment functioned as follows: first, we usedthe SPARQL query from [8] to extract all films (instancesof dbpedia-owl:Film) from DBPedia which contained ayear within one of their categories:4

SELECT DISTINCT ? movie ? t i t l e WHERE {? movie r d f : t y p e dbped ia�owl : Fi lm ;

r d f s : l a b e l ? t i t l e ;d c t e r m s : s u b j e c t ? c a t .

? c a t r d f s : l a b e l ? y e a r .FILTER langMatches ( l a n g ( ? t i t l e ) , ”EN” ) .FILTER r e g e x ( ? year , ”ˆ[0�9]{4} f i l m ” , ” i ” )

}

Using the extracted mapping between the movie URI(?movie) and title (?title) we then identified the setof candidate URIs (C) based on performing fuzzy matchesbetween a given item’s title and the extracted title fromDBPedia. Fuzzy matches were performed using the Leven-shtein similarity metric (derived from the normalised reciprocalLevenshtein distance) and setting the similarity threshold to0.9. We use fuzzy matches here due to the different formsof the movie titles and abbreviations within the datasets andlinked data labels. After deriving the set of candidate URIs,we then dereferenced each URI and looked up its year tosee if it appears within an associated category (i.e. ?moviedcterms:subject ?category). If the year of the movieitem appears within a mapped category (?category) thenwe identified the given semantic URI as denoting the item.This disambiguation was needed here as multiple films canshare the same title - this often happens with film remakes.This approach achieved coverage (i.e. proportion of itemsmapped) of 83% and 69% for MovieLens and MovieTweetingsrespectively - this reduced coverage for MovieTweetings isexplained by the recency of the movies being reviewed andthe lack of coverage of this on DBPedia at present.

B. Reduced Datasets and the Hipster Dilemma

Based on our alignment of the movie items within eachrespective dataset with their semantic URIs within the webof linked data we derived new reduced datasets - as not allitems can be mapped to URIs. We then further reduced thedataset based on two conditions by only considering users

4We used a local copy of DBPedia 3.9 for our experiments:http://wiki.dbpedia.org/Downloads39

who: (i) have posted ratings within the training and test sets;and (ii) have posted at least 10 ratings within the trainingand validation segment. The reason for the first condition isthat in this work we do not consider cold start situations,in the discussions section of the paper we explain how thework can be extended to consider such a scenario however.The reason for the second condition is that we need sufficientinformation to understand how a given user’s tastes haveevolved semantically, with only a few posts to go on we arelimited in doing this - we expand on this reasoning in thefollowing section. Table I demonstrates the extent to whichthe items, users and ratings have been reduced. There is alarge reduction for MovieTweetings where the nature of ourapproach means that we only consider 11% of users withinthe dataset. We also note that the reduction in the number ofratings is not as great, this suggests two things: (i) mappeditems are popular, and thus dominate the ratings; and (ii)obscure items are present within the data.

TABLE I. STATISTICS OF THE REVISED REVIEW DATASETS USED FOROUR ANALYSIS AND EXPERIMENTS. REDUCTION OVER THE ORIGINAL

DATASETS ARE SHOWN IN PARENTHESES.

Dataset #Users #Items #RatingsMovieLens 5,390 (-11%) 3,231 (-12.1%) 841,602 (-6.7%)MovieTweetings 2,357 (-89%) 7,913 (-30.8%) 73,397 (-38.2%)Total 7,747 11,144 914,999

As Table I suggests, certain more ‘obscure’ movies donot have DBPedia URIs; despite our use of the most recentDBPedia datasets (i.e. version 3.9) coverage is still limited incertain places. The reason for this lack of coverage for certainitems is largely due to the obscurity of the film not having awikipedia page. For instance, for the MovieLens dataset we failto map the three movies ‘Never Met Picasso’, ‘Diebinnen’ and‘Follow the Bitch’, despite these films having IMDB pages theyhave no wikipedia page, and hence no DBPedia entry. For theMovie Tweetings dataset we fail to map ‘Summer Coda’ and‘The Potted Psalm’, both of which, again, have IMDB pagesbut no wikipedia page. We define the challenge of addressingthe obscurity of movies, and thus their lack of a DBPedia entry,as the hipster dilemma. In the discussions section of the paperwe outline several strategies for dealing with this in the futuregiven its presence not only in this work but in the prior workof Ostuni et al. [8], and its potential effect on future workwithin this domain.

IV. SEMANTIC TASTE EVOLUTION

Now that we have a mapping between the items within ourmovie recommendation datasets and their URIs on the web oflinked data, we can examine how users have rated semanticcategories in the past and how their tastes have evolved, at thissemantic level, over time. From this point onwards we reservethe following special characters for set notations, as follows:

• u, v denote users, and i, j denote items.

• r denotes a known rating value (where r 2 [1, 5] orr 2 [1, 10]), and r̂ denotes a predicted rating value.

• Datasets are provided as quadruples of the form(u, i, r, t) 2 D, where t denotes the time of the rating,and are segmented into training (D

train

), validation

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 7

Dataset: MovieTweetings + URI Alignment

{(ItemID,<URI>)}

Training (first 80%), Validation (next 10%), Testing (final 10%)

Movie recommendation datasets provide numeric IDs for movie items, whichusers have rated, and assign metadata to these IDs: title, year, etc. As we shallexplain below, the user profiling approach models users’ a�nity to semanticcategories, therefore we need to align the numeric IDs of movie items to theirsemantic web URIs - to enable the semantic categories that the movie has beenplaced within to be extracted.4. In this work we use DBPedia SKOS categoriesas our semantic categories.

The URI alignment method functioned as follows: first, we used the SPARQLquery from [7] to extract all films (instances of dbpedia-owl:Film) from DBPe-dia which contained a year within one of their categories.5 Using the extractedmapping between the movie URI (?movie) and title (?title) we then iden-tified the set of candidate URIs (C) by performing fuzzy matches between agiven item’s title and the extracted title from DBPedia - using the normalisedreciprocal Levenshtein distance and setting the similarity threshold to 0.9. Weused fuzzy matches here due to the di↵erent forms of the movie titles and ab-breviations within the dataset and linked data labels. After deriving the setof candidate URIs, we then dereferenced each URI and looked up its year tosee if it appears within an associated category (i.e. ?movie dcterms:subject

?category). If the year of the movie item appears within a mapped category(?category) then we identified the given semantic URI as denoting the item.This disambiguation was needed here as multiple films can share the same ti-tle (i.e. film remakes). This approach achieved coverage (i.e. proportion of itemsmapped) of 69%: this reduced coverage is explained by the recency of the moviesbeing reviewed and the lack of coverage of this on DBPedia at present. Table 1presents the statistics of the dataset following URI alignment.

From this point on we use the following notations to aid comprehension: u, vdenote users, i, j denote items, r denotes a known rating value (where r 2 [1, 10]),r̂ denotes a predicted rating value, c denotes a semantic category that an itemhas been mapped to, and cats(i) is a convenience function that returns the setof semantic categories of item i.

Table 1. Statistics of the MovieTweetings dataset with the reduction from the originaldataset shown in parentheses.

#Users #Items #Ratings Ratings Period14,749 (-22.5%) 7,913 (-30.8%) 86,779 (-25.9%) [28-03-2013,23-09-2013]

4 E.g. The 1979 Sci-Fi Horrow film Alien is placed within the semantic categories ofAlien (franchise) films and 1979 horror films, by the dcterms:subject predi-cate.

5 We used a local copy of DBPedia 3.9 for our experiments:http://wiki.dbpedia.org/Downloads39

"I rated Aliens 8/10 http://www.imdb.com/title/tt0133093/ #IMDb"!

https://github.com/sidooms/MovieTweetings

Page 9: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

● ●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.00

0.01

0.02

0.03

Validation

#Unreviewed Categories

p(x)

100 101 102 103

µ=99 ●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●

●●●

●●●●●●●●●●●

●●

●●●●●●●●

●●●●●●●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●● ●

0.00

00.

010

0.02

00.

030

Test

#Unreviewed Categoriesp(

x)100 101 102 103

µ=138

Quantification: Cold-Start Categories Problem

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 8

Page 10: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Transferring Semantic Categories: Vertex Kernels

dbpedia:c1!

dbpedia:c3!dcterms:subject!

dbpedia:c2!

dbpedia:_1! dbpedia:_2! …! dbpedia:_n!dbpedia:_3!

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 9

dbpedia:c4!

dbpedia:c5!

Arbitrary predicate

Page 11: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Triple-Object Vectors of Categories

c2! c4!

k(φ(c2), φ(c4))!

Category Transfer Function

Table 1. Statistics of the MovieTweetings dataset with the reduction from the originaldataset shown in parentheses.

#Users #Items #Ratings Ratings Period14,749 (-22.5%) 7,913 (-30.8%) 86,779 (-25.9%) [28-03-2013,23-09-2013]

here should the semantic categories of the item not have been rated by the userpreviously: we term this issue as the cold-start categories problem. To demon-strate the extent to which this issue occurs, consider the plots shown in Fig. 1.On the left subfigure, there are two plots: the leftmost shows the association be-tween the number of reviewed and unreviewed categories for each user when usethe training dataset as background knowledge from which to form the profile ofthe user and the validation split to be the unseen data; while the rightmost figureshows the same association but with the training and validation splits as back-ground data and the test split as the unseen data. In this instance we see that,generally, the more categories that a user has rated, then the fewer categoriesthey have missed. The right subfigure shows the relative frequency distributionof these missed categories across the validation and test splits, respectively. Wefind here that there is a heavy tailed distribution, indicating that only a rel-atively small number of categories are missed, given the means, but that thisoccurs across many users. It is therefore evident that cold-start categories holdadditional information that a recommendation approach should consider.

4.1 Category Transfer Function and Vertex Kernels

To overcome the problem of cold-start categories we can include a user’s pref-erences for rated semantic categories, where such categories are similar to theunreviewed categories. This is where the utility of the semantic web becomesevident: as the semantic categories are related to one another via SKOS rela-tions (i.e. broader, narrower, related, etc.), we can identify semantic categoriesthat are similar to the unreviewed categories by how they are related within thelinked data graph - in a similar vein to prior work [8, 2]:

Definition 1 (Linked Data Graph). Let G = hV,E, Li denote a linked datagraph, where c 2 V is the set of concept vertices within the graph (i.e. resources,classes), e

cd

2 E is an edge, or link, connecting c, d 2 V and �(ecd

) 2 L denotesa label of the edge - i.e. the predicate associating c with d.

Now, let C be the set of categories that a given user has rated previously,and D be the set of categories that a given item has been mapped to, then wedefine the Category Transfer function as follows:

f(C,D) = {argmaxc2C

k�c, d

�: d 2 D} (1)

The codomain of the Category Transfer function is therefore a subset of theset of categories that the user has rated beforehand (i.e. C 0 ⇢ C). In the above

dbpedia:c4!

dbpedia:c2!

dbpedia:_1!

dbpedia:_2!

…!

dbpedia:_n!

dbpedia:_3!0

0.5

0.5

0

0

0

0.5

0.5

● ●

●● ●●●●

● ●●

●●●●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●

●●●

●●● ●●●

●●●●

●●●●●●

● ●●●

●●●●●●●●●●●

●●

●●

●●

●●●●●●

●●●

●●●●●●●●●●

●●●●●

●●●●

●●●●

●●●●

●● ●●●●

●●●●●

●●●●●●●

●●●●●●●

●●● ●●●●

●●

●● ●● ●●●

●●●

●●●●

●●● ●●

●●● ●●

●●●●●●●●●

●●

●●● ●●

●●● ●● ●●●●●

●●

●● ●

●●● ●

●●

●● ●●●●● ●●●●●●●●●

●●●●●●●●

●●

●●●●●

●●●●●●●●

●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●

●●

●●● ●●● ●●●●●●●

●●●●

●●●●

●●●

●●●●●●

●●●●

●● ●●

●●●

●●●

●●●●

●●●●●●●

●●●

●●●

●●● ●●● ●

●●

●● ●

●●●●●

●●

●●●●●●●●

●●

●●●●

●●●●

●●●●

●●●●●●●●

●●●

●●●●

●●●●●●

●●●●

●●●

● ●●●●

●●●●●●●●●●

●●●

●●●●●●

●●●●●

●●●●●●

●●●●●●●●

●●●

● ●●●●● ●

●●●●●●

●●●●

●●●●●●

●●●●●●●

●●

●●●●

●●●●●●●●●●●

●●●●

●●

● ●●●

●●

● ●●●

● ●●●●●●● ●●●●●●●●●

●●●●●●●

●●

●●●●

●●● ●●●●

●●●●●

●●●●●●●

●●

●● ●

●●●●

● ●●●●●●●●

●●●

●●●●●●

●●

●●●●●

●●

●●

●●●

●●●●●●●●●●●

●●●●

●●●●●●

●●●●●●●● ●●●●●●

●●

●●●●●

●●

●●●●●

●●

●●●●●●●●

●●●

●●●● ● ●●●

●●●●●●● ●●

●●●

●●

●●●●●

●●● ●

●●●

●●●●●

●●●

●● ●●

●●

●●

●●

● ●●●

●●

●●●●●●●● ●●●●

●●●

●●●

●● ●●●

●●

●●●

●●

●●●●●●

●●●●●●●

●●

●●

●●●●●●●●●●●●●●● ●●● ●

●●●●●

●●

●●● ●

●●●●●●●●●●

●●●●●●

●●●

●●●●

●●

●●●

●●

●●●●

●●● ●

●●●●●● ●

●●

●●

●●●

●●●●●

●●●

●● ●●●

●●

●●●●●●●●●●●●●●●

●●●

●●

●●●●●●●●●

●●●●●●●●●

●●

●●

●●

●●●●●●●

●●●

●●●●●●●●●

●●

●●●● ●●●

●●●

●●

●●

●●●●●●●●

●● ●●

● ●

●●●●●●●

●●

●●●●

●●

●●●●●

●●●

● ●●●

●●●

●●●

●●●

●●●●●●

●●●●

●●

● ●

●●

● ●●●●●●●

●●●●● ●●

● ●●●●●●

●●●●●●●●● ●

●●●●●

●●

●●

●● ●

●●●

●● ●●●

●●

●●●●●●●●●●

● ●

●●●●

●●● ●●●●●●● ●●●●●

●● ●●●

●●●●●

●●●●

●●

●●

●●

●●

●●●●●●●●● ●

●●

●●

● ●●

●●●●

●●●

●●●

●●

●●●●●●●● ●●●●●●●●●

●●●

●●

●●

●●●

●● ●

●●●●

● ●●●●●

●●

●●●●

●● ●

●●●

●●●●●●

●●●●●●●●●● ●

●●●

●●●●●●

●●●●

●●●●

● ●●●●●●●

●●●● ●●●●●●●●● ●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●●

●● ●●●

●●●●●●●

●●●

●●

●●●●● ●●●●●●

●●●

●●●●●●●

●●●●

●●●●●●

●●● ●●

●●

●●

●●●●●●●●●

●●

●●● ●

●●

●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●

●●●●

●●

●●●●●●●

●● ●●●●●

●●●●●●

●● ●

●●● ●

●●

●●

●●●● ●

●●●

●●●●

●●●●● ●●●●●● ●●●●●●

●●

● ●●

●●

●●●

●●

●●●

●●●●●●●●●●

●●●●●●●●●

●● ●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●●

●●●

●●●

●●●●●●●

●●●●

●●●●●●

●●●●●●●●●

●●●●●

●●●

●●●●●●

●●●●●●●●●

●●

●●

●●

●●●

●●●●

●●●●●●●●●●●●

●●●

●●●●●

●●●●

●●●●●●●●

●●●●

●●●●●

●●

●●

●●●●●●●●

●●

●●

●●

●●●●

●●●●

●●●●●

●●●●●

●● ●

●●

●●●●●●●●●●●●● ●●●●●●●●●

●●

●●●●● ●●●

●●● ●●

●●

● ●●

●●● ●● ●●●●●●●

●●●●●●●

●●● ●●●

●●●

●●

● ●● ●●

●●●●●●●

●●

●●

●●●●●●●●●●

●●●

●●●● ●●● ●●●

●●

●●●

●●●●

●●

●●

●●

●●●●●

●●●

●●●●●●●●●●

●● ●

●● ●●●●

●●●●

●●

●●

●●●●●

●● ●●●●●●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●

● ●●●●

●●●●●

●●●●●●●

● ●●●●●●●●●●●●●

●●

●●●●●●●

●●●● ●●●●●● ●●●●●● ●●●●●●

●● ●●●● ●●●

● ●

●●

●●●

●●●●●●

●●

●●●●●

●●●●●●●● ●●●

●●

●●

●●●●●●●●●●●●●

●●●

●●●●

●●●●

●●●

●●●●●

●●

●●●●●●●●

●●●●

●●●

●● ●●

●●●●●

●●

●●

●●● ●●

●●●●●●●

●●

●●● ●●●● ●●●●

●●

●●●●●●●●

●●●●

●●●●●●●●

●●●

●●●●

●●●●

●●●●●●●●●●●

●●●

●●

●●

●●●●●

●●

●●●●●●●●●●

●●●

●●●●●

●●●

●●●●●●●●● ●●●

●●

●●●●●

●●●●●●●●

●●●

●●●●●●●●● ●● ●●●●

●●●

●●●●●●●●●●●

●●●●●●●●●

●●●●●

●●

●●●

●●●● ●●●

●●●

●●●●

●●

●●●

●●●●●●●●

●●●●●●●●

●●●●

●●●●

●●

●●●

●●●●●●●●●●

●●

●●●●●

●●

●●● ●●●● ●●

●●●●●●●●●●

●●●●●

●●●●●●●●●

●●●

●●●●●●

●● ●●●●●●●●●

●●

●●●

●●

●●●●●

●●●

●●

●●●

●●

●●●●

●●●●●●

●●●●●

●●●

●●●●● ●●

●●●●●●●

●●●●●●●

●●

●●●●

●●●

●●● ●

●●●

●●●

●●

●●●

●●●●●

●●●● ●●●

●●●●●●●●

●●●●●●

●●●●

●●

●●●●●●

●●

● ●●●

●●●●

●●●●

●●●●

●●

●●●●

●●● ●●●● ●

●●

●●●●●●

●●

●●●

●● ●

●●

●●●●

●● ●●

●●●●

●●●●

●●

●●●●●●●

●●

●●●

●●●

●●

●●●●

●●●●

●●●●

●●● ●

●●

●●●

●●

●●● ●●●●

●●●●●●

●●●●●●●●

●●●●

● ●●

●●

●●

●●●

●●●● ●

●●

●●●●

●●●

●●●

●●●

●●

●●●●●

●●●●●●●●●●●●● ●●●

●●● ●

●●

●●●

●●

●●●●

●●●●●

●● ●●●●●●● ●●●●●●●

●●

●●●

●●

●●●●●●

●●●●

● ●●●

●●●

●●●●●●●

●●●●

●●●●●

●●

●●●●

●●●

●●●●

●●●●

●●●

●●

●●●●●

●●●

●●●

●●●●

●●●●●●

●●●

●●●●

●●●●

●●●

●●●

●●●●

●●

●● ●

●● ●●

●●● ●●●

●●●●

●●

●●●●●●●●

●● ● ●●●●●●●●●●●

●●

●● ●● ●

●●●●● ●●

●●●●●

●●●

●●

0 1000

010

030

050

0

Validation

Reviewed

Unr

evie

wed

Cat

s

●●● ●

● ●●●●● ●●●●●

●●●●●

●●●●●●

● ●●●●

●●●●● ●●

●●●●●●

●●

●●●●●●●●●●

●●●●●●● ●●●●●●

●●●

●●●●●●●●●●●● ●●●●●●●●●●●

●●●

●●●●●

●●●●●● ●●●●●●●●●●●●●●

●●●● ●●● ●●●

●●●●●●●●●

● ●●●●● ●●●●●●

●●

●●● ●●● ●●●● ●●●●●●●●●●

● ●●

●●●●●●●●

●●●●●●●

● ●●● ●●●●●●●●●●●●●●●

●●

●●●●●●●●●● ●●● ●

●●●●● ●●

●●● ●●●●●●●●●●●●● ●●● ●●

●●●●●●●● ●

●●●●● ●● ●●●●●●●●●●●●●●●

●●

●●●●●●●●

●●●●●●

●● ●●●

●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●

●●●●●●●●●● ●●●

●●● ●

● ●●●● ●●

●●●●●● ●● ●●●●●●●●●●●● ●●

●●●

●● ●●● ●●●● ●●● ●●●●● ●● ●

● ●●●

●●●●●●●●●●●●●● ●● ●●●

●●●●●●

●● ●●●●●●●●●●●●●●●

●● ●●●●●●●●●●●

●●●●●

●●●●

●●

●●●● ●

●● ●●●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●●●●●●●●●

●●● ●●●●●●●●●●●

●●●●●● ●●●●●

● ●●●●●●●●●●●●●●●●●●

●●●●●●● ●●● ●●

●●●●

●●●

●●●

●●●●●●

●●●●●●●●●●●●●

●●●●●● ●●●●

●●●●●● ●●●●●●●●● ●●

●●●●●●●

●●●●●●●●●●

●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●

●●●●●●●●●●●

●●● ●●●●●●●●●●●●

●●● ●●●●●●●● ●●●●

●●●● ●●●●●● ●●●●●

●●●●●●●●●●●●●●●●

●● ●●●

●●● ● ●●●●●●●●●●●●●●

●● ●●●●

●●●

●●●●●●●●●●●●

●●●●●

●●●●●● ●●●●● ●●●●● ●

●● ●●●●

●●●●●●●●●

●●● ●●●●●● ●●●●●●●

●●

●●●●●

●● ●●●●●●●●●●

●●●●

●●●●● ●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●● ●●●●

●●● ●●●●●●●●●●●●●●●

●●●●

●●

●●●

●●●● ●●●● ●

●●●●●●●●●●●●●●

●●●●● ●

●●●●●●●●●●●●●●●●

● ●● ●●

●●

●●●●●●●

●●●●●●●●

●●●●●●● ●●● ●●●●●●

●●

●●●●●●●●●

●●●●●

●●●●

●●● ●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

● ●●●●●●

●● ●●● ●●●●●●●●●●●

●●●●●●

●●● ●●●●●●

●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●● ●●●●●●● ●●●●●●●●● ●●●●●

●●●

●●●●●●

●●

●●● ●●●●●●●

●●● ●●●●

●● ●● ●●

● ●●●●●●

●●●●●●

●●●●● ●●●● ●●●● ●● ●●

●●●●●●●

●●●●●●●●●●●●●●●●● ●●●

●●

●● ●●●●●● ●●●●●●●●●

●● ●●● ●●●●●●●● ●● ●●●●●●

● ●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●● ●●

●● ●●●●●●●●●●●

●●●

●●●

●●●●●●

●●●●● ●●●●●●●●●

● ●●●●●●●●●●●● ●●

●●●●●● ●●

●●●●●●

● ●●●●●●●●●●●●●

●●●

● ●●●●●●●

●●● ●●●●●● ●●●●●

●●●

●●●●●●●●●

● ●●●●●●●

●● ●●●●●

●●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●

●●●●●●●●● ●● ●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●● ●●●●

●●

●●●● ●●●●

●●●●●● ●●● ●●●●●● ●●●●●

●●●●●●●●●●●●● ●●

●●●

●●●●●●●●●●●●●●●

●●●●● ●●●●●●

●●●● ●●●●●● ●●●●●

●●

●●

●●●●●●●●●●

●●●

●●●●●●●●●

●●●●● ●● ●●●●●

●●●

●●

●●● ●● ●●●●● ●●●●●●●●●●●

●●●

● ●●●●●●

●● ●●●●●●●●

●●● ●●●●●●● ●

●●

●●●

●●●●●

●●●

●●●●●●

●●● ●●●●●●●●●●●●●●●●●●●●

●●● ●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●● ●●●

●●●

●●●●

●●●

●●●●● ●●●●●●●●●

●●●●●●

●●

●●●●●● ●●●●

●●●●●●●●●

●●●●●●● ●●●●

●●

●●●●●●●

●●●

●●

●●●● ●

● ●●

●●●●●●●●●●

●●●●●●●●● ●●

●●●●●●

●●● ●● ●● ●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●● ●●

●●●●●●●● ●●●

●●●●

●●● ●

●●●●

●●●●●

●●●●●

●●●●●●●●●●●●●●● ●●●●●●●●

●●●

●●●

●●●●●●●●●

●●●●● ●●●●●

●●

●●●

● ●● ●●●●●●●●●●●●●●●●●●

●●

●●●● ●●●●●●

● ●●●●● ●●●●● ●●●●●●

● ●●●●●●●●●●●●●●

●●●

●●

●●

●●●●●●●

●●

●● ●●

●●

●●●●●●

●●●

●●●●●●●●●●●●

●●●

●●●●●

●●●●●● ●●●●

●●●●●●●●

●●●●●●●● ●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●

●●

●●● ●●●●●●●●●● ●

●●●●●●●● ●●●●●●●● ●●●

●● ●

●●●●●●●

●●●● ●●●●

●●●

● ●●●●

●●●●

●●●●●●●●●

●●

●●

●●●●●●●●

●●●●●●

●●●

●●●●●●● ●●●

●●●●

●● ●●●●

●●●●●●●●

●●●●●●●●●●●● ●●●●●●

●●●●

● ●●●●●●●●

●●●●●●●●●●●

●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●● ●●● ●●●●●●

●●●

●●●

● ●●●●●●●●●●● ●●●●●●●●●●●

●●

●●●●●●●●●

●●●●

●●

●●●●●●●●●

● ●●●●●● ●●●

●● ●●●●● ●●●●●●●●●●●●●● ●

●●●●●●●

● ●●●●●●●●● ●●

●●

●● ●●●●●●●●●●●

● ●●●● ●●●●●●●●●●●●● ●●●●●●

●●

●●

●●

●●●

●●●●●●●●●●●●●●

●●●●●● ●●●●●●●●●● ●●● ●●● ●●●●

●●●

●●●

●● ●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●● ●●●●

●●

●●●●●●● ●●●●●

●●●●●●●●●●● ●●●●

●●●●●● ●●●●●●●●●●●

●●●●●●●●●●

●●●

●●●

●● ●●●●●●●●●

●●● ●●●●●

●●●●●●● ●●●●●●●●

●●

●●●● ●

●● ●●

●●● ●

●●●●●●●●●●●●●●●●●● ●●●●

●●●

●●●●●●●●●●● ●● ●

●●●●● ●●●●●●

●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●

●●●●●●● ●●●●

●●●●●●●

●●●●●●●●●●●●●●●● ●●●●●●●●●●●●

●●●●

●●●

●●●

●● ●●●●●●● ●●●● ●●●●●●●●●●●

● ●●●●●●●●●●●●

●●●●

●●

●●

●●●●●●●●●●●●●●●●●●● ●●

●●

●●●●●● ●

●●●●●

●●

●●●

●●● ●●●●●●●●●●

●●

●●●●●●●●●●● ●●●●●●

●● ●●●●●

●● ●●●●

● ●●●●●

●●●●●●●●● ●●

●● ●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●● ●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●

●●● ●●●●●●●● ●●●● ●●●●●●●●● ● ●● ●●●●●●●

●●●

● ●●●●● ●●●●● ●● ●●●●●

●●●●●

●●●●●●●●● ●●●

●●●●●

●●

●●●●●

●●●●●●●●●

●●●●●●● ●●●●

●●

●●●●●

●●● ●●●●

●●●●●●●● ●●●●●●

●●●●●●●

●●●

● ●●●● ●● ●●●●●

●●●●●●●●●

●●

●●

●●●●●●●●●● ● ●●●●●●●●●● ●●●●●●

●●●● ●●●●●●●●●●● ●●●●●●●● ●●

●●

0 1000

020

060

010

00

Test

Reviewed

Unr

evie

wed

Cat

s

(a) Reviewed vs. Unreviewed Categories

●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.00

0.02

0.04

Validation

#Categories

p(x)

100 101 102

µ=23

●●

●●●

●●●

●●●●●●

●●●●●

●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.00

0.02

0.04

0.06

Test

#Categories

p(x)

100 101 102 103

µ=26

(b) Distribution of Unreviewed Categories

Fig. 1. Distribution of unreviewed categories across the dataset splits: the left plotshowing the association between the number of reviewed and unreviewed categories foreach user; and the right plot showing the relative frequency distribution of unreviewedcategories within each split.

function the vertex kernel k computes the similarity between categories c andd: this is often between the features vectors of the categories returned by themapping function �:

Definition 2 (Vertex Kernel). Given graph vertices u and v from a linkeddata graph G0, we define a vertex kernel (k) as a surjective function that mapsthe product of two vertices’ attribute vectors into a real valued space, where �(u)is a convenience function that returns kernel-specific attributes to be used by thefunction (i.e. an n-dimensional attribute vector of node u: �(u) 2 Rn). Hence:

k : V ⇥ V ! R (2)

k(�(u),�(v)) 7�! x (3)

Given this formulation, we can vary the kernel function (k(., .)) to measurethe similarity between arbitrary categories based on the topology of the linkeddata graph that surrounds them. All the kernels considered in this paper functionover two nodes’ feature vectors. Therefore, to derive the feature vector for a givencategory node (c), we include information about the objects that c is linked towithin the linked data graph. Let <c ?p ?o> define a triple where c appearswithin the subject position. We can then populate a vector (x) based on theobject concepts that c links to over 1-hop: �1 2 Rn - where n denotes thedimensionality of the vector space. This can also be extended to n hops awayfrom c by traversing edges away from c and collecting the objects within thetraversed triples. Each element in the vector is weighted by the out-degree of

Vertex Kernel Function

Rated Categories

Categories of the Item

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 10

Vary k(.,.): Cosine, Dice, Squared Euclidean, JS-Divergence

Page 12: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

User Profiling with Semantic Categories

Split user’s training ratings into 5-stages

Derive the user’s average rating per semantic

category

Calculate the probability of the user rating the category

highly

For each stage…

Pus

categories in the past and how their tastes have evolved, at thissemantic level, over time. From this point onwards we reservethe following special characters for set notations, as follows:

• u, v denote users, and i, j denote items.

• r denotes a known rating value (where r 2 [1, 5] orr 2 [1, 10]), and r̂ denotes a predicted rating value.

• Datasets are provided as quadruples of the form(u, i, r, t) 2 D, where t denotes the time of the rating,and are segmented into training (D

train

), validation(D

train

) and test (Dtest

) sets by the above mentionedcutoff points.

• c denotes a semantic category that an item has beenmapped to, and cats(i) is a convenience function thatreturns the set of semantic categories of item i.

A. Semantic Taste Profiles

Semantic taste profiles describe the preferences that a userhas at a given point in time for given semantic categories.We are interested in understanding how a profile at one pointin time compares to a profile at an earlier point in time,in essence observing if taste evolution has taken place. Inrecent work by McAuley and Leskovec [5] the assessment ofuser-specific evolution in the context of review platforms (e.g.BeerAdvocate and Beer Review) demonstrated the propensityof users to evolve based on their own ‘personal clock’. Thismeans that if we are to segment a user’s lifetime (i.e. timebetween first and last rating) in the recommendation datasetinto discrete lifecycle periods where each period is the samewidth in time, then we will have certain periods with noactivity in them: as the user may go away from the systemduring the mid-point of their lifetime, and then return later.To counter this we divided user’s lifecycle into 5 stages whereeach stage contains the same number of reviews, we denotethis as ‘activity-based lifecycle slicing’. Prior work has used 20

lifecycle stages [10], [5] to model user development, howeverwe found this number to be too high as it dramatically reducedthe number of users for whom we could mine taste evolutioninformation - i.e. a greater number of stages requires moreratings.

To form a semantic taste profile for a given user we usedthe user’s ratings distribution per semantic category within theallotted time window (provided by the lifecycle stage of theuser as this denotes a closed interval - i.e. s = [t, t0], t < t0).We formed a discrete probability distribution for category c attime period s 2 S (where S is the set of 5 lifecycle stages)by interpolating the user’s ratings within the distribution. Wefirst defined two sets, the former (Du,s,c

train

) corresponding to theratings by u during period/stage s for items from category c,and the latter (Du,s

train

) corresponding to ratings by u during s,hence Du,s,c

train

✓ Du,s

train

, these sets are formed as follows:Du,s,c

train

= {(u, i, r, t) : (u, i, r, t) 2 Dtrain

, t 2 s, c 2 cats(i)}(1)

Du,s

train

= {(u, i, r, t) : (u, i, r, t) 2 Dtrain

, t 2 s} (2)

We then defined the function avrating to derive theaverage rating value from all rating quadruples in a given set:

avrating(Du,s

train

) =

1

|Du,s

train

|X

(u,i,r,t)2D

u,strain

r (3)

From these definitions we then derived the discrete prob-ability distribution of the user rating the category favourablyas follows, defining the set Cu,s

train

as containing all uniquecategories of items rated by u in stage s:

Pr(c|Du,s

train

) =

avrating(Du,s,c

train

)X

c

02C

u,strain

avrating(Du,s,c

0

train

)

(4)

When implementing this approach, we only consider thecategories that item URIs are directly mapped to; that is,only those categories that are connected to the URI by thedbterms:subject predicate. Prior work by Ostuni et al.[8] performed a mapping where grandparent categories weremapped to URIs, however we chose the parent categories inthis instance to open up the possibility of other mappings inthe future - i.e. via linked data node vertex kernels.

B. User Taste Evolution: From Prior Taste Profiles

We now turn to looking at the evolution of users’ tastesover time in order to understand how their preferences change.Given our use of probability distributions to model the lifecyclestage specific taste profile of each user, we can apply infor-mation theoretic measures based on information entropy. Onesuch measure is conditional entropy, it enables one to assessthe information needed to describe the taste profile of a userat one time step (Q) using his taste profile from the previousstage (P ). A reduction in conditional entropy indicates thatthe user’s taste profile is similar to that of his previous stage’sprofile, while an increase indicates the converse:

H(Q|P ) =

X

x2P,

y2Q

p(x, y) logp(x)

p(x, y)(5)

We derived the conditional entropy over the 5 lifecycleperiods in a pairwise fashion, i.e. H(P2|P1), . . . , H(P5|P4),for each user, and plotted the curve of the mean conditionalentropy in Figure 2 over each dataset’s training split - includingthe 95% confidence intervals to show the variation in theconditional entropies. Figure 2 indicates that MovieLens userstend to diverge in their ratings and categories over time, giventhe increase in the mean curve towards later portions of theusers’ lifecycles, the same is also evident for Movie Tweetings,however the increase is more gradual there. This means thatthe semantic taste profiles of users evolve away from theirprevious preferences, suggesting that they are less likely tohave follow their prior tastes and branch out to new semanticcategories

C. User Taste Evolution: Susceptibility to Global Influence

Our second analysis looks at the influence that usersin general have on the taste profiles of individual users -modelling user-specific (local) taste development and globaldevelopment as two different processes. We used transferentropy to assess how the taste profile (P

s

) of a user at one timestep (s) has been influenced by (his own) local profile (P

s�1)and global taste profile (Q

s�1) at the previous lifecycle stage(s�1). For the latter taste profile (Q

s�1), we formed a globalprobability distribution (as above for a single user) using all

Time

1 2

5

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 11

Page 13: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Taste Evolution over Rated Categories 0.

275

0.28

00.

285

0.29

0

Lifecycle Stages

Con

ditio

nal E

ntro

py

●●

1 2 3 4 5

Users diverge from prior tastes (increase)

0.11

20.

114

0.11

6

Lifecycle Stages

Tran

sfer

Ent

ropy

● ●●

1 2 3 4 5

Users are not influenced by global tastes (increase)

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 12

Page 14: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Putting it all together: SemanticSVD++!

0.22

50.

235

0.24

5

Lifecycle Stages

Con

ditio

nal E

ntro

py

1 2 3 4 5

(a) MovieLens

0.27

50.

280

0.28

50.

290

Lifecycle Stages

Con

ditio

nal E

ntro

py●

1 2 3 4 5

(b) MovieTweetings

Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

users who posted ratings within the time interval of stage s.Now, assume that we have a random variable that describes thelocal categories that have been reviewed at the current stage(Y

s

), a random variable of local categories at the previous stage(Y

s�1). and a third random variable of global categories at theprevious stage (X

s�1), we then define the transfer entropy ofone lifecycle stage to another as follows [11]:

TX!Y

= H(Ys

|Ys�1)�H(Y

s

|Ys�1, Xs�1) (6)

Using the above probability distributions we can calculatethe transfer entropy based on the joint and conditional prob-ability distributions given the values of the random variablesfrom Y

s

, Ys�1 and X

s�1:

TX!Y

=

X

y2Ys,

y

02Ys�1,

x2Xs�1

p(y, y0, x) logp(y|y0, x)p(y|y0) (7)

We derived the transfer entropy between consecutive lifecy-cle periods, as with the conditional entropy above, to examinehow the influence of global and local dynamics on users’taste profiles developed over time. Figure 3 plots the meansof these values across the lifecycle periods together with the95% confidence intervals. We find that for users of MovieLenstransfer entropy decreases over time, indicating that globaldynamics have a stronger influence on users’ taste profilestowards later lifecycle stages. Such an effect is characteristicof users becoming more involved and familiar with the reviewsystem, and as a consequence paying more attention to others’ratings. With Movie Tweetings we find a different effect:users’ transfer entropy actually increases over time, indicatingthat users are less influenced by global taste preferences, andtherefore the ratings of other users.

V. SEMANTICSVD++

Examining the semantic taste evolution of users indicatestwo aspects: firstly, users tend to diverge away from theirpast rating behaviour; and secondly, users are susceptibleto global influence in a different manner - i.e. MovieLensusers are more susceptible than MovieTweetings. This leadsto one of our research questions: how can semantic tasteevolution be incorporated within recommender system? Inthis section we address this by incorporating semantic tasteinformation into a matrix factorisation model that we have

0.12

00.

122

0.12

4

Lifecycle Stages

Tran

sfer

Ent

ropy ●

1 2 3 4 5

(a) MovieLens

0.11

20.

114

0.11

6

Lifecycle Stages

Tran

sfer

Ent

ropy

● ●●

1 2 3 4 5

(b) MovieTweetings

Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

named SemanticSV D++, an extension of Koren et al.’s ear-lier SV D++ model [2]. The predictive function of the modelis shown in full in Eq. 8, we now explain each component ingreater detail.

r̂ui

=

Static Biasesz }| {µ+ b

i

+ bu

+

Category Biasesz }| {↵i

bi,cats(i) + ↵

u

bu,cats(i)

+

Personalisation Componentz }| {

q

|i

p

u

+ |R(u)|� 12

X

j2R(u)

y

j

+ |cats(R(u))|� 12

X

c2cats(R(u))

z

c

!

(8)

A. Static Biases

The static biases include the general bias of the givendataset (µ), which is the mean rating score across all ratingswithin the training segment; the item bias (b

i

), and the user bias(b

u

). The item bias is the average deviation from the mean biasfor the item i within the training segment, while the user biasis the average deviation from the mean bias from the trainingsegment’s ratings by user u.

B. Category Biases

The examination of user taste profiles demonstrated theevolution of user’s tastes for different semantic categoriesover time. We can encapsulate such information within ourrecommendation model by including: (i) biases towards cate-gories given general rating behaviour, and (ii) biases towardscategories by a specific user. We begin with the former.

1) Item Biases Towards Categories: We model the biasesthat an item may have given the categories it has been linkedto by capturing the proportional change in category ratingsacross the entire dataset - i.e. in general over the providedtraining portion. To do this we derived the development ofall users’ preferences for a given category c throughout thetraining segment, where Q

s

is the global taste profile (discreteprobability distribution of all categories) in stage s, and k isthe number of stages back in the training segment from whicheither a monotonic increase or decrease in the probability of

Modified version of SVD++ with: •  User taste evolution captured in semantic category biases •  Semantic personalisation component

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 13

Page 15: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

0.22

50.

235

0.24

5

Lifecycle Stages

Cond

itiona

l Ent

ropy

1 2 3 4 5

(a) MovieLens

0.27

50.

280

0.28

50.

290

Lifecycle Stages

Cond

itiona

l Ent

ropy

●●

1 2 3 4 5

(b) MovieTweetings

Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

users who posted ratings within the time interval of stage s.Now, assume that we have a random variable that describes thelocal categories that have been reviewed at the current stage(Y

s

), a random variable of local categories at the previous stage(Y

s�1). and a third random variable of global categories at theprevious stage (X

s�1), we then define the transfer entropy ofone lifecycle stage to another as follows [11]:

TX!Y

= H(Ys

|Ys�1)�H(Y

s

|Ys�1, Xs�1) (6)

Using the above probability distributions we can calculatethe transfer entropy based on the joint and conditional prob-ability distributions given the values of the random variablesfrom Y

s

, Ys�1 and X

s�1:

TX!Y

=

X

y2Ys,

y

02Ys�1,

x2Xs�1

p(y, y0, x) logp(y|y0, x)p(y|y0) (7)

We derived the transfer entropy between consecutive lifecy-cle periods, as with the conditional entropy above, to examinehow the influence of global and local dynamics on users’taste profiles developed over time. Figure 3 plots the meansof these values across the lifecycle periods together with the95% confidence intervals. We find that for users of MovieLenstransfer entropy decreases over time, indicating that globaldynamics have a stronger influence on users’ taste profilestowards later lifecycle stages. Such an effect is characteristicof users becoming more involved and familiar with the reviewsystem, and as a consequence paying more attention to others’ratings. With Movie Tweetings we find a different effect:users’ transfer entropy actually increases over time, indicatingthat users are less influenced by global taste preferences, andtherefore the ratings of other users.

V. SEMANTICSVD++

Examining the semantic taste evolution of users indicatestwo aspects: firstly, users tend to diverge away from theirpast rating behaviour; and secondly, users are susceptibleto global influence in a different manner - i.e. MovieLensusers are more susceptible than MovieTweetings. This leadsto one of our research questions: how can semantic tasteevolution be incorporated within recommender system? Inthis section we address this by incorporating semantic tasteinformation into a matrix factorisation model that we have

0.12

00.

122

0.12

4

Lifecycle Stages

Tran

sfer

Ent

ropy ●

1 2 3 4 5

(a) MovieLens

0.11

20.

114

0.11

6

Lifecycle Stages

Tran

sfer

Ent

ropy

● ●●

1 2 3 4 5

(b) MovieTweetings

Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.H(P2|P3)) across the datasets, together with the bounds of the 95% con-fidence interval for the derived means.

named SemanticSV D++, an extension of Koren et al.’s ear-lier SV D++ model [2]. The predictive function of the modelis shown in full in Eq. 8, we now explain each component ingreater detail.

r̂ui

=

Static Biasesz }| {µ+ b

i

+ bu

+

Category Biasesz }| {↵i

bi,cats(i) + ↵

u

bu,cats(i)

+

Personalisation Componentz }| {

q

|i

p

u

+ |R(u)|� 12

X

j2R(u)

y

j

+ |cats(R(u))|� 12

X

c2cats(R(u))

z

c

!

(8)

A. Static Biases

The static biases include the general bias of the givendataset (µ), which is the mean rating score across all ratingswithin the training segment; the item bias (b

i

), and the user bias(b

u

). The item bias is the average deviation from the mean biasfor the item i within the training segment, while the user biasis the average deviation from the mean bias from the trainingsegment’s ratings by user u.

B. Category Biases

The examination of user taste profiles demonstrated theevolution of user’s tastes for different semantic categoriesover time. We can encapsulate such information within ourrecommendation model by including: (i) biases towards cate-gories given general rating behaviour, and (ii) biases towardscategories by a specific user. We begin with the former.

1) Item Biases Towards Categories: We model the biasesthat an item may have given the categories it has been linkedto by capturing the proportional change in category ratingsacross the entire dataset - i.e. in general over the providedtraining portion. To do this we derived the development ofall users’ preferences for a given category c throughout thetraining segment, where Q

s

is the global taste profile (discreteprobability distribution of all categories) in stage s, and k isthe number of stages back in the training segment from whicheither a monotonic increase or decrease in the probability of

Transferring Categories using Vertex Kernels

General Category Biases

User Biases to Categories

bu,cats(i)

=⇣�k

⌘Prior Rated Categoriesz }| {

1

|C \ cats(i)|X

c2{cats(i)\C}

Pr(+|c, u)

+⇣1� �

k

⌘Transferred Categoriesz }| {

1

|fk

(C, cats(i)/C)|X

c2fk(C,D)

Pr(+|c, u)

(16)

Here we have �k

-weighted the influence of the transferred categories on thebias in order assess the e↵ects of the transferred categories on recommendationaccuracy. In essence, �

k

forms one of our hyperparameters that we optimisewhen tuning the model over the validation set for a given vertex kernel (k). As�k

2 [0, 1] we can assess its e↵ect: a larger �k

places more emphasis on knowninformation, while a lower �

k

places more emphasis on transferred categories bythe given kernel (k). As the category biases are, in essence, static features weincluded two weights, one for each category bias, defined as ↵

i

and ↵u

for theitem biases to categories and the user biases to categories respectively - theseweights are then learnt during the training phase of inducing the model.

6.4 Personalisation Component

The personalisation component of the SemanticSV D++ model builds on theexisting SV D++ model by Koren et al. [4] by including four latent factor vectors:q

i

2 Rf denotes the f latent factors associated with the item i; pu

2 Rf denotesthe f latent factors associated with the user u; y

j

2 Rf denotes the f latentfactors for item j from the set of rated items by user u: R(u); and we havedefined a new vector z

c

2 Rf which captures the latent factor vector, of f -dimensions, for a given semantic category c. This latter component capturesthe a�nity of semantic categories with latent factors: for instance the DBPediacategory category:1970s science fiction films will have a strong positivea�nity with the latent factor corresponding to Science Fiction films.

6.5 Model Learning

In order to learn our recommendation model (item and user biases, categorybias weights, latent factor vectors) we sought to minimise the following objectivefunction (including L2-regularisation of parameters):

minb⇤,↵⇤,p⇤,q⇤

X

(u,i,t,r)2D

(rui

� r̂ui

)2 + �(b2i

+ b2u

+ ↵2

i

+ ↵2

u

+ ||qi

||22

+ ||pu

||22

)

Stochastic Gradient Descent (SGD) [1] was used to learn the parameters byfirst shu✏ing the order of the ratings within the training set, and then running

Vertex Kernel (k) can be varied

rating category c began from:

�c

=

1

4� k

4X

s=k

Qs+1(c)�Q

s

(c)

Qs

(c)(9)

From this we then calculated the conditional probabilityof a given category being rated highly by accounting for thechange rate of rating preference for the category as follows:

Pr(+|c) =Prior Ratingz }| {Q5(c) +

Change Ratez }| {�c

Q5(c) (10)

By averaging this over all categories for the item i we cancalculate the evolving item bias from the provided trainingsegment:

bi,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c) (11)

2) User Biases Towards Categories: In the previous sec-tion, we induced per-user discrete probability distributions thatcaptured the probability of the user u rating a given category chighly during lifecycle stage s: Pu

s

(c). Given that users’ tasteevolve, our goal is to estimate the probability of the user ratingan item highly given its categories by capturing how the user’spreferences for each category have changed in past (decayingor growing). To capture the development of a user’s preferencefor a category we derived the average change rate (�u

c

) overthe k lifecycle periods coming before the final lifecycle stagein the training set. The parameter k is the number of stagesback in the training segment from which either a monotonicincrease or decrease in the probability of rating category cbegan from. We define the change rate (�u

c

) as follows:

�uc

=

1

4� k

4X

s=k

Pu

s+1(c)� Pu

s

(c)

Pu

s

(c)(12)

In a similar vein, we also capture the influence of globaldynamics on the user’s taste profile. We found in the previoussection that the transfer entropy of the users across all threeplatforms either reduced or increased as users’ progressedthroughout their lifecycles, indicating an increase or decreaseof influence by global taste dynamics respectively. We can cap-ture such signals on a per-user basis by assessing the change intransfer entropy for each user over time and modelling this as aglobal influence factor �u. We derive this as follows, based onmeasuring the proportional change in transfer entropy startingfrom lifecycle period k that produced a monotonic increase ordecrease in transfer entropy:

�u

=

1

4� k

4X

s=k

Ts+1|sQ!P

� Ts|s�1Q!P

Ts|s�1Q!P

(13)

By combining the average change rate (�u

c

) of the userhighly rating a given category c with the global influence factor(�u), we then derived the conditional probability of a userrating a given category highly as follows, where Pu

5 denotesthe taste profile of the user observed for the final lifecyclestage (5):

Pr(+|c, u) =Prior Ratingz }| {Pu

5 (c) +

Change Ratez }| {�uc

Pu

5 (c) +

Global Influencez }| {�uQ5(c) (14)

Given that a single item can be linked to many categorieson the web of linked data, we take the average across allcategories as the bias of the user given the categories of theitem:

bu,cats(i) =

1

|cats(i)|X

c2cats(i)

Pr(+|c, u) (15)

Other schemes for calculating the biases towards categories(both item and user) could be used, e.g. choosing the maximumbias, however we use the average as an initial scheme.

3) Weighting Category Biases: The above category biasesare derived as static features within the recommendation model(Eq. 8) mined from the provided training portion, howevereach user may be influenced by these factors in different wayswhen performing their ratings. To this end we included twoweights, one for each category bias, defined as ↵

i

and ↵u

forthe item biases to categories and the user biases to categoriesrespectively. As we will explain below, these weights are thenlearnt during the training phase of inducing the model.

C. Personalisation Component

The personalisation component of the SemanticSV D++

model builds on the existing SV D++ model by Koren et al.[2]. The modified model has four latent factor vectors: q

i

2 Rf

denotes the f latent factors associated with the item i; pu

2 Rf

denotes the f latent factors associated with the user u; yj

2 Rf

denotes the f latent factors for item j from the set of rateditems by user u: R(u); and we have defined a new vector z

c

2Rf which captures the latent factor vector, of f -dimensions,for a given semantic category c. We denote this latter, addi-tional component as the category factors component, and itsinclusion is based on the notion that semantic categories have astronger affinity with certain factors, for instance the DBPediacategory category:1970s_science_fiction_filmswill have a strong positive affinity with the latent factorcorresponding to Science Fiction films, and that this can betailored to the user given that we derive a single vector foreach semantic category of their rated items. By includingthis information we anticipated that additional cues for userpreferences, functioning between the semantic categories andlatent factors, would be captured. The latent factors are derivedduring learning and the number of factors to use is set as ahyperparameter in the model - as we shall explain below.

D. Model Learning and Hyperparameter Optimisation

To learn the parameters in our recommendation model(item and user biases, category bias weights, latent factorvectors) our goal was to minimise the following objectivefunction, regularising model parameters to control for overfitting using L2-regularisation:

min

b⇤,↵⇤,p⇤,q⇤

X

(u,i,t,r)2D

(rui

� r̂ui

)

2

+ �(b2i

+ b2u

+ ↵2i

+ ↵2u

+ ||qi

||22 + ||pu

||22) (16)

To learn the parameters we used Stochastic Gradient De-scent (SGD) [12] using the standard process of first shufflingthe order of the ratings within the training set, and then runningthrough the set of ratings one at a time. For each rating we

Average change in Transfer Entropy of the User

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 14

β controls the transfer of

category ratings

Page 16: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

¨ Aim: Predict users’ ratings of items ¤ Minimise the Root Mean Square Error (RMSE)

1.  How does a semantic model perform against existing MF-based approaches?

2.  Does transferring categories actually reduce error?

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++ 15

Experiments

Page 17: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Experimental Setup

¨  Tested four models (trained using Stochastic Gradient Descent) ¤  SVD and SVD++ (baselines) ¤  SB-SVD++: SVD++ with Semantic Category Biases ¤  S-SVD++ (SB-SVD++ with personalisation component)

¨  Tuned hyperparameters over the validation split: ¤  All models: Learning rate, regularisation weight, #factors ¤  Semantic Models: transfer parameter β

¨  Model testing: ¤  Trained models using both training+validation splits ¤  Applied to held-out final 10% of reviews

¨  Evaluation measure: Root Mean Square Error

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

16

Page 18: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Results: Ratings Prediction Error

Tuned β gives preference of known rated categories over transferred ones

16Gb RAM and 2Tb disk space) - i.e. optimising the model’s parameters withSGD given the hyperparameters and reporting the error over the validation split.

7.2 Results: Ratings Prediction Error

We now report on the results from forecasting the ratings within the test setbased on the optimised models following hyperparmater tuning. Table 4 presentsthe RMSE values that we achieved. In order to assess for chance e↵ects we per-formed significance testing using the Mann-Whitney test to assess for di↵erencesin location between the SV D++ baseline model and each of the proposed models(with di↵erent kernels) - after randomly splitting the test segment into 25-foldsand macro-averaging the RMSE.7 We find that for all models we achieved astatistically significant reduction in RMSE over the baseline - with the signif-icance probability levels indicated. The results also indicate that the inclusionof transferred categories reduces prediction error over the use of no vertex ker-nel, thereby suggesting that the use of prior rating information from relatedcategories boosts performance.

We find that the Cosine kernel performs best over both SV D++ with seman-tic biases, and SemanticSV D++, in each case with a higher �

k

weighting. Underthis weighting scheme, a higher �

k

places more emphasis on the item’s categoriesthat the user has previously rated, rather than transferring in ratings to coverthe unreviewed categories. We find varying levels across the other kernels where,aside from the JS-Divergence kernel, the optimised �

k

places more emphasis onusing rated semantic categories that the item is aligned to.

Table 4. Root Mean Square Error (RMSE) results with each model’s best kernel ishighlighted in bold with the p-value of the Mann-Whitney with the baseline marked.

Model Kernel Tuned Parameters RMSE SV D

- � = 0.001, ⌘ = 0.1, f = 50 1.786 SV D++ - � = 0.01, ⌘ = 0.05, f = 100 1.591

SB�SV D++ - � = 10�5, ⌘ = 0.05, f = 100 1.590*

Cosine � = 10�5, ⌘ = 0.05, f = 20, �k

= 0.9 1.588***Dice � = 0.001, ⌘ = 0.05, f = 20, �

k

= 0.7 1.589**Squared-Euclidean � = 10�5, ⌘ = 0.05, f = 20, �

k

= 0.6 1.589**JS-Divergence � = 0.01, ⌘ = 0.05, f = 50, �

k

= 0.3 1.590* S�SV D++ - � = 0.001, ⌘ = 0.05, f = 20 1.590*

Cosine � = 0.01, ⌘ = 0.05, f = 5, �k

= 0.8 1.588***Dice � = 0.001, ⌘ = 0.05, f = 20, �

k

= 0.9 1.590*Squared-Euclidean � = 0.05, ⌘ = 0.05, f = 5, �

k

= 0.7 1.590*JS-Divergence � = 10�4, ⌘ = 0.05, f = 10, �

k

= 0.8 1.589**Significance codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 .

7 N.b. all tested models significantly outperformed SV D at p < 0.001, so we do notreport the di↵erent p-values here.

Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

17

Page 19: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

Conclusions ¨  Users’ taste evolution can be tracked with semantic

categories ¨  Vertex kernels overcome the cold-start categories

problem ¨  Significant reduction in error:

¤ Over baselines + When transferring semantic categories

¨  Future Work: 1.  Vertex kernels based on linked data graph traversals 2.  Ranked-loss objectives (i.e. top-k models)

Page 20: Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++

@mrowebot | [email protected] http://www.lancaster.ac.uk/staff/rowem/

Questions?