wise2017 - factorization machines leveraging lightweight linked open data-enabled features for top-n...

Post on 22-Jan-2018

244 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Factorization Machines Leveraging Lightweight Linked Open Data-enabled Features for Top-N Recommendations

Guangyuan Piao, John G. Breslin Insight Centre for Data Analytics, National University of Ireland Galway

The 18th International Conference on Web Information Systems Engineering Moscow, Russia, 7-10th, October

Background

2

Linked Open Data (LOD) provides domain knowledge and rich Information about items

content-based recommender systems [source]: http://lod-cloud.net

•  1st class citizen in LOD cloud

•  Structured information from Wikipedia

•  4.58 million things •  1,445,000 persons, 87,000 films etc.

Background

3

Linked Open Data (LOD) provides domain knowledge and rich Information about items

[source]: http://lod-cloud.net

knowledge base

Background Knowledge from DBpedia

4

Chase_films Auto_racing_films …

•  Knowledge is represented as SPO triples •  SPO: Subject ! Property ! Object

•  Knowledge is freely accessible via a public SPARQL Endpoint

Background Knowledge from DBpedia

5

musicComposer

(Subject)

(Property)

(Object)

(Some) Related Work

•  Semantic Similarity/Distance Measures •  [Passant et al. ISWC’10, AAAI’10] •  [Piao et al. SAC’16]

•  Graph-based algorithms such as PageRank •  [Musto et al. UMAP’16] •  [Nguyen et al. WWW’15]

•  Machine learning approaches •  [Noia et al. RecSys’12], VSM + SVM classifier •  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPRank)

6

(Some) Related Work

•  Semantic Similarity/Distance Measures •  [Passant et al. ISWC’10, AAAI’10] •  [Piao et al. SAC’16]

•  Graph-based algorithms such as PageRank •  [Musto et al. UMAP’16] •  [Nguyen et al. WWW’15]

•  Machine learning approaches •  [Noia et al. RecSys’12], VSM + SVM classifier •  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPrank)

7

user-item interactions

item background knowledge

build a graph

extract features

feed to algorithms

SPARQL Endpoint

Combined Graph

8

Chase_films …

user-item interactions

item background knowledge

build a graph

extract features

feed to algorithms

SPARQL Endpoint

•  Using lightweight LOD features from DBpedia •  lightweight: directly obtained via SPARQL Endpoint

•  Lightweight LOD features •  Property-Object list (PO)

Proposed Approach: Features

9

user-item interactions

item background knowledge

SPARQL Endpoint

dbr:The_Godfather

dbr:Carlo_Savina

dbo:knownFor

dbr:Francis_Ford_Coppola

dbr:The_Godfather_Returns dbc:Gangster_films

dbo:series

dbo:director

dc:subject

feed to algorithms

•  Using lightweight LOD features from DBpedia •  lightweight: directly obtained via SPARQL Endpoint

•  LOD features •  Property-Object list (PO) •  Subject-Property list (SP)

Proposed Approach: Features

10

user-item interactions

item background knowledge

SPARQL Endpoint

dbr:The_Godfather

dbr:Carlo_Savina

dbo:knownFor

dbr:Francis_Ford_Coppola

dbr:The_Godfather_Returns dbc:Gangster_films

dbo:series

dbo:director

dc:subject

feed to algorithms

•  Using lightweight LOD features from DBpedia •  lightweight: directly obtained via SPARQL Endpoint

•  LOD features •  Property-Object list (PO) •  Subject-Property list (SP) •  PageRank score (PR)

Proposed Approach: Features

11

user-item interactions

item background knowledge

SPARQL Endpoint

dbr:The_Godfather

dbr:Carlo_Savina

dbo:knownFor

dbr:Francis_Ford_Coppola

dbr:The_Godfather_Returns dbc:Gangster_films

dbo:series

dbo:director

dc:subject

feed to algorithms

•  Factorization Machines (FMs)

•  Optimization: Bayesian Personalized Ranking (BPR)

Proposed Approach: Algorithms

12

Proposed Approach

13

1 0 … 1 0 … 0.2 0.2 … 0.1 0 … 0.1

0 1 … 0 1 … 0.3 0.5 … 0 0.3 … 0.2

user item PO SP PR

1

0

x1

Feature vector x Target y

x2

•  Overall features for Factorization Machines

•  Movielens dataset for LOD-enabled recommender systems

•  80% for training set, and 20% for test set

Experimental Setup: Dataset

14

•  P@N: the precision at rank N

•  R@N: the recall at rank N

•  nDCG@N: normalized Discounted Cumulative Gain

•  MRR: Mean Reciprocal Rank

•  MAP: Mean Average Precision

Experimental Setup: Evaluation Metrics

15

•  PopRank: baseline approach

•  kNN-item: item-based k-nearest neighbors algorithm

•  BPRMF: matrix factorization with the BPR optimization

•  SPRank: learning-to-rank using semantic paths based on LOD

•  LODFM: our proposed approach

Experimental Setup: Compared Methods

16

Results

17

best tuned parameters: m=200, PO+PR

Model Analysis: Features (m=10)

18

Model Analysis: Dimensionality

19

Model Analysis: Dimensionality

20

•  LODFM provides state-of-the-art performance

•  Using FMs with lightweight LOD-enabled features •  directly obtained via a public SPARQL Endpoint of DBpedia •  without maintaining graph, and extracting features from it

•  Useful features: Property-Object list & PageRank •  Feature work

•  investigate other lightweight LOD-enable features •  evaluate in other domain dataset

Conclusions

21

22

Guangyuan Piao e-mail: guangyuan.piao@insight-centre.org twitter: https://twitter.com/parklize slideshare: http://www.slideshare.net/parklize

top related