wise2017 - factorization machines leveraging lightweight linked open data-enabled features for top-n...

Factorization Machines Leveraging Lightweight Linked Open Data-enabled Features for Top-N Recommendations

Guangyuan Piao, John G. Breslin Insight Centre for Data Analytics, National University of Ireland Galway

The 18th International Conference on Web Information Systems Engineering Moscow, Russia, 7-10th, October

Background

Linked Open Data (LOD) provides domain knowledge and rich Information about items

content-based recommender systems [source]: http://lod-cloud.net

•  1st class citizen in LOD cloud

•  Structured information from Wikipedia

•  4.58 million things •  1,445,000 persons, 87,000 films etc.

Background

Linked Open Data (LOD) provides domain knowledge and rich Information about items

[source]: http://lod-cloud.net

knowledge base

Background Knowledge from DBpedia

Chase_films Auto_racing_films …

•  Knowledge is represented as SPO triples •  SPO: Subject ! Property ! Object

•  Knowledge is freely accessible via a public SPARQL Endpoint

Background Knowledge from DBpedia

musicComposer

(Subject)

(Property)

(Object)

(Some) Related Work

•  Semantic Similarity/Distance Measures •  [Passant et al. ISWC’10, AAAI’10] •  [Piao et al. SAC’16]

•  Graph-based algorithms such as PageRank •  [Musto et al. UMAP’16] •  [Nguyen et al. WWW’15]

•  Machine learning approaches •  [Noia et al. RecSys’12], VSM + SVM classifier •  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPRank)

(Some) Related Work

•  Semantic Similarity/Distance Measures •  [Passant et al. ISWC’10, AAAI’10] •  [Piao et al. SAC’16]

•  Graph-based algorithms such as PageRank •  [Musto et al. UMAP’16] •  [Nguyen et al. WWW’15]

•  Machine learning approaches •  [Noia et al. RecSys’12], VSM + SVM classifier •  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPrank)

user-item interactions

item background knowledge

build a graph

extract features

feed to algorithms

SPARQL Endpoint

Combined Graph

Chase_films …

build a graph

extract features

feed to algorithms

SPARQL Endpoint

•  Using lightweight LOD features from DBpedia •  lightweight: directly obtained via SPARQL Endpoint

•  Lightweight LOD features •  Property-Object list (PO)

Proposed Approach: Features

SPARQL Endpoint

dbr:The_Godfather

dbr:Carlo_Savina

dbo:knownFor

dbr:Francis_Ford_Coppola

dbr:The_Godfather_Returns dbc:Gangster_films

dbo:series

dbo:director

dc:subject

feed to algorithms

•  LOD features •  Property-Object list (PO) •  Subject-Property list (SP)

SPARQL Endpoint

dbr:The_Godfather

dbr:Carlo_Savina

dbo:knownFor

dbo:series

dbo:director

dc:subject

feed to algorithms

•  LOD features •  Property-Object list (PO) •  Subject-Property list (SP) •  PageRank score (PR)

SPARQL Endpoint

dbr:The_Godfather

dbr:Carlo_Savina

dbo:knownFor

dbo:series

dbo:director

dc:subject

feed to algorithms

•  Factorization Machines (FMs)

•  Optimization: Bayesian Personalized Ranking (BPR)

Proposed Approach: Algorithms

Proposed Approach

1 0 … 1 0 … 0.2 0.2 … 0.1 0 … 0.1

0 1 … 0 1 … 0.3 0.5 … 0 0.3 … 0.2

user item PO SP PR

Feature vector x Target y

•  Overall features for Factorization Machines

•  Movielens dataset for LOD-enabled recommender systems

•  80% for training set, and 20% for test set

Experimental Setup: Dataset

•  P@N: the precision at rank N

•  R@N: the recall at rank N

•  nDCG@N: normalized Discounted Cumulative Gain

•  MRR: Mean Reciprocal Rank

•  MAP: Mean Average Precision

Experimental Setup: Evaluation Metrics

•  PopRank: baseline approach

•  kNN-item: item-based k-nearest neighbors algorithm

•  BPRMF: matrix factorization with the BPR optimization

•  SPRank: learning-to-rank using semantic paths based on LOD

•  LODFM: our proposed approach

Experimental Setup: Compared Methods

Results

best tuned parameters: m=200, PO+PR

Model Analysis: Features (m=10)

Model Analysis: Dimensionality

•  LODFM provides state-of-the-art performance

•  Using FMs with lightweight LOD-enabled features •  directly obtained via a public SPARQL Endpoint of DBpedia •  without maintaining graph, and extracting features from it

•  Useful features: Property-Object list & PageRank •  Feature work

•  investigate other lightweight LOD-enable features •  evaluate in other domain dataset

Conclusions

Guangyuan Piao e-mail: guangyuan.piao@insight-centre.org twitter: https://twitter.com/parklize slideshare: http://www.slideshare.net/parklize

wise2017 - factorization machines leveraging lightweight linked open data-enabled features for top-n...

Data & Analytics

factorization machines

qubitization of arbitrary basis quantum chemistry leveraging...

factorization of natural numbers based on quaternion...

leveraging microservices for unified commerce...architecture...

factorization machine

composite number into factors using prime factorization...

temporal factorization vs. spatial factorization

algebra- factorization

common factor factorization

matrix factorization

delft university of technology advanced factorization ......

semantics & factorization

nonnegative matrix factorization - complexity, algorithms...

leveraging campus directories: lightweight authorization and...

factorize factorization · 2 factorize factorization is...

factorization & independence

leveraging lightweight analyses to aid software...

prime factorization

prime factorization ppt

triangular factorization