when recommenders met big data: an architectural proposal and evaluation [ceri '14 slides]

Post on 18-Feb-2017

137 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

When Recommenders Met Big DataAn Architectural Proposal and Evaluation

Daniel Valcarce Javier Parapar Alvaro Barreiro

CERI 2014

3rd Spanish Conference on Information Retrieval

A Coruna, June 2014

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Motivation

According to Shareaholic, in 2013...

� web traffic generated by search engines dropped 6%

� social networks increased more than 100%

Users...

� used to query what they want

� want personalised recommendations

1 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommender Systems

Objective

Predict user preferences over items

Approaches

� Content-based: uses properties of the items

� Collaborative filtering: based on similar users

� Hybrid approaches: combination of both

2 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommender Systems

Objective

Predict user preferences over items

Approaches

� Content-based: uses properties of the items

� Collaborative filtering: based on similar users

� Hybrid approaches: combination of both

2 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our work

� Recommender architecture proposal for Big Data

� Detail specific technologies for each component

� Efficiency study of MySQL Cluster and Cassandra as alternatives forstoring ratings and recommendations in the proposed architecture

3 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Generic Recommender System Architecture

Front-end

Storage

Recommendation engine

4 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our goals

� Scalability� More machines → more computational power� Big Data capable

� High availability� Fault-tolerance� No single point of failure

5 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

6 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Front-end

Use cases

� Search items

� Emit ratings

� Get recommendations

Proposed architecture

� Distributed web application (Django)

� Redundant load balancers (Perlbal)� Two levels of cache

� Reverse proxy cache (Varnish)� Distributed memory cache (Memcached)

7 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

8 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Recommendation Engine

� Recommendations are precalculated and stored

� A batch process refreshes the suggestions regularly� Use of MapReduce distributed model

� State-of-the-art paradigm for large-scale data processing� Hadoop: MapReduce open source implementation� Mahout: scalable machine learning library

9 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

10 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Storage Component I

Information to be stored

� Common web application data (e.g., user profiles)

� Manage large amount of ratings and recommendations

� Data about items

Requirements

� Read-scalable and fault-tolerance (replication)

� Write-scalable (sharding)

� Linear scalability with the number of nodes

11 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Storage Component II

Proposed technologies

� Relational database (MySQL Cluster)

� NoSQL column store (Cassandra)

� Inverted indexes (Solr)

12 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

13 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Experiment: storing ratings and recomendations

Candidates

� MySQL Cluster

� Cassandra

Netflix Prize Dataset

� 100M ratings

� 480k users

� 17.7k films

Cluster configuration

� Number of machines: 4

� Replication factor: 2

14 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Rating Insertion

Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8 concurrent petitions

0.00

0.05

0.10

0.15

0.20

0.25

0.30

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08

mil

isec

on

ds/

inse

rtio

n

# ratings

MySQL Cluster 8 Cassandra 8

15 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Rating Insertion

Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8, 16, 32 and 64 concurrent petitions

0.00

0.05

0.10

0.15

0.20

0.25

0.30

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08

mil

isec

on

ds/

inse

rtio

n

# ratings

MySQL Cluster 8MySQL Cluster 16MySQL Cluster 32MySQL Cluster 64

Cassandra 8Cassandra 16Cassandra 32Cassandra 64

15 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Rating Insertion

Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8, 16, 32 and 64 concurrent petitions

0.00

0.05

0.10

0.15

0.20

0.25

0.30

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08

mil

isec

onds/

inse

rtio

n

# ratings

MySQL Cluster 8MySQL Cluster 16MySQL Cluster 32MySQL Cluster 64

Cassandra 8Cassandra 16Cassandra 32Cassandra 64

15 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommendation Generation

Table: Times for Mahout’s Item-based Collaborative Filtering algorithmreading and writing directly to/from the database

Storage Time Time persystem (min) recommendation (ms)

Cassandra 68.85 8.6

MySQL Cluster crash! crash!

16 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommendation Generation

Table: Times for Mahout’s Item-based Collaborative Filtering algorithm

Storage Time Time persystem (min) recommendation (ms)

Cassandra 68.85 8.6

MySQL Cluster * 274.73 34.3

* Using Sqoop, a tool for transferring bulk data between HadoopDistributed File System and relational databases.

17 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommendation Serving

Figure: Average serving rate obtained by querying the top 10 recommendeditems for 25 million users using 8, 16, 32 and 64 concurrent petitions

8 16 32 64# threads

0.00

0.05

0.10

0.15

0.20

0.25

0.30

mili

seco

nds/

serv

ing

MySQL ClusterCassandra

18 of 19

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Conclusions and Future Work

� We have proposed a highly scalable and fault-tolerant platform forrecommender systems.

� We have benchmarked Cassandra and MySQL Cluster in the contextof recommender systems.

� Future: study and benchmark more parts of the proposed platform.

� Future: develop more effective recommender algorithms on the plat-form.

19 of 19

When Recommenders Met Big DataAn Architectural Proposal and Evaluation

Daniel Valcarce Javier Parapar Alvaro Barreiro

CERI 2014

3rd Spanish Conference on Information Retrieval

A Coruna, June 2014

top related