when recommenders met big data: an architectural proposal and evaluation [ceri '14 slides]

28
When Recommenders Met Big Data An Architectural Proposal and Evaluation Daniel Valcarce Javier Parapar ´ Alvaro Barreiro CERI 2014 3rd Spanish Conference on Information Retrieval A Coru˜ na, June 2014

Upload: daniel-valcarce

Post on 18-Feb-2017

137 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

When Recommenders Met Big DataAn Architectural Proposal and Evaluation

Daniel Valcarce Javier Parapar Alvaro Barreiro

CERI 2014

3rd Spanish Conference on Information Retrieval

A Coruna, June 2014

Page 2: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Page 3: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Motivation

According to Shareaholic, in 2013...

� web traffic generated by search engines dropped 6%

� social networks increased more than 100%

Users...

� used to query what they want

� want personalised recommendations

1 of 19

Page 4: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommender Systems

Objective

Predict user preferences over items

Approaches

� Content-based: uses properties of the items

� Collaborative filtering: based on similar users

� Hybrid approaches: combination of both

2 of 19

Page 5: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommender Systems

Objective

Predict user preferences over items

Approaches

� Content-based: uses properties of the items

� Collaborative filtering: based on similar users

� Hybrid approaches: combination of both

2 of 19

Page 6: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Page 7: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our work

� Recommender architecture proposal for Big Data

� Detail specific technologies for each component

� Efficiency study of MySQL Cluster and Cassandra as alternatives forstoring ratings and recommendations in the proposed architecture

3 of 19

Page 8: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Generic Recommender System Architecture

Front-end

Storage

Recommendation engine

4 of 19

Page 9: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our goals

� Scalability� More machines → more computational power� Big Data capable

� High availability� Fault-tolerance� No single point of failure

5 of 19

Page 10: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

6 of 19

Page 11: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Front-end

Use cases

� Search items

� Emit ratings

� Get recommendations

Proposed architecture

� Distributed web application (Django)

� Redundant load balancers (Perlbal)� Two levels of cache

� Reverse proxy cache (Varnish)� Distributed memory cache (Memcached)

7 of 19

Page 12: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

8 of 19

Page 13: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Recommendation Engine

� Recommendations are precalculated and stored

� A batch process refreshes the suggestions regularly� Use of MapReduce distributed model

� State-of-the-art paradigm for large-scale data processing� Hadoop: MapReduce open source implementation� Mahout: scalable machine learning library

9 of 19

Page 14: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

10 of 19

Page 15: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Storage Component I

Information to be stored

� Common web application data (e.g., user profiles)

� Manage large amount of ratings and recommendations

� Data about items

Requirements

� Read-scalable and fault-tolerance (replication)

� Write-scalable (sharding)

� Linear scalability with the number of nodes

11 of 19

Page 16: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Our proposal: Storage Component II

Proposed technologies

� Relational database (MySQL Cluster)

� NoSQL column store (Cassandra)

� Inverted indexes (Solr)

12 of 19

Page 17: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

13 of 19

Page 18: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Page 19: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Experiment: storing ratings and recomendations

Candidates

� MySQL Cluster

� Cassandra

Netflix Prize Dataset

� 100M ratings

� 480k users

� 17.7k films

Cluster configuration

� Number of machines: 4

� Replication factor: 2

14 of 19

Page 20: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Rating Insertion

Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8 concurrent petitions

0.00

0.05

0.10

0.15

0.20

0.25

0.30

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08

mil

isec

on

ds/

inse

rtio

n

# ratings

MySQL Cluster 8 Cassandra 8

15 of 19

Page 21: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Rating Insertion

Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8, 16, 32 and 64 concurrent petitions

0.00

0.05

0.10

0.15

0.20

0.25

0.30

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08

mil

isec

on

ds/

inse

rtio

n

# ratings

MySQL Cluster 8MySQL Cluster 16MySQL Cluster 32MySQL Cluster 64

Cassandra 8Cassandra 16Cassandra 32Cassandra 64

15 of 19

Page 22: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Rating Insertion

Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8, 16, 32 and 64 concurrent petitions

0.00

0.05

0.10

0.15

0.20

0.25

0.30

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08

mil

isec

onds/

inse

rtio

n

# ratings

MySQL Cluster 8MySQL Cluster 16MySQL Cluster 32MySQL Cluster 64

Cassandra 8Cassandra 16Cassandra 32Cassandra 64

15 of 19

Page 23: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommendation Generation

Table: Times for Mahout’s Item-based Collaborative Filtering algorithmreading and writing directly to/from the database

Storage Time Time persystem (min) recommendation (ms)

Cassandra 68.85 8.6

MySQL Cluster crash! crash!

16 of 19

Page 24: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommendation Generation

Table: Times for Mahout’s Item-based Collaborative Filtering algorithm

Storage Time Time persystem (min) recommendation (ms)

Cassandra 68.85 8.6

MySQL Cluster * 274.73 34.3

* Using Sqoop, a tool for transferring bulk data between HadoopDistributed File System and relational databases.

17 of 19

Page 25: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Recommendation Serving

Figure: Average serving rate obtained by querying the top 10 recommendeditems for 25 million users using 8, 16, 32 and 64 concurrent petitions

8 16 32 64# threads

0.00

0.05

0.10

0.15

0.20

0.25

0.30

mili

seco

nds/

serv

ing

MySQL ClusterCassandra

18 of 19

Page 26: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Table of Contents

IntroductionMotivationRecommender Systems

Recommender System ArchitectureOverviewFront-endRecommendation engineStorage

Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving

Conclusions and Future Work

Page 27: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

Introduction Recommender System Architecture Experiments and results Conclusions and Future Work

Conclusions and Future Work

� We have proposed a highly scalable and fault-tolerant platform forrecommender systems.

� We have benchmarked Cassandra and MySQL Cluster in the contextof recommender systems.

� Future: study and benchmark more parts of the proposed platform.

� Future: develop more effective recommender algorithms on the plat-form.

19 of 19

Page 28: When Recommenders Met Big Data: an Architectural Proposal and Evaluation [CERI '14 Slides]

When Recommenders Met Big DataAn Architectural Proposal and Evaluation

Daniel Valcarce Javier Parapar Alvaro Barreiro

CERI 2014

3rd Spanish Conference on Information Retrieval

A Coruna, June 2014