when recommenders met big data: an architectural proposal and evaluation [ceri '14 slides]
TRANSCRIPT
When Recommenders Met Big DataAn Architectural Proposal and Evaluation
Daniel Valcarce Javier Parapar Alvaro Barreiro
CERI 2014
3rd Spanish Conference on Information Retrieval
A Coruna, June 2014
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
IntroductionMotivationRecommender Systems
Recommender System ArchitectureOverviewFront-endRecommendation engineStorage
Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Motivation
According to Shareaholic, in 2013...
� web traffic generated by search engines dropped 6%
� social networks increased more than 100%
Users...
� used to query what they want
� want personalised recommendations
1 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommender Systems
Objective
Predict user preferences over items
Approaches
� Content-based: uses properties of the items
� Collaborative filtering: based on similar users
� Hybrid approaches: combination of both
2 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommender Systems
Objective
Predict user preferences over items
Approaches
� Content-based: uses properties of the items
� Collaborative filtering: based on similar users
� Hybrid approaches: combination of both
2 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
IntroductionMotivationRecommender Systems
Recommender System ArchitectureOverviewFront-endRecommendation engineStorage
Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our work
� Recommender architecture proposal for Big Data
� Detail specific technologies for each component
� Efficiency study of MySQL Cluster and Cassandra as alternatives forstoring ratings and recommendations in the proposed architecture
3 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Generic Recommender System Architecture
Front-end
Storage
Recommendation engine
4 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our goals
� Scalability� More machines → more computational power� Big Data capable
� High availability� Fault-tolerance� No single point of failure
5 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
6 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Front-end
Use cases
� Search items
� Emit ratings
� Get recommendations
Proposed architecture
� Distributed web application (Django)
� Redundant load balancers (Perlbal)� Two levels of cache
� Reverse proxy cache (Varnish)� Distributed memory cache (Memcached)
7 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
8 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Recommendation Engine
� Recommendations are precalculated and stored
� A batch process refreshes the suggestions regularly� Use of MapReduce distributed model
� State-of-the-art paradigm for large-scale data processing� Hadoop: MapReduce open source implementation� Mahout: scalable machine learning library
9 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
10 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Storage Component I
Information to be stored
� Common web application data (e.g., user profiles)
� Manage large amount of ratings and recommendations
� Data about items
Requirements
� Read-scalable and fault-tolerance (replication)
� Write-scalable (sharding)
� Linear scalability with the number of nodes
11 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Our proposal: Storage Component II
Proposed technologies
� Relational database (MySQL Cluster)
� NoSQL column store (Cassandra)
� Inverted indexes (Solr)
12 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
13 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
IntroductionMotivationRecommender Systems
Recommender System ArchitectureOverviewFront-endRecommendation engineStorage
Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Experiment: storing ratings and recomendations
Candidates
� MySQL Cluster
� Cassandra
Netflix Prize Dataset
� 100M ratings
� 480k users
� 17.7k films
Cluster configuration
� Number of machines: 4
� Replication factor: 2
14 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
mil
isec
on
ds/
inse
rtio
n
# ratings
MySQL Cluster 8 Cassandra 8
15 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8, 16, 32 and 64 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
mil
isec
on
ds/
inse
rtio
n
# ratings
MySQL Cluster 8MySQL Cluster 16MySQL Cluster 32MySQL Cluster 64
Cassandra 8Cassandra 16Cassandra 32Cassandra 64
15 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Rating Insertion
Figure: Average insertion rate obtained by inserting from 10 to 100 millionratings using 8, 16, 32 and 64 concurrent petitions
0.00
0.05
0.10
0.15
0.20
0.25
0.30
1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 9e+07 1e+08
mil
isec
onds/
inse
rtio
n
# ratings
MySQL Cluster 8MySQL Cluster 16MySQL Cluster 32MySQL Cluster 64
Cassandra 8Cassandra 16Cassandra 32Cassandra 64
15 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Generation
Table: Times for Mahout’s Item-based Collaborative Filtering algorithmreading and writing directly to/from the database
Storage Time Time persystem (min) recommendation (ms)
Cassandra 68.85 8.6
MySQL Cluster crash! crash!
16 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Generation
Table: Times for Mahout’s Item-based Collaborative Filtering algorithm
Storage Time Time persystem (min) recommendation (ms)
Cassandra 68.85 8.6
MySQL Cluster * 274.73 34.3
* Using Sqoop, a tool for transferring bulk data between HadoopDistributed File System and relational databases.
17 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Recommendation Serving
Figure: Average serving rate obtained by querying the top 10 recommendeditems for 25 million users using 8, 16, 32 and 64 concurrent petitions
8 16 32 64# threads
0.00
0.05
0.10
0.15
0.20
0.25
0.30
mili
seco
nds/
serv
ing
MySQL ClusterCassandra
18 of 19
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Table of Contents
IntroductionMotivationRecommender Systems
Recommender System ArchitectureOverviewFront-endRecommendation engineStorage
Experiments and resultsRating InsertionRecommendation GenerationRecommendation Serving
Conclusions and Future Work
Introduction Recommender System Architecture Experiments and results Conclusions and Future Work
Conclusions and Future Work
� We have proposed a highly scalable and fault-tolerant platform forrecommender systems.
� We have benchmarked Cassandra and MySQL Cluster in the contextof recommender systems.
� Future: study and benchmark more parts of the proposed platform.
� Future: develop more effective recommender algorithms on the plat-form.
19 of 19
When Recommenders Met Big DataAn Architectural Proposal and Evaluation
Daniel Valcarce Javier Parapar Alvaro Barreiro
CERI 2014
3rd Spanish Conference on Information Retrieval
A Coruna, June 2014