A File-based Approach for Recommender Systems in High-
Performance Computing Environments
Simon Dooms
@sidooms
Introduction
Is a database always the best option?
IntroIntro Hardware Workflow Item User Calc Results Concl.
09/02/2011 Simon Dooms - Ghent University - RSmeetDB '11 2
0.5%
99.5%
Hardware
Shared storage (RAID5)
Infiniband connectX DDR
194 computing nodes:8 cores @ 2.5 GHz16 GB RAM146 GB local storage
IntroHardwareHardware Workflow Item User Calc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 309/02/2011
Recommendation workflowIntro Hardware
WorkflowWorkflow Item User Calc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 4
Consumptions Item Metadata
Item SimilarityCalculation
Item
Similarities
RecommendationCalculation
User Similarities
User SimilarityCalculation
Consumptions
Consumptions Item
Similarities
Phase 1: Item Similarity
Phase 2: User Similarity
Phase 3: Recommendation
09/02/2011
Item similarityIntro Hardware Workflow
ItemItem User Calc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 5
Item Metadata
Item SimilarityCalculation
Item
Similarities
09/02/2011
Item similarityIntro Hardware Workflow
ItemItem User Calc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 6
node node node node node
C C C C C C C C C C
09/02/2011
File bucketsIntro Hardware Workflow
ItemItem User Calc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 7
MODULO
Example for 3 buckets
09/02/2011
Writing item similaritiesIntro Hardware Workflow
ItemItem User Calc Results Concl.
C C C C C C
Local Storage
Shared Storage
Simon Dooms - Ghent University - RSmeetDB '11 809/02/2011
User SimilarityIntro Hardware Workflow Item
UserUser Calc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 9
Item
Similarities
User
Similarities
User SimilarityCalculation
Consumptions
09/02/2011
User SimilarityIntro Hardware Workflow Item
UserUser Calc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 10
C C C C
nodenodenode
node
09/02/2011
Recommendation calculationIntro Hardware Workflow Item User
CalcCalc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 11
User
Similarities
RecommendationCalculation
Consumptions Item
Similarities
09/02/2011
Recommendation calculationIntro Hardware Workflow Item User
CalcCalc Results Concl.
Simon Dooms - Ghent University - RSmeetDB '11 12
…
SimilaritiesItem
SimilaritiesUser
09/02/2011
ResultsIntro Hardware Workflow Item User Calc
ResultsResults Concl.
Simon Dooms - Ghent University - RSmeetDB '11 13
• Proof of concept implementation• Cultural events dataset– 5 months of data– 53,000 items– 1,700 users– 14,000 => 6,800 consumptions
09/02/2011
Used number of nodes: 10, 20, 40, 80, 160Execution time scales inversely with number of nodes
Conclusion
• A file-based approach for HPC• Workflow as independent subjobs • Workflow ≈ embarrasingly parallel• Approach both scalable and memory efficient
Intro Hardware Workflow Item User Calc ResultsConcl.Concl.
Simon Dooms - Ghent University - RSmeetDB '11 1409/02/2011
Simon Dooms
@sidooms
A File-based Approach for Recommender Systems in High-
Performance Computing Environments
With the support of IWT Vlaanderen, Stevin Supercomputer Infrastructure at Ghent University, the Hercules Foundation and EWI