Download - Recommendations @ Rakuten Group
Recommendations @ Rakuten Group
RecSys 2015/12/01 Vincent Michel & David Mas [email protected] & [email protected] Big Data Europe, Big Data Department, Rakuten Inc. / Big Data, PriceMinister
1
Presentation Overview
2
§ Rakuten Group § Recommendations Challenges
• Challenges of Recommendations @ Rakuten • Items Catalogues and Similarities • Exploring Recommendations Models • Recommendations Evaluation and Public Initiatives
§ Conclusion
Rakuten Group Worldwide
3
Recommendation challenges
q Different languages q Users behavior q Business areas
Rakuten Group in Numbers
4
Rakuten in Japan
q > 12.000 employees q > 48 billions euros of GMS q > 100.000.000 users q > 250.000.000 items q > 40.000 merchants
Rakuten Group
q Kobo 18.000.000 users q Viki 28.000.000 users q Viber 345.000.000 users
Rakuten Ecosystem
5
• Rakuten global ecosystem : l Member-based business model that connects Rakuten services l Rakuten ID common to various Rakuten services l Online shopping and services;
Main business areas q E-commerce q Internet finance q Digital content
Recommendation challenges q Cross-services q Aggregated data q Complex users features
Rakuten’s e-commerce: B2B2C Business Model
6
• Business to Business to Consumer: l Merchants located in different regions / online virtual shopping mall l Main profit sources
• Fixed fees from merchants • Fees based on each transaction and other service
Recommendation challenges
q Many shops q Items references q Global catalog
Big Data Department @ Rakuten
7
Big Data Department 150+ engineers – Japan / Europe / US
Missions
q Development and operations of internal systems for:
q Recommendations q Search q Targeting q User behavior tracking
Average traffic
q > 100.000.000 events / day q > 40.000.000 items view / day q > 50.000.000 search / day q > 750.000 purchases / day
Technology stack
q Java / Python / Ruby q Solr / Lucene q Cassandra / Couchbase q Hadoop / Hive / Pig q Redis / Kafka
Presentation Overview
8
§ Rakuten Group § Recommendations Challenges
• Challenges of Recommendations @ Rakuten • Items Catalogues and Similarities • Exploring Recommendations Models • Recommendations Evaluation and Public Initiatives
§ Conclusion
Recommendations on Rakuten Marketplaces
9 9
Non-personalized recommendations q All-shop recommendations:
q Item to item q User to item
q In-shop recommendations q Review-based recommendations
Personalized recommendations q Purchase history recommendations q Cart add recommendations q Order confirmation recommendations
System status and scale q In production in over 35 services of Rakuten Group worldwide q Several hundreds of servers running:
q Hadoop q Cassandra q APIS
Challenges in Recommendations
10 10
Items Catalogue
Items Similarity
Recommendations
engine
Evaluation
Process
• Items catalogues l Catalogue for multiple shops with different items references ?
• Items similarity / distances l Cross services aggregation ? l Lots of parameters ?
• Recommendations engine l Best / optimal recommendations logic ?
• Evaluation process l Offline / online evaluation ? l Long-tail ? KPI ?
Recommendations Architecture: Constantly Evolving
11 11
Browsing Events
Cocounts
Storage
Purchase Events
Catalogue(s)
Distrib
u9on
layer
Recommendations Offline / materialized
Recommendations Online algebra / multi-arm
Presentation Overview
12
§ Rakuten Group § Recommendations Challenges
• Challenges of Recommendations @ Rakuten • Items Catalogues and Similarities • Exploring Recommendations Models • Recommendations Evaluation and Public Initiatives
§ Conclusion
Items Catalogues
13 13
Use different levels of aggregation to improve recommendations
Category-level (e.g. food, soda, clothes, …)
Product-level (manufactured items)
Item in shop-level (specific product sell by a specific shop)
Increased statistical power in co-events computation
Easier business handling (picking the good item)
Enriching Catalogues using Record Linkage
14
Record linkage q Use external sources (e.g., Wikidata) to align markets' products q Fuzzy matching of 600K vs 350K items for movies alignments usecase. q Blocking algorithm
Cross recommendation q Global catalog q Items aggregation q Helps with cold start issues q Improved navigation
Marketplace 2 Marketplace 1 Reference database
Co-occurrences and Similarities Computation
15
Multiple possible parameters:
l Size of time window to be considered: Does browsing and purchase data reflect similar behavior ?
l Threshold on co-occurrences Is one co-occurrence significant enough to be used ? Two ? Three ?
l Symmetric or asymmetric Is the order important in the co-occurrence ? A then B == B then A ?
l Similarity metrics Which similarity metrics to be used based on the co-occurrences ?
Only access to unitary data (purchase / browsing)
Use co-occurrences for computing items similarity
Co-occurrences Example
16
Browsing
Purchase
Session ? Session ? Time window 1
Session ? Time window 2
07/11/2015 08/11/2015
08/11/2015
24/11/2015
08/11/2015
08/11/2015
10/09/2015
08/09/2015 10/09/2015
Co-occurrences Computation
17
Co-‐purchases
Co-‐browsing
Classical co-occurrences
Complementary items
Subs9tute items
Other possible co-occurrences
Items browsed and bought together
Items browsed and not bought together
“You may also want…”
“Similar items…”
08/11/2015
08/11/2015
08/11/2015
07/11/2015
08/11/2015 10/09/2015
08/09/2015 07/11/2015
Presentation Overview
18
§ Rakuten Group § Recommendations Challenges
• Challenges of Recommendations @ Rakuten • Items Catalogues and Similarities • Exploring Recommendations Models • Recommendations Evaluation and Public Initiatives
§ Conclusion
Recommendations Algebra
19
Keys ideas l Reuse already existing logics and combine them easily. l Write business logic, not code ! l Handle multiple input/output formats.
Algebra for defining and combining recommendations engines
19
Available Logics q Content-based q Collaborative-filtering q Item-item q User-item (personalization)
Available Backends q In-memory q HDF5 files q Cassandra q Couchbase
Available Hybridization q Linear algebra / weighting q Mixed q Cascade engines q Meta-level
Python Algebra Example
20
>>> engine1 = RecommendationsEngine(nb_recos=20, datatype=‘purchase’, ! asymmetric=True, ! distance=‘conditional_probability’)!>>> engine2 = RecommendationsEngine(similarity_th=0.01, datatype=‘browsing’, ! asymmetric=False, !
! ! ! distance=‘cosine_similarity’)!>>> composite_engine = engine1 + 0.2 * engine2! Get recommendations from items (item-to-item) !>>> recos = composite_engine.recommendations_by_items([123, 456, 789, …])!
20
Purchase-based Top-20
Asymmetric Conditional probability
Browsing-based Similarity > 0.01
Symmetric Cosine similarity
+ 0.2 Composite engine
Python Algebra with Personalization
21
>>> history = HistoryEngine(datatype=‘purchase’, time_window=180, time_decay=0.01)!>>> engine1.register_history_engine(history)! …same code as previously (user-to-item)!!>>> recos = composite_engine.recommendations_by_user(‘userid’)!
21
Purchase-based Top-20
Asymmetric Conditional probability
Browsing-based Similarity > 0.01
Symmetric Cosine similarity
+ 0.2 Composite engine
Purchase-history Time window 180 days
Time decay 0.01
Python Algebra – Complete Example
22 22
Purchase-based Top-20
Asymmetric Conditional probability
Browsing-based Similarity > 0.01
Symmetric Cosine similarity
+ 0.2 Composite engine
Purchase-history Time window 180 days
Time decay 0.01
X (cascade)
Purchase-based Category-level
Similarity > 0.01 Asymmetric
Conditional probability
Browsing-based Category-level Similarity > 0.1
Symmetric Cosine similarity
+ 0.1
Composite engine
Presentation Overview
23
§ Rakuten Group § Recommendations Challenges
• Challenges of Recommendations @ Rakuten • Items Catalogues and Similarities • Exploring Recommendations Models • Recommendations Evaluation and Public
Initiatives
§ Conclusion
Recommendation Quality Challenges
24
Minor Product
Major Product
(Popular) New Product
Old Product
(A) (B)
(D)
(C)
Recommendations categories • Cold start issue
• External data ? • Cross-services ?
• Hot products (A) • Top-N items ?
• Short tail (B) • Long tail (C + D)
Long Tail is Fat
25
Long tail numbers • Most of the items are long tail • They still represent a large portion of the
traffic
Popular
Short tail
Long tail
Browsing share Number of items
Long tail Short tail Popular
Long tail approaches • Content-based • Aggregation / clustering • Personalization
Evaluation
26
Browsing History
Query History
Purchase History
Algorithms
Datasets
Offline Test Long-term Research
Online Test KPI Maximization
Use as prior
Correlation between offline metrics & value
Hybrid approach q Offline for Long-Term and Prior q Online for Short-Term and Maximizing KPI’s
Offline Evaluation
27
Pros/Cons • Convenient way to try new ideas • Fast and cheap • But hard to align with online KPI
Approaches • Rescoring • Prediction game • Business simulator
Target = item bought by user
Offline Evaluation for Online Learning
28
Public Initiative – Viki Recommendation Challenge
567 submissions from 132 participants http://www.dextra.sg/challenges/rakuten-viki-video-challenge
29
Presentation Overview
30
§ Rakuten Group § Recommendations Challenges
• Challenges of Recommendations @ Rakuten • Items Catalogues and Similarities • Exploring Recommendations Models • Recommendations Evaluation and Public Initiatives
§ Conclusion
Conclusion
31
Items catalogue: reinforce statistical power of co-occurrences across shops and services; Items similarities: find the good parameters for the different use-cases;
Recommendations models: what is the best models for in-shop, all-shops, personalization? Evaluation: handling long-tail? Comparing different models?
Rakuten provides marketplaces worldwide
Specific challenges for recommendations
We are Hiring!
32
Data Scientist / Software Developer
Ø Build algorithms for recommendations, search, targeting Ø Predictive modeling, machine learning, natural language processing Ø Working close to business Ø Python, Java, Hadoop, Couchbase, Cassandra…
Ø Also hiring: search engine developers, big data system administrators, etc.
Big Data Department – team in Paris http://global.rakuten.com/corp/careers/bigdata/
http://www.priceminister.com/recrutement/?p=197
33
THANKS !
Questions ?
More on Rakuten tech initiatives
http://www.slideshare.net/rakutentech http://rit.rakuten.co.jp/oss.html
http://rit.rakuten.co.jp/opendata.html
Positions
• http://global.rakuten.com/corp/careers/bigdata/ • http://www.priceminister.com/recrutement/?p=197