find it! nail it!boosting e-commerce search conversions with machine learning at scale
TRANSCRIPT
![Page 1: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/1.jpg)
October 28, 2017
Giuseppe “Pino” Di Fabbrizio
Rakuten Institute of Technology – Boston
![Page 2: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/2.jpg)
![Page 3: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/3.jpg)
3
• Motivations
• Traditional information retrieval models
• Learning-to-rank models
• Relevance
• Ranking Metrics
• Algorithms
• Ranking optimization
• Use cases
• Summary
• What is next?
Disclaimer: If not otherwise specified, images in this presentation
comply with the (CC) creative commons publishing license
![Page 4: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/4.jpg)
4
• E-commerce growing faster than traditional brick-and-mortar market ($4.06T by 2020)
• Mobile shopping adoption increasing worldwide (46% shoppers in Asia and 28% in North America)
• Online catalogs offering broader selections and competitive products
• Electronic money transactions gaining more consumers’ trust
• Massive data collected during web and mobile interactions providing foundation for machine learning-driven optimizations
1.61BShoppers
$1.86TSales
$150B*Revenues
ML
*2016 Combined revenues for Amazon, Otto Group, and Rakuten
https://www.statista.com/topics/871/online-shopping/
![Page 5: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/5.jpg)
5
![Page 6: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/6.jpg)
6
250M+ Products
40k+ Categories
![Page 7: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/7.jpg)
7
How do we find
the most relevant
products for a
search query?
www.rakuten.com
Oct 10, 2017
![Page 8: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/8.jpg)
8Query
Rankingfunction
Documents
www.rakuten.com
Nov 2016
1 2 3
4 5 6
7 8 9
![Page 9: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/9.jpg)
9
• Relevance is estimated by lexical matches of query terms with document terms
• Examples:
• Boolean models
• Vector space models
• Latent semantic indexing
• Okapi BM25
Index
Indexer
Query
Documents
Scoring
model
Top-n retrieved
documentsOn-line
Off-line
![Page 10: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/10.jpg)
10
www.rakuten.com
Oct 10, 2017
Query (Q)
Document 1 (D1)
Document 2 (D2)
iphone
7
case
iphone 7 Case
Q 1 1 1
D1 2 2 2
D2 3 1 0
Q
D1D2
![Page 11: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/11.jpg)
11
• Basic ideas
• Lexical similarity metrics
• Penalizing repeated occurrences of the same term
• Penalizing term frequency for longer documents
• Only few features
• Manually hand-tuned feature weights based on heuristic
• Cannot include important search signals such as user’s feedback, product popularity, purchase history, etc.
• Fast and scalable
![Page 12: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/12.jpg)
12
• Data-driven approach
• Directly optimize products rank based on relevance (different from classification and regression ML tasks)
• Handle thousands of features
• Robust to noisy data
• Handle personalization
• Industry & research state-of-the-art (Amazon, eBay, Microsoft, Yahoo!, Yandex, etc.)
![Page 13: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/13.jpg)
13
A document is relevant if contains the information the user was looking for when submitted the query
Relevance is subjective and depends on many factors:• context (what is displayed and how)
• task (purchase, search info, answer, etc.)
• novelty (unexpected data, ads, ext.)
• time and user’s effort involved
![Page 14: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/14.jpg)
14
1
32
www.rakuten.com
Nov 2016
![Page 15: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/15.jpg)
15
buyclick add
www.rakuten.com
Nov 2016
![Page 16: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/16.jpg)
16
• Clickthrough data (user’s implicit feedback) as source of relevance for search query / document pairs
• Pros
• Abundant and easy to harvest
• Always fresh
• Unbiased
• Cons
• Noisy
• Long tail queries
• Simple relevance mapping:
• score = 0 (not relevant), score = 3 (highly relevant)
• Purchase > cart > click > impression
Score User’s implicit feedback
3 Product purchased
2 Product added to the shopping cart
1 Product clicked
0 No clicks
![Page 17: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/17.jpg)
17
Seen products
Potentially
seen products
Unseen
products
Browser
viewport
Click
www.rakuten.com
Aug 2017
![Page 18: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/18.jpg)
18
Documents
Normalized and Discounted Cumulative Gain
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
NDCG
![Page 19: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/19.jpg)
19
• Tree ensemble method
• Handle sparse data
• Handle missing values and various value types
• Robust to outliers
• Learn higher-order feature interactions
• Invariant to feature scaling
• Highly scalable and optimized open source implementation (XGBoost)
![Page 20: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/20.jpg)
20
Point-wise
• Input: single documents / Output: class labels or scores
• Classify each document as relevant or non-relevant.
• Adjust w to reduce classification errors
Pairwise ranking
• Input: document pairs / Output: partial order preferences
• Classify pairs of documents – D1 > D2?
• Adjust w to reduce discordant pairs
List-wise ranking
• Input: document collections / ranked document list
• Score permutations -- Is {D1,D2,…} > {D1’,D2’,…} ?
• Adjust w to directly maximize ranking measure of interest (NDCG)
Di
Q
QDjDi >
QDjDi > Dk>
![Page 21: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/21.jpg)
21
Green = relevant
Gray = not-relevant
Blue arrows = boost for pair-wise loss function
Red arrows = boost for list-wise loss function
(a) is the perfect ranking;
(b) is ranking with 10 pairwise errors;
(c) is ranking with 8 pairwise errors
![Page 22: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/22.jpg)
22
• Relevance: User’s behavior signals
• Ranking Metrics: NDCG
• Machine Learning Algorithm: Gradient Tree Boosting
• Ranking optimization: List-wise with NDCG metrics
![Page 23: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/23.jpg)
23
Index
Indexer
Query
Documents
Scoring
model
Scores
Query
Features
Training
data
Learning
to rank
Re-ranking
model
Top-n ranked
documents (n > 1M)Top-m re-ranked
documents (m < 1k)
On-line
Off-line
Relevance
![Page 24: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/24.jpg)
24www.rakuten.com
Mar 2017
![Page 25: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/25.jpg)
25
Search Query: “40inch tv”
Regular text
search
Search with user’s signals
and learning-to-rank models
Not relevant
Not relevant
Not relevant
![Page 26: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/26.jpg)
26
Conversion Rate(Simulation)
NDCG CTR SimulatedQueries
Relative gain 15.58% 7.50% 10,000
Depth / Estimators
5 / 500 3 / 500 10 / 500 3 / 500
NDCG 0.687 0.688 0.685 0.689
Relative gain 15.14% 15.41% 14.92% 15.58%
Training time (56 cores)
2:45:48 1:20:57 35:25:44 1:58:07
![Page 27: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/27.jpg)
27
Automatic Speech
Recognition
ComputerVision
Natural Language
Processing
Information Retrieval
2011 2013 2013-2015 2017?
![Page 28: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/28.jpg)
28Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
![Page 29: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/29.jpg)
29Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
![Page 30: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/30.jpg)
30Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
![Page 31: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/31.jpg)
31
• Traditional IR methods do not scale to modern e-commerce needs
• User’s implicit feedback is a proxy for search query / document pairs
relevance
• Learning-to-rank (LTR) methods scale to thousand of features and are
robust to data noise
• LTR with listwise-based loss function substantially improve search
relevance (15.6% NDCG increase on e-commerce data)
• NDCG improvements directly correlate to conversion rates (7.5% CTR
increase on e-commerce data)
• DNN methods for IR are starting to outperform traditional ML methods
![Page 32: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale](https://reader031.vdocuments.us/reader031/viewer/2022030317/5a6703637f8b9a91298b5477/html5/thumbnails/32.jpg)