grocery shopping recommendation based on basket-sensitive random walk unilever discover (colworth),...
TRANSCRIPT
Grocery Shopping Recommendation Based on Basket-Sensitive Random
Walk
Unilever Discover (Colworth), UK(1)
University of Manchester, UK(2)
Liverpool John Moore University, UK(3)
Ming Li(1), Ben Dias(1), Wael EI-Deredy(2), Ian Jarman(3) and Paulo Lisboa(3)
The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Paris, 2009
The ProblemThe Problem Method Experiment SummaryPerformance Metric
• Grocery Shopping Recommendation:
– Grocery shopping considered as a real drudgery
– High repeat purchase rate with low injection of new products
– Implicit product preference feedback
– recommendation based on current context
• Three Issues in Recommendation Model
– A method to derive user-wise or product-wise similarity
– A method to generate recommendation based on those similarities
– An evaluation strategy to regulate model via retrospective data for
optimal live performance
Prior Art
• Item-Item Collaborative Filtering– memory efficient (e.g. 54GB -> 24GB -> 9MB)– subject to the sparsity problem
• only exploits direct (i.e. first-order) neighbourhood information
• Movie/Book Recommendation via Random Walk Model– alleviate the sparsity problem by exploring transitive (i.e. high-order)
neighbourhood information:– Two concerns:
• definition of the transition probability (column-normalized similarity)• ranking is insensitive to the current context
The Problem Method Experiment SummaryPerformance Metric
)1()( nTn RPR
Proposed: Basket-Sensitive Random Walk
• Define the product transition probability via a bipartite network:
– i.e. first–order similarity
– α penalization of consumers (products) with too many transactions
– Subject to data sparsity
• Enforce similarity from higher-order information but with reference to a user basket U:
– introduce context information by Ubasket (i.e. personalization vector)
– 1-d controls the bias between current and past baskets
Consumers
Products
c1 c2 c3 c4
p3p2p1
f(1,1) f(4,2)f(*,*)……
#purchase
basketbasketbasket UdRdR )1(TP
The Problem Method Experiment SummaryPerformance Metric
Proposed: continue
• Basket-Sensitive Random Walk Model
– Straightforward implementation is infeasible
• Quick approximation of by
– : offline calculation (also called ‘random walk with restart’ )
– lead to same ordered list of recommendation
basketbasketbasket UdRdR )1(TP
basketp
itembasket
i
iRR̂basketR̂
iitemR
basketR̂
The Problem Method Experiment SummaryPerformance Metric
Performance Metric
bias toward least popular products
basket oriented
product oriented
binary Hit Rate via popularitybased split
macro-averaged HitRate via leave-one-out split
micro-averaged HitRate via leave-one-out split
weighted Hit Rate via leave-one-out split
Characteristics of performance metrics
bias toward most popular products
The Problem Method Experiment SummaryPerformance Metric
Experiments
• Three real grocery data sets
– One from the collaborator
• online grocery store www.Leshop.ch
– Two from other published research works
• membership retailer warehouse (Chun-Nan Hsu et al., JML04)
• anonymized retail store (Tom Brijs et al., KDD99)
• Several performance metric with different characteristics
• Experiments
– Impact of model parameters
– Comparison with other models
– data sparsity
– personalization
The Problem Method Experiment SummaryPerformance Metric
Impact of model parameters: α and d
Performance metric:Left: bHR(pop) Right: marcroHR
Observation:Inconsistence between bHR and macroHR in some data sets (macroHR gets stronger bias toward least popular products)
Model parameters:α : bigger value indicating stronger penalization of products with too many transactions1-d : bias toward current basket
The Problem Method Experiment SummaryPerformance Metric
Comparison with other models
Performance metric: bHR(pop) and bHR(rnd) : binary hit using popularity-based split and random-split
Observations:(1) Empirical advantage of network-based similarity over other metric-based ones(2) Performance overestimation by random-based basket split
• an appropriate performance metric need to be determined by business rule, e.g. bHR(pop) for grocery products and bHR(rnd) for movies
The Problem Method Experiment SummaryPerformance Metric
Experiments on data sparsity
Observations:(1) The two proposed models have similar performance in bHR(2) The performance difference in wHR is more pronounced with
increased data sparsity • attribute to the high-order similarities introduced by BSRW
scheme
Performance metric: bHR and wHR (weighted hit rate via leave-one-out)
Models: CF(bn)CF(bn)+BSRWCF(cp)+BSRW
The Problem Method Experiment SummaryPerformance Metric
Personalized Models
Observations:(1) Empirical advantage of network-based similarity over other metric-based ones(2) Performance overestimation by random-based basket split
• an appropriate performance metric need to be determined by business rule, e.g. bHR(pop) for grocery products and bHR(rnd) for movies
Personalisation can better meet the consumer’s requirement and one simple way to achieve this is by re-arranging the ordered recommendation list according to personal preference
The Problem Method Experiment SummaryPerformance Metric
Summary
• Grocery shopping recommendations– product preferences are implicit– repeated purchases are overwhelmingly more frequent than
purchases of new products
• Basket-Sensitive Random Walk Model (BSRW)– Derives product transition probability via network-based
similarity instead of normalizing ad-hoc metric-based ones– On-line adaptation of recommendation based on current basket
• Poster in the Tuesday night poster session
The Problem Method Experiment SummaryPerformance Metric