grocery shopping recommendation based on basket-sensitive random walk unilever discover (colworth),...

Grocery Shopping Recommendation Based on Basket-Sensitive Random

Walk

Unilever Discover (Colworth), UK(1)

University of Manchester, UK(2)

Liverpool John Moore University, UK(3)

Ming Li(1), Ben Dias(1), Wael EI-Deredy(2), Ian Jarman(3) and Paulo Lisboa(3)

The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Paris, 2009

The ProblemThe Problem Method Experiment SummaryPerformance Metric

• Grocery Shopping Recommendation:

– Grocery shopping considered as a real drudgery

– High repeat purchase rate with low injection of new products

– Implicit product preference feedback

– recommendation based on current context

• Three Issues in Recommendation Model

– A method to derive user-wise or product-wise similarity

– A method to generate recommendation based on those similarities

– An evaluation strategy to regulate model via retrospective data for

optimal live performance

Prior Art

• Item-Item Collaborative Filtering– memory efficient (e.g. 54GB -> 24GB -> 9MB)– subject to the sparsity problem

• only exploits direct (i.e. first-order) neighbourhood information

• Movie/Book Recommendation via Random Walk Model– alleviate the sparsity problem by exploring transitive (i.e. high-order)

neighbourhood information:– Two concerns:

• definition of the transition probability (column-normalized similarity)• ranking is insensitive to the current context

The Problem Method Experiment SummaryPerformance Metric

)1()( nTn RPR

Proposed: Basket-Sensitive Random Walk

• Define the product transition probability via a bipartite network:

– i.e. first–order similarity

– α penalization of consumers (products) with too many transactions

– Subject to data sparsity

• Enforce similarity from higher-order information but with reference to a user basket U:

– introduce context information by Ubasket (i.e. personalization vector)

– 1-d controls the bias between current and past baskets

Consumers

Products

c1 c2 c3 c4

p3p2p1

f(1,1) f(4,2)f(*,*)……

#purchase

basketbasketbasket UdRdR )1(TP


Proposed: continue

• Basket-Sensitive Random Walk Model

– Straightforward implementation is infeasible

• Quick approximation of by

– : offline calculation (also called ‘random walk with restart’ )

– lead to same ordered list of recommendation

basketbasketbasket UdRdR )1(TP

basketp

itembasket

i

iRR̂basketR̂

iitemR

basketR̂


Performance Metric

bias toward least popular products

basket oriented

product oriented

binary Hit Rate via popularitybased split

macro-averaged HitRate via leave-one-out split

micro-averaged HitRate via leave-one-out split

weighted Hit Rate via leave-one-out split

Characteristics of performance metrics

bias toward most popular products


Experiments

• Three real grocery data sets

– One from the collaborator

• online grocery store www.Leshop.ch

– Two from other published research works

• membership retailer warehouse (Chun-Nan Hsu et al., JML04)

• anonymized retail store (Tom Brijs et al., KDD99)

• Several performance metric with different characteristics

• Experiments

– Impact of model parameters

– Comparison with other models

– data sparsity

– personalization


http://www.leshop.ch/

Impact of model parameters: α and d

Performance metric:Left: bHR(pop) Right: marcroHR

Observation:Inconsistence between bHR and macroHR in some data sets (macroHR gets stronger bias toward least popular products)

Model parameters:α : bigger value indicating stronger penalization of products with too many transactions1-d : bias toward current basket


Comparison with other models

Performance metric: bHR(pop) and bHR(rnd) : binary hit using popularity-based split and random-split

Observations:(1) Empirical advantage of network-based similarity over other metric-based ones(2) Performance overestimation by random-based basket split

• an appropriate performance metric need to be determined by business rule, e.g. bHR(pop) for grocery products and bHR(rnd) for movies


Experiments on data sparsity

Observations:(1) The two proposed models have similar performance in bHR(2) The performance difference in wHR is more pronounced with

increased data sparsity • attribute to the high-order similarities introduced by BSRW

scheme

Performance metric: bHR and wHR (weighted hit rate via leave-one-out)

Models: CF(bn)CF(bn)+BSRWCF(cp)+BSRW


Personalized Models

Observations:(1) Empirical advantage of network-based similarity over other metric-based ones(2) Performance overestimation by random-based basket split

• an appropriate performance metric need to be determined by business rule, e.g. bHR(pop) for grocery products and bHR(rnd) for movies

Personalisation can better meet the consumer’s requirement and one simple way to achieve this is by re-arranging the ordered recommendation list according to personal preference


Summary

• Grocery shopping recommendations– product preferences are implicit– repeated purchases are overwhelmingly more frequent than

purchases of new products

• Basket-Sensitive Random Walk Model (BSRW)– Derives product transition probability via network-based

similarity instead of normalizing ad-hoc metric-based ones– On-line adaptation of recommendation based on current basket

• Poster in the Tuesday night poster session


Acknowledgements

• Dominique Locher and his team at LeShop

– www.LeShop.ch : the No.1 e-grocer of

Switzerland since 1998

– Good Friends, Collaborators and Data Provider

• All the other data providers and anonymous reviewers


grocery shopping recommendation based on basket-sensitive random walk unilever discover (colworth),...

Documents

based similarity

grocery products

random walk modelalleviate

retrospective data

data mining

data sparsityobservat

randomsplit observations

dperformance metric