Hierarchical Exploration forAccelerating Contextual Bandits
Yisong Yue, Sue Ann Hong and Carlos Guestrin
Personalized Recommender Systems• Every day, user visits news portal• Wish to personalize to her preferences
• Can only learn from feedback• E.g., user clicks on or “likes” article
• Leads to exploration vs exploitation dilemma• Goal is to satisfy user• Must make exploratory recommendations to
learn user’s preferences• Formalized as a contextual bandit problem
Linear Stochastic Bandit Problem
Balancing Exploration vs. Exploitation
CoFineUCB: Coarse-to-Fine Hierarchical Exploration
Feature Hierarchies• Suppose “stereotypical users” span K-dimensional space• E.g., “European vs. Asian news”
• Let U = D x K matrix
• Define projection of articles into subspace:
• Define representation of user profile:
• Thus:
News Recommender Simulations & User Study
• Two tiered exploration:• First in subspace • Then in full space
Theorem: with probability 1- δ average bounded by
Comparison Win / Tie / Loss Gain / DayCoFineUCB vs. Naïve 24 / 1 / 3 0.69
CoFineUCB vs. Reshaped 21 / 3 / 6 0.27
Mean Estimate by Topic Uncertainty of Estimate
+
• At each iteration t:• Set of available actions Xt = {xt,1, …, xt,n} (available articles)
• Algorithm chooses action xt from Xt (recommends an article)
• User provides feedback ŷt (user clicks on or “likes” the article)• Algorithm incorporates feedback
• Assumptions: E[ŷt] = w*Txt (w* is unknown to system)
• Regret:
• At each iteration:
• In example below: select article on economy:
UncertaintyEstimated Gain
“Upper Confidence Bound” • Given empirical sample of learned profiles W
• Can also be used to reshape full space (use LearnU(W,D))
Constructing Feature Hierarchies Using Prior Knowledge
“Atypical Users”
Naïve LinUCB
Reshaped Full Space
“All Users”
Coarse-to-Fine Approach
Subspace
• Leave-one-out simulation validation• Compared against hierarchy-free baselines• CoFineUCB combines efficiency of Subspace
Learning with flexibility of Full Space Learning
• Live User Study• Showed real users real articles • 10 articles/day, 10 days• Counted #likes
• If then suffices to learn primarily in subspace
• K-dimensional space much more efficient to explore• Explore full space as needed