Download - Hierarchical Exploration for Accelerating Contextual Bandits

Hierarchical Exploration forAccelerating Contextual Bandits

Yisong Yue, Sue Ann Hong and Carlos Guestrin

Personalized Recommender Systems• Every day, user visits news portal• Wish to personalize to her preferences

• Can only learn from feedback• E.g., user clicks on or “likes” article

• Leads to exploration vs exploitation dilemma• Goal is to satisfy user• Must make exploratory recommendations to

learn user’s preferences• Formalized as a contextual bandit problem

Linear Stochastic Bandit Problem

Balancing Exploration vs. Exploitation

CoFineUCB: Coarse-to-Fine Hierarchical Exploration

Feature Hierarchies• Suppose “stereotypical users” span K-dimensional space• E.g., “European vs. Asian news”

• Let U = D x K matrix

• Define projection of articles into subspace:

• Define representation of user profile:

• Thus:

News Recommender Simulations & User Study

• Two tiered exploration:• First in subspace • Then in full space

Theorem: with probability 1- δ average bounded by

Comparison Win / Tie / Loss Gain / DayCoFineUCB vs. Naïve 24 / 1 / 3 0.69

CoFineUCB vs. Reshaped 21 / 3 / 6 0.27

Mean Estimate by Topic Uncertainty of Estimate

+

• At each iteration t:• Set of available actions Xt = {xt,1, …, xt,n} (available articles)

• Algorithm chooses action xt from Xt (recommends an article)

• User provides feedback ŷt (user clicks on or “likes” the article)• Algorithm incorporates feedback

• Assumptions: E[ŷt] = w*Txt (w* is unknown to system)

• Regret:

• At each iteration:

• In example below: select article on economy:

UncertaintyEstimated Gain

“Upper Confidence Bound” • Given empirical sample of learned profiles W

• Can also be used to reshape full space (use LearnU(W,D))

Constructing Feature Hierarchies Using Prior Knowledge

“Atypical Users”

Naïve LinUCB

Reshaped Full Space

“All Users”

Coarse-to-Fine Approach

Subspace

• Leave-one-out simulation validation• Compared against hierarchy-free baselines• CoFineUCB combines efficiency of Subspace

Learning with flexibility of Full Space Learning

• Live User Study• Showed real users real articles • 10 articles/day, 10 days• Counted #likes

• If then suffices to learn primarily in subspace

• K-dimensional space much more efficient to explore• Explore full space as needed

Download - Hierarchical Exploration for Accelerating Contextual Bandits

Top Related