introduction of “fairness in learning: classic and contextual bandits”
TRANSCRIPT
Introduction of “Fairness in Learning:
Classic and Contextual Bandits”
authorized by Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth
NIPS2016-YomiJanuary 19, 2017
Presenter: Kazuto Fukuchi
Fairness in Machine LearningConsequential decisions using machine learning
may lead unfair treatmentE.g., Google’s ad suggestion system [Sweeney 13]
Fairness in contextual bandit problem
African descent names European descent names
Arrested? Located
Negative ad. Neutral ad.
Individual fairness persons
• Choose one person for conducting an action• E.g., lend loan, hire, admission, etc.
When we can preferentially choose one person?Only if the person has the largest ability
There is no other reason for preferential choice Payback 90% Payback 60%
>
Contextual Bandit Problem
Each round 1. Obtain a context for each arm 2. Choose one arm 3. Observe reward s.t. and a.s.
-arms
𝑓 1 𝑓 2 𝑓 3 𝑓 4 𝑓 5
Unknown to the learner
Goal: Maximize the expected cumulative reward
Example: Linear Contextual BanditDefine• Suppose
E.g., Online recommendation• : Feature of a product • : Feature of a user regarding the product • Score of a user for a product is an inner
product
Example: Classic Bandit
• Expected reward is • Set for any • Then, the contextual bandit becomes to the
classic bandit
𝜇1 𝜇2 𝜇3 𝜇4 𝜇5
Regret• History : a record of experiences
• contexts, arm chosen, and reward observed• A policy : mapping from and to a distribution on
arms • Probability of choosing arm with at round
Regret: Dropped reward compared to the optimal policy
Regret bound if
Fairness ConstraintIt is unfair to preferentially choose one individual without an acceptable reason
A policy is -fair if with probability
Quality of the chosen individual is larger than others.
Probability of choosing arm at round
𝑓 𝑗 (𝑥 𝑗𝑡 )
>𝑓 𝑗 ′ (𝑥 𝑗 ′
𝑡 )
Institution of Fairness Constraint• Optimal policy is fair• But we can’t get the optimal policy due to
unknown
>
Can’t distinguish which arm has high expected reward
Expected reward is lower than the left group with h.p.
Fairness constraint enforces to choose a arm from the left group with uniform distribution
Fairness in Classic Bandit• Consider confidence bounds of the expected
rewards
• Choose uniformly from the chained group
expected rewards
Arm 1Arm 2Arm 3Arm 4Arm 5
Chained
Expected reward is lower than that of arms in the chained group
Fair Algorithm for Classic Bandit
Regret Upper BoundIf , then FairBandits has regret
• rounds require to obtain non-trivial regret, i.e., • Non-fair case:
• becomes by fairness constraint• Dependence on is optimal
Regret Lower BoundAny fair algorithm experiences constant per-round regret for at least
• constant per-round regret = non-trivial regret• To achieve non-trivial regret, we need at least
rounds• Thus, is necessary and sufficient
Fairness in Contextual BanditKWIK learnable = Fair bandit learnable
KWIK (Know What It Know) learning• Online regression• Learner outputs either prediction or
• denotes “I Don’t Know”• Only when , the learner observes feedback s.t.
𝑥𝑡Feature
Learner
“I Don’t Know”
Accurately predictable
KWIK learnable-KWIK learnable on a class with if1. for all w.p.
Institutions• Prediction is accurate if • With small number of answering
• number of answering =
KWIK Learnability Implies Fair Bandit LearnabilitySuppose is -KWIK learnable with Then, there is -fair algorithm for s.t.
For where
Linear Contextual Bandit Case• Let
• Then,
KWIK to Fair
Institution of KWIKToFair• Predict the expected rewards using KWIK
algorithm for each arm• If the outputs of KWIK algorithm is not
• Same strategy of classic bandit is applicableexpected rewards
Arm 1Arm 2Arm 3Arm 4Arm 5
2𝜖∗
Fair Bandit Learnability Implies KWIK LearnabilitySuppose • There is -fair algorithm for with regret • There exists s.t. for Then, there is -KWIK learnable algorithm for with is the solution of
An Exponential Separation Between Fair and Unfair Learning• Boolean conjunctions: Let
• Boolean conjunctions without fairness constraint
• For such , KWIK bound is at least • For , worst case regret bound is
Fair to KWIK
Institution of FairToKWIK• Divide domain of s.t. each width becomes
• Using fair algorithm, 𝑓 (𝑥𝑡 )0 𝜖∗ 2𝜖∗
𝑥 (0) 𝑥 (1) 𝑥 (2)
𝑥𝑡
𝑥 (ℓ) 𝑥𝑡
><?
𝑥 (3) 𝑥 (4 )
𝑝ℓ ,1 𝑝ℓ ,2Prob. of choosing
left armProb. of choosing
right arm
If for all , is in the red area
Output Otherwise,
Output
Conclusions• Fairness in contextual bandit problem and
classic bandit problem• -fair: with probability
Results• Classical Bandits: Necessary and sufficient
rounds to achieve non-trivial regret is • Contextual Bandits: Tightly relationship with
Knows What it Knows (KWIK) learning