introduction of “fairness in learning: classic and contextual bandits”

Introduction of “Fairness in Learning:

Classic and Contextual Bandits”

authorized by Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth

NIPS2016-YomiJanuary 19, 2017

Presenter: Kazuto Fukuchi

Fairness in Machine LearningConsequential decisions using machine learning

may lead unfair treatmentE.g., Google’s ad suggestion system [Sweeney 13]

Fairness in contextual bandit problem

African descent names European descent names

Arrested? Located

Negative ad. Neutral ad.

Individual fairness persons

• Choose one person for conducting an action• E.g., lend loan, hire, admission, etc.

When we can preferentially choose one person?Only if the person has the largest ability

There is no other reason for preferential choice Payback 90% Payback 60%

>

Contextual Bandit Problem

Each round 1. Obtain a context for each arm 2. Choose one arm 3. Observe reward s.t. and a.s.

-arms

𝑓 1 𝑓 2 𝑓 3 𝑓 4 𝑓 5

Unknown to the learner

Goal: Maximize the expected cumulative reward

Example: Linear Contextual BanditDefine• Suppose

E.g., Online recommendation• : Feature of a product • : Feature of a user regarding the product • Score of a user for a product is an inner

product

Example: Classic Bandit

• Expected reward is • Set for any • Then, the contextual bandit becomes to the

classic bandit

𝜇1 𝜇2 𝜇3 𝜇4 𝜇5

Regret• History : a record of experiences

• contexts, arm chosen, and reward observed• A policy : mapping from and to a distribution on

arms • Probability of choosing arm with at round

Regret: Dropped reward compared to the optimal policy

Regret bound if

Fairness ConstraintIt is unfair to preferentially choose one individual without an acceptable reason

A policy is -fair if with probability

Quality of the chosen individual is larger than others.

Probability of choosing arm at round

𝑓 𝑗 (𝑥 𝑗𝑡 )

>𝑓 𝑗 ′ (𝑥 𝑗 ′

𝑡 )

Institution of Fairness Constraint• Optimal policy is fair• But we can’t get the optimal policy due to

unknown

>

Can’t distinguish which arm has high expected reward

Expected reward is lower than the left group with h.p.

Fairness constraint enforces to choose a arm from the left group with uniform distribution

Fairness in Classic Bandit• Consider confidence bounds of the expected

rewards

• Choose uniformly from the chained group

expected rewards

Arm 1Arm 2Arm 3Arm 4Arm 5

Chained

Expected reward is lower than that of arms in the chained group

Fair Algorithm for Classic Bandit

Regret Upper BoundIf , then FairBandits has regret

• rounds require to obtain non-trivial regret, i.e., • Non-fair case:

• becomes by fairness constraint• Dependence on is optimal

Regret Lower BoundAny fair algorithm experiences constant per-round regret for at least

• constant per-round regret = non-trivial regret• To achieve non-trivial regret, we need at least

rounds• Thus, is necessary and sufficient

Fairness in Contextual BanditKWIK learnable = Fair bandit learnable

KWIK (Know What It Know) learning• Online regression• Learner outputs either prediction or

• denotes “I Don’t Know”• Only when , the learner observes feedback s.t.

𝑥𝑡Feature

Learner

“I Don’t Know”

Accurately predictable

KWIK learnable-KWIK learnable on a class with if1. for all w.p.

Institutions• Prediction is accurate if • With small number of answering

• number of answering =

KWIK Learnability Implies Fair Bandit LearnabilitySuppose is -KWIK learnable with Then, there is -fair algorithm for s.t.

For where

Linear Contextual Bandit Case• Let

• Then,

KWIK to Fair

Institution of KWIKToFair• Predict the expected rewards using KWIK

algorithm for each arm• If the outputs of KWIK algorithm is not

• Same strategy of classic bandit is applicableexpected rewards

Arm 1Arm 2Arm 3Arm 4Arm 5

2𝜖∗

Fair Bandit Learnability Implies KWIK LearnabilitySuppose • There is -fair algorithm for with regret • There exists s.t. for Then, there is -KWIK learnable algorithm for with is the solution of

An Exponential Separation Between Fair and Unfair Learning• Boolean conjunctions: Let

• Boolean conjunctions without fairness constraint

• For such , KWIK bound is at least • For , worst case regret bound is

Fair to KWIK

Institution of FairToKWIK• Divide domain of s.t. each width becomes

• Using fair algorithm, 𝑓 (𝑥𝑡 )0 𝜖∗ 2𝜖∗

𝑥 (0) 𝑥 (1) 𝑥 (2)

𝑥𝑡

𝑥 (ℓ) 𝑥𝑡

><?

𝑥 (3) 𝑥 (4 )

𝑝ℓ ,1 𝑝ℓ ,2Prob. of choosing

left armProb. of choosing

right arm

If for all , is in the red area

Output Otherwise,

Output

Conclusions• Fairness in contextual bandit problem and

classic bandit problem• -fair: with probability

Results• Classical Bandits: Necessary and sufficient

rounds to achieve non-trivial regret is • Contextual Bandits: Tightly relationship with

Knows What it Knows (KWIK) learning

introduction of “fairness in learning: classic and contextual bandits”

Technology