strategic experimentation
TRANSCRIPT
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 1
Strategic Experimentation
Lecture 1: Poisson Bandits
Introduction
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 2
A Model of Rational Learning
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 3
People are uncertain about their environment
They learn from experience
Bayesian learning
• Agents possess a prior belief about some unknownparameter(s) of the environment
• They know the distribution of possible outcomesconditional on the parameter(s)
• They use Bayes’ rule to update their belief on the basis ofwhat they observe
A Single-Agent Example
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 4
Optimal learning typically involves experimentation– Sacrifice current rewards for better information
which will generate better rewards in the future– Exploitation versus exploration
Examples:– Seller of a new good– Farmer choosing between a traditional crop
and a gene-modified one– Researcher pursuing a new research agenda
A Single-Agent Example
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 5
Multi-period monopolist learning about the demand curve
No physical links between periods
• Rothschild (1974)• Aghion, Bolton, Harris & Julien (1991)• Keller & Rady (1999)
Optimal learning may be incomplete
Identical sellers may charge different prices in the long runbecause of different sequences of observations
Strategic Experimentation
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 6
Agents learn from the experiments of others,as well as from their own
Examples:– Oligopolists in a new market– Farmers with neighboring fields– Researchers pursuing a common research agenda
With publicly observable actions and outcomes,there is an informational externality
Strategic Experimentation
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 7
• Aghion, Espinoza & Julien (1993), Harrington (1995),Keller & Rady (2003):
– Differentiated-goods duopoly
• Bergemann & Valimaki (1996, 1997, 2000):– Sellers and buyers learning about a new product
• Bolton & Harris (1999, 2000), Keller, Rady & Cripps (2005):– Pure informational externality– Two-armed bandits in continuous time
Multi-Armed Bandit Problem
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 8
Stylized model of the exploitation-exploration trade-off– Agent decides repeatedly which slot machine to play– Each machine produces a random stream of payoffs– Uncertainty about the distribution of payoff streams
Sequential design of experiments
• Robbins (1952)• Bellman (1956), Bradt, Johnson & Karlin (1956)• Gittins and Jones (1974)• Karatzas (1984), Presman (1990)
• Bergemann & Valimaki (2006)
Strategic Experimentation with Bandits
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 9
Bolton & Harris (1999):– Two-armed bandits in continuous time– One arm is “safe”– The other arm is either “good” or “bad”
(Brownian motion with high or low drift)– Type of risky arm identical across players– Publicly observable actions and outcomes– Pure informational externality– Unique symmetric Markov perfect equilibrium– Free-riding versus encouragement
Strategic Experimentation with Bandits
Introduction
• Optimal Learning
• Strategic Issues
• Bandits
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 10
Keller, Rady & Cripps (2005), Keller & Rady (2010):– Poisson process with high or low arrival rate– Unique symmetric Markov perfect equilibrium– Multiplicity of asymmetric Markov perfect equilibria– No equilibria in cutoff strategies– Inefficiency– Encouragement effect
The Model
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 11
Poisson Bandits
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 12
Two-armed bandit in continuous time
One arm is safe (S)generates a known flow payoff s
Other arm is risky (R)yields i.i.d. lump-sums of known mean h which arriveaccording to a Poisson process
if good (θ = 1), Poisson intensity is λ1 (≡ flow payoff λ1h)
if bad (θ = 0), Poisson intensity is λ0 (≡ flow payoff λ0h)
s > 0 and λ1 > λ0 ≥ 0 known to players
True value of θ initially unknown to players
Assumption: λ1h > s > λ0h
Exponential Bandits
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 13
Keller, Rady & Cripps (2005):
• λ0 = 0
• First lump-sum resolves all uncertainty about θ
• Arrives at an exponentially distributed random time(→ “exponential bandit”)
Much simpler than the case λ0 > 0 in Keller & Rady (2010):
• No encouragement effect
• Backward induction from single-agent cut-off
Actions and Payoffs
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 14
One unit of a perfectly divisible resource per unit of time
If fraction kt ∈ [0, 1] is allocated to R on [t, t + dt[
S yields (1− kt)s dt
R pays a lump-sum with probability ktλθ dt
Expected payoff increment conditional on θ:
[(1− kt)s+ ktλθh] dt
Payoffs and Beliefs
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 15
pt = posterior probability that θ equals 1
Et[λθ] = ptλ1 + (1− pt)λ0 = λ(pt)
Expected payoff increment conditional on observationsup to time t:
[(1− kt)s+ ktλ(pt)h] dt
Total payoff in per-period units:
E[∫ ∞
0r e−r t [(1− kt)s+ ktλ(pt)h] dt
]
Common Posterior Beliefs
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 16
Each player has a replica two-armed bandit
– same θ
– i.i.d. arrival times
Common prior
Observable actions and outcomes
Hence common posterior
Applying Bayes’ Rule
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 17
Let player n = 1, . . . , N allocate kn,t to R on [t, t + dt[
Define
Kt =N∑
n=1
kn,t
Conditional on θ, the probability of none of the playersreceiving a lump-sum in [t, t + dt[ is
N∏
n=1
e−kn,tλθ dt =N∏
n=1
(1− kn,tλθ dt) = 1−Ktλθ dt
No News is Bad News
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 18
Given pt at time t and no lump-sum in [t, t + dt[:
pt + dpt =pt (1−Ktλ1 dt)
(1− pt)(1−Ktλ0 dt) + pt (1−Ktλ1 dt)
ordpt = −Kt∆λ pt(1− pt) dt
with ∆λ = λ1 − λ0
When there is a lump-sum on any of the risky arms, theposterior belief jumps up from pt− to
pt = j(pt−) =λ1pt−λ(pt−)
Stationary Markov Strategies
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 19
With pt− as the natural state variable:
kt = k(pt−)
where k : [0, 1] → [0, 1] is left-continuous and piecewiseLipschitz-continuous
Call the strategy simple if k(p) ∈ {0, 1} for all p
Cut-off Strategies
Introduction
The Model
• Setup
• Beliefs
• Strategies
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 20
For some p ∈ [0, 1] :
If p > p, then k(p) = 1 (play R exclusively)If p ≤ p, then k(p) = 0 (play S exclusively)
Example:
Myopic agent ignores information content of own actions(as if r = ∞)
Payoff = max {s, λ(p)h}
Cut-off:
pm =s− λ0h
λ1h− λ0h
The Single-Agent Problem
Introduction
The Model
The Single-AgentProblem
• Principle of Optimality
• Bellman Equation
• Solution
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 21
Principle of Optimality
Introduction
The Model
The Single-AgentProblem
• Principle of Optimality
• Bellman Equation
• Solution
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 22
u(p) = maxk∈[0,1]
{
r [(1− k)s+ kλ(p)h] dt
+ e−r dtE [u(p) | p, k ]}
With probability kλ(p) dt:
Lump-sum,
u(p) = u(j(p))
With probability 1− kλ(p) dt:
No lump-sum,
u(p) = u(p)− k∆λ p(1− p)u′(p) dt
Insert and simplify to get Bellman equation!
Bellman Equation
Introduction
The Model
The Single-AgentProblem
• Principle of Optimality
• Bellman Equation
• Solution
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 23
u(p) = s+ maxk∈[0,1]
k {b(p, u)− c(p)}
with the expected short-term opportunity cost
c(p) = s− λ(p)h
and the expected long-term learning benefit
b(p, u) = [λ(p) (u(j(p))− u(p))−∆λ p(1− p)u′(p)]/r
(left-hand derivative!)
=⇒ k = 1 or k = 0 is optimal
Single-Agent Optimum
Introduction
The Model
The Single-AgentProblem
• Principle of Optimality
• Bellman Equation
• Solution
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 24
When k = 1 is optimal, u satisfies a differential-differenceequation (DDE) that admits an explicit solution
Optimal cut-off p∗1 < pm determined by
• value matching: u(p∗1) = s
• smooth pasting: u′(p∗1) = 0
0 1pm
u
s
λ1h
����
p∗1
. ........................... .........................................................
...............................
................................
.................................
..................................
.....................................
The Cooperative Problem
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
• Efficient Strategies
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 25
Efficient Experimentation
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
• Efficient Strategies
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 26
Bellman equation for N players jointly maximising theiraverage payoff:
u(p) = s+ maxK∈[0,N ]
K {b(p, u)− c(p)/N}
=⇒ K = N or K = 0 is optimal
=⇒ Optimal cut-off p∗N < p∗1, decreasing in N
0 1p∗N p∗1
K
0
N
The Strategic Problem
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 27
Characterising Best Responses
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 28
Bellman equation for player n, given K¬n = K − kn:
un(p) = s+K¬n b(p, un) + maxkn∈[0,1]
kn {b(p, un)− c(p)}
Indifference diagonal:
DK¬n= {(p, u) ∈ [0, 1]× IR+: u = s+K¬n c(p)}
0 1pm
u
s
λ1h
����
DK¬n
R best responseto K¬n
S best responseto K¬n
General Results for Markov Perfect Equilibria
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 29
(1) In any MPE, each player obtains at least the single-agentoptimum payoff
(2) All MPE are inefficient (free-riding)
(3) There is no MPE where all players use cut-off strategies
(4) In any MPE, at least one player experiments at somebeliefs below p∗1 (encouragement) if and only if λ0 > 0
No MPE in Cut-Off Strategies
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 30
0 1pm
u
s
λ1h
��������
D1
R dominant
S best response
to K¬n = 1
p1 p2
u2
........................
.....................
.....................
....................
....................................... ..........................
.....................................................
............................
u1. ................ ................. ..................
.....................................
...................
Player 1 must change action at p2 as well as at p1
Encouragement Effect for λ0 > 0
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 31
Suppose all players play S at all beliefs p ≤ p∗1
un(p∗1) = V ∗
1 (p∗1) = s, u′
n(p∗1) = (V ∗
1 )′(p∗1) = 0
b(p∗1, un) ≤ c(p∗1) = b(p∗1, V∗1 )
=⇒ un(j(p∗1)) ≤ V ∗
1 (j(p∗1))
=⇒ un(j(p∗1)) = V ∗
1 (j(p∗1))
un − V ∗1 minimal at j(p∗1)
=⇒ u′n(j(p
∗1)) ≤ (V ∗
1 )′(j(p∗1))
un(j2(p∗1)) ≥ V ∗
1 (j2(p∗1))
=⇒ b(j(p∗1), un) ≥ b(j(p∗1), V∗1 ) > c(j(p∗1))
So all players must use R at the belief j(p∗1)
Encouragement Effect for λ0 > 0
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 32
Each player’s Bellman equation now yields
un(j(p∗1)) = s+N b(j(p∗1), un)− c(j(p∗1))
≥ s+N b(j(p∗1), V∗1 )− c(j(p∗1))
> s+ b(j(p∗1), V∗1 )− c(j(p∗1))
= V ∗1 (j(p
∗1))
— a contradiction with un(j(p∗1)) = V ∗
1 (j(p∗1))
No Encouragement Effect for λ0 = 0
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 33
Encouragement effect rests on two conditions
• Experimentation by a pioneer must increase the likelihoodthat others will return to the risky arm
• These future experiments must be valuable to the pioneer
The second condition is not met for λ0 = 0
So for λ0 = 0, experimentation in any MPE stops at p∗1
Symmetric Equilibrium
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 34
No symmetric MPE in pure strategies
Indifference between R and S requires
b(p, u) = c(p)
— DDE with closed-form solution
Bellman equation then implies
k(p) =u(p)− s
(N − 1)c(p)
Symmetric MPE
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 35
Construction and uniqueness by elementary methods
0 1pm
u
s
λ1h
����
DN−1
pN p†
N
. .........................................................
..............................
...............................
................................
.................................
.................................
..................................
R w.prob. 0
R w.prob. 1
strict mixing
Smooth pasting at pN
Comparative statics as N↑: common payoffs↑, pN↓, p†Nl
Symmetric MPE
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 36
Interior allocation between cut-offs pN < p†N
0 1p∗N p∗
1
K
0
N
pN p†
N
. ............ ........... ......... .....................
................
...................
.........................
............................
......................
.........................
...........................
Inefficiency: pN > p∗N (not reached in finite time)
Encouragement effect: pN < p∗1 if λ0 > 0
Asymmetric Equilibria
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 37
No asymmetric MPE demonstrated in Bolton & Harris (1999)
Keller, Rady & Cripps (2005):
• λ0 = 0 (no encouragement effect)
• variety of asymmetric MPE in simple strategies
• complete classification for N = 2
• most inequitable MPE for arbitrary N
• higher average payoff than symmetric MPE
Keller, Rady & Cripps (2010):
• λ0 > 0 (encouragement effect)
• construction of a particular asymmetric MPE
• Pareto improvement over symmetric MPE
The Most Inequitable MPE for λ0 = 0 and N = 2
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 38
0 1p∗1
pm
u
s
λ1h
����
D1
p1
...........................
..........................
................................................. ..... .........
..............................................
.......................
.......................
........................
........................
. .............................................
........................
.........................
..... .... ....
p2
.....................................
....................................
....................................
....................................
Average payoff higher than in symmetric MPE
Average payoff increases as burden of experimentation isshared more equitably
Constructing Asymmetric MPE for λ0 > 0
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 39
Two ideas:
• Give all players identical continuation values after asuccess on a risky arm
• Let players alternate between the roles of experimenterand free-rider before all experimentation stops
Construction below for N = 2 generalises to arbitrary N
Asymmetric MPE for λ0 > 0 and N = 2
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 40
0 1pm
u
s
λ1h
����
D1D1/2
K = 2
p‡
1<K<2
p♯
K = 1
p♭
K = 0
• On ]p♯, 1] both players play symmetrically
• On ]p♭, p♯] players take turns playing R
• On [0, p♭] both players play S
Strictly higher average payoff on ]p♭, 1] than in the symmetricequilibrium
Pareto improvement for sufficiently frequent turns on ]p♭, p♯]
Why is Taking Turns More Efficient?
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 41
Under symmetry:
• A player who deviates to the safe action slows down thegradual slide of beliefs towards more pessimism
• This makes the other players experiment more than theywould on the equilibrium path
In the alternation phase of an asymmetric MPE:
• A deviation from the risky to the safe action freezes thebelief in its current state
• This delays the time at which another player takes over
Best, Worst and Most Inequitable MPE
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 42
Keller, Rady & Cripps (2005):
• λ0 = 0
• Symmetric MPE yields uniformly lowest average payoffs
• For N = 2, the limits as λ0 ↓ 0 of the MPE justconstructed yield uniformly highest average payoffs
• For N = 2, the most inequitable MPE yield uniformlyextremal individual payoffs; these are in simple strategies
Do such MPE exist for λ0 > 0?
Consider simple MPE for N = 2 and small λ0 > 0
Large Jumps
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 43
For λ0 > 0 small enough:
• j(p∗2) ≥ pm
• Everybody plays R after a success
• We can again give the players common continuation valuesafter a success on a risky arm
Precise condition:
λ0
(
λ0
λ1
)
λ0∆λ
≤r
2
Sufficient condition:λ0 ≤
r
2
A Simple MPE for N = 2
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 44
0 1pm
u
s
λ1h
����
D1
R dominant
S/R mutualbest responses
p
.....................................
.....................................
......................................
......................................
ps
. .......................................
................
.........................
...........
p
..................
................
.............................
. ............. ............. ............................
a
• on ]p, 1] both players play R• on ]ps, p] player 1 plays S, player 2 plays R• on ]p, ps] player 1 plays R, player 2 plays S• on [0, p] both players play S
Lower average payoff than in the previous MPE
◦ Monotonicity? ‘Anticipation’
Rewarding the Last Experimenter
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 45
• Maintain assumption of ‘large jumps’
• Relax payoff symmetry above D1
– Give player 1 higher payoff at j(p)
– Two effects:
◦ Lower intensity just to the right of p
◦ Player 1 is willing to play R left of p
– Average payoff
◦ decreases at high beliefs
◦ increases around p
Rewarding the Last Experimenter
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 46
• Consequences:
– Can construct equilibria with progressivelyincreasing payoff asymmetry, approachingthe most inequitable MPE for λ0 = 0
– There exist no uniformly best or worst MPEwhen λ0 > 0
Rewarding the Last Experimenter
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
• Best Responses
• General Results
• Symmetric MPE
• Asymmetric MPE
• Simple MPE
• More Asymmetry
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 47
p∗2
pm
u
s ��
D1
p2
`
p1
`
p
`
ps
`
``
`
`
• Fix (p2, u1) above D1;find p1and pand ps• Fill in u2 above D1and just below• Fill in u2 above p . . . does it join up?• Adjust (p2, u1) if necessary
Conclusion
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 48
Concluding Remarks
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 49
• Poisson bandits
– Alternative to Brownian setup
– News comes in ‘lumps’
• Asymmetric equilibria with encouragement
– Construction by elementary methods
– Taking turns is superior to ‘mixing’
– No uniformly ‘best’ or ‘worst’ MPE unless λ0 = 0
Concluding Remarks
Introduction
The Model
The Single-AgentProblem
The CooperativeProblem
The Strategic Problem
Conclusion
S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 50
• What else?
– Ongoing work on ‘bad news’ scenario– Smooth pasting does not hold– Characterization of symmetric MPE– Viscosity solutions of the Bellman equation
• What next?
– Introduce negative correlation of type of risky arm acrossplayers
– Consider non-Markovian equilibria