the canonical reputation model and reputation effects€¦ · the canonical reputation model a...
TRANSCRIPT
1
The Canonical Reputation Model andReputation Effects
George J. Mailath
University of Pennsylvania
Lecture at the Summer School in Economic TheoryJerusalem, Israel
June 25, 2015
2
The Canonical Reputation Model
A long-lived player 1 faces a sequence of short-lived players, inthe role of player 2 of the stage game.
Ai , finite action set for each player.
Y , finite set of public signals of player 1’s actions, a1.
ρ(y | a1), prob of signal y ∈ Y , given a1 ∈ A1.
Player 2’s ex post stage game payoff is u∗2(a1, a2, y), and
2’s ex ante payoff is
u2(a1, a2) :=∑
y∈Yu∗
2(a1, a2, y)ρ(y |a1).
Each player 2 maximizes her stage game payoff u2.
3
The player 2’s are uncertain about the characteristics ofplayer 1: Player 1’s characteristics are described by histype, ξ ∈ Ξ.
All the player 2’s have a common prior μ on Ξ.Type space is partitioned into two sets, Ξ = Ξ1 ∪Ξ2, where
Ξ1 is the set of payoff types andΞ2 is the set of behavioral (or commitment or action) types.
For ξ ∈ Ξ1, player 1’s ex post stage game payoff isu∗
1(a1, a2, y , ξ), and 1’s ex ante payoff is
u1(a1, a2, ξ) :=∑
y∈Yu∗
1(a1, a2, y , ξ)ρ(y |a1).
Each type ξ ∈ Ξ1 of player 1 maximizes
(1 − δ)∑
t≥0δtu1(a1, a2, ξ).
4
Player 1 knows his type and observes all past actions andsignals, while each player 2 only the history of past signals.
A strategy for player 1:
σ1 : ∪∞t=0(A1 × A2 × Y )t × Ξ → Δ(A1).
If ξ̂ ∈ Ξ2 is a simple action type, then ∃!α̂1 ∈ Δ(A1) suchthat σ1(ht
1, ξ̂) = α̂1 for all ht1.
A strategy for player 2:
σ2 : ∪∞t=0Y t → Δ(A2).
5
Space of outcomes: Ω := Ξ × (A1 × A2 × Y )∞.
A profile (σ1, σ2) with prior μ induces the unconditionaldistribution P ∈ Δ(Ω).
For a fixed simple type ξ̂ = ξ(α̂1), the probability measureon Ω conditioning on ξ̂ (and so induced by α̂1 in everyperiod and σ2), is denoted P̂ ∈ Δ(Ω).
Denoting by P̃ the measure induced by (σ1, σ2) andconditioning on ξ 6= ξ̂, we have
P = μ(ξ̂)P̂ + (1 − μ(ξ̂))P̃. (1)
Given a strategy profile σ, U1(σ, ξ) denotes the type-ξlong-lived player’s payoff in the repeated game,
U1(σ, ξ) := EP
[
(1 − δ)∞∑
t=0
δtu1(at , yt , ξ)
∣∣∣∣∣ξ
]
.
6
Denote by Γ(μ, δ) the game of incomplete information.
Definition
A strategy profile (σ′1, σ
′2) is a Nash equilibrium of the game
Γ(μ, δ) if, for all ξ ∈ Ξ1, σ′1 maximizes U1(σ1, σ
′2, ξ) over player
1’s repeated game strategies, and if for all t and all ht2 ∈ H2 that
have positive probability under (σ′1, σ
′2) and μ (i.e., P(ht
2) > 0),
EP[u2(σ
′1(h
t1, ξ), σ
′2(h
t2)) | ht
2
]= max
a2∈A2
EP[u2(σ
′1(h
t1, ξ), a2) | ht
2
].
7
Denote by Γ(μ, δ) the game of incomplete information.
Definition
A strategy profile (σ′1, σ
′2) is a Nash equilibrium of the game
Γ(μ, δ) if, for all ξ ∈ Ξ1, σ′1 maximizes U1(σ1, σ
′2, ξ) over player
1’s repeated game strategies, and if for all t and all ht2 ∈ H2 that
have positive probability under (σ′1, σ
′2) and μ (i.e., P(ht
2) > 0),
EP[u2(σ
′1(h
t1, ξ), σ
′2(h
t2)) | ht
2
]= max
a2∈A2
EP[u2(σ
′1(h
t1, ξ), a2) | ht
2
].
Our goal: Reputation Bound (Fudenberg & Levine ’89 ’92)Fix a payoff type, ξ ∈ Ξ1. What is a “good” lower bound,uniform across Nash equilibria σ′ and Ξ, for U1(σ
′, ξ)?
Our tool (Gossner 2011): relative entropy.
8
Relative Entropy
X a finite set of outcomes.
The relative entropy or Kullback-Leibler distance betweenprobability distributions p and q over X is
d(p‖q) :=∑
x∈X
p(x) logp(x)
q(x).
By convention, 0 log 0q = 0 for all q ∈ [0, 1] and p log p
0 = ∞for all p ∈ (0, 1]. In our applications of relative entropy, thesupport of q will always contain the support of p.
Since relative entropy is not symmetric, often say d(p‖q) isthe relative entropy of q with respect to p.
d(p‖q) ≥ 0, and d(p‖q) = 0 ⇐⇒ p = q.
9
Relative entropy is expected prediction error
d(p‖q) measures observer’s expected prediction error onx ∈ X using q when true dsn is p:
n i.i.d. draws from X under p has probability∏
x p(x)nx ,where nx is the number of realization of x in sample.
Observer assigns same sample probability∏
x q(x)nx .
Log likelihood ratio is
L(x1, . . . , xn) =∑
x nx log p(x)q(x) ,
and so1nL(x1, . . . , xn) → d(p‖q).
10
The chain rule
LemmaSuppose P, Q ∈ Δ(X × Y ), X and Y finite sets. Then
d(P‖Q) = d(PX‖QX ) +∑
x PX (x)d(PY (∙|x)‖QY (∙|x))
= d(PX‖QX ) + EPXd(PY (∙|x)‖QY (∙|x)).
11
The chain rule
LemmaSuppose P, Q ∈ Δ(X × Y ), X and Y finite sets. Then
d(P‖Q) = d(PX‖QX ) +∑
x PX (x)d(PY (∙|x)‖QY (∙|x))
= d(PX‖QX ) + EPXd(PY (∙|x)‖QY (∙|x)).
Proof.
d(P‖Q) =∑
x ,y P(x , y) log PX (x)QX (x)
P(x ,y)PX (x)
QX (x)Q(x ,y)
12
The chain rule
LemmaSuppose P, Q ∈ Δ(X × Y ), X and Y finite sets. Then
d(P‖Q) = d(PX‖QX ) +∑
x PX (x)d(PY (∙|x)‖QY (∙|x))
= d(PX‖QX ) + EPXd(PY (∙|x)‖QY (∙|x)).
Proof.
d(P‖Q) =∑
x ,y P(x , y) log PX (x)QX (x)
P(x ,y)PX (x)
QX (x)Q(x ,y)
= d(PX‖QX ) +∑
x ,y P(x , y) log PY (y |x)QY (y |x)
13
The chain rule
LemmaSuppose P, Q ∈ Δ(X × Y ), X and Y finite sets. Then
d(P‖Q) = d(PX‖QX ) +∑
x PX (x)d(PY (∙|x)‖QY (∙|x))
= d(PX‖QX ) + EPXd(PY (∙|x)‖QY (∙|x)).
Proof.
d(P‖Q) =∑
x ,y P(x , y) log PX (x)QX (x)
P(x ,y)PX (x)
QX (x)Q(x ,y)
= d(PX‖QX ) +∑
x ,y P(x , y) log PY (y |x)QY (y |x)
14
The chain rule
LemmaSuppose P, Q ∈ Δ(X × Y ), X and Y finite sets. Then
d(P‖Q) = d(PX‖QX ) +∑
x PX (x)d(PY (∙|x)‖QY (∙|x))
= d(PX‖QX ) + EPXd(PY (∙|x)‖QY (∙|x)).
Proof.
d(P‖Q) =∑
x ,y P(x , y) log PX (x)QX (x)
P(x ,y)PX (x)
QX (x)Q(x ,y)
= d(PX‖QX ) +∑
x ,y P(x , y) log PY (y |x)QY (y |x)
15
The chain rule
LemmaSuppose P, Q ∈ Δ(X × Y ), X and Y finite sets. Then
d(P‖Q) = d(PX‖QX ) +∑
x PX (x)d(PY (∙|x)‖QY (∙|x))
= d(PX‖QX ) + EPXd(PY (∙|x)‖QY (∙|x)).
Proof.
d(P‖Q) =∑
x ,y P(x , y) log PX (x)QX (x)
P(x ,y)PX (x)
QX (x)Q(x ,y)
= d(PX‖QX ) +∑
x ,y P(x , y) log PY (y |x)QY (y |x)
= d(PX‖QX ) +∑
x PX (x)∑
y PY (y |x) log PY (y |x)QY (y |x) .
16
A grain of truth
LemmaLet X be a finite set of outcomes. Suppose p, p′ ∈ Δ(X ) andq = εp + (1 − ε)p′ for some ε > 0. Then,
d(p‖q) ≤ − log ε.
17
A grain of truth
LemmaLet X be a finite set of outcomes. Suppose p, p′ ∈ Δ(X ) andq = εp + (1 − ε)p′ for some ε > 0. Then,
d(p‖q) ≤ − log ε.
Proof.Since q(x)/p(x) ≥ ε, we have
−d(p‖q) =∑
x p(x) log q(x)p(x) ≥
∑x p(x) log ε = log ε.
18
Back to reputations!Fix α̂1 ∈ Δ(A1) and suppose μ(ξ(α̂1)) > 0.In a Nash eq, at history ht
2, σ2(ht2) is a best response to
α1(ht2) := EP[σ1(h
t1, ξ) | ht
2] ∈ Δ(A1),
that is, σ2(ht2) maximizes∑
a1
∑y u∗
2(a1, a2, y)ρ(y |a1)α1(a1|ht2).
19
Back to reputations!Fix α̂1 ∈ Δ(A1) and suppose μ(ξ(α̂1)) > 0.In a Nash eq, at history ht
2, σ2(ht2) is a best response to
α1(ht2) := EP[σ1(h
t1, ξ) | ht
2] ∈ Δ(A1),
that is, σ2(ht2) maximizes∑
a1
∑y u∗
2(a1, a2, y)ρ(y |a1)α1(a1|ht2).
At ht2, 2’s predicted dsn on the signal yt is
p(ht2) := ρ(∙|α1(h
t2)) =
∑a1
ρ(∙|a1)α1(a1|ht2).
20
Back to reputations!Fix α̂1 ∈ Δ(A1) and suppose μ(ξ(α̂1)) > 0.In a Nash eq, at history ht
2, σ2(ht2) is a best response to
α1(ht2) := EP[σ1(h
t1, ξ) | ht
2] ∈ Δ(A1),
that is, σ2(ht2) maximizes∑
a1
∑y u∗
2(a1, a2, y)ρ(y |a1)α1(a1|ht2).
At ht2, 2’s predicted dsn on the signal yt is
p(ht2) := ρ(∙|α1(h
t2)) =
∑a1
ρ(∙|a1)α1(a1|ht2).
If player 1 plays α̂1, true dsn on yt is
p̂ := ρ(∙|α̂1) =∑
a1ρ(∙|a1)α̂1(a1).
21
Back to reputations!Fix α̂1 ∈ Δ(A1) and suppose μ(ξ(α̂1)) > 0.In a Nash eq, at history ht
2, σ2(ht2) is a best response to
α1(ht2) := EP[σ1(h
t1, ξ) | ht
2] ∈ Δ(A1),
that is, σ2(ht2) maximizes∑
a1
∑y u∗
2(a1, a2, y)ρ(y |a1)α1(a1|ht2).
At ht2, 2’s predicted dsn on the signal yt is
p(ht2) := ρ(∙|α1(h
t2)) =
∑a1
ρ(∙|a1)α1(a1|ht2).
If player 1 plays α̂1, true dsn on yt is
p̂ := ρ(∙|α̂1) =∑
a1ρ(∙|a1)α̂1(a1).
Player 2’s one-step ahead prediction error is
d(p̂‖p(ht
2)).
22
Bounding prediction errors
Player 2 is best responding to an action profile α1(ht2) that
is d(p̂‖p(ht
2))-close to α̂1 (as measured by the relative
entropy of the induced signals).
To bound player 1’s payoff, it suffices to bound the numberof periods in which d
(p̂‖p(ht
2))
is large.
23
Bounding prediction errors
Player 2 is best responding to an action profile α1(ht2) that
is d(p̂‖p(ht
2))-close to α̂1 (as measured by the relative
entropy of the induced signals).
To bound player 1’s payoff, it suffices to bound the numberof periods in which d
(p̂‖p(ht
2))
is large.
For any t , Pt2 is the marginal of P on Y t . Then,
Pt2 = μ(ξ̂)P̂t
2 + (1 − μ(ξ̂))P̃t2,
and sod(P̂t
2‖Pt2) ≤ − log μ(ξ̂).
24
Applying the chain rule:
− log μ(ξ̂) ≥ d(P̂t2‖Pt
2)
= d(P̂t−12 ‖Pt−1
2 ) + EP̂d(p̂‖p(ht−12 ))
=t−1∑
τ=0
EP̂d(p̂‖p(hτ2)).
25
Applying the chain rule:
− log μ(ξ̂) ≥ d(P̂t2‖Pt
2)
= d(P̂t−12 ‖Pt−1
2 ) + EP̂d(p̂‖p(ht−12 ))
=t−1∑
τ=0
EP̂d(p̂‖p(hτ2)).
Since this holds for all t ,
∞∑
τ=0
EP̂d(p̂‖p(hτ2)) ≤ − log μ(ξ̂).
26
From prediction bounds to payoff bounds
Definition
An action α2 ∈ Δ(A2) is an ε-entropy confirming best responseto α1 ∈ Δ(A1) if there exists α′
1 ∈ Δ(A1) such that1 α2 is a best response to α′
1; and2 d(ρ(∙|α1)‖ρ(∙|α′
1)) ≤ ε.
The set of ε-entropy confirming BR’s to α1 is denoted Bdε (α1).
27
From prediction bounds to payoff bounds
Definition
An action α2 ∈ Δ(A2) is an ε-entropy confirming best responseto α1 ∈ Δ(A1) if there exists α′
1 ∈ Δ(A1) such that1 α2 is a best response to α′
1; and2 d(ρ(∙|α1)‖ρ(∙|α′
1)) ≤ ε.
The set of ε-entropy confirming BR’s to α1 is denoted Bdε (α1).
In a Nash eq, at any on-the-eq-path history ht2, player 2’s action
is a d(p̂‖p(ht2))-entropy confirming BR to α̂1.
28
From prediction bounds to payoff bounds
Definition
An action α2 ∈ Δ(A2) is an ε-entropy confirming best responseto α1 ∈ Δ(A1) if there exists α′
1 ∈ Δ(A1) such that1 α2 is a best response to α′
1; and2 d(ρ(∙|α1)‖ρ(∙|α′
1)) ≤ ε.
The set of ε-entropy confirming BR’s to α1 is denoted Bdε (α1).
Define, for all payoff types ξ ∈ Ξ1,
v ξα1
(ε) := minα2∈Bd
ε (α1)u1(α1, α2, ξ),
and denote by wξα1 the largest convex function below vξ
α1 .
29
The product-choice game I
c s
H 2, 3 0, 2
L 3, 0 1, 1
Suppose α̂1 = 1 ◦ H.
c is unique BR to α1 if α1(H) > 12 .
s is also a BR to α1 if α1(H) = 12 .
d(1 ◦ H‖12 ◦ H + 1
2 ◦ L) = log 11/2 = log 2 ≈ 0.69.
v ξH(ε) =
{2, if ε < log 2,
0, if ε ≥ log 2.
30
A picture is worth a thousand words
ε0
player 1
payoffs
2
log 2
wξH
v ξH
31
The product-choice game II
c s
H 2, 3 0, 2
L 3, 0 1, 1
Suppose α̂1 = 23 ◦ H + 1
3 ◦ L.
c is unique BR to α1 if α1(H) > 12 .
s is also a BR to α1 if α1(H) = 12 .
d(α̂1‖12 ◦ H + 1
2 ◦ L) = 23 log 2/3
1/2 + 13 log 1/3
1/2
= 53 log 2 − log 3
=: ε̄ ≈ 0.06.
32
Two thousand?
ε0
player 1
payoffs
221
3
13
log 2ε̄
wξα̂1
wξH
v ξα̂1
v ξH
33
The reputation bound
Proposition
Suppose the action type ξ̂ = ξ(α̂1) has positive prior probability,μ(ξ̂) > 0, for some potentially mixed action α̂1 ∈ Δ(A1). Then,player 1 type ξ’s payoff in any Nash equilibrium of the gameΓ(μ, δ) is greater than or equal to wξ
α̂1(ε̂), where
ε̂ := −(1 − δ) log μ(ξ̂).
The only aspect of the set of types and the prior that plays arole in the proposition is the probability assigned to ξ̂.The set of types may be very large, and other quite crazy typesmay receive significant probability under the prior μ.
34
The proof
Since in any Nash equilibrium (σ′1, σ
′2), each payoff type ξ has
the option of playing α̂1 in every period, we have
U1(σ′, ξ) = (1 − δ)
∑∞
t=0δtEP[u1(σ
′1(h
t1), σ
′2(h
t2), ξ) | ξ]
≥ (1 − δ)∑∞
t=0δtEP̂u1(α̂1, σ
′2(h
t2), ξ)
35
The proof
Since in any Nash equilibrium (σ′1, σ
′2), each payoff type ξ has
the option of playing α̂1 in every period, we have
U1(σ′, ξ) = (1 − δ)
∑∞
t=0δtEP[u1(σ
′1(h
t1), σ
′2(h
t2), ξ) | ξ]
≥ (1 − δ)∑∞
t=0δtEP̂u1(α̂1, σ
′2(h
t2), ξ)
≥ (1 − δ)∑∞
t=0δtEP̂v ξ
α̂1(d(p̂‖p(ht
2)))
36
The proof
Since in any Nash equilibrium (σ′1, σ
′2), each payoff type ξ has
the option of playing α̂1 in every period, we have
U1(σ′, ξ) = (1 − δ)
∑∞
t=0δtEP[u1(σ
′1(h
t1), σ
′2(h
t2), ξ) | ξ]
≥ (1 − δ)∑∞
t=0δtEP̂u1(α̂1, σ
′2(h
t2), ξ)
≥ (1 − δ)∑∞
t=0δtEP̂v ξ
α̂1(d(p̂‖p(ht
2)))
≥ (1 − δ)∑∞
t=0δtEP̂wξ
α̂1(d(p̂‖p(ht
2)))
37
The proof
Since in any Nash equilibrium (σ′1, σ
′2), each payoff type ξ has
the option of playing α̂1 in every period, we have
U1(σ′, ξ) = (1 − δ)
∑∞
t=0δtEP[u1(σ
′1(h
t1), σ
′2(h
t2), ξ) | ξ]
≥ (1 − δ)∑∞
t=0δtEP̂u1(α̂1, σ
′2(h
t2), ξ)
≥ (1 − δ)∑∞
t=0δtEP̂v ξ
α̂1(d(p̂‖p(ht
2)))
≥ (1 − δ)∑∞
t=0δtEP̂wξ
α̂1(d(p̂‖p(ht
2)))
≥ wξα̂1
((1 − δ)
∑∞
t=0δtEP̂d(p̂‖p(ht
2)))
38
The proof
Since in any Nash equilibrium (σ′1, σ
′2), each payoff type ξ has
the option of playing α̂1 in every period, we have
U1(σ′, ξ) = (1 − δ)
∑∞
t=0δtEP[u1(σ
′1(h
t1), σ
′2(h
t2), ξ) | ξ]
≥ (1 − δ)∑∞
t=0δtEP̂u1(α̂1, σ
′2(h
t2), ξ)
≥ (1 − δ)∑∞
t=0δtEP̂v ξ
α̂1(d(p̂‖p(ht
2)))
≥ (1 − δ)∑∞
t=0δtEP̂wξ
α̂1(d(p̂‖p(ht
2)))
≥ wξα̂1
((1 − δ)
∑∞
t=0δtEP̂d(p̂‖p(ht
2)))
≥ wξα̂1
(−(1 − δ) log μ(ξ̂)
)= wξ
α̂1(ε̂).
39
Patient player 1
Corollary
Suppose the action type ξ̂ = ξ(α̂1) has positive prior probability,μ(ξ̂) > 0, for some potentially mixed action α̂1 ∈ Δ(A1). Then,for all ξ ∈ Ξ1 and η > 0, there exists a δ̄ < 1 such that, for allδ ∈ (δ̄, 1), player 1 type ξ’s payoff in any Nash equilibrium of thegame Γ(μ, δ) is greater than or equal to
vξα̂1
(0) − η.
40
When does Bd0 (α1) = BR(α1)?
Suppose ρ(∙|a1) 6= ρ(∙|a′1) for all a1 6= a′
1. Then pure actionStackelberg payoff is a reputation lower bound providedthe simple Stackelberg type has positive prob.
Suppose ρ(∙|α1) 6= ρ(∙|α′1) for all α1 6= α′
1. Then mixedaction Stackelberg payoff is a reputation lower boundprovided the prior includes in its support a dense subet ofΔ(A1).
41
How general is the result?
The same argument (with slightly worse notation) works ifthe monitoring distribution depends on both players actions(though statistical identifiability is a more demandingrequirement, particularly for extensive form stage games).
The same argument (with slightly worse notation) alsoworks if the game has private monitoring. Indeed, noticethat player 1 observing the signal played no role in theproof.