the canonical reputation model and reputation effects€¦ · the canonical reputation model a...

1

The Canonical Reputation Model andReputation Effects

George J. Mailath

University of Pennsylvania

Lecture at the Summer School in Economic TheoryJerusalem, Israel

June 25, 2015

2

The Canonical Reputation Model

A long-lived player 1 faces a sequence of short-lived players, inthe role of player 2 of the stage game.

Ai , finite action set for each player.

Y , finite set of public signals of player 1’s actions, a1.

ρ(y | a1), prob of signal y ∈ Y , given a1 ∈ A1.

Player 2’s ex post stage game payoff is u∗2(a1, a2, y), and

2’s ex ante payoff is

u2(a1, a2) :=∑

y∈Yu∗

2(a1, a2, y)ρ(y |a1).

Each player 2 maximizes her stage game payoff u2.

3

The player 2’s are uncertain about the characteristics ofplayer 1: Player 1’s characteristics are described by histype, ξ ∈ Ξ.

All the player 2’s have a common prior μ on Ξ.Type space is partitioned into two sets, Ξ = Ξ1 ∪Ξ2, where

Ξ1 is the set of payoff types andΞ2 is the set of behavioral (or commitment or action) types.

For ξ ∈ Ξ1, player 1’s ex post stage game payoff isu∗

1(a1, a2, y , ξ), and 1’s ex ante payoff is

u1(a1, a2, ξ) :=∑

y∈Yu∗

1(a1, a2, y , ξ)ρ(y |a1).

Each type ξ ∈ Ξ1 of player 1 maximizes

(1 − δ)∑

t≥0δtu1(a1, a2, ξ).

4

Player 1 knows his type and observes all past actions andsignals, while each player 2 only the history of past signals.

A strategy for player 1:

σ1 : ∪∞t=0(A1 × A2 × Y )t × Ξ → Δ(A1).

If ξ̂ ∈ Ξ2 is a simple action type, then ∃!α̂1 ∈ Δ(A1) suchthat σ1(ht

1, ξ̂) = α̂1 for all ht1.

A strategy for player 2:

σ2 : ∪∞t=0Y t → Δ(A2).

5

Space of outcomes: Ω := Ξ × (A1 × A2 × Y )∞.

A profile (σ1, σ2) with prior μ induces the unconditionaldistribution P ∈ Δ(Ω).

For a fixed simple type ξ̂ = ξ(α̂1), the probability measureon Ω conditioning on ξ̂ (and so induced by α̂1 in everyperiod and σ2), is denoted P̂ ∈ Δ(Ω).

Denoting by P̃ the measure induced by (σ1, σ2) andconditioning on ξ 6= ξ̂, we have

P = μ(ξ̂)P̂ + (1 − μ(ξ̂))P̃. (1)

Given a strategy profile σ, U1(σ, ξ) denotes the type-ξlong-lived player’s payoff in the repeated game,

U1(σ, ξ) := EP

[

(1 − δ)∞∑

t=0

δtu1(at , yt , ξ)

∣∣∣∣∣ξ

]

.

6

Denote by Γ(μ, δ) the game of incomplete information.

Definition

A strategy profile (σ′1, σ

′2) is a Nash equilibrium of the game

Γ(μ, δ) if, for all ξ ∈ Ξ1, σ′1 maximizes U1(σ1, σ

′2, ξ) over player

1’s repeated game strategies, and if for all t and all ht2 ∈ H2 that

have positive probability under (σ′1, σ

′2) and μ (i.e., P(ht

2) > 0),

EP[u2(σ

′1(h

t1, ξ), σ

′2(h

t2)) | ht

2

]= max

a2∈A2

EP[u2(σ

′1(h

t1, ξ), a2) | ht

2

].

7

Denote by Γ(μ, δ) the game of incomplete information.

Definition

A strategy profile (σ′1, σ

′2) is a Nash equilibrium of the game

Γ(μ, δ) if, for all ξ ∈ Ξ1, σ′1 maximizes U1(σ1, σ

′2, ξ) over player

1’s repeated game strategies, and if for all t and all ht2 ∈ H2 that

have positive probability under (σ′1, σ

′2) and μ (i.e., P(ht

2) > 0),

EP[u2(σ

′1(h

t1, ξ), σ

′2(h

t2)) | ht

2

]= max

a2∈A2

EP[u2(σ

′1(h

t1, ξ), a2) | ht

2

].

Our goal: Reputation Bound (Fudenberg & Levine ’89 ’92)Fix a payoff type, ξ ∈ Ξ1. What is a “good” lower bound,uniform across Nash equilibria σ′ and Ξ, for U1(σ

′, ξ)?

Our tool (Gossner 2011): relative entropy.

8

Relative Entropy

X a finite set of outcomes.

The relative entropy or Kullback-Leibler distance betweenprobability distributions p and q over X is

d(p‖q) :=∑

x∈X

p(x) logp(x)

q(x).

By convention, 0 log 0q = 0 for all q ∈ [0, 1] and p log p

0 = ∞for all p ∈ (0, 1]. In our applications of relative entropy, thesupport of q will always contain the support of p.

Since relative entropy is not symmetric, often say d(p‖q) isthe relative entropy of q with respect to p.

d(p‖q) ≥ 0, and d(p‖q) = 0 ⇐⇒ p = q.

9

Relative entropy is expected prediction error

d(p‖q) measures observer’s expected prediction error onx ∈ X using q when true dsn is p:

n i.i.d. draws from X under p has probability∏

x p(x)nx ,where nx is the number of realization of x in sample.

Observer assigns same sample probability∏

x q(x)nx .

Log likelihood ratio is

L(x1, . . . , xn) =∑

x nx log p(x)q(x) ,

and so1nL(x1, . . . , xn) → d(p‖q).

10

The chain rule

LemmaSuppose P, Q ∈ Δ(X × Y ), X and Y finite sets. Then

d(P‖Q) = d(PX‖QX ) +∑

x PX (x)d(PY (∙|x)‖QY (∙|x))

= d(PX‖QX ) + EPXd(PY (∙|x)‖QY (∙|x)).

11

The chain rule


d(P‖Q) = d(PX‖QX ) +∑

x PX (x)d(PY (∙|x)‖QY (∙|x))


Proof.

d(P‖Q) =∑

x ,y P(x , y) log PX (x)QX (x)

P(x ,y)PX (x)

QX (x)Q(x ,y)

12

The chain rule


d(P‖Q) = d(PX‖QX ) +∑

x PX (x)d(PY (∙|x)‖QY (∙|x))


Proof.

d(P‖Q) =∑


P(x ,y)PX (x)

QX (x)Q(x ,y)

= d(PX‖QX ) +∑

x ,y P(x , y) log PY (y |x)QY (y |x)

13

The chain rule


d(P‖Q) = d(PX‖QX ) +∑

x PX (x)d(PY (∙|x)‖QY (∙|x))


Proof.

d(P‖Q) =∑


P(x ,y)PX (x)

QX (x)Q(x ,y)

= d(PX‖QX ) +∑


14

The chain rule


d(P‖Q) = d(PX‖QX ) +∑

x PX (x)d(PY (∙|x)‖QY (∙|x))


Proof.

d(P‖Q) =∑


P(x ,y)PX (x)

QX (x)Q(x ,y)

= d(PX‖QX ) +∑


16

A grain of truth

LemmaLet X be a finite set of outcomes. Suppose p, p′ ∈ Δ(X ) andq = εp + (1 − ε)p′ for some ε > 0. Then,

d(p‖q) ≤ − log ε.

17

A grain of truth

LemmaLet X be a finite set of outcomes. Suppose p, p′ ∈ Δ(X ) andq = εp + (1 − ε)p′ for some ε > 0. Then,

d(p‖q) ≤ − log ε.

Proof.Since q(x)/p(x) ≥ ε, we have

−d(p‖q) =∑

x p(x) log q(x)p(x) ≥

∑x p(x) log ε = log ε.

18

Back to reputations!Fix α̂1 ∈ Δ(A1) and suppose μ(ξ(α̂1)) > 0.In a Nash eq, at history ht

2, σ2(ht2) is a best response to

α1(ht2) := EP[σ1(h

t1, ξ) | ht

2] ∈ Δ(A1),

that is, σ2(ht2) maximizes∑

a1

∑y u∗

2(a1, a2, y)ρ(y |a1)α1(a1|ht2).

19



α1(ht2) := EP[σ1(h

t1, ξ) | ht

2] ∈ Δ(A1),


a1

∑y u∗

2(a1, a2, y)ρ(y |a1)α1(a1|ht2).

At ht2, 2’s predicted dsn on the signal yt is

p(ht2) := ρ(∙|α1(h

t2)) =

∑a1

ρ(∙|a1)α1(a1|ht2).

20



α1(ht2) := EP[σ1(h

t1, ξ) | ht

2] ∈ Δ(A1),


a1

∑y u∗

2(a1, a2, y)ρ(y |a1)α1(a1|ht2).


p(ht2) := ρ(∙|α1(h

t2)) =

∑a1

ρ(∙|a1)α1(a1|ht2).

If player 1 plays α̂1, true dsn on yt is

p̂ := ρ(∙|α̂1) =∑

a1ρ(∙|a1)α̂1(a1).

22

Bounding prediction errors

Player 2 is best responding to an action profile α1(ht2) that

is d(p̂‖p(ht

2))-close to α̂1 (as measured by the relative

entropy of the induced signals).

To bound player 1’s payoff, it suffices to bound the numberof periods in which d

(p̂‖p(ht

2))

is large.

23

Bounding prediction errors

Player 2 is best responding to an action profile α1(ht2) that

is d(p̂‖p(ht

2))-close to α̂1 (as measured by the relative

entropy of the induced signals).

To bound player 1’s payoff, it suffices to bound the numberof periods in which d

(p̂‖p(ht

2))

is large.

For any t , Pt2 is the marginal of P on Y t . Then,

Pt2 = μ(ξ̂)P̂t

2 + (1 − μ(ξ̂))P̃t2,

and sod(P̂t

2‖Pt2) ≤ − log μ(ξ̂).

24

Applying the chain rule:

− log μ(ξ̂) ≥ d(P̂t2‖Pt

2)

= d(P̂t−12 ‖Pt−1

2 ) + EP̂d(p̂‖p(ht−12 ))

=t−1∑

τ=0

EP̂d(p̂‖p(hτ2)).

25

Applying the chain rule:

− log μ(ξ̂) ≥ d(P̂t2‖Pt

2)

= d(P̂t−12 ‖Pt−1

2 ) + EP̂d(p̂‖p(ht−12 ))

=t−1∑

τ=0

EP̂d(p̂‖p(hτ2)).

Since this holds for all t ,

∞∑

τ=0

EP̂d(p̂‖p(hτ2)) ≤ − log μ(ξ̂).

26

From prediction bounds to payoff bounds

Definition

An action α2 ∈ Δ(A2) is an ε-entropy confirming best responseto α1 ∈ Δ(A1) if there exists α′

1 ∈ Δ(A1) such that1 α2 is a best response to α′

1; and2 d(ρ(∙|α1)‖ρ(∙|α′

1)) ≤ ε.

The set of ε-entropy confirming BR’s to α1 is denoted Bdε (α1).

27


Definition



1; and2 d(ρ(∙|α1)‖ρ(∙|α′

1)) ≤ ε.


In a Nash eq, at any on-the-eq-path history ht2, player 2’s action

is a d(p̂‖p(ht2))-entropy confirming BR to α̂1.

28


Definition



1; and2 d(ρ(∙|α1)‖ρ(∙|α′

1)) ≤ ε.


Define, for all payoff types ξ ∈ Ξ1,

v ξα1

(ε) := minα2∈Bd

ε (α1)u1(α1, α2, ξ),

and denote by wξα1 the largest convex function below vξ

α1 .

29

The product-choice game I

c s

H 2, 3 0, 2

L 3, 0 1, 1

Suppose α̂1 = 1 ◦ H.

c is unique BR to α1 if α1(H) > 12 .

s is also a BR to α1 if α1(H) = 12 .

d(1 ◦ H‖12 ◦ H + 1

2 ◦ L) = log 11/2 = log 2 ≈ 0.69.

v ξH(ε) =

{2, if ε < log 2,

0, if ε ≥ log 2.

30

A picture is worth a thousand words

ε0

player 1

payoffs

2

log 2

wξH

v ξH

31

The product-choice game II

c s

H 2, 3 0, 2

L 3, 0 1, 1

Suppose α̂1 = 23 ◦ H + 1

3 ◦ L.

c is unique BR to α1 if α1(H) > 12 .

s is also a BR to α1 if α1(H) = 12 .

d(α̂1‖12 ◦ H + 1

2 ◦ L) = 23 log 2/3

1/2 + 13 log 1/3

1/2

= 53 log 2 − log 3

=: ε̄ ≈ 0.06.

32

Two thousand?

ε0

player 1

payoffs

221

3

13

log 2ε̄

wξα̂1

wξH

v ξα̂1

v ξH

33

The reputation bound

Proposition

Suppose the action type ξ̂ = ξ(α̂1) has positive prior probability,μ(ξ̂) > 0, for some potentially mixed action α̂1 ∈ Δ(A1). Then,player 1 type ξ’s payoff in any Nash equilibrium of the gameΓ(μ, δ) is greater than or equal to wξ

α̂1(ε̂), where

ε̂ := −(1 − δ) log μ(ξ̂).

The only aspect of the set of types and the prior that plays arole in the proposition is the probability assigned to ξ̂.The set of types may be very large, and other quite crazy typesmay receive significant probability under the prior μ.

34

The proof

Since in any Nash equilibrium (σ′1, σ

′2), each payoff type ξ has

the option of playing α̂1 in every period, we have

U1(σ′, ξ) = (1 − δ)

∑∞

t=0δtEP[u1(σ

′1(h

t1), σ

′2(h

t2), ξ) | ξ]

≥ (1 − δ)∑∞

t=0δtEP̂u1(α̂1, σ

′2(h

t2), ξ)

35

The proof




U1(σ′, ξ) = (1 − δ)

∑∞

t=0δtEP[u1(σ

′1(h

t1), σ

′2(h

t2), ξ) | ξ]

≥ (1 − δ)∑∞


′2(h

t2), ξ)

≥ (1 − δ)∑∞

t=0δtEP̂v ξ

α̂1(d(p̂‖p(ht

2)))

36

The proof




U1(σ′, ξ) = (1 − δ)

∑∞

t=0δtEP[u1(σ

′1(h

t1), σ

′2(h

t2), ξ) | ξ]

≥ (1 − δ)∑∞


′2(h

t2), ξ)

≥ (1 − δ)∑∞

t=0δtEP̂v ξ

α̂1(d(p̂‖p(ht

2)))

≥ (1 − δ)∑∞

t=0δtEP̂wξ

α̂1(d(p̂‖p(ht

2)))

37

The proof




U1(σ′, ξ) = (1 − δ)

∑∞

t=0δtEP[u1(σ

′1(h

t1), σ

′2(h

t2), ξ) | ξ]

≥ (1 − δ)∑∞


′2(h

t2), ξ)

≥ (1 − δ)∑∞

t=0δtEP̂v ξ

α̂1(d(p̂‖p(ht

2)))

≥ (1 − δ)∑∞

t=0δtEP̂wξ

α̂1(d(p̂‖p(ht

2)))

≥ wξα̂1

((1 − δ)

∑∞

t=0δtEP̂d(p̂‖p(ht

2)))

38

The proof




U1(σ′, ξ) = (1 − δ)

∑∞

t=0δtEP[u1(σ

′1(h

t1), σ

′2(h

t2), ξ) | ξ]

≥ (1 − δ)∑∞


′2(h

t2), ξ)

≥ (1 − δ)∑∞

t=0δtEP̂v ξ

α̂1(d(p̂‖p(ht

2)))

≥ (1 − δ)∑∞

t=0δtEP̂wξ

α̂1(d(p̂‖p(ht

2)))

≥ wξα̂1

((1 − δ)

∑∞

t=0δtEP̂d(p̂‖p(ht

2)))

≥ wξα̂1

(−(1 − δ) log μ(ξ̂)

)= wξ

α̂1(ε̂).

39

Patient player 1

Corollary

Suppose the action type ξ̂ = ξ(α̂1) has positive prior probability,μ(ξ̂) > 0, for some potentially mixed action α̂1 ∈ Δ(A1). Then,for all ξ ∈ Ξ1 and η > 0, there exists a δ̄ < 1 such that, for allδ ∈ (δ̄, 1), player 1 type ξ’s payoff in any Nash equilibrium of thegame Γ(μ, δ) is greater than or equal to

vξα̂1

(0) − η.

40

When does Bd0 (α1) = BR(α1)?

Suppose ρ(∙|a1) 6= ρ(∙|a′1) for all a1 6= a′

1. Then pure actionStackelberg payoff is a reputation lower bound providedthe simple Stackelberg type has positive prob.

Suppose ρ(∙|α1) 6= ρ(∙|α′1) for all α1 6= α′

1. Then mixedaction Stackelberg payoff is a reputation lower boundprovided the prior includes in its support a dense subet ofΔ(A1).

41

How general is the result?

The same argument (with slightly worse notation) works ifthe monitoring distribution depends on both players actions(though statistical identifiability is a more demandingrequirement, particularly for extensive form stage games).

The same argument (with slightly worse notation) alsoworks if the game has private monitoring. Indeed, noticethat player 1 observing the signal played no role in theproof.

the canonical reputation model and reputation effects€¦ · the canonical reputation model a...

Documents