issues on the border of economics and computation נושאים בגבול כלכלה וחישוב

Issues on the border of economics and computation

נושאים בגבול כלכלה וחישובSpeaker: Dr. Michael Schapira

Topic: Dynamics in Games(Slides on weighted majority

algorithms from Prof. Avrim Blum’s course at CMU)

Reminder: n-Player Games• Consider a game:

– Si is the set of (pure) strategies for player i• S = S1 x S2 x … x Sn

– s = (s1,s2,…,sn ) S is a vector of strategies– Ui : S R is the payoff function for player i.

• Notation: given a strategy vector s, let s-i = (s1,…,si-1,si,…,sn)

– The vector i where the i’th item is omitted.

• s is a (pure) Nash equilibrium if for every i,ui(si,s-i) ≥ ui(si’,s-i) for every si’ Si

Best-Response Dynamics• The (arguably) most natural way for

reaching a pure Nash (PNE) equilibrium in a game

• Best-response dynamics:–Start at an arbitrary strategy vector–Let players take turns best-

responding to other players’ actions (in any order)

–… until a pure Nash equilibrium is reached

1,1 0,0

1,10,0

RowPlayer

ColumnPlayer

players’ best-responsestwo pure Nash equilibria

Best-Response Dynamics: Illustration

x

y

x y

1,1 0,0

1,10,0

RowPlayer

ColumnPlayer

Best-Response Dynamics: Illustration

start at some strategy vectorlet players take turns best-responding (Row, Column, …)until a PNE is reached

x

y

x y

1,0 0,1

1,00,1

RowPlayer

ColumnPlayer

Do Best-Response Dynamics Always Converge?

1. A PNE might not even exist2. Even if a PNE exists convergence is not guaranteed!

x

y

x y

• When player B plays b and player A plays a, A’s strategy a* is a better response to b if

UA(a*,b) > UA(a,b)

Better Responses

0,1 3,21,5 4,02,2 1,3

RowPlayer

UMD

L RColumnPlayer

Better-Response Dynamics

• Start at an arbitrary strategy vector

• Let players take turns better-responding to other players’ actions (in any order)

• … until a pure Nash equilibrium is reached

1,0 0,1

1,00,1

RowPlayer

ColumnPlayer

Do Better-Response Dynamics Always Converge?

best-response dynamics is a special case ofbetter-response dynamics

x

y

x y

Reminder: Potential Games

• Definition: (exact) potential gameA game is an exact potential game if there is a function Φ:SR such that

• Definition: (ordinal) potential game

Ss it

€

Φ(ti,s−i) −Φ(si,s−i) = ui(ti,s−i) − ui(si,s−i)

€

Φ(ti,s−i) −Φ(si,s−i) > 0

€

ui(ti,s−i) − ui(si,s−i) > 0

Ss it

Reminder: Eq. in Potential Games

• Theorem: every (finite) potential game has a pure Nash equilibrium.

• Theorem: in every (finite) potential game better-response dynamics (and so also best-response dynamics) converge to a PNE

Example: Internet Routing

Establish routes between the smaller networks that make up the Internet

Currently handled by the Border Gateway Protocol (BGP).

AT&T

Qwest

Comcast

Level3

Why is Internet Routing Hard?

Not shortest-paths routing!!!

AT&T

Qwest

Comcast

Level3

My link to UUNET is for backup purposes only.

Load-balance myoutgoing traffic.

Always chooseshortest paths.

Avoid routes through AT&T if at all possible.

BGP Dynamics

1 2

d

Prefer routes

through 2

Prefer routes

through 1

21d2d

12d1d

under BGP each router repeatedly selectsits best available routeuntil a stable state is reached

BGP Might Not Converge!

1 2

3

23d2d…

12d1d…

31d3d…

d

in fact, sometimes a stable state does not even exists

Implications of BGP Instability

almost 50% of problems with VoIPresult from bad BGP convergence…

Internet Routing as a Game• BGP can be modeled as best-response dynamics!• the (source) nodes are the players

• player i’s strategy set is Si = Ni– where N(i) is the set of i’s neighbors Ni

• Player i’s utility from strategy vector s isui(s) = i’s rank for the route from i to d

in the directed graph induced by s**the more preferred a route the higher its rank

• A PNE in this game corresponds toa stable routing state.

d

21d2d

12d1d 21

Next-Hop Preferences• A node i has next-hop preferences if all

paths that go through the same neighbor have the same rank.– i’s route preferences depend only on its “next-

hop node”

. . .

. . . .

d

k i

R2

R1

Positive Result• Theorem: When all nodes have next-hop

preferences the Internet routing game is a potential game– A PNE (= stable state) always exists– better-response (and best-response dynamics)

converge to PNE.

• Proof (sketch): We define the (exact) potential function Φ:SR as follows

Φ(s) = Siui(S)

Positive Result (Proof Sketch)

• Need to prove thatSs it

€

Φ(ti,s−i) −Φ(si,s−i) = ui(ti,s−i) − ui(si,s−i)

. . .

. . . .

d

i

• Observe that the change in i’s strategy does not affect the utility of any player but (possibly i).Φ(ti,s-i) – Φ(si,s-i) = Sjuj(ti,s-i) – Sjuj(si,s-i) =

uj(ti,s-i) – ui(si,s-i)

Other Game Dynamics

• We will next learn about other dynamics that converge to equilibria in games.

• But first…

Motivation

Many situations involve online repeated decision making in an uncertain environment.• Deciding how to invest your money (buy or sell stocks)

• What route to drive to work each day

• …

Expert 1 Expert 2 Expert 3

Online learning, minimizing regret, and combining expert

advice.

Using “Expert” Advice

• We solicit n “experts” for their advice. Assume we want to predict the stock

market.

Can we do nearly as well as best expertin hindsight?

• We then want to use their advice somehow to make our prediction. E.g.,

Note: “expert” someone with an opinion.

• Will the market go up or down?

[Not necessairly someone who knows anything.]

Formal Model• There are n experts.

Can we do nearly as well as bestexpert in hindsight?

• each expert makes a prediction in {0,1}

• At each round t=1,2, …, T

• the learner (using experts’ predictions) makes a prediction in {0,1}

• The learner observes the actual outcome. There is a mistake if the predicted outcome is different

form the actual outcome.The learner gets to update his hypothesis.

Formal Model• There are n experts.

Can we do nearly as well as best expert in hindsight?

• each expert makes a prediction in {0,1}

• At each round t=1,2, …, T

• the learner (using experts’ predictions) makes a prediction in {0,1}• The learner observes the actual outcome. There is a mistake

if the predicted outcome is different form the actual outcome.

We are not given any other info besides the experts’ yes/no answers. We make no assumptions about the quality or

independence of the experts.We cannot hope to achieve an absolute level of quality in our

predictions.

Simpler Question• We have n “experts”.

• Is there a strategy that makes no more than lg(n) mistakes?

• One of these is perfect (never makes a mistake). We don’t know which one.

Halving Algorithm

Take majority vote over all experts that have been correct so far.

I.e., if # surviving experts predicting 1 > # surviving experts predicting 0, then predict 1; else predict 0.

Claim: If one of the experts is perfect, then at most lg(n) mistakes.

Proof: Each mistake cuts # surviving experts by factor of 2, so we make · lg(n) mistakes.

Note: this means ok for n to be very large.

Using “Expert” Advice• If one expert is perfect, get · lg(n) mistakes

with halving algorithm. • But what if no expert is perfect? Can we

do nearly as well as the best one in hindsight?

Using “Expert” AdviceStrategy #1: Iterated halving algorithm.

• Makes at most log(n)*[OPT+1] mistakes, where OPT is #mistakes of the best expert in hindsight.

At the end of an epoch we have crossed all the experts, so every single expert must make a mistake. So, the best expert must have

made a mistake. We make at most log n mistakes per epoch.

• Same as before, but once we've crossed off all the experts, restart from the beginning.

Divide the whole history into epochs. Beginning of an epoch is when we restart Halving; end of an epoch is when we have crossed off all

the available experts.

• If OPT=0 we get the previous guarantee.

Using “Expert” AdviceStrategy #1: Iterated halving algorithm.

Wasteful. Constantly forgetting what we've “learned”.

• Makes at most log(n)*[OPT+1] mistakes, where OPT is #mistakes of the best expert in hindsight.

• Same as before, but once we've crossed off all the experts, restart from the beginning.

Can we do better?

Weighted Majority Algorithm

Instead of crossing off, just lower its weight.

– Start with all experts having weight 1.Weighted Majority Algorithm

Key Point: A mistake doesn't completely disqualify an expert.

– If then predict 1else predict 0

– Predict based on weighted majority vote.

Weighted Majority Algorithm

Instead of crossing off, just lower its weight.

– Start with all experts having weight 1.Weighted Majority Algorithm

Key Point: A mistake doesn't completely disqualify an expert.

– Predict based on weighted majority vote.– Penalize mistakes by cutting weight in half.

Analysis: Does Nearly as Well as Best Expert

If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:

Theorem:

If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:

Theorem:

• Analyze W = total weight (starts at n).

constant ratio

• After each mistake, W drops by at least 25%. So, after M mistakes, W is at most n(3/4)M.

Proof:

• Weight of best expert after M mistakes is (1/2)OPT. So,

Analysis: Does Nearly as Well as Best Expert

Randomized Weighted Majority

2.4(OPT + lg n) not so good if the best expert makes a mistake 30% of the time.

Can we do better?

• Yes. Instead of taking majority vote, use weights as probabilities & predict each outcome with prob. ~ to its weight. (e.g., if 70% on up, 30% on down, then

pick 70:30)Key Point: smooth out the worst case.


2.4(OPT + lg n) not so good if the best expert makes a mistake 20% of the time.

• Also, generalize ½ to 1- e.

Can we do better?

Equivalent to select an expert with probability proportional with its weight.

• Yes. Instead of taking majority vote, use weights as probabilities. (e.g., if 70% on up, 30%

on down, then pick 70:30)

Formal Guarantee for RWM

If M = expected # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:

Theorem:

M (1+e)OPT + (1/e) log(n)

€

≤

Analysis

• Say at time t we have fraction Ft of weight on experts that made mistake.

i.e., • For all t,

• Ft is our expected loss at time t; probability we make a mistake at time t.

• Key idea: if the algo has significant expected loss, then the total weight must drop substantially.

Analysis• Say at time t we have fraction Ft of weight on

experts that made mistake.• So, we have probability Ft of making a mistake, and

we remove an eFt fraction of the total weight.– Wfinal = n(1-eF1)(1 - eF2)…

– ln(Wfinal) = ln(n) + åt [ln(1 - eFt)] < ln(n) - e åt Ft(using ln(1-x) < -x)

= ln(n) - eM. (å Ft = E[# mistakes])• If best expert makes OPT mistakes, ln(Wfinal) > ln((1-

e)OPT).• Now solve: ln(n) - eM > OPT ln(1-e).


Solves to:

Additive Regret Bounds• So, have M < OPT + eOPT + (1/e)log(n).• Say we know we will play for T time steps. Then

can set e=(log(n) / T)1/2 and get

M < OPT + 2(T * log(n))1/2.

• If we don’t know T in advance, can guess and double.

• These are called “additive regret” bounds.

Extensions: Experts as Actions

• What if experts are actions? – different ways to drive to work each day– different ways to invest our money– rows in a matrix game…

• At each time t, each action has a loss (cost) in {0,1}.

• Can still run the algorithm– Rather than viewing as “pick a prediction

with prob proportional to its weight” ,– View as “pick an expert with probability

proportional to its weight”• Same analysis applies.

Note: Did not see the predictions to select an expert (only needed to see their losses to update our weights)

Extensions: Experts as Actions

• What if experts losses are not in {0,1}, but in the continuous interval [0,1]?

• If expert i has loss li, do: wi := wi(1-lie). [before if an expert had a loss of 1, we multiplied by (1-epsilon), if it had loss of 0 we left it alone, now we do linearly

in between]• Same analysis applies.

Extensions: Losses in {0,1}

issues on the border of economics and computation נושאים בגבול כלכלה וחישוב

Documents