issues on the border of economics and computation נושאים בגבול כלכלה וחישוב
DESCRIPTION
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב. Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Slides on weighted majority algorithms from Prof. Avrim Blum’s course at CMU). Reminder: n -Player Games. Consider a game: - PowerPoint PPT PresentationTRANSCRIPT
Issues on the border of economics and computation
נושאים בגבול כלכלה וחישובSpeaker: Dr. Michael Schapira
Topic: Dynamics in Games(Slides on weighted majority
algorithms from Prof. Avrim Blum’s course at CMU)
Reminder: n-Player Games• Consider a game:
– Si is the set of (pure) strategies for player i• S = S1 x S2 x … x Sn
– s = (s1,s2,…,sn ) S is a vector of strategies– Ui : S R is the payoff function for player i.
• Notation: given a strategy vector s, let s-i = (s1,…,si-1,si,…,sn)
– The vector i where the i’th item is omitted.
• s is a (pure) Nash equilibrium if for every i,ui(si,s-i) ≥ ui(si’,s-i) for every si’ Si
Best-Response Dynamics• The (arguably) most natural way for
reaching a pure Nash (PNE) equilibrium in a game
• Best-response dynamics:–Start at an arbitrary strategy vector–Let players take turns best-
responding to other players’ actions (in any order)
–… until a pure Nash equilibrium is reached
1,1 0,0
1,10,0
RowPlayer
ColumnPlayer
players’ best-responsestwo pure Nash equilibria
Best-Response Dynamics: Illustration
x
y
x y
1,1 0,0
1,10,0
RowPlayer
ColumnPlayer
Best-Response Dynamics: Illustration
start at some strategy vectorlet players take turns best-responding (Row, Column, …)until a PNE is reached
x
y
x y
1,0 0,1
1,00,1
RowPlayer
ColumnPlayer
Do Best-Response Dynamics Always Converge?
1. A PNE might not even exist2. Even if a PNE exists convergence is not guaranteed!
x
y
x y
• When player B plays b and player A plays a, A’s strategy a* is a better response to b if
UA(a*,b) > UA(a,b)
Better Responses
0,1 3,21,5 4,02,2 1,3
RowPlayer
UMD
L RColumnPlayer
Better-Response Dynamics
• Start at an arbitrary strategy vector
• Let players take turns better-responding to other players’ actions (in any order)
• … until a pure Nash equilibrium is reached
1,0 0,1
1,00,1
RowPlayer
ColumnPlayer
Do Better-Response Dynamics Always Converge?
best-response dynamics is a special case ofbetter-response dynamics
x
y
x y
Reminder: Potential Games
• Definition: (exact) potential gameA game is an exact potential game if there is a function Φ:SR such that
• Definition: (ordinal) potential game
Ss it
€
Φ(ti,s−i) −Φ(si,s−i) = ui(ti,s−i) − ui(si,s−i)
€
Φ(ti,s−i) −Φ(si,s−i) > 0
€
ui(ti,s−i) − ui(si,s−i) > 0
Ss it
Reminder: Eq. in Potential Games
• Theorem: every (finite) potential game has a pure Nash equilibrium.
• Theorem: in every (finite) potential game better-response dynamics (and so also best-response dynamics) converge to a PNE
Example: Internet Routing
Establish routes between the smaller networks that make up the Internet
Currently handled by the Border Gateway Protocol (BGP).
AT&T
Qwest
Comcast
Level3
Why is Internet Routing Hard?
Not shortest-paths routing!!!
AT&T
Qwest
Comcast
Level3
My link to UUNET is for backup purposes only.
Load-balance myoutgoing traffic.
Always chooseshortest paths.
Avoid routes through AT&T if at all possible.
BGP Dynamics
1 2
d
Prefer routes
through 2
Prefer routes
through 1
21d2d
12d1d
under BGP each router repeatedly selectsits best available routeuntil a stable state is reached
BGP Might Not Converge!
1 2
3
23d2d…
12d1d…
31d3d…
d
in fact, sometimes a stable state does not even exists
Implications of BGP Instability
almost 50% of problems with VoIPresult from bad BGP convergence…
Internet Routing as a Game• BGP can be modeled as best-response dynamics!• the (source) nodes are the players
• player i’s strategy set is Si = Ni– where N(i) is the set of i’s neighbors Ni
• Player i’s utility from strategy vector s isui(s) = i’s rank for the route from i to d
in the directed graph induced by s**the more preferred a route the higher its rank
• A PNE in this game corresponds toa stable routing state.
d
21d2d
12d1d 21
Next-Hop Preferences• A node i has next-hop preferences if all
paths that go through the same neighbor have the same rank.– i’s route preferences depend only on its “next-
hop node”
. . .
. . . .
d
k i
R2
R1
Positive Result• Theorem: When all nodes have next-hop
preferences the Internet routing game is a potential game– A PNE (= stable state) always exists– better-response (and best-response dynamics)
converge to PNE.
• Proof (sketch): We define the (exact) potential function Φ:SR as follows
Φ(s) = Siui(S)
Positive Result (Proof Sketch)
• Need to prove thatSs it
€
Φ(ti,s−i) −Φ(si,s−i) = ui(ti,s−i) − ui(si,s−i)
. . .
. . . .
d
i
• Observe that the change in i’s strategy does not affect the utility of any player but (possibly i).Φ(ti,s-i) – Φ(si,s-i) = Sjuj(ti,s-i) – Sjuj(si,s-i) =
uj(ti,s-i) – ui(si,s-i)
Other Game Dynamics
• We will next learn about other dynamics that converge to equilibria in games.
• But first…
Motivation
Many situations involve online repeated decision making in an uncertain environment.• Deciding how to invest your money (buy or sell stocks)
• What route to drive to work each day
• …
Expert 1 Expert 2 Expert 3
Online learning, minimizing regret, and combining expert
advice.
Using “Expert” Advice
• We solicit n “experts” for their advice. Assume we want to predict the stock
market.
Can we do nearly as well as best expertin hindsight?
• We then want to use their advice somehow to make our prediction. E.g.,
Note: “expert” someone with an opinion.
• Will the market go up or down?
[Not necessairly someone who knows anything.]
Formal Model• There are n experts.
Can we do nearly as well as bestexpert in hindsight?
• each expert makes a prediction in {0,1}
• At each round t=1,2, …, T
• the learner (using experts’ predictions) makes a prediction in {0,1}
• The learner observes the actual outcome. There is a mistake if the predicted outcome is different
form the actual outcome.The learner gets to update his hypothesis.
Formal Model• There are n experts.
Can we do nearly as well as best expert in hindsight?
• each expert makes a prediction in {0,1}
• At each round t=1,2, …, T
• the learner (using experts’ predictions) makes a prediction in {0,1}• The learner observes the actual outcome. There is a mistake
if the predicted outcome is different form the actual outcome.
We are not given any other info besides the experts’ yes/no answers. We make no assumptions about the quality or
independence of the experts.We cannot hope to achieve an absolute level of quality in our
predictions.
Simpler Question• We have n “experts”.
• Is there a strategy that makes no more than lg(n) mistakes?
• One of these is perfect (never makes a mistake). We don’t know which one.
Halving Algorithm
Take majority vote over all experts that have been correct so far.
I.e., if # surviving experts predicting 1 > # surviving experts predicting 0, then predict 1; else predict 0.
Claim: If one of the experts is perfect, then at most lg(n) mistakes.
Proof: Each mistake cuts # surviving experts by factor of 2, so we make · lg(n) mistakes.
Note: this means ok for n to be very large.
Using “Expert” Advice• If one expert is perfect, get · lg(n) mistakes
with halving algorithm. • But what if no expert is perfect? Can we
do nearly as well as the best one in hindsight?
Using “Expert” AdviceStrategy #1: Iterated halving algorithm.
• Makes at most log(n)*[OPT+1] mistakes, where OPT is #mistakes of the best expert in hindsight.
At the end of an epoch we have crossed all the experts, so every single expert must make a mistake. So, the best expert must have
made a mistake. We make at most log n mistakes per epoch.
• Same as before, but once we've crossed off all the experts, restart from the beginning.
Divide the whole history into epochs. Beginning of an epoch is when we restart Halving; end of an epoch is when we have crossed off all
the available experts.
• If OPT=0 we get the previous guarantee.
Using “Expert” AdviceStrategy #1: Iterated halving algorithm.
Wasteful. Constantly forgetting what we've “learned”.
• Makes at most log(n)*[OPT+1] mistakes, where OPT is #mistakes of the best expert in hindsight.
• Same as before, but once we've crossed off all the experts, restart from the beginning.
Can we do better?
Weighted Majority Algorithm
Instead of crossing off, just lower its weight.
– Start with all experts having weight 1.Weighted Majority Algorithm
Key Point: A mistake doesn't completely disqualify an expert.
– If then predict 1else predict 0
– Predict based on weighted majority vote.
Weighted Majority Algorithm
Instead of crossing off, just lower its weight.
– Start with all experts having weight 1.Weighted Majority Algorithm
Key Point: A mistake doesn't completely disqualify an expert.
– Predict based on weighted majority vote.– Penalize mistakes by cutting weight in half.
Analysis: Does Nearly as Well as Best Expert
If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:
Theorem:
If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:
Theorem:
• Analyze W = total weight (starts at n).
constant ratio
• After each mistake, W drops by at least 25%. So, after M mistakes, W is at most n(3/4)M.
Proof:
• Weight of best expert after M mistakes is (1/2)OPT. So,
Analysis: Does Nearly as Well as Best Expert
Randomized Weighted Majority
2.4(OPT + lg n) not so good if the best expert makes a mistake 30% of the time.
Can we do better?
• Yes. Instead of taking majority vote, use weights as probabilities & predict each outcome with prob. ~ to its weight. (e.g., if 70% on up, 30% on down, then
pick 70:30)Key Point: smooth out the worst case.
Randomized Weighted Majority
2.4(OPT + lg n) not so good if the best expert makes a mistake 20% of the time.
• Also, generalize ½ to 1- e.
Can we do better?
Equivalent to select an expert with probability proportional with its weight.
• Yes. Instead of taking majority vote, use weights as probabilities. (e.g., if 70% on up, 30%
on down, then pick 70:30)
Randomized Weighted Majority
Formal Guarantee for RWM
If M = expected # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:
Theorem:
M (1+e)OPT + (1/e) log(n)
€
≤
Analysis
• Say at time t we have fraction Ft of weight on experts that made mistake.
i.e., • For all t,
• Ft is our expected loss at time t; probability we make a mistake at time t.
• Key idea: if the algo has significant expected loss, then the total weight must drop substantially.
Analysis• Say at time t we have fraction Ft of weight on
experts that made mistake.• So, we have probability Ft of making a mistake, and
we remove an eFt fraction of the total weight.– Wfinal = n(1-eF1)(1 - eF2)…
– ln(Wfinal) = ln(n) + åt [ln(1 - eFt)] < ln(n) - e åt Ft(using ln(1-x) < -x)
= ln(n) - eM. (å Ft = E[# mistakes])• If best expert makes OPT mistakes, ln(Wfinal) > ln((1-
e)OPT).• Now solve: ln(n) - eM > OPT ln(1-e).
Randomized Weighted Majority
Solves to:
Additive Regret Bounds• So, have M < OPT + eOPT + (1/e)log(n).• Say we know we will play for T time steps. Then
can set e=(log(n) / T)1/2 and get
M < OPT + 2(T * log(n))1/2.
• If we don’t know T in advance, can guess and double.
• These are called “additive regret” bounds.
Extensions: Experts as Actions
• What if experts are actions? – different ways to drive to work each day– different ways to invest our money– rows in a matrix game…
• At each time t, each action has a loss (cost) in {0,1}.
• Can still run the algorithm– Rather than viewing as “pick a prediction
with prob proportional to its weight” ,– View as “pick an expert with probability
proportional to its weight”• Same analysis applies.
Note: Did not see the predictions to select an expert (only needed to see their losses to update our weights)
Extensions: Experts as Actions
• What if experts losses are not in {0,1}, but in the continuous interval [0,1]?
• If expert i has loss li, do: wi := wi(1-lie). [before if an expert had a loss of 1, we multiplied by (1-epsilon), if it had loss of 0 we left it alone, now we do linearly
in between]• Same analysis applies.
Extensions: Losses in {0,1}