algorithmic problems in the internet christos h. papadimitriou christos
Post on 22-Dec-2015
219 views
TRANSCRIPT
Iowa State, April 2003 2
Goals of TCS (1950-2000): Develop a productive mathematical understanding
of the capabilities and limitations of the von Neumann computer and its software (the dominant and most novel computational artifacts of that time);Mathematical tools: combinatorics, logic
What should the goals of TCS be today?(and what math tools will be handy?)
Iowa State, April 2003 4
The Internet
• huge, growing, open, emergent, mysterious
• built, operated and used by a multitude of diverse economic interests
• as information repository: open, huge, available, unstructured, critical
• foundational understanding urgently needed
5Iowa State, April 2003
Today…
• Games and mechanism design
• Getting lost in the web
• The Internet’s heavy tail
Iowa State, April 2003 7
1,-1 -1,1
-1,1 1,-1
0,0 0,1
1,0 -1,-1
3,3 0,4
4,0 1,1
matching pennies prisoner’s dilemma
chicken
e.g.
Iowa State, April 2003 8
Nash equilibrium
• Definition: double best response
(problem: may not exist)
• randomized Nash equilibrium
Theorem [Nash 1952]: Always exists.
• Problem: there are usually many...
Iowa State, April 2003 9
The price of anarchy
cost of worst Nash equilibrium
“socially optimum” cost [Koutsoupias and P, 1998]
in networkrouting
= 2 [Roughgarden and Tardos, 2000,
Roughgargen 2002]
Iowa State, April 2003 10
mechanism design(or inverse game theory)
• agents have utilities – but these utilities are known only to them
• game designer prefers certain outcomes depending on players’ utilities
• designed game (mechanism) has designer’s goals as dominating strategies
Iowa State, April 2003 11
e.g., Vickrey auction
• sealed-highest-bid auction encourages gaming and speculation
• Vickrey auction: Highest bidder wins,
pays second-highest bid
Theorem: Vickrey auction is a truthful mechanism.
(Theorem: It maximizes social benefit and auctioneer expected revenue.)
Iowa State, April 2003 12
Vickrey shortest paths
6
6
3
4
5
11
10
3
ts
pay e Vc(e) = its declared cost c(e),plus a bonus equal to dist(s,t)|c(e) = - dist(s,t)
Iowa State, April 2003 14
But…
• …in the Internet Vickrey overcharge would be only about 30% on the average [FPSS 2002]
• Could this be the manifestation of rational behavior at network creation?
• [FPSS 2002]: Vickrey charges– Depend on origin and destination– Can be computed on top of BGP
Iowa State, April 2003 15
But… (cont)
• [FPSS 2002]: Vickrey charges– Depend on origin and destination– Can be computed on top of BGP
• [with Mihail and Saberi, 2003]– They are small in expectation in random
graphs.– (Also: Why traffic grows moderately as the
Internet grows…)
Iowa State, April 2003 16
The web as a graphcf: [Google 98], [Kleinberg 98]
• how do you sample the web?
[Bar-Yossef, Berg, Chien, Fakcharoenphol, Weitz, VLDB 2000]
• e.g.: 42% of web documents are in html. How do you find that?
• What is a “random” web document?
17Iowa State, April 2003
documents
hyperlinks
Idea: random walk
Problems:
1. asymmetric 2. uneven degree3. 2nd eigenvalue?
= 0.99999
Iowa State, April 2003 18
The web walker: results
• mixing time is ~log N/(1-)
• WW mixing time: 3,000,000
• actual WW mixing time: 100
• .com 49%, .jp 9%,
.edu 7%, .cn 0.8%
Iowa State, April 2003 19
Q: Is the web a random graph?
• Many K3,3’s (“communities”)• Indegrees/outdegrees obey “power laws”
• Model [Kumar et al, FOCS 2000]: copying
Iowa State, April 2003 20
Also the Internet
• [Faloutsos3 1999] the degrees of the Internet are power law distributed
• Both autonomous systems graph and router graph
• Eigenvalues: ditto!??!
• Model?
Iowa State, April 2003 21
The world according to Zipf
• Power laws, Zipf’s law, heavy tails,…
• i-th largest is ~ i-a (cities, words: a = 1, “Zipf’s Law”)
• Equivalently: prob[greater than x] ~ x -b
• (compare with law of large numbers)
• “the signature of human activity”
Iowa State, April 2003 22
Models
• Size-independent growth (“the rich get richer,” or random walk in log paper)
• Growing number of growing cities
• In the web: copying links [Kumar et al, 2000]
• Carlson and Doyle 1999: Highly optimized tolerance (HOT)
Iowa State, April 2003 24
Theorem:
• if < const, then graph is a star
degree = n -1
• if > n, then there is exponential concentration of degrees
prob(degree > x) < exp(-ax)
• otherwise, if const < < n, heavy tail:
prob(degree > x) > x -b
Iowa State, April 2003 25
Heuristically optimized tradeoffs
• Also: file sizes (trade-off between communication costs and file overhead)
• Power law distributions seem to come from tradeoffs between conflicting objectives (a signature of human activity?)
• cf HOT, [Mandelbrot 1954]• Other examples? • General theorem?
Iowa State, April 2003 26
PS: eigenvalues
Model: Edge [i,j] has prob. ~ di dj
Theorem [with Mihail, 2002]: If the di’s obey a power law, then the nb largest eigenvalues are almost surely very close to d1, d2, d3, …
(NB: The eigenvalue exponent observed in Faloutsos3 is about ½ of the degree exponent)
Corollary: Spectral methods are of dubious value in the presence of large features