1 dong lu, peter a. dinda prescience laboratory department of computer science northwestern...
Post on 21-Dec-2015
219 views
TRANSCRIPT
1
Dong Lu, Peter A. Dinda
Prescience Laboratory
Department of Computer Science
Northwestern University
Evanston, IL 60201
GridG: Synthesizing Realistic Computational Grids
2
Outline
• Why GridG?• What is GridG?• Topology generation
– Hierarchical vs. degree based? – What are the relationships among the power laws
of Internet topology?
• Annotation– What are the intra- and inter- correlations among
the hosts and within a host?– How to build the correlations into GridG?
• Conclusions and future work
3
Why GridG?
• Synthetic Grids needed to evaluate Middleware • Existing physical grids too small
• Can’t control parameters
• Example: Evaluation of our RGIS system
• Example: Grid simulation projects
– GridSim and SimGrid
• Example: overlay network simulations
– Application level multicast
4
GridG: A Synthetic Grid Generator
• Output: Network topology annotated with the hardware and software available on each node and link. – Layer 3 network: hosts, routers, links– Hosts: memory, architecture, number of
CPUs, disk, operating system, vendor, clock rate
– Routers: switching capacity– Links: bandwidth and Latency
5
Example 1Router (switching capacity)
Host (arch, numcpu, clock rate, osvendor, mem, disk,)
Link (bw, latency)
6
Requirements
• Realistic topologies– Connected– Hierarchical topology– Power laws of Internet topology
• Realistic annotations – Distributions of attributes– Correlations of attributes
• Intra-host• Inter-host
7
GridG architecture
• A sequence of transformations on a text-based representation of an annotated graph.
Other transformationson common format(Cluster maker, etc)
Structured TopologyBase
TopologyGenerator
(Tiers)
TranslationTo
CommonFormat
GridGPowerLaw
Enforcer
Structured Topologythat obeys power laws
Grid
GridGAnnotator
GISSimulator
DOTVisualization
OtherTools
RGISDatabase
8
Outline
• Why GridG?• What is GridG?• Topology generation
– Hierarchical vs. degree based? – What are the relationships among the power laws
of Internet topology?
• Annotation– What are the intra- and inter- correlations among
the hosts and within a host?– How to build the correlations into GridG?
• Conclusions and future work
9
Quick review of the Power laws of Internet topology
Power Laws Expression
Rank exponent
Outdegree exponent
Eigen exponent
Hop-plot exponent
Rvv rd
Od df
ii HhhP )(
10
Current Graph generators
• Random (Waxman)• Hierarchical :
Tiers, Transit-Stub, etc. have clear network hierarchy, but don’t follow power laws
• Degree based : Inet, Brite, PLRG, etc. follow power laws, but don’t have clear network hierarchy
11
Topology Generation in GridG (1/2)
1. Generate a basic graph without any redundant links using Tiers
• This is a hierarchical graph
2. Assign each node an outdegree randomly using the outdegree exponent power law as the distribution
• This enforces all the power laws!• Scale-free
3. Determine the remaining outdegree of each node by taking original hierarchical links into consideration
12
4. Add redundant links between randomly chosen pairs of nodes with sufficient remaining outdegree
• Nodes at higher levels (e.g., WAN) are given priority over nodes at lower levels (e.g., MAN)
5. Repeat 4 until there is no pair of nodes with positive remaining outdegree
Topology Generation in GridG (2/2)
13
Evaluation: Topology Obeys Rank Exponent Law
0
5
10
15
20
25
30
35
0 500 1000 1500 2000 2500
Ranking
GridG
Tiers
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.5 1 1.5 2 2.5 3 3.5
log(r)
GridG
Tiers
14
Evaluation: Topology Obeys Outdegree Exponent Law
0
200
400
600
800
1000
1200
1400
0 5 10 15 20 25
Outdegree
GridG
Tiers
0
0.5
1
1.5
2
2.5
3
3.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4
log(d)
GridG
Tiers
15
Evaluation : Topology Obeys Hop-plot Law
0
500000
1000000
1500000
2000000
2500000
3000000
0 1 2 3 4 5 6 7 8
Number of hops
GridG
Tiers
0
1
2
3
4
5
6
7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
log(number of Hops)
GridG
Tiers
16
Evaluation : Topology Obeys Eigenvalue Exponent Law
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12 14 16 18 20
Order
GridG
Tiers
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2 1.4
log(Order)
GridG
Tiers
17
Comparing To The Internet
Power Law Internet Routers GridG Tiers
Rank -0.49 -0.51 -0.18
R2 0.94 0.89
Outdegree -2.49 -2.63 -3.4
R2 0.97 0.55
Eigen -0.18 -0.24 -0.23
R2 0.97 0.97
Hop-plot 2.84 2.88 1.64
R2 0.99 0.99Notice Close Match
18
Relationship among power laws (0)
• An interesting phenomenon: GridG and several other graph generators generate graphs according to the outdegree law only. But the generated graphs follow all four power laws!
• How is this possible?
The power laws are closely related
Can we deduce other power laws from the outdegree power law?
19
Relationship among power laws (1)
• Eigenvalue law follows from the outdegree law [Mihail and Papadimitriou]
• Hop-plot and Eigenvalue power laws are followed by many topologies [Medina, et al]
• Outdegree law follows from the rank law• Rank law does not follow from outdegree law• Alternative rank law follows from outdegree
law and fits data betterOur Results
20
Relationship among power laws (2)
Rank law Outdegree law
Od df
])1[(111
Rv
Rv
Rd ddf
This is a power law
21
Relationship among power laws (3)
0
0.5
1
1.5
2
2.5
3
3.5
4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
log(d)
Log-log plot of the derived Outdegree law. Perfect power law fit. So we can do Rank law Outdegree law.
22
Relationship among power laws (4)
)]1,()([ vv dOONr
Outdegree law Rank law
Rvv
dr
1
)(
1
1)(
ntn
t
kn
tnkt
1),(
This is NOT a power law
23
Relationship among power laws (5)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.5 1 1.5 2 2.5 3 3.5 4
log(r)
Log-log plot of the derived Rank law. Not power law! So we can NOT do Outdegree law Rank law.
Corresponds well to the Faloutsos Internet data
24
Relationship among power laws (6)• Log-log plot of derived Outdegree law using the new
Rank law. It is perfect power law.
0
0.5
1
1.5
2
2.5
3
3.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
log(d)
25
Relationship among power laws (7)
We propose the following as the relationships among Internet topology power laws
New rank law Outdegree power law
Eigenvalue law
26
Outline
• Why GridG?• What is GridG?• Topology generation
– Hierarchical vs. degree based? – What are the relationships among the power laws
of Internet topology?
• Annotation– What are the intra- and inter- correlations among
the hosts and within a host?– How to build the correlations into GridG?
• Conclusions and future work
27
Annotation Generator
• Distributions for attributes– Example: Smith MDS trace for
memory• Intra-host correlation of attributes
– Example: Memory and CPU• Inter- host correlations of attributes
– Example: cluster of identical machines
28
Intra-host correlations
• The Memory size, Architecture, CPU clock rate, Number of CPUs, Disk size, etc, all have certain distributions. These distributions are not independent, however– Example: a host with 64 CPUs is likely to have
very big memory. Similarly, a host with a 3Ghz processor is likely to have bigger memory than a host with 1Ghz processor
• Many Intra-host correlations are unknown• GridG has heuristic rules and can be
extended by the user
29
Heuristic Intra-host rules• One processor will have memory between
64M and 4G• More CPUs, more likely to have bigger
memory and disk• More memory, more likely to have bigger
disk, and vice versa • Windows machines won’t have more than 4
processors• Machines with different architectures have
different distributions of CPU clock rate• Host load is not correlated to other attributes.
31
Inter-host correlations
• Hosts that are close to each other are likely to share some attributes.
• For example: OS concentration – Every IP subnet we probed had a dominant OS
• OS concentration rule built into GridG– User can disable
32
Annotation Algorithm : Basic
• Based on the dependence tree, make grid conform to correlations by applying conditional probability– Choosing the distribution of an attribute based
on attribute picked before it.
• For example: first choose architecture according to a distribution, then choose the number of CPUs based on it, finally, choose the size of memory based on the previous two choices.
33
Annotation Algorithm: user rules
• User can add rules to GridG: for example, “all the hosts with N or above processors will have memory bigger than N*1024 MB”, etc.
• User rules appear as perl functions.
• User can also configure the distribution of host attributes in the config file.
34
Examples: Silly hostsHost Num
CPUClock rate
Mem (MB)
Disk (GB)
Arch OS OS vendor
1 512 1200 256 40 IA32 DUX Sun
2 16 1000 512 800 PARISC NetBSD Microsoft
3 4 1600 512 160 SPARC32 DUX RedHat
4 1 1800 65536 400 IA32 Solaris Microsoft
Hosts generated without considering Intra-host correlation, each attribute follows its own distribution.
35
Examples: Sensible hosts
Host NumCPU
Clock rate
Mem (MB)
Disk (GB)
Arch OS OS vendor
1 512 1200 65536 10240 MIPS FreeBSD FreeBSD
2 16 1000 8192 800 PARISC NetBSD NetBSD
3 4 1600 1024 160 SPARC32 Solaris Sun
4 1 1800 512 80 IA32 Win2k Microsoft
Hosts generated with considering Intra-host correlations.
36
Open questions
• What are the real distributions of host attributes?
• What are the real intra- and inter-host correlations?
Difficult to answer without measurement dataDifficult to acquire measurement data (see paper)
We would appreciate your help!
37
Conclusions
1. We have presented GridG, a tool kit for generating synthetic computational grids.
2. The topology generation component can produce structured network topologies that obey the power laws of Internet topology.
3. The annotation generation component of GridG is built upon Internet measurements and a set of heuristic rules.
38
Conclusions
4. While developing GridG’s topology generator, we discovered an interesting relationship among the power laws, and proposed a new one that better fits the data.
5. While measuring the Internet, we found the OS concentration phenomenon and built it into GridG as an user option.
39
For MoreInformation
GridG is released online at:• http://www.cs.northwestern.edu/~urgis/GridG• http://www.cs.northwestern.edu/~urgis
Related RGIS project papers:• “Nondeterministic queries in a Relational Grid Information Service”, In proceedings of SC03.• “Scoped and Approximate queries in a Relational Grid Information Service”, In proceedings of
Grid2003.