Download - Chapter 3 Graphs, Trees, and Tours
Chapter 3
Graphs, Trees, and Tours
Presented by Qibin Cai
Overview
Terminology in graph theory
Trees - Minimum spanning tree (MST) - Shortest path tree (SPT)
Tours - TSP tours
Overview cont’d
Building trees
- Kruskal’s algorithm- Prim’s algorithm- Dijkstra’s algorithm- Prim-Dijkstra
algorithm
Building tours
- Nearest-neighbor algorithm
- Improved nearest-neighbor heuristic
- Divide and conquer strategy
Terminology
What is a graph?
Observation:
A graph is a set of points in a plane (or in 3-space) and a set of line segments (possibly curved), each of which either joins two points or joins a point to itself.
Some definitionsGraphsGraphs
A,B,C etc. are vertices(nodes)A,B,C etc. are vertices(nodes) (A,X), (X,Y) etc. are edges(A,X), (X,Y) etc. are edges P,Q,Z is a cycleP,Q,Z is a cycle Degree of a node is the number of edges at the nodeDegree of a node is the number of edges at the node
– Degree Y =3, degree C=1Degree Y =3, degree C=1
X
Y
Z
P Q
A
B
D
C
Terminology cont’d
Definition in mathematical language?
A graph G = (V, E) is a mathematical structure consisting of two sets V and E. The elements of V are called vertices (or nodes), and the elements of E are called edges.
Digragh : a directed graph
Terminology cont’d
Endpoints : a set of one or two vertices associated to each edge.Loop: an edge where both endpoints are the same. Also called a self-loop.Parallel edges: a collection of two or more edges having identical end.
Also called a multi-edge.
Terminology cont’d
A graph is simple if it has no loops or parallel edges.
Most of our discussions will involve simple graphs. Sometimes, when we considering reliability, we will introduce parallel edges if the network has parallel links.
Terminology cont’d
The degree of a node: the number of edges in the graph that have the node as an endpoint (plus twice the number of self-loops).
Indegree
Outdegree
Terminology cont’d
Adjacent vertices:
Two nodes are adjacent if there is an edge that has them as endpoints.
Incidence:
The relationship between an edge and its endpoints.
Terminology cont’d
Walk from vertex u to vertex v:
an alternating sequence of vertices and edges, representing a continuous traversal from vertex u to vertex v.
Trail: a walk with no repeated edges.
Path: a walk with no repeated vertices.
Terminology cont’d
Cycle:
a closed path with at least an edge.
Connected graph:
a graph in which every pair of distinct vertices has a walk between them.
Terminology cont’d
Subgraph: A graph G’=(N’,A’) is a subgraph of G=(N,A) if N’ N and
A’ A.
Component:
a maximal connected subgraph of a graph.
Terminology cont’d
Isomorphism:
Two graphs G1 and G2 are isomorphic if there is a 1-to-1 mapping f: v1 -> v2 such that (v1, v2) E1 if and only if (f(v1), f(v2)) E2.
Trees
Tree: a connected, simple graph without cycles.
Star: a tree in which only 1 node has degree greater than 1.
Chain: a tree in which no node has degree greater than 2.
Any tree with n nodes has n-1 edges.
Trees
A tree is a connected simple graph with no cycles e.g.
XY
Z
P Q
A
B
D
C
Star
A tree is a star if only 1 node has degree >1
X
Y
Z
PQ
A
B
D
C
Chains
A chain is a tree with no nodes of degree >2
XY
Z
P Q
A
B
D
C
Weighted Graph
A graph G is weighted if there is a value associated with each edge (e.g. link speed, cost, etc.) Weight of the edge ei = w(ei)
We often denote this graph (G, w). If G’ is any subgraph of G, then w(G’) = '
)(Ge
ew
To optimise a connected graph find the graph with the minimum weight
The Minimal Spanning Tree (MST)
Minimal Spanning Trees
Let G be a connected weighted graph.
A spanning subgraph includes all the nodes of G.
A tree T is a spanning tree of G if T is a spanning subgraph of G.
MST: A spanning tree of G whose total edge-weight is a minimum.
Finding the MST
Two algorithms Kruskal and Prim
Kruskal achieves the MST by starting with a graph and cutting out edges
Prim starts by selecting a node, adding the “least expensive edge” iterates until tree is built
Example MST
Use of MSTs
Small design problems - few nodes
Highly reliable links with low “downtime”or network can tolerate unreliability
Nodes ‘v’ reliabilityAs the number of nodes increases
reliability decreases (exponentially!)
Kruskal’s Algorithm (1956)
1. Check that the graph G is connected. If it is not connected, abort.2. Sort the edges of the graph G in ascending order of weight.3. Mark each node as a separate component.4. Examine each of the sorted edges:
if the edge connects two separate components, add it ; otherwise, discard.
Prim’s Algorithm (1957)
Input : a weighted connected graph G=(N,E).
Output : a minimum spanning tree T.
U = set of all nodes in MST
V = set of all nodes that are NOT yet in MST, but they’re adjacent to nodes in U.
Prim’s Algorithm (cont’d)
1. Place any node in U, and update V .
2. Find the edge with smallest weight that connects a node in V to a node in U3. Add that edge to the tree,
and update U & V.4. Repeat 2 & 3 until all nodes are included, i.e., | U | = | N |.
How to use Delite to Calculate MST’s
Invoke the code for Prim’s algorithm from the Design menu.
Select to produce a trace file
Demonstration
Tree Designs
Overview• Squareworld• Coordinate systems:• - V & H• - L & L• MSTs do not scale• Definitions: hops(n1,n2), hops
Square World
We will create a little world with several properties that make it a nice place to work on network design problem.
The world is 1000 miles by 1000 miles. 1 type of transmission line with a capacity of 1,000,000 bps. Given 2 sites, S1 at location (X1, Y1) and S2 at location (X2,Y2),
the cost of a link between them is
($1000 + $10 x d) / month
where
212
212 )()( yyxxd
2 Coordinate Systems
We will use a problem generator to set up a series of network design problems. Before we can do this, we need to know something about the methods of locating sites are used in the real world.
Vertical and horizontal (V&H) - a grid of lines, or more accurately curves is drawn. - allows for a simplified computation of distances.
Latitude and longitude (L&L) - defined for all locations on the surface of the earth. - The distance calculation is essentially an exercise in
spherical geometry.
MSTs Do Not scale
Why? First look at an example.Figure 3.2 (An MST for 5 nodes in square world)
N2
N1 N5
N4
N3 MAX_UTIL=0.6%
Figure 3.3 (An MST for 10 nodes in square world) N6
N2 N7 N10
N9 N1 N5
N4 N8 N3MAX_UTIL=2.5%
The network is beginning to have a leggy look, which means that the traffic is taking a circuitous route between its source and destination.
To qualify the legginess in the network, we make the following definition.
Two Definitions
Definitions 3.17 - The number of hops
between node n1 and n2 is the number of edges in the path chosen by the routing algorithm for the traffic flowing from n1 to n2.
Denoted by hops (n1,n2)
Definitions 3.18
- The average number of hops in a network is:
Denoted by
2,1
2,1
)2,1(
)2,1()2,1(
nn
nn
nnTraffic
nnhopsnnTraffic
hops
MSTs Do Not scale (cont’d)
We summarize the values of as below:
Number of nodes
5 1.8
10 3.1778
20 4.4158
50 8.5159
100 13.9479
hops
hops #hops grow past a reasonable level, and MSTs are not good solutions as # nodes and the traffic grow.
Then we will consider if we can design better trees.
Shortest-Path Trees (SPT)
Definition 3.19 Given a weighted graph (G,W) and nodes n1 and n2, the
shortest path from n1 to n2 is a path P such that is a minimum.
Definition 3.20 Given a weighted graph (G,W) and a node n1, a shortest –
path tree rooted at n1 is a tree T such that, for any other node n2 G, the path from n1 to n2 in the tree T is a shortest path between the nodes.
Pe
ew )(
Dijkstra’s Algorithm
1. Mark every node as unscanned and give each node a label of
2. Set the label of the root to 0 and the predecessor of the root to itself. The root will be the only node that is its own predecessor.
Dijkstra’s Algorithm (cont’d)
3. Loop until you have scanned all the nodes.
-Find the node n with the smallest label. Since the label represents the distance to the root we call it d_min.
-Mark the node as scanned.
-Scan all the adjacent nodes m and see if the distance to the root through n is better than the distance stored in the label of m. If it is, update the label and update pred[m]=n.
4. When the loop finishes, we have a tree stored in pred format rooted at root.
SPT vs. MST
Lower utilization of the links
More cost
Important:
Smaller average number of hops
Actually we will compare star (a kind of SPT) with MST, because …
Star vs. MST
If we run Dijkstra’s algorithm on a sparse graph, we will get a tree with a fair number of nodes not connected directly to the root.
If we run Dijkstra’s algorithm on a complete graph (exactly what we’re studying now), then we usually get a star.
SPT vs. MST
Star vs. MST (cont’d)
Design name MAX_UTIL Cost
MST 13.9479 0.493 $325.516
Star 1.9800 0.09 $453.861
hops
Prim’s algorithm produces much shorter paths but can produce very expensive networks. SPT is not good, either.
Is there some middle ground between MST and SPT?
Prim – Dijkstra Trees
Algorithm : Label
1) Prim’s:
2) Dijkstra’s:
3) Prim-Dijkstra’s:
),(min neighbornodedistneighbors
)),(),((min nodeneighbordistneighborrootdistneighbors
)),(),((min nodeneighbordistneighborrootdistneighbors
10
Prim – Dijkstra Trees (cont’d)
If , we build a MST.
If , we build a SPT.
The delay, and cost for various Prim-Dijkstra trees.
01
Design Link delay Cost
0(MST) N0 13.9479 0.3066 $325,516
0.1 N1 10.5717 0.1451 $280,162
0.2 N2 7.8640 0.1067 $247,217
0.3 N3 6.7762 0.0913 $243,551
0.4 N4 5.6679 0.0746 $248,650
0.5 N5 4.6303 0.0598 $253,579
0.6 N6 3.7063 0.0467 $273,742
0.7 N7 3.0186 0.0380 $295,012
0.8 N8 2.2879 0.0277 $378,792
0.9 N9 1.9800 0.0233 $453,861
hops
If we have such a large set of designs shown in the table, how to select the best?
Dominance among Designs
If we have a large set of designs, the problem is to decide which merit consideration and which should be discarded.
To help us do this we impose a partial ordering on the designs.
Definition 3.21:
Given a set S and an operator that maps S x S {TRUE,FALSE}, then we call S a partially ordered set, or poset, if
1) For any s S, s s is FALSE.
2) For any s1, s2 S, s1 s2, if s1 s2 is TRUE, then s2 s1 is FALSE.
3) If s1 s2 and s2 s3 are TRUE, then s1 s3 is TRUE.
Dominance among Designs(cont’d)
Dominance among Designs(cont’d)
Definition 3.22:
Suppose design D1 has cost C1 and performance P1. Suppose design D2 has cost C2 and performance P2. We will say D1 dominates D2, or D1 D2, if C1 < C2 and P1 > P2.
Dominance among Designs(cont’d)
Design Dominates Link delay Cost
N0 0.3066 $325,516
N1 N0 0.1451 $280,162
N2 N0,N1 0.1067 $247,217
N3 N0,N1,N2 0.0913 $243,551
N4 N0,N1 0.0746 $248,650
N5 N0,N1 0.0598 $253,579
N6 N0,N1 0.0467 $273,742
N7 N0 0.0380 $295,012
N8 0.0277 $378,792
N9 0.0233 $453,861
Dominance among Designs(cont’d)
Show dominance relationships as a directed graph:
A directed graph is a graph G=(V,E) in which each edge e has been given an orientation. If the edge has endpoints v1 and v2, we shall denote the edge e=(v1,v2) if the orientation of v1 is the source vertex.
Dominance among Designs(cont’d)
Think of the designs (N0,N1,N2,…,N9) as the nodes of a graph.
A directed edge runs from Ni to Nj if Ni Nj.
We can see we don’t want to consider N0, N1, or N2.
N9
N5
N7
N3
N0
N1
N2
N6
N4
N8
Further Analysis ofPrim-Dijkstra Trees
Given a pair of nondominating designs S1 and S2, 1 must be cheaper and 1 must have lower delay.
After rejecting the dominated designs, we still have 7 designs left to choose from. One way to clarify their differences further is to discuss the marginal cost of delay:
C1 - C2
P2 - P1
Using Delite to ProducePrim-Dijkstra Trees
Unlike Prim’s algorithm, the choice of the node at the center of the tree is important in the Prim-Dijkstra algorithm.
The value of .
Create trace file.
Tours
Sometimes a tree is just too unreliable to be a good network design.
Tours are far more reliable yet only have 1 additional link.
In graph theory, a tour refers to a possible solution of the traveling salesman problem (TSP).
Tours (cont’d)
Definition 3.24:
Given a set of vertices , a tour T is a set of n edges E such that each vertex v has degree 2 and the graph is connected.
1tv },...,,{ 21 nvvv
Tours (cont’d)
The number of tours is
Proof :
1) Represent the tour as a permutation:
2) There are n! such permutations, but the reverse permutation also gives the same tour.
2/)!1( n
),...,,(21 nttt vvv
TSP
Definition 3.25:
Given a set of vertices
and a distance function ,
the traveling salesman problem is to find the tour T such that
is a minimum.
In this notation we identify by
),...,,(21 nttt vvv
VVd :
n
itt iivvd
1
),(1
1ntv
1tv
Reliability
Definition 3.26: The reliability of a network is the probability that the functioning nodes are connected by working links.
Assumption:
1) The probability of each node working is 1;
2) The probability of a link failing is p .
Typically, p is a small number.
3) There is no correlation between the link failures.
Reliability (cont’d)
For the 5-node tree:
b d
a e
c
P(no_link_failures) =
P(failure) =
=
4)1( p4)1(1 p
432 4641 pppp p4
Reliability (cont’d)
For the 5-node ring network :
b d
a e
c
If q = 1-p, then
45 51)( pqqfailurePring 542332 51010 pqpqpqp 3210 qp
Reliability (cont’d)
p 4p (tree)
0.1 0.4 0.0729
0.01 0.04 0.00097
0.001 0.004
… … …
)(10 32 ringqp
510
410 4104 710
Building Tours
TSP is NP-hard.
No polynomial – time algorithm.
We will use heuristic algorithms.
For our purpose a heuristic algorithm for tours will build some tour but not necessarily the TSP tour.
Nearest-neighbor Algorithm
1. Start at a distinguished node we call root and set current_node = root.
2. Loop until we have all the nodes in the tour.
Now loop through the nodes and find the node closest to the current_node that is not in the tour. We call this best_node.
Create an edge between current_node and best_node.
Reset the current_node to the best_node.
3. Finally create an edge between the last node and the root to complete the tour.
Nearest-neighbor Algorithm(cont’d)
Observation: * Good (?): We are trying to produce a short tour, we will always
move to the best possible next location.
* Bad (?): When we look at the figure produced, we can see the
lines may cross frequently.
* Can we improve it? How can we measure the goodness (creditability) of an algorithm?
Creditable Algorithms
After a little analysis the lines would be uncrossed by hand, and the creditability of all the work would be brought into question.
Definition 3.27:
A heuristic optimization algorithm produces a creditable result if the result is a local optimum for the problem. Otherwise, it produces an uncreditable result.
Creditable Algorithms(cont’)
Example:
Prim’s and Dijkstra’s algorithms solve the MST and SPT problems. Their results are always creditable.
The nearest-neighbor tour-building algorithm frequently produces uncreditable designs.
Creditable Algorithms(cont’)
Definition 3.28 A suite of network design problems S is a set of triples for
Definition 3.29 A creditability test a program test (net, traffic, cost) that takes a
network problem as input and returns OK or FAIL depending on whether or not test () can manipulate net into another valid network of lower cost.
Definition 3.30 Given a suite of network design problems S, a design algorithm A,
and a creditability test t ( ) then
),,( iii CostTrafficLocation Si ,...,
S
OKnettSnetACs
)()(
Creditable Algorithms(cont’)
First creditability test on the nearest-neighbor heuristic. (Refer to the code in Appendix B “Computing creditability of Simple Nearest Neighbor”)
The creditability list : Sites 50 trials 500 trials 5000 trials
6 64.0% 59.4% 58.0%
8 56.0% 52.4% 47.94%
10 34.0% 37.8% 39.84%
15 22.0% 21.8% 22.62%
20 10.0% 13.8% 12.58%
30 8.0% 3.2% 3.84%
40 0.0% 0.8% 1.50%
A more creditable nearest-neighbor algorithm
It differs from the simpler algorithm in two ways:
First, the closest node doesn’t mean the closest node to the last added node; it means the closest node to any node in the partial tour we have built.
Second, when we add best_node to the tour, we don’t add it at the end of the tour, rather, we do a test to find the best place to add it, such that
is the smallest possible value among all nodes and that are adjacent in the partially built tour.
),(),_()_,( jiji NNdistNnodebestdistnodebestNdist
iN
iN jN
A more creditable nearest-neighbor algorithm (cont’d)
The increasing of the creditability of the tour built by the improved nearest-neighbor heuristic :
Sites 50 trials 500 trials 5000 trials 6 64.0% -> 98.0% 59.4%->94.8% 58.0%->95.32% 8 56.0% -> 92.0% 52.4% ->92.6% 47.94%->92.78% 10 34.0% -> 90.0% 37.8% ->90.8% 39.84%->91.04% 15 22.0% -> 86.0% 21.8% ->82.6% 22.62%->82.18% 20 10.0% -> 80.0% 13.8% ->72.6% 12.58%->74.06% 30 8.0% -> 54.0% 3.2% ->60.4% 3.84%->58.94% 40 0.0% -> 56.0% 0.8% ->55.0% 1.50%->48.84%
Time Complexity
Simple nearest-neighbor algorithm:
Improved nearest-neighbor algorithm:
Both heuristics have the same computational complexity.
)( 2n
)()2( 22 nn
A related heuristic
The furthest-neighbor heuristic
vs
the nearest-neighbor heuristic :
The nearest-neighbor heuristic has a tendency to strand the furthest sites, because it brings the site with the smallest dtour into the tour at the best site.
The furthest-neighbor heuristic brings the site with the largest value of dtour into the tour.
TSP Tours Do Not Scale
Theorem 3.3: Given uniform traffic any TSP tour of n nodes has if n is odd and if n is even. (Proof on text pp.86)
Comparison of average number of hops for MST and TSP:
Number of nodes 5 1.8 1.5 10 3.1778 2.777 20 4.4158 5.263 50 8.5159 12.755 100 13.9479 25.252
4
1n
hops
)1(4
2
nn
MSThops TSPhops
Intolerable !
Tour Building and Delite
The Delite tool can build tours using both the improved nearest-neighbor heuristic and the furthest-neighbor heuristic.
Starting node.
Trace file.
Return to Graph Theory:2 - Connectivity
Connectivity: Two nodes i and j are connected if the
graph contains at least one path from node i and j.
A graph is connected if every pair of its nodes is connected.
Return to Graph Theory:2 – Connectivity (cont’d)
The attraction of tours is that they are 2-connected.
Definition 3.31:
Given a connected graph G=(V,E), the vertex v is an articulation point if removing the vertex and all the attached edges disconnects the graph.
Definition 3.32:
If a connected graph G=(V,E) has no articulation points, then the graph is 2-connected.
* Our goal is to produce a design that is 2-connected and that has fewer hops. To this end we quote a helpful theorem.
Return to Graph Theory:2 – Connectivity (cont’d)
Theorem 3.4: Suppose and are 2-connected
graphs with . Let and .
Then the graph G with vertices and edges
is 2-connected.
How can we use this theorem on joining together 2-connected graphs to help us with the hop count problem in tours?
),( 111 EVG ),( 222 EVG
21 VV 121, Vvv 243 , Vvv
21 VV ),(),( 423121 vvvvEE
The answer is a divide-and-conquer strategy.
Divide and Conquer
In the figure, we have a TSP tour on 20 nodes. The average number of hops is 5.263. We want to reduce the average hop count but keep the 2-connectivity.
N20
N13 N6
N2
N7
N15 N9
N14
N10 N1 N5
N9 N12
N16 N18
N17 N4
N8 N11
N3
Divide and Conquer (cont’d)
1. Divide the 20 nodes into 2 “compact” clusters of 10 nodes each. Call these clusters C1 and C2.
(We might divide the 20 nodes by ranges of their coordinates, for example, to create the 2 clusters.)
2. Use the nearest-neighbor algorithm to design 2 TSP tours on each cluster.
3. Select v1 C1 and v2 C2 to be the 2 nodes such that the distance is the minimum.
4. Now select v3 C1-v1 and v4 C2-v2 to be the 2 nodes such that the distance is the minimum.
5. Add the edges (v1,v2), (v3,v4) to the design.
Divide and Conquer (cont’d)
Many questions about the procedure will be addressed later in the book. At this point, we will focus on the generalization to more clusters.
One approach to building networks composed of multiple 2-connected clusters can be stated in the following theorem.
Divide and Conquer (cont’d)
Theorem 3.5 Suppose that is a 2-connected graph with
. Suppose that each node is replaced by a 2-connected graph . Suppose each edge is replaced by an edge from to . Then if no 2 of these replacement edges have a common vertex, the graph
is a 2-connected graph.
),( EVG
2V Vvi iG
Ewue ),(,e
uGu ' wGv'
)',( EEVH iiii
Divide and Conquer (cont’d)
G=(V,E)
Divide and Conquer (cont’d)
H
Summary
If the traffic is small when compared to link size, then the optimal networks are MSTs and TSP tours, depending on the reliability desired.
Both MSTs and TSP tours do not scale.
The growth in the average # of hops is at the heart of the problem. It’s better to build a Prim-Dijkstra tree or a “ring of rings” to control the length of the routes.
The notion of creditability.
The End
Thank you !