a survey on dynamic graph algorithms

A survey on the

algorithms of dynamic graphs

By

Sayantani Dutta

A Graduate Paper

Submitted to the Faculty of

Mississippi State University

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in Computer Science

in the Department of Computer Science and Engineering

Mississippi State, Mississippi

December 2014

1

I. INTRODUCTION

A dynamic graph is a graph that goes through various updates. These updates may be insertion or

deletion of edges and vertices from the graph. An efficient dynamic graph algorithm aims to

update the solution of a problem after every dynamic operation, so that the solution does not

have to be computed again from scratch. Dynamic graphs can be both directed and undirected

and can be classified based on the types of updates they go through. A fully dynamic graph can

have unrestricted insertions and deletions of edges and vertices, whereas a partially dynamic

graph allows either insertions or deletions. An incremental partially dynamic graph allows only

insertions and a decremental partially dynamic graph allows only deletion operations. [2]

Dynamic graphs are used in communication networks, VLSI design, graphics, assembly

planning, financial transaction networks, disease transmission networks, ecological food

networks, sensor networks, gene regulatory networks, citation networks, protein-interaction

networks, ground transportation networks, power distribution networks, computational

phylogeny, web crawlers, and various others. [1, 2, 7]

In this survey paper, I discuss the algorithms used on directed and undirected dynamic graphs.

Also I will discuss the challenges faced by the algorithms used in web search engines and large

dynamic graphs in general.

II. ALGORITHMS USED FOR UNDIRECTED GRAPHS

Most of the algorithms used for undirected graphs involve the decomposition or partitioning of

the graph. The three tools used for dynamic graphs are clustering, sparsification and

randomization. [2,7]

2

Clustering: Clustering refers to the decomposition of the graph into clusters in such a way

that each update involves only a handful of clusters. Clusters are a suitable collection of

connected subgraphs. The decomposition is generally done recursively and the

information of the subgraphs is combined with the topology trees that maintain the

properties of the dynamically changing structure. Ambivalent data structures are also

used in clustering, in which only one edge belonging to multiple groups is selected, based

on the spanning tree’s topology. Clustering is used for the partitioning the vertex set into

subtrees connected in the designated spanning tree, so that each subtree is only adjacent

to a few other subtrees. The recursive partitioning of the spanning tree is represented

using two-dimensional topology trees that maintain information about the edges in the

spanning tree.

The time complexity of fully dynamic algorithms based on a single level of clustering is

O (m2/3

), but if the partition is applied recursively using two-dimensional topology tress,

the time complexity becomes O (m1/2

), where m is the number of edges in the graph.

According to Frederickson’s theorem, “The minimum spanning forest of an undirected

graph can be maintained in time O (m1/2

) per update, where m is the current number of

edges in the graph.” The time complexity is same for fully dynamic connectivity and 2-

edge connectivity. The main problem with this type of clustering is that it is very problem

dependent and so is very difficult to use as a black box.

3

Figure 1: Clustering of nodes in a graph

Sparsification: Sparsification is a divide-and-conquer method that can be used as a black-

box to design and dynamize graph algorithms. Dependence on the number of edges in the

graph is reduced and this helps to match the time bounds for maintaining some property

of the graph to the time taken to compute in sparse graphs. The time bound T (n, m) for a

graph with n vertices and m edges speeds up to T (n, O (n)), which is the time needed for

a sparse graph. This requires the notion of a certificate.

For any graph property P and graph G, a certificate for G is a graph G’, such that G has

property P of and only if G’ has the property. The edges of a graph G with m edges and n

vertices are partitioned into O (m/n) subgraphs, with n vertices and O (n) edges. A sparse

certificate stores the information relevant for each subgraph. Larger subgraphs are

produced by merging the certificates in pairs. These are made sparse by again computing

their certificate. The result is a balanced binary tree in which each node is represented by

4

a sparse certificate. Each update involves O (log (m/n)) graphs with O (n) edges each,

instead of one graph with m edges.

Sparsification takes place in two types. The first type is used when no dynamic algorithm

is present and a static algorithm is used for recomputing a sparse certificate in each tree

node affected by an edge update. If the certificates can be found in time O (m + n), this

variant gives time bounds of O (n) per update.

When a dynamic algorithm is present, the second type is used, where certificates are

maintained using a dynamic data structure. A stability property of certificates is needed,

to ensure that a small change in the input graph does not lead to a large change in the

certificates. This type of sparsification transforms time bounds of the form O (mp) into O

(np).

A time bound T (n) is said to be well-behaved if, for some c < 1, T (n/2) < cT (n) and the

time bound does not fluctuate wildly with change in n.

According to the given theorem, let P be a property for which we can find sparse

certificates in time f (n, m) for some well-behaved f, and such that we can construct a

data structure for testing property P in time g (n, m) which can answer queries in time

q (n, m). Then there is a fully dynamic data structure for testing whether a graph has

property P, for which edge insertions and deletions can be performed in time

O (f (n, O (n))) + g (n, O (n)), and for which the query time is q (n, O (n)).

According to another theory, let P be a property for which stable sparse certificates can

be maintained in time f (n, m) per update, where f is well-behaved, and for which there is

5

a data structure for property P with update time g (n, m) and query time q (n, m). Then P

can be maintained in time O (f (n, O (n))) + g (n, O (n)) per update, with query time

q (n, O (n)).

Sparsification can be used in minimum spanning forests and edge and vertex

connectivity. Sparsification can be used orthogonally in various data structures. An

efficient dynamic graph algorithm can be produced by the combination of clustering and

sparsification.

Randomization: Randomization helps achieve faster update times. The algorithm,

presented by Henzinger and King, works on maintaining spanning forests, random

sampling and graph decomposition. In spanning forests, trees are maintained using the

Euler Tours data structure and helps obtain logarithmic updates and queries within the

forest. In random sampling, when an edge is deleted from a tree, the non-tree edges are

searched and selected randomly to replace the deleted edge. Graph decomposition is

combined with randomization. The graph G is edge decomposed using O (log n) edge

disjointed subgraphs, which are hierarchically ordered. The higher level contains tightly

connected portions (dense edge cuts) and the lower level contains loosely connected

portions (sparse edge cuts). For each level, a spanning forest for the graph, defined by all

the edges in that level or below, is also maintained.

The goal is to get a time bound of O (log3 n). After an edge is deleted, a number of

sampled edges of O (log2 n) are searched for a replacement. However, if the candidate set

of edge e is a small fraction of all non-tree edges which are adjacent to the tree, it is

unlikely to find a replacement edge for e among this small sample. If no candidate is

6

found among the sampled edges, all the non-tree edges adjacent to the tree must be

checked explicitly.

After random sampling has failed to produce a replacement edge, we need to perform this

check explicitly; otherwise correct answers to the queries could not be guaranteed. Since

there might be a lot of edges which are adjacent to T, this explicit check could be time

consuming operation, so it should be made a low probability event for the randomized

algorithm. This can produce pathological updates, however, since deleting all edges in a

relatively small candidate set, reinserting them, deleting them again, and so on will

almost surely produce many of those unfortunate events. The graph decomposition is

used to prevent this undesirable behavior.

According to the Henzinger and King Theorem, let G be a graph with m0 edges and n

vertices subject to edge deletions only. A spanning forest of G can be maintained in

O (log3 n) expected amortized time per deletion, if there are at least (m0) deletions. The

time per query is O (log n).

III. ALGORITHMS FOR DIRECTED DYNAMIC GRAPHS

The tools used for directed dynamic graphs are Kleene closures, locality, matrices and long

paths.

Kleene Closures: Path problems like transitive closure and shortest paths are tightly

related to matrix sum and matrix multiplication over a closed semiring. The transitive

closure of a directed graph can be obtained over the adjacency matrix of the graph via

operations on the semiring of Boolean matrices. Similarly, the shortest path distances in a

7

directed graph with real-valued edge weights can be obtained from the weight matrix of

the graph via operations on the semiring of real-valued matrices. The distance matrix of

the graph is actually the Kleene closure of the weight matrix of that graph. This Kleene

closure can be computed by either Recursive decomposition or Logarithmic

decomposition.

Locality: According to Demetrescu and Italiano, dynamic path problems can be solved by

maintaining classes of paths characterized by local properties. A path π in a graph is

locally shortest if and only if every proper subpath of π is the shortest path. A historical

shortest path is a path that has been shortest at least once since it was last updated. A path

π in a graph is locally historical if and only if every proper subpath of π is historical. If

the updates in the graph are not fully dynamic, the theorem holds that “Let G be a graph

subject to a sequence of increase-only or decrease-only edge weight updates. Then the

amortized number of paths that start or stop being locally shortest at each update is O

(n2).” According to Demetrescu and Italiano’s theorem, “Let G be a graph subject to a

sequence of update operations. If at any time throughout the sequence of updates there

are at most O (h) historical paths in the graph, then the amortized number of paths that

become locally historical at each update is O (h).”

Matrices: A matrix, subjected to dynamic changes, is a useful data structure to keep

information about the paths in dynamic directed graphs. Since, Kleene closures can be

constructed by evaluating polynomials over matrices, it is natural to consider data

structures for maintaining polynomials of matrices, subject to updates of entries.

8

Long paths: There is an intuitive combinatorial property of long paths on a graph. Long

paths can be found by using short searches. According to the theorem of Ullman and

Yannakakis, “Let S, a subset of V, be a set of vertices chosen uniformly at random. Then

the probability that a given simple path has a sequence of more than (cn log n)/|S|

vertices, none of which are from S, for any c > 0, is, for sufficiently large n, bounded by

2−αc

for some positive α.”

IV. CHALLENGES IN LARGE DYNAMIC GRAPHSAND WEB-CRAWLERS

Managing large dynamic graphs and using web-crawlers have a lot of challenges. [1, 6, 7, 8]

Minimum communication overhead is desired from a distributed system, which depend

on two factors- query latencies and replica maintenance. But in real-time operations,

read/write operations are latency critical and failing to keep those under acceptable limits

may lead to the demise of those applications. Also, in a dynamically evolving graph, the

cost of keeping replicas up-to-date may exceed the benefits of replication.

Load balancing across sites is requires to prevent over-utilization or under-utilization of

resources. Skewed replication decisions may lead to load imbalance.

Flash traffic is a very important problem, which is a flash of unexpected read/write

requests issued to the system within a small period of time.

All queries are desired to be executed with very low latencies to minimize the number of

pulls needed to gather information to answer the query. This property is called the

fairness criteria, which has a value less than or equal to 1. For a real time system, the

value is always less than 1.

9

There are a few efficient distributed-memory parallel implementations of even the

simplest algorithm for sparse, arbitrary graphs.

Dynamic graphs can be enormous with very limited (potentially abysmal) locality at all

levels of memory hierarchy, and are not partitionable and highly unstructured.

The edges and vertices of these graphs may have types and access pattern may be data

dependent.

Web crawlers have the problem of sampling web pages. A technique for unique sampling

of web pages can be used to find out how many pages are on the web, how many of them

are indexed by a search engine, what is the average length of each page, what percentage

of webpages are homepages, how do the properties change over time, etc. Unfortunately,

no such technique is known.

A random graph model is not present that models the behavior of the web page on the

pages as well as on the host level.

Web-search engines also have duplicate or near duplicate pages present. Even though

duplicate host detection is easier than mirror detection, there are a million different hosts

and comparing all pairs is simply infeasible.

Change in query logs give rise to the problem of data streams, where two sequences- an

increasing and the other decreasing are compared by reading the sequences only once.

Web contains many densely connected directed bipartite subgraphs, which is a densely

connected structure and they should at least contain a constant fraction of the

corresponding complete bipartite subgraphs.

10

V. CONCLUSION

In this paper, I have surveyed the tools that are required to analyze directed and undirected

dynamic graphs. The tools that are required for directed graphs are Kleene closures, Long paths,

Matrices and Locality, whereas the tools for the undirected graphs are Randomization, Clustering

and Sparsification. I have mentioned the time bounds for these algorithms which are generally

near to the optimum time complexities. I have also mentioned the various challenges faced in the

analyzing of dynamic graphs, large graphs and algorithms used by web crawlers in a web search

engine.

VI. REFERENCES

[1] David A. Bader, “Petascale Computing for Large-Scale Graph Problems”, Georgia Tech

College of Computing

[2] Carlos Castillo, Mauricio Marin, Andrea Rodriguez, Ricardo Baeza-Yates, “Scheduling

Algorithms for Web Crawling”, Center for Web Research

[3] Camil Demetrescu, Irene Finocchi, Giuseppe F. Italiano, “Dynamic Graphs”, Chapter 1,

CRC Press, 2001, pp. 1-20

[4] David Ediger, Karl Jiang, E. Jason Riedy, David A. Bader, “GraphCT: Multithreaded

Algorithms for Massive Graph Analysis”, IEEE Transactions on Parallel and Distributed

Systems, IEEE, 2012, pp. 1-11

[5] Oden Green, “High Performance Computing for Irregular Algorithms and Applications

with an Emphasis on Big Data Analysis”, Georgia Institute of Technology, May 2014,

pp. 1-280

11

[6] Monika R. Henzinger, “Algorithmic Challenges in Web Search Engines”, Internet

Mathematics, Vol. 1, No. 1, 2003, pp. 115-126

[7] Jayanta Mondal, Amol Deshpande, “Managing Large Dynamic Graphs Efficiently”,

Special Interest Group on Management of Data, Scottsdale, Arizona, ACM, May, 2012,

pp. 1-12

[8] Pak Chung Wong, Chaomei Chen, Carsten Gorg, Ben Shneiderman, John Stasko, Jim

Thomas, “Graph Analytics- Lessons Learned and Challenges Ahead”, IEEE Computer

Society, 2011, pp. 18-29

[9] Clustering image from:

http://i11www.iti.unikarlsruhe.de/_media/members/robert_goerke/clustering_titlelogo_on

ethird.jpg

http://i11www.iti.unikarlsruhe.de/_

http://i11www.iti.unikarlsruhe.de/_

http://i11www.iti.uni-karlsruhe.de/_media/members/robert_goerke/clustering_titlelogo_onethird.jpg

a survey on dynamic graph algorithms

Documents