a practical guide to drawing and computing with...

31
A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS ROSS M. RICHARDSON Abstract. These notes are an attempt to document various graph drawing resources which I have compiled in the course of my work. They are available to anyone who is interesting, and I welcome comments and suggestions. They are very much a work in progress, and you are advised to check the last change date. I offer no warranty, implicit or explicit, and I make no claim as to the relevance of this information to your own computer system. Last changed: September 28, 2006 Contents 1. Prerequisites 2 2. Graph Formats 2 2.1. A Rouges Gallery 2 2.2. Conversion 5 3. Graph Computing 5 3.1. NetworkX 5 3.2. Boost Graph Library 9 4. Graph Drawing 12 4.1. Algorithms 12 4.2. Presentation 14 4.3. Worked Examples 16 4.4. A Sample Drawing Code 20 5. Degree Distributions 25 5.1. Visualization 25 5.2. Powerlaw Exponent 27 5.3. An example 28 6. A Sample Project 29 Acknowledgments 29 7. Appendix: Datasources. 29 References 29 The reader who enjoys presentations might enjoy a talk I gave on many of these topics. The relevent PDF file can be obtained at http://www.math.ucsd.edu/ rmrichar/talks/graph drawing talk.pdf (warning: 5MB). 1

Upload: others

Post on 06-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTINGWITH COMPLEX NETWORKS

ROSS M. RICHARDSON

Abstract. These notes are an attempt to document various graph drawing

resources which I have compiled in the course of my work. They are availableto anyone who is interesting, and I welcome comments and suggestions. They

are very much a work in progress, and you are advised to check the last change

date. I offer no warranty, implicit or explicit, and I make no claim as to therelevance of this information to your own computer system.

Last changed: September 28, 2006

Contents

1. Prerequisites 22. Graph Formats 22.1. A Rouges Gallery 22.2. Conversion 53. Graph Computing 53.1. NetworkX 53.2. Boost Graph Library 94. Graph Drawing 124.1. Algorithms 124.2. Presentation 144.3. Worked Examples 164.4. A Sample Drawing Code 205. Degree Distributions 255.1. Visualization 255.2. Powerlaw Exponent 275.3. An example 286. A Sample Project 29Acknowledgments 297. Appendix: Datasources. 29References 29

The reader who enjoys presentations might enjoy a talk I gave on many of thesetopics. The relevent PDF file can be obtained at http://www.math.ucsd.edu/∼rmrichar/talks/graph drawing talk.pdf (warning: 5MB).

1

Page 2: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

2 ROSS M. RICHARDSON

1. Prerequisites

This guide assumes the reader is sufficiently familiar with computers and com-puter programming to be able to comprehend the code and procedures containedhere within. The author does not intend in any way for this guide to serve as amethod of instruction for learning these skills. However, for the reader already fa-miliar with computer programming, we do hope to provide enough examples suchthat the reader feels comfortable tinkering on their own. The documentation alsoassumes that the reader has access to the tools on her own system; this guide makesno effort to explain their installation. For those members of Fan Chung’s researchgroup, I will try to the best of my abilities to make sure this guide corresponds tocurrently installed software on math107.

This guide includes code in Python, C++, and Maple.The mathematical content in this guide is minimal, and should not be distracting

to anyone for whom these notes might be of interest. That said, someone notfamiliar with the basic terminology of graph theory would do well to have a referenceon hand. I suggest [14].

2. Graph Formats

For almost every graph tool out there, there is some sort of graph file format.Sadly, few apply generally to a large swath of graph drawing contexts. When dis-cussing specific tools that require proprietary formats, we shall discuss the relevantformats. Here we just present a rouges gallery to help you quickly identify thosefiles you come across in the wild. We also discuss some strategies for converting be-tween the formats, which is often the most time-consuming task in any computingproject.

2.1. A Rouges Gallery.

2.1.1. GraphXML. This is a newer format, based on XML (eXtensible MarkupLanguage). It should not be confused with the custom XML format used in LincolnLu’s graph tools. We don’t currently have tools to use this format, but it is easyto recognize if you come across it. The basic syntax is simple; here is a sample

<?xml version=” 1 .0 ”?>< !DOCTYPE GraphXML SYSTEM ” file:GraphXML . dtd”><GraphXML>

<graph>

<node name=” f i r s t ” /><node name=” second /><edge source=” f i r s t ” t a r g e t=” second” />

</graph>

</GraphXML>

Figure 1. A GraphXML file.

For further reference, see [1].

Page 3: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 3

2.1.2. Lincoln’s XML Format. This is an evolving format that I don’t feel verycapable of documenting. Questions should be directed to Lincoln [2] or I if youhave some reason to use this format.

Files which begin:<?xml version="1.0"?><graph><node id="v100"><point x="3.21" y = "9.18"></node>...

are probably in Lincoln’s format.

2.1.3. Large Graph Layout. This is the input format to the Large Graph Layoutdrawing engine. Proper documentation can be found at the Large Graph Layoutweb site, found in the references [3].

LGL actually accepts a number of different file formats. The first of these, the.ncol file format, is given as a simple two column file delimited by whitespace.Thus, to place edge between Paul and Endre and Endre and Laszlo, a file wouldcontain the lines:Paul Endre 3.2Endre Laszlo 4.5

Note here the optional edge weight following the two endpoints.An .lgl file is somewhat different. It lists vertices first, followed by neighbors.

Thus, the same relations would be represented as follows:# EndrePaul 3.2Laszlo 4.5

There are a few caveats to this file format. For use, please see the section on LargeGraph Layout, or read the documentation found in the references.

2.1.4. Walrus. This is a strange one. If you see something akin to figure 2 you

Graph{

### metadata ###@name="IMDB1";@description=;@numNodes=2798;@numLinks=11135;@numPaths=0;@numPathLinks=0;### structural data ###@links=[

{ 712; 0; },{ 0; 735; },{ 0; 2499; },{ 0; 2744; },{ 1; 2; },{ 1; 942; },

...

Figure 2. Beware the Walrus.

Page 4: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

4 ROSS M. RICHARDSON

graph SD {OceanBeach -- PacificBeach [pos=’1.0, 2.0’]PacificBeach -- LaJolla [pos=’1.0, -3.0’];LaJolla -- ScrippsRanch [pos=’-2.5, 1.2’];Hillcrest -- OceanBeach [pos=’0.3,4.0’];MissionBeach -- OceanBeach [pos=’2.7,4.2’];NationalCity;}

Figure 3. A simple DOT file.

should back away very slowly. The file format is quite complicated. I refer theinterested reader to the project site [4]. Good luck1.

2.1.5. Matlab. A .mat file is not human-readable. Say you have a file labeledgraph.mat. In MatlabTM2 (or Octave), issue the following:

>> load graph . mat>> whos

Name S i z e Bytes Class

X 3x3 72 double array

Grand t o t a l i s 9 e lements us ing 72 bytes

>>

Here we see that graph.mat contained a 3 by 3 array labeled X.Matlab is not explicitly a graph format, but very often graphs are stored as

adjacency matrices or lists. The use of Matlab in this context is well outside thescope of the present section.

2.1.6. DOT. The DOT format, which comes from the AT&T Graphviz collection,is by now the default choice for graph storage and manipulation. This is due totwo factors: the widespread use of AT&T’s Graphviz tools, and the generality andextensibility of the format itself.

The format itself is easily recognized. We present an example in figure 3.Both undirected and directed graphs are supported. The format allows for arbi-

trary attributes to be associated at the node, edge, and graph level, though thereare a set of attributes which are standardized for use with the Graphviz tools.

I strongly urge all users who are looking for a format to store their data into consider DOT. The advantages are many: it is human readable, standardized,popular, and highly extensible (only GraphXML is more extensible). The primarydisadvantage, which is shared by all non-trivial formats, is that a full parser is

1I do have some code which allows me to convert into this format from Lincoln’s format. I willrelease said code if asked.

2A registered trademark of The MathWorks.

Page 5: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5

required to read and manipulate DOT files. Luckily, there are a number of pre-written parsers, including the newly available pygraphviz parser (an add-on to theNetworkX package).

Full documentation for the DOT format is available at the Graphviz project site[5].

2.2. Conversion. TBD

3. Graph Computing

Graph computing, in the context of this section, refers to an integrated systemor library for manipulating graphs. There are a large collection of such systems;indeed, the digital representation of a graph is an all-to-common project for begin-ning computer science undergraduates. For our purposes, we focus only on thosesystems which:

(1) are very general(2) contain a large number of primitive algorithms(3) are well documented and actively supported

In my assessment, there are two systems which meet these requirements. One,the Boost Graph Library, is a C++/Python library which has been around for afew years. This library seeks to be a generic set of data structures and algorithmssuitable for constructing robust graph algorithms. The other, NetworkX, is a purePython package with a host of tools which reflect recent trends in complex networkxresearch, and which puts an emphasis on interactivity.

In what follows, we give a general overview of the two packages, and asses theirsuitability to various computing tasks. We go over some of the basics of theiroperation, and provide two a sample application of each.

3.1. NetworkX. Accoding to their project description:“NetworkX (NX) is a Python package for the creation, manipula-tion, and study of the structure, dynamics, and functions of com-plex networks.”

More precisely, NetworkX is a collection of complex network tools (many alreadyin existence) which are collected in one place and given a more or less commonpython interface. This is not unlike the SAGE project [6] in computational numbertheory.

The project itself is due to Aric Hagberg at LANL, and it is currently in veryactive construction.

3.1.1. Strengths and Weaknesses. There are a number of features to recommendthe NetworkX package.

NetworkX is accessed through a simple Python interface, which is useful for anumber of reasons. One is the ease of interaction; the cost of experimenting withexamples is fairly low. Another is the short development cycle. Indeed, becausePython is a weakly-typed language and has no compile or link phase, code can bedesigned and modified on a much shorter time-scale than C/C++.

NetworkX is also useful because it serves as an interface to a number of stan-dard tools. Indeed, the Graphviz tools are available through NetworkX, as is thematplotlib drawing and graphing library. The package also allows for interaction

Page 6: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

6 ROSS M. RICHARDSON

with the scientific computing package NumPy, allowing for spectral investigations,for example.

One other benefit of NetworkX is the pragmatic balance between commonly usedfeatures very generalized algorithms. Many general graph libraries often lack pri-matives for such common operations as finding vertex neighborhoods or computingconnected components, NetworkX has easily learned functions which compute bothof these easily. Indeed, most natural graph manipulations which make sense froma theory point of view (taking subgraphs, computing diameters, etc.) are availableas atomic operations to the user. Moreover, NetworkX provides a number of gen-erators to create a whole host of common graph families, both deterministic andrandom (hypercubes, the Petersen graph, and random regular graphs, for exam-ple). On the other hand, the user has complete control over their graph objects, andNetworkX includes a number of fundamental algorithms from which to constructone’s own code (a general Dijkstra implementation, for example).

There are a few negatives to NetworkX, which make it inappropriate for a numberof applications.

Most notable, NetworkX is not based almost entirely in Python. While thismakes the code accessible, it is also correspondingly slow. This makes computingwith large graphs unrealistic.

As mentioned, NetworkX is also pragmatically designed to be quick to learn. Theprice one pays for this is that not all the code is separable into small, efficient pieces.As a result, this makes NetworkX inappropriate for designing major applications.

A final caveat is that NetworkX is under active development, and is subjectto both a changing interface and occasional bugs. In particular, this makes anycode which is heavily reliant on the NetworkX interface liable to break with everyupgrade.

In summary, NetworkX is suited well at experimentation and rapidly testing outideas. Because it integrates many tools, it is the preferred platform for day-to-daycomputing with graphs. However, given the speed and inflexibility of design, thispackage is not appropriate for building efficient applications, or anything of largecomplexity. Roughly speaking, anything that takes more than a few days to codeis probably inappropriate for NetworkX.

3.1.2. Basic Commands. To begin with NetworkX, we first open Python.

[ triton@math107 ˜ ] $ pythonPython 2 . 3 . 3 (#1 , May 7 2004 , 10 :31 :40)[GCC 3 . 3 . 3 20040412 (Red Hat Linux 3 .3 .3 −7) ] on l inux2Type ” help ” , ” copyr ight ” , ” c r e d i t s ” or ” l i c e n s e ” for more in fo rmat ion .>>> from networkx import ∗>>> G = Graph ( )>>> G. add node (0 )>>> G. add edge ( ( 1 , 2 ) )>>> G. add edge ( ( 2 , 3 ) )>>> G. add path ( [ 2 , 4 , 5 ] )>>> G. nodes ( )[ 0 , 1 , 2 , 3 , 4 , 5 ]>>> G. edges ( )

Page 7: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 7

[ ( 1 , 2 ) , (2 , 3 ) , (2 , 4 ) , (4 , 5 ) ]>>> sho r t e s t pa th (G, 1 , 5 )[ 1 , 2 , 4 , 5 ]>>> connected components ( subgraph (G, [ 1 , 2 , 5 ] ) )[ [ 1 , 2 ] , [ 5 ] ]

Most command in the NetworkX package are available by invoking the magic linefrom networkx import * 3. Alternatively, one could issue the command importnetworkx as NX. In this case, all NetworkX commands would need to be invokedwith a proceeding NX., as in G = NX.Graph().

Full documentation, as well as a better quickstart guide, can be found at thenetworkx web site [7].

3.1.3. Recipes.Reading from a DOT file Perhaps one of the most frequently performed tasks. Say we have a well-

formed DOT file labeled sd.dot.

>>> from pygraphviz import ∗>>> f = open ( ” sd . dot” , ” r ” )>>> G = Agraph ( )>>> G. read ( f )>>> from networkx import ∗>>> H = networkx from pygraphviz (G)

Here, we use the separate library pygraphviz for reading in DOT files.Note that pygraphviz imports files into an Agraph object – this is meantto replicate the functionality of a similar library in the Graphviz package.In particular, an Agraph structure maintains all the attribute data foundin DOT files. NOTE: NetworkX graphs do not have attribute data, andnetworkx from pygraphviz does not preserve this data. It is also worthpaying attention to how pygraphviz deals with files; the G.read() methodexpects a file handle, which is returned here from the system standardopen() command. Don’t worry about closing files (i.e. f.close()), sincePython will do this for you at exit.

Getting and setting attributes in an Agraph() is easily done.

>>> from pygraphviz import ∗>>> G = Agraph ( )>>> G. add node ( ’ e rdos ’ )>>> G. add nodes ( ’ turan ’ )>>> G. s e t n od e a t t r ( [ ’ e rdos ’ , ’ turan ’ ] , pos=’ 0 ,0 ’ )>>> nodea = G. get node ( ’ e rdos ’ )>>> nodea . s e t a t t r ( pos=’ 1 ,2 ’ )>>> for n in G. nodes ( ) :. . . print n , n . g e t a t t r ( ’ pos ’ ). . .

>>>

erdos 1 ,2

3This is a standard method for importing a module in Python.

Page 8: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

8 ROSS M. RICHARDSON

turan 0 ,0>>> G. wr i t e ( open ( ’ foo . dot ’ , ’w ’ ) )

Note that attributes can be accessed one or many at a time. One shouldbe careful about when to pass strings; the above example is paradigmaticin this regard. Finally, note that we include the syntax for writing a DOTfile. In particular, one should observe that the file was opend with a ’w’option, indicating that we are opening the file for writing.

There is, alas, little documentation as to how to use the Agraph class(though the above examples illustrate much of their use). One should seethe file pygraphviz.py for all available features. This is found by issuing alocate pygraphviz.py command in UNIXTMor by browsing the projectsource, found on the NetworkX website [7].

Randomly Generating a Graph There are many ways to generate random graphs, or non-random graphsfor that matter, in NetworkX.

>>> G = gnp random graph (1000 , . 002 )>>> G. i n f o ( )Name : gnp random graph (1000 , 0 . 002 )Type : GraphNumber o f nodes : 1000Number o f edges : 975Average degree : 1 .95>>> G = hypercube graph (10)>>> G. i n f o ( )Name : hypercube graph (10)Type : GraphNumber o f nodes : 1024Number o f edges : 5120Average degree : 10 .0

Some fancier graphs need to be imported. For example, the G(w) modelis accessed as follows:

>>> H = baraba s i a l b e r t g r aph (100 ,1 )>>> from networkx . g ene ra to r s . d eg r e e s eq import ∗>>> G = expected degree graph (H. degree ( ) )

Here we create a Barabasi-Albert preferential attachment graph, and thenuse the resulting degree sequence to create a G(w) graph.

Iterating over Nodes/Edges NetworkX obeys the Python convention for iteration. For instance, wecould do the following:

>>> G = gnp random graph (1000 , . 0 2 )>>> L = [ ]>>> for n in G. nodes ( ) :. . . i f G. degree (n) > 20 :. . . L . append (n). . .

>>> H = subgraph (G,L)

Page 9: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 9

This shows how one can construct the subgraph of G induced by vertices ofdegree greater than twenty.

3.2. Boost Graph Library. The Boost Graph Library (BGL) is a C++ graphlibrary made for fast and efficient graph computation. It is due to a team atIndiana University, copyright 2000-2001. There is also a parallel version of theBGL currently in active development for large scale projects [9].

It should also be noted that the research machine, math107, does not currentlyhave the Python bindings installed.

3.2.1. Strengths and Weaknesses. One of the primary advantages of the BGL is thelarge selection of algorithms available as primitives for the construction of largeralgorithms. These include the categories of shortest paths, minimum spanning trees,connected components, max flow, sorting, layout, and others. The algorithms areboth highly generic and optimized for speed. It is noteworthy that a number ofsoftware packages currently utilize BGL for their production code; in particular,the Large Graph Layout graph drawing package is written BGL.

Another major advantage of the BGL is the highly generic interface. For thosefamiliar with C++, BGL is generic in the same manner as the Standard TemplateLibrary. In any case, the generality has a number of practical consequences. Onesuch consequence is the ability to use a large class of graph data-structures; anygraph data-structure need only satisfy some interface conventions (there are a rangeof conventions, from highly specific to highly generic) to be useable by the majorityof the library. Another consequence is the ability to fine-tune any pre-built algo-rithm with very little code. For instance, the library includes an implementation ofthe Dijkstra all-pairs-shortest-path algorithm. Customizations include the abilityto act on directed or undirected graphs, weighted graphs, to specify an arbitrarydistance function, to specify an arbitrary method of combining distances, and eventhe ability to perform arbitrary actions at various standard points in the algorithm’sexecution. In general, the design of the BGL allows for highly functional code whichis at the same time highly customizable.

It is important to note that the BGL is the most comprehensively documentedgraph library currently available. Documentation is available in both web and bookformats, written by the package’s authors. The documentation consists of both ahow-to section and a full listing of the data-structure and algorithm interface. Thelisting is written in a style very similar to that of the STL documentation, andin particular thus includes complexity guarantees for most operations. For anyapplication that requires speed, this level of control is essential.

It is also of some use that the BGL has a number of Python bindings (in otherwords, an interface to the Python language that still allows one to access the li-brary). The interface is designed to mimic the C++ interface in function whilemaintaining Python syntax and simplicity. This can be useful in either gettingstarted quickly with the BGL or as a means of prototyping code for later migrationto a faster C++ implementation.

The primary disadvantage to the BGL is the added complexity associated withthe highly generic C++ interface. To fully utilize many of the features, the userneeds some familiarity with basic C++ programming, as well as some knowledgeabout more esoteric features such as templates and iterators. Moreover, as theBGL shares the advantages of the Standard Template Library, it also shares its

Page 10: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

10 ROSS M. RICHARDSON

well known weaknesses. The two most important of these are the lack of full com-patibility with all but the most modern compilers 4 and the difficulty in obtaininguseful debugging information given the inherent complexity of the underlying data-structures5. Another aspect of the complexity is reflected in the greatly increaseddevelopment time due to both the inherent complexity in the BGL and the use ofC++ as the development language. This is somewhat moderated by the availabilityof the Python binding, but at the cost of much of the efficiency and flexibility ofthe C++ interface.

3.2.2. A Small Example. We offer only a small snippet to give the flavor of theBGL6. The BGL documentation [8] offers much more.

#inc lude <iostream> // f o r s t d : : cout#inc lude <u t i l i t y > // f o r s t d : : pa i r#inc lude <algor ithm> // f o r s t d : : f o r each#inc lude <boost /graph/ g r aph t r a i t s . hpp>

#inc lude <boost /graph/ a d j a c e n c y l i s t . hpp>

#inc lude <boost /graph/ d i j k s t r a s h o r t e s t p a t h s . hpp>

using namespace boost ;

int main ( int , char ∗ [ ] ){

// crea t e a t yp ede f f o r the Graph typetypedef ad j a c e n c y l i s t <vecS , vecS , b i d i r e c t i o n a l S > Graph ;

// Make convenient l a b e l s f o r the v e r t i c e senum { A, B, C, D, E, N } ;const int num vert i ces = N;const char∗ name = ”ABCDE” ;

// wr i t i n g out the edges in the graphtypedef std : : pa ir<int , int> Edge ;Edge edge ar ray [ ] ={ Edge (A,B) , Edge (A,D) , Edge (C,A) , Edge (D,C) ,

Edge (C,E) , Edge (B,D) , Edge (D,E) } ;const int num edges = s izeof ( edge ar ray )/ s izeof ( edge ar ray [ 0 ] ) ;

// dec l a r e a graph o b j e c tGraph g ( num vert i ces ) ;

4This is a decreasingly common problem as most production C++ compilers have finally become

compliant with the full language specification.5We will not attempt to flesh out this difficulty, sufficing only to mention that anyone who has

used STL will be familiar with the multipage string-of-symbols error messages commonly reported

for simple errors.6 These examples are lifted from the BGL documentation [8].

Page 11: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS11

// add the edges to the graph o b j e c tfor ( int i = 0 ; i < num edges ; ++i )

add edge ( edge ar ray [ i ] . f i r s t , edge ar ray [ i ] . second , g ) ;. . .return 0 ;

}

Here we have an example constructing a simple graph with seven edges.Here we have an example of the Python bindings used to read in a DOT file,

compute the minimum spanning tree, and return a DOT file with the MST hi-lighted.

import boost . graph as bgl

# Load a graph from the GraphViz f i l e ’mst . dot ’graph = bgl . Graph . r ead graphv i z ( ’mst . dot ’ )

# Convert the we igh t in t o f l o a t i n g−po in t va l u e sweight = graph . convert property map ( graph . edg e p r op e r t i e s [ ’ weight ’ ] ,

’ f l o a t ’ )

# Compute the minimum spanning t r e e o f the graphmst edges = bgl . kruskal minimum spanning tree ( graph , weight )

# Compute the we igh t o f the minimum spanning t r e eprint ’MST weight =’ ,sum ( [ weight [ e ] for e in mst edges ] )

# Put the we i gh t s in t o the l a b e l . Make MST edges s o l i d wh i l e a l l o ther# edges remain dashed .l a b e l = graph . edge property map ( ’ s t r i n g ’ )s t y l e = graph . edge property map ( ’ s t r i n g ’ )for e in graph . edges :

l a b e l [ e ] = s t r ( weight [ e ] )i f e in mst edges :

s t y l e [ e ] = ’ s o l i d ’else :

s t y l e [ e ] = ’ dashed ’

# Assoc ia te the l a b e l and s t y l e proper ty maps wi th the graph f o r outputgraph . edg e p r op e r t i e s [ ’ l a b e l ’ ] = l a b e lgraph . edg e p r op e r t i e s [ ’ s t y l e ’ ] = s t y l e

# Write out the graph in GraphViz DOT formatgraph . wr i t e g raphv i z ( ’mst−out . dot ’ )

Page 12: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

12 ROSS M. RICHARDSON

4. Graph Drawing

The general topic of graph drawing is quite broad, so some definitions are inorder. By a graph drawing we intend a geometrical embedding of a graph G (pos-sibly directed and with loops) into R2 or R3. The intended application of such anembedding is the production of an image suitable for illustration of some feature ofthe graph.

Before we discuss the topic of interest to this guide, let us discuss some graphdrawing topics which are not going to be discussed further. One such topic is theproblem of finding embeddings for very well-behaved families of graphs. Theseinclude trees, planar/outerplanar graphs, graphs of bounded genus, graph arisingfrom some natural and regular geometry (e.g. lattices), and algebraic graphs (e.g.Cayley graphs). Another topic is drawings which seek to optimize or meet verystrict criteria, such as planar drawings/ sphere embeddings, minimum crossingembeddings, lattice embeddings, symmetric embeddings, and the like.

So what, then, is our goal? We would like to present here techniques for drawing“large real-world graphs” quickly, automatically, and suitable for publication. Nat-urally, we would feel more comfortable with a definition that did not resort to theuse of quotation marks, but the nature of the problem makes an such descriptionnecessarily of a heuristic nature. However, we note that practically speaking, thegraphs of interest tend to be large (≥ 100 vertices, say), very sparse, and quiteoften displaying power-law degree distributions.

This section is organized as follows. We present two algorithms which are con-cerned with producing the desired embedding, and illustrate their use. We thendiscuss issues of presentation, including the use of color, edge order, translucenceand the like, and present tools based on the NetworkX. Finally, we include someworked examples. The reader who would like to begin immediately should skipahead to the examples section.

4.1. Algorithms.

4.1.1. Force-Directed Algorithms: Kamada-Kawai. Perhaps the most natural classof algorithms for graph layout are the spring or force-directed algorithms. Thearchetypical algorithm begins with a random in Rd. To each edge we attach anideal spring (obeying Hooke’s law, say) with some prescribed ideal length. Thealgorithm then iterates to minimize the energy of the system, typically until thechange in energy drops below a given threshold.

While there are a number of implementations of this general algorithm, we dis-cuss one in particular due to Kamada and Kawai [13], with numerical improvementsby Gansner et. al. [12]. The input to the algorithm is a list of edge weights (wij)and edge lengths (dij). The (dij) are taken to be the graph-theoretic distance ifnot otherwise specified. Given some positions X = (X1, . . . , Xn) of the vertices,the stress is defined to be

stress(X) =∑i<j

wij (‖Xi −Xj‖2 − dij)2.

The algorithms of [13] and [12] both seek to minimize this stress function throughdifferent numerical schemes. Note that when the dij are the graph theoretic dis-tance, the resulting embedding is an optimal Euclidean embedding of the graph inthe sense of weighted least squares.

Page 13: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS13

The formulation due to [12] forms the core of the neato algorithm, found inthe popular AT&T Graphviz library. In particular, the algorithm accepts as inputboth edge weights and edge lengths, and has the default behavior as described. Theversion due to Kamada and Kawai [13] is found in the Boost Graph Library. Theseare perhaps the two most commonly used implementations of the force-directedclass of drawing tools.

Figure 4. A graph with layout produced by the neato algorithmof the Graphviz package.

4.1.2. Spanning Tree Algorithms: LGL. The force directed algorithm is character-ized by an attempt to find an approximate isometry of some n point metric spaceinto our two (or three) dimensional space. If one dispenses with this notion ofgraph layout, the next natural choice is to layout important spanning subgraphs ofa given graph.

One important choice of spanning subgraph is the notion of a spanning tree.Given some (possibly rooted) spanning tree T of a connected graph G, one can thenapply any host of tree layout algorithms. As this produces a set of coordinates forthe vertices, this induces a drawing of the full graph.

This scheme is implemented in the fairly recent code Large Graph Layout (LGL)[3], written by Alex Adai. The algorithm can be described as follows: Beginningwith a rooted spanning tree, place the root at the origin. For every neighbor ofthe root, place it on a unit circle/unit sphere about the root to maximize the anglebetween any other neighbor (thus on the unit circle space the neighbors evenly).For each neighbor, draw a circle about it and lay out the remaining neighbors onthe part of this circle which falls outside the prior circle, distributing vertices evenlyon this partial circle. This is then repeated iteratively for a all points. Details forthe exact procedure can be found in a paper linked off of the LGL website [3]. Inparticular, it is worth noting that the full algorithm includes a smoothing procedure,where neighborhoods of the leaves of our spanning tree are passed through a force-directed method for improved layout.

In practice, this algorithm does a remarkable job at “spreading out” real worldgraphs. In part, this is due “octopus” structure of complex networks CITE FAN.

Page 14: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

14 ROSS M. RICHARDSON

Figure 5. A graph with layout produced by the LGL code. Com-pare with figure 4 – both represent the same graph.

4.2. Presentation. The prior section considered the question of graph layout, as-signing to each vertex a spacial coordinate7. In this section, we consider the questionof actually drawing a graph given an embedding.

In principle, drawing a graph given its vertex coordinates is simple–one simplycalls an appropriate draw operation for every edge. A simple minded implementa-tion, however, quickly yields problems. Consider the following figure DRAW THIS,drawn with the LGL code. The resulting tangle of edges is not very informative.However, given the performance of LGL, we expect the high degree vertices to beclustered in the center. We suspect our layout indeed has this property, but it ishard to really determine which edges participate in any such subgraph.

One such solution is the use of color, or for publication, shades of gray. Examplesof this technique have appeared previously in this guide, see figure 5. A few remarksare in order. First, while the vertex degrees are generating the color choices forour figure, they are reflected in colored edges. Why not simply color the verticesinstead? From a practical standpoint, vertices take up less area, so they are harderto correctly identify8. Another criticism of this approach is that there is no way tovisually identify important from negligible edges. A second note about figure 5 isthat the edges are drawn with a given ordering, where the order of drawing is leastto most important.

In practice, we suggest the following general principle for edge drawing.(1) Give all edges a weight from, say, [0, 1].(2) Order the edges by weight.(3) Draw the edges from least weight to most weight.

Why use weights instead of colors or grayscale values? With weights, one can thenutilize different colormaps 9 for various effects.

7We have assumed all graph embeddings are straight-line embeddings, and as such our layout

is reduced to a vertex embedding. One could, however, look at the case of curved edges. Thoughwe shall not do so, it should be noted that many graph formats and tools have the ability to workwith spline edges in addition to their linear counterparts.

8Often vertex sizes are changed to reflect degree. While this can help identify large degree

vertices, in large graph this has the practical effect of covering much of the detail of the graph.9For the uninitiated, a colormap is formally a function c from [0, 1] into some colorspace, say

RGB for concreteness. In most computer implementations, a colormap assigns n evenly spaced

Page 15: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS15

So how then to assign weights? Any scheme appropriate to the graph is fine,though for general graphs I prefer the notion of weighted average degree. If d(u) isthe degree of vertex u, then we weight edge (u, v) as

w(u, v) = d(u)d(v)∆−2(G)

where ∆(G) is the maximum degree. For graphs with a power-law degree distri-bution, we need a minor modification in order to obtain an appropriate gradient,namely

w(u, v) =ln d(u) + ln d(v)

2 ln ∆(G).

One could instead use a multiplicative logarithmic version instead of the one above,but then degree one vertices need a special case.

Of course, the choice of edge weights may vary with application. Consider thefollowing example (see figure 6)

Figure 6. A very homogenous graph with substructure highlighted.

Here, the edge weights are not given by degree information, but rather highlightsome pre-computed substructure. In contrast, a degree-based weighting would bevery homogenous, hence little improved over a monochromatic drawing. The lessonto be drawn here is simple: for large graphs, drawings should include edgeweights,and an appropriate method for differentiating the small from the large.

Finally, we discuss a few miscellaneous issues of which one should be aware. Theuse of transparency, for example, is to be encouraged. Properly used, transparencyallows greater detail in medium sized graph drawings. As a rough guide, trans-parency shows a noticeable effect on graphs of up to a few thousand edges. Inaddition to transparency, it is well known that a proper background can highlightcontrast between edges of different edge weights. When used in conjunction withan appropriate colormap, a background can make a surprising difference. See figure7.

points in [0, 1] to n specific colors, utilizing an interpolation rule to assign the remainder of theinterval [0, 1].

Page 16: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

16 ROSS M. RICHARDSON

Figure 7. Note how the background emphasizes contrast in theedge colors.

4.3. Worked Examples. In this example we shall show how to draw a moderatelysized graph using both force-directed and LGL codes. We shall also examine pre-sentation issues. The tools we shall use primarily NetworkX (and the accompanyingpackage pygraphviz). In keeping with the rest of the document, we shall work withthe DOT format as our primary storage medium, and show how to manipulate thisformat using pygraphviz. We shall require NetworkX 0.31, pygraphviz 0.32 includeall the relevant code in the following section.

To begin, we need a data source. We shall use collaboration data from thediscrete geometry literature. Our primary file geom-1.net begins*Vertices 7343

1 "S. Kambhampati" 0.0000 0.0000 0.50002 "Christian A. Duncan" 0.0000 0.0000 0.50003 "V. P. Grishukhin" 0.0000 0.0000 0.50004 "T. T. Moh" 0.0000 0.0000 0.50005 "R. P. Brent" 0.0000 0.0000 0.50006 "Afra Zomorodian" 0.0000 0.0000 0.50007 "Gianfranco Bilardi" 0.0000 0.0000 0.50008 "Y. I. Yoon" 0.0000 0.0000 0.50009 "N. Bourbaki" 0.0000 0.0000 0.500010 "F. W. Levi" 0.0000 0.0000 0.500011 "J. P. Kermode" 0.0000 0.0000 0.500012 "B. B. Kimia" 0.0000 0.0000 0.500013 "R. Livne" 0.0000 0.0000 0.500014 "H. N. Gabow" 0.0000 0.0000 0.500015 "Jonathan C. Hardwick" 0.0000 0.0000 0.5000

...

and 7343 lines later we find7343 "J. Wilson" 0.0000 0.0000 0.5000

*Arcs*Edges

272 6588 13308 6588 1

Page 17: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS17

4884 6588 1272 3308 1272 4884 1

3308 4884 13867 6582 12990 3867 1257 4949 2601 4949 2

1219 4949 21077 4949 2257 601 2257 1219 2257 1077 10601 1219 2601 1077 2

1077 1219 3...

We shall disregard the label data, caring only about the edge data. We thus crafta small Perl script (geom.pl) which will parse this data and gives us a basic DOTfile.

#!/ usr / b in / p e r l −w

print ”graph disc geom {\n” ;

while(<>){

chomp ;@vert = sp l i t ;print ” $ver t [ 0 ] −− $ver t [ 1 ] ; \ n” ;

}

print ”}” ;

We thus issue[triton@math107 ~]$ tail -n +7374 geom-1.net | ./geom.pl > geom.dot

to obtain the file geom.dot. Note that here we use the UNIX tail command toask for lines 7374 and everything that follows.

Now, we shift into Python.

>>> import pygraphviz>>> import networkx>>> G = pygraphviz . AGraph( ’ geom . dot ’ ) # Read our data>>> H = G. subgraph (G. nodes ( ) [ 1 : 1 0 0 0 ] ) # Get an induced subgraph on 1000 nodes>>> L = connected components (H) [ 0 ] # Find the l a r g e s t connected component>>> K = H. subgraph (L)>>> import os>>> f = load ( ’ geom small . dot ’ )

Page 18: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

18 ROSS M. RICHARDSON

>>> K. wr i t e ( f )>>> f . c l o s e ( )>>> l en (K. nodes ( ) )699>>> l en (K. edges ( ) )2635

At this point, in the file geom small.dot we have a connected graph on 699 verticesand 2635 edges. We shall now layout this graph using the force directed methodpreviously discussed. Specifically, we use the algorithm neato, which is part of theAT&T Graphviz package. We issue the command[triton@math107 ~]$ neato geom_small.dot > geom_small_layout.dot

We could also have done this inside of Python, using the command K.layout(prog=’neato’).We now use the drawing code supplied in the next section, which uses the drawing

engine of Matplotlib. Thus, we go to Python.

>>> import graph draw agraph>>> G = AGraph( ’ geom smal l layout . dot ’ )>>> graph draw agraph . compute weights (G, scheme=’ logdeg ’ ) # Create a l o ga r i t hm i c we i gh t ing>>> graph draw agraph . draw (G, ’ draw1 . png ’ )

Here we used a logarithmic weighting scheme. The result is in figure 8.

Figure 8. draw1.png

The default coloring scheme uses a colormap called matplotlib.cm.jet. Otherstandard colormaps are available – see the matplotlib.cm documentation. Of course,one can create colormaps as well. Good information about colormaps can be foundat http://www.scipy.org/Cookbook/Matplotlib/Show colormaps. Let us createa small colormap.

To do so, we issue the following:

>>> from pylab import ∗>>> cd i c t = { ’ red ’ : ( ( 0 . 0 , 0 . 0 , 0 . 0 ) ,

( 0 . 3 , 0 . 0 , 0 . 0 ) ,( 0 . 7 , 0 . 98 , 0 . 9 8 ) ,

Page 19: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS19

( 1 . 0 , 1 . 0 , 1 . 0 ) ) ,’ green ’ : ( ( 0 . 0 , 0 . 0 , 0 . 0 ) ,

( 0 . 3 , 0 . 15 , 0 . 1 5 ) ,( 0 . 7 , 0 . 75 , 0 . 7 5 ) ,( 1 . 0 , 1 . 0 , 1 . 0 ) ) ,

’ b lue ’ : ( ( 0 . 0 , 0 . 0 , 0 . 0 ) ,( 0 . 3 , 0 . 34 , 0 . 3 4 ) ,( 0 . 7 , 0 . 08 , 0 . 0 8 ) ,( 1 . 0 , 1 . 0 , 1 . 0 ) )}

>>> my cmap = matp lo t l i b . c o l o r s . LinearSegmentedColormap ( ’ my colormap ’ , cd i c t , 256)>>> draw (G, ’ draw2 . png ’ , my cmap)

An explanation is provided in the reference above. Briefly, the first coordinate inevery triple corresponds to a value in [0, 1], and the second coordinate representsthe intensity value of the given color (we ignore the third coordinate for now). Wethus have four reference values given above, and the rest of the colormap is filledin via interpolation. We thus obtain figure 9.

Figure 9. draw2.png

Next, we wish to layout the same graph using LGL. To do so, we need to do someformat conversion. The format for LGL is described in a prior section, and in thefollowing subsection we give a simple code which performs the desired conversion.Using this code, which we call conversion.py, we can thus create the desired file.

>>> from pygraphviz import ∗>>> G = AGraph( ’ geom smal l layout . dot ’ )>>> from conver s i on import ∗>>> t o l g l (G, ’ geom . l g l ’ )

We now should have a file geom.lgl in the current directory. We shall now assumethat our current directory also contains the files setup.pl and lgl.pl, which areincluded with this LGL package. We further assume that we are at the root of ouruser account, in this case /home/triton.

Page 20: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

20 ROSS M. RICHARDSON

Layout using LGL requires a few steps. First, we must create a file where LGLwill leave its output. We issue the command mkdir /tmp/lgl to create this workdirectory. Next, we issue the command[triton@math107 ~]$ ./setup.pl -c config

This creates our configuration file, which we now edit. We need to edit two specificlines to indicate where the work directory and LGL file are located. Note that thetools require absolute path names.

Editing config we find the following:...# All paths should be absolute.###########################################################

# The output directory for all the LGL results. Note that# several files and subdirectories will be generated.# This has to be a valid directory name.tmpdir = ’/tmp/lgl’

# The edge file to use for the layout. Has to be a file readable# by LGLFormatHandler.pm. It has to be an existing/valid file name,# with the absolute path.inputfile = ’’

# The output file that will have the final coordinates of# each vertex. This has to be a valid file name, and it# will be place in ’tmpdir’....

We see that tmpdir is already set to the correct value, since we made /tmp/lglas our work directory. We thus need to change inputfile = ’’ to inputfile =’/home/triton/geom.lgl’.

With these edits made, we can now run LGL by typing[triton@math107 ~]$ ./lgl.pl -c config

where here the argument I pass is the name of our configuration file, config.When the layout is finished, /tmp/lgl will be filled with a number of files. As we

are interested simply in the final layout, the relevant file will be /tmp/lgl/final.coords.Opening this file, we obtain:1001 6.4891 5.86314938 6.2929 7.66155564 6.1897 6.96873851 6.3344 5.1216101 8.2566 6.0554696 7.7475 6.8037...

The file format thus lists triples consisting of the vertex label followed by the firstand second coordinates. The file conversion.py contains a function which willallow us to import these position values into our graph.

>>> from pygraphviz import ∗>>> G = AGraph( ’ geom smal l layout . dot ’ )

Page 21: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS21

>>> from conver s i on import ∗>>> f r om l g l c o o r d s (G, ’ /tmp/ l g l / f i n a l . coords ’ ) # Note t ha t we ove rwr i t e p r i o r ’ pos ’ a t t r i b u t e s

At this point, we have an AGraph object with ’pos’ attributes, and we can thusdraw such a graph using techniques we’ve developed earlier. See figure ??.

Figure 10. An LGL produced layout.

4.4. A Sample Drawing Code. Here we present an extended drawing code whichtakes advantage of the graphics services of Matplotlib. The full code may be foundhere.

# GRAPHDRAW A f l e x i b l e graph drawing code . Pass in a graph wi th weigh ted edges .# Copyright (C) 2006 Ross M. Richardson## This program i s f r e e so f tware ; you can r e d i s t r i b u t e i t and/or# modify i t under the terms o f the GNU General Pub l i c License# as pub l i s h ed by the Free Sof tware Foundation ; e i t h e r ve r s i on 2# of the License , or ( at your opt ion ) any l a t e r ve r s i on .## This program i s d i s t r i b u t e d in the hope t ha t i t w i l l be u s e fu l ,# but WITHOUT ANY WARRANTY; wi thout even the imp l i ed warranty o f# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the# GNU General Pub l i c License f o r more d e t a i l s .## You shou ld have r e c e i v ed a copy o f the GNU General Pub l i c License# a l ong wi th t h i s program ; i f not , wr i t e to the Free Sof tware# Foundation , Inc . , 51 Frank l in S t ree t , F i f t h Floor , Boston , MA02110−1301 , USA.# The author can be contac ted v ia emai l a t <rmrichardson@math . ucsd . edu>

# or by p o s t a l mail a t# 9500 Gilman Drive# Dept . o f Mathematics# UCSD

Page 22: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

22 ROSS M. RICHARDSON

# La Jo l l a , CA 92093−0112

import matp lo t l i bimport pygraphvizimport networkximport sysimport math

matp lo t l i b . use ( ’Agg ’ )from matp lo t l i b . backends . backend agg import RendererAggfrom matp lo t l i b . t rans forms import ∗from matp lo t l i b . cm import j e t

def draw ( graph ,ou tpu t f i l e ,colormap=matp lo t l i b . cm . j e t ,he ight =400 , width=400 ,dot spe r inch =72.0 ,l i n ew id th = 3 ) :

”””This w i l l produce a ( cu r r en t l y ) PNG image given an AGraph in s tanc e( accord ing to pygraphviz 0 . 3 2 ) .

Note that we expect a graph to have POS a t t r i bu t e s , and to have edgeweights .

Parameters :graph −− an AGraph in s tanc e .o u t pu t f i l e −− f i l ename (we do not add the ’ . png ’ s u f f i x ) .colormap −− matp lo t l i b . cm in s tanc e .he ight , width −− in po in t s .dot spe r inch −− a f l o a t .l i n ew id th −− how th i ck to make the l i n e s .

Only the f i r s t two arguments are nece s sa ry .”””# Here we s e t up our in s tance .dpi = Value ( dot spe r inch )r = RendererAgg ( height , width , dpi )gc = r . new gc ( )

# Here we conver t to an XGraph ins tanceG = networkx . from agraph ( graph )

E = G. edges ( )

Page 23: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS23

N = G. nodes ( )Pos = {}xmin = 0xmax = 0ymin = 0ymax = 0

# Es t a b l i s h p l o t t i n g p o s i t i o n sfor n in N:

posSt r ing = G. node at t r [ n ] [ ’ pos ’ ]i f posSt r ing != None :

p = posSt r ing . s p l i t ( ’ , ’ )x = f l o a t (p [ 0 ] )y = f l o a t (p [ 1 ] )Pos [ n ] = [ x , y ]i f x > xmax :

xmax = xi f x < xmin :

xmin = xi f y > ymax :

ymax = yi f y < ymin :

ymin = yelse :

print ”No ’ pos ’ a t t r i b u t e on ver tex ” , n

# Now we need to s e t up a coord ina te transformdisplayLim = Bbox( Point ( Value ( 0 ) , Value ( 0 ) ) ,

Point ( Value ( width ) , Value ( he ight ) ) )viewLim = Bbox( Point ( Value ( xmin ) , Value ( ymin ) ) ,

Point ( Value (xmax) , Value (ymax ) ) )

# We j u s t need to pass t h i s t rans format ion to every# draw c a l l .t rans = SeparableTrans format ion ( viewLim , displayLim ,

Func (IDENTITY) , Func (IDENTITY) )

# Some d i s p l a y p r o p e r t i e sgc . s e t a n t i a l i a s e d (1 )gc . s e t l i n ew i d t h ( l i n ew id th )gc . s e t a l pha ( . 8 ) # This i s a g l o b a l parameter .

# A l i t t l e magic to ge t the edges so r t ed c o r r e c t l ysortedEdges = [ ]

Page 24: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

24 ROSS M. RICHARDSON

for e in E:weight = e [ 2 ] [ ’ weight ’ ]weight = f l o a t ( weight )sortedEdges . append ( [ weight , e ] )

sortedEdges . s o r t ( )

for [ weight , e ] in sortedEdges :source = e [ 0 ]des t = e [ 1 ]gc . s e t f o r e g r ound ( colormap . c a l l ( weight ) )gc . s e t a l pha ( .5+.5∗ weight ) # A magical va luex = [ Pos [ source ] [ 0 ] , Pos [ des t ] [ 0 ] ]y = [ Pos [ source ] [ 1 ] , Pos [ des t ] [ 1 ] ]r . d r aw l i n e s ( gc , x , y , t rans )

# Need to implement v e r t e x drawing ! !

# Outputr . r ende r e r . wr i te png ( o u t pu t f i l e )

def compute weights ( graph , scheme=’ uniform ’ ) :”””Write edgeweights to an AGraph in s tanc e .

Parameters :graph −− AGraph in s tanc e .scheme −− ’ uniform ’ , ’ degree ’ , or ’ logdeg ’”””

G = networkx . from agraph ( graph )

maxdeg = max(G. degree ( ) )maxdeg2 = f l o a t (maxdeg ∗ maxdeg )for e in graph . edges ( ) :

i f scheme == ’ degree ’ :e . a t t r [ ’ weight ’ ] = s t r (G. degree ( e [ 0 ] ) ∗ G. degree ( e [ 1 ] ) / maxdeg2 )

i f scheme == ’ logdeg ’ :e . a t t r [ ’ weight ’ ] = s t r (math . l og (G. degree ( e [ 0 ] ) ∗ G. degree ( e [ 1 ] ) ) / \

math . l og (maxdeg2 ) )else : # uniform

e . a t t r [ ’ weight ’ ] = s t r ( 1 . 0 )

Next, we list conversion.py. This file contains code for converting DOT toLGL, and similarly for importing coordinates from LGL’s final.coords file intoan AGraph object (and hence DOT file). The code can be found at http://www.math.ucsd.edu/∼rmrichar/drawing/links/conversion.py.

Page 25: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS25

import pygraphvizimport networkx

def t o l g l ( graph , f i l ename ) :”””Write an LGL f i l e g iven an AGraph f i l e

”””

f = open ( f i l ename , ’w ’ )

for n in graph . nodes ( ) :f . wr i t e ( ’#’ + s t r (n) + ’ \n ’ )for nbr in graph [ n ] :

i f n < nbr :f . wr i t e ( nbr + ’ \n ’ )

f . c l o s e ( )

def f r om l g l c o o r d s ( graph , c o o r d f i l e ) :”””F i l l in the pos a t t r i b u t efrom a LGL coord f i l e .”””

f = open ( c o o r d f i l e , ’ r ’ )for l i n e in f :

L = l i n e . s p l i t ( )n = graph . get node (L [ 0 ] )n . a t t r [ ’ pos ’ ] = s t r (L [ 1 ] ) + ’ , ’ + s t r (L [ 2 ] )

f . c l o s e ( )

5. Degree Distributions

One very standard procedure of computing with large graphs is the analysisof the degree distribution. Recall that the degree distribution of a graph G is thesequence d = (d1, d2, . . . , dn), where di is the degree of vertex i of G. It is sometimesconvenient to order the degree distribution.

5.1. Visualization. Visualizing the degree distribution is often very useful. Per-haps the most common procedure is to plot a histogram of the degrees againstthe associated frequencies. We can do this easily in MATLAB. If D is the degreedistribution of a graph, then

>>hist (X)

produces figure 10.

Page 26: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

26 ROSS M. RICHARDSON

0 10 20 30 40 50 600

500

1000

1500

2000

2500

Figure 11. Simple histogram.

The most common use of degree distributions is the verification of power-lawdistributions. For this, we use hist to put our data in bins. Again, we assume Dis the degree distribution.

>>[ f r eq , b ins ] = hist (D, 4 7 )>>loglog ( bins , f r eq , ’ x ’ )

Here, the 47 refers to the number of bins, which we set here to make comparisonwith the next figure possible.

Applied to the same graph as before, we now get a clear powerlaw relationshipin figure ??. You will note, however, that the resulting figure is a bit messy towardthe high degree end. This is a result of the fact that the high degree occur muchmore infrequently, resulting in a number of singletons toward the end.

To solve this problem, we use exponential binning, or simply stated, we makethe bin sizes grow exponentially. To do this, we set the center of each bin to be rtimes the center of the previous, giving

bin(i + 1) = bin(i) ∗ r, r > 1.

Page 27: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS27

100

101

102

100

101

102

103

104

Figure 12. Note the right hand side.

The function degree dist (found here) will do this automatically if we pass thedesired value of r as the second argument.

>>[ f r eq , b ins ] = d e g r e e d i s t (D, 1 . 1 ) ;>>loglog ( bins , f r eq , ’ x ’ )

See figure 12 for the difference.

5.2. Powerlaw Exponent. We can extract the powerlaw exponent simply fromour degree distribution by regression as follows:

>>[ f r eq , b ins ] = d e g r e e d i s t (D, 1 . 1 ) ;>>s ize ( f r e q )ans =1 43

>>lb = [ ]>> l f = [ ]

Page 28: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

28 ROSS M. RICHARDSON

100

101

102

100

101

102

103

Figure 13. Exponential binning really helps out.

>>for i =1:43>> i f ( f r e q ( i ) ˜= 0)>> l f = [ l f log ( f r e q ( i ) ) ] ;>>lb = [ lb log ( b ins ( i ) ) ] ;>>end>>end>>polyf it ( lb , l f , 1)ans =−1.63337 .3827

Here, the 43 above just applies to the particular degree sequence used (yours willdiffer). The above code basically linearizes the data and removes zero frequency

Page 29: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS29

entries. The powerlaw exponent is given here as −1.6333, namely, the first valuereturned by polyfit.

5.3. An example. Here we extract the degrees of a random preferential attach-ment graph. We first work in python.

>>> from networkx import ∗>>> G = baraba s i a l b e r t g r aph (1000 ,1 )>>> D = G. degree ( )>>> D. s o r t ( )>>> import sys>>> sys . s tdout = open ( ’ degree s . txt ’ , ’ a ’ ) # We now r e d i r e c t output to a f i l e>>> for d in D:. . . print d. . .

We should now have a file labeled degrees.txt in our current directory. We switchto MATLAB.

>>load ( ’ degree s . txt ’ )>> [ bins , f r e q ] = d e g r e e d i s t ( degrees , 1 . 1 )>> s ize ( f r e q )

ans =

1 45>> l f = [ ]>> lb = [ ]>> for i =1:45i f ( f r e q ( i ) ˜= 0)l f = [ l f log ( f r e q ( i ) ) ] ;lb = [ lb log ( b ins ( i ) ) ] ;endend>> polyf it ( lb , l f , 1)

ans =

−1.7641 5 .8714>> f i d = fopen ( ’ d e g r e e s p ro c e s s ed . txt ’ , ’wt ’ ) ;>> fpr intf ( f i d , ’%e\ t%e\n ’ , [ bins ’ , f r eq ’ ] ’ ) ;>> fc lose ( f i d ) ;

Now back to python.

>>> X = load ( ’ d eg r e e s p ro c e s s ed . txt ’ , ’wt ’ )>>> B = [ ]>>> F = [ ]>>> for x in X:

Page 30: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

30 ROSS M. RICHARDSON

. . . i f x [ 1 ] != 0 :

. . . B. append (x [ 0 ] )

. . . F . append (x [ 1 ] )>>> l o g l o g (B,F , ’ or ’ )>>> x l ab e l ( ’ l og ( degree ) ’ )>>> y l ab e l ( ’ l og ( f r e q ) ’ )>>> t i t l e ( ’ Degree Plot ’ )>>> s a v e f i g ( ’ p l o t . png ’ )

100

101

102

100

101

102

103

log(degrees)

log(

freq

)

Degree Plot

Figure 14. Our example.

Here, we demonstrated the use of matplotlib to produce nice output. A simpleplot is found in figure 13. The nice output can be seen here.

Page 31: A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH …math.ucsd.edu/~fan/graphdraw/ross/Documentation.pdfA PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS31

6. A Sample Project

Acknowledgments

I thank Fan for the support which made this documentation possible (as wellas much of the graph drawing knowledge). I thank Lincoln for his work in graphdrawing and computing which he freely shared with me, as well as Reid for a numberof enlightening conversation. I would like to thank in advance any possible readerwho alerts me to errors, omissions, or ideas regarding these notes.

7. Appendix: Datasources.

References

[1] GraphXML. http://ftp.cwi.nl/CWIreports/INS/INS-R0009.pdf.[2] Lu, Lincoln. [email protected]. http://www.math.sc.edu/∼lu/.[3] Large Graph Layout. http://bioinformatics.icmb.utexas.edu/lgl/[4] Walrus. http://www.caida.org/tools/visualization/walrus/

[5] AT&T Graphviz Tools. http://www.graphviz.org/

[6] SAGE. http://modular.math.washington.edu/sage/[7] NetworkX. https://networkx.lanl.gov/

[8] Boost Graph Library http://www.boost.org/libs/graph/doc/

[9] Parallel BGL http://www.osl.iu.edu/research/pbgl/

[10] Roberto Tamassia’s Graph Drawing Resources http://www.cs.brown.edu/∼rt/gd.html[11] Goodman, J.E. and J. O’Rourke, Handbook of Discrete and Computational Geometry, 2nd

Edition, CRC Press, 2004.[12] Gansner, E. Koren, Y. and S. North. “Graph Drawing by Stress Majorization”. Graph Draw-

ing: 12th International Symposium, GD 2004, New York, NY, 2004. Lecture Notes in CS3383, 2005.

[13] T. Kamada and S. Kawai, An Algorithm for Drawing General Undirected Graphs, Infor-

mation Processing Letters 31 (1989), pp. 715.[14] West, D. Introduction to Graph Theory, 2nd Edition, Prentice-Hall, 2001.

9500 Gilman Drive., University of California, San Diego, La Jolla, CA 92093-0112

E-mail address: [email protected]