the hitchhiker’s guide to graph exchange formats · the hitchhiker’s guide to graph exchange...
TRANSCRIPT
The Hitchhiker’s Guide to Graph ExchangeFormats
Prof. Matthew [email protected]
http://www.maths.adelaide.edu.au/matthew.roughan/Work with Jono Tuke
UoA
June 4, 2015
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 1 / 31
Graphs
Graph: G(N,E)I N = set of nodes (vertices)I E = set of edges (links)
Often we have additional information, e.g.,I link distanceI node typeI graph name
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 2 / 31
Why?
To represent data where “connections” are 1st class objects intheir own right
I storing the data in the right format improves access, processing, ...I it’s natural, elegant, efficient, ...
Many, many datasets
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 3 / 31
ISPs: Internode: layer 3
http:
//www.internode.on.net/pdf/network/internode-domestic-ip-network.pdf
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 4 / 31
ISPs: Level 3 (NA)
http://www.fiberco.org/images/Level3-Metro-Fiber-Map4.jpg
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 5 / 31
Telegraph submarine cables
http://en.wikipedia.org/wiki/File:1901_Eastern_Telegraph_cables.png
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 6 / 31
Electricity grid
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 7 / 31
Bus network (Adelaide CBD)
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 8 / 31
French Rail
http://www.alleuroperail.com/europe-map-railways.htm
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 9 / 31
Protocol relationships
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 10 / 31
Food web
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 11 / 31
Network of Musicians (last.fm)
http://sixdegrees.hu/last.fm/
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 12 / 31
Exchange
Open research requires exchange of dataI replications of, and comparisons with resultsI comparisons between datasets
Working on data in closed formats is badI vendor lock-inI not freeI not portable (or sometimes even backward compatible)I often black boxes
Exchange formats are designed to facilitate open exchange ofdata
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 13 / 31
Portability
Main requirement is portabilityPortability between software
I graph entryI graph analysisI graph visualisationI ...
Portability between architectureI OS (Mac, Linux, Windows, FreeBSD, ...)I Hardware (big-endian v little-endian)
RequirementsOpen formatDocumented
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 14 / 31
Graph exchange file formats1 bintsv4 bintsv4 (GraphLab)2 BioGRID TAB X BioGRID TAB 2.0 Format3 BLAG, GDToolkit Batch layout generator (GDToolkit)4 BVGraph X Boldi-Vigna graph compression5 Chaco X Chaco graph format6 Cluto Cluto/Metis/Graclus format7 DGS X Dynamic GraphStream Format8 DGML Directed Graph Markup Language9 DIMACS DIMACS graph format
10 Dot GraphVis Dot Language11 DotML Dot Markup Language12 DyNetML DyNetML XML13 GAMFF A Graph and Matrix Format14 GDF Guess Data Format15 GDL Graph Description Language16 GEDCOM Geneaological data17 GEXF X Graph Exchange XML Format18 GML X Graph Modelling Language19 Graph6 X Graph620 Graph::Easy X Perl Graph::Easy format21 GraphEd GraphEd simple format22 GraphJSON Graph JSON23 GraphML X Graph Markup Language24 GraphSON TinkerPop’s JSON-based Graph format25 GraphXML X XML-Based Graph Description Language26 GraX GraX27 GRXL XML Specification for Grrr Program28 GT-ITM Georgia Tech Internetwork Topology Models29 GXL X Graph eXchange Language30 Harwell-Boeing Harwell-Boeing sparse (TGFaceny) matri31 Inet Inet Topology Generator file32 ITDK X CAIDA Internet Topology Data Kit33 JSON Graph json-graph-specification34 LEDA LEDA format35 LGF LEMON Graph Format36 LGL Large Graph Layout37 LibSea X CAIDA LibSea format38 KrackPlot X KrackPlot data format39 Matlab Matlab saved workspace40 Matrix Matrix Market sparse matrix41 Mivia Mivia ARG database format42 MultiNet MultiNet43 Netdraw VNA Netdraw VNA44 NetML Network Markup Language45 Ncol Large Graph Layout46 NNF Nested Network Format47 Nod KrackPlot Node format48 NOS Neo Org Stat format49 ns-tcl ns-2 Tcl network definition50 OGDL X Ordered Graph Data Language51 OGML Open Graph Markup Language52 Osprey Osprey file format53 Otter Otter’s native format54 Pajek (.net) X Pajek Tool’s .net format55 Pajek (.paj) X Pajek Tool project (.clu, .vec, .per, ...)56 Planar X Plantri Planar Code andedgeCode57 PSI MI Protenomics Standards Initiative Molecular Interaction58 RSF Rigi Standard Format59 Rocketfuel Rocketfuel ISP Maps60 Rutherford-Boeing Rutherford-Boeing sparse (TGFaceny) matri61 SGB Stanford GraphBase62 SGF Structured Graph Format63 S-Dot S-Dot (lisp interface to Graphviz)64 SIF Simple Interaction Format65 SNAP X Stanford Network Analysis Platform66 SoNIA X So NIA Son format67 Sparse6 X Sparse668 StOCNET StOCNET native format69 TEI Text Encoding Initiative Graph Format (XML-compatible)70 TGF, TGF X Trival Graph Format, and other simpleedgelists (CSV, TSV, Excel, ...)71 Tulip TLP Tulip graph format72 UCINET DL UCINET Data Language73 XGMML eXtensible Graph Markup and Modeling Language74 XMLBIF XML-based BayesNets Interchange Format75 XTND XML Transition Network Definition76 YGF X Y Graph Format
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 15 / 31
Graph exchange file formats
Over 100 considered76 analysed
I aiming at exchange formatsF not all designed for exchange,
but some became de factoexchange formats
I sought minimal level ofdocumentations
F not all exchange formats aredocumented (still)
0
1
2
1990 1995 2000 2005 2010 2015Year
coun
t
structureBNFIntermediateJSONOtherSimpleXML
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 16 / 31
Descriptors
Many possibilities (considered 26)File type
I encodingI representationI meta-dataI compressionI ...
Graph typesI directed, multi-, hyper-, ...
AttributesI what attributes can be stored with graphI extensibility
GeneralI extensibilityI
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 17 / 31
Encoding Types
Storage typeasciiascii/binarybinaryISO 8859unicodeUTF−8
StructureBNFIntermediateJSONOtherSimpleXML
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 18 / 31
Graph RepresentationsList of links: explicitly give
I N: e.g., N = {1,2,3}I E : e.g., E = {(1,2), (1,3)}
Adjacency matrix: define connectivity through a (0,1) matrix Adefined by
Aij =
{1, if (i , j) ∈ E0, otherwise
e.g.,
A =
0 1 11 0 01 0 0
List of neighbour lists:
I for each node: list its neighbours
I e.g.1 2, 323
PathsConstructive and/or proceduralM.Roughan (UoA) Hitch Hikers Guide June 4, 2015 19 / 31
Graph Representations
Representationedgeedge/const/procedge/matrixedge/neighedge/neigh/matrixedge/pathedge/pathsedge/proceduralmatrixmatrix/smatrixneighneigh/edge/matrixsmatrixsmatrix/matrix
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 20 / 31
Descriptors: Graph Types and Generalisations
directed/undirectedacyclic/treesmultigraph or pseudograph: has multiple parallel links betweentwo nodes
I e.g. its easy to have two links between two routersI also allows self-loop
hypergraph: links connect more than two nodesI e.g., where you have a connective medium (rather than a wire), for
instance in a wireless network.hierarchy
I nodes have subgraphs
meta-graph
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 21 / 31
Descriptors: Attributes
Static DynamicNode name, location, type, ... up/down, ...Edge distance, capacity, ... up/down, utilisation, ...
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 22 / 31
Descriptors: Attributes
Single number/weight (per edge)Multiple
I fixed vs extensibleDefaults
I multiple inheritanceVisualization
I not really attributes of graphI used for drawing itI color, shape, layout, ...
PortsTemporal dynamics
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 23 / 31
Descriptors: General
ExtensibleSchema checkingChecksumsMultiple graphsExternal references
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 24 / 31
Common Attributes
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
multiple attributes
edge edge
built in compression
checksums
multiple inheritance
incremental specifications
temporal data dynamics
external data references
ports
hyper graph
multiple graphs
default values
extensible
multi graph
hierarchy
schema checking
checked
visualisation data
edge weights
constructive
procedural
smatrix
matrix
neigh
edge
general attributerepresentation type
0.0 0.2 0.4 0.6Proportion with attribute
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 25 / 31
Attribute Correlations
●
●
●
●
●
●
●
●
●
●
●
●
●
●
multi graph : checked
structure : proc
edgeweights : multiple attributes
default values : ports
multi graph : ports
multiple attributes : schema checking
multi graph : default values
default values : visualisation data
multiple attributes : edge
hyper graph : ports
integral metadata : schema checking
multi graph : multiple attributes
integral metadata : multiple attributes
structure : schema checking
0.00000 0.00004 0.00008 0.00012P−value
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 26 / 31
Lot’s of other considerations
Software supportI documentationI maintenanceI issue of partial support
Public dataI how many people already use it
EfficiencyHuman readability
I self-describing
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 27 / 31
What’s missing?
A lot of DB conceptsMost are designed to be read in one piece
I no data partitioningI no parallelisationI no serialisationI no random access
Most are not designed with data curation in mindI no support for editingI no support for version, ...
Graph DBs handle theseI but not the portability/exchange issues
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 28 / 31
What’s missing?
CompressionStoring large graphs
I similar to storing imagesI nice to have native compression
Only one (BVGraph) treats compression seriouslyI couple of others take a little care but aren’t true compressions
Most have XML-like bloatI file compresses OK, but read/write performance?
There are a few papers on graph compression, but little work onformats that support it
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 29 / 31
Conclusion
Graph Exchange FormatsI very manyI lots of overlapI lots of varietyI no “one true” format
Perhaps we need three:I TGF (Trivial Graph Format)I GraphML (Feature rich, extensible, ...)I Format that can cope with really big graphs
Maybe a container format is what we really need?
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 30 / 31
M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 30 / 31