final_paper_revision

18
Multilevel Network Visualization Emmanuel Oppong Computer Science and Engineering The Pennsylvania State University SROP 2014 Report August 4, 2014 Abstract In this research project, we investigate the problem of visualizing large networks. Networks, or graphs, are used to describe relationships between different objects. Graphs are widely used in social networks, roadway systems, and in general, to describe a system that has interactions among multiple entities. Visualizing relationships through graph drawings is important so that information can be easily comprehended and navigated. Some networks, for instance social networks, can become very large when they represent a large number of entities. In this project, we develop a new multilevel method for visualizing graphs, using existing tools and algorithms for graph drawing. We tested this method on real-world networks from several online repositories, such as the Koblenz network collection and Stanford large network collection. We evaluated the method and compared it to alternatives. This research tool will allow users to

Upload: emmanuel-oppong

Post on 15-Apr-2017

109 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Final_Paper_Revision

Multilevel Network VisualizationEmmanuel Oppong

Computer Science and Engineering

The Pennsylvania State University

SROP 2014 Report

August 4, 2014

Abstract

In this research project, we investigate the problem of visualizing large networks. Networks, or graphs,

are used to describe relationships between different objects. Graphs are widely used in social networks,

roadway systems, and in general, to describe a system that has interactions among multiple entities.

Visualizing relationships through graph drawings is important so that information can be easily

comprehended and navigated. Some networks, for instance social networks, can become very large

when they represent a large number of entities. In this project, we develop a new multilevel method for

visualizing graphs, using existing tools and algorithms for graph drawing. We tested this method on real-

world networks from several online repositories, such as the Koblenz network collection and Stanford

large network collection. We evaluated the method and compared it to alternatives. This research tool

will allow users to generate multilevel network visualizations for describing systems such as social

connections, microorganism relationships, highway systems, large populations, and map topologies.

Introduction

A network is a system of interconnected objects. Graph theory is the mathematical language used to

describe networks. It is a very old branch of mathematics which started in 1736 when Leonhard Euler

attempted to solve the problem of the seven bridges of Konigsberg. He tried to prove that there wasn’t

a possible way of visiting each bridge without crossing one twice [1].Since then, graph theory has

evolved. Currently, researchers study how networks arise in real-world scenarios and analyze their

properties. Graphs are used to model relations in physical, social, biological, and information systems.

Page 2: Final_Paper_Revision

They are a unifying information abstraction to capture various types of data. Graphs are currently widely

used on the internet to make sense of large datasets. In 2012, Google announced the Knowledge Graph

feature as an addition to their search engine [3]. The idea was to build a massive graph of real world

objects and the connections between them. The knowledge graphs uses links between documents on

the web to understand their semantic context. The graph contains millions of objects and billions of

facts connecting them, which it uses to understand the meaning of the keywords entered for the search.

Facebook also utilizes a graph-based search engine. They combine big data from their billions of users

and external data into one big search engine providing user-specific search results.

The amount of data on the internet continues to grow each day. Graphs are used to create network

connections to make it easier to understand the type of information coming in, and the information that

is already there on the web. With the growth of data, especially on the internet, graphs have become

very large. They encapsulate millions of networks and can contain billions of different connection types.

Visual representation of networks is an important way of describing the data they represent.

Visualization of graphs is done with graph drawing techniques. A graph drawing is visual representation

of the vertices and edges it contains. The typical drawing of a graph consists of a shaded circle depicting

the vertices and line segments depicting the edges, which connects related vertices. Graph drawing

makes the information in the graph legible and navigable. The data within a network can be explored

through displaying the vertices and edges in various layouts with attributing colors, size, and other

properties. The display highlights patterns, shows connections, and provides visual information about a

vertex. These factors are used to draw conclusions about a certain dataset, in order to solve complex

problems.

There are many graph drawing techniques that utilize mathematical algorithms to space out the vertices

and edges accordingly. The arc diagram method (See Figure 1) evenly lays out all the vertices on the

same line, and the edges are drawn as semicircles that go above or below the line to connect the

vertices. The layered drawing method (also shown in Figure 1) is done by placing the vertices of directed

graphs in horizontal rows, with the edges directed downwards. These methods are ideal when drawing

displaying networks with a few vertices and edge connections. However, they are not ideal for drawing

larger graphs.

Page 3: Final_Paper_Revision

Figure 1: Arc diagram (left)[5], and Layered method (right)[4].

The force-directed system (see example in Figure2) is a physics-based method that calculates the

attractive and repulsive force between vertices, and moves the vertices along the direction of the force

[7]. The process is repeated multiple times until the edges are close to equal lengths and there are as

few crossing edges as possible. This method is better suited for displaying clustered graphs. The larger

the graph however, the longer it takes for the vertices to be repositioned.The spring electrical model

(Figure 3) is a type of force-directed algorithm, where the system is visualized as electrically-charged

vertices connected by springs [7]. Springs are imagined to be placed between vertices that share edges.

The vertices are pulled together by the spring, while a repulsive electrical force exists among all pairs of

nodes. This method is also repeated until the system reaches equilibrium [7].

Figure 2: Force-directed graph drawing technique [6].

Page 4: Final_Paper_Revision

Figure 3: Spring-Electrical Models [7].

The multilevel approach to graph drawing aims to scale very large graphs to small ones. This is done by

taking the edge connections between multiple vertices and separating them into layers. Figure 4 shows

a demonstration of the multilevel approach. First the original graph is broken down into parts, and then

new vertices are created encapsulating the parts they represent. The new vertices can be used to

construct a smaller graph, which is then displayed. The new smaller graph now allows easy visualization

of the entire graph.

Figure 4: Multilevel graph visualization approach.

Visualization of large networks, i.e., graphs with more than millions of entities, is very challenging. This is

due to the constraints of screen displays and the limitations of current graph drawing algorithms. To

solve this problem, we implement a multilevel approach, where the network is partitioned into smaller

graphs that hold different parts of the larger graph. Figure 4 illustrates the multilevel approach to graph

visualization.

Page 5: Final_Paper_Revision

A network can be partitioned in many different ways. It can be partitioned by labeled categories in the

dataset, using weights associated with the vertices, or using a user-defined parameter present in the

data. For example, if a data consists of a list of interactions between different animals, the data can be

partitioned by grouping together animals that belong to the same species. This way, we can visualize a

higher level view, where the types of species which will be represented by new vertices that belong to a

smaller graph. We can then navigate to a specific species, to view an animal that belongs to that

category.

There are many software tools currently used to visualize small graphs. Gephi[2] is a windows

application that is an interactive visualization and exploration for networks and complex systems. It can

be used for social network analysis, exploratory data analysis, and biological network analysis. It

provides tools for people to explore and understand graphs through graphical visualization. Sigma Js [9],

D3 Js, and Processing Js are all browser-based JavaScript libraries that are dedicated to graph drawing.

JavaScript is a dynamic computer programming language used to develop browser-based applications.

These JavaScript libraries can be used to simplify network visualization in a browser, and allow

application developers to integrate network exploration. We chose the Sigma Js library because it is the

most light-weight of the three aforementioned libraries, and allows more user interaction with the

display. We are creating a web user interface application, where users can upload a formatted large

graph with multiple connections. Sigma Js takes a specific input with formatted labels of the vertices and

edges with listed properties such as color and size. We are developing a PHP script for preprocessing, to

reformat the users input to the format that Sigma Js recognizes. The end goal of this project is to enable

users to upload their generated networks consisting of millions of vertices and billions of edges, and

visualize them in a multilevel manner.

Methodology

The process begins with a formatted graph that consists of multiple vertices and edges. The graph is split

into smaller ones according to their connections. This creates multiple layers of the different parts of the

graphs. The formatted description of the vertices of the smaller graphs holds the identifier of the lower

level networks they represent. When the user wants to navigate to a certain part of the graph, we use

the identifier to locate that part of the graph and magnify the display unto it. The vertex zoom

functionality will be created using JavaScript. A mouse click functionality will also be implemented. The

user can use mouse to navigate through the network by zooming onto specific layers of the graph or

Page 6: Final_Paper_Revision

directly onto a vertex. The Sigma Js library utilizes the force-directed method for drawing. The specific

plug-in of the library that uses the force-directed method is called force atlas. When the user’s network

is ready for display, the force atlas plug-in is called to calculate the position of the vertices for display.

We display the graph using force atlas which is part of the Sigma Js library. The algorithm ensures that

the vertices are well positioned so that all the edges are equal length and that crossing edges are

reduced as much as possible.

During the first four weeks of the eight week research term, we worked on creating the user interface

and building example networks to display. The goal of the application is to allow users to better visualize

and interact with their large networks. The user interface is designed to allow user to move vertices

around the screen, zoom in and out of specific items, and also display textual information about a

vertex. We also added a functionality to change the color of the vertices. Most importantly, the user

interface comes with a search bar where user can search for particular items. The user interface was

designed using HTML, a hypertext markup language used to create the graphical view of a web page.

The user interface consists of input boxes and button selections with which the user can interact with a

mouse and a keyboard. Using JavaScript, We connected the users actions to specific aspects of the

network display, thereby creating the user interactivity with it. We tested networks with different sizes,

small, large, and very large, to analyze the visualization, interactivity and performance the displays. We

found that Sigma Js can processes network with up to 1000 vertices at a preferred performance level,

however, when the vertex count exceeds that amount, performance begins to degrade. This finding is

acceptable for the multilevel approach we will used to solve out problem. If a network with a million

vertices is chosen for visualization, it can be scaled down to a network with 1000 vertices, where each

vertex holds another network with 1000 vertices.

The last four weeks of the research term was dedicated to partitioning of the large graphs into its

smaller scaled representation. To test the multi-level approach, we chose a network with 1000 vertices

and partitioned it into 10 different parts. We partitioned it numerically from 0 to 99, 100 to 199, and so

on. First we used C++ to write the code for breaking up the larger graph. We wrote the code following

the format of the dataset download from the large network databases. The different partitions were

written to new files and another file was created with vertices linked to the partitions. The files are JSON

formats which Sigma Js recognized for created the display of the vertices and edges.

Page 7: Final_Paper_Revision

Findings

We tested many different networks from two main sources, KONECT - The Koblenz Network Collection

[10], and Stanford Large Network Dataset Collection [11]. We also tested many randomly generated

graphs with arbitrary sizes and position. Here are some of the results from displaying the networks using

Sigma Js. Figure 5(a) shows a display of a randomly generated graph using Sigma Js. Figure 5(b) shows

the same graph display with the force directed plug-in from Sigma Js applied to it. As mentioned before,

the vertices of the network are moves so that the edges are close to equal length when the force

directed algorithm is applied.

Figure 5: a) Random generated graph with

Sigma Js.

Figure 5: b) Force directed plug-in applied.

Figures 6(a), 6(b), and 6(c) show examples of networks visualized using Sigma Js. These networks were

downloaded from Stanford large network database. The format of the data set was defined by the

creators and therefore had to be converted to the format required by Sigma Js. After careful conversion

from the Stanford’s graph data format to Sigma Js’ JSON format, we displayed the graph along with its

properties. We also tested the effects of the user interface dialog box on these networks. We found that

the vertices responded to the mouse and keyboard actions designed in the program. The vertices move

accordingly and changes colors upon selection of the option to change a vertex color, through the user

interface. Figure 6(a) displays a network with 1000 vertices. Figure 6(b) has a network with 5000

vertices, and Figure 6(c) has a network with 10000 vertices. As we can see in the displays, the network

becomes clustered with the vertex points. The network becomes very hard to visualize. It is not easy to

interpret the type of information being conveyed by the graph. It also takes very long to navigate

through the graph to find a specific item.

Page 8: Final_Paper_Revision

Figure 6: a) 1000 vertices. Figure 6: b) 5000 vertices.

Figure 6: c) 10,000 vertices.

The beginning face of the user interface (Figure 7), directly allows the user to interact with the network

displayed. Interactivity also plays an important role in the visualization of the networks, especially when

implementing the multilevel approach. The user interface makes the information with the network

easily accessible through a navigable display. The display screen can be repositioned along with specific

item to visualize specific parts of the network or to maneuver unto certain vertices. The user interface

allows the user to search for specific items with the data set, change the color of the vertices and

edges, and also change how the edges are drawn. The user can also fit the network to the screen is they

have navigated too far into the display. We defined the number of iteration of the force directed

algorithm when the network is first loaded onto the web browser screen. The user interface has an

option for the user to continue iterating through the algorithm to get a better display of the network.

Page 9: Final_Paper_Revision

Figure 7: User interface dialog box.

For testing the multilevel approach to visualizing large networks, we chose the network from Figure 6(a),

to partition. Our goal was to partition it into 10 parts and create a new display to link the partitioned

parts to the files they are stored in. When we loaded Sigma Js, the new display is drawn unto the screen

and also follows the interactivity of the user interface. The vertices in those displays can be changed

with color, position and style. We can now navigate to specific parts of the network we want to display.

We added two animation processes that display either the part of the graph the user wants to navigate

to, or a specific item. If the user searches for an item in the search box, the first animation zooms in to

the part of the graph that item belongs to. Then that part of the graph is loaded onto the screen. The

second animation zooms on to the item search and displays it along with its attributes.Figure 8 shows

the display of the scaled down version of the test network (Figure 6(a)). The vertices are color coated to

match the colors of the part of the larger network it is linked to. The graph is displayed with the force

directed algorithm applied to it. Figure 9 shows the different partitions that were created. The display of

each part consists of items that belong and items from other parts that are linked. They follow the color

coat. If there is at least one connection between two parts of the graph, an edge is drawn in Figure 8 to

connect those two parts.

Page 10: Final_Paper_Revision

Figure 8: Scaled display result from partitioning network in Figure 6(a).

Page 11: Final_Paper_Revision

Figure 9: The different parts of the larger graph the vertices in Figure 8 are linked to.

Discussion

The results of the tests ran on network display using Sigma Js confirmed our assumptions of the

multilevel approach. When the network is scaled down to a smaller size compared to its larger

representation, we are able to analyze the larger network very easily. For our tests, we chose to use a

network consisting of 1000 vertices. Given the results from these tests, we believe that partitioning and

visualizing networks with over a million vertices will follow the same process and produce similar results.

We have set goals to test our partition algorithm on these much larger networks. The next step is to

create multiple stages when partitioning the networks. For example, if a network has a million vertices,

we can partition it into 1000 different parts, each consisting of 1000 vertices from the larger network.

We can then move to partition further by splitting the new display into 10 parts, the display that consists

of 1000 vertices linked to the 1000 different parts of the larger graph.

We encountered several challenges while conducting this research. The primary concern when designing

the application was to create as much client-side processes as possible and utilize minimal server-side

processes. On web browser applications, server-side processes are those handled on the computer of

the host, and client-side processes are handled on the computer of the user accessing the application.

We aim to process the partitioning of the graph on the user end. However, in the current approach, this

is done on the server. We faced another problem with the use of the force directed plug-in provided by

Sigma Js. We saw that in some displays, the algorithm ran continuously without stopping. We saw some

vertices constantly moving, sometimes back and forth in the same position. We resolved this by only

iterating through the plug-in a certain amount of time and then bringing it to a halt for the first display.

As mentioned earlier, we provided an option in the user interface for the user to continue iterating

through the plug-in if they wanted a better display than the one provided.

We are now developing this work further to possibly include an improved user interface dialog box, and

parallel partitioning of larger networks. We will test the different partitioning algorithms on networks

Page 12: Final_Paper_Revision

consisting of millions of vertices and billions of edges. The goal is to minimize the time it takes to

partition the items in the larger datasets and to display the results. If the time to partition the dataset is

minimized, we will add a function in the user interface to allow users to partition the network in real

time. They will be able to define how they want the data to be separated in accordance to the format

provided and see the end results of it on the display screen. We will also design better iteration of the

force directed plug-in so that the first display of the network is desirable.

We believe that the findings of the research project will greatly benefit those interested in analyzing and

interpreting their large datasets through visualization. The end result of the research will be a website

with user access to the application. The website will allow anyone to upload their datasets and easily

visualize and interact with the information conveyed by the dataset. The website will also support

multiple formats of the dataset, and will provide a guideline for the user to follow so the upload the

right formatted document.

References

1. Rhishikesh S. Fansalkar, “Graph Theory Origin and Seven Bridges of Königsberg”, New York

University, 2007.

2. The Gephi team, “Gephi”, http://gephi.github.io/, last accessed August 2014.

3. The Google Team, “Inside Search”,

http://www.google.com/insidesearch/features/search/knowledge.html, last accessed August

2014.

4. “Graph layout”, http://goblin2.sourceforge.net/refman/pageGraphLayout.html, last accessed

August 2014.

5. Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour Through the Visualization Zoo”,

http://homes.cs.washington.edu/~jheer/files/zoo/, last accessed August 2014.

6. John Howse, Peter Rodgers, and Gem Stapleton, "VL/HCC Tutorial 2009: Automated Diagram

Drawing", http://www.eulerdiagrams.com/tutorial/AutomatedDiagramDrawing.html, last

accessed August 2014.

7. Yifan Hu, “Current and Future Challenges in the Visualization of Large Networks”, Encyclopedia

of Social Network Analysis and Mining, 2013.

8. Yifan Hu, “Efficient, High-Quality Force-Directed Graph Drawing", The Mathematica Journal

10(1), 2006.

Page 13: Final_Paper_Revision

9. Alexis Jacomy, “Sigma js library”, http://sigmajs.org/, last accessed August 2014.

10. Jérôme Kunegis, “KONECT-The Koblenz Network Collection”,

http://konect.uni-koblenz.de/networks/, last accessed August 2014.

11. Jure Leskovec, “Stanford Large Network Dataset Collection”,

http://snap.stanford.edu/data/index.html, last accessed August 2014.