diffusion processes on complex networksprac.im.pwr.wroc.pl/~szwabin/assets/diff/2.pdf ·...

28.02.2018 2_network_properties

file:///home/szwabin/Dropbox/Zajecia/Diffusion/Lectures/2_networks/2_network_properties.html 1/28

Diffusion processes on complex networks

Lecture 2 - Network properties

Janusz SzwabińskiOverview:

Degrees and their distributionAdjacency matrixPaths and distancesNetwork diameterConnectednessClustering coefficient

In [1]:

%matplotlib inline

In [2]:

import networkx as nx

Degrees and their distributionsdegree of a node represents the number of links the node has to other nodesa key property of each node in a networkin an undirected graph the total number of links can be expressed as the sum of the nodedegrees:

The factor corrects for the fact that in the sum each link is counted twice.

In [3]:

G = nx.barabasi_albert_graph(100,4)

In [4]:

len(G.edges())

L

L =1

2∑i=1

N

ki

1/2

Out[4]:

384



In [5]:

def count_edges(graph):

"""Sums node degrees to count the number of edges"""

count = 0

for n in graph.nodes():

count = count + graph.degree(n)

return count/2

In [6]:

count_edges(G)

In [7]:

H = nx.erdos_renyi_graph(100,0.2)

In [8]:

len(H.edges()) == int(count_edges(H))

Average degreean important property of a whole networkundirected networks:

In [9]:

degrees = dict(G.degree())

kaver = sum(degrees.values())/len(G)

print(kaver)

In [10]:

2*len(G.edges())/len(G)

⟨k⟩ = =1

N∑i=1

N

ki2L

N

Out[6]:

384.0

Out[8]:

True

7.68

Out[10]:

7.68



Directed networks

incoming degree - number of links pointing to node for example, it is the number of WWW pages that include hyperlinks pointing to a givendocument

outgoing degree - number of links that point from node to other nodesnumber of webpages a given document is pointing to (i.e. number of hyperlinks containedin this document)

a node's total degree is then given by

degree and the number of edges:

average degree:

Thus:

In [11]:

D = nx.scale_free_graph(20)

In [12]:

nx.draw(D)

kini i

kouti i

= +ki kini kouti

L = =∑i=1

N

kini ∑i=1

N

kouti

⟨ ⟩ = =kin1

N∑i=1

N

kiniL

N

⟨ ⟩ = =kout1

N∑i=1

N

kouti

L

N

⟨ ⟩ = ⟨ ⟩kin kout



In [13]:

len(D.edges())

In [14]:

count_edges(D)

In [15]:

outdegrees = dict(D.out_degree())

koa = sum(outdegrees.values())/len(D)

print(koa)

In [16]:

indegrees = dict(D.in_degree())

kia = sum(indegrees.values())/len(D)

print(kia)

In [17]:

len(D.edges())/len(D)

Degree distribution

In [18]:

G.degree()

Out[13]:

46

Out[14]:

46.0

2.3

2.3

Out[17]:

2.3

Out[18]:

DegreeView({0: 12, 1: 24, 2: 24, 3: 1, 4: 37, 5: 30, 6: 14, 7: 16,

8: 26, 9: 10, 10: 21, 11: 9, 12: 11, 13: 14, 14: 15, 15: 15, 16: 1

0, 17: 6, 18: 10, 19: 12, 20: 6, 21: 8, 22: 6, 23: 8, 24: 11, 25: 1

2, 26: 11, 27: 7, 28: 11, 29: 7, 30: 7, 31: 8, 32: 10, 33: 8, 34:

7, 35: 8, 36: 5, 37: 8, 38: 10, 39: 7, 40: 4, 41: 5, 42: 7, 43: 4,

44: 5, 45: 8, 46: 5, 47: 4, 48: 7, 49: 4, 50: 6, 51: 8, 52: 4, 53:

8, 54: 5, 55: 4, 56: 8, 57: 6, 58: 5, 59: 5, 60: 5, 61: 6, 62: 5, 6

3: 5, 64: 4, 65: 5, 66: 6, 67: 4, 68: 4, 69: 4, 70: 4, 71: 4, 72:

4, 73: 4, 74: 4, 75: 4, 76: 4, 77: 4, 78: 4, 79: 7, 80: 4, 81: 5, 8

2: 4, 83: 5, 84: 4, 85: 4, 86: 5, 87: 4, 88: 4, 89: 4, 90: 4, 91:

4, 92: 4, 93: 4, 94: 4, 95: 4, 96: 4, 97: 4, 98: 4, 99: 4})



nodes have different degreestheir distribution is another important property of a networkthe degree distribution provides probability that a randomly selected node in the network hasdegree since is a probability, it must be normalized:

for a network with nodes the degree distribution is nothing but the normalized histogram

where is the number of nodes with degree . Hence we have

Example 1

Consider the following network:

In [19]:

G1 = nx.Graph()

G1.add_edges_from([(3,4),(4,2),(3,2),(2,1)])

In [20]:

nx.draw(G1,with_labels=True)

pkk

pk

= 1∑k=1

∞

pk

N

= ,pkNk

N

Nk k= NNk pk



We have nodes. One node has the degree 1, thus

There are 2 nodes with the degree 2:

And we have one node with degree 3:

There are no other nodes, i.e.

Let us plot the corresponding histogram:

In [21]:

import matplotlib.pyplot as plt

degs = [1,2,3]

pvals = [0.25,0.5,0.25]

plt.stem(degs,pvals)

plt.xlabel("degree $k$")

plt.ylabel("$p_k$")

plt.title("Degree distribution")

plt.xticks(range(1,4))

Now, let us calculate the histogram on a computer:

N = 4

= = 0.25.p11

4

= = 0.5p22

4

= = 0.25.p31

4

= 0.pk>3

Out[21]:

([<matplotlib.axis.XTick at 0x7fb7fe257e80>,

<matplotlib.axis.XTick at 0x7fb8350e87b8>,

<matplotlib.axis.XTick at 0x7fb7fe220c88>],

<a list of 3 Text xticklabel objects>)



In [22]:

import collections

degree_sequence = sorted([d for n, d in G1.degree()], reverse=True) # degree se

quence

print(degree_sequence)

degreeCount = collections.Counter(degree_sequence)

print(degreeCount)

deg, cnt = zip(*degreeCount.items())

print(deg,cnt)

In [23]:

#the histogram

plt.stem(deg, cnt)

plt.title("Degree Histogram")

plt.ylabel("Count")

plt.xlabel("Degree")

#and the graph as inset

plt.axes([0.6, 0.6, 0.2, 0.2]) #left bottom witdh height

pos = nx.spring_layout(G1)

plt.axis('off')

nx.draw_networkx_nodes(G1, pos, node_size=20)

nx.draw_networkx_edges(G1, pos, alpha=0.4)

Example 2

[3, 2, 2, 1]

Counter({2: 2, 3: 1, 1: 1})

(3, 2, 1) (1, 2, 1)

Out[23]:

<matplotlib.collections.LineCollection at 0x7f56df005a20>



In [24]:

G2 = nx.watts_strogatz_graph(11,2,0)

nx.draw_circular(G2,with_labels=True)

In this case we have a ring with all nodes having the degree 2, thus

The corresponding histogram is as follows:

= {pk1,

0,

k = 2

otherwise



In [25]:

#count the degrees

degree_sequence = sorted([d for n, d in G2.degree()], reverse=True) # degree se

quence

degreeCount = collections.Counter(degree_sequence)

deg, cnt = zip(*degreeCount.items())

#plot the histogram

plt.stem(deg, cnt)

plt.title("Degree Histogram")

plt.ylabel("Count")

plt.xlabel("Degree")

#and the graph as inset

plt.axes([0.6, 0.6, 0.2, 0.2]) #left bottom witdh height

pos = nx.spring_layout(G2)

plt.axis('off')

nx.draw_networkx_nodes(G2, pos, node_size=20)

nx.draw_networkx_edges(G2, pos, alpha=0.4)

Please note that in this case the degree distribution is a Kronecker delta function, i.e.= δ(k − 2)pk

Out[25]:

<matplotlib.collections.LineCollection at 0x7f56df0aacc0>



Importance of the degree distribution

it allows to calculate many network propertiese.g. the average degree of a network may be written as

the precise functional form of impacts many network phenomena, from network robustness tothe spread of viruses

Degree distributions of real networks

node degrees can vary widely

networks (c), (d) and (f) appear to have a power-law distributions, as indicated by theirapproximately straight-line forms on the doubly logarithmic scales(b) has a power law tail, but deviates from power-law behavior for small degreespower grid network is described by an exponential distribution (note the log-linear scale)network (a) appears to have a truncated power-law degree distribution of some type or possibly twoseparate power law regimes with different exponents

⟨k⟩ = k∑k=0

∞

pk

pk



Adjacency matrixfor mathematical purposes networks are often represented through their adjacency matricesthe adjacency matrix of a directed network of nodes has rows and columns, its elementsbeing:

if there is a link pointing from node to node if nodes and are not connected to each other

the adjacency matrix of an undirected network is symmetric, i.e.

Example 1

Consider the following undirected network:

In [26]:

G = nx.Graph()

G.add_edges_from([(1,3),(1,2),(3,2),(2,4)])

nx.draw_spring(G,with_labels=True)

The adjacency matrix of the graph is

N N N

= 1Aij j i= 0Aij i j

=Aij Aji

A =

⎛⎝⎜⎜⎜

0

1

1

0

1

0

1

1

1

1

0

0

0

1

0

0

⎞⎠⎟⎟⎟



Having the adjacency matrix, we can express the degree of a node as a sum over the appropriate column orthe row of the matrix, e.g.

Since and , then

In NetworkX, once we have built a graph, we can look at its adjacency matrix with

In [27]:

A = nx.adjacency_matrix(G)

In [28]:

print(A.todense())

In [29]:

G.nodes()

In [30]:

A = nx.adjacency_matrix(G,nodelist=[1,2,3,4])

print(A.todense())

Example 2

Let us consider now a directed network, e.g.

= = = 3k2 ∑j=1

4

A2j ∑i=1

4

Ai2

=Aij Aji = 0Aii

L =1

2∑i,j=1

N

Aij

[[0 1 1 0]

[1 0 1 0]

[1 1 0 1]

[0 0 1 0]]

Out[29]:

NodeView((1, 3, 2, 4))

[[0 1 1 0]

[1 0 1 1]

[1 1 0 0]

[0 1 0 0]]



In [31]:

D = nx.DiGraph()

D.add_edges_from([(3,2),(1,2),(3,1),(2,4)])

nx.draw_spring(D, with_labels=True)

The corresponding adjacency matrix has the form

Again, we can use the matrix to calculate the degree of a node and the number of edges:

Important note

In some texts you may find a different convention for the adjacency matrix of a directed graph, i.e.

Comming back to our example

A =

⎛⎝⎜⎜⎜

0

1

0

0

0

0

0

1

1

1

0

0

0

0

0

0

⎞⎠⎟⎟⎟

= = 2kin2 ∑j=1

4

A2j

= = 1kout2 ∑i=1

4

Ai2

L = ∑i,j=1

N

Aij

= {A∗ij

1,

0,

if there is an edge from i to j

otherwise



In [32]:

nx.draw_spring(D, with_labels=True)

this alternative definition gives the following adjacency matrix:

Note that

In other words, what was the incoming degree before is now the outgoing one and vice versa:

It actually does not matter which convention is used, provided it is used consistently.

=A∗

⎛⎝⎜⎜⎜

0

0

1

0

1

0

1

0

0

0

0

0

0

1

0

0

⎞⎠⎟⎟⎟

=A∗ AT

= = 2kin2 ∑i=1

4

A∗i2

= = 1kout2 ∑j=1

4

A∗2j



Paths and distancesin physical systems distance plays a key role in determining the interaction between theircomponents

e.g. the distance between the Sun and the Earth determines the gravitational force thatacts between them

in case of networks distance is a challenging concepte.g. what is the distance between two webpages?

the physical distance is not relevant here (webpages could be hosted on serversbeing oh the opposite sides of the globe)

in networks the physical distance is usually replaced by the path lengtha path is a route that runs along the links of a networkthe length of a path is the number of links the path contains

the shortest path between nodes 1 and 7 is the path with the fewest number of edgesthere can be multiple paths of the same lengthbelow, the path between nodes and will be denoted as in an undirected network

in directed networks usually

i j dij

=dij dji

≠dij dji



Adjacency matrix and pathsthe number of shortest paths and the distance between nodes and can be calculateddirectly from the adjacency matrix

If there is a direct link between and , then

If there is a path of lenght 2 netween nodes and , then it must be

for some . Then the number of paths is given by

This is nothing but an element of .

If there is a path of lenght netween nodes and , then

As before, the number of paths of length between and is given by

these equations hold for both directed and undirected networksthe distance between nodes and is the path with the smallest for which

elegant approach that works well for networks of moderate sizesfor large networks it is more efficient to use BFS to determine the distances between nodes

Nij dij i jAij

= 1dij

i j= 1Aij

= 2dij

i j= 1AikAkj

k = 2dij

= = (N(2)ij ∑

k=1

N

AikAkj A2)ij

A2

= ddij

d i j… = 1Aik Arj

d i j

= (N(d)ij Ad)ij

i j d

> 0N(d)ij



Breadth-first search algorithmalgorithm for traversing or searching tree or graph data structuresit starts at the tree root (or some arbitrary node of a graph, sometimes referred to as a 'search key')and explores the neighbor nodes first, before moving to the next level neighboursBFS and its application in finding connected components of graphs were invented in 1945 byMichael Burke and Konrad Zuse, but this was not published until 1972it was reinvented in 1959 by E. F. Moore, who used it to find the shortest path out of a mazeit was discovered independently by C. Y. Lee as a wire routing algorithm (published 1961)applications:

copying garbage collection, Cheney's algorithmfinding the shortest path between two nodes u and v, with path length measured bynumber of edges(reverse) Cuthill–McKee mesh numberingFord–Fulkerson method for computing the maximum flow in a flow networkserialization/deserialization of a binary tree, allows the tree to be re-constructed in anefficient mannerconstruction of the failure function of the Aho-Corasick pattern matchertesting bipartiteness of a graph

Consider the following network:

The identification of the shortest path between nodes and goes along the following steps:

1. Start at node and label it with 0.

2. Find nodes directly linked with the node . Label the with distance 1 and put them in a queue.

i j

i

i



3. Take the first node, labelled , out of the queue ( in the first step). Find the unlabelled nodesadjacent to it in the graph. Label them with and put them in the queue.

4. Repeat step 3 until you find the taget node or there are no more nodes in the queue.

5. The distance between and is the label of .6. If does not have a label, the nodes belong to different components. Then

The computional complexity of BFS algorithm is

linear in both and each node needs to be entered and removed from the queue at most onceeach link has to be tested only once

In [33]:

G = nx.Graph()

n n = 1n + 1

j

i j jj

= ∞dij

O(N + L)

N L



In [34]:

G.add_edges_from([(1,2),(2,3),(2,4),(3,5),(4,5),(3,6),(5,6),(6,7),(7,8),(7,9)])

In [35]:

nx.draw_spring(G,with_labels=True)

In [37]:

bfs5 = nx.bfs_tree(G,5)

In [42]:

nx.draw_spring(bfs5,with_labels=True)



In [44]:

for e in nx.bfs_edges(G,5):

print(e)

Network diameterdiameter of the network is the maximum shortest path in the networklargest distance recorded between any pair of nodes

In [45]:

print(nx.diameter(G))

Average path lengthThe average path length is the average distance between all pairs of nodes in the network.

For a directed network of nodes it is given by:

In case of an undirected network we have:

the average path length is measured only for node pairs that are in the same componentit distinguishes an easily negotiable network from one, which is complicated and inefficient (ashorter average path length being more desirable)even if the average path length is small, the network itself might have some very remotelyconnected nodes (and many nodes, which are neighbors of each other)we can use BFS algorithm to determine for a large network:

1. Determine the distances between the first node and all other nodes with BFS.2. Determine the distances between the second node and all other nodes.3. Repeat the procedure for all remaining nodes.4. Sum the distances and divide them by the number of pairs

dmax

⟨d⟩

N

⟨d⟩ =1

N(N − 1)∑

i,j=1,Ni≠j

dij

⟨d⟩ =2

N(N − 1)∑

i,j=1,Ni≠j

dij

⟨d⟩

(5, 3)

(5, 4)

(5, 6)

(3, 2)

(6, 7)

(2, 1)

(7, 8)

(7, 9)

5



In [46]:

nx.average_shortest_path_length(G)

Connectednesskey utility of most networks is to ensure connectedness

a phone would be of limited use as a communication device if we could not call any validphone numberthe network behind the phone must be capable of establishing a path between any twonodes

in an undirected network nodes and are connected, if there is a path between themthey are disconnected if such path does not exist, in which case we have

In [47]:

TCN = nx.Graph()

TCN.add_edges_from([(1,2),(2,3),(1,3),(4,7),(7,6),(7,5),(5,6)])

In [48]:

nx.draw_spring(TCN,with_labels=True)

i j= ∞dij

Out[46]:

2.388888888888889



the above network consists of two disconnected clusters (components)within each cluster, there are paths between any two nodesthere are no paths between nodes belonging to different clustersa network is connected if all pairs of nodes in the network are connecteda network is disconnected if there is at least one pair with

for small networks visual inspection can help us to decide if they are connected or notfor networks of moderate sizes the adjacency matrix can be rearranged into a block diagonal form, ifthey are disconnected

In [49]:

A = nx.adj_matrix(TCN)

print(A.todense())

thus, tools of linear algebra may be used to determine if the adjacency matrix is block diagonalfor large networks the components are more efficiently identified using the BFS algorithm:

1. Start from a randomly chosen node and perform a BFS. Label all nodes reached this waywith .

2. If the total number of labeled nodes equals , then the network is connected. Otherwise,it consists of several components. To identify them, proceed to step 3.

3. Increase the label,

Choose an unmarked node and start BFS to find all nodes reachable from . Label themwith and return tu step 2.

In [51]:

for cc in nx.connected_components(TCN):

print(cc)

In [52]:

for cc in nx.connected_components(G):

print(cc)

In [53]:

nx.is_connected(G)

= ∞dij

in = 1

N

n → n + 1.j j

n

[[0 1 1 0 0 0 0]

[1 0 1 0 0 0 0]

[1 1 0 0 0 0 0]

[0 0 0 0 1 0 0]

[0 0 0 1 0 1 1]

[0 0 0 0 1 0 1]

[0 0 0 0 1 1 0]]

{1, 2, 3}

{4, 5, 6, 7}

{1, 2, 3, 4, 5, 6, 7, 8, 9}

Out[53]:

True



In [54]:

nx.is_connected(TCN)

Clustering coefficienta clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster togetherevidence suggests that in most real-world networks, and in particular social networks, nodes tend tocreate tightly knit groups characterised by a relatively high density of tiesthis likelihood tends to be greater than the average probability of a tie randomly establishedbetween two nodestwo versions of this measure exist: the global and the local:

the global version was designed to give an overall indication of the clustering in thenetworkthe local gives an indication of the embeddedness of single nodes

For a node with degree the local clustering coefficient is defined as

Here:

= the number of links between neighbors of node

- the maximum possible number of links between neighbors of node

The values of range from 0 to 1:

In [55]:

C0 = nx.Graph()

C0.add_edges_from([(1,3),(4,3),(5,3),(2,3)])

nx.draw_spring(C0,with_labels=True)

i ki

=ci2Li

( − 1)ki ki

Li i( −1)ki ki

2i

ci

Out[54]:

False



In this case we have for instance:

In [56]:

C1 = nx.Graph()

C1.add_edges_from([(1,3),(4,3),(5,3),(2,3),(1,4),(1,2),(2,5)])


For the above network, the local clustering coefficient of the node 3 is equal to

= = 0c30

6

= = 0.5c33

6



In [57]:

C2 = nx.Graph()

C2.add_edges_from([(1,3),(4,3),(5,3),(2,3),(1,4),(1,2),(2,5),(4,5),(1,5),(4,2)])


In this case we have:

Please note that in the last example the neighbors of the target node 3 are connected via a complete graph.Indeed, if we remove node 3 from the network, the resulting graph will have the maximal possible number oflinks:

In [58]:

C2.remove_node(3)


= = 1c36

6



measures the network's local link densitythe more densely interconnected the neighborhood of node , the higher is its local clusteringcoefficienthaving the clustering coefficients of each node, we can calculate the average clustering coefficientof the whole network:

may be interpreted as the probability that two neigbors of a randomly selected node link to eachother

In [63]:

c3 = nx.Graph()

c3.add_edges_from([(1,2),(2,3),(2,4),(2,5),(4,5),(4,6),(4,7),(5,7)])

ccs = ["0","1/6","0","1/3","1/3","0","1"]

labels={}

for i in range(1,8):

labels[i]=ccs[i-1]

nx.draw_spring(c3,labels=labels)

For the above network we have:

cii

⟨c⟩ =1

N∑i

ci

⟨c⟩

⟨c⟩ = ≃ 0.3113

42



In [64]:

nx.clustering(c3)

In [65]:

nx.average_clustering(c3)

the global clustering coefficient is based on triplets of nodesa triplet consists of three connected nodesa triangle includes three closed triplets, one centered on each of the nodes

In [66]:

#a triangle

tria = nx.Graph()

tria.add_edges_from([(1,2),(2,3),(3,1)])

nx.draw_spring(tria,with_labels=True)

Corresponding triplets:

cΔ

1 → 2 → 32 → 3 → 13 → 2 → 1

Out[64]:

{1: 0,

2: 0.16666666666666666,

3: 0,

4: 0.3333333333333333,

5: 0.6666666666666666,

6: 0,

7: 1.0}

Out[65]:

0.3095238095238095



the global clustering coefficient is the number of closed triplets (or 3 x triangles) over the totalnumber of triplets (both open and closed)

the first attempt to measure it was made by Luce and Perry (1949)this measure gives an indication of the clustering in the whole network (global)it can be applied to both undirected and directed networks (often called transitivity)the roots of the global clustering coefficient go back to the social network literature of 1940s (rationof transitive triplets)

vs

if we look at the definition of the local clustering coefficient, is the number of triangles node isparticipating in, as each link between two neighbors of node closes a trianglethus, the global coefficient also captures the degree of network clusteringhowever, and are not equivalentin random networks the measures differ sligthly

the average coefficient places more weight on the low degree nodesthe global one places more weight on the high degree nodesa weighted average where each local clustering score is weighted by isidentical to the global clustering coefficient

you can find networks in which the metrics give significantly different results:

⟨c⟩ cΔ

Li ii

⟨c⟩ cΔ

( − 1)ki ki

= { ⇒ ⟨c⟩ = 1 − O(1)ci1,

,2N−1

i ≥ 3

i = 1, 2

≃cΔ1

N

diffusion processes on complex networksprac.im.pwr.wroc.pl/~szwabin/assets/diff/2.pdf ·...

Documents