network centrality measures and their effectiveness

48
centrality measures Survey and comparisons Authors: Antonio Esposito Emanuele Pesce Supervisors: Prof. Vincenzo Auletta Ph.D Diodato Ferraioli Aprile 2015 University of Salerno, deparment of computer science 0

Upload: emapesce

Post on 17-Aug-2015

61 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Network centrality measures and their effectiveness

centrality measuresSurvey and comparisons

Authors: Antonio EspositoEmanuele Pesce

Supervisors: Prof. Vincenzo Auletta

Ph.D Diodato Ferraioli

Aprile 2015

University of Salerno, deparment of computer science

0

Page 2: Network centrality measures and their effectiveness

outline

Introduction

Centrality measures

Geometric measures

Path-based measures

Spectral measures

Effectiveness of centrality measures

Axioms for centrality

Information retrieval

Conclusions

1

Page 3: Network centrality measures and their effectiveness

introduction

Page 4: Network centrality measures and their effectiveness

centrality of a network

What is a centrality measure?

∙ Given a network, the centrality is a quantitative measure whichaims at reveling the importance of a node

∙ The more a node is centered, the more it is important∙ Formally, a centrality measure is a real valued function on thenodes of a graph

What do you mean by center?

∙ There are many intuitive ideas about what a center is, so there aremany different centrality measures

3

Page 5: Network centrality measures and their effectiveness

definition of center

The center of a star is at the same time:

∙ the node with largest degree∙ the node that is closest to the other nodes∙ the node through which most shortest paths pass∙ the node with the largest number of incoming paths∙ the node that maximize the dominant eigenvector of the graphmatrix

Several centrality indices

∙ Different centrality indices capture different properties of anetwork

4

Page 6: Network centrality measures and their effectiveness

centrality: some applications

Centrality is used often for detecting:

∙ how influential a person is in a social network?∙ how well used a road is in a transportation network?∙ how important a web page is?∙ how important a room is in a building?

5

Page 7: Network centrality measures and their effectiveness

centrality measures

Page 8: Network centrality measures and their effectiveness

centrality measures

Geometric measures∙ Indegree

∙ Closeness

∙ Harmonic

∙ Lin’s Index

Path-based measures∙ Betweeness

Spectral measures∙ The left dominant eigenvector

∙ Seeley’s index

∙ Katz’s index

∙ PageRank

∙ HITS

∙ SALSA7

Page 9: Network centrality measures and their effectiveness

different centrality measures

Example of different centrality measures applied to the samenetwork

8

Page 10: Network centrality measures and their effectiveness

geometric measures

The idea

∙ In geometric measures the importance is a function of distances.∙ A geometric centrality depends on how many nodes exist at everydistance

9

Page 11: Network centrality measures and their effectiveness

geometric measures: indegree centrality

∙ Indegree centrality is defined as the number of incoming arcs of anode x

Cindegree(x) = d−(x) (1)

∙ The node with the highest degree is the most important

When to use it?

∙ To identify people whom you can talk to∙ To identify people whom will do favors for you

10

Page 12: Network centrality measures and their effectiveness

indegree centrality: examples

Indegree measure applied on different networks

11

Page 13: Network centrality measures and their effectiveness

indegree centrality: examples

Indegree centrality can be deceiving because it is a local measure

Indegree centrality doeas not work well for:

∙ detecting nodes that are broker between two groups∙ predicting if an information reaches a node

12

Page 14: Network centrality measures and their effectiveness

geometric measures: closeness centrality

∙ Closeness centrality of x is defined by:

Ccloseness(x) =1∑

d(y,x)<∞d(y, x)

(2)

∙ Divide it for the max number of nodes (n− 1) to normalize the closeness centrality

∙ Nodes with empty coreachable set have centrality 0

∙ The closer a node is to all others, the more it is important

When to use it?∙ To identify people whom tend to be very influential person within their localnetwork

∙ They may often not be public figures, but they are often respected locally

∙ To measure how long it will take to spread information from node x to all othernodes

13

Page 15: Network centrality measures and their effectiveness

closeness centrality: example

Closeness measure applied to different networks

14

Page 16: Network centrality measures and their effectiveness

geometric measures: harmonic centrality

∙ Harmonic centrality of x, with the convention∞−1 = 0 is definedby:

Charmonic(x) =1∑

y=xd(y, x) (3)

∙ It is correlated to closeness centrality in simple networks, but italso accounts for nodes y that cannot reach x

When to use it?

∙ The same for the closeness but it can be applied to graphs thatare not connected

15

Page 17: Network centrality measures and their effectiveness

harmonic centrality: examples

Harmonic and indegree measures applied to the same network(Zachary’s karate club)

16

Page 18: Network centrality measures and their effectiveness

lin’s index

∙ Lin’s index of x

Clin(x) =|{y | d(y, x) < ∞}|2∑

d(y,x)<∞d(y, x) (4)

∙ As closeness, but here nodes with a larger coreachable set aremore important

A fact

∙ Surprisingly, Lin’s index was ignored in literature, even though itseems to provide a reasonable solution for detecting centers innetworks

17

Page 19: Network centrality measures and their effectiveness

path-based measures

The idea

∙ Path-based measures exploit not only the existence of shortestpaths but actually take into examination all shortest paths (or allpaths) coming into a node

18

Page 20: Network centrality measures and their effectiveness

path-based measures: betweenness centrality

∙ The intuition behind the betweenness centrality is to measure theprobability that a random shortest path passes though a givennode. Betweenness of x is defined as:

Cbetweenness(x) =∑

y,z=x,αyz =0

αyz(x)αyz

(5)

∙ αyz is the number of shortest paths going from y to z∙ αyz(x) is the number of shortest paths that pass through x∙ The higher is the fraction of shortest paths which passes througha node, the more the node is important

When to use it?

∙ To identify nodes which have a large influence on the transfer ofitems through the network

19

Page 21: Network centrality measures and their effectiveness

betweenness centrality: examples

Betweenness applied to different networks

20

Page 22: Network centrality measures and their effectiveness

betweenness and indegree

Betweenness and indegree measures applied to the same network(Zachary’s karate club)

21

Page 23: Network centrality measures and their effectiveness

betweenness and closeness

∙ Betweenness and closeness measures applied to the samenetwork

∙ The nodes are sized by degree and colored by betweenness

22

Page 24: Network centrality measures and their effectiveness

spectral measures

The idea

∙ In spectral measures the importance is related to the iteratedcomputation of the left dominant eigenvector of the adjacencymatrix.

∙ In the spectral centrality the importance of a node is given by theimportance of the neighbourhood

∙ The more important are the nodes pointing at you, the moreimportant you are

23

Page 25: Network centrality measures and their effectiveness

spectral measures

How many of them?

∙ The dominant eigenvector∙ Seeley’s index∙ Katz’s index∙ PageRank∙ HITS∙ SALSA

24

Page 26: Network centrality measures and their effectiveness

spectral measures: some useful notation

Given the adjacency matrix A we can compute:

∙ The ℓ1 norm of the matrix A∙ Each element of the row i is divided by the sum of its elements

∙ The symmetric graph G′ of the given graph G∙ The transpose of AT of the adjacency matrix A

∙ The number of k−lenght path from a node i to another node j∙ Ak: in such a matrix, each element aij will be the number of paths withlenght = k from the node i to the node j

25

Page 27: Network centrality measures and their effectiveness

spectral measures: the left dominant eigenvector

Dominant eigenvector

∙ Taking in consideration the left dominant eigenvector means to consider theincoming edges of a node.

∙ To find out the node’s importance, we perform an iterated computation of:

xt+1i =1λ

n∑i=0

A(t)ij (6)

where:

∙ x0i = 1 ∀ i at step 0∙ xt is the score after t iterations∙ λ is the dominant eigenvalue of the adjacency matrix A

∙ After that, the vector x is normalized and the process iterated until convergence

∙ Each node starts with the same score. Then, in iteration, it receives the sum of theconnected neighbor’s score

26

Page 28: Network centrality measures and their effectiveness

eigenvector centrality: example

In figure 1 there are applications on the same graph of degree andeigenvector centrality

Figure 1: Degree and eigenvector centrality27

Page 29: Network centrality measures and their effectiveness

spectral measures: seeley’s index

∙ Why give away all of our importance?

∙ It would have more sense to equally divide our importance among our successors

∙ The process will remains the same, but from an algebric point of view that meansnormalizing each row of the adjacency matrix:

xt+1i =1λ

n∑i=0

A(t)ij (7)

where:

∙ x0i = 1 ∀ i at step 0∙ xt is the score after t iterations∙ λ is the dominant eigenvalue of the adjacency matrix A∙ A is the normalized form of the adjacency matrix

∙ Isolated nodes of a non strongly connected graph will have null score overiterations

28

Page 30: Network centrality measures and their effectiveness

spectral measures: katz’s index

Katz’s index weighs all incoming paths to a node and then compute:

x = 1∞∑i=0

βiAi (8)

where:

∙ x is the output’s scores vector∙ 1 is the weight’s vector (for example all 1)∙ βi is an attenuation factor (β < 1

λ )∙ Ai contains in the generic element aij the number of i-lenght pathfrom i to j

29

Page 31: Network centrality measures and their effectiveness

spectral measures: pagerank

PageRank - a little overview

∙ It’s supposed to be how the Google’s search engine works∙ It is the unique vector p satisfying

p = (1− α)v(1− αA)−1

∙ where:∙ α ∈ [0, 1) is a dumping factor∙ v is a preference vector (a distribution)∙ A is the ℓ1 normalized adjacency matrix

∙ As shown, PageRank and Katz’s index differ by a constant factorand the ℓ1 normalization of the adjacency matrix A

30

Page 32: Network centrality measures and their effectiveness

spectral measures: eigenvector and pagerank

In figure 2 there are applications of the same graph of eigenvectorPageRank centrality

Figure 2: Degree and eigenvector centrality

31

Page 33: Network centrality measures and their effectiveness

spectral measures: hits

HITS - a little overview by Kleinberg

∙ The key here is the mutual reinforcement∙ A node ( such as a page ) is authoritative if it is pointed by manygood hubs∙ Hubs: pages containing good list of authoritative pages

∙ Then an Hub is good if it points to many authoritative pages∙ We iteratively compute the:∙ ai: authoritativeness score ( where a0 = 1)∙ hi: hubbiness score

as the following:hi+1 = aiATai+1 = hi+1A

∙ This process converges to the left dominant eigenvector of thematrix ATA giving the final score of authoritativeness, called ”HITS”

32

Page 34: Network centrality measures and their effectiveness

spectral measures: salsa

SALSA was ideated by Lempel and Moran

∙ Based on the same mutual reinforcement betweenauthoritativeness and hubbiness, but ℓ1normalizing the matrices Aand AT.∙ Starting value: a0 = 1∙ hi+1 = aiAT

∙ ai+1 = aiA

∙ Contrarily to HITS there is no need of a large number of iterationwith SALSA

33

Page 35: Network centrality measures and their effectiveness

spectral measures: some applications

∙ Left dominant eigenvector: the idea on which networks structureanalysisis is based

∙ Seeley’s index: feedback’s network∙ Katz’s index: citations networks∙ expecially good with direct acyclic graphs (where the basic dominanteigenvector don’t perform well)

∙ HITS: web page’s citations∙ Pagerank: Google’s search engine∙ SALSA: link structure analysis

34

Page 36: Network centrality measures and their effectiveness

effectiveness of centrality mea-sures

Page 37: Network centrality measures and their effectiveness

axioms for centrality

∙ Boldi and Vigna in 2013 tried to provide a method to evaluate andcompare different centrality measures

∙ They defined three axioms that an index should satisfy to behavepredictably∙ Size axiom∙ Density axiom∙ The score-monotonicity axiom

36

Page 38: Network centrality measures and their effectiveness

axioms for centrality: size axiom

Given a graph Sk,p (figure 3), made by a k− clique and a directedp− cycle, the size axioms is satisfied if there are threshold values,of p and k such that:∙ p > k (if the cycle is very large) the nodes of the cycle are moreimportant

∙ k > p the nodes of clique are more important∙ intuitively, for p = k, the nodes of the clique are more important

Figure 3: Graph Sk,p 37

Page 39: Network centrality measures and their effectiveness

axioms for centrality: density axiom

∙ Given a graph Dk,p(figure 4), made by a k− clique and a directedp− cycle connected by a bidirectional bridge x ↔ y, where x is anode of the clique and y a node of the cycle.

∙ A centrality measure satisfies the density axiom for k = p, if thecentrality of x is strictly larger than the centrality of y.

Figure 4: Graph Gk,p

38

Page 40: Network centrality measures and their effectiveness

axioms for centrality: the score-monotonicity axiom

∙ A centrality measure satisfies the score-monotonicity axiom if forevery graph G and every pair of node x, y such that x ↛ y, when weadd x → y to G the centrality of y increases.

39

Page 41: Network centrality measures and their effectiveness

axioms for centrality: centrality axioms: comparisons

Figure 5: For each centrality and each axiom, the report whether it issatisfied

The harmonic centrality satisfies all axioms.

40

Page 42: Network centrality measures and their effectiveness

information retrieval: sanity check

∙ Boldi and Vigna have applied centrality measures on standarddatasets in order to find out the behavior of different indices

∙ There are standard datasets with associated queries and groundtruth about which documents are relevant for every query

∙ Those collections are typically used to compare the merits and thedemerits about retrieval methods

41

Page 43: Network centrality measures and their effectiveness

information retrieval: datasets

Dataset GOV2, tested in two different ways:

∙ with all links: complete dataset∙ with inter-host link only: links between pages of the same hostare excluted from the graph

Measures of effectiveness chosen:

∙ P@10: precision at 10, fraction of relevant documents retrievedamong the first ten

∙ NDCG@10: discounted cumulative gain at 10, measure theusefulness, or gain, of a document based on its position in theresult list

42

Page 44: Network centrality measures and their effectiveness

information retrieval: results

For each centrality measure the discounted cumulative and precision at 10, on GOV2dataset using all links (on the left) and using only inter-host links (on the right).

Figure 6: All links Figure 7: Inter-host links 43

Page 45: Network centrality measures and their effectiveness

conclusions

Page 46: Network centrality measures and their effectiveness

conclusions

∙ A very simple measure as harmonic centrality, turned out to be agood notion of centrality.∙ it satisfies all centrality axioms proposed∙ it works well to retrieve information

Choose the right measure

∙ No centrality measure is better than the others in every situation∙ Some are better than others to reach a particular goal, but itdepends on the specific application domain

∙ So, the best approach is to understand which measure fits theproblem better

45

Page 47: Network centrality measures and their effectiveness

references and useful resources

Paolo Boldi and Sebastiano VignaAxioms for centrality.

Nicola Perra and Santo FortunatoSpectral centrality measures in complex networks.

M. E. J. NewmanNetworks: an introduction

46

Page 48: Network centrality measures and their effectiveness

Thank you for your attention!

47