graph mining - lesson 1 introduction to graphs and networks - … · 2020. 5. 7. · a brief...

96
Graph mining - lesson 1 Introduction to graphs and networks Nathalie Vialaneix [email protected] http://www.nathalievialaneix.eu M2 Statistics & Econometrics January 8th, 2020 Nathalie Vialaneix | Graph mining 1/30

Upload: others

Post on 19-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Graph mining - lesson 1Introduction to graphs and networks

Nathalie Vialaneix

[email protected]://www.nathalievialaneix.eu

M2 Statistics & EconometricsJanuary 8th, 2020

Nathalie Vialaneix | Graph mining 1/30

Page 2: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

A brief overview for this class...

Who am I? Statistician working in biostatistics at INRAE ToulouseMy research interests are: data mining, network inference andmining, machine learning

Purpose of this talk: presenting a few statistical tools for graphmining (graph structure, important vertices) and clustering

Nathalie Vialaneix | Graph mining 2/30

Page 3: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Outline

A brief introduction to networks/graphs

Visualization

Global characteristics

Numerical characteristics calculation

Nathalie Vialaneix | Graph mining 3/30

Page 4: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

What is a network/graph?Mathematical object used to model relational data betweenentities.

A relation between two entities is modeled by an edge

Nathalie Vialaneix | Graph mining 4/30

Page 5: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

What is a network/graph?Mathematical object used to model relational data betweenentities.

The entities are called the nodes or the vertices

Arelation between two entities is modeled by an edge

Nathalie Vialaneix | Graph mining 4/30

Page 6: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

What is a network/graph?Mathematical object used to model relational data betweenentities.

A relation between two entities is modeled by an edge

Nathalie Vialaneix | Graph mining 4/30

Page 7: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Where does graph theory come from?Seven Bridges of Königsberg: notable problem in mathematics.Königsberg set on both sides of the Pregel River and included twolarge islands.Question: Is there a walk through the city that crosses each bridgeonce and only once?

Image: Public Domain. CC BY-SA 3.0.

Leonhard Euler proved that the problem has no solution using amathematical proof which was the starting point of graph theory.

Nathalie Vialaneix | Graph mining 5/30

Page 8: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Where does graph theory come from?Seven Bridges of Königsberg: notable problem in mathematics.Königsberg set on both sides of the Pregel River and included twolarge islands.Question: Is there a walk through the city that crosses each bridgeonce and only once?

Image: Public Domain. CC BY-SA 3.0.

Leonhard Euler proved that the problem has no solution using amathematical proof which was the starting point of graph theory.

Nathalie Vialaneix | Graph mining 5/30

Page 9: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Examples of networks

Social networks:

Credits: Frauhoelle (CC BY-SA 2.0) and Caseorganic (CC BY-NC 2.0) on flickr

Nathalie Vialaneix | Graph mining 6/30

Page 10: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Examples of networks

Blog co-citations and internet routes:

Credits: Porternovelli on flikr (CC BY-SA 2.0) and Matt Britt on wikimedia commons (CC BY 2.5)

Nathalie Vialaneix | Graph mining 6/30

Page 11: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Examples of networks

Consumers/products graphs or co-purchase networks:

Credits: Loop© and http://www.annehelmond.nl

Nathalie Vialaneix | Graph mining 6/30

Page 12: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Examples of networks(biological) Neural networks:

Credits: Soon-Beom et al. (CC BY-SA 3.0) and Andreashorn (CC BY-SA 4.0) on wikimedia commonsNathalie Vialaneix | Graph mining 6/30

Page 13: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Examples of networksIngredient networks... and many others!

Credits: http://www.ladamic.com

Nathalie Vialaneix | Graph mining 6/30

Page 14: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

More complex relational models

Vertices...

can be labelled with a factor or a numeric variables or severalvariables (caracteristics attached to the entities in relation)

Edges...

I can be orientedI can be weightedI can be described by numerical attributes or factors

(caracteristics attached to the relation)

Nathalie Vialaneix | Graph mining 7/30

Page 15: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

More complex relational models

Vertices...

can be labelled with a factor or a numeric variables or severalvariables (caracteristics attached to the entities in relation)

Edges...

I can be orientedI can be weightedI can be described by numerical attributes or factors

(caracteristics attached to the relation)

Nathalie Vialaneix | Graph mining 7/30

Page 16: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Many applications...

I Viral marketing: find a way to efficiently spread the informationabout a new product using social network informations

I Recommandation systems: recommand a product tosomeone based on his/her previous purchase andco-purchase information

I Biological network: acquire knowledge about biologicalnetworks (genes, metabolomic pathway...) in order tounderstand diseases associated with disfunctionning

I ...

Nathalie Vialaneix | Graph mining 8/30

Page 17: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Standard issues associated with networksInference

Giving data, how to build a graph whose edges represent the directlinks between variables?Example: co-expression networks built from microarray data (vertices = genes;edges = significant “direct links” between expressions of two genes)

Graph mining (examples)

1. Network visualization: vertices are not a priori associated to agiven position. How to represent the network in a meaningfulway?

2. Network clustering: identify “communities” (groups of vertices that

are densely connected and share a few links with the other groups)

Nathalie Vialaneix | Graph mining 9/30

Page 18: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Standard issues associated with networksInference

Giving data, how to build a graph whose edges represent the directlinks between variables?

Graph mining (examples)

1. Network visualization: vertices are not a priori associated to agiven position. How to represent the network in a meaningfulway?

Random positions or positions aiming at representingconnected vertices closer.

2. Network clustering: identify “communities” (groups of vertices that

are densely connected and share a few links with the other groups)

Nathalie Vialaneix | Graph mining 9/30

Page 19: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Standard issues associated with networksInference

Giving data, how to build a graph whose edges represent the directlinks between variables?

Graph mining (examples)

1. Network visualization: vertices are not a priori associated to agiven position. How to represent the network in a meaningfulway?

2. Network clustering: identify “communities” (groups of vertices that

are densely connected and share a few links with the other groups)

Nathalie Vialaneix | Graph mining 9/30

Page 20: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Notations for this class

Notations

In the following, a graph G = (V ,E,W) with:I V : set of vertices {x1, . . . , xn};I E: set of (undirected) edges. m = |E |;I W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.

If needed, attributes for the vertices will be denoted by fj(xi) (jthattribute for vertex i) and attributes for the edges (other than theweights) by gj(xi , xi′) (jth attribute for the edge (xi , xi′)).

Nathalie Vialaneix | Graph mining 10/30

Page 21: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Notations for this class

Notations

In the following, a graph G = (V ,E,W) with:I V : set of vertices {x1, . . . , xn};I E: set of (undirected) edges. m = |E |;I W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.

If needed, attributes for the vertices will be denoted by fj(xi) (jthattribute for vertex i) and attributes for the edges (other than theweights) by gj(xi , xi′) (jth attribute for the edge (xi , xi′)).

Nathalie Vialaneix | Graph mining 10/30

Page 22: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Online graph datasets and ressources

I Mark Newman’s collection:http://www-personal.umich.edu/~mejn/netdata

I Stanford Large Network Dataset Collection (SNAP):http://snap.stanford.edu/data

I KONECT collection (Koblenz university):http://konect.uni-koblenz.de/networks

I Colorado Index of Complex Networks (ICON):https://icon.colorado.edu

Online course: http://barabasi.com/networksciencebook(Alberto Barabasi)

Nathalie Vialaneix | Graph mining 11/30

Page 23: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Mining graphs/networks

I Visualizing and manipulating graphs in an interactive way:Gephi https://gephi.org, Tulip http://tulip.labri.fror Cytoscape http://cytoscape.org;

I Packages/librairies in data mining languages:I for Python: igraph, NetworkX and graph-toolI for R: igraph, statnet, bipartite and tnet. See also the CRAN

task view:https://cran.r-project.org/web/views/gR.html(graphical models)

Nathalie Vialaneix | Graph mining 12/30

Page 24: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Special graphsFull graphs

A full graph with verticesV = {x1, . . . , xn} is thegraph with edge listE = {(xi , xj) : xi , xj ∈ V}

1

2

34

5

6

Bipartite graphsA graph with verticesV = {x1, . . . , xn} partitionned intotwo groups {xi : f(xi) = 1} and{xi : f(xi) = −1} and such thatedges are a subset of {(xi , xj) :f(xi) = 1 and f(xj) = −1} (e.g.,purchase network)

−1

1

−1

1

−1

1

−1

1

−1

1

Nathalie Vialaneix | Graph mining 13/30

Page 25: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Special graphsFull graphs

A full graph with verticesV = {x1, . . . , xn} is thegraph with edge listE = {(xi , xj) : xi , xj ∈ V}

1

2

34

5

6

Bipartite graphsA graph with verticesV = {x1, . . . , xn} partitionned intotwo groups {xi : f(xi) = 1} and{xi : f(xi) = −1} and such thatedges are a subset of {(xi , xj) :f(xi) = 1 and f(xj) = −1} (e.g.,purchase network)

−1

1

−1

1

−1

1

−1

1

−1

1

Nathalie Vialaneix | Graph mining 13/30

Page 26: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Most standard ways to record a graph

I adjacency matrix: matrix W if the network is weighted or

Aij =

{1 if (xi , xj) ∈ E0 otherwise

if it is unweighted.

requires to store n2 values

I edge list: matrix B of dimension m × 2 (unweighted network)or m×3 (weighted network), Bk . = (xi , xj ,Wij) for a (xi , xj) ∈ E.requires to store 3m values

Since usually m � n2, the second solution is often prefered.

Other standard formats (readable by interactive software andallowing metadata) such as graphml (a graph version of XML)http://graphml.graphdrawing.org

Nathalie Vialaneix | Graph mining 14/30

Page 27: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Most standard ways to record a graph

I adjacency matrix: matrix W if the network is weighted or

Aij =

{1 if (xi , xj) ∈ E0 otherwise

if it is unweighted.

requires to store n2 values

I edge list: matrix B of dimension m × 2 (unweighted network)or m×3 (weighted network), Bk . = (xi , xj ,Wij) for a (xi , xj) ∈ E.requires to store 3m values

Since usually m � n2, the second solution is often prefered.

Other standard formats (readable by interactive software andallowing metadata) such as graphml (a graph version of XML)http://graphml.graphdrawing.org

Nathalie Vialaneix | Graph mining 14/30

Page 28: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Most standard ways to record a graph

I adjacency matrix: matrix W if the network is weighted or

Aij =

{1 if (xi , xj) ∈ E0 otherwise

if it is unweighted.

requires to store n2 values

I edge list: matrix B of dimension m × 2 (unweighted network)or m×3 (weighted network), Bk . = (xi , xj ,Wij) for a (xi , xj) ∈ E.requires to store 3m values

Since usually m � n2, the second solution is often prefered.

Other standard formats (readable by interactive software andallowing metadata) such as graphml (a graph version of XML)http://graphml.graphdrawing.org

Nathalie Vialaneix | Graph mining 14/30

Page 29: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Most standard ways to record a graph

I adjacency matrix: matrix W if the network is weighted or

Aij =

{1 if (xi , xj) ∈ E0 otherwise

if it is unweighted.

requires to store n2 values

I edge list: matrix B of dimension m × 2 (unweighted network)or m×3 (weighted network), Bk . = (xi , xj ,Wij) for a (xi , xj) ∈ E.requires to store 3m values

Since usually m � n2, the second solution is often prefered.

Other standard formats (readable by interactive software andallowing metadata) such as graphml (a graph version of XML)http://graphml.graphdrawing.org

Nathalie Vialaneix | Graph mining 14/30

Page 30: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 1 “GOT”“Game of Thrones” coappearances network: weighted andundirected network with 107 vertices corresponding to uniquecharacters and 353 edges weighted by the number of times thetwo characters’ names appeared within 15 words of each other inthe Game of Thrones series by George R.R. Martin.

Reference: [Beveridge and Shan, 2016]

http://www.jstor.org/stable/10.4169Dataset available at: http://www.macalester.edu/~abeverid/data/stormofswords.csv(edgelist format)

Aemon

GrennSamwell

AerysJaime

Robert

Tyrion

Tywin

AlliserMance

Amory

Oberyn

AryaAnguy

BericBran

Brynden Cersei

Gendry

GregorJoffrey

Jon

Rickon

Roose

Sandor

ThorosBalon

Loras

BelwasBarristan

Illyrio

HodorJojen

LuwinMeera

Nan Theon

Brienne

BronnPodrickLothar

WalderCatelynEdmure

Hoster

Jeyne

LysaPetyr

Robb

Roslin

Sansa

Stannis

Elia

IlynMerynPycelle

Shae

Varys

Craster

Karl

DaarioDrogo

IrriDaenerys

AegonJorah

Kraznys

MissandeiRakharoRhaegar

Viserys

Worm

Davos

Cressen

Salladhor

Eddard

Eddison

Gilly

Qyburn

Renly

Tommen

Janos

Bowen

Kevan

Margaery

Myrcella

Dalla

Melisandre

Orell

QhorinRattleshirtStyr

ValYgritte

Jon Arryn

Lancel

OlennaMarillion

Robert ArrynEllaria

MaceRickard

Ramsay

Chataya

Shireen

Doran

Walton

Nathalie Vialaneix | Graph mining 15/30

Page 31: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 1 “GOT”“Game of Thrones” coappearances network: weighted andundirected network with 107 vertices corresponding to uniquecharacters and 353 edges weighted by the number of times thetwo characters’ names appeared within 15 words of each other inthe Game of Thrones series by George R.R. Martin.Reference: [Beveridge and Shan, 2016]

http://www.jstor.org/stable/10.4169

Dataset available at: http://www.macalester.edu/~abeverid/data/stormofswords.csv(edgelist format)

Aemon

GrennSamwell

AerysJaime

Robert

Tyrion

Tywin

AlliserMance

Amory

Oberyn

AryaAnguy

BericBran

Brynden Cersei

Gendry

GregorJoffrey

Jon

Rickon

Roose

Sandor

ThorosBalon

Loras

BelwasBarristan

Illyrio

HodorJojen

LuwinMeera

Nan Theon

Brienne

BronnPodrickLothar

WalderCatelynEdmure

Hoster

Jeyne

LysaPetyr

Robb

Roslin

Sansa

Stannis

Elia

IlynMerynPycelle

Shae

Varys

Craster

Karl

DaarioDrogo

IrriDaenerys

AegonJorah

Kraznys

MissandeiRakharoRhaegar

Viserys

Worm

Davos

Cressen

Salladhor

Eddard

Eddison

Gilly

Qyburn

Renly

Tommen

Janos

Bowen

Kevan

Margaery

Myrcella

Dalla

Melisandre

Orell

QhorinRattleshirtStyr

ValYgritte

Jon Arryn

Lancel

OlennaMarillion

Robert ArrynEllaria

MaceRickard

Ramsay

Chataya

Shireen

Doran

Walton

Nathalie Vialaneix | Graph mining 15/30

Page 32: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 1 “GOT”“Game of Thrones” coappearances network: weighted andundirected network with 107 vertices corresponding to uniquecharacters and 353 edges weighted by the number of times thetwo characters’ names appeared within 15 words of each other inthe Game of Thrones series by George R.R. Martin.Reference: [Beveridge and Shan, 2016]

http://www.jstor.org/stable/10.4169Dataset available at: http://www.macalester.edu/~abeverid/data/stormofswords.csv(edgelist format)

Aemon

GrennSamwell

AerysJaime

Robert

Tyrion

Tywin

AlliserMance

Amory

Oberyn

AryaAnguy

BericBran

Brynden Cersei

Gendry

GregorJoffrey

Jon

Rickon

Roose

Sandor

ThorosBalon

Loras

BelwasBarristan

Illyrio

HodorJojen

LuwinMeera

Nan Theon

Brienne

BronnPodrickLothar

WalderCatelynEdmure

Hoster

Jeyne

LysaPetyr

Robb

Roslin

Sansa

Stannis

Elia

IlynMerynPycelle

Shae

Varys

Craster

Karl

DaarioDrogo

IrriDaenerys

AegonJorah

Kraznys

MissandeiRakharoRhaegar

Viserys

Worm

Davos

Cressen

Salladhor

Eddard

Eddison

Gilly

Qyburn

Renly

Tommen

Janos

Bowen

Kevan

Margaery

Myrcella

Dalla

Melisandre

Orell

QhorinRattleshirtStyr

ValYgritte

Jon Arryn

Lancel

OlennaMarillion

Robert ArrynEllaria

MaceRickard

Ramsay

Chataya

Shireen

Doran

Walton

Nathalie Vialaneix | Graph mining 15/30

Page 33: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 1 “GOT”“Game of Thrones” coappearances network

Aemon

GrennSamwell

AerysJaime

Robert

Tyrion

Tywin

AlliserMance

Amory

Oberyn

AryaAnguy

BericBran

Brynden Cersei

Gendry

GregorJoffrey

Jon

Rickon

Roose

Sandor

ThorosBalon

Loras

BelwasBarristan

Illyrio

HodorJojen

LuwinMeera

Nan Theon

Brienne

BronnPodrickLothar

WalderCatelynEdmure

Hoster

Jeyne

LysaPetyr

Robb

Roslin

Sansa

Stannis

Elia

IlynMerynPycelle

Shae

Varys

Craster

Karl

DaarioDrogo

IrriDaenerys

AegonJorah

Kraznys

MissandeiRakharoRhaegar

Viserys

Worm

Davos

Cressen

Salladhor

Eddard

Eddison

Gilly

Qyburn

Renly

Tommen

Janos

Bowen

Kevan

Margaery

Myrcella

Dalla

Melisandre

Orell

QhorinRattleshirtStyr

ValYgritte

Jon Arryn

Lancel

OlennaMarillion

Robert ArrynEllaria

MaceRickard

Ramsay

Chataya

Shireen

Doran

Walton

Nathalie Vialaneix | Graph mining 15/30

Page 34: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 2 “NVV”my facebook network (extracted from facebook in 2015) with 152vertices (my friends on facebook) and 551 edges (mutualfriendship between my friends)

Dataset available at: http://www.nathalievialaneix.eu/doc/txt/fbnet-el-2015.txt(edge list) and http://www.nathalievialaneix.eu/doc/txt/fbnet-name-2015.txt (metadata -initials- for the vertices)

J.M

C.L

C.RJ.R

M.VK.Af.p

S.L

K.E

S.H

C.T

C.EC.V

F.A

N.V

S.TN.P

C.C

C.H

J.V

C.T

N.T

F.R

D.C

A.D.D

O.H

F.RB.M C.A

F.R

N.S

B.L.G

M.D

O.C

C.D

G.P

M.M

E.P

P.C

S.G

I.GA.G

M.P

S.CG.B

L.T

P.N.V.D

U.B

T.K

G.B

J.D

C.TC.P

M.C

L.F

N.H.G

M.C

J.A.SS.S.T

M.P.L

C.D

M.C

V.T

N.P

Y.C

M.T

J.D.V

E.L

A.R

E.C

V.V

A.R

D.L

V.G

S.D

P.L

N.L.CK.L

B.PE.V

A.B

D.L.B

N.P

J.T

R.BA.LL.A

P.P C.C

E.P

V.D

M.C

N.R

S.B

A.D

C.T

C.V.T

K.F

R.F

E.AP.B

F.C

M.C

E.B.C

N.ET.B

S.E

A.J

E.R

A.I

J.L

L.R.PR.R

M.M

S.B

C.HA.I

C.D

B.SS.R.C

T.P

K.M

G.E

A.C

F.D

R.A.R.F

I.L.TR.C

H.S

G.H

L.F.C

M.B

Z.T

C.D

E.B

N.A.A

M.V

M.F

M.C

A.S

S.F

I.T

S.I

C.B

V.R

F.P

G.M

T.C

G.S

C.N

L.MP.B

Nathalie Vialaneix | Graph mining 16/30

Page 35: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 2 “NVV”my facebook network (extracted from facebook in 2015) with 152vertices (my friends on facebook) and 551 edges (mutualfriendship between my friends)Dataset available at: http://www.nathalievialaneix.eu/doc/txt/fbnet-el-2015.txt(edge list) and http://www.nathalievialaneix.eu/doc/txt/fbnet-name-2015.txt (metadata -initials- for the vertices)

J.M

C.L

C.RJ.R

M.VK.Af.p

S.L

K.E

S.H

C.T

C.EC.V

F.A

N.V

S.TN.P

C.C

C.H

J.V

C.T

N.T

F.R

D.C

A.D.D

O.H

F.RB.M C.A

F.R

N.S

B.L.G

M.D

O.C

C.D

G.P

M.M

E.P

P.C

S.G

I.GA.G

M.P

S.CG.B

L.T

P.N.V.D

U.B

T.K

G.B

J.D

C.TC.P

M.C

L.F

N.H.G

M.C

J.A.SS.S.T

M.P.L

C.D

M.C

V.T

N.P

Y.C

M.T

J.D.V

E.L

A.R

E.C

V.V

A.R

D.L

V.G

S.D

P.L

N.L.CK.L

B.PE.V

A.B

D.L.B

N.P

J.T

R.BA.LL.A

P.P C.C

E.P

V.D

M.C

N.R

S.B

A.D

C.T

C.V.T

K.F

R.F

E.AP.B

F.C

M.C

E.B.C

N.ET.B

S.E

A.J

E.R

A.I

J.L

L.R.PR.R

M.M

S.B

C.HA.I

C.D

B.SS.R.C

T.P

K.M

G.E

A.C

F.D

R.A.R.F

I.L.TR.C

H.S

G.H

L.F.C

M.B

Z.T

C.D

E.B

N.A.A

M.V

M.F

M.C

A.S

S.F

I.T

S.I

C.B

V.R

F.P

G.M

T.C

G.S

C.N

L.MP.B

Nathalie Vialaneix | Graph mining 16/30

Page 36: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 2 “NVV”my facebook network (extracted from facebook in 2015)

J.M

C.L

C.RJ.R

M.VK.Af.p

S.L

K.E

S.H

C.T

C.EC.V

F.A

N.V

S.TN.P

C.C

C.H

J.V

C.T

N.T

F.R

D.C

A.D.D

O.H

F.RB.M C.A

F.R

N.S

B.L.G

M.D

O.C

C.D

G.P

M.M

E.P

P.C

S.G

I.GA.G

M.P

S.CG.B

L.T

P.N.V.D

U.B

T.K

G.B

J.D

C.TC.P

M.C

L.F

N.H.G

M.C

J.A.SS.S.T

M.P.L

C.D

M.C

V.T

N.P

Y.C

M.T

J.D.V

E.L

A.R

E.C

V.V

A.R

D.L

V.G

S.D

P.L

N.L.CK.L

B.PE.V

A.B

D.L.B

N.P

J.T

R.BA.LL.A

P.P C.C

E.P

V.D

M.C

N.R

S.B

A.D

C.T

C.V.T

K.F

R.F

E.AP.B

F.C

M.C

E.B.C

N.ET.B

S.E

A.J

E.R

A.I

J.L

L.R.PR.R

M.M

S.B

C.HA.I

C.D

B.SS.R.C

T.P

K.M

G.E

A.C

F.D

R.A.R.F

I.L.TR.C

H.S

G.H

L.F.C

M.B

Z.T

C.D

E.B

N.A.A

M.V

M.F

M.C

A.S

S.F

I.T

S.I

C.B

V.R

F.P

G.M

T.C

G.S

C.N

L.MP.B

Nathalie Vialaneix | Graph mining 16/30

Page 37: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 3 “FB”Amherst College https://www.amherst.edu facebook network :Snapshots of within-college social networks of the first 100colleges and universities admitted to thefacebook.com, inSeptember 2005. Vertices are annotated with metadata giving thetype of account (student, faculty, alumni, etc.), dorm, major,gender, and graduation year (2,235 vertices and 90,954 edges).

Reference: [Traud et al., 2012] http://arxiv.org/abs/1102.2166Dataset available at:https://escience.rpi.edu/data/DA/fb100 (Matlab© format;adjacency matrix + data frame with information on vertices: 7columns)

Nathalie Vialaneix | Graph mining 17/30

Page 38: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 3 “FB”Amherst College https://www.amherst.edu facebook network :Snapshots of within-college social networks of the first 100colleges and universities admitted to thefacebook.com, inSeptember 2005. Vertices are annotated with metadata giving thetype of account (student, faculty, alumni, etc.), dorm, major,gender, and graduation year (2,235 vertices and 90,954 edges).Reference: [Traud et al., 2012] http://arxiv.org/abs/1102.2166

Dataset available at:https://escience.rpi.edu/data/DA/fb100 (Matlab© format;adjacency matrix + data frame with information on vertices: 7columns)

Nathalie Vialaneix | Graph mining 17/30

Page 39: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 3 “FB”Amherst College https://www.amherst.edu facebook network :Snapshots of within-college social networks of the first 100colleges and universities admitted to thefacebook.com, inSeptember 2005. Vertices are annotated with metadata giving thetype of account (student, faculty, alumni, etc.), dorm, major,gender, and graduation year (2,235 vertices and 90,954 edges).Reference: [Traud et al., 2012] http://arxiv.org/abs/1102.2166Dataset available at:https://escience.rpi.edu/data/DA/fb100 (Matlab© format;adjacency matrix + data frame with information on vertices: 7columns)

Nathalie Vialaneix | Graph mining 17/30

Page 40: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Running example 3 “FB”Amherst College https://www.amherst.edu facebook network

Nathalie Vialaneix | Graph mining 17/30

Page 41: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Connected componentThe graph is said to be connected if any vertex can be reachedfrom any other vertex by a path along the edges.The connected components of a graph are all its connectedsubgraphs with maximum sizes.

Examples: GOT and FB are connected graphs. NVV is notconnected and contains 21 connected components, among whichthe largest has 122 vertices.

Nathalie Vialaneix | Graph mining 18/30

Page 42: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Connected componentThe graph is said to be connected if any vertex can be reachedfrom any other vertex by a path along the edges.The connected components of a graph are all its connectedsubgraphs with maximum sizes.Examples: GOT and FB are connected graphs. NVV is notconnected and contains 21 connected components, among whichthe largest has 122 vertices.

Nathalie Vialaneix | Graph mining 18/30

Page 43: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Outline

A brief introduction to networks/graphs

Visualization

Global characteristics

Numerical characteristics calculation

Nathalie Vialaneix | Graph mining 19/30

Page 44: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Visualization tools help understand the graphmacro-structure

Purpose: How to display the vertices in a meaningful and aestheticway?

Standard approach: force directed placement algorithms (FDP)(e.g., [Fruchterman and Reingold, 1991])

I attractive forces: similar to springs along the edgesI repulsive forces: similar to electric forces between all pairs of

vertices

iterative algorithm until stabilization of the vertex positions.

Nathalie Vialaneix | Graph mining 20/30

Page 45: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Visualization tools help understand the graphmacro-structure

Purpose: How to display the vertices in a meaningful and aestheticway?Standard approach: force directed placement algorithms (FDP)(e.g., [Fruchterman and Reingold, 1991])

I attractive forces: similar to springs along the edgesI repulsive forces: similar to electric forces between all pairs of

vertices

iterative algorithm until stabilization of the vertex positions.

Nathalie Vialaneix | Graph mining 20/30

Page 46: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Visualization tools help understand the graphmacro-structure

Purpose: How to display the vertices in a meaningful and aestheticway?Standard approach: force directed placement algorithms (FDP)(e.g., [Fruchterman and Reingold, 1991])

I attractive forces: similar to springs along the edges

I repulsive forces: similar to electric forces between all pairs ofvertices

iterative algorithm until stabilization of the vertex positions.

Nathalie Vialaneix | Graph mining 20/30

Page 47: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Visualization tools help understand the graphmacro-structure

Purpose: How to display the vertices in a meaningful and aestheticway?Standard approach: force directed placement algorithms (FDP)(e.g., [Fruchterman and Reingold, 1991])

I attractive forces: similar to springs along the edgesI repulsive forces: similar to electric forces between all pairs of

vertices

iterative algorithm until stabilization of the vertex positions.

Nathalie Vialaneix | Graph mining 20/30

Page 48: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Visualization tools help understand the graphmacro-structure

Purpose: How to display the vertices in a meaningful and aestheticway?Standard approach: force directed placement algorithms (FDP)(e.g., [Fruchterman and Reingold, 1991])

I attractive forces: similar to springs along the edgesI repulsive forces: similar to electric forces between all pairs of

vertices

iterative algorithm until stabilization of the vertex positions.

Nathalie Vialaneix | Graph mining 20/30

Page 49: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Visualization softwareI package igraph1 [Csardi and Nepusz, 2006] (static

representation with useful tools for graph mining)

I free software Gephi2 (interactive software, supportszooming and panning)

1http://igraph.sourceforge.net/

2http://gephi.org

Nathalie Vialaneix | Graph mining 21/30

Page 50: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Visualization softwareI package igraph1 [Csardi and Nepusz, 2006] (static

representation with useful tools for graph mining)

I free software Gephi2 (interactive software, supportszooming and panning)

1http://igraph.sourceforge.net/

2http://gephi.org

Nathalie Vialaneix | Graph mining 21/30

Page 51: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Outline

A brief introduction to networks/graphs

Visualization

Global characteristics

Numerical characteristics calculation

Nathalie Vialaneix | Graph mining 22/30

Page 52: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Density / TransitivityDensity: Number of edges divided by the number of pairs ofvertices. Is the network densely connected?

Transitivity (sometimes called clustering coefficient): Number oftriangles divided by the number of triplets connected by at leasttwo edges. What is the probability that two people with a commonfriend are also friends?

Examples

Example 1: GOTI density ' 6.2%, transitivity ' 32.9%.

Example 2: NVVI density ' 4.8%, transitivity ' 56.2%;I LCC: density ' 7.2%, transitivity ' 56%.

Example 3: FBI density ' 3.6%, transitivity ' 23.3%.

Nathalie Vialaneix | Graph mining 23/30

Page 53: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Density / TransitivityDensity: Number of edges divided by the number of pairs ofvertices. Is the network densely connected?

Examples

Example 1: GOTI 107 vertices, 352 edges⇒ density = 352

107×106/2 ' 6.2%.

Example 2: NVVI 152 vertices, 551 edges⇒ density ' 4.8%;I largest connected component: 122 vertices, 535 edges⇒

density ' 7.2%.

Example 3: FBI 2235 vertices, 9.0954 × 104 edges⇒ density ' 3.6%.

Transitivity (sometimes called clustering coefficient): Number oftriangles divided by the number of triplets connected by at leasttwo edges. What is the probability that two people with a commonfriend are also friends?

Examples

Example 1: GOTI density ' 6.2%, transitivity ' 32.9%.

Example 2: NVVI density ' 4.8%, transitivity ' 56.2%;I LCC: density ' 7.2%, transitivity ' 56%.

Example 3: FBI density ' 3.6%, transitivity ' 23.3%.

Nathalie Vialaneix | Graph mining 23/30

Page 54: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Density / TransitivityDensity: Number of edges divided by the number of pairs ofvertices. Is the network densely connected?Transitivity (sometimes called clustering coefficient): Number oftriangles divided by the number of triplets connected by at leasttwo edges. What is the probability that two people with a commonfriend are also friends?

Density is equal to 44×3/2 = 2/3 ; Transitivity is equal to 1/3.

Examples

Example 1: GOTI density ' 6.2%, transitivity ' 32.9%.

Example 2: NVVI density ' 4.8%, transitivity ' 56.2%;I LCC: density ' 7.2%, transitivity ' 56%.

Example 3: FBI density ' 3.6%, transitivity ' 23.3%.

Nathalie Vialaneix | Graph mining 23/30

Page 55: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Density / TransitivityDensity: Number of edges divided by the number of pairs ofvertices. Is the network densely connected?Transitivity (sometimes called clustering coefficient): Number oftriangles divided by the number of triplets connected by at leasttwo edges. What is the probability that two people with a commonfriend are also friends?

Examples

Example 1: GOTI density ' 6.2%, transitivity ' 32.9%.

Example 2: NVVI density ' 4.8%, transitivity ' 56.2%;I LCC: density ' 7.2%, transitivity ' 56%.

Example 3: FBI density ' 3.6%, transitivity ' 23.3%.

Nathalie Vialaneix | Graph mining 23/30

Page 56: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Density / TransitivityDensity: Number of edges divided by the number of pairs ofvertices. Is the network densely connected?Transitivity (sometimes called clustering coefficient): Number oftriangles divided by the number of triplets connected by at leasttwo edges. What is the probability that two people with a commonfriend are also friends?

Examples

Example 1: GOTI density ' 6.2%, transitivity ' 32.9%.

Example 2: NVVI density ' 4.8%, transitivity ' 56.2%;I LCC: density ' 7.2%, transitivity ' 56%.

Example 3: FBI density ' 3.6%, transitivity ' 23.3%.

Nathalie Vialaneix | Graph mining 23/30

Page 57: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Density / TransitivityDensity: Number of edges divided by the number of pairs ofvertices. Is the network densely connected?Transitivity (sometimes called clustering coefficient): Number oftriangles divided by the number of triplets connected by at leasttwo edges. What is the probability that two people with a commonfriend are also friends?

Examples

Example 1: GOTI density ' 6.2%, transitivity ' 32.9%.

Example 2: NVVI density ' 4.8%, transitivity ' 56.2%;I LCC: density ' 7.2%, transitivity ' 56%.

Example 3: FBI density ' 3.6%, transitivity ' 23.3%.

Nathalie Vialaneix | Graph mining 23/30

Page 58: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Density / TransitivityDensity: Number of edges divided by the number of pairs ofvertices. Is the network densely connected?Transitivity (sometimes called clustering coefficient): Number oftriangles divided by the number of triplets connected by at leasttwo edges. What is the probability that two people with a commonfriend are also friends?

Examples

Example 1: GOTI density ' 6.2%, transitivity ' 32.9%.

Example 2: NVVI density ' 4.8%, transitivity ' 56.2%;I LCC: density ' 7.2%, transitivity ' 56%.

Example 3: FBI density ' 3.6%, transitivity ' 23.3%.

Nathalie Vialaneix | Graph mining 23/30

Page 59: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 60: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

The diameter of this graph is 2.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 61: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

The diameter of this graph is 2.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 62: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

The diameter of this graph is 2.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 63: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

The diameter of this graph is 2.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 64: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

The diameter of this graph is 2.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 65: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

The diameter of this graph is 2.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 66: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

The diameter of this graph is 2.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 67: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

6 degrees of separation

From a volume of novels by Frigyes Karinthy: all living things andeverything else in the world is six or fewer steps away from eachother (i.e., a chain of “a friend of a friend” statements can be madeto connect any two people in a maximum of six steps).Hypothesis tested by Milgram with a letter chain (1967): ' 2 − 10intermediates to reach a target person from any starting people(known as the “small world experiment”).

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 68: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

Examples of diameter

Example 1: GOT diameter = 6 (unweighted) and 85 (weighted)Example 2: NVV diameter in LCC: 18Example 3: FB diameter = 7

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 69: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

0

500

1000

1500

2000

2 4 6

shortest path length

coun

t

shortest path length distribution − GOT (unweighted)

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 70: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

0

100

200

300

0 20 40 60 80

shortest path length

coun

t

shortest path length distribution − GOT (weighted)

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 71: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

0

250

500

750

1000

1250

0 5 10 15

shortest path length

coun

t

shortest path length distribution − NVV

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 72: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Diameter and 6 degrees of separationDiameter (of a connected graph): length of the longest shortestpath between two vertices in the graph.

0e+00

5e+05

1e+06

2 4 6

shortest path length

coun

t

shortest path length distribution − FB

Nathalie Vialaneix | Graph mining 24/30

Page 73: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

girth, cohesion

I girth: number of vertices in the shortest circle (equal to 3 iftransitivity is not equal to 0)

I cohesion (of a connected graph): minimum number of verticesto remove in order to disconnect the graph

Examples

Example 1: GOT girth = 3 and cohesion = 1Example 2: NVV girth = 3 and cohesion = 1Example 3: FB girth = 3 and cohesion = 1

Nathalie Vialaneix | Graph mining 25/30

Page 74: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

girth, cohesion

I girth: number of vertices in the shortest circle (equal to 3 iftransitivity is not equal to 0)

I cohesion (of a connected graph): minimum number of verticesto remove in order to disconnect the graph

Examples

Example 1: GOT girth = 3 and cohesion = 1Example 2: NVV girth = 3 and cohesion = 1Example 3: FB girth = 3 and cohesion = 1

Nathalie Vialaneix | Graph mining 25/30

Page 75: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Outline

A brief introduction to networks/graphs

Visualization

Global characteristics

Numerical characteristics calculation

Nathalie Vialaneix | Graph mining 26/30

Page 76: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: hubsvertex degree: number of edges adjacent to the vertex∣∣∣{xj : (xi , xj) ∈ E, j , i}

∣∣∣. The weighted version∑

j,i Wij is called thestrength.Vertices with a high degree are called hubs: measure of the vertexpopularity.

degrees

JaimeTyrion

Tywin

Jon

RobbSansa

(unweighted)

degreesV222 456

V1423 467V1700 467

(degree larger than 400)

Nathalie Vialaneix | Graph mining 27/30

Page 77: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: hubsvertex degree: number of edges adjacent to the vertex∣∣∣{xj : (xi , xj) ∈ E, j , i}

∣∣∣. The weighted version∑

j,i Wij is called thestrength.Vertices with a high degree are called hubs: measure of the vertexpopularity.

degreesJaime 24Tyrion 36Tywin 22

Jon 26Robb 25

Sansa 26

(degree larger than 20)

degreesV222 456

V1423 467V1700 467

(degree larger than 400)

Nathalie Vialaneix | Graph mining 27/30

Page 78: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: hubsvertex degree: number of edges adjacent to the vertex∣∣∣{xj : (xi , xj) ∈ E, j , i}

∣∣∣. The weighted version∑

j,i Wij is called thestrength.Vertices with a high degree are called hubs: measure of the vertexpopularity.

strength

Tyrion

Jon

(weighted)

degreesV222 456

V1423 467V1700 467

(degree larger than 400)

Nathalie Vialaneix | Graph mining 27/30

Page 79: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: hubsvertex degree: number of edges adjacent to the vertex∣∣∣{xj : (xi , xj) ∈ E, j , i}

∣∣∣. The weighted version∑

j,i Wij is called thestrength.Vertices with a high degree are called hubs: measure of the vertexpopularity.

S.L

M.P

V.G

N.E

Hubs (degree larger than 25) are two students who have been keptback one year at school.

degreesV222 456

V1423 467V1700 467

(degree larger than 400)

Nathalie Vialaneix | Graph mining 27/30

Page 80: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: hubsvertex degree: number of edges adjacent to the vertex∣∣∣{xj : (xi , xj) ∈ E, j , i}

∣∣∣. The weighted version∑

j,i Wij is called thestrength.Vertices with a high degree are called hubs: measure of the vertexpopularity.

degreesS.L 31

M.P 29V.G 27N.E 27

(degree larger than 25)

degreesV222 456

V1423 467V1700 467

(degree larger than 400)

Nathalie Vialaneix | Graph mining 27/30

Page 81: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: hubsvertex degree: number of edges adjacent to the vertex∣∣∣{xj : (xi , xj) ∈ E, j , i}

∣∣∣. The weighted version∑

j,i Wij is called thestrength.Vertices with a high degree are called hubs: measure of the vertexpopularity.

V222V1423

V1700

degreesV222 456

V1423 467V1700 467

(degree larger than 400)

Nathalie Vialaneix | Graph mining 27/30

Page 82: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: hubs

vertex degree: number of edges adjacent to the vertex∣∣∣{xj : (xi , xj) ∈ E, j , i}∣∣∣. The weighted version

∑j,i Wij is called the

strength.Vertices with a high degree are called hubs: measure of the vertexpopularity.

degreesV222 456

V1423 467V1700 467

(degree larger than 400)

Nathalie Vialaneix | Graph mining 27/30

Page 83: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Degree distribution

In real graphs (WWW, social networks...), the degree distribution isoften found to fit a power law: P(degree = k) ∼ k−γ for a γ > 0.

0.000

0.025

0.050

0.075

0.100

0.125

0 10 20 30

degrees

dens

ity

degree distribution − GOT (unweighted)

Nathalie Vialaneix | Graph mining 28/30

Page 84: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Degree distribution

In real graphs (WWW, social networks...), the degree distribution isoften found to fit a power law: P(degree = k) ∼ k−γ for a γ > 0.

0.0000

0.0025

0.0050

0.0075

0.0100

0 200 400

degrees

dens

ity

degree distribution − GOT (weighted)

Nathalie Vialaneix | Graph mining 28/30

Page 85: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Degree distribution

In real graphs (WWW, social networks...), the degree distribution isoften found to fit a power law: P(degree = k) ∼ k−γ for a γ > 0.

0.00

0.02

0.04

0.06

0.08

0 10 20 30

degrees

dens

ity

degree distribution − NVV

Nathalie Vialaneix | Graph mining 28/30

Page 86: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Degree distribution

In real graphs (WWW, social networks...), the degree distribution isoften found to fit a power law: P(degree = k) ∼ k−γ for a γ > 0.

0.000

0.002

0.004

0.006

0 100 200 300 400

degrees

dens

ity

degree distribution − FB

Nathalie Vialaneix | Graph mining 28/30

Page 87: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Degree distribution

In real graphs (WWW, social networks...), the degree distribution isoften found to fit a power law: P(degree = k) ∼ k−γ for a γ > 0.

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

++

+

+

+

+

++

+

+++

+

+

++++

+

+

+

++

++

+

+

+

+

++

++

+

+

+

+

+

+

+

+

++++

++

+

++

+

+

+

+

+

+

+

++++

++

+

++

+

++

+

+

+

+

+

+

++

+

++

+

+

+

++

+

+

++

+

+

+

+++

++

+

+

+

+

+

+

++

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+++

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

++

++

+

+

+

+

+

+

++

+

+

+

+

+

+

++

+++

+

+

+

+

++

++

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

++

++

+++

+

+

+

++

+

++++

++

+

+

+

+

++

+

+

++

+

+

+

+

+

++

+

++++

++

++

+

+

++

+

+

+++

+++++

+

++

+

++

+

+

+

+++

+

++

+

++

++

+

++

+

+

++++

++

+

+

++++++

+

+

+

++

+

+++++

+

+++

+

++

+

+++++++

+

+++++

+

++++++++

+

+++++++++

+

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+

++++++++++

+−7

−6

−5

−4

0 2 4 6

log(k)

log(

P(d

egre

e =

k))

Nathalie Vialaneix | Graph mining 28/30

Page 88: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: betweennessvertex betweenness: number of shortest paths between all pairs ofvertices that pass through the vertex. Betweenness is a centralitymeasure indicating which vertices are the most important toconnect the network.

Robert

Tyrion

betweenness degreeV177 30797 253V222 30649 456V340 49127 299

V1173 30778 313V1423 60272 467V1700 37868 467

(betweenness larger than 30,000; hubs had a degree larger than400)

Nathalie Vialaneix | Graph mining 29/30

Page 89: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: betweennessvertex betweenness: number of shortest paths between all pairs ofvertices that pass through the vertex. Betweenness is a centralitymeasure indicating which vertices are the most important toconnect the network.

betweenness degreeRobert 1166 18Tyrion 1164 36

(betweenness larger than 1000; hubs had a degree larger than 20)

betweenness degreeV177 30797 253V222 30649 456V340 49127 299

V1173 30778 313V1423 60272 467V1700 37868 467

(betweenness larger than 30,000; hubs had a degree larger than400)

Nathalie Vialaneix | Graph mining 29/30

Page 90: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: betweennessvertex betweenness: number of shortest paths between all pairs ofvertices that pass through the vertex. Betweenness is a centralitymeasure indicating which vertices are the most important toconnect the network.

B.M

L.F

Vertices with largest betweenness (larger than 3000) are publicfigures.

betweenness degreeV177 30797 253V222 30649 456V340 49127 299

V1173 30778 313V1423 60272 467V1700 37868 467

(betweenness larger than 30,000; hubs had a degree larger than400)

Nathalie Vialaneix | Graph mining 29/30

Page 91: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: betweennessvertex betweenness: number of shortest paths between all pairs ofvertices that pass through the vertex. Betweenness is a centralitymeasure indicating which vertices are the most important toconnect the network.

betweennessB.M 3439L.F 3146

betweenness degreeV177 30797 253V222 30649 456V340 49127 299

V1173 30778 313V1423 60272 467V1700 37868 467

(betweenness larger than 30,000; hubs had a degree larger than400)

Nathalie Vialaneix | Graph mining 29/30

Page 92: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: betweennessvertex betweenness: number of shortest paths between all pairs ofvertices that pass through the vertex. Betweenness is a centralitymeasure indicating which vertices are the most important toconnect the network.

V177

V222V340V1173V1423

V1700

betweenness degreeV177 30797 253V222 30649 456V340 49127 299

V1173 30778 313V1423 60272 467V1700 37868 467

(betweenness larger than 30,000; hubs had a degree larger than400)

Nathalie Vialaneix | Graph mining 29/30

Page 93: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Extracting important vertices: betweennessvertex betweenness: number of shortest paths between all pairs ofvertices that pass through the vertex. Betweenness is a centralitymeasure indicating which vertices are the most important toconnect the network.

betweenness degreeV177 30797 253V222 30649 456V340 49127 299

V1173 30778 313V1423 60272 467V1700 37868 467

(betweenness larger than 30,000; hubs had a degree larger than400)

Nathalie Vialaneix | Graph mining 29/30

Page 94: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Other centrality measure: eccentricity and closenessvertex eccentricity: shortest path length from the farthest othervertex in the graph (the smallest eccentricity is the radius)vertex closeness: inverse of the average length of the shortestpaths from this vertex to all the other vertices in the graph:

1∑j,i spl(i,j)

eccentricity closeness

betweenness

Radius

Example 1: GOT radius = 3Example 2: NVV radius in LCC: 9Example 3: FB radius = 4

Nathalie Vialaneix | Graph mining 30/30

Page 95: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Other centrality measure: eccentricity and closeness

vertex eccentricity: shortest path length from the farthest othervertex in the graph (the smallest eccentricity is the radius)vertex closeness: inverse of the average length of the shortestpaths from this vertex to all the other vertices in the graph:

1∑j,i spl(i,j)

Radius

Example 1: GOT radius = 3Example 2: NVV radius in LCC: 9Example 3: FB radius = 4

Nathalie Vialaneix | Graph mining 30/30

Page 96: Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Beveridge, A. and Shan, J. (2016).Network of thrones.Math Horizons, 23(4):18–22.

Csardi, G. and Nepusz, T. (2006).The igraph software package for complex network research.InterJournal, Complex Systems.

Fruchterman, T. and Reingold, B. (1991).Graph drawing by force-directed placement.Software, Practice and Experience, 21:1129–1164.

Traud, A., Mucha, P., and Porter, M. (2012).Social structure of facebook networks.Physica A, 391(16):4165–4180.

Nathalie Vialaneix | Graph mining 30/30