exploratory social network analysis with pajek
DESCRIPTION
Exploratory social network analysis with pajekTRANSCRIPT
Exploratory Social Network Analysis with Pajek
Fundamentals in Social Network Analysis Fundamentals in Social Network Analysis by Wouter de Nooy, Andres Mrvar andVladimir Batagelj
Slides created byThomas Plotkowiak
26.08.2010
Agenda
1. Fundamentals
2. Attributes and Relations
3. Cohesion
4. Sentiments and Friendship
5. Affiliations5. Affiliations
6. Core - Periphery
1 - Fundamentals1 - Fundamentals
Fundamentals
SociometrySociometrySociometrySociometry studies interpersonal relations. Society is not an
aggregate of individuals and the characteristics (as statisticians
assume) but a structure of interpersonal ties. Therefore, the
individual is not the basic social unit. The social atom consists
of an individual and his or her social, economic, or cultural ties.
Social atoms are linked into groups, and , ultimately, society Social atoms are linked into groups, and , ultimately, society
consists of interrelated groups.
Example of a Sociogram
Choices of twenty-six girls living in one dormitory at a New York state training school.
The girls were asked to choose the girls they liked best as their dining-table partners.
The mainmainmainmain goalgoalgoalgoal of social network analysis is detecting and
interpreting patterns of social ties among actors.
Exploratory Social Network Analysis
It consists of four parts:It consists of four parts:
1. The definition of a network
2. Network manipulation
3. Determination of structural features
4. Visual inspection
Network Definition
A graphgraphgraphgraph is a set of vertices and a set of lines between pairs of
vertices.
A vertexvertexvertexvertex is the smallest unit in a network. In SNA it represents
an actor (girl, organization, country…)an actor (girl, organization, country…)
A linelinelineline is a tie between two vertices in a network. In SNA it can
be any social relation.
A looplooplooploop is a special kind of line, namely, a line that connects a
vertex to itself.
Network Definition II
A directeddirecteddirecteddirected linslinslinslins is called an arcarcarcarc.... Whereas an undirected line is an
edge.
A directeddirecteddirecteddirected graphgraphgraphgraph or digraphdigraphdigraphdigraph contains one or more arcs. An
undirectedundirectedundirectedundirected graphgraphgraphgraph contains no arcs (all of its lines are edgesedgesedgesedges).undirectedundirectedundirectedundirected graphgraphgraphgraph contains no arcs (all of its lines are edgesedgesedgesedges).
A simple undirected graphsimple undirected graphsimple undirected graphsimple undirected graph contains neither multiple edges nor
loops.
A simplesimplesimplesimple directeddirecteddirecteddirected graphgraphgraphgraph contains no multiple arcs.
Network Definition III
A networknetworknetworknetwork consists of a graph and additional information on the
vertices or the lines of the graph.
Application
1. We use the computer program Pajek – Slovenian for spider –
to analyzed and draw social networks. (get it from
http://vlado.fmf.uni-lj.si/pub/networks/)
Number of vertices
Specific vertex and orientation
List of Arcs
Pajek Main Screen
Manipulation
Suppose we want to change reciprocated choices in the dining-
table partners network into edges.
Calculation
Suppose we want to calculate the total number of lines:
Visualization
Automatic Drawing
• Layout by Energy: Move vertices to locations that minimize
the variation in line length. ( Imagine that the lines are springs
pulling vertices together, though never too close)
• Energy Layouts:
– Kamada-Kawai (computationally expensive)– Kamada-Kawai (computationally expensive)
– Fruchtemann Reingold (faster)
• Draw by Hand
Exporting
VRML, MDL, Kinemages
Bitmap
SVG, EPS
2 - Attributes and Relations2 - Attributes and Relations
Example – The world system
In 1974,ImmanuelWallerstein introduced the concept of a
capitalist world systemcapitalist world systemcapitalist world systemcapitalist world system, which came into existence in the
sixteenth century. This system is characterized by a world economy that is stratified into a core, a semiperiphery, and a
periphery. Countries owe their wealth or poverty to their
positionin the world economy. The core,Wallerstein argues, positionin the world economy. The core,Wallerstein argues,
exists because it succeeds in exploiting the periphery and, to a lesser extent, the semiperiphery.The semiperiphery profi ts
from being an intermediary between the coreand the
periphery.
�Which countries belong to the core, semiperiphery or
periphery?
The world system network
• Network contains 80 countries with attributes:
• continent
• world system position in 1980
• gross domestic product per capita in U.S. dollars in 1995
• The arcs represent imports (of metal) into one country from
another.another.
Partition
A partitionpar titionpar titionpar tition of a network is a classification or clustering of the
vertices in the network such that each vertex is assigned to
exactly one class or cluster.
Partition Load & Edit
• File > Partition > Read (.clu File)
• File> Partition> Edit
Info on Partition Distribution
• Info > Partition
Application – Draw Partition
• Draw > Draw Partition
Reduction of a Network
• Operations > Extract from Network (select class 6)
To extractextractextractextract a subnetworksubnetworksubnetworksubnetwork from a network, select a subset of its
vertices and all lines that are only incident with the selected
vertices.
Partition – Local View
1. Partitions > Extract Second from First
Global View
1. Operations > Shrink Network
To shrinkshrinkshrinkshrink a network, replace a subset of its vertices by one new
vertex that is incident to all lines that were incident with the
vertices of the subset in the original network.
Contextual View
• Operations > Shrink Network (Don't shrink class 6)
In a contextualcontextualcontextualcontextual view,view,view,view, all classes are shunk except the one in
which you are particularly interested.
Vectors and Coordinates Load & Edit
• File > Vector > Read (.vec File)
• File> Vector > Edit
A vectorvectorvectorvector assigns a numerical value to each vertex in a network.
Info on a Vector
• Info > Vector
Vector � Partition
• Vector > Make Partition > by Truncating (Abs)
• Vector > Make Partition by Intervals > First Threshold and
Step
• Vector > Make Partition by Intervals >Selected Thresholds
Draw Vector & Partition
Global View & Vectors
1. Vector > Shrink Vector (Sum)
Network Analysis and Statistics
• Example: Crosstabulation of two partitions and some
measures of association between the classifications
represented by two partitions.
• Partition > Info > Cramer's , Rajski
• Cramer's V measures the statistical dependence between two
classifications. classifications.
• Rajski's indices measure the degree to which the information in one
classification is preserved in the other classification.
END OF LESSON 1END OF LESSON 1
3 - Cohesion3 - Cohesion
Cohesive Subgroups
Cohesive subgroups: We hypothesize that cohesive subgroups
are the basis for solidarity, shared norms, identity and
collective behavior. Perceived similarity, for instance,
membership of a social group, is expected to promote
interaction. We expect similar people to interact a lot, at least
more often than with dissimilar people. This peonomenon is more often than with dissimilar people. This peonomenon is
called homophily: "Birds of a feather flock together."Birds of a feather flock together."Birds of a feather flock together."Birds of a feather flock together."
Example – Families in Haciendas (1948)
Each arc represents "frequent visits" from one family to another.
Density & Degree I
DensityDensityDensityDensity is the number of lines in a simple network, expressed as
a proportion of the maximum possible number of lines.
A completecompletecompletecomplete networknetworknetworknetwork is a network with maximum density.
The degreedegreedegreedegree of a vertex is the number of lines incident with it.
Density & Degree II
Two vertices are adjacentadjacentadjacentadjacent if they are connected by a line.
The indegreeindegreeindegreeindegree of a vertex is the number of arcs it receives.
The outdegreeoutdegreeoutdegreeoutdegree is the number of arcs it sends.The outdegreeoutdegreeoutdegreeoutdegree is the number of arcs it sends.
To symmetrizesymmetrizesymmetrizesymmetrize a directed network is to replace unilateral and
bidirectional arcs by edges.
Computing Density
• Info > Network > General
Computing Degree
• Net > Transform > Arcs � Edges > All
• Net > Partitions > Degree > {In, Out, All}
Components
A semiwalksemiwalksemiwalksemiwalk from vertex u to vertex v is a sequence of lines such
that the end vertex of one line is the starting vertex of the next
line and the sequence starts at vertex u and end at vertex v.
A walkwalkwalkwalk is a semiwalk with the additional condition that none of
its lines are an arc of which the end vertex is the arc's tailits lines are an arc of which the end vertex is the arc's tail
Note that v5�v3� v4�v5�v3
is also a walk to v3
Paths
A semipathsemipathsemipathsemipath is a semiwalk in which no vertex in between the first
and last vertex of the semiwalk occurs more than once.
A pathpathpathpath is a walk in which no vertex in between the first and last
vertex of the walk occurs more than once.vertex of the walk occurs more than once.
Connectedness
A network is (weakly)(weakly)(weakly)(weakly) connectedconnectedconnectedconnected if each pair of vertices is
connected by a semipath.
A network is stronglystronglystronglystrongly connectedconnectedconnectedconnected if each pair of vertices is
connected by a path.connected by a path.
This network is not connected
because v2 is isolated.
Connected Components
A (weak)(weak)(weak)(weak) componentcomponentcomponentcomponent is a maximal (weakly) connected
subnetwork.
A strongstrongstrongstrong componentcomponentcomponentcomponent is a maximal strongly connected
subnetwork.subnetwork.
v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component
Example Strong Components
1. Net > Components > {Strong, Weak}
Cliques and Complete Subnetworks
A cliquecliquecliqueclique is a maximal complete subnetwork containing three
vertices or more. (cliques can overlap)
v1,v6,v5 is a clique
v2,v4,v5 is not a clique
v2,v3,v4,v5 is a clique
n-Clique & n-Clan
n-Clique: Is a maximal complete subgraph, in the analyzed graph,
each vertex has maximally the distance n. A Clique is a n-Clique
with n=1.
n-Clan: Ist a maximal complete subgraph, where each vertex has
maximally the distance n in the resulting graph
2-Clique
2-Clan
maximally the distance n in the resulting graph
n-Clans & n-Cliques
1
2
56
4
3
2-Clans: 123,234,345,456,561,612
2-Cliquen: 123,234,345,456,561,612 andandandand 135,246135,246135,246135,246
k-Plexes
56
k-Plex: A k-Plex is a maximal complete subgraph with gs Vertext,
in which each vertex has at least connections with gs-k vertices.
2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123In general k-Plexes are more robust than Cliques und Clans.
1
2
56
4
3
Overview Subgroups
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
2 Components 1 Component
2 2-Clans (341,412)
2 2-Cliques (341,412)
1 Component
1 2-Clans (124)
1 2-Clique (124)
1 Component
1 2-Clan (1234)
1 2-Clique (1234)
1 2-Plex (1234)
1 Component
1 2-Clan (1234)
1 2-Clique (1234)
1 2-Plex (1234)
1 Clique
Overview Groupconcepts
• 1-Clique, 1-Clan und 1-Plex are identical
• A n-Clan is always included in a higher order n-Clique
Component
2-Clique
2-Clan
2-Plex
Clique
Finding Cliques
• Example: We are looking for occurences of triads
• Nets > First Network, Second Network
• Nets > Fragment (1 in 2 ) > Find
The figure shows the hierarchy for the example of overlapping complete triads. There are five complete triads; each of the triads
is represented by a gray vertex. Each triad consits of three vertices.
Finding Social Circles
• Partitions > First, Second
• Partitions > Extract Second from First
We have found three social circles.
k-Cores
• Net > Components > {Strong, Weak}A kkkk----corecorecorecore is a maximal subnetwork in which each vertex has at
least degree k within the subnetwork.
k-Cores
k-cores are nestednestednestednested which means that a vertex in a 3-core is also
part of a 2-core but not all members of a 2-core belong to a 3-
core.
k-Cores Application
• K-cores help to detect cohesive subgroups by removing the
lowes k-cores from the network until the network breaks up
into relatively dense components.
• Net > Partitions > Core >{Input, Output, All}
4 - Sentiments and Friendship4 - Sentiments and Friendship
Balance Theory
Franz Heider (1940): A person (P) feels uncomfortable whe he
ore she disagrees with his ore her friend(O) on a topic (X).
P feels an urge to change this imbalance. He can adjust his
opinion, change his affection for O, or convince himself that O is
not really opposed to X.
Signed Graphs
{O,P,X} form a cycle. All balanced cycles contain an even number
of negative lines or no negative lines at all.
A signedsignedsignedsigned graphgraphgraphgraph is a graph in which each line carries either a
positive or a negative sign.
Signed Graphs with Arcs
A cyclecyclecyclecycle is a closed path.
A semicyclesemicyclesemicyclesemicycle is a closed semipath.
A (semi-)cycle is balancedbalancedbalancedbalanced if it does not contain an uneven
number of negative arcs.
Balanced Networks
A signed graph is balancedbalancedbalancedbalanced if all of its (semi-)cycles are balanced.
A signed graph is balancedbalancedbalancedbalanced if it can be partitioned into two
clusters such that all positive ties are contained within the clusters
and all negative ties are situated between the clusters.
Clusterability = Generalized Balance
A cycle or a semicycle is clusterableclusterableclusterableclusterable if it does not contain exactly
one negative arc.
A signed graph is clusterableclusterableclusterableclusterable if it can be partitioned into clusters
such that all positive ties are contained within clusters and all
negative ties are situated between clusters.
Example – Community in a New England
monastery
Options > Values of Lines > Similaritiies
Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)
Issues on Clustering
1. An optimization may find several solutions that fit equally well.
It is up to the researcher to select one or present all.
2. There is no guarantee that there is not a better solution than
the found one, unless it is optimal.
3. Different starting options yield different results. (It is hard to
tell the exact number of clusters that will yield the lowest tell the exact number of clusters that will yield the lowest
error score)
4. Negative ars are often tolerated less in a cluster than positive
arcs between clusters.
Calculating Clustering
1. Partition > Create Random Partition (ex. 3 Clusters)
2. Operations > Balance (alpha 0.5)
Development in Time
List of Vertices with their presence
List of Arcs with their presence
Loading and Drawing Networks in Time
• Net > Transform > Generate in Time
• Draw > {Previous, Next}
Network consists of 3 choices, hence the bigger errors.
We see a tendency towards clusterability.
BREAK LESSON 2BREAK LESSON 2
5 - Affiliations5 - Affiliations
Example – Corporate interlocks in Scotland in
the beginning of the twentieth century (1904-5)
A fragment of the Scottish directorates network. Companies are classified according t
oil & mining, railway, engineering...
Directors (grey) and Firms (black)
Two-Mode and One-Mode Networks
In a oneoneoneone modemodemodemode networknetworknetworknetwork, each vertex can be related to each
other vertex.
In a twotwotwotwo----modemodemodemode network,network,network,network, vertices are divided into two sets and
vertices can only be related to vertices in the other set.
• The degree of a firm specifies the number of its multiple
directors, also known as size of an event.size of an event.size of an event.size of an event.
• The degree of a director equals the number of boards he sits
on, also known as the rate of par ticipation of an actor. par ticipation of an actor. par ticipation of an actor. par ticipation of an actor.
• Also note that some measures must be computed Also note that some measures must be computed Also note that some measures must be computed Also note that some measures must be computed
differently for twodifferently for twodifferently for twodifferently for two----node networks.node networks.node networks.node networks.
vertices can only be related to vertices in the other set.
Transforming two-mode networks into one-
mode networks
Whenever two firms share a director in the two-mode network, there is a line between
them in the one-mode network.
Transforming two-mode networks into one-
mode networks II
The events of the two-mode networks are represented by lines and loops in the one-
mode network of actors. J.S.T ait meets W. Sanderson in board meetings of two
companies.
Transforming two-mode networks into one-
mode networks III
• Net > Transform > 2-Mode to 1-Mode {Rows,Columns}
• Net > Transform > 2-Mode to 1-Mode > Include Loops,
Multiple Lines
• Info > Network > Line Values
m-Slices
An mmmm----sliceslicesliceslice is a maximal subnetwork containing the lines with a
multiplicity equal to or greater than m and the vertices incident
with these lines.
2-Slice in the network of Scottish firms
Computing m-Slices
• Net > Partitions > Values Core {use max}
• Net > Partitions > Values Core > First Threshold and Step
• Net > Transform > Remove >lines with value > lower than 2
• Operations > Extract from Network > Partition
• Net > Components > Weak• Net > Components > Weak
3-slice
m-slices 3D
• Layers > Type of Layout >
3D
• Layers > In z direction
• Options> Scroll Bar > On
Drawing m-Slices
1. Layers > Type of Layout > 3D
2. Layers > In z direction
3. Options> Scroll Bar >
6 – Center and Periphery6 – Center and Periphery
//Slides need to be translated
//Input from book 2 needed.
Example – Communication ties within a sawmill
H – Hispanic
E – English
M- Mill
P – Planer section
Y -Yard
Vertex labels indicate the ethnicity and the type of work of each employee, for example
HP-10 is an Hispanic (H) working in the planer section (P)
Distance
• The larger the number of sources accessible to a person, the
easier it is to obtain information. Social ties constitute a social
capital that may be used to mobilize social resources.
• The simples indicator of centrality is the number of its
neighbors (degree in a simple undirected network)
Degree centrality I
The degreedegreedegreedegree centralitycentralitycentralitycentrality of a vertex is its degree.
DegreeDegreeDegreeDegree centralizationcentralizationcentralizationcentralization ofofofof aaaa networknetworknetworknetwork iiiis the variation in the
degrees of vertices divided by the maximum degree variation
which is possible in a networks of the same size.
Degree Centrality II
1. //TODO missing…
Closeness Centrality
1. Closeness centrality : Eine Person ist dann zentral, wenn sie
bezüglich der Netzwerkrelation sehr nah bei allen anderen
Liegt. Eine solche zentrale Lage steigert die Effizienz, mit der
ein Akteur im Netzwerk agieren kann. Ein solcher Akteur kann
Informationen schnell schnell schnell schnell empfangen und verbreiten.
1
1( )
( , )c i g
i jj
gC n
d n n=
−=∑
Closeness Centrality
1
2
3
4
5
6
7
8 9
10
11
3
11 1( ) 0, 43
23cC n−= =
nininini njnjnjnj dddd
3 1 1
3 2 1
3 4 1
3 5 1
3 6 2
3 7 2
3 8 3
3 9 4
3 10 5
3 11 5
23
nnnn CcCcCcCc
1 0,27
2 0,29
3333 0,400,400,400,40
4 0,45
5 0,45
6 0,45
7 0,45
8 0,45
9 0,37
10 0,27
Achtung: Hier wurde können nur
symmetrische Verbindungen symmetrische Verbindungen symmetrische Verbindungen symmetrische Verbindungen
betrachtet werden und nur nur nur nur
verbundene Netzeverbundene Netzeverbundene Netzeverbundene Netze.
Zentralisierung
1. Zentralisierung =! Zentralität
2. Zentralisierung ist eine strukturelle Eigenschaft der Gruppe
und nicht der relationalen Eigenschaft einzelner Akteure.
3. Index für Zentralisierung: Man berechnet die Differenzen
zwischen der Zentralität des zentralsten Akteurs und der
Zentralität aller Anderen. Man summiert dann diese diff. über Zentralität aller Anderen. Man summiert dann diese diff. über
alle anderen Akteure.
1
( *) ( )
1
g
ii
C n C nC
g=
−=
−
∑
Zentralisierung
1. Dieser weißt nur dann einen hohen Wert auf wenn genau ein Akteur zentral ist, und nicht mehrere Akteure ein Zentrum
bilden.
2. Nur der Vergleich von Daten einer Gruppe zwischen
mehreren Zeitpunkten erlaubt sinnvoll interpretierbare
Aussagen.Aussagen.
Betweenness centrality
1. Betweenness Centrality: Personen (Cutpoints), die zwei die
ansonsten unverbundene Teilpopulationen miteinander
verbinden, sind Akteure mit einer hohen betweennesscentrality. (Annahme: man nutzt nur die kürzesten Verbindungen zur Kommunikation)
2. Indem man für jedes Paar von Akteuren j, k != i unter allen kürzesten Pfaden, die j un k verbinden , den Anteil von Pfaden
bestimmt die über Akteur i laufen. Anschließens müssen diese Anteile über alle Paare j, k != i gemittelt werden.
,
( )
( )( 1)( 2)
jk i jkj k
i j kb i
g n g
C ng g
≠≠=
− −
∑
Betweenness centrality
1. Achtung: Es ist möglich das einige Akteure zwar nicht
erreichbar sind, selbst aber die anderen von sich aus erreichen
können.
1
3
4 6
8 9
10
2
3
5 7
8 9
11
1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010 11111111
0 0 0,37 0,22 0,22 0,22 0,22 0,48 0,37 0 0
Degree Prestige
1. Prestige lässt sich sinnvoll messen als relativer Innengrad dieses
Akteurs (degreedegreedegreedegree prestigeprestigeprestigeprestige) [Wasserman und K. Faust 1994:
202]
( ) / ( 1)P n x g= −
Akteur j Matrix-Eintrag Zeile i, Spalte j
Anzahl Knoten im Netzwerk
( ) / ( 1)d j jP n x g+= −
jn ijx
ijxj ij
i
x x+ =∑
Prestige Beispiel
1
2
3
4
5
6
7
8 9
10
11
3 3( ) 2 / (11 1) 0,2dP n += − =Prestige von Knoten 3:
Generell: Prestige ist unabhängig von der Gruppengröße und sein Wert liegt zwischen
0 und 1 (Stern).