course on social network analysis' - oxford statisticssnijders/blockmodels.pdf · analysis...
TRANSCRIPT
'
&
$
%
Course onSocial NetworkAnalysisBlockmodeling
Anuska Ferligoj,Vladimir Batagelj,Andrej Mrvar
University of Ljubljana
Slovenia
Padova, April 10-11, 2003
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 1'
&
$
%
Outline1 Introduction to Blockmodeling. . . . . . . . . . . . . . . . . . 1
2 Blockmodeling as a clustering problem. . . . . . . . . . . . . 2
12 Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
20 Pre-specified blockmodeling. . . . . . . . . . . . . . . . . . . 20
28 Final Remarks. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 1'
&
$
%
Introduction to Blockmodeling
Pajek - shadow 0.00,1.00 Sep- 5-1998World trade - alphabetic order
afg alb alg arg aus aut bel bol bra brm bul bur cam can car cha chd chi col con cos cub cyp cze dah den dom ecu ege egy els eth fin fra gab gha gre gua gui hai hon hun ice ind ins ire irn irq isr ita ivo jam jap jor ken kmr kod kor kuw lao leb lib liy lux maa mat mex mla mli mon mor nau nep net nic nig nir nor nze pak pan par per phi pol por rum rwa saf sau sen sie som spa sri sud swe swi syr tai tha tog tri tun tur uga uki upv uru usa usr ven vnd vnr wge yem yug zai
afg
a
lb
alg
a
rg
aus
a
ut
bel
b
ol
bra
b
rm
bul
b
ur
cam
c
an
car
c
ha
chd
c
hi
col
c
on
cos
c
ub
cyp
c
ze
dah
d
en
dom
e
cu
ege
e
gy
els
e
th
fin
fra
gab
g
ha
gre
g
ua
gui
h
ai
hon
h
un
ice
ind
ins
ire
irn
irq
isr
ita
ivo
jam
ja
p
jo
r
k
en
km
r
k
od
kor
k
uw
lao
leb
lib
liy
lux
maa
m
at
mex
m
la
mli
mon
m
or
nau
n
ep
net
n
ic
nig
n
ir
n
or
nze
p
ak
pan
p
ar
per
p
hi
pol
p
or
rum
r
wa
saf
s
au
sen
s
ie
som
s
pa
sri
sud
s
we
sw
i
s
yr
tai
tha
tog
tri
tun
tur
uga
u
ki
upv
u
ru
usa
u
sr
ven
v
nd
vnr
w
ge
yem
y
ug
zai
Pajek - shadow 0.00,1.00 Sep- 5-1998World Trade (Snyder and Kick, 1979) - cores
uki net bel lux fra ita den jap usa can bra arg ire swi spa por wge ege pol aus hun cze yug gre bul rum usr fin swe nor irn tur irq egy leb cha ind pak aut cub mex uru nig ken saf mor sud syr isr sau kuw sri tha mla gua hon els nic cos pan col ven ecu per chi tai kor vnr phi ins nze mli sen nir ivo upv gha cam gab maa alg hai dom jam tri bol par mat alb cyp ice dah nau gui lib sie tog car chd con zai uga bur rwa som eth tun liy jor yem afg mon kod brm nep kmr lao vnd
uki
n
et
bel
lu
x
fr
a
it
a
d
en
jap
usa
c
an
bra
a
rg
ire
sw
i
s
pa
por
w
ge
ege
p
ol
aus
h
un
cze
y
ug
gre
b
ul
rum
u
sr
fin
sw
e
n
or
irn
tur
irq
egy
le
b
c
ha
ind
pak
a
ut
cub
m
ex
uru
n
ig
ken
s
af
mor
s
ud
syr
is
r
s
au
kuw
s
ri
th
a
m
la
gua
h
on
els
n
ic
cos
p
an
col
v
en
ecu
p
er
chi
ta
i
k
or
vnr
p
hi
ins
nze
m
li
s
en
nir
ivo
upv
g
ha
cam
g
ab
maa
a
lg
hai
d
om
jam
tr
i
b
ol
par
m
at
alb
c
yp
ice
dah
n
au
gui
li
b
s
ie
tog
car
c
hd
con
z
ai
uga
b
ur
rw
a
s
om
eth
tu
n
li
y
jo
r
y
em
afg
m
on
kod
b
rm
nep
k
mr
lao
vnd
Alphabetic order of countries (left) and rearrangement (right)
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 2'
&
$
%
Blockmodeling as a clustering problemThe goal ofblockmodelingis to
reduce a large, potentially in-
coherent network to a smaller
comprehensible structure that
can be interpreted more readily.
Blockmodeling, as an empiri-
cal procedure, is based on the
idea that units in a network can
be grouped according to the ex-
tent to which they are equiva-
lent, according to somemean-
ingful definition of equivalence.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 3'
&
$
%
Cluster, clustering, blocks
One of the main procedural goals of blockmodeling is to identify, in a given
networkN = (U, R), R ⊆ U ×U, clusters(classes) of units that share
structural characteristics defined in terms ofR. The units within a cluster
have the same or similar connection patterns to other units. They form a
clusteringC = {C1, C2, . . . , Ck} which is apartition of the setU. Each
partition determines an equivalence relation (and vice versa). Let us denote
by∼ the relation determined by partitionC.
A clusteringC partitions also the relationR into blocks
R(Ci, Cj) = R ∩ Ci × Cj
Each such block consists of units belonging to clustersCi andCj and all
arcs leading from clusterCi to clusterCj . If i = j, a blockR(Ci, Ci) is
called adiagonalblock.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 4'
&
$
%
Equivalences
Regardless of the definition of equivalence used, there are two basic
approaches to the equivalence of units in a given network (Faust, 1988):
• the equivalent units have the same connection pattern to thesameneighbors;
• the equivalent units have the same or similar connection pattern to
(possibly)different neighbors.
The first type of equivalence is formalized by the notion ofstructural
equivalence (Lorrain and White, 1971) and the second by the notion of
regular equivalence (White and Reitz, 1983) with the latter a generalization
of the former.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 5'
&
$
%
Structural Equivalence
Units arestructurally equivalentif they are connected to the rest of the
network in identical ways. From the definition it follows that there are four
possible ideal blocks:
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 6'
&
$
%
Regular Equivalence
Intuitively, two units areregularly equivalentif they are equally connected
to equivalent others.
Batagelj, Doreian, and Ferligoj (1992) proved that regular equivalence
produces two types of blocks: null blocks and 1-covered blocks:
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 7'
&
$
%
Generalized equivalence / Block Types
Y1 1 1 1 1
X 1 1 1 1 11 1 1 1 11 1 1 1 1complete
Y0 1 0 0 0
X 1 1 1 1 10 0 0 0 00 0 0 1 0
row-dominant
Y0 0 1 0 0
X 0 0 1 1 01 1 1 0 00 0 1 0 1
col-dominant
Y0 1 0 0 0
X 1 0 1 1 00 0 1 0 11 1 0 0 0
regular
Y0 1 0 0 0
X 0 1 1 0 01 0 1 0 00 1 0 0 1row-regular
Y0 1 0 1 0
X 1 0 1 0 01 1 0 1 10 0 0 0 0col-regular
Y0 0 0 0 0
X 0 0 0 0 00 0 0 0 00 0 0 0 0
null
Y0 0 0 1 0
X 0 0 1 0 01 0 0 0 00 0 0 1 0
row-functional
Y1 0 0 00 1 0 0
X 0 0 1 00 0 0 00 0 0 1
col-functional
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 8'
&
$
%
Establishing Blockmodels
The problem of establishing a partition of units in a network in terms of a
selected type of equivalence is a special case ofclustering problemthat can
be formulated as an optimization problem(Φ, P ) as follows:
Determine the clusteringC? ∈ Φ for which
P (C?) = minC∈Φ
P (C)
whereΦ is the set offeasible clusteringsandP is acriterion function.
Since the set of unitsU is finite, the set of feasible clusterings is also
finite. Therefore the setMin(Φ, P ) of all solutions of the problem (optimal
clusterings) is not empty.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 9'
&
$
%
Criterion function
Criterion functions can be constructed
• indirectlyas a function of a compatible (dis)similarity measure between
pairs of units, or
• directly as a function measuring the fit of a clustering to an ideal
one with perfect relations within each cluster and between clusters
according to the considered types of connections (equivalence).
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 10'
&
$
%
Indirect Approach
JJJ
�?
?
R
Q
D
hierarchical algorithms,
relocation algorithm, leader algorithm
RELATION
DESCRIPTIONS
OF UNITS
original relation
path matrix
triadsorbits
DISSIMILARITY
MATRIX
STANDARD
CLUSTERING
ALGORITHMS
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 11'
&
$
%
Dissimilarities
The dissimilarity measured is compatiblewith a considered equivalence∼if for each pair of units holds
Xi ∼ Xj ⇔ d(Xi, Xj) = 0
Not all dissimilarity measures typically used are compatible with structuralequivalence. For example, thecorrected Euclidean-like dissimilarity
d(Xi, Xj) =
=
√√√√(rii − rjj)2 + (rij − rji)2 +
n∑s=1
s 6=i,j
((ris − rjs)2 + (rsi − rsj)2)
is compatible with structural equivalence.
The indirect clustering approach does not seem suitable for establishing
clusterings in terms of regular equivalence since there is no evident way
how to construct a compatible (dis)similarity measure.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 12'
&
$
%
ExampleThe analyzed network consists of social support exchange relation among
fifteen students of the Social Science Informatics fourth year class
(2002/2003) at the Faculty of Social Sciences, University of Ljubljana.
Interviews were conducted in October 2002.
Support relation among students was identified by the following question:
Introduction: You have done several exams since you are in the
second class now. Students usually borrow studying material from
their colleagues.
Enumerate (list) the names of your colleagues that you have most
often borrowed studying material from. (The number of listed
persons is not limited.)
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 13'
&
$
%
Class network
b02
b03
g07
g09
g10
g12
g22 g24
g28
g42
b51
g63
b85
b89
b96
class.net
Vertices represent students
in the class; circles – girls,
squares – boys. Opposite
pairs of arcs are replaced by
edges.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 14'
&
$
%
Indirect approachPajek - Ward [0.00,7.81]
b51
b89
b02
b96
b03
b85
g10
g24
g09
g63
g12
g07
g28
g22
g42
Using Corrected Euclidean-like
dissimilarity and Ward clustering
method we obtain the following
dendrogram.
From it we can determine the num-
ber of clusters: ’Natural’ cluster-
ings correspond to clear ’jumps’ in
the dendrogram.
If we select 3 clusters we get the
partitionC.
C = {{b51, b89, b02, b96, b03, b85, g10, g24},{g09, g63, g12}, {g07, g28, g22, g42}}
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 15'
&
$
%
Partition in 3 clusters
b02
b03
g07
g09
g10
g12
g22 g24
g28
g42
b51
g63
b85
b89
b96
On the picture, ver-
tices in the same
cluster are of the
same color.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 16'
&
$
%
MatrixPajek - shadow [0.00,1.00]
b02
b03
g10
g24 C1
b51
b85
b89
b96
g07
g22 C2
g28
g42
g09
g12 C3
g63
b02
b03
g10
g24
C
1
b51
b85
b89
b96
g07
g22
C
2
g28
g42
g09
g12
C
3
g63
The partition can be used
also to reorder rows and
columns of the matrix repre-
senting the network. Clus-
ters are divided using blue
vertical and horizontal lines.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 17'
&
$
%
Direct Approach
The second possibility for solving the blockmodeling problem is to construct
an appropriate criterion function directly and then use a local optimization
algorithm to obtain a ‘good’ clustering solution.
Criterion functionP (C) has to besensitiveto considered equivalence:
P (C) = 0 ⇔ C defines considered equivalence.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 18'
&
$
%
A Criterion Function
One of the possible ways of constructing a criterion function that directly
reflects the considered equivalence is to measure the fit of a clustering to
an ideal one with perfect relations within each cluster and between clusters
according to the considered equivalence.
Given a clusteringC = {C1, C2, . . . , Ck}, letB(Cu, Cv) denote the set of
all ideal blocks corresponding to blockR(Cu, Cv). Then the global error of
clusteringC can be expressed as
P (C) =∑
Cu,Cv∈C
minB∈B(Cu,Cv)
d(R(Cu, Cv), B)
where the termd(R(Cu, Cv), B) measures the difference (error) between
the blockR(Cu, Cv) and the ideal blockB. The functiond has to be
compatible with the selected type of equivalence.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 19'
&
$
%
Local Optimization
For solving the blockmodeling problem we use a local optimization
procedure (a relocation algorithm):
Determine the initial clusteringC;
repeat:if in the neighborhood of the current clusteringCthere exists a clusteringC′ such thatP (C′) < P (C)then move to clusteringC′ .
The neighborhood in this local optimization procedure is determined by the
following two transformations:
• movinga unitXk from clusterCp to clusterCq (transition);
• interchangingunits Xu andXv from different clustersCp andCq
(transposition).
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 20'
&
$
%
Pre-specified blockmodelingIn the previous slides the inductive approaches for establishing blockmodels
for a set of social relations defined over a set of units were discussed.
Some form of equivalence is specified and clusterings are sought that are
consistent with a specified equivalence.
Another view of blockmodeling is deductive in the sense of starting with a
blockmodel that is specified in terms of substance prior to an analysis.
In this case given a network, set of types of ideal blocks, and a reducedmodel, a solution (a clustering) can be determined which minimizes thecriterion function.
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 21'
&
$
%
Some Blockmodel Types
1. Cohesive subgroupswith intraposition ties and no ties between
positions:
1 0 0
0 1 0
0 0 1
2. A center-peripherymodel with onecore position (center) which is
internally cohesive and connected with all other positions. The other
positions (periphery) are all connected to the core position and are not
connected to each other:
1 1 1
1 0 0
1 0 0
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 22'
&
$
%
. . . Some Blockmodel Types
3. A centralizedmodel which, a special case of the center-periphery
model, where all ties are from the core position or toward it:
1 1 1
0 0 0
0 0 0
or
1 0 0
1 0 0
1 0 0
4. A hierarchicalmodel with the positions on a single path:
0 1 0
0 0 1
0 0 0
or
0 0 0
1 0 0
0 1 0
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 23'
&
$
%
. . . Some Blockmodel Types
5. A transitivity model similar to hierarchical model with permitted ties
from lower positions to all higher positions:
0 1 1
0 0 1
0 0 0
or
0 0 0
1 0 0
1 1 0
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 24'
&
$
%
Prespecified blockmodeling example
We expect that center-periphery model exists in the network: some students
having good studying material, some not.
Prespecified blockmodel: (com/complete, reg/regular, -/null block)
1 2
1 [com reg] -
2 [com reg] -
Using local optimization we get the partition:
C = {{b02, b03, b51, b85, b89, b96, g09},
{g07, g10, g12, g22, g24, g28, g42, g63}}
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 25'
&
$
%
2 clusters solution
b02
b03
g07
g09
g10
g12
g22 g24
g28
g42
b51
g63
b85
b89
b96
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 26'
&
$
%
ModelPajek - shadow [0.00,1.00]
g07
g10
g12
g22
g24
g28
g42
g63
b85
b02
b03
g09
b51
b89
b96
g07
g10
g12
g22
g24
g28
g42
g63
b85
b02
b03
g09
b51
b89
b96
Image and Error Matrices:
1 2
1 reg -
2 reg -
1 2
1 0 3
2 0 2
Total error = 5
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 27'
&
$
%
Benefits from Optimization Approach
• ordinary/inductive blockmodeling: Given a networkN and set oftypes of connectionT , determine the modelM;
• evaluationof the quality of a model,comparingdifferent models,
analyzing the evolutionof a network(Sampson data, Doreian andMrvar 1996): Given a networkN, a modelM, and blockmodelingµ,compute the corresponding criterion function;
• model fitting/deductive blockmodeling: Given a networkN, set oftypesT , and a family of models, determineµ which minimizes thecriterion function (Batagelj, Ferligoj, Doreian, 1998).
• we can fit the network to apartial modeland analyze the residualafterward;
• we can also introduce differentconstraintson the model, for example:unitsX andY are of the same type; or, types of unitsX andY are notconnected; . . .
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖
A. Ferligoj, V. Batagelj, A. Mrvar: Social Network Analysis / Blockmodeling 28'
&
$
%
Final RemarksThe current, local optimization based, programs for generalized block-
modeling can deal only with networks with at most some hundreds of
units.
What to do with larger networks is an open question. For some specialized
problems also procedures for (very) large networks can be developed
(Doreian, Batagelj, Ferligoj, 1998; Batagelj, Zaversnik, 2002).
Another interesting problem is the development ofblockmodeling of valued
networksor more generalrelational data analysis(Batagelj, Ferligoj, 2000).
The generalized blockmodeling is implemented in Pajek – program for
analysis and visualization of large networks. It is freely available, for
noncommercial use, at:
http://vlado.fmf.uni-lj.si/pub/networks/pajek/
Padova, April 10-11, 2003 ▲ ▲ ❙ ▲ ● ▲ ❙ ▲▲ ☛ ✖