social network analysis via factor graph model zi yang
TRANSCRIPT
SOCIAL NETWORK ANALYSIS VIA FACTOR GRAPH MODEL
Zi Yang
OUTLINE
Background Challenge Unsupervised case 1
Representative user finding Unsupervised case 2
Community discovery Experiments Supervised case
Modeling information diffusion in social network
BACKGROUND
Social network
Example: Digg.com A popular social news website for people to discover
and share content Various types of behaviors of the users
submit, digg, comment and reply a comment Edges
if one diggs or comments a story of another
BACKGROUND
Community discovery Modularity property
Affinity propagation Clustering via factor graph model Update rules:
,,
exp [ ]2i j
i j i ji j
k ky y
m
Pair-wise constrain
' . . '
' . . ' { , }
' . . ' { }
( , ) ( , ) max { ( , ') ( , ')}
( , ) min{0, ( , ) max{0, ( ', )}}
( , ) max{0, ( ', )}
k s t k k
i s t i i k
i s t i k
r i k s i k a a k s i k
a i k r k k r i k
a k k r i k
BACKGROUND
Affinity propagation
Local factor
1: 1:1 1
, if but :( ) ( , ) ( ) where ( )
0, otherwise
N Nk i
i k N k Ni k
c k i c kS c s i c c c
Regional constrain
OUTLINE
Background Challenge Unsupervised case 1
Representative user finding Unsupervised case 2
Community discovery Experiments Supervised case
Modeling information diffusion in social network
CHALLENGES
How to capture the local properties for social network analysis?
Community discovery as a graph clustering, and how to consider the edge information directly?
Homophily
What constraint can be applied to describe the formation/evolution of community?
OUTLINE
Background Challenge Unsupervised case 1
Representative user finding Unsupervised case 2
Community discovery Experiments Supervised case
Modeling information diffusion in social network
REPRESENTATIVE USER FINDING
Problem definition given a social network and (optional) a
confidence for each user , the objective is to find a pair-wise representativeness on each edge in the network, and estimate the representative degree of each user in the network, which is denoted by a set of variables satisfying . . In other words, represents the user that mostly trusts (or relies on).
( , )G V Ei iv
iv{ }iy
{1, , }iy N iy
iv
REPRESENTATIVE USER FINDING
Modeling Input
Variables
v3
v4v1
v2
y3
y4y1
y2
v3
v4v1
v2
Represent the representative
REPRESENTATIVE USER FINDING
Modeling Node feature function
y3
y4y1
y2
v3
v4v1
v2
g1(y1) g3(y3) g4(y4)g2(y2)
,
,( )
if ( )
( ) ( ) if
0 otherwise
ii y i
i i i j i ij NB i
w y O i
g g y w y i
iy
Normalization factor
Observation: similarity between the node and variable
Self-representative
Neighbor Representative
REPRESENTATIVE USER FINDING
Modeling Edge feature function
y3
y4y1
y2
v3
v4v1
v2
g1(y1) g3(y3) g4(y4)g2(y2)
f2,4(y2,y4)
f2,3(y2,y3)
f3,2(y3,y2)f3,2(y3,y2)
f2,1(y2,y1)
, ,
if ( , ) ( , )
1 if i j
i j i j i ji j
y yf f y y
y y
i jy y
Undirected edge: bidirected influence
If vertexes of the edge have the same representativeIf vertexes of the edge have different representative
REPRESENTATIVE USER FINDING
Modeling Regional feature function
a feature function defined
on the set of neighboring
nodes of and itself.y3
y4y1
y2
v3
v4v1
v2
g1(y1) g3(y3) g4(y4)g2(y2)
f2,4(y2,y4)
f2,3(y2,y3)
f3,2(y3,y2)f3,2(y3,y2)
f2,1(y2,y1)
h4(y4,y2)h3(y3,y1)h2(y2,y3,y4)
h1(y1,y2)
( ) { } ( ) { }
0 if and ( ),( ) ( )
1 otherwise k i
k k I k k
y k i I k y kh h y
I k ky
To avoid “leader without followers”
iv
REPRESENTATIVE USER FINDING
Modeling Objective function
Solving Max-sum algorithm
:
,
,
:
: , ( ) { }1 1
, ( ) { }1 1
max log ( )
1( ) ( ) ( , ) ( )
1( ) ( , ) ( )
i j
i j
N N
i i j ki e E k
N N
i i i j i j k I k ki e E k
P
P g f hZ
g y f y y h yZ
1N1 N
y
1 N i i j I k k
y
y y y y y
REPRESENTATIVE USER FINDING
Model learning
( )
( ) { }
( ) { } { }( ) ( ) ( ) ( )
( )( ) ( ) { }
max min ,0
min min ,0 max min ,0 ,max ,0
max
max
ii kjk I j
ij jj kj jjk I j i
ij ij ikj ij ij ikjj O i i j
k I i O i k I i O i
ijk ik ik ikl ij ij ij O i
l I i O i j
a r
a r r r
r g c g a c
p g a c g a c
‚
‚
‚ ( ) ( ) { }
max log ,01
ljl I i O i j
ijk jikc p
‚
REPRESENTATIVE USER FINDING
A bit explanation : how likely user persuades to take as his
representative : how likely user compliances the suggestion
from that he considers as his representative The direction of such process
Along the directed edges
ijkp iv jv kv
ijkc iv
jv kv
v1 v2
v3
v1 v2
v3
v1 v2
v3
REPRESENTATIVE USER FINDING
Algorithm
OUTLINE
Background Challenge Unsupervised case 1
Representative user finding Unsupervised case 2
Community discovery Experiments Supervised case
Modeling information diffusion in social network
COMMUNITY DISCOVERY
Problem definition given a social network and an expected number
of communities , correspondingly a virtual node . is introduced for each community, and the objective is to find a community for each person satisfying , which represents the community that belongs to, such that maximize the preservation of structure (or maximize the modularity of the community).
G
C
cu Uiy
iv {1, , }iy C iv
Q
COMMUNITY DISCOVERY
Feature definition – What’s different? Node feature function
Edge feature function
y3
y4
y1
y2
v3
v4v1
v2
u1 u2
g4(y4)f2,4(y2,y4)
f3,2(y3,y2)f1,3(y1,y3)
f2,1(y2,y1)
g3(y3)g2(y2)g1(y1)
f2,3(y2,y3)
, ,
,
( , ) exp
exp[ ]2
i j i j i j
i ji j i j
f y y q
k ky y
m
,
( ) ( )
( ) exp [ ] 1| |
j
i ji i j i
j I i O i y
g y y yX
COMMUNITY DISCOVERY
Algorithm
Result output and Variable updates
OUTLINE
Background Challenge Unsupervised case 1
Representative user finding Unsupervised case 2
Community discovery Experiments Supervised case
Modeling information diffusion in social network
Experiments
Dataset: Digg.com a popular social news website for people to
discover and share content 9,583 users, 56,440 contacts various types of behaviors of the users
submit, digg, comment and reply a comment Edges (In total: 308,362)
if one diggs or comments a story of another Weight of the edge: the total number of diggs and
comments
Experiments
Dataset: Digg.com 9,583 users, 56,440 contacts 308,362 edges
weight of the edge: the total number of diggs and comments
Settings: Parameter 0.6
Experiments
Result: 3 most self-representative users on 3 different topics for Digg user network
Experiments
Result: 3 most representative users of 5 communities on 3 different subset
Experiments
Result: Representative network on a sub graph in Digg-2 Network
pyrates
0.00
00
0.00
0 3
mikek814
0.0005
rocr69
1nfiniteL oop
0.0003
pavelmah
0.0000
0.0010
G ordonF ree
maxthreepwood0.0007
0.0007 0.0000
0.0000
upick
0.0000
ritubpant
wonderwal
0.0000
0.0000
mklopez
Omek
0.0000
SirP opper
irfanmp
0.0024
0.0024numberneal
0.0020mpind176
louiebaur
0.0015
0.00100.0009
zohaibusman
0.0007
optimusprime01
0.0006 0.0007
0.0006
0.00
20
0.00
06
0.0006
0.0006
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
OUTLINE
Background Challenge Unsupervised case 1
Representative user finding Unsupervised case 2
Community discovery Experiments Supervised case
Modeling information diffusion in social network
Modeling information diffusion in social network
Supervised model Bridging the actual value (label) with the
variable. More variables to come?
Learning the weights
Thanks